Recognition: unknown
CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations
Pith reviewed 2026-05-10 14:54 UTC · model grok-4.3
The pith
CLAD detects log anomalies directly from compressed byte streams by identifying disruptions in normal compression patterns.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CLAD is the first framework to perform log anomaly detection directly on compressed byte streams. It rests on the observation that normal logs compress into regular byte patterns while anomalies produce detectable multi-scale deviations in those same bytes. The model uses a dilated convolutional encoder to read the raw bytes, a hybrid Transformer-mLSTM to model dependencies across the stream, and four-way aggregation pooling to combine features at different scales, trained first by masked pre-training on byte sequences and then by focal-contrastive fine-tuning to manage class imbalance.
What carries the argument
Dilated convolutional byte encoder combined with hybrid Transformer-mLSTM and four-way aggregation pooling that extracts multi-scale deviations directly from opaque compressed bytes.
If this is right
- Detection eliminates all decompression and parsing overhead for streaming logs.
- Average F1-score reaches 0.9909 across five datasets while outperforming the best prior method by 2.72 points.
- The approach generalizes to structured streaming compressors without modification.
- Two-stage training with masked pre-training and focal-contrastive fine-tuning handles severe class imbalance in log data.
- Real-time processing becomes feasible on high-volume log streams that would otherwise require heavy pre-processing.
Where Pith is reading between the lines
- Similar byte-pattern disruption detection could extend to anomaly finding in other compressed streams such as network packets or time-series sensor data.
- Removing decompression steps would lower both latency and energy use in continuous monitoring systems that handle terabytes of logs daily.
- The architecture's focus on raw bytes might allow direct application to logs compressed by newer or custom algorithms not tested in the original evaluation.
Load-bearing premise
Anomalies in logs will reliably create byte-pattern disruptions in compressed streams that differ from normal logs in ways the model can learn without ever seeing the original text.
What would settle it
A dataset where anomalies compress to byte sequences indistinguishable from normal logs under the same compressor, causing the model's F1 score to fall below that of decompressing baselines.
read the original abstract
The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into regular byte patterns, while anomalies systematically disrupt them. To extract these multi-scale deviations from opaque bytes, we propose a purpose-built architecture integrating a dilated convolutional byte encoder, a hybrid Transformer--mLSTM, and four-way aggregation pooling. This is coupled with a two-stage training strategy of masked pre-training and focal-contrastive fine-tuning to effectively handle severe class imbalance. Evaluated across five datasets, CLAD achieves a state-of-the-art average F1-score of 0.9909 and outperforms the best baseline by 2.72 percentage points. It delivers superior accuracy while completely eliminating decompression and parsing overheads, offering a robust solution that generalizes to structured streaming compressors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CLAD, the first deep learning framework for log anomaly detection (LAD) performed directly on compressed byte streams without decompression or parsing. It rests on the insight that normal logs produce regular byte patterns under compression while anomalies create detectable multi-scale disruptions. The proposed architecture combines a dilated convolutional byte encoder, a hybrid Transformer-mLSTM, and four-way aggregation pooling, trained via masked pre-training followed by focal-contrastive fine-tuning to address class imbalance. Experiments on five datasets report a state-of-the-art average F1-score of 0.9909, outperforming the best baseline by 2.72 percentage points while eliminating pre-processing overhead and generalizing to structured streaming compressors.
Significance. If the central empirical claims hold, the work offers a practically significant advance by removing decompression and parsing costs in high-volume log processing pipelines. The two-stage training strategy and direct operation on opaque bytes are notable strengths, as is the explicit focus on efficiency. The result could influence future systems-oriented ML research on compressed or encoded data representations, provided the performance is shown to stem from the architecture rather than dataset-specific effects.
major comments (2)
- [Abstract and §3] Abstract and §3 (Method): The load-bearing assumption that anomalies 'systematically disrupt' regular byte patterns in compressed streams (enabling reliable detection without decompression) is not supported by ablations on compressor type (e.g., adaptive/dictionary-based vs. fixed) or anomaly injection methods. This leaves open whether the reported F1 gains generalize or are tied to the five specific datasets and compressor used.
- [§4 and §5] §4 (Architecture) and §5 (Experiments): No ablation results are presented that isolate the contribution of the dilated convolutional encoder, hybrid Transformer-mLSTM, or four-way pooling versus the masked pre-training and focal-contrastive fine-tuning. Without these, it is unclear whether the 2.72-point improvement is attributable to the novel components or to training choices.
minor comments (2)
- [Table 1 or §5.1] Table 1 or §5.1: Dataset characteristics (log formats, compression ratios, anomaly rates) and baseline implementation details should be expanded to allow reproduction and assessment of whether post-hoc selection occurred.
- [§5.2] §5.2: Statistical significance of the F1 scores (error bars, multiple random seeds, or paired tests) should be reported to substantiate the SOTA claim.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will incorporate the suggested analyses into the revised manuscript.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Method): The load-bearing assumption that anomalies 'systematically disrupt' regular byte patterns in compressed streams (enabling reliable detection without decompression) is not supported by ablations on compressor type (e.g., adaptive/dictionary-based vs. fixed) or anomaly injection methods. This leaves open whether the reported F1 gains generalize or are tied to the five specific datasets and compressor used.
Authors: We agree that explicit ablations on compressor families and controlled anomaly injection would further substantiate the core insight. The current manuscript already reports results across five heterogeneous datasets and notes generalization to structured streaming compressors. In the revision we will add: (i) experiments with additional compressors (zlib, LZ4, Zstandard) to separate fixed vs. adaptive/dictionary behavior, and (ii) synthetic anomaly-injection studies that quantify byte-pattern disruption. These results will be placed in §3 and §5 and will directly test whether detection performance tracks the hypothesized multi-scale disruptions rather than dataset idiosyncrasies. revision: yes
-
Referee: [§4 and §5] §4 (Architecture) and §5 (Experiments): No ablation results are presented that isolate the contribution of the dilated convolutional encoder, hybrid Transformer-mLSTM, or four-way pooling versus the masked pre-training and focal-contrastive fine-tuning. Without these, it is unclear whether the 2.72-point improvement is attributable to the novel components or to training choices.
Authors: We concur that component-wise ablations are necessary to attribute gains. We have since run the requested studies: (a) replacing the dilated convolutional encoder with a standard convolution stack, (b) substituting the hybrid Transformer-mLSTM with a pure Transformer or mLSTM-only decoder, (c) removing the four-way aggregation pooling, and (d) comparing the two-stage (masked pre-training + focal-contrastive) regime against single-stage and standard cross-entropy fine-tuning. The new results, to be added as a dedicated subsection and table in §5, show that both the architectural modules and the training strategy contribute non-redundant improvements, with the full CLAD configuration required to reach the reported 0.9909 average F1. revision: yes
Circularity Check
No circularity: empirical ML framework with external dataset validation
full rationale
The paper introduces an empirical deep learning architecture (dilated conv encoder + hybrid Transformer-mLSTM + pooling) and two-stage training for log anomaly detection on compressed byte streams. All performance claims (SOTA F1=0.9909 on five datasets) rest on direct experimental evaluation rather than any derivation, equation, or self-referential reduction. No load-bearing steps reduce predictions to fitted inputs by construction, and no self-citations are invoked to justify core premises. The work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- dilation rates and kernel sizes in byte encoder
- hyperparameters for masked pre-training and focal-contrastive fine-tuning
axioms (1)
- domain assumption Normal logs compress into regular byte patterns while anomalies systematically disrupt them in detectable ways
Reference graph
Works this paper leans on
-
[1]
Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. 2024. xlstm: Extended long short-term memory.Advances in Neural Information Processing Systems37 (2024), 107547–107603
2024
-
[2]
Rui Chen, Shenglin Zhang, Dongwen Li, Yuzhe Zhang, Fangrui Guo, Weibin Meng, Dan Pei, Yuzhi Zhang, Xu Chen, and Yuqing Liu. 2020. Logtransfer: Cross-system log anomaly detection for software systems with transfer learning. In2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 37–47
2020
-
[3]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. InProceedings of the 2017 ACM SIGSAC conference on computer and communications security. 1285–1298
2017
-
[4]
Yangxin Fan, Haolai Che, and Yinghui Wu. 2025. Inference-Friendly Graph Compression for Graph Neural Networks.Proceedings of the VLDB Endowment 18, 9 (2025), 3203–3215
2025
-
[5]
Jiawei Guan, Feng Zhang, Siqi Ma, Kuangyu Chen, Yihua Hu, Yuxing Chen, Anqun Pan, and Xiaoyong Du. 2023. Homomorphic compression: Making text processing on compression unlimited.Proceedings of the ACM on Management of Data1, 4 (2023), 1–28
2023
-
[6]
Haixuan Guo, Shuhan Yuan, and Xintao Wu. 2021. Logbert: Log anomaly detec- tion via bert. In2021 international joint conference on neural networks (IJCNN). IEEE, 1–8
2021
-
[7]
Hao Hu, Qiyang Zheng, Xiangyu Zou, Lisha Qin, Chengwei Zhang, Wanchuan Zhang, Zhaoheng Jiang, Dingwen Tao, Hongpeng Wang, and Wen Xia. 2025. A cost-effective and decompression-transparent compressor for OLTP-oriented databases. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 405–418
2025
-
[8]
Peng Jia, Shaofeng Cai, Beng Chin Ooi, Pinghui Wang, and Yiyuan Xiong. 2023. Robust and transferable log-based anomaly detection.Proceedings of the ACM on Management of Data1, 1 (2023), 1–26
2023
-
[9]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning.Advances in neural information processing systems33 (2020), 18661–18673
2020
-
[10]
Van-Hoang Le and Hongyu Zhang. 2021. Log-based anomaly detection with- out log parsing. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 492–504
2021
-
[11]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. InProceedings of the IEEE international conference on computer vision. 2980–2988
2017
-
[12]
Jie Liu, Jiamou Liu, Kaiqi Zhao, Yanni Tang, and Wu Chen. 2024. Tp-gnn: Continuous dynamic graph neural network for graph classification. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 2848–2861
2024
-
[13]
Jinyang Liu, Jieming Zhu, Shilin He, Pinjia He, Zibin Zheng, and Michael R Lyu. 2019. Logzip: Extracting hidden structures via iterative clustering for log compression. In2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 863–873
2019
-
[14]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101(2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Siyang Lu, Xiang Wei, Yandong Li, and Liqiang Wang. 2018. Detecting anomaly in big data system logs using convolutional neural network. In 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Cong...
2018
-
[16]
Lei Ma, Lei Cao, Peter M VanNostrand, Dennis M Hofmann, Yao Su, and Elke A Rundensteiner. 2024. Pluto: Sample selection for robust anomaly detection on polluted log data.Proceedings of the ACM on Management of Data2, 4 (2024), 1–25
2024
-
[17]
Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, et al. 2019. Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs.. InIjcai, Vol. 19. 4739–4745
2019
-
[18]
Adam Oliner and Jon Stearley. 2007. What supercomputers say: A study of five system logs. In37th annual IEEE/IFIP international conference on dependable systems and networks (DSN’07). IEEE, 575–584
2007
-
[19]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Jiaxing Qi, Shaohan Huang, Zhongzhi Luan, Shu Yang, Carol Fung, Hailong Yang, Depei Qian, Jing Shang, Zhiwen Xiao, and Zhihui Wu. 2023. Loggpt: Exploring chatgpt for log-based anomaly detection. In2023 IEEE international conference on high performance computing & communications, data science & systems, smart city & dependability in sensor, cloud & big dat...
2023
-
[21]
Kirk Rodrigues, Yu Luo, and Ding Yuan. 2021. CLP: Efficient and scalable search on compressed text logs. In15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 183–198
2021
-
[22]
Noam Shazeer. 2020. Glu variants improve transformer.arXiv preprint arXiv:2002.05202(2020)
work page internal anchor Pith review arXiv 2020
-
[23]
Yicheng Sui, Xiaotian Wang, Tianyu Cui, Tong Xiao, Chenghao He, Shenglin Zhang, Yuzhi Zhang, Xiao Yang, Yongqian Sun, and Dan Pei. 2025. Bridging the gap: Llm-powered transfer learning for log anomaly detection in new software systems. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE, 4414–4427
2025
-
[24]
Benzhao Tang, Shiyu Yang, Zhitao Shen, Wenjie Zhang, Xuemin Lin, and Zhihong Tian. 2025. LogLite: Lightweight Plug-and-Play Streaming Log Compression. Proceedings of the VLDB Endowment18, 11 (2025), 3757–3770
2025
-
[25]
Yanni Tang, Zhuoxing Zhang, Kaiqi Zhao, Lanting Fang, Zhenhua Li, and Wu Chen. 2024. Substructure-Aware Log Anomaly Detection.Proceedings of the VLDB Endowment18, 2 (2024), 213–225
2024
-
[26]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)
2017
-
[27]
Yi Wan, Yilin Liu, Dong Wang, and Yujin Wen. 2021. Glad-paw: Graph-based log anomaly detection by position aware weighted graph attention network. In Pacific-asia conference on knowledge discovery and data mining. Springer, 66–77
2021
-
[28]
Rui Wang, Devin Gibson, Kirk Rodrigues, Yu Luo, Yun Zhang, Kaibo Wang, Yupeng Fu, Ting Chen, and Ding Yuan. 2024. 𝜇Slope: High Compression and Fast Search on Semi-Structured Logs. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 529–544
2024
-
[29]
Ziheng Wang, Junyu Wei, Alex Aiken, Guangyan Zhang, Jacob O Tørring, Rain Jiang, Chenyu Jiang, and Wei Xu. 2025. LogCIoud: Fast Search of Compressed Logs on Object Storage.Proceedings of the VLDB Endowment18, 8 (2025), 2362– 2370
2025
-
[30]
Junyu Wei, Guangyan Zhang, Junchao Chen, Yang Wang, Weimin Zheng, Tingtao Sun, Jiesheng Wu, and Jiangwei Jiang. 2023. Loggrep: Fast and cheap cloud log storage by exploiting both static and runtime patterns. InProceedings of the Eighteenth European Conference on Computer Systems. 452–468
2023
-
[31]
Junyu Wei, Guangyan Zhang, Yang Wang, Zhiwei Liu, Zhanyang Zhu, Junchao Chen, Tingtao Sun, and Qi Zhou. 2021. On the feasibility of parser-based log compression in Large-Scale cloud systems. In19th USENIX Conference on File and Storage Technologies (FAST 21). 249–262
2021
-
[32]
Yuxin Wu and Kaiming He. 2018. Group normalization. InProceedings of the European conference on computer vision (ECCV). 3–19
2018
-
[33]
Yongzheng Xie, Hongyu Zhang, and Muhammad Ali Babar. 2022. Loggd: Detect- ing anomalies from system logs with graph neural networks. In2022 IEEE 22nd International conference on software quality, reliability and security (QRS). IEEE, 299–310
2022
-
[34]
Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tieyan Liu. 2020. On layer normalization in the transformer architecture. InInternational conference on machine learning. PMLR, 10524–10533
2020
-
[35]
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan. 2009. Online system problem detection by mining patterns of console logs. In2009 ninth IEEE international conference on data mining. IEEE, 588–597
2009
-
[36]
Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised log-based anomaly detection via probabilistic label estimation. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1448–1460
2021
-
[37]
Guangba Yu, Pengfei Chen, Pairui Li, Tianjun Weng, Haibing Zheng, Yuetang Deng, and Zibin Zheng. 2023. Logreducer: Identify and reduce log hotspots in kernel on the fly. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1763–1775
2023
-
[38]
Biao Zhang and Rico Sennrich. 2019. Root mean square layer normalization. Advances in neural information processing systems32 (2019)
2019
-
[39]
Feng Zhang, Weitao Wan, Chenyang Zhang, Jidong Zhai, Yunpeng Chai, Haixi- ang Li, and Xiaoyong Du. 2022. CompressDB: Enabling efficient compressed data direct processing for various databases. InProceedings of the 2022 International Conference on Management of Data. 1655–1669
2022
-
[40]
Lingzhe Zhang, Tong Jia, Mengxi Jia, Ying Li, Yong Yang, and Zhonghai Wu. 2024. Multivariate log-based anomaly detection for distributed database. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4256–4267
2024
-
[41]
Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, et al. 2019. Robust log-based anomaly detection on unstable log data. InProceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 807–817
2019
-
[42]
Yanliang Zhou, Feng Zhang, Tuo Lin, Yuanjie Huang, Saiqin Long, Jidong Zhai, and Xiaoyong Du. 2024. F-tadoc: Fpga-based text analytics directly on compres- sion with hls. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 3739–3752
2024
-
[43]
Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R. Lyu. 2023. Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics. InIEEE International Symposium on Software Reliability Engineering (ISSRE)
2023
-
[44]
Xuhang Zhu, Xiu Tang, Sai Wu, Jichen Li, Haobo Wang, Chang Yao, Quanqing Xu, and Gang Chen. 2025. CoLA: Model Collaboration for Log-based Anomaly Detection.Proceedings of the VLDB Endowment18, 11 (2025), 3979–3987
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.