pith. machine review for the scientific record. sign in

arxiv: 2604.08028 · v1 · submitted 2026-04-09 · 💻 cs.SE

Recognition: no theorem link

A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:04 UTC · model grok-4.3

classification 💻 cs.SE
keywords log anomaly detectionsemantic embeddingsBERTQTyBERTdeep learningsoftware logsCPU efficiencyquantization
0
0 comments X

The pith

QTyBERT produces log embeddings that match or beat BERT-based anomaly detection while generating them far faster on CPUs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks static word embeddings and full BERT for turning log text into vectors that feed deep learning anomaly detectors on three public datasets. It finds BERT more accurate but too slow for practical CPU use, while static methods run quickly yet often miss detections. To fix the trade-off, the authors build QTyBERT around a quantized lightweight BERT variant called SysBE plus CroSysEh, an unsupervised model trained on logs from many systems to sharpen the embeddings. If the results hold, this lets teams run effective deep learning log analysis without needing GPU resources or accepting weak static performance.

Core claim

QTyBERT uses SysBE, a system-specific quantized lightweight BERT, to encode log events into embeddings on CPUs and trains CroSysEh unsupervisedly on unlabeled logs from multiple systems to capture semantic structure in the embedding space. When these embeddings feed the same deep learning models, anomaly detection effectiveness on BGL, Thunderbird, and Spirit datasets reaches or exceeds that of full BERT embeddings while log embedding generation time drops close to the speed of static word embedding methods like Word2Vec or FastText.

What carries the argument

QTyBERT, which combines system-specific quantization in SysBE with cross-system unsupervised enhancement via CroSysEh to produce usable log embeddings.

If this is right

  • DL models can achieve BERT-level log anomaly detection on BGL, Thunderbird, and Spirit without the long embedding generation times that limit BERT in CPU settings.
  • Static word embeddings remain an option for maximum speed but QTyBERT closes most of the effectiveness gap without their typical performance shortfalls.
  • The method supports practical deployment of semantic log analysis in environments where GPU access is limited or latency matters.
  • Quantization and multi-system pretraining can be reused to adapt other contextual embedding approaches for log data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Teams could first adopt QTyBERT for broad CPU-based monitoring and switch to full BERT only on subsets where extra accuracy justifies the cost.
  • The cross-system training idea might extend to other embedding tasks that need both efficiency and semantic depth, such as code snippet classification.
  • If the multi-system training generalizes well, similar lightweight variants could reduce reliance on large pretrained models across software engineering tasks.

Load-bearing premise

Training CroSysEh on unlabeled logs from multiple systems will improve the semantic quality of SysBE embeddings for the target datasets without adding biases or dropping important system-specific details.

What would settle it

On a held-out log dataset, DL models using QTyBERT embeddings produce substantially lower anomaly detection F1 scores than the same models using full BERT embeddings while still showing the claimed speed advantage.

Figures

Figures reproduced from arXiv: 2604.08028 by Mika V. M\"antyl\"a, Nana Reinikainen, Xiaozhou Li, Ying Song, Yuqing Wang.

Figure 1
Figure 1. Figure 1: shows the overall workflow of QTyBERT. During appli￾cation in a target system, SysBE produces log embeddings, which are then processed by CroSysEh to obtain the final log represen￾tations. We explain how each component is built in the following subsections. Multi-system unlabeled log events Log embeddings Mapped log embeddings BERT Small calibration dataset quantization Target system unlabeled log events C… view at source ↗
Figure 2
Figure 2. Figure 2: t-SNE visualizations comparing log embeddings [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Trade-off between detection effectiveness (Avg F1- [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Recent deep learning (DL) methods for log anomaly detection increasingly rely on semantic log representation methods that convert the textual content of log events into vector embeddings as input to DL models. However, these DL methods are typically evaluated as end-to-end pipelines, while the impact of different semantic representation methods is not well understood. In this paper, we benchmark widely used semantic log representation methods, including static word embedding methods (Word2Vec, GloVe, and FastText) and the BERT-based contextual embedding method, across diverse DL models for log-event level anomaly detection on three publicly available log datasets: BGL, Thunderbird, and Spirit. We identify an effectiveness--efficiency trade off under CPU deployment settings: the BERT-based method is more effective, but incurs substantially longer log embedding generation time, limiting its practicality; static word embedding methods are efficient but are generally less effective and may yield insufficient detection performance. Motivated by this finding, we propose QTyBERT, a novel semantic log representation method that better balances this trade-off. QTyBERT uses SysBE, a lightweight BERT variant with system-specific quantization, to efficiently encode log events into vector embeddings on CPUs, and leverages CroSysEh to enhance the semantic expressiveness of these log embeddings. CroSysEh is trained unsupervisedly using unlabeled logs from multiple systems to capture the underlying semantic structure of the BERT model's embedding space. We evaluate QTyBERT against existing semantic log representation methods. Our results show that, for the DL models, using QTyBERT-generated log embeddings achieves detection effectiveness comparable to or better than BERT-generated log embeddings, while bringing log embedding generation time closer to that of static word embedding methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper benchmarks static word embeddings (Word2Vec, GloVe, FastText) and BERT-based contextual embeddings for deep learning models in log-event anomaly detection on the public BGL, Thunderbird, and Spirit datasets. It identifies an effectiveness-efficiency trade-off under CPU settings and proposes QTyBERT, a method using SysBE (a lightweight BERT variant with system-specific quantization) combined with CroSysEh (an unsupervised cross-system enhancer trained on pooled multi-system logs) to achieve detection performance comparable or superior to BERT while approaching the speed of static embeddings.

Significance. If the central claims hold after addressing experimental gaps, the work would be significant for software engineering and systems reliability by clarifying trade-offs in semantic log representations and offering a practical CPU-friendly alternative. Strengths include the systematic comparison across multiple public datasets and DL models, plus the explicit focus on deployment constraints, which could guide more efficient anomaly detection pipelines.

major comments (2)
  1. [Proposed Method] The central effectiveness claim for QTyBERT (comparable or better than BERT) depends on CroSysEh's ability to enhance semantics via unsupervised multi-system training without dilution or negative transfer of target-system information; however, the manuscript provides no ablation studies, analysis of system-specific vocabulary retention, or tests for bias introduction on BGL/Thunderbird/Spirit, leaving the robustness of the trade-off unverified.
  2. [Experiments] The experimental evaluation lacks critical details on the precise anomaly detection metrics (e.g., F1, precision-recall), statistical significance tests, hyperparameter selection and tuning procedures, and any safeguards against post-hoc model or threshold selections; these omissions directly affect the reliability of the reported benchmark results and the claimed balance between effectiveness and efficiency.
minor comments (2)
  1. [Abstract] The abstract and method descriptions introduce SysBE and CroSysEh without sufficiently clear initial definitions or distinctions from standard BERT components, which could improve readability for readers unfamiliar with the variants.
  2. The paper would benefit from explicit discussion of potential limitations of pooling logs across systems in CroSysEh, even if preliminary results appear positive.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for strengthening the manuscript. We address each major comment below and will incorporate revisions to improve the analysis and experimental details.

read point-by-point responses
  1. Referee: [Proposed Method] The central effectiveness claim for QTyBERT (comparable or better than BERT) depends on CroSysEh's ability to enhance semantics via unsupervised multi-system training without dilution or negative transfer of target-system information; however, the manuscript provides no ablation studies, analysis of system-specific vocabulary retention, or tests for bias introduction on BGL/Thunderbird/Spirit, leaving the robustness of the trade-off unverified.

    Authors: We acknowledge that dedicated ablation studies would provide stronger evidence for CroSysEh's role and confirm the lack of negative transfer or bias. In the revised manuscript, we will add ablations comparing SysBE embeddings alone versus full QTyBERT (with CroSysEh) across all three datasets. We will also include analysis of system-specific vocabulary retention (e.g., token overlap and embedding similarity metrics between original and enhanced representations) and bias checks via performance on system-specific anomaly subsets. These additions will directly verify the robustness of the reported effectiveness-efficiency trade-off. revision: yes

  2. Referee: [Experiments] The experimental evaluation lacks critical details on the precise anomaly detection metrics (e.g., F1, precision-recall), statistical significance tests, hyperparameter selection and tuning procedures, and any safeguards against post-hoc model or threshold selections; these omissions directly affect the reliability of the reported benchmark results and the claimed balance between effectiveness and efficiency.

    Authors: We agree that additional experimental details are necessary for full reproducibility and to substantiate the benchmark claims. The revised Experiments section will explicitly report all metrics (F1, precision, recall, and AUC where applicable), include statistical significance tests (e.g., paired t-tests or McNemar's test with p-values for model comparisons), detail the hyperparameter tuning process (including search ranges, validation strategy, and selection criteria), and describe threshold selection safeguards (e.g., fixed use of validation sets only, with the exact procedure documented to avoid post-hoc bias). These changes will be made without altering the core results. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmarking on external public datasets

full rationale

The paper performs a comparative empirical study by evaluating static word embeddings and BERT-based methods across DL models on the public BGL, Thunderbird, and Spirit datasets. It identifies a trade-off from these experiments, proposes QTyBERT (SysBE + CroSysEh) motivated by the observed results, and validates the new method via direct performance and timing comparisons against baselines. No equations, fitted parameters, or self-referential definitions are present in the provided text; claims do not reduce to inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked. The evaluation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

The central claim rests on standard assumptions about log semantics and DL model behavior plus two new components whose implementation details and any associated parameters are not specified in the abstract.

free parameters (2)
  • Quantization parameters for SysBE
    System-specific quantization in the lightweight BERT variant; values and selection process not detailed.
  • Training hyperparameters for CroSysEh
    Unsupervised training setup on multi-system logs; specific choices not provided.
axioms (1)
  • domain assumption Log events possess semantic structures that embedding models can capture to improve anomaly detection over raw or static representations.
    Underlies the motivation for semantic representations and the comparison between methods.
invented entities (2)
  • SysBE no independent evidence
    purpose: Lightweight BERT variant using system-specific quantization for efficient CPU-based log embedding generation.
    Core component of QTyBERT introduced to address efficiency.
  • CroSysEh no independent evidence
    purpose: Unsupervised model trained on multi-system logs to enhance semantic expressiveness of the quantized embeddings.
    Second core component of QTyBERT introduced to improve effectiveness.

pith-pipeline@v0.9.0 · 5617 in / 1509 out tokens · 88670 ms · 2026-05-10T18:04:37.462432+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 34 canonical work pages · 2 internal anchors

  1. [1]

    Vitor Cerqueira, Luís Torgo, and Igor Mozetič. 2020. Evaluating time series forecasting models: an empirical study on performance estimation methods. Machine Learning109, 11 (Nov. 2020), 1997–2028. doi:10.1007/s10994-020-05910-7

  2. [2]

    Jining Chen, Weitu Chong, Siyu Yu, Zhun Xu, Chaohong Tan, and Ningjiang Chen. 2022. TCN-based Lightweight Log Anomaly Detection in Cloud-edge Collaborative Environment. In2022 Tenth International Conference on Advanced Cloud and Big Data (CBD). 13–18. doi:10.1109/CBD58033.2022.00012

  3. [3]

    Rui Chen, Shenglin Zhang, Dongwen Li, Yuzhe Zhang, Fangrui Guo, Weibin Meng, Dan Pei, Yuzhi Zhang, Xu Chen, and Yuqing Liu. 2020. LogTransfer: Cross- System Log Anomaly Detection for Software Systems with Transfer Learning . In2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society, Los Alamitos, CA, USA, ...

  4. [4]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy...

  5. [5]

    Ying Fu, Meng Yan, Zhou Xu, Xin Xia, Xiaohong Zhang, and Dan Yang. 2022. An empirical study of the impact of log parsers on the performance of log-based anomaly detection.Empirical Software Engineering28, 1 (Nov. 2022), 39 pages. doi:10.1007/s10664-022-10214-6

  6. [6]

    Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, and Marianne Winslett. 2021. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT.Transactions of the Association for Computational Linguistics9 (2021), 1061–1080. doi:10.1162/ tacl_a_00413

  7. [7]

    Google Research. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://github.com/google-research/bert. Accessed: 2024-03-14

  8. [8]

    Shayan Hashemi and Mika Mäntylä. 2024. Onelog: towards end-to-end software log anomaly detection.Automated Software Engineering31, 2 (2024), 37

  9. [9]

    Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. 2017. Drain: An Online Log Parsing Approach with Fixed Depth Tree. In2017 IEEE International Conference on Web Services (ICWS). 33–40. doi:10.1109/ICWS.2017.13

  10. [10]

    Hespeler, Pablo Moriano, Mingyan Li, and Samuel C

    Steven C. Hespeler, Pablo Moriano, Mingyan Li, and Samuel C. Hollifield. 2025. Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation. arXiv:2506.12183 [stat.ML] https://arxiv.org/abs/2506.12183

  11. [11]

    Adha Hrusto, Nauman Bin Ali, Emelie Engström, and Yuqing Wang. 2025. Moni- toring data for Anomaly Detection in Cloud-Based Systems: A Systematic Map- ping Study.ACM Transactions on Software Engineering and Methodology(June 2025). doi:10.1145/3744556 Just Accepted

  12. [12]

    Shaohan Huang, Yi Liu, Carol Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. 2020. HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log.IEEE Trans. on Netw. and Serv. Manag.17, 4 (Dec. 2020), 2064–2076. doi:10.1109/TNSM.2020.3034647

  13. [13]

    Peng Jia, Shaofeng Cai, Beng Chin Ooi, Pinghui Wang, and Yiyuan Xiong. 2023. Robust and Transferable Log-based Anomaly Detection.Proc. ACM Manag. Data 1, 1, Article 64 (May 2023), 26 pages. doi:10.1145/3588918

  14. [14]

    Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. arXiv:1909.10351 [cs.CL] https://arxiv.org/abs/1909.10351

  15. [15]

    Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hervé Jégou, and Tomás Mikolov. 2016. FastText.zip: Compressing text classification models. CoRRabs/1612.03651 (2016). arXiv:1612.03651 http://arxiv.org/abs/1612.03651

  16. [16]

    Van-Hoang Le and Hongyu Zhang. 2021. Log-based anomaly detection with- out log parsing. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 492–504. doi:10.1109/ASE51524.2021.9678773

  17. [17]

    Van-Hoang Le and Hongyu Zhang. 2022. Log-based anomaly detection with deep learning: how far are we?. InProceedings of the 44th International Conference on Software Engineering(Pittsburgh, Pennsylvania)(ICSE ’22). Association for Computing Machinery, New York, NY, USA, 1356–1367. doi:10.1145/3510003. 3510155

  18. [18]

    Yukyung Lee, Jina Kim, and Pilsung Kang. 2023. LAnoBERT: System log anomaly detection based on BERT masked language model.Applied Soft Computing146 (2023), 110689. doi:10.1016/j.asoc.2023.110689

  19. [19]

    Xiaoyun Li, Pengfei Chen, Linxiao Jing, Zilong He, and Guangba Yu. 2023. Swiss- Log: Robust Anomaly Detection and Localization for Interleaved Unstructured Logs.IEEE Transactions on Dependable and Secure Computing20, 4 (2023), 2762–

  20. [20]

    doi:10.1109/TDSC.2022.3162857

  21. [21]

    Siyang Lu, Xiang Wei, Yandong Li, and Liqiang Wang. 2018. Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network. In2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congr...

  22. [22]

    Chuangying Meng and Ningjiang Chen. 2024. TinyLog: Log Anomaly Detection with Lightweight Temporal Convolutional Network for Edge Device. In2024 International Joint Conference on Neural Networks (IJCNN). 1–8. doi:10.1109/ IJCNN60899.2024.10651312

  23. [23]

    Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, and Rong Zhou. 2019. Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstruc- tured logs. InProceedings of the 28th International Joint Conference on Artificial Intelligence(Macao, China)(IJCAI’19). AAAI Pre...

  24. [24]

    doi: 10.1109/SANER60148.2024.00051

    Mika V. Mäntylä, Yuqing Wang, and Jesse Nyyssölä. 2024. LogLead - Fast and In- tegrated Log Loader, Enhancer, and Anomaly Detector. In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 395–399. doi:10.1109/SANER60148.2024.00046

  25. [25]

    Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort. 2020. Up or Down? Adaptive Rounding for Post-Training Quanti- zation. InProceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 7197–7206. https://proceeding...

  26. [26]

    Kim Anh Nguyen, Sabine Schulte im Walde, and Ngoc Thang Vu. 2016. Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction. arXiv:1605.07766 [cs.CL] https://arxiv.org/abs/1605.07766

  27. [27]

    Adam Oliner and Jon Stearley. 2007. What Supercomputers Say: A Study of Five System Logs. In37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’07). 575–584. doi:10.1109/DSN.2007.103

  28. [28]

    ONNX Project. 2025. ONNX: Open Neural Network Exchange — Introduction. https://onnx.ai/onnx/intro/. Accessed: 2025-09-11

  29. [29]

    Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 1532–1543. doi:10.3115/v1/D14-1162

  30. [30]

    Riley Peronto. 2024. The State of Log Data: 6 Trends Impacting Observability and Security. Blog post, Chronosphere. https://chronosphere.io/learn/observability- log-data-trends/

  31. [31]

    Emad Ul Haq Qazi, Abdulrazaq Almorjan, and Tanveer Zia. 2022. A One- Dimensional Convolutional Neural Network (1D-CNN) Based Deep Learn- ing System for Network Intrusion Detection.Applied Sciences12, 16 (2022). doi:10.3390/app12167986

  32. [32]

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL] https://arxiv.org/abs/1910.01108

  33. [33]

    Issam Sedki, Abdelwahab Hamou-Lhadj, Otmane Ait-Mohamed, and Naser Ezzati- Jivan. 2023. Towards a Classification of Log Parsing Errors. In2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC). 84–88. doi:10. 1109/ICPC58990.2023.00023

  34. [34]

    Hudan Studiawan, Ferdous Sohel, and Christian Payne. 2021. Anomaly Detection in Operating System Logs with Deep Learning-Based Sentiment Analysis.IEEE Transactions on Dependable and Secure Computing18, 5 (2021), 2136–2148. doi:10. 1109/TDSC.2020.3037903

  35. [35]

    Lei Sun and Xiaolong Xu. 2023. LogPal: A Generic Anomaly Detection Scheme of Heterogeneous Logs for Network Systems. Security and Communication Networks2023, 1 (2023), 2803139. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1155/2023/2803139 doi:10.1155/2023/2803139

  36. [36]

    Marek Suppa, Katarína Benešová, and Andrej Švec. 2021. Cost-effective De- ployment of BERT Models in Serverless Environment. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, Young-bum Kim, Conference acronym ’XX, June 03–05, 2018, Woodstock, N...

  37. [37]

    USENIX Association. [n. d.]. The Computer Failure Data Repository (CFDR). https://www.usenix.org/cfdr. Accessed: 2025-09-08

  38. [38]

    Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE.Journal of Machine Learning Research9, Nov (2008), 2579–2605. http: //www.jmlr.org/papers/v9/vandermaaten08a.html

  39. [39]

    Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre- Trained Transformers. arXiv:2002.10957 [cs.CL] https://arxiv.org/abs/2002.10957

  40. [40]

    Jianming Chang, Songqiang Chen, Chao Peng, Hao Yu, Zhiming Li, Pengfei Gao, and Tao Xie

    Yuqing Wang, Mika V. Mäntylä, Jesse Nyyssölä, Ke Ping, and Liqiang Wang. 2025. Cross-System Software Log-based Anomaly Detection Using Meta-Learning. In 2025 IEEE International Conference on Software Analysis, Evolution and Reengi- neering (SANER). 454–464. doi:10.1109/SANER64311.2025.00049

  41. [41]

    Zumin Wang, Jiyu Tian, Hui Fang, Liming Chen, and Jing Qin. 2022. LightLog: A lightweight temporal convolutional network for log anomaly detection on the edge.Computer Networks203 (2022), 108616. doi:10.1016/j.comnet.2021.108616

  42. [42]

    Xingfang Wu, Heng Li, and Foutse Khomh. 2023. On the effectiveness of log representation for log-based anomaly detection.Empirical Softw. Engg.28, 6 (Oct. 2023), 39 pages. doi:10.1007/s10664-023-10364-1

  43. [43]

    Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al

  44. [44]

    Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

    Google’s neural machine translation system: Bridging the gap between human and machine translation.arXiv preprint arXiv:1609.08144(2016). https: //arxiv.org/abs/1609.08144

  45. [45]

    Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, and Qun Liu. 2020. Ternarybert: Distillation-aware ultra-low bit bert.arXiv preprint arXiv:2009.12812(2020)

  46. [46]

    Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, Junjie Chen, Xiaoting He, Ran- dolph Yao, Jian-Guang Lou, Murali Chintalapati, Furao Shen, and Dongmei Zhang

  47. [47]

    Robust log-based anomaly detection on unstable log data. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 807–817. doi:10.1145/3338906.3338931 Received 20 February 2...