pith. sign in

arxiv: 2605.14866 · v1 · pith:3E7CQRJUnew · submitted 2026-05-14 · 💻 cs.SE · cs.AI

Towards In-Depth Root Cause Localization for Microservices with Multi-Agent Recursion-of-Thought

Pith reviewed 2026-06-30 20:02 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords root cause localizationmicroservicesmulti-agent systemslarge language modelstrace graphsfault diagnosisrecursion-of-thoughtparallel reasoning
0
0 comments X

The pith

RCLAgent decomposes trace graphs with dedicated agents running recursion-of-thought in parallel to localize microservice root causes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RCLAgent, a framework that assigns a dedicated agent to each span in a microservice trace and organizes these agents recursively and in parallel along the graph topology. This structure is intended to prevent the context explosion that occurs when a single LLM processes an entire trace at once and to replace serial reasoning chains with simultaneous exploration of causal paths. The design draws from a study of how human site reliability engineers actually diagnose failures. On public benchmarks the method reports higher localization accuracy and faster inference than prior LLM-based and traditional approaches.

Core claim

RCLAgent realizes multi-agent recursion-of-thought with parallel reasoning. It decomposes the diagnostic process along the trace graph by assigning each span to a Dedicated Agent and organizing agents recursively and in parallel according to the graph topology, with the final diagnosis obtained by synthesizing the Root-Level Diagnosis Report and the Global Evidence Graph.

What carries the argument

Multi-agent recursion-of-thought, which decomposes diagnostics by assigning dedicated agents to individual trace spans and coordinates them recursively and in parallel to produce a synthesized diagnosis.

If this is right

  • Localization accuracy improves because each agent focuses on a limited local context while still contributing to a global evidence graph.
  • Inference time decreases through parallel execution of agents rather than serial chain-of-thought steps.
  • Interpretability increases because the final report is assembled from per-span diagnoses that follow the actual call graph.
  • Transferability across deployments rises because the agent organization is driven by runtime trace topology rather than fixed training data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same recursive-agent pattern could be tested on other graph-structured diagnostic tasks such as network fault isolation or distributed database debugging.
  • If the synthesis step that merges the Root-Level Diagnosis Report and Global Evidence Graph proves sensitive to agent count, then scaling behavior on very large traces would need separate validation.
  • The approach implicitly suggests that human-inspired decomposition may reduce the token budget required for LLM-based diagnosis, an efficiency dimension not directly measured in the reported experiments.

Load-bearing premise

Decomposing the diagnostic process along the trace graph with dedicated agents organized recursively and in parallel will prevent context explosion and enable deeper causal exploration.

What would settle it

On the same public benchmarks, if RCLAgent shows no gain in localization accuracy or inference speed relative to the strongest prior methods, the central performance claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.14866 by Chiming Duan, Gong Zhang, Kangjin Wang, Lingzhe Zhang, Meiling Wang, Minghua He, Renhai Chen, Rongqian Wang, Tong Jia, Xi Peng, Ying Li.

Figure 1
Figure 1. Figure 1: Example of Trace Log anomalies correspond to their observable symptoms, such as increased response latency, reduced throughput, or repeated error events recorded in logs. At the system level, these anomalies are reflected in ob￾servability signals, including abnormal spikes in service met￾rics (e.g., error rates, CPU usage, or memory consumption), suspicious log patterns (e.g., repeated timeout or exceptio… view at source ↗
Figure 2
Figure 2. Figure 2: An Illustrative Example of The Manual Root Cause Localization Process [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Effect of Reasoning Rounds on RCL Accuracy (MRR) and Latency [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Architecture of RCLAgent Summary. Existing LLM-based methods for root cause localization primarily suffer from two limitations: evidence dilution caused by context explosion, where excessive or accumulated context overwhelms the model and critical causal signals are overlooked; and shallow reasoning caused by serial reasoning, where insufficient exploration prevents full traversal of the causal chain, lead… view at source ↗
Figure 6
Figure 6. Figure 6: The Prompt for Evidence Consoliadation Intuitively, if a child agent exhibits strong and consistent evidence of being the root cause, the agent propagates that hypothesis upward. Otherwise, if the current span itself is deemed abnormal, the agent proposes itself as the local root cause [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: illustrates the prompt template used to elicit self￾state verification from each agent. System: You are a Root Cause Localization agent in a microservice system. A user-reported failure has occurred. Your task is to analyze logs and metrics to identify the DEEPEST ROOT CAUSE SERVICE that initiated the failure chain. User: Analyze the following span for root cause: {T (s)}, You have the following data tools… view at source ↗
Figure 7
Figure 7. Figure 7: Ablation Experiment As shown in [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Hyperparameter analysis of the retrieval interval [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
read the original abstract

As modern microservice systems grow increasingly complex due to dynamic interactions and evolving runtime environments, they experience failures with rising frequency. Ensuring system reliability therefore critically depends on accurate root cause localization (RCL). While numerous traditional machine learning and deep learning approaches have been explored for this task, they often suffer from limited interpretability and poor transferability across deployments. More recently, large language model (LLM)-based methods have been proposed to address these issues. However, existing LLM-based approaches still face two fundamental limitations: context explosion, which dilutes critical evidence and degrades localization accuracy, and serial reasoning structures, which hinder deep causal exploration and impair inference efficiency. In this paper, we conduct a comprehensive study of both how human SREs perform root cause localization in practice and why existing LLM-based methods fall short. Motivated by these findings, we introduce RCLAgent, an in-depth root cause localization framework for microservice systems that realizes multi-agent recursion-of-thought with parallel reasoning. RCLAgent decomposes the diagnostic process along the trace graph by assigning each span to a Dedicated Agent and organizing agents recursively and in parallel according to the graph topology, with the final diagnosis obtained by synthesizing the Root-Level Diagnosis Report and the Global Evidence Graph. Extensive experiments on multiple public benchmarks demonstrate that RCLAgent consistently outperforms state-of-the-art methods in both localization accuracy and inference efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces RCLAgent, a multi-agent recursion-of-thought framework for root cause localization (RCL) in microservice systems. Motivated by an empirical study of human SRE diagnostic practices and the shortcomings of prior LLM-based methods (context explosion and serial reasoning), the approach assigns dedicated agents to individual spans in the trace graph, organizes them recursively and in parallel according to graph topology, and produces a final diagnosis via synthesis of a Root-Level Diagnosis Report and Global Evidence Graph. The central empirical claim is that RCLAgent consistently outperforms state-of-the-art methods on public benchmarks in both localization accuracy and inference efficiency.

Significance. If the reported gains are reproducible and the design choices are shown to be load-bearing, the work would offer a concrete, practice-motivated advance over existing LLM-based RCL techniques. The multi-agent decomposition along the trace graph directly targets two well-recognized failure modes (context dilution and shallow causal chains) and could improve both accuracy and latency in production diagnosis pipelines.

major comments (2)
  1. [Abstract / §4] Abstract and §4 (Experiments): the claim that RCLAgent 'consistently outperforms state-of-the-art methods in both localization accuracy and inference efficiency' is the central result, yet the visible text provides neither the concrete metrics (e.g., F1, precision@K, latency), the exact baselines, the number of benchmarks, nor any ablation or statistical test. Without these data the support for the claim cannot be evaluated.
  2. [§3] §3 (Method): the assumption that recursive parallel agent organization along the trace graph will mitigate context explosion is presented as following directly from the SRE study, but no quantitative evidence (e.g., context-length measurements before/after decomposition or agent-interaction overhead) is supplied to show that the decomposition actually reduces effective context size or improves causal depth.
minor comments (2)
  1. [Abstract] The abstract refers to 'multiple public benchmarks' without naming them; listing the datasets (and citing their sources) would allow readers to assess generalizability.
  2. [§3] Notation for the Global Evidence Graph and Root-Level Diagnosis Report is introduced without a formal definition or pseudocode; a small diagram or algorithm box would clarify the synthesis step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract / §4] Abstract and §4 (Experiments): the claim that RCLAgent 'consistently outperforms state-of-the-art methods in both localization accuracy and inference efficiency' is the central result, yet the visible text provides neither the concrete metrics (e.g., F1, precision@K, latency), the exact baselines, the number of benchmarks, nor any ablation or statistical test. Without these data the support for the claim cannot be evaluated.

    Authors: The detailed results—including F1, precision@K, latency, exact baselines, benchmark counts, ablations, and statistical tests—are reported in the tables and analysis of §4. To make the central claim immediately evaluable from the abstract and §4 summary, we will revise both to explicitly list the key metrics, baselines, and benchmark details while retaining the full supporting data in the section. revision: yes

  2. Referee: [§3] §3 (Method): the assumption that recursive parallel agent organization along the trace graph will mitigate context explosion is presented as following directly from the SRE study, but no quantitative evidence (e.g., context-length measurements before/after decomposition or agent-interaction overhead) is supplied to show that the decomposition actually reduces effective context size or improves causal depth.

    Authors: The SRE study supplies the qualitative motivation for targeting context explosion and serial reasoning via graph-aligned decomposition. End-to-end gains in accuracy and efficiency are shown quantitatively in §4. We agree that direct measurements (e.g., token counts pre/post-decomposition and interaction overhead) would strengthen the mechanistic claim and will add this analysis to a revised §3. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper motivates RCLAgent from an independent study of SRE practices and limitations of prior LLM-based methods, then presents an empirical evaluation on public benchmarks showing outperformance. No equations, fitted parameters renamed as predictions, self-citation chains, or self-definitional reductions appear in the abstract or described structure. The central claim rests on benchmark results rather than any derivation that reduces to its own inputs by construction. This is the normal case of a self-contained empirical proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Insufficient information from the abstract alone to identify specific free parameters or invented entities; the approach relies on domain assumptions about trace graphs.

axioms (1)
  • domain assumption The trace graph topology provides an effective structure for organizing agents recursively and in parallel to mitigate context explosion and support deep causal exploration.
    The method decomposes the diagnostic process along the trace graph and assigns agents according to graph topology.

pith-pipeline@v0.9.1-grok · 5808 in / 1253 out tokens · 60699 ms · 2026-06-30T20:02:14.117880+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

111 extracted references · 29 canonical work pages · 4 internal anchors

  1. [1]

    Fault analysis and debugging of microservice systems: Industrial survey, benchmark system, and empirical study,

    X. Zhou, X. Peng, T. Xie, J. Sun, C. Ji, W. Li, and D. Ding, “Fault analysis and debugging of microservice systems: Industrial survey, benchmark system, and empirical study,”IEEE Transactions on Software Engineering, vol. 47, no. 2, pp. 243–260, 2018

  2. [2]

    A survey of aiops in the era of large language models,

    L. Zhang, T. Jia, M. Jia, Y . Wu, A. Liu, Y . Yang, Z. Wu, X. Hu, P. Yu, and Y . Li, “A survey of aiops in the era of large language models,” ACM Computing Surveys, 2025

  3. [3]

    Developing self-adaptive microservice systems: Challenges and directions,

    N. C. Mendonc ¸a, P. Jamshidi, D. Garlan, and C. Pahl, “Developing self-adaptive microservice systems: Challenges and directions,”IEEE Software, vol. 38, no. 2, pp. 70–79, 2019

  4. [4]

    Design, monitoring, and testing of microservices systems: The prac- titioners’ perspective,

    M. Waseem, P. Liang, M. Shahin, A. Di Salle, and G. M ´arquez, “Design, monitoring, and testing of microservices systems: The prac- titioners’ perspective,”Journal of Systems and Software, vol. 182, p. 111061, 2021

  5. [5]

    Towards close-to-zero runtime collection overhead: Raft-based anomaly diagno- sis on system faults for distributed storage system,

    L. Zhang, T. Jia, M. Jia, H. Liu, Y . Yang, Z. Wu, and Y . Li, “Towards close-to-zero runtime collection overhead: Raft-based anomaly diagno- sis on system faults for distributed storage system,”IEEE Transactions on Services Computing, 2024

  6. [6]

    Time-tired compaction: An elastic compaction scheme for lsm-tree based time-series database,

    L.-Z. Zhang, X.-D. Huang, Y .-K. Wang, J.-L. Qiao, S.-X. Song, and J.-M. Wang, “Time-tired compaction: An elastic compaction scheme for lsm-tree based time-series database,”Advanced Engineering Infor- matics, vol. 59, p. 102224, 2024

  7. [7]

    Separation or not: On handing out-of-order time-series data in leveled lsm-tree,

    Y . Kang, X. Huang, S. Song, L. Zhang, J. Qiao, C. Wang, J. Wang, and J. Feinauer, “Separation or not: On handing out-of-order time-series data in leveled lsm-tree,” in2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022, pp. 3340–3352

  8. [8]

    Multivariate log-based anomaly detection for distributed database,

    L. Zhang, T. Jia, M. Jia, Y . Li, Y . Yang, and Z. Wu, “Multivariate log-based anomaly detection for distributed database,” inProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 4256–4267

  9. [9]

    Reducing events to augment log-based anomaly detection models: An empirical study,

    L. Zhang, T. Jia, K. Wang, M. Jia, Y . Yang, and Y . Li, “Reducing events to augment log-based anomaly detection models: An empirical study,” inProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2024, pp. 538– 548

  10. [10]

    Inter- dependent causal networks for root cause localization,

    D. Wang, Z. Chen, J. Ni, L. Tong, Z. Wang, Y . Fu, and H. Chen, “Inter- dependent causal networks for root cause localization,” inProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 5051–5060. TOW ARDS IN-DEPTH ROOT CAUSE LOCALIZATION FOR MICROSERVICES WITH MULTI-AGENT RECURSION-OF-THOUGHT 16

  11. [11]

    Failure diagnosis in microservice systems: A comprehensive survey and analysis,

    S. Zhang, S. Xia, W. Fan, B. Shi, X. Xiong, Z. Zhong, M. Ma, Y . Sun, and D. Pei, “Failure diagnosis in microservice systems: A comprehensive survey and analysis,”ACM Transactions on Software Engineering and Methodology, 2024

  12. [12]

    A survey on intelligent management of alerts and incidents in it services,

    Q. Yu, N. Zhao, M. Li, Z. Li, H. Wang, W. Zhang, K. Sui, and D. Pei, “A survey on intelligent management of alerts and incidents in it services,”Journal of Network and Computer Applications, p. 103842, 2024

  13. [13]

    Interpretable failure localization for microservice systems based on graph autoencoder,

    Y . Sun, Z. Lin, B. Shi, S. Zhang, S. Ma, P. Jin, Z. Zhong, L. Pan, Y . Guo, and D. Pei, “Interpretable failure localization for microservice systems based on graph autoencoder,”ACM Transactions on Software Engineering and Methodology, vol. 34, no. 2, pp. 1–28, 2025

  14. [14]

    Hemirca: Fine-grained root cause analysis for microservices with heterogeneous data sources,

    Z. Zhu, C. Lee, X. Tang, and P. He, “Hemirca: Fine-grained root cause analysis for microservices with heterogeneous data sources,” ACM Transactions on Software Engineering and Methodology, vol. 33, no. 8, pp. 1–25, 2024

  15. [15]

    Microservice root cause analysis with limited observability through intervention recognition in the latent space,

    Z. Xie, S. Zhang, Y . Geng, Y . Zhang, M. Ma, X. Nie, Z. Yao, L. Xu, Y . Sun, W. Liet al., “Microservice root cause analysis with limited observability through intervention recognition in the latent space,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 6049–6060

  16. [16]

    Kgroot: A knowledge graph-enhanced method for root cause analysis,

    T. Wang, G. Qi, and T. Wu, “Kgroot: A knowledge graph-enhanced method for root cause analysis,”Expert Systems with Applications, vol. 255, p. 124679, 2024

  17. [17]

    Rcaeval: A bench- mark for root cause analysis of microservice systems with telemetry data,

    L. Pham, H. Zhang, H. Ha, F. Salim, and X. Zhang, “Rcaeval: A bench- mark for root cause analysis of microservice systems with telemetry data,” inCompanion Proceedings of the ACM on Web Conference 2025, 2025, pp. 777–780

  18. [18]

    Lemma-rca: A large multi-modal multi-domain dataset for root cause analysis,

    L. Zheng, Z. Chen, D. Wang, C. Deng, R. Matsuoka, and H. Chen, “Lemma-rca: A large multi-modal multi-domain dataset for root cause analysis,”arXiv preprint arXiv:2406.05375, 2024

  19. [19]

    Microscope: Pinpoint performance issues with causal graphs in micro-service environments,

    J. Lin, P. Chen, and Z. Zheng, “Microscope: Pinpoint performance issues with causal graphs in micro-service environments,” inService- Oriented Computing: 16th International Conference, ICSOC 2018, Hangzhou, China, November 12-15, 2018, Proceedings 16. Springer, 2018, pp. 3–20

  20. [20]

    Causal inference-based root cause analysis for online service systems with intervention recognition,

    M. Li, Z. Li, K. Yin, X. Nie, W. Zhang, K. Sui, and D. Pei, “Causal inference-based root cause analysis for online service systems with intervention recognition,” inProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 3230–3240

  21. [21]

    Microrank: End-to-end latency issue localization with extended spectrum analysis in microservice environments,

    G. Yu, P. Chen, H. Chen, Z. Guan, Z. Huang, L. Jing, T. Weng, X. Sun, and X. Li, “Microrank: End-to-end latency issue localization with extended spectrum analysis in microservice environments,” in Proceedings of the Web Conference 2021, 2021, pp. 3087–3098

  22. [22]

    Tracerank: Abnormal service local- ization with dis-aggregated end-to-end tracing data in cloud native systems,

    G. Yu, Z. Huang, and P. Chen, “Tracerank: Abnormal service local- ization with dis-aggregated end-to-end tracing data in cloud native systems,”Journal of Software: Evolution and Process, vol. 35, no. 10, p. e2413, 2023

  23. [23]

    {CRISP}: Critical path analysis of{Large-Scale} microservice architectures,

    Z. Zhang, M. K. Ramanathan, P. Raj, A. Parwal, T. Sherwood, and M. Chabbi, “{CRISP}: Critical path analysis of{Large-Scale} microservice architectures,” in2022 USENIX Annual Technical Con- ference (USENIX ATC 22), 2022, pp. 655–672

  24. [24]

    Trace-based multi-dimensional root cause localization of performance issues in microservice systems,

    C. Zhang, Z. Dong, X. Peng, B. Zhang, and M. Chen, “Trace-based multi-dimensional root cause localization of performance issues in microservice systems,” inProceedings of the IEEE/ACM 46th Inter- national Conference on Software Engineering, 2024, pp. 1–12

  25. [25]

    Root cause analysis in microservice using neural granger causal discovery,

    C.-M. Lin, C. Chang, W.-Y . Wang, K.-D. Wang, and W.-C. Peng, “Root cause analysis in microservice using neural granger causal discovery,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 1, 2024, pp. 206–213

  26. [26]

    Nezha: Interpretable fine-grained root causes analysis for microservices on multi-modal observability data,

    G. Yu, P. Chen, Y . Li, H. Chen, X. Li, and Z. Zheng, “Nezha: Interpretable fine-grained root causes analysis for microservices on multi-modal observability data,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 553–565

  27. [27]

    Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices,

    Y . Gan, Y . Zhang, K. Hu, D. Cheng, Y . He, M. Pancholi, and C. Delimitrou, “Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices,” inProceedings of the twenty-fourth international conference on architectural support for programming languages and operating systems, 2019, pp. 19–33

  28. [28]

    Micromilts: Fault location for microservices based mutual information and lstm autoencoder,

    L. Yang, J. Li, K. Shi, S. Yang, Q. Yang, and J. Sun, “Micromilts: Fault location for microservices based mutual information and lstm autoencoder,” in2022 23rd Asia-Pacific Network Operations and Management Symposium (APNOMS). IEEE, 2022, pp. 1–6

  29. [29]

    Modelcoder: A fault model based automatic root cause localization framework for microservice systems,

    Y . Cai, B. Han, J. Li, N. Zhao, and J. Su, “Modelcoder: A fault model based automatic root cause localization framework for microservice systems,” in2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS). IEEE, 2021, pp. 1–6

  30. [30]

    Tracediag: Adaptive, interpretable, and efficient root cause analysis on large-scale microservice systems,

    R. Ding, C. Zhang, L. Wang, Y . Xu, M. Ma, X. Wu, M. Zhang, Q. Chen, X. Gao, X. Gaoet al., “Tracediag: Adaptive, interpretable, and efficient root cause analysis on large-scale microservice systems,” inProceed- ings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 1762–1773

  31. [31]

    Root cause analysis for microservice systems via hierarchical reinforcement learning from human feedback,

    L. Wang, C. Zhang, R. Ding, Y . Xu, Q. Chen, W. Zou, Q. Chen, M. Zhang, X. Gao, H. Fanet al., “Root cause analysis for microservice systems via hierarchical reinforcement learning from human feedback,” inProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 5116–5125

  32. [32]

    Trace-based intel- ligent fault diagnosis for microservices with deep learning,

    H. Chen, K. Wei, A. Li, T. Wang, and W. Zhang, “Trace-based intel- ligent fault diagnosis for microservices with deep learning,” in2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2021, pp. 884–893

  33. [33]

    Grace: Interpretable root cause analysis by graph convolutional network for microservices,

    R. Ren, Y . Wang, F. Liu, Z. Li, G. Tyson, T. Miao, and G. Xie, “Grace: Interpretable root cause analysis by graph convolutional network for microservices,” in2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS). IEEE, 2023, pp. 1–4

  34. [34]

    Actionable and interpretable fault localization for recurring failures in online service systems,

    Z. Li, N. Zhao, M. Li, X. Lu, L. Wang, D. Chang, X. Nie, L. Cao, W. Zhang, K. Suiet al., “Actionable and interpretable fault localization for recurring failures in online service systems,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 996–1008

  35. [35]

    Eadro: An end- to-end troubleshooting framework for microservices on multi-source data,

    C. Lee, T. Yang, Z. Chen, Y . Su, and M. R. Lyu, “Eadro: An end- to-end troubleshooting framework for microservices on multi-source data,” in2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 1750–1762

  36. [36]

    mabc: Multi-agent blockchain-inspired collaboration for root cause analysis in micro-services architecture,

    W. Zhang, H. Guo, J. Yang, Z. Tian, Y . Zhang, Y . Chaoran, Z. Li, T. Li, X. Shi, L. Zhenget al., “mabc: Multi-agent blockchain-inspired collaboration for root cause analysis in micro-services architecture,” inFindings of the Association for Computational Linguistics: EMNLP 2024, 2024, pp. 4017–4033

  37. [37]

    Rcagent: Cloud root cause analysis by autonomous agents with tool-augmented large language models,

    Z. Wang, Z. Liu, Y . Zhang, A. Zhong, J. Wang, F. Yin, L. Fan, L. Wu, and Q. Wen, “Rcagent: Cloud root cause analysis by autonomous agents with tool-augmented large language models,” inProceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 4966–4974

  38. [38]

    Flow-of-action: Sop enhanced llm-based multi- agent system for root cause analysis,

    C. Pei, Z. Wang, F. Liu, Z. Li, Y . Liu, X. He, R. Kang, T. Zhang, J. Chen, J. Liet al., “Flow-of-action: Sop enhanced llm-based multi- agent system for root cause analysis,” inCompanion Proceedings of the ACM on Web Conference 2025, 2025, pp. 422–431

  39. [39]

    Coca: Generative root cause analysis for distributed systems with code knowledge,

    Y . Li, Y . Wu, J. Liu, Z. Jiang, Z. Chen, G. Yu, and M. R. Lyu, “Coca: Generative root cause analysis for distributed systems with code knowledge,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 2025, pp. 1346–1358

  40. [40]

    Tamo: Fine-grained root cause analysis via tool-assisted llm agent with multi-modality observation data in cloud-native systems,

    Q. Wang, X. Zhang, M. Li, Y . Yuan, M. Xiao, F. Zhuang, and D. Yu, “Tamo: Fine-grained root cause analysis via tool-assisted llm agent with multi-modality observation data in cloud-native systems,”arXiv preprint arXiv:2504.20462, 2025

  41. [41]

    The multi-agent fault localization system based on monte carlo tree search approach,

    R. Ren, “The multi-agent fault localization system based on monte carlo tree search approach,”arXiv preprint arXiv:2507.22800, 2025

  42. [42]

    Exploring llm-based agents for root cause analysis,

    D. Roy, X. Zhang, R. Bhave, C. Bansal, P. Las-Casas, R. Fonseca, and S. Rajmohan, “Exploring llm-based agents for root cause analysis,” arXiv preprint arXiv:2403.04123, 2024

  43. [43]

    Enhancing cluster resilience: Llm-agent based autonomous intelligent cluster diagnosis system and evaluation framework,

    H. Shi, L. Cheng, W. Wu, Y . Wang, X. Liu, S. Nie, W. Wang, X. Min, C. Men, and Y . Lin, “Enhancing cluster resilience: Llm-agent based autonomous intelligent cluster diagnosis system and evaluation framework,”arXiv preprint arXiv:2411.05349, 2024

  44. [44]

    Cloud atlas: Efficient fault localization for cloud systems using language models and causal insight,

    Z. Xie, Y . Zheng, L. Ottens, K. Zhang, C. Kozyrakis, and J. Mace, “Cloud atlas: Efficient fault localization for cloud systems using language models and causal insight,”arXiv preprint arXiv:2407.08694, 2024

  45. [45]

    The potential of one-shot failure root cause analysis: Collaboration of the large language model and small classifier,

    Y . Han, Q. Du, Y . Huang, J. Wu, F. Tian, and C. He, “The potential of one-shot failure root cause analysis: Collaboration of the large language model and small classifier,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024, pp. 931–943

  46. [46]

    Thinkfl: Self-refining failure localization for microservice sys- tems via reinforcement fine-tuning,

    L. Zhang, Y . Zhai, T. Jia, C. Duan, S. Yu, J. Gao, B. Ding, Z. Wu, and Y . Li, “Thinkfl: Self-refining failure localization for microservice sys- tems via reinforcement fine-tuning,”arXiv preprint arXiv:2504.18776, 2025. TOW ARDS IN-DEPTH ROOT CAUSE LOCALIZATION FOR MICROSERVICES WITH MULTI-AGENT RECURSION-OF-THOUGHT 17

  47. [47]

    Scalalog: Scalable log-based failure diagnosis using llm,

    L. Zhang, T. Jia, M. Jia, Y . Wu, H. Liu, and Y . Li, “Scalalog: Scalable log-based failure diagnosis using llm,” inICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

  48. [48]

    Agentfm: Role-aware failure management for distributed databases with llm- driven multi-agents,

    L. Zhang, Y . Zhai, T. Jia, X. Huang, C. Duan, and Y . Li, “Agentfm: Role-aware failure management for distributed databases with llm- driven multi-agents,”arXiv preprint arXiv:2504.06614, 2025

  49. [49]

    Automated root causing of cloud incidents using in-context learning with gpt-4,

    X. Zhang, S. Ghosh, C. Bansal, R. Wang, M. Ma, Y . Kang, and S. Ra- jmohan, “Automated root causing of cloud incidents using in-context learning with gpt-4,” inCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 2024, pp. 266–277

  50. [50]

    Face it yourselves: An llm-based two-stage strategy to localize configuration errors via logs,

    S. Shan, Y . Huo, Y . Su, Y . Li, D. Li, and Z. Zheng, “Face it yourselves: An llm-based two-stage strategy to localize configuration errors via logs,”arXiv preprint arXiv:2404.00640, 2024

  51. [51]

    Openrca: Can large language models locate the root cause of software failures?

    J. Xu, Q. Zhang, Z. Zhong, S. He, C. Zhang, Q. Lin, D. Pei, P. He, D. Zhang, and Q. Zhang, “Openrca: Can large language models locate the root cause of software failures?” inThe Thirteenth International Conference on Learning Representations, 2025

  52. [52]

    Lever- aging large language models for the auto-remediation of microservice applications: An experimental study,

    K. Sarda, Z. Namrud, M. Litoiu, L. Shwartz, and I. Watts, “Lever- aging large language models for the auto-remediation of microservice applications: An experimental study,” inCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 2024, pp. 358–369

  53. [53]

    AIOPS 2022 Championship,

    “AIOPS 2022 Championship,” https://competition.aiops.cn/, 2022

  54. [54]

    Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey,

    J. Soldani and A. Brogi, “Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey,”ACM Computing Surveys (CSUR), vol. 55, no. 3, pp. 1–39, 2022

  55. [55]

    Characterizing job microarchitectural profiles at scale: Dataset and analysis,

    K. Wang, Y . Li, C. Wang, T. Jia, K. Chow, Y . Wen, Y . Dou, G. Xu, C. Hou, J. Yaoet al., “Characterizing job microarchitectural profiles at scale: Dataset and analysis,” inProceedings of the 51st International Conference on Parallel Processing, 2022, pp. 1–11

  56. [56]

    Capturing request execution path for understanding service behavior and detecting anomalies without code instrumentation,

    Y . Yang, L. Wang, J. Gu, and Y . Li, “Capturing request execution path for understanding service behavior and detecting anomalies without code instrumentation,”IEEE Transactions on Services Computing, vol. 16, no. 2, pp. 996–1010, 2022

  57. [57]

    Network-centric distributed tracing with deepflow: Troubleshooting your microservices in zero code,

    J. Shen, H. Zhang, Y . Xiang, X. Shi, X. Li, Y . Shen, Z. Zhang, Y . Wu, X. Yin, J. Wanget al., “Network-centric distributed tracing with deepflow: Troubleshooting your microservices in zero code,” in Proceedings of the ACM SIGCOMM 2023 Conference, 2023, pp. 420– 437

  58. [58]

    Agentic memory enhanced recursive reasoning for root cause localization in microservices,

    L. Zhang, T. Jia, Y . Zhai, L. Pan, C. Duan, M. He, M. Jia, and Y . Li, “Agentic memory enhanced recursive reasoning for root cause localization in microservices,”arXiv preprint arXiv:2601.02732, 2026

  59. [59]

    Simplifying root cause analysis in kubernetes with stategraph and llm,

    Y . Xiang, C. P. Chen, L. Zeng, W. Yin, X. Liu, H. Li, and W. Xu, “Simplifying root cause analysis in kubernetes with stategraph and llm,”arXiv preprint arXiv:2506.02490, 2025

  60. [60]

    Gala: Can graph-augmented large language model agentic workflows elevate root cause analysis?

    Y . Tian, Y . Liu, Z. Chong, Z. Huang, and H.-A. Jacobsen, “Gala: Can graph-augmented large language model agentic workflows elevate root cause analysis?”arXiv preprint arXiv:2508.12472, 2025

  61. [61]

    Tvdiag: A task-oriented and view-invariant failure diagnosis frame- work for microservice-based systems with multimodal data,

    S. Xie, J. Wang, H. He, Z. Wang, Y . Zhao, N. Zhang, and B. Li, “Tvdiag: A task-oriented and view-invariant failure diagnosis frame- work for microservice-based systems with multimodal data,”ACM Transactions on Software Engineering and Methodology, 2025

  62. [62]

    Causalrca: Causal inference based pre- cise fine-grained root cause localization for microservice applications,

    R. Xin, P. Chen, and Z. Zhao, “Causalrca: Causal inference based pre- cise fine-grained root cause localization for microservice applications,” Journal of Systems and Software, vol. 203, p. 111724, 2023

  63. [63]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

  64. [64]

    Evaluating Large Language Models Trained on Code

    M. Chen, “Evaluating large language models trained on code,”arXiv preprint arXiv:2107.03374, 2021

  65. [65]

    Enjoy your observability: an industrial survey of microservice tracing and analysis,

    B. Li, X. Peng, Q. Xiang, H. Wang, T. Xie, J. Sun, and X. Liu, “Enjoy your observability: an industrial survey of microservice tracing and analysis,”Empirical Software Engineering, vol. 27, pp. 1–28, 2022

  66. [66]

    Characterizing microservice dependency and performance: Alibaba trace analysis,

    S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y . Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” inProceedings of the ACM symposium on cloud computing, 2021, pp. 412–426

  67. [67]

    Ora: Job runtime prediction for high-performance computing platforms using the online retrieval-augmented language model,

    H. Liu, Y . Ma, X. Huang, L. Zhang, T. Jia, and Y . Li, “Ora: Job runtime prediction for high-performance computing platforms using the online retrieval-augmented language model,” inProceedings of the 39th ACM International Conference on Supercomputing, 2025, pp. 884–894

  68. [68]

    Walk the talk: Is your log-based software reliability maintenance system really reliable?

    M. He, T. Jia, C. Duan, P. Xiao, L. Zhang, K. Wang, Y . Wu, Y . Li, and G. Huang, “Walk the talk: Is your log-based software reliability maintenance system really reliable?”arXiv preprint arXiv:2509.24352, 2025

  69. [69]

    United we stand: Towards end-to-end log- based fault diagnosis via interactive multi-task learning,

    M. He, C. Duan, P. Xiao, T. Jia, S. Yu, L. Zhang, W. Hong, J. Han, Y . Wu, Y . Liet al., “United we stand: Towards end-to-end log- based fault diagnosis via interactive multi-task learning,”arXiv preprint arXiv:2509.24364, 2025

  70. [70]

    Uda-rcl: Unsupervised domain adaptation for microservice root cause localization utilizing multimodal data,

    X. Huang, H. Liu, Y . Wu, L. Zhang, T. Jia, Y . Li, and Z. Wu, “Uda-rcl: Unsupervised domain adaptation for microservice root cause localization utilizing multimodal data,”IEEE Transactions on Services Computing, 2025

  71. [71]

    Aaad: Asynchronous inter-variable relationship-aware anomaly detection for multivariate time series,

    H. Liu, X. Huang, M. Jia, L. Zhang, T. Jia, Z. Wu, and Y . Li, “Aaad: Asynchronous inter-variable relationship-aware anomaly detection for multivariate time series,” in2025 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2025, pp. 1–6

  72. [72]

    Logaction: Consistent cross-system anomaly detection through logs via active domain adaptation,

    C. Duan, M. He, P. Xiao, T. Jia, X. Zhang, Z. Zhong, X. Luo, Y . Niu, L. Zhang, S. Yuet al., “Logaction: Consistent cross-system anomaly detection through logs via active domain adaptation,” in 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2025, pp. 700–712

  73. [73]

    Coorlog: Efficient-generalizable log anomaly detection via adaptive coordinator in software evolution,

    P. Xiao, C. Duan, M. He, T. Jia, Y . Wu, J. Xu, G. Gao, L. Zhang, W. Hong, Y . Liet al., “Coorlog: Efficient-generalizable log anomaly detection via adaptive coordinator in software evolution,” in2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2025, pp. 1119–1131

  74. [74]

    Latent error prediction and fault localization for microservice applications by learning from system trace logs,

    X. Zhou, X. Peng, T. Xie, J. Sun, C. Ji, D. Liu, Q. Xiang, and C. He, “Latent error prediction and fault localization for microservice applications by learning from system trace logs,” inProceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, 2019, pp. 683–694

  75. [75]

    Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks,

    P. Liu, H. Xu, Q. Ouyang, R. Jiao, Z. Chen, S. Zhang, J. Yang, L. Mo, J. Zeng, W. Xueet al., “Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks,” in2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2020, pp. 48–58

  76. [76]

    Sage: practical and scalable ml-driven performance debugging in microservices,

    Y . Gan, M. Liang, S. Dev, D. Lo, and C. Delimitrou, “Sage: practical and scalable ml-driven performance debugging in microservices,” in Proceedings of the 26th ACM International Conference on Architec- tural Support for Programming Languages and Operating Systems, 2021, pp. 135–151

  77. [77]

    Practical root cause localization for microservice systems via trace analysis,

    Z. Li, J. Chen, R. Jiao, N. Zhao, Z. Wang, S. Zhang, Y . Wu, L. Jiang, L. Yan, Z. Wanget al., “Practical root cause localization for microservice systems via trace analysis,” in2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS). IEEE, 2021, pp. 1–10

  78. [78]

    Lag-llama: Towards foundation models for time series forecasting,

    K. Rasul, A. Ashok, A. R. Williams, A. Khorasani, G. Adamopoulos, R. Bhagwatkar, M. Bilo ˇs, H. Ghonia, N. Hassen, A. Schneideret al., “Lag-llama: Towards foundation models for time series forecasting,” inR0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models, 2023

  79. [79]

    Timer: generative pre-trained transformers are large time series models,

    Y . Liu, H. Zhang, C. Li, X. Huang, J. Wang, and M. Long, “Timer: generative pre-trained transformers are large time series models,” in Proceedings of the 41st International Conference on Machine Learn- ing, 2024, pp. 32 369–32 399

  80. [80]

    A decoder-only foundation model for time-series forecasting,

    A. Das, W. Kong, R. Sen, and Y . Zhou, “A decoder-only foundation model for time-series forecasting,” inForty-first International Confer- ence on Machine Learning, 2024

Showing first 80 references.