BLAgent: Agentic RAG for File-Level Bug Localization

Gias Uddin; Md Afif Al Mamun

arxiv: 2605.17965 · v1 · pith:BSKQGRZ4new · submitted 2026-05-18 · 💻 cs.SE · cs.AI

BLAgent: Agentic RAG for File-Level Bug Localization

Md Afif Al Mamun , Gias Uddin This is my paper

Pith reviewed 2026-05-20 09:27 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords bug localizationagentic RAGfile-levelSWE-bench Liteautomated program repairlarge language modelscode chunkingsoftware maintenance

0 comments

The pith

BLAgent's agentic RAG localizes bugs to the right file at over 78% top-1 accuracy using open-source models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BLAgent as a way to identify which file in a code repository contains a bug, a step that often limits progress on fixing code automatically or analyzing root causes. Current retrieval methods for this task are static and do not reason enough to pick the faulty file reliably. BLAgent adds three pieces: it chunks the repository while keeping path and AST structure, rewrites the bug description to pull both structural and runtime signals, and reranks a small list of candidate files first by rules then by step-by-step evidence. If these pieces work together, they let large language models ground their answers in the right code without scanning everything or running up high costs.

Core claim

BLAgent integrates code structure-aware repository encoding with path-augmented AST-based chunking, dual-perspective query transformation capturing both structural and behavioral signals, and two-phase agentic reranking combining symbolic inspection with evidence-grounded reasoning to perform accurate file-level bug localization over a compact candidate set.

What carries the argument

The agentic RAG framework with path-augmented AST chunking for repository encoding, dual-perspective query transformation, and two-phase symbolic-plus-reasoning reranking that balances accuracy and cost through bounded reasoning.

If this is right

BLAgent reaches over 78% top-1 accuracy with open-source models on SWE-bench Lite.
Accuracy exceeds 86% when a closed-source model is used instead.
The method runs more than 18 times cheaper than the strongest baseline that uses the same model.
Plugging BLAgent into an automated program repair pipeline raises the final repair success rate by over 20%.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bounded-reasoning pattern could reduce wasted context in other code-search tasks such as finding functions to edit during refactoring.
Cost reductions of this size might let teams run file-level checks on every commit rather than only on reported bugs.
If the dual-perspective rewrite proves robust, it could be reused as a lightweight add-on for any retrieval system that needs both static and dynamic cues.

Load-bearing premise

The three components of path-augmented AST chunking, dual-perspective query transformation, and two-phase reranking together produce accurate reasoning over a compact set of files that works across benchmarks and models.

What would settle it

Accuracy falling well below 50 percent top-1 on a fresh set of bug reports from different repositories or languages would show the components do not deliver the claimed bounded accuracy.

Figures

Figures reproduced from arXiv: 2605.17965 by Gias Uddin, Md Afif Al Mamun.

**Figure 2.** Figure 2: Overall outline of the proposed localization approach. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of naive text-based versus AST-aware code splitting. The naive splitter (a) breaks the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Query transformation of human-reported bug (Example Bug: [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Reranking of the candidate files with ReAct agent. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Basic RAG pipeline for file-level localization. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Two cases illustrating how path-augmented code chunking improves retrieval similarity. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: File-level localization in dense retrieval when the correct file appears in the Top-1,3,10 locations. [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Example of different query transformations and retrieved files. [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Integration of BLAgent into another APR framework. [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

**Figure 11.** Figure 11: Overlap of repaired issues across multiple runs using different localization strategies. [PITH_FULL_IMAGE:figures/full_fig_p028_11.png] view at source ↗

**Figure 12.** Figure 12: Overall resolution and failure percentage at different levels of program repair stage. [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗

**Figure 13.** Figure 13: Comparison of (a) APR-generated incorrect patch, and (b) Ground-truth patch for [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗

**Figure 14.** Figure 14: Example of failed line level localization ( [PITH_FULL_IMAGE:figures/full_fig_p034_14.png] view at source ↗

**Figure 15.** Figure 15: Generated patch with correct line level information. [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

read the original abstract

Bug localization remains a key bottleneck in downstream software maintenance tasks, including root cause analysis, triage, and automated program repair (APR), despite recent advances in large language model (LLM)-based repair systems. File-level bug localization is especially critical in hierarchical pipelines, where errors can propagate to downstream stages such as statement-level localization or patch generation. While Retrieval-Augmented Generation (RAG) offers a promising direction for grounding LLMs in repository context, existing RAG pipelines rely on static retrieval and lack the reasoning needed to identify faulty code accurately. In this work, we present BLAgent, a novel agentic RAG framework for file-level bug localization that integrates three key ideas: (i) code structure-aware repository encoding with path-augmented AST-based chunking, (ii) dual-perspective query transformation capturing both structural and behavioral signals, and (iii) two-phase agentic reranking combining symbolic inspection with evidence-grounded reasoning. Unlike prior graph-based or multi-hop agentic approaches, BLAgent performs bounded reasoning over a compact candidate set, balancing accuracy and cost. On SWE-bench Lite, BLAgent attains over 78% Top-1 accuracy with open-source models and over 86% with a closed-source model, while being over 18x cheaper than the strongest baseline using the same model. When integrated into an APR framework, it improves end-to-end repair success by over 20%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BLAgent reports solid accuracy and cost numbers on SWE-bench Lite for file-level bug localization via three agentic RAG components, but the abstract gives no ablations or setup details to show those components actually cause the gains.

read the letter

The key takeaway is that this paper presents BLAgent as a way to do better file-level bug localization with an agentic RAG system that keeps reasoning bounded and cheap, hitting high accuracy on SWE-bench Lite while cutting costs a lot. But the abstract alone doesn't give enough to confirm the three components are really driving those results. What the work does is pull together three pieces: first, encoding the repo with path-augmented AST chunks to keep code structure in mind; second, rewriting queries from both structural and behavioral angles; and third, a two-phase reranker that does symbolic inspection followed by reasoning. This setup is meant to avoid the heavy lifting of full graph-based or multi-hop agent approaches. It positions itself as more efficient for the task. On the positive side, the claims are concrete. Over 78 percent top-1 accuracy with open-source models, over 86 with closed, more than 18 times cheaper than the best baseline with the same model, and when plugged into an APR setup it boosts repair success by over 20 percent. That kind of cost-accuracy trade-off could matter for real pipelines. The main concern is the missing evidence for why these specific ideas work. The abstract states the performance but skips over experimental setup, baselines, ablations, or tests on other repos. If the improvements are mostly from prompt engineering or the particular bugs in SWE-bench Lite, then the framework's novelty doesn't carry as much weight. The stress-test point about needing to see if the components are causal is fair here, and without that data it's difficult to buy the full story. This kind of paper would interest folks in software engineering who are trying to make LLM agents more reliable for code tasks. Someone looking for practical ways to improve retrieval in large codebases might pick up useful tricks from the described components. Overall, the ideas seem worth checking out in detail. I think it should go to peer review so reviewers can look at the full methods and results to see if the numbers check out and if the approach generalizes.

Referee Report

3 major / 2 minor

Summary. The paper introduces BLAgent, an agentic RAG framework for file-level bug localization in software repositories. It proposes three components: (i) path-augmented AST-based chunking for code structure-aware encoding, (ii) dual-perspective query transformation for structural and behavioral signals, and (iii) two-phase agentic reranking with symbolic inspection and evidence-grounded reasoning. The central empirical claims are that BLAgent achieves over 78% Top-1 accuracy on SWE-bench Lite using open-source models and over 86% with closed-source models, is more than 18x cheaper than the strongest baseline with the same model, and yields over 20% improvement in end-to-end repair success when integrated into an APR pipeline.

Significance. If the reported performance gains and cost reductions are shown to be robust and attributable to the proposed mechanisms, the work would represent a meaningful advance in repository-scale bug localization. It could improve the reliability of hierarchical software maintenance pipelines and APR systems by providing a more accurate and efficient way to ground LLMs in repository context without unbounded reasoning costs.

major comments (3)

[Abstract and §4] Abstract and §4 (Experimental Evaluation): The manuscript reports strong Top-1 accuracy, cost reduction, and APR improvement figures but provides no ablation studies that isolate the individual contributions of path-augmented AST chunking, dual-perspective query transformation, and two-phase reranking. Without these results it is impossible to determine whether the claimed gains are caused by the agentic RAG design or by properties of the base models, prompt engineering, or the SWE-bench Lite bug distribution.
[§4.1] §4.1 (Baselines and Metrics): The abstract states that BLAgent is over 18x cheaper than the strongest baseline using the same model, yet the paper supplies no description of the baseline systems, their retrieval mechanisms, or the exact cost metric (token usage, API calls, or wall-clock time). This omission prevents verification of the cost claim and its load-bearing role in the central argument.
[§4.3] §4.3 (Generalization): No results are presented on repositories or benchmarks outside SWE-bench Lite. The claim that the three components produce bounded, accurate reasoning over a compact candidate set therefore rests on a single benchmark whose bug distribution may not be representative, weakening the assertion that the framework generalizes.

minor comments (2)

[Abstract] The abstract and introduction use the term 'bounded reasoning' without a precise definition or complexity bound; a short paragraph clarifying what 'bounded' means in terms of candidate set size or reasoning steps would improve clarity.
[Figures and Tables] Figure captions and table headers should explicitly state the models (open-source vs. closed-source) and the exact Top-1 accuracy numbers rather than relying on the prose description.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We address each major comment point by point below, indicating where we will revise the manuscript to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract and §4] The manuscript reports strong Top-1 accuracy, cost reduction, and APR improvement figures but provides no ablation studies that isolate the individual contributions of path-augmented AST chunking, dual-perspective query transformation, and two-phase reranking. Without these results it is impossible to determine whether the claimed gains are caused by the agentic RAG design or by properties of the base models, prompt engineering, or the SWE-bench Lite bug distribution.

Authors: We agree that explicit ablation studies are required to attribute performance gains to the proposed mechanisms. In the revised manuscript we will add a new subsection in §4 that reports results after systematically ablating each component in turn (path-augmented AST chunking, dual-perspective query transformation, and two-phase reranking) while keeping all other factors fixed. These experiments will quantify the contribution of each element to Top-1 accuracy and cost. revision: yes
Referee: [§4.1] The abstract states that BLAgent is over 18x cheaper than the strongest baseline using the same model, yet the paper supplies no description of the baseline systems, their retrieval mechanisms, or the exact cost metric (token usage, API calls, or wall-clock time). This omission prevents verification of the cost claim and its load-bearing role in the central argument.

Authors: We accept that the current description of baselines and cost measurement is insufficient. We will expand §4.1 with full specifications of every baseline, including their retrieval strategies and implementation details, and will state explicitly that cost is measured as total input plus output tokens across all LLM calls (retrieval and generation) using the same model for fair comparison. This will allow direct verification of the 18x reduction. revision: yes
Referee: [§4.3] No results are presented on repositories or benchmarks outside SWE-bench Lite. The claim that the three components produce bounded, accurate reasoning over a compact candidate set therefore rests on a single benchmark whose bug distribution may not be representative, weakening the assertion that the framework generalizes.

Authors: SWE-bench Lite is the current standard benchmark for repository-level bug localization because it consists of real GitHub issues with full repository context. Nevertheless, we acknowledge that results on additional benchmarks would strengthen the generalization argument. In the revision we will add a dedicated paragraph in §4.3 discussing the representativeness of SWE-bench Lite and will include a limitations subsection that explicitly notes the single-benchmark scope and outlines plans for future multi-benchmark evaluation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical performance claims on external benchmark

full rationale

The paper presents BLAgent as a novel agentic RAG framework incorporating path-augmented AST chunking, dual-perspective query transformation, and two-phase reranking, then reports empirical results on SWE-bench Lite (over 78% Top-1 with open-source models, over 86% with closed-source, 18x cheaper, and 20% APR improvement). No mathematical derivations, equations, fitted parameters, or self-referential definitions appear in the provided text. The performance figures are tied directly to an external benchmark rather than any internal reduction or self-citation chain, making the central claims self-contained empirical observations without circular structure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard assumptions about LLM reasoning capabilities over code and the representativeness of SWE-bench Lite; no free parameters, new invented entities, or non-standard axioms are introduced in the abstract.

axioms (1)

domain assumption Large language models can perform reliable symbolic inspection and evidence-grounded reasoning when given compact, well-structured code context.
The two-phase agentic reranking step depends on this capability.

pith-pipeline@v0.9.0 · 5784 in / 1317 out tokens · 29549 ms · 2026-05-20T09:27:18.545462+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 9 internal anchors

[1]

Abreu, P

R. Abreu, P. Zoeteweij, and A. J. Van Gemund. On the accuracy of spectrum-based fault localization. InTesting: Academic and industrial conference practice and research techniques-MUTATION (TAICPART-MUTATION 2007), pages 89–98. IEEE, 2007

work page 2007
[2]

M. Asad, R. M. Yasir, A. Geramirad, and S. Malek. Leveraging large language model for information retrieval-based bug localization. arXiv preprint arXiv:2508.00253, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Bettenburg, S

N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann. What makes a good bug report? InProceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 308–318, 2008

work page 2008
[4]

Böhme, E

M. Böhme, E. O. Soremekun, S. Chattopadhyay, E. Ugherughe, and A. Zeller. Where is the bug and how is it fixed? an experiment with practitioners. InProceedings of the 2017 11th joint meeting on foundations of software engineering, pages 117–128, 2017

work page 2017
[5]

RepairAgent: An Autonomous, LLM-Based Agent for Program Repair

I. Bouzenia, P. Devanbu, and M. Pradel. Repairagent: An autonomous, llm-based agent for program repair.arXiv preprint arXiv:2403.17134, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

C.-M. Chan, C. Xu, R. Yuan, H. Luo, W. Xue, Y. Guo, and J. Fu. Rq-rag: Learning to refine queries for retrieval augmented generation. arXiv preprint arXiv:2404.00610, 2024

work page arXiv 2024
[7]

Chang, X

J. Chang, X. Zhou, L. Lulu, D. Lo, and B. Li. Bridging bug localization and issue fixing: A hierarchical localization framework leveraging large language models.IEEE Transactions on Software Engineering, 2026

work page 2026
[8]

A. R. Chen, T.-H. Chen, and S. Wang. Pathidea: Improving information retrieval-based bug localization by re-constructing execution paths using logs.IEEE Transactions on Software Engineering, 48(8):2905–2919, 2021

work page 2021
[9]

Z. Chen, R. Tang, G. Deng, F. Wu, J. Wu, Z. Jiang, V. Prasanna, A. Cohan, and X. Wang. Locagent: Graph-guided llm agents for code localization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8697–8727, 2025

work page 2025
[10]

Z. Fan, X. Gao, M. Mirchev, A. Roychoudhury, and S. H. Tan. Automated repair of programs from large language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1469–1481. IEEE, 2023

work page 2023
[11]

T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Huang, W

L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025. , Vol. 1, No. 1, Article . Publication date: May 2026. BLAgent: Agentic RAG for File-Level Bug Loca...

work page 2025
[13]

Understanding the planning of LLM agents: A survey

X. Huang, W. Liu, X. Chen, X. Wang, H. Wang, D. Lian, Y. Wang, R. Tang, and E. Chen. Understanding the planning of llm agents: A survey.arXiv preprint arXiv:2402.02716, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Jiang, X

Z. Jiang, X. Ren, M. Yan, W. Jiang, Y. Li, and Z. Liu. Cosil: Software issue localization via llm-driven code repository graph searching. arXiv preprint arXiv:2503.22424, 2025

work page arXiv 2025
[15]

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan. SWE-bench: Can language models resolve real-world github issues? InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[16]

J. A. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. InProceedings of the 24th international conference on Software engineering, pages 467–477, 2002

work page 2002
[17]

Joshi, J

H. Joshi, J. C. Sanchez, S. Gulwani, V. Le, G. Verbruggen, and I. Radiček. Repair is nearly generation: Multilingual program repair with llms. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5131–5140, 2023

work page 2023
[18]

R. Just, D. Jalali, and M. D. Ernst. Defects4j: a database of existing faults to enable controlled testing studies for java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis, ISSTA 2014, page 437–440, New York, NY, USA, 2014. Association for Computing Machinery

work page 2014
[19]

S. Kang, G. An, and S. Yoo. A quantitative and qualitative evaluation of llm-based explainable fault localization.Proceedings of the ACM on Software Engineering, 1(FSE):1424–1446, 2024

work page 2024
[20]

A. N. Lam, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. Bug localization with combination of deep learning and information retrieval. In2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pages 218–229. IEEE, 2017

work page 2017
[21]

X. B. D. Le, D. Lo, and C. Le Goues. History driven program repair. In2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), volume 1, pages 213–224. IEEE, 2016

work page 2016
[22]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020
[23]

F. Li, J. Jiang, J. Sun, and H. Zhang. Hybrid automated program repair by combining large language models and program analysis.ACM Transactions on Software Engineering and Methodology, 34(7):1–28, 2025

work page 2025
[24]

X. Li, W. Li, Y. Zhang, and L. Zhang. Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. InProceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pages 169–180, 2019

work page 2019
[25]

Z. Li, J. Wang, Z. Jiang, H. Mao, Z. Chen, J. Du, Y. Zhang, F. Zhang, D. Zhang, and Y. Liu. Dmqr-rag: Diverse multi-query rewriting for rag.arXiv preprint arXiv:2411.13154, 2024

work page arXiv 2024
[26]

K. Lin, K. Lo, J. E. Gonzalez, and D. Klein. Decomposing complex queries for tip-of-the-tongue retrieval.arXiv preprint arXiv:2305.15053, 2023

work page arXiv 2023
[27]

K. Liu, A. Koyuncu, D. Kim, and T. F. Bissyandé. Tbar: Revisiting template-based automated program repair. InProceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pages 31–42, 2019

work page 2019
[28]

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang. Lost in the middle: How language models use long contexts.arXiv preprint arXiv:2307.03172, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Y. Lu, M. Bartolo, A. Moore, S. Riedel, and P. Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity.arXiv preprint arXiv:2104.08786, 2021

work page arXiv 2021
[30]

X. Ma, Y. Gong, P. He, N. Duan, et al. Query rewriting in retrieval-augmented large language models. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[31]

Y. Ma, Q. Yang, R. Cao, B. Li, F. Huang, and Y. Li. Alibaba lingmaagent: Improving automated issue resolution via comprehensive repository exploration. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, pages 238–249, 2025

work page 2025
[32]

Z. Ma, A. R. Chen, D. J. Kim, T.-H. Chen, and S. Wang. Llmparser: An exploratory study on using large language models for log parsing. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13, 2024

work page 2024
[33]

Macháček, A

R. Macháček, A. Grishina, M. Hort, and L. Moonen. The impact of fine-tuning large language models on automated program repair. arXiv preprint arXiv:2507.19909, 2025

work page arXiv 2025
[34]

Y. A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE transactions on pattern analysis and machine intelligence, 42(4):824–836, 2018

work page 2018
[35]

X. Meng, X. Wang, H. Zhang, H. Sun, and X. Liu. Improving fault localization and program repair with deep semantic features and transferred knowledge. InProceedings of the 44th International Conference on Software Engineering, pages 1169–1180, 2022

work page 2022
[36]

F. Niu, C. Li, K. Liu, X. Xia, and D. Lo. When deep learning meets information retrieval-based bug localization: A survey.ACM Computing Surveys, 57(11):1–41, 2025

work page 2025
[37]

M. R. Parvez, W. U. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang. Retrieval augmented code generation and summarization.arXiv preprint arXiv:2108.11601, 2021

work page arXiv 2021
[38]

Y. Qin, S. Wang, Y. Lou, J. Dong, K. Wang, X. Li, and X. Mao. Agentfl: Scaling llm-based fault localization to project-level context.arXiv preprint arXiv:2403.16362, 2024. , Vol. 1, No. 1, Article . Publication date: May 2026. 44•Md Afif Al Mamun and Gias Uddin

work page arXiv 2024
[39]

R. Qu, R. Tu, and F. Bao. Is semantic chunking worth the computational cost? InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2155–2177, 2025

work page 2025
[40]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1908
[41]

R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. Improving bug localization using structured information retrieval. In2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 345–355. IEEE, 2013

work page 2013
[42]

A. M. Samir and M. M. Rahman. Improved ir-based bug localization with intelligent relevance feedback.arXiv preprint arXiv:2501.10542, 2025

work page arXiv 2025
[43]

Sawarkar, A

K. Sawarkar, A. Mangal, and S. R. Solanki. Blended rag: Improving rag (retriever-augmented generation) accuracy with semantic search and hybrid query-based retrievers. In2024 IEEE 7th international conference on multimedia information processing and retrieval (MIPR), pages 155–161. IEEE, 2024

work page 2024
[44]

Shao and T

S. Shao and T. Yu. Enhancing ir-based fault localization using large language models.arXiv preprint arXiv:2412.03754, 2024

work page arXiv 2024
[45]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[46]

The developer coefficient: Software engineering efficiency and its $3 trillion impact on global GDP

Stripe. The developer coefficient: Software engineering efficiency and its $3 trillion impact on global GDP. https://stripe.com/files/ reports/the-developer-coefficient.pdf, Sept. 2018. Accessed: 2026-03-24

work page 2018
[47]

Y. Tao, Y. Qin, and Y. Liu. Retrieval-augmented code generation: A survey with focus on repository-level approaches.arXiv preprint arXiv:2510.04905, 2025

work page internal anchor Pith review arXiv 2025
[48]

Q. Wang, C. Parnin, and A. Orso. Evaluating the usefulness of ir-based fault localization techniques. InProceedings of the 2015 international symposium on software testing and analysis, pages 1–11, 2015

work page 2015
[49]

Wang and D

S. Wang and D. Lo. Version history, similar report, and structure: Putting them together for improved bug localization. InProceedings of the 22nd international conference on program comprehension, pages 53–63, 2014

work page 2014
[50]

X. Wang, B. Li, Y. Song, F. F. Xu, X. Tang, M. Zhuge, J. Pan, Y. Song, B. Li, J. Singh, et al. Openhands: An open platform for ai software developers as generalist agents.arXiv preprint arXiv:2407.16741, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[51]

W. E. Wong, V. Debroy, R. Gao, and Y. Li. The dstar method for effective software fault localization.IEEE Transactions on Reliability, 63(1):290–308, 2013

work page 2013
[52]

W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa. A survey on software fault localization.IEEE Transactions on Software Engineering, 42(8):707–740, 2016

work page 2016
[53]

W. E. Wong, R. Gao, Y. Li, R. Abreu, F. Wotawa, and D. Li. Software fault localization: An overview of research, techniques, and tools. Handbook of Software Fault Localization: Foundations and Advances, pages 1–117, 2023

work page 2023
[54]

Y. Wu, Z. Li, J. M. Zhang, M. Papadakis, M. Harman, and Y. Liu. Large language models in fault localisation.arXiv preprint arXiv:2308.15276, 2023

work page arXiv 2023
[55]

C. S. Xia, Y. Deng, S. Dunn, and L. Zhang. Demystifying llm-based software engineering agents.Proc. ACM Softw. Eng., 2(FSE), June 2025

work page 2025
[56]

C. S. Xia, Y. Wei, and L. Zhang. Automated program repair in the era of large pre-trained language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1482–1494. IEEE, 2023

work page 2023
[57]

C. S. Xia and L. Zhang. Conversational automated program repair.arXiv preprint arXiv:2301.13246, 2023

work page arXiv 2023
[58]

Y. Xiao, J. Keung, K. E. Bennin, and Q. Mi. Improving bug localization with word embedding and enhanced convolutional neural networks.Information and Software Technology, 105:17–29, 2019

work page 2019
[59]

B. Yang, Z. Cai, F. Liu, B. Le, L. Zhang, T. F. Bissyandé, Y. Liu, and H. Tian. A survey of llm-based automated program repair: Taxonomies, design paradigms, and applications.arXiv preprint arXiv:2506.23749, 2025

work page arXiv 2025
[60]

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

work page 2024
[61]

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao. React: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations, 2022

work page 2022
[62]

Zhang, Y

M. Zhang, Y. Li, X. Li, L. Chen, Y. Zhang, L. Zhang, and S. Khurshid. An empirical study of boosting spectrum-based fault localization via pagerank.IEEE Transactions on Software Engineering, 47(6):1089–1113, 2019

work page 2019
[63]

Zhang, C

Q. Zhang, C. Fang, Y. Xie, Y. Ma, W. Sun, Y. Yang, and Z. Chen. A systematic literature review on large language models for automated program repair.arXiv preprint arXiv:2405.01466, 2024

work page arXiv 2024
[64]

Zhang, T

T. Zhang, T. Yu, T. Hashimoto, M. Lewis, W.-t. Yih, D. Fried, and S. Wang. Coder reviewer reranking for code generation. InInternational Conference on Machine Learning, pages 41832–41846. PMLR, 2023

work page 2023
[65]

Zhang, H

Y. Zhang, H. Ruan, Z. Fan, and A. Roychoudhury. Autocoderover: Autonomous program improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 1592–1604, 2024

work page 2024
[66]

Zhang, X

Y. Zhang, X. Zhao, Z. Z. Wang, C. Yang, J. Wei, and T. Wu. cast: Enhancing code retrieval-augmented generation with structural chunking via abstract syntax tree.arXiv preprint arXiv:2506.15655, 2025

work page arXiv 2025
[67]

Zhang, Q

Z. Zhang, Q. Dai, X. Bo, C. Ma, R. Li, X. Chen, J. Zhu, Z. Dong, and J.-R. Wen. A survey on the memory mechanism of large language model-based agents.ACM Transactions on Information Systems, 43(6):1–47, 2025. , Vol. 1, No. 1, Article . Publication date: May 2026. BLAgent: Agentic RAG for File-Level Bug Localization•45

work page 2025
[68]

Zhang, Y

Z. Zhang, Y. Lei, X. Mao, M. Yan, L. Xu, and X. Zhang. A study of effectiveness of deep learning in locating real faults.Information and Software Technology, 131:106486, 2021

work page 2021
[69]

Y. Zhao, S. Chen, J. Zhang, and Z. Li. Recode: Improving llm-based code repair with fine-grained retrieval-augmented generation.arXiv preprint arXiv:2509.02330, 2025

work page arXiv 2025
[70]

J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In2012 34th International conference on software engineering (ICSE), pages 14–24. IEEE, 2012. , Vol. 1, No. 1, Article . Publication date: May 2026

work page 2012

[1] [1]

Abreu, P

R. Abreu, P. Zoeteweij, and A. J. Van Gemund. On the accuracy of spectrum-based fault localization. InTesting: Academic and industrial conference practice and research techniques-MUTATION (TAICPART-MUTATION 2007), pages 89–98. IEEE, 2007

work page 2007

[2] [2]

M. Asad, R. M. Yasir, A. Geramirad, and S. Malek. Leveraging large language model for information retrieval-based bug localization. arXiv preprint arXiv:2508.00253, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Bettenburg, S

N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann. What makes a good bug report? InProceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 308–318, 2008

work page 2008

[4] [4]

Böhme, E

M. Böhme, E. O. Soremekun, S. Chattopadhyay, E. Ugherughe, and A. Zeller. Where is the bug and how is it fixed? an experiment with practitioners. InProceedings of the 2017 11th joint meeting on foundations of software engineering, pages 117–128, 2017

work page 2017

[5] [5]

RepairAgent: An Autonomous, LLM-Based Agent for Program Repair

I. Bouzenia, P. Devanbu, and M. Pradel. Repairagent: An autonomous, llm-based agent for program repair.arXiv preprint arXiv:2403.17134, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

C.-M. Chan, C. Xu, R. Yuan, H. Luo, W. Xue, Y. Guo, and J. Fu. Rq-rag: Learning to refine queries for retrieval augmented generation. arXiv preprint arXiv:2404.00610, 2024

work page arXiv 2024

[7] [7]

Chang, X

J. Chang, X. Zhou, L. Lulu, D. Lo, and B. Li. Bridging bug localization and issue fixing: A hierarchical localization framework leveraging large language models.IEEE Transactions on Software Engineering, 2026

work page 2026

[8] [8]

A. R. Chen, T.-H. Chen, and S. Wang. Pathidea: Improving information retrieval-based bug localization by re-constructing execution paths using logs.IEEE Transactions on Software Engineering, 48(8):2905–2919, 2021

work page 2021

[9] [9]

Z. Chen, R. Tang, G. Deng, F. Wu, J. Wu, Z. Jiang, V. Prasanna, A. Cohan, and X. Wang. Locagent: Graph-guided llm agents for code localization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8697–8727, 2025

work page 2025

[10] [10]

Z. Fan, X. Gao, M. Mirchev, A. Roychoudhury, and S. H. Tan. Automated repair of programs from large language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1469–1481. IEEE, 2023

work page 2023

[11] [11]

T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Huang, W

L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025. , Vol. 1, No. 1, Article . Publication date: May 2026. BLAgent: Agentic RAG for File-Level Bug Loca...

work page 2025

[13] [13]

Understanding the planning of LLM agents: A survey

X. Huang, W. Liu, X. Chen, X. Wang, H. Wang, D. Lian, Y. Wang, R. Tang, and E. Chen. Understanding the planning of llm agents: A survey.arXiv preprint arXiv:2402.02716, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Jiang, X

Z. Jiang, X. Ren, M. Yan, W. Jiang, Y. Li, and Z. Liu. Cosil: Software issue localization via llm-driven code repository graph searching. arXiv preprint arXiv:2503.22424, 2025

work page arXiv 2025

[15] [15]

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan. SWE-bench: Can language models resolve real-world github issues? InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[16] [16]

J. A. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. InProceedings of the 24th international conference on Software engineering, pages 467–477, 2002

work page 2002

[17] [17]

Joshi, J

H. Joshi, J. C. Sanchez, S. Gulwani, V. Le, G. Verbruggen, and I. Radiček. Repair is nearly generation: Multilingual program repair with llms. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5131–5140, 2023

work page 2023

[18] [18]

R. Just, D. Jalali, and M. D. Ernst. Defects4j: a database of existing faults to enable controlled testing studies for java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis, ISSTA 2014, page 437–440, New York, NY, USA, 2014. Association for Computing Machinery

work page 2014

[19] [19]

S. Kang, G. An, and S. Yoo. A quantitative and qualitative evaluation of llm-based explainable fault localization.Proceedings of the ACM on Software Engineering, 1(FSE):1424–1446, 2024

work page 2024

[20] [20]

A. N. Lam, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. Bug localization with combination of deep learning and information retrieval. In2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pages 218–229. IEEE, 2017

work page 2017

[21] [21]

X. B. D. Le, D. Lo, and C. Le Goues. History driven program repair. In2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), volume 1, pages 213–224. IEEE, 2016

work page 2016

[22] [22]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020

[23] [23]

F. Li, J. Jiang, J. Sun, and H. Zhang. Hybrid automated program repair by combining large language models and program analysis.ACM Transactions on Software Engineering and Methodology, 34(7):1–28, 2025

work page 2025

[24] [24]

X. Li, W. Li, Y. Zhang, and L. Zhang. Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. InProceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pages 169–180, 2019

work page 2019

[25] [25]

Z. Li, J. Wang, Z. Jiang, H. Mao, Z. Chen, J. Du, Y. Zhang, F. Zhang, D. Zhang, and Y. Liu. Dmqr-rag: Diverse multi-query rewriting for rag.arXiv preprint arXiv:2411.13154, 2024

work page arXiv 2024

[26] [26]

K. Lin, K. Lo, J. E. Gonzalez, and D. Klein. Decomposing complex queries for tip-of-the-tongue retrieval.arXiv preprint arXiv:2305.15053, 2023

work page arXiv 2023

[27] [27]

K. Liu, A. Koyuncu, D. Kim, and T. F. Bissyandé. Tbar: Revisiting template-based automated program repair. InProceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pages 31–42, 2019

work page 2019

[28] [28]

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang. Lost in the middle: How language models use long contexts.arXiv preprint arXiv:2307.03172, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

Y. Lu, M. Bartolo, A. Moore, S. Riedel, and P. Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity.arXiv preprint arXiv:2104.08786, 2021

work page arXiv 2021

[30] [30]

X. Ma, Y. Gong, P. He, N. Duan, et al. Query rewriting in retrieval-augmented large language models. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023

[31] [31]

Y. Ma, Q. Yang, R. Cao, B. Li, F. Huang, and Y. Li. Alibaba lingmaagent: Improving automated issue resolution via comprehensive repository exploration. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, pages 238–249, 2025

work page 2025

[32] [32]

Z. Ma, A. R. Chen, D. J. Kim, T.-H. Chen, and S. Wang. Llmparser: An exploratory study on using large language models for log parsing. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13, 2024

work page 2024

[33] [33]

Macháček, A

R. Macháček, A. Grishina, M. Hort, and L. Moonen. The impact of fine-tuning large language models on automated program repair. arXiv preprint arXiv:2507.19909, 2025

work page arXiv 2025

[34] [34]

Y. A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE transactions on pattern analysis and machine intelligence, 42(4):824–836, 2018

work page 2018

[35] [35]

X. Meng, X. Wang, H. Zhang, H. Sun, and X. Liu. Improving fault localization and program repair with deep semantic features and transferred knowledge. InProceedings of the 44th International Conference on Software Engineering, pages 1169–1180, 2022

work page 2022

[36] [36]

F. Niu, C. Li, K. Liu, X. Xia, and D. Lo. When deep learning meets information retrieval-based bug localization: A survey.ACM Computing Surveys, 57(11):1–41, 2025

work page 2025

[37] [37]

M. R. Parvez, W. U. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang. Retrieval augmented code generation and summarization.arXiv preprint arXiv:2108.11601, 2021

work page arXiv 2021

[38] [38]

Y. Qin, S. Wang, Y. Lou, J. Dong, K. Wang, X. Li, and X. Mao. Agentfl: Scaling llm-based fault localization to project-level context.arXiv preprint arXiv:2403.16362, 2024. , Vol. 1, No. 1, Article . Publication date: May 2026. 44•Md Afif Al Mamun and Gias Uddin

work page arXiv 2024

[39] [39]

R. Qu, R. Tu, and F. Bao. Is semantic chunking worth the computational cost? InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2155–2177, 2025

work page 2025

[40] [40]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1908

[41] [41]

R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. Improving bug localization using structured information retrieval. In2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 345–355. IEEE, 2013

work page 2013

[42] [42]

A. M. Samir and M. M. Rahman. Improved ir-based bug localization with intelligent relevance feedback.arXiv preprint arXiv:2501.10542, 2025

work page arXiv 2025

[43] [43]

Sawarkar, A

K. Sawarkar, A. Mangal, and S. R. Solanki. Blended rag: Improving rag (retriever-augmented generation) accuracy with semantic search and hybrid query-based retrievers. In2024 IEEE 7th international conference on multimedia information processing and retrieval (MIPR), pages 155–161. IEEE, 2024

work page 2024

[44] [44]

Shao and T

S. Shao and T. Yu. Enhancing ir-based fault localization using large language models.arXiv preprint arXiv:2412.03754, 2024

work page arXiv 2024

[45] [45]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[46] [46]

The developer coefficient: Software engineering efficiency and its $3 trillion impact on global GDP

Stripe. The developer coefficient: Software engineering efficiency and its $3 trillion impact on global GDP. https://stripe.com/files/ reports/the-developer-coefficient.pdf, Sept. 2018. Accessed: 2026-03-24

work page 2018

[47] [47]

Y. Tao, Y. Qin, and Y. Liu. Retrieval-augmented code generation: A survey with focus on repository-level approaches.arXiv preprint arXiv:2510.04905, 2025

work page internal anchor Pith review arXiv 2025

[48] [48]

Q. Wang, C. Parnin, and A. Orso. Evaluating the usefulness of ir-based fault localization techniques. InProceedings of the 2015 international symposium on software testing and analysis, pages 1–11, 2015

work page 2015

[49] [49]

Wang and D

S. Wang and D. Lo. Version history, similar report, and structure: Putting them together for improved bug localization. InProceedings of the 22nd international conference on program comprehension, pages 53–63, 2014

work page 2014

[50] [50]

X. Wang, B. Li, Y. Song, F. F. Xu, X. Tang, M. Zhuge, J. Pan, Y. Song, B. Li, J. Singh, et al. Openhands: An open platform for ai software developers as generalist agents.arXiv preprint arXiv:2407.16741, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[51] [51]

W. E. Wong, V. Debroy, R. Gao, and Y. Li. The dstar method for effective software fault localization.IEEE Transactions on Reliability, 63(1):290–308, 2013

work page 2013

[52] [52]

W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa. A survey on software fault localization.IEEE Transactions on Software Engineering, 42(8):707–740, 2016

work page 2016

[53] [53]

W. E. Wong, R. Gao, Y. Li, R. Abreu, F. Wotawa, and D. Li. Software fault localization: An overview of research, techniques, and tools. Handbook of Software Fault Localization: Foundations and Advances, pages 1–117, 2023

work page 2023

[54] [54]

Y. Wu, Z. Li, J. M. Zhang, M. Papadakis, M. Harman, and Y. Liu. Large language models in fault localisation.arXiv preprint arXiv:2308.15276, 2023

work page arXiv 2023

[55] [55]

C. S. Xia, Y. Deng, S. Dunn, and L. Zhang. Demystifying llm-based software engineering agents.Proc. ACM Softw. Eng., 2(FSE), June 2025

work page 2025

[56] [56]

C. S. Xia, Y. Wei, and L. Zhang. Automated program repair in the era of large pre-trained language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1482–1494. IEEE, 2023

work page 2023

[57] [57]

C. S. Xia and L. Zhang. Conversational automated program repair.arXiv preprint arXiv:2301.13246, 2023

work page arXiv 2023

[58] [58]

Y. Xiao, J. Keung, K. E. Bennin, and Q. Mi. Improving bug localization with word embedding and enhanced convolutional neural networks.Information and Software Technology, 105:17–29, 2019

work page 2019

[59] [59]

B. Yang, Z. Cai, F. Liu, B. Le, L. Zhang, T. F. Bissyandé, Y. Liu, and H. Tian. A survey of llm-based automated program repair: Taxonomies, design paradigms, and applications.arXiv preprint arXiv:2506.23749, 2025

work page arXiv 2025

[60] [60]

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

work page 2024

[61] [61]

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao. React: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations, 2022

work page 2022

[62] [62]

Zhang, Y

M. Zhang, Y. Li, X. Li, L. Chen, Y. Zhang, L. Zhang, and S. Khurshid. An empirical study of boosting spectrum-based fault localization via pagerank.IEEE Transactions on Software Engineering, 47(6):1089–1113, 2019

work page 2019

[63] [63]

Zhang, C

Q. Zhang, C. Fang, Y. Xie, Y. Ma, W. Sun, Y. Yang, and Z. Chen. A systematic literature review on large language models for automated program repair.arXiv preprint arXiv:2405.01466, 2024

work page arXiv 2024

[64] [64]

Zhang, T

T. Zhang, T. Yu, T. Hashimoto, M. Lewis, W.-t. Yih, D. Fried, and S. Wang. Coder reviewer reranking for code generation. InInternational Conference on Machine Learning, pages 41832–41846. PMLR, 2023

work page 2023

[65] [65]

Zhang, H

Y. Zhang, H. Ruan, Z. Fan, and A. Roychoudhury. Autocoderover: Autonomous program improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 1592–1604, 2024

work page 2024

[66] [66]

Zhang, X

Y. Zhang, X. Zhao, Z. Z. Wang, C. Yang, J. Wei, and T. Wu. cast: Enhancing code retrieval-augmented generation with structural chunking via abstract syntax tree.arXiv preprint arXiv:2506.15655, 2025

work page arXiv 2025

[67] [67]

Zhang, Q

Z. Zhang, Q. Dai, X. Bo, C. Ma, R. Li, X. Chen, J. Zhu, Z. Dong, and J.-R. Wen. A survey on the memory mechanism of large language model-based agents.ACM Transactions on Information Systems, 43(6):1–47, 2025. , Vol. 1, No. 1, Article . Publication date: May 2026. BLAgent: Agentic RAG for File-Level Bug Localization•45

work page 2025

[68] [68]

Zhang, Y

Z. Zhang, Y. Lei, X. Mao, M. Yan, L. Xu, and X. Zhang. A study of effectiveness of deep learning in locating real faults.Information and Software Technology, 131:106486, 2021

work page 2021

[69] [69]

Y. Zhao, S. Chen, J. Zhang, and Z. Li. Recode: Improving llm-based code repair with fine-grained retrieval-augmented generation.arXiv preprint arXiv:2509.02330, 2025

work page arXiv 2025

[70] [70]

J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In2012 34th International conference on software engineering (ICSE), pages 14–24. IEEE, 2012. , Vol. 1, No. 1, Article . Publication date: May 2026

work page 2012