pith. sign in

arxiv: 2508.00253 · v3 · submitted 2025-08-01 · 💻 cs.SE

Towards Explorative IRBL: Combining Semantic Retrieval with LLM-driven Iterative Code Exploration

Pith reviewed 2026-05-19 01:59 UTC · model grok-4.3

classification 💻 cs.SE
keywords bug localizationlarge language modelsinformation retrievalsoftware maintenancecode explorationiterative analysis
0
0 comments X

The pith

GenLoc identifies more buggy files by combining semantic retrieval with LLM iterative exploration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces GenLoc to improve information retrieval based bug localization. It addresses problems like vocabulary mismatch in traditional methods and poor context provision in LLM methods by using semantic retrieval to select initial candidates and then applying LLM-driven functions for iterative code exploration. The goal is to provide the model with just the right amount of relevant context from the codebase. If this works, it would allow more accurate bug localization in large software projects using both Java and Python, without depending on extra metadata.

Core claim

GenLoc is a new technique for bug localization that merges semantic retrieval with LLM-driven iterative code-exploration functions. These functions let the model analyze the codebase step by step to find the source files responsible for a given bug report. When tested on three benchmarks with large Java datasets and the Python SWE-bench Lite, GenLoc outperforms traditional, deep learning, and other LLM-based methods and succeeds on bugs that the others miss.

What carries the argument

The key mechanism is the set of LLM-driven code-exploration functions that enable iterative gathering and analysis of code context starting from semantically similar files.

If this is right

  • More bugs get localized correctly even when vocabulary does not match the bug report directly.
  • The approach works across different programming languages and project sizes.
  • LLMs can be used effectively for code tasks without being overwhelmed by entire repositories or limited to fixed candidates.
  • Developers benefit from fewer false positives in the list of suspected files.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method points to a broader principle that LLMs perform better on complex software tasks when they can actively query the codebase rather than receive static inputs.
  • Similar iterative exploration could be applied to tasks like automated program repair or vulnerability detection.
  • Testing on even larger codebases or with different LLMs would help confirm the robustness of the gains.

Load-bearing premise

That the LLM can use the exploration functions to collect enough relevant context without overlooking important files or adding misleading information.

What would settle it

If experiments on the Java and SWE-bench Lite benchmarks show that GenLoc does not achieve higher performance metrics than the compared methods or does not localize any additional bugs, the claim of substantial outperformance would not hold.

Figures

Figures reproduced from arXiv: 2508.00253 by Moumita Asad, Rafed Muhammad Yasir, Sam Malek.

Figure 1
Figure 1. Figure 1: Workflow of GenLoc. 3 Methodology GenLoc operates in two primary steps to localize relevant files based on a given bug report. First, it retrieves a set of semantically similar files using embedding-based similarity. Next, an LLM, supported by a set of external functions, iteratively analyzes the bug report and the code base. During this stage, the model may examine the embedding-based retrieved files or e… view at source ↗
Figure 2
Figure 2. Figure 2: LLM Prompt. At first, the LLM is prompted to assume the role of an expert software engineer specializing in bug localization, as role-playing improves the reasoning abilities of LLMs [26]. Next, the prompt is designed to decompose bug localization into a series of smaller tasks since LLMs perform better when complex tasks are broken down into sub-tasks [21]. These sub-tasks include analyzing the bug report… view at source ↗
Figure 3
Figure 3. Figure 3: Bug Report from Birt Project. mismatch between bug reports and source files, which hampers GenLoc’s ability to accurately rank relevant files. Furthermore, a chi-square test revealed a statistically significant relationship between the presence of reproduction information and the failure to localize bugs in the Birt project across all three GenLoc trials (p-value = 0). In contrast, DreamLoc considers addit… view at source ↗
Figure 4
Figure 4. Figure 4: Overlap Analysis between GenLoc and Non-LLM based IRBL techniques. Apart from Accuracy@k, MRR@10 and MAP@10, the number of unique bugs localized by each approach are examined. In this analysis, a bug is considered success￾fully localized if the correct buggy file appears within the top 10 ranked results (i.e., Accuracy@10) [25] [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overlap Analysis between GenLoc and Recent LLM-based Approaches. In [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Bug Report from Apache RocketMQ Project. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Bug Report from OpenAPI Generator Project. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Overlap analysis between Gen￾Loc and Its Ablated Variants. Although GenLoc-NoEmbed and GenLoc-Naive obtain sim￾ilar results, the overlap analysis ( [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

Information Retrieval-based Bug Localization (IRBL) aims to identify buggy source files for a given bug report. Traditional and deep learning-based IRBL techniques often suffer from vocabulary mismatch and dependence on project-specific metadata. In contrast, recent Large Language Model (LLM)-based approaches struggle to provide appropriate context to the model: they either restrict analysis to a fixed set of candidate files, overwhelm the model with repository-wide information, or rely on explicit bug report cues to guide context collection. To address these issues, we propose GenLoc, a technique that combines semantic retrieval with LLM-driven code-exploration functions to iteratively analyze the code base and identify buggy files. We evaluate GenLoc on three complementary benchmarks, including large-scale and recent Java datasets as well as the Python based SWE-bench Lite dataset. Results demonstrate that GenLoc substantially outperforms traditional IRBL, deep learning-based approaches and recent LLM-based methods, while also localizing bugs that other techniques fail to detect.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes GenLoc, an IRBL technique that combines semantic retrieval with LLM-driven iterative code-exploration functions to gather context and identify buggy files. It addresses limitations of traditional IRBL (vocabulary mismatch), DL-based methods (project-specific metadata), and prior LLM approaches (fixed candidates, repository overload, or cue dependence). Evaluation on three benchmarks (large-scale/recent Java datasets plus SWE-bench Lite) claims substantial outperformance over baselines and the ability to localize bugs missed by other techniques.

Significance. If the results are robust, the work offers a practical advance in LLM-assisted bug localization for large codebases by making context collection iterative and function-driven rather than static or exhaustive. This could improve recall on complex bugs where fixed retrieval fails, with direct relevance to software maintenance tools.

major comments (2)
  1. The central performance claims on SWE-bench Lite and the Java benchmarks rest on the assumption that the LLM-driven iterative exploration reliably retrieves sufficient relevant files without systematic under-retrieval or noise accumulation. The method description provides no coverage guarantees, backtracking, or post-exploration filtering, leaving open the possibility that reported gains are partly due to fortunate retrieval rather than the technique itself.
  2. Evaluation section: while the abstract states 'substantial outperformance,' the manuscript supplies no per-bug breakdown, statistical significance tests, or ablation isolating the iterative exploration component from the semantic retrieval baseline. This makes it difficult to verify that the exploration step is load-bearing for the cross-technique and cross-benchmark superiority claims.
minor comments (2)
  1. Abstract: the claim of localizing 'bugs that other techniques fail to detect' would be stronger with a brief quantitative note (e.g., number of unique bugs or recall@N delta) rather than a qualitative statement.
  2. Notation and figures: ensure that the exploration functions are given explicit pseudocode or a clear interface definition so readers can reproduce the iterative loop.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address the major comments point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: The central performance claims on SWE-bench Lite and the Java benchmarks rest on the assumption that the LLM-driven iterative exploration reliably retrieves sufficient relevant files without systematic under-retrieval or noise accumulation. The method description provides no coverage guarantees, backtracking, or post-exploration filtering, leaving open the possibility that reported gains are partly due to fortunate retrieval rather than the technique itself.

    Authors: We agree that formal coverage guarantees, backtracking, or post-exploration filtering are not described in the current manuscript. The iterative process is intended to allow the LLM to progressively gather context via function calls guided by initial semantic retrieval, which empirically reduces irrelevant exploration in our experiments. To address the concern directly, we will add a dedicated limitations subsection discussing retrieval reliability risks and include new empirical results on average files explored per bug and retrieval success rates in the revised manuscript. revision: partial

  2. Referee: Evaluation section: while the abstract states 'substantial outperformance,' the manuscript supplies no per-bug breakdown, statistical significance tests, or ablation isolating the iterative exploration component from the semantic retrieval baseline. This makes it difficult to verify that the exploration step is load-bearing for the cross-technique and cross-benchmark superiority claims.

    Authors: We acknowledge that these analyses are absent from the current evaluation section and would strengthen the claims. In the revised manuscript we will add statistical significance tests (McNemar’s test for top-k localization and Wilcoxon signed-rank for MRR/MAP differences) across all benchmarks. We will also report a new ablation comparing the full GenLoc pipeline against the semantic-retrieval-only baseline. A per-bug success breakdown will be provided in supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external benchmarks

full rationale

The paper proposes GenLoc as a hybrid IRBL technique and supports its performance claims through direct evaluation on three independent benchmarks (including SWE-bench Lite). No equations, fitted parameters, or self-referential derivations appear in the provided text. The central results are comparisons against external baselines rather than quantities defined by the authors' own prior outputs or ansatzes. Self-citations, if present, are not load-bearing for the reported gains. The derivation chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no mathematical derivations, free parameters, or new physical entities. The contribution is an algorithmic technique whose correctness depends on empirical performance rather than axioms or invented constructs.

pith-pipeline@v0.9.0 · 5699 in / 1156 out tokens · 77281 ms · 2026-05-19T01:59:16.575524+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. BLAgent: Agentic RAG for File-Level Bug Localization

    cs.SE 2026-05 unverdicted novelty 6.0

    BLAgent achieves over 78% Top-1 accuracy on SWE-bench Lite for file-level bug localization using agentic RAG, at 18x lower cost than baselines, and boosts end-to-end APR success by over 20%.

  2. SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents

    cs.AI 2026-05 unverdicted novelty 6.0

    SkillLens organizes skills into policies-strategies-procedures-primitives layers, retrieves via degree-corrected random walk, and uses a verifier for local adaptation, yielding up to 6.31 pp gains on MuLocbench and ra...

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    [n. d.]. AWS Machine Learning Blog. https://aws.amazon.com/blogs/machine-learning/amazon-bedrock-knowledge- bases-now-supports-advanced-parsing-chunking-and-query-reformulation-giving-greater-control-of-accuracy-in- rag-based-applications/. Accessed: 2025-05-25

  2. [2]

    [n. d.]. Chroma. https://www.trychroma.com/. Accessed: 2025-05-25

  3. [3]

    [n. d.]. GPT-4o mini. https://platform.openai.com/docs/models/gpt-4o-mini. Accessed: 2025-05-25

  4. [4]

    [n. d.]. text-embedding-3-small. https://platform.openai.com/docs/models/text-embedding-3-small. Accessed: 2025- 05-25

  5. [5]

    [n. d.]. Tree-sitter. https://github.com/tree-sitter/tree-sitter. Accessed: 2025-04-17

  6. [6]

    Bui Thi Mai Anh and Nguyen Viet Luyen. 2021. An imbalanced deep learning model for bug localization. InProceedings of the 28th Asia-Pacific Software Engineering Conference Workshops. IEEE, 32–40

  7. [7]

    John Anvik, Lyndon Hiew, and Gail C Murphy. 2006. Who should fix this bug?. InProceedings of the 28th International Conference on Software Engineering. 361–370

  8. [8]

    Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann. 2008. What makes a good bug report?. InProceedings of the 16th International Symposium on Foundations of Software Engineering. 308–318

  9. [9]

    Lili Bo, Wangjie Ji, Xiaobing Sun, Ting Zhang, Xiaoxue Wu, and Ying Wei. 2024. ChatBR: Automated assessment and improvement of bug report quality using ChatGPT. InProceedings of the 39th International Conference on Automated Software Engineering. 1472–1483

  10. [10]

    Junming Cao, Shouliang Yang, Wenhui Jiang, Hushuang Zeng, Beijun Shen, and Hao Zhong. 2020. Bugpecker: Locating faulty methods with deep learning on revision graphs. InProceedings of the 35th International Conference on Automated Software Engineering. 1214–1218

  11. [11]

    Partha Chakraborty, Mahmoud Alfadel, and Meiyappan Nagappan. 2024. Rlocator: Reinforcement learning for bug localization.IEEE Transactions on Software Engineering(2024)

  12. [12]

    Jianming Chang, Xin Zhou, Lulu Wang, David Lo, and Bixin Li. 2025. Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models.arXiv preprint arXiv:2502.15292(2025)

  13. [13]

    Agnieszka Ciborowska and Kostadin Damevski. 2022. Fast changeset-based bug localization with bert. InProceedings of the 44th International Conference on Software Engineering. 946–957

  14. [14]

    Yali Du and Zhongxing Yu. 2023. Pre-training code representation with semantic flow graph for effective bug localization. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 579–591

  15. [15]

    Mikołaj Fejzer, Jakub Narębski, Piotr Przymus, and Krzysztof Stencel. 2021. Tracking buggy files: New efficient adaptive bug localization algorithm.IEEE Transactions on Software Engineering48, 7 (2021), 2557–2569

  16. [16]

    Jiaxuan Han, Cheng Huang, Siqi Sun, Zhonglin Liu, and Jiayong Liu. 2023. bjXnet: an improved bug localization model based on code property graph and attention mechanism.Automated Software Engineering30, 1 (2023), 12

  17. [17]

    Yikun Han, Chunjiang Liu, and Pengfei Wang. 2023. A comprehensive survey on vector database: Storage and retrieval technique, challenge.arXiv preprint arXiv:2310.11703(2023)

  18. [18]

    Shahid Iqbal, Rashid Naseem, Salman Jan, Sami Alshmrany, Muhammad Yasar, and Arshad Ali. 2020. Determining bug prioritization using feature reduction and clustering with classification.IEEE Access8 (2020), 215661–215678

  19. [19]

    Sungmin Kang, Gabin An, and Shin Yoo. 2024. A quantitative and qualitative evaluation of LLM-based explainable fault localization.Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering (2024), 1424–1446

  20. [20]

    Sungmin Kang, Juyeon Yoon, and Shin Yoo. 2023. Large language models are few-shot testers: Exploring llm-based general bug reproduction. InProceedings of the 45th International Conference on Software Engineering. IEEE, 2312–2323

  21. [21]

    Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, and Ashish Sabharwal. 2022. Decomposed prompting: A modular approach for solving complex tasks.arXiv preprint arXiv:2210.02406(2022)

  22. [22]

    Dongsun Kim, Yida Tao, Sunghun Kim, and Andreas Zeller. 2013. Where should we fix this bug? a two-phase recommendation model.IEEE Transactions on Software Engineering39, 11 (2013), 1597–1610

  23. [23]

    Misoo Kim and Eunseok Lee. 2019. A novel approach to automatic query reformulation for ir-based bug localization. InProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. 1752–1759

  24. [24]

    Misoo Kim and Eunseok Lee. 2021. Are datasets for information retrieval-based bug localization techniques trustworthy? Impact analysis of bug types on IRBL.Empirical Software Engineering26 (2021), 1–66. , Vol. 1, No. 1, Article . Publication date: October 2018. 20 Trovato et al

  25. [25]

    Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ expectations on automated fault localization. InProceedings of the 25th International Symposium on Software Testing and Analysis. 165–176

  26. [26]

    Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, and Xiaohang Dong

  27. [27]

    Better zero-shot reasoning with role-play prompting.arXiv preprint arXiv:2308.07702(2023)

  28. [28]

    An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2015. Combining deep learning with information retrieval to localize buggy files for bug reports. InProceedings of the 30th International Conference on Automated Software Engineering. IEEE, 476–481

  29. [29]

    An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. InProceedings of the 25th International Conference on Program Comprehension. IEEE, 218–229

  30. [30]

    Jaekwon Lee, Dongsun Kim, Tegawendé F Bissyandé, Woosung Jung, and Yves Le Traon. 2018. Bench4bl: reproducibility study on the performance of ir-based bug localization. InProceedings of the 27th International Symposium on Software Testing and Analysis. 61–72

  31. [31]

    Jae Yong Lee, Sungmin Kang, Juyeon Yoon, and Shin Yoo. 2024. The github recent bugs dataset for evaluating llm-based debugging applications. InProceedings of the International Conference on Software Testing, Verification and Validation. IEEE, 442–444

  32. [32]

    Yue Li, Bohan Liu, Ting Zhang, Zhiqi Wang, David Lo, Lanxin Yang, Jun Lyu, and He Zhang. 2025. A Knowledge Enhanced Large Language Model for Bug Localization.Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering(2025), 1914–1936

  33. [33]

    Zhengliang Li, Zhiwei Jiang, Qiguo Huang, and Qing Gu. 2025. LLM-BL: Large Language Models are Zero-Shot Rankers for Bug Localization. InProceedings of the 33rd International Conference on Program Comprehension. IEEE Computer Society, 548–559

  34. [34]

    Hongliang Liang, Dengji Hang, and Xiangyu Li. 2022. Modeling function-level interactions for file-level bug localization. Empirical Software Engineering27, 7 (2022), 186

  35. [35]

    Guangliang Liu, Yang Lu, Ke Shi, Jingfei Chang, and Xing Wei. 2019. Mapping bug reports to relevant source code files based on the vector space model and word embedding.IEEE Access7 (2019), 78870–78881

  36. [36]

    Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2023. Lost in the middle: How language models use long contexts.arXiv preprint arXiv:2307.03172(2023)

  37. [37]

    Zheng Liu, Yujia Zhou, Yutao Zhu, Jianxun Lian, Chaozhuo Li, Zhicheng Dou, Defu Lian, and Jian-Yun Nie. 2024. Information retrieval meets large language models. InCompanion Proceedings of the ACM Web Conference 2024. 1586–1589

  38. [38]

    Stacy K Lukins, Nicholas A Kraft, and Letha H Etzkorn. 2008. Source code retrieval for bug localization using latent dirichlet allocation. InProceedings of the 15th Working Conference on Reverse Engineering. IEEE, 155–164

  39. [39]

    Stacy K Lukins, Nicholas A Kraft, and Letha H Etzkorn. 2010. Bug localization using latent dirichlet allocation. Information and Software Technology52, 9 (2010), 972–990

  40. [40]

    Zhengmao Luo, Wenyao Wang, and Caichun Cen. 2022. Improving bug localization with effective contrastive learning representation.IEEE Access11 (2022), 32523–32533

  41. [41]

    Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, and Yongbin Li. 2025. Alibaba lingmaagent: Improving automated issue resolution via comprehensive repository exploration. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. 238–249

  42. [42]

    Patrick E McKnight and Julius Najab. 2010. Mann-whitney U test.The Corsini encyclopedia of psychology(2010), 1–1

  43. [43]

    Anh Tuan Nguyen, Tung Thanh Nguyen, Jafar Al-Kofahi, Hung Viet Nguyen, and Tien N Nguyen. 2011. A topic-based approach for narrowing the search space of buggy files from a bug report. InProceedings of the 26th International Conference on Automated Software Engineering. IEEE, 263–272

  44. [44]

    Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. 2013. Using of Jaccard coefficient for keywords similarity. InProceedings of the International Multiconference of Engineers and Computer Scientists, Vol. 1. 380–384

  45. [45]

    Laura Plein and Tegawendé F Bissyandé. 2023. Can llms demystify bug reports?arXiv preprint arXiv:2310.06310(2023)

  46. [46]

    Michael Pradel, Vijayaraghavan Murali, Rebecca Qian, Mateusz Machalica, Erik Meijer, and Satish Chandra. 2020. Scaffle: Bug localization on millions of files. InProceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 225–236

  47. [47]

    Binhang Qi, Hailong Sun, Wei Yuan, Hongyu Zhang, and Xiangxin Meng. 2021. Dreamloc: A deep relevance matching- based framework for bug localization.IEEE Transactions on Reliability71, 1 (2021), 235–249

  48. [48]

    Yihao Qin, Shangwen Wang, Yan Lei, Zhuo Zhang, Bo Lin, Xin Peng, Jun Ma, Liqian Chen, and Xiaoguang Mao. 2025. Fault Localization from the Semantic Code Search Perspective. (2025). doi:10.1145/3757915

  49. [49]

    Yihao Qin, Shangwen Wang, Yiling Lou, Jinhao Dong, Kaixin Wang, Xiaoling Li, and Xiaoguang Mao. 2025. Soap FL: A Standard Operating Procedure for LLM-based Method-Level Fault Localization.IEEE Transactions on Software , Vol. 1, No. 1, Article . Publication date: October 2018. Leveraging Large Language Model for Information Retrieval-based Bug Localization...

  50. [50]

    Md Nakhla Rafi, Dong Jae Kim, Tse-Hsun Chen, and Shaowei Wang. 2024. Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection.arXiv preprint arXiv:2409.13642(2024)

  51. [51]

    Mohammad Masudur Rahman and Chanchal K Roy. 2018. Improving ir-based bug localization with context-aware query reformulation. InProceedings of the 26th ACM joint meeting on European software Engineering Conference and Symposium on the Foundations of Software Engineering. 621–632

  52. [52]

    Matthew Renze. 2024. The effect of sampling temperature on problem solving in large language models. InFindings of the Association for Computational Linguistics: EMNLP 2024. 7346–7356

  53. [53]

    Haifeng Ruan, Yuntong Zhang, and Abhik Roychoudhury. 2025. SpecRover: Code Intent Extraction via LLMs. (2025), 963–974. https://doi.org/10.1109/ICSE55347.2025.00080

  54. [54]

    Ripon K Saha, Matthew Lease, Sarfraz Khurshid, and Dewayne E Perry. 2013. Improving bug localization using structured information retrieval. InProceedings of the 28th International Conference on Automated Software Engineering. IEEE, 345–355

  55. [55]

    Asif Mohammed Samir and Mohammad Masudur Rahman. 2025. Improved IR-Based Bug Localization with Intelligent Relevance Feedback. InProceedings of the 33rd International Conference on Program Comprehension. IEEE, 560–571

  56. [56]

    Bunyamin Sisman and Avinash C Kak. 2012. Incorporating version histories in information retrieval based bug localization. InProceedings of the 9th IEEE Working Conference on Mining Software Repositories. IEEE, 50–59

  57. [57]

    Mozhan Soltani, Felienne Hermans, and Thomas Bäck. 2020. The significance of bug report elements.Empirical Software Engineering25, 6 (2020), 5255–5294

  58. [58]

    Harald Steck, Chaitanya Ekanadham, and Nathan Kallus. 2024. Is cosine-similarity of embeddings really about similarity?. InCompanion Proceedings of the ACM Web Conference. 887–890

  59. [59]

    Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, and Yu Cheng. 2024. Magis: Llm-based multi-agent framework for github issue resolution.Advances in Neural Information Processing Systems37 (2024), 51963–51993

  60. [60]

    Stephen W Thomas, Meiyappan Nagappan, Dorothea Blostein, and Ahmed E Hassan. 2013. The impact of classifier configuration and classifier combination on bug localization.IEEE Transactions on Software Engineering39, 10 (2013), 1427–1443

  61. [61]

    Yao Tian, Ziyang Yue, Ruiyuan Zhang, Xi Zhao, Bolong Zheng, and Xiaofang Zhou. 2023. Approximate Nearest Neighbor Search in High Dimensional Vector Databases: Current Research and Future Directions.IEEE Data Eng. Bull. 46, 3 (2023), 39–54

  62. [62]

    Bei Wang, Ling Xu, Meng Yan, Chao Liu, and Ling Liu. 2020. Multi-dimension convolutional neural network for bug localization.IEEE Transactions on Services Computing15, 3 (2020), 1649–1663

  63. [63]

    Shaowei Wang and David Lo. 2014. Version history, similar report, and structure: Putting them together for improved bug localization. InProceedings of the 22nd International Conference on Program Comprehension. 53–63

  64. [64]

    Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: Locating bugs from software changes. InProceedings of the 31st International Conference on Automated Software Engineering. 262–273

  65. [65]

    Ratnadira Widyasari, Jia Wei Ang, Truong Giang Nguyen, Neil Sharma, and David Lo. 2024. Demystifying faulty code: Step-by-step reasoning for explainable fault localization. (2024), 568–579

  66. [66]

    Ratnadira Widyasari, Stefanus Agus Haryono, Ferdian Thung, Jieke Shi, Constance Tan, Fiona Wee, Jack Phan, and David Lo. 2022. On the influence of biases in bug localization: Evaluation and benchmark. InProceedings of the International Conference on Software Analysis, Evolution and Reengineering. IEEE, 128–139

  67. [67]

    Chu-Pan Wong, Yingfei Xiong, Hongyu Zhang, Dan Hao, Lu Zhang, and Hong Mei. 2014. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. InProceedings of the International Conference on Software Maintenance and Evolution. IEEE, 181–190

  68. [68]

    Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa

    W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A Survey on Software Fault Localization. IEEE Transactions on Software Engineering42, 8 (2016), 707–740

  69. [69]

    2023.Handbook of software fault localization: foundations and advances

    W Eric Wong and TH Tse. 2023.Handbook of software fault localization: foundations and advances. John Wiley & Sons

  70. [70]

    Yonghao Wu, Zheng Li, Jie M Zhang, Mike Papadakis, Mark Harman, and Yong Liu. 2023. Large language models in fault localisation.arXiv preprint arXiv:2308.15276(2023)

  71. [71]

    Zihao Wu. 2025. Autono: A ReAct-Based Highly Robust Autonomous Agent Framework.arXiv preprint arXiv:2504.04650 (2025)

  72. [72]

    Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. 2025. Demystifying LLM-Based Software Engineering Agents.Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering (2025), 801–824

  73. [73]

    Yan Xiao, Jacky Keung, Kwabena E Bennin, and Qing Mi. 2019. Improving bug localization with word embedding and enhanced convolutional neural networks.Information and Software Technology105 (2019), 17–29. , Vol. 1, No. 1, Article . Publication date: October 2018. 22 Trovato et al

  74. [74]

    Chuyang Xu, Zhongxin Liu, Xiaoxue Ren, Gehao Zhang, Ming Liang, and David Lo. 2025. FlexFL: Flexible and Effective Fault Localization with Open-Source Large Language Models.IEEE Transactions on Software Engineering(2025)

  75. [75]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations

  76. [76]

    Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. InProceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 689–699

  77. [77]

    Xin Ye, Razvan Bunescu, and Chang Liu. 2015. Mapping bug reports to relevant files: A ranking model, a fine-grained benchmark, and feature evaluation.IEEE Transactions on Software Engineering42, 4 (2015), 379–402

  78. [78]

    Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. InProceedings of the 38th International Conference on Software Engineering. 404–415

  79. [79]

    Klaus Changsun Youm, June Ahn, and Eunseok Lee. 2017. Improved bug localization based on code change histories and bug reports.Information and Software Technology82 (2017), 177–192

  80. [80]

    Abubakar Zakari, Sai Peck Lee, Rui Abreu, Babiker Hussien Ahmed, and Rasheed Abubakar Rasheed. 2020. Multiple fault localization of software programs: A systematic literature review.Information and Software Technology124 (2020), 106312

Showing first 80 references.