Loc2Repair: A Framework for Evaluating the Impact of File-Level Issue Localization in Repo-Level LLM Repair

Mohammad Nour Al Awad; Sergey Ivanov

arxiv: 2606.30963 · v1 · pith:DOJN7YHFnew · submitted 2026-06-29 · 💻 cs.SE · cs.AI

Loc2Repair: A Framework for Evaluating the Impact of File-Level Issue Localization in Repo-Level LLM Repair

Mohammad Nour Al Awad , Sergey Ivanov This is my paper

Pith reviewed 2026-07-01 01:02 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords file-level localizationrepo-level repairLLM repairevaluation frameworkSWE-bench Verifiedautomated program repairmodular pipelineissue localization

0 comments

The pith

Explicit file-level localization improves resolved rates in repository-level LLM repair from 44.7% to 52.4%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Loc2Repair, a modular evaluation framework that decouples file-level issue localization from the repair process in repository-grounded automated repair. It uses this to test the impact of localization by comparing baseline repair without explicit localization to repair guided by predicted or gold file locations across three repair backbones on SWE-bench Verified. The results show consistent improvements in resolved rates and reductions in mean elapsed time when localization is provided. This allows researchers to analyze distinct failure modes in end-to-end repair pipelines under controlled conditions.

Core claim

Loc2Repair decouples localization and repair under a shared runtime, artifact schema, and evaluation harness, allowing researchers to combine different localization models and repair backbones under matched conditions. Using three repair backbones on SWE-bench Verified, we compare baseline repair without explicit localization, repair guided by predicted localization from two localizers, and repair guided by gold modified-file sets. Explicit localization consistently improves resolved rate across all backbones: pooled performance increases from 44.7% for baseline repair to 48.9% and 49.1% with predicted localization, and to 52.4% with gold localization. Localization also reduces mean elapsed

What carries the argument

The Loc2Repair framework that isolates file-level issue localization as an upstream variable by decoupling it from repair under shared conditions.

If this is right

Resolved rates increase with both predicted and gold localization across all tested backbones.
Gold localization achieves the highest pooled resolved rate of 52.4%.
Mean elapsed time decreases with localization guidance in paired analysis.
Token effects remain heterogeneous across models despite overall latency improvements.
Gold-guided failures reveal remaining headroom beyond localization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future work could explore combining multiple localizers to approach gold performance more closely.
The modular design makes it straightforward to swap in new localization methods for testing.
Similar decoupling might reveal localization benefits in other software engineering tasks involving LLMs.
The time savings could make repair systems more practical for large repositories if the pattern holds.

Load-bearing premise

The three repair backbones and SWE-bench Verified dataset are representative enough for the observed localization benefit to apply more broadly.

What would settle it

Observing no improvement or a drop in resolved rates when adding explicit localization in experiments with new backbones or datasets would falsify the claim that localization is a consistent repair lever.

Figures

Figures reproduced from arXiv: 2606.30963 by Mohammad Nour Al Awad, Sergey Ivanov.

**Figure 1.** Figure 1: Resolved rate versus average elapsed time by repair backbone. Upper-left is better; arrows indicate the effectiveness–latency shift under localization. Pooled paired tests show the same direction, where baseline → Pred-Qwen4B yields +4.3 points (95% CI [+1.9, +6.7], 𝑝 = 0.0006365); baseline → Pred-Gemma4E4B yields +4.5 points (95% CI [+2.1, +6.8], 𝑝 = 0.0002982); baseline → gold yields +7.7 points (95% CI … view at source ↗

read the original abstract

Repository-grounded automated repair is often reported as a single end-to-end capability, which hides distinct failure modes such as poor file targeting, incorrect patch synthesis, and failed iterative debugging. We present Loc2Repair, a modular evaluation framework for controlled analysis of repository-grounded repair pipelines, and use it to isolate file-level issue localization as an upstream variable. Loc2Repair decouples localization and repair under a shared runtime, artifact schema, and evaluation harness, allowing researchers to combine different localization models and repair backbones under matched conditions. Using three repair backbones on SWE-bench Verified, we compare baseline repair without explicit localization, repair guided by predicted localization from two localizers, and repair guided by gold modified-file sets. Explicit localization consistently improves resolved rate across all backbones: pooled performance increases from 44.7% for baseline repair to 48.9% and 49.1% with predicted localization, and to 52.4% with gold localization. Localization also reduces mean elapsed time overall: in pooled paired analysis, mean elapsed time decreases by 100.94 s and 52.25 s for the two predicted-localization settings, and by 154.45 s with gold guidance, although token effects remain heterogeneous across models. Overall, Loc2Repair shows file-level localization is a consistent repair lever, improving effectiveness and mean latency in pooled analysis, while gold-guided failures expose headroom beyond localization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Loc2Repair cleanly decouples localization from repair and shows a modest but consistent lift in resolved rate plus lower time on SWE-bench across three backbones.

read the letter

The main thing to know is that this paper gives researchers a modular harness to test file-level localization as an isolated variable in repo-level LLM repair. They run the same three repair backbones with no explicit localization, with two predicted localizers, and with gold modified-file sets, all under matched conditions on SWE-bench Verified. Pooled resolved rate moves from 44.7% baseline to 48.9-49.1% with predicted localization and 52.4% with gold, and mean elapsed time drops in the paired comparisons.

The framework itself is the clearest new piece. By fixing the runtime, artifact schema, and evaluation harness, the only changing input is the localization signal. That setup makes the comparisons direct and avoids the usual confounding in end-to-end systems. The results are reported per backbone as well as pooled, which helps show the pattern is not driven by one model.

The gains are real on the data shown but remain modest, and the paper notes that even gold localization leaves headroom in the repair step. Token effects are heterogeneous, so the time benefit is not uniform. Generalization rests on the three backbones and this benchmark; that is a standard boundary rather than a hidden flaw in the logic.

The work is aimed at people who build or evaluate automated repair pipelines and want to measure component contributions instead of treating the whole system as a black box. The experimental design is straightforward and the numbers are reported without obvious fitting or circularity. It deserves a serious referee.

Referee Report

0 major / 1 minor

Summary. The paper introduces Loc2Repair, a modular framework that decouples file-level issue localization from repair in repository-grounded LLM repair pipelines under shared runtime, schema, and harness. On SWE-bench Verified with three repair backbones, it compares baseline repair (no explicit localization) against repair guided by two predicted localizers and by gold modified-file sets. Pooled results show resolved-rate gains from 44.7% (baseline) to 48.9%/49.1% (predicted) and 52.4% (gold), with corresponding mean elapsed-time reductions of 100.94 s, 52.25 s, and 154.45 s; gold guidance is used to expose remaining headroom.

Significance. If the empirical comparisons hold, the work demonstrates that explicit file-level localization is a consistent, actionable lever for both effectiveness and mean latency in repo-level repair. The modular decoupling under matched conditions supplies a reusable experimental scaffold for the community; the multi-backbone design and gold-localization upper bound provide concrete, falsifiable evidence rather than end-to-end black-box claims.

minor comments (1)

[Evaluation] The abstract states that token effects remain heterogeneous across models; a brief per-backbone breakdown of token counts (or a supplementary table) would clarify whether the latency gains are driven primarily by fewer repair iterations or by localization overhead.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The review accurately captures the core contribution of Loc2Repair as a modular framework for isolating the effects of file-level localization under controlled conditions.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical evaluation framework (Loc2Repair) that decouples localization from repair and measures outcomes via controlled experiments on the external SWE-bench Verified benchmark across three independent repair backbones. All reported improvements (resolved rates from 44.7% baseline to 48.9/49.1% predicted and 52.4% gold; latency reductions) are direct measured results from these runs under matched runtime and harness conditions. No equations, fitted parameters, self-citations, or ansatzes appear in the derivation chain; the claims do not reduce to inputs by construction and remain self-contained against the external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the SWE-bench Verified benchmark and the assumption that the modular framework does not introduce confounding artifacts; no free parameters or invented entities are introduced.

axioms (1)

domain assumption SWE-bench Verified is an appropriate benchmark for evaluating repository-level repair performance.
The experiments are conducted on this dataset to measure resolved rates and time.

pith-pipeline@v0.9.1-grok · 5790 in / 1264 out tokens · 50899 ms · 2026-07-01T01:02:07.582196+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 22 canonical work pages · 2 internal anchors

[1]

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, K. R. Narasimhan, SWE-bench: Can language models resolve real-world github issues?, in: The Twelfth International Conference on Learning Representations, 2024. URL: https://openreview.net/forum?id=VTF8yNQM66

2024
[2]

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. R. Narasimhan, O. Press, SWE-agent: Agent-computer interfaces enable automated software engineering, in: The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL: https://openreview.net/forum ?id=mXpq6ut8J3

2024
[3]

C. S. Xia, Y. Deng, S. Dunn, L. Zhang, Demystifying llm-based software engineering agents, Proc. ACM Softw. Eng. 2 (2025). URL: https://doi.org/10.1145/3715754. doi:10.1145/3715754

work page doi:10.1145/3715754 2025
[4]

Zhang, H

Y. Zhang, H. Ruan, Z. Fan, A. Roychoudhury, Autocoderover: Autonomous program improvement, in: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, 2024, p. 1592–1604. URL: https://doi.org/10.1145/3650212.3680384. doi:10.1145/3650212.3680384

work page doi:10.1145/3650212.3680384 2024
[5]

X. Wang, B. Li, Y. Song, OpenHands: An open platform for AI software developers as generalist agents, in: The Thirteenth International Conference on Learning Representations, 2025. URL: https://openreview.net/forum?id=OJd3ayDDoF

2025
[6]

SGAgent: Suggestion-Guided LLM-Based Multi-Agent Framework for Repository-Level Software Repair

Q. Zhang, C. Gao, Y. Han, Y. Shang, C. Fang, Z. Chen, L. Xiao, Sgagent: Suggestion-guided llm- based multi-agent framework for repository-level software repair, 2026. URL: https://arxiv.org/ab s/2602.23647.arXiv:2602.23647

work page internal anchor Pith review Pith/arXiv arXiv 2026
[7]

M. N. Al Awad, S. Ivanov, O. Tikhonova, Optimizing llm code suggestions: Feedback-driven timing with lightweight state bounds, in: Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), 2025, pp. 213–220. doi:10.1109/ASEW 67777.2025.00049

work page doi:10.1109/asew 2025
[8]

64 Sigma Jahan, Saurabh Singh Rajput, Tushar Sharma, and Mohammad Masudur Rahman

I. Bouzenia, P. Devanbu, M. Pradel, Repairagent: An autonomous, llm-based agent for program repair, in: Proceedings of the IEEE/ACM 47th International Conference on Software Engineering, ICSE ’25, IEEE Press, 2025, p. 2188–2200. URL: https://doi.org/10.1109/ICSE55347.2025.00157. doi:10.1109/ICSE55347.2025.00157

work page doi:10.1109/icse55347.2025.00157 2025
[9]

X. Yin, C. Ni, S. Wang, Z. Li, L. Zeng, X. Yang, Thinkrepair: Self-directed automated program repair, in: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, 2024, p. 1274–1286. URL: https://doi.org/10.1145/3650212.3680359. doi:10.1145/3650212.3680359

work page doi:10.1145/3650212.3680359 2024
[10]

C. Lee, C. S. Xia, L. Yang, J. tse Huang, Z. Zhu, L. Zhang, M. R. Lyu, Unidebugger: Hierarchical multi-agent framework for unified software debugging, 2025. URL: https://arxiv.org/abs/2404.17153. arXiv:2404.17153

work page arXiv 2025
[11]

J. Liu, Z. Liu, Z. Cheng, M. He, X. Shi, Y. Guo, X. Zhu, Y. Guo, Y. Wang, H. Wang, RepoDebug: Repository-level multi-task and multi-language debugging evaluation of large language models, in: Findings of the Association for Computational Linguistics: EMNLP 2025, Association for Computational Linguistics, 2025, pp. 23784–23813. URL: https://aclanthology.or...

work page doi:10.18653/v1/2025.findings-emnlp.1294 2025
[12]

M. S. Rashid, C. Bock, Y. Zhuang, A. Buchholz, T. B. Esler, S. Valentin, L. Franceschi, M. Wistuba, P. T. S, W. Kim, A. Deoras, G. Zappella, L. Callot, SWE-polybench: A multi-language benchmark for repository level evaluation of coding agents, 2026. URL: https://openreview.net/forum?id=n5 77FC6CKk

2026
[13]

F. Mu, J. Wang, L. Shi, S. Wang, S. Li, Q. Wang, ExpeRepair: Dual-memory enhanced LLM-based repository-level program repair, 2025. URL: https://arxiv.org/abs/2506.10484. doi:10.48550/arX iv.2506.10484.arXiv:2506.10484

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arx 2025
[14]

M. V. T. Pham, H. N. Phan, H. N. Phan, C. L. Chi, T. N. Nguyen, N. D. Q. Bui, SWE-Synth: Synthesizing verifiable bug-fix data to enable large language models in resolving real-world bugs, 2025. URL: https://arxiv.org/abs/2504.14757. doi: 10.48550/arXiv.2504.14757 . arXiv:2504.14757

work page doi:10.48550/arxiv.2504.14757 2025
[15]

M. N. Al Awad, S. Ivanov, O. Tikhonova, Pre-filtering code suggestions using developer behavioral telemetry to optimize llm-assisted programming, in: Proceedings of the 40th IEEE/ACM Interna- tional Conference on Automated Software Engineering Workshops (ASEW), 2025, pp. 113–120. doi:10.1109/ASEW67777.2025.00032

work page doi:10.1109/asew67777.2025.00032 2025
[16]

R. K. Saha, M. Lease, S. Khurshid, D. E. Perry, Improving bug localization using structured infor- mation retrieval, in: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, ASE ’13, IEEE Press, 2013, p. 345–355. URL: https://doi.org/10.1109/ASE.20 13.6693093. doi:10.1109/ASE.2013.6693093

work page doi:10.1109/ase.20 2013
[17]

X. Ye, R. Bunescu, C. Liu, Learning to rank relevant files for bug reports using domain knowledge, in: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, Association for Computing Machinery, New York, NY, USA, 2014, p. 689–699. URL: https://doi.org/10.1145/2635868.2635874. doi:10.1145/2635868.2635874

work page doi:10.1145/2635868.2635874 2014
[18]

S. Wang, D. Lo, Amalgam+: Composing rich information sources for accurate bug lo- calization, Journal of Software: Evolution and Process 28 (2016) 921–942. URL: https: //onlinelibrary.wiley.com/doi/abs/10.1002/smr.1801. doi: 1 0 . 1 0 0 2 / s m r . 1 8 01. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.1801

work page doi:10.1002/smr.1801 2016
[19]

S. A. Akbar, A. C. Kak, A large-scale comparative evaluation of ir-based tools for bug localization, in: Proceedings of the 17th International Conference on Mining Software Repositories, MSR ’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 21–31. URL: https: //doi.org/10.1145/3379597.3387474. doi:10.1145/3379597.3387474

work page doi:10.1145/3379597.3387474 2020
[20]

F. Niu, C. Li, K. Liu, X. Xia, D. Lo, When deep learning meets information retrieval-based bug localization: A survey, ACM Comput. Surv. 57 (2025). URL: https://doi.org/10.1145/3734217. doi:10.1145/3734217

work page doi:10.1145/3734217 2025
[21]

Z. Chen, R. Tang, G. Deng, F. Wu, J. Wu, Z. Jiang, V. Prasanna, A. Cohan, X. Wang, Locagent: Graph-guided LLM agents for code localization, in: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vienna, Austria, 2025, pp. 8697–8727. URL: https://aclant...

work page doi:10.18653/v1/2025.acl-long.426 2025
[22]

Jiang, X

Z. Jiang, X. Ren, M. Yan, W. Jiang, Y. Li, Z. Liu, Cosil: Software issue localization via LLM-driven code repository graph searching, CoRR abs/2503.22424 (2025). URL: https://arxiv.org/abs/2503.224 24.arXiv:2503.22424

work page arXiv 2025
[23]

Z. Yu, H. Zhang, Y. Zhao, H. Huang, M. Yao, K. Ding, J. Zhao, OrcaLoca: An LLM agent framework for software issue localization, in: Proceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, PMLR, 2025, pp. 73416–73436. URL: https://proceedings.mlr.press/v267/yu25x.html

2025
[24]

Maarleveld, J

J. Maarleveld, J. Guo, D. Feitosa, Gotta catch ’em all! towards file localisation from issues at large,
[25]

URL: https://arxiv.org/abs/2507.18319.arXiv:2507.18319

work page arXiv
[26]

R. G. Reddy, T. Suresh, J. Doo, Y. Liu, X.-P. Nguyen, Y. Zhou, S. Yavuz, C. Xiong, H. Ji, S. Joty, SWERank: Software issue localization with code ranking, in: The Fourteenth International Conference on Learning Representations, 2026. URL: https://openreview.net/forum?id=OnkRqb Nhe3

2026
[27]

S. B. Hossain, N. Jiang, Q. Zhou, X. Li, W.-H. Chiang, Y. Lyu, H. Nguyen, O. Tripp, A deep dive into large language models for automated bug localization and repair, Proc. ACM Softw. Eng. 1 (2024). URL: https://doi.org/10.1145/3660773. doi:10.1145/3660773

work page doi:10.1145/3660773 2024
[28]

Q. Feng, X. Ma, J. Sheng, Z. Feng, W. Song, P. Liang, Integrating various software artifacts for better llm-based bug localization and program repair, ACM Trans. Softw. Eng. Methodol. (2025). URL: https://doi.org/10.1145/3770581. doi:10.1145/3770581, just Accepted

work page doi:10.1145/3770581 2025
[29]

Sepidband, H

M. Sepidband, H. Taherkhani, H. V. Pham, H. Hemmati, Rgfl: Reasoning guided fault localization for automated program repair using large language models, 2026. URL: https://arxiv.org/abs/2601 .18044.arXiv:2601.18044

work page arXiv 2026
[30]

Accessed: 2026-05-03

SWE-agent Team, mini-SWE-agent, https://github.com/SWE-agent/mini-swe-agent, 2024. Accessed: 2026-05-03

2024

[1] [1]

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, K. R. Narasimhan, SWE-bench: Can language models resolve real-world github issues?, in: The Twelfth International Conference on Learning Representations, 2024. URL: https://openreview.net/forum?id=VTF8yNQM66

2024

[2] [2]

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. R. Narasimhan, O. Press, SWE-agent: Agent-computer interfaces enable automated software engineering, in: The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL: https://openreview.net/forum ?id=mXpq6ut8J3

2024

[3] [3]

C. S. Xia, Y. Deng, S. Dunn, L. Zhang, Demystifying llm-based software engineering agents, Proc. ACM Softw. Eng. 2 (2025). URL: https://doi.org/10.1145/3715754. doi:10.1145/3715754

work page doi:10.1145/3715754 2025

[4] [4]

Zhang, H

Y. Zhang, H. Ruan, Z. Fan, A. Roychoudhury, Autocoderover: Autonomous program improvement, in: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, 2024, p. 1592–1604. URL: https://doi.org/10.1145/3650212.3680384. doi:10.1145/3650212.3680384

work page doi:10.1145/3650212.3680384 2024

[5] [5]

X. Wang, B. Li, Y. Song, OpenHands: An open platform for AI software developers as generalist agents, in: The Thirteenth International Conference on Learning Representations, 2025. URL: https://openreview.net/forum?id=OJd3ayDDoF

2025

[6] [6]

SGAgent: Suggestion-Guided LLM-Based Multi-Agent Framework for Repository-Level Software Repair

Q. Zhang, C. Gao, Y. Han, Y. Shang, C. Fang, Z. Chen, L. Xiao, Sgagent: Suggestion-guided llm- based multi-agent framework for repository-level software repair, 2026. URL: https://arxiv.org/ab s/2602.23647.arXiv:2602.23647

work page internal anchor Pith review Pith/arXiv arXiv 2026

[7] [7]

M. N. Al Awad, S. Ivanov, O. Tikhonova, Optimizing llm code suggestions: Feedback-driven timing with lightweight state bounds, in: Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), 2025, pp. 213–220. doi:10.1109/ASEW 67777.2025.00049

work page doi:10.1109/asew 2025

[8] [8]

64 Sigma Jahan, Saurabh Singh Rajput, Tushar Sharma, and Mohammad Masudur Rahman

I. Bouzenia, P. Devanbu, M. Pradel, Repairagent: An autonomous, llm-based agent for program repair, in: Proceedings of the IEEE/ACM 47th International Conference on Software Engineering, ICSE ’25, IEEE Press, 2025, p. 2188–2200. URL: https://doi.org/10.1109/ICSE55347.2025.00157. doi:10.1109/ICSE55347.2025.00157

work page doi:10.1109/icse55347.2025.00157 2025

[9] [9]

X. Yin, C. Ni, S. Wang, Z. Li, L. Zeng, X. Yang, Thinkrepair: Self-directed automated program repair, in: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, 2024, p. 1274–1286. URL: https://doi.org/10.1145/3650212.3680359. doi:10.1145/3650212.3680359

work page doi:10.1145/3650212.3680359 2024

[10] [10]

C. Lee, C. S. Xia, L. Yang, J. tse Huang, Z. Zhu, L. Zhang, M. R. Lyu, Unidebugger: Hierarchical multi-agent framework for unified software debugging, 2025. URL: https://arxiv.org/abs/2404.17153. arXiv:2404.17153

work page arXiv 2025

[11] [11]

J. Liu, Z. Liu, Z. Cheng, M. He, X. Shi, Y. Guo, X. Zhu, Y. Guo, Y. Wang, H. Wang, RepoDebug: Repository-level multi-task and multi-language debugging evaluation of large language models, in: Findings of the Association for Computational Linguistics: EMNLP 2025, Association for Computational Linguistics, 2025, pp. 23784–23813. URL: https://aclanthology.or...

work page doi:10.18653/v1/2025.findings-emnlp.1294 2025

[12] [12]

M. S. Rashid, C. Bock, Y. Zhuang, A. Buchholz, T. B. Esler, S. Valentin, L. Franceschi, M. Wistuba, P. T. S, W. Kim, A. Deoras, G. Zappella, L. Callot, SWE-polybench: A multi-language benchmark for repository level evaluation of coding agents, 2026. URL: https://openreview.net/forum?id=n5 77FC6CKk

2026

[13] [13]

F. Mu, J. Wang, L. Shi, S. Wang, S. Li, Q. Wang, ExpeRepair: Dual-memory enhanced LLM-based repository-level program repair, 2025. URL: https://arxiv.org/abs/2506.10484. doi:10.48550/arX iv.2506.10484.arXiv:2506.10484

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arx 2025

[14] [14]

M. V. T. Pham, H. N. Phan, H. N. Phan, C. L. Chi, T. N. Nguyen, N. D. Q. Bui, SWE-Synth: Synthesizing verifiable bug-fix data to enable large language models in resolving real-world bugs, 2025. URL: https://arxiv.org/abs/2504.14757. doi: 10.48550/arXiv.2504.14757 . arXiv:2504.14757

work page doi:10.48550/arxiv.2504.14757 2025

[15] [15]

M. N. Al Awad, S. Ivanov, O. Tikhonova, Pre-filtering code suggestions using developer behavioral telemetry to optimize llm-assisted programming, in: Proceedings of the 40th IEEE/ACM Interna- tional Conference on Automated Software Engineering Workshops (ASEW), 2025, pp. 113–120. doi:10.1109/ASEW67777.2025.00032

work page doi:10.1109/asew67777.2025.00032 2025

[16] [16]

R. K. Saha, M. Lease, S. Khurshid, D. E. Perry, Improving bug localization using structured infor- mation retrieval, in: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, ASE ’13, IEEE Press, 2013, p. 345–355. URL: https://doi.org/10.1109/ASE.20 13.6693093. doi:10.1109/ASE.2013.6693093

work page doi:10.1109/ase.20 2013

[17] [17]

X. Ye, R. Bunescu, C. Liu, Learning to rank relevant files for bug reports using domain knowledge, in: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, Association for Computing Machinery, New York, NY, USA, 2014, p. 689–699. URL: https://doi.org/10.1145/2635868.2635874. doi:10.1145/2635868.2635874

work page doi:10.1145/2635868.2635874 2014

[18] [18]

S. Wang, D. Lo, Amalgam+: Composing rich information sources for accurate bug lo- calization, Journal of Software: Evolution and Process 28 (2016) 921–942. URL: https: //onlinelibrary.wiley.com/doi/abs/10.1002/smr.1801. doi: 1 0 . 1 0 0 2 / s m r . 1 8 01. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.1801

work page doi:10.1002/smr.1801 2016

[19] [19]

S. A. Akbar, A. C. Kak, A large-scale comparative evaluation of ir-based tools for bug localization, in: Proceedings of the 17th International Conference on Mining Software Repositories, MSR ’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 21–31. URL: https: //doi.org/10.1145/3379597.3387474. doi:10.1145/3379597.3387474

work page doi:10.1145/3379597.3387474 2020

[20] [20]

F. Niu, C. Li, K. Liu, X. Xia, D. Lo, When deep learning meets information retrieval-based bug localization: A survey, ACM Comput. Surv. 57 (2025). URL: https://doi.org/10.1145/3734217. doi:10.1145/3734217

work page doi:10.1145/3734217 2025

[21] [21]

Z. Chen, R. Tang, G. Deng, F. Wu, J. Wu, Z. Jiang, V. Prasanna, A. Cohan, X. Wang, Locagent: Graph-guided LLM agents for code localization, in: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vienna, Austria, 2025, pp. 8697–8727. URL: https://aclant...

work page doi:10.18653/v1/2025.acl-long.426 2025

[22] [22]

Jiang, X

Z. Jiang, X. Ren, M. Yan, W. Jiang, Y. Li, Z. Liu, Cosil: Software issue localization via LLM-driven code repository graph searching, CoRR abs/2503.22424 (2025). URL: https://arxiv.org/abs/2503.224 24.arXiv:2503.22424

work page arXiv 2025

[23] [23]

Z. Yu, H. Zhang, Y. Zhao, H. Huang, M. Yao, K. Ding, J. Zhao, OrcaLoca: An LLM agent framework for software issue localization, in: Proceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, PMLR, 2025, pp. 73416–73436. URL: https://proceedings.mlr.press/v267/yu25x.html

2025

[24] [24]

Maarleveld, J

J. Maarleveld, J. Guo, D. Feitosa, Gotta catch ’em all! towards file localisation from issues at large,

[25] [25]

URL: https://arxiv.org/abs/2507.18319.arXiv:2507.18319

work page arXiv

[26] [26]

R. G. Reddy, T. Suresh, J. Doo, Y. Liu, X.-P. Nguyen, Y. Zhou, S. Yavuz, C. Xiong, H. Ji, S. Joty, SWERank: Software issue localization with code ranking, in: The Fourteenth International Conference on Learning Representations, 2026. URL: https://openreview.net/forum?id=OnkRqb Nhe3

2026

[27] [27]

S. B. Hossain, N. Jiang, Q. Zhou, X. Li, W.-H. Chiang, Y. Lyu, H. Nguyen, O. Tripp, A deep dive into large language models for automated bug localization and repair, Proc. ACM Softw. Eng. 1 (2024). URL: https://doi.org/10.1145/3660773. doi:10.1145/3660773

work page doi:10.1145/3660773 2024

[28] [28]

Q. Feng, X. Ma, J. Sheng, Z. Feng, W. Song, P. Liang, Integrating various software artifacts for better llm-based bug localization and program repair, ACM Trans. Softw. Eng. Methodol. (2025). URL: https://doi.org/10.1145/3770581. doi:10.1145/3770581, just Accepted

work page doi:10.1145/3770581 2025

[29] [29]

Sepidband, H

M. Sepidband, H. Taherkhani, H. V. Pham, H. Hemmati, Rgfl: Reasoning guided fault localization for automated program repair using large language models, 2026. URL: https://arxiv.org/abs/2601 .18044.arXiv:2601.18044

work page arXiv 2026

[30] [30]

Accessed: 2026-05-03

SWE-agent Team, mini-SWE-agent, https://github.com/SWE-agent/mini-swe-agent, 2024. Accessed: 2026-05-03

2024