Loc2Repair: A Framework for Evaluating the Impact of File-Level Issue Localization in Repo-Level LLM Repair
Pith reviewed 2026-07-01 01:02 UTC · model grok-4.3
The pith
Explicit file-level localization improves resolved rates in repository-level LLM repair from 44.7% to 52.4%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Loc2Repair decouples localization and repair under a shared runtime, artifact schema, and evaluation harness, allowing researchers to combine different localization models and repair backbones under matched conditions. Using three repair backbones on SWE-bench Verified, we compare baseline repair without explicit localization, repair guided by predicted localization from two localizers, and repair guided by gold modified-file sets. Explicit localization consistently improves resolved rate across all backbones: pooled performance increases from 44.7% for baseline repair to 48.9% and 49.1% with predicted localization, and to 52.4% with gold localization. Localization also reduces mean elapsed
What carries the argument
The Loc2Repair framework that isolates file-level issue localization as an upstream variable by decoupling it from repair under shared conditions.
If this is right
- Resolved rates increase with both predicted and gold localization across all tested backbones.
- Gold localization achieves the highest pooled resolved rate of 52.4%.
- Mean elapsed time decreases with localization guidance in paired analysis.
- Token effects remain heterogeneous across models despite overall latency improvements.
- Gold-guided failures reveal remaining headroom beyond localization.
Where Pith is reading between the lines
- Future work could explore combining multiple localizers to approach gold performance more closely.
- The modular design makes it straightforward to swap in new localization methods for testing.
- Similar decoupling might reveal localization benefits in other software engineering tasks involving LLMs.
- The time savings could make repair systems more practical for large repositories if the pattern holds.
Load-bearing premise
The three repair backbones and SWE-bench Verified dataset are representative enough for the observed localization benefit to apply more broadly.
What would settle it
Observing no improvement or a drop in resolved rates when adding explicit localization in experiments with new backbones or datasets would falsify the claim that localization is a consistent repair lever.
Figures
read the original abstract
Repository-grounded automated repair is often reported as a single end-to-end capability, which hides distinct failure modes such as poor file targeting, incorrect patch synthesis, and failed iterative debugging. We present Loc2Repair, a modular evaluation framework for controlled analysis of repository-grounded repair pipelines, and use it to isolate file-level issue localization as an upstream variable. Loc2Repair decouples localization and repair under a shared runtime, artifact schema, and evaluation harness, allowing researchers to combine different localization models and repair backbones under matched conditions. Using three repair backbones on SWE-bench Verified, we compare baseline repair without explicit localization, repair guided by predicted localization from two localizers, and repair guided by gold modified-file sets. Explicit localization consistently improves resolved rate across all backbones: pooled performance increases from 44.7% for baseline repair to 48.9% and 49.1% with predicted localization, and to 52.4% with gold localization. Localization also reduces mean elapsed time overall: in pooled paired analysis, mean elapsed time decreases by 100.94 s and 52.25 s for the two predicted-localization settings, and by 154.45 s with gold guidance, although token effects remain heterogeneous across models. Overall, Loc2Repair shows file-level localization is a consistent repair lever, improving effectiveness and mean latency in pooled analysis, while gold-guided failures expose headroom beyond localization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Loc2Repair, a modular framework that decouples file-level issue localization from repair in repository-grounded LLM repair pipelines under shared runtime, schema, and harness. On SWE-bench Verified with three repair backbones, it compares baseline repair (no explicit localization) against repair guided by two predicted localizers and by gold modified-file sets. Pooled results show resolved-rate gains from 44.7% (baseline) to 48.9%/49.1% (predicted) and 52.4% (gold), with corresponding mean elapsed-time reductions of 100.94 s, 52.25 s, and 154.45 s; gold guidance is used to expose remaining headroom.
Significance. If the empirical comparisons hold, the work demonstrates that explicit file-level localization is a consistent, actionable lever for both effectiveness and mean latency in repo-level repair. The modular decoupling under matched conditions supplies a reusable experimental scaffold for the community; the multi-backbone design and gold-localization upper bound provide concrete, falsifiable evidence rather than end-to-end black-box claims.
minor comments (1)
- [Evaluation] The abstract states that token effects remain heterogeneous across models; a brief per-backbone breakdown of token counts (or a supplementary table) would clarify whether the latency gains are driven primarily by fewer repair iterations or by localization overhead.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The review accurately captures the core contribution of Loc2Repair as a modular framework for isolating the effects of file-level localization under controlled conditions.
Circularity Check
No significant circularity detected
full rationale
The paper presents an empirical evaluation framework (Loc2Repair) that decouples localization from repair and measures outcomes via controlled experiments on the external SWE-bench Verified benchmark across three independent repair backbones. All reported improvements (resolved rates from 44.7% baseline to 48.9/49.1% predicted and 52.4% gold; latency reductions) are direct measured results from these runs under matched runtime and harness conditions. No equations, fitted parameters, self-citations, or ansatzes appear in the derivation chain; the claims do not reduce to inputs by construction and remain self-contained against the external benchmark.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption SWE-bench Verified is an appropriate benchmark for evaluating repository-level repair performance.
Reference graph
Works this paper leans on
-
[1]
C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, K. R. Narasimhan, SWE-bench: Can language models resolve real-world github issues?, in: The Twelfth International Conference on Learning Representations, 2024. URL: https://openreview.net/forum?id=VTF8yNQM66
2024
-
[2]
J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. R. Narasimhan, O. Press, SWE-agent: Agent-computer interfaces enable automated software engineering, in: The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL: https://openreview.net/forum ?id=mXpq6ut8J3
2024
-
[3]
C. S. Xia, Y. Deng, S. Dunn, L. Zhang, Demystifying llm-based software engineering agents, Proc. ACM Softw. Eng. 2 (2025). URL: https://doi.org/10.1145/3715754. doi:10.1145/3715754
-
[4]
Y. Zhang, H. Ruan, Z. Fan, A. Roychoudhury, Autocoderover: Autonomous program improvement, in: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, 2024, p. 1592–1604. URL: https://doi.org/10.1145/3650212.3680384. doi:10.1145/3650212.3680384
-
[5]
X. Wang, B. Li, Y. Song, OpenHands: An open platform for AI software developers as generalist agents, in: The Thirteenth International Conference on Learning Representations, 2025. URL: https://openreview.net/forum?id=OJd3ayDDoF
2025
-
[6]
Q. Zhang, C. Gao, Y. Han, Y. Shang, C. Fang, Z. Chen, L. Xiao, Sgagent: Suggestion-guided llm- based multi-agent framework for repository-level software repair, 2026. URL: https://arxiv.org/ab s/2602.23647.arXiv:2602.23647
Pith/arXiv arXiv 2026
-
[7]
M. N. Al Awad, S. Ivanov, O. Tikhonova, Optimizing llm code suggestions: Feedback-driven timing with lightweight state bounds, in: Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), 2025, pp. 213–220. doi:10.1109/ASEW 67777.2025.00049
-
[8]
Calibration and correctness of language models for code,
I. Bouzenia, P. Devanbu, M. Pradel, Repairagent: An autonomous, llm-based agent for program repair, in: Proceedings of the IEEE/ACM 47th International Conference on Software Engineering, ICSE ’25, IEEE Press, 2025, p. 2188–2200. URL: https://doi.org/10.1109/ICSE55347.2025.00157. doi:10.1109/ICSE55347.2025.00157
-
[9]
X. Yin, C. Ni, S. Wang, Z. Li, L. Zeng, X. Yang, Thinkrepair: Self-directed automated program repair, in: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, 2024, p. 1274–1286. URL: https://doi.org/10.1145/3650212.3680359. doi:10.1145/3650212.3680359
-
[10]
C. Lee, C. S. Xia, L. Yang, J. tse Huang, Z. Zhu, L. Zhang, M. R. Lyu, Unidebugger: Hierarchical multi-agent framework for unified software debugging, 2025. URL: https://arxiv.org/abs/2404.17153. arXiv:2404.17153
arXiv 2025
-
[11]
J. Liu, Z. Liu, Z. Cheng, M. He, X. Shi, Y. Guo, X. Zhu, Y. Guo, Y. Wang, H. Wang, RepoDebug: Repository-level multi-task and multi-language debugging evaluation of large language models, in: Findings of the Association for Computational Linguistics: EMNLP 2025, Association for Computational Linguistics, 2025, pp. 23784–23813. URL: https://aclanthology.or...
-
[12]
M. S. Rashid, C. Bock, Y. Zhuang, A. Buchholz, T. B. Esler, S. Valentin, L. Franceschi, M. Wistuba, P. T. S, W. Kim, A. Deoras, G. Zappella, L. Callot, SWE-polybench: A multi-language benchmark for repository level evaluation of coding agents, 2026. URL: https://openreview.net/forum?id=n5 77FC6CKk
2026
-
[13]
F. Mu, J. Wang, L. Shi, S. Wang, S. Li, Q. Wang, ExpeRepair: Dual-memory enhanced LLM-based repository-level program repair, 2025. URL: https://arxiv.org/abs/2506.10484. doi:10.48550/arX iv.2506.10484.arXiv:2506.10484
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arx 2025
-
[14]
M. V. T. Pham, H. N. Phan, H. N. Phan, C. L. Chi, T. N. Nguyen, N. D. Q. Bui, SWE-Synth: Synthesizing verifiable bug-fix data to enable large language models in resolving real-world bugs, 2025. URL: https://arxiv.org/abs/2504.14757. doi: 10.48550/arXiv.2504.14757 . arXiv:2504.14757
-
[15]
M. N. Al Awad, S. Ivanov, O. Tikhonova, Pre-filtering code suggestions using developer behavioral telemetry to optimize llm-assisted programming, in: Proceedings of the 40th IEEE/ACM Interna- tional Conference on Automated Software Engineering Workshops (ASEW), 2025, pp. 113–120. doi:10.1109/ASEW67777.2025.00032
-
[16]
R. K. Saha, M. Lease, S. Khurshid, D. E. Perry, Improving bug localization using structured infor- mation retrieval, in: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, ASE ’13, IEEE Press, 2013, p. 345–355. URL: https://doi.org/10.1109/ASE.20 13.6693093. doi:10.1109/ASE.2013.6693093
-
[17]
X. Ye, R. Bunescu, C. Liu, Learning to rank relevant files for bug reports using domain knowledge, in: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, Association for Computing Machinery, New York, NY, USA, 2014, p. 689–699. URL: https://doi.org/10.1145/2635868.2635874. doi:10.1145/2635868.2635874
-
[18]
S. Wang, D. Lo, Amalgam+: Composing rich information sources for accurate bug lo- calization, Journal of Software: Evolution and Process 28 (2016) 921–942. URL: https: //onlinelibrary.wiley.com/doi/abs/10.1002/smr.1801. doi: 1 0 . 1 0 0 2 / s m r . 1 8 01. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/smr.1801
-
[19]
S. A. Akbar, A. C. Kak, A large-scale comparative evaluation of ir-based tools for bug localization, in: Proceedings of the 17th International Conference on Mining Software Repositories, MSR ’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 21–31. URL: https: //doi.org/10.1145/3379597.3387474. doi:10.1145/3379597.3387474
-
[20]
F. Niu, C. Li, K. Liu, X. Xia, D. Lo, When deep learning meets information retrieval-based bug localization: A survey, ACM Comput. Surv. 57 (2025). URL: https://doi.org/10.1145/3734217. doi:10.1145/3734217
-
[21]
Z. Chen, R. Tang, G. Deng, F. Wu, J. Wu, Z. Jiang, V. Prasanna, A. Cohan, X. Wang, Locagent: Graph-guided LLM agents for code localization, in: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vienna, Austria, 2025, pp. 8697–8727. URL: https://aclant...
- [22]
-
[23]
Z. Yu, H. Zhang, Y. Zhao, H. Huang, M. Yao, K. Ding, J. Zhao, OrcaLoca: An LLM agent framework for software issue localization, in: Proceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, PMLR, 2025, pp. 73416–73436. URL: https://proceedings.mlr.press/v267/yu25x.html
2025
-
[24]
Maarleveld, J
J. Maarleveld, J. Guo, D. Feitosa, Gotta catch ’em all! towards file localisation from issues at large,
-
[25]
URL: https://arxiv.org/abs/2507.18319.arXiv:2507.18319
-
[26]
R. G. Reddy, T. Suresh, J. Doo, Y. Liu, X.-P. Nguyen, Y. Zhou, S. Yavuz, C. Xiong, H. Ji, S. Joty, SWERank: Software issue localization with code ranking, in: The Fourteenth International Conference on Learning Representations, 2026. URL: https://openreview.net/forum?id=OnkRqb Nhe3
2026
-
[27]
S. B. Hossain, N. Jiang, Q. Zhou, X. Li, W.-H. Chiang, Y. Lyu, H. Nguyen, O. Tripp, A deep dive into large language models for automated bug localization and repair, Proc. ACM Softw. Eng. 1 (2024). URL: https://doi.org/10.1145/3660773. doi:10.1145/3660773
-
[28]
Q. Feng, X. Ma, J. Sheng, Z. Feng, W. Song, P. Liang, Integrating various software artifacts for better llm-based bug localization and program repair, ACM Trans. Softw. Eng. Methodol. (2025). URL: https://doi.org/10.1145/3770581. doi:10.1145/3770581, just Accepted
-
[29]
M. Sepidband, H. Taherkhani, H. V. Pham, H. Hemmati, Rgfl: Reasoning guided fault localization for automated program repair using large language models, 2026. URL: https://arxiv.org/abs/2601 .18044.arXiv:2601.18044
arXiv 2026
-
[30]
Accessed: 2026-05-03
SWE-agent Team, mini-SWE-agent, https://github.com/SWE-agent/mini-swe-agent, 2024. Accessed: 2026-05-03
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.