pith. machine review for the scientific record. sign in

arxiv: 2604.08089 · v1 · submitted 2026-04-09 · 💻 cs.SE

Recognition: 2 theorem links

· Lean Theorem

GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:55 UTC · model grok-4.3

classification 💻 cs.SE
keywords multimodal automated program repairbug localizationgraph alignmentUI graphcall graphSWE-bench MultimodalLLM-based repairvisual-to-code mapping
0
0 comments X

The pith

Graph alignment between UI screenshots and code structures enables precise bug localization in multimodal automated program repair.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that converting GUI screenshots to plain text for LLM-based repair discards spatial relationships and reduces localization to loose keyword matching. It proposes instead to build an Image UI Graph capturing visual elements and relations, then align it first at the file level with repository structures and next at the function level with code dependency graphs such as call graphs. This hierarchical process grounds visual bug reports directly to specific code components before patch generation. If correct, the method would produce more accurate fixes for real apps where bugs arrive as screenshots rather than text descriptions. The central demonstration is state-of-the-art results on a multimodal benchmark that tests this exact setting.

Core claim

GALA builds an Image UI Graph from the screenshot to represent elements and their structural relationships, performs file-level alignment by cross-referencing the graph against repository file references, conducts function-level alignment by reasoning over code call graphs and dependencies to map visual elements to precise code locations, and finally generates patches inside the resulting grounded context. The framework enforces both semantic and relational consistency across the image and code modalities.

What carries the argument

Hierarchical structural alignment that cross-references an Image UI Graph with repository-level file structures and code-level dependency graphs to create an explicit visual-to-code mapping.

If this is right

  • Localization moves from imprecise semantic guessing to explicit relational matching.
  • Patch generation occurs inside a code context that has been directly grounded to the reported visual bug.
  • Both file-level and function-level decisions benefit from the same cross-modal consistency checks.
  • The approach scales to any multimodal bug report that includes a GUI screenshot.
  • Performance gains appear specifically on benchmarks that supply visual observations alongside code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-alignment pattern could be applied to other cross-modal software tasks such as UI test generation or visual debugging.
  • Explicit structural mappings may reduce the rate at which LLMs hallucinate unrelated code changes when given image evidence.
  • Testing the method on real-world user-submitted screenshots rather than benchmark images would reveal whether the alignment generalizes beyond curated data.
  • If graph construction proves costly, lighter approximations of the UI graph might still retain enough structure to improve over text-only baselines.

Load-bearing premise

The assumption that converting screenshots into graphs will preserve the spatial relationships needed to match visual elements reliably to the correct code components.

What would settle it

Randomizing or removing the relational edges inside the Image UI Graph and observing whether GALA's localization accuracy on the SWE-bench Multimodal benchmark falls to the level of simple text-based keyword matching.

Figures

Figures reproduced from arXiv: 2604.08089 by Shikun Zhang, Shu-Dong Huang, Wei Ye, Yang Liu, Zhengran Zeng, Zhuoyao Liu.

Figure 1
Figure 1. Figure 1: Comparison between previous works on Multi [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of GALA. The overall workflow consists of four key stages: (1) the Image Graph Construction module, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of the image graph constructed by GALA [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Stage-wise Simplified Prompt for GALA Structure aware Refinement via File Graph Alignment. Within the same model call, we further introduce structural con￾straints by constructing a candidate file graph over the retrieved candidate file list, where nodes correspond to candidate files and edges represent inter-file dependencies derived from static import relationships. This graph enables structure aware rea… view at source ↗
read the original abstract

Large Language Model (LLM)-based Automated Program Repair (APR) has shown strong potential on textual benchmarks, yet struggles in multimodal scenarios where bugs are reported with GUI screenshots. Existing methods typically convert images into plain text, which discards critical spatial relationships and causes a severe disconnect between visual observations and code components, leading localization to degrade into imprecise keyword matching. To bridge this gap, we propose GALA (Graph Alignment for Localization in APR), a framework that shifts multimodal APR from implicit semantic guessing to explicit structural reasoning. GALA operates in four stages: it first constructs an Image UI Graph to capture visual elements and their structural relationships; then performs file-level alignment by cross-referencing this UI graph with repository-level structures (e.g., file references) to locate candidate files; next conducts function-level alignment by reasoning over fine-grained code dependencies (e.g., call graphs) to precisely ground visual elements to corresponding code components; and finally performs patch generation within the grounded code context based on the aligned files and functions. By systematically enforcing both semantic and relational consistency across modalities, GALA establishes a highly accurate visual-to-code mapping. Evaluations on the SWE-bench Multimodal benchmark demonstrate that GALA achieves state-of-the-art performance, highlighting the effectiveness of hierarchical structural alignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes GALA, a four-stage framework for multimodal automated program repair that constructs an Image UI Graph from GUI screenshots to capture visual elements and spatial relationships, performs file-level alignment against repository structures, conducts function-level alignment via call graphs, and generates patches in the grounded context. It claims this explicit structural reasoning outperforms implicit LLM text-based approaches and achieves state-of-the-art results on the SWE-bench Multimodal benchmark.

Significance. If the hierarchical alignment reliably improves visual-to-code grounding over text-only baselines, the work could meaningfully advance APR for GUI-reported bugs by replacing ad-hoc image-to-text conversion with explicit graph-based consistency enforcement. The procedural pipeline description is clear and the focus on structural rather than purely semantic matching addresses a documented limitation in current multimodal APR.

major comments (3)
  1. [Abstract] Abstract: The claim that 'GALA achieves state-of-the-art performance' on SWE-bench Multimodal is unsupported by any quantitative metrics, baseline comparisons, ablation results, or error analysis in the provided text. Without tables reporting success rates, localization precision, or patch generation accuracy versus text-only LLM baselines, the central empirical claim cannot be evaluated.
  2. [Method] Method description (four-stage pipeline): The framework treats accurate UI-graph extraction from screenshots and reliable grounding of visual elements to file/function references as given, yet supplies no quantitative breakdown of (a) vision-component precision/recall, (b) recall of repository references in real bug reports, or (c) ablation removing the graph stages while retaining the same LLM backbone. If any stage has high error, reported gains reduce to prompt engineering rather than structural reasoning.
  3. [Evaluation] Evaluation section: No results, figures, or tables are present to substantiate the 'hierarchical structural alignment' effectiveness claim. The weakest assumption—that converting images to text discards critical spatial relationships and that explicit graphs will bridge them—remains untested in the manuscript.
minor comments (2)
  1. [Abstract] The acronym 'GALA' is defined inconsistently as 'Graph Alignment for Localization in APR' in the abstract but the title uses 'Multimodal Graph Alignment'; standardize the expansion.
  2. [Method] Notation for the Image UI Graph and cross-modal alignment steps is introduced procedurally without formal definitions or pseudocode, making reproducibility harder.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments correctly identify that the current manuscript draft lacks the quantitative evidence needed to support the central claims. We will make major revisions to include the missing results, ablations, and analyses. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that 'GALA achieves state-of-the-art performance' on SWE-bench Multimodal is unsupported by any quantitative metrics, baseline comparisons, ablation results, or error analysis in the provided text. Without tables reporting success rates, localization precision, or patch generation accuracy versus text-only LLM baselines, the central empirical claim cannot be evaluated.

    Authors: We agree that the abstract's SOTA claim is not supported by numbers in the provided text. The current draft contains only the high-level abstract without the supporting tables or metrics. In the revised manuscript we will expand the abstract to include key quantitative results (e.g., success rate, localization precision, and improvement over text-only baselines) and will ensure the Evaluation section supplies the full tables, baseline comparisons, ablations, and error analysis. revision: yes

  2. Referee: [Method] Method description (four-stage pipeline): The framework treats accurate UI-graph extraction from screenshots and reliable grounding of visual elements to file/function references as given, yet supplies no quantitative breakdown of (a) vision-component precision/recall, (b) recall of repository references in real bug reports, or (c) ablation removing the graph stages while retaining the same LLM backbone. If any stage has high error, reported gains reduce to prompt engineering rather than structural reasoning.

    Authors: We accept this criticism. The current method description presents the four-stage pipeline without empirical validation of its components. We will add a dedicated subsection with quantitative results for (a) precision/recall of the vision-based UI-graph extraction, (b) recall of repository file/function references extracted from bug reports, and (c) an ablation that disables the graph-alignment stages while keeping the identical LLM backbone. These additions will allow readers to assess whether the reported gains derive from structural reasoning or from prompt engineering. revision: yes

  3. Referee: [Evaluation] Evaluation section: No results, figures, or tables are present to substantiate the 'hierarchical structural alignment' effectiveness claim. The weakest assumption—that converting images to text discards critical spatial relationships and that explicit graphs will bridge them—remains untested in the manuscript.

    Authors: The referee is correct: the provided manuscript text contains no Evaluation section, figures, or tables. We will insert a complete Evaluation section that reports results on the SWE-bench Multimodal benchmark, includes figures and tables comparing hierarchical graph alignment against text-only baselines, and directly tests the assumption that image-to-text conversion loses spatial information while explicit graphs recover it. We will also incorporate error analysis and ablation studies as requested. revision: yes

Circularity Check

0 steps flagged

No circularity; procedural framework evaluated empirically

full rationale

The paper presents GALA as a four-stage procedural pipeline (UI graph construction, file-level alignment via repository references, function-level alignment via call graphs, patch generation) without equations, fitted parameters, predictions, or self-referential derivations. Central claims rest on empirical SOTA results on SWE-bench Multimodal rather than any reduction of outputs to inputs by construction. No self-citations, uniqueness theorems, or ansatzes appear in a load-bearing role within the provided text, making the framework self-contained as an engineering proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the high-level framework description; the Image UI Graph is introduced as a modeling construct without independent evidence or formal definition.

invented entities (1)
  • Image UI Graph no independent evidence
    purpose: Capture visual elements and their structural relationships from screenshots
    Core new modeling step in the first stage of the framework

pith-pipeline@v0.9.0 · 5536 in / 1128 out tokens · 33464 ms · 2026-05-10T17:55:38.236240+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    GALA operates in four stages: it first constructs an Image UI Graph to capture visual elements and their structural relationships; then performs file-level alignment by cross-referencing this UI graph with repository-level structures... function-level alignment by reasoning over fine-grained code dependencies (e.g., call graphs)

  • IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    By systematically enforcing both semantic and relational consistency across modalities, GALA establishes a highly accurate visual-to-code mapping.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 13 canonical work pages · 3 internal anchors

  1. [1]

    Waqas Ali, Lili Bo, Xiaobing Sun, Xiaoxue Wu, Saifullah Memon, Saima Siraj, and Ann Suwaree Ashton. 2023. Automated software bug localization enabled by meta-heuristic-based convolutional neural network and improved deep neural network.Expert Systems with Applications232 (2023), 120562

  2. [2]

    Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, and William Wang. 2024. Swe-search: Enhancing software agents with monte carlo tree search and iterative refinement.arXiv preprint arXiv:2410.20285(2024)

  3. [3]

    Fraol Batole, David OBrien, Tien Nguyen, Robert Dyer, and Hridesh Rajan

  4. [4]

    In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)

    An LLM-Based Agent-Oriented Approach for Automated Code Design Issue Localization. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 637–637

  5. [5]

    Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. 2025. Repairagent: An autonomous, llm-based agent for program repair. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 2188–2200

  6. [6]

    Partha Chakraborty, Mahmoud Alfadel, and Meiyappan Nagappan. 2025. BLAZE: Cross-language and cross-project bug localization via dynamic chunking and hard example learning.IEEE Transactions on Software Engineering(2025)

  7. [7]

    Zhaoling Chen, Robert Tang, Gangda Deng, Fang Wu, Jialong Wu, Zhiwei Jiang, Viktor Prasanna, Arman Cohan, and Xingyao Wang. 2025. Locagent: Graph- guided llm agents for code localization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 8697– 8727

  8. [8]

    Agnieszka Ciborowska and Kostadin Damevski. 2022. Fast changeset-based bug localization with BERT. InProceedings of the 44th international conference on software engineering. 946–957

  9. [9]

    Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1469–1481

  10. [10]

    Kai Huang, Jian Zhang, Xiangxin Meng, and Yang Liu. 2025. Template-Guided Program Repair in the Era of Large Language Models.. InICSE. 1895–1907

  11. [11]

    Kai Huang, Jian Zhang, Xiaofei Xie, and Chunyang Chen. 2025. Seeing is fixing: Cross-modal reasoning with multimodal llms for visual software issue fixing. arXiv preprint arXiv:2506.16136(2025)

  12. [12]

    Xuan Huo and Ming Li. 2017. Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code.. InIJCAI. 1909–1915

  13. [13]

    Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of code lan- guage models on automated program repair. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1430–1442

  14. [14]

    Zhonghao Jiang, Xiaoxue Ren, Meng Yan, Wei Jiang, Yong Li, and Zhongxin Liu

  15. [15]

    In 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)

    Issue Localization via LLM-Driven Iterative Code Graph Searching. In 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 3034–3045

  16. [16]

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770(2023)

  17. [17]

    An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE, 218–229

  18. [18]

    Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated program repair.Commun. ACM62, 12 (2019), 56–65

  19. [19]

    Cheryl Lee, Chunqiu Steven Xia, Longji Yang, Jen-tse Huang, Zhouruixing Zhu, Lingming Zhang, and Michael R Lyu. 2025. Unidebugger: Hierarchical multi- agent framework for unified software debugging. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 18248–18277

  20. [20]

    Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, and Qianxiang Wang. 2025. Swe-debate: Competitive multi-agent debate for software issue resolution.arXiv preprint arXiv:2507.23348(2025)

  21. [21]

    Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Zhiyong Huang, and Jing Ma

  22. [22]

    InFindings of the Association for Computational Linguistics: EMNLP 2024

    Mmcode: Benchmarking multimodal large language models for code gen- eration with visually rich programming problems. InFindings of the Association for Computational Linguistics: EMNLP 2024. 736–783

  23. [23]

    Derrick Lin, James Koppel, Angela Chen, and Armando Solar-Lezama. 2017. QuixBugs: A multi-lingual program repair benchmark set based on the Quixey Challenge. InProceedings Companion of the 2017 ACM SIGPLAN international conference on systems, programming, languages, and applications: software for humanity. 55–56

  24. [24]

    Wei Liu, Chao Peng, Pengfei Gao, Aofan Liu, Wei Zhang, Haiyan Zhao, and Zhi Jin. 2025. GraphLocator: Graph-guided Causal Reasoning for Issue Localization. arXiv preprint arXiv:2512.22469(2025)

  25. [25]

    Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, and Yongbin Li. 2025. Alibaba lingmaagent: Improving automated issue resolution via com- prehensive repository exploration. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. 238–249

  26. [26]

    Yicheng Ouyang, Jun Yang, and Lingming Zhang. 2024. Benchmarking automated program repair: An extensive study on both real-world and artificial bugs. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 440–452

  27. [27]

    Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, and Michael Lyu. 2024. Domain knowledge matters: Improving prompts with fix templates for repairing python type errors. InProceedings of the 46th ieee/acm international conference on software engineering. 1–13

  28. [28]

    Revanth Gangi Reddy, Tarun Suresh, JaeHyeok Doo, Ye Liu, Xuan Phi Nguyen, Yingbo Zhou, Semih Yavuz, Caiming Xiong, Heng Ji, and Shafiq Joty. 2025. Swerank: Software issue localization with code ranking.arXiv preprint arXiv:2505.07849(2025)

  29. [29]

    Asif Mohammed Samir and Mohammad Masudur Rahman. 2026. Improved Bug Localization with AI Agents Leveraging Hypothesis and Dynamic Cognition. arXiv preprint arXiv:2601.12522(2026)

  30. [30]

    Aditya Bharat Soni, Boxuan Li, Xingyao Wang, Valerie Chen, and Graham Neubig

  31. [31]

    InICML 2025 Workshop on Computer Use Agents

    Coding Agents with Multimodal Browsing are Generalist Problem Solvers. InICML 2025 Workshop on Computer Use Agents

  32. [32]

    Shin Hwei Tan, Jooyong Yi, Sergey Mechtaev, Abhik Roychoudhury, et al. 2017. Codeflaws: a programming competition benchmark for evaluating automated program repair tools. In2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 180–182

  33. [33]

    Xiaoxuan Tang, Jincheng Wang, Liwei Luo, Jingxuan Xu, Sheng Zhou, Dajun Chen, Wei Jiang, and Yong Li. 2026. SVRepair: Structured Visual Reasoning for Automated Program Repair.arXiv preprint arXiv:2602.06090(2026)

  34. [34]

    Boshi Wang, Weijian Xu, Yunsheng Li, Mei Gao, Yujia Xie, Huan Sun, and Dongdong Chen. 2025. Improving code localization with repository memory. arXiv preprint arXiv:2510.01003(2025)

  35. [35]

    Hanbin Wang, Xiaoxuan Zhou, Zhipeng Xu, Keyuan Cheng, Yuxin Zuo, Kai Tian, Jingwei Song, Junting Lu, Wenhui Hu, and Xueyang Liu. 2025. Code- vision: evaluating multimodal LLMs logic understanding and code generation capabilities.arXiv preprint arXiv:2502.11829(2025)

  36. [36]

    Weishi Wang, Yue Wang, Shafiq Joty, and Steven CH Hoi. 2023. Rap-gen: Retrieval-augmented patch generation with codet5 for automatic program repair. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 146–158

  37. [37]

    Ying Wang, Wenjun Mao, Chong Wang, Zhenhao Zhou, Yicheng Zhou, Wenyun Zhao, Yiling Lou, and Xin Peng. 2025. Extracting Conceptual Knowledge to Locate Software Issues.arXiv preprint arXiv:2509.21427(2025)

  38. [38]

    Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, and Sameena Shah. 2023. How effective are neural networks for fixing security vulnerabilities. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1282–1294

  39. [39]

    Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. 2024. Agentless: Demystifying llm-based software engineering agents.arXiv preprint arXiv:2407.01489(2024)

  40. [40]

    Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. 2025. Demystifying llm-based software engineering agents.Proceedings of the ACM on Software Engineering2, FSE (2025), 801–824

  41. [41]

    Chunqiu Steven Xia, Yifeng Ding, and Lingming Zhang. 2023. The plastic surgery hypothesis in the era of large language models. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 522– 534

  42. [42]

    Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated program repair in the era of large pre-trained language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1482–1494

  43. [43]

    Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959–971

  44. [44]

    Ke Xu, Siyang Xiao, Ming Liang, Yichen Yu, Zhixiang Wang, Jingxuan Xu, Dajun Chen, Wei Jiang, and Yong Li. 2026. Learning Adaptive Parallel Execution for Efficient Code Localization.arXiv preprint arXiv:2601.19568(2026)

  45. [45]

    Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F Bissyandé, and Shunfu Jin. 2024. Cref: An llm-based conversational software repair framework for programming tutors. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 882– 894

  46. [46]

    John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems37 (2024), 50528–50652

  47. [47]

    John Yang, Carlos E Jimenez, Alex L Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik R Narasimhan, et al

  48. [48]

    InThe Thirteenth International Conference on Learning Representations

    SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?. InThe Thirteenth International Conference on Learning Representations

  49. [49]

    Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang

  50. [50]

    InProceedings of the Zhuoyao Liu, Zhengran Zeng, Shudong Huang, Yang Liu, Shikun Zhang, and Wei Ye 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

    Thinkrepair: Self-directed automated program repair. InProceedings of the Zhuoyao Liu, Zhengran Zeng, Shudong Huang, Yang Liu, Shikun Zhang, and Wei Ye 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1274–1286

  51. [51]

    Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen

  52. [52]

    A survey of learning-based automated program repair.ACM Transactions on Software Engineering and Methodology33, 2 (2023), 1–69

  53. [53]

    Quanjun Zhang, Chunrong Fang, Yang Xie, YuXiang Ma, Weisong Sun, Yun Yang, and Zhenyu Chen. 2024. A systematic literature review on large language models for automated program repair.ACM Transactions on Software Engineering and Methodology(2024)

  54. [54]

    Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. Gamma: Revisiting template-based automated program repair via mask prediction. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 535–547

  55. [55]

    Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024. Autocoderover: Autonomous program improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1592– 1604

  56. [56]

    Jiuang Zhao, Donghao Yang, Li Zhang, Xiaoli Lian, Zitian Yang, and Fang Liu

  57. [57]

    InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

    Enhancing automated program repair with solution design. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 1706–1718