arxiv: 2604.03113 · v2 · submitted 2026-04-03 · 💻 cs.SE

Recognition: no theorem link

PAFT: Preservation Aware Fine-Tuning for Minimal-Edit Program Repair

Boyang Yang , Zijian Cai , Shunfu Jin , Haoye Tian

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:18 UTC · model grok-4.3

classification 💻 cs.SE

keywords program repairfine-tuninglarge language modelsminimal editstoken preservationDefects4JHumanEval-Javaautomated repair

0 comments

The pith

Preservation-aware fine-tuning produces more accurate yet shorter patches in LLM-based program repair.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PAFT to train large language models on program repair so they learn explicitly which tokens to leave unchanged. It creates these signals by aligning the buggy code with its fixed version, masks the full sequence, and trains with a curriculum ordered by how difficult each edit is. On Defects4J and HumanEval-Java the method raises the fraction of patches that pass tests by up to 65.6 percent while cutting average edit distance by up to 32.6 percent, beating both ordinary fine-tuning and a preference-based baseline. A reader would care because most current repairs rewrite far more code than needed, which raises review and long-term maintenance costs; minimal, localized fixes would make automated repair more practical in real codebases.

Core claim

PAFT derives token-level preservation signals by aligning buggy and fixed code, combines them with full-sequence masking, and applies an edit-difficulty curriculum. This produces models that generate plausible patches with higher pass@1 rates and lower average edit distances on Defects4J and HumanEval-Java, outperforming both standard supervised fine-tuning and the AdaPatcher baseline while requiring no extra search or reranking at inference time.

What carries the argument

Token-level preservation signals obtained from aligning buggy and fixed code versions, combined with full-sequence masking and an edit-difficulty curriculum.

If this is right

Patches concentrate changes on faulty regions while preserving stable surrounding code.
Pass@1 rates rise by up to 65.6 percent over standard fine-tuning on the evaluated benchmarks.
Average edit distance falls by up to 32.6 percent, lowering review effort.
The method beats preference-based baselines such as AdaPatcher on Defects4J without inference-time search or post-processing.
Smaller, localized patches emerge directly from the fine-tuned model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The preservation signals could be reused for other code-editing tasks such as refactoring or test generation.
Testing the same signals on models larger than 6.7B parameters might show whether the locality benefit scales.
Pairing PAFT with automated test-case generation could help when minimal edits still fail incomplete test suites.
Lower edit distances imply reduced long-term maintenance costs once the patches are deployed.

Load-bearing premise

Aligning buggy and fixed code produces reliable token-level preservation signals that generalize without introducing alignment artifacts or dataset-specific biases.

What would settle it

Apply the same training procedure to a new program-repair dataset whose buggy-fixed pairs contain noisy or inconsistent alignments and check whether the reported gains in pass@1 and reductions in edit distance disappear.

Figures

Figures reproduced from arXiv: 2604.03113 by Boyang Yang, Haoye Tian, Shunfu Jin, Zijian Cai.

**Figure 1.** Figure 1: presents a representative case on Defects4J Cli-28 with DeepSeek-Coder-6.7B. This defect is a useful stress test for preservation because the required semantic change is highly localized, while the surrounding context already makes the intended behavior explicit. The loop should process each option independently. When a property value is not affirmative, the correct behavior is to skip [PITH_FULL_IMAGE:f… view at source ↗

**Figure 2.** Figure 2: Overview of PAFT. We derive an alignment-guided preservation signal from (𝑝, 𝑏, 𝑓 ) and fine-tune a quantized code LLM with QLoRA using a preservation-aware token weighting and an edit-difficulty curriculum. 3.4 Repair Inference At inference time, PAFT uses the same instruction or chat template as in training to serialize a repair context𝑐, consisting of the prompt and buggy code followed by the assistant-… view at source ↗

**Figure 3.** Figure 3: Distribution of AED values over plausible patches [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Large language models (LLMs) are effective for automated program repair, but plausible patches that pass the full test suite often rewrite more code than necessary, increasing review and maintenance costs. This over-editing is common because most bugs are localized, while standard supervised fine-tuning provides no explicit signal about which tokens should be preserved and which should be changed. We propose PAFT, a preservation-aware fine-tuning method for minimal-edit program repair. PAFT derives token-level preservation signals by aligning buggy and fixed code, combines them with full-sequence masking, and applies an edit-difficulty curriculum. Across Defects4J and HumanEval-Java, PAFT improves pass@1 by up to 65.6% over standard supervised fine-tuning (StdFT) while reducing average edit distance (AED) by up to 32.6%. On Defects4J with DeepSeek-Coder-6.7B, PAFT also outperforms AdaPatcher, a strong preference-based repair baseline, improving pass@1 from 5.9% to 10.1% while reducing median AED from 61.0 to 42.0. Overall, PAFT preserves stable context and concentrates edits on faulty regions, yielding smaller, more localized, plausible patches without inference-time search, reranking, or post-processing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes PAFT, a preservation-aware fine-tuning method for LLMs in automated program repair. It derives token-level preservation signals by aligning buggy and fixed code, combines them with full-sequence masking, and applies an edit-difficulty curriculum during training. Experiments on Defects4J and HumanEval-Java report up to 65.6% pass@1 gains over standard supervised fine-tuning, up to 32.6% reduction in average edit distance, and outperformance of AdaPatcher (pass@1 from 5.9% to 10.1%, median AED from 61.0 to 42.0) on Defects4J with DeepSeek-Coder-6.7B, yielding smaller localized patches without inference-time changes.

Significance. If the gains are shown to stem specifically from the alignment-derived preservation signals rather than masking or curriculum alone, PAFT would offer a lightweight training-time mechanism to reduce over-editing in repair patches, lowering review and maintenance costs while remaining compatible with existing inference pipelines.

major comments (2)

[§3] §3 (Method): The token-level preservation labels are extracted via string alignment of buggy and fixed code; the manuscript does not specify the alignment algorithm (e.g., diff, edit script) or how it handles identifier renames, reordered equivalent statements, or whitespace-only changes. Because the loss directly weights these labels, any systematic misalignment would make the reported pass@1 and AED improvements benchmark-specific rather than a general advance in minimal-edit repair.
[§4] §4 (Experiments): The headline results (65.6% pass@1 lift, 32.6% AED reduction, AdaPatcher comparison) are presented without error bars, ablation studies isolating the preservation signal from masking and curriculum, or statistical significance tests. Without these, it is impossible to confirm that the preservation component is load-bearing for the central claim.

minor comments (2)

[Abstract] Abstract: The phrase 'up to 65.6%' is used for both pass@1 and AED; clarify whether these maxima occur on the same benchmark/model pair.
[§4.2] §4.2: Provide the exact number of training examples and the distribution of edit difficulties used in the curriculum.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important areas for improving reproducibility and experimental rigor. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§3] §3 (Method): The token-level preservation labels are extracted via string alignment of buggy and fixed code; the manuscript does not specify the alignment algorithm (e.g., diff, edit script) or how it handles identifier renames, reordered equivalent statements, or whitespace-only changes. Because the loss directly weights these labels, any systematic misalignment would make the reported pass@1 and AED improvements benchmark-specific rather than a general advance in minimal-edit repair.

Authors: We agree that the alignment procedure must be specified in detail to ensure reproducibility and support the generality of the claims. In the revised manuscript, we will describe the exact method: token sequences are extracted after normalizing whitespace and comments, then aligned using Python's difflib.SequenceMatcher to identify the longest common subsequence. Tokens in the LCS are labeled as preserved. Identifier renames are treated as edits (as they differ in string content), reordered statements receive partial LCS matches where possible, and whitespace-only differences are eliminated by normalization prior to alignment. We will add pseudocode, an example, and a brief discussion of limitations regarding purely semantic equivalences (e.g., commutative reordering) to §3. revision: yes
Referee: [§4] §4 (Experiments): The headline results (65.6% pass@1 lift, 32.6% AED reduction, AdaPatcher comparison) are presented without error bars, ablation studies isolating the preservation signal from masking and curriculum, or statistical significance tests. Without these, it is impossible to confirm that the preservation component is load-bearing for the central claim.

Authors: We acknowledge that the current experimental section lacks the requested statistical controls and ablations. In the revision, we will add: (i) error bars computed from five independent training runs with distinct random seeds; (ii) ablation experiments comparing standard fine-tuning, masking-only, curriculum-only, and full PAFT; and (iii) statistical significance tests (paired t-tests on pass@1 and Wilcoxon signed-rank tests on AED) across the reported benchmarks. Updated tables and figures will be included in §4 to demonstrate that the preservation signal contributes measurably beyond the other components. revision: yes

Circularity Check

0 steps flagged

No circularity: preservation labels derived externally from code alignment

full rationale

The paper defines PAFT by aligning buggy and fixed code to extract token-level preservation signals, then combining them with masking and a curriculum. This alignment step is external to the model and its outputs; no equations, fitted parameters, or self-citations are shown that reduce the claimed improvements (pass@1 lift, AED reduction) back to the inputs by construction. The derivation chain is therefore self-contained against the benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method implicitly assumes that code alignment yields unbiased preservation labels and that curriculum ordering improves learning without further justification.

pith-pipeline@v0.9.0 · 5537 in / 1132 out tokens · 30002 ms · 2026-05-13T19:18:58.742907+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 4 internal anchors

[1]

Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and chal- lenges of modern code review. In35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013, David Notkin, Betty H. C. Cheng, and Klaus Pohl (Eds.). IEEE Computer Society, 712–721. doi:10.1109/ICSE.2013.6606617

work page doi:10.1109/icse.2013.6606617 2013
[2]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009 (ACM International Conference Proceeding Series, Vol. 382), Andrea Pohoreckyj Danyluk, Léon Bottou, and Michael L. Littman (Eds.)...

work page arXiv 2009
[3]

Worawalan Chatlatanagulchai, Kundjanasith Thonglek, Brittany Reid, Yutaro Kashiwa, Pattara Leelaprute, Arnon Rungsawang, Bundit Manaskasemsak, and Hajimu Iida. 2025. On the Use of Agentic Coding Manifests: An Empirical Study of Claude Code. InProduct-Focused Software Process Improvement - 26th International Conference, PROFES 2025, Salerno, Italy, Decembe...

work page doi:10.1007/978- 2025
[4]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[5]

Zhenlong Dai, Bingrui Chen, Zhuoluo Zhao, Xiu Tang, Sai Wu, Chang Yao, Zhipeng Gao, and Jingyuan Chen. 2025. Less Is More: Adaptive Program Repair with Bug Localization and Preference Learning. InAAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, Toby Walsh, Julie Shah,...

work page doi:10.1609/aaai.v39i1.31988 2025
[6]

DeepSeek-AI. 2024. DeepSeek-V3 Technical Report.CoRRabs/2412.19437 (2024). arXiv:2412.19437 doi:10.48550/ARXIV.2412.19437

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.19437 2024
[7]

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. (2023). http://papers.nips. cc/paper_files/paper/2023/hash/1feb87871436031bdc0f2beaa62a049b-Abstract- Conference.html

work page 2023
[8]

Marco Di Biase, Magiel Bruntink, Arie Van Deursen, and Alberto Bacchelli. 2019. The effects of change decomposition on code review - a controlled experiment. PeerJ Computer Science5 (2019), e193

work page 2019
[9]

Ramtin Ehsani, Sakshi Pathak, Shriya Rawal, Abdullah Al Mujahid, Mia Moham- mad Imran, and Preetha Chatterjee. 2026. Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub.CoRRabs/2601.15195 (2026). arXiv:2601.15195 doi:10.48550/ARXIV.2601.15195

work page doi:10.48550/arxiv.2601.15195 2026
[10]

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guant- ing Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang

work page
[11]

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence.CoRRabs/2401.14196 (2024). arXiv:2401.14196 doi:10.48550/ARXIV.2401.14196

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.14196 2024
[12]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=nZeVKeeFYf9

work page 2022
[13]

Kai Huang, Xiangxin Meng, Jian Zhang, Yang Liu, Wenjie Wang, Shuhao Li, and Yuqing Zhang. 2024. An Empirical Study on Fine-Tuning Large Language Models of Code for Automated Program Repair. InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering(Echternach, Luxem- bourg)(ASE ’23). IEEE Press, 1162–1174. doi:10.1109/AS...

work page doi:10.1109/ase56229.2023.00181 2024
[14]

Siming Huang, Tianhao Cheng, Jason Klein Liu, Weidi Xu, Jiaran Hao, Liuyihan Song, Yang Xu, Jian Yang, Jiaheng Liu, Chenchen Zhang, Linzheng Chai, Ruifeng Yuan, Xianzhen Luo, Qiufeng Wang, YuanTao Fan, Qingfu Zhu, Zhaoxiang Zhang, Yang Gao, Jie Fu, Qian Liu, Houyi Li, Ge Zhang, Yuan Qi, Yinghui Xu, Wei Chu, and Zili Wang. 2025. OpenCoder: The Open Cookboo...

work page 2025
[15]

Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of Code Language Models on Automated Program Repair. In45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20,

work page 2023
[16]

In: 45th IEEE/ACM International Conference on Software En- gineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

IEEE, 1430–1442. doi:10.1109/ICSE48619.2023.00125

work page doi:10.1109/icse48619.2023.00125 2023
[17]

René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: a database of existing faults to enable controlled testing studies for Java programs. InInterna- tional Symposium on Software Testing and Analysis, ISSTA ’14, San Jose, CA, USA - July 21 - 26, 2014, Corina S. Pasareanu and Darko Marinov (Eds.). ACM, 437–440. doi:10.1145/2610384.2628055

work page doi:10.1145/2610384.2628055 2014
[18]

Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair.IEEE Trans. Software Eng.38, 1 (2012), 54–72. doi:10.1109/TSE.2011.104

work page doi:10.1109/tse.2011.104 2012
[19]

Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated program repair.Commun. ACM62, 12 (2019), 56–65. doi:10.1145/3318162

work page doi:10.1145/3318162 2019
[20]

Backpropagation through time and the brain.Current Opinion in Neurobiology, 55:82–89, 2019

Kui Liu, Li Li, Anil Koyuncu, Dongsun Kim, Zhe Liu, Jacques Klein, and Tegawendé F. Bissyandé. 2021. A critical review on the evaluation of auto- mated program repair systems.J. Syst. Softw.171 (2021), 110817. doi:10.1016/J. JSS.2020.110817

work page doi:10.1016/j 2021
[21]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In7th International Conference on Learning Representations, ICLR 2019, New Or- leans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum? id=Bkg6RiCqY7

work page 2019
[22]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with hu-...

work page 2022
[23]

Manning, Stefano Ermon, and Chelsea Finn

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. (2023). http://papers.nips.cc/ paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract- Conference.html

work page 2023
[24]

Achyudh Ram, Anand Ashok Sawant, Marco Castelluccio, and Alberto Bacchelli

work page
[25]

What makes a code change easier to review: an empirical investigation on code change reviewability. InProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Lake Buena Vista, FL, USA)(ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 201–212....

work page doi:10.1145/3236024 2018
[26]

Ratcliff and David E

John W. Ratcliff and David E. Metzener. 1988. Pattern Matching: The Gestalt Approach.Dr. Dobb’s Journal(July 1988), 46. https://jacobfilipp.com/DrDobbs/ articles/DDJ/1988/8807/8807c/8807c.htm HTML reprint/archive of the original magazine article

work page 1988
[27]

André Silva, Sen Fang, and Martin Monperrus. 2025. RepairLLaMA: Efficient Rep- resentations and Fine-Tuned Adapters for Program Repair.IEEE Trans. Software Eng.51, 8 (2025), 2366–2380. doi:10.1109/TSE.2025.3581062

work page doi:10.1109/tse.2025.3581062 2025
[28]

Xiaolong Tian. 2024. Evaluating the Repair Ability of LLM Under Different Prompt Settings. InIEEE International Conference on Software Services Engineering, SSE 2024, Shenzhen, China, July 7-13, 2024. IEEE, 313–322. doi:10.1109/SSE62657.2024. 00053

work page doi:10.1109/sse62657.2024 2024
[29]

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated Pro- gram Repair in the Era of Large Pre-trained Language Models. In45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 1482–1494. doi:10.1109/ICSE48619.2023.00129

work page doi:10.1109/icse48619.2023.00129 2023
[30]

Chunqiu Steven Xia and Lingming Zhang. 2024. Automated program repair via conversation: Fixing 162 out of 337 bugs for $0.42 each using chatgpt. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 819–831

work page 2024
[31]

Junjielong Xu, Ying Fu, Shin Hwei Tan, and Pinjia He. 2025. Aligning the Objective of LLM-Based Program Repair. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 2548–2560

work page 2025
[32]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jian Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Liangha...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388 2025
[33]

Bissyandé, Yang Liu, and Haoye Tian

Boyang Yang, Zijian Cai, Fengling Liu, Bach Le, Lingming Zhang, Tegawendé F. Bissyandé, Yang Liu, and Haoye Tian. 2025. A Survey of LLM-based Auto- mated Program Repair: Taxonomies, Design Paradigms, and Applications.CoRR abs/2506.23749 (2025). arXiv:2506.23749 doi:10.48550/ARXIV.2506.23749 , , Boyang Yang, Zijian Cai, Shunfu Jin, and Haoye Tian

work page doi:10.48550/arxiv.2506.23749 2025
[34]

InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024)

Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F. Bissyandé, and Shunfu Jin. 2024. CREF: An LLM-Based Conversa- tional Software Repair Framework for Programming Tutors. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, Vienna, Austria, September 16-20, 2024, ...

work page doi:10.1145/3650212.3680328 2024
[35]

Boyang Yang, Haoye Tian, Jiadong Ren, Hongyu Zhang, Jacques Klein, Tegawende Bissyande, Claire Le Goues, and Shunfu Jin. 2025. MORepair: Teach- ing LLMs to Repair Code via Multi-Objective Fine-Tuning.ACM Transactions on Software Engineering and Methodology(2025)

work page 2025
[36]

Jooyong Yi, Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme, and Abhik Roy- choudhury. 2018. A correlation study between automated program repair and test-suite metrics. InProceedings of the 40th International Conference on Soft- ware Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark...

work page doi:10.1145/3180155.3182517 2018
[37]

Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang

work page
[38]

ThinkRepair: Self-Directed Automated Program Repair. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, Vienna, Austria, September 16-20, 2024, Maria Christakis and Michael Pradel (Eds.). ACM, 1274–1286. doi:10.1145/3650212.3680359

work page doi:10.1145/3650212.3680359 2024
[39]

Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure- Inducing Input.IEEE Trans. Software Eng.28, 2 (2002), 183–200. doi:10.1109/32. 988498

work page doi:10.1109/32 2002
[40]

Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, and Omer Levy. 2023. LIMA: Less Is More for Alignment. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS...

work page 2023