arxiv: 2604.24354 · v1 · submitted 2026-04-27 · 💻 cs.LO

Recognition: unknown

Understanding and Improving Automated Proof Synthesis for Interactive Theorem Provers

Manqing Zhang , Yunwei Dong , Lingru Zhou , Bingxu Xiao , Yepang Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-07 17:46 UTC · model grok-4.3

classification 💻 cs.LO

keywords automated theorem provingproof synthesisinteractive theorem proverstactic searchhuman expert patternsformal verificationdeep learning

0 comments

The pith

Guiding proof search with human expert tactic patterns lets tools prove 8 percent more theorems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies theorems that current deep-learning proof synthesizers still cannot prove in interactive theorem provers. Analysis of proof scripts and search traces shows that successful proofs tend to follow sequences of tactics that human experts use. The authors therefore introduce a lightweight addition called pattern-guided tactic search that steers existing synthesizers toward those human-like choices during search. Experiments on standard benchmarks show the change raises the total number of proved theorems by an average of 8.05 percent and newly proves 20 percent more theorems that had remained out of reach. The same guidance also yields shorter proof scripts and works better on complex statements.

Core claim

Tactic selections that conform to human expert proof patterns succeed more often than those chosen by purely statistical search. By adding a heuristic that prefers such patterns, the pattern-guided tactic search method improves existing proof synthesis tools without retraining their models.

What carries the argument

Pattern-guided tactic search (PGTS), a heuristic that reorders or filters candidate tactics during search to favor sequences matching observed human expert patterns from proof corpora.

If this is right

Existing proof synthesizers can be improved by adding the guidance layer without retraining their neural models.
A measurable fraction of previously unprovable theorems become reachable, raising coverage by roughly one-fifth.
Generated proofs for complex theorems become shorter and more readable.
The approach scales to harder statements where pure search currently fails.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The finding implies that explicit structural knowledge from human proofs can usefully complement statistical learning in automated reasoning.
Similar pattern guidance might improve other search-based tasks such as program synthesis or hardware verification.
If the human patterns prove stable across corpora, future tools could learn or mine them automatically instead of using a fixed heuristic.

Load-bearing premise

The measured gains come specifically from directing search toward human expert patterns rather than from other implementation details or the choice of test theorems.

What would settle it

Applying the same PGTS heuristic to a fresh set of theorems or a different base proof synthesizer and finding no average increase (or a decrease) in proved theorems would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.24354 by Bingxu Xiao, Lingru Zhou, Manqing Zhang, Yepang Liu, Yunwei Dong.

**Figure 2.** Figure 2: An example of proof script in Verdi project and generates more concise proof scripts, thereby advancing the efficiency and effectiveness of automated proof synthesis. The remainder of this paper is organized as follows. Section 2 presents the background of this study. Section 3 describes the experimental setup and methodology of our empirical investigation. Section 4 reports the results and findings for e… view at source ↗

**Figure 3.** Figure 3: An example workflow of step-by-step proof search and tactic selection distributed systems. The script defines a theorem named trace_outputs_backup, which asserts that any event emitted solely by the backup node does not contribute to the system’s observable output. To establish the correctness of this theorem, the user interacts with the ITPs by applying a sequence of tactics that incrementally construct … view at source ↗

**Figure 5.** Figure 5: illustrates the relationship between the number of Logical-Mathematical Symbols in theorems and the proof success rate. The regression lines indicate that as the number of Logical-Mathematical Symbols increases, the proof success rates of all proof synthesis tools exhibit a consistent decline. This trend may be attributed to the increasing complexity and abstraction introduced by a higher density of symbo… view at source ↗

**Figure 4.** Figure 4: The proportion of theorem types proven by proof synthesis tools view at source ↗

**Figure 6.** Figure 6: illustrates the proof success rates of theorems involving different types of logical-mathematical symbols. Notably, theorems containing quantifiers (e.g., ∀, ∃) exhibit the highest proof success rate across all evaluated synthesis tools. In contrast, theorems involving implication operators (→) and logical connectives (∧, ∨, ¬) demonstrate relatively lower success rates. In particular, the presence of logi… view at source ↗

**Figure 8.** Figure 8: illustrates the proof success rates associated with different human expert proof patterns observed in the proof scripts. Among the various patterns, the "Introduction" and "Analysis" patterns exhibit the highest success rates, indicating that current proof synthesis tools are more effective when dealing with goal decomposition and case analysis tasks. In contrast, patterns such as "Rewriting" and "Automat… view at source ↗

**Figure 7.** Figure 7: presents a comparative analysis of proof success rates between scenarios involving the introduction of auxiliary lemmas and those without such additional components. The empirical data clearly demonstrates a consistent pattern across all proof synthesis tools: the success rates are significantly lower in cases requiring auxiliary lemmas compared to those that can be completed without lemma introduction. … view at source ↗

**Figure 9.** Figure 9: Relationship Between Proof Search Depth and Number of Tactic Attempts for Successful and Failed Theorems complex or ambiguous proof states. These findings suggest that improving the accuracy and utility of tactic ranking, particularly in the early stages of proof search, is essential for enhancing proof synthesis performance. Finding 5: Failed proof search involves significantly more tactic attempts, large… view at source ↗

**Figure 10.** Figure 10: Correlation of Human Expert Proof Pattern Alignment with Tactic Effectiveness in Proof Search findings, we present a set of suggestions for enhancing future proof synthesis tools. An effective direction for future proof synthesis tools is the joint prediction of tactics and theorem types, enabling more context-aware and accurate proof generation. We observe a notable decline in the performance of existi… view at source ↗

**Figure 11.** Figure 11: illustrates the number of theorems proved by all studied baselines using the original depth-first search (DFS), our proposed Pattern-Guided Tactic Search (PGTS), and a variant, RPGTS. The results show that incorporating human-written proof patterns into the search process as a heuristic significantly enhances the theorem-proving capabilities of automated proof synthesis tools. Notably, guiding all tactic… view at source ↗

**Figure 13.** Figure 13: The Average Reduction in Proof Steps by Four Proof Synthesis Tools Using Different Proof Search Methods Compared to Human-Written Proofs Answer for RQ6: PGTS significantly reduces proof lengths compared to DFS, achieving an average reduction of 20.82% across proof synthesis tools. 8. Discussion This section discusses several potential limitations of our approach PGTS, as well as threats to its validity. … view at source ↗

**Figure 12.** Figure 12: presents a heatmap illustrating the average proof length generated by different proof search strategies across four foundational proof synthesis tools. In the heatmap, darker colors indicate shorter proof sequences, signifying higher-quality proofs. The experimental results demonstrate that our proposed PGTS tool significantly improves proof synthesis by reducing proof lengths in most cases. Specificall… view at source ↗

**Figure 14.** Figure 14: Average Number of Tactic Attempts per Proof Step under Different Search Strategies To understand this phenomenon, we conducted a deeper analysis. The primary cause lies in the pattern-matching mechanism: tactics that align with human-like proof patterns are prioritized during the search. However, our current design considers only the category of a tactic, without validating the correctness of its parame… view at source ↗

read the original abstract

Formal verification using interactive theorem provers ensures high-quality software. However, writing proof scripts for interactive theorem provers is labor-intensive and requires deep expertise. Recent studies have leveraged deep learning to automate theorem proving by learning from manually written proof corpora. Nevertheless, these techniques still achieve limited success rates in proof synthesis. This paper investigates the theorems that current proof synthesis techniques are unable to prove and analyzes their characteristics. Through an in-depth analysis of the proven theorems, proof scripts, and the proof search process, we identify the limitations of existing tools and summarize the key factors influencing proof success rates. Our research provides valuable insights for the future optimization of automated proof tools. Based on our empirical study, we discover that tactic selections conforming to human expert proof patterns are more likely to lead to successful proofs. Inspired by this finding, we propose a pattern-guided tactic search (PGTS) method to heuristically enhance the search process of existing proof synthesis tools. Our evaluation experiments demonstrate that PGTS improves the number of theorems proved by existing proof synthesis tools by an average of 8.05\%, while also achieving an average 20\% increase in previously unproven theorems. Furthermore, PGTS enhances the capability of proof synthesis tools to prove complex theorems and generates more concise proof scripts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PGTS adds a human-pattern heuristic to proof search but the gains aren't cleanly attributed to that change.

read the letter

The paper introduces PGTS, a heuristic that steers automated proof search toward tactic sequences that match human expert patterns, and reports average improvements of 8.05% in proved theorems plus 20% more newly solved ones. They do solid work on the front end by studying failed proofs and successful scripts to extract what matters. The analysis of proof characteristics and search processes gives clear reasons why some theorems resist current methods. That part stands on its own as useful background. The proposal itself is simple: bias the search using those patterns. It builds directly on existing synthesis tools rather than replacing them, which is a plus for adoption. The results suggest it helps with complex theorems and produces shorter scripts too. If the evaluation is clean, this could be a quick win for practitioners. The main soft spot is attribution. The gains are presented as coming from the pattern guidance, but there's no sign of an ablation that turns the heuristic on and off while holding the rest of the system fixed. Other differences in implementation or data could explain the numbers instead. The abstract leaves that open, and if the paper doesn't close it with controls, the central claim stays weaker than it could be. Details on the exact theorem sets, statistical significance, and baseline configurations would help judge how general the effect is. This is for researchers in automated theorem proving who want incremental improvements to tactic selection. It won't revolutionize the area but gives them something concrete to experiment with. I would send it to peer review. The analysis is worth circulating and the method is practical enough that referees can assess its real impact with the right questions.

Referee Report

2 major / 2 minor

Summary. The paper analyzes limitations of existing deep learning-based proof synthesis tools for interactive theorem provers by examining unproven theorems, proof scripts, and search processes. It identifies that tactic selections conforming to human expert proof patterns are more likely to succeed, proposes a Pattern-Guided Tactic Search (PGTS) heuristic to enhance existing tools, and reports that PGTS yields an average 8.05% increase in proved theorems plus a 20% increase in previously unproven theorems, while also improving performance on complex theorems and producing more concise proofs.

Significance. If the empirical claims hold under detailed scrutiny, the work's analysis of proof patterns offers a useful lens for tool improvement in automated formal verification. The identification of human-pattern alignment as a success factor could guide future heuristics even if the specific PGTS gains require further validation.

major comments (2)

[Abstract and Evaluation section] Abstract and Evaluation section: The central quantitative claims—an average 8.05% improvement in theorems proved and 20% more previously unproven theorems—are stated without any description of the underlying dataset(s), the specific interactive theorem prover, the baseline proof synthesis tools, the total number of theorems, the number of experimental runs, or any statistical tests. This absence prevents verification of the reported gains and their attribution to PGTS.
[Section on PGTS proposal and evaluation] Section on PGTS proposal and evaluation: The claim that gains arise specifically from guiding toward human expert proof patterns is not supported by an ablation that disables only the pattern heuristic while holding search budget, infrastructure, and other implementation details fixed. Without such isolation, the improvements cannot be confidently attributed to the identified patterns rather than incidental changes in the overall procedure.

minor comments (2)

[Introduction] The manuscript should define 'human expert proof patterns' and 'pattern-guided tactic search' with concrete examples in the early sections to improve accessibility.
[Evaluation] Any tables or figures presenting success rates should include standard deviations or confidence intervals and clearly label all baselines and metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will incorporate revisions to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract and Evaluation section] Abstract and Evaluation section: The central quantitative claims—an average 8.05% improvement in theorems proved and 20% more previously unproven theorems—are stated without any description of the underlying dataset(s), the specific interactive theorem prover, the baseline proof synthesis tools, the total number of theorems, the number of experimental runs, or any statistical tests. This absence prevents verification of the reported gains and their attribution to PGTS.

Authors: The Evaluation section of the manuscript provides the dataset details, the interactive theorem prover, baseline tools, theorem counts, run numbers, and experimental protocol. However, we agree the abstract should be self-contained. We will revise the abstract to briefly state the experimental setup, including the prover, dataset size, baselines, number of theorems, runs performed, and any statistical tests. This will facilitate verification without altering the reported results. revision: yes
Referee: [Section on PGTS proposal and evaluation] Section on PGTS proposal and evaluation: The claim that gains arise specifically from guiding toward human expert proof patterns is not supported by an ablation that disables only the pattern heuristic while holding search budget, infrastructure, and other implementation details fixed. Without such isolation, the improvements cannot be confidently attributed to the identified patterns rather than incidental changes in the overall procedure.

Authors: We acknowledge that the current evaluation compares PGTS-augmented tools against unmodified baselines but does not isolate the pattern heuristic via a controlled ablation that holds search budget, infrastructure, and all other factors fixed. To strengthen attribution of gains specifically to the human-pattern guidance, we will add such an ablation study to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical analysis and heuristic evaluation

full rationale

The paper conducts an independent empirical analysis of proof scripts, search processes, and unproven theorems to identify human-expert pattern correlations, then implements PGTS as a heuristic inspired by that observation and measures success-rate gains directly against baselines. No equations, self-definitions, fitted parameters presented as predictions, or load-bearing self-citations appear in the derivation; the reported 8.05% and 20% improvements are external experimental outcomes, not reductions to the input analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical study on machine learning for theorem proving; the abstract introduces no new mathematical axioms, free parameters, or invented entities.

pith-pipeline@v0.9.0 · 5529 in / 1012 out tokens · 50302 ms · 2026-05-07T17:46:17.967942+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Reliability in software-intensive systems: challenges, solutions, and future perspectives

Francisco Henrique Ferreira, Elisa Yumi Nakagawa, and Ro- drigo Pereira dos Santos. Reliability in software-intensive systems: challenges, solutions, and future perspectives. In2021 47th Euromi- cro Conference on Software Engineering and Advanced Applications (SEAA), pages 54–61. IEEE, 2021

2021
[2]

Towards an understanding of reliability of software-intensive systems-of-systems.Information and Software Technology, 158:107186, 2023

Francisco Henrique Cerdeira Ferreira, Elisa Yumi Nakagawa, and Rodrigo Pereira dos Santos. Towards an understanding of reliability of software-intensive systems-of-systems.Information and Software Technology, 158:107186, 2023

2023
[3]

Software reliability engineering: A roadmap

Michael R Lyu. Software reliability engineering: A roadmap. In Future of Software Engineering (FOSE’07) , pages 153–170. IEEE, 2007

2007
[4]

Software reliability assessment with OR applications , volume 364

PK Kapur, Hoang Pham, Anshu Gupta, PC Jha, et al. Software reliability assessment with OR applications , volume 364. Springer, 2011

2011
[5]

sel4: Formal verification of an os kernel

Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, et al. sel4: Formal verification of an os kernel. InProceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 207–220, 2009

2009
[6]

Formal verification of a realistic compiler.Communi- cations of the ACM, 52(7):107–115, 2009

Xavier Leroy. Formal verification of a realistic compiler.Communi- cations of the ACM, 52(7):107–115, 2009

2009
[7]

Productivity for proof engineering

Mark Staples, Ross Jeffery, June Andronick, Toby Murray, Gerwin Klein, and Rafal Kolanski. Productivity for proof engineering. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement,pages1–4,2014

2014
[8]

A survey of formal verification approaches for practical systems

Qiao Zhang, Danyang Zhuo, and James Wilcox. A survey of formal verification approaches for practical systems
[9]

InERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, 2016

Xavier Leroy, Sandrine Blazy, Daniel Kästner, Bernhard Schommer, MarkusPister,andChristianFerdinand.Compcert-aformallyverified optimizing compiler. InERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, 2016

2016
[10]

Formalverificationof a constant-time preserving c compiler.Proceedings of the ACM on Programming Languages, 4(POPL):1–30, 2019

GillesBarthe,SandrineBlazy,BenjaminGrégoire,RémiHutin,Vin- centLaporte,DavidPichardie,andAlixTrieu. Formalverificationof a constant-time preserving c compiler.Proceedings of the ACM on Programming Languages, 4(POPL):1–30, 2019

2019
[11]

understanding and improving automated proof synthesis for interactive theorem provers

Manqing Zhang, Lingru Zhou, Bingxu Xiao, Yunwei Dong, and Yepang Liu. Replication package for “understanding and improving automated proof synthesis for interactive theorem provers”.https: //github.com/zmqgeek/PGTS, 2025

2025
[12]

Learningtoprovetheoremsviainteracting with proof assistants

KaiyuYangandJiaDeng. Learningtoprovetheoremsviainteracting with proof assistants. In International Conference on Machine Learning, pages 6984–6994. PMLR, 2019

2019
[13]

Tactok: Semantics- aware proof synthesis

Emily First, Yuriy Brun, and Arjun Guha. Tactok: Semantics- aware proof synthesis. Proceedings of the ACM on Programming Languages, 4(OOPSLA):1–31, 2020

2020
[14]

Passport:Improvingautomatedformal verification using identifiers

Alex Sanchez-Stern, Emily First, Timothy Zhou, Zhanna Kaufman, YuriyBrun,andTaliaRinger. Passport:Improvingautomatedformal verification using identifiers. ACM Transactions on Programming Languages and Systems, 45(2):1–30, 2023

2023
[15]

Generating correctness proofs with neural networks

Alex Sanchez-Stern, Yousef Alhessi, Lawrence Saul, and Sorin Lerner. Generating correctness proofs with neural networks. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pages 1–10, 2020

2020
[16]

Proof au- tomation with large language models

Minghai Lu, Benjamin Delaware, and Tianyi Zhang. Proof au- tomation with large language models. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engi- neering, pages 1509–1520, 2024

2024
[17]

Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth

Jiawei Han, Jian Pei, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Meichun Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In proceedings of the 17th international conference on data engineering, pages 215–224. IEEE Piscataway, NJ, USA, 2001

2001
[18]

Mining sequentialpatternsbypattern-growth:Theprefixspanapproach

Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Mei-Chun Hsu. Mining sequentialpatternsbypattern-growth:Theprefixspanapproach. IEEE Transactions on knowledge and data engineering,16(11):1424–1440, M. Zhang, L. Zhou, B. Xiao et al.:Preprint submitted to Elsevier Page 16 of 17 Journal of Systems and S...

2004
[19]

Miningcomparativesentencesfromsocialmedia text

FabíolaSFPereira. Miningcomparativesentencesfromsocialmedia text. In Workshop on interactions between data mining and natural language processing (DMNLP) co-located with European conference on machine learning and principles and practice of knowledge discov- ery in databases (ECML/PKDD), pages 41–48, 2015

2015
[20]

Diversity-driven automated formal verification

Emily First and Yuriy Brun. Diversity-driven automated formal verification. In Proceedings of the 44th International Conference on Software Engineering, pages 749–761, 2022

2022
[21]

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

Saketh Ram Kasibatla, Arpan Agarwal, Yuriy Brun, Sorin Lerner, Talia Ringer, and Emily First. Cobblestone: Iterative automation for formal verification.arXiv preprint arXiv:2410.19940, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Thor: Wielding hammers to integrate language mod- els and automated theorem provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, and Mateja Jamnik. Thor: Wielding hammers to integrate language mod- els and automated theorem provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

2022
[23]

Three years of ex- perience with sledgehammer, a practical link between automatic and interactive theorem provers

Lawrence C Paulsson and Jasmin C Blanchette. Three years of ex- perience with sledgehammer, a practical link between automatic and interactive theorem provers. InProceedings of the 8th International Workshop on the Implementation of Logics (IWIL-2010), Yogyakarta, Indonesia. EPiC, volume 2, 2012

2010
[24]

Learning-assisted automated reasoning with flyspeck.Journal of Automated Reasoning , 53:173– 213, 2014

Cezary Kaliszyk and Josef Urban. Learning-assisted automated reasoning with flyspeck.Journal of Automated Reasoning , 53:173– 213, 2014

2014
[25]

Hammer for coq: Automation for dependent type theory.Journal of automated reasoning, 61:423– 453, 2018

Łukasz Czajka and Cezary Kaliszyk. Hammer for coq: Automation for dependent type theory.Journal of automated reasoning, 61:423– 453, 2018

2018
[26]

Baldur: Whole-proof generation and repair with large language models

Emily First, Markus N Rabe, Talia Ringer, and Yuriy Brun. Baldur: Whole-proof generation and repair with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineer- ing, pages 1229–1241, 2023

2023
[27]

Leandojo: Theorem proving with retrieval-augmented language models

KaiyuYang,AidanSwope,AlexGu,RahulChalamala,PeiyangSong, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandku- mar. Leandojo: Theorem proving with retrieval-augmented language models. Advances in Neural Information Processing Systems , 36: 21573–21612, 2023

2023
[28]

Zhongxin Liu, Xin Xia, Ahmed E Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. Neural-machine-translation-based commit message generation: how far are we? In Proceedings of the 33rd ACM/IEEE international conference on automated software engi- neering, pages 373–384, 2018

2018
[29]

Data quality matters:Acasestudyondatalabelcorrectnessforsecuritybugreport prediction

Xiaoxue Wu, Wei Zheng, Xin Xia, and David Lo. Data quality matters:Acasestudyondatalabelcorrectnessforsecuritybugreport prediction. IEEE Transactions on Software Engineering,48(7):2541– 2556, 2021

2021
[30]

Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering,48(9):3280–3296,2021

Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering,48(9):3280–3296,2021

2021
[31]

What makes a good commit message? InProceedings of the 44th International Conference on Software Engineering,pages2389– 2401, 2022

Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. What makes a good commit message? InProceedings of the 44th International Conference on Software Engineering,pages2389– 2401, 2022

2022
[32]

Revisiting, benchmarking and exploring api recommendation: How far are we?IEEE Transactions on Software Engineering, 49(4):1876–1897, 2022

Yun Peng, Shuqing Li, Wenwei Gu, Yichen Li, Wenxuan Wang, Cuiyun Gao, and Michael R Lyu. Revisiting, benchmarking and exploring api recommendation: How far are we?IEEE Transactions on Software Engineering, 49(4):1876–1897, 2022

2022
[33]

Revisitinglearning- basedcommitmessagegeneration

JinhaoDong,YilingLou,DanHao,andLinTan. Revisitinglearning- basedcommitmessagegeneration. In 2023 IEEE/ACM 45th Interna- tional Conference on Software Engineering (ICSE) , pages 794–805. IEEE, 2023

2023
[34]

An empirical study of deep learning models for vulnerability detection

BenjaminSteenhoek,MdMahbuburRahman,RichardJiles,andWei Le. An empirical study of deep learning models for vulnerability detection. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2237–2248. IEEE, 2023

2023
[35]

Automaticcommitmessagegeneration: A critical review and directions for future work.IEEE Transactions on Software Engineering, 50(4):816–835, 2024

Yuxia Zhang, Zhiqing Qiu, Klaas-Jan Stol, Wenhui Zhu, Jiaxin Zhu, YingchenTian,andHuiLiu. Automaticcommitmessagegeneration: A critical review and directions for future work.IEEE Transactions on Software Engineering, 50(4):816–835, 2024

2024
[36]

Learning-based models for vulnerability detection: An extensive study

Chao Ni, Liyu Shen, Xiaodan Xu, Xin Yin, and Shaohua Wang. Learning-based models for vulnerability detection: An extensive study. arXiv preprint arXiv:2408.07526, 2024. M. Zhang, L. Zhou, B. Xiao et al.:Preprint submitted to Elsevier Page 17 of 17

work page arXiv 2024