arxiv: 2605.10712 · v1 · submitted 2026-05-11 · 💻 cs.SE · cs.CR

Recognition: no theorem link

AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification

Paschal C. Amusuo , Ricardo Calvo , Dharun Anandayuvaraj , Taylor Le Lievre , Kevin Kolyakov , Elijah Jorgensen , Aravind Machiry , James C. Davis

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:15 UTC · model grok-4.3

classification 💻 cs.SE cs.CR

keywords memory safetyunit proofsautomated verificationprogram synthesisLLMembedded systemssafety propertiescomponent verification

0 comments

The pith

AutoSOUP automates generation of Safety-Oriented Unit Proofs for component memory-safety verification using three techniques and an LLM hybrid.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AutoSOUP as a way to automate component-level memory-safety verification in low-level software by creating unit proofs that capture key verification decisions. These proofs encode choices such as the scope of analysis, loop bounds, and environment models needed to prove or disprove safety properties. The authors define three automatic derivation techniques and combine them with a hybrid LLM-As-Function-Call architecture that pairs deterministic program synthesis with language models to produce justifiable results. If the approach holds, it would lower the expertise barrier that currently limits formal verification in embedded systems and similar domains where dynamic checks are hard to apply.

Core claim

AutoSOUP formalizes Safety-Oriented Unit Proofs as artifacts that encode verification choices for memory-safety properties and derives them automatically through three techniques augmented by an LLM-As-Function-Call hybrid that integrates deterministic program synthesis with large language models to generate sound and justifiable proofs for individual components.

What carries the argument

Safety-Oriented Unit Proofs: artifacts that encode verification choices including scope, loop bounds, and environment models to enable proving absence of memory-safety violations in software components.

If this is right

Verification of memory safety for individual components can proceed without manual construction of proofs by experts.
Generated proofs can both confirm safety and surface actual vulnerabilities in the components under analysis.
The proofs carry explicit characterizations of their assumptions and the guarantees they provide.
Memory-safety assurance becomes feasible for embedded systems where hardware protections are limited.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hybrid synthesis-plus-LLM pattern could be applied to automate proofs for other classes of properties such as functional correctness or concurrency safety.
Integration into build pipelines would allow repeated checks on evolving components without re-investment of manual effort each time.
Performance on larger codebases could be measured by applying the system to established open embedded projects and counting how many components receive complete proofs.

Load-bearing premise

The three automatic techniques together with the LLM-As-Function-Call hybrid can reliably produce unit proofs whose encoded verification choices are both sufficient and sound for the target components.

What would settle it

A concrete case in which an AutoSOUP-generated unit proof asserts memory safety for a component that later exhibits a violation under targeted testing or in which the proof misses a known vulnerability present in the original code.

Figures

Figures reproduced from arXiv: 2605.10712 by Aravind Machiry, Dharun Anandayuvaraj, Elijah Jorgensen, James C. Davis, Kevin Kolyakov, Paschal C. Amusuo, Ricardo Calvo, Taylor Le Lievre.

**Figure 2.** Figure 2: AutoSOUP system diagram. AutoSOUP connects safety-oriented unit-proof construction with LLM-as-function-call [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of program reachable LOC, verification [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Comparing AutoSOUP’s unit proofs with expert [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

Memory-safety errors remain a persistent source of zero-day vulnerabilities in low-level software. The problem is especially acute in embedded systems, where hardware protections are often limited and dynamic analysis is difficult to apply effectively. Memory-safety verification can provide stronger assurance by proving the absence of such errors or exposing violations when they exist. However, current verification workflows remain largely manual and require substantial specialized expertise, limiting their adoption in practice. We present AutoSOUP, a system for automating component-level memory-safety verification through Safety-Oriented Unit Proofs. We formalize these unit proofs as artifacts that encode verification choices (scope, loop bounds, and environment models) for verifying safety properties, and introduce three techniques for deriving them automatically. To overcome the limitations of existing automation approaches, we further introduce LLM-As-Function-Call, a hybrid architecture that combines deterministic program synthesis with LLMs to automate these techniques and produce justifiable unit proofs. We evaluate AutoSOUP by assessing its ability to automate memory-safety verification and expose vulnerabilities in verified components, and we characterize the assumptions and guarantees of the resulting proofs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AutoSOUP pairs deterministic synthesis with LLMs to generate unit proofs for component memory safety, but the evaluation leaves the soundness of those LLM choices under-supported.

read the letter

AutoSOUP formalizes Safety-Oriented Unit Proofs that capture scope, loop bounds, and environment models, then uses three automatic techniques plus a hybrid LLM-As-Function-Call architecture to produce them. The hybrid keeps deterministic synthesis for parts where it can be reliable and hands the rest to an LLM, with the goal of cutting down the manual work that currently blocks component-level memory-safety verification in embedded code. That combination is the clearest new piece: prior work has tried either pure synthesis or standalone LLM prompting, but this specific mix aimed at justifiable unit proofs is not already in the literature they cite. The evaluation checks whether the system can automate the process and surface vulnerabilities, and the authors lay out the assumptions and guarantees of the resulting proofs. That is useful groundwork for anyone who has tried to scale verification beyond small examples. The main soft spot is exactly the one the stress-test flags. The architecture still lets the LLM propose loop bounds and environment models, and the abstract only says the evaluation assesses automation and vulnerability exposure without describing independent soundness checks such as manual inspection of the generated artifacts or comparison to hand-written ground-truth proofs. If those checks are missing or weak in the full paper, then a generated proof could silently accept unsafe code. The lack of any quantitative results or error rates in the abstract makes it hard to judge how often this happens in practice. This work is for researchers building automated verification tools and for security engineers who need memory-safety arguments on real components rather than toy examples. A reader already working on LLM-assisted formal methods or embedded verification will find concrete architecture details and an initial evaluation to build on. It deserves a serious referee because the problem is practical and the hybrid approach is a reasonable next step, even though the soundness argument will need tightening in revision. I would send it out for review.

Referee Report

2 major / 1 minor

Summary. The paper presents AutoSOUP, a system for automating component-level memory-safety verification through Safety-Oriented Unit Proofs. These proofs are formalized as artifacts encoding verification choices including scope, loop bounds, and environment models. The authors introduce three automatic techniques for deriving such proofs and a hybrid LLM-As-Function-Call architecture that combines deterministic program synthesis with LLMs to generate justifiable unit proofs. The evaluation assesses the system's ability to automate verification and expose vulnerabilities in components while characterizing the assumptions and guarantees of the resulting proofs.

Significance. If the results hold, AutoSOUP could meaningfully lower the barrier to memory-safety verification in embedded and low-level systems by reducing manual effort. The hybrid architecture and explicit characterization of assumptions/guarantees are positive contributions that address limitations of pure automation or standalone LLM use. Credit is due for focusing on justifiable proofs rather than black-box generation. However, the overall significance hinges on whether the generated proofs are demonstrably sound; without that, the automation benefit is limited.

major comments (2)

[Evaluation] Evaluation section: the description of assessing 'ability to automate and expose vulnerabilities' does not include independent soundness validation of the generated unit proofs (e.g., manual inspection of chosen scopes/loop bounds/environment models or comparison to ground-truth proofs). This is load-bearing for the central claim that the proofs are justifiable and support reliable safety verification without false negatives.
[LLM-As-Function-Call architecture] LLM-As-Function-Call architecture (likely §4 or equivalent): while the hybrid design is introduced, the manuscript must show how the deterministic synthesis component constrains LLM-proposed choices (loop bounds, environment models) to prevent unsound outputs; the abstract's claim that assumptions/guarantees are characterized does not substitute for this mechanism or its empirical validation.

minor comments (1)

[Abstract] The abstract would be strengthened by including at least one quantitative result (e.g., number of components handled or success rate) to give readers an immediate sense of scale.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments highlight important aspects of our evaluation and architecture that we will clarify and strengthen in the revision. We address each major comment below.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the description of assessing 'ability to automate and expose vulnerabilities' does not include independent soundness validation of the generated unit proofs (e.g., manual inspection of chosen scopes/loop bounds/environment models or comparison to ground-truth proofs). This is load-bearing for the central claim that the proofs are justifiable and support reliable safety verification without false negatives.

Authors: We agree that explicit independent soundness validation would strengthen the evaluation and better support the claim of justifiable proofs. Our current evaluation demonstrates successful automation and vulnerability exposure while characterizing assumptions and guarantees of the generated proofs. In the revision, we will add a dedicated subsection to the evaluation that includes manual inspection of a representative sample of generated unit proofs. This will cover verification of chosen scopes, loop bounds, and environment models, with comparisons to ground-truth proofs where available, to confirm the absence of false negatives. revision: yes
Referee: [LLM-As-Function-Call architecture] LLM-As-Function-Call architecture (likely §4 or equivalent): while the hybrid design is introduced, the manuscript must show how the deterministic synthesis component constrains LLM-proposed choices (loop bounds, environment models) to prevent unsound outputs; the abstract's claim that assumptions/guarantees are characterized does not substitute for this mechanism or its empirical validation.

Authors: The LLM-As-Function-Call architecture combines deterministic program synthesis with LLMs specifically to produce justifiable proofs by constraining LLM outputs. We will revise the description in §4 to explicitly detail the constraining mechanisms, including how the synthesis component validates and enforces soundness on LLM-proposed loop bounds and environment models (e.g., by rejecting invalid or unsound choices via static checks). We will also add empirical validation through additional case-study results demonstrating that the generated proofs introduce no false negatives, thereby supporting the characterization of assumptions and guarantees. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents AutoSOUP as an engineering system that formalizes Safety-Oriented Unit Proofs and combines deterministic program synthesis with LLMs to generate them. No equations, fitted parameters, or self-referential definitions appear in the abstract or description. The central claims rest on an evaluation of automation capability and vulnerability exposure rather than any derivation that reduces to its own inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text, so the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; full text would be required to populate this ledger.

pith-pipeline@v0.9.0 · 5521 in / 1141 out tokens · 34870 ms · 2026-05-12T04:15:39.364345+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 1 internal anchor

[1]

URL: https://docs.litellm.ai/docs/

LiteLLM - getting started | liteLLM. URL: https://docs.litellm.ai/docs/

work page
[2]

original-date: 2020-10-25T23:23:54Z

openai/openai-python. original-date: 2020-10-25T23:23:54Z. URL: https://github. com/openai/openai-python

work page 2020
[3]

URL: https://www.computer.org/csdl/ magazine/co/2001/01/r1135/13rRUwgyOg9

Software defect reduction top 10 list. URL: https://www.computer.org/csdl/ magazine/co/2001/01/r1135/13rRUwgyOg9

work page 2001
[4]

URL: https://www.swebench.com/index.html

SWE-bench leaderboards. URL: https://www.swebench.com/index.html

work page
[5]

Second edition

IEC 61508-3: Functional safety of electrical/electronic/programmable electronic safety-related systems – part 3: Software requirements, 2010. Second edition

work page 2010
[6]

URL: https: //www.crowdstrike.com/en-us/blog/channel-file-291-rca-available/

Channel File 291 Incident RCA is Available | CrowdStrike, 2025. URL: https: //www.crowdstrike.com/en-us/blog/channel-file-291-rca-available/

work page 2025
[7]

DARPA Guide for Formal Methods to Deliver Resilient Systems for Proposals,

work page
[8]

URL: https://defencescienceinstitute.com/wp-content/uploads/2025/01/ Resilient_Systems_Best_Practices_Guide_-_1-9-2025.pdf

work page 2025
[9]

In the Wild

Project Zero: 0day "In the Wild", 2025. URL: https://googleprojectzero.blogspot. com/p/0day.html

work page 2025
[10]

Challenges in de- signing exploit mitigations for deeply embedded systems

Ali Abbasi, Jos Wetzels, Thorsten Holz, and Sandro Etalle. Challenges in de- signing exploit mitigations for deeply embedded systems. In2019 IEEE Euro- pean Symposium on Security and Privacy (EuroS&P), pages 31–46, 2025. URL: https://ieeexplore.ieee.org/abstract/document/8806725, doi:10.1109/EuroSP. 2019.00013

work page doi:10.1109/eurosp 2025
[11]

Securing tomorrow’s software: the need for memory safety standards, 2025

Alex Rebert, Ben Laurie, Murali Vijayaraghavan, and Alex Richardson. Securing tomorrow’s software: the need for memory safety standards, 2025. URL: https:// security.googleblog.com/2025/02/securing-tomorrows-software-need-for.html

work page 2025
[12]

Safer with Google: Advancing Memory Safety, 2024

Alex Rebert, Chandler Carruth, Jen Engel, and Andy Qin. Safer with Google: Advancing Memory Safety, 2024. URL: https://security.googleblog.com/2024/10/ safer-with-google-advancing-memory.html

work page 2024
[13]

Amusuo, Owen Cochell, Taylor Le Lievre, Parth V

Paschal C. Amusuo, Owen Cochell, Taylor Le Lievre, Parth V. Patil, Aravind Machiry, and James C. Davis. Do unit proofs work? an empirical study of composi- tional bounded model checking for memory safety verification. In2026 IEEE/ACM 48th International Conference on Software Engineering. URL: http://arxiv.org/abs/ 2503.13762,arXiv:2503.13762[cs],doi:10.48...

work page doi:10.48550/arxiv.2503.13762
[14]

Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez, Jonathan Met- zman, Oliver Chang, and James C

Paschal C. Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez, Jonathan Met- zman, Oliver Chang, and James C. Davis. FalseCrashReducer: Mitigating false positive crashes in OSS-fuzz-gen using agentic AI. URL: http://arxiv.org/abs/ 2510.02185,arXiv:2510.02185[cs],doi:10.48550/arXiv.2510.02185

work page doi:10.48550/arxiv.2510.02185
[15]

Log parsing: How far can chatgpt go? InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, ASE ’23, page 1699–1704

Paschal C. Amusuo, Ricardo Andrés Calvo Méndez, Zhongwei Xu, Aravind Machiry, and James C. Davis. Systematically detecting packet validation vul- nerabilities in embedded network stacks. In2023 38th IEEE/ACM Interna- tional Conference on Automated Software Engineering (ASE), pages 926–938, 2023. ISSN: 2643-1572. URL: https://ieeexplore.ieee.org/abstract/d...

work page doi:10.1109/ase56229.2023.00095 2023
[16]

Mousavi Engineering: New Ideas and Emerging Results (ICSE-NIER)

Paschal C. Amusuo, Parth V. Patil, Owen Cochell, Taylor Le Lievre, and James C. Davis. A unit proofing framework for code-level verification: A research agenda. In2025 IEEE/ACM 47th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 36–40. URL: https://ieeexplore.ieee.org/ abstract/document/11023946,doi:10....

work page doi:10.1109/icse-nier66352.2025.00013 2025
[17]

Arthur Azevedo de Amorim, Cătălin Hriţcu, and Benjamin C. Pierce. The meaning of memory safety. In Lujo Bauer and Ralf Küsters, editors,Principles of Security and Trust, pages 79–105. Springer International Publishing. doi:10.1007/978- 3-319-89722-6_4

work page doi:10.1007/978-
[18]

Bensalem, M

S. Bensalem, M. Bozga, T.-H. Nguyen, and J. Sifakis. Compositional verification for component-based systems and application.IET Software, 4(3):181–193, 2010. Publisher: IET Digital Library. URL: https://digital-library.theiet.org/content/ journals/10.1049/iet-sen.2009.0011,doi:10.1049/iet-sen.2009.0011

work page doi:10.1049/iet-sen.2009.0011 2010
[19]

Protocol-aware firmware rehosting for effective fuzzing of embedded network stacks

Moritz Bley, Tobias Scharnowski, Simon Wörner, Moritz Schloegel, and Thorsten Holz. Protocol-aware firmware rehosting for effective fuzzing of embedded network stacks. InProceedings of the 2025 ACM SIGSAC Conference on Com- puter and Communications Security, CCS ’25, pages 4484–4498. Association for Computing Machinery. URL: https://dl.acm.org/doi/10.1145...

work page doi:10.1145/3719027.3765125 2025
[20]

The Urgent Need for Memory Safety in Software Products | CISA, September 2025

Bob Lord. The Urgent Need for Memory Safety in Software Products | CISA, September 2025. URL: https://www.cisa.gov/news-events/news/urgent-need- memory-safety-software-products

work page 2025
[21]

Infer: An automatic program verifier for memory safety of c programs

Cristiano Calcagno and Dino Distefano. Infer: An automatic program verifier for memory safety of c programs. In Mihaela Bobaru, Klaus Havelund, Gerard J. Holzmann, and Rajeev Joshi, editors,NASA Formal Methods, pages 459–465. Springer.doi:10.1007/978-3-642-20398-5_33

work page doi:10.1007/978-3-642-20398-5_33
[22]

Compo- sitional shape analysis by means of bi-abduction

Cristiano Calcagno, Dino Distefano, Peter O’Hearn, and Hongseok Yang. Compo- sitional shape analysis by means of bi-abduction. InProceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’09, pages 289–300. Association for Computing Machinery, 2025. URL: https: //dl.acm.org/doi/10.1145/1480881.1480917,doi:10....

work page doi:10.1145/1480881.1480917 2025
[23]

Static detection of packet injection vulnerabilities: A case for identifying attacker-controlled implicit information leaks

Qi Alfred Chen, Zhiyun Qian, Yunhan Jack Jia, Yuru Shao, and Zhuoqing Morley Mao. Static detection of packet injection vulnerabilities: A case for identifying attacker-controlled implicit information leaks. InProceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, pages 388–400. Association for Computing Machinery, ...

work page doi:10.1145/2810103.2813643 2015
[24]

BLITZ: Compositional bounded model checking for real-world programs

Chia Yuan Cho, Vijay D’Silva, and Dawn Song. BLITZ: Compositional bounded model checking for real-world programs. In2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 136–146, 2013. URL: https://ieeexplore.ieee.org/abstract/document/6693074, doi:10.1109/ASE.2013. 6693074

work page doi:10.1109/ase.2013 2013
[25]

Monteiro, Daniel Schwartz-Narbonne, Serdar Tasiran, Michael Tautschnig, and Mark R

Nathan Chong, Byron Cook, Konstantinos Kallas, Kareem Khazem, Felipe R. Monteiro, Daniel Schwartz-Narbonne, Serdar Tasiran, Michael Tautschnig, and Mark R. Tuttle. Code-level model checking in the software development work- flow. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEI...

work page doi:10.1145/3377813.3381347 2020
[26]

Bounded model checking using satisfiability solving.Formal Methods in System Design, 19(1):7–34, 2001.doi:10.1023/A:1011276507260

Edmund Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu. Bounded model checking using satisfiability solving.Formal Methods in System Design, 19(1):7–34, 2001.doi:10.1023/A:1011276507260

work page doi:10.1023/a:1011276507260 2001
[27]

Cobleigh, Dimitra Giannakopoulou & Corina S

Jamieson M. Cobleigh, Dimitra Giannakopoulou, and Corina S. PĂsĂreanu. Learn- ing assumptions for compositional verification. In Hubert Garavel and John Hatcliff, editors,Tools and Algorithms for the Construction and Analysis of Systems, pages 331–346. Springer, 2003.doi:10.1007/3-540-36577-X_24

work page doi:10.1007/3-540-36577-x_24 2003
[28]

External technical root cause analysis — channel file

CrowdStrike, Inc. External technical root cause analysis — channel file

work page
[29]

Accessed January 2026

Technical report, CrowdStrike, August 2024. Accessed January 2026. URL: https://www.crowdstrike.com/wp-content/uploads/2024/08/Channel-File- 291-Incident-Root-Cause-Analysis-08.06.2024.pdf

work page 2024
[30]

Modeling readability to improve unit tests

Ermira Daka, José Campos, Gordon Fraser, Jonathan Dorn, and Westley Weimer. Modeling readability to improve unit tests. InProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 107–118. Association for Computing Machinery. URL: https://dl.acm.org/doi/10.1145/ 2786805.2786838,doi:10.1145/2786805.2786838

work page doi:10.1145/2786805.2786838 2015
[31]

Lahiri, Akash Lal, and Yi Li

Ankush Das, Shuvendu K. Lahiri, Akash Lal, and Yi Li. Angelic verification: Precise verification modulo unknowns. In Daniel Kroening and Corina S. Păsăre- anu, editors,Computer Aided Verification, pages 324–342. Springer International Publishing, 2015.doi:10.1007/978-3-319-21690-4_19

work page doi:10.1007/978-3-319-21690-4_19 2015
[32]

A guide to stakeholder analysis for cybersecurity researchers.arXiv preprint arXiv:2508.14796, 2025

James C Davis, Sophie Chen, Huiyun Peng, Paschal C Amusuo, and Kelechi G Kalu. A guide to stakeholder analysis for cybersecurity researchers.arXiv preprint arXiv:2508.14796, 2025

work page arXiv 2025
[33]

Dino Distefano, Manuel Fähndrich, Francesco Logozzo, and Peter W. O’Hearn. Scaling static analyses at facebook.Communications of the ACM, 62(8):62–70,

work page
[34]

URL: https://dl.acm.org/doi/10.1145/3338112,doi:10.1145/3338112

work page doi:10.1145/3338112
[35]

SoK: Enabling security analyses of embedded systems via rehosting

Andrew Fasano, Tiemoko Ballo, Marius Muench, Tim Leek, Alexander Bulekov, Brendan Dolan-Gavitt, Manuel Egele, Aurélien Francillon, Long Lu, Nick Gregory, Davide Balzarotti, and William Robertson. SoK: Enabling security analyses of embedded systems via rehosting. InProceedings of the 2021 ACM Asia Conference on Computer and Communications Security, ASIA CC...

work page doi:10.1145/3433210 2021
[36]

{P2IM}: Scalable and hardware- independent firmware testing via automatic peripheral interface modeling

Bo Feng, Alejandro Mera, and Long Lu. {P2IM}: Scalable and hardware- independent firmware testing via automatic peripheral interface modeling. In 29th USENIX Security Symposium (USENIX Security 20), pages 1237–1254, 2020. URL: https://www.usenix.org/conference/usenixsecurity20/presentation/feng

work page 2020
[37]

Deductive software verification

Jean-Christophe Filliâtre. Deductive software verification. 13(5):397–403. doi: 10.1007/s10009-011-0211-0

work page doi:10.1007/s10009-011-0211-0
[38]

In: Chandra, S., Blincoe, K., Tonella, P

Emily First, Markus N. Rabe, Talia Ringer, and Yuriy Brun. Baldur: Whole- proof generation and repair with large language models. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, pages 1229–1241. Association for Computing Machinery, 2023. URL: https://dl.a...

work page doi:10.1145/3611643.3616243 2023
[39]

An em- pirical study on the correctness of formally verified distributed systems

Pedro Fonseca, Kaiyuan Zhang, Xi Wang, and Arvind Krishnamurthy. An em- pirical study on the correctness of formally verified distributed systems. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys ’17, pages 328–343. Association for Computing Machinery, 2017. URL: https: //dl.acm.org/doi/10.1145/3064176.3064183,doi:10.1145/306...

work page doi:10.1145/3064176.3064183 2017
[40]

Mikhail Y. R. Gadelha, Hussama I. Ismail, and Lucas C. Cordeiro. Handling loops in bounded model checking of c programs via k-induction. 19(1):97–114. doi:10.1007/s10009-015-0407-9

work page doi:10.1007/s10009-015-0407-9
[41]

What is memory safety? - the PL enthusiast

Michael Hicks. What is memory safety? - the PL enthusiast. URL: http://www.pl- enthusiast.net/2014/07/21/memory-safety/

work page 2014
[42]

Lessons from Formally Verified Deployed Software.CoRR, March 2023

Li Huang, Sophie Ebersold, Alexander Kogtenkov, Bertrand Meyer, and Yinling Liu. Lessons from Formally Verified Deployed Software.CoRR, March 2023. URL: http://arxiv.org/abs/2301.02206,doi:10.48550/arXiv.2301.02206

work page doi:10.48550/arxiv.2301.02206 2023
[43]

In: 26th IEEE/ACM Interna- tional Conference on Automated Software Engineering, ASE 2011

Franjo Ivančić, Gogul Balakrishnan, Aarti Gupta, Sriram Sankaranarayanan, Naoto Maeda, Hiroki Tokuoka, Takashi Imoto, and Yoshiaki Miyazaki. DC2: A framework for scalable, scope-bounded software verification. In2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pages 133–142. ISSN: 1938-4300. URL: https://ieeexplore....

work page doi:10.1109/ase.2011.6100046 2011
[44]

Ganai, Aarti Gupta, and Pranav Ashar

Franjo Ivančić, Zijiang Yang, Malay K. Ganai, Aarti Gupta, and Pranav Ashar. Efficient SAT-based bounded model checking for software verification. 404(3):256–

work page
[45]

URL: https://www.sciencedirect.com/science/article/pii/S0304397508002223, doi:10.1016/j.tcs.2008.03.013

work page doi:10.1016/j.tcs.2008.03.013 2008
[46]

Artemis: Toward accurate detection of server-side request forgeries through LLM-assisted inter-procedural path-sensitive taint analysis

Yuchen Ji, Ting Dai, Zhichao Zhou, Yutian Tang, and Jingzhu He. Artemis: Toward accurate detection of server-side request forgeries through LLM-assisted inter-procedural path-sensitive taint analysis. 9:128:1349–128:1377. URL: https: //dl.acm.org/doi/10.1145/3720488,doi:10.1145/3720488

work page doi:10.1145/3720488
[47]

Getting started - the kani rust verifier, 2024

kani. Getting started - the kani rust verifier, 2024. URL: https://model-checking. github.io/kani/

work page 2024
[48]

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

Saketh Ram Kasibatla, Arpan Agarwal, Yuriy Brun, Sorin Lerner, Talia Ringer, and Emily First. Cobblestone: A divide-and-conquer approach for automating formal verification. URL: https://arxiv.org/abs/2410.19940v4, doi:10.1145/3744916. 3773178

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3744916
[49]

Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., and Winwood, S

Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: formal verification of an OS kernel. InProceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP ’09, pages 207–220. Ass...

work page doi:10.1145/1629575.1629596 2009
[50]

CBMC – c bounded model checker

Daniel Kroening and Michael Tautschnig. CBMC – c bounded model checker. In Erika Ábrahám and Klaus Havelund, editors,Tools and Algorithms for the Construction and Analysis of Systems, pages 389–391. Springer, 2014. doi:10. 1007/978-3-642-54862-8_26

work page 2014
[51]

New EU product liability directive comes into force

Latham & Watkins LLP. New EU product liability directive comes into force. Client alert / legal briefing, December 2024. Accessed January

work page 2024
[52]

URL: https://www.lw.com/admin/upload/SiteAttachments/New-EU- Product-Liability-Directive-Comes-Into-Force.pdf

work page
[53]

Can LLMs reason about program se- mantics? a comprehensive evaluation of LLMs on formal specification inference

Thanh Le-Cong, Bach Le, and Toby Murray. Can LLMs reason about program se- mantics? a comprehensive evaluation of LLMs on formal specification inference. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers), ...

work page doi:10.18653/v1/2025.acl-long.1068 2025
[54]

Assessing the impact of interface vulnerabilities in compart- mentalized software

Hugo Lefeuvre, Vlad-Andrei Bădoiu, Yi Chen, Felipe Huici, Nathan Dautenhahn, and Pierre Olivier. Assessing the impact of interface vulnerabilities in compart- mentalized software. InProceedings 2023 Network and Distributed System Security Symposium. Internet Society.doi:10.14722/ndss.2023.24117

work page doi:10.14722/ndss.2023.24117 2023
[55]

Memory state verification based on inductive and deductive reasoning

Shaofeng Li, Lei Qiao, and Mengfei Yang. Memory state verification based on inductive and deductive reasoning. 70(3):1026–1039. URL: https://ieeexplore.ieee. org/abstract/document/9435092,doi:10.1109/TR.2021.3074709

work page doi:10.1109/tr.2021.3074709 2021
[56]

IRIS: LLM-assisted static analysis for detecting security vulnerabilities

Ziyang Li, Saikat Dutta, and Mayur Naik. IRIS: LLM-assisted static analysis for detecting security vulnerabilities. In Y. Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, pages 35735–35758. URL: https://proceedings.iclr.cc/paper_files/paper/2025/file/ 582d4e27fa24168f3af1f4582655034b-Pape...

work page 2025
[57]

BaseSAFE: baseband sanitized fuzzing through emulation

Dominik Maier, Lukas Seidel, and Shinjo Park. BaseSAFE: baseband sanitized fuzzing through emulation. InProceedings of the 13th ACM Conference on Security and Privacy in Wireless and Mobile Networks, WiSec ’20, pages 122–132. Associ- ation for Computing Machinery. URL: https://dl.acm.org/doi/10.1145/3395351. 3399360,doi:10.1145/3395351.3399360

work page doi:10.1145/3395351
[58]

CBMC: goto-cc, 2024

Martin Brain. CBMC: goto-cc, 2024. URL: https://diffblue.github.io/cbmc/group_ _goto-cc.html

work page 2024
[59]

Validation obli- gations: A novel approach to check compliance between requirements and their formal specification

Atif Mashkoor, Michael Leuschel, and Alexander Egyed. Validation obli- gations: A novel approach to check compliance between requirements and their formal specification. In2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 1–

work page
[60]

URL: https://ieeexplore.ieee.org/abstract/document/9402243, doi:10.1109/ ICSE-NIER52604.2021.00009

work page arXiv 2021
[61]

Miller, Lars Fredriksen, and Bryan So

Barton P. Miller, Lars Fredriksen, and Bryan So. An empirical study of the reliability of UNIX utilities. 33(12):32–44. URL: https://dl.acm.org/doi/10.1145/ 96267.96279,doi:10.1145/96267.96279

work page doi:10.1145/96267.96279
[62]

A proactive approach to more secure code | MSRC Blog | Microsoft Security Response Center, 2019

MSRC Team. A proactive approach to more secure code | MSRC Blog | Microsoft Security Response Center, 2019. URL: https://msrc.microsoft.com/blog/2019/07/a- proactive-approach-to-more-secure-code/

work page 2019
[63]

What you corrupt is not what you crash: Challenges in fuzzing em- bedded devices

Marius Muench, Jan Stijohann, Frank Kargl, Aurelien Francillon, and Davide Balzarotti. What you corrupt is not what you crash: Challenges in fuzzing em- bedded devices. InNetwork and Distributed System Security Symposium. Internet Society, 2025. URL: https://www.ndss-symposium.org/wp-content/uploads/2018/ 02/ndss2018_01A-4_Muench_paper.pdf,doi:10.14722/nd...

work page doi:10.14722/ndss.2018.23166 2025
[64]

Full spatial and temporal memory safety for c

Santosh Nagarakatte. Full spatial and temporal memory safety for c. 22(4):30–

work page
[65]

URL: https://ieeexplore.ieee.org/abstract/document/10439147, doi:10.1109/ MSEC.2024.3363142

work page arXiv 2024
[66]

Muhammad A. A. Pirzada, Giles Reger, Ahmed Bhayat, and Lucas C. Cordeiro. LLM-generated invariants for bounded model checking without loop unrolling. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE ’24, pages 1395–1407. Association for Computing Machinery. URL: https://dl.acm.org/doi/10.1145/3691620.3695...

work page doi:10.1145/3691620.3695512
[67]

The ‘shift left’ principle.New Electronics, 54(14):18–21, 2021

Mark Pitchford. The ‘shift left’ principle.New Electronics, 54(14):18–21, 2021. Publisher: Mark Allen Group. URL: https://www.magonlinelibrary.com/doi/full/ 10.12968/S0047-9624%2822%2960234-7, doi:10.12968/S0047-9624(22)60234- 7

work page doi:10.12968/s0047-9624 2021
[68]

Lemix: Enabling testing of embedded applications as linux applications

Sai Ritvik Tanksalkar, Siddharth Muralee, Srihari Danduri, Paschal Amusuo, Antonio Bianchi, James C Davis, and Aravind Kumar Machiry. Lemix: Enabling testing of embedded applications as linux applications. In[USENIX Security’25] USENIX Security Symposium, 2025, pages arXiv–2503, 2025

work page 2025
[69]

Ernst, Jeff H

Brian Robinson, Michael D. Ernst, Jeff H. Perkins, Vinay Augustine, and Nuo Li. Scaling up automated test generation: Automatically generating maintainable regression unit tests for programs. In2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pages 23–32. ISSN: 1938-4300. URL: https://ieeexplore.ieee.org/abstract/d...

work page doi:10.1109/ase 2011
[70]

SoK: Eternal war in memory

László Szekeres, Mathias Payer, Tao Wei, and Dawn Song. SoK: Eternal war in memory. In2013 IEEE Symposium on Security and Privacy, pages 48–62, 2025. ISSN: 1081-6011. URL: https://ieeexplore.ieee.org/abstract/document/6547101, doi:10.1109/SP.2013.13

work page doi:10.1109/sp.2013.13 2025
[71]

Memory-safety verification of open programs with angelic assump- tions

Gourav Takhar, Baldip Bijlani, Prantik Chatterjee, Akash Lal, and Subhajit Roy. Memory-safety verification of open programs with angelic assump- tions. 9:312:1119–312:1147. URL: https://dl.acm.org/doi/10.1145/3763090, doi: 10.1145/3763090

work page doi:10.1145/3763090
[72]

Cordeiro

Norbert Tihanyi, Yiannis Charalambous, Ridhi Jain, Mohamed Amine Ferrag, and Lucas C. Cordeiro. A new era in software security: Towards self-healing software via large language models and formal verification. In2025 IEEE/ACM International Conference on Automation of Software Test (AST), pages 136–147. IEEE Press. URL: https://dl.acm.org/doi/10.1109/AST666...

work page doi:10.1109/ast66626.2025.00020 2025
[73]

Taming the static analysis beast

John Toman and Dan Grossman. Taming the static analysis beast. In Ben- jamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi, editors,2nd Sum- mit on Advances in Programming Languages (SNAPL 2017), volume 71 ofLeib- niz International Proceedings in Informatics (LIPIcs), pages 18:1–18:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. URL: https:...

work page doi:10.4230/lipics.snapl.2017.18 2017
[74]

van Oorschot

Paul C. van Oorschot. Memory errors and memory safety: C as a case study. 21(2):70–76. URL: https://ieeexplore.ieee.org/abstract/document/10102611, doi: 10.1109/MSEC.2023.3236542

work page doi:10.1109/msec.2023.3236542 2023
[75]

UnsafeCop: Towards memory safety for real-world unsafe rust code with practical bounded model checking

Minghua Wang, Jingling Xue, Lin Huang, Yuan Zi, and Tao Wei. UnsafeCop: Towards memory safety for real-world unsafe rust code with practical bounded model checking. In Andre Platzer, Kristin Yvonne Rozier, Matteo Pradella, and Matteo Rossi, editors,Formal Methods, pages 307–324. Springer Nature Switzer- land, 2025.doi:10.1007/978-3-031-71177-0_19

work page doi:10.1007/978-3-031-71177-0_19 2025
[76]

Here comes the wave of insurance claims for the CrowdStrike outage, 2025

Lian Kit Wee. Here comes the wave of insurance claims for the CrowdStrike outage, 2025. URL: https://www.businessinsider.com/businesses-claiming-losses- crowdstrike-outage-insurance-billions-losses-cyber-policies-2024-7

work page 2025
[77]

Cordeiro

Tong Wu, Shale Xiong, Edoardo Manino, Gareth Stockwell, and Lucas C. Cordeiro. Verifying components of arm(r) confidential computing architecture with ESBMC,

work page
[78]

URL: https://arxiv.org/abs/2406.04375v1

work page arXiv
[79]

Ckgfuzzer: Llm-based fuzz driver generation enhanced by code knowledge graph,

Hanxiang Xu, Wei Ma, Ting Zhou, Yanjie Zhao, Kai Chen, Qiang Hu, Yang Liu, and Haoyu Wang. CKGFuzzer: LLM-based fuzz driver generation en- hanced by code knowledge graph. In2025 IEEE/ACM 47th International Confer- ence on Software Engineering: Companion Proceedings (ICSE-Companion), pages 243–254. URL: https://ieeexplore.ieee.org/abstract/document/1102425...

work page doi:10.1109/icse-companion66252.2025.00079 2025
[80]

Lorch, Shuai Lu, Fan Yang, Ziqiao Zhou, and Shan Lu

Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Jianan Yao, Weidong Cui, Yeyun Gong, Chris Hawblitzel, Shuvendu Lahiri, Jacob R. Lorch, Shuai Lu, Fan Yang, Ziqiao Zhou, and Shan Lu. AutoVerus: Automated proof generation for rust code, 2024. URL: https://arxiv.org/abs/2409.13082v1

work page arXiv 2024

Showing first 80 references.