pith. machine review for the scientific record. sign in

arxiv: 2605.10712 · v1 · submitted 2026-05-11 · 💻 cs.SE · cs.CR

Recognition: no theorem link

AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:15 UTC · model grok-4.3

classification 💻 cs.SE cs.CR
keywords memory safetyunit proofsautomated verificationprogram synthesisLLMembedded systemssafety propertiescomponent verification
0
0 comments X

The pith

AutoSOUP automates generation of Safety-Oriented Unit Proofs for component memory-safety verification using three techniques and an LLM hybrid.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AutoSOUP as a way to automate component-level memory-safety verification in low-level software by creating unit proofs that capture key verification decisions. These proofs encode choices such as the scope of analysis, loop bounds, and environment models needed to prove or disprove safety properties. The authors define three automatic derivation techniques and combine them with a hybrid LLM-As-Function-Call architecture that pairs deterministic program synthesis with language models to produce justifiable results. If the approach holds, it would lower the expertise barrier that currently limits formal verification in embedded systems and similar domains where dynamic checks are hard to apply.

Core claim

AutoSOUP formalizes Safety-Oriented Unit Proofs as artifacts that encode verification choices for memory-safety properties and derives them automatically through three techniques augmented by an LLM-As-Function-Call hybrid that integrates deterministic program synthesis with large language models to generate sound and justifiable proofs for individual components.

What carries the argument

Safety-Oriented Unit Proofs: artifacts that encode verification choices including scope, loop bounds, and environment models to enable proving absence of memory-safety violations in software components.

If this is right

  • Verification of memory safety for individual components can proceed without manual construction of proofs by experts.
  • Generated proofs can both confirm safety and surface actual vulnerabilities in the components under analysis.
  • The proofs carry explicit characterizations of their assumptions and the guarantees they provide.
  • Memory-safety assurance becomes feasible for embedded systems where hardware protections are limited.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hybrid synthesis-plus-LLM pattern could be applied to automate proofs for other classes of properties such as functional correctness or concurrency safety.
  • Integration into build pipelines would allow repeated checks on evolving components without re-investment of manual effort each time.
  • Performance on larger codebases could be measured by applying the system to established open embedded projects and counting how many components receive complete proofs.

Load-bearing premise

The three automatic techniques together with the LLM-As-Function-Call hybrid can reliably produce unit proofs whose encoded verification choices are both sufficient and sound for the target components.

What would settle it

A concrete case in which an AutoSOUP-generated unit proof asserts memory safety for a component that later exhibits a violation under targeted testing or in which the proof misses a known vulnerability present in the original code.

Figures

Figures reproduced from arXiv: 2605.10712 by Aravind Machiry, Dharun Anandayuvaraj, Elijah Jorgensen, James C. Davis, Kevin Kolyakov, Paschal C. Amusuo, Ricardo Calvo, Taylor Le Lievre.

Figure 1
Figure 1. Figure 1: AutoSOUP automatically identifies the functions [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: AutoSOUP system diagram. AutoSOUP connects safety-oriented unit-proof construction with LLM-as-function-call [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of program reachable LOC, verification [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparing AutoSOUP’s unit proofs with expert [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
read the original abstract

Memory-safety errors remain a persistent source of zero-day vulnerabilities in low-level software. The problem is especially acute in embedded systems, where hardware protections are often limited and dynamic analysis is difficult to apply effectively. Memory-safety verification can provide stronger assurance by proving the absence of such errors or exposing violations when they exist. However, current verification workflows remain largely manual and require substantial specialized expertise, limiting their adoption in practice. We present AutoSOUP, a system for automating component-level memory-safety verification through Safety-Oriented Unit Proofs. We formalize these unit proofs as artifacts that encode verification choices (scope, loop bounds, and environment models) for verifying safety properties, and introduce three techniques for deriving them automatically. To overcome the limitations of existing automation approaches, we further introduce LLM-As-Function-Call, a hybrid architecture that combines deterministic program synthesis with LLMs to automate these techniques and produce justifiable unit proofs. We evaluate AutoSOUP by assessing its ability to automate memory-safety verification and expose vulnerabilities in verified components, and we characterize the assumptions and guarantees of the resulting proofs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents AutoSOUP, a system for automating component-level memory-safety verification through Safety-Oriented Unit Proofs. These proofs are formalized as artifacts encoding verification choices including scope, loop bounds, and environment models. The authors introduce three automatic techniques for deriving such proofs and a hybrid LLM-As-Function-Call architecture that combines deterministic program synthesis with LLMs to generate justifiable unit proofs. The evaluation assesses the system's ability to automate verification and expose vulnerabilities in components while characterizing the assumptions and guarantees of the resulting proofs.

Significance. If the results hold, AutoSOUP could meaningfully lower the barrier to memory-safety verification in embedded and low-level systems by reducing manual effort. The hybrid architecture and explicit characterization of assumptions/guarantees are positive contributions that address limitations of pure automation or standalone LLM use. Credit is due for focusing on justifiable proofs rather than black-box generation. However, the overall significance hinges on whether the generated proofs are demonstrably sound; without that, the automation benefit is limited.

major comments (2)
  1. [Evaluation] Evaluation section: the description of assessing 'ability to automate and expose vulnerabilities' does not include independent soundness validation of the generated unit proofs (e.g., manual inspection of chosen scopes/loop bounds/environment models or comparison to ground-truth proofs). This is load-bearing for the central claim that the proofs are justifiable and support reliable safety verification without false negatives.
  2. [LLM-As-Function-Call architecture] LLM-As-Function-Call architecture (likely §4 or equivalent): while the hybrid design is introduced, the manuscript must show how the deterministic synthesis component constrains LLM-proposed choices (loop bounds, environment models) to prevent unsound outputs; the abstract's claim that assumptions/guarantees are characterized does not substitute for this mechanism or its empirical validation.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by including at least one quantitative result (e.g., number of components handled or success rate) to give readers an immediate sense of scale.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments highlight important aspects of our evaluation and architecture that we will clarify and strengthen in the revision. We address each major comment below.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the description of assessing 'ability to automate and expose vulnerabilities' does not include independent soundness validation of the generated unit proofs (e.g., manual inspection of chosen scopes/loop bounds/environment models or comparison to ground-truth proofs). This is load-bearing for the central claim that the proofs are justifiable and support reliable safety verification without false negatives.

    Authors: We agree that explicit independent soundness validation would strengthen the evaluation and better support the claim of justifiable proofs. Our current evaluation demonstrates successful automation and vulnerability exposure while characterizing assumptions and guarantees of the generated proofs. In the revision, we will add a dedicated subsection to the evaluation that includes manual inspection of a representative sample of generated unit proofs. This will cover verification of chosen scopes, loop bounds, and environment models, with comparisons to ground-truth proofs where available, to confirm the absence of false negatives. revision: yes

  2. Referee: [LLM-As-Function-Call architecture] LLM-As-Function-Call architecture (likely §4 or equivalent): while the hybrid design is introduced, the manuscript must show how the deterministic synthesis component constrains LLM-proposed choices (loop bounds, environment models) to prevent unsound outputs; the abstract's claim that assumptions/guarantees are characterized does not substitute for this mechanism or its empirical validation.

    Authors: The LLM-As-Function-Call architecture combines deterministic program synthesis with LLMs specifically to produce justifiable proofs by constraining LLM outputs. We will revise the description in §4 to explicitly detail the constraining mechanisms, including how the synthesis component validates and enforces soundness on LLM-proposed loop bounds and environment models (e.g., by rejecting invalid or unsound choices via static checks). We will also add empirical validation through additional case-study results demonstrating that the generated proofs introduce no false negatives, thereby supporting the characterization of assumptions and guarantees. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents AutoSOUP as an engineering system that formalizes Safety-Oriented Unit Proofs and combines deterministic program synthesis with LLMs to generate them. No equations, fitted parameters, or self-referential definitions appear in the abstract or description. The central claims rest on an evaluation of automation capability and vulnerability exposure rather than any derivation that reduces to its own inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text, so the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; full text would be required to populate this ledger.

pith-pipeline@v0.9.0 · 5521 in / 1141 out tokens · 34870 ms · 2026-05-12T04:15:39.364345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 1 internal anchor

  1. [1]

    URL: https://docs.litellm.ai/docs/

    LiteLLM - getting started | liteLLM. URL: https://docs.litellm.ai/docs/

  2. [2]

    original-date: 2020-10-25T23:23:54Z

    openai/openai-python. original-date: 2020-10-25T23:23:54Z. URL: https://github. com/openai/openai-python

  3. [3]

    URL: https://www.computer.org/csdl/ magazine/co/2001/01/r1135/13rRUwgyOg9

    Software defect reduction top 10 list. URL: https://www.computer.org/csdl/ magazine/co/2001/01/r1135/13rRUwgyOg9

  4. [4]

    URL: https://www.swebench.com/index.html

    SWE-bench leaderboards. URL: https://www.swebench.com/index.html

  5. [5]

    Second edition

    IEC 61508-3: Functional safety of electrical/electronic/programmable electronic safety-related systems – part 3: Software requirements, 2010. Second edition

  6. [6]

    URL: https: //www.crowdstrike.com/en-us/blog/channel-file-291-rca-available/

    Channel File 291 Incident RCA is Available | CrowdStrike, 2025. URL: https: //www.crowdstrike.com/en-us/blog/channel-file-291-rca-available/

  7. [7]

    DARPA Guide for Formal Methods to Deliver Resilient Systems for Proposals,

  8. [8]

    URL: https://defencescienceinstitute.com/wp-content/uploads/2025/01/ Resilient_Systems_Best_Practices_Guide_-_1-9-2025.pdf

  9. [9]

    In the Wild

    Project Zero: 0day "In the Wild", 2025. URL: https://googleprojectzero.blogspot. com/p/0day.html

  10. [10]

    Challenges in de- signing exploit mitigations for deeply embedded systems

    Ali Abbasi, Jos Wetzels, Thorsten Holz, and Sandro Etalle. Challenges in de- signing exploit mitigations for deeply embedded systems. In2019 IEEE Euro- pean Symposium on Security and Privacy (EuroS&P), pages 31–46, 2025. URL: https://ieeexplore.ieee.org/abstract/document/8806725, doi:10.1109/EuroSP. 2019.00013

  11. [11]

    Securing tomorrow’s software: the need for memory safety standards, 2025

    Alex Rebert, Ben Laurie, Murali Vijayaraghavan, and Alex Richardson. Securing tomorrow’s software: the need for memory safety standards, 2025. URL: https:// security.googleblog.com/2025/02/securing-tomorrows-software-need-for.html

  12. [12]

    Safer with Google: Advancing Memory Safety, 2024

    Alex Rebert, Chandler Carruth, Jen Engel, and Andy Qin. Safer with Google: Advancing Memory Safety, 2024. URL: https://security.googleblog.com/2024/10/ safer-with-google-advancing-memory.html

  13. [13]

    Amusuo, Owen Cochell, Taylor Le Lievre, Parth V

    Paschal C. Amusuo, Owen Cochell, Taylor Le Lievre, Parth V. Patil, Aravind Machiry, and James C. Davis. Do unit proofs work? an empirical study of composi- tional bounded model checking for memory safety verification. In2026 IEEE/ACM 48th International Conference on Software Engineering. URL: http://arxiv.org/abs/ 2503.13762,arXiv:2503.13762[cs],doi:10.48...

  14. [14]

    Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez, Jonathan Met- zman, Oliver Chang, and James C

    Paschal C. Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez, Jonathan Met- zman, Oliver Chang, and James C. Davis. FalseCrashReducer: Mitigating false positive crashes in OSS-fuzz-gen using agentic AI. URL: http://arxiv.org/abs/ 2510.02185,arXiv:2510.02185[cs],doi:10.48550/arXiv.2510.02185

  15. [15]

    Log parsing: How far can chatgpt go? InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, ASE ’23, page 1699–1704

    Paschal C. Amusuo, Ricardo Andrés Calvo Méndez, Zhongwei Xu, Aravind Machiry, and James C. Davis. Systematically detecting packet validation vul- nerabilities in embedded network stacks. In2023 38th IEEE/ACM Interna- tional Conference on Automated Software Engineering (ASE), pages 926–938, 2023. ISSN: 2643-1572. URL: https://ieeexplore.ieee.org/abstract/d...

  16. [16]

    Mousavi Engineering: New Ideas and Emerging Results (ICSE-NIER)

    Paschal C. Amusuo, Parth V. Patil, Owen Cochell, Taylor Le Lievre, and James C. Davis. A unit proofing framework for code-level verification: A research agenda. In2025 IEEE/ACM 47th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 36–40. URL: https://ieeexplore.ieee.org/ abstract/document/11023946,doi:10....

  17. [17]

    Arthur Azevedo de Amorim, Cătălin Hriţcu, and Benjamin C. Pierce. The meaning of memory safety. In Lujo Bauer and Ralf Küsters, editors,Principles of Security and Trust, pages 79–105. Springer International Publishing. doi:10.1007/978- 3-319-89722-6_4

  18. [18]

    Bensalem, M

    S. Bensalem, M. Bozga, T.-H. Nguyen, and J. Sifakis. Compositional verification for component-based systems and application.IET Software, 4(3):181–193, 2010. Publisher: IET Digital Library. URL: https://digital-library.theiet.org/content/ journals/10.1049/iet-sen.2009.0011,doi:10.1049/iet-sen.2009.0011

  19. [19]

    Protocol-aware firmware rehosting for effective fuzzing of embedded network stacks

    Moritz Bley, Tobias Scharnowski, Simon Wörner, Moritz Schloegel, and Thorsten Holz. Protocol-aware firmware rehosting for effective fuzzing of embedded network stacks. InProceedings of the 2025 ACM SIGSAC Conference on Com- puter and Communications Security, CCS ’25, pages 4484–4498. Association for Computing Machinery. URL: https://dl.acm.org/doi/10.1145...

  20. [20]

    The Urgent Need for Memory Safety in Software Products | CISA, September 2025

    Bob Lord. The Urgent Need for Memory Safety in Software Products | CISA, September 2025. URL: https://www.cisa.gov/news-events/news/urgent-need- memory-safety-software-products

  21. [21]

    Infer: An automatic program verifier for memory safety of c programs

    Cristiano Calcagno and Dino Distefano. Infer: An automatic program verifier for memory safety of c programs. In Mihaela Bobaru, Klaus Havelund, Gerard J. Holzmann, and Rajeev Joshi, editors,NASA Formal Methods, pages 459–465. Springer.doi:10.1007/978-3-642-20398-5_33

  22. [22]

    Compo- sitional shape analysis by means of bi-abduction

    Cristiano Calcagno, Dino Distefano, Peter O’Hearn, and Hongseok Yang. Compo- sitional shape analysis by means of bi-abduction. InProceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’09, pages 289–300. Association for Computing Machinery, 2025. URL: https: //dl.acm.org/doi/10.1145/1480881.1480917,doi:10....

  23. [23]

    Static detection of packet injection vulnerabilities: A case for identifying attacker-controlled implicit information leaks

    Qi Alfred Chen, Zhiyun Qian, Yunhan Jack Jia, Yuru Shao, and Zhuoqing Morley Mao. Static detection of packet injection vulnerabilities: A case for identifying attacker-controlled implicit information leaks. InProceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, pages 388–400. Association for Computing Machinery, ...

  24. [24]

    BLITZ: Compositional bounded model checking for real-world programs

    Chia Yuan Cho, Vijay D’Silva, and Dawn Song. BLITZ: Compositional bounded model checking for real-world programs. In2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 136–146, 2013. URL: https://ieeexplore.ieee.org/abstract/document/6693074, doi:10.1109/ASE.2013. 6693074

  25. [25]

    Monteiro, Daniel Schwartz-Narbonne, Serdar Tasiran, Michael Tautschnig, and Mark R

    Nathan Chong, Byron Cook, Konstantinos Kallas, Kareem Khazem, Felipe R. Monteiro, Daniel Schwartz-Narbonne, Serdar Tasiran, Michael Tautschnig, and Mark R. Tuttle. Code-level model checking in the software development work- flow. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEI...

  26. [26]

    Bounded model checking using satisfiability solving.Formal Methods in System Design, 19(1):7–34, 2001.doi:10.1023/A:1011276507260

    Edmund Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu. Bounded model checking using satisfiability solving.Formal Methods in System Design, 19(1):7–34, 2001.doi:10.1023/A:1011276507260

  27. [27]

    Cobleigh, Dimitra Giannakopoulou & Corina S

    Jamieson M. Cobleigh, Dimitra Giannakopoulou, and Corina S. PĂsĂreanu. Learn- ing assumptions for compositional verification. In Hubert Garavel and John Hatcliff, editors,Tools and Algorithms for the Construction and Analysis of Systems, pages 331–346. Springer, 2003.doi:10.1007/3-540-36577-X_24

  28. [28]

    External technical root cause analysis — channel file

    CrowdStrike, Inc. External technical root cause analysis — channel file

  29. [29]

    Accessed January 2026

    Technical report, CrowdStrike, August 2024. Accessed January 2026. URL: https://www.crowdstrike.com/wp-content/uploads/2024/08/Channel-File- 291-Incident-Root-Cause-Analysis-08.06.2024.pdf

  30. [30]

    Modeling readability to improve unit tests

    Ermira Daka, José Campos, Gordon Fraser, Jonathan Dorn, and Westley Weimer. Modeling readability to improve unit tests. InProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 107–118. Association for Computing Machinery. URL: https://dl.acm.org/doi/10.1145/ 2786805.2786838,doi:10.1145/2786805.2786838

  31. [31]

    Lahiri, Akash Lal, and Yi Li

    Ankush Das, Shuvendu K. Lahiri, Akash Lal, and Yi Li. Angelic verification: Precise verification modulo unknowns. In Daniel Kroening and Corina S. Păsăre- anu, editors,Computer Aided Verification, pages 324–342. Springer International Publishing, 2015.doi:10.1007/978-3-319-21690-4_19

  32. [32]

    A guide to stakeholder analysis for cybersecurity researchers.arXiv preprint arXiv:2508.14796, 2025

    James C Davis, Sophie Chen, Huiyun Peng, Paschal C Amusuo, and Kelechi G Kalu. A guide to stakeholder analysis for cybersecurity researchers.arXiv preprint arXiv:2508.14796, 2025

  33. [33]

    Dino Distefano, Manuel Fähndrich, Francesco Logozzo, and Peter W. O’Hearn. Scaling static analyses at facebook.Communications of the ACM, 62(8):62–70,

  34. [34]

    URL: https://dl.acm.org/doi/10.1145/3338112,doi:10.1145/3338112

  35. [35]

    SoK: Enabling security analyses of embedded systems via rehosting

    Andrew Fasano, Tiemoko Ballo, Marius Muench, Tim Leek, Alexander Bulekov, Brendan Dolan-Gavitt, Manuel Egele, Aurélien Francillon, Long Lu, Nick Gregory, Davide Balzarotti, and William Robertson. SoK: Enabling security analyses of embedded systems via rehosting. InProceedings of the 2021 ACM Asia Conference on Computer and Communications Security, ASIA CC...

  36. [36]

    {P2IM}: Scalable and hardware- independent firmware testing via automatic peripheral interface modeling

    Bo Feng, Alejandro Mera, and Long Lu. {P2IM}: Scalable and hardware- independent firmware testing via automatic peripheral interface modeling. In 29th USENIX Security Symposium (USENIX Security 20), pages 1237–1254, 2020. URL: https://www.usenix.org/conference/usenixsecurity20/presentation/feng

  37. [37]

    Deductive software verification

    Jean-Christophe Filliâtre. Deductive software verification. 13(5):397–403. doi: 10.1007/s10009-011-0211-0

  38. [38]

    In: Chandra, S., Blincoe, K., Tonella, P

    Emily First, Markus N. Rabe, Talia Ringer, and Yuriy Brun. Baldur: Whole- proof generation and repair with large language models. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, pages 1229–1241. Association for Computing Machinery, 2023. URL: https://dl.a...

  39. [39]

    An em- pirical study on the correctness of formally verified distributed systems

    Pedro Fonseca, Kaiyuan Zhang, Xi Wang, and Arvind Krishnamurthy. An em- pirical study on the correctness of formally verified distributed systems. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys ’17, pages 328–343. Association for Computing Machinery, 2017. URL: https: //dl.acm.org/doi/10.1145/3064176.3064183,doi:10.1145/306...

  40. [40]

    Mikhail Y. R. Gadelha, Hussama I. Ismail, and Lucas C. Cordeiro. Handling loops in bounded model checking of c programs via k-induction. 19(1):97–114. doi:10.1007/s10009-015-0407-9

  41. [41]

    What is memory safety? - the PL enthusiast

    Michael Hicks. What is memory safety? - the PL enthusiast. URL: http://www.pl- enthusiast.net/2014/07/21/memory-safety/

  42. [42]

    Lessons from Formally Verified Deployed Software.CoRR, March 2023

    Li Huang, Sophie Ebersold, Alexander Kogtenkov, Bertrand Meyer, and Yinling Liu. Lessons from Formally Verified Deployed Software.CoRR, March 2023. URL: http://arxiv.org/abs/2301.02206,doi:10.48550/arXiv.2301.02206

  43. [43]

    In: 26th IEEE/ACM Interna- tional Conference on Automated Software Engineering, ASE 2011

    Franjo Ivančić, Gogul Balakrishnan, Aarti Gupta, Sriram Sankaranarayanan, Naoto Maeda, Hiroki Tokuoka, Takashi Imoto, and Yoshiaki Miyazaki. DC2: A framework for scalable, scope-bounded software verification. In2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pages 133–142. ISSN: 1938-4300. URL: https://ieeexplore....

  44. [44]

    Ganai, Aarti Gupta, and Pranav Ashar

    Franjo Ivančić, Zijiang Yang, Malay K. Ganai, Aarti Gupta, and Pranav Ashar. Efficient SAT-based bounded model checking for software verification. 404(3):256–

  45. [45]

    URL: https://www.sciencedirect.com/science/article/pii/S0304397508002223, doi:10.1016/j.tcs.2008.03.013

  46. [46]

    Artemis: Toward accurate detection of server-side request forgeries through LLM-assisted inter-procedural path-sensitive taint analysis

    Yuchen Ji, Ting Dai, Zhichao Zhou, Yutian Tang, and Jingzhu He. Artemis: Toward accurate detection of server-side request forgeries through LLM-assisted inter-procedural path-sensitive taint analysis. 9:128:1349–128:1377. URL: https: //dl.acm.org/doi/10.1145/3720488,doi:10.1145/3720488

  47. [47]

    Getting started - the kani rust verifier, 2024

    kani. Getting started - the kani rust verifier, 2024. URL: https://model-checking. github.io/kani/

  48. [48]

    Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

    Saketh Ram Kasibatla, Arpan Agarwal, Yuriy Brun, Sorin Lerner, Talia Ringer, and Emily First. Cobblestone: A divide-and-conquer approach for automating formal verification. URL: https://arxiv.org/abs/2410.19940v4, doi:10.1145/3744916. 3773178

  49. [49]

    Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., and Winwood, S

    Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. seL4: formal verification of an OS kernel. InProceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP ’09, pages 207–220. Ass...

  50. [50]

    CBMC – c bounded model checker

    Daniel Kroening and Michael Tautschnig. CBMC – c bounded model checker. In Erika Ábrahám and Klaus Havelund, editors,Tools and Algorithms for the Construction and Analysis of Systems, pages 389–391. Springer, 2014. doi:10. 1007/978-3-642-54862-8_26

  51. [51]

    New EU product liability directive comes into force

    Latham & Watkins LLP. New EU product liability directive comes into force. Client alert / legal briefing, December 2024. Accessed January

  52. [52]

    URL: https://www.lw.com/admin/upload/SiteAttachments/New-EU- Product-Liability-Directive-Comes-Into-Force.pdf

  53. [53]

    Can LLMs reason about program se- mantics? a comprehensive evaluation of LLMs on formal specification inference

    Thanh Le-Cong, Bach Le, and Toby Murray. Can LLMs reason about program se- mantics? a comprehensive evaluation of LLMs on formal specification inference. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Com- putational Linguistics (Volume 1: Long Papers), ...

  54. [54]

    Assessing the impact of interface vulnerabilities in compart- mentalized software

    Hugo Lefeuvre, Vlad-Andrei Bădoiu, Yi Chen, Felipe Huici, Nathan Dautenhahn, and Pierre Olivier. Assessing the impact of interface vulnerabilities in compart- mentalized software. InProceedings 2023 Network and Distributed System Security Symposium. Internet Society.doi:10.14722/ndss.2023.24117

  55. [55]

    Memory state verification based on inductive and deductive reasoning

    Shaofeng Li, Lei Qiao, and Mengfei Yang. Memory state verification based on inductive and deductive reasoning. 70(3):1026–1039. URL: https://ieeexplore.ieee. org/abstract/document/9435092,doi:10.1109/TR.2021.3074709

  56. [56]

    IRIS: LLM-assisted static analysis for detecting security vulnerabilities

    Ziyang Li, Saikat Dutta, and Mayur Naik. IRIS: LLM-assisted static analysis for detecting security vulnerabilities. In Y. Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, pages 35735–35758. URL: https://proceedings.iclr.cc/paper_files/paper/2025/file/ 582d4e27fa24168f3af1f4582655034b-Pape...

  57. [57]

    BaseSAFE: baseband sanitized fuzzing through emulation

    Dominik Maier, Lukas Seidel, and Shinjo Park. BaseSAFE: baseband sanitized fuzzing through emulation. InProceedings of the 13th ACM Conference on Security and Privacy in Wireless and Mobile Networks, WiSec ’20, pages 122–132. Associ- ation for Computing Machinery. URL: https://dl.acm.org/doi/10.1145/3395351. 3399360,doi:10.1145/3395351.3399360

  58. [58]

    CBMC: goto-cc, 2024

    Martin Brain. CBMC: goto-cc, 2024. URL: https://diffblue.github.io/cbmc/group_ _goto-cc.html

  59. [59]

    Validation obli- gations: A novel approach to check compliance between requirements and their formal specification

    Atif Mashkoor, Michael Leuschel, and Alexander Egyed. Validation obli- gations: A novel approach to check compliance between requirements and their formal specification. In2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pages 1–

  60. [60]

    URL: https://ieeexplore.ieee.org/abstract/document/9402243, doi:10.1109/ ICSE-NIER52604.2021.00009

  61. [61]

    Miller, Lars Fredriksen, and Bryan So

    Barton P. Miller, Lars Fredriksen, and Bryan So. An empirical study of the reliability of UNIX utilities. 33(12):32–44. URL: https://dl.acm.org/doi/10.1145/ 96267.96279,doi:10.1145/96267.96279

  62. [62]

    A proactive approach to more secure code | MSRC Blog | Microsoft Security Response Center, 2019

    MSRC Team. A proactive approach to more secure code | MSRC Blog | Microsoft Security Response Center, 2019. URL: https://msrc.microsoft.com/blog/2019/07/a- proactive-approach-to-more-secure-code/

  63. [63]

    What you corrupt is not what you crash: Challenges in fuzzing em- bedded devices

    Marius Muench, Jan Stijohann, Frank Kargl, Aurelien Francillon, and Davide Balzarotti. What you corrupt is not what you crash: Challenges in fuzzing em- bedded devices. InNetwork and Distributed System Security Symposium. Internet Society, 2025. URL: https://www.ndss-symposium.org/wp-content/uploads/2018/ 02/ndss2018_01A-4_Muench_paper.pdf,doi:10.14722/nd...

  64. [64]

    Full spatial and temporal memory safety for c

    Santosh Nagarakatte. Full spatial and temporal memory safety for c. 22(4):30–

  65. [65]

    URL: https://ieeexplore.ieee.org/abstract/document/10439147, doi:10.1109/ MSEC.2024.3363142

  66. [66]

    Muhammad A. A. Pirzada, Giles Reger, Ahmed Bhayat, and Lucas C. Cordeiro. LLM-generated invariants for bounded model checking without loop unrolling. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE ’24, pages 1395–1407. Association for Computing Machinery. URL: https://dl.acm.org/doi/10.1145/3691620.3695...

  67. [67]

    The ‘shift left’ principle.New Electronics, 54(14):18–21, 2021

    Mark Pitchford. The ‘shift left’ principle.New Electronics, 54(14):18–21, 2021. Publisher: Mark Allen Group. URL: https://www.magonlinelibrary.com/doi/full/ 10.12968/S0047-9624%2822%2960234-7, doi:10.12968/S0047-9624(22)60234- 7

  68. [68]

    Lemix: Enabling testing of embedded applications as linux applications

    Sai Ritvik Tanksalkar, Siddharth Muralee, Srihari Danduri, Paschal Amusuo, Antonio Bianchi, James C Davis, and Aravind Kumar Machiry. Lemix: Enabling testing of embedded applications as linux applications. In[USENIX Security’25] USENIX Security Symposium, 2025, pages arXiv–2503, 2025

  69. [69]

    Ernst, Jeff H

    Brian Robinson, Michael D. Ernst, Jeff H. Perkins, Vinay Augustine, and Nuo Li. Scaling up automated test generation: Automatically generating maintainable regression unit tests for programs. In2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pages 23–32. ISSN: 1938-4300. URL: https://ieeexplore.ieee.org/abstract/d...

  70. [70]

    SoK: Eternal war in memory

    László Szekeres, Mathias Payer, Tao Wei, and Dawn Song. SoK: Eternal war in memory. In2013 IEEE Symposium on Security and Privacy, pages 48–62, 2025. ISSN: 1081-6011. URL: https://ieeexplore.ieee.org/abstract/document/6547101, doi:10.1109/SP.2013.13

  71. [71]

    Memory-safety verification of open programs with angelic assump- tions

    Gourav Takhar, Baldip Bijlani, Prantik Chatterjee, Akash Lal, and Subhajit Roy. Memory-safety verification of open programs with angelic assump- tions. 9:312:1119–312:1147. URL: https://dl.acm.org/doi/10.1145/3763090, doi: 10.1145/3763090

  72. [72]

    Cordeiro

    Norbert Tihanyi, Yiannis Charalambous, Ridhi Jain, Mohamed Amine Ferrag, and Lucas C. Cordeiro. A new era in software security: Towards self-healing software via large language models and formal verification. In2025 IEEE/ACM International Conference on Automation of Software Test (AST), pages 136–147. IEEE Press. URL: https://dl.acm.org/doi/10.1109/AST666...

  73. [73]

    Taming the static analysis beast

    John Toman and Dan Grossman. Taming the static analysis beast. In Ben- jamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi, editors,2nd Sum- mit on Advances in Programming Languages (SNAPL 2017), volume 71 ofLeib- niz International Proceedings in Informatics (LIPIcs), pages 18:1–18:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. URL: https:...

  74. [74]

    van Oorschot

    Paul C. van Oorschot. Memory errors and memory safety: C as a case study. 21(2):70–76. URL: https://ieeexplore.ieee.org/abstract/document/10102611, doi: 10.1109/MSEC.2023.3236542

  75. [75]

    UnsafeCop: Towards memory safety for real-world unsafe rust code with practical bounded model checking

    Minghua Wang, Jingling Xue, Lin Huang, Yuan Zi, and Tao Wei. UnsafeCop: Towards memory safety for real-world unsafe rust code with practical bounded model checking. In Andre Platzer, Kristin Yvonne Rozier, Matteo Pradella, and Matteo Rossi, editors,Formal Methods, pages 307–324. Springer Nature Switzer- land, 2025.doi:10.1007/978-3-031-71177-0_19

  76. [76]

    Here comes the wave of insurance claims for the CrowdStrike outage, 2025

    Lian Kit Wee. Here comes the wave of insurance claims for the CrowdStrike outage, 2025. URL: https://www.businessinsider.com/businesses-claiming-losses- crowdstrike-outage-insurance-billions-losses-cyber-policies-2024-7

  77. [77]

    Cordeiro

    Tong Wu, Shale Xiong, Edoardo Manino, Gareth Stockwell, and Lucas C. Cordeiro. Verifying components of arm(r) confidential computing architecture with ESBMC,

  78. [78]

    URL: https://arxiv.org/abs/2406.04375v1

  79. [79]

    Ckgfuzzer: Llm-based fuzz driver generation enhanced by code knowledge graph,

    Hanxiang Xu, Wei Ma, Ting Zhou, Yanjie Zhao, Kai Chen, Qiang Hu, Yang Liu, and Haoyu Wang. CKGFuzzer: LLM-based fuzz driver generation en- hanced by code knowledge graph. In2025 IEEE/ACM 47th International Confer- ence on Software Engineering: Companion Proceedings (ICSE-Companion), pages 243–254. URL: https://ieeexplore.ieee.org/abstract/document/1102425...

  80. [80]

    Lorch, Shuai Lu, Fan Yang, Ziqiao Zhou, and Shan Lu

    Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Jianan Yao, Weidong Cui, Yeyun Gong, Chris Hawblitzel, Shuvendu Lahiri, Jacob R. Lorch, Shuai Lu, Fan Yang, Ziqiao Zhou, and Shan Lu. AutoVerus: Automated proof generation for rust code, 2024. URL: https://arxiv.org/abs/2409.13082v1

Showing first 80 references.