Recognition: unknown
From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification
Pith reviewed 2026-05-08 11:15 UTC · model grok-4.3
The pith
Open-weight LLMs achieve high rates of formal verification when generating code from natural language using structured prompting and Dafny feedback.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By supplying structural signatures of the target code and allowing models to iteratively repair outputs using feedback from the Dafny verifier, open-weight LLMs can produce both implementations and annotations that satisfy formal proofs on the NL2VC-60 problems, with functional correctness further ensured by uDebug validation; contextless prompting yields near-zero success while the combined strategy yields rates above 80 percent for several models.
What carries the argument
The tiered prompting strategy of contextless prompts, signature prompts supplying structural anchors, and self-healing prompts that incorporate iterative feedback from the Dafny verifier, applied to the NL2VC-60 dataset and cross-checked with uDebug functional tests.
If this is right
- Contextless prompting produces near-universal failure on verifiable code generation.
- Signature and self-healing prompting produce large gains, reaching 90.91 percent verification success for Gemma 4-31B and 81.82 percent for GPT-OSS 120B.
- Open-weight LLMs become practical tools for synthesizing the annotations needed for high-assurance software.
- Formal verification can be integrated directly into the natural-language-to-code workflow rather than applied after the fact.
Where Pith is reading between the lines
- Teams without formal-methods expertise could use the same loop to draft reliable implementations for safety-critical modules.
- The method might extend to larger codebases if the prompting patterns generalize beyond single-algorithm problems.
- Because the models are open-weight, the approach could be deployed locally or fine-tuned on organization-specific verification tasks.
Load-bearing premise
The 11 randomly selected problems represent the general class of complex algorithmic tasks, and uDebug tests combined with Dafny proofs reliably exclude trivial or vacuous specifications.
What would settle it
A new collection of algorithmic problems on which even signature and self-healing prompts produce low verification rates, or cases where Dafny accepts the output yet uDebug reveals functional errors.
read the original abstract
Large Language Models (LLMs) show promise in automated software engineering, yet their guarantee of correctness is frequently undermined by erroneous or hallucinated code. To enforce model honesty, formal verification requires LLMs to synthesize implementation logic alongside formal specifications that are subsequently proven correct by a mathematical verifier. However, the transition from informal natural language to precise formal specification remains an arduous task. Our work addresses this by providing the NaturalLanguage2VerifiedCode (NL2VC)-60 dataset: a collection of 60 complex algorithmic problems. We evaluate 11 randomly selected problem sets across seven open-weight LLMs using a tiered prompting strategy: contextless prompts, signature prompts providing structural anchors, and self-healing prompts utilizing iterative feedback from the Dafny verifier. To address vacuous verification, where models satisfy verifiers with trivial specifications, we integrate the uDebug platform to ensure functional validation. Our results show that while contextless prompting leads to near-universal failure, structural signatures and iterative self-healing facilitate a dramatic performance turnaround. Specifically, Gemma 4-31B achieved a 90.91\% verification success rate, while GPT-OSS 120B rose from zero to 81.82\% success with signature-guided feedback. These findings indicate that formal verification is now attainable for open-weight LLMs, which serve as effective apprentices for synthesizing complex annotations and facilitating high-assurance software development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the NL2VC-60 dataset of 60 complex algorithmic problems and evaluates seven open-weight LLMs on 11 randomly selected problems using tiered prompting (contextless, signature prompts providing structural anchors, and self-healing with iterative Dafny verifier feedback) combined with uDebug functional validation to prevent vacuous specifications. It reports near-zero success with contextless prompts but dramatic improvements to 90.91% verification success for Gemma 4-31B and 81.82% for GPT-OSS 120B after self-healing, concluding that formal verification is now attainable for open-weight LLMs as apprentices for high-assurance code synthesis.
Significance. If the empirical results hold under broader evaluation, the work offers a concrete demonstration that open-weight LLMs can generate Dafny-annotated code that passes both functional tests and formal proofs, with the tiered prompting and uDebug integration providing a practical mechanism to reduce hallucinations. The use of an external verifier (Dafny) and functional oracle (uDebug) to enforce correctness is a methodological strength that could inform future AI-assisted formal methods pipelines.
major comments (2)
- [Abstract] Abstract and results description: the central claim that 'formal verification is now attainable for open-weight LLMs' and that they serve as 'effective apprentices for synthesizing complex annotations' rests on success rates from only 11 randomly selected problems out of NL2VC-60, with no disclosed list of the problems, selection criteria, complexity classification (e.g., data structures or asymptotic bounds), or argument that the random draw avoids bias toward easier cases where signature prompts plus verifier feedback suffice.
- [Results and evaluation description] Evaluation methodology (as described in the abstract and results): the paper invokes uDebug to rule out vacuous specifications but provides no details on the test suites used, how they were constructed, or statistical significance of the reported rates (e.g., confidence intervals or exact problem selection process), leaving the 'dramatic performance turnaround' claim only partially supported and vulnerable to selection bias.
minor comments (2)
- [Abstract] The abstract and main text would benefit from an explicit link or repository reference to the NL2VC-60 dataset and any accompanying code or prompts for reproducibility.
- [Throughout] Notation for model names (e.g., 'Gemma 4-31B', 'GPT-OSS 120B') and the exact definition of 'self-healing prompts' could be clarified with a brief example or table in the methods section.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. These have prompted us to enhance the transparency of our problem selection process and evaluation methodology. We respond to each major comment below and have incorporated the suggested clarifications into the revised manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract and results description: the central claim that 'formal verification is now attainable for open-weight LLMs' and that they serve as 'effective apprentices for synthesizing complex annotations' rests on success rates from only 11 randomly selected problems out of NL2VC-60, with no disclosed list of the problems, selection criteria, complexity classification (e.g., data structures or asymptotic bounds), or argument that the random draw avoids bias toward easier cases where signature prompts plus verifier feedback suffice.
Authors: We agree that additional details on the selected problems are necessary to fully support our claims. In the revised manuscript, we have added a dedicated table in the results section that lists all 11 problems, their domains (e.g., sorting, graph algorithms, dynamic programming), and asymptotic complexity bounds. The selection process is now explicitly described as uniform random sampling from the NL2VC-60 dataset using a fixed seed for reproducibility. We further argue that NL2VC-60 was designed to span a representative range of algorithmic difficulties, and the random draw therefore does not systematically favor easier instances. These changes provide concrete evidence that the observed success rates are not an artifact of problem selection. revision: yes
-
Referee: [Results and evaluation description] Evaluation methodology (as described in the abstract and results): the paper invokes uDebug to rule out vacuous specifications but provides no details on the test suites used, how they were constructed, or statistical significance of the reported rates (e.g., confidence intervals or exact problem selection process), leaving the 'dramatic performance turnaround' claim only partially supported and vulnerable to selection bias.
Authors: We have substantially expanded the evaluation methodology section. For each of the 11 problems we now detail the uDebug test suites, which were constructed from the canonical inputs provided in the original problem statements and augmented with systematically generated edge cases and boundary conditions to ensure non-vacuous specifications. We report the number of tests per problem and their coverage metrics. In addition, we have added 95% binomial confidence intervals for all reported success rates to quantify statistical uncertainty given the sample size. The random selection procedure is stated with the exact seed and sampling method. These revisions directly address concerns about transparency and strengthen the empirical foundation of the performance claims. revision: yes
Circularity Check
No circularity; purely empirical measurements against external verifiers
full rationale
The paper presents success rates obtained by executing LLMs on 11 problems from the NL2VC-60 dataset under tiered prompting, with outcomes checked by the independent Dafny verifier and uDebug functional tests. No equations, fitted parameters, predictions, or first-principles derivations appear that could reduce to the inputs by construction. The reported percentages (e.g., Gemma 4-31B at 90.91%) are direct experimental counts, not self-defined quantities. No self-citations are invoked as load-bearing uniqueness theorems, and the methodology relies on external tools rather than internal redefinitions. Methodological limitations around sample size are validity concerns, not circularity.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Dafny verifier produces sound proofs for the generated specifications and implementations
- domain assumption uDebug platform supplies independent functional tests that detect vacuous specifications
Reference graph
Works this paper leans on
-
[1]
d.].Microsoft Copilot: Your AI Companion
[n. d.].Microsoft Copilot: Your AI Companion. https: //copilot.microsoft.com/ Accessed: 2026-04-21
2026
-
[2]
UVa Online Judge
2026. UVa Online Judge. https://onlinejudge.org/. Ac- cessed: 2026-04-19
2026
-
[3]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774(2023)
work page internal anchor Pith review arXiv 2023
-
[4]
Alibaba Qwen Team. 2025. Qwen3-Coder-30B: Spe- cialized Models for Agentic Code Intelligence. https: //huggingface.co/Qwen Accessed: Apr. 22, 2026
2025
-
[5]
Alibaba Qwen Team. 2026. Qwen3.5 and Qwen3.6-MoE: Advancing Open-Weight Foundation Models. https:// github.com/QwenLM/Qwen Accessed: Apr. 22, 2026
2026
-
[6]
Amazon Web Services, Inc. 2024. Amazon Q Devel- oper: AI coding companion. https://aws.amazon.com/q/ developer/ Accessed: 2026-04-21
2024
-
[7]
Anysphere, Inc. 2024. Cursor: The AI Code Editor. https://cursor.com/ Accessed: 2026-04-21. 14
2024
-
[8]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. 2021. Program synthesis with large language models.arXiv preprint arXiv:2108.07732(2021)
work page internal anchor Pith review arXiv 2021
- [9]
- [10]
-
[11]
Clark Barrett, Leonardo De Moura, and Aaron Stump
-
[12]
InInternational Conference on Computer Aided Verification
SMT-COMP: Satisfiability modulo theories com- petition. InInternational Conference on Computer Aided Verification. Springer, 20–23
- [13]
-
[14]
Franck Cassez, Joanne Fuller, Milad K Ghale, David J Pearce, and Horacio MA Quiles. 2023. Formal and executable semantics of the ethereum virtual machine in dafny. InInternational Symposium on Formal Methods. Springer, 571–583
2023
-
[15]
Aleksandar Chakarov, Aleksandr Fedchin, Zvonimir Rakamari´c, and Neha Rungta. 2022. Better counterexam- ples for Dafny. InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 404–411
2022
-
[16]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374(2021)
work page internal anchor Pith review arXiv 2021
-
[17]
Byron Cook. 2018. Formal reasoning about the security of amazon web services. InInternational Conference on Computer Aided Verification. Springer, 38–47
2018
- [18]
-
[19]
Łukasz Czajka, Burak Ekici, and Cezary Kaliszyk. 2018. Concrete semantics with Coq and CoqHammer. InInter- national Conference on Intelligent Computer Mathemat- ics. Springer, 53–59
2018
-
[20]
2024.Dafny Reference Manual
Dafny Team. 2024.Dafny Reference Manual. Dafny Software Foundation. https://dafny.org/ Accessed: 2026- 04-20
2024
-
[21]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. InInternational conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340
2008
-
[22]
Nusrat Farzana Dipu, Muhammad Monir Hossain, Kimia Zamiri Azar, Farimah Farahmandi, and Mark Tehranipoor. 2024. Formalfuzzer: Formal verification assisted fuzz testing for soc vulnerability detection. In 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 355–361
2024
-
[23]
Foster, et al
Aleksandr Fedchin, Tyler Dean, Jeffrey S. Foster, et al
-
[24]
Amazon Science(2023)
A Toolkit for Automated Testing of Dafny. Amazon Science(2023). https://www.amazon.science/ publications/a-toolkit-for-automated-testing-of-dafny
2023
-
[25]
Robert Feldt and Ana Magazinius. 2010. Validity threats in empirical software engineering research-an initial sur- vey.. InSeke. 374–379
2010
-
[26]
Emily First, Markus N Rabe, Talia Ringer, and Yuriy Brun. 2023. Baldur: Whole-proof generation and repair with large language models. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engi- neering. 1229–1241
2023
-
[27]
Simon Frieder, Luca Pinchetti, Chevalier Cheva- lier, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Petersen, and Julius Berner. 2023. Mathematical capabilities of chatgpt.Advances in neu- ral information processing systems36 (2023), 27699– 27744
2023
-
[28]
Hubert Garavel, Maurice H Ter Beek, and Jaco Van De Pol. 2020. The 2020 expert survey on formal meth- ods. InInternational Conference on Formal Methods for Industrial Critical Systems. Springer, 3–69
2020
-
[29]
Louie Giray. 2023. Prompt engineering with ChatGPT: a guide for academic writers.Annals of biomedical engineering51, 12 (2023), 2629–2633
2023
-
[30]
GitHub Community. 2026. Dafny Repositories Search Results. https://github.com/search?q=dafny&type= repositories Accessed: 2026-04-23
2026
-
[31]
2024.Vibe Coding in Google AI Studio: Building Apps with Natural Language
Google. 2024.Vibe Coding in Google AI Studio: Building Apps with Natural Language. https://aistudio.google. com/vibe-code Accessed: Apr. 22, 2026
2024
-
[32]
Google DeepMind. 2026. Gemma 4: Next-Generation Open Multimodal Models. https://ai.google.dev/gemma Accessed: Apr. 22, 2026
2026
-
[33]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)
work page internal anchor Pith review arXiv 2024
-
[34]
Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh
-
[35]
Program synthesis.Foundations and Trends in Programming Languages4, 1-2 (2017), 1–119
2017
-
[36]
Charles Antony Richard Hoare. 1969. An axiomatic basis for computer programming.Commun. ACM12, 10 (1969), 576–580
1969
-
[37]
Ahmed Irfan, Sorawee Porncharoenwase, Zvonimir Rakamari´c, Neha Rungta, and Emina Torlak. 2022. Test- ing Dafny (experience paper). InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 556–567
2022
-
[38]
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, 15 and Pascale Fung. 2023. Survey of hallucination in natural language generation.ACM computing surveys 55, 12 (2023), 1–38
2023
-
[39]
Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygó´ zd´ z, Piotr Miło´s, Yuhuai Wu, and Mateja Jamnik. 2022. Thor: Wielding hammers to integrate language models and automated theorem provers.Advances in Neural Information Pro- cessing Systems35 (2022), 8360–8373
2022
-
[40]
2021.Theories of programming: the life and works of Tony Hoare
Cliff B Jones and Jayadev Misra. 2021.Theories of programming: the life and works of Tony Hoare. ACM
2021
-
[41]
Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Mate- jovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, et al. 2025. Gemma 3 technical report.arXiv preprint arXiv:2503.197864 (2025)
work page internal anchor Pith review arXiv 2025
-
[42]
Gerwin Klein, Philip Derrin, and Kevin Elphinstone
-
[43]
InProceedings of the 14th ACM SIGPLAN international conference on Functional programming
Experience report: sel4: formally verifying a high- performance microkernel. InProceedings of the 14th ACM SIGPLAN international conference on Functional programming. 91–96
-
[44]
Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, and Percy S Liang. 2019. Spoc: Search-based pseudocode to code.Advances in Neural Information Processing Systems32 (2019)
2019
-
[45]
Claire Le Goues, K Rustan M Leino, and Michał Moskal
-
[46]
In International Conference on Software Engineering and Formal Methods
The boogie verification debugger (tool paper). In International Conference on Software Engineering and Formal Methods. Springer, 407–414
-
[47]
K Rustan M Leino. 2010. Dafny: An automatic program verifier for functional correctness. InInternational con- ference on logic for programming artificial intelligence and reasoning. Springer, 348–370
2010
-
[48]
K Rustan M Leino. 2012. Developing verified programs with Dafny. InProceedings of the 2012 ACM conference on High integrity language technology. 9–10
2012
-
[49]
Xavier Leroy. 2009. Formal verification of a realistic compiler.Commun. ACM52, 7 (2009), 107–115
2009
-
[50]
2025.The CompCert C verified compiler: Documentation and user’s manual
Xavier Leroy. 2025.The CompCert C verified compiler: Documentation and user’s manual. Ph. D. Dissertation. Inria
2025
-
[51]
2001.Model-based verification: Analysis guidelines
Grace A Lewis, Santiago Comella-Dorda, David P Gluch, John Hudak, and Charles Weinstock. 2001.Model-based verification: Analysis guidelines. Technical Report
2001
- [52]
-
[53]
Chloe Loughridge, Qinyi Sun, Seth Ahrenbach, Fed- erico Cassano, Chuyue Sun, Ying Sheng, Anish Mudide, Md Rakib Hossain Misu, Nada Amin, and Max Tegmark
- [54]
-
[55]
Lezhi Ma, Shangqing Liu, Yi Li, Xiaofei Xie, and Lei Bu. 2025. Specgen: Automated generation of formal pro- gram specifications via large language models. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 16–28
2025
-
[56]
Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, and Graham Neubig. 2022. Language models of code are few-shot commonsense learners. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 1384–1403
2022
-
[57]
Md Motaleb Hossen Manik and Ge Wang. 2026. Gemma 4, Phi-4, and Qwen3: Accuracy-Efficiency Tradeoffs in Dense and MoE Reasoning Language Models.arXiv preprint arXiv:2604.07035(2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[58]
Ali Mesbah, Arie Van Deursen, and Danny Roest. 2011. Invariant-based automatic testing of modern web appli- cations.IEEE Transactions on Software Engineering38, 1 (2011), 35–53
2011
-
[59]
Microsoft Research. 2024. Dafny: A Language and Program Verifier for Functional Correctness. https://www.microsoft.com/en-us/research/project/ dafny-a-language-and-program-verifier-for-functional-correctness/ Official Project Page
2024
-
[60]
Mistral AI. 2024. Codestral-22B-v0.1: An Open-Weight Model for Professional Coders. https://mistral.ai/news/ codestral/ Accessed: Apr. 22, 2026
2024
-
[61]
Md Rakib Hossain Misu, Cristina V Lopes, Iris Ma, and James Noble. 2024. Towards ai-assisted synthesis of verified dafny methods.Proceedings of the ACM on Software Engineering1, FSE (2024), 812–835
2024
-
[62]
Toby Murray, Daniel Matichuk, Matthew Brassil, Peter Gammie, Timothy Bourke, Sean Seefried, Corey Lewis, Xin Gao, and Gerwin Klein. 2013. seL4: from general purpose to a proof of information flow enforcement. In 2013 IEEE Symposium on Security and Privacy. IEEE, 415–429
2013
-
[63]
Anthony Narkawicz, César A Munoz, and Aaron M Dutle. 2017. The MINERV A software development process. InNASA Formal Methods Symposium (NFM) 2017
2017
-
[64]
Noor Nashid, Mifta Sintaha, and Ali Mesbah. 2023. Retrieval-based prompt selection for code-related few- shot learning. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2450–2462
2023
-
[65]
James Noble, David Streader, Isaac Oscar Gariano, and Miniruwani Samarakoon. 2022. More programming than programming: Teaching formal methods in a software engineering programme. InNASA Formal Methods Sym- posium. Springer, 431–450
2022
-
[66]
OpenAI. 2025. Introducing GPT-OSS: Open- Weight Reasoning Models. https://openai.com/ index/introducing-gpt-oss/ Accessed: Apr. 22, 2026
2025
-
[67]
Saswata Paul, Elkin Cruz, Airin Dutta, Ankita Bhaumik, Erik Blasch, Gul Agha, Stacy Patterson, Fotis Kop- saftopoulos, and Carlos Varela. 2023. Formal verification of safety-critical aerospace systems.IEEE Aerospace and Electronic Systems Magazine38, 5 (2023), 72–88
2023
-
[68]
Partha Pratim Ray. 2025. A review on vibe coding: Fundamentals, state-of-the-art, challenges and future di- rections.Authorea Preprints(2025)
2025
-
[69]
Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the 16 few-shot paradigm. InExtended abstracts of the 2021 CHI conference on human factors in computing systems. 1–7
2021
-
[70]
Talia Ringer, Karl Palmskog, Ilya Sergey, Gligoric Milos, and Zachary Tatlock. 2019. QED at large: A survey of engineering of formally verified software.Foundations and Trends in Programming Languages5, 2-3 (2019), 102–281
2019
-
[71]
JM Rushby. 1995. Model checking and other ways of automating formal methods.Position paper for panel on model checking for concurrent programs, Software Quality Week, San Francisco(1995)
1995
-
[72]
Ahmed Ryan, Ibrahim Khalil, Abdullah Al Jahid, Md Erfan, Sungbin Park, Akond Ashfaque Ur Rahman, and Md Rayhanur Rahman. 2026. Mind the Gap: Evaluating LLMs for High-Level Malicious Package Detection vs. Fine-Grained Indicator Identification.arXiv preprint arXiv:2602.16304(2026)
-
[73]
Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroen- sri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, et al. 2025. Medgemma technical report.arXiv preprint arXiv:2507.05201(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[74]
Janet Siegmund, Norbert Siegmund, and Sven Apel
-
[75]
In2015 IEEE/ACM 37th IEEE In- ternational Conference on Software Engineering, V ol
Views on internal and external validity in empirical software engineering. In2015 IEEE/ACM 37th IEEE In- ternational Conference on Software Engineering, V ol. 1. IEEE, 9–19
-
[76]
Chuyue Sun, Ying Sheng, Oded Padon, and Clark Bar- rett. 2024. Clover: Clo sed-loop ver ifiable code gen- eration. InInternational Symposium on AI Verification. Springer, 134–155
2024
-
[77]
Maurice ter Beek, Manfred Broy, and Brijesh Dongol
-
[78]
The role of formal methods in computer science education.ACM Inroads15, 4 (2024), 58–66
2024
-
[79]
The Dafny Project. 2024. Dafny NuGet Package: The Dafny Compiler and Verifier. NuGet Package Manager. https://www.nuget.org/packages/Dafny/ Version 4.x.x, Accessed: Apr. 22, 2026
2024
-
[80]
2024.Dafny Reference Man- ual
The Dafny Project. 2024.Dafny Reference Man- ual. Amazon Web Services. https://dafny.org/latest/ DafnyRef/DafnyRef Accessed: Apr. 22, 2026
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.