Recognition: no theorem link
Combining Mechanical and Agentic Specification Inference for Move
Pith reviewed 2026-05-14 22:01 UTC · model grok-4.3
The pith
A hybrid tool merges weakest-precondition analysis with an AI agent to infer Move specifications automatically.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a sound weakest-precondition analysis over Move bytecode provides a mechanical baseline for specification inference, and an agentic coding CLI completes the specifications for loop invariants and high-level properties, with the Move Prover acting as the oracle that decides validity and drives iterative refinement until verification succeeds.
What carries the argument
The hybrid inference pipeline that exposes the weakest-precondition analysis and the Move Prover to the agentic coding CLI via a Model Context Protocol service, allowing the agent to propose and refine specifications until they pass verification.
If this is right
- Specifications for functions that support global-state invariants can be inferred with far less manual writing.
- The method handles programs that use higher-order functions, dynamic dispatch, references, and various loop forms.
- The agent can supply proof hints that help the prover reach a successful outcome.
- The approach has been demonstrated on a corpus of canonical Move code exercising these features.
Where Pith is reading between the lines
- Similar hybrid pipelines could be constructed for other languages once a comparable weakest-precondition analysis exists.
- The division of labor suggests that AI is most useful when it augments rather than replaces sound mechanical analysis.
- Scaling the tool to larger Move codebases would reveal how often further human guidance is still required.
Load-bearing premise
The AI agent can reliably generate loop invariants and high-level idiomatic specifications that the Move Prover accepts after a reasonable number of refinements.
What would settle it
A Move function containing a loop for which the agent produces no acceptable invariant after repeated refinement attempts, even though a correct specification exists that would allow verification to succeed.
read the original abstract
In this paper, we describe early work on a specification inference tool for the Move Prover that combines a weakest-precondition (WP) analysis over Move bytecode with an agentic coding CLI such as Claude Code. Specification inference reduces the boilerplate of writing specifications in Move: in order to verify a high-level property such as a global state invariant, pre- and post-conditions for the supporting functions typically have to be written by hand, which is tedious. In our setting, a Model Context Protocol (MCP) service exposes the WP analysis and the prover itself to the coding agent. The WP analysis provides a sound, mechanical baseline for inference; the AI is used precisely where WP is weakest -- for loop invariants and high-level idiomatic specifications such as monotonicity, conservation, and structural invariants. The Move Prover serves as the oracle that decides whether the generated specs are valid, and the agent is equipped to generate proof hints and to refine the inferred specification until verification succeeds. The tool has been applied to a corpus of canonical Move code, including code that uses higher-order functions, dynamic dispatch, global state, references, and various forms of loops.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes early work on a specification inference tool for the Move Prover that combines a weakest-precondition (WP) analysis over Move bytecode with an agentic coding CLI (e.g., Claude Code) exposed via a Model Context Protocol (MCP) service. The WP analysis supplies a sound mechanical baseline for pre- and post-conditions, while the AI agent is used specifically for loop invariants and high-level idiomatic specifications such as monotonicity, conservation, and structural invariants. The Move Prover serves as an oracle that validates the generated specifications, allowing the agent to generate proof hints and refine until verification succeeds. The tool is stated to have been applied to a corpus of canonical Move code that includes higher-order functions, dynamic dispatch, global state, references, and loops.
Significance. If the hybrid approach can be shown to reliably infer valid specifications with limited human intervention, it would meaningfully reduce boilerplate in Move verification and demonstrate a practical way to combine the soundness guarantees of weakest-precondition reasoning with the flexibility of large-language-model agents for idiomatic properties. This could broaden access to formal verification for Move programs that use complex language features where purely mechanical inference is insufficient.
major comments (1)
- [Abstract] Abstract: The manuscript asserts that the tool “has been applied to a corpus” of Move code featuring higher-order functions, dynamic dispatch, global state, references, and loops, yet supplies no quantitative evaluation data whatsoever—no success rates, median refinement iterations, fraction of cases requiring human edits, or failure-mode analysis. Without these metrics the central claim that the agent “refines until verification succeeds” without extensive intervention remains an untested assertion rather than a demonstrated result.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need for quantitative support. The manuscript presents early-stage work on the hybrid inference tool; we agree that the current text does not yet supply the metrics that would turn the reported applications into a demonstrated result. We will revise accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The manuscript asserts that the tool “has been applied to a corpus” of Move code featuring higher-order functions, dynamic dispatch, global state, references, and loops, yet supplies no quantitative evaluation data whatsoever—no success rates, median refinement iterations, fraction of cases requiring human edits, or failure-mode analysis. Without these metrics the central claim that the agent “refines until verification succeeds” without extensive intervention remains an untested assertion rather than a demonstrated result.
Authors: We accept the observation. The present manuscript is framed as a description of the tool architecture and its initial use on a small canonical corpus; no systematic measurement campaign was performed. In the revision we will add a dedicated evaluation subsection that reports, for the functions processed: (i) total number of specifications inferred, (ii) fraction that verified on the first agent attempt, (iii) median number of refinement iterations, (iv) number of cases that required manual edits to the generated invariants or hints, and (v) a brief failure-mode summary. These numbers will be drawn from the same corpus already mentioned in the abstract, thereby grounding the claim that the agent refines until verification succeeds. revision: yes
Circularity Check
No circularity: descriptive tool integration without derivation or self-referential claims
full rationale
The manuscript is an early-work tool description that integrates an existing weakest-precondition analysis over Move bytecode with an external agentic coding CLI (Claude Code) and the Move Prover as oracle. No equations, fitted parameters, predictions, or mathematical derivations are present. The central claim—that the hybrid system infers valid specifications for higher-order functions, loops, etc.—rests on the mechanical soundness of WP plus the agent's ability to supply invariants, with the prover deciding validity. This is not self-definitional, not a fitted-input prediction, and contains no load-bearing self-citations or uniqueness theorems. The paper simply reports application to a corpus without claiming to derive its own correctness from its outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Weakest-precondition analysis over Move bytecode produces a sound baseline set of specifications.
- domain assumption The Move Prover can serve as a reliable oracle for deciding validity of inferred specifications.
Forward citations
Cited by 2 Pith papers
-
Formal Verification of Imperative First-Class Functions in Move
The Move Prover now verifies imperative first-class functions via behavioral predicates that capture aborts and state changes, with state labels for sequencing and efficient SMT handling of known versus unknown functions.
-
Formal Verification of Imperative First-Class Functions in Move
The Move Prover now verifies imperative first-class functions using behavioral predicates encoded via SMT discrimination over possible function values at call sites.
Reference graph
Works this paper leans on
-
[1]
Fast and reliable formal verification of smart contracts with the move prover,
D. L. Dill, W. Grieskamp, J. Park, S. Qadeer, M. Xu, and J. E. Zhong, “Fast and reliable formal verification of smart contracts with the move prover,” inTools and Algorithms for the Construction and Analysis of Systems (TACAS 2022), ser. Lecture Notes in Computer Science, vol. 13243. Springer, 2022, pp. 183–200
work page 2022
-
[2]
Resources: A safe language abstraction for money,
S. Blackshear, D. L. Dill, S. Qadeer, C. W. Barrett, J. C. Mitchell, O. Padon, and Y . Zohar, “Resources: A safe language abstraction for money,” 2020. [Online]. Available: https://arxiv.org/abs/2004.05106
-
[3]
Aptos Labs, “The Move on Aptos Book,” https://aptos-labs.github.io/ move-book/, 2023, accessed: 2026-05-09
work page 2023
-
[4]
Securing Aptos Framework with Formal Verifica- tion,
J. Park, T. Zhang, W. Grieskamp, M. Xu, G. D. Giacomo, K. Chen, Y . Lu, and R. Chen, “Securing Aptos Framework with Formal Verifica- tion,” in5th International Workshop on Formal Methods for Blockchains (FMBC 2024), ser. Open Access Series in Informatics (OASIcs), vol
work page 2024
- [5]
-
[6]
Aptos Labs, “The Aptos Framework Book,” https://aptos-labs.github.i o/framework-book/, 2023, accessed: 2026-05-09
work page 2023
-
[7]
Dynamically discovering likely program invariants to support program evolution,
M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin, “Dynamically discovering likely program invariants to support program evolution,” IEEE Transactions on Software Engineering, vol. 27, no. 2, pp. 99–123, 2001
work page 2001
-
[8]
Can large language models reason about program invariants?
K. Pei, D. Bieber, K. Shi, C. Sutton, and P. Yin, “Can large language models reason about program invariants?” inProceedings of the 40th International Conference on Machine Learning (ICML 2023), ser. Proceedings of Machine Learning Research, vol. 202. PMLR, 2023, pp. 27 496–27 520. [Online]. Available: https://proceedings.mlr.press/v2 02/pei23a.html
work page 2023
-
[9]
Lahiri, Akash Lal, Aseem Rastogi, Subha- jit Roy, and Rahul Sharma
A. Kamath, A. Senthilnathan, S. Chakraborty, P. Deligiannis, S. K. Lahiri, A. Lal, A. Rastogi, S. Roy, and R. Sharma, “Finding inductive loop invariants using large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2311.07948
- [10]
-
[11]
Guarded commands, nondeterminacy and formal deriva- tion of programs,
E. W. Dijkstra, “Guarded commands, nondeterminacy and formal deriva- tion of programs,”Communications of the ACM, vol. 18, no. 8, pp. 453–457, 1975
work page 1975
-
[12]
Weakest-precondition of unstructured programs,
M. Barnett and K. R. M. Leino, “Weakest-precondition of unstructured programs,” inProceedings of the 6th ACM SIGPLAN-SIGSOFT Work- shop on Program Analysis for Software Tools and Engineering. New York, NY , USA: Association for Computing Machinery, 2005, pp. 82— 87
work page 2005
-
[13]
Anthropic, “Claude Code Plugins,” https://docs.claude.com/en/docs/cl aude-code/plugins, 2025, accessed: 2026-05-10
work page 2025
-
[14]
Aeneas: Rust verification by functional trans- lation,
S. Ho and J. Protzenko, “Aeneas: Rust verification by functional trans- lation,”Proc. ACM Program. Lang., vol. 6, no. ICFP, pp. 116:1–116:31, 2022
work page 2022
-
[15]
Move Prover Spec Inference Paper Examples,
Aptos Labs, “Move Prover Spec Inference Paper Examples,” https://gi thub.com/aptos-labs/aptos-core/tree/d8be469f7f/third_party/move/mov e-prover/doc/inference-paper-26/examples, 2026, commitd8be469f7f, accessed 2026-05-11
work page 2026
-
[16]
MoveFlow: AI-assisted Move smart contract development,
——, “MoveFlow: AI-assisted Move smart contract development,” https: //github.com/aptos-labs/aptos-core/tree/d8be469f7f/aptos-move/flow, 2026, commitd8be469f7f, accessed 2026-05-11
work page 2026
-
[17]
Formal Verification of Imperative First-Class Functions in Move
W. Grieskamp, T. Zhang, V . Kashyap, and J. Silverman, “Formal verification of imperative first-class functions in Move,”CoRR, vol. abs/2605.10007, 2026. [Online]. Available: https://arxiv.org/abs/2605.1 0007
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[18]
Agentic specification generator for Move programs,
Y .-F. Fu, M. Xu, and T. Kim, “Agentic specification generator for Move programs,” inProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025), 2025, pp. 1286–1298
work page 2025
-
[19]
VeriStruct: AI-assisted automated verification of data-structure modules in Verus,
C. Sun, Y . Sun, D. Amrollahi, E. Zhang, S. K. Lahiri, S. Lu, D. L. Dill, and C. Barrett, “VeriStruct: AI-assisted automated verification of data-structure modules in Verus,” inTools and Algorithms for the Construction and Analysis of Systems (TACAS 2026), ser. Lecture Notes in Computer Science. Springer, 2026, arXiv:2510.25015. [Online]. Available: https...
-
[20]
Y . Liu, Y . Xue, D. Wu, Y . Sun, Y . Li, M. Shi, and Y . Liu, “PropertyGPT: LLM-driven formal verification of smart contracts through retrieval- augmented property generation,” inNetwork and Distributed System Security Symposium (NDSS 2025), 2025
work page 2025
-
[21]
DafnyPro: LLM-assisted automated verification for Dafny programs,
D. Banerjee, O. Bouissou, and S. Zetzsche, “DafnyPro: LLM-assisted automated verification for Dafny programs,” 2026, dafny Workshop, POPL 2026. [Online]. Available: https://arxiv.org/abs/2601.05385
-
[22]
AutoVerus: Automated proof generation for Rust code,
C. Yang, X. Li, M. R. H. Misu, J. Yao, W. Cui, Y . Gong, C. Hawblitzel, S. K. Lahiri, J. R. Lorch, S. Lu, F. Yang, Z. Zhou, and S. Lu, “AutoVerus: Automated proof generation for Rust code,”Proc. ACM Program. Lang., vol. 9, no. OOPSLA2, 2025
work page 2025
-
[23]
Boogie: A modular reusable verifier for object-oriented programs,
M. Barnett, B.-Y . E. Chang, R. DeLine, B. Jacobs, and K. R. M. Leino, “Boogie: A modular reusable verifier for object-oriented programs,” inInternational Symposium on Formal Methods for Components and Objects. Springer, 2005, pp. 364–387
work page 2005
-
[24]
A polymorphic intermediate verifica- tion language: Design and logical encoding,
K. R. M. Leino and P. Rümmer, “A polymorphic intermediate verifica- tion language: Design and logical encoding,” inTACAS, J. Esparza and R. Majumdar, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 312–327
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.