arxiv: 2605.10005 · v2 · submitted 2026-05-11 · 💻 cs.PL · cs.AI· cs.LO· cs.SE

Recognition: no theorem link

Combining Mechanical and Agentic Specification Inference for Move

Wolfgang Grieskamp , Teng Zhang , Vineeth Kashyap

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:01 UTC · model grok-4.3

classification 💻 cs.PL cs.AIcs.LOcs.SE

keywords specification inferenceMove Proverweakest preconditionagentic codingformal verificationsmart contracts

0 comments

The pith

A hybrid tool merges weakest-precondition analysis with an AI agent to infer Move specifications automatically.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a specification inference tool for the Move Prover that pairs a sound weakest-precondition analysis over Move bytecode with an agentic coding CLI. The mechanical analysis supplies a reliable baseline of pre- and post-conditions while the AI agent supplies loop invariants and high-level idiomatic properties such as monotonicity, conservation, and structural invariants. The Move Prover serves as the oracle that validates each candidate specification, and the agent generates proof hints and refines the output until verification succeeds. The approach targets the manual boilerplate required to verify high-level properties in Move programs that use higher-order functions, dynamic dispatch, global state, references, and loops. A sympathetic reader would care because the method promises to lower the effort of writing supporting specifications for formal verification of smart contracts.

Core claim

The paper claims that a sound weakest-precondition analysis over Move bytecode provides a mechanical baseline for specification inference, and an agentic coding CLI completes the specifications for loop invariants and high-level properties, with the Move Prover acting as the oracle that decides validity and drives iterative refinement until verification succeeds.

What carries the argument

The hybrid inference pipeline that exposes the weakest-precondition analysis and the Move Prover to the agentic coding CLI via a Model Context Protocol service, allowing the agent to propose and refine specifications until they pass verification.

If this is right

Specifications for functions that support global-state invariants can be inferred with far less manual writing.
The method handles programs that use higher-order functions, dynamic dispatch, references, and various loop forms.
The agent can supply proof hints that help the prover reach a successful outcome.
The approach has been demonstrated on a corpus of canonical Move code exercising these features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar hybrid pipelines could be constructed for other languages once a comparable weakest-precondition analysis exists.
The division of labor suggests that AI is most useful when it augments rather than replaces sound mechanical analysis.
Scaling the tool to larger Move codebases would reveal how often further human guidance is still required.

Load-bearing premise

The AI agent can reliably generate loop invariants and high-level idiomatic specifications that the Move Prover accepts after a reasonable number of refinements.

What would settle it

A Move function containing a loop for which the agent produces no acceptable invariant after repeated refinement attempts, even though a correct specification exists that would allow verification to succeed.

read the original abstract

In this paper, we describe early work on a specification inference tool for the Move Prover that combines a weakest-precondition (WP) analysis over Move bytecode with an agentic coding CLI such as Claude Code. Specification inference reduces the boilerplate of writing specifications in Move: in order to verify a high-level property such as a global state invariant, pre- and post-conditions for the supporting functions typically have to be written by hand, which is tedious. In our setting, a Model Context Protocol (MCP) service exposes the WP analysis and the prover itself to the coding agent. The WP analysis provides a sound, mechanical baseline for inference; the AI is used precisely where WP is weakest -- for loop invariants and high-level idiomatic specifications such as monotonicity, conservation, and structural invariants. The Move Prover serves as the oracle that decides whether the generated specs are valid, and the agent is equipped to generate proof hints and to refine the inferred specification until verification succeeds. The tool has been applied to a corpus of canonical Move code, including code that uses higher-order functions, dynamic dispatch, global state, references, and various forms of loops.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hybrid Move spec tool pairs WP analysis with an agentic CLI but provides no numbers on whether the agent actually reduces manual work.

read the letter

Colleague, the main takeaway is that this describes an early hybrid tool for inferring Move specifications: weakest-precondition analysis supplies the sound baseline, and an agentic coding CLI (via MCP) handles the gaps like loop invariants and high-level properties such as monotonicity or conservation, with the Move Prover as the oracle that forces refinement until verification passes. They report applying it to a corpus covering higher-order functions, dynamic dispatch, global state, references, and loops. The architecture that wires the WP results and prover directly into the agent is the concrete new piece here, and it is a reasonable way to ground the agent without letting it drift into invalid specs. The separation of concerns is clear and practical for this domain. The soft spot is the complete absence of evaluation data. The text states the tool was run on the corpus but gives no success rates, no median refinement iterations, no count of cases needing extra human edits, and no failure modes. Without those figures the claim that the agent reliably fills the hard parts without extensive intervention stays untested. The formal side relies on existing WP and prover machinery rather than new derivations, which is fine for a tool paper. Citation choices look standard and focused on the relevant Move verification literature. This is aimed at researchers building verification tooling for blockchain languages like Move, particularly those already using the prover and interested in AI assistance for specs. A reader in that niche could extract the MCP integration pattern and try it. I would send it for peer review so the authors get concrete feedback on adding benchmarks and to see if the implementation details hold up under scrutiny.

Referee Report

1 major / 0 minor

Summary. The paper describes early work on a specification inference tool for the Move Prover that combines a weakest-precondition (WP) analysis over Move bytecode with an agentic coding CLI (e.g., Claude Code) exposed via a Model Context Protocol (MCP) service. The WP analysis supplies a sound mechanical baseline for pre- and post-conditions, while the AI agent is used specifically for loop invariants and high-level idiomatic specifications such as monotonicity, conservation, and structural invariants. The Move Prover serves as an oracle that validates the generated specifications, allowing the agent to generate proof hints and refine until verification succeeds. The tool is stated to have been applied to a corpus of canonical Move code that includes higher-order functions, dynamic dispatch, global state, references, and loops.

Significance. If the hybrid approach can be shown to reliably infer valid specifications with limited human intervention, it would meaningfully reduce boilerplate in Move verification and demonstrate a practical way to combine the soundness guarantees of weakest-precondition reasoning with the flexibility of large-language-model agents for idiomatic properties. This could broaden access to formal verification for Move programs that use complex language features where purely mechanical inference is insufficient.

major comments (1)

[Abstract] Abstract: The manuscript asserts that the tool “has been applied to a corpus” of Move code featuring higher-order functions, dynamic dispatch, global state, references, and loops, yet supplies no quantitative evaluation data whatsoever—no success rates, median refinement iterations, fraction of cases requiring human edits, or failure-mode analysis. Without these metrics the central claim that the agent “refines until verification succeeds” without extensive intervention remains an untested assertion rather than a demonstrated result.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for quantitative support. The manuscript presents early-stage work on the hybrid inference tool; we agree that the current text does not yet supply the metrics that would turn the reported applications into a demonstrated result. We will revise accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The manuscript asserts that the tool “has been applied to a corpus” of Move code featuring higher-order functions, dynamic dispatch, global state, references, and loops, yet supplies no quantitative evaluation data whatsoever—no success rates, median refinement iterations, fraction of cases requiring human edits, or failure-mode analysis. Without these metrics the central claim that the agent “refines until verification succeeds” without extensive intervention remains an untested assertion rather than a demonstrated result.

Authors: We accept the observation. The present manuscript is framed as a description of the tool architecture and its initial use on a small canonical corpus; no systematic measurement campaign was performed. In the revision we will add a dedicated evaluation subsection that reports, for the functions processed: (i) total number of specifications inferred, (ii) fraction that verified on the first agent attempt, (iii) median number of refinement iterations, (iv) number of cases that required manual edits to the generated invariants or hints, and (v) a brief failure-mode summary. These numbers will be drawn from the same corpus already mentioned in the abstract, thereby grounding the claim that the agent refines until verification succeeds. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive tool integration without derivation or self-referential claims

full rationale

The manuscript is an early-work tool description that integrates an existing weakest-precondition analysis over Move bytecode with an external agentic coding CLI (Claude Code) and the Move Prover as oracle. No equations, fitted parameters, predictions, or mathematical derivations are present. The central claim—that the hybrid system infers valid specifications for higher-order functions, loops, etc.—rests on the mechanical soundness of WP plus the agent's ability to supply invariants, with the prover deciding validity. This is not self-definitional, not a fitted-input prediction, and contains no load-bearing self-citations or uniqueness theorems. The paper simply reports application to a corpus without claiming to derive its own correctness from its outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach depends on the soundness of weakest-precondition analysis for Move bytecode and on the ability of the AI agent to produce prover-acceptable specifications for the cases where WP is incomplete.

axioms (2)

domain assumption Weakest-precondition analysis over Move bytecode produces a sound baseline set of specifications.
Invoked as the mechanical foundation that the AI augments.
domain assumption The Move Prover can serve as a reliable oracle for deciding validity of inferred specifications.
Central to the refinement loop described.

pith-pipeline@v0.9.0 · 5506 in / 1336 out tokens · 65654 ms · 2026-05-14T22:01:01.528351+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Formal Verification of Imperative First-Class Functions in Move
cs.PL 2026-05 unverdicted novelty 7.0

The Move Prover now verifies imperative first-class functions via behavioral predicates that capture aborts and state changes, with state labels for sequencing and efficient SMT handling of known versus unknown functions.
Formal Verification of Imperative First-Class Functions in Move
cs.PL 2026-05 unverdicted novelty 7.0

The Move Prover now verifies imperative first-class functions using behavioral predicates encoded via SMT discrimination over possible function values at call sites.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Fast and reliable formal verification of smart contracts with the move prover,

D. L. Dill, W. Grieskamp, J. Park, S. Qadeer, M. Xu, and J. E. Zhong, “Fast and reliable formal verification of smart contracts with the move prover,” inTools and Algorithms for the Construction and Analysis of Systems (TACAS 2022), ser. Lecture Notes in Computer Science, vol. 13243. Springer, 2022, pp. 183–200

work page 2022
[2]

Resources: A safe language abstraction for money,

S. Blackshear, D. L. Dill, S. Qadeer, C. W. Barrett, J. C. Mitchell, O. Padon, and Y . Zohar, “Resources: A safe language abstraction for money,” 2020. [Online]. Available: https://arxiv.org/abs/2004.05106

work page arXiv 2020
[3]

The Move on Aptos Book,

Aptos Labs, “The Move on Aptos Book,” https://aptos-labs.github.io/ move-book/, 2023, accessed: 2026-05-09

work page 2023
[4]

Securing Aptos Framework with Formal Verifica- tion,

J. Park, T. Zhang, W. Grieskamp, M. Xu, G. D. Giacomo, K. Chen, Y . Lu, and R. Chen, “Securing Aptos Framework with Formal Verifica- tion,” in5th International Workshop on Formal Methods for Blockchains (FMBC 2024), ser. Open Access Series in Informatics (OASIcs), vol

work page 2024
[5]

9:1–9:16

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024, pp. 9:1–9:16

work page 2024
[6]

The Aptos Framework Book,

Aptos Labs, “The Aptos Framework Book,” https://aptos-labs.github.i o/framework-book/, 2023, accessed: 2026-05-09

work page 2023
[7]

Dynamically discovering likely program invariants to support program evolution,

M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin, “Dynamically discovering likely program invariants to support program evolution,” IEEE Transactions on Software Engineering, vol. 27, no. 2, pp. 99–123, 2001

work page 2001
[8]

Can large language models reason about program invariants?

K. Pei, D. Bieber, K. Shi, C. Sutton, and P. Yin, “Can large language models reason about program invariants?” inProceedings of the 40th International Conference on Machine Learning (ICML 2023), ser. Proceedings of Machine Learning Research, vol. 202. PMLR, 2023, pp. 27 496–27 520. [Online]. Available: https://proceedings.mlr.press/v2 02/pei23a.html

work page 2023
[9]

Lahiri, Akash Lal, Aseem Rastogi, Subha- jit Roy, and Rahul Sharma

A. Kamath, A. Senthilnathan, S. Chakraborty, P. Deligiannis, S. K. Lahiri, A. Lal, A. Rastogi, S. Roy, and R. Sharma, “Finding inductive loop invariants using large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2311.07948

work page arXiv 2023
[10]

Claude Code,

Anthropic, “Claude Code,” https://www.anthropic.com/claude-code, 2025

work page 2025
[11]

Guarded commands, nondeterminacy and formal deriva- tion of programs,

E. W. Dijkstra, “Guarded commands, nondeterminacy and formal deriva- tion of programs,”Communications of the ACM, vol. 18, no. 8, pp. 453–457, 1975

work page 1975
[12]

Weakest-precondition of unstructured programs,

M. Barnett and K. R. M. Leino, “Weakest-precondition of unstructured programs,” inProceedings of the 6th ACM SIGPLAN-SIGSOFT Work- shop on Program Analysis for Software Tools and Engineering. New York, NY , USA: Association for Computing Machinery, 2005, pp. 82— 87

work page 2005
[13]

Claude Code Plugins,

Anthropic, “Claude Code Plugins,” https://docs.claude.com/en/docs/cl aude-code/plugins, 2025, accessed: 2026-05-10

work page 2025
[14]

Aeneas: Rust verification by functional trans- lation,

S. Ho and J. Protzenko, “Aeneas: Rust verification by functional trans- lation,”Proc. ACM Program. Lang., vol. 6, no. ICFP, pp. 116:1–116:31, 2022

work page 2022
[15]

Move Prover Spec Inference Paper Examples,

Aptos Labs, “Move Prover Spec Inference Paper Examples,” https://gi thub.com/aptos-labs/aptos-core/tree/d8be469f7f/third_party/move/mov e-prover/doc/inference-paper-26/examples, 2026, commitd8be469f7f, accessed 2026-05-11

work page 2026
[16]

MoveFlow: AI-assisted Move smart contract development,

——, “MoveFlow: AI-assisted Move smart contract development,” https: //github.com/aptos-labs/aptos-core/tree/d8be469f7f/aptos-move/flow, 2026, commitd8be469f7f, accessed 2026-05-11

work page 2026
[17]

Formal Verification of Imperative First-Class Functions in Move

W. Grieskamp, T. Zhang, V . Kashyap, and J. Silverman, “Formal verification of imperative first-class functions in Move,”CoRR, vol. abs/2605.10007, 2026. [Online]. Available: https://arxiv.org/abs/2605.1 0007

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Agentic specification generator for Move programs,

Y .-F. Fu, M. Xu, and T. Kim, “Agentic specification generator for Move programs,” inProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025), 2025, pp. 1286–1298

work page 2025
[19]

VeriStruct: AI-assisted automated verification of data-structure modules in Verus,

C. Sun, Y . Sun, D. Amrollahi, E. Zhang, S. K. Lahiri, S. Lu, D. L. Dill, and C. Barrett, “VeriStruct: AI-assisted automated verification of data-structure modules in Verus,” inTools and Algorithms for the Construction and Analysis of Systems (TACAS 2026), ser. Lecture Notes in Computer Science. Springer, 2026, arXiv:2510.25015. [Online]. Available: https...

work page arXiv 2026
[20]

PropertyGPT: LLM-driven formal verification of smart contracts through retrieval- augmented property generation,

Y . Liu, Y . Xue, D. Wu, Y . Sun, Y . Li, M. Shi, and Y . Liu, “PropertyGPT: LLM-driven formal verification of smart contracts through retrieval- augmented property generation,” inNetwork and Distributed System Security Symposium (NDSS 2025), 2025

work page 2025
[21]

DafnyPro: LLM-assisted automated verification for Dafny programs,

D. Banerjee, O. Bouissou, and S. Zetzsche, “DafnyPro: LLM-assisted automated verification for Dafny programs,” 2026, dafny Workshop, POPL 2026. [Online]. Available: https://arxiv.org/abs/2601.05385

work page arXiv 2026
[22]

AutoVerus: Automated proof generation for Rust code,

C. Yang, X. Li, M. R. H. Misu, J. Yao, W. Cui, Y . Gong, C. Hawblitzel, S. K. Lahiri, J. R. Lorch, S. Lu, F. Yang, Z. Zhou, and S. Lu, “AutoVerus: Automated proof generation for Rust code,”Proc. ACM Program. Lang., vol. 9, no. OOPSLA2, 2025

work page 2025
[23]

Boogie: A modular reusable verifier for object-oriented programs,

M. Barnett, B.-Y . E. Chang, R. DeLine, B. Jacobs, and K. R. M. Leino, “Boogie: A modular reusable verifier for object-oriented programs,” inInternational Symposium on Formal Methods for Components and Objects. Springer, 2005, pp. 364–387

work page 2005
[24]

A polymorphic intermediate verifica- tion language: Design and logical encoding,

K. R. M. Leino and P. Rümmer, “A polymorphic intermediate verifica- tion language: Design and logical encoding,” inTACAS, J. Esparza and R. Majumdar, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 312–327

work page 2010