pith. machine review for the scientific record. sign in

arxiv: 2605.10005 · v2 · submitted 2026-05-11 · 💻 cs.PL · cs.AI· cs.LO· cs.SE

Recognition: no theorem link

Combining Mechanical and Agentic Specification Inference for Move

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:01 UTC · model grok-4.3

classification 💻 cs.PL cs.AIcs.LOcs.SE
keywords specification inferenceMove Proverweakest preconditionagentic codingformal verificationsmart contracts
0
0 comments X

The pith

A hybrid tool merges weakest-precondition analysis with an AI agent to infer Move specifications automatically.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a specification inference tool for the Move Prover that pairs a sound weakest-precondition analysis over Move bytecode with an agentic coding CLI. The mechanical analysis supplies a reliable baseline of pre- and post-conditions while the AI agent supplies loop invariants and high-level idiomatic properties such as monotonicity, conservation, and structural invariants. The Move Prover serves as the oracle that validates each candidate specification, and the agent generates proof hints and refines the output until verification succeeds. The approach targets the manual boilerplate required to verify high-level properties in Move programs that use higher-order functions, dynamic dispatch, global state, references, and loops. A sympathetic reader would care because the method promises to lower the effort of writing supporting specifications for formal verification of smart contracts.

Core claim

The paper claims that a sound weakest-precondition analysis over Move bytecode provides a mechanical baseline for specification inference, and an agentic coding CLI completes the specifications for loop invariants and high-level properties, with the Move Prover acting as the oracle that decides validity and drives iterative refinement until verification succeeds.

What carries the argument

The hybrid inference pipeline that exposes the weakest-precondition analysis and the Move Prover to the agentic coding CLI via a Model Context Protocol service, allowing the agent to propose and refine specifications until they pass verification.

If this is right

  • Specifications for functions that support global-state invariants can be inferred with far less manual writing.
  • The method handles programs that use higher-order functions, dynamic dispatch, references, and various loop forms.
  • The agent can supply proof hints that help the prover reach a successful outcome.
  • The approach has been demonstrated on a corpus of canonical Move code exercising these features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hybrid pipelines could be constructed for other languages once a comparable weakest-precondition analysis exists.
  • The division of labor suggests that AI is most useful when it augments rather than replaces sound mechanical analysis.
  • Scaling the tool to larger Move codebases would reveal how often further human guidance is still required.

Load-bearing premise

The AI agent can reliably generate loop invariants and high-level idiomatic specifications that the Move Prover accepts after a reasonable number of refinements.

What would settle it

A Move function containing a loop for which the agent produces no acceptable invariant after repeated refinement attempts, even though a correct specification exists that would allow verification to succeed.

read the original abstract

In this paper, we describe early work on a specification inference tool for the Move Prover that combines a weakest-precondition (WP) analysis over Move bytecode with an agentic coding CLI such as Claude Code. Specification inference reduces the boilerplate of writing specifications in Move: in order to verify a high-level property such as a global state invariant, pre- and post-conditions for the supporting functions typically have to be written by hand, which is tedious. In our setting, a Model Context Protocol (MCP) service exposes the WP analysis and the prover itself to the coding agent. The WP analysis provides a sound, mechanical baseline for inference; the AI is used precisely where WP is weakest -- for loop invariants and high-level idiomatic specifications such as monotonicity, conservation, and structural invariants. The Move Prover serves as the oracle that decides whether the generated specs are valid, and the agent is equipped to generate proof hints and to refine the inferred specification until verification succeeds. The tool has been applied to a corpus of canonical Move code, including code that uses higher-order functions, dynamic dispatch, global state, references, and various forms of loops.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper describes early work on a specification inference tool for the Move Prover that combines a weakest-precondition (WP) analysis over Move bytecode with an agentic coding CLI (e.g., Claude Code) exposed via a Model Context Protocol (MCP) service. The WP analysis supplies a sound mechanical baseline for pre- and post-conditions, while the AI agent is used specifically for loop invariants and high-level idiomatic specifications such as monotonicity, conservation, and structural invariants. The Move Prover serves as an oracle that validates the generated specifications, allowing the agent to generate proof hints and refine until verification succeeds. The tool is stated to have been applied to a corpus of canonical Move code that includes higher-order functions, dynamic dispatch, global state, references, and loops.

Significance. If the hybrid approach can be shown to reliably infer valid specifications with limited human intervention, it would meaningfully reduce boilerplate in Move verification and demonstrate a practical way to combine the soundness guarantees of weakest-precondition reasoning with the flexibility of large-language-model agents for idiomatic properties. This could broaden access to formal verification for Move programs that use complex language features where purely mechanical inference is insufficient.

major comments (1)
  1. [Abstract] Abstract: The manuscript asserts that the tool “has been applied to a corpus” of Move code featuring higher-order functions, dynamic dispatch, global state, references, and loops, yet supplies no quantitative evaluation data whatsoever—no success rates, median refinement iterations, fraction of cases requiring human edits, or failure-mode analysis. Without these metrics the central claim that the agent “refines until verification succeeds” without extensive intervention remains an untested assertion rather than a demonstrated result.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for quantitative support. The manuscript presents early-stage work on the hybrid inference tool; we agree that the current text does not yet supply the metrics that would turn the reported applications into a demonstrated result. We will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The manuscript asserts that the tool “has been applied to a corpus” of Move code featuring higher-order functions, dynamic dispatch, global state, references, and loops, yet supplies no quantitative evaluation data whatsoever—no success rates, median refinement iterations, fraction of cases requiring human edits, or failure-mode analysis. Without these metrics the central claim that the agent “refines until verification succeeds” without extensive intervention remains an untested assertion rather than a demonstrated result.

    Authors: We accept the observation. The present manuscript is framed as a description of the tool architecture and its initial use on a small canonical corpus; no systematic measurement campaign was performed. In the revision we will add a dedicated evaluation subsection that reports, for the functions processed: (i) total number of specifications inferred, (ii) fraction that verified on the first agent attempt, (iii) median number of refinement iterations, (iv) number of cases that required manual edits to the generated invariants or hints, and (v) a brief failure-mode summary. These numbers will be drawn from the same corpus already mentioned in the abstract, thereby grounding the claim that the agent refines until verification succeeds. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive tool integration without derivation or self-referential claims

full rationale

The manuscript is an early-work tool description that integrates an existing weakest-precondition analysis over Move bytecode with an external agentic coding CLI (Claude Code) and the Move Prover as oracle. No equations, fitted parameters, predictions, or mathematical derivations are present. The central claim—that the hybrid system infers valid specifications for higher-order functions, loops, etc.—rests on the mechanical soundness of WP plus the agent's ability to supply invariants, with the prover deciding validity. This is not self-definitional, not a fitted-input prediction, and contains no load-bearing self-citations or uniqueness theorems. The paper simply reports application to a corpus without claiming to derive its own correctness from its outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach depends on the soundness of weakest-precondition analysis for Move bytecode and on the ability of the AI agent to produce prover-acceptable specifications for the cases where WP is incomplete.

axioms (2)
  • domain assumption Weakest-precondition analysis over Move bytecode produces a sound baseline set of specifications.
    Invoked as the mechanical foundation that the AI augments.
  • domain assumption The Move Prover can serve as a reliable oracle for deciding validity of inferred specifications.
    Central to the refinement loop described.

pith-pipeline@v0.9.0 · 5506 in / 1336 out tokens · 65654 ms · 2026-05-14T22:01:01.528351+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Formal Verification of Imperative First-Class Functions in Move

    cs.PL 2026-05 unverdicted novelty 7.0

    The Move Prover now verifies imperative first-class functions via behavioral predicates that capture aborts and state changes, with state labels for sequencing and efficient SMT handling of known versus unknown functions.

  2. Formal Verification of Imperative First-Class Functions in Move

    cs.PL 2026-05 unverdicted novelty 7.0

    The Move Prover now verifies imperative first-class functions using behavioral predicates encoded via SMT discrimination over possible function values at call sites.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Fast and reliable formal verification of smart contracts with the move prover,

    D. L. Dill, W. Grieskamp, J. Park, S. Qadeer, M. Xu, and J. E. Zhong, “Fast and reliable formal verification of smart contracts with the move prover,” inTools and Algorithms for the Construction and Analysis of Systems (TACAS 2022), ser. Lecture Notes in Computer Science, vol. 13243. Springer, 2022, pp. 183–200

  2. [2]

    Resources: A safe language abstraction for money,

    S. Blackshear, D. L. Dill, S. Qadeer, C. W. Barrett, J. C. Mitchell, O. Padon, and Y . Zohar, “Resources: A safe language abstraction for money,” 2020. [Online]. Available: https://arxiv.org/abs/2004.05106

  3. [3]

    The Move on Aptos Book,

    Aptos Labs, “The Move on Aptos Book,” https://aptos-labs.github.io/ move-book/, 2023, accessed: 2026-05-09

  4. [4]

    Securing Aptos Framework with Formal Verifica- tion,

    J. Park, T. Zhang, W. Grieskamp, M. Xu, G. D. Giacomo, K. Chen, Y . Lu, and R. Chen, “Securing Aptos Framework with Formal Verifica- tion,” in5th International Workshop on Formal Methods for Blockchains (FMBC 2024), ser. Open Access Series in Informatics (OASIcs), vol

  5. [5]

    9:1–9:16

    Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024, pp. 9:1–9:16

  6. [6]

    The Aptos Framework Book,

    Aptos Labs, “The Aptos Framework Book,” https://aptos-labs.github.i o/framework-book/, 2023, accessed: 2026-05-09

  7. [7]

    Dynamically discovering likely program invariants to support program evolution,

    M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin, “Dynamically discovering likely program invariants to support program evolution,” IEEE Transactions on Software Engineering, vol. 27, no. 2, pp. 99–123, 2001

  8. [8]

    Can large language models reason about program invariants?

    K. Pei, D. Bieber, K. Shi, C. Sutton, and P. Yin, “Can large language models reason about program invariants?” inProceedings of the 40th International Conference on Machine Learning (ICML 2023), ser. Proceedings of Machine Learning Research, vol. 202. PMLR, 2023, pp. 27 496–27 520. [Online]. Available: https://proceedings.mlr.press/v2 02/pei23a.html

  9. [9]

    Lahiri, Akash Lal, Aseem Rastogi, Subha- jit Roy, and Rahul Sharma

    A. Kamath, A. Senthilnathan, S. Chakraborty, P. Deligiannis, S. K. Lahiri, A. Lal, A. Rastogi, S. Roy, and R. Sharma, “Finding inductive loop invariants using large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2311.07948

  10. [10]

    Claude Code,

    Anthropic, “Claude Code,” https://www.anthropic.com/claude-code, 2025

  11. [11]

    Guarded commands, nondeterminacy and formal deriva- tion of programs,

    E. W. Dijkstra, “Guarded commands, nondeterminacy and formal deriva- tion of programs,”Communications of the ACM, vol. 18, no. 8, pp. 453–457, 1975

  12. [12]

    Weakest-precondition of unstructured programs,

    M. Barnett and K. R. M. Leino, “Weakest-precondition of unstructured programs,” inProceedings of the 6th ACM SIGPLAN-SIGSOFT Work- shop on Program Analysis for Software Tools and Engineering. New York, NY , USA: Association for Computing Machinery, 2005, pp. 82— 87

  13. [13]

    Claude Code Plugins,

    Anthropic, “Claude Code Plugins,” https://docs.claude.com/en/docs/cl aude-code/plugins, 2025, accessed: 2026-05-10

  14. [14]

    Aeneas: Rust verification by functional trans- lation,

    S. Ho and J. Protzenko, “Aeneas: Rust verification by functional trans- lation,”Proc. ACM Program. Lang., vol. 6, no. ICFP, pp. 116:1–116:31, 2022

  15. [15]

    Move Prover Spec Inference Paper Examples,

    Aptos Labs, “Move Prover Spec Inference Paper Examples,” https://gi thub.com/aptos-labs/aptos-core/tree/d8be469f7f/third_party/move/mov e-prover/doc/inference-paper-26/examples, 2026, commitd8be469f7f, accessed 2026-05-11

  16. [16]

    MoveFlow: AI-assisted Move smart contract development,

    ——, “MoveFlow: AI-assisted Move smart contract development,” https: //github.com/aptos-labs/aptos-core/tree/d8be469f7f/aptos-move/flow, 2026, commitd8be469f7f, accessed 2026-05-11

  17. [17]

    Formal Verification of Imperative First-Class Functions in Move

    W. Grieskamp, T. Zhang, V . Kashyap, and J. Silverman, “Formal verification of imperative first-class functions in Move,”CoRR, vol. abs/2605.10007, 2026. [Online]. Available: https://arxiv.org/abs/2605.1 0007

  18. [18]

    Agentic specification generator for Move programs,

    Y .-F. Fu, M. Xu, and T. Kim, “Agentic specification generator for Move programs,” inProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025), 2025, pp. 1286–1298

  19. [19]

    VeriStruct: AI-assisted automated verification of data-structure modules in Verus,

    C. Sun, Y . Sun, D. Amrollahi, E. Zhang, S. K. Lahiri, S. Lu, D. L. Dill, and C. Barrett, “VeriStruct: AI-assisted automated verification of data-structure modules in Verus,” inTools and Algorithms for the Construction and Analysis of Systems (TACAS 2026), ser. Lecture Notes in Computer Science. Springer, 2026, arXiv:2510.25015. [Online]. Available: https...

  20. [20]

    PropertyGPT: LLM-driven formal verification of smart contracts through retrieval- augmented property generation,

    Y . Liu, Y . Xue, D. Wu, Y . Sun, Y . Li, M. Shi, and Y . Liu, “PropertyGPT: LLM-driven formal verification of smart contracts through retrieval- augmented property generation,” inNetwork and Distributed System Security Symposium (NDSS 2025), 2025

  21. [21]

    DafnyPro: LLM-assisted automated verification for Dafny programs,

    D. Banerjee, O. Bouissou, and S. Zetzsche, “DafnyPro: LLM-assisted automated verification for Dafny programs,” 2026, dafny Workshop, POPL 2026. [Online]. Available: https://arxiv.org/abs/2601.05385

  22. [22]

    AutoVerus: Automated proof generation for Rust code,

    C. Yang, X. Li, M. R. H. Misu, J. Yao, W. Cui, Y . Gong, C. Hawblitzel, S. K. Lahiri, J. R. Lorch, S. Lu, F. Yang, Z. Zhou, and S. Lu, “AutoVerus: Automated proof generation for Rust code,”Proc. ACM Program. Lang., vol. 9, no. OOPSLA2, 2025

  23. [23]

    Boogie: A modular reusable verifier for object-oriented programs,

    M. Barnett, B.-Y . E. Chang, R. DeLine, B. Jacobs, and K. R. M. Leino, “Boogie: A modular reusable verifier for object-oriented programs,” inInternational Symposium on Formal Methods for Components and Objects. Springer, 2005, pp. 364–387

  24. [24]

    A polymorphic intermediate verifica- tion language: Design and logical encoding,

    K. R. M. Leino and P. Rümmer, “A polymorphic intermediate verifica- tion language: Design and logical encoding,” inTACAS, J. Esparza and R. Majumdar, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 312–327