pith. sign in

arxiv: 2605.24619 · v1 · pith:C2VVFW33new · submitted 2026-05-23 · 💻 cs.SE

Synthesizing Inductive Invariants for Distributed Protocols via IC3 and Large Language Models

Pith reviewed 2026-06-30 13:01 UTC · model grok-4.3

classification 💻 cs.SE
keywords inductive invariantsdistributed protocolsTLA+IC3large language modelssafety verificationneuro-symbolic methodsprotocol verification
0
0 comments X

The pith

IC3Syn combines IC3 with LLMs to synthesize inductive invariants for all 29 tested distributed protocols in TLA+.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces IC3Syn to automatically find inductive invariants needed to prove safety in distributed protocols. These invariants must hold initially and be preserved by transitions while implying the safety property. Existing methods either limit the protocol models or require human-written templates. IC3Syn uses a symbolic controller to create specific tasks and an LLM to reason about the protocol at a high level. It succeeds on every protocol in the evaluation set, with the invariants verified to work for the complete unbounded versions using a theorem prover.

Core claim

IC3Syn executes an IC3-style process over TLA+ states where the IC3 controller decomposes the invariant search into blocking tasks and the LLM provides the missing protocol-level reasoning, enabling discovery of invariants on finite instances that TLAPS confirms are inductive for the full protocol.

What carries the argument

The IC3Syn neuro-symbolic framework that pairs a symbolic IC3 controller for task decomposition with LLM protocol-level reasoning.

If this is right

  • Safety can be established for protocols like MongoLoglessDynamicRaft that previous tools could not handle.
  • The approach applies to consensus, reconfiguration, and client-server protocols without logical restrictions.
  • Invariants synthesized on finite instances transfer to unbounded models.
  • No manual templates are required for the synthesis process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This integration could reduce the need for expert-crafted templates in verifying additional classes of systems.
  • Similar neuro-symbolic loops might be tested on liveness properties or other specification languages.
  • The results on industrial-scale examples point to possible use in automated protocol design pipelines.

Load-bearing premise

The invariants discovered on finite instances remain inductive when applied to the full unbounded protocol without subtle flaws that the theorem prover overlooks.

What would settle it

A case where the candidate invariants pass checks on finite instances but fail to be inductive or imply safety when checked by TLAPS on the unbounded TLA+ model.

Figures

Figures reproduced from arXiv: 2605.24619 by Guangyuan Wu, Hengfeng Wei, Taolue Chen, Weining Cao, Xiaoxing Ma, Yuan Yao.

Figure 1
Figure 1. Figure 1: Overview of IC3Syn. The IC3 loop iterates through four stages: frontier query, backward predecessor search, blocking-clause search, and clause push. They are coordinated by three components: (1) an IC3 controller that maintains frames and manages backward search from frontier bad states, (2) a verification backend where Apalache handles symbolic witness queries and TLC screens candidate blocking clauses ag… view at source ↗
read the original abstract

Distributed protocols are notoriously difficult to verify correctly. Proving safety typically requires inductive invariants that both imply the desired property and are preserved by every protocol transition; yet inferring such invariants remains a major bottleneck: existing approaches either restrict the protocol models to a decidable fragment of first-order logic or demand expert-crafted templates. We present IC3Syn, a neuro-symbolic framework that synthesizes inductive invariants by executing an IC3-style process over TLA+ states with the assistance of Large Language Models (LLMs). At large, IC3Syn combines a symbolic IC3 controller, which decomposes invariant synthesis into focused blocking tasks and an LLM which provides protocol-level reasoning that IC3 alone lacks for TLA+ specifications. This integration enables a disciplined yet flexible search for invariants without imposing logical restrictions or requiring manual templates. We evaluate IC3Syn on 29 distributed protocols spanning consensus, reconfiguration and client-server systems, and compare it against Endive, IC3PO, SWISS and DistAI. IC3Syn discovers candidate invariants for all 29 protocols, including MongoLoglessDynamicRaft (MLDR), an industrial-scale Raft-based reconfiguration protocol for which none of the compared tools reports a solution, as well as one complex Paxos variant. In each case, the invariants synthesized on finite instances are shown in TLAPS to be inductive for the full unbounded protocol, thereby establishing safety.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents IC3Syn, a neuro-symbolic framework that combines a symbolic IC3 controller with LLMs to synthesize inductive invariants for TLA+ specifications of distributed protocols. The approach decomposes synthesis into focused blocking tasks on finite instances, using the LLM for protocol-level reasoning to generate candidate clauses. It reports success on all 29 evaluated protocols (including MongoLoglessDynamicRaft and a complex Paxos variant), with the synthesized invariants verified via TLAPS to be inductive and to imply safety over the full unbounded models, outperforming Endive, IC3PO, SWISS, and DistAI.

Significance. If the results hold, the work is significant for reducing reliance on expert templates or decidable fragments in distributed protocol verification. Success on an industrial-scale reconfiguration protocol like MLDR, where prior tools report no solution, is a notable practical advance. Explicit credit is due for the machine-checked TLAPS proofs establishing inductiveness and safety on the unbounded TLA+ models, and for the reproducible empirical comparison across multiple baselines.

major comments (2)
  1. [Evaluation section] Evaluation section (and abstract): The 100% success rate on 29 protocols is reported without quantitative data on LLM usage, such as the number of candidates generated per protocol, rejection rates, prompt variations tested, or failure modes encountered (especially for MLDR). This information is load-bearing for assessing the reliability and reproducibility of the LLM component in the neuro-symbolic integration.
  2. [Approach and evaluation sections] Approach and evaluation sections: The method relies on finite-instance IC3 blocking tasks to surface candidates that TLAPS then certifies for the unbounded model. No argument or sensitivity analysis is provided showing that the chosen finite bounds (e.g., for MLDR) are adequate to surface all invariants required for unbounded inductiveness, even though TLAPS accepts the final sets. This is load-bearing for the transfer claim underlying success on complex protocols.
minor comments (2)
  1. [Evaluation section] The paper would benefit from a table summarizing per-protocol metrics (e.g., number of invariants found, TLAPS proof times) to complement the high-level success claims.
  2. [§3] Notation for the IC3 blocking tasks and LLM interaction could be clarified with a small example in §3 to make the decomposition more accessible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our submission. We address each major comment below and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Evaluation section] Evaluation section (and abstract): The 100% success rate on 29 protocols is reported without quantitative data on LLM usage, such as the number of candidates generated per protocol, rejection rates, prompt variations tested, or failure modes encountered (especially for MLDR). This information is load-bearing for assessing the reliability and reproducibility of the LLM component in the neuro-symbolic integration.

    Authors: We agree that quantitative details on LLM usage would strengthen reproducibility claims. In the revised manuscript we will add a new subsection (or table) in the evaluation reporting, for each of the 29 protocols: (i) total LLM queries issued, (ii) number of unique candidate clauses generated, (iii) rejection rate by the IC3 controller, and (iv) any prompt-template variations that were explored. For MLDR we will additionally document the specific failure modes observed before a successful invariant set was obtained. These statistics were recorded during our experiments and can be included without altering the core claims. revision: yes

  2. Referee: [Approach and evaluation sections] Approach and evaluation sections: The method relies on finite-instance IC3 blocking tasks to surface candidates that TLAPS then certifies for the unbounded model. No argument or sensitivity analysis is provided showing that the chosen finite bounds (e.g., for MLDR) are adequate to surface all invariants required for unbounded inductiveness, even though TLAPS accepts the final sets. This is load-bearing for the transfer claim underlying success on complex protocols.

    Authors: We acknowledge the value of an explicit argument for bound selection. The IC3 blocking phase operates only on finite instances to produce candidate clauses; the subsequent TLAPS proof establishes inductiveness and safety directly on the unbounded TLA+ model, which is the transfer guarantee. Nevertheless, we will insert a short paragraph in the approach section explaining the heuristic used to choose finite bounds (typically the smallest number of nodes/processes sufficient to expose all relevant blocking obligations) and will report, for MLDR, the concrete bounds employed together with a brief note that increasing those bounds did not yield additional necessary clauses. A full sensitivity sweep across multiple bound sizes for every protocol was not performed; we therefore mark this revision as partial. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical synthesis and external verification are independent

full rationale

The paper describes a neuro-symbolic IC3+LLM framework evaluated empirically on 29 protocols against external baselines (Endive, IC3PO, SWISS, DistAI) with TLAPS certification on unbounded TLA+ models. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations reduce any claimed result to its inputs by construction. The finite-to-unbounded transfer is presented as an empirical outcome verified externally rather than derived tautologically from the method's own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Framework rests on standard TLA+ semantics and IC3 algorithm; no new free parameters, axioms beyond domain assumptions, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption TLA+ transition semantics are faithfully represented in the finite instances used by IC3
    The IC3 controller operates directly on TLA+ states.

pith-pipeline@v0.9.1-grok · 5796 in / 1219 out tokens · 33459 ms · 2026-06-30T13:01:30.646032+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 29 canonical work pages · 2 internal anchors

  1. [1]

    Aaron R. Bradley. 2011. SAT-based model checking without unrolling. InInternational Conference on Verification, Model Checking, and Ab- stract Interpretation. Springer, Springer, Berlin, Heidelberg, 70–87. doi:10.1007/978-3-642-18275-4_7

  2. [2]

    Aaron R. Bradley. 2012. Understanding IC3. InInternational Conference on Theory and Applications of Satisfiability Testing (SAT). Springer, Berlin, Heidelberg, 1–14. doi:10.1007/978-3-642-31612-8_1

  3. [3]

    Weining Cao, Guangyuan Wu, Tangzhi Xu, Yuan Yao, Hengfeng Wei, Taolue Chen, and Xiaoxing Ma. 2025. Clause2Inv: A Generate- Combine-Check Framework for Loop Invariant Inference.Proceed- ings of the ACM on Software Engineering2, ISSTA (2025), 1009–1030. doi:10.1145/3728920

  4. [4]

    Saikat Chakraborty, Shuvendu Lahiri, Sarah Fakhoury, Akash Lal, Madanlal Musuvathi, Aseem Rastogi, Aditya Senthilnathan, Rahul Sharma, and Nikhil Swamy. 2023. Ranking LLM-Generated Loop Invariants for Program Verification. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association...

  5. [5]

    Kaustuv Chaudhuri, Damien Doligez, Leslie Lamport, and Stephan Merz. 2008. A TLA+ proof system. InProceedings of the LPAR Work- shop on Knowledge Exchange: Automated Provers and Proof Assis- tants (KEAPPA) (CEUR Workshop Proceedings, Vol. 418). CEUR-WS.org, Aachen, Germany, 17–37.https://ceur-ws.org/Vol-418/paper2.pdf

  6. [6]

    Alessandro Cimatti and Alberto Griggio. 2016. Infinite-state invariant checking with IC3 and predicate abstraction.Formal Methods in System Design49, 3 (2016), 190–218. doi:10.1007/s10703-016-0257-4

  7. [7]

    Alessandro Cimatti, Alberto Griggio, Sergio Mover, and Stefano Tonetta. 2014. IC3 Modulo Theories via Implicit Predicate Abstraction. InTools and Algorithms for the Construction and Analysis of Systems (TACAS). Springer, Berlin, Heidelberg, 46–61. doi:10.1007/978-3-642- 54862-8_4

  8. [8]

    Niklas Eén, Alan Mishchenko, and Robert Brayton. 2011. Efficient implementation of property directed reachability. In2011 Formal Meth- ods in Computer-Aided Design (FMCAD). IEEE, Piscataway, NJ, USA, 125–134.https://ieeexplore.ieee.org/document/6148886/

  9. [9]

    Yotam M. Y. Feldman, James R. Wilcox, Sharon Shoham, and Mooly Sagiv. 2019. Inferring Inductive Invariants from Phase Structures. InComputer Aided Verification (CA V). Springer, Berlin, Heidelberg, 405–425. doi:10.1007/978-3-030-25543-5_23

  10. [10]

    Sakallah

    Aman Goel and Karem A. Sakallah. 2021. On symmetry and quan- tification: A new approach to verify distributed protocols. InNASA Formal Methods: 13th International Symposium, NFM 2021, Virtual Event, May 24–28, 2021, Proceedings. Springer, Cham, Switzerland, 131–150. doi:10.1007/978-3-030-76384-8_9

  11. [11]

    Jim Gray and Leslie Lamport. 2006. Consensus on Transaction Commit. ACM Transactions on Database Systems31, 1 (2006), 133–160. doi:10. 1145/1132863.1132867

  12. [12]

    Travis Hance, Marijn Heule, Ruben Martins, and Bryan Parno. 2021. Finding Invariants of Distributed Systems: It’s a Small (Enough) World After All. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, Virtual Event, 115–131. https://www.usenix.org/conference/nsdi21/presentation/hance

  13. [13]

    Lorch, Bryan Parno, Michael L

    Chris Hawblitzel, Jon Howell, Manos Kapritsos, Jacob R. Lorch, Bryan Parno, Michael L. Roberts, Srinath Setty, and Brian Zill. 2015. IronFleet: Proving practical distributed systems correct. InProceedings of the 25th Symposium on Operating Systems Principles (SOSP). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/2815400. 2815428

  14. [14]

    Kryštof Hoder and Nikolaj Bjørner. 2012. Generalized property di- rected reachability. InInternational Conference on Theory and Appli- cations of Satisfiability Testing. Springer, Springer, Berlin, Heidelberg, 157–171. doi:10.1007/978-3-642-31612-8_13

  15. [15]

    Adharsh Kamath, Nausheen Mohammed, Aditya Senthilnathan, Saikat Chakraborty, Pantazis Deligiannis, Shuvendu K Lahiri, Akash Lal, Aseem Rastogi, Subhajit Roy, and Rahul Sharma. 2024. Leveraging LLMs for program verification. InFormal Methods in Computer-Aided Design (FMCAD). FMCAD, Inc., Prague, Czech Republic, 107–118. doi:10.34727/2024/isbn.978-3-85448-065-5_16

  16. [16]

    Koenig, Oded Padon, Neil Immerman, and Alex Aiken

    Jason R. Koenig, Oded Padon, Neil Immerman, and Alex Aiken. 2020. First-order quantified separators. InProceedings of the 41st ACM SIG- PLAN Conference on Programming Language Design and Implemen- tation. Association for Computing Machinery, New York, NY, USA, 703–717. doi:10.1145/3385412.3386018

  17. [17]

    Igor Konnov, Jure Kukovec, and Thanh-Hai Tran. 2019. TLA+ model checking made symbolic.Proceedings of the ACM on Programming Lan- guages3, OOPSLA, Article 123 (2019), 30 pages. doi:10.1145/3360549

  18. [18]

    Leslie Lamport. 1998. The Part-Time Parliament.ACM Transactions on Computer Systems16, 2 (1998), 133–169. doi:10.1145/279227.279229

  19. [19]

    Leslie Lamport. 2001. Paxos Made Simple.ACM SIGACT News32, 4 (2001), 51–58.https://lamport.azurewebsites.net/pubs/paxos-simple. pdf

  20. [20]

    2002.Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers

    Leslie Lamport. 2002.Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley, Boston, MA.https://lamport.org/tla/book.html

  21. [21]

    Sakallah

    Haojun Ma, Aman Goel, Jean-Baptiste Jeannin, Manos Kapritsos, Baris Kasikci, and Karem A. Sakallah. 2019. I4: Incremental Inference of Inductive Invariants for Verification of Distributed Protocols. InPro- ceedings of the 27th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, USA, 370–384. doi:10.1145/334130...

  22. [22]

    McMillan

    Kenneth L. McMillan. 2003. Interpolation and SAT-Based Model Check- ing. InComputer Aided Verification (CA V). Springer, Berlin, Heidelberg, 1–13. doi:10.1007/978-3-540-45069-6_1

  23. [23]

    Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, and Michael Deardeuff. 2015. How Amazon Web Services Uses Formal Methods.Commun. ACM58, 4 (2015), 66–73. doi:10.1145/ 2699417

  24. [24]

    Oded Padon, Giuliano Losa, Mooly Sagiv, and Sharon Shoham. 2017. Paxos Made EPR: Decidable Reasoning about Distributed Protocols. Proceedings of the ACM on Programming Languages1, OOPSLA, Article 108 (2017), 31 pages. doi:10.1145/3140568

  25. [25]

    McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham

    Oded Padon, Kenneth L. McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham. 2016. Ivy: Safety verification by interactive generaliza- tion. InProceedings of the 37th ACM SIGPLAN Conference on Program- ming Language Design and Implementation. Association for Computing Machinery, New York, NY, USA, 614–630. doi:10.1145/2908080.2908118

  26. [26]

    Muhammad A. A. Pirzada, Giles Reger, Ahmed Bhayat, and Lucas C. Cordeiro. 2024. LLM-Generated Invariants for Bounded Model Check- ing Without Loop Unrolling. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. Asso- ciation for Computing Machinery, New York, NY, USA, 1395–1407. doi:10.1145/3691620.3695512

  27. [27]

    William Schultz, Edward Ashton, Heidi Howard, and Stavros Tri- pakis. 2024. Scalable, Interpretable Distributed Protocol Verifi- cation by Inductive Proof Slicing. doi:10.48550/arXiv.2404.18048 arXiv:2404.18048 [cs.DC]

  28. [28]

    William Schultz, Ian Dardik, and Stavros Tripakis. 2022. Plain and simple inductive invariant inference for distributed protocols in TLA+. InProceedings of the 22nd Conference on Formal Methods in Computer- Aided Design (FMCAD). FMCAD, Inc., Austin, TX, USA, 273–283. doi:10. 34727/2022/isbn.978-3-85448-053-2_34

  29. [29]

    Anjiang Wei, Tarun Suresh, Tianran Sun, Haoze Wu, Ke Wang, and Alex Aiken. 2025. Quokka: Accelerating Program Verification with LLMs via Invariant Synthesis. doi:10.48550/arXiv.2509.21629 arXiv:2509.21629 [cs.PL] 14

  30. [30]

    Cheng Wen, Jialun Cao, Jie Su, Zhiwu Xu, Shengchao Qin, Mengda He, Haokun Li, Shing-Chi Cheung, and Cong Tian. 2024. Enchanting Pro- gram Specification Synthesis by Large Language Models Using Static Analysis and Program Verification. InComputer Aided Verification: 36th International Conference, CA V 2024, Montreal, QC, Canada, July 24–27, 2024, Proceedin...

  31. [31]

    Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D

    James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas Anderson. 2015. Verdi: A frame- work for implementing and formally verifying distributed systems. InProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Ma- chinery, New York, NY, USA, 3...

  32. [32]

    Guangyuan Wu, Weining Cao, Yuan Yao, Hengfeng Wei, Taolue Chen, and Xiaoxing Ma. 2024. LLM Meets Bounded Model Checking: Neuro- symbolic Loop Invariant Inference. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering(Sacra- mento, CA, USA)(ASE ’24). Association for Computing Machinery, New York, NY, USA, 406–417. do...

  33. [33]

    Haoze Wu, Clark Barrett, and Nina Narodytska. 2024. Lemur: Inte- grating Large Language Models in Automated Program Verification. https://openreview.net/forum?id=Q3YaCghZNtICLR 2024 poster, OpenReview.net

  34. [34]

    VeruSAGE: A Study of Agent-Based Verification for Rust Systems

    Chenyuan Yang, Natalie Neamtu, Chris Hawblitzel, Jacob R. Lorch, and Shan Lu. 2025. VeruSAGE: A Study of Agent-Based Verification for Rust Systems. doi:10.48550/arXiv.2512.18436arXiv:2512.18436 [cs.SE]

  35. [35]

    Jianan Yao, Runzhou Tao, Ronghui Gu, Jason Nieh, Suman Jana, and Gabriel Ryan. 2021. DistAI: Data-driven automated invariant learning for distributed protocols. In15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, Santa Clara, CA, 405–421.https://www.usenix.org/conference/osdi21/ presentation/yao

  36. [36]

    Yuan Yu, Panagiotis Manolios, and Leslie Lamport. 1999. Model checking TLA+ specifications. InCorrect Hardware Design and Veri- fication Methods: 10th IFIP WG 10.5 Advanced Research Working Con- ference, CHARME’99. Springer, Springer, Berlin, Heidelberg, 54–66. doi:10.1007/3-540-48153-2_6

  37. [37]

    Tony Nuda Zhang, Travis Hance, Manos Kapritsos, Tej Chajed, and Bryan Parno. 2024. Inductive Invariants That Spark Joy: Using In- variant Taxonomies to Streamline Distributed Protocol Proofs. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 837–853.https: //www.usenix.org/conference/osd...

  38. [38]

    Tony Nuda Zhang, Keshav Singh, Tej Chajed, Manos Kapritsos, and Bryan Parno. 2025. Basilisk: Using Provenance Invariants to Auto- mate Proofs of Undecidable Protocols. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). USENIX Association, Boston, MA, 1–18.https://www.usenix.org/conference/ osdi25/presentation/zhang-tonyBest ...

  39. [39]

    Yuhao Zhou and Stavros Tripakis. 2025. Towards Language Model Guided TLA+ Proof Automation. doi:10.48550/arXiv.2512.09758 arXiv:2512.09758 [cs.LO] 15