pith. sign in

arxiv: 2606.17182 · v1 · pith:NYWMQUXOnew · submitted 2026-06-15 · 💻 cs.LG · cs.DC· cs.LO· cs.MA· cs.PL

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

Pith reviewed 2026-06-27 03:49 UTC · model grok-4.3

classification 💻 cs.LG cs.DCcs.LOcs.MAcs.PL
keywords concurrency anomaliesmulti-agent LLM systemsformal verificationTLA+Verusconsistency hierarchydeterministic executionread-generate-write
0
0 comments X

The pith

A 274-obligation Verus development proves sound and complete detectors for four concurrency anomalies in multi-agent LLM systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models state sharing in multi-agent LLM systems as long-running read-generate-write operations under deterministic-generation semantics. It formalizes four anomalies in TLA+ as structural analogues of classical isolation anomalies. A consistency hierarchy consisting of the chain L0 to L4 is defined and one maximal chain is shown to be realizable with strict separation. A machine-checked Verus development establishes that detectors for each level are sound and complete against the specifications, with corresponding avoidance sets realized in Rust runtimes.

Core claim

The exclusion lattice over the four anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, L0 ⊂neq ⋯ ⊂neq L4, to our knowledge the first machine-checked consistency hierarchy for such runtimes, proved via 274 Verus obligations with a trust base of two structural axioms and a mutex correspondence.

What carries the argument

The Verus development that proves the detectors sound and complete against the TLA+ specifications of the anomalies and each runtime's avoidance set.

If this is right

  • Three Rust runtimes realize L0-L1 via pessimistic locking and serializable snapshot isolation, each verified against stale-generation.
  • L2-L4 are realized by exec-mode-verified prevention twins that achieve 0/1000 anomaly rate versus 1000/1000 without the twin.
  • An L0 to L1 refinement eliminates a reproduced silent lost update from ByteDance's deer-flow.
  • An L3 commit-order sequencer removes tool-effect reordering observed on unmodified output from LangGraph's ToolNode.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same verification approach could be applied to other shared-state agent frameworks that adopt deterministic replay.
  • The hierarchy supplies concrete upgrade paths for existing systems that currently exhibit the anomalies.
  • Live verification across model families indicates the prevention mechanisms do not depend on particular LLM generation behaviors.

Load-bearing premise

The two structural axioms and mutex correspondence accurately capture the read-generate-write operations under deterministic-generation semantics.

What would settle it

An execution trace that produces one of the four anomalies inside a runtime asserted to enforce its corresponding avoidance level, or the discovery of a failing obligation inside the 274-obligation Verus development.

Figures

Figures reproduced from arXiv: 2606.17182 by Sajjad Khan.

Figure 1
Figure 1. Figure 1: Hasse diagram of the consistency lattice [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SSI token overhead versus realized abort rate under the high [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
read the original abstract

Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, $L_0 \subsetneq \cdots \subsetneq L_4$, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified $L_0 \to L_1$ refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims to model concurrency anomalies (stale-generation, phantom-tool, causal-cascade, tool-effect reordering) in multi-agent LLM systems as long-running read-generate-write operations under deterministic-generation semantics. It formalizes the anomalies in TLA+ with TLC counterexamples, proves a consistency hierarchy L0 ⊂ ... ⊂ L4 via a 274-obligation Verus development (zero assumes/admits, trust base of two structural axioms plus mutex correspondence), realizes L0-L1 in three verified Rust runtimes, provides exec-mode verified prevention for L2-L4, and reproduces/fixes issues in deer-flow and LangGraph.

Significance. If the result holds, the contribution is significant as the first machine-checked consistency hierarchy for such runtimes, with direct mechanical support from 274 Verus obligations and TLC counterexamples for soundness/completeness claims, plus explicit reproduction and fixes in deployed systems (deer-flow, LangGraph). The verified detector, refinements, and realizability artifacts provide strong evidence.

minor comments (3)
  1. [Abstract] Abstract: the phrase 'the exclusion lattice over these anomalies is trivial' is used without elaboration; a brief parenthetical or reference to the classical isolation literature would aid readers.
  2. [Verus development section] The two structural axioms forming the Verus trust base are declared but their precise statements are not reproduced in the main body; including them (even in an appendix reference) would improve self-containment.
  3. [Experimental evaluation] Table or figure reporting the 120 retracted sessions across model families should include the exact model versions and prompt templates used for reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of the manuscript, the assessment of its significance, and the recommendation to accept. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; claims rest on external mechanical verification

full rationale

The paper's derivation consists of TLA+ anomaly definitions with TLC counterexamples, followed by a 274-obligation Verus development (zero assume, zero admit) that proves soundness and completeness of detectors against independently stated specifications and each runtime's avoidance set. The only non-mechanical elements are two explicitly declared structural axioms plus a mutex correspondence, stated as the scoped trust base for the deterministic-generation read-generate-write model. No equation or claim reduces by construction to a fitted parameter, self-definition, or self-citation chain; the realizability proofs and lattice separation are machine-checked against external benchmarks rather than being tautological with the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The verification rests on two unnamed structural axioms and a mutex correspondence as the explicit trust base; no free parameters are introduced and the anomalies are presented as formal analogues of existing types rather than new postulated entities.

axioms (2)
  • domain assumption two structural axioms
    Explicitly listed as part of the trust base for the 274 Verus obligations
  • domain assumption mutex correspondence
    Explicitly listed as part of the trust base for the 274 Verus obligations

pith-pipeline@v0.9.1-grok · 5869 in / 1279 out tokens · 75273 ms · 2026-06-27T03:49:38.968832+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

101 extracted references · 8 linked inside Pith

  1. [1]

    SagaLLM: Context management, validation, and transaction guarantees for multi-agent LLM planning,

    E. Y . Chang and L. Geng, “SagaLLM: Context management, validation, and transaction guarantees for multi-agent LLM planning,”Proceedings of the VLDB Endowment, 2025, arXiv:2503.11951. TABLE 6 Per-file Verus obligation counts (verus_count.sh -full).†marks the four helper files excluded from the 274 curated headline; the default verus_count.shrun applies th...

  2. [2]

    Atomix: Timely, transactional tool use for reliable agentic workflows,

    B. Mohammadi, N. Potamitis, L. Klein, A. Arora, and L. Bindschaedler, “Atomix: Timely, transactional tool use for reliable agentic workflows,” arXiv preprint arXiv:2602.14849, 2026, preprint; not yet peer-reviewed at time of writing

  3. [3]

    How to make a multiprocessor computer that correctly executes multiprocess programs,

    L. Lamport, “How to make a multiprocessor computer that correctly executes multiprocess programs,”IEEE Transactions on Computers, vol. C-28, no. 9, pp. 690–691, 1979

  4. [4]

    Memory consistency and event ordering in scalable shared- memory multiprocessors,

    K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared- memory multiprocessors,” inISCA, 1990, pp. 15–26

  5. [5]

    Sim- plifying arm concurrency: Multicopy-atomic axiomatic and operational models for ARMv8,

    C. Pulte, S. Flur, W. Deacon, J. French, S. Sarkar, and P. Sewell, “Sim- plifying arm concurrency: Multicopy-atomic axiomatic and operational models for ARMv8,” inPOPL, vol. 2, 2018, pp. 19:1–19:29

  6. [6]

    Linearizability: A correctness condition for concurrent objects,

    M. P. Herlihy and J. M. Wing, “Linearizability: A correctness condition for concurrent objects,”ACM TOPLAS, vol. 12, no. 3, pp. 463–492, 1990

  7. [7]

    A critique of ANSI SQL isolation levels,

    H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil, “A critique of ANSI SQL isolation levels,” inACM SIGMOD, 1995, pp. 1–10

  8. [8]

    Weak consistency: A generalized theory and optimistic implementations for distributed transactions,

    A. Adya, “Weak consistency: A generalized theory and optimistic implementations for distributed transactions,” Ph.D. dissertation, MIT, 1999

  9. [9]

    Generalized isolation level definitions,

    A. Adya, B. Liskov, and P. O’Neil, “Generalized isolation level definitions,” inProceedings of the 16th International Conference on Data Engineering (ICDE), 2000, pp. 67–78

  10. [10]

    Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS,

    W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS,” inSOSP, 2011, pp. 401–416

  11. [11]

    Consistency, availability, and convergence,

    P. Mahajan, L. Alvisi, and M. Dahlin, “Consistency, availability, and convergence,” UT Austin, Tech. Rep. UTCS-TR-11-22, 2011

  12. [12]

    Burckhardt,Principles of Eventual Consistency

    S. Burckhardt,Principles of Eventual Consistency. now publishers, 2014

  13. [13]

    Conflict-free replicated data types,

    M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski, “Conflict-free replicated data types,” inSSS, 2011, pp. 386–400

  14. [14]

    Session guarantees for weakly consistent replicated data,

    D. B. Terry, A. J. Demers, K. Petersen, M. J. Spreitzer, M. M. Theimer, and B. B. Welch, “Session guarantees for weakly consistent replicated data,” inPDIS, 1994, pp. 140–149

  15. [15]

    A universal modular ACTOR formalism for artificial intelligence,

    C. Hewitt, P. Bishop, and R. Steiger, “A universal modular ACTOR formalism for artificial intelligence,” inIJCAI, 1973, pp. 235–245. S. KHAN 31

  16. [16]

    Consistency tradeoffs in modern distributed database system design: CAP is only part of the story,

    D. J. Abadi, “Consistency tradeoffs in modern distributed database system design: CAP is only part of the story,”IEEE Computer, vol. 45, no. 2, pp. 37–42, 2012

  17. [17]

    Consistency in non-transactional distributed storage systems,

    P. Viotti and M. Vukoli´c, “Consistency in non-transactional distributed storage systems,”ACM Computing Surveys, vol. 49, no. 1, pp. 19:1–19:34, 2016

  18. [18]

    CodeCRDT: Observation-driven coordination for multi- agent LLM code generation,

    S. Pugachev, “CodeCRDT: Observation-driven coordination for multi- agent LLM code generation,”arXiv preprint arXiv:2510.18893, 2025

  19. [19]

    Why do multi-agent LLM systems fail?

    M. Cemri, M. Z. Pan, S. Yang, L. A. Agrawal, B. Chopra, R. Tiwari, K. Keutzer, A. Parameswaran, D. Klein, K. Ramchandran, M. Zaharia, J. E. Gonzalez, and I. Stoica, “Why do multi-agent LLM systems fail?” inNeurIPS Datasets and Benchmarks Track, 2025, spotlight; arXiv:2503.13657

  20. [20]

    Bolt-on causal con- sistency,

    P. Bailis, A. Ghodsi, J. M. Hellerstein, and I. Stoica, “Bolt-on causal con- sistency,” inProceedings of the ACM SIGMOD International Conference on Management of Data, 2013, pp. 761–772

  21. [21]

    Serializable isolation for snapshot databases,

    M. J. Cahill, U. Röhm, and A. D. Fekete, “Serializable isolation for snapshot databases,” inProceedings of the ACM SIGMOD International Conference on Management of Data, 2008, pp. 729–738

  22. [22]

    Making snapshot isolation serializable,

    A. Fekete, D. Liarokapis, E. O’Neil, P. O’Neil, and D. Shasha, “Making snapshot isolation serializable,”ACM Transactions on Database Systems (TODS), vol. 30, no. 2, pp. 492–528, 2005

  23. [23]

    Verus: Verifying Rust programs using linear ghost types,

    A. Lattuada, T. Hance, C. Cho, M. Brun, I. Subasinghe, Y . Zhou, J. Howell, B. Parno, and C. Hawblitzel, “Verus: Verifying Rust programs using linear ghost types,” inProceedings of the ACM on Programming Languages (OOPSLA), vol. 7, 2023, pp. 286–315

  24. [24]

    AutoGen: Enabling next-gen LLM applications via multi-agent conversation,

    Q. Wu, G. Bansal, J. Zhang, Y . Wu, S. Zhang, E. Zhu, B. Li, L. Jiang, X. Zhang, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,” inarXiv preprint arXiv:2308.08155, 2023

  25. [25]

    Probabilistically bounded staleness for practical partial quorums,

    P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein, and I. Sto- ica, “Probabilistically bounded staleness for practical partial quorums,” Proceedings of the VLDB Endowment, vol. 5, no. 8, pp. 776–787, 2012

  26. [26]

    Quantitative relaxation of concurrent data structures,

    T. A. Henzinger, C. M. Kirsch, H. Payer, A. Sezgin, and A. Sokolova, “Quantitative relaxation of concurrent data structures,” inProceedings of the 40th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2013, pp. 317–328

  27. [27]

    Robust composition: Towards a unified approach to access control and concurrency control,

    M. S. Miller, “Robust composition: Towards a unified approach to access control and concurrency control,” Ph.D. dissertation, Johns Hopkins University, 2006

  28. [28]

    Analysing the security properties of object-capability patterns,

    T. Murray, “Analysing the security properties of object-capability patterns,” Ph.D. dissertation, University of Oxford, 2010

  29. [29]

    Toolformer: Language models can teach themselves to use tools,

    T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettle- moyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inNeurIPS, 2023

  30. [30]

    Gorilla: Large language model connected with massive APIs,

    S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive APIs,” inarXiv preprint arXiv:2305.15334, 2023

  31. [31]

    Garcia-Molina and K

    H. Garcia-Molina and K. Salem, “Sagas,” inProceedings of the 1987 ACM SIGMOD International Conference on Management of Data, 1987, pp. 249–259

  32. [32]

    IronFleet: Proving practical distributed systems correct,

    C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. Setty, and B. Zill, “IronFleet: Proving practical distributed systems correct,” inProceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), 2015, pp. 1–17

  33. [33]

    Repairing sequential consistency in C/C++11,

    O. Lahav, V . Vafeiadis, J. Kang, C.-K. Hur, and D. Dreyer, “Repairing sequential consistency in C/C++11,” inProceedings of the 38th ACM SIG- PLAN Conference on Programming Language Design and Implementation (PLDI), 2017, pp. 618–632

  34. [34]

    GenMC: A model checker for weak memory models,

    M. Kokologiannakis and V . Vafeiadis, “GenMC: A model checker for weak memory models,” inComputer Aided Verification (CAV), 2021, pp. 427–440

  35. [35]

    RustBelt: Securing the foundations of the Rust programming language,

    R. Jung, J.-H. Jourdan, R. Krebbers, and D. Dreyer, “RustBelt: Securing the foundations of the Rust programming language,” inProceedings of the 45th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL). Association for Computing Machinery, 2018, pp. 66:1–66:34

  36. [36]

    A promising semantics for relaxed-memory concurrency,

    J. Kang, C.-K. Hur, O. Lahav, V . Vafeiadis, and D. Dreyer, “A promising semantics for relaxed-memory concurrency,” inProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), 2017, pp. 175–189

  37. [37]

    INV ALID_CONCURRENT_GRAPH_UPDATE: Lang- Graph error reference,

    LangChain AI, “INV ALID_CONCURRENT_GRAPH_UPDATE: Lang- Graph error reference,” https://docs.langchain.com/oss/python/langgraph/ errors/INV ALID_CONCURRENT_GRAPH_UPDATE, 2025, accessed November 2025

  38. [38]

    InvalidUpdateError when using parallel subgraphs with shared state keys in LangGraph,

    Issue #6446, langchain-ai/langgraph, “InvalidUpdateError when using parallel subgraphs with shared state keys in LangGraph,” https://github. com/langchain-ai/langgraph/issues/6446, 2025, accessed November 2025

  39. [39]

    Handling shared state across multi- agent conversations in AutoGen,

    Discussion #7144, microsoft/autogen, “Handling shared state across multi- agent conversations in AutoGen,” https://github.com/microsoft/autogen/ discussions/7144, 2025, accessed November 2025

  40. [40]

    Stateful persistence grains for each agent,

    Pull Request #3954, microsoft/autogen, “Stateful persistence grains for each agent,” https://github.com/microsoft/autogen/pull/3954, 2024, accessed November 2025

  41. [41]

    GraphFlow state persistence bug: workflow gets stuck after interruption during agent transitions,

    Issue #7043, microsoft/autogen, “GraphFlow state persistence bug: workflow gets stuck after interruption during agent transitions,” https: //github.com/microsoft/autogen/issues/7043, 2025, accessed November 2025

  42. [42]

    To-do list shown during streaming disappears after the run completes,

    Issue #3123, bytedance/deer-flow, “To-do list shown during streaming disappears after the run completes,” https://github.com/bytedance/deer- flow/issues/3123, 2026, accessed June 2026

  43. [43]

    fix(agents): preserve todos state across node updates,

    Pull Request #3180, bytedance/deer-flow, “fix(agents): preserve todos state across node updates,” https://github.com/bytedance/deer-flow/pull/3180, 2026, accessed June 2026

  44. [44]

    [runtime] Plan mode fails with duplicate todos channel type conflict,

    Issue #3199, bytedance/deer-flow, “[runtime] Plan mode fails with duplicate todos channel type conflict,” https://github.com/bytedance/deer- flow/issues/3199, 2026, accessed June 2026

  45. [45]

    Allocating isolation levels to transactions,

    A. Fekete, “Allocating isolation levels to transactions,” inProceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), 2005, pp. 206–215

  46. [46]

    Highly available transactions: Virtues and limitations,

    P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Sto- ica, “Highly available transactions: Virtues and limitations,”Proceedings of the VLDB Endowment, vol. 7, no. 3, pp. 181–192, 2014

  47. [47]

    Keeping CALM: When distributed consistency is easy,

    J. M. Hellerstein and P. Alvaro, “Keeping CALM: When distributed consistency is easy,”Communications of the ACM, vol. 63, no. 9, pp. 72–81, 2020

  48. [48]

    Coordination avoidance in database systems,

    P. Bailis, A. Fekete, M. J. Franklin, A. Ghodsi, J. M. Hellerstein, and I. Stoica, “Coordination avoidance in database systems,” inProceedings of the VLDB Endowment, vol. 8, no. 3, 2014, pp. 185–196

  49. [49]

    Seeing is believing: A client-centric specification of database isolation,

    N. Crooks, Y . Pu, L. Alvisi, and A. Clement, “Seeing is believing: A client-centric specification of database isolation,” inProceedings of the ACM Symposium on Principles of Distributed Computing (PODC), 2017, pp. 73–82

  50. [50]

    Spanner: Google’s globally-distributed database,

    J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y . Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford, “Spanner: Google’s globally-distributed database,” inOSDI, 2012...

  51. [51]

    Calvin: Fast distributed transactions for partitioned database systems,

    A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi, “Calvin: Fast distributed transactions for partitioned database systems,” in ACM SIGMOD, 2012, pp. 1–12

  52. [52]

    Software transactional memory,

    N. Shavit and D. Touitou, “Software transactional memory,” inProceed- ings of the 14th ACM Symposium on Principles of Distributed Computing (PODC), 1995, pp. 204–213

  53. [53]

    Transactional memory: Architectural support for lock-free data structures,

    M. Herlihy and J. E. B. Moss, “Transactional memory: Architectural support for lock-free data structures,” inProceedings of the 20th International Symposium on Computer Architecture (ISCA), 1993, pp. 289–300

  54. [54]

    Multiversion concurrency control— theory and algorithms,

    P. A. Bernstein and N. Goodman, “Multiversion concurrency control— theory and algorithms,”ACM Transactions on Database Systems, vol. 8, no. 4, pp. 465–483, 1983

  55. [55]

    ACTA: The SAGA continues,

    P. K. Chrysanthis and K. Ramamritham, “ACTA: The SAGA continues,” inDatabase Transaction Models for Advanced Applications, A. K. Elmagarmid, Ed. Morgan Kaufmann, 1992, pp. 349–397

  56. [56]

    Verdi: A framework for implementing and formally verifying distributed systems,

    J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. Anderson, “Verdi: A framework for implementing and formally verifying distributed systems,” inProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2015, pp. 357–368

  57. [57]

    Chapar: Certified causally consistent distributed key-value stores,

    M. Lesani, C. J. Bell, and A. Chlipala, “Chapar: Certified causally consistent distributed key-value stores,” inProceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2016

  58. [58]

    Jepsen: Distributed systems safety analysis,

    K. Kingsbury, “Jepsen: Distributed systems safety analysis,” https://jepsen. io, 2013–present, accessed 2026

  59. [59]

    Cobra: Making transactional key- value stores verifiably serializable,

    C. Tan, C. Zhao, S. Mu, and M. Walfish, “Cobra: Making transactional key- value stores verifiably serializable,” inProceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020, pp. 63–80

  60. [60]

    Elle: Inferring isolation anomalies from experimental observations,

    K. Kingsbury and P. Alvaro, “Elle: Inferring isolation anomalies from experimental observations,” arXiv:2003.10554, 2020

  61. [61]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inarXiv preprint arXiv:2210.03629, 2022. S. KHAN 32

  62. [62]

    MetaGPT: Meta programming for multi-agent collaborative framework,

    S. Hong, X. Zheng, J. Chen, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “MetaGPT: Meta programming for multi-agent collaborative framework,” inarXiv preprint arXiv:2308.00352, 2023

  63. [63]

    Generative agents: Interactive simulacra of human behavior,

    J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inUIST, 2023

  64. [64]

    Model context protocol,

    Anthropic, “Model context protocol,” https://modelcontextprotocol.io, 2024

  65. [65]

    Database transaction models for advanced applica- tions,

    A. K. Elmagarmid, “Database transaction models for advanced applica- tions,”Morgan Kaufmann, 1992

  66. [66]

    Synthesis of extended trans- action models using ACTA,

    P. K. Chrysanthis and K. Ramamritham, “Synthesis of extended trans- action models using ACTA,”ACM Transactions on Database Systems, vol. 19, no. 3, pp. 450–491, 1994

  67. [67]

    The ConTract model,

    A. Reuter and H. Wächter, “The ConTract model,”IEEE Data Engineering Bulletin, vol. 14, no. 1, pp. 39–43, 1991

  68. [68]

    Orleans: Distributed virtual actors for programmability and scalability,

    P. A. Bernstein, S. Bykov, A. Geller, G. Kliot, and J. Thelin, “Orleans: Distributed virtual actors for programmability and scalability,” inMicrosoft Research Technical Report MSR-TR-2014-41, 2014

  69. [69]

    Verification-aware planning for multi-agent systems,

    T. Xuet al., “Verification-aware planning for multi-agent systems,”arXiv preprint arXiv:2510.17109, 2025

  70. [70]

    AgentVerify: Compositional formal verification of AI agent safety properties via LTL model checking,

    E. Fang, “AgentVerify: Compositional formal verification of AI agent safety properties via LTL model checking,” Preprints.org, 2026, manuscript 202604.1029

  71. [71]

    Concurrency control in groupware systems,

    C. A. Ellis and S. J. Gibbs, “Concurrency control in groupware systems,” inProceedings of the 1989 ACM SIGMOD International Conference on Management of Data, 1989, pp. 399–407

  72. [72]

    Groupware: Some issues and experiences,

    C. A. Ellis, S. J. Gibbs, and G. L. Rein, “Groupware: Some issues and experiences,”Communications of the ACM, vol. 34, no. 1, pp. 39–58, 1991

  73. [73]

    H-Store: A high-performance, distributed main memory transaction processing system,

    R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y . Zhang, J. Hugg, and D. J. Abadi, “H-Store: A high-performance, distributed main memory transaction processing system,”Proceedings of the VLDB Endowment, vol. 1, no. 2, pp. 1496–1499, 2008

  74. [74]

    Durable Functions: Semantics for stateful serverless,

    S. Burckhardt, C. Gillum, D. Justo, K. Kallas, C. McMahon, and C. S. Meiklejohn, “Durable Functions: Semantics for stateful serverless,” Proceedings of the ACM on Programming Languages (OOPSLA), vol. 5, 2021

  75. [75]

    Temporal: durable execution platform,

    Temporal Technologies, “Temporal: durable execution platform,” https: //temporal.io, accessed 2026

  76. [76]

    MemGPT: Towards LLMs as operating systems,

    C. Packer, V . Fang, S. G. Patil, K. Lin, S. Wooders, and J. E. Gonza- lez, “MemGPT: Towards LLMs as operating systems,”arXiv preprint arXiv:2310.08560, 2023

  77. [77]

    Multiparty asynchronous session types,

    K. Honda, N. Yoshida, and M. Carbone, “Multiparty asynchronous session types,” inProceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2008, pp. 273–284

  78. [78]

    Probabilistic relational reasoning for differential privacy,

    G. Barthe, B. Köpf, F. Olmedo, and S. Zanella-Béguelin, “Probabilistic relational reasoning for differential privacy,” inProceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2012, pp. 97–110

  79. [79]

    Weakest precondition reasoning for expected run-times of probabilistic programs,

    B. L. Kaminski, J.-P. Katoen, C. Matheja, and F. Olmedo, “Weakest precondition reasoning for expected run-times of probabilistic programs,” inProgramming Languages and Systems (ESOP), 2016, pp. 364–389

  80. [80]

    Toward a theory of transac- tional contention managers,

    R. Guerraoui, M. Herlihy, and B. Pochon, “Toward a theory of transac- tional contention managers,” inProceedings of the 24th ACM Symposium on Principles of Distributed Computing (PODC), 2005, pp. 258–264

Showing first 80 references.