Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

Sajjad Khan

arxiv: 2606.17182 · v1 · pith:NYWMQUXOnew · submitted 2026-06-15 · 💻 cs.LG · cs.DC· cs.LO· cs.MA· cs.PL

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

Sajjad Khan This is my paper

Pith reviewed 2026-06-27 03:49 UTC · model grok-4.3

classification 💻 cs.LG cs.DCcs.LOcs.MAcs.PL

keywords concurrency anomaliesmulti-agent LLM systemsformal verificationTLA+Verusconsistency hierarchydeterministic executionread-generate-write

0 comments

The pith

A 274-obligation Verus development proves sound and complete detectors for four concurrency anomalies in multi-agent LLM systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models state sharing in multi-agent LLM systems as long-running read-generate-write operations under deterministic-generation semantics. It formalizes four anomalies in TLA+ as structural analogues of classical isolation anomalies. A consistency hierarchy consisting of the chain L0 to L4 is defined and one maximal chain is shown to be realizable with strict separation. A machine-checked Verus development establishes that detectors for each level are sound and complete against the specifications, with corresponding avoidance sets realized in Rust runtimes.

Core claim

The exclusion lattice over the four anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, L0 ⊂neq ⋯ ⊂neq L4, to our knowledge the first machine-checked consistency hierarchy for such runtimes, proved via 274 Verus obligations with a trust base of two structural axioms and a mutex correspondence.

What carries the argument

The Verus development that proves the detectors sound and complete against the TLA+ specifications of the anomalies and each runtime's avoidance set.

If this is right

Three Rust runtimes realize L0-L1 via pessimistic locking and serializable snapshot isolation, each verified against stale-generation.
L2-L4 are realized by exec-mode-verified prevention twins that achieve 0/1000 anomaly rate versus 1000/1000 without the twin.
An L0 to L1 refinement eliminates a reproduced silent lost update from ByteDance's deer-flow.
An L3 commit-order sequencer removes tool-effect reordering observed on unmodified output from LangGraph's ToolNode.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same verification approach could be applied to other shared-state agent frameworks that adopt deterministic replay.
The hierarchy supplies concrete upgrade paths for existing systems that currently exhibit the anomalies.
Live verification across model families indicates the prevention mechanisms do not depend on particular LLM generation behaviors.

Load-bearing premise

The two structural axioms and mutex correspondence accurately capture the read-generate-write operations under deterministic-generation semantics.

What would settle it

An execution trace that produces one of the four anomalies inside a runtime asserted to enforce its corresponding avoidance level, or the discovery of a failing obligation inside the 274-obligation Verus development.

Figures

Figures reproduced from arXiv: 2606.17182 by Sajjad Khan.

**Figure 2.** Figure 2: SSI token overhead versus realized abort rate under the high [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

read the original abstract

Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, $L_0 \subsetneq \cdots \subsetneq L_4$, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified $L_0 \to L_1$ refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies a machine-checked consistency hierarchy plus detectors for four concurrency anomalies in multi-agent LLM runtimes, backed by 274 Verus obligations with zero assumes or admits.

read the letter

The core contribution is a TLA+ model of stale-generation, phantom-tool, causal-cascade, and tool-effect reordering anomalies under deterministic-generation semantics, followed by a Verus proof that realizes a strict L0-to-L4 chain and verifies sound, complete detectors plus refinements for each level.

The mechanical evidence is the main asset. TLC produces counterexamples for each anomaly, the Verus development discharges 274 obligations cleanly, and the authors show concrete reproductions and fixes in deer-flow and LangGraph. L2 is also exercised live across model families. These elements give direct support for the soundness and realizability claims rather than relying on fitted parameters or informal argument.

The work is new in its application to LLM agent runtimes and in shipping the verified hierarchy; the anomaly types themselves are acknowledged as classical analogues. The trust base is limited to two structural axioms and a mutex correspondence, which are stated explicitly.

Soft spots are modest and scoped. The axioms need review for how closely they match real read-generate-write behavior in deployed systems, and the entire development stays inside deterministic replay semantics. No broader claims are made.

This is for people building or verifying multi-agent LLM frameworks who care about runtime consistency. A reader who wants mechanically checked detectors and refinements will find usable artifacts here.

It deserves peer review because the verification work is substantial and the evidence is reproducible.

Referee Report

0 major / 3 minor

Summary. The paper claims to model concurrency anomalies (stale-generation, phantom-tool, causal-cascade, tool-effect reordering) in multi-agent LLM systems as long-running read-generate-write operations under deterministic-generation semantics. It formalizes the anomalies in TLA+ with TLC counterexamples, proves a consistency hierarchy L0 ⊂ ... ⊂ L4 via a 274-obligation Verus development (zero assumes/admits, trust base of two structural axioms plus mutex correspondence), realizes L0-L1 in three verified Rust runtimes, provides exec-mode verified prevention for L2-L4, and reproduces/fixes issues in deer-flow and LangGraph.

Significance. If the result holds, the contribution is significant as the first machine-checked consistency hierarchy for such runtimes, with direct mechanical support from 274 Verus obligations and TLC counterexamples for soundness/completeness claims, plus explicit reproduction and fixes in deployed systems (deer-flow, LangGraph). The verified detector, refinements, and realizability artifacts provide strong evidence.

minor comments (3)

[Abstract] Abstract: the phrase 'the exclusion lattice over these anomalies is trivial' is used without elaboration; a brief parenthetical or reference to the classical isolation literature would aid readers.
[Verus development section] The two structural axioms forming the Verus trust base are declared but their precise statements are not reproduced in the main body; including them (even in an appendix reference) would improve self-containment.
[Experimental evaluation] Table or figure reporting the 120 retracted sessions across model families should include the exact model versions and prompt templates used for reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of the manuscript, the assessment of its significance, and the recommendation to accept. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; claims rest on external mechanical verification

full rationale

The paper's derivation consists of TLA+ anomaly definitions with TLC counterexamples, followed by a 274-obligation Verus development (zero assume, zero admit) that proves soundness and completeness of detectors against independently stated specifications and each runtime's avoidance set. The only non-mechanical elements are two explicitly declared structural axioms plus a mutex correspondence, stated as the scoped trust base for the deterministic-generation read-generate-write model. No equation or claim reduces by construction to a fitted parameter, self-definition, or self-citation chain; the realizability proofs and lattice separation are machine-checked against external benchmarks rather than being tautological with the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The verification rests on two unnamed structural axioms and a mutex correspondence as the explicit trust base; no free parameters are introduced and the anomalies are presented as formal analogues of existing types rather than new postulated entities.

axioms (2)

domain assumption two structural axioms
Explicitly listed as part of the trust base for the 274 Verus obligations
domain assumption mutex correspondence
Explicitly listed as part of the trust base for the 274 Verus obligations

pith-pipeline@v0.9.1-grok · 5869 in / 1279 out tokens · 75273 ms · 2026-06-27T03:49:38.968832+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

101 extracted references · 8 linked inside Pith

[1]

SagaLLM: Context management, validation, and transaction guarantees for multi-agent LLM planning,

E. Y . Chang and L. Geng, “SagaLLM: Context management, validation, and transaction guarantees for multi-agent LLM planning,”Proceedings of the VLDB Endowment, 2025, arXiv:2503.11951. TABLE 6 Per-file Verus obligation counts (verus_count.sh -full).†marks the four helper files excluded from the 274 curated headline; the default verus_count.shrun applies th...

arXiv 2025
[2]

Atomix: Timely, transactional tool use for reliable agentic workflows,

B. Mohammadi, N. Potamitis, L. Klein, A. Arora, and L. Bindschaedler, “Atomix: Timely, transactional tool use for reliable agentic workflows,” arXiv preprint arXiv:2602.14849, 2026, preprint; not yet peer-reviewed at time of writing

Pith/arXiv arXiv 2026
[3]

How to make a multiprocessor computer that correctly executes multiprocess programs,

L. Lamport, “How to make a multiprocessor computer that correctly executes multiprocess programs,”IEEE Transactions on Computers, vol. C-28, no. 9, pp. 690–691, 1979

1979
[4]

Memory consistency and event ordering in scalable shared- memory multiprocessors,

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared- memory multiprocessors,” inISCA, 1990, pp. 15–26

1990
[5]

Sim- plifying arm concurrency: Multicopy-atomic axiomatic and operational models for ARMv8,

C. Pulte, S. Flur, W. Deacon, J. French, S. Sarkar, and P. Sewell, “Sim- plifying arm concurrency: Multicopy-atomic axiomatic and operational models for ARMv8,” inPOPL, vol. 2, 2018, pp. 19:1–19:29

2018
[6]

Linearizability: A correctness condition for concurrent objects,

M. P. Herlihy and J. M. Wing, “Linearizability: A correctness condition for concurrent objects,”ACM TOPLAS, vol. 12, no. 3, pp. 463–492, 1990

1990
[7]

A critique of ANSI SQL isolation levels,

H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil, “A critique of ANSI SQL isolation levels,” inACM SIGMOD, 1995, pp. 1–10

1995
[8]

Weak consistency: A generalized theory and optimistic implementations for distributed transactions,

A. Adya, “Weak consistency: A generalized theory and optimistic implementations for distributed transactions,” Ph.D. dissertation, MIT, 1999

1999
[9]

Generalized isolation level definitions,

A. Adya, B. Liskov, and P. O’Neil, “Generalized isolation level definitions,” inProceedings of the 16th International Conference on Data Engineering (ICDE), 2000, pp. 67–78

2000
[10]

Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS,

W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS,” inSOSP, 2011, pp. 401–416

2011
[11]

Consistency, availability, and convergence,

P. Mahajan, L. Alvisi, and M. Dahlin, “Consistency, availability, and convergence,” UT Austin, Tech. Rep. UTCS-TR-11-22, 2011

2011
[12]

Burckhardt,Principles of Eventual Consistency

S. Burckhardt,Principles of Eventual Consistency. now publishers, 2014

2014
[13]

Conflict-free replicated data types,

M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski, “Conflict-free replicated data types,” inSSS, 2011, pp. 386–400

2011
[14]

Session guarantees for weakly consistent replicated data,

D. B. Terry, A. J. Demers, K. Petersen, M. J. Spreitzer, M. M. Theimer, and B. B. Welch, “Session guarantees for weakly consistent replicated data,” inPDIS, 1994, pp. 140–149

1994
[15]

A universal modular ACTOR formalism for artificial intelligence,

C. Hewitt, P. Bishop, and R. Steiger, “A universal modular ACTOR formalism for artificial intelligence,” inIJCAI, 1973, pp. 235–245. S. KHAN 31

1973
[16]

Consistency tradeoffs in modern distributed database system design: CAP is only part of the story,

D. J. Abadi, “Consistency tradeoffs in modern distributed database system design: CAP is only part of the story,”IEEE Computer, vol. 45, no. 2, pp. 37–42, 2012

2012
[17]

Consistency in non-transactional distributed storage systems,

P. Viotti and M. Vukoli´c, “Consistency in non-transactional distributed storage systems,”ACM Computing Surveys, vol. 49, no. 1, pp. 19:1–19:34, 2016

2016
[18]

CodeCRDT: Observation-driven coordination for multi- agent LLM code generation,

S. Pugachev, “CodeCRDT: Observation-driven coordination for multi- agent LLM code generation,”arXiv preprint arXiv:2510.18893, 2025

arXiv 2025
[19]

Why do multi-agent LLM systems fail?

M. Cemri, M. Z. Pan, S. Yang, L. A. Agrawal, B. Chopra, R. Tiwari, K. Keutzer, A. Parameswaran, D. Klein, K. Ramchandran, M. Zaharia, J. E. Gonzalez, and I. Stoica, “Why do multi-agent LLM systems fail?” inNeurIPS Datasets and Benchmarks Track, 2025, spotlight; arXiv:2503.13657

Pith/arXiv arXiv 2025
[20]

Bolt-on causal con- sistency,

P. Bailis, A. Ghodsi, J. M. Hellerstein, and I. Stoica, “Bolt-on causal con- sistency,” inProceedings of the ACM SIGMOD International Conference on Management of Data, 2013, pp. 761–772

2013
[21]

Serializable isolation for snapshot databases,

M. J. Cahill, U. Röhm, and A. D. Fekete, “Serializable isolation for snapshot databases,” inProceedings of the ACM SIGMOD International Conference on Management of Data, 2008, pp. 729–738

2008
[22]

Making snapshot isolation serializable,

A. Fekete, D. Liarokapis, E. O’Neil, P. O’Neil, and D. Shasha, “Making snapshot isolation serializable,”ACM Transactions on Database Systems (TODS), vol. 30, no. 2, pp. 492–528, 2005

2005
[23]

Verus: Verifying Rust programs using linear ghost types,

A. Lattuada, T. Hance, C. Cho, M. Brun, I. Subasinghe, Y . Zhou, J. Howell, B. Parno, and C. Hawblitzel, “Verus: Verifying Rust programs using linear ghost types,” inProceedings of the ACM on Programming Languages (OOPSLA), vol. 7, 2023, pp. 286–315

2023
[24]

AutoGen: Enabling next-gen LLM applications via multi-agent conversation,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, S. Zhang, E. Zhu, B. Li, L. Jiang, X. Zhang, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,” inarXiv preprint arXiv:2308.08155, 2023

Pith/arXiv arXiv 2023
[25]

Probabilistically bounded staleness for practical partial quorums,

P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein, and I. Sto- ica, “Probabilistically bounded staleness for practical partial quorums,” Proceedings of the VLDB Endowment, vol. 5, no. 8, pp. 776–787, 2012

2012
[26]

Quantitative relaxation of concurrent data structures,

T. A. Henzinger, C. M. Kirsch, H. Payer, A. Sezgin, and A. Sokolova, “Quantitative relaxation of concurrent data structures,” inProceedings of the 40th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2013, pp. 317–328

2013
[27]

Robust composition: Towards a unified approach to access control and concurrency control,

M. S. Miller, “Robust composition: Towards a unified approach to access control and concurrency control,” Ph.D. dissertation, Johns Hopkins University, 2006

2006
[28]

Analysing the security properties of object-capability patterns,

T. Murray, “Analysing the security properties of object-capability patterns,” Ph.D. dissertation, University of Oxford, 2010

2010
[29]

Toolformer: Language models can teach themselves to use tools,

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettle- moyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inNeurIPS, 2023

2023
[30]

Gorilla: Large language model connected with massive APIs,

S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive APIs,” inarXiv preprint arXiv:2305.15334, 2023

Pith/arXiv arXiv 2023
[31]

Garcia-Molina and K

H. Garcia-Molina and K. Salem, “Sagas,” inProceedings of the 1987 ACM SIGMOD International Conference on Management of Data, 1987, pp. 249–259

1987
[32]

IronFleet: Proving practical distributed systems correct,

C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. Setty, and B. Zill, “IronFleet: Proving practical distributed systems correct,” inProceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), 2015, pp. 1–17

2015
[33]

Repairing sequential consistency in C/C++11,

O. Lahav, V . Vafeiadis, J. Kang, C.-K. Hur, and D. Dreyer, “Repairing sequential consistency in C/C++11,” inProceedings of the 38th ACM SIG- PLAN Conference on Programming Language Design and Implementation (PLDI), 2017, pp. 618–632

2017
[34]

GenMC: A model checker for weak memory models,

M. Kokologiannakis and V . Vafeiadis, “GenMC: A model checker for weak memory models,” inComputer Aided Verification (CAV), 2021, pp. 427–440

2021
[35]

RustBelt: Securing the foundations of the Rust programming language,

R. Jung, J.-H. Jourdan, R. Krebbers, and D. Dreyer, “RustBelt: Securing the foundations of the Rust programming language,” inProceedings of the 45th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL). Association for Computing Machinery, 2018, pp. 66:1–66:34

2018
[36]

A promising semantics for relaxed-memory concurrency,

J. Kang, C.-K. Hur, O. Lahav, V . Vafeiadis, and D. Dreyer, “A promising semantics for relaxed-memory concurrency,” inProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), 2017, pp. 175–189

2017
[37]

INV ALID_CONCURRENT_GRAPH_UPDATE: Lang- Graph error reference,

LangChain AI, “INV ALID_CONCURRENT_GRAPH_UPDATE: Lang- Graph error reference,” https://docs.langchain.com/oss/python/langgraph/ errors/INV ALID_CONCURRENT_GRAPH_UPDATE, 2025, accessed November 2025

2025
[38]

InvalidUpdateError when using parallel subgraphs with shared state keys in LangGraph,

Issue #6446, langchain-ai/langgraph, “InvalidUpdateError when using parallel subgraphs with shared state keys in LangGraph,” https://github. com/langchain-ai/langgraph/issues/6446, 2025, accessed November 2025

2025
[39]

Handling shared state across multi- agent conversations in AutoGen,

Discussion #7144, microsoft/autogen, “Handling shared state across multi- agent conversations in AutoGen,” https://github.com/microsoft/autogen/ discussions/7144, 2025, accessed November 2025

2025
[40]

Stateful persistence grains for each agent,

Pull Request #3954, microsoft/autogen, “Stateful persistence grains for each agent,” https://github.com/microsoft/autogen/pull/3954, 2024, accessed November 2025

2024
[41]

GraphFlow state persistence bug: workflow gets stuck after interruption during agent transitions,

Issue #7043, microsoft/autogen, “GraphFlow state persistence bug: workflow gets stuck after interruption during agent transitions,” https: //github.com/microsoft/autogen/issues/7043, 2025, accessed November 2025

2025
[42]

To-do list shown during streaming disappears after the run completes,

Issue #3123, bytedance/deer-flow, “To-do list shown during streaming disappears after the run completes,” https://github.com/bytedance/deer- flow/issues/3123, 2026, accessed June 2026

2026
[43]

fix(agents): preserve todos state across node updates,

Pull Request #3180, bytedance/deer-flow, “fix(agents): preserve todos state across node updates,” https://github.com/bytedance/deer-flow/pull/3180, 2026, accessed June 2026

2026
[44]

[runtime] Plan mode fails with duplicate todos channel type conflict,

Issue #3199, bytedance/deer-flow, “[runtime] Plan mode fails with duplicate todos channel type conflict,” https://github.com/bytedance/deer- flow/issues/3199, 2026, accessed June 2026

2026
[45]

Allocating isolation levels to transactions,

A. Fekete, “Allocating isolation levels to transactions,” inProceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), 2005, pp. 206–215

2005
[46]

Highly available transactions: Virtues and limitations,

P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Sto- ica, “Highly available transactions: Virtues and limitations,”Proceedings of the VLDB Endowment, vol. 7, no. 3, pp. 181–192, 2014

2014
[47]

Keeping CALM: When distributed consistency is easy,

J. M. Hellerstein and P. Alvaro, “Keeping CALM: When distributed consistency is easy,”Communications of the ACM, vol. 63, no. 9, pp. 72–81, 2020

2020
[48]

Coordination avoidance in database systems,

P. Bailis, A. Fekete, M. J. Franklin, A. Ghodsi, J. M. Hellerstein, and I. Stoica, “Coordination avoidance in database systems,” inProceedings of the VLDB Endowment, vol. 8, no. 3, 2014, pp. 185–196

2014
[49]

Seeing is believing: A client-centric specification of database isolation,

N. Crooks, Y . Pu, L. Alvisi, and A. Clement, “Seeing is believing: A client-centric specification of database isolation,” inProceedings of the ACM Symposium on Principles of Distributed Computing (PODC), 2017, pp. 73–82

2017
[50]

Spanner: Google’s globally-distributed database,

J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y . Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford, “Spanner: Google’s globally-distributed database,” inOSDI, 2012...

2012
[51]

Calvin: Fast distributed transactions for partitioned database systems,

A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi, “Calvin: Fast distributed transactions for partitioned database systems,” in ACM SIGMOD, 2012, pp. 1–12

2012
[52]

Software transactional memory,

N. Shavit and D. Touitou, “Software transactional memory,” inProceed- ings of the 14th ACM Symposium on Principles of Distributed Computing (PODC), 1995, pp. 204–213

1995
[53]

Transactional memory: Architectural support for lock-free data structures,

M. Herlihy and J. E. B. Moss, “Transactional memory: Architectural support for lock-free data structures,” inProceedings of the 20th International Symposium on Computer Architecture (ISCA), 1993, pp. 289–300

1993
[54]

Multiversion concurrency control— theory and algorithms,

P. A. Bernstein and N. Goodman, “Multiversion concurrency control— theory and algorithms,”ACM Transactions on Database Systems, vol. 8, no. 4, pp. 465–483, 1983

1983
[55]

ACTA: The SAGA continues,

P. K. Chrysanthis and K. Ramamritham, “ACTA: The SAGA continues,” inDatabase Transaction Models for Advanced Applications, A. K. Elmagarmid, Ed. Morgan Kaufmann, 1992, pp. 349–397

1992
[56]

Verdi: A framework for implementing and formally verifying distributed systems,

J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. Anderson, “Verdi: A framework for implementing and formally verifying distributed systems,” inProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2015, pp. 357–368

2015
[57]

Chapar: Certified causally consistent distributed key-value stores,

M. Lesani, C. J. Bell, and A. Chlipala, “Chapar: Certified causally consistent distributed key-value stores,” inProceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2016

2016
[58]

Jepsen: Distributed systems safety analysis,

K. Kingsbury, “Jepsen: Distributed systems safety analysis,” https://jepsen. io, 2013–present, accessed 2026

2013
[59]

Cobra: Making transactional key- value stores verifiably serializable,

C. Tan, C. Zhao, S. Mu, and M. Walfish, “Cobra: Making transactional key- value stores verifiably serializable,” inProceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020, pp. 63–80

2020
[60]

Elle: Inferring isolation anomalies from experimental observations,

K. Kingsbury and P. Alvaro, “Elle: Inferring isolation anomalies from experimental observations,” arXiv:2003.10554, 2020

arXiv 2003
[61]

ReAct: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inarXiv preprint arXiv:2210.03629, 2022. S. KHAN 32

Pith/arXiv arXiv 2022
[62]

MetaGPT: Meta programming for multi-agent collaborative framework,

S. Hong, X. Zheng, J. Chen, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “MetaGPT: Meta programming for multi-agent collaborative framework,” inarXiv preprint arXiv:2308.00352, 2023

Pith/arXiv arXiv 2023
[63]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inUIST, 2023

2023
[64]

Model context protocol,

Anthropic, “Model context protocol,” https://modelcontextprotocol.io, 2024

2024
[65]

Database transaction models for advanced applica- tions,

A. K. Elmagarmid, “Database transaction models for advanced applica- tions,”Morgan Kaufmann, 1992

1992
[66]

Synthesis of extended trans- action models using ACTA,

P. K. Chrysanthis and K. Ramamritham, “Synthesis of extended trans- action models using ACTA,”ACM Transactions on Database Systems, vol. 19, no. 3, pp. 450–491, 1994

1994
[67]

The ConTract model,

A. Reuter and H. Wächter, “The ConTract model,”IEEE Data Engineering Bulletin, vol. 14, no. 1, pp. 39–43, 1991

1991
[68]

Orleans: Distributed virtual actors for programmability and scalability,

P. A. Bernstein, S. Bykov, A. Geller, G. Kliot, and J. Thelin, “Orleans: Distributed virtual actors for programmability and scalability,” inMicrosoft Research Technical Report MSR-TR-2014-41, 2014

2014
[69]

Verification-aware planning for multi-agent systems,

T. Xuet al., “Verification-aware planning for multi-agent systems,”arXiv preprint arXiv:2510.17109, 2025

arXiv 2025
[70]

AgentVerify: Compositional formal verification of AI agent safety properties via LTL model checking,

E. Fang, “AgentVerify: Compositional formal verification of AI agent safety properties via LTL model checking,” Preprints.org, 2026, manuscript 202604.1029

arXiv 2026
[71]

Concurrency control in groupware systems,

C. A. Ellis and S. J. Gibbs, “Concurrency control in groupware systems,” inProceedings of the 1989 ACM SIGMOD International Conference on Management of Data, 1989, pp. 399–407

1989
[72]

Groupware: Some issues and experiences,

C. A. Ellis, S. J. Gibbs, and G. L. Rein, “Groupware: Some issues and experiences,”Communications of the ACM, vol. 34, no. 1, pp. 39–58, 1991

1991
[73]

H-Store: A high-performance, distributed main memory transaction processing system,

R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y . Zhang, J. Hugg, and D. J. Abadi, “H-Store: A high-performance, distributed main memory transaction processing system,”Proceedings of the VLDB Endowment, vol. 1, no. 2, pp. 1496–1499, 2008

2008
[74]

Durable Functions: Semantics for stateful serverless,

S. Burckhardt, C. Gillum, D. Justo, K. Kallas, C. McMahon, and C. S. Meiklejohn, “Durable Functions: Semantics for stateful serverless,” Proceedings of the ACM on Programming Languages (OOPSLA), vol. 5, 2021

2021
[75]

Temporal: durable execution platform,

Temporal Technologies, “Temporal: durable execution platform,” https: //temporal.io, accessed 2026

2026
[76]

MemGPT: Towards LLMs as operating systems,

C. Packer, V . Fang, S. G. Patil, K. Lin, S. Wooders, and J. E. Gonza- lez, “MemGPT: Towards LLMs as operating systems,”arXiv preprint arXiv:2310.08560, 2023

Pith/arXiv arXiv 2023
[77]

Multiparty asynchronous session types,

K. Honda, N. Yoshida, and M. Carbone, “Multiparty asynchronous session types,” inProceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2008, pp. 273–284

2008
[78]

Probabilistic relational reasoning for differential privacy,

G. Barthe, B. Köpf, F. Olmedo, and S. Zanella-Béguelin, “Probabilistic relational reasoning for differential privacy,” inProceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2012, pp. 97–110

2012
[79]

Weakest precondition reasoning for expected run-times of probabilistic programs,

B. L. Kaminski, J.-P. Katoen, C. Matheja, and F. Olmedo, “Weakest precondition reasoning for expected run-times of probabilistic programs,” inProgramming Languages and Systems (ESOP), 2016, pp. 364–389

2016
[80]

Toward a theory of transac- tional contention managers,

R. Guerraoui, M. Herlihy, and B. Pochon, “Toward a theory of transac- tional contention managers,” inProceedings of the 24th ACM Symposium on Principles of Distributed Computing (PODC), 2005, pp. 258–264

2005

Showing first 80 references.

[1] [1]

SagaLLM: Context management, validation, and transaction guarantees for multi-agent LLM planning,

E. Y . Chang and L. Geng, “SagaLLM: Context management, validation, and transaction guarantees for multi-agent LLM planning,”Proceedings of the VLDB Endowment, 2025, arXiv:2503.11951. TABLE 6 Per-file Verus obligation counts (verus_count.sh -full).†marks the four helper files excluded from the 274 curated headline; the default verus_count.shrun applies th...

arXiv 2025

[2] [2]

Atomix: Timely, transactional tool use for reliable agentic workflows,

B. Mohammadi, N. Potamitis, L. Klein, A. Arora, and L. Bindschaedler, “Atomix: Timely, transactional tool use for reliable agentic workflows,” arXiv preprint arXiv:2602.14849, 2026, preprint; not yet peer-reviewed at time of writing

Pith/arXiv arXiv 2026

[3] [3]

How to make a multiprocessor computer that correctly executes multiprocess programs,

L. Lamport, “How to make a multiprocessor computer that correctly executes multiprocess programs,”IEEE Transactions on Computers, vol. C-28, no. 9, pp. 690–691, 1979

1979

[4] [4]

Memory consistency and event ordering in scalable shared- memory multiprocessors,

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared- memory multiprocessors,” inISCA, 1990, pp. 15–26

1990

[5] [5]

Sim- plifying arm concurrency: Multicopy-atomic axiomatic and operational models for ARMv8,

C. Pulte, S. Flur, W. Deacon, J. French, S. Sarkar, and P. Sewell, “Sim- plifying arm concurrency: Multicopy-atomic axiomatic and operational models for ARMv8,” inPOPL, vol. 2, 2018, pp. 19:1–19:29

2018

[6] [6]

Linearizability: A correctness condition for concurrent objects,

M. P. Herlihy and J. M. Wing, “Linearizability: A correctness condition for concurrent objects,”ACM TOPLAS, vol. 12, no. 3, pp. 463–492, 1990

1990

[7] [7]

A critique of ANSI SQL isolation levels,

H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil, “A critique of ANSI SQL isolation levels,” inACM SIGMOD, 1995, pp. 1–10

1995

[8] [8]

Weak consistency: A generalized theory and optimistic implementations for distributed transactions,

A. Adya, “Weak consistency: A generalized theory and optimistic implementations for distributed transactions,” Ph.D. dissertation, MIT, 1999

1999

[9] [9]

Generalized isolation level definitions,

A. Adya, B. Liskov, and P. O’Neil, “Generalized isolation level definitions,” inProceedings of the 16th International Conference on Data Engineering (ICDE), 2000, pp. 67–78

2000

[10] [10]

Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS,

W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS,” inSOSP, 2011, pp. 401–416

2011

[11] [11]

Consistency, availability, and convergence,

P. Mahajan, L. Alvisi, and M. Dahlin, “Consistency, availability, and convergence,” UT Austin, Tech. Rep. UTCS-TR-11-22, 2011

2011

[12] [12]

Burckhardt,Principles of Eventual Consistency

S. Burckhardt,Principles of Eventual Consistency. now publishers, 2014

2014

[13] [13]

Conflict-free replicated data types,

M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski, “Conflict-free replicated data types,” inSSS, 2011, pp. 386–400

2011

[14] [14]

Session guarantees for weakly consistent replicated data,

D. B. Terry, A. J. Demers, K. Petersen, M. J. Spreitzer, M. M. Theimer, and B. B. Welch, “Session guarantees for weakly consistent replicated data,” inPDIS, 1994, pp. 140–149

1994

[15] [15]

A universal modular ACTOR formalism for artificial intelligence,

C. Hewitt, P. Bishop, and R. Steiger, “A universal modular ACTOR formalism for artificial intelligence,” inIJCAI, 1973, pp. 235–245. S. KHAN 31

1973

[16] [16]

Consistency tradeoffs in modern distributed database system design: CAP is only part of the story,

D. J. Abadi, “Consistency tradeoffs in modern distributed database system design: CAP is only part of the story,”IEEE Computer, vol. 45, no. 2, pp. 37–42, 2012

2012

[17] [17]

Consistency in non-transactional distributed storage systems,

P. Viotti and M. Vukoli´c, “Consistency in non-transactional distributed storage systems,”ACM Computing Surveys, vol. 49, no. 1, pp. 19:1–19:34, 2016

2016

[18] [18]

CodeCRDT: Observation-driven coordination for multi- agent LLM code generation,

S. Pugachev, “CodeCRDT: Observation-driven coordination for multi- agent LLM code generation,”arXiv preprint arXiv:2510.18893, 2025

arXiv 2025

[19] [19]

Why do multi-agent LLM systems fail?

M. Cemri, M. Z. Pan, S. Yang, L. A. Agrawal, B. Chopra, R. Tiwari, K. Keutzer, A. Parameswaran, D. Klein, K. Ramchandran, M. Zaharia, J. E. Gonzalez, and I. Stoica, “Why do multi-agent LLM systems fail?” inNeurIPS Datasets and Benchmarks Track, 2025, spotlight; arXiv:2503.13657

Pith/arXiv arXiv 2025

[20] [20]

Bolt-on causal con- sistency,

P. Bailis, A. Ghodsi, J. M. Hellerstein, and I. Stoica, “Bolt-on causal con- sistency,” inProceedings of the ACM SIGMOD International Conference on Management of Data, 2013, pp. 761–772

2013

[21] [21]

Serializable isolation for snapshot databases,

M. J. Cahill, U. Röhm, and A. D. Fekete, “Serializable isolation for snapshot databases,” inProceedings of the ACM SIGMOD International Conference on Management of Data, 2008, pp. 729–738

2008

[22] [22]

Making snapshot isolation serializable,

A. Fekete, D. Liarokapis, E. O’Neil, P. O’Neil, and D. Shasha, “Making snapshot isolation serializable,”ACM Transactions on Database Systems (TODS), vol. 30, no. 2, pp. 492–528, 2005

2005

[23] [23]

Verus: Verifying Rust programs using linear ghost types,

A. Lattuada, T. Hance, C. Cho, M. Brun, I. Subasinghe, Y . Zhou, J. Howell, B. Parno, and C. Hawblitzel, “Verus: Verifying Rust programs using linear ghost types,” inProceedings of the ACM on Programming Languages (OOPSLA), vol. 7, 2023, pp. 286–315

2023

[24] [24]

AutoGen: Enabling next-gen LLM applications via multi-agent conversation,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, S. Zhang, E. Zhu, B. Li, L. Jiang, X. Zhang, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,” inarXiv preprint arXiv:2308.08155, 2023

Pith/arXiv arXiv 2023

[25] [25]

Probabilistically bounded staleness for practical partial quorums,

P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein, and I. Sto- ica, “Probabilistically bounded staleness for practical partial quorums,” Proceedings of the VLDB Endowment, vol. 5, no. 8, pp. 776–787, 2012

2012

[26] [26]

Quantitative relaxation of concurrent data structures,

T. A. Henzinger, C. M. Kirsch, H. Payer, A. Sezgin, and A. Sokolova, “Quantitative relaxation of concurrent data structures,” inProceedings of the 40th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2013, pp. 317–328

2013

[27] [27]

Robust composition: Towards a unified approach to access control and concurrency control,

M. S. Miller, “Robust composition: Towards a unified approach to access control and concurrency control,” Ph.D. dissertation, Johns Hopkins University, 2006

2006

[28] [28]

Analysing the security properties of object-capability patterns,

T. Murray, “Analysing the security properties of object-capability patterns,” Ph.D. dissertation, University of Oxford, 2010

2010

[29] [29]

Toolformer: Language models can teach themselves to use tools,

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettle- moyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inNeurIPS, 2023

2023

[30] [30]

Gorilla: Large language model connected with massive APIs,

S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive APIs,” inarXiv preprint arXiv:2305.15334, 2023

Pith/arXiv arXiv 2023

[31] [31]

Garcia-Molina and K

H. Garcia-Molina and K. Salem, “Sagas,” inProceedings of the 1987 ACM SIGMOD International Conference on Management of Data, 1987, pp. 249–259

1987

[32] [32]

IronFleet: Proving practical distributed systems correct,

C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. Setty, and B. Zill, “IronFleet: Proving practical distributed systems correct,” inProceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), 2015, pp. 1–17

2015

[33] [33]

Repairing sequential consistency in C/C++11,

O. Lahav, V . Vafeiadis, J. Kang, C.-K. Hur, and D. Dreyer, “Repairing sequential consistency in C/C++11,” inProceedings of the 38th ACM SIG- PLAN Conference on Programming Language Design and Implementation (PLDI), 2017, pp. 618–632

2017

[34] [34]

GenMC: A model checker for weak memory models,

M. Kokologiannakis and V . Vafeiadis, “GenMC: A model checker for weak memory models,” inComputer Aided Verification (CAV), 2021, pp. 427–440

2021

[35] [35]

RustBelt: Securing the foundations of the Rust programming language,

R. Jung, J.-H. Jourdan, R. Krebbers, and D. Dreyer, “RustBelt: Securing the foundations of the Rust programming language,” inProceedings of the 45th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL). Association for Computing Machinery, 2018, pp. 66:1–66:34

2018

[36] [36]

A promising semantics for relaxed-memory concurrency,

J. Kang, C.-K. Hur, O. Lahav, V . Vafeiadis, and D. Dreyer, “A promising semantics for relaxed-memory concurrency,” inProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), 2017, pp. 175–189

2017

[37] [37]

INV ALID_CONCURRENT_GRAPH_UPDATE: Lang- Graph error reference,

LangChain AI, “INV ALID_CONCURRENT_GRAPH_UPDATE: Lang- Graph error reference,” https://docs.langchain.com/oss/python/langgraph/ errors/INV ALID_CONCURRENT_GRAPH_UPDATE, 2025, accessed November 2025

2025

[38] [38]

InvalidUpdateError when using parallel subgraphs with shared state keys in LangGraph,

Issue #6446, langchain-ai/langgraph, “InvalidUpdateError when using parallel subgraphs with shared state keys in LangGraph,” https://github. com/langchain-ai/langgraph/issues/6446, 2025, accessed November 2025

2025

[39] [39]

Handling shared state across multi- agent conversations in AutoGen,

Discussion #7144, microsoft/autogen, “Handling shared state across multi- agent conversations in AutoGen,” https://github.com/microsoft/autogen/ discussions/7144, 2025, accessed November 2025

2025

[40] [40]

Stateful persistence grains for each agent,

Pull Request #3954, microsoft/autogen, “Stateful persistence grains for each agent,” https://github.com/microsoft/autogen/pull/3954, 2024, accessed November 2025

2024

[41] [41]

GraphFlow state persistence bug: workflow gets stuck after interruption during agent transitions,

Issue #7043, microsoft/autogen, “GraphFlow state persistence bug: workflow gets stuck after interruption during agent transitions,” https: //github.com/microsoft/autogen/issues/7043, 2025, accessed November 2025

2025

[42] [42]

To-do list shown during streaming disappears after the run completes,

Issue #3123, bytedance/deer-flow, “To-do list shown during streaming disappears after the run completes,” https://github.com/bytedance/deer- flow/issues/3123, 2026, accessed June 2026

2026

[43] [43]

fix(agents): preserve todos state across node updates,

Pull Request #3180, bytedance/deer-flow, “fix(agents): preserve todos state across node updates,” https://github.com/bytedance/deer-flow/pull/3180, 2026, accessed June 2026

2026

[44] [44]

[runtime] Plan mode fails with duplicate todos channel type conflict,

Issue #3199, bytedance/deer-flow, “[runtime] Plan mode fails with duplicate todos channel type conflict,” https://github.com/bytedance/deer- flow/issues/3199, 2026, accessed June 2026

2026

[45] [45]

Allocating isolation levels to transactions,

A. Fekete, “Allocating isolation levels to transactions,” inProceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), 2005, pp. 206–215

2005

[46] [46]

Highly available transactions: Virtues and limitations,

P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Sto- ica, “Highly available transactions: Virtues and limitations,”Proceedings of the VLDB Endowment, vol. 7, no. 3, pp. 181–192, 2014

2014

[47] [47]

Keeping CALM: When distributed consistency is easy,

J. M. Hellerstein and P. Alvaro, “Keeping CALM: When distributed consistency is easy,”Communications of the ACM, vol. 63, no. 9, pp. 72–81, 2020

2020

[48] [48]

Coordination avoidance in database systems,

P. Bailis, A. Fekete, M. J. Franklin, A. Ghodsi, J. M. Hellerstein, and I. Stoica, “Coordination avoidance in database systems,” inProceedings of the VLDB Endowment, vol. 8, no. 3, 2014, pp. 185–196

2014

[49] [49]

Seeing is believing: A client-centric specification of database isolation,

N. Crooks, Y . Pu, L. Alvisi, and A. Clement, “Seeing is believing: A client-centric specification of database isolation,” inProceedings of the ACM Symposium on Principles of Distributed Computing (PODC), 2017, pp. 73–82

2017

[50] [50]

Spanner: Google’s globally-distributed database,

J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y . Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford, “Spanner: Google’s globally-distributed database,” inOSDI, 2012...

2012

[51] [51]

Calvin: Fast distributed transactions for partitioned database systems,

A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi, “Calvin: Fast distributed transactions for partitioned database systems,” in ACM SIGMOD, 2012, pp. 1–12

2012

[52] [52]

Software transactional memory,

N. Shavit and D. Touitou, “Software transactional memory,” inProceed- ings of the 14th ACM Symposium on Principles of Distributed Computing (PODC), 1995, pp. 204–213

1995

[53] [53]

Transactional memory: Architectural support for lock-free data structures,

M. Herlihy and J. E. B. Moss, “Transactional memory: Architectural support for lock-free data structures,” inProceedings of the 20th International Symposium on Computer Architecture (ISCA), 1993, pp. 289–300

1993

[54] [54]

Multiversion concurrency control— theory and algorithms,

P. A. Bernstein and N. Goodman, “Multiversion concurrency control— theory and algorithms,”ACM Transactions on Database Systems, vol. 8, no. 4, pp. 465–483, 1983

1983

[55] [55]

ACTA: The SAGA continues,

P. K. Chrysanthis and K. Ramamritham, “ACTA: The SAGA continues,” inDatabase Transaction Models for Advanced Applications, A. K. Elmagarmid, Ed. Morgan Kaufmann, 1992, pp. 349–397

1992

[56] [56]

Verdi: A framework for implementing and formally verifying distributed systems,

J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. Anderson, “Verdi: A framework for implementing and formally verifying distributed systems,” inProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2015, pp. 357–368

2015

[57] [57]

Chapar: Certified causally consistent distributed key-value stores,

M. Lesani, C. J. Bell, and A. Chlipala, “Chapar: Certified causally consistent distributed key-value stores,” inProceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2016

2016

[58] [58]

Jepsen: Distributed systems safety analysis,

K. Kingsbury, “Jepsen: Distributed systems safety analysis,” https://jepsen. io, 2013–present, accessed 2026

2013

[59] [59]

Cobra: Making transactional key- value stores verifiably serializable,

C. Tan, C. Zhao, S. Mu, and M. Walfish, “Cobra: Making transactional key- value stores verifiably serializable,” inProceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020, pp. 63–80

2020

[60] [60]

Elle: Inferring isolation anomalies from experimental observations,

K. Kingsbury and P. Alvaro, “Elle: Inferring isolation anomalies from experimental observations,” arXiv:2003.10554, 2020

arXiv 2003

[61] [61]

ReAct: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inarXiv preprint arXiv:2210.03629, 2022. S. KHAN 32

Pith/arXiv arXiv 2022

[62] [62]

MetaGPT: Meta programming for multi-agent collaborative framework,

S. Hong, X. Zheng, J. Chen, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “MetaGPT: Meta programming for multi-agent collaborative framework,” inarXiv preprint arXiv:2308.00352, 2023

Pith/arXiv arXiv 2023

[63] [63]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inUIST, 2023

2023

[64] [64]

Model context protocol,

Anthropic, “Model context protocol,” https://modelcontextprotocol.io, 2024

2024

[65] [65]

Database transaction models for advanced applica- tions,

A. K. Elmagarmid, “Database transaction models for advanced applica- tions,”Morgan Kaufmann, 1992

1992

[66] [66]

Synthesis of extended trans- action models using ACTA,

P. K. Chrysanthis and K. Ramamritham, “Synthesis of extended trans- action models using ACTA,”ACM Transactions on Database Systems, vol. 19, no. 3, pp. 450–491, 1994

1994

[67] [67]

The ConTract model,

A. Reuter and H. Wächter, “The ConTract model,”IEEE Data Engineering Bulletin, vol. 14, no. 1, pp. 39–43, 1991

1991

[68] [68]

Orleans: Distributed virtual actors for programmability and scalability,

P. A. Bernstein, S. Bykov, A. Geller, G. Kliot, and J. Thelin, “Orleans: Distributed virtual actors for programmability and scalability,” inMicrosoft Research Technical Report MSR-TR-2014-41, 2014

2014

[69] [69]

Verification-aware planning for multi-agent systems,

T. Xuet al., “Verification-aware planning for multi-agent systems,”arXiv preprint arXiv:2510.17109, 2025

arXiv 2025

[70] [70]

AgentVerify: Compositional formal verification of AI agent safety properties via LTL model checking,

E. Fang, “AgentVerify: Compositional formal verification of AI agent safety properties via LTL model checking,” Preprints.org, 2026, manuscript 202604.1029

arXiv 2026

[71] [71]

Concurrency control in groupware systems,

C. A. Ellis and S. J. Gibbs, “Concurrency control in groupware systems,” inProceedings of the 1989 ACM SIGMOD International Conference on Management of Data, 1989, pp. 399–407

1989

[72] [72]

Groupware: Some issues and experiences,

C. A. Ellis, S. J. Gibbs, and G. L. Rein, “Groupware: Some issues and experiences,”Communications of the ACM, vol. 34, no. 1, pp. 39–58, 1991

1991

[73] [73]

H-Store: A high-performance, distributed main memory transaction processing system,

R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y . Zhang, J. Hugg, and D. J. Abadi, “H-Store: A high-performance, distributed main memory transaction processing system,”Proceedings of the VLDB Endowment, vol. 1, no. 2, pp. 1496–1499, 2008

2008

[74] [74]

Durable Functions: Semantics for stateful serverless,

S. Burckhardt, C. Gillum, D. Justo, K. Kallas, C. McMahon, and C. S. Meiklejohn, “Durable Functions: Semantics for stateful serverless,” Proceedings of the ACM on Programming Languages (OOPSLA), vol. 5, 2021

2021

[75] [75]

Temporal: durable execution platform,

Temporal Technologies, “Temporal: durable execution platform,” https: //temporal.io, accessed 2026

2026

[76] [76]

MemGPT: Towards LLMs as operating systems,

C. Packer, V . Fang, S. G. Patil, K. Lin, S. Wooders, and J. E. Gonza- lez, “MemGPT: Towards LLMs as operating systems,”arXiv preprint arXiv:2310.08560, 2023

Pith/arXiv arXiv 2023

[77] [77]

Multiparty asynchronous session types,

K. Honda, N. Yoshida, and M. Carbone, “Multiparty asynchronous session types,” inProceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2008, pp. 273–284

2008

[78] [78]

Probabilistic relational reasoning for differential privacy,

G. Barthe, B. Köpf, F. Olmedo, and S. Zanella-Béguelin, “Probabilistic relational reasoning for differential privacy,” inProceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2012, pp. 97–110

2012

[79] [79]

Weakest precondition reasoning for expected run-times of probabilistic programs,

B. L. Kaminski, J.-P. Katoen, C. Matheja, and F. Olmedo, “Weakest precondition reasoning for expected run-times of probabilistic programs,” inProgramming Languages and Systems (ESOP), 2016, pp. 364–389

2016

[80] [80]

Toward a theory of transac- tional contention managers,

R. Guerraoui, M. Herlihy, and B. Pochon, “Toward a theory of transac- tional contention managers,” inProceedings of the 24th ACM Symposium on Principles of Distributed Computing (PODC), 2005, pp. 258–264

2005