pith. machine review for the scientific record. sign in

arxiv: 2604.20587 · v1 · submitted 2026-04-22 · 💻 cs.DB

Recognition: unknown

Making TransactionIsolation Checking Practical

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:52 UTC · model grok-4.3

classification 💻 cs.DB
keywords transaction isolation checkingdatabase trace verificationSMT constraint solvingisolation levelskey-value store operationsmodular verification pipelinesuperpositions abstraction
0
0 comments X

The pith

Boomslang provides the first general-purpose way to check database transaction isolation for arbitrary operations without any knowledge of database internals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Boomslang as a checking framework that parses database traces into an abstract graph, lowers the graph into an intermediate representation, and solves the resulting constraints to verify isolation guarantees. This design separates concerns so that the front end handles trace parsing and semantic lowering while the back end performs the actual verification. A sympathetic reader would care because previous tools were restricted to narrow sets of operations or required detailed knowledge of how a specific database works, leaving many real configurations uncheckable. Boomslang shows that a single modular pipeline can reimplement several earlier checkers in a few hundred lines of code and can also expose bugs in production systems.

Core claim

Boomslang is the first general-purpose checking framework capable of verifying configurations that were previously uncheckable. It supports arbitrary operation types provided by modern transactional key-value stores, requires no knowledge of database internals, and offers a modular, extensible pipeline. The framework parses a database trace into an Abstract Semantic Graph, lowers it via semantic analysis into a low-level IR that uses superpositions to capture uncertainty, and converts the IR into constraints for SMT solving.

What carries the argument

Superpositions in the intermediate representation, an abstraction that captures the uncertainty and complexity arising from arbitrary operations and missing information about database internals.

If this is right

  • The core logic of three prior checkers can be reimplemented as Boomslang modules using only 271-386 lines of code while achieving comparable or superior performance.
  • Boomslang can identify a new bug in TiDB and audit the metadata layer of the JuiceFS file system.
  • It can check vendor-specific behavior in MariaDB and support five previously unchecked isolation levels.
  • It can confirm a theoretical result on the correctness of strict serializability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same front-end and superposition approach could be adapted to verify other storage properties such as atomic visibility in file systems or object stores.
  • Integration with automated testing harnesses could allow routine checking of isolation guarantees whenever a new transaction implementation is released.
  • Optimizations to the SMT backend might extend the practical size of traces that can be checked without manual intervention.

Load-bearing premise

The superpositions abstraction can accurately capture and resolve the uncertainty caused by arbitrary operations and unknown database internals.

What would settle it

A concrete database trace containing a known isolation violation that Boomslang reports as correct, or a trace that produces an unsolvable constraint set solely because of unresolved superpositions.

Figures

Figures reproduced from arXiv: 2604.20587 by Cheng Tan, Jian Zhang, Shuai Mu.

Figure 1
Figure 1. Figure 1: Problem setup. 2 Setup, motivation, and applicability 2.1 Problem setup In this paper, we consider the following setup. Multiple clients simultaneously send requests to a database. The database op￾erates on the requests in a concurrent manner and sends re￾sponses back to the clients. These clients and the external world observe the requests to and responses from the database, which we refer to as a trace. … view at source ↗
Figure 2
Figure 2. Figure 2: Boomslang pipeline. may have concerns about whether their databases perform correctly. This is especially true for correctness-critical applications such as financial services and healthcare. The database owner can use a checker to ensure the integrity of their outsourced databases without trusting the cloud. More potential applications. More broadly, a checker enables unexplored applications: auditing int… view at source ↗
Figure 3
Figure 3. Figure 3: An Abstract Semantic Graph example for serializability. Nodes “Ni” represent transactions; “x” denotes a key; and numbers such as “1” and “2” represent values. In the write history, key x has a write set “wSet” consisting of node N1 writing x=1 and node N2 writing x=2. The version order “verOrder” specifies that N1 happens before N2. In the read lineage, reading x → 2 happens at least once for the node N3 … view at source ↗
Figure 4
Figure 4. Figure 4: An example demonstrating unknowns and graphs with superposition. On the left is a trace with four transactions (begins and commits are omitted). On the right are four graphs with different assumptions for checking serializability; they all have transactions as nodes with solid edges representing known dependencies. (a) is a graph with known version order and encoded values. Graph (b), (c) and (d) incorpora… view at source ↗
Figure 5
Figure 5. Figure 5: Transforming ASG to low-level IR. This describes the logical transformation flow. In practice, the implementation combines common dependency passes to improve efficiency. At a high level, Boomslang processes the ASG through multiple passes, generating each required dependency type one at a time. Each dependency-generating pass analyzes the ASG to determine whether a dependency exists between two nodes. Cru… view at source ↗
Figure 6
Figure 6. Figure 6: Boomslang can check various combinations of isolation levels, databases, and workloads. ★ Checking PL-2+ is challenging, and Boomslang does not scale to 10K transactions as it does for other isolation levels. RandomBench can only be checked by Boomslang since they have operations that existing checkers cannot handle. We only present the result of Port on a relatively small trace. Even though both PL-2+ and… view at source ↗
Figure 7
Figure 7. Figure 7: Lines of code (LOC) for each checker component: IR generation (Gen. IR), pruning, and SMT encoding. † , ★ , • , + means this component is shared across checkers. For example, Boomslang SER/PL-2+/CS share the same component of converting ASG to low-level IRs. Checker Original LOC LOC in Boomslang Cobra [64] 16.6K 271 Viper [73] 2.9K 281 PolySI [48] 3.4K 386 [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Lines of code (LOC) for Cobra/Viper/PolySI’s original system and their implementations in Boomslang. 10K 20K 30K 40K 50K 60K 70K 80K 0 50 100 150 200 250 runtime (seconds) 4 12 28 46 78 127 177 233 5 14 27 45 68 100 127 171 PolySI Viper [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Checking SI: a fair comparison between PolySI’s and Viper’s encodings Among the eight checkers, we implement five checkers Boomslang RR, RC, PL-2+, CS, and PL-FCV that verify pre￾viously uncheckable isolation levels—repeatable read [29, 32], read committed [29, Section 3.2.2], PL-2+ [29, Section 4.1], cursor stability [29, Section 4.6] and PL-FCV [29, Section 4.6] [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Checking serializability: Boomslang and Cobra. a fair performance comparison, removing differences due to implementation-specific factors [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Checking serializability: Boomslang and Elle. against Viper and PolySI across four workloads with 50K transactions. The results reveal distinct scalability bottlenecks in existing tools: Viper consistently hits the 600-second execu￾tion timeout (TO) on BlindW-C, RUBiS, and Twitter, indicat￾ing Viper fails to scale computationally for large workloads. In contrast, Boomslang demonstrates superior efficiency… view at source ↗
Figure 13
Figure 13. Figure 13: Decomposition of Boomslang runtime of BlindW-C (2K keys, 32 threads, 10K txns) under different settings. 3K 6K 12K 18K 24K number of superpositions 0 50 100 150 200 runtime (s) SER [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗
Figure 16
Figure 16. Figure 16: Boomslang’s IR. read-dependencies; for deleted keys, their tombstones are re￾turned, revealing the transactions responsible for the deletions. This allows Boomslang to establish read-dependencies to the deleting transactions. If tombstones are not used, the trace contains unknowns, requiring superposition to handle such cases, which is described in Section A.4. Other dependencies. Boomslang also supports … view at source ↗
Figure 17
Figure 17. Figure 17: Boomslang’s scalability on checking serializability (SER), snapshot isolation (SI), and read committed (RC). 10K 20K 40K 60K 80K number of transactions 0 5 10 15 20 25 30 35 40 45 memory cost (GB) SER SI RC [PITH_FULL_IMAGE:figures/full_fig_p017_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Boomslang’s scalability on checking serializability (SER), snapshot isolation (SI), and read committed (RC). Applications. We experiment with the existing BlindW-C [4, 5] from Cobra [64]. Boomslang’s scalability In this experiment, we study how performance varies across different isolation levels, and how Boomslang scales as the problem size grows. We use BlindW￾C because scalability testing with basic re… view at source ↗
Figure 19
Figure 19. Figure 19: Comparison of encoding implementations. “Origin” refers to the original implementation, while “Boomslang” represents Boomslang’s implementation of the same encoding. The x-axis shows the encoding and the trace size; for example, “Cobra (10K)” indicates Cobra’s encoding for checking serializability on 10K trans￾actions. All traces are generated from BlindW, except Emme’s, which is generated using RandomRMW… view at source ↗
read the original abstract

Checking whether database transactions adhere to isolation levels is a crucial yet challenging problem. We present Boomslang, the first general-purpose checking framework capable of verifying configurations that were previously uncheckable. Boomslang advances beyond prior work in three key aspects: (1) it supports arbitrary operation types provided by modern transactional key-value stores, (2) it requires no knowledge of database internals, and (3) it offers a modular, extensible pipeline amenable to customization and optimizations. Boomslang adopts a front-/back-end separation. As the front-end, it parses a database trace into an Abstract Semantic Graph, which is then lowered -- via semantic analysis -- into a low-level intermediate representation (IR). The back-end converts this IR to a set of constraints for SMT solving. This design is enabled by a key abstraction in the IR, called superpositions, which capture the uncertainty and complexity caused by arbitrary operations and missing information. Our experiments show that with just 271--386 lines of code, the core logic of three prior checkers can be reimplemented as Boomslang modules, achieving comparable or superior performance. Using Boomslang, we also identify a new bug in TiDB, audit the metadata layer of the JuiceFS file system, check vendor-specific behavior in MariaDB, support five previously unchecked isolation levels, and confirm a theoretical result on the correctness of strict serializability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Boomslang, a general-purpose framework for checking whether database transaction traces adhere to specified isolation levels. It uses a front-end that parses traces into an Abstract Semantic Graph (ASG), lowers the ASG via semantic analysis into a low-level IR that employs a 'superpositions' abstraction to represent uncertainty arising from arbitrary operations and unknown database internals, and a back-end that encodes the IR as SMT constraints for solving. The framework is claimed to support arbitrary operation types from modern key-value stores, require no database internals knowledge, and be modular/extensible. Experiments report that three prior checkers can be reimplemented as Boomslang modules in 271-386 lines of code with comparable or better performance, plus new results including a bug in TiDB, an audit of JuiceFS metadata, checks on MariaDB vendor behavior, support for five previously uncheckable isolation levels, and confirmation of a theoretical result on strict serializability.

Significance. If the superpositions abstraction in the IR is shown to be sound and complete while accurately capturing the relevant uncertainties, Boomslang would be a significant practical contribution: it generalizes prior specialized isolation checkers into a single extensible pipeline and enables verification of configurations that were previously out of reach. The reported reimplementations with small code sizes and the concrete bug findings supply evidence of engineering utility and extensibility.

major comments (2)
  1. [Abstract and §5] Abstract and §5 (Experiments): the claims of 'successful reimplementations' of prior checkers and 'bug findings' are presented without any description of the verification methods used, error analysis performed, or how soundness of the superpositions lowering was established. Because soundness of the new IR abstraction is load-bearing for all downstream claims (including the reimplementations and new checks), this omission prevents assessment of whether the reported results are reliable.
  2. [§4] §4 (IR and superpositions): the manuscript must supply a precise, formal definition of the superpositions abstraction together with a proof sketch or argument that the lowering from ASG to IR preserves the necessary properties for isolation checking. Without this, it is impossible to evaluate whether the abstraction correctly resolves uncertainty from arbitrary operations while remaining sound.
minor comments (2)
  1. [Abstract] The abstract states that Boomslang 'supports five previously unchecked isolation levels' but does not name them or indicate which prior work they extend; this should be clarified in the introduction or related-work section for context.
  2. [§3] Figure captions and the pipeline diagram (presumably in §3) should explicitly label the data flow from ASG through superpositions to SMT constraints so that readers can trace the modular components without ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments correctly identify the need for greater formality in presenting the superpositions abstraction and for more explicit validation details in the experimental claims. We will revise the manuscript to address both points.

read point-by-point responses
  1. Referee: [Abstract and §5] Abstract and §5 (Experiments): the claims of 'successful reimplementations' of prior checkers and 'bug findings' are presented without any description of the verification methods used, error analysis performed, or how soundness of the superpositions lowering was established. Because soundness of the new IR abstraction is load-bearing for all downstream claims (including the reimplementations and new checks), this omission prevents assessment of whether the reported results are reliable.

    Authors: We agree that the current presentation omits necessary details on verification methods and error analysis. In the revised manuscript we will expand §5 with a new subsection describing the validation process for the three reimplementations (including the exact test suites, output comparison procedures against the original checkers, and any manual trace inspections performed) as well as the reproduction steps and confirmation methods for the TiDB bug and other findings. We will also add an explicit argument, placed in §4, that the superpositions lowering preserves the properties required for sound isolation checking; this argument will be supported by the semantic-analysis invariants already computed in the front-end. revision: yes

  2. Referee: [§4] §4 (IR and superpositions): the manuscript must supply a precise, formal definition of the superpositions abstraction together with a proof sketch or argument that the lowering from ASG to IR preserves the necessary properties for isolation checking. Without this, it is impossible to evaluate whether the abstraction correctly resolves uncertainty from arbitrary operations while remaining sound.

    Authors: We accept that a precise formal definition and accompanying argument are required. The revised §4 will contain (1) a formal definition of a superposition as a finite set of possible operation outcomes together with an associated uncertainty predicate, and (2) a proof sketch showing that the ASG-to-IR lowering is semantics-preserving for the purpose of isolation checking: every concrete execution admitted by the original trace is represented by at least one superposition, and every superposition corresponds to a set of executions consistent with the observed trace and the unknown database internals. This establishes that the subsequent SMT encoding cannot produce false negatives for isolation violations. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents Boomslang as a new modular framework for isolation checking via ASG parsing, lowering to an IR that uses the superpositions abstraction for uncertainty, and SMT solving. No mathematical derivations, fitted parameters, predictions, or equations are described that reduce to the inputs by construction. Reimplementations of prior checkers (271-386 LOC) and new bug findings provide independent practical validation. No self-citations, uniqueness theorems, or ansatz smuggling appear in the abstract or pipeline description. The approach is self-contained as an engineering design with external evidence of utility.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central design rests on the new superpositions abstraction to handle uncertainty; no free parameters are mentioned. Relies on standard assumptions that traces can be parsed into semantic graphs and that SMT solvers can decide the generated constraints.

axioms (2)
  • domain assumption Database operation traces can be faithfully parsed into an Abstract Semantic Graph
    Invoked in the front-end description.
  • domain assumption SMT solvers are capable of deciding the constraints produced from the IR
    Assumed in the back-end conversion step.
invented entities (1)
  • superpositions no independent evidence
    purpose: Capture uncertainty and complexity caused by arbitrary operations and missing information in the low-level IR
    New abstraction introduced to enable the general-purpose design; no independent evidence provided in abstract.

pith-pipeline@v0.9.0 · 5539 in / 1304 out tokens · 70738 ms · 2026-05-09T22:52:11.848293+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

75 extracted references · 4 canonical work pages

  1. [1]

    https://github.com/YugaByte/yugabyte-db/issues/824

    Acknowledged inserts can be present in reads for tens of seconds, then disappear. https://github.com/YugaByte/yugabyte-db/issues/824

  2. [2]

    https://github.com

    Anonymous bug issue on github. https://github.com

  3. [3]

    https: //www.infoq.com/presentations/Big-Data-in-Real-Time-at-Twitter

    Big data in real time at Twitter. https: //www.infoq.com/presentations/Big-Data-in-Real-Time-at-Twitter

  4. [4]

    https://github.com/DBCobra/CobraBench

    Cobra bench. https://github.com/DBCobra/CobraBench

  5. [5]

    https://github.com/DBCobra/CobraVerifier

    Cobra verifier. https://github.com/DBCobra/CobraVerifier

  6. [6]

    https://github.com/DBCobra/CobraLogs/tree/ 12e4421a1eb7f276d2de36608980dd3e535de64d

    Cobralogs. https://github.com/DBCobra/CobraLogs/tree/ 12e4421a1eb7f276d2de36608980dd3e535de64d

  7. [7]

    https://github.com/cockroachdb/cockroach/issues/10030

    G2: anti-dependency cycles. https://github.com/cockroachdb/cockroach/issues/10030

  8. [8]

    https://github.com/YugaByte/yugabyte-db/issues/2125

    G2-item anomaly with master kills. https://github.com/YugaByte/yugabyte-db/issues/2125

  9. [9]

    https://jepsen.io/

    Jepsen. https://jepsen.io/

  10. [10]

    http://jepsen.io/analyses/faunadb-2.5.4

    Jepsen: Faunadb 2.5.4. http://jepsen.io/analyses/faunadb-2.5.4

  11. [11]

    https://jepsen.io/analyses/mongodb-4.2.6

    Jepsen: Mongodb 4.2.6. https://jepsen.io/analyses/mongodb-4.2.6

  12. [12]

    https://jepsen.io/analyses/tidb-2.1.7

    Jepsen: Tidb 2.1.7. https://jepsen.io/analyses/tidb-2.1.7

  13. [13]

    https://github.com/juicedata/juicefs

    Juicefs. https://github.com/juicedata/juicefs

  14. [14]

    https://www.cockroachlabs.com/blog/jepsen-tests-lessons/

    Lessons learned from 2+ years of nightly jepsen tests. https://www.cockroachlabs.com/blog/jepsen-tests-lessons/

  15. [15]

    https://mariadb.org/

    Mariadb. https://mariadb.org/

  16. [16]

    https://www.mysql.com/

    Mysql. https://www.mysql.com/

  17. [17]

    NCC: Natural concurrency control for strictly serializable datastores by avoiding the timestamp-inversion pitfall (extended version)

  18. [18]

    https://github.com/dracoooooo/Plume

    Plume: Efficient and complete black-box checking of weak isolation levels. https://github.com/dracoooooo/Plume

  19. [19]

    https://github.com/hengxin/PolySI-PVLDB2023-Artifacts

    Polysi github repo. https://github.com/hengxin/PolySI-PVLDB2023-Artifacts

  20. [20]

    https://www.postgresql.org/

    Postgresql official website. https://www.postgresql.org/

  21. [21]

    https://rubis.ow2.org/

    RUBiS. https://rubis.ow2.org/

  22. [22]

    https://github.com/lucamul/TapirCorrectnessTest

    Tapircorrectnesstest. https://github.com/lucamul/TapirCorrectnessTest

  23. [23]

    https://www.pingcap.com/

    Tidb. https://www.pingcap.com/

  24. [24]

    https://tikv.org/

    Tikv is a highly scalable, low latency, and easy to use key-value database. https://tikv.org/

  25. [25]

    http://www.tpc.org/tpcc/

    TPC-C. http://www.tpc.org/tpcc/

  26. [26]

    https://github.com/Khoury-srg/Viper

    Viper github repo. https://github.com/Khoury-srg/Viper

  27. [27]

    https://jira.mariadb.org/browse/MDEV-26642

    Weird select view when a record is modified to the same value by two transactions. https://jira.mariadb.org/browse/MDEV-26642

  28. [28]

    https://github.com/yugabyte/yugabyte-db

    Yugabytedb github repo. https://github.com/yugabyte/yugabyte-db

  29. [29]

    Adya.Weak consistency: a generalized theory and optimistic implementations for distributed transactions

    A. Adya.Weak consistency: a generalized theory and optimistic implementations for distributed transactions. PhD thesis, Massachusetts Institute of Technology, 1999

  30. [30]

    A. Adya, B. Liskov, and P. O’Neil. Generalized isolation level definitions. InProceedings of 16th International Conference on Data Engineering, 2000

  31. [31]

    Alquraan, H

    A. Alquraan, H. Takruri, M. Alfatafta, and S. Al-Kiswany. An analysis of network-partitioning failures in cloud systems. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 51–68, 2018

  32. [32]

    Berenson, P

    H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil. A critique of ANSI SQL isolation levels. InProc. SIGMOD, May 1995

  33. [33]

    P. A. Bernstein, D. W. Shipman, and W. S. Wong. Formal aspects of serializability in database concurrency control.TSE, SE-5(3), May 1979

  34. [34]

    Biswas and C

    R. Biswas and C. Enea. dbcop source code. https://gitlab.math.univ-paris-diderot.fr/ranadeep/dbcop

  35. [35]

    Biswas and C

    R. Biswas and C. Enea. On the complexity of checking transactional consistency. InProc. OOPSLA, Oct. 2019

  36. [36]

    Bouajjani, C

    A. Bouajjani, C. Enea, R. Guerraoui, and J. Hamza. On verifying causal consistency. InProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, pages 626–638, 2017

  37. [37]

    Burckhardt, C

    S. Burckhardt, C. Dern, M. Musuvathi, and R. Tan. Line-up: a complete and automatic linearizability checker. InProceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 330–340, 2010

  38. [38]

    J. Clark. Verifying serializability protocols with version order recovery. Master’s thesis, ETH Zurich, 2021

  39. [39]

    Clark, A

    J. Clark, A. F. Donaldson, J. Wickerson, and M. Rigger. Validating database system isolation level implementations with version certificate recovery. InProc. EuroSys, 2024

  40. [40]

    Crooks, Y

    N. Crooks, Y. Pu, L. Alvisi, and A. Clement. Seeing is believing: a client-centric specification of database isolation. InProc. PODC, July 2017

  41. [41]

    Z. Cui, W. Dou, Y. Gao, D. Wang, J. Song, Y. Zheng, T. Wang, R. Yang, K. Xu, Y. Hu, et al. Understanding transaction bugs in database systems. In2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE), pages 947–947. IEEE Computer Society, 2024

  42. [42]

    H. D. Dixit, S. Pendharkar, M. Beadon, C. Mason, T. Chakravarthy, B. Muthiah, and S. Sankar. Silent data corruptions at scale.arXiv preprint arXiv:2102.11245, 2021

  43. [43]

    W. Dou, Z. Cui, Q. Dai, J. Song, D. Wang, Y. Gao, W. Wang, J. Wei, L. Chen, H. Wang, et al. Detecting isolation bugs via transaction oracle construction. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1123–1135. IEEE, 2023

  44. [44]

    Eisenberg and J

    A. Eisenberg and J. Melton. SQL standardization: The next steps. SIGMOD Rec., 29(1):63–67, 2000

  45. [45]

    Ghasemirad, S

    S. Ghasemirad, S. Liu, C. Sprenger, L. Multazzu, and D. A. Basin. Veriso: Verifiable isolation guarantees for database transactions.CoRR, abs/2503.06284, 2025

  46. [46]

    Golab, M

    W. Golab, M. R. Rahman, A. AuYoung, K. Keeton, and I. Gupta. Client-centric benchmarking of eventual consistency for cloud storage systems. InProc. ICDCS, June 2014

  47. [47]

    Huang, Q

    D. Huang, Q. Liu, Q. Cui, Z. Fang, X. Ma, F. Xu, L. Shen, L. Tang, Y. Zhou, M. Huang, et al. Tidb: a raft-based htap database. Proceedings of the VLDB Endowment, 13(12):3072–3084, 2020

  48. [48]

    Huang, S

    K. Huang, S. Liu, Z. Chen, H. Wei, D. Basin, H. Li, and A. Pan. Efficient black-box checking of snapshot isolation in databases. Proceedings of the VLDB Endowment, 16(6):1264–1276, 2023

  49. [49]

    Jiang, S

    Z.-M. Jiang, S. Liu, M. Rigger, and Z. Su. Detecting Transactional Bugs in Database Engines via Graph-Based Oracle Construction. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 397–417, 2023

  50. [50]

    B. H. Kim and D. Lie. Caelus: Verifying the consistency of cloud services with battery-powered devices. InProc. S&P, May 2015

  51. [51]

    Elle: Inferring isolation anomalies from experimental observations.ArXiv abs/2003.10554(2020)

    K. Kingsbury and P. Alvaro. Elle: Inferring isolation anomalies from experimental observations.arXiv preprint arXiv:2003.10554, 2020

  52. [52]

    H. Li, H. Wei, H. Ouyang, Y. Chen, N. Yang, R. Zhang, and A. Pan. Online timestamp-based transactional isolation checking of database systems (extended version). https://github.com/FertileFragrance/TimeKiller, 2025

  53. [53]

    K. Li, S. Weng, P. Liu, L. Ni, C. Yang, R. Zhang, X. Zhou, J. Lou, G. Huang, W. Qian, et al. Leopard: A black-box approach for efficiently verifying various isolation levels. In2023 IEEE 39th International Conference on Data Engineering (ICDE), pages 722–735. IEEE, 2023

  54. [54]

    G. Lowe. Testing for linearizability.Concurrency and Computation: Practice and Experience, 29(4):e3928, 2017

  55. [55]

    H. Lu, S. Mu, S. Sen, and W. Lloyd. NCC: Natural concurrency control for strictly serializable datastores by avoiding the timestamp-inversion pitfall. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), 2023

  56. [56]

    H. Lu, S. Sen, and W. Lloyd. Performance-optimal read-only transactions. InProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, OSDI’20, USA, 2020. USENIX Association. 13

  57. [57]

    H. Lu, K. Veeraraghavan, P. Ajoux, J. Hunt, Y. J. Song, W. Tobagus, S. Kumar, and W. Lloyd. Existential consistency: measuring and understanding consistency at Facebook. InProc. SOSP, Oct. 2015

  58. [58]

    Manson, W

    J. Manson, W. Pugh, and S. V. Adve. The java memory model. SIGPLAN Not., 40(1):378–391, Jan. 2005

  59. [59]

    Ouyang, H

    H. Ouyang, H. Wei, Y. Huang, H. Li, and A. Pan. Verifying transactional consistency of mongodb.arXiv preprint arXiv:2111.14946, 2021

  60. [60]

    C. H. Papadimitriou. The serializability of concurrent database updates.JACM, 26(4), Oct. 1979

  61. [61]

    L. Pick, A. Xu, A. Desai, S. A. Seshia, and A. Albarghouthi. Checking observational correctness of database systems.Proceedings of the ACM on Programming Languages, 9(OOPSLA1):1661–1688, 2025

  62. [62]

    M. R. Rahman, W. Golab, A. AuYoung, K. Keeton, and J. J. Wylie. Toward a principled framework for benchmarking consistency. InProc. HotDep, Oct. 2012

  63. [63]

    R. Taft, I. Sharif, A. Matei, N. VanBenschoten, J. Lewis, T. Grieger, K. Niemi, A. Woods, A. Birzin, R. Poss, P. Bardea, A. Ranade, B. Darnell, B. Gruneir, J. Jaffray, L. Zhang, and P. Mattis. Cockroachdb: The resilient geo-distributed SQL database. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online c...

  64. [64]

    C. Tan, C. Zhao, S. Mu, and M. Walfish. Cobra: Making transactional key-value stores verifiably serializable. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 2020

  65. [65]

    Torlak, M

    E. Torlak, M. Vaziri, and J. Dolby. Memsat: checking axiomatic specifications of memory models.ACM Sigplan Notices, 45(6):341–350, 2010

  66. [66]

    H. Wada, A. Fekete, L. Zhao, K. Lee, and A. Liu. Data consistency properties and the trade-offs in commercial cloud storage: the consumers’ perspective. InProc. CIDR, Jan. 2011

  67. [67]

    S. Wang, G. Zhang, J. Wei, Y. Wang, J. Wu, and Q. Luo. Understanding silent data corruptions in a large production cpu population. InProceedings of the 29th Symposium on Operating Systems Principles, pages 216–230, 2023

  68. [68]

    Weikum and G

    G. Weikum and G. Vossen.Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery. Elsevier, 2001

  69. [69]

    Y. Xia, X. Yu, M. Butrovich, A. Pavlo, and S. Devadas. Litmus: Towards a practical database management system with verifiable acid properties and transaction correctness. InProceedings of the 2022 international conference on management of data, pages 1478–1492, 2022

  70. [70]

    X. Yang, Y. Zhang, S. Wang, B. Yu, F. Li, Y. Li, and W. Yan. Ledgerdb: A centralized ledger database for universal audit and verification. Proceedings of the VLDB Endowment, 13(12):3138–3151, 2020

  71. [71]

    Zellag and B

    K. Zellag and B. Kemme. How consistent is your cloud application? In Proc. SoCC, Oct. 2012

  72. [72]

    Zhang, N

    I. Zhang, N. K. Sharma, A. Szekeres, A. Krishnamurthy, and D. R. K. Ports. Building consistent transactions with inconsistent replication. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4-7, 2015, pages 263–278. ACM, 2015

  73. [73]

    Zhang, Y

    J. Zhang, Y. Ji, S. Mu, and C. Tan. Viper: A Fast Snapshot Isolation Checker. InProc. EuroSys, 2023

  74. [74]

    Zhang and C

    J. Zhang and C. Tan. Simplifying snapshot isolation: A new definition, equivalence, and efficient checking. InProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data, pages 23–29, 2024

  75. [75]

    happened- before

    M. Zheng, J. Tucek, D. Huang, F. Qin, M. Lillibridge, E. S. Yang, B. W. Zhao, and S. Singh. Torturing databases for fun and profit. In11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 449–464, 2014. 14 A Appendix A.1 Graph-based isolation level definition Serialization graph.Most existing checkers [38,39,48,51, 64, 73] ...