pith. machine review for the scientific record. sign in

arxiv: 2605.11953 · v1 · submitted 2026-05-12 · 💻 cs.DB · cs.CR

Recognition: no theorem link

PROTECT-DB: Protecting Data using Replicated State Machines: Efficient Corruption Detection & Recovery

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:42 UTC · model grok-4.3

classification 💻 cs.DB cs.CR
keywords database securityreplicated state machinesByzantine fault tolerancecorruption detectiondata recoveryPostgreSQLtransaction logsdeterministic execution
0
0 comments X

The pith

Replicated state machines on deterministic PostgreSQL detect database corruption and repair it concurrently with ongoing transactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PROTECT-DB, a system that applies Byzantine fault tolerant replicated state machines to safeguard database data against attacks that alter its state. Replicas execute the same transactions from a shared log using a deterministic extension of PostgreSQL, so any divergence reveals corruption. The design emphasizes quick detection followed by fast repair that runs alongside normal transaction processing. A performance evaluation supports that the method is efficient enough for practical use. If the approach holds, it offers a concrete way to protect critical data without halting database operations.

Core claim

By building Byzantine-fault tolerant replicated state machines on a deterministic extension of PostgreSQL, every replica processes transactions from a shared log identically; differences among replicas therefore indicate corruption, enabling efficient detection and concurrent repair without interrupting transaction execution.

What carries the argument

The replicated state machine in which each replica deterministically replays transactions from a shared log or blockchain, with replica outputs compared to expose corruption.

If this is right

  • Corruption is detected by comparing replica states after identical transaction execution.
  • Repair proceeds without pausing the database for ongoing transactions.
  • Performance remains practical according to the reported study.
  • The method supplies a foundation for applying BFT replication inside database engines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same log-and-compare pattern could be added to other database engines once a deterministic execution layer exists.
  • Shared logs resemble blockchain structures and might allow hybrid systems that log transactions for both durability and security.
  • Reducing replica count or overhead while keeping detection reliable would be a natural next measurement.

Load-bearing premise

All replicas must produce exactly the same results when executing the same transactions from the shared log, which requires a deterministic extension of PostgreSQL.

What would settle it

Run a workload with one replica's state deliberately altered and check whether the system either misses the corruption or takes longer than normal transaction latency to repair it.

Figures

Figures reproduced from arXiv: 2605.11953 by Anant Utgikar, S. Sudarshan.

Figure 1
Figure 1. Figure 1: System model 2.3 Other Related Work Ledger databases such as [30] provide verification and auditing support, preserving historical information for forward integrity checking [5]. [30] does not support SQL, whereas SQL Ledger [4] provides forward integrity for updates on an SQL database using a blockchain. However, these systems require an audit process for de￾tecting corruption, and meanwhile wrong answers… view at source ↗
Figure 2
Figure 2. Figure 2: Request format checks, and then executes the transaction. Multiple threads are used to allow parallel execution of transactions, with the deter￾ministic database component responsible for ensuring ordering. The results of transaction execution at each replica are sent back asynchronously to the client using Kafka’s pub-sub mechanism, to ensure fault tolerance. We describe the components of the model in thi… view at source ↗
Figure 3
Figure 3. Figure 3: Workers and condition variable github.com/lzllai/AriaBC. As noted in [15], the read-write check also detects phantom conflicts since PostgreSQL uses leaf conflicts for checking conflicts between predicates and updates/deletes/inserts, and these conflicts show up as read-write conflicts on index leaves. We also note that the changes to PostgreSQL above are minimal. The worker executes the transaction on the… view at source ↗
Figure 4
Figure 4. Figure 4: Transaction Result Aggregation Map 5.2 Corrupt Replica Detection All replicas receive an identical sequence of globally ordered trans￾actions. In passive detection, the result of each transaction is moni￾tored. If the result from any two database replicas differs, at least one of them must be corrupted. Since only a minority of replicas may be compromised, we can determine the correct transaction result ba… view at source ↗
Figure 6
Figure 6. Figure 6: Procedure to recover a failed replica replica is different, the hash of at least one of the children of the root node must differ from that in the reference replica. By recursively comparing the hashes of children down the tree, we can identify leaf-nodes whose hashes do not match across replicas. In case of partitions, as proposed in the previous section, the tree comparison must be done between all corre… view at source ↗
Figure 7
Figure 7. Figure 7: Transaction Result and State Aggregation Map [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Impact of Determinism Traffic generator Benchbase [7] generates transactions in a pre￾configured proportion of operations. It supports multiple bench￾marks like YCSB, Smallbank, and TPC-C, each of which defines its own schema . RSA key-pairs are generated with a common configuration ( PKCSv15 padding, 2048 key-length, UTF-8 format, Base64 encoding, etc) for both client and system, and public keys are publi… view at source ↗
Figure 10
Figure 10. Figure 10: Impact of maintaining Merkle tree Merkle tree. However, there is not much difference in system throughput in the two cases. This is probably because the Merkle tree nodes are frequently accessed and would be in buffer, so the IO overhead is minimal. 8.2 End-to-end system performance We evaluate our proposed model on a multi-node setup. We mea￾sure the impact of adding Kafka alone, with transactions alread… view at source ↗
Figure 11
Figure 11. Figure 11: Performance of End-to-end system Kafka broker. A transaction is considered complete when its results are received by the Raft client from a threshold number of servers (in our case 3), and they match [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
read the original abstract

Data is critical for the operation of any organization and needs to be protected, especially against attacks that compromise the state of the database. In this paper, we explore an approach based on Byzantine-fault tolerant replicated state machines, built on top of a deterministic extension of PostgreSQL. Each replica deterministically executes transactions recorded in a shared log/blockchain. Our focus is on creating a practical system that is designed for efficient and quick detection of corruption, as well as quick repair concurrent with execution of transactions. We also present a performance study showing the efficiency and practicality of our approach. We believe our work lays the foundations for the practical use of the BFT replicated state machine approach in the context of databases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes PROTECT-DB, a system for protecting databases against corruption attacks using Byzantine fault-tolerant replicated state machines built atop a deterministic extension of PostgreSQL. Replicas execute transactions from a shared log/blockchain to enable efficient corruption detection and concurrent quick repair, with the work including a performance study to demonstrate efficiency and practicality; it positions this as a foundation for practical BFT use in databases.

Significance. If the determinism premise is rigorously established and the performance results hold, the approach could enable practical, low-overhead corruption detection and recovery in production databases by leveraging BFT replication without requiring full system redesigns.

major comments (2)
  1. Abstract: The central claim of a 'performance study showing the efficiency and practicality of our approach' is asserted without any metrics, experimental setup, results, or data, rendering the efficiency claims unverifiable and load-bearing for the practicality argument.
  2. Abstract: The corruption detection mechanism depends entirely on identical deterministic execution across replicas from the shared log, yet the manuscript provides no description of the deterministic PostgreSQL extension or how non-determinism sources (timestamps, random functions, query planning, I/O ordering) are eliminated; without this, divergence cannot be reliably attributed to corruption versus implementation artifacts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment point by point below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: Abstract: The central claim of a 'performance study showing the efficiency and practicality of our approach' is asserted without any metrics, experimental setup, results, or data, rendering the efficiency claims unverifiable and load-bearing for the practicality argument.

    Authors: We agree that the abstract would be strengthened by including high-level metrics to support the efficiency claims. The detailed experimental setup, results, and data are presented in Section 5 of the manuscript. In revision, we will update the abstract to concisely report key findings, such as throughput overhead below 15% compared to unmodified PostgreSQL and sub-second concurrent recovery times on standard workloads, while preserving brevity. revision: yes

  2. Referee: Abstract: The corruption detection mechanism depends entirely on identical deterministic execution across replicas from the shared log, yet the manuscript provides no description of the deterministic PostgreSQL extension or how non-determinism sources (timestamps, random functions, query planning, I/O ordering) are eliminated; without this, divergence cannot be reliably attributed to corruption versus implementation artifacts.

    Authors: Section 3.2 of the manuscript describes the deterministic PostgreSQL extension, including replacement of timestamps with log-derived sequence numbers, deterministic seeding for random functions, use of fixed query plans via prepared statements, and log-enforced ordering for I/O. To address the concern that this may not be sufficiently prominent, we will add a brief summary sentence to the abstract and improve cross-referencing from the introduction. We believe this resolves the verifiability issue without requiring new experiments. revision: partial

Circularity Check

0 steps flagged

No circularity: system description relies on external determinism assumption without self-referential derivation

full rationale

The paper presents an architectural description of a BFT replicated state machine system for database corruption detection and recovery, built atop an assumed deterministic PostgreSQL extension. No equations, fitted parameters, predictions, or derivation chains appear. The determinism premise is stated as a prerequisite rather than derived or fitted within the paper, and the performance study is presented as empirical validation without reducing to self-defined inputs. No self-citations, ansatzes, or renamings of known results are load-bearing. The central claims remain independent of any internal circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review limited to abstract; no specific free parameters, axioms, or invented entities are detailed in the provided text.

axioms (2)
  • domain assumption A deterministic extension of PostgreSQL allows identical transaction execution across replicas from a shared log.
    Required for replicas to maintain consistent state and enable corruption detection via comparison.
  • domain assumption BFT replicated state machines can be built efficiently on top of this extension for practical use.
    Central premise for the system's practicality and performance claims.

pith-pipeline@v0.9.0 · 5413 in / 1203 out tokens · 64388 ms · 2026-05-13T04:42:44.077241+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Abadi and J

    D. Abadi and J. Faleiro.An overview of deterministic database systems. Communi- cations of the ACM, vol.61, 2018

  2. [2]

    Abadi and Jose M

    Daniel J. Abadi and Jose M. Faleiro. An overview of deterministic database systems.Commun. ACM, 61(9):78–88, 2018

  3. [3]

    Hyper- ledger fabric: a distributed operating system for permissioned blockchains

    Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstanti- nos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, Srinivasan Muralidharan, Chet Murthy, Binh Nguyen, Manish Sethi, Gari Singh, Keith Smith, Alessandro Sorniotti, Chrysoula Stathakopoulou, Marko Vukolić, Sharon Weed Cocco, and Jas...

  4. [4]

    Antonopoulos, R

    P. Antonopoulos, R. Kaushik, H. Kodavalla, S. Rosales Aceves, R. Wong, J. Ander- son, and J. Szymaszek.SQL Ledger: Cryptographically Verifiable Data in Azure SQL Database. InProceedings of the ICMD, ACM SIGMOD, 2021

  5. [5]

    Bellare and B

    M. Bellare and B. Yee. Forward integrity for secure audit logs. Technical report, UC San Diego, 1998

  6. [6]

    Devanbu, M

    P. Devanbu, M. Gertz, C. Martel, and S. Stubblebine.Authentic Data Publication over the Internet.Journal of Computer Security, 11, 2002

  7. [7]

    Difallah, A

    D. Difallah, A. Pavlo, C. Curino, and P. CudreMauroux. Oltp-bench: An extensible testbed for benchmarking relational databases.Proceedings of the VLDB, 2013

  8. [8]

    T. Distler. Byzantine fault-tolerant state-machine replication from a systems perspective.ACM Comput. Surv., 54(1), February 2021

  9. [9]

    Apache Kafka, https://kafka.apache.org/, 2025

    Apache Foundation. Apache Kafka, https://kafka.apache.org/, 2025

  10. [10]

    Apache Ratis

    Apache Software Foundation. Apache Ratis. https://ratis.apache.org, 2025

  11. [11]

    Freij, H

    A. Freij, H. Zhou, and Y. Solihin.Bonsai Merkle Forests: Efficiently Achieving Crash Consistency in Secure Persistent Memory. InMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

  12. [12]

    Pittelli, and Susan B

    Hector Garcia-Molina, Frank M. Pittelli, and Susan B. Davidson. Applications of Byzantine agreement in database systems.ACM Trans. Database Syst., 11(1):27– 47, 1986

  13. [13]

    Popov, Vladimir Stankovic, and Lorenzo Strigini

    Ilir Gashi, Peter T. Popov, Vladimir Stankovic, and Lorenzo Strigini. On designing dependable services with diverse off-the-shelf SQL servers. InW ADS, volume 3069 of Lecture Notes in Computer Science, Springer, pages 191––214. 2003. 12

  14. [14]

    Z. Lai, C. Liu, and E. Lo. When private blockchain meets deterministic database. Proceedings of the ACM on Management of Data, 2023

  15. [15]

    Y. Lu, X. Yu, L. Cao, and S. Madden. Aria: a fast and practical deterministic oltp database.Proceedings of the VLDB, 13, 2020

  16. [16]

    MITRA: byzantine fault-tolerant middleware for transaction processing on replicated databases

    Aldelir Fernando Luiz, Lau Cheuk Lung, and Miguel Correia. MITRA: byzantine fault-tolerant middleware for transaction processing on replicated databases. SIGMOD Rec., 43(1):32–38, May 2014

  17. [17]

    Mykletun, M

    E. Mykletun, M. Narasimha, and G. Tsudik. Authentication and integrity in outsourced databases.ACM Trans. Storage, 2(2):107–138, May 2006

  18. [18]

    Blockchain meets database: Design and implementation of a blockchain relational database.Proc

    Senthil Nathan, Chander Govindarajan, Adarsh Saraf, Manish Sethi, and Praveen Jayachandran. Blockchain meets database: Design and implementation of a blockchain relational database.Proc. VLDB Endow., 12(11):1539–1552, 2019

  19. [19]

    Ongaro and J

    D. Ongaro and J. Ousterhout.In Search of an Understandable Consensus Algorithm. InAnnual Technical Conference 2014. USENIX Association, 2014

  20. [20]

    Callinicos: Robust transactional storage for distributed data structures

    Ricardo Padilha, Enrique Fynn, Robert Soule, and Fernando Pedone. Callinicos: Robust transactional storage for distributed data structures. InUSENIX Annual Technical Conf., 2016

  21. [21]

    Augustus: scalable and robust storage for cloud applications

    Ricardo Padilha and Fernando Pedone. Augustus: scalable and robust storage for cloud applications. InProceedings of the 8th ACM European Conference on Computer Systems, EuroSys ’13, page 99–112, New York, NY, USA, 2013. Association for Computing Machinery

  22. [22]

    Byzantium: Byzantine-Fault-Tolerant Database Replication Pro- viding Snapshot Isolation

    Nuno Preguic and Rodrigo Rodrigues amd Cristovao Honorato amd Joao Lourenco. Byzantium: Byzantine-Fault-Tolerant Database Replication Pro- viding Snapshot Isolation. InProcs Workshop on Hot Topics in System Depend- ability (HotDep), Usenix, 2008

  23. [23]

    Kun Ren, Alexander Thomson, and Daniel J. Abadi. An evaluation of the advan- tages and disadvantages of deterministic database systems.Proc. VLDB Endow., 7(10):821–832, 2014

  24. [24]

    chainifyDB: How to get rid of your blockchain and use your dbms instead

    Felix Schuhknecht, Ankur Sharma, Jens Dittrich, and Divya Agrawal. chainifyDB: How to get rid of your blockchain and use your dbms instead. InCIDR, 2021

  25. [25]

    Silberschatz, H

    A. Silberschatz, H. F. Korth, and S. Sudarshan.Database system concepts. McGraw- Hill, 7 edition, 2022

  26. [26]

    Basil: Breaking up bft with acid (transactions)

    Florian Suri-Payer, Matthew Burke, Zheng Wang, Yunhao Zhang, Lorenzo Alvisi, and Natacha Crooks. Basil: Breaking up bft with acid (transactions). InProceed- ings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, SOSP ’21, New York, NY, USA, 2021. Association for Computing Machinery

  27. [27]

    Pesto: Cooking up high performance bft queries

    Florian Suri-Payer, Neil Giridharan, Liam Arzola, Shir Cohen, Lorenzo Alvisi, and Natacha Crooks. Pesto: Cooking up high performance bft queries. InProcs. ACM SIGOPS Symp. on Operating Systems Principles, SOSP ’25, page 529–554, 2025

  28. [28]

    Alexander Thomson and Daniel J. Abadi. The case for determinism in database systems.Proc. VLDB Endow., 3(1):70–80, 2010

  29. [29]

    Toler- ating byzantine faults in transaction processing systems using commit barrier scheduling

    Ben Vandiver, Hari Balakrishnan, Barbara Liskov, and Samuel Madden. Toler- ating byzantine faults in transaction processing systems using commit barrier scheduling. InProcs. ACM Symposium on Operating Systems Principles SOSP, pages 59–72, 2007

  30. [30]

    X. Yang, Y. Zhang, S. Wang, B. Yu, F. Li, Y. Li, and W. Yan. Ledgerdb: A centralized ledger database for universal audit and verification.Proc. VLDB Endow., 13, 2020

  31. [31]

    M. Yin, D. Malkhi, M. K. Reiter, G. G. Gueta, and I. Abraham. Hotstuff: Bft consensus with linearity and responsiveness. InIn Proc. ACM PODC, 2019

  32. [32]

    C. Yue, M. Zhang, C. Zhu, G. Chen, D. Loghin, and Beng C. Ooi. Veribench: Analyzing the performance of database systems with verifiability.Proc. VLDB Endow., 16, 2023

  33. [33]

    Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R

    Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R. K. Ports. Building consistent transactions with inconsistent replication. InProceedings of the 25th Symposium on Operating Systems Principles, SOSP ’15, page 263–278, New York, NY, USA, 2015. Association for Computing Machinery

  34. [34]

    Janus: Enhancing asynchronous common subset with trusted hardware

    Liangrong Zhao, Hans Schmiedel, Qin Wang, and Jiangshan Yu. Janus: Enhancing asynchronous common subset with trusted hardware. In2024 Annual Computer Security Applications Conference (ACSAC), pages 488–504, 2024. 13