Recognition: unknown
Living Databases: A Unified Model for Continuous Schema Evolution, Versioning, and Transformations
Pith reviewed 2026-05-09 18:52 UTC · model grok-4.3
The pith
A single abstraction unifies schema evolution, versioning, transformations, and provenance tracking in databases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We argue for unifying these diverse functionalities under a single abstraction and a common set of computational primitives. We present such an abstraction, powerful enough to encompass existing use cases and to support new ones. Going beyond previous approaches, our framework seamlessly integrates provenance tracking for system-visible operations, conditional propagation of updates, and configurable alerts on change events. It also offers a principled treatment of dependent objects such as views and derived artifacts like machine learning models, by providing declarative mechanisms to control their evolution. A prototype is sketched in a relational-like database system based on an adapted '
What carries the argument
The unified abstraction for living databases, realized through an adaptation of the Prolly Tree (a Merkle tree-inspired data structure with tunable parameters for performance).
Load-bearing premise
The diverse functionalities of schema evolution, versioning, transformations, and provenance can be unified under a single abstraction and common primitives without losing specialized performance or requiring impractical redesigns of existing systems.
What would settle it
Implementing the prototype on standard schema-evolution and versioning benchmarks and measuring whether it matches the throughput or latency of current specialized systems while still expressing all prior use cases.
Figures
read the original abstract
Databases, and datasets more generally, evolve continuously through updates, transformations, versioning, schema changes, streaming operations, and other mechanisms. While prior work has noted connections among some of these areas, they have traditionally been studied in isolation, each with its own abstractions, algorithms, and system implementations. In this paper, we argue for unifying these diverse functionalities under a single abstraction and a common set of computational primitives. We present such an abstraction, powerful enough to encompass existing use cases and to support new ones. Going beyond previous approaches, our framework seamlessly integrates provenance tracking for system-visible operations, conditional propagation of updates, and configurable alerts on change events. It also offers a principled treatment of dependent objects such as views and derived artifacts like machine learning models, by providing declarative mechanisms to control their evolution. Finally, we sketch a prototype implementation in a relational-like database system based on an adaptation of the "Prolly Tree", a Merkle tree-inspired data structure with tunable parameters to meet varying performance requirements, and present some initial experimental results.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes unifying schema evolution, versioning, transformations, provenance tracking, conditional update propagation, configurable alerts, and declarative control over dependent objects (views, ML models) under a single abstraction based on an adapted Prolly Tree (a tunable Merkle-tree variant). It sketches a prototype implementation in a relational-like system and reports initial experimental results, claiming the framework encompasses prior use cases while enabling new ones without specialized performance loss.
Significance. If the concrete mappings and performance claims hold, the work would be significant for bridging traditionally isolated database research areas and providing a principled basis for evolution of derived artifacts. The integration of provenance and alerts is a potential strength, but the current high-level sketch does not yet demonstrate these benefits.
major comments (2)
- [Abstract] Abstract / prototype sketch: the central claim that the adapted Prolly Tree 'seamlessly integrates' conditional propagation of updates, configurable alerts, and declarative mechanisms for views/ML models is not supported by any concrete encoding, operation mapping, or example; without this, it is impossible to verify that specialized performance is preserved or that no impractical redesign is required.
- [Prototype and experiments] Initial experimental results: no quantitative comparison is supplied against specialized systems for schema evolution or versioning, nor any measurement showing that the unified primitives maintain performance for the claimed features; this directly undermines the 'without losing specialized performance' assertion that is load-bearing for the unification argument.
minor comments (1)
- The tunable parameters of the Prolly Tree are mentioned but not defined with sufficient precision or default values to allow reproduction or extension of the initial results.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that the current high-level sketch requires concrete details to substantiate the unification claims and will revise the manuscript to address both major comments.
read point-by-point responses
-
Referee: [Abstract] Abstract / prototype sketch: the central claim that the adapted Prolly Tree 'seamlessly integrates' conditional propagation of updates, configurable alerts, and declarative mechanisms for views/ML models is not supported by any concrete encoding, operation mapping, or example; without this, it is impossible to verify that specialized performance is preserved or that no impractical redesign is required.
Authors: We agree that the manuscript presents the integration at an abstract level without concrete encodings, operation mappings, or worked examples for conditional update propagation, alerts, and declarative control of views/ML models. This limits verifiability of the 'seamless' claim and performance preservation. In the revised version we will add a dedicated section with explicit mappings of these features onto the adapted Prolly Tree primitives, together with illustrative examples showing how they are realized without redesign of the core structure. revision: yes
-
Referee: [Prototype and experiments] Initial experimental results: no quantitative comparison is supplied against specialized systems for schema evolution or versioning, nor any measurement showing that the unified primitives maintain performance for the claimed features; this directly undermines the 'without losing specialized performance' assertion that is load-bearing for the unification argument.
Authors: The current experimental section contains only preliminary micro-benchmarks and indeed supplies neither head-to-head comparisons against specialized schema-evolution or versioning systems nor overhead measurements for the additional unified features. This weakens the performance-preservation argument. We will expand the evaluation with new quantitative experiments that compare the Prolly Tree prototype against representative specialized systems and report the incremental cost of provenance, conditional propagation, and alert mechanisms. revision: yes
Circularity Check
No circularity: conceptual unification proposal with independent design argument
full rationale
The manuscript advances a design argument for a unified abstraction (adapted Prolly Trees) that integrates schema evolution, versioning, provenance, conditional propagation, alerts, and dependent-object control. No equations, fitted parameters, or first-principles derivations are presented whose outputs reduce by construction to the inputs. The Prolly Tree adaptation is introduced as an implementation sketch rather than a self-referential definition or renamed prior result; any self-citations (if present) are not load-bearing for the central claim. The work is therefore self-contained as a forward-looking proposal whose validity rests on external evaluation of the sketched prototype rather than tautological reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Diverse database functionalities can be unified under a single abstraction and common computational primitives without significant loss of specialized capabilities.
Reference graph
Works this paper leans on
-
[1]
ACM Computing Surveys (CSUR) , volume=
Semantic data models , author=. ACM Computing Surveys (CSUR) , volume=. 1988 , publisher=
1988
-
[2]
ACM Transactions on Database Systems (TODS) , volume=
Extending the database relational model to capture more meaning , author=. ACM Transactions on Database Systems (TODS) , volume=. 1979 , publisher=
1979
-
[3]
Readings in database systems , volume=
What goes around comes around , author=. Readings in database systems , volume=
-
[4]
And Around
What Goes Around Comes Around... And Around... , author=. SIGMOD Record , volume=
-
[5]
2016 IEEE International Conference on Big Data (Big Data) , pages=
Database decay and how to avoid it , author=. 2016 IEEE International Conference on Big Data (Big Data) , pages=. 2016 , organization=
2016
-
[6]
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , pages=
Feral concurrency control: An empirical investigation of modern application integrity , author=. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , pages=
2015
-
[7]
, author=
A Critique of Modern SQL and a Proposal Towards a Simple and Expressive Query Language. , author=. CIDR , year=
-
[8]
Proceedings of the ACM on Management of Data , volume=
PG-Schema: Schemas for property graphs , author=. Proceedings of the ACM on Management of Data , volume=. 2023 , publisher=
2023
-
[9]
2019 , publisher=
Database system concepts , author=. 2019 , publisher=
2019
-
[10]
PVLDB , year=
Principles of dataset versioning: Exploring the recreation/storage tradeoff , author=. PVLDB , year=
-
[11]
Stonebraker, Mike and Abadi, Daniel J. and Batkin, Adam and Chen, Xuedong and Cherniack, Mitch and Ferreira, Miguel and Lau, Edmond and Lin, Amerson and Madden, Sam and O'Neil, Elizabeth and O'Neil, Pat and Rasin, Alex and Tran, Nga and Zdonik, Stan , title =. 2005 , publisher =
2005
-
[12]
ACM Transactions on Database Systems (TODS) , volume=
Join indices , author=. ACM Transactions on Database Systems (TODS) , volume=. 1987 , publisher=
1987
-
[13]
Proceedings of the 2002 ACM SIGMOD international conference on Management of data , pages=
Storing and querying ordered XML using a relational database system , author=. Proceedings of the 2002 ACM SIGMOD international conference on Management of data , pages=
2002
-
[14]
Neumann, Thomas and Weikum, Gerhard , journal=. The. 2010 , publisher=
2010
-
[15]
Sun, Wen and Fokoue, Achille and Srinivas, Kavitha and Kementsietsidis, Anastasios and Hu, Gang and Xie, Guotong , booktitle=
-
[16]
ACM SIGMOD Record , volume=
Factorized databases , author=. ACM SIGMOD Record , volume=. 2016 , publisher=
2016
-
[17]
Proceedings of the VLDB Endowment , volume=
Graceful database schema evolution: the prism workbench , author=. Proceedings of the VLDB Endowment , volume=. 2008 , publisher=
2008
-
[18]
ACM Sigmod Record , volume=
The database language GEM , author=. ACM Sigmod Record , volume=. 1983 , publisher=
1983
-
[19]
Proceedings of the 1978 ACM SIGMOD international conference on management of data , pages=
The semantic data model: a modelling mechanism for data base applications , author=. Proceedings of the 1978 ACM SIGMOD international conference on management of data , pages=
1978
-
[20]
arXiv preprint arXiv:2011.06381 , year=
Scalable querying of nested data , author=. arXiv preprint arXiv:2011.06381 , year=
-
[21]
Database and XML Technologies: 6th International XML Database Symposium, XSym 2009, Lyon, France, August 24, 2009
From Entity Relationship to XML Schema: a graph-theoretic approach , author=. Database and XML Technologies: 6th International XML Database Symposium, XSym 2009, Lyon, France, August 24, 2009. Proceedings 6 , pages=. 2009 , organization=
2009
-
[22]
Data & Knowledge Engineering , volume=
Conceptual modeling for customized XML schemas , author=. Data & Knowledge Engineering , volume=. 2005 , publisher=
2005
-
[23]
ACM Transactions on Database Systems (TODS) , volume=
A graph-theoretic approach to map conceptual designs to XML schemas , author=. ACM Transactions on Database Systems (TODS) , volume=. 2013 , publisher=
2013
-
[24]
ACM transactions on database systems (TODS) , volume=
The entity-relationship model—toward a unified view of data , author=. ACM transactions on database systems (TODS) , volume=. 1976 , publisher=
1976
-
[25]
ACM Transactions on Database Systems (TODS) , volume=
Extended algebra and calculus for nested relational databases , author=. ACM Transactions on Database Systems (TODS) , volume=. 1988 , publisher=
1988
-
[26]
, author=
Sypse: Privacy-first Data Management through Pseudonymization and Partitioning. , author=. CIDR , year=
-
[27]
CIDR , year=
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First , author=. CIDR , year=
-
[28]
Proceedings of the VLDB Endowment , volume=
Delta lake: high-performance ACID table storage over cloud object stores , author=. Proceedings of the VLDB Endowment , volume=. 2020 , publisher=
2020
-
[29]
Theory and Practice of Provenance
Amit Chavan and Silu Huang and Amol Deshpande and Aaron Elmore and Samuel Madden and Aditya Parameswaran , title =. Theory and Practice of Provenance
-
[30]
Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data , series =
Salem, Kenneth and Beyer, Kevin and Lindsay, Bruce and Cochrane, Roberta , title =. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data , series =. 2000 , isbn =. doi:10.1145/342009.335393 , acmid =
-
[31]
Chavan, Amit and Deshpande, Amol , booktitle=
-
[32]
, author=
Management Of Schema Evolution In Databases. , author=. VLDB , volume=
-
[33]
The VLDB Journal , volume=
Automating the database schema evolution process , author=. The VLDB Journal , volume=. 2013 , publisher=
2013
-
[34]
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation , pages=
Synthesizing database programs for schema refactoring , author=. Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation , pages=
-
[35]
arXiv preprint arXiv:2003.01331 , year=
Data migration using datalog program synthesis , author=. arXiv preprint arXiv:2003.01331 , year=
-
[36]
International Conference on Conceptual Modeling , pages=
Towards a taxonomy of schema changes for NoSQL databases: The orion language , author=. International Conference on Conceptual Modeling , pages=. 2021 , organization=
2021
-
[37]
The VLDB Journal , volume=
Multi-schema-version data management: data independence in the twenty-first century , author=. The VLDB Journal , volume=. 2018 , publisher=
2018
-
[38]
arXiv preprint arXiv:2210.03958 , year=
Online schema evolution is (almost) free for snapshot databases , author=. arXiv preprint arXiv:2210.03958 , year=
-
[39]
arXiv preprint arXiv:1910.10959 , year=
Toward co-existing database schemas based on bidirectional transformation , author=. arXiv preprint arXiv:1910.10959 , year=
-
[40]
Computing Open , volume=
Schema Versioning in Databases: A Literature Review , author=. Computing Open , volume=
-
[41]
Proceedings of the International Conference on Extending Database Technology (EDBT)
Time-Related Patterns Of Schema Evolution , author=. Proceedings of the International Conference on Extending Database Technology (EDBT). OpenProceedings. org , pages=
-
[42]
2016 IEEE International Conference on Software Maintenance and Evolution (ICSME) , pages=
Evolving NoSQL databases without downtime , author=. 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME) , pages=. 2016 , organization=
2016
-
[43]
Distributed and Parallel Databases , volume=
Self-adapting data migration in the context of schema evolution in NoSQL databases , author=. Distributed and Parallel Databases , volume=. 2022 , publisher=
2022
-
[44]
ACM Transactions on Database Systems (TODS) , volume=
Synchronization of queries and views upon schema evolutions: A survey , author=. ACM Transactions on Database Systems (TODS) , volume=. 2016 , publisher=
2016
-
[45]
Information Systems , volume=
Growing up with stability: How open-source relational databases evolve , author=. Information Systems , volume=. 2015 , publisher=
2015
-
[46]
, author=
Joint Source and Schema Evolution: Insights from a Study of 195 FOSS Projects. , author=. EDBT , pages=
-
[47]
2016 IEEE 8th international workshop on managing technical debt (MTD) , pages=
Database design debts through examining schema evolution , author=. 2016 IEEE 8th international workshop on managing technical debt (MTD) , pages=. 2016 , organization=
2016
-
[48]
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles , pages=
Discretized streams: Fault-tolerant streaming computation at scale , author=. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles , pages=. 2013 , organization=
2013
-
[49]
Bhardwaj and Souvik Bhattacherjee and Amit Chavan and Amol Deshpande and Aaron Elmore and Samuel Madden and Aditya Parameswaran , title =
Anant P. Bhardwaj and Souvik Bhattacherjee and Amit Chavan and Amol Deshpande and Aaron Elmore and Samuel Madden and Aditya Parameswaran , title =. Conference on Innovative Database Research
-
[50]
Proceedings of the VLDB Endowment
Souvik Bhattacherjee and Amit Chavan and Silu Huang and Amol Deshpande and Aditya Parameswaran , title =. Proceedings of the VLDB Endowment
-
[51]
Elmore and Samuel Madden and Aditya G
Michael Maddox and David Goehring and Aaron J. Elmore and Samuel Madden and Aditya G. Parameswaran and Amol Deshpande , title =
-
[52]
A Survey and Classification of Storage Deduplication Systems , journal =
Paulo, Jo\. A Survey and Classification of Storage Deduplication Systems , journal =. 2014 , issn =. doi:10.1145/2611778 , acmid =
-
[53]
CIDR , year=
Generic Version Control: Configurable Versioning for Application-Specific Requirements , author=. CIDR , year=
-
[54]
CIDR , year=
Beyond Relations: A Case for Elevating to the Entity-Relationship Abstraction , author=. CIDR , year=
-
[55]
Proceedings of the 1996 ACM SIGMOD international Conference on Management of Data , pages=
The dangers of replication and a solution , author=. Proceedings of the 1996 ACM SIGMOD international Conference on Management of Data , pages=
1996
-
[56]
Proceedings of the VLDB Endowment , volume=
DBSP: Automatic Incremental View Maintenance for Rich Query Languages , author=. Proceedings of the VLDB Endowment , volume=. 2023 , publisher=
2023
-
[57]
Proceedings of the VLDB Endowment , volume=
ORPHEUSDB: Bolt-on Versioning for Relational Databases , author=. Proceedings of the VLDB Endowment , volume=
-
[58]
VLDB , year =
Carlo Curino and Hyun Jin Moon and Carlo Zaniolo , title =. VLDB , year =
-
[59]
Tianxun Hu and Tianzheng Wang and Qingqing Zhou , title =. Proc. VLDB Endow. , volume =. 2022 , doi =
2022
-
[60]
Shi Gao and Carlo Zaniolo , title =. 4th. 2012 , address =
2012
-
[61]
Moon and Alin Deutsch and Carlo Zaniolo , title =
Carlo Curino and Hyun J. Moon and Alin Deutsch and Carlo Zaniolo , title =. Proc. VLDB Endow. , volume =
-
[62]
Rundensteiner and Andreas Koeller and Xin Zhang , title =
Elke A. Rundensteiner and Andreas Koeller and Xin Zhang , title =. Proceedings of the International Database Engineering and Applications Symposium (IDEAS'99) , year =
-
[63]
Computing Open , volume =
Zouhaier Brahmia and Fabio Grandi and Barbara Oliboni , title =. Computing Open , volume =. 2024 , doi =
2024
-
[64]
37th IEEE International Conference on Data Engineering (ICDE) , pages =
Panos Vassiliadis , title =. 37th IEEE International Conference on Data Engineering (ICDE) , pages =. 2021 , doi =
2021
-
[65]
Roddick , title =
John F. Roddick , title =. Inf. Softw. Technol. , volume =. 1995 , doi =
1995
-
[66]
Proceedings of the 2nd Workshop on Human-in-the-Loop Data Analytics , pages=
Provdb: Lifecycle management of collaborative analysis workflows , author=. Proceedings of the 2nd Workshop on Human-in-the-Loop Data Analytics , pages=
-
[67]
Proceedings of the VLDB Endowment , volume=
ForkBase: An Efficient Storage Engine for Blockchain and Forkable Applications , author=. Proceedings of the VLDB Endowment , volume=
-
[68]
Proceedings of the VLDB Endowment , volume=
Online, asynchronous schema change in F1 , author=. Proceedings of the VLDB Endowment , volume=. 2013 , publisher=
2013
-
[69]
2007 IEEE 23rd international conference on data engineering , pages=
Partial materialized views , author=. 2007 IEEE 23rd international conference on data engineering , pages=. 2006 , organization=
2007
-
[70]
Computer , volume=
A graphical data manipulation language for an extended entity-relationship model , author=. Computer , volume=. 1990 , publisher=
1990
-
[71]
R., and R
Rogers, T. R., and R. G. G. Cattell , title =. IEEE Data Engineering Bulletin , year =
-
[72]
Proceedings of the 1980 ACM SIGMOD international conference on Management of data , pages=
An entity-based database user interface , author=. Proceedings of the 1980 ACM SIGMOD international conference on Management of data , pages=
1980
-
[73]
ACM Transactions on Database Systems (TODS) , volume=
The functional data model and the data languages DAPLEX , author=. ACM Transactions on Database Systems (TODS) , volume=. 1981 , publisher=
1981
-
[74]
Proceedings of the 1988 ACM SIGMOD international conference on Management of data , pages=
A data model and query language for EXODUS , author=. Proceedings of the 1988 ACM SIGMOD international conference on Management of data , pages=
1988
-
[75]
Proceedings of the 2008 ACM SIGMOD international conference on Management of data , pages=
Object/relational mapping 2008: hibernate and the entity data model (edm) , author=. Proceedings of the 2008 ACM SIGMOD international conference on Management of data , pages=
2008
-
[76]
Melnik, Sergey and Adya, Atul and Bernstein, Philip A. , title =. ACM Trans. Database Syst. , month = dec, articleno =. 2008 , issue_date =. doi:10.1145/1412331.1412334 , abstract =
-
[77]
Proceedings of the XIII Brazilian Symposium on Software Components, Architectures, and Reuse , pages=
Implementing a Classic ER Algebra to Automatically Generate Complex Queries for Document-Oriented Databases , author=. Proceedings of the XIII Brazilian Symposium on Software Components, Architectures, and Reuse , pages=
-
[78]
NET entity framework: raising the level of abstraction in data programming , author=
ADO. NET entity framework: raising the level of abstraction in data programming , author=. Proceedings of the 2007 ACM SIGMOD international conference on Management of data , pages=
2007
-
[79]
1984 IEEE First International Conference on Data Engineering , pages=
An entity-relationship algebra , author=. 1984 IEEE First International Conference on Data Engineering , pages=. 1984 , organization=
1984
-
[80]
Proceedings of the 2021 International Conference on Management of Data , pages=
Bullfrog: Online schema evolution via lazy evaluation , author=. Proceedings of the 2021 International Conference on Management of Data , pages=
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.