Determination Provenance: From Ambiguity to Algebra
Pith reviewed 2026-06-27 11:30 UTC · model grok-4.3
The pith
Determination provenance tracks commitments that resolve ambiguity among multiple admissible data outcomes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Determination provenance tracks the commitments that resolve ambiguity in data systems with multiple admissible outcomes. A tuple's support is the set of resolutions under which it holds. These supports form a commutative semiring, and layered commitments induce a filtration measuring each tuple's query-relative depth. Positive relational algebra respects the filtration, enabling compositional robustness analysis and quantitative diagnosis of resolution cost. The framework is instantiated for transactional isolation and for Datalog with negation, where classical semantic variants correspond to different views of a single shared filtration.
What carries the argument
The filtration induced by layered commitments on supports in a commutative semiring, which measures query-relative depth and is respected by positive relational algebra.
If this is right
- Supports form a commutative semiring allowing algebraic manipulation of provenance.
- Positive relational algebra respects the filtration and preserves depth measures under its operations.
- Classical isolation levels and negation semantics correspond to different views of one shared filtration.
- Compositional robustness analysis and quantitative diagnosis of resolution cost become possible for queries.
Where Pith is reading between the lines
- The semiring and filtration structure could be used to compare resolution costs across different query plans in ambiguous settings.
- Similar layering might apply to other models with nondeterminism, such as certain concurrent or probabilistic computations.
- The approach opens a path to provenance that remains defined even before ambiguity is resolved.
Load-bearing premise
Resolutions admit a layering into a filtration whose interaction with relational algebra operations holds compositionally without additional constraints on the resolution sets.
What would settle it
A concrete counterexample consisting of a positive relational algebra query, a set of resolutions, and input tuples where the output tuple's filtration depth does not equal the composition of input depths under the algebra operation.
Figures
read the original abstract
Many data systems admit multiple admissible outcomes for the same input: concurrent transactions may serialize in one of many orders; a logic program may have multiple stable models. Classical data provenance cannot even pose its question in such settings -- it explains how a result was derived, but only after something has chosen which result to produce. We introduce \emph{determination provenance} to track the commitments that resolve this ambiguity. A tuple's \emph{support} is the set of resolutions under which it holds. Supports form a commutative semiring, and layered commitments induce a \emph{filtration} measuring each tuple's \emph{query-relative depth} -- how many layers of semantic resolution it depends on. Positive relational algebra respects the filtration, enabling compositional robustness analysis and quantitative diagnosis of resolution cost. We instantiate the framework for transactional isolation and for $\mbox{Datalog}^\neg$; in both, classical semantic variants (isolation levels; negation semantics) correspond to different views of a single shared filtration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces determination provenance to track commitments resolving ambiguity in systems with multiple admissible outcomes (e.g., transaction serializations or stable models). A tuple's support is defined as the set of resolutions under which it holds; these supports form a commutative semiring via set operations. Layered commitments induce a filtration measuring each tuple's query-relative depth, and the paper asserts that positive relational algebra respects this filtration compositionally. Explicit constructions and proofs are given for transactional isolation levels and Datalog^neg variants, showing that classical semantic variants correspond to different views of a single shared filtration.
Significance. If the algebraic claims hold, the framework unifies provenance analysis across ambiguous semantics, enabling compositional robustness checks and quantitative diagnosis of resolution cost without ad-hoc constraints per instantiation. The explicit constructions and proofs for the two concrete cases (isolation levels and Datalog^neg) are notable strengths, as they demonstrate that the general semiring and filtration properties follow directly from the definitions.
minor comments (3)
- [Abstract] The abstract states that 'positive relational algebra respects the filtration' but does not indicate the section containing the general proof versus the instantiation-specific arguments; adding a forward reference would improve readability.
- Notation for the filtration (e.g., how depth is assigned to layered commitments) is introduced without an early example; a small concrete table illustrating depth values for a sample query would clarify the concept before the general theorems.
- [Introduction] The claim that the general case follows 'without additional ad-hoc constraints on the resolution sets' is stated in the introduction; confirming this is reiterated with a pointer to the relevant lemma or theorem would strengthen the narrative flow.
Simulated Author's Rebuttal
We thank the referee for the positive summary of the paper and the recommendation for minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity; framework is definitional and self-contained
full rationale
The paper defines determination provenance, supports as sets of resolutions forming a commutative semiring, and a filtration induced by layered commitments. It then proves that positive relational algebra respects this filtration compositionally. These are explicit algebraic constructions and proofs for the general case and two instantiations (transactional isolation, Datalog^neg), with no fitted parameters, no predictions that reduce to inputs by construction, and no load-bearing self-citations. The central results follow directly from the stated definitions without circular reduction. This matches the default expectation of a non-circular theoretical paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The set of resolutions under which a tuple holds forms a commutative semiring
- ad hoc to paper Layered commitments induce a filtration that positive relational algebra respects
invented entities (3)
-
determination provenance
no independent evidence
-
support
no independent evidence
-
filtration
no independent evidence
Reference graph
Works this paper leans on
-
[1]
1999.Weak consistency: A generalized theory and optimistic implementations
Atul Adya. 1999.Weak consistency: A generalized theory and optimistic implementations. Ph. D. Dissertation. MIT
1999
-
[2]
Ameloot, Frank Neven, and Jan Van den Bussche
Tom J. Ameloot, Frank Neven, and Jan Van den Bussche. 2013. Relational transducers for declarative networking.J. ACM60, 2 (2013), 15:1–15:38. doi:10.1145/2450142.2450151
-
[3]
Bahareh Sadat Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan, and Boris Glavic. 2018. GProM - A Swiss Army Knife for Your Provenance Needs. InIEEE Data Engineering Bulletin, Vol. 41. 51–62
2018
-
[4]
Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. 1999. Consistent Query Answers in Inconsistent Databases. In Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 68–79. doi:10.1145/303976.303983
-
[5]
Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen. 2023. DBSP: Automatic Incremental View Maintenance.Proceedings of the VLDB Endowment16, 7 (2023), 1601–1614. doi:10.14778/3587136.3587137
-
[6]
Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. InDatabase Theory - ICDT 2001, 8th International Conference, London, UK, January 4-6, 2001, Proceedings (Lecture Notes in Computer Science, Vol. 1973), Jan Van den Bussche and Victor Vianu (Eds.). Springer, London, UK, 316–330. doi:10.1007/3-540-...
-
[7]
Nilesh Dalvi and Dan Suciu. 2012. The dichotomy of probabilistic inference for unions of conjunctive queries.J. ACM 59, 6 (2012), 30:1–30:87. doi:10.1145/2395116.2395119
-
[8]
Improved differentially private euclidean distance approximation
Katrin M. Dannert, Erich Grädel, Matthias Naaf, and Val Tannen. 2021. Semiring Provenance for LFP and Strategy Analysis. InProceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS). 276–289. doi:10.1145/3452021.3458325
-
[9]
Daniel Deutch, Tova Milo, Sudeepa Roy, and Val Tannen. 2014. Circuits for Datalog Provenance. InProceedings of the 17th International Conference on Database Theory (ICDT). 201–212. doi:10.5441/002/icdt.2014.22
-
[10]
2012.Answer Set Solving in Practice
Martin Gebser, Roland Kaminski, Benjamin Kaufmann, and Torsten Schaub. 2012.Answer Set Solving in Practice. Morgan & Claypool
2012
-
[11]
Michael Gelfond and Vladimir Lifschitz. 1988. The stable model semantics for logic programming.ICLP(1988)
1988
-
[12]
Boris Glavic and Gustavo Alonso. 2009. Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting. InProceedings of the 25th IEEE International Conference on Data Engineering (ICDE). 174–185. doi:10.1109/ICDE.2009.15
-
[13]
Green, Grigoris Karvounarakis, and Val Tannen
Todd J. Green, Giorgos Karvounarakis, and Val Tannen. 2007. Provenance Semirings. InProceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). Association for Computing Machinery, Beijing, China, 31–40. doi:10.1145/1265530.1265535
-
[14]
2001.Causes and explanations: A structural-model approach
Joseph Halpern and Judea Pearl. 2001.Causes and explanations: A structural-model approach. MIT Press
2001
-
[15]
Joseph M. Hellerstein. 2025. Complete CALM: A Coordination Criterion for Specifications. arXiv:2602.09435 [cs.DC] https://arxiv.org/abs/2602.09435
Pith/arXiv arXiv 2025
-
[16]
Hellerstein
Joseph M. Hellerstein. 2025. On the Complexity of Determinations. (2025). Under review
2025
-
[17]
Sven Köhler, Bertram Ludäscher, and Yannis Smaragdakis. 2012. Declarative Datalog Debugging for Mere Mortals. In Datalog 2.0. 111–122. doi:10.1007/978-3-642-36524-8_12
-
[18]
Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System.Commun. ACM21, 7 (1978), 558–565. doi:10.1145/359545.359563
-
[19]
Ester Livshits, Leopoldo Bertossi, Benny Kimelfeld, and Moshe Sebag. 2021. The Shapley Value of Tuples in Query Answering. InProceedings of the 24th International Conference on Database Theory (ICDT). 20:1–20:19. doi:10.4230/ LIPIcs.ICDT.2021.20
2021
-
[20]
Wiktor Marek and Miroslaw Truszczyński
V. Wiktor Marek and Miroslaw Truszczyński. 1991. Autoepistemic Logic.J. ACM38, 3 (1991), 588–619. doi:10.1145/ 116825.116836
arXiv 1991
-
[21]
Antoni Mazurkiewicz. 1977. Concurrent Program Schemes and Their Interpretations. InDAIMI Report Series, Vol. 6. Aarhus University. doi:10.7146/dpb.v6i78.7691 34 Joseph M. Hellerstein
-
[22]
Halpern, Christoph Koch, Katherine F
Alexandra Meliou, Wolfgang Gatterbauer, Joseph Y. Halpern, Christoph Koch, Katherine F. Moore, and Dan Suciu
-
[23]
http://sites.computer.org/debull/ A10sept/suciu.pdf
Causality in databases.IEEE Data Engineering Bulletin33, 3 (2010), 59–67. http://sites.computer.org/debull/ A10sept/suciu.pdf
2010
-
[24]
Dan Olteanu and Jakub Závodn `y. 2015. Factorized databases.SIGMOD(2015)
2015
-
[25]
Fotis Psallidas and Eugene Wu. 2018. Smoke: Fine-Grained Lineage at Interactive Speed. InProceedings of the 2018 International Conference on Management of Data (SIGMOD). 719–734. doi:10.1145/3183713.3190515
-
[26]
David P. Reed. 1978.Naming and Synchronization in a Decentralized Computer System. Ph. D. Dissertation. Massachusetts Institute of Technology
1978
-
[27]
Zhang, et al
Margo Seltzer, Y. Zhang, et al. 2005. Provenance in Systems. InProceedings of the 7th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, San Francisco, CA, 99–116
2005
-
[28]
2018.Provenance and Probabilistic Databases
Pierre Senellart. 2018.Provenance and Probabilistic Databases. Cambridge University Press
2018
-
[29]
Lloyd S. Shapley. 1953. A Value for 𝑛-Person Games. InContributions to the Theory of Games, Harold W. Kuhn and Albert W. Tucker (Eds.). Annals of Mathematics Studies, Vol. 2. Princeton University Press, 307–317
1953
-
[30]
Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag
Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010.Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google
2010
-
[31]
2011.Probabilistic Databases
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011.Probabilistic Databases. Morgan & Claypool
2011
-
[32]
Allen Van Gelder, Kenneth Ross, and John Schlipf. 1991. The well-founded semantics for general logic programs. JACM(1991)
1991
-
[33]
Brecht Vandevoort, Alan Fekete, Bas Ketsman, Frank Neven, and Stijn Vansummeren. 2025. Using Read Promotion and Mixed Isolation Levels for Performant Yet Serializable Execution of Transaction Programs.arXiv preprint arXiv:2501.18377v3(2025)
arXiv 2025
-
[34]
Jef Wijsen. 2019. Certain Conjunctive Query Answering in First-Order Logic.ACM Transactions on Database Systems 44, 1 (2019), 3:1–3:45. doi:10.1145/3284551
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.