pith. machine review for the scientific record. sign in

arxiv: 2605.00417 · v1 · submitted 2026-05-01 · 💻 cs.DB · cs.LO

Recognition: unknown

Multiset semantics in SPARQL, Relational Algebra and Datalog

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:04 UTC · model grok-4.3

classification 💻 cs.DB cs.LO
keywords multiset semanticsSPARQLDatalogrelational algebraexpressive equivalencequery languagesRDF data
0
0 comments X

The pith

SPARQL patterns using AND, UNION, FILTER, EXCEPT and SELECT are expressively equivalent to multiset Datalog and relational algebra.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that SPARQL query patterns built from AND, UNION, FILTER, EXCEPT and SELECT produce the same multisets of results as non-recursive Datalog programs with safe negation and as expressions in a multiset relational algebra with projection, selection, natural join, arithmetic union and except. This matters for a sympathetic reader because it lets database theory developed for Datalog and relational algebra transfer directly to SPARQL, which is the standard language for querying RDF data. The work therefore supplies a precise algebraic and logical account of how bag semantics behaves in the SPARQL fragment that corresponds to these operators. If the equivalence holds, any optimization or rewriting technique valid in one formalism is valid in the others.

Core claim

We prove that these three formalisms are expressively equivalent under multiset semantics. The proof proceeds by exhibiting translations between SPARQL patterns, multiset Datalog programs and multiset relational algebra expressions that preserve the multiplicity of every result tuple for the listed operators.

What carries the argument

The explicit translations that map SPARQL patterns to multiset Datalog rules and to multiset relational algebra expressions while preserving tuple multiplicities.

If this is right

  • Any SPARQL query in the fragment can be rewritten into an equivalent multiset relational algebra expression.
  • Optimizations and equivalences known for multiset relational algebra apply unchanged to the corresponding SPARQL patterns.
  • Non-recursive Datalog with safe negation can serve as an intermediate representation for evaluating these SPARQL queries.
  • The algebraic structure of multiset SPARQL is fully characterized by the operators of the aligned Datalog and relational algebra.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The equivalence supplies a route for implementing SPARQL engines on top of existing Datalog or relational algebra engines that already handle multisets.
  • Similar translations might be attempted for additional SPARQL operators such as OPTIONAL or property paths if suitable multiset extensions are defined.
  • The result clarifies why certain duplicate-handling behaviors in SPARQL match the bag semantics already present in SQL.

Load-bearing premise

The claimed equivalence is limited to the listed SPARQL operators, non-recursive Datalog with safe negation, and the particular multiset operations defined for relational algebra in the paper.

What would settle it

A SPARQL pattern using only AND, UNION, FILTER, EXCEPT and SELECT whose result multiset cannot be produced by any non-recursive safe-negation Datalog program or by any expression in the paper's multiset relational algebra.

Figures

Figures reproduced from arXiv: 2605.00417 by Claudio Gutierrez, Daniel Hern\'andez, Renzo Angles.

Figure 1
Figure 1. Figure 1: Transitivity of language containment. The figure represents three languages view at source ↗
Figure 2
Figure 2. Figure 2: The triangle of simulations among SPARQL, Non-Recursive Multiset Datalog with Safe Negation view at source ↗
Figure 3
Figure 3. Figure 3: Example of derivation trees. Let 𝐷 be a NRMD¬ database, 𝐹 = 𝑟(𝑎) be a fact in 𝐷 with card(𝐹, 𝐷) = 2, Π = {𝑅, 𝑆} be a NRMD¬ program where 𝑅 is the rule 𝑞(𝑋) ← 𝑟(𝑋), 𝑝(𝑋) and 𝑆 is the rule 𝑝(𝑋) ← 𝑟(𝑋). This figure shows the derivation trees of Π with respect to 𝐷. A variable 𝑋 occurs positively in a rule 𝑅 if and only if 𝑋 occurs in a positive literal in the body of 𝑅. A rule 𝑅 is said to be safe if all its … view at source ↗
Figure 4
Figure 4. Figure 4: Parse tree of the normalized SPARQL graph pattern presented in Example view at source ↗
Figure 5
Figure 5. Figure 5: Derivation trees for the ground literal 𝜃 (𝑝(𝑋¯)) regarding query (𝑝(𝑋¯), {𝑅}) (on the left), and query (𝑝(𝑋¯), {𝑅 𝐴 2 , 𝑅′′}) (on the right). The children of the nodes labeled with the positive ground literals 𝜃 (𝐴𝑖) are omitted. (2) If 𝑚 > 1 and 𝑛 = 0 then the normalization of rule 𝑅 consists of a set Π𝑅 of rules 𝑅 𝐴 𝑖 , for 2 ≤ 𝑖 ≤ 𝑚, defined recursively as follows: 𝑅 𝐴 2 = 𝑞 𝐴 2 (𝑌¯ 2) ← 𝐴1, 𝐴2, 𝑅 𝐴 𝑖 … view at source ↗
Figure 6
Figure 6. Figure 6: Derivation trees for the ground atom 𝜃 (𝑝(𝑋¯)) regarding query (𝑝(𝑋¯), {𝑅}) (on the left), and query (𝑝(𝑋¯), {𝑅 𝐴 2 , 𝑅′′}) (on the right). The children of the nodes labeled with the positive ground literals 𝜃 (𝐴𝑖) are omitted. Nodes label with the negative ground literals ¬𝜃 (𝐵𝑗) have no children and do no derivation tree include the positive literal 𝜃 (𝐵𝑗) as the root label. Proof. To prove this claim, w… view at source ↗
read the original abstract

The paper analyzes and characterizes the algebraic and logical structure of the multiset semantics for SPARQL patterns involving AND, UNION, FILTER, EXCEPT, and SELECT. To do this, we align SPARQL with two well-established query languages: Datalog and Relational Algebra. Specifically, we study (i) a version of non-recursive Datalog with safe negation extended to support multisets, and (ii) a multiset relational algebra comprising projection, selection, natural join, arithmetic union, and except. We prove that these three formalisms are expressively equivalent under multiset semantics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper analyzes multiset semantics for a SPARQL fragment consisting of AND, UNION, FILTER, EXCEPT, and SELECT patterns. It introduces a non-recursive Datalog variant with safe negation extended to bags and a multiset relational algebra with projection, selection, natural join, arithmetic union, and except. The core result is a proof of expressive equivalence among these three formalisms under multiset semantics, via translations that preserve bag cardinalities.

Significance. If the translations hold, the result is significant for database theory and semantic web query processing. It supplies a precise correspondence that lets optimization and equivalence techniques transfer between SPARQL, Datalog, and relational algebra while respecting duplicate semantics. The explicit scoping to non-recursive safe-negation Datalog and the listed operators makes the claim falsifiable and practically relevant for implementation and verification tasks.

minor comments (2)
  1. The paper would benefit from an explicit table or diagram summarizing the operator correspondences and translation rules across the three formalisms.
  2. Notation for multiset cardinality and bag union could be introduced once in a preliminary section and then used consistently to reduce reader effort when following the inductive proofs.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the thorough summary and positive recommendation to accept the paper. We are pleased that the core contribution—the expressive equivalence of the SPARQL fragment, multiset Datalog with safe negation, and the specified multiset relational algebra under bag semantics—is recognized as significant for database theory and semantic web query processing.

Circularity Check

0 steps flagged

No significant circularity detected in the equivalence proof

full rationale

The paper's central claim is a direct proof of expressive equivalence among three independently defined formalisms (SPARQL fragment with AND/UNION/FILTER/EXCEPT/SELECT under multiset semantics, non-recursive safe-negation multiset Datalog, and the specified multiset relational algebra operators). The abstract and scope explicitly condition the result on these precise definitions and non-recursion, with no internal reduction of any prediction or uniqueness claim to a self-citation, fitted parameter, or definitional tautology. The derivation proceeds via standard inductive translations that preserve bag cardinalities, which is self-contained against external benchmarks and does not rely on load-bearing self-citations or ansatzes smuggled from prior work. This is the expected outcome for a well-scoped equivalence proof.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Limited information available from abstract only; relies on standard definitions of multisets and query operators.

axioms (2)
  • standard math Standard multiset semantics for relational operations (projection, selection, join, union, except)
    Invoked to extend Datalog and relational algebra to multisets.
  • domain assumption Safe negation and non-recursion in Datalog
    Required for the Datalog fragment studied.

pith-pipeline@v0.9.0 · 5389 in / 1091 out tokens · 53908 ms · 2026-05-09T19:04:48.414142+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 31 canonical work pages

  1. [1]

    1995.Foundations of Databases

    Serge Abiteboul, Richard Hull, and Victor Vianu. 1995.Foundations of Databases. Addison-Wesley

  2. [2]

    Afrati, Matthew Damigos, and Manolis Gergatsoulis

    Foto N. Afrati, Matthew Damigos, and Manolis Gergatsoulis. 2010. Query Containment Under Bag and Bag-set Semantics.Inform. Process. Lett.110, 10 (2010), 360–369. https://doi.org/10.1016/j.ipl.2010.02.017

  3. [3]

    Joseph Albert. 1991. Algebraic Properties of Bag Data Types. InProc. of the Int. Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 211–219. https://dl.acm.org/doi/10.5555/645917. 672310

  4. [4]

    Renzo Angles, Georg Gottlob, Aleksandar Pavlović, Reinhard Pichler, and Emanuel Sallinger. 2023. SparqLog: A System for Efficient Evaluation of SPARQL 1.1 Queries via Datalog.Proc. VLDB Endow.16, 13 (2023), 4240–4253. https://doi.org/10.14778/3625054.3625061

  5. [5]

    Renzo Angles and Claudio Gutiérrez. 2008. The Expressive Power of SPARQL. InProc. of the International Semantic Web Conference (ISWC) (LNCS, Vol. 5318). Springer, 114–129. https://doi.org/10.1007/978-3-540-88564-1_8

  6. [6]

    Renzo Angles and Claudio Gutiérrez. 2016. The Multiset Semantics of SPARQL Patterns. In15th International Semantic Web Conference (ISWC) (LNCS, Vol. 9981). Springer, 20–36. https://doi.org/10.1007/978-3-319-46523-4_2

  7. [7]

    Renzo Angles and Claudio Gutierrez. 2016. Negation in SPARQL. InAlberto Mendelzon Int. Workshop on Foundations of Data Management (AMW). https://ceur-ws.org/Vol-1644/paper11.pdf

  8. [8]

    Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. 2018. The Vadalog system: datalog-based reasoning for knowledge graphs.Proc. VLDB Endow.11, 9 (2018), 975–987. https://doi.org/10.14778/3213880.3213888

  9. [9]

    Leopoldo Bertossi, Georg Gottlob, and Reinhard Pichler. 2019. Datalog: Bag Semantics via Set Semantics. InInternational Conference on Database Theory (ICDT), Vol. 127. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 16:1–16:19. https://doi.org/10.4230/LIPIcs.ICDT.2019.16

  10. [10]

    Val Breazu-Tannen and Ramesh Subrahmanyam. 1991. Logical and computational aspects of programming with sets/bags/lists. InAutomata, Languages and Programming. Springer Berlin Heidelberg, Berlin, Heidelberg, 60–75

  11. [11]

    Artem Chebotko, Shiyong Lu, and Farshad Fotouhi. 2009. Semantics preserving SPARQL-to-SQL translation.Data & Knowledge Engineering68, 10 (2009), 973–1000. https://doi.org/10.1016/j.datak.2009.04.001

  12. [12]

    Sara Cohen. 2009. Equivalence of Queries That Are Sensitive to Multiplicities.The VLDB Journal18, 3 (June 2009), 765–785. https://doi.org/10.1007/s00778-008-0122-1

  13. [13]

    Colby and Leonid Libkin

    Latha S. Colby and Leonid Libkin. 1997. Tractable iteration mechanisms for bag languages. InInternational Conferencia on Database Theory (ICDT) (LNCS, Vol. 1186). Springer Berlin Heidelberg, Berlin, Heidelberg, 461–475. https://doi.org/ 10.1007/3-540-62222-5_64 38 Angles et al

  14. [14]

    Marco Console, Paolo Guagliardo, and Leonid Libkin. 2022. Fragments of bag relational algebra: Expressiveness and certain answers.Information Systems(2022). https://doi.org/10.1016/j.is.2020.101604

  15. [15]

    2005.A relational algebra for SPARQL

    Richard Cyganiak. 2005.A relational algebra for SPARQL. Technical Report HPL-2005-170. HP Labs

  16. [16]

    C. J. Date. 2006.Date on Database: Writings 2000-2006. APress

  17. [17]

    Umeshwar Dayal, Nathan Goodman, and Randy H. Katz. 1982. An Extended Relational Algebra with Control over Duplicate Elimination. InProc. of the Symposium on Principles of Database Systems (PODS). ACM, 117–123. https://doi.org/10.1145/588111.588132

  18. [18]

    Floris Geerts, Thomas Unger, Grigoris Karvounarakis, Irini Fundulaki, and Vassilis Christophides. 2016. Algebraic Structures for Capturing the Provenance of SPARQL Queries.J. ACM63, 1 (2016), 63 pages. https://doi.org/10.1145/ 2810037

  19. [19]

    Todd J. Green. 2009. Bag Semantics. InEncyclopedia of Database Systems. 201–206

  20. [20]

    Stéphane Grumbach, Leonid Libkin, Tova Milo, and Limsoon Wong. 1996. Query languages for bags: expressive power and complexity.SIGACT News27, 2 (1996), 30–44. https://doi.org/10.1145/235767.235770

  21. [21]

    Stéphane Grumbach and Tova Milo. 1996. Towards Tractable Algebras for Bags.J. Comput. System Sci.52, 3 (1996), 570–588. https://doi.org/10.1006/jcss.1996.0042

  22. [22]

    Paolo Guagliardo and Leonid Libkin. 2017. A Formal Semantics of SQL Queries, Its Validation, and Applications.Proc. VLDB Endow.11, 1 (2017), 27–39. https://doi.org/10.14778/3151113.3151116

  23. [23]

    Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language - W3C Recommendation. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/

  24. [24]

    12 Jean-Baptiste Courtois and Sylvain Schmitz

    A. Hernich and P. G. Kolaitis. 2017. Foundations of information integration under bag semantics. In32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). 1–12. https://doi.org/10.1109/LICS.2017.8005104

  25. [25]

    2020.The Problem of Incomplete Data in SPARQL

    Daniel Hernández. 2020.The Problem of Incomplete Data in SPARQL. Ph.D. dissertation. Universidad de Chile - Faculty of Physical and Mathematical Sciences, Santiago, Chile. https://repositorio.uchile.cl/handle/2250/178033

  26. [26]

    Aidan Hogan, Marcelo Arenas, Alejandro Mallea, and Axel Polleres. 2014. Everything You Always Wanted to Know About Blank Nodes.Journal of Web Semantics27, 1 (2014). https://doi.org/10.1016/j.websem.2014.06.004

  27. [27]

    Kostylev, and Bernardo Cuenca Grau

    Mark Kaminski, Egor V. Kostylev, and Bernardo Cuenca Grau. 2016. Semantics and Expressive Power of Subqueries and Aggregates in SPARQL 1.1. InProc. of the International Conference on World Wide Web. 227–238

  28. [28]

    Kostylev, and Bernardo Cuenca Grau

    Mark Kaminski, Egor V. Kostylev, and Bernardo Cuenca Grau. 2016. Semantics and Expressive Power of Subqueries and Aggregates in SPARQL 1.1.. InProceedings of the Int. Conference on World Wide Web (WWW). ACM, 227–238. https://doi.org/10.1145/2872427.2883022

  29. [29]

    Aviel Klausner and Nathan Goodman. 1985. Multirelations - Semantics and languages. InProc. of Int. Conference on Very Large Data Bases (VLDB). VLDB Endowment, 251–258. https://dl.acm.org/doi/10.5555/1286760.1286783

  30. [30]

    Kostylev

    Roman Kontchakov and Egor V. Kostylev. 2016. On Expressibility of Non-Monotone Operators in SPARQL. InInt. Conference on the Principles of Knowledge Representation and Reasoning. AAAI Press, 369–378. https://dl.acm.org/doi/ 10.5555/3032027.3032071

  31. [31]

    Lamperti, M

    G. Lamperti, M. Melchiori, and M. Zanella. 2001. On Multisets in Database Systems. InProceedings of the Workshop on Multiset Processing. 147–216. https://dl.acm.org/doi/10.5555/647269.721839

  32. [32]

    Leonid Libkin and Limsoon Wong. 1994. Some Properties of Query Languages for Bags. InProc. of the Int. Workshop on Database Programming Languages (DBPL) - Object Models and Languages. 97–114. https://doi.org/10.1007/978-1- 4471-3564-7_7

  33. [33]

    Leonid Libkin and Limsoon Wong. 1997. Query languages for bags and aggregate functions.J. Comput. System Sci.55, 2 (1997), 241–272. https://doi.org/10.1006/jcss.1997.1523

  34. [34]

    J. W. Lloyd. 1998.Programming with multisets. Technical Report. University of Bristol

  35. [35]

    Melton and A

    J. Melton and A. R. Simon. 2002.SQL:1999. Understanding Relational Language Components. Morgan Kaufmann Publ

  36. [36]

    I. S. Mumick, S. J. Finkelstein, Hamid Pirahesh, and Raghu Ramakrishnan. 1990. Magic is Relevant.SIGMOD Rec.19, 2 (1990), 247–258. https://doi.org/10.1145/93605.98734

  37. [37]

    Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. 1990. The Magic of Duplicates and Aggregates. InProc. of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 264–277. https://dl.acm.org/doi/10.5555/94362.94426

  38. [38]

    2006.Semantics of SPARQL

    Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2006.Semantics of SPARQL. Technical Report TR/DCC-2006-17. Department of Computer Science, University of Chile

  39. [39]

    Polleres

    A. Polleres. 2007. From SPARQL to Rules (and back). InProceedings of the 16th Int. World Wide Web Conference (WWW). ACM, 787–796. https://doi.org/10.1145/1242572.1242679

  40. [40]

    Axel Polleres and Johannes Peter Wallner. 2013. On the relation between SPARQL1.1 and Answer Set Programming. Journal of Applied Non-Classical Logics23, 1-2 (2013), 159–212. https://doi.org/10.1080/11663081.2013.798992

  41. [41]

    Eric Prud’hommeaux and Andy Seaborne. 2008. SPARQL Query Language for RDF. W3C Recommendation. http://www.w3.org/TR/2008/REC-115-sparql-query-20080115/. The multiset semantics of SPARQL patterns 39

  42. [42]

    Wilmer Ricciotti and James Cheney. 2019. Mixing Set and Bag Semantics. InProc. 17th ACM SIGPLAN International Symposium on Database Programming Languages (DBPL)(Phoenix, AZ, USA). ACM, New York, NY, USA, 70–73. https://doi.org/10.1145/3315507.3330202

  43. [43]

    Simon Schenk. 2007. A SPARQL Semantics Based on Datalog. InAnnual German Conference on Advances in Artificial Intelligence, Vol. 4667. 160–174. https://doi.org/10.1007/978-3-540-74565-5_14

  44. [44]

    Michael Schmidt, Michael Meier, and Georg Lausen. 2010. Foundations of SPARQL query optimization. InProc. of the Int. Conference on Database Theory. ACM, 4–33. https://doi.org/10.1145/1804669.1804675

  45. [45]

    Xiaowang Zhang and Jan Van den Bussche. 2014. On the primitivity of operators in SPARQL.Inf. Process. Lett.114, 9 (2014), 480–485. https://doi.org/10.1016/j.ipl.2014.03.014 A Variable renaming in SPARQL This appendix section defines function subs(·,·), which renames SPARQL variables. This function is used to simulate the MRA operator renaming 𝜌𝐴/𝐵 (see Ta...