arxiv: 2605.00417 · v1 · submitted 2026-05-01 · 💻 cs.DB · cs.LO

Recognition: unknown

Multiset semantics in SPARQL, Relational Algebra and Datalog

Renzo Angles , Claudio Gutierrez , Daniel Hern\'andez

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:04 UTC · model grok-4.3

classification 💻 cs.DB cs.LO

keywords multiset semanticsSPARQLDatalogrelational algebraexpressive equivalencequery languagesRDF data

0 comments

The pith

SPARQL patterns using AND, UNION, FILTER, EXCEPT and SELECT are expressively equivalent to multiset Datalog and relational algebra.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that SPARQL query patterns built from AND, UNION, FILTER, EXCEPT and SELECT produce the same multisets of results as non-recursive Datalog programs with safe negation and as expressions in a multiset relational algebra with projection, selection, natural join, arithmetic union and except. This matters for a sympathetic reader because it lets database theory developed for Datalog and relational algebra transfer directly to SPARQL, which is the standard language for querying RDF data. The work therefore supplies a precise algebraic and logical account of how bag semantics behaves in the SPARQL fragment that corresponds to these operators. If the equivalence holds, any optimization or rewriting technique valid in one formalism is valid in the others.

Core claim

We prove that these three formalisms are expressively equivalent under multiset semantics. The proof proceeds by exhibiting translations between SPARQL patterns, multiset Datalog programs and multiset relational algebra expressions that preserve the multiplicity of every result tuple for the listed operators.

What carries the argument

The explicit translations that map SPARQL patterns to multiset Datalog rules and to multiset relational algebra expressions while preserving tuple multiplicities.

If this is right

Any SPARQL query in the fragment can be rewritten into an equivalent multiset relational algebra expression.
Optimizations and equivalences known for multiset relational algebra apply unchanged to the corresponding SPARQL patterns.
Non-recursive Datalog with safe negation can serve as an intermediate representation for evaluating these SPARQL queries.
The algebraic structure of multiset SPARQL is fully characterized by the operators of the aligned Datalog and relational algebra.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The equivalence supplies a route for implementing SPARQL engines on top of existing Datalog or relational algebra engines that already handle multisets.
Similar translations might be attempted for additional SPARQL operators such as OPTIONAL or property paths if suitable multiset extensions are defined.
The result clarifies why certain duplicate-handling behaviors in SPARQL match the bag semantics already present in SQL.

Load-bearing premise

The claimed equivalence is limited to the listed SPARQL operators, non-recursive Datalog with safe negation, and the particular multiset operations defined for relational algebra in the paper.

What would settle it

A SPARQL pattern using only AND, UNION, FILTER, EXCEPT and SELECT whose result multiset cannot be produced by any non-recursive safe-negation Datalog program or by any expression in the paper's multiset relational algebra.

Figures

Figures reproduced from arXiv: 2605.00417 by Claudio Gutierrez, Daniel Hern\'andez, Renzo Angles.

**Figure 1.** Figure 1: Transitivity of language containment. The figure represents three languages view at source ↗

**Figure 2.** Figure 2: The triangle of simulations among SPARQL, Non-Recursive Multiset Datalog with Safe Negation view at source ↗

**Figure 3.** Figure 3: Example of derivation trees. Let 𝐷 be a NRMD¬ database, 𝐹 = 𝑟(𝑎) be a fact in 𝐷 with card(𝐹, 𝐷) = 2, Π = {𝑅, 𝑆} be a NRMD¬ program where 𝑅 is the rule 𝑞(𝑋) ← 𝑟(𝑋), 𝑝(𝑋) and 𝑆 is the rule 𝑝(𝑋) ← 𝑟(𝑋). This figure shows the derivation trees of Π with respect to 𝐷. A variable 𝑋 occurs positively in a rule 𝑅 if and only if 𝑋 occurs in a positive literal in the body of 𝑅. A rule 𝑅 is said to be safe if all its … view at source ↗

**Figure 4.** Figure 4: Parse tree of the normalized SPARQL graph pattern presented in Example view at source ↗

**Figure 5.** Figure 5: Derivation trees for the ground literal 𝜃 (𝑝(𝑋¯)) regarding query (𝑝(𝑋¯), {𝑅}) (on the left), and query (𝑝(𝑋¯), {𝑅 𝐴 2 , 𝑅′′}) (on the right). The children of the nodes labeled with the positive ground literals 𝜃 (𝐴𝑖) are omitted. (2) If 𝑚 > 1 and 𝑛 = 0 then the normalization of rule 𝑅 consists of a set Π𝑅 of rules 𝑅 𝐴 𝑖 , for 2 ≤ 𝑖 ≤ 𝑚, defined recursively as follows: 𝑅 𝐴 2 = 𝑞 𝐴 2 (𝑌¯ 2) ← 𝐴1, 𝐴2, 𝑅 𝐴 𝑖 … view at source ↗

**Figure 6.** Figure 6: Derivation trees for the ground atom 𝜃 (𝑝(𝑋¯)) regarding query (𝑝(𝑋¯), {𝑅}) (on the left), and query (𝑝(𝑋¯), {𝑅 𝐴 2 , 𝑅′′}) (on the right). The children of the nodes labeled with the positive ground literals 𝜃 (𝐴𝑖) are omitted. Nodes label with the negative ground literals ¬𝜃 (𝐵𝑗) have no children and do no derivation tree include the positive literal 𝜃 (𝐵𝑗) as the root label. Proof. To prove this claim, w… view at source ↗

read the original abstract

The paper analyzes and characterizes the algebraic and logical structure of the multiset semantics for SPARQL patterns involving AND, UNION, FILTER, EXCEPT, and SELECT. To do this, we align SPARQL with two well-established query languages: Datalog and Relational Algebra. Specifically, we study (i) a version of non-recursive Datalog with safe negation extended to support multisets, and (ii) a multiset relational algebra comprising projection, selection, natural join, arithmetic union, and except. We prove that these three formalisms are expressively equivalent under multiset semantics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper proves expressive equivalence under multiset semantics for a SPARQL fragment with AND/UNION/FILTER/EXCEPT/SELECT, non-recursive safe-negation Datalog, and a matching multiset relational algebra.

read the letter

The main point is a clean proof that these three formalisms match exactly when you count multiplicities instead of treating results as sets. They define multiset versions of each language, then give translations both ways that keep bag cardinalities intact for the listed operators. This extends the older set-semantics results in a way that lines up better with how actual engines return duplicates. The work is useful because it lets you move queries or optimizations across SPARQL, Datalog, and relational algebra without losing multiplicity information. The proofs appear to be inductive and scoped tightly to non-recursion and the five SPARQL operators, which avoids overclaiming. The multiset relational algebra uses natural join, arithmetic union, and except in ways that support the equivalence, and the Datalog side adds safe negation to handle EXCEPT. That choice is reasonable and keeps everything decidable. The main limitation is the narrow fragment. Full SPARQL includes OPTIONAL, property paths, and other constructs, while recursive Datalog is common in practice, so the result does not apply there. Readers will need to verify that the paper's specific bag definitions match the ones they care about. The literature citations look standard and build on prior set-based equivalences without obvious gaps. This paper is for database theorists and semantic-web researchers who work on query language foundations or cross-system translations. A reader already familiar with the set versions will pick up the multiset adjustments quickly. It deserves a serious referee because the claim is precise, the scope is stated up front, and the approach is standard for this area. I would send it to peer review.

Referee Report

0 major / 2 minor

Summary. The paper analyzes multiset semantics for a SPARQL fragment consisting of AND, UNION, FILTER, EXCEPT, and SELECT patterns. It introduces a non-recursive Datalog variant with safe negation extended to bags and a multiset relational algebra with projection, selection, natural join, arithmetic union, and except. The core result is a proof of expressive equivalence among these three formalisms under multiset semantics, via translations that preserve bag cardinalities.

Significance. If the translations hold, the result is significant for database theory and semantic web query processing. It supplies a precise correspondence that lets optimization and equivalence techniques transfer between SPARQL, Datalog, and relational algebra while respecting duplicate semantics. The explicit scoping to non-recursive safe-negation Datalog and the listed operators makes the claim falsifiable and practically relevant for implementation and verification tasks.

minor comments (2)

The paper would benefit from an explicit table or diagram summarizing the operator correspondences and translation rules across the three formalisms.
Notation for multiset cardinality and bag union could be introduced once in a preliminary section and then used consistently to reduce reader effort when following the inductive proofs.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the thorough summary and positive recommendation to accept the paper. We are pleased that the core contribution—the expressive equivalence of the SPARQL fragment, multiset Datalog with safe negation, and the specified multiset relational algebra under bag semantics—is recognized as significant for database theory and semantic web query processing.

Circularity Check

0 steps flagged

No significant circularity detected in the equivalence proof

full rationale

The paper's central claim is a direct proof of expressive equivalence among three independently defined formalisms (SPARQL fragment with AND/UNION/FILTER/EXCEPT/SELECT under multiset semantics, non-recursive safe-negation multiset Datalog, and the specified multiset relational algebra operators). The abstract and scope explicitly condition the result on these precise definitions and non-recursion, with no internal reduction of any prediction or uniqueness claim to a self-citation, fitted parameter, or definitional tautology. The derivation proceeds via standard inductive translations that preserve bag cardinalities, which is self-contained against external benchmarks and does not rely on load-bearing self-citations or ansatzes smuggled from prior work. This is the expected outcome for a well-scoped equivalence proof.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Limited information available from abstract only; relies on standard definitions of multisets and query operators.

axioms (2)

standard math Standard multiset semantics for relational operations (projection, selection, join, union, except)
Invoked to extend Datalog and relational algebra to multisets.
domain assumption Safe negation and non-recursion in Datalog
Required for the Datalog fragment studied.

pith-pipeline@v0.9.0 · 5389 in / 1091 out tokens · 53908 ms · 2026-05-09T19:04:48.414142+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 31 canonical work pages

[1]

1995.Foundations of Databases

Serge Abiteboul, Richard Hull, and Victor Vianu. 1995.Foundations of Databases. Addison-Wesley

1995
[2]

Afrati, Matthew Damigos, and Manolis Gergatsoulis

Foto N. Afrati, Matthew Damigos, and Manolis Gergatsoulis. 2010. Query Containment Under Bag and Bag-set Semantics.Inform. Process. Lett.110, 10 (2010), 360–369. https://doi.org/10.1016/j.ipl.2010.02.017

work page doi:10.1016/j.ipl.2010.02.017 2010
[3]

Joseph Albert. 1991. Algebraic Properties of Bag Data Types. InProc. of the Int. Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 211–219. https://dl.acm.org/doi/10.5555/645917. 672310

work page doi:10.5555/645917 1991
[4]

Renzo Angles, Georg Gottlob, Aleksandar Pavlović, Reinhard Pichler, and Emanuel Sallinger. 2023. SparqLog: A System for Efficient Evaluation of SPARQL 1.1 Queries via Datalog.Proc. VLDB Endow.16, 13 (2023), 4240–4253. https://doi.org/10.14778/3625054.3625061

work page doi:10.14778/3625054.3625061 2023
[5]

Renzo Angles and Claudio Gutiérrez. 2008. The Expressive Power of SPARQL. InProc. of the International Semantic Web Conference (ISWC) (LNCS, Vol. 5318). Springer, 114–129. https://doi.org/10.1007/978-3-540-88564-1_8

work page doi:10.1007/978-3-540-88564-1_8 2008
[6]

Renzo Angles and Claudio Gutiérrez. 2016. The Multiset Semantics of SPARQL Patterns. In15th International Semantic Web Conference (ISWC) (LNCS, Vol. 9981). Springer, 20–36. https://doi.org/10.1007/978-3-319-46523-4_2

work page doi:10.1007/978-3-319-46523-4_2 2016
[7]

Renzo Angles and Claudio Gutierrez. 2016. Negation in SPARQL. InAlberto Mendelzon Int. Workshop on Foundations of Data Management (AMW). https://ceur-ws.org/Vol-1644/paper11.pdf

2016
[8]

Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. 2018. The Vadalog system: datalog-based reasoning for knowledge graphs.Proc. VLDB Endow.11, 9 (2018), 975–987. https://doi.org/10.14778/3213880.3213888

work page doi:10.14778/3213880.3213888 2018
[9]

Leopoldo Bertossi, Georg Gottlob, and Reinhard Pichler. 2019. Datalog: Bag Semantics via Set Semantics. InInternational Conference on Database Theory (ICDT), Vol. 127. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 16:1–16:19. https://doi.org/10.4230/LIPIcs.ICDT.2019.16

work page doi:10.4230/lipics.icdt.2019.16 2019
[10]

Val Breazu-Tannen and Ramesh Subrahmanyam. 1991. Logical and computational aspects of programming with sets/bags/lists. InAutomata, Languages and Programming. Springer Berlin Heidelberg, Berlin, Heidelberg, 60–75

1991
[11]

Artem Chebotko, Shiyong Lu, and Farshad Fotouhi. 2009. Semantics preserving SPARQL-to-SQL translation.Data & Knowledge Engineering68, 10 (2009), 973–1000. https://doi.org/10.1016/j.datak.2009.04.001

work page doi:10.1016/j.datak.2009.04.001 2009
[12]

Sara Cohen. 2009. Equivalence of Queries That Are Sensitive to Multiplicities.The VLDB Journal18, 3 (June 2009), 765–785. https://doi.org/10.1007/s00778-008-0122-1

work page doi:10.1007/s00778-008-0122-1 2009
[13]

Colby and Leonid Libkin

Latha S. Colby and Leonid Libkin. 1997. Tractable iteration mechanisms for bag languages. InInternational Conferencia on Database Theory (ICDT) (LNCS, Vol. 1186). Springer Berlin Heidelberg, Berlin, Heidelberg, 461–475. https://doi.org/ 10.1007/3-540-62222-5_64 38 Angles et al

work page doi:10.1007/3-540-62222-5_64 1997
[14]

Marco Console, Paolo Guagliardo, and Leonid Libkin. 2022. Fragments of bag relational algebra: Expressiveness and certain answers.Information Systems(2022). https://doi.org/10.1016/j.is.2020.101604

work page doi:10.1016/j.is.2020.101604 2022
[15]

2005.A relational algebra for SPARQL

Richard Cyganiak. 2005.A relational algebra for SPARQL. Technical Report HPL-2005-170. HP Labs

2005
[16]

C. J. Date. 2006.Date on Database: Writings 2000-2006. APress

2006
[17]

Umeshwar Dayal, Nathan Goodman, and Randy H. Katz. 1982. An Extended Relational Algebra with Control over Duplicate Elimination. InProc. of the Symposium on Principles of Database Systems (PODS). ACM, 117–123. https://doi.org/10.1145/588111.588132

work page doi:10.1145/588111.588132 1982
[18]

Floris Geerts, Thomas Unger, Grigoris Karvounarakis, Irini Fundulaki, and Vassilis Christophides. 2016. Algebraic Structures for Capturing the Provenance of SPARQL Queries.J. ACM63, 1 (2016), 63 pages. https://doi.org/10.1145/ 2810037

2016
[19]

Todd J. Green. 2009. Bag Semantics. InEncyclopedia of Database Systems. 201–206

2009
[20]

Stéphane Grumbach, Leonid Libkin, Tova Milo, and Limsoon Wong. 1996. Query languages for bags: expressive power and complexity.SIGACT News27, 2 (1996), 30–44. https://doi.org/10.1145/235767.235770

work page doi:10.1145/235767.235770 1996
[21]

Stéphane Grumbach and Tova Milo. 1996. Towards Tractable Algebras for Bags.J. Comput. System Sci.52, 3 (1996), 570–588. https://doi.org/10.1006/jcss.1996.0042

work page doi:10.1006/jcss.1996.0042 1996
[22]

Paolo Guagliardo and Leonid Libkin. 2017. A Formal Semantics of SQL Queries, Its Validation, and Applications.Proc. VLDB Endow.11, 1 (2017), 27–39. https://doi.org/10.14778/3151113.3151116

work page doi:10.14778/3151113.3151116 2017
[23]

Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language - W3C Recommendation. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/

2013
[24]

12 Jean-Baptiste Courtois and Sylvain Schmitz

A. Hernich and P. G. Kolaitis. 2017. Foundations of information integration under bag semantics. In32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). 1–12. https://doi.org/10.1109/LICS.2017.8005104

work page doi:10.1109/lics.2017.8005104 2017
[25]

2020.The Problem of Incomplete Data in SPARQL

Daniel Hernández. 2020.The Problem of Incomplete Data in SPARQL. Ph.D. dissertation. Universidad de Chile - Faculty of Physical and Mathematical Sciences, Santiago, Chile. https://repositorio.uchile.cl/handle/2250/178033

2020
[26]

Aidan Hogan, Marcelo Arenas, Alejandro Mallea, and Axel Polleres. 2014. Everything You Always Wanted to Know About Blank Nodes.Journal of Web Semantics27, 1 (2014). https://doi.org/10.1016/j.websem.2014.06.004

work page doi:10.1016/j.websem.2014.06.004 2014
[27]

Kostylev, and Bernardo Cuenca Grau

Mark Kaminski, Egor V. Kostylev, and Bernardo Cuenca Grau. 2016. Semantics and Expressive Power of Subqueries and Aggregates in SPARQL 1.1. InProc. of the International Conference on World Wide Web. 227–238

2016
[28]

Kostylev, and Bernardo Cuenca Grau

Mark Kaminski, Egor V. Kostylev, and Bernardo Cuenca Grau. 2016. Semantics and Expressive Power of Subqueries and Aggregates in SPARQL 1.1.. InProceedings of the Int. Conference on World Wide Web (WWW). ACM, 227–238. https://doi.org/10.1145/2872427.2883022

work page doi:10.1145/2872427.2883022 2016
[29]

Aviel Klausner and Nathan Goodman. 1985. Multirelations - Semantics and languages. InProc. of Int. Conference on Very Large Data Bases (VLDB). VLDB Endowment, 251–258. https://dl.acm.org/doi/10.5555/1286760.1286783

work page doi:10.5555/1286760.1286783 1985
[30]

Kostylev

Roman Kontchakov and Egor V. Kostylev. 2016. On Expressibility of Non-Monotone Operators in SPARQL. InInt. Conference on the Principles of Knowledge Representation and Reasoning. AAAI Press, 369–378. https://dl.acm.org/doi/ 10.5555/3032027.3032071

work page doi:10.5555/3032027.3032071 2016
[31]

Lamperti, M

G. Lamperti, M. Melchiori, and M. Zanella. 2001. On Multisets in Database Systems. InProceedings of the Workshop on Multiset Processing. 147–216. https://dl.acm.org/doi/10.5555/647269.721839

work page doi:10.5555/647269.721839 2001
[32]

Leonid Libkin and Limsoon Wong. 1994. Some Properties of Query Languages for Bags. InProc. of the Int. Workshop on Database Programming Languages (DBPL) - Object Models and Languages. 97–114. https://doi.org/10.1007/978-1- 4471-3564-7_7

work page doi:10.1007/978-1- 1994
[33]

Leonid Libkin and Limsoon Wong. 1997. Query languages for bags and aggregate functions.J. Comput. System Sci.55, 2 (1997), 241–272. https://doi.org/10.1006/jcss.1997.1523

work page doi:10.1006/jcss.1997.1523 1997
[34]

J. W. Lloyd. 1998.Programming with multisets. Technical Report. University of Bristol

1998
[35]

Melton and A

J. Melton and A. R. Simon. 2002.SQL:1999. Understanding Relational Language Components. Morgan Kaufmann Publ

2002
[36]

I. S. Mumick, S. J. Finkelstein, Hamid Pirahesh, and Raghu Ramakrishnan. 1990. Magic is Relevant.SIGMOD Rec.19, 2 (1990), 247–258. https://doi.org/10.1145/93605.98734

work page doi:10.1145/93605.98734 1990
[37]

Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. 1990. The Magic of Duplicates and Aggregates. InProc. of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 264–277. https://dl.acm.org/doi/10.5555/94362.94426

work page doi:10.5555/94362.94426 1990
[38]

2006.Semantics of SPARQL

Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2006.Semantics of SPARQL. Technical Report TR/DCC-2006-17. Department of Computer Science, University of Chile

2006
[39]

Polleres

A. Polleres. 2007. From SPARQL to Rules (and back). InProceedings of the 16th Int. World Wide Web Conference (WWW). ACM, 787–796. https://doi.org/10.1145/1242572.1242679

work page doi:10.1145/1242572.1242679 2007
[40]

Axel Polleres and Johannes Peter Wallner. 2013. On the relation between SPARQL1.1 and Answer Set Programming. Journal of Applied Non-Classical Logics23, 1-2 (2013), 159–212. https://doi.org/10.1080/11663081.2013.798992

work page doi:10.1080/11663081.2013.798992 2013
[41]

Eric Prud’hommeaux and Andy Seaborne. 2008. SPARQL Query Language for RDF. W3C Recommendation. http://www.w3.org/TR/2008/REC-115-sparql-query-20080115/. The multiset semantics of SPARQL patterns 39

2008
[42]

Wilmer Ricciotti and James Cheney. 2019. Mixing Set and Bag Semantics. InProc. 17th ACM SIGPLAN International Symposium on Database Programming Languages (DBPL)(Phoenix, AZ, USA). ACM, New York, NY, USA, 70–73. https://doi.org/10.1145/3315507.3330202

work page doi:10.1145/3315507.3330202 2019
[43]

Simon Schenk. 2007. A SPARQL Semantics Based on Datalog. InAnnual German Conference on Advances in Artificial Intelligence, Vol. 4667. 160–174. https://doi.org/10.1007/978-3-540-74565-5_14

work page doi:10.1007/978-3-540-74565-5_14 2007
[44]

Michael Schmidt, Michael Meier, and Georg Lausen. 2010. Foundations of SPARQL query optimization. InProc. of the Int. Conference on Database Theory. ACM, 4–33. https://doi.org/10.1145/1804669.1804675

work page doi:10.1145/1804669.1804675 2010
[45]

Xiaowang Zhang and Jan Van den Bussche. 2014. On the primitivity of operators in SPARQL.Inf. Process. Lett.114, 9 (2014), 480–485. https://doi.org/10.1016/j.ipl.2014.03.014 A Variable renaming in SPARQL This appendix section defines function subs(·,·), which renames SPARQL variables. This function is used to simulate the MRA operator renaming 𝜌𝐴/𝐵 (see Ta...

work page doi:10.1016/j.ipl.2014.03.014 2014