pith. machine review for the scientific record. sign in

arxiv: 2604.22422 · v1 · submitted 2026-04-24 · 💻 cs.DB · cs.AI

Recognition: unknown

How Hard is it to Decide if a Fact is Relevant to a Query?

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:15 UTC · model grok-4.3

classification 💻 cs.DB cs.AI
keywords conjunctive queriesquery relevanceself-joinshypertreewidthDL-Lite ontologiescombined complexityminimal subsets
0
0 comments X

The pith

Relevance of a database fact to a conjunctive query matches the complexity of query evaluation once self-joins are forbidden or bounded.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the problem of deciding whether a given fact in a database belongs to a minimal subset that satisfies a Boolean conjunctive query. It identifies self-joins, meaning multiple atoms using the same relation symbol, as the feature responsible for making relevance strictly harder than ordinary query evaluation. When the number of self-joins is zero or bounded, the decision problem falls into exactly the same complexity classes as query evaluation: NP-complete in the general case and LogCFL-complete for queries of bounded hypertreewidth. An analogous result holds for ontology-mediated queries over DL-Lite_R ontologies when a generalized interaction-width measure is bounded.

Core claim

The central claim is that self-joins are the sole source of the extra hardness: relevance is NP-complete for arbitrary conjunctive queries and LogCFL-complete for bounded-hypertreewidth queries, but drops to the same classes as query evaluation itself as soon as self-joins are forbidden or limited in number. The same collapse occurs for ontology-mediated queries once interaction width is bounded.

What carries the argument

Self-join width (the maximum number of times any single relation appears in the query body), which the authors prove is the parameter that separates the complexity of relevance checking from that of query evaluation.

If this is right

  • Relevance checking can be solved by the same algorithms used for query evaluation on any class of queries that forbids or bounds self-joins.
  • For queries of bounded hypertreewidth without self-joins, relevance belongs to LogCFL and therefore admits highly parallel evaluation.
  • In the ontology setting, bounding interaction width makes relevance no harder than standard ontology-mediated query answering over DL-Lite_R.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Query-rewriting or reformulation tools could usefully rewrite queries to eliminate self-joins in order to obtain efficient relevance explanations.
  • The same width-based restrictions may simplify other explanation tasks such as why-not queries or provenance computation.
  • The results suggest that interaction-width measures could be useful for classifying the complexity of explanation problems beyond DL-Lite_R.

Load-bearing premise

The analysis uses the standard definition of relevance as membership in some minimal satisfying subset of the database and studies combined complexity.

What would settle it

A concrete counter-example would be an infinite family of self-join-free conjunctive queries for which relevance cannot be decided in nondeterministic polynomial time.

Figures

Figures reproduced from arXiv: 2604.22422 by Diego Figueira, Meghyn Bienvenu, Pierre Lafourcade.

Figure 1
Figure 1. Figure 1: An example query and database to illustrate the notion of view at source ↗
Figure 2
Figure 2. Figure 2: Interacting atoms where T := {R ′ ⊑ R, R′ ⊑ R −}. Definition 5.6 (Interacting query atoms). Given an OMQ Q = (T , q) ∈ (DL-LiteR, CQ), we say that distinct atoms α, β of q interact if there exists a fact f such that({f}, T ) |= α and ({f}, T ) |= β (with α and β treated as Boolean CQs). An atom α interacts with itself if there exist a fact f, homo￾morphisms h1 : α hom −−→ I{f},T and h2 : α hom −−→ I{f},T ,… view at source ↗
Figure 3
Figure 3. Figure 3: Database DG for Theorem 4.2. Arrows represent r facts. A Appendix to Section 3 “Query Relevance Problem” Proposition 3.7. For any class Q of (ontology-mediated) CQs, the query evaluation problem for Q is in L C , where C is the complexity class of the relevance problem for Q. Proof. It suffices to iterate, in logspace, over all facts in the database and test whether the considered fact is relevant. Query e… view at source ↗
Figure 4
Figure 4. Figure 4: Visual support for proof of Lemma 4.9. Proof. 1 ⇒ 2 If S ⊆ D is a minimal support for q, let h : q hom −−→ D be a homomorphism realizing it, i.e., such that h(q) = S. Consider the equivalence relation E ∈ Eq induced by h (i.e., where (t, t′ ) ∈ E iff h(t) = h(t ′ ) for all t, t′ ∈ Mq ∪ const(q)). Hence, hˆ : q ̸= E hom −−→ S via the relativization hˆ of h onto the variables of q ̸= E . Observe that hˆ : q … view at source ↗
Figure 5
Figure 5. Figure 5: Construction for Lemma D.1. The loops labelled Cp are simple cycles of length n, and the arrows labelled Ppn+2 are simple paths of length 7. The two anchor vertices are uc and ud. containing arity 2 relations. Then there exist two digraphs Gq, GD = (Vq, Eq),(VD, ED) such that: 1. there exists a bijective mapping η from the homomorphic images of q in D to the Gq-homomorphic images on GD that preserves the i… view at source ↗
Figure 6
Figure 6. Figure 6: Visual support for proof of Claim D.4 Proof. From left to right, we have that (u ′ , v′ ) belongs to a minimal G-homomorphic image h(G) via the digraph ho￾momorphism h : G hom −−→ G′ . In particular, h(u, v) = (u ′ , v′ ) for some (u, v) ∈ E(G) – cf view at source ↗
read the original abstract

We consider the following fundamental problem: given a database D, Boolean conjunctive query (CQ) q, and fact f in D, decide whether f is relevant to q wrt. D, i.e., does f belong to a minimal subset S of D such that S |= q. Despite being of central importance to query answer explanation, the combined complexity of deciding query relevance has not been studied in detail, leaving open what makes this problem hard, and which restrictions can yield lower complexity. Relevance has already been shown to be harder than query evaluation: namely, $\Sigma^p_2$-complete for CQs, even over a binary signature. We further observe that NP-hardness applies already to (acyclic) chain CQs. Our work identifies self-joins (multiple atoms with the same relation) as the culprit. Indeed, we prove that if we forbid or bound the occurrence of self-joins, then relevance has the same complexity as query evaluation, namely, NP (without structural restrictions) and LogCFL (for bounded hypertreewidth classes). In the ontology setting, we establish an analogous result for ontology-mediated queries consisting of a CQ and DL-Lite_R ontology, namely that relevance is no harder than query answering provided that we bound the interaction width (which generalizes both self-join width and a recently introduced 'interaction-free' condition). Our results thus pinpoint what makes relevance harder than query evaluation and identify natural classes of queries which admit efficient relevance computation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper studies the combined complexity of deciding relevance of a fact f to a Boolean conjunctive query q over database D, i.e., whether f belongs to some minimal subset S ⊆ D with S |= q. It shows Σ₂ᵖ-completeness in general (even for binary signatures) and NP-hardness already for acyclic chain queries. The main result is that forbidding or bounding self-joins makes relevance equivalent in complexity to plain CQ evaluation: NP-complete without structural restrictions, and in LogCFL for bounded hypertreewidth. An analogous result holds for DL-Lite_R ontology-mediated queries when interaction width (generalizing self-join width) is bounded.

Significance. If the results hold, the work precisely identifies self-joins as the source of extra hardness separating relevance from query evaluation, while identifying natural, practical classes (bounded self-joins, bounded interaction width) where relevance inherits the same efficient algorithms and complexity as evaluation. This has direct value for query explanation, debugging, and provenance systems. The paper earns credit for cleanly separating the self-join-free case (where a homomorphism certificate works directly) from the bounded case and for extending the analysis to ontology-mediated queries.

major comments (3)
  1. [Section 4] Section 4 (NP membership for bounded self-join width): The argument that relevance remains in NP when self-joins are bounded by a constant relies on a reduction to CQ evaluation, but it is unclear how minimality of the support is enforced. A homomorphism placing f in its image may produce a non-minimal support when multiple atoms over the same relation can share or not share facts; the certificate must therefore guarantee existence of some minimal support containing f, not merely a satisfying one. This step is load-bearing for the central claim that relevance matches query-evaluation complexity.
  2. [Section 5] Section 5 (LogCFL for bounded hypertreewidth + bounded self-joins): The reduction establishing membership in LogCFL does not explicitly address how the hypertree decomposition is adapted to enforce minimality of the witness set containing f. Since the reader's verification of the hypertreewidth case is incomplete, a concrete sketch of the decomposition construction or a reference to the exact lemma handling the interaction between width bounds and minimality is required.
  3. [Section 6] Section 6 (ontology-mediated queries): The claim that relevance is no harder than query answering under bounded interaction width is stated via reduction, but the interaction-width definition and its relationship to both self-join width and the prior 'interaction-free' condition are only sketched; a formal definition and a short proof that the bound preserves the LogCFL upper bound would strengthen the result.
minor comments (2)
  1. [Abstract] The abstract and introduction use 'interaction width' without an inline definition or pointer to its formalization; adding a one-sentence definition or equation reference would improve readability.
  2. [Table 1] Table 1 (complexity summary) would benefit from an extra column explicitly listing the self-join-width or interaction-width parameter for each row.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading, the positive assessment of significance, and the constructive comments on proof clarity. We will revise the manuscript to address all points by adding explicit details on minimality enforcement, decomposition sketches, and formal definitions. These changes strengthen the presentation while preserving the stated results.

read point-by-point responses
  1. Referee: [Section 4] Section 4 (NP membership for bounded self-join width): The argument that relevance remains in NP when self-joins are bounded by a constant relies on a reduction to CQ evaluation, but it is unclear how minimality of the support is enforced. A homomorphism placing f in its image may produce a non-minimal support when multiple atoms over the same relation can share or not share facts; the certificate must therefore guarantee existence of some minimal support containing f, not merely a satisfying one. This step is load-bearing for the central claim that relevance matches query-evaluation complexity.

    Authors: We agree the current write-up leaves the minimality step implicit. Because self-join width is bounded by a fixed constant k, the NP certificate guesses both a homomorphism from q to D and an assignment of at most k facts per relation such that f is used and every fact in the image is necessary for the homomorphism to hold (verified by checking that removing any one fact falsifies at least one atom). The bounded k ensures the certificate size remains polynomial. We will add a dedicated paragraph and a short lemma in the revised Section 4 that formally shows this certificate characterises exactly the minimal supports containing f. revision: yes

  2. Referee: [Section 5] Section 5 (LogCFL for bounded hypertreewidth + bounded self-joins): The reduction establishing membership in LogCFL does not explicitly address how the hypertree decomposition is adapted to enforce minimality of the witness set containing f. Since the reader's verification of the hypertreewidth case is incomplete, a concrete sketch of the decomposition construction or a reference to the exact lemma handling the interaction between width bounds and minimality is required.

    Authors: We accept that an explicit sketch is needed for verification. The hypertree decomposition is extended by augmenting each bag with a constant-size set of fact identifiers (bounded by the self-join width) that records which facts realise the atoms involving f; minimality is enforced bottom-up by a LogCFL-computable check that discards any fact whose removal still permits a satisfying assignment in the subtree. We will insert a concrete two-paragraph sketch of this construction together with a pointer to the supporting lemma on width-bounded minimal-witness decompositions in the revised Section 5. revision: yes

  3. Referee: [Section 6] Section 6 (ontology-mediated queries): The claim that relevance is no harder than query answering under bounded interaction width is stated via reduction, but the interaction-width definition and its relationship to both self-join width and the prior 'interaction-free' condition are only sketched; a formal definition and a short proof that the bound preserves the LogCFL upper bound would strengthen the result.

    Authors: We will expand Section 6 with a formal definition of interaction width (the maximum number of atoms that interact via a single ontology axiom or self-join) and a short proof that bounded interaction width reduces ontology-mediated relevance to a bounded-self-join CQ instance whose LogCFL membership follows from the result of Section 5. The definition will also explicitly relate interaction width to the earlier interaction-free condition. These additions will be included in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: standard complexity reductions to query evaluation

full rationale

The paper establishes membership and hardness results for relevance via explicit polynomial-time reductions and certificates that compare directly to the known combined complexity of CQ evaluation (NP-complete in general, LogCFL for bounded hypertreewidth). These reductions are external to the paper's own definitions: they invoke standard homomorphism-based certificates for query satisfaction and show that, under bounded self-join width, a minimal support containing the distinguished fact can be witnessed without introducing new parameters or self-referential assumptions. No self-citations are load-bearing for the central claims, no fitted quantities are renamed as predictions, and the ontology-mediated extension similarly reduces to interaction-width bounds without circular steps. The derivation chain is therefore self-contained against external complexity benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard definitions of conjunctive queries, minimal models for entailment, and known complexity results for query evaluation; no free parameters, new entities, or ad-hoc axioms are introduced.

axioms (2)
  • domain assumption Standard definition of Boolean conjunctive queries and minimal subsets S such that S |= q
    Used to define the relevance decision problem.
  • standard math Known combined complexity of CQ evaluation (NP-complete in general, LogCFL for bounded hypertreewidth)
    Basis for showing relevance matches evaluation complexity under restrictions.

pith-pipeline@v0.9.0 · 5568 in / 1277 out tokens · 63439 ms · 2026-05-08T09:15:00.313168+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references

  1. [1]

    Ceylan, ˙I

    Reasoning about explanations for negative query an- swers in DL-Lite.Journal of Artificial Intelligence Research (JAIR)48:635–669. Ceylan, ˙I. ˙I.; Lukasiewicz, T.; Malizia, E.; and Vaicenavi- cius, A. 2019. Explanations for query answers under exis- tential rules. InInternational Joint Conference on Artificial Intelligence (IJCAI), 1639–1646. Ceylan, ˙I....

  2. [2]

    Gottlob, G.; Leone, N.; and Scarcello, F

    ACM Press. Gottlob, G.; Leone, N.; and Scarcello, F. 2002. Hypertree decompositions and tractable queries.Journal of Computer and System Sciences (JCSS)64(3):579–627. Green, T. J., and Tannen, V . 2017. The semiring framework for database provenance. InACM Symposium on Principles of Database Systems (PODS), 93–99. ACM Press. Grohe, M. 2007. The complexity...

  3. [3]

    Query Relevance Problem

    The Shapley value of tuples in query answering.Logi- cal Methods in Computer Science (LMCS)V olume 17, Issue 3:6942. Meliou, A.; Gatterbauer, W.; Moore, K. F.; and Suciu, D. 2010. The complexity of causality and responsibility for query answers and non-answers.Proc. VLDB Endow. 4(1):34–45. Pe˜naloza, R., and Sertkaya, B. 2017. Understanding the complexity...

  4. [4]

    ifx∈shared-vars(T, q), thenh S(x)∈const(A) 2.(A,T)|=q hS whereq hS is obtained by remov- ing all atoms in int-atoms(T, q)and replacingx∈ shared-vars(T, q)byh S(x)

  5. [5]

    Proof.First suppose thatfis relevant to(T, q)w.r.t.A

    there is noS ′ ⊊Sandh S′ :int-atoms(Q) hom − − → IS′,T such thath S′(x) =h S(x)for everyx∈ shared-vars(T, q). Proof.First suppose thatfis relevant to(T, q)w.r.t.A. Let Sq be a minimal support for(T, q)inAthat containsf, witnessed byh:q hom − − → ISq,T . Necessarily, every fact inS q must be potentially relevant to some atom inqw.r.t.S, oth- erwise it coul...

  6. [6]

    Rl such thatT |= ∃T − ⊑ ∃R 1, andT |=∃R − i ⊆ ∃R i+1 for1≤i < l

    it only maps variables to wordsR 1 . . . Rl such thatT |= ∃T − ⊑ ∃R 1, andT |=∃R − i ⊆ ∃R i+1 for1≤i < l

  7. [7]

    Minimal Homomorphisms on Graphs

    it satisfies the query atoms, meaning that ifA(x)∈qand h(x) =R 1 . . . Rl, thenT |=R − l ⊑A(orT |=∃T − ⊑ Aifh(x) =ϵ), and ifS(x, y)∈q, then either: •h(x) =R 1 . . . Rl andh(y) =R 1 . . . Rl+1 andT |= Rl+1 ⊑S, or •h(x) =R 1 . . . Rl+1 andh(y) =R 1 . . . Rl andT |= R− l+1 ⊑S Thus, to sum up, we guess a roleTand verify that there exists some elementcP 1 . . ...

  8. [8]

    there exists a bijective mappingηfrom the homomorphic images ofqinDto theG q-homomorphic images onG D that preserves the inclusion relation

  9. [9]

    for every homomorphic imageSofqinD,f∈Siffe f ∈η(S)

    for every factf∈D, there existse f ∈E D s.t. for every homomorphic imageSofqinD,f∈Siffe f ∈η(S)

  10. [10]

    Proof.We build graphsG q, GD that precisely encodeqand Dresp

    ifqis isomorphic toD, thenG q =G D. Proof.We build graphsG q, GD that precisely encodeqand Dresp. The construction will be identical for both in order to satisfy condition 3, so we shall focus onDfor now. The construction fromqis identical except that constants must be replaced with existential variables. For this, we start with a set of vertices that are...

  11. [11]

    for every anchor vertex inS,Salso contains its associated Cpn+1 andC pn+2

  12. [12]

    for every anchor vertex inS,Salso contains at least one Ppn+2 that touches it

  13. [13]

    for everyP pn+2 inS,Salso contains the coupledP pn+2 and the intermediateC p. The image ofηalways has this structure indeed, andη −1(S) can be obtained from everyS∈ Sby simply removing all Cpn+1s andC pn+2s then replacing every Ppn+2 − − − − →Cpi Ppn+2 − − − − → by Ri − − →. We now need to show that everyGq-homomorphic image ˆG′ onG D is inS, thatη −1( ˆG...