pith. sign in

arxiv: 2505.00039 · v5 · submitted 2025-04-29 · 💻 cs.CL · cs.AI· cs.IR

An Ontology-Driven Graph RAG for Legal Norms: A Structural, Temporal, and Deterministic Approach

Pith reviewed 2026-05-22 17:49 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.IR
keywords Graph RAGlegal normsontologyknowledge graphtemporal modelingcausalityLLMBrazilian Constitution
0
0 comments X

The pith

An ontology-driven graph RAG distinguishes abstract legal works from versioned expressions to support deterministic temporal and causal queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SAT-Graph RAG as a framework that explicitly models the hierarchical, diachronic, and causal structure of legal norms to fix the blind spots in standard flat-text retrieval. It grounds the knowledge graph in an LRMoo-inspired distinction between abstract Works and versioned Expressions, uses CTV aggregations to represent temporal states efficiently, and reifies legislative events as Action nodes. A planner-guided query strategy then applies explicit policies to resolve point-in-time retrieval, hierarchical impact analysis, and provenance reconstruction in a deterministic way. The Brazilian Constitution case study shows this creates a verifiable substrate for LLMs that lowers factual errors in legal answers. A sympathetic reader cares because law requires exact versions and causal links that ordinary RAG routinely mishandles.

Core claim

We ground our knowledge graph in a formal, LRMoo-inspired model that distinguishes abstract legal Works from their versioned Expressions. We model temporal states as efficient aggregations that reuse the versioned expressions (CTVs) of unchanged components, and we reify legislative events as first-class Action nodes to make causality explicit and queryable. This structured backbone enables a unified, planner-guided query strategy that applies explicit policies to deterministically resolve complex requests for point-in-time retrieval, hierarchical impact analysis, and auditable provenance reconstruction.

What carries the argument

The SAT-Graph RAG framework, which uses an LRMoo-inspired ontology to separate abstract Works from versioned Expressions, CTV aggregations for temporal states, and reified Action nodes to expose causality for deterministic query resolution.

Load-bearing premise

That explicitly modeling legal norms via an LRMoo-inspired distinction between abstract Works and versioned Expressions, combined with CTV aggregations and reified Action nodes, will produce deterministic, auditable resolutions for point-in-time and causal queries without introducing new modeling errors or query complexity.

What would settle it

Execute a point-in-time query on a specific article of the Brazilian Constitution at a date before and after a documented amendment, then verify whether the retrieved text matches only the historically valid versions with no later changes or omissions.

Figures

Figures reproduced from arXiv: 2505.00039 by Hudson de Martim.

Figure 1
Figure 1. Figure 1: Example of articulated text for Art. 12 of the Federal Constitution of Brazil (1988) with annotations [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of hierarchical semantic segmentation, and typification of the structural entities/nodes, [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representation of the multi-layered relationship in the graph: a [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Representation of the multilingual content (in Portuguese and in English). [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: New Temporal Versions of the component "tit2" (Title II) derived from new CTVs of some of its children (chapters). Unchanged child components have their most recent CTV reused (e.g., the 1988-10-05 CTV of tit2_cap2 is aggregated into the 1993-09-14 CTV of tit2). This aggregation model provides an economical, non-ambiguous, and efficient representation of the norm’s evolution. It establishes that a child’s … view at source ↗
Figure 6
Figure 6. Figure 6: Diagram of aggregation relationships (orange) between [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Representation of a legislative Action (Event) in the knowledge graph. The Action, commanded by the Art.1’s caput of the Brazilian Constitutional Amendment 26, terminates the validity of the original 1988- 10-5 CTV of Art. 6’s caput of the Brazilian Federal Constitution of 1988 and produces its new 2000-02-14 CTV. For each Action node, we also generate a descriptive Text Unit. This text is a structured, na… view at source ↗
Figure 8
Figure 8. Figure 8: Knowledge graph illustrating how Text Units are derived from two sources: from Language Versions (representing content) and from the other entities (Norm, Component, Temporal Version, Action), representing metadata and relationships. 3.6 Structure-Aware Retrieval via Curated Communities A key advantage of our graph-based model is its ability to enable structure-aware retrieval. While the original Graph RAG… view at source ↗
Figure 9
Figure 9. Figure 9: Diagram illustrating inter-norm (and component) aggregation by legal [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Diagram illustrating how a user can select a scope (e.g., a [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Original Version and subsequent Versions of Article 6 of Brazilian Constitution generated by 3 [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

Retrieval-Augmented Generation (RAG) systems in the legal domain face a critical challenge: standard, flat-text retrieval is blind to the hierarchical, diachronic, and causal structure of law, leading to anachronistic and unreliable answers. This paper introduces the Structure-Aware Temporal Graph RAG (SAT-Graph RAG), an ontology-driven framework designed to overcome these limitations by explicitly modeling the formal structure and diachronic nature of legal norms. We ground our knowledge graph in a formal, LRMoo-inspired model that distinguishes abstract legal Works from their versioned Expressions. We model temporal states as efficient aggregations that reuse the versioned expressions (CTVs) of unchanged components, and we reify legislative events as first-class Action nodes to make causality explicit and queryable. This structured backbone enables a unified, planner-guided query strategy that applies explicit policies to deterministically resolve complex requests for (i) point-in-time retrieval, (ii) hierarchical impact analysis, and (iii) auditable provenance reconstruction. Through a case study on the Brazilian Constitution, we demonstrate how this approach provides a verifiable, temporally-correct substrate for LLMs, enabling higher-order analytical capabilities while drastically reducing the risk of factual errors. The result is a practical framework for building more trustworthy and explainable legal AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SAT-Graph RAG, an ontology-driven graph retrieval-augmented generation framework for legal norms. It grounds a knowledge graph in an LRMoo-inspired distinction between abstract legal Works and versioned Expressions, models temporal states via CTV aggregations of unchanged components, and reifies legislative events as Action nodes to capture causality. A unified planner-guided query strategy with explicit policies is proposed for point-in-time retrieval, hierarchical impact analysis, and provenance reconstruction. The central claim is that this structure provides a verifiable, temporally-correct substrate for LLMs that enables higher-order analysis while drastically reducing factual errors, as demonstrated in a case study on the Brazilian Constitution.

Significance. If the modeling and query policies prove robust, the approach could advance legal AI by supplying an auditable, diachronic graph substrate that mitigates anachronism and hallucination risks common in flat-text RAG. The explicit separation of Works/Expressions and reification of Actions offers a principled way to handle versioning and causality that standard vector retrieval lacks. However, the absence of quantitative evaluation in the provided description limits immediate impact assessment.

major comments (2)
  1. [Case study] Case study section: The Brazilian Constitution demonstration is presented only as a qualitative illustration of the framework and query policies. No accuracy metrics, error counts on temporal or causal queries, baseline comparisons (e.g., standard RAG or other graph methods), or error analysis are reported, leaving the claim of 'drastically reducing the risk of factual errors' unsupported by evidence.
  2. [Query strategy] Query strategy description: The planner-guided policies for resolving point-in-time, hierarchical, and provenance queries are described at a high level without formal specification of the resolution algorithms, conflict-handling rules, or complexity analysis. This makes it difficult to verify the determinism and auditability asserted in the abstract.
minor comments (1)
  1. [Abstract] The abstract and introduction use several acronyms (SAT-Graph RAG, CTV, LRMoo) without an initial glossary or expansion on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Case study] Case study section: The Brazilian Constitution demonstration is presented only as a qualitative illustration of the framework and query policies. No accuracy metrics, error counts on temporal or causal queries, baseline comparisons (e.g., standard RAG or other graph methods), or error analysis are reported, leaving the claim of 'drastically reducing the risk of factual errors' unsupported by evidence.

    Authors: We agree that the case study remains qualitative and does not yet supply quantitative metrics, error counts, or baseline comparisons, which leaves the stronger phrasing of the claim without direct empirical backing. In the revision we will add a dedicated error analysis subsection, report success rates on a set of temporal and causal queries, and include a brief comparison against standard vector RAG on the same query set. We will also moderate the wording of the claim to reflect the illustrative nature of the current demonstration while still highlighting the structural advantages. revision: yes

  2. Referee: [Query strategy] Query strategy description: The planner-guided policies for resolving point-in-time, hierarchical, and provenance queries are described at a high level without formal specification of the resolution algorithms, conflict-handling rules, or complexity analysis. This makes it difficult to verify the determinism and auditability asserted in the abstract.

    Authors: The manuscript presents the query strategy at the policy level to emphasize its unified, planner-guided character. To improve verifiability we will insert pseudocode for the core resolution procedures, explicit conflict-handling rules for overlapping temporal states, and a short complexity discussion. These additions will directly support the determinism and auditability claims without altering the overall approach. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework derives from independent ontology modeling

full rationale

The paper constructs SAT-Graph RAG from explicit modeling decisions: an LRMoo-inspired distinction between abstract Works and versioned Expressions, CTV aggregations for temporal states, and reified Action nodes for legislative events. These choices enable a planner-guided query strategy for point-in-time, hierarchical, and provenance queries, illustrated qualitatively via the Brazilian Constitution case study. No equations, fitted parameters, or self-referential definitions appear that would reduce any claimed prediction or result to the inputs by construction. The derivation chain relies on structural ontology application rather than self-citation chains or renamed empirical patterns, rendering it self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on domain assumptions from library science and several new modeling entities introduced without external validation beyond the single case-study description.

axioms (1)
  • domain assumption LRMoo-inspired distinction between abstract legal Works and their versioned Expressions
    Invoked to ground the knowledge graph structure and enable reuse of unchanged components.
invented entities (2)
  • CTVs (versioned expressions of unchanged components) no independent evidence
    purpose: Efficient temporal state aggregation that reuses prior versions
    Introduced to avoid full duplication when laws remain stable across time periods.
  • Action nodes for legislative events no independent evidence
    purpose: Make causality explicit and directly queryable
    Reified as first-class nodes to support provenance and impact analysis.

pith-pipeline@v0.9.0 · 5763 in / 1406 out tokens · 41421 ms · 2026-05-22T17:49:58.611897+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge

    cs.IR 2026-04 unverdicted novelty 7.0

    CAR is a new retrieval objective that targets the currently active authority set rather than most-similar documents, with theorems on coverage conditions and evaluations showing two-stage methods outperform dense retr...

  2. Deterministic Legal Agents: A Canonical Primitive API for Auditable Reasoning over Temporal Knowledge Graphs

    cs.AI 2025-10 unverdicted novelty 7.0

    The paper specifies the SAT-Graph API, a canonical primitive interface that enables auditable, deterministic reasoning over temporal knowledge graphs by isolating uncertainty to intent translation and narrative synthesis.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 2 Pith papers · 1 internal anchor

  1. [1]

    Retrieval-augmented generation for knowledge-intensive NLP tasks

    Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V , Goyal N, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Advances in Neural Information Processing Systems. vol. 33

  2. [2]

    Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component- Level, Event-Centric Approach to Legal Knowledge Graphs; 2025

    De Martim H. Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component- Level, Event-Centric Approach to Legal Knowledge Graphs; 2025. [preprint]. Accessed on: August 23,

  3. [3]

    Available from:https://arxiv.org/abs/2506.07853

  4. [4]

    From Local to Global: A Graph RAG Approach to Query-Focused Summarization

    Edge D, Trinh HT, Cheng N, Bradley J, Chao A, Mody A, et al.. From local to global: A graph RAG approach to query-focused summarization; 2024. [Preprint]. Accessed on: August 23, 2025. Available from:https://arxiv.org/abs/2404.16130

  5. [5]

    LRMoo: Object-oriented definition and mapping from the IFLA Library Reference Model

    IFLA LRMoo Working Group and CIDOC CRM Special Interest Group. LRMoo: Object-oriented definition and mapping from the IFLA Library Reference Model. IFLA; 2024. Version 1.0. 21

  6. [6]

    Pipitone and G

    Pipitone N, Houir Alami G. LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain; 2024. [preprint] arXiv:2408.10343

  7. [7]

    Finding the law: Enhancing statutory article retrieval via graph neural networks; 2023

    Louis A, van Dijck G, Spanakis G. Finding the law: Enhancing statutory article retrieval via graph neural networks; 2023. [preprint] arXiv:2301.12847

  8. [8]

    A Heterogeneous Graph Based on Legal Documents and Legal Statute Hierarchy for Chinese Legal Case Retrieval

    Hei M, et al. A Heterogeneous Graph Based on Legal Documents and Legal Statute Hierarchy for Chinese Legal Case Retrieval. IEEE Access. 2024;12:93502-16

  9. [9]

    Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use; 2025

    Ho J, Colby A, Fisher W. Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use; 2025. [preprint] arXiv:2505.02164

  10. [10]

    A survey on temporal knowledge graph: Representation learning and applications; 2024

    Cai L, et al. A survey on temporal knowledge graph: Representation learning and applications; 2024. [preprint]

  11. [11]

    Modelling temporal data in knowledge graphs: a systematic review protocol

    Hooshafza S, et al. Modelling temporal data in knowledge graphs: a systematic review protocol. HRB Open Research. 2022;4:101

  12. [12]

    Legislative XML: principles and technical tools; 2012

    Palmirani M, Vitali F. Legislative XML: principles and technical tools; 2012

  13. [13]

    LegalHTML: a Representation Language for Legal Acts

    Stellato A, et al. LegalHTML: a Representation Language for Legal Acts. ESWC 2023 (paper)

  14. [14]

    Available from: https://2023.eswc-conferences.org/wp-content/uploads/ 2023/05/paper_Stellato_2023_LegalHTML.pdf

  15. [15]

    A Temporal Knowledge Graph Generation Dataset Supervised Distantly by Large Language Models

    Zhu J, et al. A Temporal Knowledge Graph Generation Dataset Supervised Distantly by Large Language Models. Scientific Data. 2025;12(1):734

  16. [16]

    Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization; 2025

    Barron RC, et al. Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization; 2025. [preprint] arXiv:2502.20364

  17. [17]

    Moving in the time: An ontology for identifying legal resources

    De Oliveira Lima JA, Palmirani M, Vitali F. Moving in the time: An ontology for identifying legal resources. In: Legal Knowledge and Information Systems. Berlin: Springer Berlin Heidelberg; 2008

  18. [18]

    Unsupervised Differentiable Multi-aspect Network Embedding

    Park C, Yang C, Zhu Q, Kim D, Yu H, Han J. Unsupervised Differentiable Multi-aspect Network Embedding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’20); 2020. p. 1435-45

  19. [19]

    Explainable AI and law: An evidential survey

    Richmond KM, et al. Explainable AI and law: An evidential survey. Digital Society. 2024;3(1):1. 22