pith. machine review for the scientific record. sign in

arxiv: 2605.02472 · v1 · submitted 2026-05-04 · 💻 cs.CL

Recognition: unknown

Accurate Legal Reasoning at Scale: Neuro-Symbolic Offloading and Structural Auditability for Robust Legal Adjudication

Authors on Pith no claims yet

Pith reviewed 2026-05-09 16:09 UTC · model grok-4.3

classification 💻 cs.CL
keywords legal reasoningneuro-symbolic AIdeterministic contract languageamortized intelligencegraph-based executionlegal auditabilitylarge language modelsreasoning consistency
0
0 comments X

The pith

Legal texts are translated once by an LLM into a deterministic graph for accurate, low-cost, auditable adjudication.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a neuro-symbolic method to handle computational legal clauses that require complex logic. An LLM performs a single translation of the legal text into Deterministic Autonomous Contract Language, a typed graph intermediate representation. Subsequent adjudication runs on deterministic graph execution rather than repeated probabilistic inferences from large reasoning models. This produces near-perfect consistency, avoids the reasoning cliff of probabilistic models, cuts compute costs by more than 90 percent in high-volume settings, and generates visually auditable traces that satisfy legal requirements.

Core claim

The central claim is that legal reasoning at scale becomes reliable when an LLM is used only once to convert text into DACL, a typed graph representation, after which all adjudication occurs through deterministic graph executions that carry a complete, visually inspectable trace. Against runtime baselines such as GPT-5.2 and Gemini 3 Pro, the resulting DACL-based agent delivers near-perfect consistency, removes the reasoning cliff, reduces costs by over 90 percent, and meets strict auditability standards for legal systems.

What carries the argument

Deterministic Autonomous Contract Language (DACL): a typed graph intermediate representation that encodes legal clauses so that adjudication reduces to deterministic execution with an auditable trace.

If this is right

  • Compute costs drop by more than 90 percent in high-volume legal workflows because the expensive LLM step occurs only once.
  • Near-perfect consistency is achieved because probabilistic inference is replaced by deterministic graph execution.
  • The reasoning cliff disappears because errors are confined to the initial translation rather than accumulating across repeated model calls.
  • Every adjudication produces a visually auditable trace that satisfies the transparency demands of legal systems.
  • Production systems become feasible where repeated large-model inference would otherwise remain too expensive or error-prone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same one-time translation plus deterministic execution pattern could be tested in regulatory compliance or financial contract monitoring where rule-based logic dominates.
  • Measuring translation accuracy across different legal jurisdictions would show how much jurisdiction-specific knowledge must be supplied to the LLM.
  • The auditable graph traces open the possibility of automated review pipelines in which non-lawyers can inspect intermediate steps without needing to re-run the model.
  • Pre-computing DACL graphs for standard contract templates could further amortize the initial cost across thousands of similar cases.

Load-bearing premise

A single LLM pass can produce a complete and accurate DACL graph from arbitrary legal text without semantic loss or logical gaps that would affect downstream deterministic execution.

What would settle it

A test set of complex legal clauses where the outcomes of DACL graph executions systematically differ from the consensus of independent legal experts on the same inputs would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.02472 by Stanis{\l}aw S\'ojka, Witold Kowalczyk.

Figure 1
Figure 1. Figure 1: Architectural Comparison: Baseline vs. Amortized Intelligence. The baseline (top) relies on expensive, probabilistic inference-time compute for every transaction, leading to linear cost scaling and potential hallucinations. Our proposed approach (bottom) shifts reasoning to a one-time compile-time step, translating the contract into a DACL graph. This enables the runtime agent to execute deterministic logi… view at source ↗
Figure 2
Figure 2. Figure 2: Error Taxonomy by Model Configuration. The stacked bars show the total failure count for each model (N = 400 events). Arithmetic (AH) errors remain negligible across all models, confirming that the difficulty lies in state tracking rather than computation. This presents a paradoxical finding: frontier mod￾els have mastered the computational primitives of law (arithmetic) but lack the structural fidelity to… view at source ↗
read the original abstract

Legal texts often contain computational legal clauses--provisions whose understanding requires complex logic. While frontier Large Reasoning Models (LRMs) can describe such clauses, building production-ready systems is limited by reasoning errors and the high cost of inference. We propose Amortized Intelligence, a neuro-symbolic approach where we use an LLM once to translate a legal text into Deterministic Autonomous Contract Language (DACL): a typed graph intermediate representation. Adjudication then relies on deterministic graph executions with a visually auditable trace. In comparison against runtime LRM baselines (including GPT-5.2 and Gemini 3 Pro), our DACL-based Agent achieves near-perfect consistency and mitigates the "reasoning cliff" observed in probabilistic models. The system reduces compute costs by over 90% in high-volume workflows while satisfying the strict auditability requirements of legal adjudication.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a neuro-symbolic framework called Amortized Intelligence for legal adjudication. An LLM is used once to translate legal text into Deterministic Autonomous Contract Language (DACL), a typed graph intermediate representation. Adjudication then proceeds via deterministic graph execution with a visually auditable trace. The DACL-based agent is claimed to deliver near-perfect consistency, mitigate the reasoning cliff seen in probabilistic LRMs, reduce compute costs by over 90% in high-volume workflows, and satisfy legal auditability requirements, outperforming runtime baselines such as GPT-5.2 and Gemini 3 Pro.

Significance. If the central claims were substantiated with rigorous evidence, the work would be significant for neuro-symbolic AI and legal informatics. Offloading reasoning to deterministic execution after a single neural translation step could address inconsistency and cost barriers that currently limit LLM deployment in high-stakes legal settings, while the emphasis on structural auditability directly responds to regulatory and practical needs for explainable adjudication systems.

major comments (2)
  1. [Abstract] Abstract: The performance claims of near-perfect consistency, mitigation of the reasoning cliff, and over 90% cost reduction are asserted without any reported data, baselines, error analysis, methodology details, or quantitative comparisons to the cited LRM baselines. No evidence is supplied to support the central assertions.
  2. [Abstract] Abstract: The load-bearing assumption that a single LLM pass produces a complete, semantically faithful DACL graph from arbitrary legal text (without omissions, mis-typings, or logical gaps) is stated but unsupported by any metrics on translation fidelity such as clause coverage, type correctness, or expert agreement on graph structure. Deterministic execution is only as reliable as its input graph.
minor comments (1)
  1. [Abstract] The acronym DACL and the term Amortized Intelligence are introduced without an explicit definition, formal syntax, or comparison to related intermediate representations in the neuro-symbolic literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which correctly identifies areas where the abstract's claims require stronger evidentiary support. We address each point below and will revise the manuscript accordingly to improve clarity and substantiation without altering the core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The performance claims of near-perfect consistency, mitigation of the reasoning cliff, and over 90% cost reduction are asserted without any reported data, baselines, error analysis, methodology details, or quantitative comparisons to the cited LRM baselines. No evidence is supplied to support the central assertions.

    Authors: We agree that the abstract asserts these performance outcomes without including or referencing supporting data, which limits its standalone value. The manuscript body contains the relevant experimental comparisons to GPT-5.2 and Gemini 3 Pro along with discussions of consistency and cost, but these are not summarized or cited in the abstract itself. We will revise the abstract to incorporate concise quantitative highlights drawn from the evaluation sections and add explicit references to the corresponding figures, tables, and methodology details. revision: yes

  2. Referee: [Abstract] Abstract: The load-bearing assumption that a single LLM pass produces a complete, semantically faithful DACL graph from arbitrary legal text (without omissions, mis-typings, or logical gaps) is stated but unsupported by any metrics on translation fidelity such as clause coverage, type correctness, or expert agreement on graph structure. Deterministic execution is only as reliable as its input graph.

    Authors: This observation is accurate and central to the framework's validity. The manuscript describes the translation process but provides no quantitative metrics assessing its fidelity or completeness. We will add a new evaluation subsection focused on the translation step, reporting clause coverage, type correctness, and expert agreement metrics, and will reference these results in the revised abstract to substantiate the assumption. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on unquantified empirical comparison without self-referential derivations

full rationale

The paper describes a neuro-symbolic pipeline in which an LLM performs a one-time translation of legal text into a DACL graph, after which adjudication proceeds via deterministic execution. The headline results (near-perfect consistency, mitigation of the reasoning cliff, >90% cost reduction) are asserted via comparison to runtime LRM baselines. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text. The central claim therefore does not reduce to its own inputs by construction; it is an empirical assertion whose verification would require external metrics on translation fidelity, which are not supplied but whose absence does not create circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the unverified assumption that legal logic can be losslessly captured in a typed graph and that one LLM translation suffices for all subsequent deterministic runs. No free parameters are stated. DACL is introduced as a new entity without independent evidence of completeness.

axioms (1)
  • domain assumption An LLM can translate arbitrary legal text into a complete and semantically faithful DACL graph representation
    This is the load-bearing premise of the neuro-symbolic offloading step described in the abstract.
invented entities (1)
  • Deterministic Autonomous Contract Language (DACL) no independent evidence
    purpose: Typed graph intermediate representation that enables deterministic execution and visual audit trails for legal clauses
    New language introduced to replace probabilistic reasoning; no external validation or prior reference provided in abstract.

pith-pipeline@v0.9.0 · 5453 in / 1442 out tokens · 31618 ms · 2026-05-09T16:09:32.924216+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 15 canonical work pages

  1. [1]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  2. [2]

    1997 , publisher =

    Dan Gusfield , title =. 1997 , publisher =

  3. [3]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  4. [4]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

  5. [5]

    Theory and Practice of Logic Programming , volume=

    Fifty Years of Prolog and Beyond , author=. Theory and Practice of Logic Programming , volume=

  6. [6]

    IJCAI'07: Proceedings of the 20th International Joint Conference on Artificial Intelligence , pages=

    ProbLog: A Probabilistic Prolog and Its Application in Link Discovery , author=. IJCAI'07: Proceedings of the 20th International Joint Conference on Artificial Intelligence , pages=

  7. [7]

    Goal-Directed Execution of Answer Set Programs (GDEASP) Workshop , year=

    Blawx: Web-based user-friendly Rules as Code , author=. Goal-Directed Execution of Answer Set Programs (GDEASP) Workshop , year=

  8. [8]

    3rd International Workshop on Goal-Directed Execution of Answer Set Programs (GDE 2023) , year=

    Building Blawx , author=. 3rd International Workshop on Goal-Directed Execution of Answer Set Programs (GDE 2023) , year=

  9. [9]

    Implementations of Prolog , pages=

    Epilog: A Language for Extended Programming in Logic , author=. Implementations of Prolog , pages=

  10. [10]

    IEEE Transactions on Knowledge and Data Engineering , volume=

    DR-Prolog: A System for Defeasible Reasoning with Rules and Ontologies on the Semantic Web , author=. IEEE Transactions on Knowledge and Data Engineering , volume=. 2007 , doi=

  11. [11]

    Advances in Neural Information Processing Systems , year=

    DeepProbLog: Neural Probabilistic Logic Programming , author=. Advances in Neural Information Processing Systems , year=

  12. [12]

    ProbLog --- Probabilistic Logic Programming , howpublished =

  13. [13]

    Journal of Logic Programming , year=

    Epilog: A Prolog Extension with Hierarchical Modules , author=. Journal of Logic Programming , year=

  14. [14]

    Proceedings of the AAAI Workshop on Intelligent Narrative Technologies , year=

    Epilog: A Reasoning System for Natural Language Understanding , author=. Proceedings of the AAAI Workshop on Intelligent Narrative Technologies , year=

  15. [15]

    International Workshop on Principles and Practice of Semantic Web Reasoning , year=

    DR-Prolog: Defeasible Reasoning on the Semantic Web , author=. International Workshop on Principles and Practice of Semantic Web Reasoning , year=

  16. [16]

    arXiv preprint arXiv:2410.09904 , year =

    Equitable Access to Justice: Logical LLMs Show Promise , author =. arXiv preprint arXiv:2410.09904 , year =. doi:10.48550/arXiv.2410.09904 , url =. 2410.09904 , archivePrefix =

  17. [17]

    2025 , eprint=

    Towards Robust Legal Reasoning: Harnessing Logical LLMs in Law , author=. 2025 , eprint=

  18. [18]

    In: Proceedings of the 34th ACM International Conference on Information and Knowledge Management

    On Verifiable Legal Reasoning: A Multi-Agent Framework with Formalized Knowledge Representations , author =. Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM) , year =. doi:10.1145/3746252.3761057 , url =

  19. [19]

    arXiv preprint arXiv:2409.11589 , year =

    ProSLM: A Prolog Synergized Language Model for Explainable Domain Specific Knowledge Based Question Answering , author =. arXiv preprint arXiv:2409.11589 , year =. 2409.11589 , archivePrefix =

  20. [20]

    arXiv preprint arXiv:2511.06618 , year =

    GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization , author =. arXiv preprint arXiv:2511.06618 , year =. 2511.06618 , archivePrefix =

  21. [21]

    2025 , note =

    LLM as a Judge for Evaluating Contract Graphs: Multi-Judge Benchmarking and Agentic Uncertainty-Aware Refinement , author =. 2025 , note =

  22. [22]

    arXiv preprint arXiv:2506.16335 , year =

    Explainable Rule Application via Structured Prompting: A Neural-Symbolic Approach , author =. arXiv preprint arXiv:2506.16335 , year =. doi:10.48550/arXiv.2506.16335 , url =. 2506.16335 , archivePrefix =

  23. [23]

    ArXiv , year=

    LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models , author=. ArXiv , year=

  24. [24]

    Chain of Logic: Rule-Based Reasoning with Large Language Models

    Servantez, Sergio and Barrow, Joe and Hammond, Kristian and Jain, Rajiv. Chain of Logic: Rule-Based Reasoning with Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.159

  25. [25]

    Large legal fictions: Profiling legal hallucinations in large language models,

    Dahl, Matthew and Magesh, Varun and Suzgun, Mirac and Ho, Daniel E , title =. Journal of Legal Analysis , volume =. 2024 , month =. doi:10.1093/jla/laae003 , url =

  26. [26]

    ArXiv , year=

    Simple synthetic data reduces sycophancy in large language models , author=. ArXiv , year=

  27. [27]

    L egal M athematical R easoning with LLM s: P rocedural A lignment through T wo- S tage R einforcement L earning

    Zhang, Kepu and Xie, Guofu and Yu, Weijie and Xu, Mingyue and Tang, Xu and Li, Yaxin and Xu, Jun. L egal M athematical R easoning with LLM s: P rocedural A lignment through T wo- S tage R einforcement L earning. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025

  28. [28]

    Cummins, John and Dávila Quintero, Jacinto and Kowalski, Robert and Ovenden, David , year =

  29. [29]

    Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, et al

    Chalkidis, Ilias and Jana, Abhik and Hartung, Dirk and Bommarito, Michael and Androutsopoulos, Ion and Katz, Daniel and Aletras, Nikolaos. L ex GLUE : A Benchmark Dataset for Legal Language Understanding in E nglish. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022....

  30. [30]

    PloS one , volume=

    A general approach for predicting the behavior of the Supreme Court of the United States , author=. PloS one , volume=. 2017 , publisher=

  31. [31]

    PeerJ Comput

    Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective , author=. PeerJ Comput. Sci. , year=

  32. [32]

    Neural legal judgment prediction in English

    Chalkidis, Ilias and Androutsopoulos, Ion and Aletras, Nikolaos. Neural Legal Judgment Prediction in E nglish. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1424

  33. [33]

    Zero-shot Transfer of Article-aware Legal Outcome Classification for E uropean Court of Human Rights Cases

    T.y.s.s, Santosh and Ichim, Oana and Grabmair, Matthias. Zero-shot Transfer of Article-aware Legal Outcome Classification for E uropean Court of Human Rights Cases. Findings of the Association for Computational Linguistics: EACL 2023. 2023. doi:10.18653/v1/2023.findings-eacl.44

  34. [34]

    CoRR , volume =

    Quzhe Huang and Mingxu Tao and Zhenwei An and Chen Zhang and Cong Jiang and Zhibin Chen and Zirui Wu and Yansong Feng , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2305.15062 , eprinttype =. 2305.15062 , timestamp =

  35. [35]

    2023 , url=

    Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model , author=. 2023 , url=

  36. [36]

    2025 , month =

    GPT-5.2 System Card , author =. 2025 , month =

  37. [37]

    2025 , month =

    Claude Sonnet 4.5 System Card , author =. 2025 , month =

  38. [38]

    2025 , month =

    Gemini 3 Pro Model Card , author =. 2025 , month =

  39. [39]

    2025 , howpublished =

    OpenAI Agents SDK: A Framework for Orchestrating Intelligent Workflows , author =. 2025 , howpublished =

  40. [40]

    2025 , month =

    GPT-5 Mini: Lightweight Reasoning for Agentic Workflows , author =. 2025 , month =

  41. [41]

    Proceedings of the 40th International Conference on Machine Learning , articleno =

    Gao, Luyu and Madaan, Aman and Zhou, Shuyan and Alon, Uri and Liu, Pengfei and Yang, Yiming and Callan, Jamie and Neubig, Graham , title =. Proceedings of the 40th International Conference on Machine Learning , articleno =. 2023 , publisher =

  42. [42]

    Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. Trans. Mach. Learn. Res. , year=

  43. [43]

    , title =

    Turpin, Miles and Michael, Julian and Perez, Ethan and Bowman, Samuel R. , title =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =. 2023 , publisher =

  44. [44]

    Constrained Language Models Yield Few-Shot Semantic Parsers

    Shin, Richard and Lin, Christopher and Thomson, Sam and Chen, Charles and Roy, Subhro and Platanios, Emmanouil Antonios and Pauls, Adam and Klein, Dan and Eisner, Jason and Van Durme, Benjamin. Constrained Language Models Yield Few-Shot Semantic Parsers. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.1...

  45. [45]

    Conference on Empirical Methods in Natural Language Processing , year=

    Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning , author=. Conference on Empirical Methods in Natural Language Processing , year=

  46. [46]

    ArXiv , year=

    ReAct: Synergizing Reasoning and Acting in Language Models , author=. ArXiv , year=

  47. [47]

    Toolformer: language models can teach themselves to use tools , year =

    Schick, Timo and Dwivedi-Yu, Jane and Dess\'. Toolformer: language models can teach themselves to use tools , year =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =

  48. [48]

    and Zhang, Hao and Gonzalez, Joseph E

    Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , title =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =. 2023 , publisher =

  49. [49]

    doi: 10.18653/v1/2023.findings-acl.507

    Hsieh, Cheng-Yu and Li, Chun-Liang and Yeh, Chih-kuan and Nakhost, Hootan and Fujii, Yasuhisa and Ratner, Alex and Krishna, Ranjay and Lee, Chen-Yu and Pfister, Tomas. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi...

  50. [50]

    ArXiv , year=

    FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance , author=. ArXiv , year=

  51. [51]

    2021 , eprint=

    CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review , author=. 2021 , eprint=

  52. [52]

    ArXiv , year=

    A Benchmark for Lease Contract Review , author=. ArXiv , year=

  53. [53]

    Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law , pages =

    Blair-Stanek, Andrew and Holzenberger, Nils and Van Durme, Benjamin , title =. Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law , pages =. 2023 , isbn =. doi:10.1145/3594536.3595163 , abstract =

  54. [54]

    CLERC : A Dataset for U

    Hou, Abe Bohan and Weller, Orion and Qin, Guanghui and Yang, Eugene and Lawrie, Dawn and Holzenberger, Nils and Blair-Stanek, Andrew and Van Durme, Benjamin. CLERC : A Dataset for U . S . Legal Case Retrieval and Retrieval-Augmented Analysis Generation. Findings of the Association for Computational Linguistics: NAACL 2025. 2025. doi:10.18653/v1/2025.findi...

  55. [55]

    2026 , eprint=

    LEXam: Benchmarking Legal Reasoning on 340 Law Exams , author=. 2026 , eprint=

  56. [56]

    ArXiv , year=

    Thinking Longer, Not Always Smarter: Evaluating LLM Capabilities in Hierarchical Legal Reasoning , author=. ArXiv , year=