arxiv: 2604.05539 · v1 · submitted 2026-04-07 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement

Cedric Haufe, Frieder Stolzenburg

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:57 UTC · model grok-4.3

classification 💻 cs.AI

keywords neurosymbolic AIlogic tensor networkslarge language modelsoffer validationpublic procurementexplainable AIXAIregulated decision making

0 comments

The pith

A neurosymbolic pipeline extracts predicates from offer documents with a language model and validates them via Logic Tensor Networks to produce auditable decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes combining a language model for extracting specific facts from procurement offer documents with a Logic Tensor Network that applies logical rules to determine validity. This neurosymbolic method aims to deliver decisions that match the accuracy of standard models while remaining fully traceable to individual predicates, rule evaluations, and source text passages. In regulated public institutions, such traceability supports legal verification and auditing, which purely statistical approaches often lack. Experiments on a real corpus of offer documents confirm that the pipeline performs at a level comparable to existing methods. The modular structure also allows domain rules to be updated independently of the language model.

Core claim

By employing a language model to extract information from offer documents and then aggregating the results with a Logic Tensor Network, the approach produces decisions that are both factually correct and legally verifiable. These decisions can be justified by predicate values, rule truth values, and corresponding text passages, enabling rule checking based on a real corpus. Experiments show performance comparable to existing models, with the key advantage lying in interpretability, modular predicate extraction, and explicit support for XAI.

What carries the argument

Logic Tensor Network (LTN) that evaluates the truth of domain rules over predicates extracted by the language model from offer documents.

If this is right

Decisions can be justified by referencing specific predicate values, rule truth values, and the original text passages.
Domain-specific knowledge can be linked directly to the semantic understanding provided by language models.
The modular predicate extraction allows rules to be updated or extended without retraining the language model.
The structure provides explicit support for Explainable AI through auditable rule checking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same predicate-to-LTN structure could be reused in other regulated text domains such as contract compliance or grant application review.
Human review of the extracted predicates before LTN evaluation could serve as a practical safeguard against extraction errors.
If predicate accuracy improves with newer language models, overall decision quality could exceed purely neural baselines while retaining traceability.

Load-bearing premise

The predicates extracted by the language model are sufficiently accurate and complete for the Logic Tensor Network rules to produce legally correct decisions on unseen offer documents.

What would settle it

A test corpus of offer documents with independent legal-expert validity labels where the pipeline produces decisions that systematically disagree with the ground truth on cases involving clear rule violations.

Figures

Figures reproduced from arXiv: 2604.05539 by Cedric Haufe, Frieder Stolzenburg.

**Figure 1.** Figure 1: Anonymised example of a valid offer document from the corpus. Each document d was manually assigned a binary label IS_VALID_OFFER(d) ∈ {0, 1} which identifies the extent to which it is valid or invalid. A valid offer is marked with IS_VALID_OFFER = 1 and could be used in the real process. Documents that clearly serve a different purpose (e.g. invoices, delivery notes, order confirmations, general price li… view at source ↗

**Figure 2.** Figure 2: Overview of the pipeline: Incoming potential offers are segmented, evaluated by an LLM, and then used by an LTN to make the final decision. Since these potential offers usually contain several pages and numerous layout elements (text, tables, diagrams), the LLM does not work directly with the entire raw text. Instead, this raw text is divided into a series of meaningful text excerpts (chunks). For this pu… view at source ↗

read the original abstract

We present a neurosymbolic approach, i.e., combining symbolic and subsymbolic artificial intelligence, to validating offer documents in regulated public institutions. We employ a language model to extract information and then aggregate with an LTN (Logic Tensor Network) to make an auditable decision. In regulated public institutions, decisions must be made in a manner that is both factually correct and legally verifiable. Our neurosymbolic approach allows existing domain-specific knowledge to be linked to the semantic text understanding of language models. The decisions resulting from our pipeline can be justified by predicate values, rule truth values, and corresponding text passages, which enables rule checking based on a real corpus of offer documents. Our experiments on a real corpus show that the proposed pipeline achieves performance comparable to existing models, while its key advantage lies in its interpretability, modular predicate extraction, and explicit support for XAI (Explainable AI).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches an LLM-to-LTN pipeline for auditable offer validation in procurement but provides no direct check on whether the extracted predicates are accurate enough to support the interpretability claim.

read the letter

The main takeaway is a concrete neurosymbolic pipeline: an LLM extracts predicates from real offer documents, then a Logic Tensor Network applies domain rules to produce decisions that can be traced back to specific predicates, rules, and text spans. They tested it on an actual corpus from public institutions and report end-to-end performance on par with standard models, with the stated advantage being better interpretability and modularity for regulated settings. That framing is useful because procurement decisions need legal defensibility, and the setup does link existing symbolic knowledge to LLM text handling in a way that could be replicated. The experiments on real data are a plus over purely synthetic tests. The central soft spot is exactly where the stress-test flagged: there is no separate validation of the predicate extraction step itself. No precision or recall figures against expert annotations, no error analysis on missed or invented predicates, and no inter-annotator agreement. Without that, comparable aggregate accuracy does not confirm that the LTN is operating on sound inputs or that the XAI outputs rest on factually correct predicates. If the LLM misses key facts or hallucinates others, the rule truth values and justifications become unreliable even if the final label matches a baseline. The paper is aimed at practitioners and researchers working on compliant AI in government or regulated procurement. Readers looking for a template that mixes LLMs with explicit rules will find the architecture description worth examining. It deserves peer review because the application domain is high-stakes and the neurosymbolic combination is a reasonable direction; referees can push on the missing predicate-level evaluation and ask for clearer baselines and ablation results. I would not cite it yet without those fixes.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a neurosymbolic pipeline for validating offer documents in regulated public procurement. A large language model extracts predicates from text, which are then aggregated via Logic Tensor Networks (LTNs) using domain-specific rules to produce auditable decisions. The authors claim that this approach achieves performance comparable to existing models on a real corpus while providing advantages in interpretability, modularity, predicate-level justification, and explicit support for XAI through rule truth values and linked text passages.

Significance. If the empirical claims are substantiated, the work offers a concrete demonstration of combining LLM semantic extraction with symbolic logical reasoning in a high-stakes legal domain. The emphasis on auditability and XAI could address regulatory requirements for verifiable decisions, potentially increasing trust in automated procurement systems.

major comments (2)

[Abstract] Abstract: The central claim that 'the proposed pipeline achieves performance comparable to existing models' is unsupported by any reported metrics, baselines, corpus split details, labeling process, or error analysis. Without these, the performance assertion cannot be evaluated.
[Experiments] Experiments (as described in the abstract and skeptic analysis): The legal verifiability claim rests on the assumption that LM-extracted predicates are accurate and complete enough for LTN rules to yield correct decisions. No predicate-level validation (precision/recall vs. expert annotations, inter-annotator agreement, or error analysis on extracted predicates) is provided; only end-to-end performance is referenced. This is load-bearing for the interpretability and XAI advantages.

minor comments (1)

[Abstract] Abstract: The phrase 'neurosymbolic approach' is introduced without a short definition or citation to foundational LTN or neurosymbolic literature, which may hinder accessibility for readers outside the subfield.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to strengthen the empirical presentation.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'the proposed pipeline achieves performance comparable to existing models' is unsupported by any reported metrics, baselines, corpus split details, labeling process, or error analysis. Without these, the performance assertion cannot be evaluated.

Authors: We agree that the abstract's performance claim requires concrete supporting details to be properly evaluated. The current version states the claim at a high level based on the experiments but does not include the specific metrics or procedural details in the abstract. We will revise the abstract to report key performance metrics, baselines, corpus split details, labeling process, and a summary of error analysis, ensuring the claim is fully substantiated and evaluable. revision: yes
Referee: [Experiments] Experiments (as described in the abstract and skeptic analysis): The legal verifiability claim rests on the assumption that LM-extracted predicates are accurate and complete enough for LTN rules to yield correct decisions. No predicate-level validation (precision/recall vs. expert annotations, inter-annotator agreement, or error analysis on extracted predicates) is provided; only end-to-end performance is referenced. This is load-bearing for the interpretability and XAI advantages.

Authors: We concur that the accuracy and completeness of the LLM-extracted predicates are foundational to the legal verifiability, interpretability, and XAI claims. The manuscript currently emphasizes end-to-end results. We will add a dedicated predicate-level evaluation in the revised experiments section, reporting precision and recall against expert annotations, inter-annotator agreement, and a detailed error analysis on the extracted predicates. These additions will directly bolster the support for the modularity and explainability advantages. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; empirical pipeline with no equations or self-referential reductions

full rationale

The paper presents a neurosymbolic pipeline that extracts predicates via language model and aggregates them using Logic Tensor Networks for auditable decisions on offer documents. Central claims rest on experiments showing performance comparable to baselines plus advantages in interpretability and XAI support. No equations, derivations, fitted parameters, or self-citations appear in the provided text that would reduce any prediction or result to its inputs by construction. The approach invokes domain knowledge and empirical validation on a real corpus without self-definitional loops, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation. This matches the default expectation of no significant circularity for an empirical methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the paper presupposes that domain rules can be expressed as first-order logic predicates and that LLM extraction errors do not invalidate downstream LTN inference.

axioms (1)

domain assumption Decisions in regulated public institutions must be both factually correct and legally verifiable.
Stated in the abstract as the motivating requirement for the neurosymbolic approach.

pith-pipeline@v0.9.0 · 5455 in / 1275 out tokens · 29474 ms · 2026-05-10T19:57:42.091955+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
LTN aggregates predicate truth values p(d)∈[0,1]^11 with fuzzy operators (Gödel min/max, Łukasiewicz, Product) and rules R1: Tc(d)→Obase(d), R5: NOTs(d)→¬Obase(d) to decide IS_VALID_OFFER
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery from Law of Logic unclear
Predicate layer uses 8 domain predicates (OFFER_TITLE, NOT_AN_OFFER, …) with LLM-derived soft truth values in [0,1]

Reference graph

Works this paper leans on

19 extracted references · 18 canonical work pages · 1 internal anchor

[1]

doi: https://doi.org/10.1016/j.artint

Badreddine, S., d’Avila Garcez, A., Serafini, L., Spranger, M.: Logic tensor net- works. Artificial Intelligence303, 103649 (2022). https://doi.org/10.1016/j.artint. 2021.103649

work page doi:10.1016/j.artint 2022
[2]

Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., Hui, B., Ji, L., Li, M., Lin, J., Lin, R., Liu, D., Liu, G., Lu, C., Lu, K., Ma, J., Men, R., Ren, X., Ren, X., Tan, C., Tan, S., Tu, J., Wang, P., Wang, S., Wang, W., Wu, S., Xu, B., Xu, J., Yang, Yang, H., Yang, J., Yang, S., Yao, Y., Yu, B., Yuan, H., Yuan, Z.,...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Neural-symbolic learning and reasoning: A survey and interpretation

Besold, T.R., d’Avila Garcez, A., Bader, S., Bowman, H., Domingos, P., Hitzler, P., Kühnberger, K.U., Lamb, L.C., Lima, P.M.V., de Penning, L., Pinkas, G., Poon, H., Zaverucha, G.: Chapter 1. Neural-symbolic learning and reasoning: A survey and interpretation. In: Hitzler, P., Sarker, M.K. (eds.) Neuro-symbolic ar- tificial intelligence: the state of the ...

work page doi:10.3233/faia210348 2022
[4]

In: Cao, T., Das, A., Kumarage, T., Wan, Y., Krishna, S., Mehrabi, N., Dhamala, J., Ra- makrishna, A., Galystan, A., Kumar, A., Gupta, R., Chang, K.W

Bodhwani, U., Ling, Y., Dong, S., Feng, Y., Li, H., Goyal, A.: A calibrated reflection approach for enhancing confidence estimation in LLMs. In: Cao, T., Das, A., Kumarage, T., Wan, Y., Krishna, S., Mehrabi, N., Dhamala, J., Ra- makrishna, A., Galystan, A., Kumar, A., Gupta, R., Chang, K.W. (eds.) Pro- ceedings of the 5th Workshop on Trustworthy NLP (Trus...

2025
[5]

https://doi.org/10.18653/v1/2025.trustnlp-main.26

Association for Computational Linguistics, Stroudsburg, PA, USA (2025). https://doi.org/10.18653/v1/2025.trustnlp-main.26

work page doi:10.18653/v1/2025.trustnlp-main.26 2025
[6]

BERT: Pre-training of deep bidi- rectional transformers for language understanding

Devlin,J.,Chang,M.W.,Lee,K.,Toutanova,K.:Bert:Pre-trainingofdeepbidirec- tional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the 14 C. Haufe and F. Stolzenburg Association for Computational Linguistics: Human Language Technologies, Vol- ume1(Lon...

work page doi:10.18653/v1/n19-1423 2019
[7]

In: Sierra, C

Donadello, I., Serafini, L., d’Avila Garcez, A.: Logic tensor networks for semantic image interpretation. In: Sierra, C. (ed.) International Joint Conferences on Arti- ficial Intelligence (IJCAI 2017). pp. 1596–1602. Curran Associates Inc, Red Hook, NY (2017). https://doi.org/10.24963/ijcai.2017/221

work page doi:10.24963/ijcai.2017/221 2017
[8]

Neurosymbolic

Garcez, A.d., Lamb, L.C.: Neurosymbolic AI: the 3rd wave. Artificial Intelligence Review56(11), 12387–12406 (2023). https://doi.org/10.1007/s10462-023-10448-w

work page doi:10.1007/s10462-023-10448-w 2023
[9]

Hájek, P.: Metamathematics of fuzzy logic, Trends in logic, vol. 4. Kluwer Aca- demic, Dordrecht (1998). https://doi.org/10.1007/978-94-011-5300-3

work page doi:10.1007/978-94-011-5300-3 1998
[10]

In: Proceedings of the 14th International Joint Conference on Artificial Intelligence – Volume 2

Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence – Volume 2. pp. 1137–1143. IJCAI’95, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA (1995). https://doi.org/10.5555/1643031. 1643047

work page doi:10.5555/1643031 1995
[11]

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive nlp tasks. In: H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin (eds.) Advances in Neural Information Processing Systems. vol. 33, pp. 9459–9474...

work page doi:10.5555/3495724 2020
[12]

Synthesis Lectures on Human Language Technologies, Springer In- ternational Publishing and Imprint Springer, Cham, 1st ed

Lin, J., Nogueira, R., Yates, A.: Pretrained Transformers for Text Ranking: BERT and Beyond. Synthesis Lectures on Human Language Technologies, Springer In- ternational Publishing and Imprint Springer, Cham, 1st ed. 2022 edn. (2022). https://doi.org/10.1007/978-3-031-02181-7

work page doi:10.1007/978-3-031-02181-7 2022
[13]

Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation,

Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, in- formedness, markedness and correlation. International Journal of Machine Learn- ing Technology 2:1 (2011) (2011), https://arxiv.org/pdf/2010.16061

work page arXiv 2011
[14]

Digital Society3(1), 1 (2023)

Richmond, K.M., Muddamsetty, S.M., Gammeltoft-Hansen, T., Olsen, H.P., Moes- lund, T.B.: Explainable AI and law: An evidential survey. Digital Society3(1), 1 (2023). https://doi.org/10.1007/s44206-023-00081-z

work page doi:10.1007/s44206-023-00081-z 2023
[15]

The probabilistic relevance framework: Bm25 and beyond

Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval3(4), 333–389 (2009). https://doi.org/10.1561/1500000019

work page doi:10.1561/1500000019 2009
[16]

2019 , month = may, journal =

Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x

work page doi:10.1038/s42256-019-0048-x 2019
[17]

In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M

Serafini, L., d’Avila Garcez, A.S.: Learning and reasoning with logic tensor net- works. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016: ad- vances in artificial intelligence, Lecture Notes in Artificial Intelligence, vol. 10037, pp. 334–348. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49130-1_ 25

work page doi:10.1007/978-3-319-49130-1_ 2016
[18]

2009 , issn =

Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Information Processing & Management45(4), 427–437 (2009). https://doi.org/10.1016/j.ipm.2009.03.002

work page doi:10.1016/j.ipm.2009.03.002 2009
[19]

In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

Taubenfeld, A., Sheffer, T., Ofek, E., Feder, A., Goldstein, A., Gekhman, Z., Yona, G.: Confidence improves self-consistency in LLMs. In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T. (eds.) Findings of the Association for Computational Neurosymbolic Offer Validation in Regulated Procurement 15 Linguistics: ACL 2025. pp. 20090–20111. Association for C...

work page doi:10.18653/v1/2025.findings-acl 2025