NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

Bj\"orn Gehrke; Daniel Romero Schellhorn; Till Mossakowski

arxiv: 2606.19279 · v1 · pith:PZXTIAJUnew · submitted 2026-06-17 · 💻 cs.AI · cs.LG· cs.LO· math.CT· math.LO· math.PR

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

Daniel Romero Schellhorn , Till Mossakowski , Bj\"orn Gehrke This is my paper

Pith reviewed 2026-06-26 20:34 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.LOmath.CTmath.LOmath.PR

keywords neurosymbolic semanticscategorical semanticsmonadsdifferentiable tensor programmingneural predicatesMNIST addition taskprobabilistic programming

0 comments

The pith

NeSyCat Torch embeds neural network interpretations of symbols into a monadic categorical semantics for uniform neurosymbolic learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce NeSyCat Torch to fill the gap in NeSyCat by allowing computational symbols to be interpreted through neural networks. NeSyCat itself provides a single inductive definition of truth that works across classical, fuzzy, probabilistic and neural systems by being parametric in a strong monad and an aggregation on truth values. The implementation uses the distribution monad for reference semantics and the lazy log-tensor monad for stable differentiable training, with monadic bind handling marginalization in do-notation. This setup supports batching and remains parametric, so the same axioms apply to many first-order neurosymbolic approaches. Experiments on MNIST addition show better speed and accuracy than LTN and DeepProbLog, nearly matching DeepStochLog.

Core claim

The paper claims that by interpreting symbols via neural networks and realizing the framework in tensor backends with the distribution monad and lazy log-tensor monad, one obtains a differentiable implementation of the categorical semantics that preserves the inductive truth definition while enabling efficient training and outperforming prior systems on benchmark tasks.

What carries the argument

The lazy log-tensor monad over the log-semiring, which performs lazy marginalization via monadic bind in do-notation for numerically stable backpropagation.

If this is right

Code written once in monad-based do-notation can be reused across different monads for various semantics.
Neural approximation of symbols integrates directly without breaking the categorical structure.
Batching via an additional monad layer allows scalable training without changes to the core logic.
The approach applies to continuous domains once a suitable monad is implemented with neural components.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This construction could integrate with other tensor-based probabilistic programming libraries for broader neurosymbolic applications.
Testing on tasks beyond MNIST addition would verify if the performance gains generalize to new logical queries.
The lazy pruning of branches in the monad may suggest similar optimizations in other logical tensor systems.

Load-bearing premise

The monadic bind realized by tensor operations and the lazy log-tensor monad preserves the categorical semantics enough for the inductive truth definition to hold after neural approximation and batching.

What would settle it

A counterexample where the output probabilities or truth values from the tensor implementation diverge significantly from those computed by the reference distribution monad on the same logical program would show the semantics are not preserved.

read the original abstract

Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic and neural systems each define truth by their own inductive rules. NeSyCat, extending ULLER, subsumes them under a single inductive definition of truth, parametric in a strong monad and an aggregation structure on truth-values. NeSyCat has so far lacked an account of predicates and functions learned by neural networks. We provide NeSyCat Torch as the missing link and interpret computational symbols via neural networks, implementing the framework in probabilistic programming and tensor-based backends. We use the distribution monad for reference semantics and metric evaluation, and complement it by a monad for numerically stable, differentiable training: the lazy log-tensor monad over the log-semiring. For efficient training in batches, we furthermore employ a batch monad. The axioms are the source code: written once in monad-based do-notation, monadic bind performs marginalisation, lazily pruning unneeded branches. On MNIST addition, our HaskTorch, JAX, and PyTorch implementations outperform LTN and DeepProbLog in speed and accuracy, while achieving nearly the accuracy of DeepStochLog. However, unlike DeepStochLog, we stay in a uniform framework that applies to many first-order NeSy approaches. Namely, the construction is parametric in the monad; instantiating it with, e.g., the Giry monad extends the approach to continuous probability (working out a neural representation here is left for future work).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NeSyCat Torch supplies a concrete tensor implementation that adds neural predicates to the monad-parametric NeSyCat setup and runs competitively on MNIST addition, but the claim that lazy log-tensor and batch monads preserve the original inductive semantics rests on the code without an explicit check.

read the letter

The main takeaway is that this paper ships working code for NeSyCat in PyTorch, JAX, and HaskTorch. It lets neural networks stand in for predicates and functions while the truth definition stays parametric in a strong monad, using the distribution monad for reference and a lazy log-tensor monad plus batch monad for stable, differentiable training. The MNIST addition results show better speed and accuracy than LTN and DeepProbLog, and nearly match DeepStochLog without leaving the uniform framework.

What is actually new is the tensor realization itself. Earlier NeSyCat work lacked an account of learned neural symbols; here the axioms are written once in do-notation, monadic bind handles marginalization and lazy pruning, and the same source works across backends. That is a direct, usable extension.

The soft spot is the preservation question. The construction assumes that realizing bind with tensor operations, log-semiring arithmetic, and batching still computes the same truth values as the abstract monad once neural approximations enter. No small non-neural example or monad-law verification is described to confirm equivalence before the empirical test. The performance numbers are given without error bars or ablations, so the robustness of the gains is not fully clear. These are real but not fatal gaps for an implementation paper.

This is for people already working on neurosymbolic systems who want one codebase that can swap monads and plug in neural components. A reader who needs portable code and a unified starting point will get value. It is worth sending to peer review because the implementation claim is specific, the comparison is direct, and the parametric property is preserved in the reported experiments.

Referee Report

2 major / 2 minor

Summary. The paper presents NeSyCat Torch, a tensor implementation of the NeSyCat categorical framework for neurosymbolic learning. It extends the framework to neural predicates and functions by realizing symbols via neural networks, using the distribution monad for reference semantics and a lazy log-tensor monad (over the log-semiring) for differentiable training, together with a batch monad for efficiency. The implementation is written once in monad-based do-notation (with bind performing marginalization and lazy pruning), remains parametric in the monad, and is instantiated in HaskTorch, JAX, and PyTorch backends. On the MNIST addition task the approach is reported to outperform LTN and DeepProbLog in speed and accuracy while approaching DeepStochLog performance; the source code is presented as the axioms of the construction.

Significance. If the tensor monads preserve the original inductive definition of truth, the work supplies a uniform, extensible first-order neurosymbolic framework that supports both probabilistic reference semantics and efficient neural training. Notable strengths are the parametric monad design (explicitly allowing future extension to the Giry monad), the use of source code as axioms, and the provision of multiple backend implementations. These features could reduce fragmentation across classical, fuzzy, probabilistic and neural NeSy systems.

major comments (2)

[Abstract] Abstract: the central claim that the lazy log-tensor monad and batch monad preserve NeSyCat's inductive definition of truth after neural approximation and batching is load-bearing, yet the manuscript supplies no monad-law verification, equivalence proof on a non-neural fragment, or other formal check that tensor-realized bind computes the same marginals and truth values as the abstract strong monad; MNIST accuracy/speed results alone do not establish semantic fidelity.
[Experimental evaluation] Experimental evaluation (MNIST addition): the reported outperformance over LTN and DeepProbLog is presented without error bars, number of runs, ablation on the choice of lazy pruning or log-semiring arithmetic, or details of how neural predicates are integrated into the monadic construction, leaving open whether the gains are robust or depend on post-hoc implementation choices.

minor comments (2)

The interaction between the batch monad and the lazy log-tensor monad would benefit from a small worked example showing how a simple first-order formula is evaluated under batching.
[Abstract] The abstract states that the construction 'stays in a uniform framework' but does not explicitly contrast the monadic parametricity with the non-parametric aspects of DeepStochLog; a short comparative paragraph would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger evidence of semantic preservation and more rigorous experimental reporting. We address each major comment below, indicating planned revisions where appropriate. The manuscript's core contribution remains the monad-parametric tensor implementation with source code as axioms, but we will strengthen the presentation as detailed.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the lazy log-tensor monad and batch monad preserve NeSyCat's inductive definition of truth after neural approximation and batching is load-bearing, yet the manuscript supplies no monad-law verification, equivalence proof on a non-neural fragment, or other formal check that tensor-realized bind computes the same marginals and truth values as the abstract strong monad; MNIST accuracy/speed results alone do not establish semantic fidelity.

Authors: The implementation is expressed uniformly in monad-based do-notation with bind explicitly performing marginalization and lazy pruning, which by construction mirrors the abstract strong monad semantics of NeSyCat. The distribution monad provides the reference semantics, while the lazy log-tensor monad is engineered for numerical stability over the log-semiring without altering the inductive truth definition. We acknowledge the absence of an explicit monad-law verification or equivalence proof in the current draft. In revision we will add a dedicated subsection sketching the equivalence on the non-neural fragment (showing that tensor bind yields identical marginals and truth values) and confirming that the custom monads satisfy the required laws up to the lazy pruning approximation. This addresses the load-bearing claim without relying solely on MNIST results. revision: partial
Referee: [Experimental evaluation] Experimental evaluation (MNIST addition): the reported outperformance over LTN and DeepProbLog is presented without error bars, number of runs, ablation on the choice of lazy pruning or log-semiring arithmetic, or details of how neural predicates are integrated into the monadic construction, leaving open whether the gains are robust or depend on post-hoc implementation choices.

Authors: We agree that the experimental section would benefit from greater transparency. The neural predicates are integrated by realizing symbols as neural networks inside the monadic do-notation, with the batch monad handling efficient tensor operations; this is described in the implementation sections but can be expanded. In the revised manuscript we will report results with error bars over 5 independent runs with different random seeds, include an ablation study varying the lazy pruning threshold and comparing log-semiring versus standard arithmetic, and add explicit pseudocode or diagrams showing neural predicate embedding within the monadic construction. These additions will demonstrate that the reported speed and accuracy gains are robust rather than dependent on specific post-hoc choices. revision: yes

Circularity Check

0 steps flagged

No circularity: implementation and empirical MNIST results are independent of fitted inputs or self-referential definitions

full rationale

The paper describes a monad-parametric implementation of NeSyCat (distribution monad for reference, lazy log-tensor monad for training, plus batch monad) with neural symbol interpretation, then reports measured speed/accuracy on MNIST addition as an external empirical outcome. No equation, claim, or self-citation reduces the performance numbers or the preservation of inductive truth to a quantity defined inside the same paper. The source-code-as-axioms statement is an implementation choice, not a derivation that equates output to input by construction. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The construction rests on the categorical semantics of NeSyCat/ULLER (standard_math) and on the existence of numerically stable tensor realizations of the lazy log-tensor monad and batch monad (domain_assumption). No free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math NeSyCat supplies a single inductive definition of truth parametric in a strong monad and an aggregation structure
Invoked in the first paragraph as the base framework being extended.
domain assumption The lazy log-tensor monad over the log-semiring yields numerically stable differentiable training
Stated as the monad chosen for training; no independent verification supplied in abstract.

pith-pipeline@v0.9.1-grok · 5827 in / 1473 out tokens · 20162 ms · 2026-06-26T20:34:04.663001+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 11 canonical work pages

[1]

Conference version in FoSSaCS 2010, LNCS 6014, pp

doi: 10.2168/LMCS-11(1: 3)2015. Conference version in FoSSaCS 2010, LNCS 6014, pp. 297–311. P. B. Andrews.An Introduction to Mathematical Logic and Type Theory: To Truth Through Proof. Academic press,

work page doi:10.2168/lmcs-11(1: 2015
[2]

Logic Tensor Networks , volume =

doi: 10.1016/J.ARTINT.2021.103649. URLhttps://doi.org/10.1016/j.artint.2021.103649. Samy Badreddine, Luciano Serafini, and Michael Spranger. logLTN: Differentiable Fuzzy Logic in the Logarithm Space. arXiv:2306.14546, June

work page doi:10.1016/j.artint.2021.103649 2021
[3]

doi: 10.1093/imanum/draa038. Wray L. Buntine. Operations for learning with graphical models.Journal of Artificial Intelligence Research, 2:159–225,

work page doi:10.1093/imanum/draa038
[4]

Alonzo Church

doi: 10.1613/jair.62. Alonzo Church. A formulation of the simple theory of types.The journal of symbolic logic, 5(2):56–68,

work page doi:10.1613/jair.62
[5]

Bruno Gavranovi´ c, Paul Lessard, Andrew Dudzik, Tamara von Glehn, Jo˜ ao G

doi: 10.1109/LICS.2019.8785665. Bruno Gavranovi´ c, Paul Lessard, Andrew Dudzik, Tamara von Glehn, Jo˜ ao G. M. Ara´ ujo, and Petar Veliˇ ckovi´ c. Position: Categorical Deep Learning is an Algebraic Theory of All Architectures. arXiv:2402.15332, June

work page doi:10.1109/lics.2019.8785665 2019
[6]

2006.Decision Modelling For Health Economic Evaluation

ISBN 978-0-19-851598-2. doi: 10.1093/oso/ 9780198515982.001.0001. Anders Kock. Monads on symmetric monoidal closed categories.Archiv der Mathematik, 21:1–10,

work page doi:10.1093/oso/
[7]

Christina Kohl and Christina Schwaiger

doi: 10.1007/BF01220868. Christina Kohl and Christina Schwaiger. Monads in computer science,

work page doi:10.1007/bf01220868
[8]

Artificial Intelligence298, 103504 (2021)

doi: 10.1016/J.ARTINT.2021.103504. URLhttps://doi.org/10.1016/ j.artint.2021.103504. Eugenio Moggi. Notions of computation and monads.Information and Computation, 93 (1):55–92, July

work page doi:10.1016/j.artint.2021.103504 2021
[9]

Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Justin Chiu, Neeraj Pradhan, Alexan- der M

doi: 10.1016/0890-5401(91)90052-4. Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Justin Chiu, Neeraj Pradhan, Alexan- der M. Rush, and Noah D. Goodman. Tensor variable elimination for plated factor graphs. InProceedings of the 36th International Conference on Machine Learning (ICML), vol- ume 97 ofProceedings of Machine Learning Research, pages 4871–4880,

work page doi:10.1016/0890-5401(91)90052-4
[11]

Natalia ´Slusarz, Ekaterina Komendantskaya, Matthew L

URLhttps: //arxiv.org/abs/2604.24612. Natalia ´Slusarz, Ekaterina Komendantskaya, Matthew L. Daggitt, Robert Stewart, and Kathrin Stark. Logic of Differentiable Logics: Towards a Uniform Semantics of DL. arXiv:2303.10650, October

Pith/arXiv arXiv
[12]

Neural Probabilistic Logic Programming in Discrete- Continuous Domains

Lennert De Smet, Pedro Zuidberg Dos Martires, Robin Manhaeve, Giuseppe Marra, Ange- lika Kimmig, and Luc De Raedt. Neural Probabilistic Logic Programming in Discrete- Continuous Domains. arXiv:2303.04660, March

arXiv
[13]

Commonsense visual sense- making for autonomous driving - on generalised neurosymbolic online abduction inte- grating vision and semantics.Artif

doi: 10.1016/j.artint. 2021.103602. Emile van Krieken, Thiviyan Thanapalasingam, Jakub M. Tomczak, Frank van Harmelen, and Annette ten Teije. A-NeSI: A scalable approximate method for probabilistic neu- rosymbolic inference. InAdvances in Neural Information Processing Systems (NeurIPS),

work page doi:10.1016/j.artint 2021
[14]

doi: 10.1007/978-3-031-71167-1

work page doi:10.1007/978-3-031-71167-1

[1] [1]

Conference version in FoSSaCS 2010, LNCS 6014, pp

doi: 10.2168/LMCS-11(1: 3)2015. Conference version in FoSSaCS 2010, LNCS 6014, pp. 297–311. P. B. Andrews.An Introduction to Mathematical Logic and Type Theory: To Truth Through Proof. Academic press,

work page doi:10.2168/lmcs-11(1: 2015

[2] [2]

Logic Tensor Networks , volume =

doi: 10.1016/J.ARTINT.2021.103649. URLhttps://doi.org/10.1016/j.artint.2021.103649. Samy Badreddine, Luciano Serafini, and Michael Spranger. logLTN: Differentiable Fuzzy Logic in the Logarithm Space. arXiv:2306.14546, June

work page doi:10.1016/j.artint.2021.103649 2021

[3] [3]

doi: 10.1093/imanum/draa038. Wray L. Buntine. Operations for learning with graphical models.Journal of Artificial Intelligence Research, 2:159–225,

work page doi:10.1093/imanum/draa038

[4] [4]

Alonzo Church

doi: 10.1613/jair.62. Alonzo Church. A formulation of the simple theory of types.The journal of symbolic logic, 5(2):56–68,

work page doi:10.1613/jair.62

[5] [5]

Bruno Gavranovi´ c, Paul Lessard, Andrew Dudzik, Tamara von Glehn, Jo˜ ao G

doi: 10.1109/LICS.2019.8785665. Bruno Gavranovi´ c, Paul Lessard, Andrew Dudzik, Tamara von Glehn, Jo˜ ao G. M. Ara´ ujo, and Petar Veliˇ ckovi´ c. Position: Categorical Deep Learning is an Algebraic Theory of All Architectures. arXiv:2402.15332, June

work page doi:10.1109/lics.2019.8785665 2019

[6] [6]

2006.Decision Modelling For Health Economic Evaluation

ISBN 978-0-19-851598-2. doi: 10.1093/oso/ 9780198515982.001.0001. Anders Kock. Monads on symmetric monoidal closed categories.Archiv der Mathematik, 21:1–10,

work page doi:10.1093/oso/

[7] [7]

Christina Kohl and Christina Schwaiger

doi: 10.1007/BF01220868. Christina Kohl and Christina Schwaiger. Monads in computer science,

work page doi:10.1007/bf01220868

[8] [8]

Artificial Intelligence298, 103504 (2021)

doi: 10.1016/J.ARTINT.2021.103504. URLhttps://doi.org/10.1016/ j.artint.2021.103504. Eugenio Moggi. Notions of computation and monads.Information and Computation, 93 (1):55–92, July

work page doi:10.1016/j.artint.2021.103504 2021

[9] [9]

Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Justin Chiu, Neeraj Pradhan, Alexan- der M

doi: 10.1016/0890-5401(91)90052-4. Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Justin Chiu, Neeraj Pradhan, Alexan- der M. Rush, and Noah D. Goodman. Tensor variable elimination for plated factor graphs. InProceedings of the 36th International Conference on Machine Learning (ICML), vol- ume 97 ofProceedings of Machine Learning Research, pages 4871–4880,

work page doi:10.1016/0890-5401(91)90052-4

[10] [11]

Natalia ´Slusarz, Ekaterina Komendantskaya, Matthew L

URLhttps: //arxiv.org/abs/2604.24612. Natalia ´Slusarz, Ekaterina Komendantskaya, Matthew L. Daggitt, Robert Stewart, and Kathrin Stark. Logic of Differentiable Logics: Towards a Uniform Semantics of DL. arXiv:2303.10650, October

Pith/arXiv arXiv

[11] [12]

Neural Probabilistic Logic Programming in Discrete- Continuous Domains

Lennert De Smet, Pedro Zuidberg Dos Martires, Robin Manhaeve, Giuseppe Marra, Ange- lika Kimmig, and Luc De Raedt. Neural Probabilistic Logic Programming in Discrete- Continuous Domains. arXiv:2303.04660, March

arXiv

[12] [13]

Commonsense visual sense- making for autonomous driving - on generalised neurosymbolic online abduction inte- grating vision and semantics.Artif

doi: 10.1016/j.artint. 2021.103602. Emile van Krieken, Thiviyan Thanapalasingam, Jakub M. Tomczak, Frank van Harmelen, and Annette ten Teije. A-NeSI: A scalable approximate method for probabilistic neu- rosymbolic inference. InAdvances in Neural Information Processing Systems (NeurIPS),

work page doi:10.1016/j.artint 2021

[13] [14]

doi: 10.1007/978-3-031-71167-1

work page doi:10.1007/978-3-031-71167-1