pith. sign in

arxiv: 2605.20919 · v1 · pith:Z5HIGJQXnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.PL

Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures

Pith reviewed 2026-05-21 06:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.PL
keywords compilation to tensor graphsvector symbolic architecturesfrozen embeddingsdifferentiable programmingpolynomial logicend-to-end trainingfunctional programmingmulti-modal embeddings
0
0 comments X

The pith

A single Sutra program compiles to a tensor graph that decodes bundles at 100% accuracy on four frozen embeddings where Hadamard products fail and supports end-to-end training via autograd.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Sutra as a typed functional language whose entire program, including control flow and I/O, lowers through beta-reduction to one fused tensor-operation graph over a frozen embedding substrate. Rotation binding, unbinding, bundling, and polynomial realizations of Kleene three-valued logic all become differentiable tensor ops, so the same source runs unchanged on text encoders and a protein language model. Because the emitted graph is a standard PyTorch network, gradients flow back to train classifiers from random initialization to perfect accuracy, and trained scalars can be written back into the original source as literals. If the central claim holds, symbolic logic programs become first-class differentiable artifacts that inherit the geometric structure already present in off-the-shelf embeddings.

Core claim

Sutra is a typed, purely functional programming language whose compiled forward pass is a PyTorch neural network. The compiler beta-reduces the whole program -- primitives, control flow, string I/O -- to one fused tensor-op graph over a frozen embedding substrate. Rotation binding, unbind, bundle, polynomial Kleene three-valued logic, and tail-recursive loops all lower to tensor operations; the Kleene connectives are Lagrange-interpolated polynomials exact on the {-1, 0, +1} truth grid. Validation shows the same program runs on four frozen embeddings spanning two modalities and decodes bundles at 100% accuracy through width k=8 on every substrate, where the textbook Hadamard product has уже塌

What carries the argument

The Sutra compiler that beta-reduces functional programs containing vector-symbolic primitives into a single fused PyTorch tensor graph over a frozen embedding substrate.

If this is right

  • The identical source achieves perfect bundle decoding on three text encoders and one protein encoder even after the Hadamard product baseline collapses.
  • Autograd through the emitted graph trains a fuzzy-rule classifier from chance-level accuracy to 100 percent without altering the symbolic source.
  • A weighted variant learns a scalar gain and writes the value back into the original Sutra source as a literal that reproduces the trained behavior upon recompilation.
  • The resulting artifact functions simultaneously as executable logic and as a trainable neural network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Existing embeddings may already embed enough algebraic structure to support symbolic manipulation across modalities without fine-tuning.
  • Hybrid systems could let engineers write logic once and obtain modality-specific neural implementations by choosing different frozen substrates.
  • The approach opens a route to inspect trained models by reading the updated numeric literals back in the original source language.

Load-bearing premise

The chosen frozen embedding substrates already contain sufficient geometric structure for the tensor-op implementations of rotation binding, unbinding, and polynomial logic to succeed without any embedding adaptation or additional learned parameters.

What would settle it

Running the same compiled Sutra program on one of the four tested embeddings at bundle width k=8 and observing decoding accuracy below 100 percent.

Figures

Figures reproduced from arXiv: 2605.20919 by Emma Leonhart.

Figure 1
Figure 1. Figure 1: Per-tick dataflow of the soft-halt RNN cell. Once [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The K = 3 rule pipeline. The Sutra source above is the literal program the compiler beta-reduces to the tensor-op graph shown; own binds to p1 when computing rule1 (o0, o1 = p2, p3), and to p2, p3 for rule2, rule3 respectively. Solid boxes are PyTorch tensor ops; dashed boxes are learnable prototypes. The AND in the leftmost branch combines cos(x, p1) with the AND-of-NOTs over the other classes; rule2 and … view at source ↗
Figure 3
Figure 3. Figure 3: Five-stage compilation pipeline (§4). Boxes are intermediate artifacts; italic labels are the [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
read the original abstract

Sutra is a typed, purely functional programming language whose compiled forward pass is a PyTorch neural network. The compiler beta-reduces the whole program -- primitives, control flow, string I/O -- to one fused tensor-op graph over a frozen embedding substrate. Rotation binding, unbind, bundle, polynomial Kleene three-valued logic, and tail-recursive loops all lower to tensor operations; the Kleene connectives are Lagrange-interpolated polynomials exact on the {-1, 0, +1} truth grid. Validation is one fact tested two ways. (1) The same program runs on four frozen embeddings spanning two modalities -- three text encoders (nomic-embed-text, all-minilm, mxbai-embed-large) and one protein language model (ESM-2) -- and decodes bundles at 100% accuracy through width k=8 on every substrate, where the textbook Hadamard product has already collapsed (2.5% on mxbai-embed-large, 7.5% on all-minilm). (2) PyTorch autograd flows through the actually compiled graph: a fuzzy-rule classifier written in .su trains from random init (18.7 +/- 9.5%; chance = 20%, five classes) to 100.0 +/- 0.0% (three seeds) by backpropagating through the emitted graph, the symbolic source unmodified. A weighted variant additionally trains a scalar cosine gain and writes it back into the .su source as a numeric literal; recompiling reproduces the trained behaviour to ~2e-7 per logit, so the trained model is itself legible, recompilable code. The same artifact is therefore both a logic program and a trainable neural network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Sutra, a typed purely functional language that compiles entire programs (including rotation binding, unbinding, bundling, polynomial Kleene logic, and loops) to a single fused tensor-op graph executable as a PyTorch network over a frozen embedding substrate. It claims that the identical Sutra source achieves 100% bundle decoding accuracy through width k=8 on four diverse frozen embeddings (nomic-embed-text, all-minilm, mxbai-embed-large, ESM-2) where the textbook Hadamard product already fails, and that PyTorch autograd flows through the emitted graph to train a fuzzy-rule classifier from random initialization (18.7% to 100%) with trained scalar parameters exportable back into the source for exact reproduction (~2e-7 error).

Significance. If substantiated, the work supplies a concrete compilation target that renders vector symbolic architectures differentiable and directly trainable inside standard ML frameworks while preserving source-level legibility. The cross-embedding consistency and the closed loop from symbolic source through training back to recompilable code are concrete strengths that could support hybrid neuro-symbolic systems without embedding fine-tuning.

major comments (2)
  1. [Abstract] Abstract (validation paragraph): the headline result that the same compiled tensor-op graph decodes bundles at 100% for k=8 on all four listed embeddings (while Hadamard collapses) is load-bearing for the claim that the primitives are substrate-agnostic; however, no control experiments on random or non-contrastively-trained vectors of matching dimension are reported, leaving open the possibility that success exploits latent superposition statistics already present in the chosen pre-trained encoders rather than the rotation/Lagrange operators themselves.
  2. [Abstract] Abstract (training paragraph): the reported end-to-end training (18.7 +/- 9.5% to 100.0 +/- 0.0% across three seeds) and the ~2e-7 reproduction error after writing the learned cosine gain back into the .su source are central to the differentiability claim, yet the manuscript provides neither the classifier architecture, loss function, nor any ablation on the contribution of the compiled graph versus the embedding, rendering the numerical support difficult to verify or reproduce from the given text.
minor comments (1)
  1. The description of how tail-recursive loops and string I/O lower to tensor operations is stated at a high level; an explicit small example program together with its emitted graph would clarify the beta-reduction claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional controls and details for improved clarity and reproducibility.

read point-by-point responses
  1. Referee: [Abstract] Abstract (validation paragraph): the headline result that the same compiled tensor-op graph decodes bundles at 100% for k=8 on all four listed embeddings (while Hadamard collapses) is load-bearing for the claim that the primitives are substrate-agnostic; however, no control experiments on random or non-contrastively-trained vectors of matching dimension are reported, leaving open the possibility that success exploits latent superposition statistics already present in the chosen pre-trained encoders rather than the rotation/Lagrange operators themselves.

    Authors: We agree that control experiments on random vectors would strengthen the substrate-agnostic claim. The current results show 100% decoding on four diverse frozen embeddings (including cross-modal ESM-2) where Hadamard fails, but we acknowledge the absence of random-vector baselines leaves room for alternative explanations. In the revised manuscript we will add experiments with randomly initialized vectors of matching dimension to isolate the contribution of the rotation/Lagrange operators. revision: yes

  2. Referee: [Abstract] Abstract (training paragraph): the reported end-to-end training (18.7 +/- 9.5% to 100.0 +/- 0.0% across three seeds) and the ~2e-7 reproduction error after writing the learned cosine gain back into the .su source are central to the differentiability claim, yet the manuscript provides neither the classifier architecture, loss function, nor any ablation on the contribution of the compiled graph versus the embedding, rendering the numerical support difficult to verify or reproduce from the given text.

    Authors: We accept that the training details are insufficient for verification. The revised manuscript will explicitly describe the fuzzy-rule classifier architecture written in Sutra, the loss function (cross-entropy over five classes), and an ablation that compares performance with and without the compiled graph. These additions will make the autograd flow and closed-loop reproduction (~2e-7 error) fully reproducible from the text. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on independent frozen embeddings

full rationale

The paper's derivation consists of a compiler that lowers Sutra programs (including rotation binding, unbinding, bundling, and Lagrange-interpolated Kleene logic) to fused tensor operations over a frozen embedding substrate, followed by direct empirical validation. The headline result—identical source achieving 100% bundle decode accuracy at k=8 across nomic-embed-text, all-minilm, mxbai-embed-large, and ESM-2 while Hadamard collapses—is measured on public, independently trained models with no adaptation or learned parameters in the embeddings themselves. Training a fuzzy-rule classifier from random initialization (18.7% to 100.0%) via autograd through the emitted graph further confirms differentiability without any fitted quantity being renamed as a prediction. No self-definitional loop, fitted-input prediction, or load-bearing self-citation chain appears in the provided text; the chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on standard functional reduction, the geometric properties of the chosen embeddings, and the exactness of Lagrange interpolation on the discrete truth grid; no free parameters are fitted to the reported results and no new physical entities are postulated.

axioms (2)
  • standard math Beta-reduction of the functional program to a fused tensor graph is semantics-preserving
    Invoked as the compilation mechanism that turns control flow and primitives into tensor ops.
  • domain assumption Lagrange interpolation yields exact Kleene connectives on the {-1,0,+1} grid
    Used to implement three-valued logic as polynomial tensor operations.
invented entities (1)
  • Sutra language and its compiler no independent evidence
    purpose: Typed functional language whose forward pass is a PyTorch tensor graph
    Newly defined artifact that serves as both logic program and trainable network.

pith-pipeline@v0.9.0 · 5852 in / 1582 out tokens · 82496 ms · 2026-05-21T06:25:00.773335+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

  1. [1]

    • Darwiche, A., & Marquis, P. (2002). A knowledge compilation map.JAIR17:229–264. • Gayler, R. W. (2003). Vector symbolic architectures answer Jackendoff's challenges for cognitive neuroscience.Joint International Conference on Cognitive Science. • Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in dis- tributed representation...

  2. [2]

    vector" means

    Kluwer Academic. The standard reference for t-norm-based fuzzy logics (Gödel, Łukasiewicz, product) cited in §1.1-1 to place Sutra's polynomial connectives. • Heddes, M., Nunes, I., Vergés, P., Kleyko, D., Abraham, D., Givargis, T., Nicolau, A., & Veidenbaum, A. (2023). Torchhd: An open source python library to support research on hyperdimensional computi...