arxiv: 2605.08150 · v1 · submitted 2026-05-03 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

A PyTorch Library of Turing-Complete Neural Networks

Jonathan Bates

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:00 UTC · model grok-4.3

classification 💻 cs.LG

keywords Turing machinesneural networksPyTorchtransformersrecurrent networkssymbolic computationboolean circuitshard attention

0 comments

The pith

A PyTorch package builds neural networks whose forward pass exactly advances any given Turing machine by one step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a library that converts a Turing machine's transition function and terminal states into either a transformer or recurrent network, so that running the network once matches one step of the machine. It shows how to realize this using ReLU layers for boolean logic and hard attention for tape positions, without any training involved. A reader would care because the code makes abstract proofs of neural Turing completeness into runnable models that can be inspected and optimized. The work also derives the necessary circuit constructions from scratch to ensure the simulation is exact.

Core claim

Given a transition function and terminal states, the package produces a neural network whose single forward pass computes the next configuration of the Turing machine. One architecture uses a transformer with self-attention, cross-attention, and feedforward layers; the other is a recurrent network that encodes the tape stack via a Cantor set representation. Both are built by first showing how ReLU networks realize boolean gates, disjunctive normal form formulas, and binary addition, then using hard attention to perform exact positional lookups on the tape.

What carries the argument

The compilation procedure that maps a Turing machine transition function directly to network weights, using ReLU subnetworks for logic and hard attention for tape access.

If this is right

The resulting models simulate the specified Turing machine exactly in one forward pass with no training required.
ReLU networks can be composed to implement arbitrary boolean circuits including AND, OR, NOT, XOR, DNF formulas, and binary adders.
Hard attention layers can perform precise positional lookups to read and write on a simulated tape.
The library provides a concrete starting point for examining how gradient-based training affects these exactly constructed solutions.
Two separate architectures realize the same simulation result via different theoretical routes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers could initialize networks with these constructed weights and test whether gradient descent preserves or improves the exact simulation behavior.
The same construction technique might be adapted to other computational models such as pushdown automata or register machines.
Because the networks are differentiable, they open the possibility of studying hybrid symbolic-neural systems where parts of the machine are learned rather than hard-coded.

Load-bearing premise

The PyTorch code correctly implements the exact behaviors described in the two cited theoretical papers without introducing any discrepancies in the neural operations.

What would settle it

Run the generated network on a simple Turing machine for several steps and compare its tape contents and state against an independent Python simulator of the same machine; any mismatch on even one step would disprove exact simulation.

Figures

Figures reproduced from arXiv: 2605.08150 by Jonathan Bates.

**Figure 1.** Figure 1: Execution trace on input B()E. The head scans right in state R, matches parentheses in state M, and verifies in state V . Matched pairs are replaced with *. Head symbol highlighted. 2.2 Notation and Activations We write ReLU(x) = max(x, 0) for the rectified linear unit. When inputs are restricted to {0, 1}, the constructions in this paper are designed so that all intermediate values remain in {0, 1} under … view at source ↗

**Figure 2.** Figure 2: Architecture of the WCM21 simulator for one step. Feedforward (FF) layers imple [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Cantor set encodings by stack length. Each stack symbol selects a subinterval, and [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

We present a PyTorch package that compiles neural networks and their weights from Turing machine descriptions, producing models that exactly simulate the specified machine without any training. Given a transition function and a set of terminal states, the package constructs a model whose forward pass corresponds to one step of the Turing machine. Two architectures are implemented, each realizing a different theoretical result: (1) a transformer with self-attention, cross-attention, and feedforward layers based on Wei, Chen, and Ma (2021), and (2) a recurrent network based on Siegelmann and Sontag (1995) that encodes the stack in a Cantor set. We develop the constructions from first principles, showing how ReLU networks implement Boolean circuits (AND, OR, NOT, XOR gates and their composition into DNF formulas and binary adders) and how hard attention implements positional lookup on the tape. The package serves as a concrete, runnable reference for the symbolic-neural bridge, and as a foundation for future work on the stability of constructed solutions under gradient-based optimization. Code is available at https://github.com/jonrbates/turing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents a PyTorch package that compiles neural network models and weights directly from Turing machine transition functions and terminal states, yielding models whose single forward pass exactly simulates one step of the specified machine without training. It implements two architectures: a transformer using self-attention, cross-attention, and feedforward layers (following Wei, Chen, and Ma 2021) and a recurrent network that encodes the stack via the Cantor-set construction (following Siegelmann and Sontag 1995). The work develops the underlying constructions from first principles, including ReLU-based implementations of Boolean gates, DNF formulas, and binary adders, plus hard attention for positional tape lookup.

Significance. If the exact-simulation claim holds, the paper supplies a concrete, open-source artifact that makes theoretical results on neural Turing completeness directly runnable and inspectable. This is a useful reference for the symbolic-neural bridge and a foundation for studying the behavior of constructed solutions under gradient descent. The first-principles derivations and provision of reproducible code are explicit strengths that increase the work's utility beyond a purely theoretical contribution.

major comments (1)

[Abstract] Abstract: the assertion that the constructed models 'exactly simulate' the Turing machine is load-bearing for the central claim, yet the Siegelmann-Sontag RNN relies on a Cantor-set stack encoding whose arbitrary-depth representations cannot be distinguished under IEEE 754 floating-point arithmetic (float32 or float64) used by PyTorch; the manuscript provides no discussion of rounding behavior, precision loss after repeated stack operations, or empirical verification that the forward pass remains faithful for non-trivial stack depths.

minor comments (2)

The manuscript would benefit from an explicit mapping between the first-principles Boolean-circuit constructions and the corresponding PyTorch module implementations, including any helper functions for DNF or adder circuits.
The GitHub repository link is given, but the main text lacks a concise API or usage example that would allow readers to reproduce the claimed one-step simulation without inspecting the source.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for this constructive observation on the distinction between theoretical exactness and floating-point implementation. We address the comment directly below.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that the constructed models 'exactly simulate' the Turing machine is load-bearing for the central claim, yet the Siegelmann-Sontag RNN relies on a Cantor-set stack encoding whose arbitrary-depth representations cannot be distinguished under IEEE 754 floating-point arithmetic (float32 or float64) used by PyTorch; the manuscript provides no discussion of rounding behavior, precision loss after repeated stack operations, or empirical verification that the forward pass remains faithful for non-trivial stack depths.

Authors: We agree that the manuscript does not address floating-point effects. The exact-simulation claim follows the real-arithmetic constructions of Siegelmann and Sontag (1995), in which the Cantor-set encoding represents stack contents exactly for any finite depth. The PyTorch implementation, however, uses IEEE 754 binary floating-point, so repeated multiplications by 1/3 (or equivalent) will incur rounding error that eventually collapses distinct stack symbols. We will revise the manuscript by (i) qualifying the abstract and introduction to state that exact simulation holds in the mathematical model, (ii) adding a dedicated subsection on numerical considerations that derives the maximum reliable stack depth for float32 and float64, and (iii) including empirical verification plots that measure symbol-recovery error versus stack depth. These changes will be incorporated in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: new implementation of external theoretical results

full rationale

The paper constructs PyTorch models that simulate Turing machines by developing ReLU-based Boolean circuits and hard-attention tape lookup from first principles, then instantiating two architectures drawn from the external citations Wei et al. (2021) and Siegelmann & Sontag (1995). No parameters are fitted, no predictions are made from subsets of data, and no load-bearing step reduces to a self-citation or self-definition. The central claim is the production of a runnable library whose forward pass matches one TM step by explicit construction; this is independent of the present paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of the two cited theoretical results and the correctness of their translation into PyTorch code. No free parameters are fitted as the models are constructed exactly. No new entities are invented.

axioms (2)

domain assumption The theoretical results from Wei, Chen, and Ma (2021) on transformer-based Turing machine simulation hold.
The paper bases one architecture on this work.
domain assumption The recurrent network construction from Siegelmann and Sontag (1995) using Cantor sets for stack encoding is valid.
The second architecture relies on this.

pith-pipeline@v0.9.0 · 5485 in / 1531 out tokens · 63919 ms · 2026-05-12T01:00:59.195673+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery theorem (LogicNat ≃ Nat) unclear
ReLU networks implement Boolean circuits (AND, OR, NOT, XOR gates and their composition into DNF formulas and binary adders)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective (J-positivity off-identity) unclear
recurrent network based on Siegelmann and Sontag (1995) that encodes the stack in a Cantor set

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages · 1 internal anchor

[1]

Bhattamishra, K

S. Bhattamishra, K. Ahuja, and N. Goyal. On the computational power of transform- ers and its implications in sequence modeling. InProceedings of the 24th Conference on Computational Natural Language Learning, pp. 455–475, 2020

work page 2020
[2]

Neural Turing Machines

A. Graves, G. Wayne, and I. Danihelka. Neural Turing machines.arXiv preprint arXiv:1410.5401, 2014

work page internal anchor Pith review arXiv 2014
[3]

Z. Li, T. Wang, and Y. Yu. Chain of thought empowers transformers to solve inherently serial problems. InThe Twelfth International Conference on Learning Representations (ICLR), 2024

work page 2024
[4]

Li and Y

Q. Li and Y. Wang. Constant bit-size transformers are Turing complete. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[5]

Merrill and A

W. Merrill and A. Sabharwal. The expressive power of transformers with chain of thought. InThe Twelfth International Conference on Learning Representations (ICLR), 2024

work page 2024
[6]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. PyTorch: An imperative style, high-performance deep learning library. InAdvances in Neural Information Processing Systems, 2019

work page 2019
[7]

Pérez, J

J. Pérez, J. Barceló, and F. Marinkovic. Attention is Turing-complete.Journal of Machine Learning Research, 22(75):1–35, 2021

work page 2021
[8]

H. T. Siegelmann and E. D. Sontag. On the computational power of neural nets. InPro- ceedings of the 5th Annual ACM Conference on Computational Learning Theory (COLT), pp. 440–449, 1992

work page 1992
[9]

H. T. Siegelmann and E. D. Sontag. On the computational power of neural nets.Journal of Computer and System Sciences, 50(1):132–150, 1995

work page 1995
[10]

Smolensky, R

P. Smolensky, R. Fernandez, Z. H. Zhou, M. Opper, A. Davies, and J. Gao. Mecha- nisms of symbol processing for in-context learning in transformer networks.arXiv preprint arXiv:2410.17498, 2024

work page arXiv 2024
[11]

C. Wei, Y. Chen, and T. Ma. Statistically meaningful approximation: a case study on approximating Turing machines with transformers.arXiv preprint arXiv:2107.13163, 2021. 12

work page arXiv 2021