Recognition: 2 theorem links
· Lean TheoremA PyTorch Library of Turing-Complete Neural Networks
Pith reviewed 2026-05-12 01:00 UTC · model grok-4.3
The pith
A PyTorch package builds neural networks whose forward pass exactly advances any given Turing machine by one step.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a transition function and terminal states, the package produces a neural network whose single forward pass computes the next configuration of the Turing machine. One architecture uses a transformer with self-attention, cross-attention, and feedforward layers; the other is a recurrent network that encodes the tape stack via a Cantor set representation. Both are built by first showing how ReLU networks realize boolean gates, disjunctive normal form formulas, and binary addition, then using hard attention to perform exact positional lookups on the tape.
What carries the argument
The compilation procedure that maps a Turing machine transition function directly to network weights, using ReLU subnetworks for logic and hard attention for tape access.
If this is right
- The resulting models simulate the specified Turing machine exactly in one forward pass with no training required.
- ReLU networks can be composed to implement arbitrary boolean circuits including AND, OR, NOT, XOR, DNF formulas, and binary adders.
- Hard attention layers can perform precise positional lookups to read and write on a simulated tape.
- The library provides a concrete starting point for examining how gradient-based training affects these exactly constructed solutions.
- Two separate architectures realize the same simulation result via different theoretical routes.
Where Pith is reading between the lines
- Researchers could initialize networks with these constructed weights and test whether gradient descent preserves or improves the exact simulation behavior.
- The same construction technique might be adapted to other computational models such as pushdown automata or register machines.
- Because the networks are differentiable, they open the possibility of studying hybrid symbolic-neural systems where parts of the machine are learned rather than hard-coded.
Load-bearing premise
The PyTorch code correctly implements the exact behaviors described in the two cited theoretical papers without introducing any discrepancies in the neural operations.
What would settle it
Run the generated network on a simple Turing machine for several steps and compare its tape contents and state against an independent Python simulator of the same machine; any mismatch on even one step would disprove exact simulation.
Figures
read the original abstract
We present a PyTorch package that compiles neural networks and their weights from Turing machine descriptions, producing models that exactly simulate the specified machine without any training. Given a transition function and a set of terminal states, the package constructs a model whose forward pass corresponds to one step of the Turing machine. Two architectures are implemented, each realizing a different theoretical result: (1) a transformer with self-attention, cross-attention, and feedforward layers based on Wei, Chen, and Ma (2021), and (2) a recurrent network based on Siegelmann and Sontag (1995) that encodes the stack in a Cantor set. We develop the constructions from first principles, showing how ReLU networks implement Boolean circuits (AND, OR, NOT, XOR gates and their composition into DNF formulas and binary adders) and how hard attention implements positional lookup on the tape. The package serves as a concrete, runnable reference for the symbolic-neural bridge, and as a foundation for future work on the stability of constructed solutions under gradient-based optimization. Code is available at https://github.com/jonrbates/turing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a PyTorch package that compiles neural network models and weights directly from Turing machine transition functions and terminal states, yielding models whose single forward pass exactly simulates one step of the specified machine without training. It implements two architectures: a transformer using self-attention, cross-attention, and feedforward layers (following Wei, Chen, and Ma 2021) and a recurrent network that encodes the stack via the Cantor-set construction (following Siegelmann and Sontag 1995). The work develops the underlying constructions from first principles, including ReLU-based implementations of Boolean gates, DNF formulas, and binary adders, plus hard attention for positional tape lookup.
Significance. If the exact-simulation claim holds, the paper supplies a concrete, open-source artifact that makes theoretical results on neural Turing completeness directly runnable and inspectable. This is a useful reference for the symbolic-neural bridge and a foundation for studying the behavior of constructed solutions under gradient descent. The first-principles derivations and provision of reproducible code are explicit strengths that increase the work's utility beyond a purely theoretical contribution.
major comments (1)
- [Abstract] Abstract: the assertion that the constructed models 'exactly simulate' the Turing machine is load-bearing for the central claim, yet the Siegelmann-Sontag RNN relies on a Cantor-set stack encoding whose arbitrary-depth representations cannot be distinguished under IEEE 754 floating-point arithmetic (float32 or float64) used by PyTorch; the manuscript provides no discussion of rounding behavior, precision loss after repeated stack operations, or empirical verification that the forward pass remains faithful for non-trivial stack depths.
minor comments (2)
- The manuscript would benefit from an explicit mapping between the first-principles Boolean-circuit constructions and the corresponding PyTorch module implementations, including any helper functions for DNF or adder circuits.
- The GitHub repository link is given, but the main text lacks a concise API or usage example that would allow readers to reproduce the claimed one-step simulation without inspecting the source.
Simulated Author's Rebuttal
We thank the referee for this constructive observation on the distinction between theoretical exactness and floating-point implementation. We address the comment directly below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that the constructed models 'exactly simulate' the Turing machine is load-bearing for the central claim, yet the Siegelmann-Sontag RNN relies on a Cantor-set stack encoding whose arbitrary-depth representations cannot be distinguished under IEEE 754 floating-point arithmetic (float32 or float64) used by PyTorch; the manuscript provides no discussion of rounding behavior, precision loss after repeated stack operations, or empirical verification that the forward pass remains faithful for non-trivial stack depths.
Authors: We agree that the manuscript does not address floating-point effects. The exact-simulation claim follows the real-arithmetic constructions of Siegelmann and Sontag (1995), in which the Cantor-set encoding represents stack contents exactly for any finite depth. The PyTorch implementation, however, uses IEEE 754 binary floating-point, so repeated multiplications by 1/3 (or equivalent) will incur rounding error that eventually collapses distinct stack symbols. We will revise the manuscript by (i) qualifying the abstract and introduction to state that exact simulation holds in the mathematical model, (ii) adding a dedicated subsection on numerical considerations that derives the maximum reliable stack depth for float32 and float64, and (iii) including empirical verification plots that measure symbol-recovery error versus stack depth. These changes will be incorporated in the next version. revision: yes
Circularity Check
No circularity: new implementation of external theoretical results
full rationale
The paper constructs PyTorch models that simulate Turing machines by developing ReLU-based Boolean circuits and hard-attention tape lookup from first principles, then instantiating two architectures drawn from the external citations Wei et al. (2021) and Siegelmann & Sontag (1995). No parameters are fitted, no predictions are made from subsets of data, and no load-bearing step reduces to a self-citation or self-definition. The central claim is the production of a runnable library whose forward pass matches one TM step by explicit construction; this is independent of the present paper's own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The theoretical results from Wei, Chen, and Ma (2021) on transformer-based Turing machine simulation hold.
- domain assumption The recurrent network construction from Siegelmann and Sontag (1995) using Cantor sets for stack encoding is valid.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery theorem (LogicNat ≃ Nat) unclearReLU networks implement Boolean circuits (AND, OR, NOT, XOR gates and their composition into DNF formulas and binary adders)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective (J-positivity off-identity) unclearrecurrent network based on Siegelmann and Sontag (1995) that encodes the stack in a Cantor set
Reference graph
Works this paper leans on
-
[1]
S. Bhattamishra, K. Ahuja, and N. Goyal. On the computational power of transform- ers and its implications in sequence modeling. InProceedings of the 24th Conference on Computational Natural Language Learning, pp. 455–475, 2020
work page 2020
-
[2]
A. Graves, G. Wayne, and I. Danihelka. Neural Turing machines.arXiv preprint arXiv:1410.5401, 2014
work page internal anchor Pith review arXiv 2014
-
[3]
Z. Li, T. Wang, and Y. Yu. Chain of thought empowers transformers to solve inherently serial problems. InThe Twelfth International Conference on Learning Representations (ICLR), 2024
work page 2024
- [4]
-
[5]
W. Merrill and A. Sabharwal. The expressive power of transformers with chain of thought. InThe Twelfth International Conference on Learning Representations (ICLR), 2024
work page 2024
- [6]
- [7]
-
[8]
H. T. Siegelmann and E. D. Sontag. On the computational power of neural nets. InPro- ceedings of the 5th Annual ACM Conference on Computational Learning Theory (COLT), pp. 440–449, 1992
work page 1992
-
[9]
H. T. Siegelmann and E. D. Sontag. On the computational power of neural nets.Journal of Computer and System Sciences, 50(1):132–150, 1995
work page 1995
-
[10]
P. Smolensky, R. Fernandez, Z. H. Zhou, M. Opper, A. Davies, and J. Gao. Mecha- nisms of symbol processing for in-context learning in transformer networks.arXiv preprint arXiv:2410.17498, 2024
- [11]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.