Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management

Guanyu Cui; Kun He; Zhewei Wei

arxiv: 2605.19514 · v2 · pith:LUUT6FOUnew · submitted 2026-05-19 · 💻 cs.AI · cs.CL· cs.LG

Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management

Guanyu Cui , Zhewei Wei , Kun He This is my paper

Pith reviewed 2026-05-20 06:24 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords TransformersTuring-completenesscontext managementautoregressive modelslarge language modelscomputational powerfixed-system setting

0 comments

The pith

Turing-completeness of real-world autoregressive Transformers depends on context management in the fixed-system setting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many existing proofs claim Transformers are Turing-complete by letting a family of models grow in context length or numerical precision to handle longer inputs. The paper separates this scaling-family approach from the fixed-system setting, where one unchanging autoregressive Transformer paired with one unchanging context-management method processes inputs of any length step by step. Real LLM deployments match the fixed-system setting, so scaling-family results supply only resource bounds and do not prove Turing-completeness for practical systems. Different context-management methods produce sharply different computational abilities even for identical base models. This distinction shows that context management is the component that actually sets the computational power of deployed Transformers.

Core claim

Existing proofs of Transformer Turing-completeness are established in the scaling-family setting, but real-world LLM deployment and the standard notion of Turing-completeness correspond to the fixed-system setting in which a single fixed autoregressive Transformer is coupled with a fixed context-management method; therefore scaling-family results do not establish Turing-completeness, and different context-management methods yield sharply different computational power.

What carries the argument

The fixed-system setting, in which a single autoregressive Transformer uses a fixed context-management method to process inputs of varying lengths step by step.

If this is right

Scaling-family proofs supply resource bounds but do not establish Turing-completeness for fixed deployed systems.
Context-management methods can produce sharply different computational power for the same base Transformer architecture.
Real-world LLMs' ability to perform arbitrary computations hinges on the particular context-management method in use.
Standard Turing-completeness aligns with fixed systems rather than families of models that scale with input length.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designing stronger fixed context-management rules could raise the effective computational reach of current LLMs without enlarging model size or precision.
Examining the concrete context handlers in popular deployed models would reveal their practical computational limits more clearly than scaling arguments do.
Future constructions could identify the weakest context-management rules that achieve Turing-completeness inside the fixed-system setting.

Load-bearing premise

The premise that real-world LLM deployment and the standard notion of Turing-completeness correspond to the fixed-system setting of one unchanging Transformer and one fixed context-management method.

What would settle it

A concrete demonstration that a fixed autoregressive Transformer equipped with a standard fixed context-management method can or cannot simulate arbitrary Turing-machine computations on inputs of unbounded length would settle whether the fixed-system claim holds.

Figures

Figures reproduced from arXiv: 2605.19514 by Guanyu Cui, Kun He, Zhewei Wei.

**Figure 1.** Figure 1: illustrates the summarization-style context manager described above. Transformer xˆ Transformer xˆ =⇒ xˆ [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of appending-style context management. Left: decode a new token xˆ to append to the end of the sequence. Right: shift the window forward by 1 token. Other Methods and Remark on the Power of C. Beyond the two methods mentioned above, many other context-management mechanisms have been proposed. For example, many systems write important information to external storage and retrieve it when needed… view at source ↗

read the original abstract

Many works make the eye-catching claim that Transformers are Turing-complete. However, the literature often conflates two distinct settings: (i) a fixed Transformer system setting, in which a fixed autoregressive Transformer is coupled with a fixed context-management method to process inputs of different lengths step by step, and (ii) a scaling-family setting, in which a family of different models (with increasing context-window length or numerical precision) is used to handle different input lengths. Existing proofs of Transformer Turing-completeness are frequently established in setting (ii), whereas real-world LLM deployment and the standard notion of Turing-completeness correspond more naturally to setting (i). In this paper, we first formalize the fixed-system setting, thereby providing a concrete characterization of how real-world LLMs operate. We then argue that results proved in the scaling-family setting provide theoretically meaningful resource bounds but do not establish Turing-completeness, thereby clarifying a common misinterpretation of existing results. Finally, we show that different context-management methods can yield sharply different computational power, and we advocate the position that context management is a central component that critically determines the computational power of real-world autoregressive Transformers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper usefully separates scaling-family proofs from the fixed single-model setting that matches real LLM use, but it stays conceptual without new theorems or data.

read the letter

The main point worth knowing is that most existing proofs of Transformer Turing-completeness rely on a scaling family where you can grow context length or precision across different models, while actual deployed systems run one fixed autoregressive model plus whatever context management it uses. The authors formalize this split and say the scaling results give resource bounds but do not establish Turing-completeness for the fixed case that real LLMs inhabit. That distinction is the useful part of the work. They also note that swapping the context-management policy can produce quite different effective power, which follows from their definitions and lines up with how KV caches and truncation actually work in practice. Credit where due: the paper avoids overclaiming by acknowledging that the scaling results are still meaningful on their own terms, and it keeps the focus on how real deployments operate rather than chasing abstract universality. The soft spots are proportionate to the scope. This is a position paper, so there are no new proofs, constructions, or experiments showing exactly when a fixed system crosses into Turing-completeness or fails to. The argument that different context methods change computational power is presented conceptually rather than with concrete reductions or counterexamples that could be checked. The mapping of standard Turing-completeness to the fixed-system setting is reasonable but rests on that definitional choice, and readers who see context management as part of the overall system might push back. Overall the reasoning is internally consistent and does not rely on circularity or hidden fitting. This is for people who work on the theoretical limits of current LLMs and want to avoid loose claims when citing prior completeness results. A reader who already follows the literature on Transformer expressivity would get the most out of it as a clarifying note rather than a source of new techniques. It deserves a serious referee to test whether the formalization of the two settings holds up under scrutiny and whether the practical consequences for context management are stated sharply enough.

Referee Report

1 major / 3 minor

Summary. The paper claims that existing proofs of Turing-completeness for autoregressive Transformers typically rely on a scaling-family setting (varying models with increasing context length or precision), whereas real-world LLM deployment and the standard definition of Turing-completeness align with a fixed-system setting (a single fixed Transformer paired with a fixed context-management method). It formalizes the fixed-system setting, argues that scaling-family results yield only resource bounds rather than full Turing-completeness, demonstrates that context-management policies can produce sharply different computational powers, and concludes that context management is central to the power of real-world autoregressive Transformers.

Significance. If the fixed-system vs. scaling-family distinction is accepted, the position clarifies a frequent misinterpretation in the literature on Transformer expressivity. The formalization of the fixed-system setting supplies a concrete characterization matching practical deployment, and the observation that context management determines computational power is a direct consequence of the definitions rather than a new theorem. This could usefully redirect future work toward analyzing specific context policies instead of blanket Turing-completeness claims.

major comments (1)

The central argument rests on the premise that real-world deployment corresponds to the fixed-system setting; however, the manuscript does not address hybrid deployments (e.g., periodic model scaling or dynamic context resizing) that might blur the boundary between the two settings.

minor comments (3)

Abstract: the phrase 'eye-catching claim' is colloquial; a more neutral formulation such as 'frequently asserted claim' would better suit the journal's tone.
The formalization of the fixed-system setting (likely §3) would benefit from an explicit notation for the context-management function to make the dependence on policy clearer.
The discussion of differing computational powers induced by context-management methods could be strengthened by a short comparative table or example computation for at least two policies.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our position paper. We address the single major comment below and have incorporated a targeted revision to strengthen the manuscript's treatment of boundary cases.

read point-by-point responses

Referee: The central argument rests on the premise that real-world deployment corresponds to the fixed-system setting; however, the manuscript does not address hybrid deployments (e.g., periodic model scaling or dynamic context resizing) that might blur the boundary between the two settings.

Authors: We agree that hybrid deployments constitute a relevant boundary case not explicitly discussed in the original manuscript. In the revised version we add a short paragraph (new text in Section 2) noting that such hybrids typically consist of a sequence of fixed-system invocations or bounded dynamic adjustments to context length or precision. Because each individual computation trace still operates under a fixed context-management policy and a single model instance at any given step, the core distinction between fixed-system and scaling-family settings remains intact; unbounded external scaling is still absent. This clarification does not alter our main claims but directly responds to the referee's observation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims follow from explicit definitions

full rationale

The paper is a position piece that first defines two distinct settings (fixed-system vs. scaling-family) for assessing Transformer Turing-completeness, then argues that real-world deployment and the standard TC notion align with the fixed-system setting while prior proofs apply only to scaling-family. This mapping and the subsequent observation that context-management policies affect computational power are direct consequences of the supplied formalizations rather than reductions to self-referential equations, fitted parameters, or load-bearing self-citations. No derivation chain collapses by construction; the central position is self-contained against external benchmarks of practical LLM usage and the conventional definition of Turing-completeness.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The argument rests on domain assumptions about what counts as real-world deployment and standard Turing-completeness; no numerical free parameters or new entities are introduced.

axioms (1)

domain assumption Real-world LLM deployment corresponds to a fixed autoregressive Transformer coupled with a fixed context-management method.
This mapping is invoked to argue that scaling-family proofs do not establish Turing-completeness for practical systems.

pith-pipeline@v0.9.0 · 5747 in / 1135 out tokens · 51879 ms · 2026-05-20T06:24:39.184352+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formalize a computational model for a fixed Transformer system (T, D, C) ... summarization-style context management leads to ... constant-space ... appending-style ... linear space
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

different context-management methods can yield sharply different computational power

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.