pith. machine review for the scientific record. sign in

arxiv: 2604.05469 · v1 · submitted 2026-04-07 · 📊 stat.ME · cs.LG· stat.ML

Recognition: 3 theorem links

· Lean Theorem

Task Ecologies and the Evolution of World-Tracking Representations in Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:30 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML
keywords language modelsrepresentational selectionecological veridicalityJensen-Shannon excessnext-token predictionequivalence classesworld-trackingtask ecologies
0
0 comments X

The pith

Language models develop world-tracking representations exactly when their encodings preserve the equivalence classes of the training ecology.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that for any encoding of latent world states, the Bayes-optimal next-token cross-entropy decomposes into irreducible conditional entropy plus a Jensen-Shannon excess term. This excess vanishes if and only if the encoding preserves the training ecology's equivalence classes. This gives a precise notion of ecological veridicality. A sympathetic reader cares because it explains when autoregressive learning leads to representations that track the world rather than just fitting surface patterns. It also predicts specific failure modes like loss of distinctions under simplicity pressure and excess error on new ecologies.

Core claim

For any encoding of latent world states, the Bayes-optimal next-token cross-entropy decomposes into the irreducible conditional entropy plus a Jensen-Shannon excess term. That excess vanishes if and only if the encoding preserves the training ecology's equivalence classes. This yields a precise notion of ecological veridicality for language models and identifies the minimum-complexity zero-excess solution as the quotient partition by training equivalence. The framework applies to frozen dense and frozen Mixture-of-Experts transformers, with in-context learning not enlarging the separation set and per-task adaptation breaking the premise. It predicts two characteristic failure modes and a 1 2

What carries the argument

The Jensen-Shannon excess term in the next-token cross-entropy decomposition, which measures the divergence from the training ecology's equivalence classes and vanishes only for veridical encodings.

Load-bearing premise

The fixed-encoding analysis requires frozen models without per-task adaptation, and the dynamic extension assumes explicit heredity, variation, and selection mechanisms.

What would settle it

Controlled experiments on small language models with known finite ecologies where one can directly compute whether the Jensen-Shannon excess is zero exactly when the encoding matches the equivalence classes, or positive on refined deployment data.

Figures

Figures reproduced from arXiv: 2604.05469 by Giulio Valentino Dalla Riva.

Figure 1
Figure 1. Figure 1: Exact finite-ecology calibration of Thm. 8. Each point is a discrete ecol [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical-corpus corroboration of the static theory. Left: the exact empirical [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Selection-stage diagnostics in the microgpt Wright–Fisher experiment. Left: [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Neural validation of the two-ecology mechanism on bracket balance in Lisp source [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Exact finite-ecology calibration of Thm. 55. The plotted quantity is the objective [PITH_FULL_IMAGE:figures/full_fig_p035_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Exact corpus-induced test of Thm. 55. Left: the exact global optimum path under [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Off-ecology failure in the microgpt model organism. Left: per-model cross-entropy [PITH_FULL_IMAGE:figures/full_fig_p036_7.png] view at source ↗
read the original abstract

We study language models as evolving model organisms and ask when autoregressive next-token learning selects for world-tracking representations. For any encoding of latent world states, the Bayes-optimal next-token cross-entropy decomposes into the irreducible conditional entropy plus a Jensen--Shannon excess term. That excess vanishes if and only if the encoding preserves the training ecology's equivalence classes. This yields a precise notion of ecological veridicality for language models and identifies the minimum-complexity zero-excess solution as the quotient partition by training equivalence. We then determine when this fixed-encoding analysis applies to transformer families: frozen dense and frozen Mixture-of-Experts transformers satisfy it, in-context learning does not enlarge the model's separation set, and per-task adaptation breaks the premise. The framework predicts two characteristic failure modes: simplicity pressure preferentially removes low-gain distinctions, and training-optimal models can still incur positive excess on deployment ecologies that refine the training ecology. A conditional dynamic extension shows how inter-model selection and post-training can recover such gap distinctions under explicit heredity, variation, and selection assumptions. Exact finite-ecology checks and controlled microgpt experiments validate the static decomposition, split-merge threshold, off-ecology failure pattern, and two-ecology rescue mechanism in a regime where the relevant quantities are directly observable. The goal is not to model frontier systems at scale, but to use small language models as laboratory organisms for theory about representational selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper develops an information-theoretic framework for when autoregressive next-token prediction in language models selects for world-tracking representations. For any encoding of latent world states, the Bayes-optimal cross-entropy decomposes into irreducible conditional entropy plus a Jensen-Shannon excess term; the excess is zero if and only if the encoding preserves the training ecology's equivalence classes (the quotient partition). This yields a notion of ecological veridicality, identifies the minimal zero-excess solution, states applicability conditions for frozen dense and MoE transformers (in-context learning does not enlarge the separation set; per-task adaptation breaks the premise), predicts two failure modes (simplicity pressure removes low-gain distinctions; training-optimal models can still incur excess on refined deployment ecologies), and offers a conditional dynamic extension under explicit heredity/variation/selection. The claims are supported by exact finite-ecology checks and controlled microgpt experiments that validate the static decomposition, split-merge threshold, off-ecology failure pattern, and two-ecology rescue mechanism.

Significance. If the decomposition and its consequences hold, the work supplies a precise, observable, and parameter-free criterion for representational fidelity that depends only on the training distribution's equivalence classes rather than external world models or fitted parameters. The framing of small language models as laboratory organisms for theory, together with the exact checks and microgpt validation in a fully observable regime, is a genuine strength. The applicability statements and failure-mode predictions are scoped clearly rather than overclaimed. The result is likely to be useful for analyzing representational selection in both static and evolutionary settings.

minor comments (3)
  1. The abstract and introduction use 'ecological veridicality' and 'Jensen-Shannon excess term' without an early forward reference to the precise definitions (presumably in §2 or §3); adding one sentence that points to the relevant equations would improve readability for readers outside information theory.
  2. The dynamic extension is presented as 'conditional' under explicit heredity/variation/selection; a short paragraph clarifying which of these assumptions are necessary versus sufficient for the rescue mechanism would prevent over-interpretation.
  3. Figure captions for the microgpt experiments should explicitly state the finite ecology size and the observable quantities used to compute the excess term, even if they appear in the main text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading, accurate summary of the framework, and positive assessment. The recommendation for minor revision is appreciated. No major comments were raised in the report, so we have no specific points requiring rebuttal or revision at this stage. We will address any minor editorial suggestions in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The central decomposition of Bayes-optimal next-token cross-entropy into irreducible conditional entropy plus a Jensen-Shannon excess term follows directly from standard information-theoretic identities (conditional entropy and divergence between predictive distributions). The equivalence classes are defined externally from the training ecology's conditional next-token distributions, and the 'vanishes iff preserves classes' statement is a direct mathematical consequence of sufficiency rather than a redefinition or self-referential fit. Applicability conditions for frozen transformers, in-context learning, and the dynamic extension are stated with explicit scope and assumptions. Finite-ecology checks and microgpt experiments use observable quantities independent of the target result. No load-bearing self-citations, fitted inputs renamed as predictions, or ansatzes smuggled via prior work appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The framework rests on standard information-theoretic identities and domain assumptions about autoregressive training and transformer architectures; it introduces new terminology but no fitted numerical parameters or entities with external falsifiable handles.

axioms (2)
  • domain assumption Language models are trained under autoregressive next-token prediction
    Stated as the core learning objective for the model organisms under study.
  • domain assumption Training data defines a task ecology with well-defined equivalence classes
    Central premise required for the vanishing condition on the excess term.
invented entities (2)
  • ecological veridicality no independent evidence
    purpose: Precise notion of when an encoding preserves training equivalence classes so that excess cross-entropy vanishes
    Newly defined within the paper to capture the zero-excess condition.
  • Jensen-Shannon excess term no independent evidence
    purpose: Quantifies additional prediction error arising from failure to preserve training equivalence classes
    Derived directly from the decomposition of Bayes-optimal cross-entropy.

pith-pipeline@v0.9.0 · 5551 in / 1714 out tokens · 92013 ms · 2026-05-10T19:30:08.011932+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

  1. [1]

    Alexander Atanasov, Blake Bordelon, and Cengiz Pehlevan

    doi: 10.1162/daed_a_01909. Alexander Atanasov, Blake Bordelon, and Cengiz Pehlevan. Neural networks as kernel learners: The silent alignment effect. InThe Tenth International Conference on Learn- ing Representations, 2022. ICLR 2022 poster.https://openreview.net/forum?id= 1NvflqAdoom. Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, and G...

  2. [2]

    Jack Lindsey

    ICLR 2023 notable top 5%.https://openreview.net/forum?id=DeG07_TcZvT. Jack Lindsey. Emergent introspective awareness in large language models.Transformer Circuits Thread, 2025.https://transformer-circuits.pub/2025/introspection/ index.html. Alexander Lobashev. An information-geometric view of the Platonic Hypothesis. In NeurIPS 2025 Workshop on Symmetry a...

  3. [3]

    Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt

    doi: 10.1073/pnas.2215907120. Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability. InThe Eleventh International Conference on Learning Representations, 2023. ICLR 2023 notable top 25%.https: //openreview.net/forum?id=9XFSbDPmdW. 40 Task Ecologies and World-Tracking Repr...

  4. [4]

    https://arxiv.org/abs/2501.00226

    arXiv:2501.00226 [cs.AI], first submitted December 31, 2024; revised July 16, 2025. https://arxiv.org/abs/2501.00226. Naftali Tishby, Fernando C. Pereira, and William Bialek. The information bottleneck method. In37th Annual Allerton Conference on Communication, Control, and Com- puting, pages 368–377, 1999. Bram van Dijk, Tom Kouwenhoven, Marco Spruit, an...