pith. machine review for the scientific record. sign in

arxiv: 2603.28258 · v2 · submitted 2026-03-30 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Categorical Perception in Large Language Model Hidden States: Structural Warping at Digit-Count Boundaries

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:17 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords categorical perceptionrepresentational similarity analysislarge language modelstokenizationnumerical representationhidden statesdigit boundaries
0
0 comments X

The pith

LLM hidden states warp geometrically at digit-count boundaries like 10 and 100, fitting a categorical-perception model better than continuous distance alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models process Arabic numerals with internal representations that treat digit-count transitions as category boundaries. A model adding a boost at those boundaries explains the geometry of hidden-state similarities across layers and models more accurately than a smooth logarithmic distance function. The warping appears only at structurally defined points tied to tokenization and is absent both at control locations and in domains without such discontinuities. Two patterns emerge: some models also learn to label the categories explicitly while others exhibit the geometric effect without being able to name it. The finding indicates that input-format breaks can produce categorical perception geometry independently of semantic understanding.

Core claim

A CP-additive model combining log-distance with an additive boost at digit-count boundaries (10 and 100) fits representational similarity matrices better than a purely continuous log-distance model at 100 percent of primary layers in every one of six tested models spanning five architecture families. The advantage is confined to the structurally defined boundaries, disappears at non-boundary control positions, and is absent when the same models process temperature values whose linguistic categories lack tokenization discontinuities. Architectures split into classic CP, where explicit category labeling and geometric warping co-occur, and structural CP, where warping occurs without the ability

What carries the argument

The CP-additive model, which augments logarithmic distance with a boundary boost at tokenization discontinuities such as digit-count transitions.

If this is right

  • Tokenization discontinuities alone can induce categorical geometry in hidden states without requiring explicit semantic category knowledge.
  • Architectural family determines whether a model will also acquire the ability to report the boundary categories explicitly.
  • Purely continuous models of numerical representation are incomplete for any input format that contains token-level breaks.
  • The dissociation between geometric warping and explicit labeling is stable across boundaries and is a fixed property of each architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar warping may occur at other tokenization boundaries such as sentence or clause edges in natural language.
  • Changing tokenizer design during pretraining could reduce or eliminate these categorical effects in future models.
  • The same analysis applied to non-numerical sequences with structural breaks would test whether the phenomenon is general or number-specific.

Load-bearing premise

The better fit of the boundary-boosted model arises from genuine structural effects of tokenization rather than from unaccounted properties of the stimuli or the particular similarity measure chosen.

What would settle it

Recomputing the representational similarity analysis after shifting the assumed boundaries to random positions or after replacing the boundary term with a different functional form would eliminate the reported advantage of the CP-additive model.

read the original abstract

Categorical perception (CP) -- enhanced discriminability at category boundaries -- is among the most studied phenomena in perceptual psychology. This paper reports that analogous geometric warping occurs in the hidden-state representations of large language models (LLMs) processing Arabic numerals. Using representational similarity analysis across six models from five architecture families, the study finds that a CP-additive model (log-distance plus a boundary boost) fits the representational geometry better than a purely continuous model at 100% of primary layers in every model tested. The effect is specific to structurally defined boundaries (digit-count transitions at 10 and 100), absent at non-boundary control positions, and absent in the temperature domain where linguistic categories (hot/cold) lack a tokenisation discontinuity. Two qualitatively distinct signatures emerge: "classic CP" (Gemma, Qwen), where models both categorise explicitly and show geometric warping, and "structural CP" (Llama, Mistral, Phi), where geometry warps at the boundary but models cannot report the category distinction. This dissociation is stable across boundaries and is a property of the architecture, not the stimulus. Structural input-format discontinuities are sufficient to produce categorical perception geometry in LLMs, independently of explicit semantic category knowledge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that large language models exhibit categorical perception (CP) in hidden-state representations for Arabic numerals, with a CP-additive model (log-distance plus a boundary boost at digit-count transitions 10 and 100) fitting representational geometry better than a purely continuous log-distance model at 100% of primary layers across six models from five architecture families. The effect is specific to structurally defined boundaries, absent at non-boundary controls and in the temperature domain, and dissociates into 'classic CP' (explicit categorization plus warping in Gemma/Qwen) versus 'structural CP' (warping without explicit categorization in Llama/Mistral/Phi).

Significance. If the central modeling comparison holds after correction for the extra parameter, the result would demonstrate that tokenization-induced input discontinuities alone can produce CP-like geometric warping in LLM representations, independent of explicit semantic category knowledge. Strengths include the multi-architecture replication, use of representational similarity analysis, specificity to structural boundaries, and the dissociation between geometric and explicit effects, which could inform how discrete token boundaries shape continuous embedding spaces.

major comments (2)
  1. [Abstract] Abstract: The claim that the CP-additive model fits better at 100% of primary layers provides no quantitative fit statistics (e.g., R², likelihood, or similarity scores), error bars, exact control definitions, or details on boundary-boost estimation, preventing assessment of effect size or reliability.
  2. [Results] Modeling comparison (results): The CP-additive model adds one free parameter (boundary boost) fitted to the same data; without AIC/BIC penalization, likelihood-ratio testing, or cross-validated prediction, any raw fit improvement is expected under a continuous null and does not establish a genuine structural effect.
minor comments (2)
  1. [Methods] Clarify the precise RSA distance metric, layer selection criteria for 'primary layers', and how non-boundary control positions were chosen to match the structural boundaries.
  2. [Figures] Add error bars or confidence intervals to all reported fit comparisons across layers and models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. We have revised the abstract and results sections to incorporate quantitative fit statistics, error bars, control definitions, and penalized model comparisons as requested. Below we respond point by point.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the CP-additive model fits better at 100% of primary layers provides no quantitative fit statistics (e.g., R², likelihood, or similarity scores), error bars, exact control definitions, or details on boundary-boost estimation, preventing assessment of effect size or reliability.

    Authors: We agree the abstract requires more quantitative detail. The revised abstract now reports the mean R² improvement of the CP-additive model over the continuous baseline (0.11, SE 0.02 across all primary layers and models), the exact non-boundary control positions (digit transitions at 5, 15, 50, 150), and the boundary-boost estimation method (ordinary least-squares fit to residuals after subtracting the log-distance component). These additions allow direct evaluation of effect size and reliability. revision: yes

  2. Referee: [Results] Modeling comparison (results): The CP-additive model adds one free parameter (boundary boost) fitted to the same data; without AIC/BIC penalization, likelihood-ratio testing, or cross-validated prediction, any raw fit improvement is expected under a continuous null and does not establish a genuine structural effect.

    Authors: The referee correctly identifies the need for penalization and validation. We have added AIC/BIC comparisons to the results (mean ΔAIC = 17.6 favoring CP-additive; mean ΔBIC = 14.9), which remain decisive after the extra-parameter penalty. We also report 5-fold cross-validation across stimuli, where the CP-additive model yields higher out-of-sample similarity in 93% of folds. These analyses confirm the improvement reflects a genuine structural effect rather than overfitting. revision: yes

Circularity Check

1 steps flagged

CP-additive model superiority reduces to extra fitted parameter without penalization

specific steps
  1. fitted input called prediction [Abstract]
    "a CP-additive model (log-distance plus a boundary boost) fits the representational geometry better than a purely continuous model at 100% of primary layers in every model tested"

    The boundary boost is an additional free parameter fitted to the identical hidden-state similarity data used for the comparison. Superior raw fit is therefore guaranteed by construction for the more flexible model; the paper presents this as evidence of tokenization-driven categorical perception without reporting any correction for the extra degree of freedom.

full rationale

The paper's headline result compares a log-distance baseline to a CP-additive variant that adds one free boundary-boost parameter and reports superior fit at 100% of layers. Because the added term is estimated from the same representational similarity data, any raw improvement in fit is statistically expected under a continuous null; the abstract and described results supply no AIC/BIC correction, likelihood-ratio test, or cross-validation. This matches the fitted-input-called-prediction pattern exactly: the claimed structural warping is not a parameter-free prediction but a direct consequence of the modeling choice. No self-citation or ansatz smuggling is required for the reduction; the circularity is internal to the model comparison itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on fitting an additional boundary boost parameter and assuming representational similarity analysis captures perceptually relevant geometry in LLM states.

free parameters (1)
  • boundary boost
    Added term in the CP-additive model to capture enhanced discriminability at digit-count transitions; fitted to the representational data.
axioms (1)
  • domain assumption Representational similarity analysis accurately reflects geometric structure in hidden states relevant to categorical perception effects.
    Core method from cognitive neuroscience applied to LLMs without additional validation steps described.

pith-pipeline@v0.9.0 · 5514 in / 1201 out tokens · 37340 ms · 2026-05-14T22:17:55.882254+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.