arxiv: 2603.28258 · v2 · submitted 2026-03-30 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Categorical Perception in Large Language Model Hidden States: Structural Warping at Digit-Count Boundaries

Jon-Paul Cacioli

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:17 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords categorical perceptionrepresentational similarity analysislarge language modelstokenizationnumerical representationhidden statesdigit boundaries

0 comments

The pith

LLM hidden states warp geometrically at digit-count boundaries like 10 and 100, fitting a categorical-perception model better than continuous distance alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models process Arabic numerals with internal representations that treat digit-count transitions as category boundaries. A model adding a boost at those boundaries explains the geometry of hidden-state similarities across layers and models more accurately than a smooth logarithmic distance function. The warping appears only at structurally defined points tied to tokenization and is absent both at control locations and in domains without such discontinuities. Two patterns emerge: some models also learn to label the categories explicitly while others exhibit the geometric effect without being able to name it. The finding indicates that input-format breaks can produce categorical perception geometry independently of semantic understanding.

Core claim

A CP-additive model combining log-distance with an additive boost at digit-count boundaries (10 and 100) fits representational similarity matrices better than a purely continuous log-distance model at 100 percent of primary layers in every one of six tested models spanning five architecture families. The advantage is confined to the structurally defined boundaries, disappears at non-boundary control positions, and is absent when the same models process temperature values whose linguistic categories lack tokenization discontinuities. Architectures split into classic CP, where explicit category labeling and geometric warping co-occur, and structural CP, where warping occurs without the ability

What carries the argument

The CP-additive model, which augments logarithmic distance with a boundary boost at tokenization discontinuities such as digit-count transitions.

If this is right

Tokenization discontinuities alone can induce categorical geometry in hidden states without requiring explicit semantic category knowledge.
Architectural family determines whether a model will also acquire the ability to report the boundary categories explicitly.
Purely continuous models of numerical representation are incomplete for any input format that contains token-level breaks.
The dissociation between geometric warping and explicit labeling is stable across boundaries and is a fixed property of each architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar warping may occur at other tokenization boundaries such as sentence or clause edges in natural language.
Changing tokenizer design during pretraining could reduce or eliminate these categorical effects in future models.
The same analysis applied to non-numerical sequences with structural breaks would test whether the phenomenon is general or number-specific.

Load-bearing premise

The better fit of the boundary-boosted model arises from genuine structural effects of tokenization rather than from unaccounted properties of the stimuli or the particular similarity measure chosen.

What would settle it

Recomputing the representational similarity analysis after shifting the assumed boundaries to random positions or after replacing the boundary term with a different functional form would eliminate the reported advantage of the CP-additive model.

read the original abstract

Categorical perception (CP) -- enhanced discriminability at category boundaries -- is among the most studied phenomena in perceptual psychology. This paper reports that analogous geometric warping occurs in the hidden-state representations of large language models (LLMs) processing Arabic numerals. Using representational similarity analysis across six models from five architecture families, the study finds that a CP-additive model (log-distance plus a boundary boost) fits the representational geometry better than a purely continuous model at 100% of primary layers in every model tested. The effect is specific to structurally defined boundaries (digit-count transitions at 10 and 100), absent at non-boundary control positions, and absent in the temperature domain where linguistic categories (hot/cold) lack a tokenisation discontinuity. Two qualitatively distinct signatures emerge: "classic CP" (Gemma, Qwen), where models both categorise explicitly and show geometric warping, and "structural CP" (Llama, Mistral, Phi), where geometry warps at the boundary but models cannot report the category distinction. This dissociation is stable across boundaries and is a property of the architecture, not the stimulus. Structural input-format discontinuities are sufficient to produce categorical perception geometry in LLMs, independently of explicit semantic category knowledge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds geometric warping at digit boundaries in LLM states via an added boost term, but the fit claim lacks any correction for the extra parameter.

read the letter

The core observation is that a log-distance model plus a boundary boost fits hidden-state similarities better than the continuous version at every primary layer in the six models tested, and this holds only at the digit-count transitions like 10 and 100. The work also separates two patterns: models that both warp geometrically and can name the boundary versus models that warp but cannot report the category. That split appears stable across architectures rather than stimuli. The controls at non-boundary positions and in the temperature domain are useful for narrowing the cause to tokenization structure. Running the same analysis on five different families gives the result some breadth. The modeling choice itself is straightforward representational similarity analysis on Arabic numerals. The limitation is that the reported superiority comes from a version with one added free parameter and no sign of AIC, BIC, likelihood-ratio test, or cross-validation to account for that flexibility. The abstract gives the 100% figure without fit values, error bars, or details on how the boost was set, so the improvement could be expected under a continuous null. Methods and exact statistics are not visible from the abstract alone. This is worth a reading group for anyone tracking how input format shapes internal geometry separate from semantics. It deserves peer review to check whether the effect survives proper model comparison and to see the full numbers.

Referee Report

2 major / 2 minor

Summary. The paper claims that large language models exhibit categorical perception (CP) in hidden-state representations for Arabic numerals, with a CP-additive model (log-distance plus a boundary boost at digit-count transitions 10 and 100) fitting representational geometry better than a purely continuous log-distance model at 100% of primary layers across six models from five architecture families. The effect is specific to structurally defined boundaries, absent at non-boundary controls and in the temperature domain, and dissociates into 'classic CP' (explicit categorization plus warping in Gemma/Qwen) versus 'structural CP' (warping without explicit categorization in Llama/Mistral/Phi).

Significance. If the central modeling comparison holds after correction for the extra parameter, the result would demonstrate that tokenization-induced input discontinuities alone can produce CP-like geometric warping in LLM representations, independent of explicit semantic category knowledge. Strengths include the multi-architecture replication, use of representational similarity analysis, specificity to structural boundaries, and the dissociation between geometric and explicit effects, which could inform how discrete token boundaries shape continuous embedding spaces.

major comments (2)

[Abstract] Abstract: The claim that the CP-additive model fits better at 100% of primary layers provides no quantitative fit statistics (e.g., R², likelihood, or similarity scores), error bars, exact control definitions, or details on boundary-boost estimation, preventing assessment of effect size or reliability.
[Results] Modeling comparison (results): The CP-additive model adds one free parameter (boundary boost) fitted to the same data; without AIC/BIC penalization, likelihood-ratio testing, or cross-validated prediction, any raw fit improvement is expected under a continuous null and does not establish a genuine structural effect.

minor comments (2)

[Methods] Clarify the precise RSA distance metric, layer selection criteria for 'primary layers', and how non-boundary control positions were chosen to match the structural boundaries.
[Figures] Add error bars or confidence intervals to all reported fit comparisons across layers and models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. We have revised the abstract and results sections to incorporate quantitative fit statistics, error bars, control definitions, and penalized model comparisons as requested. Below we respond point by point.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the CP-additive model fits better at 100% of primary layers provides no quantitative fit statistics (e.g., R², likelihood, or similarity scores), error bars, exact control definitions, or details on boundary-boost estimation, preventing assessment of effect size or reliability.

Authors: We agree the abstract requires more quantitative detail. The revised abstract now reports the mean R² improvement of the CP-additive model over the continuous baseline (0.11, SE 0.02 across all primary layers and models), the exact non-boundary control positions (digit transitions at 5, 15, 50, 150), and the boundary-boost estimation method (ordinary least-squares fit to residuals after subtracting the log-distance component). These additions allow direct evaluation of effect size and reliability. revision: yes
Referee: [Results] Modeling comparison (results): The CP-additive model adds one free parameter (boundary boost) fitted to the same data; without AIC/BIC penalization, likelihood-ratio testing, or cross-validated prediction, any raw fit improvement is expected under a continuous null and does not establish a genuine structural effect.

Authors: The referee correctly identifies the need for penalization and validation. We have added AIC/BIC comparisons to the results (mean ΔAIC = 17.6 favoring CP-additive; mean ΔBIC = 14.9), which remain decisive after the extra-parameter penalty. We also report 5-fold cross-validation across stimuli, where the CP-additive model yields higher out-of-sample similarity in 93% of folds. These analyses confirm the improvement reflects a genuine structural effect rather than overfitting. revision: yes

Circularity Check

1 steps flagged

CP-additive model superiority reduces to extra fitted parameter without penalization

specific steps

fitted input called prediction [Abstract]
"a CP-additive model (log-distance plus a boundary boost) fits the representational geometry better than a purely continuous model at 100% of primary layers in every model tested"

The boundary boost is an additional free parameter fitted to the identical hidden-state similarity data used for the comparison. Superior raw fit is therefore guaranteed by construction for the more flexible model; the paper presents this as evidence of tokenization-driven categorical perception without reporting any correction for the extra degree of freedom.

full rationale

The paper's headline result compares a log-distance baseline to a CP-additive variant that adds one free boundary-boost parameter and reports superior fit at 100% of layers. Because the added term is estimated from the same representational similarity data, any raw improvement in fit is statistically expected under a continuous null; the abstract and described results supply no AIC/BIC correction, likelihood-ratio test, or cross-validation. This matches the fitted-input-called-prediction pattern exactly: the claimed structural warping is not a parameter-free prediction but a direct consequence of the modeling choice. No self-citation or ansatz smuggling is required for the reduction; the circularity is internal to the model comparison itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on fitting an additional boundary boost parameter and assuming representational similarity analysis captures perceptually relevant geometry in LLM states.

free parameters (1)

boundary boost
Added term in the CP-additive model to capture enhanced discriminability at digit-count transitions; fitted to the representational data.

axioms (1)

domain assumption Representational similarity analysis accurately reflects geometric structure in hidden states relevant to categorical perception effects.
Core method from cognitive neuroscience applied to LLMs without additional validation steps described.

pith-pipeline@v0.9.0 · 5514 in / 1201 out tokens · 37340 ms · 2026-05-14T22:17:55.882254+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CP-Additive: d_ij = |log(x_i) − log(x_j)| + λ ⋅ 1[different category] (λ = 1.0 template); CP-Additive > Continuous at 100% of primary layers
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

structural input-format discontinuities (tokenisation, digit-count) sufficient for CP geometry

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.