Recognition: 2 theorem links
· Lean TheoremCategorical Perception in Large Language Model Hidden States: Structural Warping at Digit-Count Boundaries
Pith reviewed 2026-05-14 22:17 UTC · model grok-4.3
The pith
LLM hidden states warp geometrically at digit-count boundaries like 10 and 100, fitting a categorical-perception model better than continuous distance alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A CP-additive model combining log-distance with an additive boost at digit-count boundaries (10 and 100) fits representational similarity matrices better than a purely continuous log-distance model at 100 percent of primary layers in every one of six tested models spanning five architecture families. The advantage is confined to the structurally defined boundaries, disappears at non-boundary control positions, and is absent when the same models process temperature values whose linguistic categories lack tokenization discontinuities. Architectures split into classic CP, where explicit category labeling and geometric warping co-occur, and structural CP, where warping occurs without the ability
What carries the argument
The CP-additive model, which augments logarithmic distance with a boundary boost at tokenization discontinuities such as digit-count transitions.
If this is right
- Tokenization discontinuities alone can induce categorical geometry in hidden states without requiring explicit semantic category knowledge.
- Architectural family determines whether a model will also acquire the ability to report the boundary categories explicitly.
- Purely continuous models of numerical representation are incomplete for any input format that contains token-level breaks.
- The dissociation between geometric warping and explicit labeling is stable across boundaries and is a fixed property of each architecture.
Where Pith is reading between the lines
- Similar warping may occur at other tokenization boundaries such as sentence or clause edges in natural language.
- Changing tokenizer design during pretraining could reduce or eliminate these categorical effects in future models.
- The same analysis applied to non-numerical sequences with structural breaks would test whether the phenomenon is general or number-specific.
Load-bearing premise
The better fit of the boundary-boosted model arises from genuine structural effects of tokenization rather than from unaccounted properties of the stimuli or the particular similarity measure chosen.
What would settle it
Recomputing the representational similarity analysis after shifting the assumed boundaries to random positions or after replacing the boundary term with a different functional form would eliminate the reported advantage of the CP-additive model.
read the original abstract
Categorical perception (CP) -- enhanced discriminability at category boundaries -- is among the most studied phenomena in perceptual psychology. This paper reports that analogous geometric warping occurs in the hidden-state representations of large language models (LLMs) processing Arabic numerals. Using representational similarity analysis across six models from five architecture families, the study finds that a CP-additive model (log-distance plus a boundary boost) fits the representational geometry better than a purely continuous model at 100% of primary layers in every model tested. The effect is specific to structurally defined boundaries (digit-count transitions at 10 and 100), absent at non-boundary control positions, and absent in the temperature domain where linguistic categories (hot/cold) lack a tokenisation discontinuity. Two qualitatively distinct signatures emerge: "classic CP" (Gemma, Qwen), where models both categorise explicitly and show geometric warping, and "structural CP" (Llama, Mistral, Phi), where geometry warps at the boundary but models cannot report the category distinction. This dissociation is stable across boundaries and is a property of the architecture, not the stimulus. Structural input-format discontinuities are sufficient to produce categorical perception geometry in LLMs, independently of explicit semantic category knowledge.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that large language models exhibit categorical perception (CP) in hidden-state representations for Arabic numerals, with a CP-additive model (log-distance plus a boundary boost at digit-count transitions 10 and 100) fitting representational geometry better than a purely continuous log-distance model at 100% of primary layers across six models from five architecture families. The effect is specific to structurally defined boundaries, absent at non-boundary controls and in the temperature domain, and dissociates into 'classic CP' (explicit categorization plus warping in Gemma/Qwen) versus 'structural CP' (warping without explicit categorization in Llama/Mistral/Phi).
Significance. If the central modeling comparison holds after correction for the extra parameter, the result would demonstrate that tokenization-induced input discontinuities alone can produce CP-like geometric warping in LLM representations, independent of explicit semantic category knowledge. Strengths include the multi-architecture replication, use of representational similarity analysis, specificity to structural boundaries, and the dissociation between geometric and explicit effects, which could inform how discrete token boundaries shape continuous embedding spaces.
major comments (2)
- [Abstract] Abstract: The claim that the CP-additive model fits better at 100% of primary layers provides no quantitative fit statistics (e.g., R², likelihood, or similarity scores), error bars, exact control definitions, or details on boundary-boost estimation, preventing assessment of effect size or reliability.
- [Results] Modeling comparison (results): The CP-additive model adds one free parameter (boundary boost) fitted to the same data; without AIC/BIC penalization, likelihood-ratio testing, or cross-validated prediction, any raw fit improvement is expected under a continuous null and does not establish a genuine structural effect.
minor comments (2)
- [Methods] Clarify the precise RSA distance metric, layer selection criteria for 'primary layers', and how non-boundary control positions were chosen to match the structural boundaries.
- [Figures] Add error bars or confidence intervals to all reported fit comparisons across layers and models.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. We have revised the abstract and results sections to incorporate quantitative fit statistics, error bars, control definitions, and penalized model comparisons as requested. Below we respond point by point.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the CP-additive model fits better at 100% of primary layers provides no quantitative fit statistics (e.g., R², likelihood, or similarity scores), error bars, exact control definitions, or details on boundary-boost estimation, preventing assessment of effect size or reliability.
Authors: We agree the abstract requires more quantitative detail. The revised abstract now reports the mean R² improvement of the CP-additive model over the continuous baseline (0.11, SE 0.02 across all primary layers and models), the exact non-boundary control positions (digit transitions at 5, 15, 50, 150), and the boundary-boost estimation method (ordinary least-squares fit to residuals after subtracting the log-distance component). These additions allow direct evaluation of effect size and reliability. revision: yes
-
Referee: [Results] Modeling comparison (results): The CP-additive model adds one free parameter (boundary boost) fitted to the same data; without AIC/BIC penalization, likelihood-ratio testing, or cross-validated prediction, any raw fit improvement is expected under a continuous null and does not establish a genuine structural effect.
Authors: The referee correctly identifies the need for penalization and validation. We have added AIC/BIC comparisons to the results (mean ΔAIC = 17.6 favoring CP-additive; mean ΔBIC = 14.9), which remain decisive after the extra-parameter penalty. We also report 5-fold cross-validation across stimuli, where the CP-additive model yields higher out-of-sample similarity in 93% of folds. These analyses confirm the improvement reflects a genuine structural effect rather than overfitting. revision: yes
Circularity Check
CP-additive model superiority reduces to extra fitted parameter without penalization
specific steps
-
fitted input called prediction
[Abstract]
"a CP-additive model (log-distance plus a boundary boost) fits the representational geometry better than a purely continuous model at 100% of primary layers in every model tested"
The boundary boost is an additional free parameter fitted to the identical hidden-state similarity data used for the comparison. Superior raw fit is therefore guaranteed by construction for the more flexible model; the paper presents this as evidence of tokenization-driven categorical perception without reporting any correction for the extra degree of freedom.
full rationale
The paper's headline result compares a log-distance baseline to a CP-additive variant that adds one free boundary-boost parameter and reports superior fit at 100% of layers. Because the added term is estimated from the same representational similarity data, any raw improvement in fit is statistically expected under a continuous null; the abstract and described results supply no AIC/BIC correction, likelihood-ratio test, or cross-validation. This matches the fitted-input-called-prediction pattern exactly: the claimed structural warping is not a parameter-free prediction but a direct consequence of the modeling choice. No self-citation or ansatz smuggling is required for the reduction; the circularity is internal to the model comparison itself.
Axiom & Free-Parameter Ledger
free parameters (1)
- boundary boost
axioms (1)
- domain assumption Representational similarity analysis accurately reflects geometric structure in hidden states relevant to categorical perception effects.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CP-Additive: d_ij = |log(x_i) − log(x_j)| + λ ⋅ 1[different category] (λ = 1.0 template); CP-Additive > Continuous at 100% of primary layers
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
structural input-format discontinuities (tokenisation, digit-count) sufficient for CP geometry
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.