The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

Lihao Huang; Liuyuan Wen; Wenbin Li; Xun Zhu; Yang Gao

arxiv: 2606.03645 · v1 · pith:OBKCHHNCnew · submitted 2026-05-29 · 💻 cs.LG · cs.AI

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

Liuyuan Wen , Xun Zhu , Lihao Huang , Wenbin Li , Yang Gao This is my paper

Pith reviewed 2026-06-28 23:42 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords large language modelsarithmeticresidual stream geometrycarry potentialquantizationgeometric slippageerror correction

0 comments

The pith

Large language models represent multi-operand addition as an Iso-Raw-Sum Trajectory in residual streams, anchored by semantic digits and modulated by continuous carry fibers, with errors as geometric slippages from noisy quantization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the geometry of the residual stream in large language models while they perform addition with multiple operands. It identifies a trajectory in which semantic digits fix the base position of representations while continuous carry fibers adjust values that exceed single-digit ranges. The authors introduce the Noisy Quantization Model, which treats arithmetic mistakes as slips that occur when internal neural noise moves a latent carry potential across discrete thresholds. This same account explains why lightweight probes can separate coexisting signals, such as correct values and hallucinations, from one activation vector. The framework also yields a geometric consistency check that detects and corrects these failures during inference.

Core claim

By analyzing the residual stream geometry during multi-operand addition, the authors identify the Iso-Raw-Sum Trajectory (IRST), a geometric structure where representations are anchored by semantic digits and modulated by continuous carry fibers. They propose the Noisy Quantization Model to explain this geometry, framing arithmetic errors as Geometric Slippages caused by internal neural noise pushing a continuous, latent Carry Potential across quantization thresholds. This geometric framework elucidates Probe Versatility, explaining how lightweight probes can disentangle coexisting latent signals from a single activation vector, and validates the insights through a geometric consistency chec

What carries the argument

The Iso-Raw-Sum Trajectory (IRST), a geometric structure in the residual stream where representations are anchored by semantic digits and modulated by continuous carry fibers, together with the Noisy Quantization Model that attributes errors to noise-driven crossings of quantization thresholds.

Load-bearing premise

The observed trajectories and error patterns are produced by a continuous carry potential that is quantized at discrete thresholds rather than by other mechanisms such as attention patterns, token embeddings, or training data statistics.

What would settle it

An experiment that clamps or suppresses the continuous carry dimension in the residual stream during addition and checks whether the specific patterns of geometric slippage errors disappear; persistence of those error patterns would falsify the model.

Figures

Figures reproduced from arXiv: 2606.03645 by Lihao Huang, Liuyuan Wen, Wenbin Li, Xun Zhu, Yang Gao.

**Figure 1.** Figure 1: Overview of our probing framework. (Left) The LLM performs multi-operand addition (e.g., 123 + 392 + 136) in an autoregressive manner. At each generation step (e.g., p = 1, corresponding to the tens digit), we extract the hidden state vectors h (l) p (mainly focusing on the final layer L). (Right) We train versatile probes on these activation vectors to decode several critical arithmetic variables, includi… view at source ↗

**Figure 2.** Figure 2: 2D UMAP visualization of the arithmetic manifold. (Left) Macroscopic Backbone: Global geometry of h (L) p (p = 4) organized around digit Anchors (0–9). Blue points denote correct samples labeled as sˆp; red points denote errors labeled as sˆp(sp). The inset highlights high-error transition zones between digit basins. (Right) Microscopic Texture: Magnified view around Anchor 1 labeled with (sp, sˆp, cp), sh… view at source ↗

**Figure 3.** Figure 3: The Iso-Raw-Sum Trajectory (IRST) framework of the arithmetic manifold. (Left) Magnified UMAP projection around digit Anchors 1, 2, and 3. Points are labeled with (ˆsp, cp). The geometry reveals distinct IRSTs (T0 ∼ T3) that act as continuous “threads” piercing through adjacent digit basins. For instance, T1 (where rp mod 10 = 1) connects stable nodes (1, 0) ↔ (2, 1) ↔ (3, 2) as the input carry increases. … view at source ↗

**Figure 5.** Figure 5: Empirical validation of the Noisy Quantization Model. (Top) The distribution of Carry Potential Φ across all generated positions p in the dataset. Green vertical lines indicate integer quantization thresholds (1.0, 2.0, . . . ). (Bottom) The conditional error rate as a function of Φ. The empirical data (red bars, off-byone errors only) exhibits a distinct periodic bathtub shape, spiking near integer bound… view at source ↗

**Figure 6.** Figure 6: Representative trajectory-level validation on T3. The markers are labeled as (sp, sˆp, cp). Circles denote correct predictions and squares denote errors. (Left) Empirical projection of last-layer activations for samples in T3. The x-axis shows the cosine distance from the central centroid (4, 4, 1). The manifold exhibits a clear V-shaped progression connecting stable basins. Crucially, error states such as… view at source ↗

**Figure 7.** Figure 7: presents the visualizations for PCA and t-SNE. 0.6 0.4 0.2 0.0 0.2 0.4 0.6 PC1 (9.76% variance) 0.6 0.4 0.2 0.0 0.2 0.4 PC2 (8.22% variance) 8 9 1 3 7 2 2(3) 1 6 3 4(5) 2 0 3(4) 9 2 9 6 2 3 0 5 5 5 0 5 0 7 1 1 1 9 1(0) 3 5(6) 5 3(4) 4 6 9 0 3 8 3 9 6 5 4 2 7 9 5(4) 6 7 1 6(8) 9 1 5 55 3 7 8 2 4 4(3) 0 1 0 1 8 4 1 2 1 8 3 3(4) 3 7(5) 3 8 7 4 8 9 1 3 3 6 5 1 1 7 2 9 2 1 6(7) 9 5 0 1 8 9 2 0 8 6 2 9 5 6(5) 1 … view at source ↗

**Figure 8.** Figure 8: Projection-independent validation of the IRSTs in the native representation space. Cosine distance to anchor states is evaluated directly in R 2560 across multiple trajectories. The characteristic V-shaped correlation between native-space distance and the continuous carry potential Φ generalizes across T0 . . . T9, supporting the claim that the IRST organization is not solely a 2D projection artifact. The … view at source ↗

**Figure 9.** Figure 9: Intrinsic dimensionality across IRSTs. We report the participation ratio, TWO-NN, and Levina–Bickel MLE estimates for T0 . . . T9 and their pooled union. The trajectory-conditioned subsets remain stably low-dimensional in native space, while the pooled set exhibits a larger linear effective dimension. D. Validation of the Raw Sum Assumption In the main context, we posit that the model’s arithmetic errors a… view at source ↗

**Figure 10.** Figure 10: Layer-wise evolution of intrinsic dimensionality. We compare the nonlinear and linear intrinsic-dimension estimates for T5 and the pooled set of all trajectories across layers. The two remain broadly similar in early layers, while later layers develop more trajectory-specific local structure. an incorrect output digit sˆp. This suggests that arithmetic errors are largely geometrically constrained: the rep… view at source ↗

**Figure 11.** Figure 11: Geometric signatures of carry-based and non-carry errors. Error modes are decoupled using the raw-sum probe. Carrybased errors, where the local raw sum is still recovered correctly, remain concentrated near adjacent decision boundaries. In contrast, non-carry errors scatter across more distant regions and do not follow a single continuous trajectory, indicating a distinct failure mode beyond the dominant… view at source ↗

**Figure 12.** Figure 12: Scaling of cognitive noise with task complexity. We extend the analysis of [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: Boundary-position bathtub profiles. Frequency and conditional-error distributions at the most significant digit (p = 0, top row) and the least significant digit (p = 9, bottom row) for 3-term 10-digit addition. Unlike the interior columns summarized in the main text, these boundary positions show structurally different profiles, highlighting that the steady-state bathtub regime is primarily an interior-po… view at source ↗

**Figure 14.** Figure 14: Generalization of the IRST geometry across different models. (Left) UMAP visualization for Qwen3-8B on a 12-digit addition task. The manifold structure is highly consistent with the 4B model, featuring a sequential arrangement of digit basins (0–9) connected by clear trajectories (e.g., T0, T1, T2). (Right) UMAP visualization for Gemma-3-4B-IT on a 10-digit addition task. Despite architectural differences… view at source ↗

**Figure 15.** Figure 15: Generalization of the Noisy Quantization hypothesis. Validation on Qwen3-8B (12-digit addition, top) and Gemma-3-4B-IT (10-digit addition, bottom). The Left panels show the sample frequency distribution relative to Carry Potential Φ, indicating that the dataset covers the entire potential space. The Right panels display the conditional error rates (red bars) overlaid with our theoretical fit (dashed blue … view at source ↗

**Figure 16.** Figure 16: Additional evidence from specialized arithmetic transformers. (Top) UMAP projections from the under-converged and fully converged single-task addition models reported by Quirke et al. (2025). The under-converged model exhibits continuous inter-basin geometry similar to our main setting, while the fully converged model forms more isolated basins. (Bottom) Conditional error-rate curves for the under-converg… view at source ↗

**Figure 17.** Figure 17: IRST analysis for 4-term addition (cp ∈ {0, 1, 2, 3}). (Left) 2D UMAP projection of activations at p = 4. While the digit backbone (0-9) persists, the increased density of carry fibers causes visual entanglement of trajectories in 2D space. However 3D plots can resolve these overlaps. (Right) Schematic of the expanded geometry. Between any adjacent digit basins (e.g., 0 and 1), there are now three distinc… view at source ↗

**Figure 18.** Figure 18: Impact of Tolerance δ. Token accuracy (left axis, blue) and Question accuracy (right axis, orange) under different values of δ. The peak at δ ≈ 0.12 validates the existence of a noise margin in the model’s carry potential estimation. context: ”That step looks incorrect. Let’s re-do just this step: {expression} = {current output}”. The model is then forced to regenerate the current digit based on this augm… view at source ↗

**Figure 19.** Figure 19: Layer-wise performance of different probes trained on Qwen3-4B combing all positions. (nostalgebraist, 2020). The Logit Lens applies the final layer’s unembedding matrix WU to intermediate hidden states h (l) p to project them directly into the vocabulary space. We evaluated both methods on identifying the Ground Truth Digit (GT) and the Model’s Final Output Digit (Pred) across all 36 layers. The results … view at source ↗

**Figure 20.** Figure 20: Layer-wise input-carry decoding accuracy for attention and FFN outputs. Attention blocks exhibit sharper stepwise gains, while FFN outputs largely follow these updates, suggesting that carry information is consolidated through a staged pipeline across layers. 2 4 0 6(7) 1 9 9 1 6 8 4 2(1) 3 6 3 4 0 6(5) 2 6 1 6 1(2) 9 7(5) 1 2 9 1(0) 9 9 1 2 4(3) 7 6 8 5 2(1) 1 1 7(8) 0 7 4 4 7 9 6(3) 5 2 5(4) 4 6 3 3 7 5… view at source ↗

**Figure 21.** Figure 21: Layer-wise Alighed UMAP visualization. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗

**Figure 22.** Figure 22: Layer-wise Performance Comparison: Linear Probes vs. Logit Lens. The Green and Red lines represent the accuracy of linear probes trained on h (l) p for the Ground Truth (GT) and Model Prediction (Pred), respectively. The Blue and Orange lines represent the accuracy of the Logit Lens (applying the unembedding matrix directly). Key Observation: There is a significant decoding lag. Probes successfully decode… view at source ↗

read the original abstract

Large Language Models exhibit paradoxical fragility in fundamental arithmetic, implying a disconnect between internal computation and discrete output. By analyzing the residual stream geometry during multi-operand addition, we identify the Iso-Raw-Sum Trajectory (IRST), a geometric structure where representations are anchored by semantic digits and modulated by continuous carry fibers. We propose the Noisy Quantization Model to explain this geometry, framing arithmetic errors as Geometric Slippages caused by internal neural noise pushing a continuous, latent Carry Potential across quantization thresholds. This geometric framework further elucidates Probe Versatility, explaining how lightweight probes can disentangle coexisting latent signals (such as ground truth versus hallucination) from a single activation vector. Finally, we validate these insights through a geometric consistency check method that effectively detects and corrects these quantization failures during inference. Our code is available at https://github.com/RL-MIND/Shape-of-Addition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a geometric framing for addition in transformers via the Iso-Raw-Sum Trajectory and Noisy Quantization Model, but the attribution to continuous carry potential lacks controls against simpler alternatives.

read the letter

The main takeaway is that this work describes a specific geometric pattern in the residual stream for multi-operand addition, anchored by semantic digits and modulated by what they call continuous carry fibers, then explains errors as slippages across quantization thresholds in their Noisy Quantization Model. The framing also ties into why probes can separate coexisting signals like ground truth and hallucination.

What stands out as new is the particular combination of the Iso-Raw-Sum Trajectory with the carry-potential account and the geometric-slippages explanation for errors. Prior probing work on arithmetic exists, but this descriptive structure and the inference-time consistency check do not reduce directly to it. Releasing code helps, as it lets others inspect the claimed geometric consistency method.

The soft spots are in the evidence. The claims rest on observational geometry without reported quantitative metrics, error bars, or ablation details in the abstract. The central move—attributing the trajectories and errors specifically to a continuous latent Carry Potential quantized at thresholds—does not appear to include controls that would distinguish it from attention patterns, token embeddings, or training statistics. The stress-test concern holds on the supplied description: without those comparisons, the modeling choice stays unsecured.

This paper is for researchers already working on mechanistic interpretability of structured computation in language models. Someone tracking activation geometry or arithmetic failures could extract usable ideas from the framing, even if the support needs more work.

It deserves peer review. The descriptive entities are concrete enough that referees could push on the validation and see whether the model adds explanatory power beyond existing techniques.

Referee Report

2 major / 1 minor

Summary. The paper analyzes residual stream geometry in LLMs during multi-operand addition tasks. It identifies an Iso-Raw-Sum Trajectory (IRST) in which representations are anchored by semantic digits and modulated by continuous carry fibers. The authors propose a Noisy Quantization Model that attributes arithmetic errors to geometric slippages arising when internal neural noise drives a latent continuous Carry Potential across discrete quantization thresholds. The framework is also used to explain probe versatility for disentangling coexisting signals (e.g., ground truth vs. hallucination) and is validated via a geometric consistency check that detects and corrects quantization failures at inference time. Code is released.

Significance. If the IRST geometry and the attribution of errors specifically to a continuous Carry Potential quantized at thresholds can be substantiated, the work would offer a mechanistic account of arithmetic fragility in LLMs and a practical inference-time correction method. The public code release is a clear strength that supports reproducibility and further testing of the geometric claims.

major comments (2)

[Abstract] Abstract and introduction: the central claim that arithmetic errors arise from geometric slippages of a continuous latent Carry Potential across quantization thresholds is not accompanied by described controls or ablation experiments that would distinguish this mechanism from alternatives such as discrete attention patterns, token embedding statistics, or training-data regularities. Without such disambiguation the attribution remains unsecured.
[Abstract] Abstract: the validation of the Noisy Quantization Model and the geometric consistency check is described only at a high level; no quantitative metrics, error bars, or statistical tests are mentioned that would allow assessment of whether the observed trajectories and error patterns are better explained by the proposed model than by simpler alternatives.

minor comments (1)

[Abstract] The abstract introduces several new terms (IRST, Noisy Quantization Model, Carry Potential, Geometric Slippages) without immediate definitions or pointers to the sections where they are formalized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your constructive feedback. We value the emphasis on rigorous disambiguation of mechanisms and quantitative validation of the Noisy Quantization Model. We will revise the manuscript to incorporate explicit controls, ablations, and quantitative metrics as outlined below. These additions will strengthen the attribution of errors to geometric slippages of the continuous Carry Potential while preserving the core geometric findings on the IRST.

read point-by-point responses

Referee: [Abstract] Abstract and introduction: the central claim that arithmetic errors arise from geometric slippages of a continuous latent Carry Potential across quantization thresholds is not accompanied by described controls or ablation experiments that would distinguish this mechanism from alternatives such as discrete attention patterns, token embedding statistics, or training-data regularities. Without such disambiguation the attribution remains unsecured.

Authors: We agree that the abstract does not detail explicit controls. The manuscript's geometric consistency check functions as an implicit disambiguation by demonstrating that interventions aligned with the continuous carry dimension correct errors in a manner not predicted by discrete attention patterns or static embedding statistics. However, to directly address the concern, the revision will add a dedicated ablation section comparing the Noisy Quantization Model against alternatives, including attention-head ablations and training-data regularity baselines, with quantitative error-prediction comparisons. revision: yes
Referee: [Abstract] Abstract: the validation of the Noisy Quantization Model and the geometric consistency check is described only at a high level; no quantitative metrics, error bars, or statistical tests are mentioned that would allow assessment of whether the observed trajectories and error patterns are better explained by the proposed model than by simpler alternatives.

Authors: The current manuscript emphasizes qualitative trajectory visualizations and the functional success of the consistency check. We acknowledge the absence of explicit quantitative metrics in the abstract and high-level description. The revision will add quantitative results, including detection accuracy with error bars across multiple seeds, statistical significance tests against baseline models, and tables comparing slippage prediction performance to simpler alternatives. revision: yes

Circularity Check

0 steps flagged

No circularity in observational geometry analysis

full rationale

The paper presents observational analysis of residual stream geometry during addition, identifying structures such as the Iso-Raw-Sum Trajectory and proposing the Noisy Quantization Model to frame errors as geometric slippages. No load-bearing derivations, equations, or results are shown to reduce by construction to fitted inputs, self-citations, or self-definitional loops. The central claims remain descriptive and model-proposing without the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claims rest on the assumption that residual-stream geometry directly reflects computational mechanisms and that the proposed quantization thresholds exist as latent continuous variables; no free parameters are named in the abstract, but the model introduces new descriptive entities without independent falsification criteria beyond the observed trajectories.

axioms (1)

domain assumption Residual stream activations during addition contain linearly readable semantic and carry information
Invoked when the authors state that representations are anchored by semantic digits and modulated by carry fibers.

invented entities (3)

Iso-Raw-Sum Trajectory (IRST) no independent evidence
purpose: Describes the observed geometric path of activations during addition
New descriptive structure introduced to organize the activation patterns
Noisy Quantization Model no independent evidence
purpose: Explains arithmetic errors as slips of a continuous carry potential across thresholds
New explanatory model proposed in the abstract
Carry Potential no independent evidence
purpose: Continuous latent variable that is quantized to produce carry decisions
Postulated continuous signal whose noise produces observed errors

pith-pipeline@v0.9.1-grok · 5687 in / 1566 out tokens · 21786 ms · 2026-06-28T23:42:33.275720+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 15 canonical work pages · 5 internal anchors

[1]

Proceedings of the 31st International Conference on Computational Linguistics , pages=

Language models encode the value of numbers linearly , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=
[2]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Probing for arithmetic errors in language models , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[3]

arXiv preprint arXiv:2510.05969 , year=

Probing the Difficulty Perception Mechanism of Large Language Models , author=. arXiv preprint arXiv:2510.05969 , year=

work page arXiv
[4]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , year=

Measuring Mathematical Problem Solving With the MATH Dataset , author=. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , year=
[6]

arXiv preprint arXiv:2402.14903 , year=

Tokenization counts: the impact of tokenization on arithmetic in frontier llms , author=. arXiv preprint arXiv:2402.14903 , year=

work page arXiv
[7]

The Eleventh International Conference on Learning Representations , year=

Progress measures for grokking via mechanistic interpretability , author=. The Eleventh International Conference on Learning Representations , year=
[8]

arXiv preprint arXiv:2502.19981 , year=

The lookahead limitation: Why multi-operand addition is hard for llms , author=. arXiv preprint arXiv:2502.19981 , year=

work page arXiv
[9]

arXiv preprint arXiv:2407.15360 , year=

Dissecting Multiplication in Transformers: Insights into LLMs , author=. arXiv preprint arXiv:2407.15360 , year=

work page arXiv
[10]

The Twelfth International Conference on Learning Representations , year=

Understanding Addition in Transformers , author=. The Twelfth International Conference on Learning Representations , year=
[11]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages=

The riemannian geometry of deep generative models , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages=
[12]

Journal of Machine Learning Research , volume=

Topology of deep neural networks , author=. Journal of Machine Learning Research , volume=
[13]

ICML 2025 Workshop on Reliable and Responsible Foundation Models , year=

The Geometries of Truth Are Orthogonal Across Tasks , author=. ICML 2025 Workshop on Reliable and Responsible Foundation Models , year=

2025
[14]

NeurIPS 2024 Workshop on Symmetry and Geometry in Neural Representations , year=

Hidden Holes-topological aspects of language models , author=. NeurIPS 2024 Workshop on Symmetry and Geometry in Neural Representations , year=

2024
[15]

arXiv preprint arXiv:2402.02619v10 , year=

Understanding Addition and Subtraction in Transformers , author=. arXiv preprint arXiv:2402.02619v10 , year=

work page arXiv
[16]

arXiv preprint arXiv:2506.07824 , year=

Addition in Four Movements: Mapping Layer-wise Information Trajectories in LLMs , author=. arXiv preprint arXiv:2506.07824 , year=

work page arXiv
[17]

ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

Language Models Use Trigonometry to Do Addition , author=. ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

2025
[18]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[19]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Umap: Uniform manifold approximation and projection for dimension reduction , author=. arXiv preprint arXiv:1802.03426 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[20]

arXiv preprint arXiv:2411.04430 , year=

Towards unifying interpretability and control: Evaluation via intervention , author=. arXiv preprint arXiv:2411.04430 , year=

work page arXiv
[21]

Forty-second International Conference on Machine Learning , year=

To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models , author=. Forty-second International Conference on Machine Learning , year=
[22]

Gemma 3 Technical Report

Gemma 3 technical report , author=. arXiv preprint arXiv:2503.19786 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[23]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Advances in neural information processing systems , volume=

Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=
[25]

Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations , pages=

Transformers: State-of-the-art natural language processing , author=. Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations , pages=

2020
[26]

Language models encode numbers using digit representations in base 10 , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) , pages=

2025
[27]

Proceedings of the Thirteenth International Conference on Learning Representations (ICLR) , year=

Not All Language Model Features Are One-Dimensionally Linear , author=. Proceedings of the Thirteenth International Conference on Learning Representations (ICLR) , year=
[28]

the Journal of machine Learning research , volume=

Scikit-learn: Machine learning in Python , author=. the Journal of machine Learning research , volume=. 2011 , publisher=

2011
[29]

Journal of machine learning research , volume=

Visualizing data using t-SNE , author=. Journal of machine learning research , volume=
[30]

Wiley interdisciplinary reviews: computational statistics , volume=

Principal component analysis , author=. Wiley interdisciplinary reviews: computational statistics , volume=. 2010 , publisher=

2010
[31]

AI Alignment Forum , year =

nostalgebraist , title =. AI Alignment Forum , year =
[32]

arXiv preprint arXiv:2411.16260 , year=

Unraveling arithmetic in large language models: The role of algebraic structures , author=. arXiv preprint arXiv:2411.16260 , year=

work page arXiv
[33]

Hypothesis-Driven Feature Manifold Analysis in

Tiblias, Federico and Bigoulaeva, Irina and Niu, Jingcheng and Balloccu, Simone and Gurevych, Iryna , journal=. Hypothesis-Driven Feature Manifold Analysis in
[34]

Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers , pages=

Mathematical Computation and Reasoning Errors by Large Language Models , author=. Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers , pages=
[35]

A Language Model

Dimitri von R. A Language Model. Forty-first International Conference on Machine Learning , year=
[36]

arXiv preprint arXiv:2407.15421 , year=

Planning in a recurrent neural network that plays Sokoban , author=. arXiv preprint arXiv:2407.15421 , year=

work page arXiv
[37]

MINT: Foundation Model Interventions , year=

Linearly Controlled Language Generation with Performative Guarantees , author=. MINT: Foundation Model Interventions , year=
[38]

First Conference on Language Modeling , year=

Eliciting Latent Knowledge from ''Quirky'' Language Models , author=. First Conference on Language Modeling , year=
[39]

Advances in Neural Information Processing Systems , volume=

Inference-time intervention: Eliciting truthful answers from a language model , author=. Advances in Neural Information Processing Systems , volume=
[40]

Proceedings of the 41st International Conference on Machine Learning , pages=

Interpreting and improving large language models in arithmetic calculation , author=. Proceedings of the 41st International Conference on Machine Learning , pages=
[41]

arXiv preprint arXiv:2304.02015 , year=

How well do large language models perform in arithmetic tasks? , author=. arXiv preprint arXiv:2304.02015 , year=

work page arXiv
[42]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Exposing numeracy gaps: A benchmark to evaluate fundamental numerical abilities in large language models , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025
[43]

Findings of the Association for Computational Linguistics ACL 2024 , pages=

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

2024

[1] [1]

Proceedings of the 31st International Conference on Computational Linguistics , pages=

Language models encode the value of numbers linearly , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=

[2] [2]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Probing for arithmetic errors in language models , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[3] [3]

arXiv preprint arXiv:2510.05969 , year=

Probing the Difficulty Perception Mechanism of Large Language Models , author=. arXiv preprint arXiv:2510.05969 , year=

work page arXiv

[4] [4]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , year=

Measuring Mathematical Problem Solving With the MATH Dataset , author=. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , year=

[6] [6]

arXiv preprint arXiv:2402.14903 , year=

Tokenization counts: the impact of tokenization on arithmetic in frontier llms , author=. arXiv preprint arXiv:2402.14903 , year=

work page arXiv

[7] [7]

The Eleventh International Conference on Learning Representations , year=

Progress measures for grokking via mechanistic interpretability , author=. The Eleventh International Conference on Learning Representations , year=

[8] [8]

arXiv preprint arXiv:2502.19981 , year=

The lookahead limitation: Why multi-operand addition is hard for llms , author=. arXiv preprint arXiv:2502.19981 , year=

work page arXiv

[9] [9]

arXiv preprint arXiv:2407.15360 , year=

Dissecting Multiplication in Transformers: Insights into LLMs , author=. arXiv preprint arXiv:2407.15360 , year=

work page arXiv

[10] [10]

The Twelfth International Conference on Learning Representations , year=

Understanding Addition in Transformers , author=. The Twelfth International Conference on Learning Representations , year=

[11] [11]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages=

The riemannian geometry of deep generative models , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages=

[12] [12]

Journal of Machine Learning Research , volume=

Topology of deep neural networks , author=. Journal of Machine Learning Research , volume=

[13] [13]

ICML 2025 Workshop on Reliable and Responsible Foundation Models , year=

The Geometries of Truth Are Orthogonal Across Tasks , author=. ICML 2025 Workshop on Reliable and Responsible Foundation Models , year=

2025

[14] [14]

NeurIPS 2024 Workshop on Symmetry and Geometry in Neural Representations , year=

Hidden Holes-topological aspects of language models , author=. NeurIPS 2024 Workshop on Symmetry and Geometry in Neural Representations , year=

2024

[15] [15]

arXiv preprint arXiv:2402.02619v10 , year=

Understanding Addition and Subtraction in Transformers , author=. arXiv preprint arXiv:2402.02619v10 , year=

work page arXiv

[16] [16]

arXiv preprint arXiv:2506.07824 , year=

Addition in Four Movements: Mapping Layer-wise Information Trajectories in LLMs , author=. arXiv preprint arXiv:2506.07824 , year=

work page arXiv

[17] [17]

ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

Language Models Use Trigonometry to Do Addition , author=. ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

2025

[18] [18]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Umap: Uniform manifold approximation and projection for dimension reduction , author=. arXiv preprint arXiv:1802.03426 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

arXiv preprint arXiv:2411.04430 , year=

Towards unifying interpretability and control: Evaluation via intervention , author=. arXiv preprint arXiv:2411.04430 , year=

work page arXiv

[21] [21]

Forty-second International Conference on Machine Learning , year=

To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models , author=. Forty-second International Conference on Machine Learning , year=

[22] [22]

Gemma 3 Technical Report

Gemma 3 technical report , author=. arXiv preprint arXiv:2503.19786 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Advances in neural information processing systems , volume=

Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=

[25] [25]

Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations , pages=

Transformers: State-of-the-art natural language processing , author=. Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations , pages=

2020

[26] [26]

Language models encode numbers using digit representations in base 10 , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) , pages=

2025

[27] [27]

Proceedings of the Thirteenth International Conference on Learning Representations (ICLR) , year=

Not All Language Model Features Are One-Dimensionally Linear , author=. Proceedings of the Thirteenth International Conference on Learning Representations (ICLR) , year=

[28] [28]

the Journal of machine Learning research , volume=

Scikit-learn: Machine learning in Python , author=. the Journal of machine Learning research , volume=. 2011 , publisher=

2011

[29] [29]

Journal of machine learning research , volume=

Visualizing data using t-SNE , author=. Journal of machine learning research , volume=

[30] [30]

Wiley interdisciplinary reviews: computational statistics , volume=

Principal component analysis , author=. Wiley interdisciplinary reviews: computational statistics , volume=. 2010 , publisher=

2010

[31] [31]

AI Alignment Forum , year =

nostalgebraist , title =. AI Alignment Forum , year =

[32] [32]

arXiv preprint arXiv:2411.16260 , year=

Unraveling arithmetic in large language models: The role of algebraic structures , author=. arXiv preprint arXiv:2411.16260 , year=

work page arXiv

[33] [33]

Hypothesis-Driven Feature Manifold Analysis in

Tiblias, Federico and Bigoulaeva, Irina and Niu, Jingcheng and Balloccu, Simone and Gurevych, Iryna , journal=. Hypothesis-Driven Feature Manifold Analysis in

[34] [34]

Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers , pages=

Mathematical Computation and Reasoning Errors by Large Language Models , author=. Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers , pages=

[35] [35]

A Language Model

Dimitri von R. A Language Model. Forty-first International Conference on Machine Learning , year=

[36] [36]

arXiv preprint arXiv:2407.15421 , year=

Planning in a recurrent neural network that plays Sokoban , author=. arXiv preprint arXiv:2407.15421 , year=

work page arXiv

[37] [37]

MINT: Foundation Model Interventions , year=

Linearly Controlled Language Generation with Performative Guarantees , author=. MINT: Foundation Model Interventions , year=

[38] [38]

First Conference on Language Modeling , year=

Eliciting Latent Knowledge from ''Quirky'' Language Models , author=. First Conference on Language Modeling , year=

[39] [39]

Advances in Neural Information Processing Systems , volume=

Inference-time intervention: Eliciting truthful answers from a language model , author=. Advances in Neural Information Processing Systems , volume=

[40] [40]

Proceedings of the 41st International Conference on Machine Learning , pages=

Interpreting and improving large language models in arithmetic calculation , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

[41] [41]

arXiv preprint arXiv:2304.02015 , year=

How well do large language models perform in arithmetic tasks? , author=. arXiv preprint arXiv:2304.02015 , year=

work page arXiv

[42] [42]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Exposing numeracy gaps: A benchmark to evaluate fundamental numerical abilities in large language models , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025

[43] [43]

Findings of the Association for Computational Linguistics ACL 2024 , pages=

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

2024