LLM residual streams during addition form an Iso-Raw-Sum Trajectory anchored by digit semantics and modulated by continuous carry signals, with errors arising as geometric slippages across quantization thresholds in a noisy model.
arXiv preprint arXiv:2502.19981 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
BitTokens represent numbers as single tokens via IEEE 754 binary format, allowing small language models to learn basic arithmetic algorithms nearly perfectly.
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
citing papers explorer
-
The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models
LLM residual streams during addition form an Iso-Raw-Sum Trajectory anchored by digit semantics and modulated by continuous carry signals, with errors arising as geometric slippages across quantization thresholds in a noisy model.
-
Efficient numeracy in language models through single-token number embeddings
BitTokens represent numbers as single tokens via IEEE 754 binary format, allowing small language models to learn basic arithmetic algorithms nearly perfectly.
-
Hypothesis generation and updating in large language models
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
-
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.