Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs.arXiv preprint arXiv:2402.14903

Aaditya K · 2024 · arXiv 2402.14903

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

LLM residual streams during addition form an Iso-Raw-Sum Trajectory anchored by digit semantics and modulated by continuous carry signals, with errors arising as geometric slippages across quantization thresholds in a noisy model.

Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

DIPS fine-tunes LLMs to output ordered feasible decision vectors approximating Pareto fronts for constrained bi-objective convex problems, reaching 95-98% normalized hypervolume with 0.16s inference.

How Tokenization Limits Phonological Knowledge Representation in Language Models and How to Improve Them

cs.CL · 2026-04-18 · unverdicted · novelty 7.0

Subword tokenization impairs phonological knowledge encoding in LMs, but an IPA-based fine-tuning method restores it with minimal impact on other capabilities.

Efficient numeracy in language models through single-token number embeddings

cs.LG · 2025-10-08 · unverdicted · novelty 7.0

BitTokens represent numbers as single tokens via IEEE 754 binary format, allowing small language models to learn basic arithmetic algorithms nearly perfectly.

FLEXITOKENS: Flexible Tokenization for Evolving Language Models

cs.CL · 2025-07-17 · unverdicted · novelty 7.0

FLEXITOKENS replaces rigid subword tokenizers and fixed-compression auxiliary losses with a simplified boundary-prediction objective in byte-level models, yielding lower over-fragmentation and up to 10-point gains on multilingual and domain-adaptation tasks.

Inside the LLM Word Factory

cs.CL · 2026-06-07 · unverdicted · novelty 6.0

Activation patching localizes English detokenization in Llama2-7B to a two-stage attention-then-MLP process at layer 1 that generalizes to 12 models from 8 families, with depth varying by positional encoding, plus an early-layer probe achieving 0.94-0.97 AUROC.

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

cs.CL · 2025-09-08 · unverdicted · novelty 6.0

MachineLearningLM uses continued pretraining on SCM-synthesized ML tasks with random-forest distillation to give LLMs robust many-shot in-context learning on tabular classification, reaching random-forest accuracy levels while preserving general chat performance.

The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic

cs.AI · 2026-05-27 · unverdicted · novelty 5.0

Re-evaluation of GSM-Symbolic using GLMMs on 20 models shows only half have significant performance changes; a distribution shift in larger integers (K-S=0.12) accounts for significance in half the remaining cases.

Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective

cs.CR · 2026-04-20 · unverdicted · novelty 5.0

BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.

A Triadic Suffix Tokenization Scheme for Numerical Reasoning

cs.CL · 2026-04-13 · unverdicted · novelty 5.0

Triadic Suffix Tokenization groups digits into triads with fixed magnitude suffixes to make order-of-magnitude relationships explicit at the token level for LLMs.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization cs.AI · 2026-05-12 · unverdicted · none · ref 40
DIPS fine-tunes LLMs to output ordered feasible decision vectors approximating Pareto fronts for constrained bi-objective convex problems, reaching 95-98% normalized hypervolume with 0.16s inference.
The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic cs.AI · 2026-05-27 · unverdicted · none · ref 5
Re-evaluation of GSM-Symbolic using GLMMs on 20 models shows only half have significant performance changes; a distribution shift in larger integers (K-S=0.12) accounts for significance in half the remaining cases.

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs.arXiv preprint arXiv:2402.14903

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer