pith. machine review for the scientific record. sign in

Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CL 2

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

Compute Optimal Tokenization

cs.CL · 2026-05-02 · unverdicted · novelty 6.0

Compute-optimal language models require parameter count to scale with data bytes rather than tokens, with optimal token compression rate decreasing as compute budget grows.

citing papers explorer

Showing 2 of 2 citing papers.