LLMs can compose surface-form tokens from base embeddings plus learned transformation vectors, freeing 10-40% of vocabulary slots while expanding coverage and preserving downstream performance across five languages.
InProceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 34303–34326
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Vocab Diet: Reshaping the Vocabulary of LLMs via Vector Arithmetic
LLMs can compose surface-form tokens from base embeddings plus learned transformation vectors, freeing 10-40% of vocabulary slots while expanding coverage and preserving downstream performance across five languages.