ConvexTok uses convex relaxation of tokenization to a linear program, improving intrinsic metrics, bits-per-byte, and some downstream tasks while certifying near-optimality within 1% at typical vocabulary sizes.
Investigating the Effectiveness of BPE : The Power of Shorter Sequences
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CL 3years
2026 3roles
background 1polarities
unclear 1representative citing papers
citing papers explorer
-
Tokenisation via Convex Relaxations
ConvexTok uses convex relaxation of tokenization to a linear program, improving intrinsic metrics, bits-per-byte, and some downstream tasks while certifying near-optimality within 1% at typical vocabulary sizes.
- Tokenization with Split Trees
- Compute Optimal Tokenization