LangMAP adapts UnigramLM for multilingual use to deliver language-specific tokenization from a shared vocabulary, boosting boundary alignment metrics across natural and programming languages with mixed downstream fine-tuning gains.
Impact of tokenization on language models: an analysis for Turkish.ACM Transactions on Asian and Low-Resource Language Information Processing, 22(4):116:1–116:21, 2023
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
TOTEN is a knowledge-based system for structure-preserving representation of physical quantities and technical notation in Brazilian Portuguese using an ontology of engineering entities and external authorities, outperforming statistical baselines in atomicity and reconstruction.
citing papers explorer
-
Toten: A Knowledge-Based System For Structure-Preserving Representation Of Physical Quantities And Technical Notation In Brazilian Portuguese
TOTEN is a knowledge-based system for structure-preserving representation of physical quantities and technical notation in Brazilian Portuguese using an ontology of engineering entities and external authorities, outperforming statistical baselines in atomicity and reconstruction.