Language models degrade over 300 times in performance on Romanized Sinhala versus Unicode, with model size showing no correlation to script robustness.
Social, Economic, and Demographic Factors Drive the Emergence of Hinglish Code-Mixing on Social Media
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala
Language models degrade over 300 times in performance on Romanized Sinhala versus Unicode, with model size showing no correlation to script robustness.