SARA aligns internal routing distributions in MoE layers to high-resource semantic anchors via symmetric JS divergence, improving low-resource language performance by 0.8-1.2% over standard instruction tuning on Global-MMLU.
arXiv preprint arXiv:2505.17747 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment
SARA aligns internal routing distributions in MoE layers to high-resource semantic anchors via symmetric JS divergence, improving low-resource language performance by 0.8-1.2% over standard instruction tuning on Global-MMLU.