On the Limits of Model Merging for Multilinguality in Pre-Training

Aleksandr Umnov; Christof Monz; Fedor Vitiugin; Khalil Sima'an; Seth Aycock

arxiv: 2605.25846 · v1 · pith:652SEMUFnew · submitted 2026-05-25 · 💻 cs.CL

On the Limits of Model Merging for Multilinguality in Pre-Training

Seth Aycock , Fedor Vitiugin , Aleksandr Umnov , Christof Monz , Khalil Sima'an This is my paper

classification 💻 cs.CL

keywords mergingpre-trainingmodelmodelsmonolingualperformancelanguage-specificachieved

0 comments

read the original abstract

Endowing models with consistent multilingual performance can be achieved by mixing pre-training data, or post-training approaches such as language-specific model merging. In this work, we test whether merging can be applied to monolingually pre-trained models. We conduct a controlled study on the efficacy of mixed, merged, and monolingual pre-training setups. We find that while monolingual pre-training results in strong in-language performance, merging any combination of monolingual models leads to performance collapse due to interference. Our analysis suggests representational similarity is a prerequisite for model merging. We therefore conclude that the flexibility of merging in fine-tuning does not extend trivially to language-specific pre-training.

This paper has not been read by Pith yet.

On the Limits of Model Merging for Multilinguality in Pre-Training

discussion (0)