Merging any combination of monolingual pre-trained models leads to performance collapse due to interference, indicating that merging flexibility from fine-tuning does not extend to pre-training.
2409.16235 , archiveprefix =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2representative citing papers
citing papers explorer
-
On the Limits of Model Merging for Multilinguality in Pre-Training
Merging any combination of monolingual pre-trained models leads to performance collapse due to interference, indicating that merging flexibility from fine-tuning does not extend to pre-training.
- A Recipe for Long-Context Reasoning in Large Language Models via On-Policy Optimization and Distillation