Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
URL https: //aclanthology.org/2024.acl-long.309/
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.
Sparse autoencoder analysis of language model activations finds limited evidence for a unified set of features detecting linguistic constraint violations.
HONES ranks feed-forward neurons by their causal contributions from task-relevant attention heads and uses lightweight scaling to steer performance on multiple vision-language tasks.
citing papers explorer
-
Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining
Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
-
One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging
Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.
-
Do Language Models Encode Knowledge of Linguistic Constraint Violations?
Sparse autoencoder analysis of language model activations finds limited evidence for a unified set of features detecting linguistic constraint violations.
-
From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models
HONES ranks feed-forward neurons by their causal contributions from task-relevant attention heads and uses lightweight scaling to steer performance on multiple vision-language tasks.