HAT Score analysis of 20 models on 3 benchmarks finds transfer functional in small models, slower-than-expected gains with scale, and clear progress over time.
On the Cross-lingual Transferability of Monolingual Representations , url =
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
dataset 1polarities
use dataset 1representative citing papers
SHIFT mitigates language bias in MLIR by subtracting estimated relative language vectors from document embeddings during indexing using parallel translation pairs.
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
The study filters non-English Wikipedia, reveals quality problems, proposes a 4-level ranking, and shows filtered data matches or beats raw data in language modeling with largest gains for lower-quality editions.
Selective replacement of the worst 20-30% of text-only subtitle segments with visual-enhanced outputs raises COMET scores for Indic languages, but full visual grounding is ineffective because of temporal misalignment between subtitles and frames.
Incidental multilingualism from uneven web training makes LLMs unequal, brittle, and opaque across languages.
A new pre-training task that maps languages bidirectionally in embedding space improves machine translation by up to 11.9 BLEU, cross-lingual QA by 6.72 BERTScore points, and understanding accuracy by over 5% over strong baselines.
Activation steering with FLORES-derived language vectors produces modest, layer-sensitive and language-dependent gains on cultural awareness tasks, with some settings degrading performance and strong interaction with prompt design.
citing papers explorer
-
Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance
A new pre-training task that maps languages bidirectionally in embedding space improves machine translation by up to 11.9 BLEU, cross-lingual QA by 6.72 BERTScore points, and understanding accuracy by over 5% over strong baselines.