Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.
Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
Experiments show domain match and language relatedness drive knowledge transfer in multilingual MT more than vocabulary overlap.
A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.
citing papers explorer
-
Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL
Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
-
Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation
Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.
-
Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining
Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
-
BloombergGPT: A Large Language Model for Finance
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
-
The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation
Experiments show domain match and language relatedness drive knowledge transfer in multilingual MT more than vocabulary overlap.
-
YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling
A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.