Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT

Shijie Wu, Mark Dredze · 2019 · DOI 10.18653/v1/d19-1077

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

representative citing papers

Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.

Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.

Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

cs.CL · 2026-04-19 · unverdicted · novelty 7.0

Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.

BloombergGPT: A Large Language Model for Finance

cs.LG · 2023-03-30 · conditional · novelty 6.0

BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation

cs.CL · 2026-05-05 · unverdicted · novelty 5.0

Experiments show domain match and language relatedness drive knowledge transfer in multilingual MT more than vocabulary overlap.

YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling

cs.CL · 2026-05-07 · unverdicted · novelty 2.0 · 2 refs

A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.

citing papers explorer

Showing 6 of 6 citing papers.

Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL cs.CL · 2026-04-22 · unverdicted · none · ref 72
Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation cs.CL · 2026-04-21 · unverdicted · none · ref 36
Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.
Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining cs.CL · 2026-04-19 · unverdicted · none · ref 77
Multilingual pretraining develops translation in two phases: early copying driven by surface similarities, followed by generalizing mechanisms while copying is refined.
BloombergGPT: A Large Language Model for Finance cs.LG · 2023-03-30 · conditional · none · ref 131
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation cs.CL · 2026-05-05 · unverdicted · none · ref 27
Experiments show domain match and language relatedness drive knowledge transfer in multilingual MT more than vocabulary overlap.
YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling cs.CL · 2026-05-07 · unverdicted · none · ref 55 · 2 links
A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.

Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT

fields

years

verdicts

representative citing papers

citing papers explorer