How multilingual is Multilingual BERT?

Telmo Pires, Eva Schlinger, Dan Garrette · 2019 · cs.CL · arXiv 1906.01502

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in different scripts, that transfer works best between typologically similar languages, that monolingual corpora can train models for code-switching, and that the model can find translation pairs. From these results, we can conclude that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs.

representative citing papers

Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

cs.CL · 2026-05-14 · unverdicted · novelty 7.0

New metrics KSS and KPS are introduced to evaluate multilingual machine unlearning quality and cross-language consistency in LLMs, addressing limitations of single-language evaluation protocols.

Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Lesioning a shared core in multilingual LLMs drops whole-brain fMRI encoding correlation by 60.32%, while language-specific lesions selectively weaken predictions only for the matched native language.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

cs.CL · 2020-02-19 · unverdicted · novelty 6.0

CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.

Patent Claim Generation by Fine-Tuning OpenAI GPT-2

cs.CL · 2019-07-01 · unverdicted · novelty 5.0

Fine-tunes GPT-2 on patent claims, probes training steps, analyzes conditional and unconditional sampling outputs, proposes a new sampling method, and releases an email bot for exploration.

citing papers explorer

Showing 5 of 5 citing papers.

Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation cs.CL · 2026-05-14 · unverdicted · none · ref 30 · internal anchor
New metrics KSS and KPS are introduced to evaluate multilingual machine unlearning quality and cross-language consistency in LLMs, addressing limitations of single-language evaluation protocols.
Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment cs.CL · 2026-04-12 · unverdicted · none · ref 49
Lesioning a shared core in multilingual LLMs drops whole-brain fMRI encoding correlation by 60.32%, while language-specific lesions selectively weaken predictions only for the matched native language.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 62
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
CodeBERT: A Pre-Trained Model for Programming and Natural Languages cs.CL · 2020-02-19 · unverdicted · none · ref 52
CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.
Patent Claim Generation by Fine-Tuning OpenAI GPT-2 cs.CL · 2019-07-01 · unverdicted · none · ref 27 · internal anchor
Fine-tunes GPT-2 on patent claims, probes training steps, analyzes conditional and unconditional sampling outputs, proposes a new sampling method, and releases an email bot for exploration.

How multilingual is Multilingual BERT?

fields

years

verdicts

representative citing papers

citing papers explorer