pith:A536HLN7
Unsupervised Cross-lingual Representation Learning at Scale
Pretraining multilingual language models on 100 languages with over two terabytes of data leads to large gains on cross-lingual benchmarks.
arxiv:1911.02116 v2 · 2019-11-05 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{A536HLN7UJUF763LWO53KFKHUV}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks.
That the observed gains are caused by the increased scale of pretraining data and languages rather than by differences in data filtering, hyperparameter choices, or evaluation protocol details not visible in the abstract.
XLM-R, pretrained on 100 languages with 2TB of CommonCrawl data, improves average XNLI accuracy by 14.6 points and MLQA F1 by 13 points over mBERT while matching strong monolingual models on GLUE.
References
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:47.315378Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
0777e3adbfa2685ffb6bb3bbb51547a555b8cee1800b2893176cd77160efda46
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/A536HLN7UJUF763LWO53KFKHUV \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0777e3adbfa2685ffb6bb3bbb51547a555b8cee1800b2893176cd77160efda46
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "fbed3f020eb6d7cdcebb15ee2da04eef8c1db21877b60207edcbbd0b72267088",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2019-11-05T22:42:00Z",
"title_canon_sha256": "f1c1e325d47d6ee88301d33ccbf5082b8804a1f894c8786e31e3003ca0f104c5"
},
"schema_version": "1.0",
"source": {
"id": "1911.02116",
"kind": "arxiv",
"version": 2
}
}