Expanding pretrained models to thousands more languages via lexicon-based adaptation

Xinyi Wang, Sebastian Ruder, Graham Neubig · 2022 · DOI 10.18653/v1/2022.acl-long.61

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers

cs.IR · 2026-04-19 · unverdicted · novelty 7.0

Code-switching creates a fundamental performance bottleneck for multilingual retrievers, causing drops of up to 27% on new benchmarks CSR-L and CS-MTEB, with embedding divergence as the key cause and vocabulary expansion insufficient to fix it.

ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model

cs.CL · 2024-04-03 · unverdicted · novelty 4.0

Four MAFT-based PLMs for Angolan languages report 12.3-point gains over AfroXLMR-base and 3.8-point gains over OFA baselines on downstream tasks.

citing papers explorer

Showing 2 of 2 citing papers.

Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers cs.IR · 2026-04-19 · unverdicted · none · ref 45
Code-switching creates a fundamental performance bottleneck for multilingual retrievers, causing drops of up to 27% on new benchmarks CSR-L and CS-MTEB, with embedding divergence as the key cause and vocabulary expansion insufficient to fix it.
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model cs.CL · 2024-04-03 · unverdicted · none · ref 26
Four MAFT-based PLMs for Angolan languages report 12.3-point gains over AfroXLMR-base and 3.8-point gains over OFA baselines on downstream tasks.

Expanding pretrained models to thousands more languages via lexicon-based adaptation

fields

years

verdicts

representative citing papers

citing papers explorer