Modular Monolingual Adaptation using Pretrained Language Models

Nalin Kumar; Ond\v{r}ej Du\v{s}ek

arxiv: 2606.06738 · v1 · pith:OLSVQJKInew · submitted 2026-06-04 · 💻 cs.CL

Modular Monolingual Adaptation using Pretrained Language Models

Nalin Kumar , Ond\v{r}ej Du\v{s}ek This is my paper

Pith reviewed 2026-06-28 01:08 UTC · model grok-4.3

classification 💻 cs.CL

keywords monolingual adaptationlow-resource languagespretrained language modelsmodular approachtoken replacementembedding freezingNLU tasks

0 comments

The pith

Replacing tokens, freezing embeddings, and tuning the rest adapts pretrained models better to low-resource languages than full finetuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether full model finetuning is needed to adapt pretrained language models to low-resource languages. Instead, it proposes a modular method: replace the language-specific tokens, freeze the new embeddings, and tune only the remaining parts of the model. Tests on Scottish Gaelic, Irish, and Quechua demonstrate improved performance on mask filling, named entity recognition, and part-of-speech tagging tasks. This suggests more efficient knowledge transfer for languages with scarce data.

Core claim

By replacing tokens, freezing the corresponding embeddings, and tuning the rest of the model rather than the entire model, the adaptation to low-resource languages yields better results on natural language understanding tasks.

What carries the argument

Modular adaptation through token replacement and embedding freezing while selectively tuning model parameters.

If this is right

The modular approach can be more effective than full tuning for low-resource language adaptation.
It works for very low-resource cases such as Quechua with 8.5k training instances.
Analysis shows the importance of training strategies and pretrained embedding choices.
Performance gains are observed on mask filling, NER, and POS tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such modularity may reduce computational costs for adapting models to many languages.
The results imply that preserving original embeddings helps retain cross-lingual knowledge.
Similar freezing strategies could be explored for other layers in future adaptations.

Load-bearing premise

Freezing the embeddings after token replacement is enough to keep useful knowledge from the original model without updates or new interference.

What would settle it

If experiments show that updating all parameters including embeddings leads to higher accuracy on the NLU tasks for these languages, the modular claim would be challenged.

Figures

Figures reproduced from arXiv: 2606.06738 by Nalin Kumar, Ond\v{r}ej Du\v{s}ek.

**Figure 2.** Figure 2: The graph shows the weight differences be [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

read the original abstract

Building monolingual language models (LMs) for low-resource languages typically relies on adapting pretrained language models (PLMs) by finetuning the whole model on the target language. This approach is widely favored over training from scratch, as it enables effective knowledge transfer. Additionally, prior work has shown that using a language-specific tokenizer can enhance the adaptability. In this work, we hypothesize that full model tuning is often unnecessary and propose a more modular approach. Specifically, we replace the tokens, freeze the corresponding embeddings, and tune the rest of the model. We use Scottish Gaelic, Irish, and Quechua for our experiments, with Quechua being a very low-resource language (8.5k training instances). Evaluation on natural language understanding (NLU) tasks -- mask filling, NER, and POS -- shows that our proposed approach improves performance when adapting models to low-resource languages. Additionally, we provide a comprehensive analysis of the effectiveness of training strategies, the choice of pretrained embeddings, and models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The modular freezing approach after token replacement is a practical tweak worth testing, but the evidence for it beating full fine-tuning rests on unshown numbers and an untested assumption about embedding stability in tiny data regimes.

read the letter

The main thing to know is that this paper tests a modular adaptation method: swap in a language-specific tokenizer, freeze the new embeddings, and tune only the rest of the model instead of full fine-tuning. They report better results on mask filling, NER, and POS for Scottish Gaelic, Irish, and Quechua, with the last one using just 8.5k training examples.

What is actually new is the specific combination of token replacement plus frozen embeddings plus partial tuning, presented as a testable hypothesis building on prior tokenizer observations. The paper does a decent job focusing on very low-resource cases and including analysis of training strategies, embedding choices, and different models. That breakdown gives readers concrete options to consider when adapting PLMs.

The soft spots are in the strength of the supporting evidence. The abstract states an improvement without any numbers, baselines, or tests, so it is impossible to judge effect size or robustness from what is shown. The stress-test concern lands: freezing the embeddings after replacement may not preserve useful transfer if the new tokens need language-specific updates, especially for Quechua where the corpus is tiny and upper layers may not compensate. If the paper lacks a direct frozen-versus-unfrozen comparison under identical token replacement, the central assumption stays untested.

This is for people working on efficient PLM adaptation to low-resource languages in NLP. A reader looking for practical experiments on tokenizer and embedding handling would find some value in the setup and analysis. It deserves a serious referee to check the actual results, controls, and whether the gains survive closer scrutiny.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a modular approach to adapting pretrained language models (PLMs) to low-resource languages by replacing tokens with language-specific ones, freezing the corresponding embeddings, and fine-tuning only the remaining model parameters. Experiments are conducted on Scottish Gaelic, Irish, and Quechua (the latter with only 8.5k training instances), evaluating on NLU tasks including mask filling, NER, and POS tagging. The central claim is that this method improves performance over full-model fine-tuning, with additional analysis of training strategies, embedding choices, and model selections.

Significance. If the results hold after addressing the noted gaps, the work would indicate that full fine-tuning is often unnecessary for monolingual PLM adaptation in low-resource settings, potentially offering efficiency gains while better preserving pretrained knowledge. The focus on a very low-resource case (Quechua) and multiple tasks provides a relevant testbed for modular adaptation techniques.

major comments (2)

[analysis of training strategies and embedding choices] The central claim depends on the assumption that freezing new embeddings after token replacement is sufficient to preserve transfer without harmful interference or the need for language-specific updates. However, the analysis of training strategies and embedding choices does not include a direct frozen-vs-unfrozen ablation under identical token replacement, which is required to validate this for Quechua's 8.5k-instance regime where upper layers may not compensate.
[abstract and experimental evaluation] The abstract asserts that the proposed approach 'improves performance' on the NLU tasks but supplies no quantitative results, baselines, effect sizes, or statistical tests. If the results section lacks these controls and comparisons to full fine-tuning (and to token replacement without freezing), the empirical support for the central claim cannot be assessed.

minor comments (1)

[method] The method description would benefit from a diagram or pseudocode illustrating the token replacement and which parameters are frozen vs. tuned.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major comment below, clarifying the manuscript's content and outlining planned revisions where appropriate.

read point-by-point responses

Referee: [analysis of training strategies and embedding choices] The central claim depends on the assumption that freezing new embeddings after token replacement is sufficient to preserve transfer without harmful interference or the need for language-specific updates. However, the analysis of training strategies and embedding choices does not include a direct frozen-vs-unfrozen ablation under identical token replacement, which is required to validate this for Quechua's 8.5k-instance regime where upper layers may not compensate.

Authors: We appreciate the referee's emphasis on this distinction. The manuscript's analysis of training strategies explicitly compares the proposed modular approach (token replacement followed by freezing the new embeddings while tuning the remainder) against full fine-tuning after identical token replacement. The latter case updates the new embeddings and thus serves as the unfrozen counterpart. These comparisons are reported for all languages, including the 8.5k-instance Quechua setting. To make the frozen-vs-unfrozen contrast even more explicit, we will add a dedicated ablation subsection isolating this factor in the revised version. revision: yes
Referee: [abstract and experimental evaluation] The abstract asserts that the proposed approach 'improves performance' on the NLU tasks but supplies no quantitative results, baselines, effect sizes, or statistical tests. If the results section lacks these controls and comparisons to full fine-tuning (and to token replacement without freezing), the empirical support for the central claim cannot be assessed.

Authors: We agree that the abstract would be strengthened by including concrete quantitative support. In revision we will update the abstract to report key performance deltas versus full fine-tuning on mask filling, NER, and POS tagging, along with the primary baselines. The results section already presents direct comparisons to full fine-tuning (which encompasses token replacement without freezing the new embeddings) across the three languages and tasks; we will ensure effect sizes are highlighted and will add any missing statistical significance markers if not already present. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical adaptation method with independent experimental validation

full rationale

The paper proposes replacing tokens, freezing new embeddings, and tuning only upper layers for PLM adaptation to low-resource languages, then evaluates this on mask filling, NER, and POS tasks for Scottish Gaelic, Irish, and Quechua. No mathematical derivation chain, equations, or fitted parameters renamed as predictions exist. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim rests on direct empirical comparisons rather than any self-referential construction, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is empirical and introduces no mathematical axioms, free parameters, or new entities; the central claim rests entirely on the experimental comparison described in the abstract.

pith-pipeline@v0.9.1-grok · 5703 in / 1075 out tokens · 41910 ms · 2026-06-28T01:08:33.720757+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

93 extracted references · 26 canonical work pages

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[2]

The Limits of Interpretation

Umberto Eco. The Limits of Interpretation
[3]

Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards

Jannik Strötgen and Michael Gertz. Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). 2012

2012
[4]

Chercheur

J.L. Chercheur. Case-Based Reasoning. 1994

1994
[5]

Castor and L

A. Castor and L. E. Pollux. The use of user modelling to guide inference and learning. Applied Intelligence. 1992

1992
[6]

Superman and B

S. Superman and B. Batman and C. Catwoman and S. Spiderman. Superheroes experiences with books. Journal journal journal
[7]

Elementary Statistics

Paul Gerhard Hoel. Elementary Statistics. 1971

1971
[8]

1954--58

A history of technology. 1954--58

1954
[9]

N. Chomsky. Conditions on Transformations. A festschrift for Morris Halle. 1973

1973
[10]

Natural Fibre Twines

BSI. Natural Fibre Twines. 1973

1973
[11]

Language: Its Nature, Development, and Origin

Otto Jespersen. Language: Its Nature, Development, and Origin
[12]

Proceedings of the 29th International Conference on Computational Linguistics , pages=

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning , author=. Proceedings of the 29th International Conference on Computational Linguistics , pages=
[13]

Accelerating Multilingual Language Model for Excessively Tokenized Languages

Hong, Jimin and Lee, Gibbeum and Cho, Jaewoong. Accelerating Multilingual Language Model for Excessively Tokenized Languages. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.660

work page doi:10.18653/v1/2024.findings-acl.660 2024
[14]

Efficient Active Learning with Adapters

Galimzianova, Daria and Sanochkin, Leonid. Efficient Active Learning with Adapters. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.840

work page doi:10.18653/v1/2024.findings-emnlp.840 2024
[15]

Multi- BERT : Leveraging Adapters for Low-Resource Multi-Domain Adaptation

Abed Azad, Parham and Beigy, Hamid. Multi- BERT : Leveraging Adapters for Low-Resource Multi-Domain Adaptation. Proceedings of the Tenth Workshop on Noisy and User-generated Text. 2025. doi:10.18653/v1/2025.wnut-1.12

work page doi:10.18653/v1/2025.wnut-1.12 2025
[16]

Multilingual Machine Translation with Hyper-Adapters

Baziotis, Christos and Artetxe, Mikel and Cross, James and Bhosale, Shruti. Multilingual Machine Translation with Hyper-Adapters. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.77

work page doi:10.18653/v1/2022.emnlp-main.77 2022
[17]

A dapter H ub: A Framework for Adapting Transformers

Pfeiffer, Jonas and R. A dapter H ub: A Framework for Adapting Transformers. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020. doi:10.18653/v1/2020.emnlp-demos.7

work page doi:10.18653/v1/2020.emnlp-demos.7 2020
[18]

and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich

Alabi, Jesujoba O. and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich. Adapting Pre-trained Language Models to A frican Languages via Multilingual Adaptive Fine-Tuning. Proceedings of the 29th International Conference on Computational Linguistics. 2022

2022
[19]

International Conference on Machine Learning , pages=

Overtrained Language Models Are Harder to Fine-Tune , author=. International Conference on Machine Learning , pages=. 2025 , organization=

2025
[20]

12 Oleksiy Syvokon and Mariana Romanyshyn

Rust, Phillip and Pfeiffer, Jonas and Vuli \'c , Ivan and Ruder, Sebastian and Gurevych, Iryna. How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volum...

work page doi:10.18653/v1/2021.acl-long.243 2021
[21]

As Good as New

de Vries, Wietse and Nissim, Malvina. As Good as New. How to Successfully Recycle E nglish GPT -2 to Make Models for Other Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.74

work page doi:10.18653/v1/2021.findings-acl.74 2021
[22]

Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages

Limisiewicz, Tomasz and Balhar, Ji r \'i and Mare c ek, David. Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.350

work page doi:10.18653/v1/2023.findings-acl.350 2023
[23]

Rethinking Vocabulary Augmentation: Addressing the Challenges of Low-Resource Languages in Multilingual Models

Lin, Nankai and Zeng, Peijian and Zheng, Weixiong and Jiang, Shengyi and Zhou, Dong and Yang, Aimin. Rethinking Vocabulary Augmentation: Addressing the Challenges of Low-Resource Languages in Multilingual Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025
[24]

arXiv preprint arXiv:2406.11477 , year=

How Can We Effectively Expand the Vocabulary of LLMs with 0.01 GB of Target Language Text? , author=. arXiv preprint arXiv:2406.11477 , year=

arXiv
[25]

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

2019
[26]

and Smith, Noah A

Chau, Ethan C. and Smith, Noah A. Specializing Multilingual Language Models: An Empirical Study. Proceedings of the 1st Workshop on Multilingual Representation Learning. 2021. doi:10.18653/v1/2021.mrl-1.5

work page doi:10.18653/v1/2021.mrl-1.5 2021
[27]

When Being Unseen from m BERT is just the Beginning: Handling New Languages With Multilingual Language Models

Muller, Benjamin and Anastasopoulos, Antonios and Sagot, Beno \^i t and Seddah, Djam \'e. When Being Unseen from m BERT is just the Beginning: Handling New Languages With Multilingual Language Models. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10...

work page doi:10.18653/v1/2021.naacl-main.38 2021
[28]

Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in H ausa Language Using A fri BERT a

Sani, Sani Abdullahi and Muhammad, Shamsuddeen Hassan and Jarvis, Devon. Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in H ausa Language Using A fri BERT a. Proceedings of the First Workshop on Language Models for Low-Resource Languages. 2025

2025
[29]

MAD-X : A n A dapter- B ased F ramework for M ulti- T ask C ross- L ingual T ransfer

Pfeiffer, Jonas and Vuli \'c , Ivan and Gurevych, Iryna and Ruder, Sebastian. MAD-X : A n A dapter- B ased F ramework for M ulti- T ask C ross- L ingual T ransfer. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.617

work page doi:10.18653/v1/2020.emnlp-main.617 2020
[30]

and Tsvetkov, Yulia

Wang, Zirui and Lipton, Zachary C. and Tsvetkov, Yulia. On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.359

work page doi:10.18653/v1/2020.emnlp-main.359 2020
[31]

arXiv preprint arXiv:1912.07076 , year=

Multilingual is not enough: BERT for Finnish , author=. arXiv preprint arXiv:1912.07076 , year=

arXiv 1912
[32]

arXiv preprint arXiv:2003.02912 , year=

What the [mask]? making sense of language-specific BERT models , author=. arXiv preprint arXiv:2003.02912 , year=

arXiv 2003
[33]

M ulti F i T : Efficient Multi-lingual Language Model Fine-tuning

Eisenschlos, Julian and Ruder, Sebastian and Czapla, Piotr and Kadras, Marcin and Gugger, Sylvain and Howard, Jeremy. M ulti F i T : Efficient Multi-lingual Language Model Fine-tuning. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCN...

work page doi:10.18653/v1/d19-1572 2019
[34]

arXiv preprint arXiv:2303.08774 , year=

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

Pith/arXiv arXiv
[35]

Exploring the Impact of Transliteration on NLP Performance: Treating M altese as an A rabic Dialect

Micallef, Kurt and Eryani, Fadhl and Habash, Nizar and Bouamor, Houda and Borg, Claudia. Exploring the Impact of Transliteration on NLP Performance: Treating M altese as an A rabic Dialect. Proceedings of the Workshop on Computation and Written Language (CAWL 2023). 2023. doi:10.18653/v1/2023.cawl-1.4

work page doi:10.18653/v1/2023.cawl-1.4 2023
[36]

arXiv preprint arXiv:2407.02320 , year=

Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts , author=. arXiv preprint arXiv:2407.02320 , year=

arXiv
[37]

arXiv preprint arXiv:2409.17326 , year=

How Transliterations Improve Crosslingual Alignment , author=. arXiv preprint arXiv:2409.17326 , year=

arXiv
[38]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Romanization-based Large-scale Adaptation of Multilingual Language Models , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

2023
[39]

arXiv preprint arXiv:2203.09904 , year=

Do Multilingual Language Models Capture Differing Moral Norms? , author=. arXiv preprint arXiv:2203.09904 , year=

arXiv
[40]

The 2023 W eb NLG Shared Task on Low Resource Languages

Cripwell, Liam and Belz, Anya and Gardent, Claire and Gatt, Albert and Borg, Claudia and Borg, Marthese and Judge, John and Lorandi, Michela and Nikiforovskaya, Anna and Soto Martinez, William. The 2023 W eb NLG Shared Task on Low Resource Languages. Overview and Evaluation Results ( W eb NLG 2023). Proceedings of the Workshop on Multimodal, Multilingual ...

2023
[41]

and McDonald, Ryan and Petrov, Slav and Pyysalo, Sampo and Silveira, Natalia and Tsarfaty, Reut and Zeman, Daniel

Nivre, Joakim and de Marneffe, Marie-Catherine and Ginter, Filip and Goldberg, Yoav and Haji c , Jan and Manning, Christopher D. and McDonald, Ryan and Petrov, Slav and Pyysalo, Sampo and Silveira, Natalia and Tsarfaty, Reut and Zeman, Daniel. U niversal D ependencies v1: A Multilingual Treebank Collection. Proceedings of the Tenth International Conferenc...

2016
[42]

and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel

Nivre, Joakim and de Marneffe, Marie-Catherine and Ginter, Filip and Haji c , Jan and Manning, Christopher D. and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel. U niversal D ependencies v2: An Evergrowing Multilingual Treebank Collection. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

2020
[43]

W ord N et Embeddings

Saedi, Chakaveh and Branco, Ant \'o nio and Ant \'o nio Rodrigues, Jo \ a o and Silva, Jo \ a o. W ord N et Embeddings. Proceedings of the Third Workshop on Representation Learning for NLP. 2018. doi:10.18653/v1/W18-3016

work page doi:10.18653/v1/w18-3016 2018
[44]

arXiv preprint arXiv:2307.09288 , year=

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

Pith/arXiv arXiv
[45]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
[46]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=
[47]

Journal of Machine Learning Research , volume=

Beyond english-centric multilingual machine translation , author=. Journal of Machine Learning Research , volume=
[48]

Mmi01 at The B aby LM Challenge: Linguistically Motivated Curriculum Learning for Pretraining in Low-Resource Settings

Mi, Maggie. Mmi01 at The B aby LM Challenge: Linguistically Motivated Curriculum Learning for Pretraining in Low-Resource Settings. Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning. 2023. doi:10.18653/v1/2023.conll-babylm.23

work page doi:10.18653/v1/2023.conll-babylm.23 2023
[49]

arXiv preprint arXiv:2402.07827 , year=

Aya model: An instruction finetuned open-access multilingual language model , author=. arXiv preprint arXiv:2402.07827 , year=

arXiv
[50]

CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data

Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm \'a n, Francisco and Joulin, Armand and Grave, Edouard. CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

2020
[51]

ACM , author =

Miller, George A. , title =. Commun. ACM , month = nov, pages =. 1995 , issue_date =. doi:10.1145/219717.219748 , abstract =

work page doi:10.1145/219717.219748 1995
[52]

Advances in Neural Information Processing Systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in Neural Information Processing Systems , volume=
[53]

Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning. 2023

2023
[54]

arXiv preprint arXiv:2409.11968 , year=

Efficacy of Synthetic Data as a Benchmark , author=. arXiv preprint arXiv:2409.11968 , year=

arXiv
[55]

arXiv preprint arXiv:2308.08747 , year=

An empirical study of catastrophic forgetting in large language models during continual fine-tuning , author=. arXiv preprint arXiv:2308.08747 , year=

Pith/arXiv arXiv
[56]

MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina

Zhang, Chen and Tao, Mingxu and Huang, Quzhe and Lin, Jiuheng and Chen, Zhibin and Feng, Yansong. MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.479

work page doi:10.18653/v1/2024.acl-long.479 2024
[57]

News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces

J \"o rg Tiedemann. News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. Recent Advances in Natural Language Processing. 2009

2009
[58]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

A New Massive Multilingual Dataset for High-Performance Language Technologies , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024
[59]

arXiv preprint arXiv:2404.11553 , year=

Quantifying multilingual performance of large language models across languages , author=. arXiv preprint arXiv:2404.11553 , year=

arXiv
[60]

arXiv preprint arXiv:2402.14714 , year=

Efficient and effective vocabulary expansion towards multilingual large language models , author=. arXiv preprint arXiv:2402.14714 , year=

arXiv
[61]

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Marchisio, Kelly and Lewis, Patrick and Chen, Yihong and Artetxe, Mikel. Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.338

work page doi:10.18653/v1/2023.findings-acl.338 2023
[62]

arXiv preprint arXiv:2007.09757 , year=

Mono vs multilingual transformer-based models: a comparison across several language tasks , author=. arXiv preprint arXiv:2007.09757 , year=

arXiv 2007
[63]

arXiv preprint arXiv:2010.11934 , year=

mt5: A massively multilingual pre-trained text-to-text transformer , author=. arXiv preprint arXiv:2010.11934 , year=

arXiv 2010
[64]

arXiv preprint arXiv:2401.01055 , year=

Llama beyond english: An empirical study on language capability transfer , author=. arXiv preprint arXiv:2401.01055 , year=

arXiv
[65]

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining , pages=

Improving cross-lingual information retrieval on low-resource languages via optimal transport distillation , author=. Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining , pages=
[66]

arXiv preprint arXiv:2401.13303 , year=

Mala-500: Massive language adaptation of large language models , author=. arXiv preprint arXiv:2401.13303 , year=

arXiv
[67]

Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

Alexa teacher model: Pretraining and distilling multi-billion-parameter encoders for natural language understanding systems , author=. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
[68]

arXiv preprint arXiv:2407.21783 , year=

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

Pith/arXiv arXiv
[69]

arXiv preprint arXiv:2408.00118 , year=

Gemma 2: Improving open language models at a practical size , author=. arXiv preprint arXiv:2408.00118 , year=

Pith/arXiv arXiv
[70]

arXiv preprint arXiv:2312.11805 , year=

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

Pith/arXiv arXiv
[71]

Publications Manual , year = "1983", publisher =

1983
[72]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[73]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of. 2007 , url=

2007
[74]

Dan Gusfield , title =. 1997

1997
[75]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[76]

and Lin, Lucy H

Chau, Ethan C. and Lin, Lucy H. and Smith, Noah A. Parsing with Multilingual BERT , a Small Corpus, and a Small Treebank. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.118

work page doi:10.18653/v1/2020.findings-emnlp.118 2020
[77]

and Zettlemoyer, Luke

Blevins, Terra and Limisiewicz, Tomasz and Gururangan, Suchin and Li, Margaret and Gonen, Hila and Smith, Noah A. and Zettlemoyer, Luke. Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.604

work page doi:10.18653/v1/2024.emnlp-main.604 2024
[78]

Are All Languages Created Equal in Multilingual BERT ?

Wu, Shijie and Dredze, Mark. Are All Languages Created Equal in Multilingual BERT ?. Proceedings of the 5th Workshop on Representation Learning for NLP. 2020. doi:10.18653/v1/2020.repl4nlp-1.16

work page doi:10.18653/v1/2020.repl4nlp-1.16 2020
[79]

Conneau, K

Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...

work page doi:10.18653/v1/2020.acl-main.747 2020
[80]

Cross-lingual Name Tagging and Linking for 282 Languages

Pan, Xiaoman and Zhang, Boliang and May, Jonathan and Nothman, Joel and Knight, Kevin and Ji, Heng. Cross-lingual Name Tagging and Linking for 282 Languages. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1178

work page doi:10.18653/v1/p17-1178 2017

Showing first 80 references.

[1] [1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972

[2] [2]

The Limits of Interpretation

Umberto Eco. The Limits of Interpretation

[3] [3]

Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards

Jannik Strötgen and Michael Gertz. Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). 2012

2012

[4] [4]

Chercheur

J.L. Chercheur. Case-Based Reasoning. 1994

1994

[5] [5]

Castor and L

A. Castor and L. E. Pollux. The use of user modelling to guide inference and learning. Applied Intelligence. 1992

1992

[6] [6]

Superman and B

S. Superman and B. Batman and C. Catwoman and S. Spiderman. Superheroes experiences with books. Journal journal journal

[7] [7]

Elementary Statistics

Paul Gerhard Hoel. Elementary Statistics. 1971

1971

[8] [8]

1954--58

A history of technology. 1954--58

1954

[9] [9]

N. Chomsky. Conditions on Transformations. A festschrift for Morris Halle. 1973

1973

[10] [10]

Natural Fibre Twines

BSI. Natural Fibre Twines. 1973

1973

[11] [11]

Language: Its Nature, Development, and Origin

Otto Jespersen. Language: Its Nature, Development, and Origin

[12] [12]

Proceedings of the 29th International Conference on Computational Linguistics , pages=

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning , author=. Proceedings of the 29th International Conference on Computational Linguistics , pages=

[13] [13]

Accelerating Multilingual Language Model for Excessively Tokenized Languages

Hong, Jimin and Lee, Gibbeum and Cho, Jaewoong. Accelerating Multilingual Language Model for Excessively Tokenized Languages. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.660

work page doi:10.18653/v1/2024.findings-acl.660 2024

[14] [14]

Efficient Active Learning with Adapters

Galimzianova, Daria and Sanochkin, Leonid. Efficient Active Learning with Adapters. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.840

work page doi:10.18653/v1/2024.findings-emnlp.840 2024

[15] [15]

Multi- BERT : Leveraging Adapters for Low-Resource Multi-Domain Adaptation

Abed Azad, Parham and Beigy, Hamid. Multi- BERT : Leveraging Adapters for Low-Resource Multi-Domain Adaptation. Proceedings of the Tenth Workshop on Noisy and User-generated Text. 2025. doi:10.18653/v1/2025.wnut-1.12

work page doi:10.18653/v1/2025.wnut-1.12 2025

[16] [16]

Multilingual Machine Translation with Hyper-Adapters

Baziotis, Christos and Artetxe, Mikel and Cross, James and Bhosale, Shruti. Multilingual Machine Translation with Hyper-Adapters. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.77

work page doi:10.18653/v1/2022.emnlp-main.77 2022

[17] [17]

A dapter H ub: A Framework for Adapting Transformers

Pfeiffer, Jonas and R. A dapter H ub: A Framework for Adapting Transformers. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020. doi:10.18653/v1/2020.emnlp-demos.7

work page doi:10.18653/v1/2020.emnlp-demos.7 2020

[18] [18]

and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich

Alabi, Jesujoba O. and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich. Adapting Pre-trained Language Models to A frican Languages via Multilingual Adaptive Fine-Tuning. Proceedings of the 29th International Conference on Computational Linguistics. 2022

2022

[19] [19]

International Conference on Machine Learning , pages=

Overtrained Language Models Are Harder to Fine-Tune , author=. International Conference on Machine Learning , pages=. 2025 , organization=

2025

[20] [20]

12 Oleksiy Syvokon and Mariana Romanyshyn

Rust, Phillip and Pfeiffer, Jonas and Vuli \'c , Ivan and Ruder, Sebastian and Gurevych, Iryna. How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volum...

work page doi:10.18653/v1/2021.acl-long.243 2021

[21] [21]

As Good as New

de Vries, Wietse and Nissim, Malvina. As Good as New. How to Successfully Recycle E nglish GPT -2 to Make Models for Other Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.74

work page doi:10.18653/v1/2021.findings-acl.74 2021

[22] [22]

Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages

Limisiewicz, Tomasz and Balhar, Ji r \'i and Mare c ek, David. Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.350

work page doi:10.18653/v1/2023.findings-acl.350 2023

[23] [23]

Rethinking Vocabulary Augmentation: Addressing the Challenges of Low-Resource Languages in Multilingual Models

Lin, Nankai and Zeng, Peijian and Zheng, Weixiong and Jiang, Shengyi and Zhou, Dong and Yang, Aimin. Rethinking Vocabulary Augmentation: Addressing the Challenges of Low-Resource Languages in Multilingual Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025

[24] [24]

arXiv preprint arXiv:2406.11477 , year=

How Can We Effectively Expand the Vocabulary of LLMs with 0.01 GB of Target Language Text? , author=. arXiv preprint arXiv:2406.11477 , year=

arXiv

[25] [25]

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

2019

[26] [26]

and Smith, Noah A

Chau, Ethan C. and Smith, Noah A. Specializing Multilingual Language Models: An Empirical Study. Proceedings of the 1st Workshop on Multilingual Representation Learning. 2021. doi:10.18653/v1/2021.mrl-1.5

work page doi:10.18653/v1/2021.mrl-1.5 2021

[27] [27]

When Being Unseen from m BERT is just the Beginning: Handling New Languages With Multilingual Language Models

Muller, Benjamin and Anastasopoulos, Antonios and Sagot, Beno \^i t and Seddah, Djam \'e. When Being Unseen from m BERT is just the Beginning: Handling New Languages With Multilingual Language Models. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10...

work page doi:10.18653/v1/2021.naacl-main.38 2021

[28] [28]

Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in H ausa Language Using A fri BERT a

Sani, Sani Abdullahi and Muhammad, Shamsuddeen Hassan and Jarvis, Devon. Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in H ausa Language Using A fri BERT a. Proceedings of the First Workshop on Language Models for Low-Resource Languages. 2025

2025

[29] [29]

MAD-X : A n A dapter- B ased F ramework for M ulti- T ask C ross- L ingual T ransfer

Pfeiffer, Jonas and Vuli \'c , Ivan and Gurevych, Iryna and Ruder, Sebastian. MAD-X : A n A dapter- B ased F ramework for M ulti- T ask C ross- L ingual T ransfer. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.617

work page doi:10.18653/v1/2020.emnlp-main.617 2020

[30] [30]

and Tsvetkov, Yulia

Wang, Zirui and Lipton, Zachary C. and Tsvetkov, Yulia. On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.359

work page doi:10.18653/v1/2020.emnlp-main.359 2020

[31] [31]

arXiv preprint arXiv:1912.07076 , year=

Multilingual is not enough: BERT for Finnish , author=. arXiv preprint arXiv:1912.07076 , year=

arXiv 1912

[32] [32]

arXiv preprint arXiv:2003.02912 , year=

What the [mask]? making sense of language-specific BERT models , author=. arXiv preprint arXiv:2003.02912 , year=

arXiv 2003

[33] [33]

M ulti F i T : Efficient Multi-lingual Language Model Fine-tuning

Eisenschlos, Julian and Ruder, Sebastian and Czapla, Piotr and Kadras, Marcin and Gugger, Sylvain and Howard, Jeremy. M ulti F i T : Efficient Multi-lingual Language Model Fine-tuning. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCN...

work page doi:10.18653/v1/d19-1572 2019

[34] [34]

arXiv preprint arXiv:2303.08774 , year=

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

Pith/arXiv arXiv

[35] [35]

Exploring the Impact of Transliteration on NLP Performance: Treating M altese as an A rabic Dialect

Micallef, Kurt and Eryani, Fadhl and Habash, Nizar and Bouamor, Houda and Borg, Claudia. Exploring the Impact of Transliteration on NLP Performance: Treating M altese as an A rabic Dialect. Proceedings of the Workshop on Computation and Written Language (CAWL 2023). 2023. doi:10.18653/v1/2023.cawl-1.4

work page doi:10.18653/v1/2023.cawl-1.4 2023

[36] [36]

arXiv preprint arXiv:2407.02320 , year=

Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts , author=. arXiv preprint arXiv:2407.02320 , year=

arXiv

[37] [37]

arXiv preprint arXiv:2409.17326 , year=

How Transliterations Improve Crosslingual Alignment , author=. arXiv preprint arXiv:2409.17326 , year=

arXiv

[38] [38]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Romanization-based Large-scale Adaptation of Multilingual Language Models , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

2023

[39] [39]

arXiv preprint arXiv:2203.09904 , year=

Do Multilingual Language Models Capture Differing Moral Norms? , author=. arXiv preprint arXiv:2203.09904 , year=

arXiv

[40] [40]

The 2023 W eb NLG Shared Task on Low Resource Languages

Cripwell, Liam and Belz, Anya and Gardent, Claire and Gatt, Albert and Borg, Claudia and Borg, Marthese and Judge, John and Lorandi, Michela and Nikiforovskaya, Anna and Soto Martinez, William. The 2023 W eb NLG Shared Task on Low Resource Languages. Overview and Evaluation Results ( W eb NLG 2023). Proceedings of the Workshop on Multimodal, Multilingual ...

2023

[41] [41]

and McDonald, Ryan and Petrov, Slav and Pyysalo, Sampo and Silveira, Natalia and Tsarfaty, Reut and Zeman, Daniel

Nivre, Joakim and de Marneffe, Marie-Catherine and Ginter, Filip and Goldberg, Yoav and Haji c , Jan and Manning, Christopher D. and McDonald, Ryan and Petrov, Slav and Pyysalo, Sampo and Silveira, Natalia and Tsarfaty, Reut and Zeman, Daniel. U niversal D ependencies v1: A Multilingual Treebank Collection. Proceedings of the Tenth International Conferenc...

2016

[42] [42]

and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel

Nivre, Joakim and de Marneffe, Marie-Catherine and Ginter, Filip and Haji c , Jan and Manning, Christopher D. and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel. U niversal D ependencies v2: An Evergrowing Multilingual Treebank Collection. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

2020

[43] [43]

W ord N et Embeddings

Saedi, Chakaveh and Branco, Ant \'o nio and Ant \'o nio Rodrigues, Jo \ a o and Silva, Jo \ a o. W ord N et Embeddings. Proceedings of the Third Workshop on Representation Learning for NLP. 2018. doi:10.18653/v1/W18-3016

work page doi:10.18653/v1/w18-3016 2018

[44] [44]

arXiv preprint arXiv:2307.09288 , year=

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

Pith/arXiv arXiv

[45] [45]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

[46] [46]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

[47] [47]

Journal of Machine Learning Research , volume=

Beyond english-centric multilingual machine translation , author=. Journal of Machine Learning Research , volume=

[48] [48]

Mmi01 at The B aby LM Challenge: Linguistically Motivated Curriculum Learning for Pretraining in Low-Resource Settings

Mi, Maggie. Mmi01 at The B aby LM Challenge: Linguistically Motivated Curriculum Learning for Pretraining in Low-Resource Settings. Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning. 2023. doi:10.18653/v1/2023.conll-babylm.23

work page doi:10.18653/v1/2023.conll-babylm.23 2023

[49] [49]

arXiv preprint arXiv:2402.07827 , year=

Aya model: An instruction finetuned open-access multilingual language model , author=. arXiv preprint arXiv:2402.07827 , year=

arXiv

[50] [50]

CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data

Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm \'a n, Francisco and Joulin, Armand and Grave, Edouard. CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

2020

[51] [51]

ACM , author =

Miller, George A. , title =. Commun. ACM , month = nov, pages =. 1995 , issue_date =. doi:10.1145/219717.219748 , abstract =

work page doi:10.1145/219717.219748 1995

[52] [52]

Advances in Neural Information Processing Systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in Neural Information Processing Systems , volume=

[53] [53]

Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning. 2023

2023

[54] [54]

arXiv preprint arXiv:2409.11968 , year=

Efficacy of Synthetic Data as a Benchmark , author=. arXiv preprint arXiv:2409.11968 , year=

arXiv

[55] [55]

arXiv preprint arXiv:2308.08747 , year=

An empirical study of catastrophic forgetting in large language models during continual fine-tuning , author=. arXiv preprint arXiv:2308.08747 , year=

Pith/arXiv arXiv

[56] [56]

MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina

Zhang, Chen and Tao, Mingxu and Huang, Quzhe and Lin, Jiuheng and Chen, Zhibin and Feng, Yansong. MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.479

work page doi:10.18653/v1/2024.acl-long.479 2024

[57] [57]

News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces

J \"o rg Tiedemann. News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. Recent Advances in Natural Language Processing. 2009

2009

[58] [58]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

A New Massive Multilingual Dataset for High-Performance Language Technologies , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024

[59] [59]

arXiv preprint arXiv:2404.11553 , year=

Quantifying multilingual performance of large language models across languages , author=. arXiv preprint arXiv:2404.11553 , year=

arXiv

[60] [60]

arXiv preprint arXiv:2402.14714 , year=

Efficient and effective vocabulary expansion towards multilingual large language models , author=. arXiv preprint arXiv:2402.14714 , year=

arXiv

[61] [61]

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Marchisio, Kelly and Lewis, Patrick and Chen, Yihong and Artetxe, Mikel. Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.338

work page doi:10.18653/v1/2023.findings-acl.338 2023

[62] [62]

arXiv preprint arXiv:2007.09757 , year=

Mono vs multilingual transformer-based models: a comparison across several language tasks , author=. arXiv preprint arXiv:2007.09757 , year=

arXiv 2007

[63] [63]

arXiv preprint arXiv:2010.11934 , year=

mt5: A massively multilingual pre-trained text-to-text transformer , author=. arXiv preprint arXiv:2010.11934 , year=

arXiv 2010

[64] [64]

arXiv preprint arXiv:2401.01055 , year=

Llama beyond english: An empirical study on language capability transfer , author=. arXiv preprint arXiv:2401.01055 , year=

arXiv

[65] [65]

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining , pages=

Improving cross-lingual information retrieval on low-resource languages via optimal transport distillation , author=. Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining , pages=

[66] [66]

arXiv preprint arXiv:2401.13303 , year=

Mala-500: Massive language adaptation of large language models , author=. arXiv preprint arXiv:2401.13303 , year=

arXiv

[67] [67]

Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

Alexa teacher model: Pretraining and distilling multi-billion-parameter encoders for natural language understanding systems , author=. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

[68] [68]

arXiv preprint arXiv:2407.21783 , year=

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

Pith/arXiv arXiv

[69] [69]

arXiv preprint arXiv:2408.00118 , year=

Gemma 2: Improving open language models at a practical size , author=. arXiv preprint arXiv:2408.00118 , year=

Pith/arXiv arXiv

[70] [70]

arXiv preprint arXiv:2312.11805 , year=

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

Pith/arXiv arXiv

[71] [71]

Publications Manual , year = "1983", publisher =

1983

[72] [72]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[73] [73]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of. 2007 , url=

2007

[74] [74]

Dan Gusfield , title =. 1997

1997

[75] [75]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015

[76] [76]

and Lin, Lucy H

Chau, Ethan C. and Lin, Lucy H. and Smith, Noah A. Parsing with Multilingual BERT , a Small Corpus, and a Small Treebank. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.118

work page doi:10.18653/v1/2020.findings-emnlp.118 2020

[77] [77]

and Zettlemoyer, Luke

Blevins, Terra and Limisiewicz, Tomasz and Gururangan, Suchin and Li, Margaret and Gonen, Hila and Smith, Noah A. and Zettlemoyer, Luke. Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.604

work page doi:10.18653/v1/2024.emnlp-main.604 2024

[78] [78]

Are All Languages Created Equal in Multilingual BERT ?

Wu, Shijie and Dredze, Mark. Are All Languages Created Equal in Multilingual BERT ?. Proceedings of the 5th Workshop on Representation Learning for NLP. 2020. doi:10.18653/v1/2020.repl4nlp-1.16

work page doi:10.18653/v1/2020.repl4nlp-1.16 2020

[79] [79]

Conneau, K

Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...

work page doi:10.18653/v1/2020.acl-main.747 2020

[80] [80]

Cross-lingual Name Tagging and Linking for 282 Languages

Pan, Xiaoman and Zhang, Boliang and May, Jonathan and Nothman, Joel and Knight, Kevin and Ji, Heng. Cross-lingual Name Tagging and Linking for 282 Languages. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1178

work page doi:10.18653/v1/p17-1178 2017