Modular Monolingual Adaptation using Pretrained Language Models
Pith reviewed 2026-06-28 01:08 UTC · model grok-4.3
The pith
Replacing tokens, freezing embeddings, and tuning the rest adapts pretrained models better to low-resource languages than full finetuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By replacing tokens, freezing the corresponding embeddings, and tuning the rest of the model rather than the entire model, the adaptation to low-resource languages yields better results on natural language understanding tasks.
What carries the argument
Modular adaptation through token replacement and embedding freezing while selectively tuning model parameters.
If this is right
- The modular approach can be more effective than full tuning for low-resource language adaptation.
- It works for very low-resource cases such as Quechua with 8.5k training instances.
- Analysis shows the importance of training strategies and pretrained embedding choices.
- Performance gains are observed on mask filling, NER, and POS tasks.
Where Pith is reading between the lines
- Such modularity may reduce computational costs for adapting models to many languages.
- The results imply that preserving original embeddings helps retain cross-lingual knowledge.
- Similar freezing strategies could be explored for other layers in future adaptations.
Load-bearing premise
Freezing the embeddings after token replacement is enough to keep useful knowledge from the original model without updates or new interference.
What would settle it
If experiments show that updating all parameters including embeddings leads to higher accuracy on the NLU tasks for these languages, the modular claim would be challenged.
Figures
read the original abstract
Building monolingual language models (LMs) for low-resource languages typically relies on adapting pretrained language models (PLMs) by finetuning the whole model on the target language. This approach is widely favored over training from scratch, as it enables effective knowledge transfer. Additionally, prior work has shown that using a language-specific tokenizer can enhance the adaptability. In this work, we hypothesize that full model tuning is often unnecessary and propose a more modular approach. Specifically, we replace the tokens, freeze the corresponding embeddings, and tune the rest of the model. We use Scottish Gaelic, Irish, and Quechua for our experiments, with Quechua being a very low-resource language (8.5k training instances). Evaluation on natural language understanding (NLU) tasks -- mask filling, NER, and POS -- shows that our proposed approach improves performance when adapting models to low-resource languages. Additionally, we provide a comprehensive analysis of the effectiveness of training strategies, the choice of pretrained embeddings, and models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a modular approach to adapting pretrained language models (PLMs) to low-resource languages by replacing tokens with language-specific ones, freezing the corresponding embeddings, and fine-tuning only the remaining model parameters. Experiments are conducted on Scottish Gaelic, Irish, and Quechua (the latter with only 8.5k training instances), evaluating on NLU tasks including mask filling, NER, and POS tagging. The central claim is that this method improves performance over full-model fine-tuning, with additional analysis of training strategies, embedding choices, and model selections.
Significance. If the results hold after addressing the noted gaps, the work would indicate that full fine-tuning is often unnecessary for monolingual PLM adaptation in low-resource settings, potentially offering efficiency gains while better preserving pretrained knowledge. The focus on a very low-resource case (Quechua) and multiple tasks provides a relevant testbed for modular adaptation techniques.
major comments (2)
- [analysis of training strategies and embedding choices] The central claim depends on the assumption that freezing new embeddings after token replacement is sufficient to preserve transfer without harmful interference or the need for language-specific updates. However, the analysis of training strategies and embedding choices does not include a direct frozen-vs-unfrozen ablation under identical token replacement, which is required to validate this for Quechua's 8.5k-instance regime where upper layers may not compensate.
- [abstract and experimental evaluation] The abstract asserts that the proposed approach 'improves performance' on the NLU tasks but supplies no quantitative results, baselines, effect sizes, or statistical tests. If the results section lacks these controls and comparisons to full fine-tuning (and to token replacement without freezing), the empirical support for the central claim cannot be assessed.
minor comments (1)
- [method] The method description would benefit from a diagram or pseudocode illustrating the token replacement and which parameters are frozen vs. tuned.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive comments. We address each major comment below, clarifying the manuscript's content and outlining planned revisions where appropriate.
read point-by-point responses
-
Referee: [analysis of training strategies and embedding choices] The central claim depends on the assumption that freezing new embeddings after token replacement is sufficient to preserve transfer without harmful interference or the need for language-specific updates. However, the analysis of training strategies and embedding choices does not include a direct frozen-vs-unfrozen ablation under identical token replacement, which is required to validate this for Quechua's 8.5k-instance regime where upper layers may not compensate.
Authors: We appreciate the referee's emphasis on this distinction. The manuscript's analysis of training strategies explicitly compares the proposed modular approach (token replacement followed by freezing the new embeddings while tuning the remainder) against full fine-tuning after identical token replacement. The latter case updates the new embeddings and thus serves as the unfrozen counterpart. These comparisons are reported for all languages, including the 8.5k-instance Quechua setting. To make the frozen-vs-unfrozen contrast even more explicit, we will add a dedicated ablation subsection isolating this factor in the revised version. revision: yes
-
Referee: [abstract and experimental evaluation] The abstract asserts that the proposed approach 'improves performance' on the NLU tasks but supplies no quantitative results, baselines, effect sizes, or statistical tests. If the results section lacks these controls and comparisons to full fine-tuning (and to token replacement without freezing), the empirical support for the central claim cannot be assessed.
Authors: We agree that the abstract would be strengthened by including concrete quantitative support. In revision we will update the abstract to report key performance deltas versus full fine-tuning on mask filling, NER, and POS tagging, along with the primary baselines. The results section already presents direct comparisons to full fine-tuning (which encompasses token replacement without freezing the new embeddings) across the three languages and tasks; we will ensure effect sizes are highlighted and will add any missing statistical significance markers if not already present. revision: yes
Circularity Check
No circularity: empirical adaptation method with independent experimental validation
full rationale
The paper proposes replacing tokens, freezing new embeddings, and tuning only upper layers for PLM adaptation to low-resource languages, then evaluates this on mask filling, NER, and POS tasks for Scottish Gaelic, Irish, and Quechua. No mathematical derivation chain, equations, or fitted parameters renamed as predictions exist. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim rests on direct empirical comparisons rather than any self-referential construction, satisfying the default expectation of no significant circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aho and Jeffrey D
Alfred V. Aho and Jeffrey D. Ullman , title =. 1972
1972
-
[2]
The Limits of Interpretation
Umberto Eco. The Limits of Interpretation
-
[3]
Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards
Jannik Strötgen and Michael Gertz. Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). 2012
2012
-
[4]
Chercheur
J.L. Chercheur. Case-Based Reasoning. 1994
1994
-
[5]
Castor and L
A. Castor and L. E. Pollux. The use of user modelling to guide inference and learning. Applied Intelligence. 1992
1992
-
[6]
Superman and B
S. Superman and B. Batman and C. Catwoman and S. Spiderman. Superheroes experiences with books. Journal journal journal
-
[7]
Elementary Statistics
Paul Gerhard Hoel. Elementary Statistics. 1971
1971
-
[8]
1954--58
A history of technology. 1954--58
1954
-
[9]
N. Chomsky. Conditions on Transformations. A festschrift for Morris Halle. 1973
1973
-
[10]
Natural Fibre Twines
BSI. Natural Fibre Twines. 1973
1973
-
[11]
Language: Its Nature, Development, and Origin
Otto Jespersen. Language: Its Nature, Development, and Origin
-
[12]
Proceedings of the 29th International Conference on Computational Linguistics , pages=
Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning , author=. Proceedings of the 29th International Conference on Computational Linguistics , pages=
-
[13]
Accelerating Multilingual Language Model for Excessively Tokenized Languages
Hong, Jimin and Lee, Gibbeum and Cho, Jaewoong. Accelerating Multilingual Language Model for Excessively Tokenized Languages. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.660
-
[14]
Efficient Active Learning with Adapters
Galimzianova, Daria and Sanochkin, Leonid. Efficient Active Learning with Adapters. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.840
-
[15]
Multi- BERT : Leveraging Adapters for Low-Resource Multi-Domain Adaptation
Abed Azad, Parham and Beigy, Hamid. Multi- BERT : Leveraging Adapters for Low-Resource Multi-Domain Adaptation. Proceedings of the Tenth Workshop on Noisy and User-generated Text. 2025. doi:10.18653/v1/2025.wnut-1.12
-
[16]
Multilingual Machine Translation with Hyper-Adapters
Baziotis, Christos and Artetxe, Mikel and Cross, James and Bhosale, Shruti. Multilingual Machine Translation with Hyper-Adapters. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.77
-
[17]
A dapter H ub: A Framework for Adapting Transformers
Pfeiffer, Jonas and R. A dapter H ub: A Framework for Adapting Transformers. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020. doi:10.18653/v1/2020.emnlp-demos.7
-
[18]
and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich
Alabi, Jesujoba O. and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich. Adapting Pre-trained Language Models to A frican Languages via Multilingual Adaptive Fine-Tuning. Proceedings of the 29th International Conference on Computational Linguistics. 2022
2022
-
[19]
International Conference on Machine Learning , pages=
Overtrained Language Models Are Harder to Fine-Tune , author=. International Conference on Machine Learning , pages=. 2025 , organization=
2025
-
[20]
12 Oleksiy Syvokon and Mariana Romanyshyn
Rust, Phillip and Pfeiffer, Jonas and Vuli \'c , Ivan and Ruder, Sebastian and Gurevych, Iryna. How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volum...
-
[21]
de Vries, Wietse and Nissim, Malvina. As Good as New. How to Successfully Recycle E nglish GPT -2 to Make Models for Other Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.74
-
[22]
Limisiewicz, Tomasz and Balhar, Ji r \'i and Mare c ek, David. Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.350
-
[23]
Rethinking Vocabulary Augmentation: Addressing the Challenges of Low-Resource Languages in Multilingual Models
Lin, Nankai and Zeng, Peijian and Zheng, Weixiong and Jiang, Shengyi and Zhou, Dong and Yang, Aimin. Rethinking Vocabulary Augmentation: Addressing the Challenges of Low-Resource Languages in Multilingual Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025
2025
-
[24]
arXiv preprint arXiv:2406.11477 , year=
How Can We Effectively Expand the Vocabulary of LLMs with 0.01 GB of Target Language Text? , author=. arXiv preprint arXiv:2406.11477 , year=
-
[25]
Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=
2019
-
[26]
Chau, Ethan C. and Smith, Noah A. Specializing Multilingual Language Models: An Empirical Study. Proceedings of the 1st Workshop on Multilingual Representation Learning. 2021. doi:10.18653/v1/2021.mrl-1.5
-
[27]
Muller, Benjamin and Anastasopoulos, Antonios and Sagot, Beno \^i t and Seddah, Djam \'e. When Being Unseen from m BERT is just the Beginning: Handling New Languages With Multilingual Language Models. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10...
-
[28]
Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in H ausa Language Using A fri BERT a
Sani, Sani Abdullahi and Muhammad, Shamsuddeen Hassan and Jarvis, Devon. Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in H ausa Language Using A fri BERT a. Proceedings of the First Workshop on Language Models for Low-Resource Languages. 2025
2025
-
[29]
MAD-X : A n A dapter- B ased F ramework for M ulti- T ask C ross- L ingual T ransfer
Pfeiffer, Jonas and Vuli \'c , Ivan and Gurevych, Iryna and Ruder, Sebastian. MAD-X : A n A dapter- B ased F ramework for M ulti- T ask C ross- L ingual T ransfer. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.617
-
[30]
Wang, Zirui and Lipton, Zachary C. and Tsvetkov, Yulia. On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.359
-
[31]
arXiv preprint arXiv:1912.07076 , year=
Multilingual is not enough: BERT for Finnish , author=. arXiv preprint arXiv:1912.07076 , year=
arXiv 1912
-
[32]
arXiv preprint arXiv:2003.02912 , year=
What the [mask]? making sense of language-specific BERT models , author=. arXiv preprint arXiv:2003.02912 , year=
arXiv 2003
-
[33]
M ulti F i T : Efficient Multi-lingual Language Model Fine-tuning
Eisenschlos, Julian and Ruder, Sebastian and Czapla, Piotr and Kadras, Marcin and Gugger, Sylvain and Howard, Jeremy. M ulti F i T : Efficient Multi-lingual Language Model Fine-tuning. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCN...
-
[34]
arXiv preprint arXiv:2303.08774 , year=
Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=
-
[35]
Exploring the Impact of Transliteration on NLP Performance: Treating M altese as an A rabic Dialect
Micallef, Kurt and Eryani, Fadhl and Habash, Nizar and Bouamor, Houda and Borg, Claudia. Exploring the Impact of Transliteration on NLP Performance: Treating M altese as an A rabic Dialect. Proceedings of the Workshop on Computation and Written Language (CAWL 2023). 2023. doi:10.18653/v1/2023.cawl-1.4
-
[36]
arXiv preprint arXiv:2407.02320 , year=
Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts , author=. arXiv preprint arXiv:2407.02320 , year=
-
[37]
arXiv preprint arXiv:2409.17326 , year=
How Transliterations Improve Crosslingual Alignment , author=. arXiv preprint arXiv:2409.17326 , year=
-
[38]
Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
Romanization-based Large-scale Adaptation of Multilingual Language Models , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
2023
-
[39]
arXiv preprint arXiv:2203.09904 , year=
Do Multilingual Language Models Capture Differing Moral Norms? , author=. arXiv preprint arXiv:2203.09904 , year=
-
[40]
The 2023 W eb NLG Shared Task on Low Resource Languages
Cripwell, Liam and Belz, Anya and Gardent, Claire and Gatt, Albert and Borg, Claudia and Borg, Marthese and Judge, John and Lorandi, Michela and Nikiforovskaya, Anna and Soto Martinez, William. The 2023 W eb NLG Shared Task on Low Resource Languages. Overview and Evaluation Results ( W eb NLG 2023). Proceedings of the Workshop on Multimodal, Multilingual ...
2023
-
[41]
and McDonald, Ryan and Petrov, Slav and Pyysalo, Sampo and Silveira, Natalia and Tsarfaty, Reut and Zeman, Daniel
Nivre, Joakim and de Marneffe, Marie-Catherine and Ginter, Filip and Goldberg, Yoav and Haji c , Jan and Manning, Christopher D. and McDonald, Ryan and Petrov, Slav and Pyysalo, Sampo and Silveira, Natalia and Tsarfaty, Reut and Zeman, Daniel. U niversal D ependencies v1: A Multilingual Treebank Collection. Proceedings of the Tenth International Conferenc...
2016
-
[42]
and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel
Nivre, Joakim and de Marneffe, Marie-Catherine and Ginter, Filip and Haji c , Jan and Manning, Christopher D. and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel. U niversal D ependencies v2: An Evergrowing Multilingual Treebank Collection. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020
2020
-
[43]
Saedi, Chakaveh and Branco, Ant \'o nio and Ant \'o nio Rodrigues, Jo \ a o and Silva, Jo \ a o. W ord N et Embeddings. Proceedings of the Third Workshop on Representation Learning for NLP. 2018. doi:10.18653/v1/W18-3016
-
[44]
arXiv preprint arXiv:2307.09288 , year=
Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=
-
[45]
Advances in neural information processing systems , volume=
Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
-
[46]
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=
-
[47]
Journal of Machine Learning Research , volume=
Beyond english-centric multilingual machine translation , author=. Journal of Machine Learning Research , volume=
-
[48]
Mi, Maggie. Mmi01 at The B aby LM Challenge: Linguistically Motivated Curriculum Learning for Pretraining in Low-Resource Settings. Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning. 2023. doi:10.18653/v1/2023.conll-babylm.23
-
[49]
arXiv preprint arXiv:2402.07827 , year=
Aya model: An instruction finetuned open-access multilingual language model , author=. arXiv preprint arXiv:2402.07827 , year=
-
[50]
CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data
Wenzek, Guillaume and Lachaux, Marie-Anne and Conneau, Alexis and Chaudhary, Vishrav and Guzm \'a n, Francisco and Joulin, Armand and Grave, Edouard. CCN et: Extracting High Quality Monolingual Datasets from Web Crawl Data. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020
2020
-
[51]
Miller, George A. , title =. Commun. ACM , month = nov, pages =. 1995 , issue_date =. doi:10.1145/219717.219748 , abstract =
-
[52]
Advances in Neural Information Processing Systems , volume=
Direct preference optimization: Your language model is secretly a reward model , author=. Advances in Neural Information Processing Systems , volume=
-
[53]
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning. 2023
2023
-
[54]
arXiv preprint arXiv:2409.11968 , year=
Efficacy of Synthetic Data as a Benchmark , author=. arXiv preprint arXiv:2409.11968 , year=
-
[55]
arXiv preprint arXiv:2308.08747 , year=
An empirical study of catastrophic forgetting in large language models during continual fine-tuning , author=. arXiv preprint arXiv:2308.08747 , year=
-
[56]
MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina
Zhang, Chen and Tao, Mingxu and Huang, Quzhe and Lin, Jiuheng and Chen, Zhibin and Feng, Yansong. MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.479
-
[57]
News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces
J \"o rg Tiedemann. News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. Recent Advances in Natural Language Processing. 2009
2009
-
[58]
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=
A New Massive Multilingual Dataset for High-Performance Language Technologies , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=
2024
-
[59]
arXiv preprint arXiv:2404.11553 , year=
Quantifying multilingual performance of large language models across languages , author=. arXiv preprint arXiv:2404.11553 , year=
-
[60]
arXiv preprint arXiv:2402.14714 , year=
Efficient and effective vocabulary expansion towards multilingual large language models , author=. arXiv preprint arXiv:2402.14714 , year=
-
[61]
Marchisio, Kelly and Lewis, Patrick and Chen, Yihong and Artetxe, Mikel. Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.338
-
[62]
arXiv preprint arXiv:2007.09757 , year=
Mono vs multilingual transformer-based models: a comparison across several language tasks , author=. arXiv preprint arXiv:2007.09757 , year=
arXiv 2007
-
[63]
arXiv preprint arXiv:2010.11934 , year=
mt5: A massively multilingual pre-trained text-to-text transformer , author=. arXiv preprint arXiv:2010.11934 , year=
arXiv 2010
-
[64]
arXiv preprint arXiv:2401.01055 , year=
Llama beyond english: An empirical study on language capability transfer , author=. arXiv preprint arXiv:2401.01055 , year=
-
[65]
Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining , pages=
Improving cross-lingual information retrieval on low-resource languages via optimal transport distillation , author=. Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining , pages=
-
[66]
arXiv preprint arXiv:2401.13303 , year=
Mala-500: Massive language adaptation of large language models , author=. arXiv preprint arXiv:2401.13303 , year=
-
[67]
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
Alexa teacher model: Pretraining and distilling multi-billion-parameter encoders for natural language understanding systems , author=. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
-
[68]
arXiv preprint arXiv:2407.21783 , year=
The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=
-
[69]
arXiv preprint arXiv:2408.00118 , year=
Gemma 2: Improving open language models at a practical size , author=. arXiv preprint arXiv:2408.00118 , year=
-
[70]
arXiv preprint arXiv:2312.11805 , year=
Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=
-
[71]
Publications Manual , year = "1983", publisher =
1983
-
[72]
Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243
-
[73]
Scalable training of
Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of. 2007 , url=
2007
-
[74]
Dan Gusfield , title =. 1997
1997
-
[75]
Tetreault , title =
Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =
2015
-
[76]
Chau, Ethan C. and Lin, Lucy H. and Smith, Noah A. Parsing with Multilingual BERT , a Small Corpus, and a Small Treebank. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.118
-
[77]
Blevins, Terra and Limisiewicz, Tomasz and Gururangan, Suchin and Li, Margaret and Gonen, Hila and Smith, Noah A. and Zettlemoyer, Luke. Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.604
-
[78]
Are All Languages Created Equal in Multilingual BERT ?
Wu, Shijie and Dredze, Mark. Are All Languages Created Equal in Multilingual BERT ?. Proceedings of the 5th Workshop on Representation Learning for NLP. 2020. doi:10.18653/v1/2020.repl4nlp-1.16
-
[79]
Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...
-
[80]
Cross-lingual Name Tagging and Linking for 282 Languages
Pan, Xiaoman and Zhang, Boliang and May, Jonathan and Nothman, Joel and Knight, Kevin and Ji, Heng. Cross-lingual Name Tagging and Linking for 282 Languages. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1178
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.