Recognition: unknown
Scripts Through Time: A Survey of the Evolving Role of Transliteration in NLP
Pith reviewed 2026-05-10 05:04 UTC · model grok-4.3
The pith
Transliteration converts writing systems to raise lexical overlap and ease cross-lingual transfer in NLP.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Transliteration converts text from one script to another so that models see greater lexical overlap across languages and therefore transfer knowledge more effectively. Different ways of adding transliteration at input time have appeared over time, each carrying its own accuracy, efficiency, and coverage trade-offs that vary with the target languages and tasks.
What carries the argument
Taxonomy of motivations for transliteration together with the set of input-incorporation approaches that organize benefits across code-mixing, language-family relatedness, and inference efficiency.
If this is right
- For code-mixed text, transliteration improves model handling by aligning mixed scripts into one representation.
- Languages from the same family gain more from transfer when transliteration increases shared vocabulary.
- Inference-time speed improves when transliteration replaces heavier multilingual tokenizers in large models.
- Researchers obtain concrete selection rules that match transliteration strategy to language, task, and compute limits.
Where Pith is reading between the lines
- The same trade-off analysis could be reapplied to test whether transliteration still helps once models reach the scale of the newest LLMs released after the survey.
- Low-resource languages not heavily represented in the reviewed literature could be checked to see if the same taxonomy still predicts useful strategies.
- Combining transliteration with other cross-lingual signals such as shared subword units might produce larger gains than either method alone.
Load-bearing premise
The taxonomy and the collected studies together cover the main ways transliteration is used today without leaving out important recent work or major language settings.
What would settle it
A controlled experiment on a language pair and task outside the surveyed cases that shows transliteration either reduces accuracy or adds no measurable gain would undermine the selection recommendations.
Figures
read the original abstract
Cross-lingual transfer in NLP is often hindered by the ``script barrier'' where differences in writing systems inhibit transfer learning between languages. Transliteration, the process of converting the script, has emerged as a powerful technique to bridge this gap by increasing lexical overlap. This paper provides a comprehensive survey of the application of transliteration in cross-lingual NLP. We present a taxonomy of key motivations to utilize transliterations in language models, and provide an overview of different approaches of incorporating transliterations as input. We analyze the evolution and effectiveness of these methods, discussing the critical trade-offs involved, and contextualize their need in modern LLMs. The review explores various settings that show how transliteration is beneficial, including handling code-mixed text, leveraging language family relatedness, and pragmatic gains in inference efficiency. Based on this analysis, we provide concrete recommendations for researchers on selecting and implementing the most appropriate transliteration strategy based on their specific language, task, and resource constraints.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper surveys the role of transliteration in NLP for overcoming the script barrier in cross-lingual transfer. It introduces a taxonomy of motivations for using transliteration in language models, overviews methods for incorporating transliterations as input, analyzes the evolution, effectiveness, and trade-offs of these approaches, and situates them within modern LLMs. The review highlights benefits in code-mixed text handling, leveraging language family relatedness, and inference efficiency gains, and concludes with concrete recommendations for selecting transliteration strategies based on language, task, and resource constraints.
Significance. If the taxonomy and literature synthesis prove comprehensive, the survey offers a useful organizing framework for researchers in multilingual NLP and LLMs by consolidating motivations, approaches, and practical trade-offs. It explicitly credits prior work through structured analysis and provides actionable recommendations that could aid strategy selection in resource-constrained settings, particularly where efficiency matters. The emphasis on pragmatic gains in modern LLMs is a timely contribution if recent developments are adequately covered.
major comments (1)
- [Taxonomy and modern LLMs contextualization] Taxonomy section and the modern LLMs contextualization (as referenced in the abstract): The central claim that transliteration yields benefits in code-mixing, language-family transfer, and inference efficiency, leading to concrete strategy recommendations, depends on the taxonomy comprehensively capturing current practice. If post-2022 literature on transliteration interactions with subword/byte-level tokenizers (e.g., in mT5, Llama, or Mistral-style models) or long-context efficiency is omitted, the generalization to current LLM settings is undermined. This is load-bearing for the recommendations in resource-constrained scenarios.
minor comments (2)
- [Abstract] The abstract states the survey contextualizes transliteration in modern LLMs but does not indicate the cutoff date for reviewed literature or list key recent models explicitly; adding this would help readers evaluate coverage.
- [Taxonomy] Figure or table presenting the taxonomy of motivations could benefit from clearer visual distinction between historical and contemporary approaches to aid quick reference.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for identifying a key area for strengthening the manuscript. We address the major comment below and will incorporate revisions to enhance the coverage of recent literature.
read point-by-point responses
-
Referee: Taxonomy section and the modern LLMs contextualization (as referenced in the abstract): The central claim that transliteration yields benefits in code-mixing, language-family transfer, and inference efficiency, leading to concrete strategy recommendations, depends on the taxonomy comprehensively capturing current practice. If post-2022 literature on transliteration interactions with subword/byte-level tokenizers (e.g., in mT5, Llama, or Mistral-style models) or long-context efficiency is omitted, the generalization to current LLM settings is undermined. This is load-bearing for the recommendations in resource-constrained scenarios.
Authors: We agree that robust coverage of post-2022 developments is necessary to support the recommendations for modern LLM settings. The taxonomy organizes motivations (e.g., lexical overlap for code-mixing, family relatedness, and efficiency) that are largely architecture-agnostic, and the manuscript already reviews the evolution of incorporation methods through transformer-based models while contextualizing needs in LLMs. However, to directly address the concern, we will expand the modern LLMs section with additional analysis and citations of recent works examining transliteration interactions with subword tokenizers (as in Llama and Mistral) and byte-level approaches, including any documented effects on long-context efficiency. This revision will make the generalization and practical recommendations more explicit and evidence-based without altering the core taxonomy. revision: yes
Circularity Check
No circularity: survey aggregates external studies
full rationale
This is a literature survey that presents a taxonomy of motivations, overviews approaches from cited works, analyzes trade-offs, and offers recommendations based on external evidence. No equations, fitted parameters, or self-derived predictions exist. Central claims about benefits in code-mixing, language-family transfer, and efficiency rest on reviewed literature rather than reducing to the paper's own inputs by construction. Self-citations, if present, are not load-bearing for any derivation chain.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Zhuang, Wenhao and Sun, Yuan and Zhao, Xiaobing. Enhancing Cross-Lingual Transfer through Reversible Transliteration: A H uffman-Based Approach for Low-Resource Languages. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.795
-
[2]
NICT ' s Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers
Dabre, Raj and Kunchukuttan, Anoop and Fujita, Atsushi and Sumita, Eiichiro. NICT ' s Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers. Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation. 2018
2018
-
[3]
Robust neural machine translation with joint textual and phonetic embedding
Liu, Hairong and Ma, Mingbo and Huang, Liang and Xiong, Hao and He, Zhongjun. Robust Neural Machine Translation with Joint Textual and Phonetic Embedding. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1291
-
[4]
Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages
Nakov, Preslav and Ng, Hwee Tou. Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009
2009
-
[5]
R oman L ens: The Role Of Latent R omanization In Multilinguality In LLM s
Saji, Alan and Husain, Jaavid Aktar and Jayakumar, Thanmay and Dabre, Raj and Kunchukuttan, Anoop and Puduppully, Ratish. R oman L ens: The Role Of Latent R omanization In Multilinguality In LLM s. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.1354
-
[6]
Hyperpolyglot LLM s: Cross-Lingual Interpretability in Token Embeddings
Wen-Yi, Andrea W and Mimno, David. Hyperpolyglot LLM s: Cross-Lingual Interpretability in Token Embeddings. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.71
-
[7]
How Multilingual is Multilingual BERT ?
Pires, Telmo and Schlinger, Eva and Garrette, Dan. How Multilingual is Multilingual BERT ?. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1493
-
[8]
Muller, Benjamin and Anastasopoulos, Antonios and Sagot, Beno \^i t and Seddah, Djam \'e. When Being Unseen from m BERT is just the Beginning: Handling New Languages With Multilingual Language Models. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10...
-
[9]
Pushing the Limits of Low-Resource Morphological Inflection
Anastasopoulos, Antonios and Neubig, Graham. Pushing the Limits of Low-Resource Morphological Inflection. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1091
-
[10]
A Simple but Effective Approach to Improve A rabizi-to- E nglish Statistical Machine Translation
van der Wees, Marlies and Bisazza, Arianna and Monz, Christof. A Simple but Effective Approach to Improve A rabizi-to- E nglish Statistical Machine Translation. Proceedings of the 2nd Workshop on Noisy User-generated Text ( WNUT ). 2016
2016
-
[11]
Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration
Lin, Ying and Pan, Xiaoman and Deri, Aliya and Ji, Heng and Knight, Kevin. Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration. Proceedings of the Sixth Named Entity Workshop. 2016. doi:10.18653/v1/W16-2701
-
[12]
Integrating an Unsupervised Transliteration Model into Statistical Machine Translation
Durrani, Nadir and Sajjad, Hassan and Hoang, Hieu and Koehn, Philipp. Integrating an Unsupervised Transliteration Model into Statistical Machine Translation. Proceedings of the 14th Conference of the E uropean Chapter of the Association for Computational Linguistics, volume 2: Short Papers. 2014. doi:10.3115/v1/E14-4029
-
[13]
Improving machine translation via triangulation and transliteration
Durrani, Nadir and Koehn, Philipp. Improving machine translation via triangulation and transliteration. Proceedings of the 17th Annual Conference of the European Association for Machine Translation. 2014
2014
-
[14]
R omanization-based Approach to Morphological Analysis in K orean SMS Text Processing
Kim, Youngsam and Shin, Hyopil. R omanization-based Approach to Morphological Analysis in K orean SMS Text Processing. Proceedings of the Sixth International Joint Conference on Natural Language Processing. 2013
2013
-
[15]
He, Junqing and Wu, Long and Zhao, Xuemin and Yan, Yonghong. HCCL at S em E val-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity. Proceedings of the 11th International Workshop on Semantic Evaluation ( S em E val-2017). 2017. doi:10.18653/v1/S17-2033
-
[16]
Using Transliteration of Proper Names from A rabic to L atin Script to Improve E nglish- A rabic Word Alignment
Semmar, Nasredine and Saadane, Houda. Using Transliteration of Proper Names from A rabic to L atin Script to Improve E nglish- A rabic Word Alignment. Proceedings of the Sixth International Joint Conference on Natural Language Processing. 2013
2013
-
[17]
How do you pronounce your name? Improving G 2 P with transliterations
Bhargava, Aditya and Kondrak, Grzegorz. How do you pronounce your name? Improving G 2 P with transliterations. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011
2011
-
[18]
Robust Deep Learning Based Sentiment Classification of Code-Mixed Text
Mukherjee, Siddhartha and Prasan, Vinuthkumar and Nediyanchath, Anish and Shah, Manan and Kumar, Nikhil. Robust Deep Learning Based Sentiment Classification of Code-Mixed Text. Proceedings of the 16th International Conference on Natural Language Processing. 2019
2019
-
[19]
and Joanis, Eric and Kuhn, Roland and Foster, George and Popowich, Fred
Kashani, Mehdi M. and Joanis, Eric and Kuhn, Roland and Foster, George and Popowich, Fred. Integration of an A rabic Transliteration Module into a Statistical Machine Translation System. Proceedings of the Second Workshop on Statistical Machine Translation. 2007
2007
-
[20]
Cross-lingual Named Entity List Search via Transliteration
Khakhmovich, Aleksandr and Pavlova, Svetlana and Kirillova, Kira and Arefyev, Nikolay and Savilova, Ekaterina. Cross-lingual Named Entity List Search via Transliteration. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020
2020
-
[21]
A rabizi sentiment analysis based on transliteration and automatic corpus annotation
Guellil, Imane and Adeel, Ahsan and Azouaou, Faical and Benali, Fodil and Hachani, Ala-eddine and Hussain, Amir. A rabizi sentiment analysis based on transliteration and automatic corpus annotation. Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2018. doi:10.18653/v1/W18-6249
-
[22]
Towards Offensive Language Identification for D ravidian Languages
Sai, Siva and Sharma, Yashvardhan. Towards Offensive Language Identification for D ravidian Languages. Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. 2021
2021
-
[23]
Palanikumar, Vasanth and Benhur, Sean and Hande, Adeep and Chakravarthi, Bharathi Raja. DE - ABUSE @ T amil NLP - ACL 2022: Transliteration as Data Augmentation for Abuse Detection in T amil. Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages. 2022. doi:10.18653/v1/2022.dravidianlangtech-1.5
-
[24]
Hate Speech and Offensive Language Detection in B engali
Das, Mithun and Banerjee, Somnath and Saha, Punyajoy and Mukherjee, Animesh. Hate Speech and Offensive Language Detection in B engali. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2022. doi:1...
-
[25]
E nglish to B engali Multimodal Neural Machine Translation using Transliteration-based Phrase Pairs Augmentation
Laskar, Sahinur Rahman and Dadure, Pankaj and Manna, Riyanka and Pakray, Partha and Bandyopadhyay, Sivaji. E nglish to B engali Multimodal Neural Machine Translation using Transliteration-based Phrase Pairs Augmentation. Proceedings of the 9th Workshop on Asian Translation. 2022
2022
-
[26]
DLRG - D ravidian L ang T ech@ EACL 2024 : Combating Hate Speech in T elugu Code-mixed Text on Social Media
Rajalakshmi, Ratnavel and M, Saptharishee and S, Hareesh and R, Gabriel and Sr, Varsini. DLRG - D ravidian L ang T ech@ EACL 2024 : Combating Hate Speech in T elugu Code-mixed Text on Social Media. Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages. 2024
2024
-
[27]
Sandalphon@ D ravidian L ang T ech- EACL 2024: Hate and Offensive Language Detection in T elugu Code-mixed Text using Transliteration-Augmentation
Tabassum, Nafisa and Khan, Mosabbir and Ahsan, Shawly and Hossain, Jawad and Hoque, Mohammed Moshiul. Sandalphon@ D ravidian L ang T ech- EACL 2024: Hate and Offensive Language Detection in T elugu Code-mixed Text using Transliteration-Augmentation. Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages. 2024
2024
-
[28]
R omanization-based Large-scale Adaptation of Multilingual Language Models
Purkayastha, Sukannya and Ruder, Sebastian and Pfeiffer, Jonas and Gurevych, Iryna and Vuli \'c , Ivan. R omanization-based Large-scale Adaptation of Multilingual Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.538
-
[29]
R o M antra: Optimizing Neural Machine Translation for Low-Resource Languages through R omanization
Soni, Govind and Bhattacharyya, Pushpak. R o M antra: Optimizing Neural Machine Translation for Low-Resource Languages through R omanization. Proceedings of the 21st International Conference on Natural Language Processing (ICON). 2024
2024
-
[30]
Husain, Jaavid and Dabre, Raj and M, Aswanth and Gala, Jay and Jayakumar, Thanmay and Puduppully, Ratish and Kunchukuttan, Anoop. R oman S etu: Efficiently unlocking multilingual capabilities of Large Language Models via R omanization. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. do...
-
[31]
Jailbreaking LLM s with A rabic Transliteration and A rabizi
Al Ghanim, Mansour and Almohaimeed, Saleh and Zheng, Mengxin and Solihin, Yan and Lou, Qian. Jailbreaking LLM s with A rabic Transliteration and A rabizi. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1034
-
[32]
Ma, Chunlan and Liu, Yihong and Ye, Haotian and Schuetze, Hinrich. Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non- L atin Scripts. Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025). 2025. doi:10.18653/v1/2025.mrl-main.27
-
[33]
On R omanization for Model Transfer Between Scripts in Neural Machine Translation
Amrhein, Chantal and Sennrich, Rico. On R omanization for Model Transfer Between Scripts in Neural Machine Translation. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.223
-
[34]
Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLM s
Nag, Arijit and Mukherjee, Animesh and Ganguly, Niloy and Chakrabarti, Soumen. Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLM s. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.920
-
[35]
Muril: Multilingual representations for indian languages , author=. arXiv preprint arXiv:2103.10730 , year=
-
[36]
Unsupervised Machine Translation On D ravidian Languages
Koneru, Sai and Liu, Danni and Niehues, Jan. Unsupervised Machine Translation On D ravidian Languages. Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. 2021
2021
-
[37]
Unsupervised Cross-lingual Representation Learning at Scale
Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...
-
[38]
2017 , publisher=
Pinyin as subword unit for chinese-sourced neural machine translation , author=. 2017 , publisher=
2017
-
[39]
XMU Neural Machine Translation Systems for WAT 2018 M yanmar- E nglish Translation Task
Wang, Boli and Hu, Jinming and Chen, Yidong and Shi, Xiaodong. XMU Neural Machine Translation Systems for WAT 2018 M yanmar- E nglish Translation Task. Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation. 2018
2018
-
[40]
Arabic–Chinese Neural Machine Translation: Romanized Arabic as Subword Unit for Arabic-sourced Translation , year=
Aqlan, Fares and Fan, Xiaoping and Alqwbani, Abdullah and Al-Mansoub, Akram , journal=. Arabic–Chinese Neural Machine Translation: Romanized Arabic as Subword Unit for Arabic-sourced Translation , year=
-
[41]
arXiv preprint arXiv:1909.06516 , year=
A universal parent model for low-resource neural machine translation transfer , author=. arXiv preprint arXiv:1909.06516 , year=
-
[42]
Name Translation based on Fine-grained Named Entity Recognition in a Single Language
Sadamitsu, Kugatsu and Saito, Itsumi and Katayama, Taichi and Asano, Hisako and Matsuo, Yoshihiro. Name Translation based on Fine-grained Named Entity Recognition in a Single Language. Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16). 2016
2016
-
[43]
and Kunchukuttan, Anoop and Kumar, Pratyush
Doddapaneni, Sumanth and Aralikatte, Rahul and Ramesh, Gowtham and Goyal, Shreya and Khapra, Mitesh M. and Kunchukuttan, Anoop and Kumar, Pratyush. Towards Leaving No I ndic Language Behind: Building Monolingual Corpora, Benchmark and Models for I ndic Languages. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volu...
-
[44]
I ndic BART : A Pre-trained Model for Indic Natural Language Generation
Dabre, Raj and Shrotriya, Himani and Kunchukuttan, Anoop and Puduppully, Ratish and Khapra, Mitesh and Kumar, Pratyush. I ndic BART : A Pre-trained Model for Indic Natural Language Generation. Findings of the Association for Computational Linguistics: ACL 2022. 2022. doi:10.18653/v1/2022.findings-acl.145
-
[45]
Khemchandani, Yash and Mehtani, Sarvesh and Patil, Vaidehi and Awasthi, Abhijeet and Talukdar, Partha and Sarawagi, Sunita. Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: A n I ndic Languages Study. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conf...
-
[46]
Efficient Neural Machine Translation for Low-Resource Languages via Exploiting Related Languages
Goyal, Vikrant and Kumar, Sourav and Sharma, Dipti Misra. Efficient Neural Machine Translation for Low-Resource Languages via Exploiting Related Languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 2020. doi:10.18653/v1/2020.acl-srw.22
-
[47]
Dhamecha, Tejas and Murthy, Rudra and Bharadwaj, Samarth and Sankaranarayanan, Karthik and Bhattacharyya, Pushpak. Role of L anguage R elatedness in M ultilingual F ine-tuning of L anguage M odels: A C ase S tudy in I ndo- A ryan L anguages. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021....
-
[48]
Does Transliteration Help Multilingual Language Modeling?
Moosa, Ibraheem Muhammad and Akhter, Mahmud Elahi and Habib, Ashfia Binte. Does Transliteration Help Multilingual Language Modeling?. Findings of the Association for Computational Linguistics: EACL 2023. 2023. doi:10.18653/v1/2023.findings-eacl.50
-
[49]
The U niversity of M aryland ' s K azakh- E nglish Neural Machine Translation System at WMT 19
Briakou, Eleftheria and Carpuat, Marine. The U niversity of M aryland ' s K azakh- E nglish Neural Machine Translation System at WMT 19. Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). 2019. doi:10.18653/v1/W19-5308
-
[50]
Pre-training via Leveraging Assisting Languages for Neural Machine Translation
Song, Haiyue and Dabre, Raj and Mao, Zhuoyuan and Cheng, Fei and Kurohashi, Sadao and Sumita, Eiichiro. Pre-training via Leveraging Assisting Languages for Neural Machine Translation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 2020. doi:10.18653/v1/2020.acl-srw.37
-
[51]
Transliteration for Cross-Lingual Morphological Inflection
Murikinati, Nikitha and Anastasopoulos, Antonios and Neubig, Graham. Transliteration for Cross-Lingual Morphological Inflection. Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology. 2020. doi:10.18653/v1/2020.sigmorphon-1.22
-
[52]
Khatri, Jyotsana and Saini, Nikhil and Bhattacharyya, Pushpak. Language Relatedness and Lexical Closeness can help Improve Multilingual NMT : IITB ombay@ M ulti I ndic NMT WAT 2021. Proceedings of the 8th Workshop on Asian Translation (WAT2021). 2021. doi:10.18653/v1/2021.wat-1.26
-
[53]
Vania, Clara and Kementchedjhieva, Yova and S gaard, Anders and Lopez, Adam. A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCN...
-
[54]
Exploring the Impact of Transliteration on NLP Performance: Treating M altese as an A rabic Dialect
Micallef, Kurt and Eryani, Fadhl and Habash, Nizar and Bouamor, Houda and Borg, Claudia. Exploring the Impact of Transliteration on NLP Performance: Treating M altese as an A rabic Dialect. Proceedings of the Workshop on Computation and Written Language (CAWL 2023). 2023. doi:10.18653/v1/2023.cawl-1.4
-
[55]
Zhou, Shijia and Shan, Huangyan and Plank, Barbara and Litschko, Robert. M ai NLP at S em E val-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024). 2024. doi:10.18653/v1/2024.semeval-1.259
-
[56]
Cross-lingual Transfer Learning for J apanese Named Entity Recognition
Johnson, Andrew and Karanasou, Penny and Gaspers, Judith and Klakow, Dietrich. Cross-lingual Transfer Learning for J apanese Named Entity Recognition. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers). 2019. doi:10.18653/v1/N19-2023
-
[57]
Rijhwani, Shruti and Xie, Jiateng and Neubig, Graham and Carbonell, Jaime , title =. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence , articleno =. 2019 , isbn =. doi:10.1609/...
-
[58]
A Novel Approach towards Cross Lingual Sentiment Analysis using Transliteration and Character Embedding
Roychoudhury, Rajarshi and Dey, Subhrajit and Akhtar, Md and Das, Amitava and Naskar, Sudip. A Novel Approach towards Cross Lingual Sentiment Analysis using Transliteration and Character Embedding. Proceedings of the 19th International Conference on Natural Language Processing (ICON). 2022
2022
-
[59]
Multiple Character Embeddings for C hinese Word Segmentation
Zhou, Jianing and Wang, Jingkang and Liu, Gongshen. Multiple Character Embeddings for C hinese Word Segmentation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. 2019. doi:10.18653/v1/P19-2029
-
[60]
Putting Figures on Influences on M oroccan D arija from A rabic, F rench and S panish using the W ord N et
Mrini, Khalil and Bond, Francis. Putting Figures on Influences on M oroccan D arija from A rabic, F rench and S panish using the W ord N et. Proceedings of the 9th Global Wordnet Conference. 2018
2018
-
[61]
Chau, Ethan C. and Smith, Noah A. Specializing Multilingual Language Models: An Empirical Study. Proceedings of the 1st Workshop on Multilingual Representation Learning. 2021. doi:10.18653/v1/2021.mrl-1.5
-
[62]
Alternative Input Signals Ease Transfer in Multilingual Machine Translation
Sun, Simeng and Fan, Angela and Cross, James and Chaudhary, Vishrav and Tran, Chau and Koehn, Philipp and Guzm \'a n, Francisco. Alternative Input Signals Ease Transfer in Multilingual Machine Translation. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.363
-
[63]
Liu, Yihong and Ma, Chunlan and Ye, Haotian and Schuetze, Hinrich. T ransli C o: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.136
-
[64]
arXiv preprint arXiv:2409.17326 , year=
How transliterations improve crosslingual alignment , author=. arXiv preprint arXiv:2409.17326 , year=
-
[65]
Xhelili, Orgest and Liu, Yihong and Schuetze, Hinrich. Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.659
-
[66]
The Effect of Model Capacity and Script Diversity on Subword Tokenization for S orani K urdish
Salehi, Ali and Jacobs, Cassandra L. The Effect of Model Capacity and Script Diversity on Subword Tokenization for S orani K urdish. Proceedings of the 21st SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology. 2024. doi:10.18653/v1/2024.sigmorphon-1.6
-
[67]
S cript M ix: Mixing Scripts for Low-resource Language Parsing
Lee, Jaeseong and Lee, Dohyeon and Hwang, Seung-won. S cript M ix: Mixing Scripts for Low-resource Language Parsing. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.357
-
[68]
T rans MI : A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data
Liu, Yihong and Ma, Chunlan and Ye, Haotian and Sch. T rans MI : A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data. Proceedings of the 31st International Conference on Computational Linguistics. 2025
2025
-
[69]
Input Combination Strategies for Multi-Source Transformer Decoder
Libovick \'y , Jind r ich and Helcl, Jind r ich and Mare c ek, David. Input Combination Strategies for Multi-Source Transformer Decoder. Proceedings of the Third Conference on Machine Translation: Research Papers. 2018. doi:10.18653/v1/W18-6326
-
[70]
Emerging Cross-lingual Structure in Pretrained Language Models
Conneau, Alexis and Wu, Shijie and Li, Haoran and Zettlemoyer, Luke and Stoyanov, Veselin. Emerging Cross-lingual Structure in Pretrained Language Models. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.536
-
[71]
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
Lost in Transliteration: Bridging the Script Gap in Neural IR , author=. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
-
[72]
Johnson, Melvin and Schuster, Mike and Le, Quoc V. and Krikun, Maxim and Wu, Yonghui and Chen, Zhifeng and Thorat, Nikhil and Vi \'e gas, Fernanda and Wattenberg, Martin and Corrado, Greg and Hughes, Macduff and Dean, Jeffrey. G oogle ' s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Transactions of the Association for Co...
-
[73]
Zoph, Barret and Yuret, Deniz and May, Jonathan and Knight, Kevin. Transfer Learning for Low-Resource Neural Machine Translation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1163
-
[74]
Cross-lingual Language Model Pretraining , url =
Conneau, Alexis and Lample, Guillaume , booktitle =. Cross-lingual Language Model Pretraining , url =
-
[75]
Happiness is Sharing a Vocabulary: A Study of Transliteration Methods
Jung, Haeji and Kim, Jinju and Kim, Kyungjin and Roh, Youjeong and Mortensen, David R. Happiness is Sharing a Vocabulary: A Study of Transliteration Methods. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 1: Long Papers). 2026. doi:10.18653/v1/2026.eacl-long.365
-
[76]
Philippy, Fred and Guo, Siwen and Haddadan, Shohreh. Towards a Common Understanding of Contributing Factors for Cross-Lingual Transfer in Multilingual Language Models: A Review. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.323
-
[77]
Winata, Genta and Aji, Alham Fikri and Yong, Zheng Xin and Solorio, Thamar. The Decades Progress on Code-Switching Research in NLP : A Systematic Survey on Trends and Challenges. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.185
-
[78]
2026 , eprint=
Beyond Monolingual Assumptions: A Survey of Code-Switched NLP in the Era of Large Language Models across Modalities , author=. 2026 , eprint=
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.