A unified survey that consolidates Indian NLP resources by task, language, domain, and modality while identifying gaps in coverage and generalization.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 4years
2026 4representative citing papers
KS-PRET-5M is a newly released 5.09 million word Kashmiri pretraining dataset containing 12.13 million subword tokens after MuRIL tokenization, made available as a continuous text stream under CC BY 4.0.
A language-adaptive combination of generalist, specialist, and ensemble transformer models achieves 0.796 macro F1 and 0.826 accuracy on multilingual polarization detection across 22 languages.
A survey that taxonomizes motivations for transliteration in cross-lingual NLP, reviews incorporation approaches and their evolution, analyzes trade-offs in settings like code-mixing and language families, and offers implementation recommendations.
citing papers explorer
-
BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources
A unified survey that consolidates Indian NLP resources by task, language, domain, and modality while identifying gaps in coverage and generalization.
-
ks-pret-5m: a 5 million word, 12 million token kashmiri pretraining dataset
KS-PRET-5M is a newly released 5.09 million word Kashmiri pretraining dataset containing 12.13 million subword tokens after MuRIL tokenization, made available as a continuous text stream under CC BY 4.0.
-
MKJ at SemEval-2026 Task 9: A Comparative Study of Generalist, Specialist, and Ensemble Strategies for Multilingual Polarization
A language-adaptive combination of generalist, specialist, and ensemble transformer models achieves 0.796 macro F1 and 0.826 accuracy on multilingual polarization detection across 22 languages.
-
Scripts Through Time: A Survey of the Evolving Role of Transliteration in NLP
A survey that taxonomizes motivations for transliteration in cross-lingual NLP, reviews incorporation approaches and their evolution, analyzes trade-offs in settings like code-mixing and language families, and offers implementation recommendations.