Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

Haiyang Sun; Hanqi Jiang; Junhao Chen; Ruidong Zhang; Tianming Liu; Tianyang Zhong; Weihang You; Xiang Li; Yifan Zhou; Yiheng Liu

arxiv: 2412.04497 · v5 · submitted 2024-11-30 · 💻 cs.CL · cs.AI

Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

Tianyang Zhong , Zhenyuan Yang , Zhengliang Liu , Ruidong Zhang , Weihang You , Yiheng Liu , Haiyang Sun , Yi Pan

show 6 more authors

Yiwei Li Yifan Zhou Hanqi Jiang Junhao Chen Xiang Li Tianming Liu

This is my paper

Pith reviewed 2026-05-23 16:33 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords low-resource languageslarge language modelshumanities researchlinguistic variationcultural preservationdata scarcityinterdisciplinary collaborationethical considerations

0 comments

The pith

Large language models can enable new research methods for low-resource languages in linguistics, history, and culture despite data limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how recent large language models create opportunities to study and preserve low-resource languages that carry cultural and historical value. It examines applications across linguistic variation, historical documentation, cultural expressions, and literary analysis while noting persistent barriers such as scarce data and limited model adaptability. The work argues that customized models paired with collaboration between technologists and humanities researchers offer a practical route forward. A sympathetic reader would care because these languages encode irreplaceable records of human diversity that standard tools have so far left under-examined.

Core claim

Recent advancements in large language models offer transformative opportunities for addressing challenges in low-resource languages, enabling innovative methodologies in linguistic, historical, and cultural research. The paper systematically evaluates applications of LLMs in these domains, identifies key challenges such as data accessibility, model adaptability, and cultural sensitivity, and concludes that interdisciplinary collaboration and customized models are promising avenues for advancing research and preserving linguistic heritage.

What carries the argument

Systematic evaluation of LLM technical frameworks, current methodologies, and ethical considerations applied to low-resource language research in the humanities.

If this is right

Researchers gain new ways to document and analyze linguistic variation that traditional methods miss.
Historical and cultural records in low-resource languages become more accessible for study and preservation.
Customized models reduce the impact of data scarcity on humanities work.
Ethical guidelines for cultural sensitivity become necessary when deploying these tools.
Global efforts to safeguard intellectual diversity gain concrete technical support.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Success would create digital archives that combine LLM output with traditional fieldwork for endangered languages.
The approach could be tested by measuring research output before and after LLM assistance on the same set of texts.
Neighboring fields such as anthropology or oral-history studies might adopt similar customized models for their own low-resource data.
Wider adoption would require shared benchmarks that test both technical performance and respect for cultural context.

Load-bearing premise

LLMs can be meaningfully adapted to low-resource languages through customized models and interdisciplinary collaboration even when data is scarce and technology is limited.

What would settle it

A controlled comparison on a documented low-resource language showing that customized LLMs produce no measurable improvement in accuracy or completeness of linguistic, historical, or cultural analysis compared with non-LLM methods.

Figures

Figures reproduced from arXiv: 2412.04497 by Haiyang Sun, Hanqi Jiang, Junhao Chen, Ruidong Zhang, Tianming Liu, Tianyang Zhong, Weihang You, Xiang Li, Yifan Zhou, Yiheng Liu, Yi Pan, Yiwei Li, Zhengliang Liu, Zhenyuan Yang.

read the original abstract

Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity. Despite their significance, these languages face critical challenges, including data scarcity and technological limitations, which hinder their comprehensive study and preservation. Recent advancements in large language models (LLMs) offer transformative opportunities for addressing these challenges, enabling innovative methodologies in linguistic, historical, and cultural research. This study systematically evaluates the applications of LLMs in low-resource language research, encompassing linguistic variation, historical documentation, cultural expressions, and literary analysis. By analyzing technical frameworks, current methodologies, and ethical considerations, this paper identifies key challenges such as data accessibility, model adaptability, and cultural sensitivity. Given the cultural, historical, and linguistic richness inherent in low-resource languages, this work emphasizes interdisciplinary collaboration and the development of customized models as promising avenues for advancing research in this domain. By underscoring the potential of integrating artificial intelligence with the humanities to preserve and study humanity's linguistic and cultural heritage, this study fosters global efforts towards safeguarding intellectual diversity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript is a literature review surveying applications of large language models (LLMs) to low-resource languages within humanities research. It describes challenges including data scarcity and technological limitations, argues that recent LLM progress creates opportunities for new methods in linguistic variation, historical documentation, cultural expressions, and literary analysis, and identifies issues such as data accessibility, model adaptability, and cultural sensitivity. The paper positions customized models and interdisciplinary collaboration as promising directions while stressing ethical considerations and the value of preserving linguistic diversity.

Significance. A thorough synthesis of this kind could help orient researchers at the AI-humanities boundary and support preservation efforts for low-resource languages, particularly by cataloguing acknowledged limitations rather than overstating current capabilities. Its value would rest on the breadth and balance of the cited work and the specificity with which technical and ethical gaps are mapped.

major comments (1)

[Abstract] Abstract: the statement that the study 'systematically evaluates the applications of LLMs... encompassing linguistic variation, historical documentation, cultural expressions, and literary analysis' and 'analyzing technical frameworks, current methodologies' is central to the paper's claim of identifying key challenges, yet the abstract supplies no concrete examples, search methodology, or quantitative summary of the reviewed literature, leaving the evaluation's rigor and coverage difficult to assess.

minor comments (2)

[Abstract] The abstract repeatedly uses 'this study' and 'this paper' interchangeably; consistent terminology would improve clarity.
No explicit statement of the review protocol (databases, inclusion criteria, or time window) appears in the provided abstract; adding this in the introduction or methods section would strengthen reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and positive overall assessment of the manuscript as a literature review. We address the single major comment below and will revise the abstract accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that the study 'systematically evaluates the applications of LLMs... encompassing linguistic variation, historical documentation, cultural expressions, and literary analysis' and 'analyzing technical frameworks, current methodologies' is central to the paper's claim of identifying key challenges, yet the abstract supplies no concrete examples, search methodology, or quantitative summary of the reviewed literature, leaving the evaluation's rigor and coverage difficult to assess.

Authors: We agree that the abstract would benefit from greater specificity to better signal the review's scope and approach. The full manuscript details the literature synthesis across the four domains (linguistic variation, historical documentation, cultural expressions, and literary analysis) along with technical and ethical considerations. In revision we will add a concise clause to the abstract noting the breadth of sources examined and the focus on acknowledged limitations rather than overstating capabilities. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a systematic literature review surveying LLM applications to low-resource languages in humanities contexts. It presents no original derivations, equations, fitted parameters, predictions, or technical claims that reduce to inputs by construction. Opportunities are framed as potential avenues enabled by recent LLM progress, with explicit acknowledgment of data scarcity and limits; no self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The structure contains no derivation chain to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is a survey and introduces no new technical derivations, fitted parameters, or postulated entities; it relies on standard domain assumptions about LLM capabilities in NLP.

axioms (1)

domain assumption LLMs can be adapted to low-resource languages via customization and interdisciplinary effort
Implicit in the discussion of opportunities and the recommendation for customized models.

pith-pipeline@v0.9.0 · 5751 in / 1004 out tokens · 48005 ms · 2026-05-23T16:33:59.423469+00:00 · methodology

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cross-Lingual Transfer and Parameter-Efficient Adaptation in the Turkic Language Family: A Theoretical Framework for Low-Resource Language Models
cs.CL 2026-03 unverdicted novelty 7.0

The paper introduces the Turkic Transfer Coefficient (TTC) as a theoretical measure of transfer potential and a scaling model linking adaptation performance to model capacity, data size, and adaptation module expressi...
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
cs.LG 2026-04 unverdicted novelty 6.0

COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs
cs.AI 2026-05 unverdicted novelty 4.0

A research plan to analyze language distribution in LOD knowledge graphs and explore cross-lingual transfer plus analogical reasoning to improve coverage for low-resource languages.
LLM Harms: A Taxonomy and Discussion
cs.CY 2025-12 unverdicted novelty 3.0

This paper proposes a taxonomy of LLM harms in five categories and suggests mitigation strategies plus a dynamic auditing system for responsible development.

Reference graph

Works this paper leans on

157 extracted references · 157 canonical work pages · cited by 4 Pith papers · 14 internal anchors

[1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Arab World English Journal (AWEJ) Special Issue on Translation (2015)

Agliz, R.: Translation of religious texts: Difficulties and challenges. Arab World English Journal (AWEJ) Special Issue on Translation (2015)

work page 2015
[3]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics pp

Alam, F., Chowdhury, S.A., Boughorbel, S., Hasanain, M.: Llms for low resource languages in multilingual, multimodal, and dialectal settings. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics pp. 27–33 (2024)

work page 2024
[4]

Amrami, A., Goldberg, Y.: Towards better substitution-based word sense induction (2019), https://arxiv.org/abs/1905.12598

work page internal anchor Pith review Pith/arXiv arXiv 2019
[5]

Anderson, B.: Sacred Texts in a Digital Age: Materiality, Digital Culture, and the Functional Dimensions of Scriptures in Judaism, Christianity, and Islam, pp. 281–302. De Gruyter (06 2020). doi:10.1515/9783110634440-013

work page doi:10.1515/9783110634440-013 2020
[6]

https://www.anthropic.com/ news/1m-context (August 2025), anthropic API Update 31

Anthropic: Claude sonnet 4 now supports 1m tokens of context. https://www.anthropic.com/ news/1m-context (August 2025), anthropic API Update 31

work page 2025
[7]

anthropic (May 2025), https://www.anthropic.com/news/ claude-4

Anthropic: Introducing claude 4. anthropic (May 2025), https://www.anthropic.com/news/ claude-4

work page 2025
[8]

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

Arivazhagan, N., Bapna, A., Firat, O., Lepikhin, D., Johnson, M., Krikun, M., Chen, M.X., Cao, Y., Foster, G., Cherry, C., Macherey, W., Chen, Z., Wu, Y.: Massively multilingual neural ma- chine translation in the wild: Findings and challenges. arXiv preprint arXiv:1907.05019 (2019), https://arxiv.org/abs/1907.05019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[9]

Computational Linguistics 34(4), 555–596 (2008)

Artstein, R., Poesio, M.: Survey article: Inter-coder agreement for computational linguis- tics. Computational Linguistics 34(4), 555–596 (2008). doi:10.1162/coli.07-034-R2, https:// aclanthology.org/J08-4004/

work page doi:10.1162/coli.07-034-r2 2008
[10]

Nature 603(7900), 280–283 (2022)

Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., de Freitas, N.: Restoring and attributing ancient texts using deep neural networks. Nature 603(7900), 280–283 (2022)

work page 2022
[11]

In: 2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA)

Avyodri, R., Lukas, S., Tjahyadi, H.: Optical character recognition (ocr) for text recognition and its post-processing method: A literature review. In: 2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA). pp. 1–6. IEEE (2022)

work page 2022
[12]

In: International Conference Florence Heri-Tech: the Future of Heritage Science and Technologies

Barucci, A., Canfailla, C., Cucci, C., Forasassi, M., Franci, M., Guarducci, G., Guidi, T., Loschi- avo, M., Picollo, M., Pini, R., et al.: Ancient egyptian hieroglyphs segmentation and classification with convolutional neural networks. In: International Conference Florence Heri-Tech: the Future of Heritage Science and Technologies. pp. 126–139. Springer (2022)

work page 2022
[13]

In: 2023 19th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS)

Bimagambetova, Z., Rakhymzhanov, D., Jaxylykova, A., Pak, A.: Evaluating large language models for sentence augmentation in low-resource languages: A case study on kazakh. In: 2023 19th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS). pp. 14–18. IEEE (2023)

work page 2023
[14]

In: Proceedings of the ACM Web Conference 2022

Braylan, A., Alonso, O., Lease, M.: Measuring annotator agreement generally across complex structured, multi-object, and free-text annotation tasks. In: Proceedings of the ACM Web Conference 2022. p. 1720–1730. WWW ’22, ACM (Apr 2022). doi:10.1145/3485447.3512242, http://dx.doi.org/10.1145/3485447.3512242

work page doi:10.1145/3485447.3512242 2022
[15]

Brockington, J.: The concept of” dharma” in the r¯ am¯ ayan . a. Journal of Indian Philosophy 32(5/6), 655–670 (2004)

work page 2004
[16]

Language Models are Few-Shot Learners

Brown, T.B.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2005
[17]

In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

Calderon, N., Reichart, R., Dror, R.: The alternative annotator test for LLM-as-a-judge: How to statistically justify replacing human annotators with LLMs. In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T. (eds.) Proceedings of the 63rd Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers). pp. 16051–16081. Associat...

work page doi:10.18653/v1/2025.acl-long.782 2025
[18]

Carlo, V.D., Bianchi, F., Palmonari, M.: Training temporal word embeddings with a compass (2019), https://arxiv.org/abs/1906.02376

work page internal anchor Pith review Pith/arXiv arXiv 2019
[19]

low-resource fine-tuning: A case study with fact-checking in turkish

Cekinel, R.F., Karagoz, P., Coltekin, C.: Cross-lingual learning vs. low-resource fine-tuning: A case study with fact-checking in turkish. arXiv preprint arXiv:2403.00411 (2024)

work page arXiv 2024
[20]

arXiv preprint arXiv:2205.10964 (2022)

Chang, T.A., Tu, Z., Bergen, B.K.: The geometry of multilingual language model representations. arXiv preprint arXiv:2205.10964 (2022). doi:10.48550/arXiv.2205.10964, https://arxiv.org/ abs/2205.10964

work page doi:10.48550/arxiv.2205.10964 2022
[21]

arXiv preprint arXiv:2412.05184 (2024) 32

Chen, J., Shu, P., Li, Y., Zhao, H., Jiang, H., Pan, Y., Zhou, Y., Liu, Z., Howe, L.C., Liu, T.: Queen: A large language model for quechua-english translation. arXiv preprint arXiv:2412.05184 (2024) 32

work page arXiv 2024
[22]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Cho, K.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[23]

Unsupervised Cross-lingual Representation Learning at Scale

Conneau, A.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1911
[24]

arXiv preprint arXiv:2306.10095 (2023)

Dai, H., Li, Y., Liu, Z., Zhao, L., Wu, Z., Song, S., Shen, Y., Zhu, D., Li, X., Li, S., et al.: Ad-autogpt: An autonomous gpt for alzheimer’s disease infodemiology. arXiv preprint arXiv:2306.10095 (2023)

work page arXiv 2023
[25]

arXiv preprint arXiv:2302.13007 (2023)

Dai, H., Liu, Z., Liao, W., Huang, X., Wu, Z., Zhao, L., Liu, W., Liu, N., Li, S., Zhu, D., et al.: Chataug: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007 (2023)

work page arXiv 2023
[26]

Lingua 75(4), 325–361 (1988)

Deane, P.D.: Polysemy and cognition. Lingua 75(4), 325–361 (1988). doi:https://doi.org/10.1016/0024-3841(88)90009-5, https://www.sciencedirect.com/ science/article/pii/0024384188900095

work page doi:10.1016/0024-3841(88)90009-5 1988
[27]

google (March 2025), https://blog

DeepMind, G.: Gemini 2.5: Our most intelligent ai model. google (March 2025), https://blog. google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ #gemini-2-5-thinking

work page 2025
[28]

ahmed: Debate- induced bias in multilingual LLMs

Demidova, A., Atwany, H., Rabih, N., Sha’ban, S., Abdul-Mageed, M.: John vs. ahmed: Debate- induced bias in multilingual LLMs. In: Habash, N., Bouamor, H., Eskander, R., Tomeh, N., Abu Farha, I., Abdelali, A., Touileb, S., Hamed, I., Onaizan, Y., Alhafni, B., Antoun, W., Khalifa, S., Haddad, H., Zitouni, I., AlKhamissi, B., Almatham, R., Mrini, K. (eds.) ...

work page doi:10.18653/v1/2024.arabicnlp- 2024
[29]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

In: International Conference on Pattern Recognition Applications and Methods 2017

Dhali, M., He, S., Popovic, M., Tigchelaar, E., Schomaker, L.: A digital palaeographic approach towards writer identification in the dead sea scrolls. In: International Conference on Pattern Recognition Applications and Methods 2017. pp. 693–702 (2017)

work page 2017
[31]

arXiv preprint arXiv:1911.07930 (2019)

Dhali, M.A., de Wit, J.W., Schomaker, L.: Binet: Degraded-manuscript binarization in di- verse document textures and layouts using deep encoder-decoder networks. arXiv preprint arXiv:1911.07930 (2019)

work page arXiv 1911
[32]

The Llama 3 Herd of Models

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

In: 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO)

Dun ⃗der, I., Seljan, S., Pavlovski, M.: Automatic machine translation of poetry and a low- resource language pair. In: 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO). pp. 1034–1039. IEEE (2020)

work page 2020
[34]

In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers)

Ebrahimi, A., Mager, M., Oncevay, A., Chaudhary, V., Chiruzzo, L., Fan, A., Ortega, J., Ramos, R., Rios, A., Meza Ruiz, I.V., Gim´ enez-Lugo, G., Mager, E., Neubig, G., Palmer, A., Coto- Solano, R., Vu, T., Kann, K.: AmericasNLI: Evaluating zero-shot natural language understand- ing of pretrained multilingual models in truly low-resource languages. In: Pr...

work page 2022
[35]

In: 2024 National Conference on Communications (NCC)

Eledath, D., Baby, A., Singh, S.: Robust speech recognition using meta-learning for low- resource accents. In: 2024 National Conference on Communications (NCC). pp. 1–6 (2024). doi:10.1109/NCC60321.2024.10485786

work page doi:10.1109/ncc60321.2024.10485786 2024
[36]

Journal of translation 10(1), 25–33 (2014) 33

Elewa, A.: Features of translating religious texts. Journal of translation 10(1), 25–33 (2014) 33

work page 2014
[37]

Memoirs of the American Academy in Rome 65, 43–69 (2020)

Erasmo, M.: The theatre of pompey. Memoirs of the American Academy in Rome 65, 43–69 (2020)

work page 2020
[38]

In: Proceedings of ACL

Faisal, F., Ahia, O., Srivastava, A., Ahuja, K., Chiang, D., Tsvetkov, Y., Anastasopoulos, A.: DIALECTBENCH: A NLP benchmark for dialects, varieties, and closely-related languages. In: Proceedings of ACL. Association for Computational Linguistics, Bangkok, Thailand (Aug 2024), https://arxiv.org/pdf/2403.11009

work page arXiv 2024
[39]

Journal of Machine Learning Research 23(120), 1–39 (2022), https://jmlr.org/papers/v23/21-0998.html

Fedus, W., Zoph, B., Shazeer, N.: Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research 23(120), 1–39 (2022), https://jmlr.org/papers/v23/21-0998.html

work page 2022
[40]

In: International conference on machine learning

Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep net- works. In: International conference on machine learning. pp. 1126–1135. PMLR (2017)

work page 2017
[41]

In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Foroutan, N., Banaei, M., Lebret, R., Bosselut, A., Aberer, K.: Discovering language-neutral sub- networks in multilingual language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 7560–7575. Association for Computational Lin- guistics, Abu Dhabi, United Arab Emirates (Dec 2022). doi:10.18653/v1/2022....

work page doi:10.18653/v1/2022.emnlp-main.513 2022
[42]

arXiv preprint arXiv:2503.12051 (2025)

Gao, F., Huang, C., Tashi, N., Wang, X., Tsering, T., Ma-bao, B., Duojie, R., Luosang, G., Dongrub, R., Tashi, D., et al.: Tlue: A tibetan language understanding evaluation benchmark. arXiv preprint arXiv:2503.12051 (2025)

work page arXiv 2025
[43]

org/abs/2403.06399

Ginn, M., Tjuatja, L., He, T., Rice, E., Neubig, G., Palmer, A., Levin, L.: Glosslm: A massively multilingual corpus and pretrained model for interlinear glossed text (2024), https://arxiv. org/abs/2403.06399

work page arXiv 2024
[44]

In: Proceedings of the 58th Annual Meeting of the As- sociation for Computational Linguistics

Giulianelli, M., Del Tredici, M., Fern´ andez, R.: Analysing lexical semantic change with con- textualised word representations. In: Proceedings of the 58th Annual Meeting of the As- sociation for Computational Linguistics. Association for Computational Linguistics (2020). doi:10.18653/v1/2020.acl-main.365, http://dx.doi.org/10.18653/v1/2020.acl-main.365

work page doi:10.18653/v1/2020.acl-main.365 2020
[45]

Master of science thesis, International Institute of Information Technology (IIIT), Hyderabad (2023), https://api.semanticscholar.org/CorpusID:259371553, report No: IIIT/TH/2023/84

Goel, A.: Beyond the Surface: A Computational Exploration of Linguistic Ambiguity. Master of science thesis, International Institute of Information Technology (IIIT), Hyderabad (2023), https://api.semanticscholar.org/CorpusID:259371553, report No: IIIT/TH/2023/84

work page 2023
[46]

arXiv preprint arXiv:2010.08275 (2020)

Gonen, H., Ravfogel, S., Elazar, Y., Goldberg, Y.: It’s not greek to mbert: inducing word-level translations from multilingual bert. arXiv preprint arXiv:2010.08275 (2020)

work page arXiv 2010
[47]

Supervised sequence labelling with recurrent neural networks pp

Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks pp. 37–45 (2012)

work page 2012
[48]

arXiv preprint arXiv:2406.00684 (2024)

Guan, H., Yang, H., Wang, X., Han, S., Liu, Y., Jin, L., Bai, X., Liu, Y.: Deciphering oracle bone language with diffusion models. arXiv preprint arXiv:2406.00684 (2024)

work page arXiv 2024
[49]

International Journal of Heritage Studies 29(5), 398–412 (2023)

Gwerevende, S., Mthombeni, Z.M.: Safeguarding intangible cultural heritage: exploring the synergies in the transmission of indigenous languages, dance and music practices in southern africa. International Journal of Heritage Studies 29(5), 398–412 (2023)

work page 2023
[50]

In: Erk, K., Smith, N.A

Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. In: Erk, K., Smith, N.A. (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1489–1501. Association for Computational Linguistics, Berlin, Germany (Aug 2016). doi:10.1865...

work page doi:10.18653/v1/p16- 2016
[51]

arXiv preprint arXiv:2408.02237 (2024) 34

Hasan, M.A., Tarannum, P., Dey, K., Razzak, I., Naseem, U.: Do large language models speak all languages equally? a comparative study in low-resource settings. arXiv preprint arXiv:2408.02237 (2024) 34

work page arXiv 2024
[52]

arXiv preprint arXiv:2010.12309 (2020)

Hedderich, M.A., Lange, L., Adel, H., Str¨ otgen, J., Klakow, D.: A survey on recent approaches for natural language processing in low-resource scenarios. arXiv preprint arXiv:2010.12309 (2020)

work page arXiv 2010
[53]

In: North American Chapter of the Asso- ciation for Computational Linguistics (2020), https://api.semanticscholar.org/CorpusID: 225062337

Hedderich, M.A., Lange, L., Adel, H., Strotgen, J., Klakow, D.: A survey on recent approaches for natural language processing in low-resource scenarios. In: North American Chapter of the Asso- ciation for Computational Linguistics (2020), https://api.semanticscholar.org/CorpusID: 225062337

work page 2020
[54]

ter Hoeve, M., Grangier, D., Schluter, N.: High-resource methodological bias in low-resource investigations (2022), https://arxiv.org/abs/2211.07534

work page arXiv 2022
[55]

Lost in the Middle: How Language Models Use Long Contexts

Hofmann, V., Glavaˇ s, G., Ljubeˇ si´ c, N., Pierrehumbert, J.B., Sch¨ utze, H.: Geographic adaptation of pretrained language models. Transactions of the Association for Computational Linguistics 12, 411–431 (2024). doi:10.1162/tacl a 00652, https://aclanthology.org/2024.tacl-1.23/

work page internal anchor Pith review doi:10.1162/tacl 2024
[56]

arXiv preprint arXiv:2005.04290 (2020)

Huang, J., Kuchaiev, O., O’Neill, P., Lavrukhin, V., Li, J., Flores, A., Kucsko, G., Ginsburg, B.: Cross-language transfer learning, continuous learning, and domain adaptation for end-to-end automatic speech recognition. arXiv preprint arXiv:2005.04290 (2020)

work page arXiv 2005
[57]

In: International Conference on Machine Learning

Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., Gao, C., Huang, Y., Lyu, W., Zhang, Y., et al.: Position: Trustllm: Trustworthiness in large language models. In: International Conference on Machine Learning. pp. 20166–20270. PMLR (2024)

work page 2024
[58]

TrustLLM: Trustworthiness in Large Language Models

Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., Gao, C., Huang, Y., Lyu, W., Zhang, Y., et al.: Trustllm: Trustworthiness in large language models. arXiv preprint arXiv:2401.05561 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[59]

Faculty Scholarship 612 (2024), https: //digitalcommons.lindenwood.edu/faculty-research-papers/612, available online

Hutson, J., Ellsworth, P., Ellsworth, M.: Preserving linguistic diversity in the digital age: A scalable model for cultural heritage continuity. Faculty Scholarship 612 (2024), https: //digitalcommons.lindenwood.edu/faculty-research-papers/612, available online

work page 2024
[60]

Metaphorik

J¨ akel, O.: Hypotheses revisited: The cognitive theory of metaphor applied to religious texts. Metaphorik. de 2(1), 20–42 (2002)

work page 2002
[61]

Jiang, H., Pan, Y., Chen, J., Liu, Z., Zhou, Y., Shu, P., Li, Y., Zhao, H., Mihm, S., Howe, L.C., Liu, T.: Oraclesage: Towards unified visual-linguistic understanding of oracle bone scripts through cross-modal knowledge fusion (2024), https://arxiv.org/abs/2411.17837

work page arXiv 2024
[62]

edited by antonia michelle daymond, frederick l

Jones Medine, C.M.: T&t clark handbook of african american theology. edited by antonia michelle daymond, frederick l. ware, and eric lewis williams. new york: Bloomsbury t&t clark,

work page
[63]

464 pages. $198.00. Horizons 50(1), 230–232 (2023). doi:10.1017/hor.2023.26

work page doi:10.1017/hor.2023.26 2023
[65]

Proceedings of the National Academy of Sciences122(31), e2426815122 (2025)

Kamath, G., Yang, M., Reddy, S., Sonderegger, M., Card, D.: Semantic change in adults is not primarily a generational phenomenon. Proceedings of the National Academy of Sciences122(31), e2426815122 (2025). doi:10.1073/pnas.2426815122, https://www.pnas.org/doi/abs/10.1073/ pnas.2426815122

work page doi:10.1073/pnas.2426815122 2025
[66]

Journal of Translation and Language Studies 5(1), 1–9 (2024)

Karabayeva, I., Kalizhanova, A.: Evaluating machine translation of literature through rhetorical analysis. Journal of Translation and Language Studies 5(1), 1–9 (2024)

work page 2024
[67]

In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J

Khanuja, S., Dandapat, S., Srinivasan, A., Sitaram, S., Choudhury, M.: GLUECoS: An evaluation benchmark for code-switched NLP. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computa- tional Linguistics. pp. 3575–3585. Association for Computational Linguistics, Online (Jul 2020)....

work page doi:10.18653/v1/2020.acl-main.329 2020
[68]

In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases

Kholodna, N., Julka, S., Khodadadi, M., Gumus, M.N., Granitzer, M.: Llms in the loop: Lever- aging large language model annotations for active learning in low-resource languages. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 397–412. Springer (2024)

work page 2024
[69]

Kirk, H.R., Vidgen, B., R¨ ottger, P., Hale, S.A.: Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback (2023), https://arxiv.org/abs/2303.05453

work page arXiv 2023
[70]

SN Computer Science 5(4), 397 (2024)

Koopmans, L., Dhali, M.A., Schomaker, L.: Performance analysis of handwritten text augmen- tation on style-based dating of historical documents. SN Computer Science 5(4), 397 (2024)

work page 2024
[71]

In: Oco, N., Dita, S.N., Borlongan, A.M., Kim, J.B

Lai, R.K.Y.: Tupleised co-occurrence measures vs LLM word embeddings for corpus linguistics: The case of English light verb construction detection. In: Oco, N., Dita, S.N., Borlongan, A.M., Kim, J.B. (eds.) Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation. pp. 1201–1212. Tokyo University of Foreign Studies, Tokyo, J...

work page 2024
[72]

In: Huang, C.R., Harada, Y., Kim, J.B., Chen, S., Hsu, Y.Y., Chersoni, E., A, P., Zeng, W.H., Peng, B., Li, Y., Li, J

Lai, R.K.Y., Yin, L.Z., Zhang, A.Y., Jiang, Y., Xin, B.S., Gao, J.: Turn design, resonance and epistemic stance in the diamond sutra: A dialogic constructionist approach. In: Huang, C.R., Harada, Y., Kim, J.B., Chen, S., Hsu, Y.Y., Chersoni, E., A, P., Zeng, W.H., Peng, B., Li, Y., Li, J. (eds.) Proceedings of the 37th Pacific Asia Conference on Language,...

work page 2023
[73]

Palgrave Macmillan (2012)

LeBlanc, J.R., Medine, C.M.J.: Ancient and Modern Religion and Politics: Negotiating Transi- tive Spaces and Hybrid Identities. Palgrave Macmillan (2012)

work page 2012
[74]

arXiv preprint arXiv:2312.06037 (2023)

Lee, G.G., Shi, L., Latif, E., Gao, Y., Bewersdorf, A., Nyaaba, M., Guo, S., Wu, Z., Liu, Z., Wang, H., et al.: Multimodality of ai for education: Towards artificial general intelligence. arXiv preprint arXiv:2312.06037 (2023)

work page arXiv 2023
[75]

arXiv preprint arXiv:2402.10946 (2024)

Li, C., Chen, M., Wang, J., Sitaram, S., Xie, X.: Culturellm: Incorporating cultural differences into large language models. arXiv preprint arXiv:2402.10946 (2024)

work page arXiv 2024
[76]

Li, C., Teney, D., Yang, L., Wen, Q., Xie, X., Wang, J.: Culturepark: Boosting cross-cultural understanding in large language models (2024), https://arxiv.org/abs/2405.15145

work page arXiv 2024
[77]

arXiv preprint arXiv:2410.21418 (2024)

Li, Y., Zhao, H., Jiang, H., Pan, Y., Liu, Z., Wu, Z., Shu, P., Tian, J., Yang, T., Xu, S., et al.: Large language models for manufacturing. arXiv preprint arXiv:2410.21418 (2024)

work page arXiv 2024
[78]

JMIR Medical Education 9(1), e48904 (2023)

Liao, W., Liu, Z., Dai, H., Xu, S., Wu, Z., Zhang, Y., Huang, X., Zhu, D., Cai, H., Li, Q., et al.: Differentiating chatgpt-generated and human-written medical texts: quantitative study. JMIR Medical Education 9(1), e48904 (2023)

work page 2023
[79]

Information 11(2), 67 (2020)

Lin, D., Murakami, Y., Ishida, T.: Towards language service creation and customization for low-resource languages. Information 11(2), 67 (2020)

work page 2020
[80]

Liu, A., Wu, Z., Michael, J., Suhr, A., West, P., Koller, A., Swayamdipta, S., Smith, N.A., Choi, Y.: We’re afraid language models aren’t modeling ambiguity (2023), https://arxiv.org/abs/ 2304.14399

work page arXiv 2023
[81]

Meta-Radiology p

Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., Li, A., He, M., Liu, Z., et al.: Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology p. 100017 (2023)

work page 2023

Showing first 80 references.

[1] [1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Arab World English Journal (AWEJ) Special Issue on Translation (2015)

Agliz, R.: Translation of religious texts: Difficulties and challenges. Arab World English Journal (AWEJ) Special Issue on Translation (2015)

work page 2015

[3] [3]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics pp

Alam, F., Chowdhury, S.A., Boughorbel, S., Hasanain, M.: Llms for low resource languages in multilingual, multimodal, and dialectal settings. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics pp. 27–33 (2024)

work page 2024

[4] [4]

Amrami, A., Goldberg, Y.: Towards better substitution-based word sense induction (2019), https://arxiv.org/abs/1905.12598

work page internal anchor Pith review Pith/arXiv arXiv 2019

[5] [5]

Anderson, B.: Sacred Texts in a Digital Age: Materiality, Digital Culture, and the Functional Dimensions of Scriptures in Judaism, Christianity, and Islam, pp. 281–302. De Gruyter (06 2020). doi:10.1515/9783110634440-013

work page doi:10.1515/9783110634440-013 2020

[6] [6]

https://www.anthropic.com/ news/1m-context (August 2025), anthropic API Update 31

Anthropic: Claude sonnet 4 now supports 1m tokens of context. https://www.anthropic.com/ news/1m-context (August 2025), anthropic API Update 31

work page 2025

[7] [7]

anthropic (May 2025), https://www.anthropic.com/news/ claude-4

Anthropic: Introducing claude 4. anthropic (May 2025), https://www.anthropic.com/news/ claude-4

work page 2025

[8] [8]

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

Arivazhagan, N., Bapna, A., Firat, O., Lepikhin, D., Johnson, M., Krikun, M., Chen, M.X., Cao, Y., Foster, G., Cherry, C., Macherey, W., Chen, Z., Wu, Y.: Massively multilingual neural ma- chine translation in the wild: Findings and challenges. arXiv preprint arXiv:1907.05019 (2019), https://arxiv.org/abs/1907.05019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[9] [9]

Computational Linguistics 34(4), 555–596 (2008)

Artstein, R., Poesio, M.: Survey article: Inter-coder agreement for computational linguis- tics. Computational Linguistics 34(4), 555–596 (2008). doi:10.1162/coli.07-034-R2, https:// aclanthology.org/J08-4004/

work page doi:10.1162/coli.07-034-r2 2008

[10] [10]

Nature 603(7900), 280–283 (2022)

Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., de Freitas, N.: Restoring and attributing ancient texts using deep neural networks. Nature 603(7900), 280–283 (2022)

work page 2022

[11] [11]

In: 2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA)

Avyodri, R., Lukas, S., Tjahyadi, H.: Optical character recognition (ocr) for text recognition and its post-processing method: A literature review. In: 2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA). pp. 1–6. IEEE (2022)

work page 2022

[12] [12]

In: International Conference Florence Heri-Tech: the Future of Heritage Science and Technologies

Barucci, A., Canfailla, C., Cucci, C., Forasassi, M., Franci, M., Guarducci, G., Guidi, T., Loschi- avo, M., Picollo, M., Pini, R., et al.: Ancient egyptian hieroglyphs segmentation and classification with convolutional neural networks. In: International Conference Florence Heri-Tech: the Future of Heritage Science and Technologies. pp. 126–139. Springer (2022)

work page 2022

[13] [13]

In: 2023 19th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS)

Bimagambetova, Z., Rakhymzhanov, D., Jaxylykova, A., Pak, A.: Evaluating large language models for sentence augmentation in low-resource languages: A case study on kazakh. In: 2023 19th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS). pp. 14–18. IEEE (2023)

work page 2023

[14] [14]

In: Proceedings of the ACM Web Conference 2022

Braylan, A., Alonso, O., Lease, M.: Measuring annotator agreement generally across complex structured, multi-object, and free-text annotation tasks. In: Proceedings of the ACM Web Conference 2022. p. 1720–1730. WWW ’22, ACM (Apr 2022). doi:10.1145/3485447.3512242, http://dx.doi.org/10.1145/3485447.3512242

work page doi:10.1145/3485447.3512242 2022

[15] [15]

Brockington, J.: The concept of” dharma” in the r¯ am¯ ayan . a. Journal of Indian Philosophy 32(5/6), 655–670 (2004)

work page 2004

[16] [16]

Language Models are Few-Shot Learners

Brown, T.B.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2005

[17] [17]

In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

Calderon, N., Reichart, R., Dror, R.: The alternative annotator test for LLM-as-a-judge: How to statistically justify replacing human annotators with LLMs. In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T. (eds.) Proceedings of the 63rd Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers). pp. 16051–16081. Associat...

work page doi:10.18653/v1/2025.acl-long.782 2025

[18] [18]

Carlo, V.D., Bianchi, F., Palmonari, M.: Training temporal word embeddings with a compass (2019), https://arxiv.org/abs/1906.02376

work page internal anchor Pith review Pith/arXiv arXiv 2019

[19] [19]

low-resource fine-tuning: A case study with fact-checking in turkish

Cekinel, R.F., Karagoz, P., Coltekin, C.: Cross-lingual learning vs. low-resource fine-tuning: A case study with fact-checking in turkish. arXiv preprint arXiv:2403.00411 (2024)

work page arXiv 2024

[20] [20]

arXiv preprint arXiv:2205.10964 (2022)

Chang, T.A., Tu, Z., Bergen, B.K.: The geometry of multilingual language model representations. arXiv preprint arXiv:2205.10964 (2022). doi:10.48550/arXiv.2205.10964, https://arxiv.org/ abs/2205.10964

work page doi:10.48550/arxiv.2205.10964 2022

[21] [21]

arXiv preprint arXiv:2412.05184 (2024) 32

Chen, J., Shu, P., Li, Y., Zhao, H., Jiang, H., Pan, Y., Zhou, Y., Liu, Z., Howe, L.C., Liu, T.: Queen: A large language model for quechua-english translation. arXiv preprint arXiv:2412.05184 (2024) 32

work page arXiv 2024

[22] [22]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Cho, K.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[23] [23]

Unsupervised Cross-lingual Representation Learning at Scale

Conneau, A.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1911

[24] [24]

arXiv preprint arXiv:2306.10095 (2023)

Dai, H., Li, Y., Liu, Z., Zhao, L., Wu, Z., Song, S., Shen, Y., Zhu, D., Li, X., Li, S., et al.: Ad-autogpt: An autonomous gpt for alzheimer’s disease infodemiology. arXiv preprint arXiv:2306.10095 (2023)

work page arXiv 2023

[25] [25]

arXiv preprint arXiv:2302.13007 (2023)

Dai, H., Liu, Z., Liao, W., Huang, X., Wu, Z., Zhao, L., Liu, W., Liu, N., Li, S., Zhu, D., et al.: Chataug: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007 (2023)

work page arXiv 2023

[26] [26]

Lingua 75(4), 325–361 (1988)

Deane, P.D.: Polysemy and cognition. Lingua 75(4), 325–361 (1988). doi:https://doi.org/10.1016/0024-3841(88)90009-5, https://www.sciencedirect.com/ science/article/pii/0024384188900095

work page doi:10.1016/0024-3841(88)90009-5 1988

[27] [27]

google (March 2025), https://blog

DeepMind, G.: Gemini 2.5: Our most intelligent ai model. google (March 2025), https://blog. google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ #gemini-2-5-thinking

work page 2025

[28] [28]

ahmed: Debate- induced bias in multilingual LLMs

Demidova, A., Atwany, H., Rabih, N., Sha’ban, S., Abdul-Mageed, M.: John vs. ahmed: Debate- induced bias in multilingual LLMs. In: Habash, N., Bouamor, H., Eskander, R., Tomeh, N., Abu Farha, I., Abdelali, A., Touileb, S., Hamed, I., Onaizan, Y., Alhafni, B., Antoun, W., Khalifa, S., Haddad, H., Zitouni, I., AlKhamissi, B., Almatham, R., Mrini, K. (eds.) ...

work page doi:10.18653/v1/2024.arabicnlp- 2024

[29] [29]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

In: International Conference on Pattern Recognition Applications and Methods 2017

Dhali, M., He, S., Popovic, M., Tigchelaar, E., Schomaker, L.: A digital palaeographic approach towards writer identification in the dead sea scrolls. In: International Conference on Pattern Recognition Applications and Methods 2017. pp. 693–702 (2017)

work page 2017

[31] [31]

arXiv preprint arXiv:1911.07930 (2019)

Dhali, M.A., de Wit, J.W., Schomaker, L.: Binet: Degraded-manuscript binarization in di- verse document textures and layouts using deep encoder-decoder networks. arXiv preprint arXiv:1911.07930 (2019)

work page arXiv 1911

[32] [32]

The Llama 3 Herd of Models

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[33] [33]

In: 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO)

Dun ⃗der, I., Seljan, S., Pavlovski, M.: Automatic machine translation of poetry and a low- resource language pair. In: 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO). pp. 1034–1039. IEEE (2020)

work page 2020

[34] [34]

In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers)

Ebrahimi, A., Mager, M., Oncevay, A., Chaudhary, V., Chiruzzo, L., Fan, A., Ortega, J., Ramos, R., Rios, A., Meza Ruiz, I.V., Gim´ enez-Lugo, G., Mager, E., Neubig, G., Palmer, A., Coto- Solano, R., Vu, T., Kann, K.: AmericasNLI: Evaluating zero-shot natural language understand- ing of pretrained multilingual models in truly low-resource languages. In: Pr...

work page 2022

[35] [35]

In: 2024 National Conference on Communications (NCC)

Eledath, D., Baby, A., Singh, S.: Robust speech recognition using meta-learning for low- resource accents. In: 2024 National Conference on Communications (NCC). pp. 1–6 (2024). doi:10.1109/NCC60321.2024.10485786

work page doi:10.1109/ncc60321.2024.10485786 2024

[36] [36]

Journal of translation 10(1), 25–33 (2014) 33

Elewa, A.: Features of translating religious texts. Journal of translation 10(1), 25–33 (2014) 33

work page 2014

[37] [37]

Memoirs of the American Academy in Rome 65, 43–69 (2020)

Erasmo, M.: The theatre of pompey. Memoirs of the American Academy in Rome 65, 43–69 (2020)

work page 2020

[38] [38]

In: Proceedings of ACL

Faisal, F., Ahia, O., Srivastava, A., Ahuja, K., Chiang, D., Tsvetkov, Y., Anastasopoulos, A.: DIALECTBENCH: A NLP benchmark for dialects, varieties, and closely-related languages. In: Proceedings of ACL. Association for Computational Linguistics, Bangkok, Thailand (Aug 2024), https://arxiv.org/pdf/2403.11009

work page arXiv 2024

[39] [39]

Journal of Machine Learning Research 23(120), 1–39 (2022), https://jmlr.org/papers/v23/21-0998.html

Fedus, W., Zoph, B., Shazeer, N.: Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research 23(120), 1–39 (2022), https://jmlr.org/papers/v23/21-0998.html

work page 2022

[40] [40]

In: International conference on machine learning

Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep net- works. In: International conference on machine learning. pp. 1126–1135. PMLR (2017)

work page 2017

[41] [41]

In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Foroutan, N., Banaei, M., Lebret, R., Bosselut, A., Aberer, K.: Discovering language-neutral sub- networks in multilingual language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 7560–7575. Association for Computational Lin- guistics, Abu Dhabi, United Arab Emirates (Dec 2022). doi:10.18653/v1/2022....

work page doi:10.18653/v1/2022.emnlp-main.513 2022

[42] [42]

arXiv preprint arXiv:2503.12051 (2025)

Gao, F., Huang, C., Tashi, N., Wang, X., Tsering, T., Ma-bao, B., Duojie, R., Luosang, G., Dongrub, R., Tashi, D., et al.: Tlue: A tibetan language understanding evaluation benchmark. arXiv preprint arXiv:2503.12051 (2025)

work page arXiv 2025

[43] [43]

org/abs/2403.06399

Ginn, M., Tjuatja, L., He, T., Rice, E., Neubig, G., Palmer, A., Levin, L.: Glosslm: A massively multilingual corpus and pretrained model for interlinear glossed text (2024), https://arxiv. org/abs/2403.06399

work page arXiv 2024

[44] [44]

In: Proceedings of the 58th Annual Meeting of the As- sociation for Computational Linguistics

Giulianelli, M., Del Tredici, M., Fern´ andez, R.: Analysing lexical semantic change with con- textualised word representations. In: Proceedings of the 58th Annual Meeting of the As- sociation for Computational Linguistics. Association for Computational Linguistics (2020). doi:10.18653/v1/2020.acl-main.365, http://dx.doi.org/10.18653/v1/2020.acl-main.365

work page doi:10.18653/v1/2020.acl-main.365 2020

[45] [45]

Master of science thesis, International Institute of Information Technology (IIIT), Hyderabad (2023), https://api.semanticscholar.org/CorpusID:259371553, report No: IIIT/TH/2023/84

Goel, A.: Beyond the Surface: A Computational Exploration of Linguistic Ambiguity. Master of science thesis, International Institute of Information Technology (IIIT), Hyderabad (2023), https://api.semanticscholar.org/CorpusID:259371553, report No: IIIT/TH/2023/84

work page 2023

[46] [46]

arXiv preprint arXiv:2010.08275 (2020)

Gonen, H., Ravfogel, S., Elazar, Y., Goldberg, Y.: It’s not greek to mbert: inducing word-level translations from multilingual bert. arXiv preprint arXiv:2010.08275 (2020)

work page arXiv 2010

[47] [47]

Supervised sequence labelling with recurrent neural networks pp

Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks pp. 37–45 (2012)

work page 2012

[48] [48]

arXiv preprint arXiv:2406.00684 (2024)

Guan, H., Yang, H., Wang, X., Han, S., Liu, Y., Jin, L., Bai, X., Liu, Y.: Deciphering oracle bone language with diffusion models. arXiv preprint arXiv:2406.00684 (2024)

work page arXiv 2024

[49] [49]

International Journal of Heritage Studies 29(5), 398–412 (2023)

Gwerevende, S., Mthombeni, Z.M.: Safeguarding intangible cultural heritage: exploring the synergies in the transmission of indigenous languages, dance and music practices in southern africa. International Journal of Heritage Studies 29(5), 398–412 (2023)

work page 2023

[50] [50]

In: Erk, K., Smith, N.A

Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. In: Erk, K., Smith, N.A. (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1489–1501. Association for Computational Linguistics, Berlin, Germany (Aug 2016). doi:10.1865...

work page doi:10.18653/v1/p16- 2016

[51] [51]

arXiv preprint arXiv:2408.02237 (2024) 34

Hasan, M.A., Tarannum, P., Dey, K., Razzak, I., Naseem, U.: Do large language models speak all languages equally? a comparative study in low-resource settings. arXiv preprint arXiv:2408.02237 (2024) 34

work page arXiv 2024

[52] [52]

arXiv preprint arXiv:2010.12309 (2020)

Hedderich, M.A., Lange, L., Adel, H., Str¨ otgen, J., Klakow, D.: A survey on recent approaches for natural language processing in low-resource scenarios. arXiv preprint arXiv:2010.12309 (2020)

work page arXiv 2010

[53] [53]

In: North American Chapter of the Asso- ciation for Computational Linguistics (2020), https://api.semanticscholar.org/CorpusID: 225062337

Hedderich, M.A., Lange, L., Adel, H., Strotgen, J., Klakow, D.: A survey on recent approaches for natural language processing in low-resource scenarios. In: North American Chapter of the Asso- ciation for Computational Linguistics (2020), https://api.semanticscholar.org/CorpusID: 225062337

work page 2020

[54] [54]

ter Hoeve, M., Grangier, D., Schluter, N.: High-resource methodological bias in low-resource investigations (2022), https://arxiv.org/abs/2211.07534

work page arXiv 2022

[55] [55]

Lost in the Middle: How Language Models Use Long Contexts

Hofmann, V., Glavaˇ s, G., Ljubeˇ si´ c, N., Pierrehumbert, J.B., Sch¨ utze, H.: Geographic adaptation of pretrained language models. Transactions of the Association for Computational Linguistics 12, 411–431 (2024). doi:10.1162/tacl a 00652, https://aclanthology.org/2024.tacl-1.23/

work page internal anchor Pith review doi:10.1162/tacl 2024

[56] [56]

arXiv preprint arXiv:2005.04290 (2020)

Huang, J., Kuchaiev, O., O’Neill, P., Lavrukhin, V., Li, J., Flores, A., Kucsko, G., Ginsburg, B.: Cross-language transfer learning, continuous learning, and domain adaptation for end-to-end automatic speech recognition. arXiv preprint arXiv:2005.04290 (2020)

work page arXiv 2005

[57] [57]

In: International Conference on Machine Learning

Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., Gao, C., Huang, Y., Lyu, W., Zhang, Y., et al.: Position: Trustllm: Trustworthiness in large language models. In: International Conference on Machine Learning. pp. 20166–20270. PMLR (2024)

work page 2024

[58] [58]

TrustLLM: Trustworthiness in Large Language Models

Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., Gao, C., Huang, Y., Lyu, W., Zhang, Y., et al.: Trustllm: Trustworthiness in large language models. arXiv preprint arXiv:2401.05561 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[59] [59]

Faculty Scholarship 612 (2024), https: //digitalcommons.lindenwood.edu/faculty-research-papers/612, available online

Hutson, J., Ellsworth, P., Ellsworth, M.: Preserving linguistic diversity in the digital age: A scalable model for cultural heritage continuity. Faculty Scholarship 612 (2024), https: //digitalcommons.lindenwood.edu/faculty-research-papers/612, available online

work page 2024

[60] [60]

Metaphorik

J¨ akel, O.: Hypotheses revisited: The cognitive theory of metaphor applied to religious texts. Metaphorik. de 2(1), 20–42 (2002)

work page 2002

[61] [61]

Jiang, H., Pan, Y., Chen, J., Liu, Z., Zhou, Y., Shu, P., Li, Y., Zhao, H., Mihm, S., Howe, L.C., Liu, T.: Oraclesage: Towards unified visual-linguistic understanding of oracle bone scripts through cross-modal knowledge fusion (2024), https://arxiv.org/abs/2411.17837

work page arXiv 2024

[62] [62]

edited by antonia michelle daymond, frederick l

Jones Medine, C.M.: T&t clark handbook of african american theology. edited by antonia michelle daymond, frederick l. ware, and eric lewis williams. new york: Bloomsbury t&t clark,

work page

[63] [63]

464 pages. $198.00. Horizons 50(1), 230–232 (2023). doi:10.1017/hor.2023.26

work page doi:10.1017/hor.2023.26 2023

[64] [65]

Proceedings of the National Academy of Sciences122(31), e2426815122 (2025)

Kamath, G., Yang, M., Reddy, S., Sonderegger, M., Card, D.: Semantic change in adults is not primarily a generational phenomenon. Proceedings of the National Academy of Sciences122(31), e2426815122 (2025). doi:10.1073/pnas.2426815122, https://www.pnas.org/doi/abs/10.1073/ pnas.2426815122

work page doi:10.1073/pnas.2426815122 2025

[65] [66]

Journal of Translation and Language Studies 5(1), 1–9 (2024)

Karabayeva, I., Kalizhanova, A.: Evaluating machine translation of literature through rhetorical analysis. Journal of Translation and Language Studies 5(1), 1–9 (2024)

work page 2024

[66] [67]

In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J

Khanuja, S., Dandapat, S., Srinivasan, A., Sitaram, S., Choudhury, M.: GLUECoS: An evaluation benchmark for code-switched NLP. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computa- tional Linguistics. pp. 3575–3585. Association for Computational Linguistics, Online (Jul 2020)....

work page doi:10.18653/v1/2020.acl-main.329 2020

[67] [68]

In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases

Kholodna, N., Julka, S., Khodadadi, M., Gumus, M.N., Granitzer, M.: Llms in the loop: Lever- aging large language model annotations for active learning in low-resource languages. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 397–412. Springer (2024)

work page 2024

[68] [69]

Kirk, H.R., Vidgen, B., R¨ ottger, P., Hale, S.A.: Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback (2023), https://arxiv.org/abs/2303.05453

work page arXiv 2023

[69] [70]

SN Computer Science 5(4), 397 (2024)

Koopmans, L., Dhali, M.A., Schomaker, L.: Performance analysis of handwritten text augmen- tation on style-based dating of historical documents. SN Computer Science 5(4), 397 (2024)

work page 2024

[70] [71]

In: Oco, N., Dita, S.N., Borlongan, A.M., Kim, J.B

Lai, R.K.Y.: Tupleised co-occurrence measures vs LLM word embeddings for corpus linguistics: The case of English light verb construction detection. In: Oco, N., Dita, S.N., Borlongan, A.M., Kim, J.B. (eds.) Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation. pp. 1201–1212. Tokyo University of Foreign Studies, Tokyo, J...

work page 2024

[71] [72]

In: Huang, C.R., Harada, Y., Kim, J.B., Chen, S., Hsu, Y.Y., Chersoni, E., A, P., Zeng, W.H., Peng, B., Li, Y., Li, J

Lai, R.K.Y., Yin, L.Z., Zhang, A.Y., Jiang, Y., Xin, B.S., Gao, J.: Turn design, resonance and epistemic stance in the diamond sutra: A dialogic constructionist approach. In: Huang, C.R., Harada, Y., Kim, J.B., Chen, S., Hsu, Y.Y., Chersoni, E., A, P., Zeng, W.H., Peng, B., Li, Y., Li, J. (eds.) Proceedings of the 37th Pacific Asia Conference on Language,...

work page 2023

[72] [73]

Palgrave Macmillan (2012)

LeBlanc, J.R., Medine, C.M.J.: Ancient and Modern Religion and Politics: Negotiating Transi- tive Spaces and Hybrid Identities. Palgrave Macmillan (2012)

work page 2012

[73] [74]

arXiv preprint arXiv:2312.06037 (2023)

Lee, G.G., Shi, L., Latif, E., Gao, Y., Bewersdorf, A., Nyaaba, M., Guo, S., Wu, Z., Liu, Z., Wang, H., et al.: Multimodality of ai for education: Towards artificial general intelligence. arXiv preprint arXiv:2312.06037 (2023)

work page arXiv 2023

[74] [75]

arXiv preprint arXiv:2402.10946 (2024)

Li, C., Chen, M., Wang, J., Sitaram, S., Xie, X.: Culturellm: Incorporating cultural differences into large language models. arXiv preprint arXiv:2402.10946 (2024)

work page arXiv 2024

[75] [76]

Li, C., Teney, D., Yang, L., Wen, Q., Xie, X., Wang, J.: Culturepark: Boosting cross-cultural understanding in large language models (2024), https://arxiv.org/abs/2405.15145

work page arXiv 2024

[76] [77]

arXiv preprint arXiv:2410.21418 (2024)

Li, Y., Zhao, H., Jiang, H., Pan, Y., Liu, Z., Wu, Z., Shu, P., Tian, J., Yang, T., Xu, S., et al.: Large language models for manufacturing. arXiv preprint arXiv:2410.21418 (2024)

work page arXiv 2024

[77] [78]

JMIR Medical Education 9(1), e48904 (2023)

Liao, W., Liu, Z., Dai, H., Xu, S., Wu, Z., Zhang, Y., Huang, X., Zhu, D., Cai, H., Li, Q., et al.: Differentiating chatgpt-generated and human-written medical texts: quantitative study. JMIR Medical Education 9(1), e48904 (2023)

work page 2023

[78] [79]

Information 11(2), 67 (2020)

Lin, D., Murakami, Y., Ishida, T.: Towards language service creation and customization for low-resource languages. Information 11(2), 67 (2020)

work page 2020

[79] [80]

Liu, A., Wu, Z., Michael, J., Suhr, A., West, P., Koller, A., Swayamdipta, S., Smith, N.A., Choi, Y.: We’re afraid language models aren’t modeling ambiguity (2023), https://arxiv.org/abs/ 2304.14399

work page arXiv 2023

[80] [81]

Meta-Radiology p

Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., Li, A., He, M., Liu, Z., et al.: Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology p. 100017 (2023)

work page 2023