pith. sign in

arxiv: 2412.04497 · v5 · submitted 2024-11-30 · 💻 cs.CL · cs.AI

Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

Pith reviewed 2026-05-23 16:33 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords low-resource languageslarge language modelshumanities researchlinguistic variationcultural preservationdata scarcityinterdisciplinary collaborationethical considerations
0
0 comments X

The pith

Large language models can enable new research methods for low-resource languages in linguistics, history, and culture despite data limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how recent large language models create opportunities to study and preserve low-resource languages that carry cultural and historical value. It examines applications across linguistic variation, historical documentation, cultural expressions, and literary analysis while noting persistent barriers such as scarce data and limited model adaptability. The work argues that customized models paired with collaboration between technologists and humanities researchers offer a practical route forward. A sympathetic reader would care because these languages encode irreplaceable records of human diversity that standard tools have so far left under-examined.

Core claim

Recent advancements in large language models offer transformative opportunities for addressing challenges in low-resource languages, enabling innovative methodologies in linguistic, historical, and cultural research. The paper systematically evaluates applications of LLMs in these domains, identifies key challenges such as data accessibility, model adaptability, and cultural sensitivity, and concludes that interdisciplinary collaboration and customized models are promising avenues for advancing research and preserving linguistic heritage.

What carries the argument

Systematic evaluation of LLM technical frameworks, current methodologies, and ethical considerations applied to low-resource language research in the humanities.

If this is right

  • Researchers gain new ways to document and analyze linguistic variation that traditional methods miss.
  • Historical and cultural records in low-resource languages become more accessible for study and preservation.
  • Customized models reduce the impact of data scarcity on humanities work.
  • Ethical guidelines for cultural sensitivity become necessary when deploying these tools.
  • Global efforts to safeguard intellectual diversity gain concrete technical support.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Success would create digital archives that combine LLM output with traditional fieldwork for endangered languages.
  • The approach could be tested by measuring research output before and after LLM assistance on the same set of texts.
  • Neighboring fields such as anthropology or oral-history studies might adopt similar customized models for their own low-resource data.
  • Wider adoption would require shared benchmarks that test both technical performance and respect for cultural context.

Load-bearing premise

LLMs can be meaningfully adapted to low-resource languages through customized models and interdisciplinary collaboration even when data is scarce and technology is limited.

What would settle it

A controlled comparison on a documented low-resource language showing that customized LLMs produce no measurable improvement in accuracy or completeness of linguistic, historical, or cultural analysis compared with non-LLM methods.

Figures

Figures reproduced from arXiv: 2412.04497 by Haiyang Sun, Hanqi Jiang, Junhao Chen, Ruidong Zhang, Tianming Liu, Tianyang Zhong, Weihang You, Xiang Li, Yifan Zhou, Yiheng Liu, Yi Pan, Yiwei Li, Zhengliang Liu, Zhenyuan Yang.

Figure 1
Figure 1. Figure 1: Overview of the structure outline of the article. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity. Despite their significance, these languages face critical challenges, including data scarcity and technological limitations, which hinder their comprehensive study and preservation. Recent advancements in large language models (LLMs) offer transformative opportunities for addressing these challenges, enabling innovative methodologies in linguistic, historical, and cultural research. This study systematically evaluates the applications of LLMs in low-resource language research, encompassing linguistic variation, historical documentation, cultural expressions, and literary analysis. By analyzing technical frameworks, current methodologies, and ethical considerations, this paper identifies key challenges such as data accessibility, model adaptability, and cultural sensitivity. Given the cultural, historical, and linguistic richness inherent in low-resource languages, this work emphasizes interdisciplinary collaboration and the development of customized models as promising avenues for advancing research in this domain. By underscoring the potential of integrating artificial intelligence with the humanities to preserve and study humanity's linguistic and cultural heritage, this study fosters global efforts towards safeguarding intellectual diversity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript is a literature review surveying applications of large language models (LLMs) to low-resource languages within humanities research. It describes challenges including data scarcity and technological limitations, argues that recent LLM progress creates opportunities for new methods in linguistic variation, historical documentation, cultural expressions, and literary analysis, and identifies issues such as data accessibility, model adaptability, and cultural sensitivity. The paper positions customized models and interdisciplinary collaboration as promising directions while stressing ethical considerations and the value of preserving linguistic diversity.

Significance. A thorough synthesis of this kind could help orient researchers at the AI-humanities boundary and support preservation efforts for low-resource languages, particularly by cataloguing acknowledged limitations rather than overstating current capabilities. Its value would rest on the breadth and balance of the cited work and the specificity with which technical and ethical gaps are mapped.

major comments (1)
  1. [Abstract] Abstract: the statement that the study 'systematically evaluates the applications of LLMs... encompassing linguistic variation, historical documentation, cultural expressions, and literary analysis' and 'analyzing technical frameworks, current methodologies' is central to the paper's claim of identifying key challenges, yet the abstract supplies no concrete examples, search methodology, or quantitative summary of the reviewed literature, leaving the evaluation's rigor and coverage difficult to assess.
minor comments (2)
  1. [Abstract] The abstract repeatedly uses 'this study' and 'this paper' interchangeably; consistent terminology would improve clarity.
  2. No explicit statement of the review protocol (databases, inclusion criteria, or time window) appears in the provided abstract; adding this in the introduction or methods section would strengthen reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and positive overall assessment of the manuscript as a literature review. We address the single major comment below and will revise the abstract accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that the study 'systematically evaluates the applications of LLMs... encompassing linguistic variation, historical documentation, cultural expressions, and literary analysis' and 'analyzing technical frameworks, current methodologies' is central to the paper's claim of identifying key challenges, yet the abstract supplies no concrete examples, search methodology, or quantitative summary of the reviewed literature, leaving the evaluation's rigor and coverage difficult to assess.

    Authors: We agree that the abstract would benefit from greater specificity to better signal the review's scope and approach. The full manuscript details the literature synthesis across the four domains (linguistic variation, historical documentation, cultural expressions, and literary analysis) along with technical and ethical considerations. In revision we will add a concise clause to the abstract noting the breadth of sources examined and the focus on acknowledged limitations rather than overstating capabilities. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a systematic literature review surveying LLM applications to low-resource languages in humanities contexts. It presents no original derivations, equations, fitted parameters, predictions, or technical claims that reduce to inputs by construction. Opportunities are framed as potential avenues enabled by recent LLM progress, with explicit acknowledgment of data scarcity and limits; no self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The structure contains no derivation chain to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is a survey and introduces no new technical derivations, fitted parameters, or postulated entities; it relies on standard domain assumptions about LLM capabilities in NLP.

axioms (1)
  • domain assumption LLMs can be adapted to low-resource languages via customization and interdisciplinary effort
    Implicit in the discussion of opportunities and the recommendation for customized models.

pith-pipeline@v0.9.0 · 5751 in / 1004 out tokens · 48005 ms · 2026-05-23T16:33:59.423469+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Cross-Lingual Transfer and Parameter-Efficient Adaptation in the Turkic Language Family: A Theoretical Framework for Low-Resource Language Models

    cs.CL 2026-03 unverdicted novelty 7.0

    The paper introduces the Turkic Transfer Coefficient (TTC) as a theoretical measure of transfer potential and a scaling model linking adaptation performance to model capacity, data size, and adaptation module expressi...

  2. COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

    cs.LG 2026-04 unverdicted novelty 6.0

    COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.

  3. In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs

    cs.AI 2026-05 unverdicted novelty 4.0

    A research plan to analyze language distribution in LOD knowledge graphs and explore cross-lingual transfer plus analogical reasoning to improve coverage for low-resource languages.

  4. LLM Harms: A Taxonomy and Discussion

    cs.CY 2025-12 unverdicted novelty 3.0

    This paper proposes a taxonomy of LLM harms in five categories and suggests mitigation strategies plus a dynamic auditing system for responsible development.

Reference graph

Works this paper leans on

157 extracted references · 157 canonical work pages · cited by 4 Pith papers · 14 internal anchors

  1. [1]

    GPT-4 Technical Report

    Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  2. [2]

    Arab World English Journal (AWEJ) Special Issue on Translation (2015)

    Agliz, R.: Translation of religious texts: Difficulties and challenges. Arab World English Journal (AWEJ) Special Issue on Translation (2015)

  3. [3]

    Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics pp

    Alam, F., Chowdhury, S.A., Boughorbel, S., Hasanain, M.: Llms for low resource languages in multilingual, multimodal, and dialectal settings. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics pp. 27–33 (2024)

  4. [4]

    Amrami, A., Goldberg, Y.: Towards better substitution-based word sense induction (2019), https://arxiv.org/abs/1905.12598

  5. [5]

    Anderson, B.: Sacred Texts in a Digital Age: Materiality, Digital Culture, and the Functional Dimensions of Scriptures in Judaism, Christianity, and Islam, pp. 281–302. De Gruyter (06 2020). doi:10.1515/9783110634440-013

  6. [6]

    https://www.anthropic.com/ news/1m-context (August 2025), anthropic API Update 31

    Anthropic: Claude sonnet 4 now supports 1m tokens of context. https://www.anthropic.com/ news/1m-context (August 2025), anthropic API Update 31

  7. [7]

    anthropic (May 2025), https://www.anthropic.com/news/ claude-4

    Anthropic: Introducing claude 4. anthropic (May 2025), https://www.anthropic.com/news/ claude-4

  8. [8]

    Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

    Arivazhagan, N., Bapna, A., Firat, O., Lepikhin, D., Johnson, M., Krikun, M., Chen, M.X., Cao, Y., Foster, G., Cherry, C., Macherey, W., Chen, Z., Wu, Y.: Massively multilingual neural ma- chine translation in the wild: Findings and challenges. arXiv preprint arXiv:1907.05019 (2019), https://arxiv.org/abs/1907.05019

  9. [9]

    Computational Linguistics 34(4), 555–596 (2008)

    Artstein, R., Poesio, M.: Survey article: Inter-coder agreement for computational linguis- tics. Computational Linguistics 34(4), 555–596 (2008). doi:10.1162/coli.07-034-R2, https:// aclanthology.org/J08-4004/

  10. [10]

    Nature 603(7900), 280–283 (2022)

    Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., de Freitas, N.: Restoring and attributing ancient texts using deep neural networks. Nature 603(7900), 280–283 (2022)

  11. [11]

    In: 2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA)

    Avyodri, R., Lukas, S., Tjahyadi, H.: Optical character recognition (ocr) for text recognition and its post-processing method: A literature review. In: 2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA). pp. 1–6. IEEE (2022)

  12. [12]

    In: International Conference Florence Heri-Tech: the Future of Heritage Science and Technologies

    Barucci, A., Canfailla, C., Cucci, C., Forasassi, M., Franci, M., Guarducci, G., Guidi, T., Loschi- avo, M., Picollo, M., Pini, R., et al.: Ancient egyptian hieroglyphs segmentation and classification with convolutional neural networks. In: International Conference Florence Heri-Tech: the Future of Heritage Science and Technologies. pp. 126–139. Springer (2022)

  13. [13]

    In: 2023 19th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS)

    Bimagambetova, Z., Rakhymzhanov, D., Jaxylykova, A., Pak, A.: Evaluating large language models for sentence augmentation in low-resource languages: A case study on kazakh. In: 2023 19th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS). pp. 14–18. IEEE (2023)

  14. [14]

    In: Proceedings of the ACM Web Conference 2022

    Braylan, A., Alonso, O., Lease, M.: Measuring annotator agreement generally across complex structured, multi-object, and free-text annotation tasks. In: Proceedings of the ACM Web Conference 2022. p. 1720–1730. WWW ’22, ACM (Apr 2022). doi:10.1145/3485447.3512242, http://dx.doi.org/10.1145/3485447.3512242

  15. [15]

    Brockington, J.: The concept of” dharma” in the r¯ am¯ ayan . a. Journal of Indian Philosophy 32(5/6), 655–670 (2004)

  16. [16]

    Language Models are Few-Shot Learners

    Brown, T.B.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)

  17. [17]

    In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

    Calderon, N., Reichart, R., Dror, R.: The alternative annotator test for LLM-as-a-judge: How to statistically justify replacing human annotators with LLMs. In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T. (eds.) Proceedings of the 63rd Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers). pp. 16051–16081. Associat...

  18. [18]

    Carlo, V.D., Bianchi, F., Palmonari, M.: Training temporal word embeddings with a compass (2019), https://arxiv.org/abs/1906.02376

  19. [19]

    low-resource fine-tuning: A case study with fact-checking in turkish

    Cekinel, R.F., Karagoz, P., Coltekin, C.: Cross-lingual learning vs. low-resource fine-tuning: A case study with fact-checking in turkish. arXiv preprint arXiv:2403.00411 (2024)

  20. [20]

    arXiv preprint arXiv:2205.10964 (2022)

    Chang, T.A., Tu, Z., Bergen, B.K.: The geometry of multilingual language model representations. arXiv preprint arXiv:2205.10964 (2022). doi:10.48550/arXiv.2205.10964, https://arxiv.org/ abs/2205.10964

  21. [21]

    arXiv preprint arXiv:2412.05184 (2024) 32

    Chen, J., Shu, P., Li, Y., Zhao, H., Jiang, H., Pan, Y., Zhou, Y., Liu, Z., Howe, L.C., Liu, T.: Queen: A large language model for quechua-english translation. arXiv preprint arXiv:2412.05184 (2024) 32

  22. [22]

    Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

    Cho, K.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  23. [23]

    Unsupervised Cross-lingual Representation Learning at Scale

    Conneau, A.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)

  24. [24]

    arXiv preprint arXiv:2306.10095 (2023)

    Dai, H., Li, Y., Liu, Z., Zhao, L., Wu, Z., Song, S., Shen, Y., Zhu, D., Li, X., Li, S., et al.: Ad-autogpt: An autonomous gpt for alzheimer’s disease infodemiology. arXiv preprint arXiv:2306.10095 (2023)

  25. [25]

    arXiv preprint arXiv:2302.13007 (2023)

    Dai, H., Liu, Z., Liao, W., Huang, X., Wu, Z., Zhao, L., Liu, W., Liu, N., Li, S., Zhu, D., et al.: Chataug: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007 (2023)

  26. [26]

    Lingua 75(4), 325–361 (1988)

    Deane, P.D.: Polysemy and cognition. Lingua 75(4), 325–361 (1988). doi:https://doi.org/10.1016/0024-3841(88)90009-5, https://www.sciencedirect.com/ science/article/pii/0024384188900095

  27. [27]

    google (March 2025), https://blog

    DeepMind, G.: Gemini 2.5: Our most intelligent ai model. google (March 2025), https://blog. google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ #gemini-2-5-thinking

  28. [28]

    ahmed: Debate- induced bias in multilingual LLMs

    Demidova, A., Atwany, H., Rabih, N., Sha’ban, S., Abdul-Mageed, M.: John vs. ahmed: Debate- induced bias in multilingual LLMs. In: Habash, N., Bouamor, H., Eskander, R., Tomeh, N., Abu Farha, I., Abdelali, A., Touileb, S., Hamed, I., Onaizan, Y., Alhafni, B., Antoun, W., Khalifa, S., Haddad, H., Zitouni, I., AlKhamissi, B., Almatham, R., Mrini, K. (eds.) ...

  29. [29]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, J.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  30. [30]

    In: International Conference on Pattern Recognition Applications and Methods 2017

    Dhali, M., He, S., Popovic, M., Tigchelaar, E., Schomaker, L.: A digital palaeographic approach towards writer identification in the dead sea scrolls. In: International Conference on Pattern Recognition Applications and Methods 2017. pp. 693–702 (2017)

  31. [31]

    arXiv preprint arXiv:1911.07930 (2019)

    Dhali, M.A., de Wit, J.W., Schomaker, L.: Binet: Degraded-manuscript binarization in di- verse document textures and layouts using deep encoder-decoder networks. arXiv preprint arXiv:1911.07930 (2019)

  32. [32]

    The Llama 3 Herd of Models

    Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

  33. [33]

    In: 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO)

    Dun ⃗der, I., Seljan, S., Pavlovski, M.: Automatic machine translation of poetry and a low- resource language pair. In: 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO). pp. 1034–1039. IEEE (2020)

  34. [34]

    In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers)

    Ebrahimi, A., Mager, M., Oncevay, A., Chaudhary, V., Chiruzzo, L., Fan, A., Ortega, J., Ramos, R., Rios, A., Meza Ruiz, I.V., Gim´ enez-Lugo, G., Mager, E., Neubig, G., Palmer, A., Coto- Solano, R., Vu, T., Kann, K.: AmericasNLI: Evaluating zero-shot natural language understand- ing of pretrained multilingual models in truly low-resource languages. In: Pr...

  35. [35]

    In: 2024 National Conference on Communications (NCC)

    Eledath, D., Baby, A., Singh, S.: Robust speech recognition using meta-learning for low- resource accents. In: 2024 National Conference on Communications (NCC). pp. 1–6 (2024). doi:10.1109/NCC60321.2024.10485786

  36. [36]

    Journal of translation 10(1), 25–33 (2014) 33

    Elewa, A.: Features of translating religious texts. Journal of translation 10(1), 25–33 (2014) 33

  37. [37]

    Memoirs of the American Academy in Rome 65, 43–69 (2020)

    Erasmo, M.: The theatre of pompey. Memoirs of the American Academy in Rome 65, 43–69 (2020)

  38. [38]

    In: Proceedings of ACL

    Faisal, F., Ahia, O., Srivastava, A., Ahuja, K., Chiang, D., Tsvetkov, Y., Anastasopoulos, A.: DIALECTBENCH: A NLP benchmark for dialects, varieties, and closely-related languages. In: Proceedings of ACL. Association for Computational Linguistics, Bangkok, Thailand (Aug 2024), https://arxiv.org/pdf/2403.11009

  39. [39]

    Journal of Machine Learning Research 23(120), 1–39 (2022), https://jmlr.org/papers/v23/21-0998.html

    Fedus, W., Zoph, B., Shazeer, N.: Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research 23(120), 1–39 (2022), https://jmlr.org/papers/v23/21-0998.html

  40. [40]

    In: International conference on machine learning

    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep net- works. In: International conference on machine learning. pp. 1126–1135. PMLR (2017)

  41. [41]

    In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

    Foroutan, N., Banaei, M., Lebret, R., Bosselut, A., Aberer, K.: Discovering language-neutral sub- networks in multilingual language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 7560–7575. Association for Computational Lin- guistics, Abu Dhabi, United Arab Emirates (Dec 2022). doi:10.18653/v1/2022....

  42. [42]

    arXiv preprint arXiv:2503.12051 (2025)

    Gao, F., Huang, C., Tashi, N., Wang, X., Tsering, T., Ma-bao, B., Duojie, R., Luosang, G., Dongrub, R., Tashi, D., et al.: Tlue: A tibetan language understanding evaluation benchmark. arXiv preprint arXiv:2503.12051 (2025)

  43. [43]

    org/abs/2403.06399

    Ginn, M., Tjuatja, L., He, T., Rice, E., Neubig, G., Palmer, A., Levin, L.: Glosslm: A massively multilingual corpus and pretrained model for interlinear glossed text (2024), https://arxiv. org/abs/2403.06399

  44. [44]

    In: Proceedings of the 58th Annual Meeting of the As- sociation for Computational Linguistics

    Giulianelli, M., Del Tredici, M., Fern´ andez, R.: Analysing lexical semantic change with con- textualised word representations. In: Proceedings of the 58th Annual Meeting of the As- sociation for Computational Linguistics. Association for Computational Linguistics (2020). doi:10.18653/v1/2020.acl-main.365, http://dx.doi.org/10.18653/v1/2020.acl-main.365

  45. [45]

    Master of science thesis, International Institute of Information Technology (IIIT), Hyderabad (2023), https://api.semanticscholar.org/CorpusID:259371553, report No: IIIT/TH/2023/84

    Goel, A.: Beyond the Surface: A Computational Exploration of Linguistic Ambiguity. Master of science thesis, International Institute of Information Technology (IIIT), Hyderabad (2023), https://api.semanticscholar.org/CorpusID:259371553, report No: IIIT/TH/2023/84

  46. [46]

    arXiv preprint arXiv:2010.08275 (2020)

    Gonen, H., Ravfogel, S., Elazar, Y., Goldberg, Y.: It’s not greek to mbert: inducing word-level translations from multilingual bert. arXiv preprint arXiv:2010.08275 (2020)

  47. [47]

    Supervised sequence labelling with recurrent neural networks pp

    Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks pp. 37–45 (2012)

  48. [48]

    arXiv preprint arXiv:2406.00684 (2024)

    Guan, H., Yang, H., Wang, X., Han, S., Liu, Y., Jin, L., Bai, X., Liu, Y.: Deciphering oracle bone language with diffusion models. arXiv preprint arXiv:2406.00684 (2024)

  49. [49]

    International Journal of Heritage Studies 29(5), 398–412 (2023)

    Gwerevende, S., Mthombeni, Z.M.: Safeguarding intangible cultural heritage: exploring the synergies in the transmission of indigenous languages, dance and music practices in southern africa. International Journal of Heritage Studies 29(5), 398–412 (2023)

  50. [50]

    In: Erk, K., Smith, N.A

    Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. In: Erk, K., Smith, N.A. (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1489–1501. Association for Computational Linguistics, Berlin, Germany (Aug 2016). doi:10.1865...

  51. [51]

    arXiv preprint arXiv:2408.02237 (2024) 34

    Hasan, M.A., Tarannum, P., Dey, K., Razzak, I., Naseem, U.: Do large language models speak all languages equally? a comparative study in low-resource settings. arXiv preprint arXiv:2408.02237 (2024) 34

  52. [52]

    arXiv preprint arXiv:2010.12309 (2020)

    Hedderich, M.A., Lange, L., Adel, H., Str¨ otgen, J., Klakow, D.: A survey on recent approaches for natural language processing in low-resource scenarios. arXiv preprint arXiv:2010.12309 (2020)

  53. [53]

    In: North American Chapter of the Asso- ciation for Computational Linguistics (2020), https://api.semanticscholar.org/CorpusID: 225062337

    Hedderich, M.A., Lange, L., Adel, H., Strotgen, J., Klakow, D.: A survey on recent approaches for natural language processing in low-resource scenarios. In: North American Chapter of the Asso- ciation for Computational Linguistics (2020), https://api.semanticscholar.org/CorpusID: 225062337

  54. [54]

    ter Hoeve, M., Grangier, D., Schluter, N.: High-resource methodological bias in low-resource investigations (2022), https://arxiv.org/abs/2211.07534

  55. [55]

    Lost in the Middle: How Language Models Use Long Contexts

    Hofmann, V., Glavaˇ s, G., Ljubeˇ si´ c, N., Pierrehumbert, J.B., Sch¨ utze, H.: Geographic adaptation of pretrained language models. Transactions of the Association for Computational Linguistics 12, 411–431 (2024). doi:10.1162/tacl a 00652, https://aclanthology.org/2024.tacl-1.23/

  56. [56]

    arXiv preprint arXiv:2005.04290 (2020)

    Huang, J., Kuchaiev, O., O’Neill, P., Lavrukhin, V., Li, J., Flores, A., Kucsko, G., Ginsburg, B.: Cross-language transfer learning, continuous learning, and domain adaptation for end-to-end automatic speech recognition. arXiv preprint arXiv:2005.04290 (2020)

  57. [57]

    In: International Conference on Machine Learning

    Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., Gao, C., Huang, Y., Lyu, W., Zhang, Y., et al.: Position: Trustllm: Trustworthiness in large language models. In: International Conference on Machine Learning. pp. 20166–20270. PMLR (2024)

  58. [58]

    TrustLLM: Trustworthiness in Large Language Models

    Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., Gao, C., Huang, Y., Lyu, W., Zhang, Y., et al.: Trustllm: Trustworthiness in large language models. arXiv preprint arXiv:2401.05561 (2024)

  59. [59]

    Faculty Scholarship 612 (2024), https: //digitalcommons.lindenwood.edu/faculty-research-papers/612, available online

    Hutson, J., Ellsworth, P., Ellsworth, M.: Preserving linguistic diversity in the digital age: A scalable model for cultural heritage continuity. Faculty Scholarship 612 (2024), https: //digitalcommons.lindenwood.edu/faculty-research-papers/612, available online

  60. [60]

    Metaphorik

    J¨ akel, O.: Hypotheses revisited: The cognitive theory of metaphor applied to religious texts. Metaphorik. de 2(1), 20–42 (2002)

  61. [61]

    Jiang, H., Pan, Y., Chen, J., Liu, Z., Zhou, Y., Shu, P., Li, Y., Zhao, H., Mihm, S., Howe, L.C., Liu, T.: Oraclesage: Towards unified visual-linguistic understanding of oracle bone scripts through cross-modal knowledge fusion (2024), https://arxiv.org/abs/2411.17837

  62. [62]

    edited by antonia michelle daymond, frederick l

    Jones Medine, C.M.: T&t clark handbook of african american theology. edited by antonia michelle daymond, frederick l. ware, and eric lewis williams. new york: Bloomsbury t&t clark,

  63. [63]

    464 pages. $198.00. Horizons 50(1), 230–232 (2023). doi:10.1017/hor.2023.26

  64. [65]

    Proceedings of the National Academy of Sciences122(31), e2426815122 (2025)

    Kamath, G., Yang, M., Reddy, S., Sonderegger, M., Card, D.: Semantic change in adults is not primarily a generational phenomenon. Proceedings of the National Academy of Sciences122(31), e2426815122 (2025). doi:10.1073/pnas.2426815122, https://www.pnas.org/doi/abs/10.1073/ pnas.2426815122

  65. [66]

    Journal of Translation and Language Studies 5(1), 1–9 (2024)

    Karabayeva, I., Kalizhanova, A.: Evaluating machine translation of literature through rhetorical analysis. Journal of Translation and Language Studies 5(1), 1–9 (2024)

  66. [67]

    In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J

    Khanuja, S., Dandapat, S., Srinivasan, A., Sitaram, S., Choudhury, M.: GLUECoS: An evaluation benchmark for code-switched NLP. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computa- tional Linguistics. pp. 3575–3585. Association for Computational Linguistics, Online (Jul 2020)....

  67. [68]

    In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases

    Kholodna, N., Julka, S., Khodadadi, M., Gumus, M.N., Granitzer, M.: Llms in the loop: Lever- aging large language model annotations for active learning in low-resource languages. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 397–412. Springer (2024)

  68. [69]

    Kirk, H.R., Vidgen, B., R¨ ottger, P., Hale, S.A.: Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback (2023), https://arxiv.org/abs/2303.05453

  69. [70]

    SN Computer Science 5(4), 397 (2024)

    Koopmans, L., Dhali, M.A., Schomaker, L.: Performance analysis of handwritten text augmen- tation on style-based dating of historical documents. SN Computer Science 5(4), 397 (2024)

  70. [71]

    In: Oco, N., Dita, S.N., Borlongan, A.M., Kim, J.B

    Lai, R.K.Y.: Tupleised co-occurrence measures vs LLM word embeddings for corpus linguistics: The case of English light verb construction detection. In: Oco, N., Dita, S.N., Borlongan, A.M., Kim, J.B. (eds.) Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation. pp. 1201–1212. Tokyo University of Foreign Studies, Tokyo, J...

  71. [72]

    In: Huang, C.R., Harada, Y., Kim, J.B., Chen, S., Hsu, Y.Y., Chersoni, E., A, P., Zeng, W.H., Peng, B., Li, Y., Li, J

    Lai, R.K.Y., Yin, L.Z., Zhang, A.Y., Jiang, Y., Xin, B.S., Gao, J.: Turn design, resonance and epistemic stance in the diamond sutra: A dialogic constructionist approach. In: Huang, C.R., Harada, Y., Kim, J.B., Chen, S., Hsu, Y.Y., Chersoni, E., A, P., Zeng, W.H., Peng, B., Li, Y., Li, J. (eds.) Proceedings of the 37th Pacific Asia Conference on Language,...

  72. [73]

    Palgrave Macmillan (2012)

    LeBlanc, J.R., Medine, C.M.J.: Ancient and Modern Religion and Politics: Negotiating Transi- tive Spaces and Hybrid Identities. Palgrave Macmillan (2012)

  73. [74]

    arXiv preprint arXiv:2312.06037 (2023)

    Lee, G.G., Shi, L., Latif, E., Gao, Y., Bewersdorf, A., Nyaaba, M., Guo, S., Wu, Z., Liu, Z., Wang, H., et al.: Multimodality of ai for education: Towards artificial general intelligence. arXiv preprint arXiv:2312.06037 (2023)

  74. [75]

    arXiv preprint arXiv:2402.10946 (2024)

    Li, C., Chen, M., Wang, J., Sitaram, S., Xie, X.: Culturellm: Incorporating cultural differences into large language models. arXiv preprint arXiv:2402.10946 (2024)

  75. [76]

    Li, C., Teney, D., Yang, L., Wen, Q., Xie, X., Wang, J.: Culturepark: Boosting cross-cultural understanding in large language models (2024), https://arxiv.org/abs/2405.15145

  76. [77]

    arXiv preprint arXiv:2410.21418 (2024)

    Li, Y., Zhao, H., Jiang, H., Pan, Y., Liu, Z., Wu, Z., Shu, P., Tian, J., Yang, T., Xu, S., et al.: Large language models for manufacturing. arXiv preprint arXiv:2410.21418 (2024)

  77. [78]

    JMIR Medical Education 9(1), e48904 (2023)

    Liao, W., Liu, Z., Dai, H., Xu, S., Wu, Z., Zhang, Y., Huang, X., Zhu, D., Cai, H., Li, Q., et al.: Differentiating chatgpt-generated and human-written medical texts: quantitative study. JMIR Medical Education 9(1), e48904 (2023)

  78. [79]

    Information 11(2), 67 (2020)

    Lin, D., Murakami, Y., Ishida, T.: Towards language service creation and customization for low-resource languages. Information 11(2), 67 (2020)

  79. [80]

    Liu, A., Wu, Z., Michael, J., Suhr, A., West, P., Koller, A., Swayamdipta, S., Smith, N.A., Choi, Y.: We’re afraid language models aren’t modeling ambiguity (2023), https://arxiv.org/abs/ 2304.14399

  80. [81]

    Meta-Radiology p

    Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., Li, A., He, M., Liu, Z., et al.: Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology p. 100017 (2023)

Showing first 80 references.