pith. machine review for the scientific record. sign in

arxiv: 2605.09147 · v1 · submitted 2026-05-09 · 💻 cs.CL · cs.AI· stat.AP

Recognition: no theorem link

From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages

Esteban Garces Arias, Matthias Sch\"offel

Pith reviewed 2026-05-12 03:33 UTC · model grok-4.3

classification 💻 cs.CL cs.AIstat.AP
keywords POS tagginglarge language modelsmedieval Romance languagescross-lingual transferfine-tuninghistorical NLPdigital humanities
0
0 comments X

The pith

Large language models outperform traditional taggers for part-of-speech tagging of medieval Romance languages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper compares traditional rule-based and statistical taggers against large language models for part-of-speech tagging on medieval Occitan, Catalan, and French texts. LLM-based methods outperform the traditional ones in experiments on historically grounded datasets. Fine-tuning and multilingual training deliver the largest accuracy gains, with cross-lingual transfer helping under-resourced varieties particularly well. Targeted bilingual training can even surpass broader multilingual approaches for specific languages. The work stresses the value of linguistic proximity and dataset features when choosing transfer strategies for historical language processing.

Core claim

Experiments on historically grounded datasets show that LLM-based approaches consistently outperform traditional taggers, with fine-tuning and multilingual training yielding the largest improvements. In particular, cross-lingual transfer learning substantially benefits under-resourced varieties, while targeted bilingual training can outperform broader multilingual configurations for specific target languages. The results highlight the importance of linguistic proximity and dataset characteristics when designing transfer strategies for historical NLP.

What carries the argument

The evaluation of zero-shot prompting, few-shot prompting, monolingual fine-tuning, and cross-lingual transfer learning settings using open-source LLMs against rule-based and statistical taggers.

If this is right

  • Fine-tuned LLMs achieve higher POS tagging accuracy than traditional taggers on these medieval languages.
  • Multilingual training substantially improves results for under-resourced medieval varieties.
  • Targeted bilingual training can exceed the performance of broader multilingual training for particular target languages.
  • Linguistic proximity between languages influences how effective cross-lingual transfer will be.
  • The findings supply practical guidance for deploying LLM-based tagging in digital humanities work on historical texts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Higher tagging accuracy could support improved downstream analysis such as syntactic parsing or information retrieval from medieval manuscripts.
  • The transfer strategies may extend to other historical language varieties that share similar spelling variation and data scarcity.
  • Hybrid systems that combine LLM predictions with targeted traditional rules could be tested to further reduce errors on variant spellings.

Load-bearing premise

That performance differences arise only from the tagging methods and transfer strategies rather than from unstated differences in model scale, pretraining data overlap with historical texts, or biases in the medieval corpora.

What would settle it

A controlled re-run of the experiments using models matched for size and pretraining data, or an evaluation of the top configurations on an independent medieval text dataset not used in the original study.

Figures

Figures reproduced from arXiv: 2605.09147 by Esteban Garces Arias, Matthias Sch\"offel.

Figure 1
Figure 1. Figure 1: Geographic distribution and spelling characteristics of medieval Romance languages (13th century). Left: [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of performance evolution, in terms [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Effect of trilingual CLTF with respect to single-dataset LLM fine-tuning. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Decoding strategy performance across varying prompts, models and datasets. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
read the original abstract

Part-of-speech (POS) tagging for Medieval Romance languages remains challenging due to orthographic variation, morphological complexity, and limited annotated resources. This paper presents a systematic empirical evaluation of large language models (LLMs) for POS tagging across three medieval varieties: Medieval Occitan, Medieval Catalan, and Medieval French. We compare traditional rule-based and statistical taggers with modern open-source LLMs under zero-shot prompting, few-shot prompting, monolingual fine-tuning, and cross-lingual transfer learning settings. Experiments on historically grounded datasets show that LLM-based approaches consistently outperform traditional taggers, with fine-tuning and multilingual training yielding the largest improvements. In particular, cross-lingual transfer learning substantially benefits under-resourced varieties, while targeted bilingual training can outperform broader multilingual configurations for specific target languages. The results highlight the importance of linguistic proximity and dataset characteristics when designing transfer strategies for historical NLP. These findings provide empirical insights into the applicability of modern neural methods to medieval text processing and provide practical guidance for deploying LLM-based POS tagging pipelines in digital humanities research. All code, models, and processed datasets are released for reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. This manuscript presents a systematic empirical comparison of traditional rule-based and statistical POS taggers against open-source LLMs for part-of-speech tagging on historically grounded datasets of Medieval Occitan, Medieval Catalan, and Medieval French. The authors evaluate zero-shot and few-shot prompting, monolingual fine-tuning, and cross-lingual transfer learning, claiming that LLM-based methods consistently outperform traditional taggers, with the largest gains from fine-tuning and multilingual training. Cross-lingual transfer particularly benefits under-resourced varieties, while targeted bilingual training can outperform broader multilingual setups for specific languages. The work stresses the importance of linguistic proximity and dataset characteristics, and releases all code, models, and processed datasets for reproducibility.

Significance. If the quantitative results hold under the reported controls, this study offers a valuable contribution to historical NLP by providing concrete evidence on the effectiveness of modern neural methods for low-resource medieval Romance languages. The emphasis on fine-tuning and cross-lingual strategies supplies practical guidance for digital humanities applications where annotated data is scarce. A notable strength is the full release of code, models, and datasets, which directly supports reproducibility and enables follow-up work. The paper bridges traditional computational linguistics with LLM techniques in a domain that has received limited attention.

minor comments (3)
  1. Abstract: The abstract asserts consistent outperformance and benefits of fine-tuning and cross-lingual transfer but omits any quantitative results, dataset sizes, or specific metrics. Adding one or two key numbers (e.g., accuracy deltas) would strengthen the summary without lengthening the paragraph.
  2. Section 4 (Experiments): While the full text supplies model names, hyperparameters, and dataset provenance as noted in the review, ensure that all tables reporting accuracy include error bars or statistical significance tests for the cross-lingual comparisons to support the claim of substantial benefits.
  3. Section 5 (Discussion): The discussion of linguistic proximity is insightful, but a brief paragraph on potential confounds (e.g., pretraining data overlap with historical texts) would address the main assumption raised in the review and strengthen the interpretation of transfer results.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review, which accurately summarizes our empirical comparison of traditional POS taggers and LLMs on medieval Occitan, Catalan, and French. We appreciate the recognition of our contributions to historical NLP, the practical guidance on fine-tuning and cross-lingual strategies, and the emphasis on reproducibility through released resources. We will incorporate minor revisions to further strengthen clarity and presentation as suggested.

Circularity Check

0 steps flagged

No significant circularity; empirical comparison only

full rationale

The paper is a straightforward empirical comparison of POS taggers (traditional vs. LLM-based) on three medieval Romance language datasets. It reports experimental outcomes under zero-shot, few-shot, fine-tuning, and cross-lingual settings, with explicit statements that code, models, and datasets are released. No mathematical derivation chain, fitted parameters renamed as predictions, self-definitional equations, or load-bearing self-citations appear in the abstract or described structure. The central claims rest on reproducible experimental results rather than any reduction to inputs by construction. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, invented entities, or non-standard axioms; the work rests on standard NLP domain assumptions about the applicability of POS tagging and transfer learning to historical texts.

axioms (1)
  • domain assumption POS tagging remains a meaningful and evaluable task for medieval Romance languages despite orthographic variation and morphological complexity.
    Invoked implicitly in the experimental design comparing taggers on historically grounded datasets.

pith-pipeline@v0.9.0 · 5500 in / 1476 out tokens · 62975 ms · 2026-05-12T03:33:01.296142+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

107 extracted references · 107 canonical work pages · 4 internal anchors

  1. [1]

    2025 , month = may, howpublished =

    Wiedner, Marinus , title =. doi:10.5281/zenodo.15300719 , url =

  2. [2]

    doi:10.5281/zenodo.5615759 , url =

    Pujol i Campeny, Afra and Meelen, Marieke , title =. doi:10.5281/zenodo.5615759 , url =

  3. [3]

    2014 , note =

    Miriam Cabré , title =. 2014 , note =

  4. [4]

    GPT-4 Technical Report

    Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

  5. [5]

    The Llama 3 Herd of Models

    The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

  6. [6]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

  7. [7]

    Cirugía mayor , url=

    Gago Jover, Francisco , year=. Cirugía mayor , url=. Spanish Medical Texts. Digital Library of Old Spanish Texts. , publisher=

  8. [8]

    Tratado de cirugía , url=

    Gago Jover, Francisco , year=. Tratado de cirugía , url=. Spanish Medical Texts. Digital Library of Old Spanish Texts. , publisher=

  9. [9]

    Anathomie

    Tittel, Sabine , address =. Die "Anathomie" in der "Grande Chirurgie" des Gui de Chauliac : wort- und sachgeschichtliche Untersuchungen und Edition , year =. Die "Anathomie" in der "Grande Chirurgie" des Gui de Chauliac Wort- und Sachgeschichtliche Untersuchungen und Edition , isbn =

  10. [10]

    2024 , MONTH = Jan, DOI =

    Pr. 2024 , MONTH = Jan, DOI =

  11. [11]

    The Oxford handbook of the French language , publisher =

    Philippe Caron , title =. The Oxford handbook of the French language , publisher =. 2024 , pages =

  12. [12]

    Versione occitanica della prima metà del Trecento , year =

    Abū’l Qāsim Halaf Ibn 'Abbās az-Zahrāwī, La Chirurgia. Versione occitanica della prima metà del Trecento , year =

  13. [13]

    Lapidaire en prose , publisher =

  14. [14]

    1906 , place =

    Neuburger, Max , title =. 1906 , place =

  15. [15]

    Manuel de linguistique occitane , publisher =

    Jean Sibille , title =. Manuel de linguistique occitane , publisher =. 2024 , pages =

  16. [16]

    Trotter, D. A. , title = ". Forum for Modern Language Studies , volume =. 1999 , issn =

  17. [17]

    Poujade, Clamenca , title =

  18. [18]

    Wiedner, Marinus , year =. Old

  19. [19]

    2019 , publisher=

    Crises of the Sentence , author=. 2019 , publisher=

  20. [20]

    Part-of-Speech Tagging on 16th-Century

    St. Part-of-Speech Tagging on 16th-Century. 2023 , publisher =

  21. [21]

    Improving Lemmatization of Non-Standard Languages with Joint Learning

    Manjavacas, Enrique and K \'a d \'a r, \'A kos and Kestemont, Mike. Improving Lemmatization of Non-Standard Languages with Joint Learning. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1153

  22. [22]

    Nédey, Oriane and Janès, Juliette and Sagot, Benoît and Bawden, Rachel and Clérice, Thibault , title =

  23. [23]

    Miletic, Aleksandra and Bernhard, Delphine and Bras, Myriam and Ligozat, Anne-Laure and Vergez-Couret, Marianne , URL =

  24. [24]

    Chandra and Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  25. [25]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  26. [26]

    Dan Gusfield , title =. 1997

  27. [27]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  28. [28]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

  29. [29]

    A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages

    Cardenas, Ronald and Lin, Ying and Ji, Heng and May, Jonathan. A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1252

  30. [30]

    Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios

    Eskander, Ramy and Muresan, Smaranda and Collins, Michael. Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.391

  31. [31]

    Automatic Transcription of Handwritten Old O ccitan Language

    Garces Arias, Esteban and Pai, Vallari and Sch. Automatic Transcription of Handwritten Old O ccitan Language. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.953

  32. [32]

    International Conference on Computational Linguistics , year=

    Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource , author=. International Conference on Computational Linguistics , year=

  33. [33]

    Building an old

    Olga Scrivner and Sandra K. Building an old. Proceedings of KONVENS 2012 , pages =. 2012 , editor =

  34. [34]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Weakly supervised pos taggers perform poorly on truly low-resource languages , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  35. [35]

    Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation

    Garces Arias, Esteban and Li, Meimingwei and Heumann, Christian and Assenmacher, Matthias. Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation. Proceedings of the 31st International Conference on Computational Linguistics. 2025

  36. [36]

    Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation

    Garces Arias, Esteban and Rodemann, Julian and Li, Meimingwei and Heumann, Christian and A enmacher, Matthias. Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.885

  37. [37]

    Hierarchical neural story generation

    Fan, Angela and Lewis, Mike and Dauphin, Yann. Hierarchical Neural Story Generation. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1082

  38. [38]

    Beam Search Strategies for Neural Machine Translation , url=

    Freitag, Markus and Al-Onaizan, Yaser , year=. Beam Search Strategies for Neural Machine Translation , url=. doi:10.18653/v1/w17-3207 , booktitle=

  39. [39]

    2022 , eprint=

    A Contrastive Framework for Neural Text Generation , author=. 2022 , eprint=

  40. [40]

    Cognitive science , volume=

    A learning algorithm for Boltzmann machines , author=. Cognitive science , volume=. 1985 , publisher=

  41. [41]

    2018 , eprint=

    Hierarchical Neural Story Generation , author=. 2018 , eprint=

  42. [42]

    The Curious Case of Neural Text Degeneration

    The curious case of neural text degeneration , author=. arXiv preprint arXiv:1904.09751 , year=

  43. [43]

    doi:10.5281/zenodo.3883589 , url =

    Clérice, Thibault , title =. doi:10.5281/zenodo.3883589 , url =

  44. [44]

    2024 , eprint=

    Phi-4 Technical Report , author=. 2024 , eprint=

  45. [45]

    2023 , eprint=

    Mistral 7B , author=. 2023 , eprint=

  46. [46]

    2024 , eprint=

    Gemma 2: Improving Open Language Models at a Practical Size , author=. 2024 , eprint=

  47. [47]

    2024 , eprint=

    Aya 23: Open Weight Releases to Further Multilingual Progress , author=. 2024 , eprint=

  48. [48]

    2025 , eprint=

    Qwen2.5 Technical Report , author=. 2025 , eprint=

  49. [49]

    Improving Low-Resource POS Tagging with Transfer Learning: A Case in Cantonese , author=

  50. [50]

    Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages

    de Vries, Wietse and Wieling, Martijn and Nissim, Malvina. Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.529

  51. [51]

    C orpus A ri \`e ja: Building an Annotated Corpus with Variation in O ccitan

    Poujade, Clamenca and Bras, Myriam and Urieli, Assaf. C orpus A ri \`e ja: Building an Annotated Corpus with Variation in O ccitan. Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024. 2024

  52. [52]

    Modeling Orthographic Variation in O ccitan`s Dialects

    Hopton, Zachary and Aepli, No. Modeling Orthographic Variation in O ccitan`s Dialects. Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024). 2024. doi:10.18653/v1/2024.vardial-1.6

  53. [53]

    To POS Tag or Not to POS Tag: The Impact of POS Tags on Morphological Learning in Low-Resource Settings

    Moeller, Sarah and Liu, Ling and Hulden, Mans. To POS Tag or Not to POS Tag: The Impact of POS Tags on Morphological Learning in Low-Resource Settings. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.1865...

  54. [54]

    Development of Part-of-Speech tagger for a low-resource endangered language , year=

    Gore, Toshal and Khatavkar, Vaibhav , booktitle=. Development of Part-of-Speech tagger for a low-resource endangered language , year=

  55. [55]

    Advances in Neural Information Processing Systems , volume=

    Language Models are Few-Shot Learners , author=. Advances in Neural Information Processing Systems , volume=

  56. [56]

    Journal of Machine Learning Research , volume=

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , author=. Journal of Machine Learning Research , volume=

  57. [57]

    2012 , publisher=

    Natural Language Processing for Historical Texts , author=. 2012 , publisher=

  58. [58]

    Natural Language Engineering , volume=

    Natural Language Processing for Similar Languages, Varieties, and Dialects: A Survey , author=. Natural Language Engineering , volume=. 2019 , publisher=

  59. [59]

    2022 , eprint=

    High-Resource Methodological Bias in Low-Resource Investigations , author=. 2022 , eprint=

  60. [60]

    2020 , eprint=

    Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages , author=. 2020 , eprint=

  61. [61]

    2021 , eprint=

    How Low is Too Low? A Computational Perspective on Extremely Low-Resource Languages , author=. 2021 , eprint=

  62. [62]

    2020 , eprint=

    Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages , author=. 2020 , eprint=

  63. [63]

    2024 , eprint=

    Zero Resource Cross-Lingual Part Of Speech Tagging , author=. 2024 , eprint=

  64. [64]

    2024 , eprint=

    Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios? , author=. 2024 , eprint=

  65. [65]

    2024 , eprint=

    ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks , author=. 2024 , eprint=

  66. [66]

    2022 , eprint=

    Improving Zero-shot Cross-lingual Transfer between Closely Related Languages by injecting Character-level Noise , author=. 2022 , eprint=

  67. [67]

    2023 , eprint=

    Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages , author=. 2023 , eprint=

  68. [68]

    2022 , eprint=

    Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging , author=. 2022 , eprint=

  69. [69]

    2022 , eprint=

    Yunshan Cup 2020: Overview of the Part-of-Speech Tagging Task for Low-resourced Languages , author=. 2022 , eprint=

  70. [70]

    2022 , eprint=

    Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging , author=. 2022 , eprint=

  71. [71]

    2021 , eprint=

    Corpus and Models for Lemmatisation and POS-tagging of Old French , author=. 2021 , eprint=

  72. [72]

    Part of Speech Tagging and Lemmatization of Medieval Latin Texts.A Cross-Genre Survey , author=

    eFontes. Part of Speech Tagging and Lemmatization of Medieval Latin Texts.A Cross-Genre Survey , author=. 2024 , eprint=

  73. [73]

    2021 , eprint=

    A Falta de Pan, Buenas Son Tortas: The Efficacy of Predicted UPOS Tags for Low Resource UD Parsing , author=. 2021 , eprint=

  74. [74]

    2022 , eprint=

    From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French , author=. 2022 , eprint=

  75. [75]

    2018 , eprint=

    Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource , author=. 2018 , eprint=

  76. [76]

    2022 , eprint=

    The Importance of Context in Very Low Resource Language Modeling , author=. 2022 , eprint=

  77. [77]

    Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data , url=

    AlGhamdi, Fahad and Diab, Mona , year=. Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data , url=. doi:10.18653/v1/w19-1410 , booktitle=

  78. [78]

    2020 , eprint=

    Reducing Confusion in Active Learning for Part-Of-Speech Tagging , author=. 2020 , eprint=

  79. [79]

    2018 , eprint=

    Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging , author=. 2018 , eprint=

  80. [80]

    2019 , eprint=

    Low-Resource Name Tagging Learned with Weakly Labeled Data , author=. 2019 , eprint=

Showing first 80 references.