pith. machine review for the scientific record. sign in

arxiv: 2605.02608 · v1 · submitted 2026-05-04 · 💻 cs.CL · cs.AI· cs.LG

Recognition: 3 theorem links

· Lean Theorem

Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:33 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG
keywords dependency parsinglow-resource languagesBiaffine LSTMtransformer modelsmorphological complexityAfrican languagestreebanks
0
0 comments X

The pith

Biaffine LSTM outperforms transformers for dependency parsing in low-resource regimes until data volume reaches a moderate threshold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares four dependency parsers across ten typologically diverse languages, with emphasis on low-resource African languages. It shows that the Biaffine LSTM architecture delivers higher accuracy than transformer models when annotated training data is limited, yet transformers regain the lead once data volume increases. The switchover occurs at corpus sizes commonly seen in under-resourced treebanks. Morphological complexity, quantified by the MATTR metric, further widens the performance gap for transformers even after corpus size is controlled. These patterns offer direct guidance on architecture selection for syntactic tools in data-scarce settings.

Core claim

The Biaffine LSTM consistently outperforms transformer models in low-resource regimes, with transformers recovering their advantage as training data increases; the crossover falls within a resource range typical of treebanks for under-resourced languages, and morphological complexity measured via MATTR emerges as a significant secondary predictor of transformers' relative disadvantage after controlling for corpus size.

What carries the argument

Direct head-to-head evaluation of the Biaffine LSTM and Stack-Pointer Network against pre-trained transformers (AfroXLMR-large and RemBERT) on controlled subsets of training data, with MATTR serving as the measure of morphological complexity.

If this is right

  • The Biaffine LSTM is the better choice for building syntactic tools when annotated data is scarce.
  • Transformers become preferable once treebank size exceeds the typical range for under-resourced languages.
  • Morphological complexity remains an independent factor that favors simpler LSTM parsers.
  • Resource-aware parser selection can improve parsing accuracy for languages with small treebanks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same data-size crossover may appear in other structured prediction tasks such as semantic role labeling or named-entity recognition.
  • Targeted data collection for morphologically complex languages could accelerate the point at which transformers become viable.
  • Hybrid systems that switch between LSTM and transformer backbones based on available data volume are worth testing.

Load-bearing premise

The ten chosen languages, especially the low-resource African ones, represent broader low-resource conditions, and the MATTR metric isolates morphological complexity independently of data quality or annotation consistency.

What would settle it

Repeating the experiments on a fresh set of low-resource languages while systematically varying training-set size and morphological complexity to check whether the same performance crossover and MATTR correlation appear.

Figures

Figures reproduced from arXiv: 2605.02608 by Christiane Fellbaum, Happy Buzaaba, Kevin Guan.

Figure 1
Figure 1. Figure 1: RER vs. sentence count. RER < 0 indicates fewer errors than Biaffine LSTM baseline. Model Metric log10(train) Sents. AfroXLMR-large LAS 2.923 838 RemBERT LAS 3.142 1,388 AfroXLMR-large UAS 2.918 829 RemBERT UAS 3.127 1,339 view at source ↗
Figure 3
Figure 3. Figure 3: Partial regression of RERLAS on MATTR for RemBERT (top) and AfroXLMR-large (bottom), after residualizing both variables for training size. Partial regression plots for RERUAS are omitted for brevity, al￾though the relationship of RERUAS with morphological complexity is consistent with that of RERLAS and leads to the same conclusions. bank’s annotation conventions requires sufficient supervision. With littl… view at source ↗
read the original abstract

Transformer-based models achieve state-of-the-art dependency parsing for high-resource languages, yet their advantage over simpler architectures in low-resource settings remains poorly understood. We evaluate four parsers -- the Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT -- across ten typologically diverse languages, with a focus on low-resource African languages. We find that the Biaffine LSTM consistently outperforms transformer models in low-resource regimes, with transformers recovering their advantage as training data increases. The crossover falls within a resource range typical of treebanks for under-resourced languages. Morphological complexity (measured via MATTR) emerges as a significant secondary predictor of transformers' relative disadvantage after controlling for corpus size. These results indicate that the Biaffine LSTM may be better suited for syntactic tool development in low-resource regimes until sufficient annotated data is available to leverage the representational capacity of pre-trained transformers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript evaluates four dependency parsing architectures—the Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT—across ten typologically diverse languages, emphasizing low-resource African languages. It reports that the Biaffine LSTM outperforms the transformer models in low-resource regimes, that transformers recover their advantage as training data increases, that the performance crossover occurs within a resource range typical of under-resourced treebanks, and that morphological complexity (measured via MATTR) is a significant secondary predictor of transformers' relative disadvantage after controlling for corpus size. These findings are used to recommend the Biaffine LSTM for syntactic tool development in low-resource settings until sufficient data is available.

Significance. If the primary performance curves and crossover point are reproducible, the work provides practical guidance for architecture selection in low-resource dependency parsing, a topic of direct relevance to NLP tool development for under-resourced languages. The empirical focus on a resource spectrum and typologically diverse set (including African languages) adds value beyond single-language studies. The secondary MATTR-based predictor, however, requires stronger justification to support the mechanistic interpretation offered in the abstract.

major comments (1)
  1. [Results and discussion of secondary predictors] The claim that MATTR measures morphological complexity and serves as a significant secondary predictor (abstract and results/discussion) is not adequately supported. MATTR is a standard lexical-diversity metric (moving-average type-token ratio), while morphological complexity is conventionally quantified by inflectional entropy, paradigm size, or average morphemes per word. The manuscript provides no validation that the regression isolates morphological effects from lexical diversity, annotation consistency, or data quality. Because this predictor is presented as explanatory support for the observed crossover and the recommendation for Biaffine LSTM, the mechanistic interpretation is insecure.
minor comments (2)
  1. [Methods] Provide explicit details on data splits, hyperparameter search procedures, statistical significance tests for performance differences, and any controls for model size or pretraining corpus overlap to allow full reproducibility of the primary comparisons.
  2. [Results] Clarify the exact regression model, included covariates, and reported coefficients or p-values for the MATTR analysis so readers can assess the strength of the secondary finding independently.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the interpretation and justification of MATTR below. The primary empirical findings on parser performance across resource levels and the recommended use of the Biaffine LSTM in low-resource settings are unaffected by this revision.

read point-by-point responses
  1. Referee: The claim that MATTR measures morphological complexity and serves as a significant secondary predictor (abstract and results/discussion) is not adequately supported. MATTR is a standard lexical-diversity metric (moving-average type-token ratio), while morphological complexity is conventionally quantified by inflectional entropy, paradigm size, or average morphemes per word. The manuscript provides no validation that the regression isolates morphological effects from lexical diversity, annotation consistency, or data quality. Because this predictor is presented as explanatory support for the observed crossover and the recommendation for Biaffine LSTM, the mechanistic interpretation is insecure.

    Authors: We acknowledge the referee's point that MATTR is conventionally a lexical-diversity metric rather than a direct measure of morphological complexity (such as inflectional entropy or paradigm size). The manuscript's phrasing in the abstract and discussion does overstate the direct link. In the revised version we will (1) replace the parenthetical claim with language describing MATTR as a lexical-diversity proxy that correlates with morphological richness in the languages studied, (2) add explicit discussion of the regression controls (corpus size already included as a covariate) and the limitations of this proxy, and (3) tone down the mechanistic interpretation to note that MATTR captures a secondary signal whose precise causal contribution requires further validation with dedicated morphological metrics. These changes will be made in the abstract, results, and discussion sections. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical measurements on held-out data

full rationale

The paper reports performance comparisons of four parsers across ten languages using standard train/dev/test splits and regression to identify predictors. All reported advantages, crossovers, and secondary correlations (including MATTR) are measured outcomes from observed data rather than quantities defined by the analysis itself or reduced to fitted inputs by construction. No derivations, uniqueness theorems, ansatzes, or self-citations are invoked as load-bearing steps in any claimed chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim is supported entirely by empirical measurements of parser accuracy across languages and data regimes; no free parameters, axioms, or invented entities are introduced to derive the result.

pith-pipeline@v0.9.0 · 5457 in / 1227 out tokens · 50717 ms · 2026-05-08T19:33:04.849919+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 47 canonical work pages · 2 internal anchors

  1. [1]

    Aho and Jeffrey D

    Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

  2. [2]

    Publications Manual , year = "1983", publisher =

  3. [3]

    Chandra and Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  4. [4]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  5. [5]

    Dan Gusfield , title =. 1997

  6. [6]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  7. [7]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

  8. [8]

    Can T ype- T oken R atio be U sed to S how M orphological C omplexity of L anguages?

    Kimmo Kettunen. Can T ype- T oken R atio be U sed to S how M orphological C omplexity of L anguages?. Journal of Quantitative Linguistics. 2014. doi:10.1080/09296174.2014.911506

  9. [9]

    A C omparison B etween M orphological C omplexity M easures: T ypological D ata vs

    Christian Bentz and Tatyana Ruzsics and Alexander Koplenig and Tanja Samard z i \'c. A C omparison B etween M orphological C omplexity M easures: T ypological D ata vs. L anguage C orpora. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity ( CL 4 LC ). 2016

  10. [10]

    Comparing learnability of two dependency schemes: `semantic' ( UD ) and `syntactic' ( SUD )

    Tuora, Ryszard and Przepi \'o rkowski, Adam and Leczkowski, Aleksander. Comparing learnability of two dependency schemes: `semantic' ( UD ) and `syntactic' ( SUD ). Findings of the Association for Computational Linguistics: EMNLP 2021. 2021. doi:10.18653/v1/2021.findings-emnlp.256

  11. [11]

    A fri B ooms: An Online Treebank for A frikaans

    Augustinus, Liesbeth and Dirix, Peter and van Niekerk, Daniel and Schuurman, Ineke and Vandeghinste, Vincent and Van Eynde, Frank and van Huyssteen, Gerhard. A fri B ooms: An Online Treebank for A frikaans. Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16). 2016

  12. [12]

    Covington and Joe D

    Michael A. Covington and Joe D. McFall. Cutting the Gordian knot: T he moving-average type-token ratio ( MATTR ). Journal of Quantitative Linguistics. 2010. doi:10.1080/09296171003643098

  13. [13]

    Dependency Parsing

    Sandra K \"u bler and Ryan McDonald and Joakim Nivre. Dependency Parsing. 2009. doi:10.1007/978-3-031-02131-2

  14. [14]

    Efficient Second-Order T ree CRF for Neural Dependency Parsing

    Yu Zhang and Zhenghua Li and Min Zhang. Efficient Second-Order T ree CRF for Neural Dependency Parsing. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.302

  15. [15]

    Constraint Grammar: A Language-Independent Framework for Parsing Unrestricted Text

    Fred Karlsson and Atro Voutilainen and Juha Heikkilä and Arto Anttila. Constraint Grammar: A Language-Independent Framework for Parsing Unrestricted Text. 1995

  16. [16]

    A Very Nice Paper To Cite

    Firstname1 Lastname1 and Firstname2 Lastname2. A Very Nice Paper To Cite. International Symposium on Computer Architecture. 2000

  17. [17]

    An Efficient Algorithm for Projective Dependency Parsing

    Nivre, Joakim. An Efficient Algorithm for Projective Dependency Parsing. Proceedings of the Eighth International Conference on Parsing Technologies. 2003

  18. [18]

    Non-Projective Dependency Parsing using Spanning Tree Algorithms

    McDonald, Ryan and Pereira, Fernando and Ribarov, Kiril and Haji c , Jan. Non-Projective Dependency Parsing using Spanning Tree Algorithms. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 2005

  19. [19]

    A Fast and Accurate Dependency Parser using Neural Networks

    Chen, Danqi and Manning, Christopher. A Fast and Accurate Dependency Parser using Neural Networks. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. doi:10.3115/v1/D14-1082

  20. [20]

    Long short-term memory,

    Hochreiter, Sepp and Schmidhuber, J \"u rgen. Long Short-Term Memory. Neural Computation. 1997. doi:10.1162/neco.1997.9.8.1735

  21. [21]

    and Kaiser, Lukasz and Polosukhin, Illia

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017

  22. [22]

    75 Languages, 1 Model: Parsing Universal Dependencies Universally

    Kondratyuk, Dan and Straka, Milan. 75 Languages, 1 Model: Parsing Universal Dependencies Universally. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1279

  23. [23]

    Self-attentive Biaffine Dependency Parsing

    Li, Ying and Li, Zhenghua and Zhang, Min and Wang, Rui and Li, Sheng and Si, Luo. Self-attentive Biaffine Dependency Parsing. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). 2019

  24. [24]

    Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary

    Grünewald, Stefan and Friedrich, Annemarie and Kuhn, Jonas. Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary. Proceedings of the 17th International Conference on Parsing Technologies (IWPT 2021). 2021

  25. [25]

    On the importance of pre-training data volume for compact language models

    Micheli, Vincent and d'Hoffschmidt, Martin and Fleuret, Fran c ois. On the importance of pre-training data volume for compact language models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.632

  26. [26]

    Learning Which Features Matter: R o BERT a Acquires a Preference for Linguistic Generalizations (Eventually)

    Warstadt, Alex and Zhang, Yian and Li, Xiaocheng and Liu, Haokun and Bowman, Samuel R. Learning Which Features Matter: R o BERT a Acquires a Preference for Linguistic Generalizations (Eventually). Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.16

  27. [27]

    Natural language processing applications for low-resource languages

    Pakray, Partha and Gelbukh, Alexander and Bandyopadhyay, Sivaji. Natural language processing applications for low-resource languages. Natural Language Processing. 2025. doi:10.1017/nlp.2024.33

  28. [28]

    A Survey of the Model Transfer Approaches to Cross-Lingual Dependency Parsing

    Jha, Prabhat Kumar and Kumar, Rajesh and Sahula, Vikram. A Survey of the Model Transfer Approaches to Cross-Lingual Dependency Parsing. ACM Transactions on Asian and Low-Resource Language Information Processing. 2020. doi:10.1145/3383772

  29. [29]

    Cross-lingual dependency parsing for a language with a unique script

    Zhou, He and Dakota, Daniel and K \"u bler, Sandra. Cross-lingual dependency parsing for a language with a unique script. Natural Language Processing. 2025. doi:10.1017/nlp.2024.21

  30. [30]

    Extending Multilingual BERT to Low-Resource Languages

    Wang, Zihan and K, Karthikeyan and Mayhew, Stephen and Roth, Dan. Extending Multilingual BERT to Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.240

  31. [31]

    Are All Languages Created Equal in Multilingual BERT ?

    Wu, Shijie and Dredze, Mark. Are All Languages Created Equal in Multilingual BERT ?. arXiv preprint arXiv:2005.09093. 2020

  32. [32]

    M icro BERT : Effective Training of Low-resource Monolingual BERT s through Parameter Reduction and Multitask Learning

    Gessler, Luke and Zeldes, Amir. M icro BERT : Effective Training of Low-resource Monolingual BERT s through Parameter Reduction and Multitask Learning. Proceedings of the 2nd Workshop on Multi-lingual Representation Learning (MRL). 2022. doi:10.18653/v1/2022.mrl-1.9

  33. [33]

    Efficient Language Modeling for Low-Resource Settings with Hybrid RNN - T ransformer Architectures

    Lindenmaier, Gabriel and Papay, Sean and Pad \'o , Sebastian. Efficient Language Modeling for Low-Resource Settings with Hybrid RNN - T ransformer Architectures. arXiv preprint arXiv:2502.00617. 2025

  34. [34]

    The Importance of Being Recurrent for Modeling Hierarchical Structure

    Tran, Ke and Bisazza, Arianna and Monz, Christof. The Importance of Being Recurrent for Modeling Hierarchical Structure. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1503

  35. [35]

    Universal Transformers

    Dehghani, Mostafa and Gouws, Stephan and Vinyals, Oriol and Uszkoreit, Jakob and Kaiser, Łukasz. Universal Transformers. arXiv preprint arXiv:1807.03819. 2018

  36. [36]

    and Hinton, Geoffrey E

    Sutskever, Ilya and Martens, James and Dahl, George E. and Hinton, Geoffrey E. On the Importance of Initialization and Momentum in Deep Learning. Proceedings of the 30th International Conference on Machine Learning. 2013

  37. [37]

    Thomas and Frank, Robert and Linzen, Tal

    McCoy, R. Thomas and Frank, Robert and Linzen, Tal. Does Syntax Need to Grow on Trees? S ources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks. Transactions of the Association for Computational Linguistics. 2020. doi:10.1162/tacl_a_00304

  38. [38]

    Dodge, G

    Dodge, Jesse and Ilharco, Gabriel and Schwartz, Roy and Farhadi, Ali and Hajishirzi, Hannaneh and Smith, Noah A. Fine-Tuning Pretrained Language Models: W eight Initializations, Data Orders, and Early Stopping. arXiv preprint arXiv:2002.06305. 2020

  39. [39]

    Constraints on Non-Projective Dependency Parsing

    Nivre, Joakim. Constraints on Non-Projective Dependency Parsing. 11th Conference of the E uropean Chapter of the Association for Computational Linguistics. 2006

  40. [40]

    and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel

    Nivre, Joakim and de Marneffe, Marie-Catherine and Ginter, Filip and Haji c , Jan and Manning, Christopher D. and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel. U niversal D ependencies v2: An Evergrowing Multilingual Treebank Collection. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

  41. [41]

    SUD or Surface-Syntactic U niversal D ependencies: An annotation scheme near-isomorphic to UD

    Gerdes, Kim and Guillaume, Bruno and Kahane, Sylvain and Perrier, Guy. SUD or Surface-Syntactic U niversal D ependencies: An annotation scheme near-isomorphic to UD. Proceedings of the Second Workshop on Universal Dependencies ( UDW 2018). 2018. doi:10.18653/v1/W18-6008

  42. [42]

    Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) , year=

    Learning Word Vectors for 157 Languages , author=. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) , year=

  43. [43]

    2009 , publisher =

    Hajič, Jan and Smrž, Otakar and Zemánek, Petr and Pajas, Petr and Šnaidauf, Jan and Beška, Emanuel and Kráčmar, Jakub and Hassanová, Kamila , title =. 2009 , publisher =

  44. [44]

    , journal =

    Probert, Tracy N. , journal =. A comparison of the early reading strategies of. 2019 , doi =

  45. [45]

    Linguistic Typology , volume =

    The world's simplest grammars are creole grammars , author =. Linguistic Typology , volume =. 2001 , doi =

  46. [46]

    , year =

    Donaldson, Bruce C. , year =. A Grammar of

  47. [47]

    Findings of the Association for Computational Linguistics: NAACL 2024 , month = jun, year =

    Low-resource neural machine translation with morphological modeling , author =. Findings of the Association for Computational Linguistics: NAACL 2024 , month = jun, year =. doi:10.18653/v1/2024.findings-naacl.13 , pages =

  48. [48]

    2021 , publisher =

    The Oxford History of Romanian Morphology , author =. 2021 , publisher =

  49. [49]

    Traitement Automatique des Langues

    Un corpus arboré pour le français : le F rench T reebank [A parsed corpus for F rench: the F rench treebank ]. Traitement Automatique des Langues. 2019

  50. [50]

    Linguistic Data Retrievable from a Treebank

    Barbu Mititelu, Verginica and Irimia, Elena. Linguistic Data Retrievable from a Treebank. Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016). 2016

  51. [51]

    Ma, Xuezhe , title =

  52. [52]

    2024 , note =

    Zhang, Yu , title =. 2024 , note =

  53. [53]

    Universal Dependencies

    Universal Dependencies Contributors. Universal Dependencies. 2025

  54. [54]

    Developing U niversal D ependencies for W olof

    Dione, Cheikh Bamba. Developing U niversal D ependencies for W olof. Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019). 2019. doi:10.18653/v1/W19-8003

  55. [55]

    How Multilingual is Multilingual BERT ?

    Pires, Telmo and Schlinger, Eva and Garrette, Dan. How multilingual is Multilingual BERT?. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1493

  56. [56]

    Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , editor=

    Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...

  57. [57]

    Rethinking Embedding Coupling in Pre-trained Language Models

    Chung, Hyung Won and Fevry, Thibault and Tsai, Henry and Johnson, Melvin and Ruder, Sebastian. Rethinking Embedding Coupling in Pre-trained Language Models. arXiv preprint arXiv:2010.12821. 2020

  58. [58]

    Nekoto, Wilhelmina and Marivate, Vukosi and Matsila, Tshinondiwa and Fasubaa, Timi and Fagbohungbe, Taiwo and Akinola, Solomon Oluwole and Muhammad, Shamsuddeen and Kabongo Kabenamualu, Salomon and Osei, Salomey and Sackey, Freshia and Niyongabo, Rubungo Andre and Macharm, Ricky and Ogayo, Perez and Ahia, Orevaoghene and Berhe, Musie Meressa and Adeyemi, ...

  59. [59]

    Lost in the Middle: How Language Models Use Long Contexts

    Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 2017. doi:10.1162/tacl\_a\_00051

  60. [60]

    SALT -31: A Machine Translation Benchmark Dataset for 31 U gandan Languages

    Nsumba, Solomon and Akera, Benjamin and Ouma, Evelyn Nafula and Ssentanda, Medadi and Kawalya, Deo and Bainomugisha, Engineer and Mwebaze, Ernest Tonny and Quinn, John. SALT -31: A Machine Translation Benchmark Dataset for 31 U gandan Languages. Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026). 2026

  61. [61]

    I bom NLP : A Step Toward Inclusive Natural Language Processing for N igeria's Minority Languages

    Kalejaiye, Oluwadara and Beyene, Luel Hagos and Adelani, David Ifeoluwa and Edet, Mmekut-mfon Gabriel and Akpan, Aniefon Daniel and Urua, Eno-Abasi and Andy, Anietie. I bom NLP : A Step Toward Inclusive Natural Language Processing for N igeria's Minority Languages. Proceedings of the 14th International Joint Conference on Natural Language Processing and t...

  62. [62]

    and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich

    Alabi, Jesujoba O. and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich. Adapting Pre-trained Language Models to A frican Languages via Multilingual Adaptive Fine-Tuning. Proceedings of the 29th International Conference on Computational Linguistics. 2022

  63. [63]

    and Muhammad, Shamsuddeen H

    Adelani, David Ifeoluwa and Neubig, Graham and Ruder, Sebastian and Rijhwani, Shruti and Beukman, Michael and Palen-Michel, Chester and Lignos, Constantine and Alabi, Jesujoba O. and Muhammad, Shamsuddeen H. and Nabende, Peter and Dione, Cheikh M. Bamba and Bukula, Andiswa and Mabuya, Rooweither and Dossou, Bonaventure F. P. and Sibanda, Blessing and Buza...

  64. [64]

    Dione, Cheikh M. Bamba and Adelani, David Ifeoluwa and Nabende, Peter and Alabi, Jesujoba and Sindane, Thapelo and Buzaaba, Happy and Muhammad, Shamsuddeen Hassan and Emezue, Chris Chinenye and Ogayo, Perez and Aremu, Anuoluwapo and Gitau, Catherine and Mbaye, Derguene and Mukiibi, Jonathan and Sibanda, Blessing and Dossou, Bonaventure F. P. and Bukula, A...

  65. [65]

    Adelani, David Ifeoluwa and Masiak, Marek and Azime, Israel Abebe and Alabi, Jesujoba and Tonja, Atnafu Lambebo and Mwase, Christine and Ogundepo, Odunayo and Dossou, Bonaventure F. P. and Oladipo, Akintunde and Nixdorf, Doreen and Emezue, Chris Chinenye and Al-azzawi, Sana and Sibanda, Blessing and David, Davis and Ndolela, Lolwethu and Mukiibi, Jonathan...

  66. [66]

    AfriBERTa: Towards Viable Multilingual Language Models for Low-Resource Languages

    Ogueji, Kelechi. AfriBERTa: Towards Viable Multilingual Language Models for Low-Resource Languages. 2022

  67. [67]

    Muhammad, Shamsuddeen Hassan and Abdulmumin, Idris and Ayele, Abinew Ali and Ousidhoum, Nedjma and Adelani, David Ifeoluwa and Yimam, Seid Muhie and Ahmad, Ibrahim Sa'id and Beloucif, Meriem and Mohammad, Saif M. and Ruder, Sebastian and Hourrane, Oumaima and Brazdil, Pavel and Jorge, Alipio and Ali, Felermino D \'a rio M \'a rio Ant \'o nio and David, Da...

  68. [68]

    AfricaNLP Resources

    Adelani, David Ifeoluwa. AfricaNLP Resources. 2022

  69. [69]

    arXiv preprint arXiv:2307.13405 , year=

    Bella, G \'a bor and others. Towards Bridging the Digital Language Divide. arXiv preprint arXiv:2307.13405. 2023

  70. [70]

    Natural Language Processing for African Languages

    Adelani, David Ifeoluwa. Natural Language Processing for African Languages. arXiv preprint arXiv:2507.00297. 2025

  71. [71]

    doi:10.18653/v1/N19-1423 , pages =

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1423

  72. [72]

    A Surface-Syntactic UD Treebank for N aija

    Caron, Bernard and Courtin, Marine and Gerdes, Kim and Kahane, Sylvain. A Surface-Syntactic UD Treebank for N aija. Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019). 2019. doi:10.18653/v1/W19-7803

  73. [73]

    and others , title =

    Buzaaba, H. and others , title =. 2026 , howpublished =

  74. [74]

    and Santorini, Beatrice and Marcinkiewicz, Mary Ann

    Marcus, Mitchell P. and Santorini, Beatrice and Marcinkiewicz, Mary Ann. Building a Large Annotated Corpus of E nglish: The P enn T reebank. Computational Linguistics. 1993

  75. [75]

    A Gold Standard Dependency Corpus for E nglish

    Silveira, Natalia and Dozat, Timothy and de Marneffe, Marie-Catherine and Bowman, Samuel and Connor, Miriam and Bauer, John and Manning, Chris. A Gold Standard Dependency Corpus for E nglish. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14). 2014

  76. [76]

    Manning, Joakim Nivre, and Daniel Zeman

    de Marneffe, Marie-Catherine and Manning, Christopher D. and Nivre, Joakim and Zeman, Daniel. U niversal D ependencies. Computational Linguistics. 2021. doi:10.1162/coli_a_00402

  77. [77]

    A new proof of C ayley's formula for counting labeled trees

    Shor, Peter W. A new proof of C ayley's formula for counting labeled trees. Journal of Combinatorial Theory, Series A. 1995. doi:10.1016/0097-3165(95)90022-5

  78. [78]

    Combining (Second-Order) Graph-Based and Headed-Span-Based Projective Dependency Parsing

    Yang, Songlin and Tu, Kewei. Combining (Second-Order) Graph-Based and Headed-Span-Based Projective Dependency Parsing. Findings of the Association for Computational Linguistics: ACL 2022. 2022

  79. [79]

    Three New Probabilistic Models for Dependency Parsing: An Exploration

    Eisner, Jason M. Three New Probabilistic Models for Dependency Parsing: An Exploration. COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics. 1996

  80. [80]

    On the shortest arborescence of a directed graph

    Chu, Yoeng-Jin. On the shortest arborescence of a directed graph. Scientia Sinica. 1965

Showing first 80 references.