A fro B ench: How Good are Large Language Models on A frican Languages?

Jessica Ojo, Odunayo Ogundepo, Akintunde Oladipo, Kelechi Ogueji, Jimmy Lin, Pontus Stenetorp, David Ifeoluwa Adelani · 2025 · DOI 10.18653/v1/2025.findings-acl.976

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

DEPART: DEcomposing PARiTy across Multilingual LLMs

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.

Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

cs.CL · 2026-07-02 · unverdicted · novelty 5.0

Meta-analysis of 33 ACL papers shows inconsistent LLM-as-a-Judge results, overtrust, and single-model reliance in multilingual/low-resource settings, with recommendations for better practice.

Sample-Size Scaling of the African Languages NLI Evaluation

cs.CL · 2026-06-02 · unverdicted · novelty 5.0

Scaling NLI performance with sample size in African languages is language-dependent and frequently non-monotonic, with saturation or declines observed in some cases.

citing papers explorer

Showing 3 of 3 citing papers after filters.

DEPART: DEcomposing PARiTy across Multilingual LLMs cs.CL · 2026-05-27 · unverdicted · none · ref 30
A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages cs.CL · 2026-07-02 · unverdicted · none · ref 181
Meta-analysis of 33 ACL papers shows inconsistent LLM-as-a-Judge results, overtrust, and single-model reliance in multilingual/low-resource settings, with recommendations for better practice.
Sample-Size Scaling of the African Languages NLI Evaluation cs.CL · 2026-06-02 · unverdicted · none · ref 12
Scaling NLI performance with sample size in African languages is language-dependent and frequently non-monotonic, with saturation or declines observed in some cases.

A fro B ench: How Good are Large Language Models on A frican Languages?

fields

years

verdicts

representative citing papers

citing papers explorer