A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
A fro B ench: How Good are Large Language Models on A frican Languages?
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Meta-analysis of 33 ACL papers shows inconsistent LLM-as-a-Judge results, overtrust, and single-model reliance in multilingual/low-resource settings, with recommendations for better practice.
Scaling NLI performance with sample size in African languages is language-dependent and frequently non-monotonic, with saturation or declines observed in some cases.
citing papers explorer
-
DEPART: DEcomposing PARiTy across Multilingual LLMs
A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
-
Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages
Meta-analysis of 33 ACL papers shows inconsistent LLM-as-a-Judge results, overtrust, and single-model reliance in multilingual/low-resource settings, with recommendations for better practice.
-
Sample-Size Scaling of the African Languages NLI Evaluation
Scaling NLI performance with sample size in African languages is language-dependent and frequently non-monotonic, with saturation or declines observed in some cases.