A cross-evaluation setup with native-speaker SME rubrics benchmarks LLMs on underrepresented Arabic dialects and quantifies automated judge bias and cultural-reasoning gaps.
Fluent but Foreign: Even Regional LLMs Lack Cultural Align- ment
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Frontier LLMs reach ~97% aggregate reliability on Nepal's K-10 curriculum but show major shortfalls in pedagogical clarity and cultural contextualization, indicating they are not ready for autonomous tutoring.
citing papers explorer
-
Benchmarking Frontier LLMs on Arabic Cultural and Sociolinguistic Knowledge: A Cross-Evaluation Framework with Human SME Ground Truth
A cross-evaluation setup with native-speaker SME rubrics benchmarks LLMs on underrepresented Arabic dialects and quantifies automated judge bias and cultural-reasoning gaps.