An audit of over twenty African NLP corpus families documents license incompatibilities, hidden restrictions, and data persistence failures via a six-tier matrix applied to three languages.
Waxal: A large-scale multilingual african language speech corpus
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Fine-tuned edge ASR models reduce WER by 26.9 points over zero-shot baselines on 19 African languages while being substantially smaller and release supporting artifacts.
AfriVox-v2 is a benchmark that evaluates modern speech models on in-the-wild African audio with domain-specific tests for sectors including government, finance, health, and agriculture.
citing papers explorer
-
Open but Incompatible: A License Compatibility Analysis of Corpora for Low-Resource African Languages
An audit of over twenty African NLP corpus families documents license incompatibilities, hidden restrictions, and data persistence failures via a six-tier matrix applied to three languages.
-
WAXAL-NET: Finetuned Edge ASR Across 19 African Languages
Fine-tuned edge ASR models reduce WER by 26.9 points over zero-shot baselines on 19 African languages while being substantially smaller and release supporting artifacts.
-
AfriVox-v2: A Domain-Verticalized Benchmark for In-the-Wild African Speech Recognition
AfriVox-v2 is a benchmark that evaluates modern speech models on in-the-wild African audio with domain-specific tests for sectors including government, finance, health, and agriculture.