Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning

Lau, Mingfei, Chen, Qian, Fang, Yeming, Xu, Tingting, Chen, Tongzhou, Golik, Pavel , editor = · 2025 · DOI 10.18653/v1/2025.acl-long.370

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource Languages

cs.CL · 2026-06-08 · accept · novelty 7.0

OpenBibleTTS supplies speech data and alignments for 37 underrepresented languages and shows that no single TTS system leads on all metrics, with Gemini-TTS highest in listener ratings but monolingual EveryVoice models strongest on intelligibility for several African languages.

Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

Audio language models are benchmarked on five semantic and paralinguistic reasoning tasks to reveal limitations in handling spoken audio evidence, accent variation, and domain shifts.

citing papers explorer

Showing 2 of 2 citing papers.

OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource Languages cs.CL · 2026-06-08 · accept · none · ref 13
OpenBibleTTS supplies speech data and alignments for 37 underrepresented languages and shows that no single TTS system leads on all metrics, with Gemini-TTS highest in listener ratings but monolingual EveryVoice models strongest on intelligibility for several African languages.
Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents cs.CL · 2026-05-11 · unverdicted · none · ref 163
Audio language models are benchmarked on five semantic and paralinguistic reasoning tasks to reveal limitations in handling spoken audio evidence, accent variation, and domain shifts.

Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning

fields

years

verdicts

representative citing papers

citing papers explorer