Recognition: unknown
Tadabur: A Large-Scale Quran Audio Dataset
Pith reviewed 2026-05-10 02:12 UTC · model grok-4.3
The pith
Tadabur introduces a Quran audio dataset with over 1400 hours from more than 600 reciters featuring diverse styles and conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400 hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research and analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.
What carries the argument
The Tadabur dataset, a large collection of Quran recitation audio assembled to maximize total duration and variation across many reciters and recording setups.
If this is right
- Researchers can train speech processing models on a much larger and more varied set of Quranic recitations than was previously possible.
- Standardized benchmarks for Quranic speech analysis and recognition can be developed using this shared resource.
- Studies of how recitation style, voice, and recording conditions affect automated analysis become feasible at scale.
- Applications that process real-world Quranic audio gain access to training data that better matches deployment conditions.
Where Pith is reading between the lines
- Widespread use of Tadabur could improve the performance of digital tools for Quran memorization and pronunciation practice.
- The same collection strategy might be applied to build comparable large audio resources for other religious or liturgical texts.
- Comparisons between Tadabur and general Arabic speech datasets could highlight distinctive acoustic properties of Quranic recitation.
Load-bearing premise
The audio has been accurately collected, properly attributed to distinct reciters, and represents genuine diversity in styles and conditions without significant collection or labeling errors.
What would settle it
An independent audit that finds the total unique audio duration substantially below 1400 hours or that the number of truly distinct reciters falls well short of 600 due to duplication or misattribution.
Figures
read the original abstract
Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research and analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Tadabur, a large-scale Quran audio dataset comprising more than 1400 hours of recitation audio from over 600 distinct reciters. It highlights the dataset's diversity in recitation styles, vocal characteristics, and recording conditions as a means to address limitations in existing Quran datasets and to support future research and standardized benchmarks in Quranic speech analysis.
Significance. If the scale, diversity, and accessibility claims are substantiated through detailed methodology, verification, and public release, Tadabur could provide a meaningfully larger and more varied resource than prior Quran audio collections, enabling improved model training and evaluation for tasks such as recitation recognition and analysis. The absence of supporting details currently prevents assessment of this potential impact.
major comments (2)
- [Abstract] Abstract: The central claims of >1400 hours from >600 reciters with 'substantial variation' in styles and conditions are asserted without any description of collection sources, reciter attribution process, duration measurement, or verification steps. This is load-bearing for the paper's contribution as a dataset release.
- [Abstract] Abstract and manuscript description: No quantitative breakdown (e.g., hours per reciter, condition statistics) or safeguards against duplication/misattribution are provided, leaving the diversity and scale assertions unverified and preventing independent confirmation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback emphasizing the need for methodological transparency to substantiate the dataset's scale and diversity claims. We agree that the current manuscript lacks sufficient detail in these areas and will undertake a major revision to address all points raised.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claims of >1400 hours from >600 reciters with 'substantial variation' in styles and conditions are asserted without any description of collection sources, reciter attribution process, duration measurement, or verification steps. This is load-bearing for the paper's contribution as a dataset release.
Authors: We agree that the abstract and manuscript currently assert these claims without supporting methodological descriptions. In the revised manuscript, we will add a dedicated 'Data Collection and Verification' section that explicitly details the sources of the audio (publicly available recitation repositories and licensed recordings), the reciter attribution process (including metadata cross-verification and expert consultation), duration measurement procedures (using automated segmentation followed by manual validation), and verification steps (such as duplicate detection and attribution audits). This will directly support the load-bearing claims. revision: yes
-
Referee: [Abstract] Abstract and manuscript description: No quantitative breakdown (e.g., hours per reciter, condition statistics) or safeguards against duplication/misattribution are provided, leaving the diversity and scale assertions unverified and preventing independent confirmation.
Authors: We concur that the absence of quantitative breakdowns and explicit safeguards limits verifiability. The revised manuscript will include new tables and figures providing breakdowns such as hours per reciter (with mean, median, and range), statistics on recitation styles and recording conditions (e.g., studio vs. live, noise levels), and descriptions of safeguards including audio fingerprinting for deduplication, source provenance tracking, and multi-stage attribution verification to prevent misattribution. These additions will enable independent confirmation. revision: yes
Circularity Check
No circularity: dataset release paper with no derivations or predictions
full rationale
This manuscript is a dataset release paper. It contains no equations, no derivations, no fitted parameters, no predictions, and no load-bearing claims that reduce to self-referential logic or self-citation chains. The headline numbers (1400+ hours, 600+ reciters) are presented as direct statements of collection results rather than outputs of any derivation process. No steps exist that could be flagged under the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The audio recordings are authentic recitations of the Quran from distinct reciters with the claimed variations in style and conditions.
Reference graph
Works this paper leans on
-
[1]
Quran recitations for audio classification,
M. Alrajeh, “Quran recitations for audio classification,” https://www.kaggle.com/datasets/mohammedalrajeh/ quran-recitations-for-audio-classification, 2023, accessed: 2026-01-24
2023
-
[2]
Quran speech to text dataset,
“Quran speech to text dataset,” https://www.openslr.org/132/, n.d., openSLR SLR132
-
[3]
Quran-md: A fine-grained multimodal dataset of the quran,
M. U. Salman, M. A. Qazi, and M. T. Alam, “Quran-md: A fine-grained multimodal dataset of the quran,” in5th Muslims in ML Workshop co-located with NeurIPS 2025
2025
-
[4]
Gemini 2.5 flash,
Google DeepMind, “Gemini 2.5 flash,” https://deepmind.google/models/gemini/flash/, 2025
2025
-
[5]
Whisperx: Time-accurate speech transcription of long-form audio,
M. Bain, J. Huh, T. Han, and A. Zisserman, “Whisperx: Time-accurate speech transcription of long-form audio,” INTERSPEECH, 2023
2023
-
[6]
Silma embedding matryoshka 0.1,
A. B. Soliman, K. Ouda, and S. AI, “Silma embedding matryoshka 0.1,” https://huggingface.co/silma-ai/ silma-embedding-matryoshka-v0.1, 2024
2024
-
[7]
H. M. Abdullah Abdelfattah, Mahmoud I.Khalil, “Automatic pronunciation error detection and correction of the holy quran’s learners using deep learning,”arXiv preprint arXiv:2509.00094, 2025
-
[8]
EAT: Self-supervised pre-training with efficient audio transformer,
W. Chen, Y . Liang, Z. Ma, Z. Zheng, and X. Chen, “EAT: Self-supervised pre-training with efficient audio transformer,” inIJCAI, 2024
2024
-
[9]
Dinov2: Learning robust visual features without supervision,
M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, R. Howes, P.-Y . Huang, H. Xu, V . Sharma, S.-W. Li, W. Galuba, M. Rabbat, M. Assran, N. Ballas, G. Synnaeve, I. Misra, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without ...
2023
-
[10]
whisper-base-ar-quran: Whisper base fine-tuned on qur’anic arabic,
Tarteel AI, “whisper-base-ar-quran: Whisper base fine-tuned on qur’anic arabic,” https://huggingface.co/tarteel-ai/ whisper-base-ar-quran, 2022
2022
-
[11]
Robust Speech Recognition via Large-Scale Weak Supervision
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,”arXiv preprint arXiv:2212.04356, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
Fine-tuned XLSR-53 large model for speech recognition in Arabic,
J. Grosman, “Fine-tuned XLSR-53 large model for speech recognition in Arabic,” https://huggingface.co/ jonatasgrosman/wav2vec2-large-xlsr-53-arabic, 2021
2021
-
[13]
Scaling speech technology to 1,000+ languages,
V . Pratap, A. Tjandra, B. Shiet al., “Scaling speech technology to 1,000+ languages,”arXiv preprint arXiv:2305.13516, 2023
-
[14]
Qwen3-asr technical report,
Qwen Team, “Qwen3-asr technical report,” https://huggingface.co/Qwen/Qwen3-ASR-1.7B, 2026
2026
-
[15]
Cohere transcribe: State-of-the-art speech recognition,
Cohere Labs, “Cohere transcribe: State-of-the-art speech recognition,” https://huggingface.co/CohereLabs/ cohere-transcribe-03-2026, 2026. 2026-04-22
2026
-
[16]
A. H. Liu, A. Ehrenberg, A. Loet al., “V oxtral realtime,”arXiv preprint arXiv:2602.11298, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[17]
Vibevoice-asr technical report,
Microsoft Research, “Vibevoice-asr technical report,” https://huggingface.co/microsoft/VibeV oice-ASR, 2026
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.