pith. machine review for the scientific record. sign in

arxiv: 2604.18932 · v1 · submitted 2026-04-21 · 💻 cs.SD · cs.AI

Recognition: unknown

Tadabur: A Large-Scale Quran Audio Dataset

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:12 UTC · model grok-4.3

classification 💻 cs.SD cs.AI
keywords Quran audio datasetQuranic recitationspeech datasetaudio diversitylarge-scale audioQuran speech researchrecitation styles
0
0 comments X

The pith

Tadabur introduces a Quran audio dataset with over 1400 hours from more than 600 reciters featuring diverse styles and conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing Quran audio datasets are limited in both scale and diversity, which restricts progress in Quranic speech research. The paper presents Tadabur to close this gap by compiling more than 1400 hours of recitation audio drawn from over 600 distinct reciters. This collection captures real differences in recitation styles, vocal characteristics, and recording conditions. A sympathetic reader would care because the added volume and variety can support training of more capable speech models and the creation of shared evaluation standards for the domain.

Core claim

We present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400 hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research and analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.

What carries the argument

The Tadabur dataset, a large collection of Quran recitation audio assembled to maximize total duration and variation across many reciters and recording setups.

If this is right

  • Researchers can train speech processing models on a much larger and more varied set of Quranic recitations than was previously possible.
  • Standardized benchmarks for Quranic speech analysis and recognition can be developed using this shared resource.
  • Studies of how recitation style, voice, and recording conditions affect automated analysis become feasible at scale.
  • Applications that process real-world Quranic audio gain access to training data that better matches deployment conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use of Tadabur could improve the performance of digital tools for Quran memorization and pronunciation practice.
  • The same collection strategy might be applied to build comparable large audio resources for other religious or liturgical texts.
  • Comparisons between Tadabur and general Arabic speech datasets could highlight distinctive acoustic properties of Quranic recitation.

Load-bearing premise

The audio has been accurately collected, properly attributed to distinct reciters, and represents genuine diversity in styles and conditions without significant collection or labeling errors.

What would settle it

An independent audit that finds the total unique audio duration substantially below 1400 hours or that the number of truly distinct reciters falls well short of 600 due to duplication or misattribution.

Figures

Figures reproduced from arXiv: 2604.18932 by Faisal Alherran.

Figure 1
Figure 1. Figure 1: Overview of the verse-level alignment pipeline, illustrating the processing flow from audio input through [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Detailed architecture of the Ayah Alignment Module, illustrating the embedding-based similarity matching [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Recitation stop segmentation and boundary correction module for precise ayah-level audio extraction through [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: LLM-based metadata classification pipeline for identifying valid surah recitations. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Deduplication pipeline using EAT embeddings, cosine similarity, and connected component extraction. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Number of reciters across major publicly available Quran datasets. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research and analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces Tadabur, a large-scale Quran audio dataset comprising more than 1400 hours of recitation audio from over 600 distinct reciters. It highlights the dataset's diversity in recitation styles, vocal characteristics, and recording conditions as a means to address limitations in existing Quran datasets and to support future research and standardized benchmarks in Quranic speech analysis.

Significance. If the scale, diversity, and accessibility claims are substantiated through detailed methodology, verification, and public release, Tadabur could provide a meaningfully larger and more varied resource than prior Quran audio collections, enabling improved model training and evaluation for tasks such as recitation recognition and analysis. The absence of supporting details currently prevents assessment of this potential impact.

major comments (2)
  1. [Abstract] Abstract: The central claims of >1400 hours from >600 reciters with 'substantial variation' in styles and conditions are asserted without any description of collection sources, reciter attribution process, duration measurement, or verification steps. This is load-bearing for the paper's contribution as a dataset release.
  2. [Abstract] Abstract and manuscript description: No quantitative breakdown (e.g., hours per reciter, condition statistics) or safeguards against duplication/misattribution are provided, leaving the diversity and scale assertions unverified and preventing independent confirmation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need for methodological transparency to substantiate the dataset's scale and diversity claims. We agree that the current manuscript lacks sufficient detail in these areas and will undertake a major revision to address all points raised.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims of >1400 hours from >600 reciters with 'substantial variation' in styles and conditions are asserted without any description of collection sources, reciter attribution process, duration measurement, or verification steps. This is load-bearing for the paper's contribution as a dataset release.

    Authors: We agree that the abstract and manuscript currently assert these claims without supporting methodological descriptions. In the revised manuscript, we will add a dedicated 'Data Collection and Verification' section that explicitly details the sources of the audio (publicly available recitation repositories and licensed recordings), the reciter attribution process (including metadata cross-verification and expert consultation), duration measurement procedures (using automated segmentation followed by manual validation), and verification steps (such as duplicate detection and attribution audits). This will directly support the load-bearing claims. revision: yes

  2. Referee: [Abstract] Abstract and manuscript description: No quantitative breakdown (e.g., hours per reciter, condition statistics) or safeguards against duplication/misattribution are provided, leaving the diversity and scale assertions unverified and preventing independent confirmation.

    Authors: We concur that the absence of quantitative breakdowns and explicit safeguards limits verifiability. The revised manuscript will include new tables and figures providing breakdowns such as hours per reciter (with mean, median, and range), statistics on recitation styles and recording conditions (e.g., studio vs. live, noise levels), and descriptions of safeguards including audio fingerprinting for deduplication, source provenance tracking, and multi-stage attribution verification to prevent misattribution. These additions will enable independent confirmation. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release paper with no derivations or predictions

full rationale

This manuscript is a dataset release paper. It contains no equations, no derivations, no fitted parameters, no predictions, and no load-bearing claims that reduce to self-referential logic or self-citation chains. The headline numbers (1400+ hours, 600+ reciters) are presented as direct statements of collection results rather than outputs of any derivation process. No steps exist that could be flagged under the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the collected audio meets the stated criteria for authenticity, attribution, and diversity. No free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The audio recordings are authentic recitations of the Quran from distinct reciters with the claimed variations in style and conditions.
    The abstract relies on this premise to establish the dataset's value but provides no verification details.

pith-pipeline@v0.9.0 · 5389 in / 1140 out tokens · 54441 ms · 2026-05-10T02:12:18.690903+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 4 canonical work pages · 2 internal anchors

  1. [1]

    Quran recitations for audio classification,

    M. Alrajeh, “Quran recitations for audio classification,” https://www.kaggle.com/datasets/mohammedalrajeh/ quran-recitations-for-audio-classification, 2023, accessed: 2026-01-24

  2. [2]

    Quran speech to text dataset,

    “Quran speech to text dataset,” https://www.openslr.org/132/, n.d., openSLR SLR132

  3. [3]

    Quran-md: A fine-grained multimodal dataset of the quran,

    M. U. Salman, M. A. Qazi, and M. T. Alam, “Quran-md: A fine-grained multimodal dataset of the quran,” in5th Muslims in ML Workshop co-located with NeurIPS 2025

  4. [4]

    Gemini 2.5 flash,

    Google DeepMind, “Gemini 2.5 flash,” https://deepmind.google/models/gemini/flash/, 2025

  5. [5]

    Whisperx: Time-accurate speech transcription of long-form audio,

    M. Bain, J. Huh, T. Han, and A. Zisserman, “Whisperx: Time-accurate speech transcription of long-form audio,” INTERSPEECH, 2023

  6. [6]

    Silma embedding matryoshka 0.1,

    A. B. Soliman, K. Ouda, and S. AI, “Silma embedding matryoshka 0.1,” https://huggingface.co/silma-ai/ silma-embedding-matryoshka-v0.1, 2024

  7. [7]

    Automatic pronunciation error detection and correction of the holy quran’s learners using deep learning,

    H. M. Abdullah Abdelfattah, Mahmoud I.Khalil, “Automatic pronunciation error detection and correction of the holy quran’s learners using deep learning,”arXiv preprint arXiv:2509.00094, 2025

  8. [8]

    EAT: Self-supervised pre-training with efficient audio transformer,

    W. Chen, Y . Liang, Z. Ma, Z. Zheng, and X. Chen, “EAT: Self-supervised pre-training with efficient audio transformer,” inIJCAI, 2024

  9. [9]

    Dinov2: Learning robust visual features without supervision,

    M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, R. Howes, P.-Y . Huang, H. Xu, V . Sharma, S.-W. Li, W. Galuba, M. Rabbat, M. Assran, N. Ballas, G. Synnaeve, I. Misra, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without ...

  10. [10]

    whisper-base-ar-quran: Whisper base fine-tuned on qur’anic arabic,

    Tarteel AI, “whisper-base-ar-quran: Whisper base fine-tuned on qur’anic arabic,” https://huggingface.co/tarteel-ai/ whisper-base-ar-quran, 2022

  11. [11]

    Robust Speech Recognition via Large-Scale Weak Supervision

    A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,”arXiv preprint arXiv:2212.04356, 2023

  12. [12]

    Fine-tuned XLSR-53 large model for speech recognition in Arabic,

    J. Grosman, “Fine-tuned XLSR-53 large model for speech recognition in Arabic,” https://huggingface.co/ jonatasgrosman/wav2vec2-large-xlsr-53-arabic, 2021

  13. [13]

    Scaling speech technology to 1,000+ languages,

    V . Pratap, A. Tjandra, B. Shiet al., “Scaling speech technology to 1,000+ languages,”arXiv preprint arXiv:2305.13516, 2023

  14. [14]

    Qwen3-asr technical report,

    Qwen Team, “Qwen3-asr technical report,” https://huggingface.co/Qwen/Qwen3-ASR-1.7B, 2026

  15. [15]

    Cohere transcribe: State-of-the-art speech recognition,

    Cohere Labs, “Cohere transcribe: State-of-the-art speech recognition,” https://huggingface.co/CohereLabs/ cohere-transcribe-03-2026, 2026. 2026-04-22

  16. [16]

    Voxtral Realtime

    A. H. Liu, A. Ehrenberg, A. Loet al., “V oxtral realtime,”arXiv preprint arXiv:2602.11298, 2026

  17. [17]

    Vibevoice-asr technical report,

    Microsoft Research, “Vibevoice-asr technical report,” https://huggingface.co/microsoft/VibeV oice-ASR, 2026