pith. sign in

arxiv: 2606.06975 · v1 · pith:HJQXDWIPnew · submitted 2026-06-05 · 💻 cs.SD · eess.AS

MyGardenBird: A Machine-Learning-Ready Bird Sound Dataset for Twelve Common Malaysian Birds

Pith reviewed 2026-06-27 21:14 UTC · model grok-4.3

classification 💻 cs.SD eess.AS
keywords bird sound datasetbioacousticsmachine learningMalaysian birdsMel-spectrogramsconvolutional neural networksdata curation
0
0 comments X

The pith

A new dataset of twelve Malaysian bird species supports 92-96 percent convolutional classification accuracy on their sounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper assembles and releases MyGardenBird as a balanced collection of 7200 three-second audio clips drawn from twelve common bird species in Peninsular Malaysia and the Indo-Malayan region. Recordings sourced from public archives undergo species filtering, manual spectrogram segmentation, and quality control to create the primary release, which supplies geospatial metadata, vocalisation categories, and signal-to-noise ratios. Partitions are defined at the level of original recordings to block data leakage between training and evaluation sets. Baseline experiments with convolutional neural networks operating on Mel-spectrograms reach 92 to 96 percent test accuracy, establishing that the species vocalisations remain separable under this curation. The release includes preprocessing code and a higher-rate supplementary version to aid further use.

Core claim

The authors establish that their MyGardenBird dataset of 7200 manually validated three-second clips from twelve Malaysian bird species exhibits strong interspecies separability, as shown by convolutional neural network classification accuracies of 92-96 percent on Mel-spectrograms when partitions are enforced at the source-recording level.

What carries the argument

The curation pipeline of species-level filtering from Xeno-canto, followed by manual spectrogram segmentation, quality control, and source-recording-level partitioning produces the balanced, leak-free dataset that carries the separability result.

If this is right

  • Models trained on the dataset can classify the twelve species in new audio recordings at the reported accuracy levels.
  • The provided SNR metadata enables direct study of how signal quality influences classification performance.
  • The accompanying code allows extension of the same pipeline to additional species or geographic areas.
  • The 44.1 kHz version supports experiments that require higher sampling rates than the primary 16 kHz release.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The curation approach could be applied to other tropical regions where labeled bird sound data remain scarce.
  • Field recordings collected through citizen science could be tested against the dataset for real-world deployment.
  • High separability opens the possibility of extending the work to continuous monitoring or multi-species detection tasks.

Load-bearing premise

Single-annotator manual segmentation and quality control produces labels consistent enough to support reliable high-accuracy machine learning.

What would settle it

Independent re-annotation of a subset of clips by multiple experts that reveals frequent species misassignments or inconsistent segment boundaries and drops classification accuracy below 80 percent.

Figures

Figures reproduced from arXiv: 2606.06975 by Mohd Yamani Idna Idris, Muhammad Mun'im Ahmad Zabidi, Norisma Idris.

Figure 1
Figure 1. Figure 1: MyGardenBird curation pipeline, comprising six sequential steps implemented across nine Python scripts (Stage 1 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) Xeno-canto quality grade composition per species. Grade A recordings were prioritised, with Grades B and C [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Interactive segmentation GUI (Stage4 annotate segments.py). The upper panel displays the log-mel spectrogram of the source FLAC recording with automatically proposed candidate segments (blob detection). The middle panel shows curator-approved segments as coloured overlays, each fixed at exactly three seconds and repositionable via drag-and-drop. The lower panel provides blob-detection tuning controls and a… view at source ↗
Figure 4
Figure 4. Figure 4: Representative mel-spectrograms for all twelve species in MyGardenBird. Each panel shows the clip whose SNR [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Entity–Relationship diagram for the MyGardenBird metadata files. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Geographic distribution of the Xeno-canto source recordings used in MyGardenBird (sources with available [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Per-species signal-to-noise ratio distribution. SNR was estimated using a percentile-based noise-floor method [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-class performance of MobileNetV3-Small on the 16 kHz test set (Mixup [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Mean test accuracy versus computational cost (MFLOPs, log scale) for the three CNN architectures trained on [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
read the original abstract

Bioacoustic datasets from tropical regions remain limited, in part due to the absence of reproducible workflows for aggregating recordings from public archives. We present \textbf{MyGardenBird}, a curated dataset of bird vocalisations representing twelve common species across Peninsular Malaysia and the Indo-Malayan region. Recordings were sourced from Xeno-canto and processed through species-level filtering, manual spectrogram segmentation, and quality control checks. The primary release comprises 7,200 manually validated audio clips (16 kHz, 16-bit PCM mono WAV), balanced at 600 three-second clips per species (6.0 hours total) derived from 1,381 distinct recordings. Metadata includes geospatial coordinates, vocalisation categories, and signal-to-noise ratio (SNR) values (range: 0.83--59.18 dB; mean: 15.80 dB). A supplementary 44.1 kHz version is also provided. To mitigate data leakage, dataset partitions are defined at the source-recording level. Baseline classification experiments using convolutional neural networks on Mel-spectrograms achieved test accuracies of 92--96\%, indicating strong interspecies separability. Limitations include reliance on single-annotator curation; however, validation with BirdNET confirmed label consistency. MyGardenBird is openly available at https://doi.org/10.5281/zenodo.20306877 under a CC BY-NC-SA 4.0 licence. Complete preprocessing code accompanies the release to support reproducibility and future expansion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents MyGardenBird, a curated dataset of 7,200 three-second audio clips (600 per species) from 12 common Malaysian bird species sourced from Xeno-canto. It describes species-level filtering, single-annotator manual spectrogram segmentation, quality control, recording-level train/test splits to prevent leakage, and metadata including SNR values. Baseline CNN classification on Mel-spectrograms yields 92-96% test accuracy, presented as evidence of strong interspecies separability. The dataset (16 kHz and 44.1 kHz versions), metadata, and preprocessing code are released openly under CC BY-NC-SA 4.0.

Significance. If label quality holds, the work supplies a balanced, reproducible tropical bioacoustic dataset where such resources are scarce, with explicit recording-level partitioning and full code release as clear strengths for ML reproducibility. The reported baseline accuracies indicate practical utility for classification tasks, and the open Zenodo DOI supports immediate use and extension.

major comments (1)
  1. [Abstract] Abstract: The assertion that 'validation with BirdNET confirmed label consistency' provides no quantitative metric (agreement rate, confusion matrix, or error analysis on any subset). This is load-bearing for the separability claim, because the 92-96% CNN accuracies on the recording-level split could be inflated by systematic single-annotator mislabels rather than true acoustic distinctiveness; a concrete agreement statistic is required to substantiate the interpretation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the manuscript. We address the single major comment below and will revise the paper accordingly to improve clarity and substantiation of claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that 'validation with BirdNET confirmed label consistency' provides no quantitative metric (agreement rate, confusion matrix, or error analysis on any subset). This is load-bearing for the separability claim, because the 92-96% CNN accuracies on the recording-level split could be inflated by systematic single-annotator mislabels rather than true acoustic distinctiveness; a concrete agreement statistic is required to substantiate the interpretation.

    Authors: We agree that the current wording in the abstract lacks the requested quantitative support. The BirdNET check was performed as a secondary consistency verification on the curated clips rather than a formal inter-annotator study, but no agreement rate, confusion matrix, or subset analysis is reported. In the revised manuscript we will either (a) remove the BirdNET sentence from the abstract and limitations section or (b) add the concrete statistics that were computed during curation (whichever is supported by our internal records) so that readers can properly evaluate label quality independent of the CNN results. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset curation with empirical baselines only

full rationale

The manuscript is a data release paper describing sourcing from Xeno-canto, manual segmentation, and release of 7200 clips. Baseline CNN accuracies (92-96%) are reported as empirical results on the released data, not as derivations or predictions from fitted parameters. No equations, self-definitional steps, fitted-input predictions, or load-bearing self-citations appear. External archives and BirdNET provide independent grounding; label curation assumptions affect correctness but do not create circularity by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper contributes a new curated dataset rather than deriving results from parameters or new entities; the primary assumption concerns the representativeness and label quality of archived recordings.

axioms (1)
  • domain assumption Recordings from Xeno-canto are representative of the vocalizations of the twelve species
    The curation begins from public archive data without independent field validation or multi-annotator consensus.

pith-pipeline@v0.9.1-grok · 5815 in / 1283 out tokens · 23021 ms · 2026-06-27T21:14:36.865725+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 1 canonical work pages

  1. [1]

    2014 , publisher =

    Integer Programming , author =. 2014 , publisher =

  2. [2]

    Spectral characteristics of

    Divyapriya, C and Pramod, P , year =. Spectral characteristics of. Current Science , publisher =

  3. [3]

    2024 , url =

    Cbc (Coin-or Branch and Cut) Solver , author =. 2024 , url =

  4. [4]

    2020 , journal =

    A state-of-the-art review on birds as indicators of biodiversity: Advances, challenges, and future directions , author =. 2020 , journal =

  5. [5]

    A global assessment of

    Funosas, David and Sebasti. A global assessment of. 2026 , journal =

  6. [6]

    2016 , howpublished =

    Deep Residual Learning for Image Recognition , author =. 2016 , howpublished =

  7. [7]

    Searching for

    Howard, Andrew and Sandler, Mark and Chu, Grace and Chen, Liang-Chieh and Chen, Bo and Tan, Mingxing and Wang, Weijun and Zhu, Yukun and Pang, Ruoming and Vasudevan, Vijay and others , year =. Searching for

  8. [8]

    Ecological Informatics , publisher =

    Kahl, Stefan and Wood, Connor M and Eibl, Maximilian and Klinck, Holger , year =. Ecological Informatics , publisher =

  9. [9]

    Overview of

    Kahl, Stefan and Vellinga, Willem-Pier and Denton, Samuel and Flinsenberg, Stefan and Fedorov, Roman and Klinck, Holger and Planque, Robert and Glotin, Herv. Overview of. 2023 , howpublished =

  10. [10]

    Checklist of the birds of

    Lepage, Denis , year =. Checklist of the birds of

  11. [11]

    2026 , url =

    Birds of. 2026 , url =

  12. [12]

    Park, Daniel S and Chan, William and Zhang, Yu and Chiu, Chung-Cheng and Zoph, Barret and Cubuk, Ekin D and Le, Quoc V , year =

  13. [13]

    Influence of landscape matrix on urban bird abundance: evidence from

    Puan, Chong Leong and Yeong, Kok Loong and Ong, Kang Woei and Fauzi, Muhd Izzat Ahmad and Yahya, Muhammad Syafiq and Khoo, Swee Seng , year =. Influence of landscape matrix on urban bird abundance: evidence from. Journal of Asia-Pacific Biodiversity , publisher =

  14. [14]

    Rasmussen and John C

    Pamela C. Rasmussen and John C. Anderton , year =. Birds of

  15. [15]

    2025 , journal =

    Geographic variation in acoustic signals in wildlife: A systematic review , author =. 2025 , journal =. doi:10.1111/jbi.15116 , issn =

  16. [16]

    2016 , howpublished =

    Audio Based Bird Species Identification using Deep Learning Techniques , author =. 2016 , howpublished =

  17. [17]

    2022 , journal =

    Computational bioacoustics with deep learning: a review and roadmap , author =. 2022 , journal =

  18. [18]

    Tan, Mingxing and Le, Quoc , year =

  19. [19]

    2018 , howpublished =

    mixup: Beyond Empirical Risk Minimization , author =. 2018 , howpublished =