A strongly annotated passive acoustic dataset for tropical bird monitoring
Pith reviewed 2026-05-22 08:57 UTC · model grok-4.3
The pith
PteroSet supplies 15,372 time-frequency annotations of 168 Neotropical bird species across 73 hours of recordings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present PteroSet, a curated dataset of strongly annotated Neotropical bird vocalizations recorded in Puerto Asis (Putumayo) and Pivijay (Magdalena), Colombia. The dataset comprises 563 recordings totaling 73.62 hours and 15,372 time-frequency annotations, including 6,702 species-level events across 168 species. Annotations follow a COCO-inspired JSON schema that unifies audio files, taxonomic categories, and labels. PteroSet serves as a benchmark highlighting acoustic co-occurrence and domain shift, and includes a deep learning baseline for binary bird detection.
What carries the argument
PteroSet dataset with its COCO-inspired JSON schema for audio files, taxonomic categories, and machine learning labels.
If this is right
- Supervised models can train on exact time labels for bird detection.
- Dataset shows real challenges from overlapping calls and site differences.
- Provides public benchmark for tropical acoustic monitoring algorithms.
- Supports non-invasive biodiversity tracking in Neotropical regions.
Where Pith is reading between the lines
- Similar datasets for other taxa could extend monitoring to insects or amphibians.
- Transfer learning from this data may reduce labeling needs at new sites.
- Strong annotations appear necessary when vocalizations overlap frequently.
Load-bearing premise
Expert annotators can correctly name species and draw exact time-frequency boundaries in dense overlapping tropical soundscapes.
What would settle it
Other experts re-labeling part of the data and showing major mismatches in species names or boundary times.
Figures
read the original abstract
Passive acoustic monitoring enables continuous, non-invasive biodiversity assessment across diverse ecosystems. The scale of these datasets has driven the adoption of machine learning, with supervised approaches showing strong performance. However, supervised methods require time-resolved annotated datasets, which remain scarce, especially in complex tropical soundscapes. We present PteroSet, a curated dataset of strongly annotated Neotropical bird vocalizations recorded in Puerto Asis (Putumayo) and Pivijay (Magdalena), Colombia, between 2023 and 2025. The dataset comprises 563 recordings (73.62 h) and 15,372 time-frequency annotations, including 6,702 events identified to the species level across 168 species. We release the annotations in a COCO-inspired JSON schema that unifies audio files, taxonomic categories, and labels for machine learning workflows. Beyond providing annotated data, PteroSet serves as a realistic benchmark that highlights key characteristics of tropical soundscapes, including acoustic co-occurrence and domain shift across recording sites. We provide a deep learning baseline for binary bird detection, demonstrating PteroSet's usability and the challenges it presents.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents PteroSet, a curated dataset of strongly annotated Neotropical bird vocalizations recorded at two sites in Colombia (Puerto Asis and Pivijay) from 2023 to 2025. It comprises 563 recordings (73.62 h total) containing 15,372 time-frequency annotations, of which 6,702 events are identified to species level across 168 species. Annotations are released in a COCO-inspired JSON schema, and the paper supplies a deep-learning baseline for binary bird detection while noting the dataset's value for studying acoustic co-occurrence and domain shift.
Significance. If the species-level labels and time-frequency boundaries can be shown to be reliable, PteroSet would constitute a useful addition to the limited set of strongly annotated tropical PAM datasets, providing a realistic benchmark that incorporates overlapping vocalizations and cross-site variation. The public JSON schema and baseline code further support immediate use in supervised ML workflows.
major comments (2)
- [Abstract] Abstract and Dataset section: the central claim that the 6,702 species-level events constitute reliable ground truth rests on the assertion of 'expert manual' annotations, yet no annotation protocol, number of annotators, reference materials, decision rules for ambiguous or overlapping calls, or inter-annotator agreement metric is supplied. Without these elements the headline statistics cannot be treated as verified labels in complex Neotropical soundscapes.
- [Dataset Description] Dataset section: the two-site design is presented as capturing domain shift, but no quantitative comparison of acoustic properties, species composition, or recording conditions between Puerto Asis and Pivijay is provided. This leaves the domain-shift claim unsupported by evidence.
minor comments (1)
- [Abstract] Abstract: the total duration is given as 73.62 h; confirm that this figure is repeated with the same precision in the main text and tables.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each major comment below and describe the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract and Dataset section: the central claim that the 6,702 species-level events constitute reliable ground truth rests on the assertion of 'expert manual' annotations, yet no annotation protocol, number of annotators, reference materials, decision rules for ambiguous or overlapping calls, or inter-annotator agreement metric is supplied. Without these elements the headline statistics cannot be treated as verified labels in complex Neotropical soundscapes.
Authors: We agree that a more detailed description of the annotation process is necessary to support the reliability of the species-level labels. In the revised manuscript we will add a new subsection to the Dataset Description that specifies the annotation protocol: annotations were performed sequentially by two expert ornithologists with extensive field experience in Colombian avifauna; reference materials included xeno-canto recordings, local field guides, and spectrogram examples from prior studies; decision rules for overlapping calls prioritized labeling the clearest vocalization while noting co-occurrence; ambiguous cases were resolved through joint review. Although a formal inter-annotator agreement metric was not computed, we will describe the consistency checks employed. These additions will allow readers to evaluate the ground-truth quality directly. revision: yes
-
Referee: [Dataset Description] Dataset section: the two-site design is presented as capturing domain shift, but no quantitative comparison of acoustic properties, species composition, or recording conditions between Puerto Asis and Pivijay is provided. This leaves the domain-shift claim unsupported by evidence.
Authors: We accept that quantitative evidence is required to substantiate the domain-shift claim. In the revised manuscript we will insert a new table (and accompanying text) that compares the two sites on species composition (unique species counts and Jaccard overlap), recording conditions (vegetation type, elevation, time-of-day distribution), and basic acoustic properties (mean sound pressure level and dominant frequency range across recordings). This will provide concrete support for the claim that the sites introduce realistic domain variation suitable for benchmarking. revision: yes
Circularity Check
No circularity: data-release paper with no derivations or self-referential predictions
full rationale
The manuscript is a dataset release describing the collection and annotation of 563 recordings with 15,372 time-frequency boxes across 168 species. No equations, fitted parameters, uniqueness theorems, or predictive claims appear that could reduce to author-defined inputs by construction. The central contribution is the curated collection itself, released in a COCO-inspired JSON schema, together with a simple baseline detector. Because there is no derivation chain to inspect, no load-bearing step reduces to a prior choice of the authors. The two-site design and expert-annotation claim are presented as factual descriptions of the data rather than results derived from internal definitions or self-citations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Expert manual annotation provides reliable species identification and temporal localization in complex overlapping soundscapes
Reference graph
Works this paper leans on
-
[1]
Sugai, L. S. M., Silva, T. S. F., Ribeiro, J. W., Jr & Llusia, D. Terrestrial Passive Acoustic Monitoring: Review and Perspectives. BioScience 69, 15–25 (2019)
work page 2019
-
[2]
Sugai, L. S. M., Desjonquères, C., Silva, T. S. F. & Llusia, D. A roadmap for survey designs in terrestrial acoustic monitoring. Remote Sensing in Ecology and Conservation 6, 220–235 (2020)
work page 2020
-
[3]
Gibb, R., Browning, E., Glover-Kapfer, P. & Jones, K. E. Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring. Methods in Ecology and Evolution https://doi.org/10.1111/2041-210X.13101(2018) doi:10.1111/2041-210X.13101
-
[4]
Dufourq, E., Batist, C., Foquet, R. & Durbach, I. Passive acoustic monitoring of animal populations with transfer learning. Ecological Informatics 70, 101688 (2022)
work page 2022
-
[5]
Kahl, S. et al. Overview of BirdCLEF 2023: Automated Bird Species Identification in Eastern Africa. in CLEF-WN 2023 - 14th Conference and Labs of the Evaluation Forum (Thessaloniki, Greece, 2023)
work page 2023
-
[6]
Computational bioacoustics with deep learning: a review and roadmap
Stowell, D. Computational bioacoustics with deep learning: a review and roadmap. PeerJ 10, e13152 (2022)
work page 2022
-
[7]
Stowell, D., Wood, M. D., Pamuła, H., Stylianou, Y. & Glotin, H. Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge. Methods in Ecology and Evolution 10, 368–380 (2019)
work page 2019
-
[8]
Koumura, T. & Okanoya, K. Automatic Recognition of Element Classes and Boundaries in the Birdsong with Variable Sequences. PLOS ONE 11, e0159188 (2016)
work page 2016
-
[9]
Gómez-Gómez, J., Vidaña-Vila, E. & Sevillano, X. Western Mediterranean wetlands bird species classification: evaluating small-footprint deep learning approaches on a new annotated dataset. Preprint athttps://doi.org/10.48550/arXiv.2207.05393(2022)
-
[10]
Martin, K., Adam, O., Obin, N. & Dufour, V. Rookognise: Acoustic detection and identification of individual rooks in field recordings using multi-task neural networks. Ecological Informatics 72, 101818 (2022)
work page 2022
-
[11]
Kumar, S., Anshuman, B., Rüttimann, L., Hahnloser, R. H. R. & Arora, V. Balanced Deep CCA for Bird Vocalization Detection. in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (2023)
work page 2023
-
[12]
Zhao, Z. et al. Automated bird acoustic event detection and robust species classification. Ecological Informatics 39, 99–108 (2017)
work page 2017
-
[13]
Salamon, J. et al. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLOS ONE 11, e0166866 (2016)
work page 2016
-
[14]
Lostanlen, V., Salamon, J., Farnsworth, A., Kelling, S. & Bello, J. P. Birdvox-Full-Night: A Dataset and Benchmark for Avian Flight Call Detection. in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 266–270 (2018)
work page 2018
-
[15]
Morfi, V., Bas, Y., Pamuła, H., Glotin, H. & Stowell, D. NIPS4Bplus: a richly annotated birdsong audio dataset. PeerJ Comput. Sci. 5, e223 (2019). 16
work page 2019
-
[16]
Cramer, A. L., Lostanlen, V., Farnsworth, A., Salamon, J. & Bello, J. P. Chirping up the Right Tree: Incorporating Biological Taxonomies into Deep Bioacoustic Classifiers. in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 901–905 (2020). doi:10.1109/ICASSP40776.2020.9052908
-
[17]
Arriaga, J. G., Cody, M. L., Vallejo, E. E. & Taylor, C. E. Bird-DB: A database for annotated bird song sequences. Ecological Informatics 27, 21–25 (2015)
work page 2015
-
[18]
Chronister, L., Rhinehart, T., Place, A. & Kitzes, J. An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. Ecology 102, (2021)
work page 2021
-
[19]
Hedley, R. W. Complexity, Predictability and Time Homogeneity of Syntax in the Songs of Cassin’s Vireo (Vireo cassinii). PLOS ONE 11, e0150822 (2016)
work page 2016
-
[20]
Vidaña-Vila, E., Navarro, J. & Alsina-Pagès, R. M. Towards Automatic Bird Detection: An Annotated and Segmented Acoustic Dataset of Seven Picidae Species. Data 2, (2017)
work page 2017
-
[21]
Merino Recalde, N. et al. A densely sampled and richly annotated acoustic data set from a wild bird population. Animal Behaviour 211, 111–122 (2024)
work page 2024
-
[22]
Weldy, M. J. et al. Audio tagging of avian dawn chorus recordings in California, Oregon and Washington. Biodivers Data J 12, e118315 (2024)
work page 2024
-
[23]
Jing, X. et al. DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition. Preprint athttps://doi.org/10.48550/arXiv.2406.08517(2024)
-
[24]
Hagiwara, M. et al. BEANS: The Benchmark of Animal Sounds. in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (2023)
work page 2023
-
[25]
Rauch, L. et al. BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics. in The Thirteenth International Conference on Learning Representations (2025)
work page 2025
-
[26]
Cañas, J. et al. Overview of BirdCLEF+ 2025: Multi-Taxonomic Sound Identification in the Middle Magdalena, Colombia. in CLEF 2025-Working Notes of the Conference and Labs of the Evaluation Forum vol. 4038 2909–2919 (2025)
work page 2025
-
[27]
Mesaros, A., Serizel, R., Heittola, T., Virtanen, T. & Plumbley, M. D. A decade of DCASE: Achievements, practices, evaluations and future challenges. in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (2025)
work page 2025
-
[28]
Sharing bird sounds from around the world.https://xeno-canto.org/(2022)
work page 2022
-
[29]
Kahl, S., Wood, C. M., Eibl, M. & Klinck, H. BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics 61, 101236 (2021)
work page 2021
-
[30]
Hamer, J. et al. BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics. Preprint athttps://doi.org/10.48550/arXiv.2312.07439(2023)
-
[31]
GBIF Regional Statistics - 2020
Waller, J. GBIF Regional Statistics - 2020. GBIF Data Bloghttps://data-blog.gbif.org/ post/gbif-regional-statistics-2020/(2020)
work page 2020
-
[32]
The top 10 most biodiverse countries
Butler, R. The top 10 most biodiverse countries. https://news.mongabay.com/2016/05/ top-10-biodiverse-countries/(2016). 17
work page 2016
-
[33]
Vega-Hidalgo, Á. et al. A collection of fully-annotated soundscape recordings from neotropical coffee farms in Colombia and Costa Rica. Zenodohttps://doi.org/10.5281/zenodo.7525349 (2023)
-
[34]
Hopping, W. A., Kahl, S. & Klinck, H. A collection of fully-annotated soundscape recordings from the Southwestern Amazon Basin. Zenodohttps://doi.org/10.5281/zenodo.7079124 (2022)
-
[35]
Pérez-Granados, C. et al. WABAD: A world annotated bird acoustic dataset for passive acoustic monitoring. Ecology 107, e70317 (2026)
work page 2026
-
[36]
Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. in Computer Vision – ECCV 2014 (eds Fleet, D., Pajdla, T., Schiele, B. & Tuytelaars, T.) 740–755 (Springer International Publishing, Cham, 2014). doi:10.1007/978-3-319-10602-1_48
-
[37]
Hill, A. P., Prince, P., Snaddon, J. L., Doncaster, C. P. & Rogers, A. AudioMoth: A low-cost acoustic device for monitoring biodiversity and the environment. HardwareX 6, e00073 (2019)
work page 2019
-
[38]
Lisa Yang Center for Conservation Bioacoustics
K. Lisa Yang Center for Conservation Bioacoustics. Raven Pro: Interactive Sound Analysis Software (Version 1.6.5). The Cornell Lab of Ornithology (2026)
work page 2026
-
[39]
Ruiz, D. et al. Pteroset. Zenodohttps://doi.org/10.5281/zenodo.18563039(2026). 18
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.