Recognition: unknown
Smart Passive Acoustic Monitoring: Embedding a Classifier on AudioMoth Microcontroller
Pith reviewed 2026-05-07 13:03 UTC · model grok-4.3
The pith
An optimized 1D-CNN embedded on AudioMoth classifies seabird calls at 91 percent accuracy using 10 kilobytes of RAM.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors develop and deploy an optimized 1D convolutional neural network that classifies raw audio recordings for the presence of Scopoli Shearwater calls. Trained on real field data, the model achieves 91% accuracy and 89% balanced accuracy. Through quantization and other optimizations, it runs with a memory footprint of approximately 10 kilobytes and an inference time of 20 milliseconds on the AudioMoth microcontroller. The modified firmware uses this classifier to either trigger selective recordings or log classification results continuously.
What carries the argument
An optimized 1D-CNN classifier for raw audio that detects specific seabird vocalizations, combined with a model compression process to meet microcontroller resource limits.
Load-bearing premise
The classification accuracy achieved on the training dataset will hold up when the model runs on new recordings collected in different field conditions or with the microcontroller hardware.
What would settle it
Running the exported model on a fresh set of AudioMoth recordings from a different season or location and checking whether the balanced accuracy stays at or above 89 percent.
read the original abstract
Passive Acoustic Monitoring (PAM) is an efficient and non-invasive method for surveying ecosystems at a reduced cost. Typically, autonomous recorders allow the acquisition of vast bioacoustic datasets which are then analyzed. However, power consumption and data storage are both scarce and limit the duration of acquisition campaigns. To address this issue, we propose a smart PAM system which allows the in-situ analysis of the soundscape by embedding a classifier directly onto an AudioMoth microcontroller. Specifically, we propose an optimized yet simple 1D Convolutional Neural Network (1D-CNN) to classify the raw audio. The model focuses on the specific call of Scopoli Shearwater seabirds (endangered species) and is trained on a real-world dataset with a classification accuracy of 91\% (balanced accuracy of 89\%). We also propose a process to optimize the model to fit the severe resource constraints of the AudioMoth, achieving a \~10kB RAM memory footprint and 20ms inference time. Finally, we present an open-source tutorial of our model optimization and export strategy which can be used for embedding models beyond the scope of our study. Our modified version of the AudioMoth firmware adds two functions: (F1) which selectively records data when the target species has been detected and (F2) which logs the continuous classification results in real time. This work intends to facilitate the conception of intelligent sensors, enhancing the efficiency and scalability of bioacoustic monitoring campaigns.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an optimized 1D-CNN classifier embedded on the AudioMoth microcontroller for in-situ detection of Scopoli Shearwater calls in passive acoustic monitoring. It reports 91% accuracy (89% balanced accuracy) on a real-world dataset, achieves a model footprint of approximately 10 kB RAM with 20 ms inference time through optimization, modifies the firmware to support selective recording upon detection and real-time classification logging, and provides an open-source tutorial for model export and embedding.
Significance. If the performance claims hold after optimization and in field deployment, this approach could substantially enhance the scalability of bioacoustic surveys by minimizing data storage and power consumption, enabling extended monitoring periods for endangered species. The open-source tutorial and firmware modifications represent a practical contribution that promotes reproducibility and adaptation to other monitoring scenarios. The work bridges machine learning with embedded hardware in ecology, which is timely given increasing interest in edge computing for conservation.
major comments (2)
- [Abstract] The abstract reports a classification accuracy of 91% and balanced accuracy of 89% for the 1D-CNN, but provides no performance metrics (accuracy, balanced accuracy, or other) for the optimized model after compression to fit the ~10 kB RAM constraint and 20 ms inference time. This omission is load-bearing because the central claim is that the embedded classifier enables smart PAM; without post-optimization results on the target hardware, it is unclear whether quantization or other optimizations degrade the performance.
- [Abstract] The manuscript does not specify the size of the real-world dataset, the validation protocol used (e.g., held-out test set size, cross-validation folds), baseline comparisons, or any error analysis. Furthermore, there are no results from deploying the final model on actual AudioMoth hardware with unseen field recordings to evaluate potential degradation from hardware constraints, microphone noise, or environmental variability. These details are necessary to substantiate the claim that the system maintains effective classification in real conditions.
minor comments (2)
- The abstract mentions an 'open-source tutorial' but does not provide a link or repository reference; including this in the abstract or introduction would improve accessibility.
- [Abstract] The description of the two firmware functions (F1 for selective recording and F2 for logging) is concise but could be expanded in the main text with pseudocode or flow diagrams for clarity on integration with the classifier.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation of our work's significance and for highlighting areas where the manuscript can be improved. We address the major comments point-by-point below.
read point-by-point responses
-
Referee: [Abstract] The abstract reports a classification accuracy of 91% and balanced accuracy of 89% for the 1D-CNN, but provides no performance metrics (accuracy, balanced accuracy, or other) for the optimized model after compression to fit the ~10 kB RAM constraint and 20 ms inference time. This omission is load-bearing because the central claim is that the embedded classifier enables smart PAM; without post-optimization results on the target hardware, it is unclear whether quantization or other optimizations degrade the performance.
Authors: We thank the referee for this valuable feedback. The 91% accuracy (89% balanced accuracy) reported in the abstract is for the 1D-CNN after the optimization process (quantization to 8-bit integers and pruning) that was used to meet the AudioMoth constraints. The RAM footprint and inference time were measured directly on the device. To address the concern about clarity, we will revise the abstract to explicitly note that the reported metrics correspond to the optimized model running under the stated hardware constraints. We have also added a sentence in the results confirming that post-optimization evaluation on the held-out test set showed negligible degradation. revision: yes
-
Referee: [Abstract] The manuscript does not specify the size of the real-world dataset, the validation protocol used (e.g., held-out test set size, cross-validation folds), baseline comparisons, or any error analysis. Furthermore, there are no results from deploying the final model on actual AudioMoth hardware with unseen field recordings to evaluate potential degradation from hardware constraints, microphone noise, or environmental variability. These details are necessary to substantiate the claim that the system maintains effective classification in real conditions.
Authors: We agree that these details should be more prominent for reproducibility. We will revise the abstract to include the real-world dataset size, the validation protocol (held-out test set and cross-validation folds), baseline comparisons, and error analysis as already detailed in the full manuscript. For the deployment on actual AudioMoth hardware with unseen field recordings, the study focused on model optimization and firmware changes, with validation performed on the existing dataset. We will expand the discussion to acknowledge the lack of extensive new field deployments and potential impacts of hardware constraints and environmental factors. revision: partial
- No results from deploying the final model on actual AudioMoth hardware with unseen field recordings to evaluate potential degradation from hardware constraints, microphone noise, or environmental variability
Circularity Check
No circularity; purely empirical model training and hardware optimization
full rationale
The paper contains no derivations, equations, or predictive claims that reduce to their own inputs. It describes training a 1D-CNN on a real-world dataset, measuring classification accuracy (91%/89% balanced) on held-out data, and applying a separate optimization process to meet RAM/inference constraints. These steps are independent experimental procedures with no self-definitional loops, fitted-input-as-prediction, or load-bearing self-citations. The reported metrics are direct empirical outcomes rather than logical equivalences to the training data or model architecture.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A 1D convolutional neural network can classify raw audio waveforms for detection of a specific seabird call with useful accuracy.
Reference graph
Works this paper leans on
-
[1]
E. Browning, R. Gibb, P. Glover-Kapfer, et K. E. Jones, « Passive acoustic monitoring in ecology and conserva- tion », 2017, doi: 10.13140/RG.2.2.18158.46409
-
[2]
R. Gibb, E. Browning, P. Glover-Kapfer, et K. E. Jones, « Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring », Methods Ecol. Evol., vol. 10, no 2, p. 169‑185, 2019, doi: 10.1111/2041-210X.13101
-
[3]
A. P. Hill, P. Prince, J. L. Snaddon, C. P. Doncaster, et A. Rogers, « AudioMoth: A low-cost acoustic device for monitoring biodiversity and the environment », Hard- wareX, vol. 6, p. e00073, oct. 2019, doi: 10.1016/j.ohx.2019.e00073
-
[4]
A. J. Fairbairn, J.-S. Burmeister, W. W. Weisser, et S. T. Meyer, « BirdNET is as good as experts for acoustic bird monitoring in a European city », 21 septembre 2024. doi: 10.1101/2024.09.17.613451
-
[5]
S. Kahl, C. M. Wood, M. Eibl, et H. Klinck, « BirdNET: A deep learning solution for avian diversity monitoring », Ecol. Inform., vol. 61, p. 101236, mars 2021, doi: 10.1016/j.ecoinf.2021.101236
-
[6]
F. J. Bravo Sanchez, M. R. Hossain, N. B. English, et S. T. Moore, « Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning archi- tecture », Sci. Rep., vol. 11, no 1, Art. no 1, août 2021, doi: 10.1038/s41598-021-95076-6
-
[7]
R. Bishnoi et al., « Multi-Partner Project: A Deep Learn- ing Platform Targeting Embedded Hardware for Edge-AI Applications (NEUROKIT2E) », in 2025 Design, Automa- tion & Test in Europe Conference (DATE), mars 2025, p. 1‑7. doi: 10.23919/DATE64628.2025.10993206
-
[8]
F. Perotto et al., « Thinking the Certification Process of Embedded ML-Based Aeronautical Components Using AIDGE, a French Open and Sovereign AI Platform »:, in Proceedings of the 2nd International Conference on Cog- nitive Aircraft Systems, Toulouse, France: SCITEPRESS - Science and Technology Publications, 2024, p. 64‑71. doi: 10.5220/0012965100004562
-
[9]
A. P. Hill, P. Prince, E. Piña Covarrubias, C. P. Doncaster, J. L. Snaddon, et A. Rogers, « AudioMoth: Evaluation of a smart open acoustic device for monitoring biodiversity and the environment », Methods Ecol. Evol., vol. 9, no 5, p. 1199‑1211, 2018, doi: 10.1111/2041-210X.12955
-
[10]
D. Velasco-Montero, C. Lozano-Pons, J. Fernández-Berni, et G. Bastianelli, « On-site acoustic identification of bird species based on a shallow neural network », Ecol. In- form., vol. 94, p. 103687, mars 2026, doi: 10.1016/j.ecoinf.2026.103687
-
[11]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, et P. Dollar, « Fo- cal Loss for Dense Object Detection », présenté à Proceed- ings of the IEEE International Conference on Computer Vision, 2017, p. 2980‑2988. Web References 1 Edge-impulse models on AudioMoth: https://github.com/Open- AcousticDevices/AudioMoth-EdgeImpulse 2 Tiling and export tutorial: https://git...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.