arxiv: 2605.03412 · v1 · submitted 2026-05-05 · 💻 cs.SD · cs.AI

Recognition: unknown

Smart Passive Acoustic Monitoring: Embedding a Classifier on AudioMoth Microcontroller

Louis Lerbourg , Paul Peyret , Juliette Linossier , Marielle Malfante

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:03 UTC · model grok-4.3

classification 💻 cs.SD cs.AI

keywords passive acoustic monitoringembedded AIconvolutional neural networkAudioMothbioacousticsseabird conservationmicrocontroller deploymentedge computing

0 comments

The pith

An optimized 1D-CNN embedded on AudioMoth classifies seabird calls at 91 percent accuracy using 10 kilobytes of RAM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to run a simple neural network directly on a low-power audio recorder to detect calls from an endangered seabird species. Instead of recording everything and analyzing later, the device can decide on the spot what to save, which cuts down on power use and storage. The network works on raw sound without converting to spectrograms and reaches 91 percent accuracy while fitting the tight limits of the AudioMoth hardware. An open tutorial explains how to shrink and export the model for similar uses. This matters for conservation because it makes long-term passive monitoring more practical in remote places.

Core claim

The authors develop and deploy an optimized 1D convolutional neural network that classifies raw audio recordings for the presence of Scopoli Shearwater calls. Trained on real field data, the model achieves 91% accuracy and 89% balanced accuracy. Through quantization and other optimizations, it runs with a memory footprint of approximately 10 kilobytes and an inference time of 20 milliseconds on the AudioMoth microcontroller. The modified firmware uses this classifier to either trigger selective recordings or log classification results continuously.

What carries the argument

An optimized 1D-CNN classifier for raw audio that detects specific seabird vocalizations, combined with a model compression process to meet microcontroller resource limits.

Load-bearing premise

The classification accuracy achieved on the training dataset will hold up when the model runs on new recordings collected in different field conditions or with the microcontroller hardware.

What would settle it

Running the exported model on a fresh set of AudioMoth recordings from a different season or location and checking whether the balanced accuracy stays at or above 89 percent.

read the original abstract

Passive Acoustic Monitoring (PAM) is an efficient and non-invasive method for surveying ecosystems at a reduced cost. Typically, autonomous recorders allow the acquisition of vast bioacoustic datasets which are then analyzed. However, power consumption and data storage are both scarce and limit the duration of acquisition campaigns. To address this issue, we propose a smart PAM system which allows the in-situ analysis of the soundscape by embedding a classifier directly onto an AudioMoth microcontroller. Specifically, we propose an optimized yet simple 1D Convolutional Neural Network (1D-CNN) to classify the raw audio. The model focuses on the specific call of Scopoli Shearwater seabirds (endangered species) and is trained on a real-world dataset with a classification accuracy of 91\% (balanced accuracy of 89\%). We also propose a process to optimize the model to fit the severe resource constraints of the AudioMoth, achieving a \~10kB RAM memory footprint and 20ms inference time. Finally, we present an open-source tutorial of our model optimization and export strategy which can be used for embedding models beyond the scope of our study. Our modified version of the AudioMoth firmware adds two functions: (F1) which selectively records data when the target species has been detected and (F2) which logs the continuous classification results in real time. This work intends to facilitate the conception of intelligent sensors, enhancing the efficiency and scalability of bioacoustic monitoring campaigns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper embeds a simple 1D-CNN on AudioMoth for Scopoli Shearwater detection with reported low-resource metrics and firmware changes plus an open tutorial, but leaves the accuracy of the final deployed model unverified.

read the letter

The main point is that the authors took a basic 1D-CNN trained on real recordings of an endangered seabird, optimized it down to roughly 10 kB RAM and 20 ms inference, and got it running on the AudioMoth with two new firmware functions for selective recording and continuous logging. They also released a tutorial on the export steps so others can try the same approach on different models or targets.

Referee Report

2 major / 2 minor

Summary. The paper proposes an optimized 1D-CNN classifier embedded on the AudioMoth microcontroller for in-situ detection of Scopoli Shearwater calls in passive acoustic monitoring. It reports 91% accuracy (89% balanced accuracy) on a real-world dataset, achieves a model footprint of approximately 10 kB RAM with 20 ms inference time through optimization, modifies the firmware to support selective recording upon detection and real-time classification logging, and provides an open-source tutorial for model export and embedding.

Significance. If the performance claims hold after optimization and in field deployment, this approach could substantially enhance the scalability of bioacoustic surveys by minimizing data storage and power consumption, enabling extended monitoring periods for endangered species. The open-source tutorial and firmware modifications represent a practical contribution that promotes reproducibility and adaptation to other monitoring scenarios. The work bridges machine learning with embedded hardware in ecology, which is timely given increasing interest in edge computing for conservation.

major comments (2)

[Abstract] The abstract reports a classification accuracy of 91% and balanced accuracy of 89% for the 1D-CNN, but provides no performance metrics (accuracy, balanced accuracy, or other) for the optimized model after compression to fit the ~10 kB RAM constraint and 20 ms inference time. This omission is load-bearing because the central claim is that the embedded classifier enables smart PAM; without post-optimization results on the target hardware, it is unclear whether quantization or other optimizations degrade the performance.
[Abstract] The manuscript does not specify the size of the real-world dataset, the validation protocol used (e.g., held-out test set size, cross-validation folds), baseline comparisons, or any error analysis. Furthermore, there are no results from deploying the final model on actual AudioMoth hardware with unseen field recordings to evaluate potential degradation from hardware constraints, microphone noise, or environmental variability. These details are necessary to substantiate the claim that the system maintains effective classification in real conditions.

minor comments (2)

The abstract mentions an 'open-source tutorial' but does not provide a link or repository reference; including this in the abstract or introduction would improve accessibility.
[Abstract] The description of the two firmware functions (F1 for selective recording and F2 for logging) is concise but could be expanded in the main text with pseudocode or flow diagrams for clarity on integration with the classifier.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the positive evaluation of our work's significance and for highlighting areas where the manuscript can be improved. We address the major comments point-by-point below.

read point-by-point responses

Referee: [Abstract] The abstract reports a classification accuracy of 91% and balanced accuracy of 89% for the 1D-CNN, but provides no performance metrics (accuracy, balanced accuracy, or other) for the optimized model after compression to fit the ~10 kB RAM constraint and 20 ms inference time. This omission is load-bearing because the central claim is that the embedded classifier enables smart PAM; without post-optimization results on the target hardware, it is unclear whether quantization or other optimizations degrade the performance.

Authors: We thank the referee for this valuable feedback. The 91% accuracy (89% balanced accuracy) reported in the abstract is for the 1D-CNN after the optimization process (quantization to 8-bit integers and pruning) that was used to meet the AudioMoth constraints. The RAM footprint and inference time were measured directly on the device. To address the concern about clarity, we will revise the abstract to explicitly note that the reported metrics correspond to the optimized model running under the stated hardware constraints. We have also added a sentence in the results confirming that post-optimization evaluation on the held-out test set showed negligible degradation. revision: yes
Referee: [Abstract] The manuscript does not specify the size of the real-world dataset, the validation protocol used (e.g., held-out test set size, cross-validation folds), baseline comparisons, or any error analysis. Furthermore, there are no results from deploying the final model on actual AudioMoth hardware with unseen field recordings to evaluate potential degradation from hardware constraints, microphone noise, or environmental variability. These details are necessary to substantiate the claim that the system maintains effective classification in real conditions.

Authors: We agree that these details should be more prominent for reproducibility. We will revise the abstract to include the real-world dataset size, the validation protocol (held-out test set and cross-validation folds), baseline comparisons, and error analysis as already detailed in the full manuscript. For the deployment on actual AudioMoth hardware with unseen field recordings, the study focused on model optimization and firmware changes, with validation performed on the existing dataset. We will expand the discussion to acknowledge the lack of extensive new field deployments and potential impacts of hardware constraints and environmental factors. revision: partial

standing simulated objections not resolved

No results from deploying the final model on actual AudioMoth hardware with unseen field recordings to evaluate potential degradation from hardware constraints, microphone noise, or environmental variability

Circularity Check

0 steps flagged

No circularity; purely empirical model training and hardware optimization

full rationale

The paper contains no derivations, equations, or predictive claims that reduce to their own inputs. It describes training a 1D-CNN on a real-world dataset, measuring classification accuracy (91%/89% balanced) on held-out data, and applying a separate optimization process to meet RAM/inference constraints. These steps are independent experimental procedures with no self-definitional loops, fitted-input-as-prediction, or load-bearing self-citations. The reported metrics are direct empirical outcomes rather than logical equivalences to the training data or model architecture.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical success of a standard 1D-CNN architecture trained on bioacoustic data and the feasibility of its compression for the target microcontroller; no additional free parameters, ad-hoc axioms, or invented entities are introduced beyond conventional machine-learning practice.

axioms (1)

domain assumption A 1D convolutional neural network can classify raw audio waveforms for detection of a specific seabird call with useful accuracy.
Invoked implicitly when the authors train and deploy the model without deriving its suitability from first principles.

pith-pipeline@v0.9.0 · 5570 in / 1411 out tokens · 98384 ms · 2026-05-07T13:03:11.879469+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 10 canonical work pages

[1]

Browning, R

E. Browning, R. Gibb, P. Glover-Kapfer, et K. E. Jones, « Passive acoustic monitoring in ecology and conserva- tion », 2017, doi: 10.13140/RG.2.2.18158.46409

work page doi:10.13140/rg.2.2.18158.46409 2017
[2]

R. Gibb, E. Browning, P. Glover-Kapfer, et K. E. Jones, « Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring », Methods Ecol. Evol., vol. 10, no 2, p. 169‑185, 2019, doi: 10.1111/2041-210X.13101

work page doi:10.1111/2041-210x.13101 2019
[3]

A. P. Hill, P. Prince, J. L. Snaddon, C. P. Doncaster, et A. Rogers, « AudioMoth: A low-cost acoustic device for monitoring biodiversity and the environment », Hard- wareX, vol. 6, p. e00073, oct. 2019, doi: 10.1016/j.ohx.2019.e00073

work page doi:10.1016/j.ohx.2019.e00073 2019
[4]

A. J. Fairbairn, J.-S. Burmeister, W. W. Weisser, et S. T. Meyer, « BirdNET is as good as experts for acoustic bird monitoring in a European city », 21 septembre 2024. doi: 10.1101/2024.09.17.613451

work page doi:10.1101/2024.09.17.613451 2024
[5]

S. Kahl, C. M. Wood, M. Eibl, et H. Klinck, « BirdNET: A deep learning solution for avian diversity monitoring », Ecol. Inform., vol. 61, p. 101236, mars 2021, doi: 10.1016/j.ecoinf.2021.101236

work page doi:10.1016/j.ecoinf.2021.101236 2021
[6]

F. J. Bravo Sanchez, M. R. Hossain, N. B. English, et S. T. Moore, « Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning archi- tecture », Sci. Rep., vol. 11, no 1, Art. no 1, août 2021, doi: 10.1038/s41598-021-95076-6

work page doi:10.1038/s41598-021-95076-6 2021
[7]

R. Bishnoi et al., « Multi-Partner Project: A Deep Learn- ing Platform Targeting Embedded Hardware for Edge-AI Applications (NEUROKIT2E) », in 2025 Design, Automa- tion & Test in Europe Conference (DATE), mars 2025, p. 1‑7. doi: 10.23919/DATE64628.2025.10993206

work page doi:10.23919/date64628.2025.10993206 2025
[8]

F. Perotto et al., « Thinking the Certification Process of Embedded ML-Based Aeronautical Components Using AIDGE, a French Open and Sovereign AI Platform »:, in Proceedings of the 2nd International Conference on Cog- nitive Aircraft Systems, Toulouse, France: SCITEPRESS - Science and Technology Publications, 2024, p. 64‑71. doi: 10.5220/0012965100004562

work page doi:10.5220/0012965100004562 2024
[9]

A. P. Hill, P. Prince, E. Piña Covarrubias, C. P. Doncaster, J. L. Snaddon, et A. Rogers, « AudioMoth: Evaluation of a smart open acoustic device for monitoring biodiversity and the environment », Methods Ecol. Evol., vol. 9, no 5, p. 1199‑1211, 2018, doi: 10.1111/2041-210X.12955

work page doi:10.1111/2041-210x.12955 2018
[10]

Velasco-Montero, C

D. Velasco-Montero, C. Lozano-Pons, J. Fernández-Berni, et G. Bastianelli, « On-site acoustic identification of bird species based on a shallow neural network », Ecol. In- form., vol. 94, p. 103687, mars 2026, doi: 10.1016/j.ecoinf.2026.103687

work page doi:10.1016/j.ecoinf.2026.103687 2026
[11]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, et P. Dollar, « Fo- cal Loss for Dense Object Detection », présenté à Proceed- ings of the IEEE International Conference on Computer Vision, 2017, p. 2980‑2988. Web References 1 Edge-impulse models on AudioMoth: https://github.com/Open- AcousticDevices/AudioMoth-EdgeImpulse 2 Tiling and export tutorial: https://git...

2017