pith. sign in

arxiv: 2502.20838 · v3 · pith:7NTAAJT3new · submitted 2025-02-28 · 💻 cs.SD · cs.AI· cs.LG· eess.AS

Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data

classification 💻 cs.SD cs.AIcs.LGeess.AS
keywords temporallocalizationrecordingssupervisedannotationbaselinesbinarybioacoustic
0
0 comments X
read the original abstract

Passive acoustic monitoring (PAM) systems generate continuous recordings spanning months, yet automated bioacoustic analysis of whale calls requires two separate annotation efforts: binary presence labels for classification and precise temporal boundaries for localization. A binary label for a multi-minute recording can be assigned in seconds, but timestamping every call within it requires hours of expert effort. Providing both is infeasible at operational scale. We present DSMIL-LocNet, a weakly supervised multiple instance learning (MIL) framework that performs both classification and temporal localization using only recording-level presence/absence labels. Our dual-stream architecture integrates spectral and temporal features to process recordings of 2--30 minutes without the temporal compression that degrades existing CNN methods on long inputs. On the AcousticTrends BlueFinLibrary, DSMIL-LocNet achieves F1 scores of 0.88--0.91 on recordings of 300--1800s, where fully supervised CNN baselines degrade to 0.19--0.64. It also provides temporal localization that these baselines cannot produce without frame-level annotation. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Ecologically-Constrained Task Arithmetic for Multi-Taxa Bioacoustic Classifiers Without Shared Data

    cs.SD 2026-05 unverdicted novelty 7.0

    Task vector arithmetic on near-orthogonal bioacoustic models allows composing multi-taxa classifiers without data sharing, with asymmetric accuracy gains for underrepresented taxa.