Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

Emiru Tsunoo; Kazuki Shimada; Masafumi Takahashi; Naoya Takahashi; Shusuke Takahashi; Yuichiro Koyama; Yuki Mitsufuji

arxiv: 2106.10806 · v1 · pith:BSWWT44Onew · submitted 2021-06-21 · 📡 eess.AS · cs.SD

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

Kazuki Shimada , Naoya Takahashi , Yuichiro Koyama , Shusuke Takahashi , Emiru Tsunoo , Masafumi Takahashi , Yuki Mitsufuji This is my paper

classification 📡 eess.AS cs.SD

keywords systemsystemsdetectioneventimpulselocalizationmodelseld

0 comments

read the original abstract

This report describes our systems submitted to the DCASE2021 challenge task 3: sound event localization and detection (SELD) with directional interference. Our previous system based on activity-coupled Cartesian direction of arrival (ACCDOA) representation enables us to solve a SELD task with a single target. This ACCDOA-based system with efficient network architecture called RD3Net and data augmentation techniques outperformed state-of-the-art SELD systems in terms of localization and location-dependent detection. Using the ACCDOA-based system as a base, we perform model ensembles by averaging outputs of several systems trained with different conditions such as input features, training folds, and model architectures. We also use the event independent network v2 (EINV2)-based system to increase the diversity of the model ensembles. To generalize the models, we further propose impulse response simulation (IRS), which generates simulated multi-channel signals by convolving simulated room impulse responses (RIRs) with source signals extracted from the original dataset. Our systems significantly improved over the baseline system on the development dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis
eess.AS 2025-09 unverdicted novelty 7.0

DeepASA unifies source separation, dereverberation, SED, classification, and DoAE via object-oriented processing, chain-of-inference, and temporal coherence matching, reporting SOTA on ASA2, MC-FUSS, and STARSS23.