General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

Daniel P. W. Ellis; Eduardo Fonseca; Frederic Font; Jordi Pons; Manoj Plakal; Xavier Favory; Xavier Serra

arxiv: 1807.09902 · v3 · pith:JPC4PO6Onew · submitted 2018-07-26 · 💻 cs.SD · cs.LG· eess.AS· stat.ML

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

Eduardo Fonseca , Manoj Plakal , Frederic Font , Daniel P. W. Ellis , Xavier Favory , Jordi Pons , Xavier Serra This is my paper

classification 💻 cs.SD cs.LGeess.ASstat.ML

keywords audiotasktaggingaudiosetfreesoundgeneral-purposebaselinechallenge

0 comments

read the original abstract

This paper describes Task 2 of the DCASE 2018 Challenge, titled "General-purpose audio tagging of Freesound content with AudioSet labels". This task was hosted on the Kaggle platform as "Freesound General-Purpose Audio Tagging Challenge". The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology. We present the task, the dataset prepared for the competition, and a baseline system.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

WQ-Fusion: Dynamic Gated Attention for Cross-Domain Audio Representation
cs.SD 2026-06 unverdicted novelty 4.0

WQ-Fusion combines Whisper and Qwen encoders with gated attention to reach 0.836 on the Interspeech 2026 Audio Encoder Capability Challenge, outperforming single-encoder baselines.