Asvspoof 2021: Automatic speaker verification spoofing and countermeasures challenge evaluation plan

H ´ector Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, et al · 2021 · arXiv 2109.00535

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis

cs.CV · 2026-04-14 · conditional · novelty 7.0

Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.

Phoneme-Level Deepfake Detection Across Emotional Conditions Using Self-Supervised Embeddings

cs.SD · 2026-05-04 · unverdicted · novelty 6.0

Phoneme-level analysis using self-supervised embeddings identifies higher divergence in complex vowels and fricatives for emotional voice conversion deepfakes, enabling more interpretable detection across emotions.

Deepfake Audio Detection Using Self-supervised Fusion Representations

cs.SD · 2026-05-05 · unverdicted · novelty 5.0

A dual-branch fusion model with XLS-R, BEATs, Matching Head, and cross-attention achieves 70.20% F1-score and 16.54% environmental EER on CompSpoofV2, outperforming the baseline for component-level deepfake detection.

citing papers explorer

Showing 3 of 3 citing papers.

Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis cs.CV · 2026-04-14 · conditional · none · ref 9
Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
Phoneme-Level Deepfake Detection Across Emotional Conditions Using Self-Supervised Embeddings cs.SD · 2026-05-04 · unverdicted · none · ref 6
Phoneme-level analysis using self-supervised embeddings identifies higher divergence in complex vowels and fricatives for emotional voice conversion deepfakes, enabling more interpretable detection across emotions.
Deepfake Audio Detection Using Self-supervised Fusion Representations cs.SD · 2026-05-05 · unverdicted · none · ref 9
A dual-branch fusion model with XLS-R, BEATs, Matching Head, and cross-attention achieves 70.20% F1-score and 16.54% environmental EER on CompSpoofV2, outperforming the baseline for component-level deepfake detection.

Asvspoof 2021: Automatic speaker verification spoofing and countermeasures challenge evaluation plan

fields

years

verdicts

representative citing papers

citing papers explorer