Probing-guided selection of depth zones from frozen SSL speech models yields compact classifiers with 28% relative EER improvement on cross-domain deepfake detection tasks.
hub
Audio Deepfake detection: A survey
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
MixFake is a new benchmark for mixed-authenticity audio and a multi-stream prompt tuning method achieves 0.95% EER foreground and 7.72% absolute gain in complex background deepfake detection.
A new dataset, iterative coarse-to-fine localization framework, and segment-level IoU F1 metric tackle the open problem of detecting multiple unknown word-level inpainted regions in speech.
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
ArtifactNet extracts codec residuals from spectrograms with a 4M-parameter network to detect AI music at F1=0.9829 and 1.49% FPR on unseen tracks from 22 generators, outperforming larger baselines.
Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
DetectZoo is a unified toolkit providing reference implementations of 61 detectors, native loaders for 22 benchmark datasets, and a standardized evaluation pipeline for AI-generated content detection across text, audio, and image modalities.
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
Phoneme-level analysis using self-supervised embeddings identifies higher divergence in complex vowels and fricatives for emotional voice conversion deepfakes, enabling more interpretable detection across emotions.
A two-stage boundary detection plus segment classification method with multi-length training achieves state-of-the-art results for detecting and localizing partial deepfakes on PartialSpoof and Half-Truth benchmarks.
The paper proposes Synthetic Trust Attacks (STAs) as a formal threat model with an eight-stage attack chain (STAM) that shifts defense focus from detecting synthetic media to protecting human decision processes in social engineering.
The AuthGlass dataset and proposed multi-modal models achieve state-of-the-art results on voice liveness detection and user authentication for smart glasses.
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.
Fairness metrics uncover gender disparities in audio deepfake detection error distributions that standard Equal Error Rate metrics obscure.
A zero-shot open-set speech deepfake source tracing framework using adapted SSL-AASIST embeddings and AAM loss achieves EER of 16.43% in OOD trials with cosine scoring, outperforming few-shot alternatives.
Dual-granularity orthogonal disentanglement framework achieves EERs of 1.35%, 7.88%, and 21.58% on ASVspoof 2019 LA, ASVspoof 2021 DF, and In-the-Wild datasets, outperforming gradient reversal by 2.60% on cross-dataset transfer.
RBF SVM achieves ~93% accuracy and ~7% EER on deepfake audio detection using prosodic and spectral features from the FoR dataset at 44.1 kHz and 16 kHz sampling rates.
citing papers explorer
-
Asymmetric Phase Coding Audio Watermarking
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
-
Split and Conquer Partial Deepfake Speech
A two-stage boundary detection plus segment classification method with multi-length training achieves state-of-the-art results for detecting and localizing partial deepfakes on PartialSpoof and Half-Truth benchmarks.
-
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.