Probing-guided selection of depth zones from frozen SSL speech models yields compact classifiers with 28% relative EER improvement on cross-domain deepfake detection tasks.
hub
Audio Deepfake detection: A survey
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
MixFake is a new benchmark for mixed-authenticity audio and a multi-stream prompt tuning method achieves 0.95% EER foreground and 7.72% absolute gain in complex background deepfake detection.
A new dataset, iterative coarse-to-fine localization framework, and segment-level IoU F1 metric tackle the open problem of detecting multiple unknown word-level inpainted regions in speech.
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
ArtifactNet extracts codec residuals from spectrograms with a 4M-parameter network to detect AI music at F1=0.9829 and 1.49% FPR on unseen tracks from 22 generators, outperforming larger baselines.
Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
Phoneme-level analysis using self-supervised embeddings identifies higher divergence in complex vowels and fricatives for emotional voice conversion deepfakes, enabling more interpretable detection across emotions.
A two-stage boundary detection plus segment classification method with multi-length training achieves state-of-the-art results for detecting and localizing partial deepfakes on PartialSpoof and Half-Truth benchmarks.
The paper proposes Synthetic Trust Attacks (STAs) as a formal threat model with an eight-stage attack chain (STAM) that shifts defense focus from detecting synthetic media to protecting human decision processes in social engineering.
The AuthGlass dataset and proposed multi-modal models achieve state-of-the-art results on voice liveness detection and user authentication for smart glasses.
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.
Fairness metrics uncover gender disparities in audio deepfake detection error distributions that standard Equal Error Rate metrics obscure.
A zero-shot open-set speech deepfake source tracing framework using adapted SSL-AASIST embeddings and AAM loss achieves EER of 16.43% in OOD trials with cosine scoring, outperforming few-shot alternatives.
RBF SVM achieves ~93% accuracy and ~7% EER on deepfake audio detection using prosodic and spectral features from the FoR dataset at 44.1 kHz and 16 kHz sampling rates.
citing papers explorer
-
Probing-Guided Layer Selection from Self-Supervised Speech Models for Generalizable Audio Deepfake Detection
Probing-guided selection of depth zones from frozen SSL speech models yields compact classifiers with 28% relative EER improvement on cross-domain deepfake detection tasks.
-
MixFake: Benchmarking and Enhancing Audio Deepfake Detection in Diverse Real-world Mixed Audio
MixFake is a new benchmark for mixed-authenticity audio and a multi-stream prompt tuning method achieves 0.95% EER foreground and 7.72% absolute gain in complex background deepfake detection.
-
Toward Fine-Grained Speech Inpainting Forensics:A Dataset, Method, and Metric for Multi-Region Tampering Localization
A new dataset, iterative coarse-to-fine localization framework, and segment-level IoU F1 metric tackle the open problem of detecting multiple unknown word-level inpainted regions in speech.
-
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
-
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics
ArtifactNet extracts codec residuals from spectrograms with a 4M-parameter network to detect AI music at F1=0.9829 and 1.49% FPR on unseen tracks from 22 generators, outperforming larger baselines.
-
Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis
Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
-
Asymmetric Phase Coding Audio Watermarking
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
-
Phoneme-Level Deepfake Detection Across Emotional Conditions Using Self-Supervised Embeddings
Phoneme-level analysis using self-supervised embeddings identifies higher divergence in complex vowels and fricatives for emotional voice conversion deepfakes, enabling more interpretable detection across emotions.
-
Split and Conquer Partial Deepfake Speech
A two-stage boundary detection plus segment classification method with multi-length training achieves state-of-the-art results for detecting and localizing partial deepfakes on PartialSpoof and Half-Truth benchmarks.
-
Synthetic Trust Attacks: Modeling How Generative AI Manipulates Human Decisions in Social Engineering Fraud
The paper proposes Synthetic Trust Attacks (STAs) as a formal threat model with an eight-stage attack chain (STAM) that shifts defense focus from detecting synthetic media to protecting human decision processes in social engineering.
-
AuthGlass: Benchmarking Voice Liveness Detection and Authentication on Smart Glasses via Comprehensive Acoustic Features
The AuthGlass dataset and proposed multi-modal models achieve state-of-the-art results on voice liveness detection and user authentication for smart glasses.
-
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.
-
Gender Fairness in Audio Deepfake Detection: Performance and Disparity Analysis
Fairness metrics uncover gender disparities in audio deepfake detection error distributions that standard Equal Error Rate metrics obscure.
-
Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing
A zero-shot open-set speech deepfake source tracing framework using adapted SSL-AASIST embeddings and AAM loss achieves EER of 16.43% in OOD trials with cosine scoring, outperforming few-shot alternatives.
-
Classical Machine Learning Baselines for Deepfake Audio Detection on the Fake-or-Real Dataset
RBF SVM achieves ~93% accuracy and ~7% EER on deepfake audio detection using prosodic and spectral features from the FoR dataset at 44.1 kHz and 16 kHz sampling rates.