A new dataset, iterative coarse-to-fine localization framework, and segment-level IoU F1 metric tackle the open problem of detecting multiple unknown word-level inpainted regions in speech.
Fooled Twice: People Cannot Detect Deepfakes but Think They Can
9 Pith papers cite this work. Polarity classification is still indexing.
years
2026 9representative citing papers
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
ArtifactNet extracts codec residuals from spectrograms with a 4M-parameter network to detect AI music at F1=0.9829 and 1.49% FPR on unseen tracks from 22 generators, outperforming larger baselines.
Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
Phoneme-level analysis using self-supervised embeddings identifies higher divergence in complex vowels and fricatives for emotional voice conversion deepfakes, enabling more interpretable detection across emotions.
A two-stage boundary detection plus segment classification method with multi-length training achieves state-of-the-art results for detecting and localizing partial deepfakes on PartialSpoof and Half-Truth benchmarks.
The paper proposes Synthetic Trust Attacks (STAs) as a formal threat model with an eight-stage attack chain (STAM) that shifts defense focus from detecting synthetic media to protecting human decision processes in social engineering.
RBF SVM achieves ~93% accuracy and ~7% EER on deepfake audio detection using prosodic and spectral features from the FoR dataset at 44.1 kHz and 16 kHz sampling rates.
citing papers explorer
-
Toward Fine-Grained Speech Inpainting Forensics:A Dataset, Method, and Metric for Multi-Region Tampering Localization
A new dataset, iterative coarse-to-fine localization framework, and segment-level IoU F1 metric tackle the open problem of detecting multiple unknown word-level inpainted regions in speech.
-
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
-
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics
ArtifactNet extracts codec residuals from spectrograms with a 4M-parameter network to detect AI music at F1=0.9829 and 1.49% FPR on unseen tracks from 22 generators, outperforming larger baselines.
-
Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis
Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
-
Asymmetric Phase Coding Audio Watermarking
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
-
Phoneme-Level Deepfake Detection Across Emotional Conditions Using Self-Supervised Embeddings
Phoneme-level analysis using self-supervised embeddings identifies higher divergence in complex vowels and fricatives for emotional voice conversion deepfakes, enabling more interpretable detection across emotions.
-
Split and Conquer Partial Deepfake Speech
A two-stage boundary detection plus segment classification method with multi-length training achieves state-of-the-art results for detecting and localizing partial deepfakes on PartialSpoof and Half-Truth benchmarks.
-
Synthetic Trust Attacks: Modeling How Generative AI Manipulates Human Decisions in Social Engineering Fraud
The paper proposes Synthetic Trust Attacks (STAs) as a formal threat model with an eight-stage attack chain (STAM) that shifts defense focus from detecting synthetic media to protecting human decision processes in social engineering.
-
Classical Machine Learning Baselines for Deepfake Audio Detection on the Fake-or-Real Dataset
RBF SVM achieves ~93% accuracy and ~7% EER on deepfake audio detection using prosodic and spectral features from the FoR dataset at 44.1 kHz and 16 kHz sampling rates.