Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
Deepfake video detection using convolutional vision transformer
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
COF fuses epistemic, aleatoric, calibration, conformal and distributional uncertainties via simplex optimization of Pearson correlation with errors, outperforming alternatives under distribution shift on CelebDF but collapsing with all methods on cross-dataset tests.
A training-free dual-system framework refines anomaly score ordering on uncertain samples from self-supervised talking head forgery detectors to improve detection performance.
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
PVLM combines parsing-aware vision-language modeling with dynamic contrastive learning to enable fine-grained zero-shot attribution of deepfakes to unseen generators and outperforms prior methods on a new benchmark.
MFVLR uses multi-domain vision-language reconstruction with a fine-grained language transformer, multi-domain vision encoder, and vision injection module to achieve generalizable detection and localization of diffusion-synthesized face forgeries.
Emo-Boost augments low-level deepfake detectors with intra- and inter-modal emotion consistency checks to raise cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.
citing papers explorer
-
Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis
Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
-
Architecture-Adaptive Uncertainty Fusion for Deepfake Detection
COF fuses epistemic, aleatoric, calibration, conformal and distributional uncertainties via simplex optimization of Pearson correlation with errors, outperforming alternatives under distribution shift on CelebDF but collapsing with all methods on cross-dataset tests.
-
Enhancing Self-Supervised Talking Head Forgery Detection via a Training-Free Dual-System Framework
A training-free dual-system framework refines anomaly score ordering on uncertain samples from self-supervised talking head forgery detectors to improve detection performance.
-
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
-
PVLM: Parsing-Aware Vision Language Model with Dynamic Contrastive Learning for Zero-Shot Deepfake Attribution
PVLM combines parsing-aware vision-language modeling with dynamic contrastive learning to enable fine-grained zero-shot attribution of deepfakes to unseen generators and outperforms prior methods on a new benchmark.
-
MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization
MFVLR uses multi-domain vision-language reconstruction with a fine-grained language transformer, multi-domain vision encoder, and vision injection module to achieve generalizable detection and localization of diffusion-synthesized face forgeries.
-
EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection
Emo-Boost augments low-level deepfake detectors with intra- and inter-modal emotion consistency checks to raise cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.