Introduces the LDD task, ListenForge dataset built from five listening head generation methods, and MANet model that detects listening forgeries via motion inconsistencies guided by audio semantics.
Deepfake video detection using convolutional vision transformer
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
COF fuses epistemic, aleatoric, calibration, conformal and distributional uncertainties via simplex optimization of Pearson correlation with errors, outperforming alternatives under distribution shift on CelebDF but collapsing with all methods on cross-dataset tests.
A training-free dual-system framework refines anomaly score ordering on uncertain samples from self-supervised talking head forgery detectors to improve detection performance.
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
PVLM combines parsing-aware vision-language modeling with dynamic contrastive learning to enable fine-grained zero-shot attribution of deepfakes to unseen generators and outperforms prior methods on a new benchmark.
MFVLR uses multi-domain vision-language reconstruction with a fine-grained language transformer, multi-domain vision encoder, and vision injection module to achieve generalizable detection and localization of diffusion-synthesized face forgeries.
Emo-Boost augments low-level deepfake detectors with intra- and inter-modal emotion consistency checks to raise cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.
citing papers explorer
-
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
-
MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization
MFVLR uses multi-domain vision-language reconstruction with a fine-grained language transformer, multi-domain vision encoder, and vision injection module to achieve generalizable detection and localization of diffusion-synthesized face forgeries.