pith. machine review for the scientific record. sign in

arxiv: 2605.10334 · v1 · submitted 2026-05-11 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

The Alpha Blending Hypothesis: Compositing Shortcut in Deepfake Detection

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:05 UTC · model grok-4.3

classification 💻 cs.CV
keywords deepfake detectionalpha blendingcompositing artifactsself-blended imagescross-dataset generalizationshortcut learningface manipulation
0
0 comments X

The pith

Deepfake detectors primarily search for alpha blending compositing artifacts rather than semantic or generative cues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper advances the Alpha Blending Hypothesis that current frame-based deepfake detectors work chiefly by locating low-level compositing traces left when manipulated faces are inserted into real frames. If this holds, then detectors succeed through a simple shortcut rather than by recognizing what makes an image forged. Experiments show these models react strongly to self-blended real images and to ordinary non-neural edits. The authors therefore train a detector using only real facial photos augmented with artificial blends and report competitive results across many held-out deepfake test collections. Combining the blending-focused model with one that ignores such cues further raises performance.

Core claim

The central claim is that state-of-the-art frame-based detectors primarily function as alpha blending searchers. They localize low-level compositing artifacts introduced during the integration of manipulated faces into target frames, rather than learning semantic anomalies or specific generative neural fingerprints.

What carries the argument

The Alpha Blending Hypothesis, which frames detector success as the detection of alpha blending artifacts created when fake faces are composited into target frames.

If this is right

  • Detectors exhibit high sensitivity to self-blended images and to simple non-generative manipulations.
  • Training on real images augmented only with synthetic blends produces strong average cross-dataset generalization on compositional deepfake collections.
  • Explicit blending searchers and models that avoid blending shortcuts produce complementary predictions.
  • An ensemble of the two types of models yields improved detection results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many existing deepfake benchmarks may be solved largely by exploiting compositing cues instead of learning forgery semantics.
  • Future detectors could be hardened by training on data that deliberately varies or removes blending signals.
  • The method suggests it is possible to build effective detectors without ever training on explicitly generated deepfake examples.

Load-bearing premise

High sensitivity to self-blended real images and non-generative manipulations shows that detectors rely primarily on compositing artifacts rather than other cues.

What would settle it

A detector that continues to generalize across deepfake datasets after all compositing artifacts have been removed from both training and test images would falsify the hypothesis.

Figures

Figures reproduced from arXiv: 2605.10334 by Andrii Yermakov, Jan Cech, Jiri Matas, Mario Fritz.

Figure 1
Figure 1. Figure 1: The immunization effect. Validation (left) and training (right) curves for PE [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sensitivity of GenD [53] and FS-VFM [44] to alpha blending (Hard/Soft discontinuities). Right: samples from “Real-on-Real” dataset with a +100% brightness; classes are in brackets. other augmentation is used during “fake” sample creation. Crucially, such samples contain no neural fingerprints. If a pre-trained detector is invariant to this manipulation, it will have a low AUROC. To ensure that the detectio… view at source ↗
Figure 3
Figure 3. Figure 3: DINO, CLIP and PE backbones: average cross-dataset test AUROC as a function of training [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Recent deepfake detection methods demonstrate improved cross-dataset generalization, yet the underlying mechanisms remain underexplored. We introduce the Alpha Blending Hypothesis, positing that state-of-the-art frame-based detectors primarily function as alpha blending searchers; rather than learning semantic anomalies or specific generative neural fingerprints, they localize low-level compositing artifacts introduced during the integration of manipulated faces into target frames. We experimentally validate the hypothesis, demonstrating that deepfake detectors exhibit high sensitivity to the so-called self-blended images (SBI) and non-generative manipulations. We propose the method BlenD that leverages a large-scale, diverse dataset of real-only facial images augmented with SBI. This approach achieves the best average cross-dataset generalization on 15 compositional deepfake datasets released between 2019 and 2025 without utilizing explicitly generated deepfakes during training. Furthermore, we show that predictions from explicit blending searchers and models resilient to blending shortcuts are highly complementary, yielding a state-of-the-art AUROC of 94.0% in an ensemble configuration. The code with experiments and the trained model will be publicly released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces the Alpha Blending Hypothesis, which posits that state-of-the-art frame-based deepfake detectors primarily function as alpha blending searchers that localize low-level compositing artifacts from face integration rather than semantic anomalies or generative neural fingerprints. It validates the hypothesis empirically by demonstrating high detector sensitivity to self-blended images (SBI) and non-generative manipulations, proposes the BlenD training method that uses only real facial images augmented with SBI to achieve the best average cross-dataset generalization across 15 compositional deepfake datasets (2019-2025) without any explicitly generated deepfakes, and shows that an ensemble of explicit blending searchers with blending-resilient models reaches 94.0% AUROC. The code, experiments, and trained model are to be released publicly.

Significance. If the central results hold, the work is significant for offering a mechanistic account of why recent deepfake detectors generalize well across datasets and for introducing a practical, real-image-only training regime that sidesteps the need to generate or curate synthetic deepfakes. The scale of the evaluation (15 datasets spanning multiple years), the demonstration of complementarity via the ensemble, and the commitment to public code and model release are clear strengths that support reproducibility and further investigation in the field.

major comments (2)
  1. [Hypothesis validation experiments] Abstract and hypothesis validation: The claim that detectors 'primarily' function as alpha blending searchers is inferred from observed high sensitivity to SBI and non-generative manipulations plus the success of the SBI-only BlenD regime. However, this remains correlational evidence of sufficiency; the manuscript does not include a controlled ablation that removes or masks alpha-blending artifacts while preserving other potential cues (semantic inconsistencies or generative fingerprints) and then quantifies the resulting performance degradation. Such an isolation test is load-bearing for the 'primarily' qualifier and would be required to rule out mixed reliance on multiple cue types.
  2. [BlenD training and evaluation] BlenD method and cross-dataset results: While the average generalization across 15 datasets is presented as state-of-the-art, the manuscript should provide per-dataset AUROC breakdowns (ideally in a table) together with explicit details on SBI generation parameters, the real-to-blended sample ratio, and any post-processing steps. Without these, it is difficult to assess whether the reported gains are robust or could be influenced by unstated dataset-construction choices.
minor comments (3)
  1. [Abstract] The abstract states that BlenD 'achieves the best average cross-dataset generalization' but does not report the numerical average AUROC value or list the specific baseline methods and scores being compared; adding these figures would make the summary self-contained.
  2. [Figures illustrating SBI and detector responses] Figure captions and SBI examples should explicitly state the blending parameters (alpha range, mask type, source/target image selection) used to generate the illustrated self-blended images so that readers can reproduce the sensitivity tests.
  3. [Discussion or conclusion] The manuscript would benefit from a short discussion of potential limitations, such as whether the alpha-blending shortcut remains dominant for video-level or temporal deepfake detectors that may exploit additional motion cues.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: The claim that detectors 'primarily' function as alpha blending searchers is inferred from high sensitivity to SBI and non-generative manipulations plus BlenD success. This is correlational evidence; a controlled ablation removing alpha-blending artifacts while preserving other cues (semantic inconsistencies or generative fingerprints) is needed to quantify degradation and support the 'primarily' qualifier.

    Authors: We agree that the current evidence is primarily correlational and that a direct ablation isolating blending artifacts from other potential cues would offer stronger causal support for the 'primarily' claim. Designing such an experiment without introducing new confounds is technically challenging. In the revised manuscript we will add a dedicated limitations subsection that explicitly discusses this gap, refine the abstract and hypothesis statements to avoid overclaiming, and include additional supporting analyses (e.g., further sensitivity tests on non-blending manipulations). We believe the combination of SBI sensitivity, BlenD's cross-dataset performance without synthetic deepfakes, and complementarity with blending-resilient models still provides meaningful mechanistic insight. revision: partial

  2. Referee: The manuscript should provide per-dataset AUROC breakdowns (ideally in a table) together with explicit details on SBI generation parameters, the real-to-blended sample ratio, and any post-processing steps.

    Authors: We agree that these details will improve transparency and allow readers to better evaluate robustness. The revised manuscript will include a new table reporting AUROC for each of the 15 individual datasets. We will also expand the experimental setup section to specify all SBI generation parameters, the exact real-to-blended training ratio, and any post-processing steps applied. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical hypothesis tested on external data without self-referential reduction

full rationale

The paper advances the Alpha Blending Hypothesis through experimental validation on external deepfake datasets and non-generative manipulations, then introduces BlenD trained exclusively on real images augmented with self-blended images. No derivation chain, equation, or central claim reduces by construction to fitted parameters, self-citations, or renamed inputs; the sufficiency of blending artifacts is demonstrated via cross-dataset AUROC on 15 independent compositional sets, and complementarity with other models is shown empirically. The approach remains self-contained against external benchmarks with no load-bearing self-citation or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that SBI and non-generative manipulations sufficiently isolate alpha blending effects without introducing confounding factors; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Self-blended images (SBI) and non-generative manipulations isolate low-level compositing artifacts that match those in real deepfakes.
    Invoked to validate that detector sensitivity to SBI indicates primary reliance on blending shortcuts rather than semantic features.

pith-pipeline@v0.9.0 · 5499 in / 1286 out tokens · 36200 ms · 2026-05-12T05:05:53.568763+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 4 internal anchors

  1. [1]

    The DeepSpeak Dataset

    Sarah Barrington, Matyas Bohacek, and Hany Farid. DeepSpeak dataset v1.0.arXiv preprint arXiv:2408.05366, 2024

  2. [2]

    Perception encoder: The best visual embeddings are not at the output of the network

    Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Abdul Rasheed, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Shang-Wen Li, Piotr Dollar, and Christoph Feichtenhofer. Perception encoder: The best visual embeddings are not at the output of the network. In The...

  3. [3]

    Burt and Edward H

    Peter J. Burt and Edward H. Adelson. A multiresolution spline with application to image mosaics. ACM Trans. Graph., 2(4):217–236, October 1983. ISSN 0730-0301. doi: 10.1145/245.247. URLhttps://doi.org/10.1145/245.247

  4. [4]

    Forensics adapter: Adapting CLIP for generalizable face forgery detection

    Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, and Junyu Dong. Forensics adapter: Adapting CLIP for generalizable face forgery detection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19207–19217, 2025

  5. [5]

    Detecting lip-syncing deepfakes: Vision temporal transformer for analyzing mouth inconsistencies.arXiv preprint arXiv:2504.01470, 2025

    Soumyya Kanti Datta, Shan Jia, and Siwei Lyu. Detecting lip-syncing deepfakes: Vision temporal transformer for analyzing mouth inconsistencies.arXiv preprint arXiv:2504.01470, 2025

  6. [6]

    Retinaface: Single-shot multi-level face localisation in the wild

    Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. Retinaface: Single-shot multi-level face localisation in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5203–5212, 2020

  7. [7]

    The DeepFake Detection Challenge (DFDC) Dataset

    Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. The deepfake detection challenge (DFDC) dataset.arXiv preprint arXiv:2006.07397, 2020

  8. [8]

    Deepfakes Detection Dataset by Google & Jigsaw

    Nicholas Dufour, Andrew Gully, Per Karlsson, Alexey Victor V orbyov, Thomas Leung, Jeremiah Childs, and Christoph Bregler. Deepfakes Detection Dataset by Google & Jigsaw. https:// research.google/blog/contributing-data-to-deepfake-detection-research/ , 2019

  9. [9]

    arXiv preprint arXiv:2407.03168 , year =

    Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, and Di Zhang. Liveportrait: Efficient portrait animation with stitching and retargeting control.arXiv preprint arXiv:2407.03168, 2024

  10. [10]

    Lips don’t lie: A generalisable and robust approach to face forgery detection

    Alexandros Haliassos, Konstantinos V ougioukas, Stavros Petridis, and Maja Pantic. Lips don’t lie: A generalisable and robust approach to face forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5039–5049, 2021

  11. [11]

    Leveraging real talking faces via self-supervision for robust forgery detection

    Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, and Maja Pantic. Leveraging real talking faces via self-supervision for robust forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14950–14962, 2022

  12. [12]

    FSBI: Deepfake detection with frequency enhanced self-blended images.Image and Vision Computing, 154: 105418, 2025

    Ahmed Abul Hasanaath, Hamzah Luqman, Raed Katib, and Saeed Anwar. FSBI: Deepfake detection with frequency enhanced self-blended images.Image and Vision Computing, 154: 105418, 2025

  13. [13]

    PolyGlotFake: A novel multilingual and multimodal deepfake dataset

    Yang Hou, Haitao Fu, Chunkai Chen, Zida Li, Haoyu Zhang, and Jianjun Zhao. PolyGlotFake: A novel multilingual and multimodal deepfake dataset. InInternational Conference on Pattern Recognition, pages 180–193. Springer, 2024

  14. [14]

    Exposing gan-generated faces using inconsistent corneal specular highlights

    Shu Hu, Yuezun Li, and Siwei Lyu. Exposing gan-generated faces using inconsistent corneal specular highlights. InICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2500–2504. IEEE, 2021

  15. [15]

    Model attribution of face-swap deepfake videos

    Shan Jia, Xin Li, and Siwei Lyu. Model attribution of face-swap deepfake videos. In2022 IEEE International Conference on Image Processing (ICIP), pages 2356–2360. IEEE, 2022. 10

  16. [16]

    Khalid, S

    Hasam Khalid, Shahroz Tariq, Minha Kim, and Simon S Woo. FakeA VCeleb: A novel audio- video multimodal deepfake dataset.arXiv preprint arXiv:2108.05080, 2021

  17. [17]

    Beyond spatial frequency: Pixel-wise temporal frequency-based deepfake video detection

    Taehoon Kim, Jongwook Choi, Yonghyun Jeong, Haeun Noh, Jaejun Yoo, Seungryul Baek, and Jongwon Choi. Beyond spatial frequency: Pixel-wise temporal frequency-based deepfake video detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11198–11207, 2025

  18. [18]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  19. [19]

    KoDF: A large-scale korean deepfake detection dataset

    Patrick Kwon, Jaeseong You, Gyuhyeon Nam, Sungwoo Park, and Gyeongsu Chae. KoDF: A large-scale korean deepfake detection dataset. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10744–10753, 2021

  20. [20]

    Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes

    Nicolas Larue, Ngoc-Son Vu, Vitomir Struc, Peter Peer, and Vassilis Christophides. Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes. InProceedings of the IEEE/CVF international conference on computer vision, pages 21011–21021, 2023

  21. [21]

    Sok: Systematization and benchmarking of deepfake detectors in a unified framework

    Binh M Le, Jiwon Kim, Simon S Woo, Kristen Moore, Alsharif Abuadbba, and Shahroz Tariq. Sok: Systematization and benchmarking of deepfake detectors in a unified framework. In2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P), pages 883–902. IEEE, 2025

  22. [22]

    Latentsync: Taming audio-conditioned latent diffusion models for lip sync with syncnet supervision.arXiv preprint arXiv:2412.09262, 2024

    Chunyu Li, Chao Zhang, Weikai Xu, Jingyu Lin, Jinghui Xie, Weiguo Feng, Bingyue Peng, Cunjian Chen, and Weiwei Xing. Latentsync: Taming audio-conditioned latent diffusion models for lip sync with syncnet supervision.arXiv preprint arXiv:2412.09262, 2024

  23. [23]

    Advancing high fidelity identity swapping for forgery detection

    Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. Advancing high fidelity identity swapping for forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5074–5083, 2020

  24. [24]

    In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking

    Yuezun Li, Ming-Ching Chang, and Siwei Lyu. In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking. InIEEE International Workshop on Information Forensics and Security (WIFS), 2018

  25. [25]

    Celeb-DF: A large-scale challeng- ing dataset for deepfake forensics

    Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-DF: A large-scale challeng- ing dataset for deepfake forensics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3207–3216, 2020

  26. [26]

    Celeb-DF++: A large-scale challenging video deepfake benchmark for generalizable forensics.arXiv preprint arXiv:2507.18015, 2025

    Yuezun Li, Delong Zhu, Xinjie Cui, and Siwei Lyu. Celeb-DF++: A large-scale challenging video deepfake benchmark for generalizable forensics.arXiv preprint arXiv:2507.18015, 2025

  27. [27]

    Fake it till you make it: Curricular dynamic forgery augmentations towards general deepfake detection

    Yuzhen Lin, Wentang Song, Bin Li, Yuezun Li, Jiangqun Ni, Han Chen, and Qiushi Li. Fake it till you make it: Curricular dynamic forgery augmentations towards general deepfake detection. InEuropean conference on computer vision, pages 104–122. Springer, 2024

  28. [28]

    Lips are lying: Spotting the temporal inconsistency between audio and visual in lip-syncing deepfakes

    Weifeng Liu, Tianyi She, Jiawei Liu, Boheng Li, Dongyu Yao, and Run Wang. Lips are lying: Spotting the temporal inconsistency between audio and visual in lip-syncing deepfakes. Advances in Neural Information Processing Systems, 37:91131–91155, 2024

  29. [29]

    Generalizing face forgery detection with high-frequency features

    Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. Generalizing face forgery detection with high-frequency features. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16317–16326, 2021

  30. [30]

    Exploiting visual artifacts to expose deepfakes and face manipulations

    Falko Matern, Christian Riess, and Marc Stamminger. Exploiting visual artifacts to expose deepfakes and face manipulations. In2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 83–92, 2019

  31. [31]

    Diff2lip: Audio conditioned diffusion models for lip-synchronization

    Soumik Mukhopadhyay, Saksham Suri, Ravi Teja Gadde, and Abhinav Shrivastava. Diff2lip: Audio conditioned diffusion models for lip-synchronization. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5292–5302, January 2024. 11

  32. [32]

    Deconvolution and checkerboard artifacts

    Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts. Distill, 2016. URLhttp://distill.pub/2016/deconv-checkerboard/

  33. [33]

    Towards universal fake image detectors that generalize across generative models

    Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards universal fake image detectors that generalize across generative models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24480–24489, 2023

  34. [34]

    Poisson image editing.ACM Trans

    Patrick Pérez, Michel Gangnet, and Andrew Blake. Poisson image editing.ACM Trans. Graph., 22(3):313–318, July 2003. ISSN 0730-0301. doi: 10.1145/882262.882269. URL https://doi.org/10.1145/882262.882269

  35. [35]

    Parameter-efficient tuning on layer normalization for pre- trained language models.arXiv preprint arXiv:2211.08682,

    Wang Qi, Yu-Ping Ruan, Yuan Zuo, and Taihao Li. Parameter-efficient tuning on layer normal- ization for pre-trained language models.arXiv preprint arXiv:2211.08682, 2022

  36. [36]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, pages 8748–8763. PmLR, 2021

  37. [37]

    Faceforensics++: Learning to detect manipulated facial images

    Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. Faceforensics++: Learning to detect manipulated facial images. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1–11, 2019

  38. [38]

    Towards real-world deepfake detection: A diverse in-the-wild dataset of forgery faces.arXiv preprint arXiv:2510.08067, 2025

    Junyu Shi, Minghui Li, Junguo Zuo, Zhifei Yu, Yipeng Lin, Shengshan Hu, Ziqi Zhou, Yechao Zhang, Wei Wan, Yinzhe Xu, et al. Towards real-world deepfake detection: A diverse in-the-wild dataset of forgery faces.arXiv preprint arXiv:2510.08067, 2025

  39. [39]

    Detecting deepfakes with self-blended images

    Kaede Shiohara and Toshihiko Yamasaki. Detecting deepfakes with self-blended images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18720–18729, 2022

  40. [40]

    Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. DINOv3. arXiv preprint arXiv:2508.10104, 2025

  41. [41]

    Cyclical learning rates for training neural networks

    Leslie N Smith. Cyclical learning rates for training neural networks. InProceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pages 464–472. IEEE, 2017

  42. [42]

    Illumination enlightened spatial- temporal inconsistency for deepfake video detection

    Kaiyue Tian, Chen Chen, Yichao Zhou, and Xiyuan Hu. Illumination enlightened spatial- temporal inconsistency for deepfake video detection. In2024 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2024

  43. [43]

    FSFM: A generalizable face security foundation model via self-supervised facial representation learning

    Gaojian Wang, Feng Lin, Tong Wu, Zhenguang Liu, Zhongjie Ba, and Kui Ren. FSFM: A generalizable face security foundation model via self-supervised facial representation learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 24364– 24376, 2025

  44. [44]

    Scalable face security vision foundation model for deepfake, diffusion, and spoofing detection.arXiv preprint arXiv:2510.10663, 2025

    Gaojian Wang, Feng Lin, Tong Wu, Zhisheng Yan, and Kui Ren. Scalable face security vision foundation model for deepfake, diffusion, and spoofing detection.arXiv preprint arXiv:2510.10663, 2025

  45. [45]

    arXiv preprint arXiv:2510.16320 , year=

    Wenhao Wang, Longqi Cai, Taihong Xiao, Yuxiao Wang, and Ming-Hsuan Yang. Scaling laws for deepfake detection.arXiv preprint arXiv:2510.16320, 2025

  46. [46]

    Identity- driven multimedia forgery detection via reference assistance

    Junhao Xu, Jingjing Chen, Xue Song, Feng Han, Haijun Shan, and Yu-Gang Jiang. Identity- driven multimedia forgery detection via reference assistance. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3887–3896, 2024

  47. [47]

    Learning spatiotemporal inconsistency via thumbnail layout for face deepfake detection.International Journal of Computer Vision, 132(12):5663–5680, 2024

    Yuting Xu, Jian Liang, Lijun Sheng, and Xiao-Yu Zhang. Learning spatiotemporal inconsistency via thumbnail layout for face deepfake detection.International Journal of Computer Vision, 132(12):5663–5680, 2024. 12

  48. [48]

    UCF: Uncovering common features for generalizable deepfake detection

    Zhiyuan Yan, Yong Zhang, Yanbo Fan, and Baoyuan Wu. UCF: Uncovering common features for generalizable deepfake detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 22412–22423, 2023

  49. [49]

    Deep- fakebench: A comprehensive benchmark of deepfake detection

    Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, and Baoyuan Wu. Deep- fakebench: A comprehensive benchmark of deepfake detection. In A. Oh, T. Neu- mann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neu- ral Information Processing Systems, volume 36, pages 4534–4565. Curran Associates, Inc., 2023. URL https://proceedings.neurips...

  50. [50]

    Transcending forgery specificity with latent space augmentation for generalizable deepfake detection

    Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, and Baoyuan Wu. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8984–8994, 2024

  51. [51]

    Orthogonal subspace decomposition for generalizable AI-generated image detection

    Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decomposition for generalizable AI-generated image detection. InProceedings of the International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=GFpjO8S8Po

  52. [52]

    Exposing deep fakes using inconsistent head poses

    Xin Yang, Yuezun Li, and Siwei Lyu. Exposing deep fakes using inconsistent head poses. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8261–8265. IEEE, 2019

  53. [53]

    Deepfake detection that generalizes across benchmarks

    Andrii Yermakov, Jan Cech, Jiri Matas, and Mario Fritz. Deepfake detection that generalizes across benchmarks. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), March 2026

  54. [54]

    HelloMeme: Integrating spatial knitting attentions to embed high-level and fidelity-rich conditions in diffusion models.arXiv preprint arXiv:2410.22901, 2024

    Shengkai Zhang, Nianhong Jiao, Tian Li, Chaojie Yang, Chenhui Xue, Boya Niu, and Jun Gao. HelloMeme: Integrating spatial knitting attentions to embed high-level and fidelity-rich conditions in diffusion models.arXiv preprint arXiv:2410.22901, 2024

  55. [55]

    Detecting and simulating artifacts in GAN fake images

    Xu Zhang, Svebor Karaman, and Shih-Fu Chang. Detecting and simulating artifacts in GAN fake images. In2019 IEEE international workshop on information forensics and security (WIFS), pages 1–6. IEEE, 2019

  56. [56]

    Memo: Memory-guided diffusion for expressive talking video generation.arXiv preprint arXiv:2412.04448, 2024

    Longtao Zheng, Yifan Zhang, Hanzhong Guo, Jiachun Pan, Zhenxiong Tan, Jiahao Lu, Chuanxin Tang, Bo An, and Shuicheng Yan. Memo: Memory-guided diffusion for expressive talking video generation.arXiv preprint arXiv:2412.04448, 2024

  57. [57]

    FreqBlender: Enhancing deep- fake detection by blending frequency knowledge.Advances in Neural Information Processing Systems, 37:44965–44988, 2024

    Jiaran Zhou, Yuezun Li, Baoyuan Wu, Bin Li, Junyu Dong, et al. FreqBlender: Enhancing deep- fake detection by blending frequency knowledge.Advances in Neural Information Processing Systems, 37:44965–44988, 2024

  58. [58]

    Face forensics in the wild

    Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, and Jianbing Shen. Face forensics in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5778–5788, 2021

  59. [59]

    immunization

    Xiangyu Zhu, Hao Wang, Hongyan Fei, Zhen Lei, and Stan Z Li. Face forgery detection by 3D decomposition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2929–2939, 2021. 13 Supplementary Material S1 Supplementary material overview This supplementary material provides additional experimental results and detailed d...