Deepfake Detection that Generalizes Across Benchmarks

Andrii Yermakov; Jan Cech; Jiri Matas; Mario Fritz

arxiv: 2508.06248 · v4 · submitted 2025-08-08 · 💻 cs.CV

Deepfake Detection that Generalizes Across Benchmarks

Andrii Yermakov , Jan Cech , Jiri Matas , Mario Fritz This is my paper

Pith reviewed 2026-05-18 23:43 UTC · model grok-4.3

classification 💻 cs.CV

keywords deepfake detectioncross-dataset generalizationlayer normalizationhyperspherical manifoldmetric learningparameter-efficient adaptationvision encoder

0 comments

The pith

A minimal update to a pre-trained vision model delivers state-of-the-art cross-dataset generalization for deepfake detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out to show that strong generalization to unseen deepfake techniques is possible without introducing elaborate new model architectures or large numbers of trainable parameters. It achieves this by taking a standard pre-trained vision encoder and updating only its Layer Normalization parameters while forcing the learned features onto a hyperspherical manifold through L2 normalization and metric learning. A sympathetic reader would care because practical deepfake detectors must handle manipulation methods that appear after training, and complex models are difficult to deploy or retrain at scale. If the central claim holds, effective detectors could be built and updated with far less computation than current approaches require.

Core claim

The GenD method updates only the Layer Normalization parameters of a foundational pre-trained vision encoder, representing 0.03 percent of the total parameters, and enforces a hyperspherical feature manifold by combining L2 normalization with metric learning. When evaluated across 14 benchmark datasets spanning 2019 to 2025, the resulting detector records higher average cross-dataset AUROC than more complex recent methods. The analysis further establishes that training on paired real and fake images drawn from the same source video is required to limit shortcut learning and that detection difficulty on academic benchmarks has not increased in a strictly monotonic fashion over time.

What carries the argument

GenD, the parameter-efficient adaptation of a pre-trained vision encoder that updates only its Layer Normalization parameters and projects features onto a hyperspherical manifold via L2 normalization and metric learning.

Load-bearing premise

That performance on the chosen 14 academic benchmarks from 2019 to 2025 serves as a reliable indicator of how the detector will behave against entirely new real-world manipulation techniques absent from all evaluation sets.

What would settle it

Evaluate the trained detector on a fresh dataset that uses a deepfake generation method completely outside the 14 benchmarks and check whether its average cross-dataset AUROC remains higher than that of competing approaches.

Figures

Figures reproduced from arXiv: 2508.06248 by Andrii Yermakov, Jan Cech, Jiri Matas, Mario Fritz.

**Figure 3.** Figure 3: Video-level AUROC for (a) Training and (b) Valida [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Evolution of detection difficulty over time. Each video-level AUROC is computed on the test set of the corresponding benchmark. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Robustness to image degradations for GenD (PE [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment. Although many approaches adapt foundation models by introducing significant architectural complexity, this work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders. The proposed method, GenD, fine-tunes only the Layer Normalization parameters (0.03% of the total) and enhances generalization by enforcing a hyperspherical feature manifold using L2 normalization and metric learning on it. We conducted an extensive evaluation on 14 benchmark datasets spanning from 2019 to 2025. The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC. Our analysis yields two primary findings for the field: 1) training on paired real-fake data from the same source video is essential for mitigating shortcut learning and improving generalization, and 2) detection difficulty on academic datasets has not strictly increased over time, with models trained on older, diverse datasets showing strong generalization capabilities. This work delivers a computationally efficient and reproducible method, proving that state-of-the-art generalization is attainable by making targeted, minimal changes to a pre-trained foundational image encoder model. The code is at: https://github.com/yermandy/GenD

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes GenD, a parameter-efficient method for deepfake detection. It adapts a pre-trained vision encoder by fine-tuning only the Layer Normalization parameters (0.03% of total parameters) while applying L2 normalization and metric learning to enforce a hyperspherical feature manifold. The method is evaluated on 14 benchmark datasets spanning 2019–2025 and claims state-of-the-art average cross-dataset AUROC, outperforming more complex recent approaches. Two key findings are reported: paired real-fake training data from the same source video is essential to mitigate shortcut learning, and detection difficulty on academic datasets has not strictly increased over time, with older diverse datasets generalizing well. Public code is provided.

Significance. If the empirical results hold, the work is significant for showing that robust cross-dataset generalization in deepfake detection is achievable via minimal, targeted adaptation of foundational encoders rather than architectural complexity. The 0.03% parameter count and public code are clear strengths that support reproducibility and practical utility. The two findings on paired training and non-increasing dataset difficulty offer concrete, actionable insights for the field.

major comments (1)

[§4] §4 (Experimental evaluation): The manuscript claims state-of-the-art average cross-dataset AUROC but provides insufficient detail on the precise train/test splits across the 14 datasets and on whether baseline methods were re-implemented under identical conditions or taken from reported numbers. These details are load-bearing for independent verification of the SOTA claim and the two primary findings.

minor comments (2)

[Abstract] The abstract and introduction could more explicitly name the base pre-trained vision encoder (e.g., which ViT or ResNet variant) to aid immediate reproducibility.
[Tables/Figures] Figure captions and table headers would benefit from clearer indication of whether results are averaged over multiple runs or single runs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address the single major comment below and will incorporate additional experimental details to strengthen reproducibility.

read point-by-point responses

Referee: [§4] §4 (Experimental evaluation): The manuscript claims state-of-the-art average cross-dataset AUROC but provides insufficient detail on the precise train/test splits across the 14 datasets and on whether baseline methods were re-implemented under identical conditions or taken from reported numbers. These details are load-bearing for independent verification of the SOTA claim and the two primary findings.

Authors: We agree that greater specificity on data partitioning and baseline implementation is essential for verification. In the revised manuscript we will add a dedicated subsection (and accompanying table) that enumerates, for each of the 14 datasets, the exact source videos or clips used for training versus testing, the number of real and fake frames in each split, and the temporal or identity-based partitioning strategy employed to avoid leakage. We will also explicitly state that all reported baseline results were obtained by re-implementing the competing methods ourselves under identical training protocols, data splits, optimizer settings, and evaluation metrics; any minor hyper-parameter deviations required by the original papers will be noted. The public code repository already contains the precise split-generation scripts and configuration files that reproduce these experiments, and we will add a README section that maps each table entry to the corresponding code path. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The paper's core contribution is an empirical method (GenD) that fine-tunes only LayerNorm parameters plus L2 normalization and metric learning, evaluated via comparative AUROC on 14 independent external benchmark datasets (2019-2025). The SOTA performance claim and two primary findings (importance of paired real-fake training; non-increasing difficulty over time) are direct experimental observations on held-out data, not quantities fitted within the training loop or reduced by construction to the method's inputs. No mathematical derivations, equations, uniqueness theorems, or ansatzes are invoked that collapse to self-definition or self-citation chains. The evaluation protocol is reproducible with public code and relies on external benchmarks rather than internal self-referential metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard transfer-learning assumptions about pre-trained vision encoders and metric-learning objectives, with no new free parameters or invented entities introduced beyond conventional training choices.

axioms (2)

domain assumption Pre-trained vision foundation models yield transferable features for deepfake detection tasks
The method adapts an existing encoder without re-deriving its feature properties.
domain assumption Enforcing a hyperspherical manifold via L2 normalization and metric learning improves generalization to unseen manipulations
This is the primary claimed enhancement mechanism.

pith-pipeline@v0.9.0 · 5762 in / 1362 out tokens · 43664 ms · 2026-05-18T23:43:36.110679+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

L2-normalized classification token … uniformity and alignment losses … Luniform = log E[exp(−2∥zx−zy∥²)] … Lalign = E[∥zx−zy∥²]
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

fine-tunes only the Layer Normalization parameters (0.03% of the total)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Energy-Based Constraint Networks: Learning Structural Coherence Across Modalities
cs.CV 2026-05 unverdicted novelty 7.0

Energy-based constraint networks learn structural coherence from contrastive pairs using frozen encoders, achieving 93.4% accuracy on text corruptions and 0.959 AUC on deepfake detection with composable branches that ...
Aletheia: Physics-Conditioned Localized Artifact Attention (PhyLAA-X) for End-to-End Generalizable and Robust Deepfake Video Detection
cs.CV 2026-04 unverdicted novelty 6.0

PhyLAA-X embeds physics-derived feature volumes into localized artifact attention for improved cross-generator generalization and adversarial robustness in deepfake detection.
Fractal Characterization of Low-Correlation Signals in AI-Generated Image Detection
cs.CV 2026-04 unverdicted novelty 4.0

Fractal characterization of low-correlation signals distinguishes AI-generated images from real ones with claimed robustness and superior performance.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · cited by 3 Pith papers · 5 internal anchors

[1]

Protecting world leaders against 8 deep fakes

Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. Protecting world leaders against 8 deep fakes. InCVPR Workshops, 2019. 2

work page 2019
[2]

Proactive image manipulation detection

Vishal Asnani, Xi Yin, Tal Hassner, Sijia Liu, and Xiaoming Liu. Proactive image manipulation detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15386–15395, 2022. 2

work page 2022
[3]

MALP: manipulation localization using a proactive scheme

Vishal Asnani, Xi Yin, Tal Hassner, and Xiaoming Liu. MALP: manipulation localization using a proactive scheme. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12343–12352, 2023. 2

work page 2023
[4]

Realistic and efficient face swapping: A unified approach with diffusion models

Sanoojan Baliah, Qinliang Lin, Shengcai Liao, Xiaodan Liang, and Muhammad Haris Khan. Realistic and efficient face swapping: A unified approach with diffusion models. In 2025 IEEE/CVF Winter Conference on Applications of Com- puter Vision (WACV), pages 1062–1071. IEEE, 2025. 1

work page 2025
[5]

The DeepSpeak Dataset

Sarah Barrington, Matyas Bohacek, and Hany Farid. Deep- speak dataset v1.0.arXiv preprint arXiv:2408.05366, 2024. 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Per- turb, attend, detect and localize (PADL): Robust proactive image defense.IEEE Access, 2025

Filippo Bartolucci, Iacopo Masi, and Giuseppe Lisanti. Per- turb, attend, detect and localize (PADL): Robust proactive image defense.IEEE Access, 2025. 2

work page 2025
[7]

Bit- Fit: Simple parameter-efficient fine-tuning for transformer- based masked language-models

Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. Bit- Fit: Simple parameter-efficient fine-tuning for transformer- based masked language-models. InProceedings of the 60th Annual Meeting of the Association for Computational Lin- guistics (Volume 2: Short Papers), pages 1–9, Dublin, Ire- land, 2022. Association for Computational Linguistics. 5

work page 2022
[8]

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Rasheed, et al. Perception encoder: The best visual embeddings are not at the output of the net- work.arXiv preprint arXiv:2504.13181, 2025. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

One-shot neu- ral face reenactment via finding directions in GAN’s latent space.International Journal of Computer Vision, 132(8): 3324–3354, 2024

Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, and Georgios Tzimiropoulos. One-shot neu- ral face reenactment via finding directions in GAN’s latent space.International Journal of Computer Vision, 132(8): 3324–3354, 2024. 1

work page 2024
[10]

Testing human ability to detect ‘deepfake’ images of human faces.Journal of Cybersecurity, 9(1):tyad011, 2023

Sergi D Bray, Shane D Johnson, and Bennett Kleinberg. Testing human ability to detect ‘deepfake’ images of human faces.Journal of Cybersecurity, 9(1):tyad011, 2023. 1

work page 2023
[11]

Can we leave deepfake data behind in training deepfake detector? InThe Thirty-eighth Annual Conference on Neural Information Processing Sys- tems, 2024

Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, and Chen Li. Can we leave deepfake data behind in training deepfake detector? InThe Thirty-eighth Annual Conference on Neural Information Processing Sys- tems, 2024. 5

work page 2024
[12]

Exploiting style latent flows for generalizing deepfake video detection

Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, and Jongwon Choi. Exploiting style latent flows for generalizing deepfake video detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1133–1143, 2024. 5

work page 2024
[13]

Forensics adapter: Adapting CLIP for generaliz- able face forgery detection

Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, and Junyu Dong. Forensics adapter: Adapting CLIP for generaliz- able face forgery detection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19207– 19217, 2025. 2, 4, 5, 6, 8

work page 2025
[14]

Veo 3, 2025.https://deepmind

Google DeepMind. Veo 3, 2025.https://deepmind. google/models/veo/. 2

work page 2025
[15]

Retinaface: Single-shot multi- level face localisation in the wild

Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kot- sia, and Stefanos Zafeiriou. Retinaface: Single-shot multi- level face localisation in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5203–5212, 2020. 3

work page 2020
[16]

The DeepFake Detection Challenge (DFDC) Dataset

Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. The deepfake detection challenge (DFDC) dataset.arXiv preprint arXiv:2006.07397, 2020. 4

work page internal anchor Pith review Pith/arXiv arXiv 2006
[17]

Deepfakes De- tection Dataset by Google & Jigsaw.https : / / research

Nicholas Dufour, Andrew Gully, Per Karlsson, Alexey Victor V orbyov, Thomas Leung, Jeremiah Childs, and Christoph Bregler. Deepfakes De- tection Dataset by Google & Jigsaw.https : / / research . google / blog / contributing - data - to - deepfake - detection - research/,

work page
[18]

Exploring unbiased deepfake detection via token-level shuffling and mixing.arXiv preprint arXiv:2501.04376,

Xinghe Fu, Zhiyuan Yan, Taiping Yao, Shen Chen, and Xi Li. Exploring unbiased deepfake detection via token-level shuffling and mixing.arXiv preprint arXiv:2501.04376,

work page arXiv
[19]

The expressive power of tuning only the normal- ization layers.arXiv preprint arXiv:2302.07937, 2023

Angeliki Giannou, Shashank Rajput, and Dimitris Papail- iopoulos. The expressive power of tuning only the normal- ization layers.arXiv preprint arXiv:2302.07937, 2023. 2

work page arXiv 2023
[20]

Lips don’t lie: A generalisable and robust approach to face forgery detection

Alexandros Haliassos, Konstantinos V ougioukas, Stavros Petridis, and Maja Pantic. Lips don’t lie: A generalisable and robust approach to face forgery detection. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5039–5049, 2021. 2, 4, 5

work page 2021
[21]

Leveraging real talking faces via self- supervision for robust forgery detection

Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, and Maja Pantic. Leveraging real talking faces via self- supervision for robust forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14950–14962, 2022. 5

work page 2022
[22]

Towards more general video-based deepfake detection through facial component guided adaptation for foundation model

Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, and Jun- Cheng Chen. Towards more general video-based deepfake detection through facial component guided adaptation for foundation model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22995–23005,

work page
[23]

Parameter-efficient fine-tuning for large models: A comprehensive survey.Transactions on Machine Learning Research, 2024

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. Parameter-efficient fine-tuning for large models: A comprehensive survey.Transactions on Machine Learning Research, 2024. 5

work page 2024
[24]

Unmasking illusions: Under- standing human perception of audiovisual deepfakes.arXiv preprint arXiv:2405.04097, 2024

Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, and Hsin-Min Wang. Unmasking illusions: Under- standing human perception of audiovisual deepfakes.arXiv preprint arXiv:2405.04097, 2024. 1

work page arXiv 2024
[25]

Polyglotfake: A novel multilingual and multimodal deepfake dataset

Yang Hou, Haitao Fu, Chunkai Chen, Zida Li, Haoyu Zhang, and Jianjun Zhao. Polyglotfake: A novel multilingual and multimodal deepfake dataset. InInternational Conference on Pattern Recognition, pages 180–193. Springer, 2024. 4

work page 2024
[26]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 5

work page 2022
[27]

Model attribution of face- swap deepfake videos

Shan Jia, Xin Li, and Siwei Lyu. Model attribution of face- swap deepfake videos. In2022 IEEE International Confer- 9 ence on Image Processing (ICIP), pages 2356–2360. IEEE,

work page
[28]

FakeA VCeleb: A novel audio-video multimodal deepfake dataset.arXiv preprint arXiv:2108.05080,

Hasam Khalid, Shahroz Tariq, Minha Kim, and Simon S Woo. FakeA VCeleb: A novel audio-video multimodal deep- fake dataset.arXiv preprint arXiv:2108.05080, 2021. 4

work page arXiv 2021
[29]

Clipping the deception: Adapting vision-language models for univer- sal deepfake detection

Sohail Ahmed Khan and Duc-Tien Dang-Nguyen. Clipping the deception: Adapting vision-language models for univer- sal deepfake detection. InProceedings of the 2024 Inter- national Conference on Multimedia Retrieval, pages 1006– 1015, 2024. 2

work page 2024
[30]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Kodf: A large-scale korean deep- fake detection dataset

Patrick Kwon, Jaeseong You, Gyuhyeon Nam, Sungwoo Park, and Gyeongsu Chae. Kodf: A large-scale korean deep- fake detection dataset. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 10744– 10753, 2021. 4

work page 2021
[32]

Advancing high fidelity identity swapping for forgery detection

Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. Advancing high fidelity identity swapping for forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5074–5083,

work page
[33]

Vision-language model fine-tuning via simple parameter-efficient modification

Ming Li, Jike Zhong, Chenxin Li, Liuzhuozheng Li, Nie Lin, and Masashi Sugiyama. Vision-language model fine-tuning via simple parameter-efficient modification. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 14394–14410, Miami, Florida, USA, 2024. Association for Computational Linguistics. 5

work page 2024
[34]

Celeb-df: A large-scale challenging dataset for deep- fake forensics

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deep- fake forensics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3207– 3216, 2020. 4

work page 2020
[35]

Celeb-df++: A large-scale chal- lenging video deepfake benchmark for generalizable forensics,

Yuezun Li, Delong Zhu, Xinjie Cui, and Siwei Lyu. Celeb-df++: A large-scale challenging video deepfake benchmark for generalizable forensics.arXiv preprint arXiv:2507.18015, 2025. 3, 4

work page arXiv 2025
[36]

Forgery-aware adaptive transformer for generalizable synthetic image detection

Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Jingdong Wang, and Yao Zhao. Forgery-aware adaptive transformer for generalizable synthetic image detection. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 10770–10780, 2024. 2

work page 2024
[37]

Lips are lying: Spotting the temporal inconsistency between audio and visual in lip-syncing deep- fakes.Advances in Neural Information Processing Systems, 37:91131–91155, 2024

Weifeng Liu, Tianyi She, Jiawei Liu, Boheng Li, Dongyu Yao, and Run Wang. Lips are lying: Spotting the temporal inconsistency between audio and visual in lip-syncing deep- fakes.Advances in Neural Information Processing Systems, 37:91131–91155, 2024. 2, 4

work page 2024
[38]

Laa-net: Localized artifact attention network for quality-agnostic and generalizable deepfake de- tection

Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, and Djamila Aouada. Laa-net: Localized artifact attention network for quality-agnostic and generalizable deepfake de- tection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17395– 17405, 2024. 5

work page 2024
[39]

Ex- ploring self-supervised vision transformers for deepfake de- tection: A comparative analysis

Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. Ex- ploring self-supervised vision transformers for deepfake de- tection: A comparative analysis. InProceedings of the IEEE International Joint Conference on Biometrics (IJCB), pages 1–10, 2024. 2

work page 2024
[40]

Towards uni- versal fake image detectors that generalize across genera- tive models

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- versal fake image detectors that generalize across genera- tive models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24480– 24489, 2023. 1, 2, 4

work page 2023
[41]

Omnisync: Towards universal lip synchronization via diffusion transformers.arXiv preprint arXiv:2505.21448,

Ziqiao Peng, Jiwen Liu, Haoxian Zhang, Xiaoqiang Liu, Songlin Tang, Pengfei Wan, Di Zhang, Hongyan Liu, and Jun He. Omnisync: Towards universal lip synchronization via diffusion transformers.arXiv preprint arXiv:2505.21448,

work page arXiv
[42]

Parameter-efficient tuning on layer normalization for pre- trained language models.arXiv preprint arXiv:2211.08682,

Wang Qi, Yu-Ping Ruan, Yuan Zuo, and Taihao Li. Parameter-efficient tuning on layer normalization for pre- trained language models.arXiv preprint arXiv:2211.08682,

work page arXiv
[43]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InInternational Conference on Machine Learning, pages 8748–8763. PmLR, 2021. 1, 2

work page 2021
[44]

Faceforen- sics++: Learning to detect manipulated facial images

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Chris- tian Riess, Justus Thies, and Matthias Nießner. Faceforen- sics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1–11, 2019. 3, 4

work page 2019
[45]

Detecting deep- fakes with self-blended images

Kaede Shiohara and Toshihiko Yamasaki. Detecting deep- fakes with self-blended images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18720–18729, 2022. 2, 5

work page 2022
[46]

DINOv3

Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

Cyclical learning rates for training neural networks

Leslie N Smith. Cyclical learning rates for training neural networks. InProceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pages 464–472. IEEE, 2017. 3

work page 2017
[48]

Synthesia, 2024.https://www.synthesia.io. 2

work page 2024
[49]

In- triguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. In- triguing properties of neural networks. InProceedings of the International Conference on Learning Representations (ICLR), 2014. 7

work page 2014
[50]

Real appearance mod- eling for more general deepfake detection

Jiahe Tian, Cai Yu, Xi Wang, Peng Chen, Zihao Xiao, Jiao Dai, Jizhong Han, and Yesheng Chai. Real appearance mod- eling for more general deepfake detection. InEuropean Con- ference on Computer Vision, pages 402–419. Springer, 2024. 5

work page 2024
[51]

Layernorm: A key component in parameter-efficient fine-tuning.arXiv preprint arXiv:2403.20284, 2024

Taha ValizadehAslani and Hualou Liang. Layernorm: A key component in parameter-efficient fine-tuning.arXiv preprint arXiv:2403.20284, 2024. 2

work page arXiv 2024
[52]

Understanding contrastive representation learning through alignment and uniformity on the hypersphere

Tongzhou Wang and Phillip Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. InInternational conference on machine learning, pages 9929–9939. PMLR, 2020. 2, 5 10

work page 2020
[53]

Altfreezing for more general video face forgery detection

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, and Houqiang Li. Altfreezing for more general video face forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4129–4138, 2023. 5

work page 2023
[54]

Identity-driven multimedia forgery detection via reference assistance

Junhao Xu, Jingjing Chen, Xue Song, Feng Han, Hai- jun Shan, and Yu-Gang Jiang. Identity-driven multimedia forgery detection via reference assistance. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3887–3896, 2024. 4

work page 2024
[55]

Learning spatiotemporal inconsistency via thumbnail layout for face deepfake detection.International Journal of Com- puter Vision, 132(12):5663–5680, 2024

Yuting Xu, Jian Liang, Lijun Sheng, and Xiao-Yu Zhang. Learning spatiotemporal inconsistency via thumbnail layout for face deepfake detection.International Journal of Com- puter Vision, 132(12):5663–5680, 2024. 5

work page 2024
[56]

Deepfakebench: A comprehensive benchmark of deepfake detection

Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, and Baoyuan Wu. Deepfakebench: A comprehensive benchmark of deepfake detection. InAdvances in Neural Information Processing Systems, pages 4534–4565. Curran Associates, Inc., 2023. 3

work page 2023
[57]

Transcending forgery specificity with latent space augmentation for generalizable deepfake detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, and Baoyuan Wu. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 8984–8994, 2024. 5

work page 2024
[58]

Orthogonal subspace decomposi- tion for generalizable AI-generated image detection

Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decomposi- tion for generalizable AI-generated image detection. InPro- ceedings of the International Conference on Machine Learn- ing, 2025. 2, 4, 5, 6, 8

work page 2025
[59]

Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen, Mingyi Guo, Xinghe Fu, Taiping Yao, Shouhong Ding, Yunsheng Wu, and Li Yuan. Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12615–12625, 2025. 4, 5

work page 2025
[60]

Exposing deep fakes using inconsistent head poses

Xin Yang, Yuezun Li, and Siwei Lyu. Exposing deep fakes using inconsistent head poses. InProceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8261–8265. IEEE, 2019. 4

work page 2019
[61]

Faceguard: Proactive deepfake detection.arXiv preprint arXiv:2109.05673, 2021

Yuankun Yang, Chenyue Liang, Hongyu He, Xiaoyu Cao, and Neil Zhenqiang Gong. Faceguard: Proactive deepfake detection.arXiv preprint arXiv:2109.05673, 2021. 2

work page arXiv 2021
[62]

Defending fake via warning: Universal proactive defense against face manipulation.IEEE Signal Processing Letters, 30:1072–1076, 2023

Rui Zhai, Rongrong Ni, Yu Chen, Yang Yu, and Yao Zhao. Defending fake via warning: Universal proactive defense against face manipulation.IEEE Signal Processing Letters, 30:1072–1076, 2023. 2

work page 2023
[63]

Learning natural consistency represen- tation for face forgery video detection

Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, and Shiming Ge. Learning natural consistency represen- tation for face forgery video detection. InEuropean Con- ference on Computer Vision, pages 407–424. Springer, 2024. 5

work page 2024
[64]

Learning self-consistency for deepfake detection

Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia. Learning self-consistency for deepfake detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15023–15033, 2021. 5

work page 2021
[65]

Proactive image manipulation detection via deep semi-fragile watermark.Neurocomputing, 585:127593,

Yuan Zhao, Bo Liu, Tianqing Zhu, Ming Ding, Xin Yu, and Wanlei Zhou. Proactive image manipulation detection via deep semi-fragile watermark.Neurocomputing, 585:127593,

work page
[66]

Exploring temporal coherence for more gen- eral video face forgery detection

Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. Exploring temporal coherence for more gen- eral video face forgery detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 15044–15054, 2021. 4, 5

work page 2021
[67]

Face forensics in the wild

Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, and Jian- bing Shen. Face forensics in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5778–5788, 2021. 3, 4 11

work page 2021

[1] [1]

Protecting world leaders against 8 deep fakes

Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. Protecting world leaders against 8 deep fakes. InCVPR Workshops, 2019. 2

work page 2019

[2] [2]

Proactive image manipulation detection

Vishal Asnani, Xi Yin, Tal Hassner, Sijia Liu, and Xiaoming Liu. Proactive image manipulation detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15386–15395, 2022. 2

work page 2022

[3] [3]

MALP: manipulation localization using a proactive scheme

Vishal Asnani, Xi Yin, Tal Hassner, and Xiaoming Liu. MALP: manipulation localization using a proactive scheme. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12343–12352, 2023. 2

work page 2023

[4] [4]

Realistic and efficient face swapping: A unified approach with diffusion models

Sanoojan Baliah, Qinliang Lin, Shengcai Liao, Xiaodan Liang, and Muhammad Haris Khan. Realistic and efficient face swapping: A unified approach with diffusion models. In 2025 IEEE/CVF Winter Conference on Applications of Com- puter Vision (WACV), pages 1062–1071. IEEE, 2025. 1

work page 2025

[5] [5]

The DeepSpeak Dataset

Sarah Barrington, Matyas Bohacek, and Hany Farid. Deep- speak dataset v1.0.arXiv preprint arXiv:2408.05366, 2024. 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

Per- turb, attend, detect and localize (PADL): Robust proactive image defense.IEEE Access, 2025

Filippo Bartolucci, Iacopo Masi, and Giuseppe Lisanti. Per- turb, attend, detect and localize (PADL): Robust proactive image defense.IEEE Access, 2025. 2

work page 2025

[7] [7]

Bit- Fit: Simple parameter-efficient fine-tuning for transformer- based masked language-models

Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. Bit- Fit: Simple parameter-efficient fine-tuning for transformer- based masked language-models. InProceedings of the 60th Annual Meeting of the Association for Computational Lin- guistics (Volume 2: Short Papers), pages 1–9, Dublin, Ire- land, 2022. Association for Computational Linguistics. 5

work page 2022

[8] [8]

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Rasheed, et al. Perception encoder: The best visual embeddings are not at the output of the net- work.arXiv preprint arXiv:2504.13181, 2025. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

One-shot neu- ral face reenactment via finding directions in GAN’s latent space.International Journal of Computer Vision, 132(8): 3324–3354, 2024

Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, and Georgios Tzimiropoulos. One-shot neu- ral face reenactment via finding directions in GAN’s latent space.International Journal of Computer Vision, 132(8): 3324–3354, 2024. 1

work page 2024

[10] [10]

Testing human ability to detect ‘deepfake’ images of human faces.Journal of Cybersecurity, 9(1):tyad011, 2023

Sergi D Bray, Shane D Johnson, and Bennett Kleinberg. Testing human ability to detect ‘deepfake’ images of human faces.Journal of Cybersecurity, 9(1):tyad011, 2023. 1

work page 2023

[11] [11]

Can we leave deepfake data behind in training deepfake detector? InThe Thirty-eighth Annual Conference on Neural Information Processing Sys- tems, 2024

Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, and Chen Li. Can we leave deepfake data behind in training deepfake detector? InThe Thirty-eighth Annual Conference on Neural Information Processing Sys- tems, 2024. 5

work page 2024

[12] [12]

Exploiting style latent flows for generalizing deepfake video detection

Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, and Jongwon Choi. Exploiting style latent flows for generalizing deepfake video detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1133–1143, 2024. 5

work page 2024

[13] [13]

Forensics adapter: Adapting CLIP for generaliz- able face forgery detection

Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, and Junyu Dong. Forensics adapter: Adapting CLIP for generaliz- able face forgery detection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19207– 19217, 2025. 2, 4, 5, 6, 8

work page 2025

[14] [14]

Veo 3, 2025.https://deepmind

Google DeepMind. Veo 3, 2025.https://deepmind. google/models/veo/. 2

work page 2025

[15] [15]

Retinaface: Single-shot multi- level face localisation in the wild

Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kot- sia, and Stefanos Zafeiriou. Retinaface: Single-shot multi- level face localisation in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5203–5212, 2020. 3

work page 2020

[16] [16]

The DeepFake Detection Challenge (DFDC) Dataset

Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. The deepfake detection challenge (DFDC) dataset.arXiv preprint arXiv:2006.07397, 2020. 4

work page internal anchor Pith review Pith/arXiv arXiv 2006

[17] [17]

Deepfakes De- tection Dataset by Google & Jigsaw.https : / / research

Nicholas Dufour, Andrew Gully, Per Karlsson, Alexey Victor V orbyov, Thomas Leung, Jeremiah Childs, and Christoph Bregler. Deepfakes De- tection Dataset by Google & Jigsaw.https : / / research . google / blog / contributing - data - to - deepfake - detection - research/,

work page

[18] [18]

Exploring unbiased deepfake detection via token-level shuffling and mixing.arXiv preprint arXiv:2501.04376,

Xinghe Fu, Zhiyuan Yan, Taiping Yao, Shen Chen, and Xi Li. Exploring unbiased deepfake detection via token-level shuffling and mixing.arXiv preprint arXiv:2501.04376,

work page arXiv

[19] [19]

The expressive power of tuning only the normal- ization layers.arXiv preprint arXiv:2302.07937, 2023

Angeliki Giannou, Shashank Rajput, and Dimitris Papail- iopoulos. The expressive power of tuning only the normal- ization layers.arXiv preprint arXiv:2302.07937, 2023. 2

work page arXiv 2023

[20] [20]

Lips don’t lie: A generalisable and robust approach to face forgery detection

Alexandros Haliassos, Konstantinos V ougioukas, Stavros Petridis, and Maja Pantic. Lips don’t lie: A generalisable and robust approach to face forgery detection. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5039–5049, 2021. 2, 4, 5

work page 2021

[21] [21]

Leveraging real talking faces via self- supervision for robust forgery detection

Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, and Maja Pantic. Leveraging real talking faces via self- supervision for robust forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14950–14962, 2022. 5

work page 2022

[22] [22]

Towards more general video-based deepfake detection through facial component guided adaptation for foundation model

Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, and Jun- Cheng Chen. Towards more general video-based deepfake detection through facial component guided adaptation for foundation model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22995–23005,

work page

[23] [23]

Parameter-efficient fine-tuning for large models: A comprehensive survey.Transactions on Machine Learning Research, 2024

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. Parameter-efficient fine-tuning for large models: A comprehensive survey.Transactions on Machine Learning Research, 2024. 5

work page 2024

[24] [24]

Unmasking illusions: Under- standing human perception of audiovisual deepfakes.arXiv preprint arXiv:2405.04097, 2024

Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, and Hsin-Min Wang. Unmasking illusions: Under- standing human perception of audiovisual deepfakes.arXiv preprint arXiv:2405.04097, 2024. 1

work page arXiv 2024

[25] [25]

Polyglotfake: A novel multilingual and multimodal deepfake dataset

Yang Hou, Haitao Fu, Chunkai Chen, Zida Li, Haoyu Zhang, and Jianjun Zhao. Polyglotfake: A novel multilingual and multimodal deepfake dataset. InInternational Conference on Pattern Recognition, pages 180–193. Springer, 2024. 4

work page 2024

[26] [26]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 5

work page 2022

[27] [27]

Model attribution of face- swap deepfake videos

Shan Jia, Xin Li, and Siwei Lyu. Model attribution of face- swap deepfake videos. In2022 IEEE International Confer- 9 ence on Image Processing (ICIP), pages 2356–2360. IEEE,

work page

[28] [28]

FakeA VCeleb: A novel audio-video multimodal deepfake dataset.arXiv preprint arXiv:2108.05080,

Hasam Khalid, Shahroz Tariq, Minha Kim, and Simon S Woo. FakeA VCeleb: A novel audio-video multimodal deep- fake dataset.arXiv preprint arXiv:2108.05080, 2021. 4

work page arXiv 2021

[29] [29]

Clipping the deception: Adapting vision-language models for univer- sal deepfake detection

Sohail Ahmed Khan and Duc-Tien Dang-Nguyen. Clipping the deception: Adapting vision-language models for univer- sal deepfake detection. InProceedings of the 2024 Inter- national Conference on Multimedia Retrieval, pages 1006– 1015, 2024. 2

work page 2024

[30] [30]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

Kodf: A large-scale korean deep- fake detection dataset

Patrick Kwon, Jaeseong You, Gyuhyeon Nam, Sungwoo Park, and Gyeongsu Chae. Kodf: A large-scale korean deep- fake detection dataset. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 10744– 10753, 2021. 4

work page 2021

[32] [32]

Advancing high fidelity identity swapping for forgery detection

Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. Advancing high fidelity identity swapping for forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5074–5083,

work page

[33] [33]

Vision-language model fine-tuning via simple parameter-efficient modification

Ming Li, Jike Zhong, Chenxin Li, Liuzhuozheng Li, Nie Lin, and Masashi Sugiyama. Vision-language model fine-tuning via simple parameter-efficient modification. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 14394–14410, Miami, Florida, USA, 2024. Association for Computational Linguistics. 5

work page 2024

[34] [34]

Celeb-df: A large-scale challenging dataset for deep- fake forensics

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deep- fake forensics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3207– 3216, 2020. 4

work page 2020

[35] [35]

Celeb-df++: A large-scale chal- lenging video deepfake benchmark for generalizable forensics,

Yuezun Li, Delong Zhu, Xinjie Cui, and Siwei Lyu. Celeb-df++: A large-scale challenging video deepfake benchmark for generalizable forensics.arXiv preprint arXiv:2507.18015, 2025. 3, 4

work page arXiv 2025

[36] [36]

Forgery-aware adaptive transformer for generalizable synthetic image detection

Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Jingdong Wang, and Yao Zhao. Forgery-aware adaptive transformer for generalizable synthetic image detection. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 10770–10780, 2024. 2

work page 2024

[37] [37]

Lips are lying: Spotting the temporal inconsistency between audio and visual in lip-syncing deep- fakes.Advances in Neural Information Processing Systems, 37:91131–91155, 2024

Weifeng Liu, Tianyi She, Jiawei Liu, Boheng Li, Dongyu Yao, and Run Wang. Lips are lying: Spotting the temporal inconsistency between audio and visual in lip-syncing deep- fakes.Advances in Neural Information Processing Systems, 37:91131–91155, 2024. 2, 4

work page 2024

[38] [38]

Laa-net: Localized artifact attention network for quality-agnostic and generalizable deepfake de- tection

Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, and Djamila Aouada. Laa-net: Localized artifact attention network for quality-agnostic and generalizable deepfake de- tection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17395– 17405, 2024. 5

work page 2024

[39] [39]

Ex- ploring self-supervised vision transformers for deepfake de- tection: A comparative analysis

Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. Ex- ploring self-supervised vision transformers for deepfake de- tection: A comparative analysis. InProceedings of the IEEE International Joint Conference on Biometrics (IJCB), pages 1–10, 2024. 2

work page 2024

[40] [40]

Towards uni- versal fake image detectors that generalize across genera- tive models

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- versal fake image detectors that generalize across genera- tive models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24480– 24489, 2023. 1, 2, 4

work page 2023

[41] [41]

Omnisync: Towards universal lip synchronization via diffusion transformers.arXiv preprint arXiv:2505.21448,

Ziqiao Peng, Jiwen Liu, Haoxian Zhang, Xiaoqiang Liu, Songlin Tang, Pengfei Wan, Di Zhang, Hongyan Liu, and Jun He. Omnisync: Towards universal lip synchronization via diffusion transformers.arXiv preprint arXiv:2505.21448,

work page arXiv

[42] [42]

Parameter-efficient tuning on layer normalization for pre- trained language models.arXiv preprint arXiv:2211.08682,

Wang Qi, Yu-Ping Ruan, Yuan Zuo, and Taihao Li. Parameter-efficient tuning on layer normalization for pre- trained language models.arXiv preprint arXiv:2211.08682,

work page arXiv

[43] [43]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InInternational Conference on Machine Learning, pages 8748–8763. PmLR, 2021. 1, 2

work page 2021

[44] [44]

Faceforen- sics++: Learning to detect manipulated facial images

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Chris- tian Riess, Justus Thies, and Matthias Nießner. Faceforen- sics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1–11, 2019. 3, 4

work page 2019

[45] [45]

Detecting deep- fakes with self-blended images

Kaede Shiohara and Toshihiko Yamasaki. Detecting deep- fakes with self-blended images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18720–18729, 2022. 2, 5

work page 2022

[46] [46]

DINOv3

Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2025

[47] [47]

Cyclical learning rates for training neural networks

Leslie N Smith. Cyclical learning rates for training neural networks. InProceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pages 464–472. IEEE, 2017. 3

work page 2017

[48] [48]

Synthesia, 2024.https://www.synthesia.io. 2

work page 2024

[49] [49]

In- triguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. In- triguing properties of neural networks. InProceedings of the International Conference on Learning Representations (ICLR), 2014. 7

work page 2014

[50] [50]

Real appearance mod- eling for more general deepfake detection

Jiahe Tian, Cai Yu, Xi Wang, Peng Chen, Zihao Xiao, Jiao Dai, Jizhong Han, and Yesheng Chai. Real appearance mod- eling for more general deepfake detection. InEuropean Con- ference on Computer Vision, pages 402–419. Springer, 2024. 5

work page 2024

[51] [51]

Layernorm: A key component in parameter-efficient fine-tuning.arXiv preprint arXiv:2403.20284, 2024

Taha ValizadehAslani and Hualou Liang. Layernorm: A key component in parameter-efficient fine-tuning.arXiv preprint arXiv:2403.20284, 2024. 2

work page arXiv 2024

[52] [52]

Understanding contrastive representation learning through alignment and uniformity on the hypersphere

Tongzhou Wang and Phillip Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. InInternational conference on machine learning, pages 9929–9939. PMLR, 2020. 2, 5 10

work page 2020

[53] [53]

Altfreezing for more general video face forgery detection

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, and Houqiang Li. Altfreezing for more general video face forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4129–4138, 2023. 5

work page 2023

[54] [54]

Identity-driven multimedia forgery detection via reference assistance

Junhao Xu, Jingjing Chen, Xue Song, Feng Han, Hai- jun Shan, and Yu-Gang Jiang. Identity-driven multimedia forgery detection via reference assistance. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3887–3896, 2024. 4

work page 2024

[55] [55]

Learning spatiotemporal inconsistency via thumbnail layout for face deepfake detection.International Journal of Com- puter Vision, 132(12):5663–5680, 2024

Yuting Xu, Jian Liang, Lijun Sheng, and Xiao-Yu Zhang. Learning spatiotemporal inconsistency via thumbnail layout for face deepfake detection.International Journal of Com- puter Vision, 132(12):5663–5680, 2024. 5

work page 2024

[56] [56]

Deepfakebench: A comprehensive benchmark of deepfake detection

Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, and Baoyuan Wu. Deepfakebench: A comprehensive benchmark of deepfake detection. InAdvances in Neural Information Processing Systems, pages 4534–4565. Curran Associates, Inc., 2023. 3

work page 2023

[57] [57]

Transcending forgery specificity with latent space augmentation for generalizable deepfake detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, and Baoyuan Wu. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 8984–8994, 2024. 5

work page 2024

[58] [58]

Orthogonal subspace decomposi- tion for generalizable AI-generated image detection

Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decomposi- tion for generalizable AI-generated image detection. InPro- ceedings of the International Conference on Machine Learn- ing, 2025. 2, 4, 5, 6, 8

work page 2025

[59] [59]

Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen, Mingyi Guo, Xinghe Fu, Taiping Yao, Shouhong Ding, Yunsheng Wu, and Li Yuan. Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12615–12625, 2025. 4, 5

work page 2025

[60] [60]

Exposing deep fakes using inconsistent head poses

Xin Yang, Yuezun Li, and Siwei Lyu. Exposing deep fakes using inconsistent head poses. InProceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8261–8265. IEEE, 2019. 4

work page 2019

[61] [61]

Faceguard: Proactive deepfake detection.arXiv preprint arXiv:2109.05673, 2021

Yuankun Yang, Chenyue Liang, Hongyu He, Xiaoyu Cao, and Neil Zhenqiang Gong. Faceguard: Proactive deepfake detection.arXiv preprint arXiv:2109.05673, 2021. 2

work page arXiv 2021

[62] [62]

Defending fake via warning: Universal proactive defense against face manipulation.IEEE Signal Processing Letters, 30:1072–1076, 2023

Rui Zhai, Rongrong Ni, Yu Chen, Yang Yu, and Yao Zhao. Defending fake via warning: Universal proactive defense against face manipulation.IEEE Signal Processing Letters, 30:1072–1076, 2023. 2

work page 2023

[63] [63]

Learning natural consistency represen- tation for face forgery video detection

Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, and Shiming Ge. Learning natural consistency represen- tation for face forgery video detection. InEuropean Con- ference on Computer Vision, pages 407–424. Springer, 2024. 5

work page 2024

[64] [64]

Learning self-consistency for deepfake detection

Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia. Learning self-consistency for deepfake detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15023–15033, 2021. 5

work page 2021

[65] [65]

Proactive image manipulation detection via deep semi-fragile watermark.Neurocomputing, 585:127593,

Yuan Zhao, Bo Liu, Tianqing Zhu, Ming Ding, Xin Yu, and Wanlei Zhou. Proactive image manipulation detection via deep semi-fragile watermark.Neurocomputing, 585:127593,

work page

[66] [66]

Exploring temporal coherence for more gen- eral video face forgery detection

Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. Exploring temporal coherence for more gen- eral video face forgery detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 15044–15054, 2021. 4, 5

work page 2021

[67] [67]

Face forensics in the wild

Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, and Jian- bing Shen. Face forensics in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5778–5788, 2021. 3, 4 11

work page 2021