arxiv: 2604.17477 · v1 · submitted 2026-04-19 · 💻 cs.CV · cs.LG

Recognition: unknown

Unveiling Deepfakes: A Frequency-Aware Triple Branch Network for Deepfake Detection

Qihao Shen , Jiaxing Xuan , Zhenguang Liu , Sifan Wu , Yutong Xie , Zhaoyan Ming , Yingying Jiao , Kui Ren

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:41 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords deepfake detectionfrequency analysistriple-branch networkmutual informationfeature decouplingforgery robustnessimage reconstruction

0 comments

The pith

A triple-branch network jointly analyzes original and frequency-reconstructed images with mutual information losses to detect deepfakes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing deepfake detectors often focus on limited frequency ranges, causing them to overfit to specific artifacts and miss varied forgery patterns. They also suffer from redundant features when multiple branches attend to the same manipulated regions. This paper introduces a triple-branch architecture that processes the original image alongside reconstructions from different frequency channels, paired with new losses derived from mutual information theory to decouple overlapping features and fuse complementary ones. The result is improved generalization across forgery types. Experiments confirm superior performance on six large benchmark datasets.

Core claim

We propose a triple-branch network that jointly captures spatial and frequency features by learning from both the original image and image reconstructed by different frequency channels, and we mathematically derive feature decoupling and fusion losses grounded in the mutual information theory, which enhances the model to focus on task-relevant features across the original image and the image reconstructed by different frequency channels. Extensive experiments on six large-scale benchmark datasets demonstrate that our method consistently achieves state-of-the-art performance.

What carries the argument

Triple-branch network with frequency channel reconstructions and mutual-information-based decoupling and fusion losses, which separate redundant features and promote complementary spatial-frequency clues.

If this is right

The model reduces overfitting to particular frequency artifacts by incorporating multiple reconstruction branches.
Feature representations become more diverse as the losses prevent branches from focusing on identical forged regions.
Generalization capability improves for detecting diverse and unseen manipulation techniques.
Consistent state-of-the-art performance is achieved across six large-scale benchmark datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This frequency-aware approach could be extended to other image forensics tasks such as identifying image manipulations beyond deepfakes.
The mutual information losses may help in designing multi-view networks for tasks like object detection where feature complementarity matters.
Real-world deployment would benefit from testing on deepfakes from social media platforms with compression artifacts.
Adaptive selection of frequency channels instead of fixed ones might further enhance the method's coverage of forgery types.

Load-bearing premise

The mathematically derived feature decoupling and fusion losses will reliably enhance focus on task-relevant complementary features across branches without introducing new overfitting risks when facing diverse forgery patterns.

What would settle it

Testing on a dataset of deepfakes engineered with uniformly distributed frequency artifacts or where all single-frequency methods fail would show whether the triple-branch design and losses deliver a genuine advantage.

Figures

Figures reproduced from arXiv: 2604.17477 by Jiaxing Xuan, Kui Ren, Qihao Shen, Sifan Wu, Yingying Jiao, Yutong Xie, Zhaoyan Ming, Zhenguang Liu.

**Figure 1.** Figure 1: The input image is transformed into the frequency domain via discrete cosine transform (DCT), where different frequency channels reveal distinct [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of the proposed framework. The dynamic frequency channel selection (DFCS) module first transforms the input image into multiple [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Detailed flowchart of the Global Fusion Module (GFM). The module transforms the simply concatenated features F into a discriminative fused [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: In-dataset impact of channel number on AUC, ACC and F1 score. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Cross-dataset impact of channel number on AUC, ACC and F1 score. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Visual examples of our method on Deepfakes (DF), Face2Face (F2F), FaceSwap (FS), NeuralTextures (NT), CDF, CDF2, DFDC and DFDC-P datasets. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

Advanced deepfake technologies are blurring the lines between real and fake, presenting both revolutionary opportunities and alarming threats. While it unlocks novel applications in fields like entertainment and education, its malicious use has sparked urgent ethical and societal concerns ranging from identity theft to the dissemination of misinformation. To tackle these challenges, feature analysis using frequency features has emergedas a promising direction for deepfake detection. However, oneaspect that has been overlooked so far is that existing methodstend to concentrate on one or a few specific frequency domains,which risks overfitting to particular artifacts and significantlyundermines their robustness when facing diverse forgery patterns. Another underexplored aspect we observe is that different features often attend to the same forged region, resulting in redundant feature representations and limiting the diversity of the extracted clues. This may undermine the ability of a model to capture complementary information across different facets, thereby compromising its generalization capability to diverse manipulations. In this paper, we seek to tackle these challenges from two aspects: (1) we propose a triple-branch network that jointly captures spatial and frequency features by learning from both original image and image reconstructed by different frequency channels, and (2) we mathematically derive feature decoupling and fusion losses grounded in the mutual information theory, which enhances the model to focus on task-relevant features across the original image and the image reconstructed by different frequency channels. Extensive experiments on six large-scale benchmark datasets demonstrate that our method consistently achieves state-of-the-art performance. Our code is released at https://github.com/injooker/Unveiling Deepfake.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The triple-branch frequency setup with derived MI losses is a targeted incremental step for deepfake detection that directly tackles feature redundancy and narrow frequency overfitting.

read the letter

The paper's main move is a triple-branch network that feeds the original image plus reconstructions from different frequency channels, combined with decoupling and fusion losses pulled from mutual information theory. This setup aims to force the branches to pick up complementary clues instead of repeating the same forged regions or latching onto one frequency band. That matches the problems they flag in prior work, and deriving the losses from MI rather than pure empirical tuning gives it a cleaner foundation than many ad-hoc regularizers in the area. Releasing the code is also useful for anyone wanting to check the implementation. The experiments span six benchmarks and claim consistent SOTA, which at least shows they tested breadth. The soft spot is the MI estimation itself. In high-dimensional frequency features, variational or neural MI estimators often carry bias or variance that can make the intended decoupling unreliable, and the abstract gives no specifics on which estimator they used or how they validated it against overfitting. If the full paper only shows final accuracy numbers without ablation on the loss terms or stability checks, the robustness story stays hard to trust. This is for people already working on frequency-based deepfake detectors who need a concrete architecture plus loss recipe to build on. It has enough technical substance and reproducibility to warrant peer review, though the referees should press on the loss validation and variance reporting.

Referee Report

2 major / 2 minor

Summary. The paper proposes a triple-branch CNN architecture for deepfake detection that processes an input image together with two versions reconstructed from selected frequency channels. It introduces two losses derived from mutual information theory—one for decoupling redundant features across branches and one for fusing complementary task-relevant information—and reports state-of-the-art accuracy on six public benchmarks while releasing code.

Significance. If the MI-derived losses demonstrably produce non-redundant, forgery-sensitive representations without estimator-induced overfitting, the work would strengthen frequency-aware detection pipelines and reduce reliance on single-domain artifacts. The public code release supports reproducibility and allows direct verification of the claimed gains.

major comments (2)

[§3.2] §3.2, Eq. (4)–(6): The decoupling and fusion losses are presented as direct consequences of mutual-information identities, yet the implementation necessarily employs a variational or neural MI estimator for high-dimensional frequency features. No ablation, bias/variance analysis, or comparison against alternative estimators is reported, leaving open the possibility that the losses fail to enforce the intended complementarity and instead fit dataset-specific noise.
[§4.3] §4.3, Tables 2–5: The SOTA claim rests on single-run accuracy numbers across six datasets without reported standard deviations, multiple random seeds, or statistical significance tests against the strongest baselines. This omission prevents assessment of whether the observed margins are robust or could be explained by training stochasticity.

minor comments (2)

[Abstract] Abstract: typographical errors (“emergedas”, “oneaspect”) should be corrected for readability.
[§3.1] §3.1: The precise selection rule for the frequency channels used in reconstruction is stated only qualitatively; an explicit algorithm or hyper-parameter table would aid reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our frequency-aware triple-branch network and the mutual information losses. The comments highlight important aspects of estimator reliability and result robustness that we address below. We indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2, Eq. (4)–(6): The decoupling and fusion losses are presented as direct consequences of mutual-information identities, yet the implementation necessarily employs a variational or neural MI estimator for high-dimensional frequency features. No ablation, bias/variance analysis, or comparison against alternative estimators is reported, leaving open the possibility that the losses fail to enforce the intended complementarity and instead fit dataset-specific noise.

Authors: We agree that the practical implementation uses a neural MI estimator (MINE) to approximate the high-dimensional terms in Eqs. (4)–(6). The theoretical identities hold in the ideal case, and our empirical gains across six benchmarks support that the losses promote complementarity rather than noise fitting. However, the absence of estimator-specific ablations is a valid limitation. In the revised manuscript, we will expand §3.2 with a discussion of the estimator choice, add a comparison to an alternative (InfoNCE-based) estimator on one dataset, and include training curves for the MI loss values to illustrate stability. This provides partial but substantive additional evidence without requiring entirely new large-scale runs. revision: partial
Referee: [§4.3] §4.3, Tables 2–5: The SOTA claim rests on single-run accuracy numbers across six datasets without reported standard deviations, multiple random seeds, or statistical significance tests against the strongest baselines. This omission prevents assessment of whether the observed margins are robust or could be explained by training stochasticity.

Authors: The referee is correct that single-run results limit evaluation of variability due to training stochasticity. In the revised version, we will re-train our method and the primary baselines using multiple random seeds (minimum of three), report mean accuracy ± standard deviation in updated Tables 2–5, and add paired statistical significance tests against the strongest competitors. These changes will be incorporated into §4.3 to directly address concerns about robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: losses derived from external MI theory; architecture and claims are independent

full rationale

The paper proposes a triple-branch network motivated by observed limitations in prior frequency-based detectors and mathematically derives decoupling/fusion losses from mutual information theory as an external framework. No equations or steps reduce the claimed performance gains, feature complementarity, or SOTA results to fitted parameters, self-definitions, or self-citation chains by construction. Empirical validation on six benchmarks provides independent falsifiability rather than tautological equivalence to inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep learning optimization assumptions plus the validity of applying mutual information theory to enforce feature complementarity in this architecture; no new physical entities are introduced.

free parameters (2)

loss weighting coefficients for decoupling and fusion terms
Hyperparameters that balance the mutual information losses against the main detection objective and must be tuned on validation data.
number and selection of frequency channels for image reconstruction
Choices that determine which frequency bands are used in the two reconstruction branches.

axioms (1)

domain assumption Mutual information theory provides a principled way to quantify and minimize redundant information across feature branches for improved generalization.
Invoked to justify the derived decoupling and fusion losses.

pith-pipeline@v0.9.0 · 5600 in / 1320 out tokens · 42316 ms · 2026-05-10T06:41:23.222218+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 11 canonical work pages · 2 internal anchors

[1]

Face2face: Real-time face capture and reenactment of rgb videos,

J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2face: Real-time face capture and reenactment of rgb videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2387–2395

2016
[2]

Simswap: An efficient framework for high fidelity face swapping,

R. Chen, X. Chen, B. Ni, and Y . Ge, “Simswap: An efficient framework for high fidelity face swapping,” in Proceedings of the 28th ACM international conference on multimedia , 2020, pp. 2003–2011

2020
[3]

Fsgan: Subject agnostic face swap- ping and reenactment,

Y . Nirkin, Y . Keller, and T. Hassner, “Fsgan: Subject agnostic face swap- ping and reenactment,” in Proceedings of the IEEE/CVF international conference on computer vision , 2019, pp. 7184–7193

2019
[4]

Deep dual consecutive network for human pose estimation,

Z. Liu, H. Chen, R. Feng, S. Wu, S. Ji, B. Yang, and X. Wang, “Deep dual consecutive network for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2021, pp. 525–534

2021
[5]

Deepfakes and beyond: A survey of face manipulation and fake detection,

R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega- Garcia, “Deepfakes and beyond: A survey of face manipulation and fake detection,” Information Fusion, vol. 64, pp. 131–148, 2020

2020
[6]

Bi-directional distribution alignment for transductive zero-shot learning,

Z. Wang, Y . Hao, T. Mu, O. Li, S. Wang, and X. He, “Bi-directional distribution alignment for transductive zero-shot learning,” in Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 893–19 902

2023
[7]

Sfe-net: Spatial- frequency enhancement network for robust nuclei segmentation in histopathology images,

J. Chen, G. Yang, A. Liu, X. Chen, and J. Liu, “Sfe-net: Spatial- frequency enhancement network for robust nuclei segmentation in histopathology images,” Computers in Biology and Medicine , vol. 171, p. 108131, 2024

2024
[8]

Mix-dann and dynamic-modal-distillation for video domain adaptation,

Y . Yin, B. Zhu, J. Chen, L. Cheng, and Y .-G. Jiang, “Mix-dann and dynamic-modal-distillation for video domain adaptation,” inProceedings of the 30th ACM International Conference on Multimedia , 2022, pp. 3224–3233

2022
[9]

Dfil: Deepfake incremental learning by exploiting domain- invariant forgery clues,

K. Pan, Y . Yin, Y . Wei, F. Lin, Z. Ba, Z. Liu, Z. Wang, L. Cavallaro, and K. Ren, “Dfil: Deepfake incremental learning by exploiting domain- invariant forgery clues,” in Proceedings of the 31st ACM International Conference on Multimedia , 2023, pp. 8035–8046

2023
[10]

Copy motion from one to another: Fake motion video generation,

Z. Liu, S. Wu, C. Xu, X. Wang, L. Zhu, S. Wu, and F. Feng, “Copy motion from one to another: Fake motion video generation,” arXiv preprint arXiv:2205.01373, 2022

work page arXiv 2022
[11]

Syn- thesizing obama: learning lip sync from audio,

S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Syn- thesizing obama: learning lip sync from audio,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1–13, 2017

2017
[12]

W. Staff. (2022) Deepfake video of zelensky urging surrender circulates online. Accessed: 2025-08-16. [Online]. Available: https: //www.wired.com/story/zelensky-deepfake-facebook-twitter-playbook/

2022
[13]

Lips don’t lie: A generalisable and robust approach to face forgery detection,

A. Haliassos, K. V ougioukas, S. Petridis, and M. Pantic, “Lips don’t lie: A generalisable and robust approach to face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5039–5049

2021
[14]

Face x-ray for more general face forgery detection,

L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, “Face x-ray for more general face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5001–5010

2020
[15]

Thinking in frequency: Face forgery detection by mining frequency-aware clues,

Y . Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining frequency-aware clues,” in European conference on computer vision . Springer, 2020, pp. 86–103. 13

2020
[16]

Two-branch recurrent network for isolating deepfakes in videos,

I. Masi, A. Killekar, R. M. Mascarenhas, S. P. Gurudatt, and W. Ab- dAlmageed, “Two-branch recurrent network for isolating deepfakes in videos,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16. Springer, 2020, pp. 667–684

2020
[17]

Rich models for steganalysis of digital images,

J. Fridrich and J. Kodovsky, “Rich models for steganalysis of digital images,” IEEE Transactions on information Forensics and Security , vol. 7, no. 3, pp. 868–882, 2012

2012
[18]

Sfiad: Deepfake detec- tion through spatial-frequency feature integration and dynamic margin optimization,

Y . Kou, P. Li, H. Ma, J. Zhou, X. Li et al. , “Sfiad: Deepfake detec- tion through spatial-frequency feature integration and dynamic margin optimization,” Artificial Intelligence Review , vol. 58, no. 7, pp. 1–22, 2025

2025
[19]

A spatio-frequency cross fusion model for deepfake detection and segmentation,

J. Zheng, Y . Zhou, N. Zhang, X. Hu, K. Xu, D. Gao, and Z. Tang, “A spatio-frequency cross fusion model for deepfake detection and segmentation,” Neurocomputing, vol. 628, p. 129683, 2025

2025
[20]

Spatial- frequency feature fusion based deepfake detection through knowledge distillation,

B. Wang, X. Wu, F. Wang, Y . Zhang, F. Wei, and Z. Song, “Spatial- frequency feature fusion based deepfake detection through knowledge distillation,”Engineering Applications of Artificial Intelligence, vol. 133, p. 108341, 2024

2024
[21]

Df40: Toward next-generation deepfake detection.arXiv preprint arXiv:2406.13495, 2024

Z. Yan, T. Yao, S. Chen, Y . Zhao, X. Fu, J. Zhu, D. Luo, C. Wang, S. Ding, Y . Wu, and L. Yuan, “Df40: Toward next-generation deepfake detection,” 2024. [Online]. Available: https://arxiv.org/abs/2406.13495

work page arXiv 2024
[22]

Faceforensics++: Learning to detect manipulated facial images,

A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1–11

2019
[23]

Celeb-df: A large- scale challenging dataset for deepfake forensics,

Y . Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-df: A large- scale challenging dataset for deepfake forensics,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3207–3216

2020
[24]

detection challenge, 2020, accessed: 2025-01-12

D. detection challenge, 2020, accessed: 2025-01-12. [Online]. Available: https://kaggle.com/c/Deepfake-detection-challenge

2020
[25]

Locate and verify: A two-stream network for improved deepfake detection,

C. Shuai, J. Zhong, S. Wu, F. Lin, Z. Wang, Z. Ba, Z. Liu, L. Cavallaro, and K. Ren, “Locate and verify: A two-stream network for improved deepfake detection,” in Proceedings of the 31st ACM International Conference on Multimedia , 2023, pp. 7131–7142

2023
[26]

Freqblender: Enhancing deepfake detection by blending frequency knowledge,

H. Li, Y . Li, J. Zhou, B. Li, and J. Dong, “Freqblender: Enhancing deepfake detection by blending frequency knowledge,” arXiv preprint arXiv:2404.13872, 2024

work page arXiv 2024
[27]

Protecting celebrities from deepfake with identity consistency transformer,

X. Dong, J. Bao, D. Chen, T. Zhang, W. Zhang, N. Yu, D. Chen, F. Wen, and B. Guo, “Protecting celebrities from deepfake with identity consistency transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 9468–9478

2022
[28]

Learning second or- der local anomaly for general face forgery detection,

J. Fei, Y . Dai, P. Yu, T. Shen, Z. Xia, and J. Weng, “Learning second or- der local anomaly for general face forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 270–20 280

2022
[29]

Dual contrastive learning for general face forgery detection,

K. Sun, T. Yao, S. Chen, S. Ding, J. Li, and R. Ji, “Dual contrastive learning for general face forgery detection,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 36, no. 2, 2022, pp. 2316– 2324

2022
[30]

Uia-vit: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection,

W. Zhuang, Q. Chu, Z. Tan, Q. Liu, H. Yuan, C. Miao, Z. Luo, and N. Yu, “Uia-vit: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection,” in European conference on computer vision . Springer, 2022, pp. 391–407

2022
[31]

Detecting deepfakes with self-blended images,

K. Shiohara and T. Yamasaki, “Detecting deepfakes with self-blended images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 18 720–18 729

2022
[32]

Frequency-aware discrim- inative feature learning supervised by single-center loss for face forgery detection,

J. Li, H. Xie, J. Li, Z. Wang, and Y . Zhang, “Frequency-aware discrim- inative feature learning supervised by single-center loss for face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2021, pp. 6458–6467

2021
[33]

Leveraging frequency analysis for deep fake image recogni- tion,

J. Frank, T. Eisenhofer, L. Sch ¨onherr, A. Fischer, D. Kolossa, and T. Holz, “Leveraging frequency analysis for deep fake image recogni- tion,” in International conference on machine learning . PMLR, 2020, pp. 3247–3258

2020
[34]

Fcanet: Frequency channel attention networks,

Z. Qin, P. Zhang, F. Wu, and X. Li, “Fcanet: Frequency channel attention networks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 783–792

2021
[35]

Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions,

R. Durall, M. Keuper, and J. Keuper, “Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , June 2020

2020
[36]

Learning in the frequency domain,

K. Xu, M. Qin, F. Sun, Y . Wang, Y .-K. Chen, and F. Ren, “Learning in the frequency domain,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2020, pp. 1740–1749

2020
[37]

Privacy-preserving face recognition with learnable privacy budgets in frequency domain,

J. Ji, H. Wang, Y . Huang, J. Wu, X. Xu, S. Ding, S. Zhang, L. Cao, and R. Ji, “Privacy-preserving face recognition with learnable privacy budgets in frequency domain,” in European Conference on Computer Vision. Springer, 2022, pp. 475–491

2022
[38]

Exposing the deception: Uncovering more forgery clues for deepfake detection,

Z. Ba, Q. Liu, Z. Liu, S. Wu, F. Lin, L. Lu, and K. Ren, “Exposing the deception: Uncovering more forgery clues for deepfake detection,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 38, no. 2, 2024, pp. 719–728

2024
[39]

Adam: A Method for Stochastic Optimization

D. P. Kingma, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[40]

Auxiliary Tasks in Multi-task Learning , journal =

L. Liebel and M. K ¨orner, “Auxiliary tasks in multi-task learning,” arXiv preprint arXiv:1805.06334, 2018

work page arXiv 2018
[41]

The DeepFake Detection Challenge (DFDC) Dataset

B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. C. Ferrer, “The deepfake detection challenge (dfdc) dataset,” arXiv preprint arXiv:2006.07397, 2020

work page internal anchor Pith review arXiv 2006
[42]

detection challenge, 2020, accessed: 2025-01-12

D. detection challenge, 2020, accessed: 2025-01-12. [Online]. Available: https://github.com/deepfakes/faceswap

2020
[43]

[Online]

FaceSwap, 2016, accessed: 2025-01-12. [Online]. Available: https: //github.com/MarekKowalski/FaceSwap

2016
[44]

Deferred neural rendering: Image synthesis using neural textures,

J. Thies, M. Zollh ¨ofer, and M. Nießner, “Deferred neural rendering: Image synthesis using neural textures,” Acm Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1–12, 2019

2019
[45]

Mesonet: a compact facial video forgery detection network,

D. Afchar, V . Nozick, J. Yamagishi, and I. Echizen, “Mesonet: a compact facial video forgery detection network,” in 2018 IEEE international workshop on information forensics and security (WIFS) . IEEE, 2018, pp. 1–7

2018
[46]

Deepfakeucl: Deepfake detection via unsupervised contrastive learning,

S. Fung, X. Lu, C. Zhang, and C.-T. Li, “Deepfakeucl: Deepfake detection via unsupervised contrastive learning,” in 2021 international joint conference on neural networks (IJCNN) . IEEE, 2021, pp. 1–8

2021
[47]

Detecting deep- fake videos from appearance and behavior,

S. Agarwal, H. Farid, T. El-Gaaly, and S.-N. Lim, “Detecting deep- fake videos from appearance and behavior,” in 2020 IEEE international workshop on information forensics and security (WIFS) . IEEE, 2020, pp. 1–6

2020
[48]

Generalizing face forgery detection via uncertainty learning,

Y . Wu, X. Song, J. Chen, and Y .-G. Jiang, “Generalizing face forgery detection via uncertainty learning,” in Proceedings of the 31st ACM International Conference on Multimedia , 2023, pp. 1759–1767

2023
[49]

Detecting deepfake videos with temporal dropout 3dcnn

D. Zhang, C. Li, F. Lin, D. Zeng, and S. Ge, “Detecting deepfake videos with temporal dropout 3dcnn.” in IJCAI, 2021, pp. 1288–1294

2021
[50]

Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolu- tional network,

J. Hu, X. Liao, W. Wang, and Z. Qin, “Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolu- tional network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1089–1102, 2021

2021
[51]

Finfer: Frame inference- based deepfake detection for high-visual-quality videos,

J. Hu, X. Liao, J. Liang, W. Zhou, and Z. Qin, “Finfer: Frame inference- based deepfake detection for high-visual-quality videos,” in Proceedings of the AAAI conference on artificial intelligence , vol. 36, no. 1, 2022, pp. 951–959

2022
[52]

Ucf: Uncovering common features for generalizable deepfake detection,

Z. Yan, Y . Zhang, Y . Fan, and B. Wu, “Ucf: Uncovering common features for generalizable deepfake detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 22 412–22 423

2023
[53]

Exposing DeepFake Videos By Detecting Face Warping Artifacts

Y . Li, “Exposing deepfake videos by detecting face warping artif acts,” arXiv preprint arXiv:1811.00656 , 2018

work page Pith review arXiv 2018
[54]

Core: Consistent representation learning for face forgery detection,

Y . Ni, D. Meng, C. Yu, C. Quan, D. Ren, and Y . Zhao, “Core: Consistent representation learning for face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12–21

2022
[55]

Representative forgery mining for fake face detection,

C. Wang and W. Deng, “Representative forgery mining for fake face detection,” 2021. [Online]. Available: https://arxiv.org/abs/2104.06609

work page arXiv 2021
[56]

Hierarchical contrastive inconsistency learning for deepfake video detection,

Z. Gu, T. Yao, Y . Chen, S. Ding, and L. Ma, “Hierarchical contrastive inconsistency learning for deepfake video detection,” in European Con- ference on Computer Vision . Springer, 2022, pp. 596–613

2022
[57]

Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain,

H. Liu, X. Li, W. Zhou, Y . Chen, Y . He, H. Xue, W. Zhang, and N. Yu, “Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain,” 2021. [Online]. Available: https: //arxiv.org/abs/2103.01856

work page arXiv 2021
[58]

End-to- end reconstruction-classification learning for face forgery detection,

J. Cao, C. Ma, T. Yao, S. Chen, S. Ding, and X. Yang, “End-to- end reconstruction-classification learning for face forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4113–4122

2022
[59]

Preserving fairness generalization in deepfake detection,

L. Lin, X. He, Y . Ju, X. Wang, F. Ding, and S. Hu, “Preserving fairness generalization in deepfake detection,” 2024. [Online]. Available: https://arxiv.org/abs/2402.17229

work page arXiv 2024
[60]

Beyond the prior forgery knowledge: Mining critical clues for general face forgery detection,

A. Luo, C. Kong, J. Huang, Y . Hu, X. Kang, and A. C. Kot, “Beyond the prior forgery knowledge: Mining critical clues for general face forgery detection,” 2023. [Online]. Available: https://arxiv.org/abs/2304.12489 14

work page arXiv 2023
[61]

Grad-cam: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision , 2017, pp. 618–626. Qihao Shen received the B.S. degree in Software Engineering from Zhejiang University, Hangzhou, China, in...

2017
[62]

Her re- search interests include information security, arti- ficial intelligence applications, data security and trustworthy intelligent systems

She is currently a Engineer at State Grid Blockchain Technology (Beijing) Co., Ltd. Her re- search interests include information security, arti- ficial intelligence applications, data security and trustworthy intelligent systems. Zhaoyan Ming received the Ph.D. degree in com- puter science from the National University of Singa- pore, Singapore. She is cur...

2025