pith. machine review for the scientific record. sign in

arxiv: 2604.17477 · v1 · submitted 2026-04-19 · 💻 cs.CV · cs.LG

Recognition: unknown

Unveiling Deepfakes: A Frequency-Aware Triple Branch Network for Deepfake Detection

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:41 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords deepfake detectionfrequency analysistriple-branch networkmutual informationfeature decouplingforgery robustnessimage reconstruction
0
0 comments X

The pith

A triple-branch network jointly analyzes original and frequency-reconstructed images with mutual information losses to detect deepfakes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing deepfake detectors often focus on limited frequency ranges, causing them to overfit to specific artifacts and miss varied forgery patterns. They also suffer from redundant features when multiple branches attend to the same manipulated regions. This paper introduces a triple-branch architecture that processes the original image alongside reconstructions from different frequency channels, paired with new losses derived from mutual information theory to decouple overlapping features and fuse complementary ones. The result is improved generalization across forgery types. Experiments confirm superior performance on six large benchmark datasets.

Core claim

We propose a triple-branch network that jointly captures spatial and frequency features by learning from both the original image and image reconstructed by different frequency channels, and we mathematically derive feature decoupling and fusion losses grounded in the mutual information theory, which enhances the model to focus on task-relevant features across the original image and the image reconstructed by different frequency channels. Extensive experiments on six large-scale benchmark datasets demonstrate that our method consistently achieves state-of-the-art performance.

What carries the argument

Triple-branch network with frequency channel reconstructions and mutual-information-based decoupling and fusion losses, which separate redundant features and promote complementary spatial-frequency clues.

If this is right

  • The model reduces overfitting to particular frequency artifacts by incorporating multiple reconstruction branches.
  • Feature representations become more diverse as the losses prevent branches from focusing on identical forged regions.
  • Generalization capability improves for detecting diverse and unseen manipulation techniques.
  • Consistent state-of-the-art performance is achieved across six large-scale benchmark datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This frequency-aware approach could be extended to other image forensics tasks such as identifying image manipulations beyond deepfakes.
  • The mutual information losses may help in designing multi-view networks for tasks like object detection where feature complementarity matters.
  • Real-world deployment would benefit from testing on deepfakes from social media platforms with compression artifacts.
  • Adaptive selection of frequency channels instead of fixed ones might further enhance the method's coverage of forgery types.

Load-bearing premise

The mathematically derived feature decoupling and fusion losses will reliably enhance focus on task-relevant complementary features across branches without introducing new overfitting risks when facing diverse forgery patterns.

What would settle it

Testing on a dataset of deepfakes engineered with uniformly distributed frequency artifacts or where all single-frequency methods fail would show whether the triple-branch design and losses deliver a genuine advantage.

Figures

Figures reproduced from arXiv: 2604.17477 by Jiaxing Xuan, Kui Ren, Qihao Shen, Sifan Wu, Yingying Jiao, Yutong Xie, Zhaoyan Ming, Zhenguang Liu.

Figure 1
Figure 1. Figure 1: The input image is transformed into the frequency domain via discrete cosine transform (DCT), where different frequency channels reveal distinct [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed framework. The dynamic frequency channel selection (DFCS) module first transforms the input image into multiple [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detailed flowchart of the Global Fusion Module (GFM). The module transforms the simply concatenated features F into a discriminative fused [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: In-dataset impact of channel number on AUC, ACC and F1 score. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cross-dataset impact of channel number on AUC, ACC and F1 score. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual examples of our method on Deepfakes (DF), Face2Face (F2F), FaceSwap (FS), NeuralTextures (NT), CDF, CDF2, DFDC and DFDC-P datasets. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

Advanced deepfake technologies are blurring the lines between real and fake, presenting both revolutionary opportunities and alarming threats. While it unlocks novel applications in fields like entertainment and education, its malicious use has sparked urgent ethical and societal concerns ranging from identity theft to the dissemination of misinformation. To tackle these challenges, feature analysis using frequency features has emergedas a promising direction for deepfake detection. However, oneaspect that has been overlooked so far is that existing methodstend to concentrate on one or a few specific frequency domains,which risks overfitting to particular artifacts and significantlyundermines their robustness when facing diverse forgery patterns. Another underexplored aspect we observe is that different features often attend to the same forged region, resulting in redundant feature representations and limiting the diversity of the extracted clues. This may undermine the ability of a model to capture complementary information across different facets, thereby compromising its generalization capability to diverse manipulations. In this paper, we seek to tackle these challenges from two aspects: (1) we propose a triple-branch network that jointly captures spatial and frequency features by learning from both original image and image reconstructed by different frequency channels, and (2) we mathematically derive feature decoupling and fusion losses grounded in the mutual information theory, which enhances the model to focus on task-relevant features across the original image and the image reconstructed by different frequency channels. Extensive experiments on six large-scale benchmark datasets demonstrate that our method consistently achieves state-of-the-art performance. Our code is released at https://github.com/injooker/Unveiling Deepfake.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a triple-branch CNN architecture for deepfake detection that processes an input image together with two versions reconstructed from selected frequency channels. It introduces two losses derived from mutual information theory—one for decoupling redundant features across branches and one for fusing complementary task-relevant information—and reports state-of-the-art accuracy on six public benchmarks while releasing code.

Significance. If the MI-derived losses demonstrably produce non-redundant, forgery-sensitive representations without estimator-induced overfitting, the work would strengthen frequency-aware detection pipelines and reduce reliance on single-domain artifacts. The public code release supports reproducibility and allows direct verification of the claimed gains.

major comments (2)
  1. [§3.2] §3.2, Eq. (4)–(6): The decoupling and fusion losses are presented as direct consequences of mutual-information identities, yet the implementation necessarily employs a variational or neural MI estimator for high-dimensional frequency features. No ablation, bias/variance analysis, or comparison against alternative estimators is reported, leaving open the possibility that the losses fail to enforce the intended complementarity and instead fit dataset-specific noise.
  2. [§4.3] §4.3, Tables 2–5: The SOTA claim rests on single-run accuracy numbers across six datasets without reported standard deviations, multiple random seeds, or statistical significance tests against the strongest baselines. This omission prevents assessment of whether the observed margins are robust or could be explained by training stochasticity.
minor comments (2)
  1. [Abstract] Abstract: typographical errors (“emergedas”, “oneaspect”) should be corrected for readability.
  2. [§3.1] §3.1: The precise selection rule for the frequency channels used in reconstruction is stated only qualitatively; an explicit algorithm or hyper-parameter table would aid reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our frequency-aware triple-branch network and the mutual information losses. The comments highlight important aspects of estimator reliability and result robustness that we address below. We indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2, Eq. (4)–(6): The decoupling and fusion losses are presented as direct consequences of mutual-information identities, yet the implementation necessarily employs a variational or neural MI estimator for high-dimensional frequency features. No ablation, bias/variance analysis, or comparison against alternative estimators is reported, leaving open the possibility that the losses fail to enforce the intended complementarity and instead fit dataset-specific noise.

    Authors: We agree that the practical implementation uses a neural MI estimator (MINE) to approximate the high-dimensional terms in Eqs. (4)–(6). The theoretical identities hold in the ideal case, and our empirical gains across six benchmarks support that the losses promote complementarity rather than noise fitting. However, the absence of estimator-specific ablations is a valid limitation. In the revised manuscript, we will expand §3.2 with a discussion of the estimator choice, add a comparison to an alternative (InfoNCE-based) estimator on one dataset, and include training curves for the MI loss values to illustrate stability. This provides partial but substantive additional evidence without requiring entirely new large-scale runs. revision: partial

  2. Referee: [§4.3] §4.3, Tables 2–5: The SOTA claim rests on single-run accuracy numbers across six datasets without reported standard deviations, multiple random seeds, or statistical significance tests against the strongest baselines. This omission prevents assessment of whether the observed margins are robust or could be explained by training stochasticity.

    Authors: The referee is correct that single-run results limit evaluation of variability due to training stochasticity. In the revised version, we will re-train our method and the primary baselines using multiple random seeds (minimum of three), report mean accuracy ± standard deviation in updated Tables 2–5, and add paired statistical significance tests against the strongest competitors. These changes will be incorporated into §4.3 to directly address concerns about robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: losses derived from external MI theory; architecture and claims are independent

full rationale

The paper proposes a triple-branch network motivated by observed limitations in prior frequency-based detectors and mathematically derives decoupling/fusion losses from mutual information theory as an external framework. No equations or steps reduce the claimed performance gains, feature complementarity, or SOTA results to fitted parameters, self-definitions, or self-citation chains by construction. Empirical validation on six benchmarks provides independent falsifiability rather than tautological equivalence to inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep learning optimization assumptions plus the validity of applying mutual information theory to enforce feature complementarity in this architecture; no new physical entities are introduced.

free parameters (2)
  • loss weighting coefficients for decoupling and fusion terms
    Hyperparameters that balance the mutual information losses against the main detection objective and must be tuned on validation data.
  • number and selection of frequency channels for image reconstruction
    Choices that determine which frequency bands are used in the two reconstruction branches.
axioms (1)
  • domain assumption Mutual information theory provides a principled way to quantify and minimize redundant information across feature branches for improved generalization.
    Invoked to justify the derived decoupling and fusion losses.

pith-pipeline@v0.9.0 · 5600 in / 1320 out tokens · 42316 ms · 2026-05-10T06:41:23.222218+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 11 canonical work pages · 2 internal anchors

  1. [1]

    Face2face: Real-time face capture and reenactment of rgb videos,

    J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2face: Real-time face capture and reenactment of rgb videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2387–2395

  2. [2]

    Simswap: An efficient framework for high fidelity face swapping,

    R. Chen, X. Chen, B. Ni, and Y . Ge, “Simswap: An efficient framework for high fidelity face swapping,” in Proceedings of the 28th ACM international conference on multimedia , 2020, pp. 2003–2011

  3. [3]

    Fsgan: Subject agnostic face swap- ping and reenactment,

    Y . Nirkin, Y . Keller, and T. Hassner, “Fsgan: Subject agnostic face swap- ping and reenactment,” in Proceedings of the IEEE/CVF international conference on computer vision , 2019, pp. 7184–7193

  4. [4]

    Deep dual consecutive network for human pose estimation,

    Z. Liu, H. Chen, R. Feng, S. Wu, S. Ji, B. Yang, and X. Wang, “Deep dual consecutive network for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2021, pp. 525–534

  5. [5]

    Deepfakes and beyond: A survey of face manipulation and fake detection,

    R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega- Garcia, “Deepfakes and beyond: A survey of face manipulation and fake detection,” Information Fusion, vol. 64, pp. 131–148, 2020

  6. [6]

    Bi-directional distribution alignment for transductive zero-shot learning,

    Z. Wang, Y . Hao, T. Mu, O. Li, S. Wang, and X. He, “Bi-directional distribution alignment for transductive zero-shot learning,” in Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 893–19 902

  7. [7]

    Sfe-net: Spatial- frequency enhancement network for robust nuclei segmentation in histopathology images,

    J. Chen, G. Yang, A. Liu, X. Chen, and J. Liu, “Sfe-net: Spatial- frequency enhancement network for robust nuclei segmentation in histopathology images,” Computers in Biology and Medicine , vol. 171, p. 108131, 2024

  8. [8]

    Mix-dann and dynamic-modal-distillation for video domain adaptation,

    Y . Yin, B. Zhu, J. Chen, L. Cheng, and Y .-G. Jiang, “Mix-dann and dynamic-modal-distillation for video domain adaptation,” inProceedings of the 30th ACM International Conference on Multimedia , 2022, pp. 3224–3233

  9. [9]

    Dfil: Deepfake incremental learning by exploiting domain- invariant forgery clues,

    K. Pan, Y . Yin, Y . Wei, F. Lin, Z. Ba, Z. Liu, Z. Wang, L. Cavallaro, and K. Ren, “Dfil: Deepfake incremental learning by exploiting domain- invariant forgery clues,” in Proceedings of the 31st ACM International Conference on Multimedia , 2023, pp. 8035–8046

  10. [10]

    Copy motion from one to another: Fake motion video generation,

    Z. Liu, S. Wu, C. Xu, X. Wang, L. Zhu, S. Wu, and F. Feng, “Copy motion from one to another: Fake motion video generation,” arXiv preprint arXiv:2205.01373, 2022

  11. [11]

    Syn- thesizing obama: learning lip sync from audio,

    S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Syn- thesizing obama: learning lip sync from audio,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1–13, 2017

  12. [12]

    W. Staff. (2022) Deepfake video of zelensky urging surrender circulates online. Accessed: 2025-08-16. [Online]. Available: https: //www.wired.com/story/zelensky-deepfake-facebook-twitter-playbook/

  13. [13]

    Lips don’t lie: A generalisable and robust approach to face forgery detection,

    A. Haliassos, K. V ougioukas, S. Petridis, and M. Pantic, “Lips don’t lie: A generalisable and robust approach to face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5039–5049

  14. [14]

    Face x-ray for more general face forgery detection,

    L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, “Face x-ray for more general face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5001–5010

  15. [15]

    Thinking in frequency: Face forgery detection by mining frequency-aware clues,

    Y . Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining frequency-aware clues,” in European conference on computer vision . Springer, 2020, pp. 86–103. 13

  16. [16]

    Two-branch recurrent network for isolating deepfakes in videos,

    I. Masi, A. Killekar, R. M. Mascarenhas, S. P. Gurudatt, and W. Ab- dAlmageed, “Two-branch recurrent network for isolating deepfakes in videos,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16. Springer, 2020, pp. 667–684

  17. [17]

    Rich models for steganalysis of digital images,

    J. Fridrich and J. Kodovsky, “Rich models for steganalysis of digital images,” IEEE Transactions on information Forensics and Security , vol. 7, no. 3, pp. 868–882, 2012

  18. [18]

    Sfiad: Deepfake detec- tion through spatial-frequency feature integration and dynamic margin optimization,

    Y . Kou, P. Li, H. Ma, J. Zhou, X. Li et al. , “Sfiad: Deepfake detec- tion through spatial-frequency feature integration and dynamic margin optimization,” Artificial Intelligence Review , vol. 58, no. 7, pp. 1–22, 2025

  19. [19]

    A spatio-frequency cross fusion model for deepfake detection and segmentation,

    J. Zheng, Y . Zhou, N. Zhang, X. Hu, K. Xu, D. Gao, and Z. Tang, “A spatio-frequency cross fusion model for deepfake detection and segmentation,” Neurocomputing, vol. 628, p. 129683, 2025

  20. [20]

    Spatial- frequency feature fusion based deepfake detection through knowledge distillation,

    B. Wang, X. Wu, F. Wang, Y . Zhang, F. Wei, and Z. Song, “Spatial- frequency feature fusion based deepfake detection through knowledge distillation,”Engineering Applications of Artificial Intelligence, vol. 133, p. 108341, 2024

  21. [21]

    Df40: Toward next-generation deepfake detection.arXiv preprint arXiv:2406.13495, 2024

    Z. Yan, T. Yao, S. Chen, Y . Zhao, X. Fu, J. Zhu, D. Luo, C. Wang, S. Ding, Y . Wu, and L. Yuan, “Df40: Toward next-generation deepfake detection,” 2024. [Online]. Available: https://arxiv.org/abs/2406.13495

  22. [22]

    Faceforensics++: Learning to detect manipulated facial images,

    A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1–11

  23. [23]

    Celeb-df: A large- scale challenging dataset for deepfake forensics,

    Y . Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-df: A large- scale challenging dataset for deepfake forensics,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3207–3216

  24. [24]

    detection challenge, 2020, accessed: 2025-01-12

    D. detection challenge, 2020, accessed: 2025-01-12. [Online]. Available: https://kaggle.com/c/Deepfake-detection-challenge

  25. [25]

    Locate and verify: A two-stream network for improved deepfake detection,

    C. Shuai, J. Zhong, S. Wu, F. Lin, Z. Wang, Z. Ba, Z. Liu, L. Cavallaro, and K. Ren, “Locate and verify: A two-stream network for improved deepfake detection,” in Proceedings of the 31st ACM International Conference on Multimedia , 2023, pp. 7131–7142

  26. [26]

    Freqblender: Enhancing deepfake detection by blending frequency knowledge,

    H. Li, Y . Li, J. Zhou, B. Li, and J. Dong, “Freqblender: Enhancing deepfake detection by blending frequency knowledge,” arXiv preprint arXiv:2404.13872, 2024

  27. [27]

    Protecting celebrities from deepfake with identity consistency transformer,

    X. Dong, J. Bao, D. Chen, T. Zhang, W. Zhang, N. Yu, D. Chen, F. Wen, and B. Guo, “Protecting celebrities from deepfake with identity consistency transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 9468–9478

  28. [28]

    Learning second or- der local anomaly for general face forgery detection,

    J. Fei, Y . Dai, P. Yu, T. Shen, Z. Xia, and J. Weng, “Learning second or- der local anomaly for general face forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 270–20 280

  29. [29]

    Dual contrastive learning for general face forgery detection,

    K. Sun, T. Yao, S. Chen, S. Ding, J. Li, and R. Ji, “Dual contrastive learning for general face forgery detection,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 36, no. 2, 2022, pp. 2316– 2324

  30. [30]

    Uia-vit: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection,

    W. Zhuang, Q. Chu, Z. Tan, Q. Liu, H. Yuan, C. Miao, Z. Luo, and N. Yu, “Uia-vit: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection,” in European conference on computer vision . Springer, 2022, pp. 391–407

  31. [31]

    Detecting deepfakes with self-blended images,

    K. Shiohara and T. Yamasaki, “Detecting deepfakes with self-blended images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 18 720–18 729

  32. [32]

    Frequency-aware discrim- inative feature learning supervised by single-center loss for face forgery detection,

    J. Li, H. Xie, J. Li, Z. Wang, and Y . Zhang, “Frequency-aware discrim- inative feature learning supervised by single-center loss for face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2021, pp. 6458–6467

  33. [33]

    Leveraging frequency analysis for deep fake image recogni- tion,

    J. Frank, T. Eisenhofer, L. Sch ¨onherr, A. Fischer, D. Kolossa, and T. Holz, “Leveraging frequency analysis for deep fake image recogni- tion,” in International conference on machine learning . PMLR, 2020, pp. 3247–3258

  34. [34]

    Fcanet: Frequency channel attention networks,

    Z. Qin, P. Zhang, F. Wu, and X. Li, “Fcanet: Frequency channel attention networks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 783–792

  35. [35]

    Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions,

    R. Durall, M. Keuper, and J. Keuper, “Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , June 2020

  36. [36]

    Learning in the frequency domain,

    K. Xu, M. Qin, F. Sun, Y . Wang, Y .-K. Chen, and F. Ren, “Learning in the frequency domain,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2020, pp. 1740–1749

  37. [37]

    Privacy-preserving face recognition with learnable privacy budgets in frequency domain,

    J. Ji, H. Wang, Y . Huang, J. Wu, X. Xu, S. Ding, S. Zhang, L. Cao, and R. Ji, “Privacy-preserving face recognition with learnable privacy budgets in frequency domain,” in European Conference on Computer Vision. Springer, 2022, pp. 475–491

  38. [38]

    Exposing the deception: Uncovering more forgery clues for deepfake detection,

    Z. Ba, Q. Liu, Z. Liu, S. Wu, F. Lin, L. Lu, and K. Ren, “Exposing the deception: Uncovering more forgery clues for deepfake detection,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 38, no. 2, 2024, pp. 719–728

  39. [39]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

  40. [40]

    Auxiliary Tasks in Multi-task Learning , journal =

    L. Liebel and M. K ¨orner, “Auxiliary tasks in multi-task learning,” arXiv preprint arXiv:1805.06334, 2018

  41. [41]

    The DeepFake Detection Challenge (DFDC) Dataset

    B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. C. Ferrer, “The deepfake detection challenge (dfdc) dataset,” arXiv preprint arXiv:2006.07397, 2020

  42. [42]

    detection challenge, 2020, accessed: 2025-01-12

    D. detection challenge, 2020, accessed: 2025-01-12. [Online]. Available: https://github.com/deepfakes/faceswap

  43. [43]

    [Online]

    FaceSwap, 2016, accessed: 2025-01-12. [Online]. Available: https: //github.com/MarekKowalski/FaceSwap

  44. [44]

    Deferred neural rendering: Image synthesis using neural textures,

    J. Thies, M. Zollh ¨ofer, and M. Nießner, “Deferred neural rendering: Image synthesis using neural textures,” Acm Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1–12, 2019

  45. [45]

    Mesonet: a compact facial video forgery detection network,

    D. Afchar, V . Nozick, J. Yamagishi, and I. Echizen, “Mesonet: a compact facial video forgery detection network,” in 2018 IEEE international workshop on information forensics and security (WIFS) . IEEE, 2018, pp. 1–7

  46. [46]

    Deepfakeucl: Deepfake detection via unsupervised contrastive learning,

    S. Fung, X. Lu, C. Zhang, and C.-T. Li, “Deepfakeucl: Deepfake detection via unsupervised contrastive learning,” in 2021 international joint conference on neural networks (IJCNN) . IEEE, 2021, pp. 1–8

  47. [47]

    Detecting deep- fake videos from appearance and behavior,

    S. Agarwal, H. Farid, T. El-Gaaly, and S.-N. Lim, “Detecting deep- fake videos from appearance and behavior,” in 2020 IEEE international workshop on information forensics and security (WIFS) . IEEE, 2020, pp. 1–6

  48. [48]

    Generalizing face forgery detection via uncertainty learning,

    Y . Wu, X. Song, J. Chen, and Y .-G. Jiang, “Generalizing face forgery detection via uncertainty learning,” in Proceedings of the 31st ACM International Conference on Multimedia , 2023, pp. 1759–1767

  49. [49]

    Detecting deepfake videos with temporal dropout 3dcnn

    D. Zhang, C. Li, F. Lin, D. Zeng, and S. Ge, “Detecting deepfake videos with temporal dropout 3dcnn.” in IJCAI, 2021, pp. 1288–1294

  50. [50]

    Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolu- tional network,

    J. Hu, X. Liao, W. Wang, and Z. Qin, “Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolu- tional network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1089–1102, 2021

  51. [51]

    Finfer: Frame inference- based deepfake detection for high-visual-quality videos,

    J. Hu, X. Liao, J. Liang, W. Zhou, and Z. Qin, “Finfer: Frame inference- based deepfake detection for high-visual-quality videos,” in Proceedings of the AAAI conference on artificial intelligence , vol. 36, no. 1, 2022, pp. 951–959

  52. [52]

    Ucf: Uncovering common features for generalizable deepfake detection,

    Z. Yan, Y . Zhang, Y . Fan, and B. Wu, “Ucf: Uncovering common features for generalizable deepfake detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 22 412–22 423

  53. [53]

    Exposing DeepFake Videos By Detecting Face Warping Artifacts

    Y . Li, “Exposing deepfake videos by detecting face warping artif acts,” arXiv preprint arXiv:1811.00656 , 2018

  54. [54]

    Core: Consistent representation learning for face forgery detection,

    Y . Ni, D. Meng, C. Yu, C. Quan, D. Ren, and Y . Zhao, “Core: Consistent representation learning for face forgery detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12–21

  55. [55]

    Representative forgery mining for fake face detection,

    C. Wang and W. Deng, “Representative forgery mining for fake face detection,” 2021. [Online]. Available: https://arxiv.org/abs/2104.06609

  56. [56]

    Hierarchical contrastive inconsistency learning for deepfake video detection,

    Z. Gu, T. Yao, Y . Chen, S. Ding, and L. Ma, “Hierarchical contrastive inconsistency learning for deepfake video detection,” in European Con- ference on Computer Vision . Springer, 2022, pp. 596–613

  57. [57]

    Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain,

    H. Liu, X. Li, W. Zhou, Y . Chen, Y . He, H. Xue, W. Zhang, and N. Yu, “Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain,” 2021. [Online]. Available: https: //arxiv.org/abs/2103.01856

  58. [58]

    End-to- end reconstruction-classification learning for face forgery detection,

    J. Cao, C. Ma, T. Yao, S. Chen, S. Ding, and X. Yang, “End-to- end reconstruction-classification learning for face forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4113–4122

  59. [59]

    Preserving fairness generalization in deepfake detection,

    L. Lin, X. He, Y . Ju, X. Wang, F. Ding, and S. Hu, “Preserving fairness generalization in deepfake detection,” 2024. [Online]. Available: https://arxiv.org/abs/2402.17229

  60. [60]

    Beyond the prior forgery knowledge: Mining critical clues for general face forgery detection,

    A. Luo, C. Kong, J. Huang, Y . Hu, X. Kang, and A. C. Kot, “Beyond the prior forgery knowledge: Mining critical clues for general face forgery detection,” 2023. [Online]. Available: https://arxiv.org/abs/2304.12489 14

  61. [61]

    Grad-cam: Visual explanations from deep networks via gradient-based localization,

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision , 2017, pp. 618–626. Qihao Shen received the B.S. degree in Software Engineering from Zhejiang University, Hangzhou, China, in...

  62. [62]

    Her re- search interests include information security, arti- ficial intelligence applications, data security and trustworthy intelligent systems

    She is currently a Engineer at State Grid Blockchain Technology (Beijing) Co., Ltd. Her re- search interests include information security, arti- ficial intelligence applications, data security and trustworthy intelligent systems. Zhaoyan Ming received the Ph.D. degree in com- puter science from the National University of Singa- pore, Singapore. She is cur...