pith. machine review for the scientific record. sign in

arxiv: 2411.15633 · v4 · pith:NTVX2QFGnew · submitted 2024-11-23 · 💻 cs.CV

Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection

Pith reviewed 2026-05-17 23:20 UTC · model grok-4.3

classification 💻 cs.CV
keywords AI-generated image detectiongeneralizationsingular value decompositionorthogonal subspacesfeature spacepre-trained modelsoverfitting
0
0 comments X

The pith

Decomposing features via SVD into orthogonal parts lets detectors freeze general pre-trained knowledge and adapt only the rest to spot AI fakes without overfitting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that AI-generated image detectors fail to generalize because training on limited fake examples collapses the feature space into a low-rank form that cannot express new patterns. The proposed fix splits the space with singular value decomposition into two orthogonal subspaces, holds the largest principal components fixed to keep broad pre-trained visual knowledge, and updates only the remaining components to capture fake signals. This explicit orthogonality keeps the overall space higher-ranked than full fine-tuning or low-rank adapters. A side result is that the adapted components implicitly encode the idea that fakes are derived from reals rather than independent.

Core claim

Employing singular value decomposition to decompose the original feature space into two orthogonal subspaces, freezing the principal components while adapting only the remained components, preserves the pre-trained knowledge while learning fake patterns. This ensures the higher rank of the whole feature space, minimizes overfitting, and enhances generalization compared to full-parameter and LoRA-based tuning methods. The method also implicitly learns a vital prior that fakes are actually derived from the real, indicating a hierarchical relationship.

What carries the argument

SVD orthogonal subspace decomposition that freezes principal components to retain pre-trained rank and adapts only residual components to learn detection signals.

Load-bearing premise

The largest directions found by SVD on pre-trained vision features hold general visual knowledge that does not overlap with the specific clues needed to detect fakes, so freezing them keeps useful information without blocking detection learning.

What would settle it

Apply the method to a training set of images from several known generators, then evaluate accuracy on a held-out generator never seen in training; if performance matches or falls below a standard fine-tuned baseline, the benefit of freezing principal components is not supported.

read the original abstract

AI-generated images (AIGIs), such as natural or face images, have become increasingly important yet challenging. In this paper, we start from a new perspective to excavate the reason behind the failure generalization in AIGI detection, named the \textit{asymmetry phenomenon}, where a naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked, which is proved seriously limiting the expressivity and generalization. One potential remedy is incorporating the pre-trained knowledge within the vision foundation models (higher-ranked) to expand the feature space, alleviating the model's overfitting to fake. To this end, we employ Singular Value Decomposition (SVD) to decompose the original feature space into \textit{two orthogonal subspaces}. By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning fake patterns. Compared to existing full-parameters and LoRA-based tuning methods, we explicitly ensure orthogonality, enabling the higher rank of the whole feature space, effectively minimizing overfitting and enhancing generalization. We finally identify a crucial insight: our method implicitly learns \textit{a vital prior that fakes are actually derived from the real}, indicating a hierarchical relationship rather than independence. Modeling this prior, we believe, is essential for achieving superior generalization. Our codes are publicly available at \href{https://github.com/YZY-stack/Effort-AIGI-Detection}{GitHub}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that naive training for AI-generated image (AIGI) detection exhibits an 'asymmetry phenomenon' in which the detector overfits to limited and monotonous fake patterns, collapsing the feature space to low rank and harming generalization. To remedy this, the authors apply Singular Value Decomposition (SVD) to features extracted from pre-trained vision foundation models, decomposing the space into two orthogonal subspaces. They freeze the principal components (to retain general pre-trained knowledge) while adapting only the residual components (to capture fake patterns), explicitly enforcing orthogonality to maintain higher rank, reduce overfitting, and improve generalization relative to full-parameter or LoRA tuning. The work also identifies an implicit prior that fakes are hierarchically derived from reals rather than independent.

Significance. If the SVD partitioning reliably isolates general knowledge from task-specific adaptation without discarding detection-critical directions, the method would supply a principled, orthogonality-aware fine-tuning recipe that directly targets rank collapse, a recurring issue in AIGI generalization. The derived insight about modeling the real-to-fake hierarchical prior could usefully shape subsequent detector design.

major comments (1)
  1. [Method (SVD decomposition and asymmetry phenomenon)] The load-bearing assumption that the principal components obtained from SVD on pre-trained features encode only general real-image knowledge and lie orthogonal to (and independent of) the directions needed to detect fakes is not justified in the method description. SVD is performed on the variance structure of the pre-trained feature matrix without any real/fake separation; because common AIGI artifacts (frequency biases, diffusion-specific patterns) frequently align with high-variance axes, a non-negligible fraction of the detection signal may reside in the frozen principal subspace. Freezing it would then remove rather than protect useful information, undermining both the orthogonality guarantee and the claimed generalization benefit. This concern directly affects the central claim and requires either a formal argument or targeted ablations showing the contribution of the frozen versus
minor comments (2)
  1. Specify exactly on which feature matrix (real-only, mixed real/fake, or pre-training corpus) the SVD is computed and how the rank cutoff for the principal subspace is chosen.
  2. The abstract asserts generalization gains and the implicit prior but supplies no quantitative metrics, ablation tables, or cross-generator results; these must be clearly presented and compared against full fine-tuning and LoRA baselines.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. The concern regarding the justification of the SVD decomposition assumption is well-taken, and we address it directly below while outlining planned revisions to strengthen the presentation.

read point-by-point responses
  1. Referee: [Method (SVD decomposition and asymmetry phenomenon)] The load-bearing assumption that the principal components obtained from SVD on pre-trained features encode only general real-image knowledge and lie orthogonal to (and independent of) the directions needed to detect fakes is not justified in the method description. SVD is performed on the variance structure of the pre-trained feature matrix without any real/fake separation; because common AIGI artifacts (frequency biases, diffusion-specific patterns) frequently align with high-variance axes, a non-negligible fraction of the detection signal may reside in the frozen principal subspace. Freezing it would then remove rather than protect useful information, undermining both the orthogonality guarantee and the claimed generalization benefit. This concern directly affects the central claim and requires either a formal argument or targeted

    Authors: We agree that the unsupervised nature of SVD on the pre-trained feature matrix does not explicitly separate real and fake directions, and that certain AIGI artifacts could in principle align with high-variance axes. Our core rationale is that the principal components still predominantly encode the high-rank, general visual priors learned from massive real-image corpora during foundation-model pre-training; the residual subspace then captures the lower-variance deviations that correspond to the hierarchical real-to-fake relationship we identify. The explicit orthogonality constraint we impose further prevents rank collapse even if some detection signal overlaps the principal directions. To directly address the referee’s request, we will add targeted ablations in the revised manuscript that (i) measure detection performance when the principal subspace is progressively unfrozen and (ii) quantify the rank and generalization gap with and without the orthogonality constraint. These experiments will clarify the contribution of each subspace and strengthen the empirical support for our modeling choice. revision: partial

Circularity Check

0 steps flagged

No circularity: SVD decomposition is an explicit algorithmic choice, not a self-referential reduction

full rationale

The paper's central derivation applies SVD to the feature matrix of a pre-trained vision model, freezes the top singular components, and adapts only the orthogonal residual subspace. This procedure is defined directly by the linear algebra of SVD and the training protocol; the resulting feature space rank and orthogonality follow from the decomposition itself rather than from any fitted parameter that is later renamed as a prediction. The asymmetry phenomenon is presented as an empirical observation motivating the method, and the claim that freezing principal components preserves general knowledge is an interpretive hypothesis evaluated on downstream generalization benchmarks, not a tautology. No self-citation chain, uniqueness theorem, or ansatz smuggling appears in the load-bearing steps. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that SVD decomposition cleanly separates general pre-trained knowledge from fake-specific patterns; no free parameters or new entities are introduced beyond standard training choices.

axioms (1)
  • domain assumption Feature spaces extracted from pre-trained vision foundation models admit an SVD decomposition in which the principal components encode general knowledge orthogonal to task-specific fake patterns.
    Invoked to justify freezing the principal components while adapting only the residual subspace.

pith-pipeline@v0.9.0 · 5589 in / 1388 out tokens · 40329 ms · 2026-05-17T23:20:52.690182+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection

    cs.CV 2026-05 unverdicted novelty 7.0

    LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.

  2. Reduce the Artifacts Bias for More Generalizable AI-Generated Image Detection

    cs.CV 2026-05 conditional novelty 6.0

    SEF introduces GAN upsampling for diverse artifacts and expert fusion to reduce domain interference, yielding stronger generalization on 13 benchmarks for AI-generated image detection.

  3. Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection

    cs.CV 2026-05 unverdicted novelty 6.0

    ODP-Net structurally disentangles universal forgery traces from generator fingerprints and semantics via orthogonal decomposition and purification, delivering state-of-the-art generalization to unseen AI image generat...

  4. Rethinking Cross-Domain Evaluation for Face Forgery Detection with Semantic Fine-grained Alignment and Mixture-of-Experts

    cs.CV 2026-04 unverdicted novelty 6.0

    Cross-AUC exposes large robustness drops in existing face forgery detectors across datasets, while the SFAM model with semantic alignment and region-specific experts delivers better performance on public benchmarks.

  5. Combating Pattern and Content Bias: Adversarial Feature Learning for Generalized AI-Generated Image Detection

    cs.CV 2026-04 unverdicted novelty 6.0

    MAFL uses adversarial training to suppress pattern and content biases, guiding models to learn shared generative features for better cross-model generalization in detecting AI images.

  6. Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models

    cs.CV 2026-02 conditional novelty 6.0

    Frozen features from vision foundation models enable a linear probe to outperform specialized AIGI detectors by over 30% on in-the-wild data due to emergent forgery knowledge from pre-training.

  7. Scaling Up AI-Generated Image Detection with Generator-Aware Prototypes

    cs.CV 2025-12 unverdicted novelty 6.0

    GAPL learns a compact set of canonical forgery prototypes and applies two-stage LoRA training to build a low-variance feature space that improves generalization across GAN and diffusion generators.

  8. How Noise Benefits AI-generated Image Detection

    cs.CV 2025-11 unverdicted novelty 6.0

    PiN-CLIP jointly trains a noise generator and detector under a variational positive-incentive principle to inject feature-space noise that suppresses shortcut directions and improves out-of-distribution accuracy by 5....

  9. Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts

    cs.CV 2026-05 unverdicted novelty 5.0

    MDMF detects AI-generated images by learning patch-level forensic signatures and quantifying their distributional discrepancies with MMD, yielding larger separation than global methods when micro-defects are present.

  10. VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection

    cs.CV 2026-04 unverdicted novelty 5.0

    VRAG-DFD uses RAG to retrieve forgery knowledge and RL-based training to build critical reasoning in MLLMs, delivering state-of-the-art generalization on deepfake detection tasks.

  11. LOGER: Local--Global Ensemble for Robust Deepfake Detection in the Wild

    cs.CV 2026-04 unverdicted novelty 5.0

    LOGER ensembles heterogeneous global vision models with selective local patch aggregation via multiple instance learning to achieve robust deepfake detection across varied manipulations and degradations.

  12. Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles

    cs.CV 2026-04 unverdicted novelty 4.0

    A multi-stream ensemble using DINOv2 and CLIP backbones trained with extreme degradations achieves stable deepfake detection and fourth place in the NTIRE 2026 challenge.

  13. Towards Generalizable Deepfake Image Detection with Vision Transformers

    cs.CV 2026-04 unverdicted novelty 4.0

    Ensemble of vision transformers reaches 96.77% AUC and 9% EER on DF-Wild deepfake test set, outperforming the prior Effort baseline by 7% AUC and 8% EER.

  14. Adaptive Forensic Feature Refinement via Intrinsic Importance Perception

    cs.CV 2026-04 unverdicted novelty 4.0

    I2P adaptively selects the most discriminative layers from visual foundation models for synthetic image detection and constrains task updates to low-sensitivity parameter subspaces to improve specificity without harmi...

  15. Boosting Robust AIGI Detection with LoRA-based Pairwise Training

    cs.CV 2026-04 unverdicted novelty 4.0

    LoRA-based pairwise training with distortion and size simulations boosts robust AIGI detection under severe distortions, placing third in the NTIRE challenge.

  16. HEDGE: Heterogeneous Ensemble for Detection of AI-GEnerated Images in the Wild

    cs.CV 2026-04 unverdicted novelty 4.0

    HEDGE is a heterogeneous ensemble using progressive DINOv3 training, multi-scale features, and MetaCLIP2 diversity with dual-gating fusion to achieve robust AI-generated image detection and 4th place in the NTIRE 2026...

Reference graph

Works this paper leans on

290 extracted references · 290 canonical work pages · cited by 16 Pith papers · 16 internal anchors

  1. [1]

    Wukong, 2022. 5. In https://xihe.mindspore.cn/modelzoo/wukong, 2022. 5

  2. [3]

    Brock, A. et al. Large scale gan training for high fidelity natural image synthesis. In ICLR, 2018 b

  3. [4]

    End-to-end reconstruction-classification learning for face forgery detection

    Cao, J., Ma, C., Yao, T., Chen, S., Ding, S., and Yang, X. End-to-end reconstruction-classification learning for face forgery detection. In CVPR, pp.\ 4113--4122, 2022

  4. [5]

    What makes fake images detectable? understanding properties that generalize

    Chai, L., Bau, D., Lim, S.-N., and Isola, P. What makes fake images detectable? understanding properties that generalize. In ECCV, pp.\ 103--120. Springer, 2020

  5. [6]

    Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images

    Chen, B., Zeng, J., Yang, J., and Yang, R. Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. In ICML, 2024

  6. [7]

    Learning to see in the dark

    Chen, C., Chen, Q., Xu, J., and Koltun, V. Learning to see in the dark. In CVPR, 2018

  7. [8]

    Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection

    Chen, L., Zhang, Y., Song, Y., Liu, L., and Wang, J. Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. In CVPR, pp.\ 18710--18719, 2022 a

  8. [9]

    Ost: Improving generalization of deepfake detection via one-shot test-time training

    Chen, L., Zhang, Y., Song, Y., Wang, J., and Liu, L. Ost: Improving generalization of deepfake detection via one-shot test-time training. In NeurIPS, 2022 b

  9. [10]

    and Koltun, V

    Chen, Q. and Koltun, V. Photographic image synthesis with cascaded refinement networks. In ICCV, 2017

  10. [11]

    Can we leave deepfake data behind in training deepfake detector? NeurIPS, 2024

    Cheng, J., Yan, Z., Zhang, Y., Luo, Y., Wang, Z., and Li, C. Can we leave deepfake data behind in training deepfake detector? NeurIPS, 2024

  11. [12]

    Exploiting style latent flows for generalizing deepfake video detection

    Choi, J., Kim, T., Jeong, Y., Baek, S., and Choi, J. Exploiting style latent flows for generalizing deepfake video detection. In CVPR, pp.\ 1133--1143, 2024

  12. [13]

    Stargan: Unified generative adversarial networks for multi-domain image-to-image translation

    Choi, Y., Choi, M., Kim, M., Ha, J.-W., Kim, S., and Choo, J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR, 2018

  13. [14]

    Second-order attention network for single image super-resolution

    Dai, T., Cai, J., Zhang, Y., Xia, S.-T., and Lei, Z. Second-order attention network for single image super-resolution. In CVPR, 2019

  14. [15]

    https://www.kaggle.com/c/deepfake-detection-challenge Accessed 2021-04-24

    detection challenge., D., 2020. https://www.kaggle.com/c/deepfake-detection-challenge Accessed 2021-04-24

  15. [16]

    https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html Accessed 2021-04-24

    DFD., 2020. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html Accessed 2021-04-24

  16. [17]

    and Nichol, A

    Dhariwal, P. and Nichol, A. Diffusion models beat gans on image synthesis. In NeurIPS, 2021

  17. [18]

    Dhariwal, P. et al. Diffusion models beat gans on image synthesis. NeurIPS, 34: 0 8780--8794, 2021

  18. [19]

    Parameter-efficient fine-tuning of large-scale pre-trained language models

    Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., Hu, S., Chen, Y., Chan, C.-M., Chen, W., et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5 0 (3): 0 220--235, 2023

  19. [20]

    Dolhansky, B., Howes, R., Pflaum, B., Baram, N., and Ferrer, C. C. The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854, 2019

  20. [21]

    Implicit identity leakage: The stumbling block to improving deepfake detection generalization

    Dong, S., Wang, J., Ji, R., Liang, J., Fan, H., and Ge, Z. Implicit identity leakage: The stumbling block to improving deepfake detection generalization. In CVPR, pp.\ 3994--4004, 2023

  21. [22]

    Exploring unbiased deepfake detection via token-level shuffling and mixing

    Fu, X., Yan, Z., Yao, T., Chen, S., and Li, X. Exploring unbiased deepfake detection via token-level shuffling and mixing. In AAAI, 2025

  22. [23]

    Generative adversarial networks

    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial networks. Communications of the ACM, 63 0 (11): 0 139--144, 2020

  23. [24]

    Vector quantized diffusion model for text-to-image synthesis

    Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., and Guo, B. Vector quantized diffusion model for text-to-image synthesis. In CVPR, pp.\ 10696--10706, 2022 a

  24. [25]

    Hierarchical contrastive inconsistency learning for deepfake video detection

    Gu, Z., Yao, T., Chen, Y., Ding, S., and Ma, L. Hierarchical contrastive inconsistency learning for deepfake video detection. In ECCV, pp.\ 596--613. Springer, 2022 b

  25. [26]

    Delving into sequential patches for deepfake detection

    Guan, J., Zhou, H., Hong, Z., Ding, E., Wang, J., Quan, C., and Zhao, Y. Delving into sequential patches for deepfake detection. NeurIPS, 35: 0 4517--4530, 2022

  26. [27]

    E., Bhojanapalli, S., Neyshabur, B., and Srebro, N

    Gunasekar, S., Woodworth, B. E., Bhojanapalli, S., Neyshabur, B., and Srebro, N. Implicit regularization in matrix factorization. NeurIPS, 30, 2017

  27. [28]

    Lips don't lie: A generalisable and robust approach to face forgery detection

    Haliassos, A., Vougioukas, K., Petridis, S., and Pantic, M. Lips don't lie: A generalisable and robust approach to face forgery detection. In CVPR, 2021

  28. [29]

    Leveraging real talking faces via self-supervision for robust forgery detection

    Haliassos, A., Mira, R., Petridis, S., and Pantic, M. Leveraging real talking faces via self-supervision for robust forgery detection. In CVPR, pp.\ 14950--14962, 2022

  29. [30]

    Deep residual learning for image recognition

    He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In CVPR, pp.\ 770--778, 2016

  30. [31]

    Denoising diffusion probabilistic models

    Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. NeurIPS, 33: 0 6840--6851, 2020

  31. [33]

    Implicit identity driven deepfake face swapping detection

    Huang, B., Wang, Z., Yang, J., Ai, J., Zou, Q., Wang, Q., and Ye, D. Implicit identity driven deepfake face swapping detection. In CVPR, pp.\ 4490--4499, 2023

  32. [34]

    Jeong, Y. et al. Bihpf: Bilateral high-pass filters for robust deepfake detection. In WACV, pp.\ 48--57, 2022

  33. [35]

    Jiang, L., Li, R., Wu, W., Qian, C., and Loy, C. C. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In CVPR, 2020

  34. [36]

    Progressive growing of gans for improved quality, stability, and variation

    Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. In ICLR, 2018

  35. [37]

    A style-based generator architecture for generative adversarial networks

    Karras, T., Laine, S., and Aila, T. A style-based generator architecture for generative adversarial networks. In CVPR, 2019

  36. [38]

    and Woo, S

    Khalid, H. and Woo, S. S. Oc-fakedect: Classifying deepfakes using one-class variational autoencoder. In CVPRW, pp.\ 656--657, 2020

  37. [40]

    Enhancing general face forgery detection via vision transformer with low-rank adaptation

    Kong, C., Li, H., and Wang, S. Enhancing general face forgery detection via vision transformer with low-rank adaptation. In ICCV, pp.\ 102--107. IEEE, 2023

  38. [42]

    DeepFakes: a New Threat to Face Recognition? Assessment and Detection

    Korshunov, P. and Marcel, S. Deepfakes: a new threat to face recognition? assessment and detection. arXiv preprint arXiv:1812.08685, 2018

  39. [43]

    Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes

    Larue, N., Vu, N.-S., Struc, V., Peer, P., and Christophides, V. Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes. In ICCV, pp.\ 21011--21021, 2023

  40. [44]

    Learning to generalize: Meta-learning for domain generalization

    Li, D., Yang, Y., Song, Y.-Z., and Hospedales, T. Learning to generalize: Meta-learning for domain generalization. In AAAI, volume 32, 2018

  41. [45]

    Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection

    Li, J., Xie, H., Li, J., Wang, Z., and Zhang, Y. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In CVPR, 2021

  42. [46]

    Diverse image synthesis from semantic layouts via conditional imle

    Li, K., Zhang, T., and Malik, J. Diverse image synthesis from semantic layouts via conditional imle. In ICCV, 2019

  43. [47]

    Face x-ray for more general face forgery detection

    Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., and Guo, B. Face x-ray for more general face forgery detection. In CVPR, 2020 a

  44. [48]

    Celeb-df: A new dataset for deepfake forensics

    Li, Y., Yang, X., Sun, P., Qi, H., and Lyu, S. Celeb-df: A new dataset for deepfake forensics. In CVPR, 2020 b

  45. [50]

    Spatial-phase shallow learning: rethinking face forgery detection in frequency domain

    Liu, H., Li, X., Zhou, W., Chen, Y., He, Y., Xue, H., Zhang, W., and Yu, N. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In CVPR, 2021 a

  46. [51]

    Forgery-aware adaptive transformer for generalizable synthetic image detection

    Liu, H., Tan, Z., Tan, C., Wei, Y., Wang, J., and Zhao, Y. Forgery-aware adaptive transformer for generalizable synthetic image detection. In CVPR, pp.\ 10770--10780, 2024

  47. [52]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pp.\ 10012--10022, 2021 b

  48. [53]

    Liu, Z. et al. Global texture enhancement for fake face detection in the wild. In CVPR, pp.\ 8060--8069, 2020

  49. [55]

    Luo, A., Kong, C., Huang, J., Hu, Y., Kang, X., and Kot, A. C. Beyond the prior forgery knowledge: Mining critical clues for general face forgery detection. IEEE TIFS, 19: 0 1168--1182, 2023 b

  50. [56]

    Generalizing face forgery detection with high-frequency features

    Luo, Y., Zhang, Y., Yan, J., and Liu, W. Generalizing face forgery detection with high-frequency features. In CVPR, 2021

  51. [57]

    Lare\^ 2 : Latent reconstruction error based method for diffusion-generated image detection

    Luo, Y., Du, J., Yan, K., and Ding, S. Lare\^ 2 : Latent reconstruction error based method for diffusion-generated image detection. In CVPR, pp.\ 17006--17015, 2024

  52. [58]

    F 2 trans: High-frequency fine-grained transformer for face forgery detection

    Miao, C., Tan, Z., Chu, Q., Liu, H., Hu, H., and Yu, N. F 2 trans: High-frequency fine-grained transformer for face forgery detection. IEEE TIFS, 18: 0 1039--1051, 2023

  53. [59]

    https://www.midjourney.com/home

    MidJourney. https://www.midjourney.com/home

  54. [60]

    and Rostamizadeh, A

    Mohri, M. and Rostamizadeh, A. Rademacher complexity bounds for non-iid processes. Advances in neural information processing systems, 21, 2008

  55. [61]

    M., Chandrasekaran, S., Flenner, A., Bappy, J

    Nataraj, L., Mohammed, T. M., Chandrasekaran, S., Flenner, A., Bappy, J. H., Roy-Chowdhury, A. K., and Manjunath, B. Detecting gan generated fake images using co-occurrence matrices. arXiv preprint arXiv:1903.06836, 2019

  56. [62]

    Core: Consistent representation learning for face forgery detection

    Ni, Y., Meng, D., Yu, C., Quan, C., Ren, D., and Zhao, Y. Core: Consistent representation learning for face forgery detection. In CVPRW, pp.\ 12--21, 2022

  57. [64]

    Glide: Towards photorealistic image generation and editing with text-guided diffusion models

    Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., and Chen, M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In ICML, 2022

  58. [65]

    Ojha, U. et al. Towards universal fake image detectors that generalize across generative models. In CVPR, pp.\ 24480--24489, 2023

  59. [66]

    Semantic image synthesis with spatially-adaptive normalization

    Park, T., Liu, M.-Y., Wang, T.-C., and Zhu, J.-Y. Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019

  60. [67]

    Beit v2: Masked image modeling with vector-quantized visual tokenizers

    Peng, Z., Dong, L., Bao, H., Ye, Q., and Wei, F. Beit v2: Masked image modeling with vector-quantized visual tokenizers. arXiv preprint arXiv:2208.06366, 2022

  61. [68]

    Thinking in frequency: Face forgery detection by mining frequency-aware clues

    Qian, Y., Yin, G., Sheng, L., Chen, Z., and Shao, J. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In ECCV, pp.\ 86--103. Springer, 2020

  62. [69]

    W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al

    Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. In ICML, pp.\ 8748--8763. PMLR, 2021

  63. [70]

    Zero-shot text-to-image generation

    Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. Zero-shot text-to-image generation. In ICML, 2021

  64. [71]

    High-resolution image synthesis with latent diffusion models

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, 2022 a

  65. [72]

    High-resolution image synthesis with latent diffusion models

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, pp.\ 10684--10695, 2022 b

  66. [73]

    Face F orensics++: Learning to detect manipulated facial images

    R\"ossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., and Nie ner, M. Face F orensics++: Learning to detect manipulated facial images. In ICCV, 2019

  67. [74]

    Faceforensics++: Learning to detect manipulated facial images

    Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., and Nie ner, M. Faceforensics++: Learning to detect manipulated facial images. In ICCV, pp.\ 1--11, 2019

  68. [76]

    and Yamasaki, T

    Shiohara, K. and Yamasaki, T. Detecting deepfakes with self-blended images. In CVPR, pp.\ 18720--18729, 2022

  69. [77]

    Blendface: Re-designing identity encoders for face-swapping

    Shiohara, K., Yang, X., and Taketomi, T. Blendface: Re-designing identity encoders for face-swapping. In ICCV, pp.\ 7634--7644, 2023

  70. [78]

    Domain general face forgery detection by learning to weight

    Sun, K., Liu, H., Ye, Q., Gao, Y., Liu, J., Shao, L., and Ji, R. Domain general face forgery detection by learning to weight. In AAAI, volume 35, pp.\ 2638--2646, 2021

  71. [79]

    Dual contrastive learning for general face forgery detection

    Sun, K., Yao, T., Chen, S., Ding, S., Li, J., and Ji, R. Dual contrastive learning for general face forgery detection. In AAAI, volume 36, pp.\ 2316--2324, 2022

  72. [80]

    Learning on gradients: Generalized artifacts representation for gan-generated images detection

    Tan, C., Zhao, Y., Wei, S., Gu, G., and Wei, Y. Learning on gradients: Generalized artifacts representation for gan-generated images detection. In CVPR, pp.\ 12105--12114, June 2023

  73. [81]

    Data-independent operator: A training-free artifact representation extractor for generalizable deepfake detection

    Tan, C., Liu, P., Tao, R., Liu, H., Zhao, Y., Wu, B., and Wei, Y. Data-independent operator: A training-free artifact representation extractor for generalizable deepfake detection. arXiv preprint arXiv:2403.06803, 2024 a

  74. [82]

    Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning

    Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., and Wei, Y. Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning. In AAAI, volume 38, pp.\ 5052--5060, 2024 b

  75. [83]

    Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection

    Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., and Wei, Y. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection. In CVPR, pp.\ 28130--28139, 2024 c

  76. [84]

    and Le, Q

    Tan, M. and Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, pp.\ 6105--6114. PMLR, 2019

  77. [85]

    Face2face: Real-time face capture and reenactment of rgb videos

    Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., and Nie ner, M. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR, 2016

  78. [86]

    Training data-efficient image transformers & distillation through attention

    Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J \'e gou, H. Training data-efficient image transformers & distillation through attention. In ICML, pp.\ 10347--10357. PMLR, 2021

  79. [87]

    and Liu, Y

    Trinh, L. and Liu, Y. An examination of fairness of ai models for deepfake detection. arXiv, 2021

  80. [88]

    M2tr: Multi-modal multi-scale transformers for deepfake detection

    Wang, J., Wu, Z., Ouyang, W., Han, X., Chen, J., Jiang, Y.-G., and Li, S.-N. M2tr: Multi-modal multi-scale transformers for deepfake detection. In ICMR, pp.\ 615--623, 2022

Showing first 80 references.