arxiv: 2411.15633 · v4 · pith:NTVX2QFGnew · submitted 2024-11-23 · 💻 cs.CV

Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection

Zhiyuan Yan , Jiangming Wang , Peng Jin , Ke-Yue Zhang , Chengchun Liu , Shen Chen , Taiping Yao , Shouhong Ding

show 2 more authors

Baoyuan Wu Li Yuan

This is my paper

Pith reviewed 2026-05-17 23:20 UTC · model grok-4.3

classification 💻 cs.CV

keywords AI-generated image detectiongeneralizationsingular value decompositionorthogonal subspacesfeature spacepre-trained modelsoverfitting

0 comments

The pith

Decomposing features via SVD into orthogonal parts lets detectors freeze general pre-trained knowledge and adapt only the rest to spot AI fakes without overfitting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that AI-generated image detectors fail to generalize because training on limited fake examples collapses the feature space into a low-rank form that cannot express new patterns. The proposed fix splits the space with singular value decomposition into two orthogonal subspaces, holds the largest principal components fixed to keep broad pre-trained visual knowledge, and updates only the remaining components to capture fake signals. This explicit orthogonality keeps the overall space higher-ranked than full fine-tuning or low-rank adapters. A side result is that the adapted components implicitly encode the idea that fakes are derived from reals rather than independent.

Core claim

Employing singular value decomposition to decompose the original feature space into two orthogonal subspaces, freezing the principal components while adapting only the remained components, preserves the pre-trained knowledge while learning fake patterns. This ensures the higher rank of the whole feature space, minimizes overfitting, and enhances generalization compared to full-parameter and LoRA-based tuning methods. The method also implicitly learns a vital prior that fakes are actually derived from the real, indicating a hierarchical relationship.

What carries the argument

SVD orthogonal subspace decomposition that freezes principal components to retain pre-trained rank and adapts only residual components to learn detection signals.

Load-bearing premise

The largest directions found by SVD on pre-trained vision features hold general visual knowledge that does not overlap with the specific clues needed to detect fakes, so freezing them keeps useful information without blocking detection learning.

What would settle it

Apply the method to a training set of images from several known generators, then evaluate accuracy on a held-out generator never seen in training; if performance matches or falls below a standard fine-tuned baseline, the benefit of freezing principal components is not supported.

read the original abstract

AI-generated images (AIGIs), such as natural or face images, have become increasingly important yet challenging. In this paper, we start from a new perspective to excavate the reason behind the failure generalization in AIGI detection, named the \textit{asymmetry phenomenon}, where a naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked, which is proved seriously limiting the expressivity and generalization. One potential remedy is incorporating the pre-trained knowledge within the vision foundation models (higher-ranked) to expand the feature space, alleviating the model's overfitting to fake. To this end, we employ Singular Value Decomposition (SVD) to decompose the original feature space into \textit{two orthogonal subspaces}. By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning fake patterns. Compared to existing full-parameters and LoRA-based tuning methods, we explicitly ensure orthogonality, enabling the higher rank of the whole feature space, effectively minimizing overfitting and enhancing generalization. We finally identify a crucial insight: our method implicitly learns \textit{a vital prior that fakes are actually derived from the real}, indicating a hierarchical relationship rather than independence. Modeling this prior, we believe, is essential for achieving superior generalization. Our codes are publicly available at \href{https://github.com/YZY-stack/Effort-AIGI-Detection}{GitHub}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses SVD to split pre-trained features into orthogonal subspaces so the model can freeze general knowledge and adapt only residuals for AIGI detection, but the key assumption that top components hold no fake signal needs direct checks.

read the letter

The main thing here is that they apply SVD to the feature matrix from a pre-trained vision model, split it into orthogonal parts, freeze the principal components, and tune only the rest to pick up fake patterns while keeping the overall space higher rank. This is framed as a fix for the asymmetry phenomenon, where standard training overfits to limited fakes and collapses expressivity. The explicit orthogonality step and the derived insight that fakes are hierarchically derived from reals rather than independent are the clearest new pieces. It goes beyond plain LoRA or full fine-tuning by enforcing the split at the subspace level. The public code is a plus for anyone who wants to test it. The motivation for preserving pre-trained knowledge without letting the detector collapse is straightforward and addresses a real practical issue in media verification. The soft spot sits in the central assumption: the top singular vectors are treated as carrying only general real-image structure that stays orthogonal to fake cues. If diffusion artifacts or frequency biases sit along those high-variance axes, freezing them would discard rather than protect useful signal, exactly as the stress-test note flags. The abstract motivates the asymmetry but the full experiments would have to show through ablations or feature visualizations that the frozen part really adds no detection value and that the residual adaptation produces measurable rank and generalization gains. Without that evidence the benefit stays plausible rather than demonstrated. This is for people working on robust synthetic-image detectors who already use foundation models and want a structured alternative to standard adaptation. A reader who cares about generalization tricks in computer vision would get something concrete to try. It deserves a serious referee because the technical move is clear and the problem matters, even if the empirical link between the SVD split and the claimed gains still needs scrutiny.

Referee Report

1 major / 2 minor

Summary. The paper claims that naive training for AI-generated image (AIGI) detection exhibits an 'asymmetry phenomenon' in which the detector overfits to limited and monotonous fake patterns, collapsing the feature space to low rank and harming generalization. To remedy this, the authors apply Singular Value Decomposition (SVD) to features extracted from pre-trained vision foundation models, decomposing the space into two orthogonal subspaces. They freeze the principal components (to retain general pre-trained knowledge) while adapting only the residual components (to capture fake patterns), explicitly enforcing orthogonality to maintain higher rank, reduce overfitting, and improve generalization relative to full-parameter or LoRA tuning. The work also identifies an implicit prior that fakes are hierarchically derived from reals rather than independent.

Significance. If the SVD partitioning reliably isolates general knowledge from task-specific adaptation without discarding detection-critical directions, the method would supply a principled, orthogonality-aware fine-tuning recipe that directly targets rank collapse, a recurring issue in AIGI generalization. The derived insight about modeling the real-to-fake hierarchical prior could usefully shape subsequent detector design.

major comments (1)

[Method (SVD decomposition and asymmetry phenomenon)] The load-bearing assumption that the principal components obtained from SVD on pre-trained features encode only general real-image knowledge and lie orthogonal to (and independent of) the directions needed to detect fakes is not justified in the method description. SVD is performed on the variance structure of the pre-trained feature matrix without any real/fake separation; because common AIGI artifacts (frequency biases, diffusion-specific patterns) frequently align with high-variance axes, a non-negligible fraction of the detection signal may reside in the frozen principal subspace. Freezing it would then remove rather than protect useful information, undermining both the orthogonality guarantee and the claimed generalization benefit. This concern directly affects the central claim and requires either a formal argument or targeted ablations showing the contribution of the frozen versus

minor comments (2)

Specify exactly on which feature matrix (real-only, mixed real/fake, or pre-training corpus) the SVD is computed and how the rank cutoff for the principal subspace is chosen.
The abstract asserts generalization gains and the implicit prior but supplies no quantitative metrics, ablation tables, or cross-generator results; these must be clearly presented and compared against full fine-tuning and LoRA baselines.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. The concern regarding the justification of the SVD decomposition assumption is well-taken, and we address it directly below while outlining planned revisions to strengthen the presentation.

read point-by-point responses

Referee: [Method (SVD decomposition and asymmetry phenomenon)] The load-bearing assumption that the principal components obtained from SVD on pre-trained features encode only general real-image knowledge and lie orthogonal to (and independent of) the directions needed to detect fakes is not justified in the method description. SVD is performed on the variance structure of the pre-trained feature matrix without any real/fake separation; because common AIGI artifacts (frequency biases, diffusion-specific patterns) frequently align with high-variance axes, a non-negligible fraction of the detection signal may reside in the frozen principal subspace. Freezing it would then remove rather than protect useful information, undermining both the orthogonality guarantee and the claimed generalization benefit. This concern directly affects the central claim and requires either a formal argument or targeted

Authors: We agree that the unsupervised nature of SVD on the pre-trained feature matrix does not explicitly separate real and fake directions, and that certain AIGI artifacts could in principle align with high-variance axes. Our core rationale is that the principal components still predominantly encode the high-rank, general visual priors learned from massive real-image corpora during foundation-model pre-training; the residual subspace then captures the lower-variance deviations that correspond to the hierarchical real-to-fake relationship we identify. The explicit orthogonality constraint we impose further prevents rank collapse even if some detection signal overlaps the principal directions. To directly address the referee’s request, we will add targeted ablations in the revised manuscript that (i) measure detection performance when the principal subspace is progressively unfrozen and (ii) quantify the rank and generalization gap with and without the orthogonality constraint. These experiments will clarify the contribution of each subspace and strengthen the empirical support for our modeling choice. revision: partial

Circularity Check

0 steps flagged

No circularity: SVD decomposition is an explicit algorithmic choice, not a self-referential reduction

full rationale

The paper's central derivation applies SVD to the feature matrix of a pre-trained vision model, freezes the top singular components, and adapts only the orthogonal residual subspace. This procedure is defined directly by the linear algebra of SVD and the training protocol; the resulting feature space rank and orthogonality follow from the decomposition itself rather than from any fitted parameter that is later renamed as a prediction. The asymmetry phenomenon is presented as an empirical observation motivating the method, and the claim that freezing principal components preserves general knowledge is an interpretive hypothesis evaluated on downstream generalization benchmarks, not a tautology. No self-citation chain, uniqueness theorem, or ansatz smuggling appears in the load-bearing steps. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that SVD decomposition cleanly separates general pre-trained knowledge from fake-specific patterns; no free parameters or new entities are introduced beyond standard training choices.

axioms (1)

domain assumption Feature spaces extracted from pre-trained vision foundation models admit an SVD decomposition in which the principal components encode general knowledge orthogonal to task-specific fake patterns.
Invoked to justify freezing the principal components while adapting only the residual subspace.

pith-pipeline@v0.9.0 · 5589 in / 1388 out tokens · 40329 ms · 2026-05-17T23:20:52.690182+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we employ Singular Value Decomposition (SVD) to decompose the original feature space into two orthogonal subspaces. By freezing the principal components and adapting only the remained components
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

our method implicitly learns a vital prior that fakes are actually derived from the real, indicating a hierarchical relationship rather than independence

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection
cs.CV 2026-05 unverdicted novelty 7.0

LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.
Reduce the Artifacts Bias for More Generalizable AI-Generated Image Detection
cs.CV 2026-05 conditional novelty 6.0

SEF introduces GAN upsampling for diverse artifacts and expert fusion to reduce domain interference, yielding stronger generalization on 13 benchmarks for AI-generated image detection.
Decoupling Semantics and Fingerprints: A Universal Representation for AI-Generated Image Detection
cs.CV 2026-05 unverdicted novelty 6.0

ODP-Net structurally disentangles universal forgery traces from generator fingerprints and semantics via orthogonal decomposition and purification, delivering state-of-the-art generalization to unseen AI image generat...
Rethinking Cross-Domain Evaluation for Face Forgery Detection with Semantic Fine-grained Alignment and Mixture-of-Experts
cs.CV 2026-04 unverdicted novelty 6.0

Cross-AUC exposes large robustness drops in existing face forgery detectors across datasets, while the SFAM model with semantic alignment and region-specific experts delivers better performance on public benchmarks.
Combating Pattern and Content Bias: Adversarial Feature Learning for Generalized AI-Generated Image Detection
cs.CV 2026-04 unverdicted novelty 6.0

MAFL uses adversarial training to suppress pattern and content biases, guiding models to learn shared generative features for better cross-model generalization in detecting AI images.
Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models
cs.CV 2026-02 conditional novelty 6.0

Frozen features from vision foundation models enable a linear probe to outperform specialized AIGI detectors by over 30% on in-the-wild data due to emergent forgery knowledge from pre-training.
Scaling Up AI-Generated Image Detection with Generator-Aware Prototypes
cs.CV 2025-12 unverdicted novelty 6.0

GAPL learns a compact set of canonical forgery prototypes and applies two-stage LoRA training to build a low-variance feature space that improves generalization across GAN and diffusion generators.
How Noise Benefits AI-generated Image Detection
cs.CV 2025-11 unverdicted novelty 6.0

PiN-CLIP jointly trains a noise generator and detector under a variational positive-incentive principle to inject feature-space noise that suppresses shortcut directions and improves out-of-distribution accuracy by 5....
Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts
cs.CV 2026-05 unverdicted novelty 5.0

MDMF detects AI-generated images by learning patch-level forensic signatures and quantifying their distributional discrepancies with MMD, yielding larger separation than global methods when micro-defects are present.
VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection
cs.CV 2026-04 unverdicted novelty 5.0

VRAG-DFD uses RAG to retrieve forgery knowledge and RL-based training to build critical reasoning in MLLMs, delivering state-of-the-art generalization on deepfake detection tasks.
LOGER: Local--Global Ensemble for Robust Deepfake Detection in the Wild
cs.CV 2026-04 unverdicted novelty 5.0

LOGER ensembles heterogeneous global vision models with selective local patch aggregation via multiple instance learning to achieve robust deepfake detection across varied manipulations and degradations.
Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles
cs.CV 2026-04 unverdicted novelty 4.0

A multi-stream ensemble using DINOv2 and CLIP backbones trained with extreme degradations achieves stable deepfake detection and fourth place in the NTIRE 2026 challenge.
Towards Generalizable Deepfake Image Detection with Vision Transformers
cs.CV 2026-04 unverdicted novelty 4.0

Ensemble of vision transformers reaches 96.77% AUC and 9% EER on DF-Wild deepfake test set, outperforming the prior Effort baseline by 7% AUC and 8% EER.
Adaptive Forensic Feature Refinement via Intrinsic Importance Perception
cs.CV 2026-04 unverdicted novelty 4.0

I2P adaptively selects the most discriminative layers from visual foundation models for synthetic image detection and constrains task updates to low-sensitivity parameter subspaces to improve specificity without harmi...
Boosting Robust AIGI Detection with LoRA-based Pairwise Training
cs.CV 2026-04 unverdicted novelty 4.0

LoRA-based pairwise training with distortion and size simulations boosts robust AIGI detection under severe distortions, placing third in the NTIRE challenge.
HEDGE: Heterogeneous Ensemble for Detection of AI-GEnerated Images in the Wild
cs.CV 2026-04 unverdicted novelty 4.0

HEDGE is a heterogeneous ensemble using progressive DINOv3 training, multi-scale features, and MetaCLIP2 diversity with dual-gating fusion to achieve robust AI-generated image detection and 4th place in the NTIRE 2026...

Reference graph

Works this paper leans on

290 extracted references · 290 canonical work pages · cited by 16 Pith papers · 16 internal anchors

[1]

Wukong, 2022. 5. In https://xihe.mindspore.cn/modelzoo/wukong, 2022. 5

work page 2022
[3]

Brock, A. et al. Large scale gan training for high fidelity natural image synthesis. In ICLR, 2018 b

work page 2018
[4]

End-to-end reconstruction-classification learning for face forgery detection

Cao, J., Ma, C., Yao, T., Chen, S., Ding, S., and Yang, X. End-to-end reconstruction-classification learning for face forgery detection. In CVPR, pp.\ 4113--4122, 2022

work page 2022
[5]

What makes fake images detectable? understanding properties that generalize

Chai, L., Bau, D., Lim, S.-N., and Isola, P. What makes fake images detectable? understanding properties that generalize. In ECCV, pp.\ 103--120. Springer, 2020

work page 2020
[6]

Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images

Chen, B., Zeng, J., Yang, J., and Yang, R. Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. In ICML, 2024

work page 2024
[7]

Learning to see in the dark

Chen, C., Chen, Q., Xu, J., and Koltun, V. Learning to see in the dark. In CVPR, 2018

work page 2018
[8]

Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection

Chen, L., Zhang, Y., Song, Y., Liu, L., and Wang, J. Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. In CVPR, pp.\ 18710--18719, 2022 a

work page 2022
[9]

Ost: Improving generalization of deepfake detection via one-shot test-time training

Chen, L., Zhang, Y., Song, Y., Wang, J., and Liu, L. Ost: Improving generalization of deepfake detection via one-shot test-time training. In NeurIPS, 2022 b

work page 2022
[10]

and Koltun, V

Chen, Q. and Koltun, V. Photographic image synthesis with cascaded refinement networks. In ICCV, 2017

work page 2017
[11]

Can we leave deepfake data behind in training deepfake detector? NeurIPS, 2024

Cheng, J., Yan, Z., Zhang, Y., Luo, Y., Wang, Z., and Li, C. Can we leave deepfake data behind in training deepfake detector? NeurIPS, 2024

work page 2024
[12]

Exploiting style latent flows for generalizing deepfake video detection

Choi, J., Kim, T., Jeong, Y., Baek, S., and Choi, J. Exploiting style latent flows for generalizing deepfake video detection. In CVPR, pp.\ 1133--1143, 2024

work page 2024
[13]

Stargan: Unified generative adversarial networks for multi-domain image-to-image translation

Choi, Y., Choi, M., Kim, M., Ha, J.-W., Kim, S., and Choo, J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR, 2018

work page 2018
[14]

Second-order attention network for single image super-resolution

Dai, T., Cai, J., Zhang, Y., Xia, S.-T., and Lei, Z. Second-order attention network for single image super-resolution. In CVPR, 2019

work page 2019
[15]

https://www.kaggle.com/c/deepfake-detection-challenge Accessed 2021-04-24

detection challenge., D., 2020. https://www.kaggle.com/c/deepfake-detection-challenge Accessed 2021-04-24

work page 2020
[16]

https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html Accessed 2021-04-24

DFD., 2020. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html Accessed 2021-04-24

work page 2020
[17]

and Nichol, A

Dhariwal, P. and Nichol, A. Diffusion models beat gans on image synthesis. In NeurIPS, 2021

work page 2021
[18]

Dhariwal, P. et al. Diffusion models beat gans on image synthesis. NeurIPS, 34: 0 8780--8794, 2021

work page 2021
[19]

Parameter-efficient fine-tuning of large-scale pre-trained language models

Ding, N., Qin, Y., Yang, G., Wei, F., Yang, Z., Su, Y., Hu, S., Chen, Y., Chan, C.-M., Chen, W., et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5 0 (3): 0 220--235, 2023

work page 2023
[20]

Dolhansky, B., Howes, R., Pflaum, B., Baram, N., and Ferrer, C. C. The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854, 2019

work page arXiv 1910
[21]

Implicit identity leakage: The stumbling block to improving deepfake detection generalization

Dong, S., Wang, J., Ji, R., Liang, J., Fan, H., and Ge, Z. Implicit identity leakage: The stumbling block to improving deepfake detection generalization. In CVPR, pp.\ 3994--4004, 2023

work page 2023
[22]

Exploring unbiased deepfake detection via token-level shuffling and mixing

Fu, X., Yan, Z., Yao, T., Chen, S., and Li, X. Exploring unbiased deepfake detection via token-level shuffling and mixing. In AAAI, 2025

work page 2025
[23]

Generative adversarial networks

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial networks. Communications of the ACM, 63 0 (11): 0 139--144, 2020

work page 2020
[24]

Vector quantized diffusion model for text-to-image synthesis

Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., and Guo, B. Vector quantized diffusion model for text-to-image synthesis. In CVPR, pp.\ 10696--10706, 2022 a

work page 2022
[25]

Hierarchical contrastive inconsistency learning for deepfake video detection

Gu, Z., Yao, T., Chen, Y., Ding, S., and Ma, L. Hierarchical contrastive inconsistency learning for deepfake video detection. In ECCV, pp.\ 596--613. Springer, 2022 b

work page 2022
[26]

Delving into sequential patches for deepfake detection

Guan, J., Zhou, H., Hong, Z., Ding, E., Wang, J., Quan, C., and Zhao, Y. Delving into sequential patches for deepfake detection. NeurIPS, 35: 0 4517--4530, 2022

work page 2022
[27]

E., Bhojanapalli, S., Neyshabur, B., and Srebro, N

Gunasekar, S., Woodworth, B. E., Bhojanapalli, S., Neyshabur, B., and Srebro, N. Implicit regularization in matrix factorization. NeurIPS, 30, 2017

work page 2017
[28]

Lips don't lie: A generalisable and robust approach to face forgery detection

Haliassos, A., Vougioukas, K., Petridis, S., and Pantic, M. Lips don't lie: A generalisable and robust approach to face forgery detection. In CVPR, 2021

work page 2021
[29]

Leveraging real talking faces via self-supervision for robust forgery detection

Haliassos, A., Mira, R., Petridis, S., and Pantic, M. Leveraging real talking faces via self-supervision for robust forgery detection. In CVPR, pp.\ 14950--14962, 2022

work page 2022
[30]

Deep residual learning for image recognition

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In CVPR, pp.\ 770--778, 2016

work page 2016
[31]

Denoising diffusion probabilistic models

Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. NeurIPS, 33: 0 6840--6851, 2020

work page 2020
[33]

Implicit identity driven deepfake face swapping detection

Huang, B., Wang, Z., Yang, J., Ai, J., Zou, Q., Wang, Q., and Ye, D. Implicit identity driven deepfake face swapping detection. In CVPR, pp.\ 4490--4499, 2023

work page 2023
[34]

Jeong, Y. et al. Bihpf: Bilateral high-pass filters for robust deepfake detection. In WACV, pp.\ 48--57, 2022

work page 2022
[35]

Jiang, L., Li, R., Wu, W., Qian, C., and Loy, C. C. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In CVPR, 2020

work page 2020
[36]

Progressive growing of gans for improved quality, stability, and variation

Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. In ICLR, 2018

work page 2018
[37]

A style-based generator architecture for generative adversarial networks

Karras, T., Laine, S., and Aila, T. A style-based generator architecture for generative adversarial networks. In CVPR, 2019

work page 2019
[38]

and Woo, S

Khalid, H. and Woo, S. S. Oc-fakedect: Classifying deepfakes using one-class variational autoencoder. In CVPRW, pp.\ 656--657, 2020

work page 2020
[40]

Enhancing general face forgery detection via vision transformer with low-rank adaptation

Kong, C., Li, H., and Wang, S. Enhancing general face forgery detection via vision transformer with low-rank adaptation. In ICCV, pp.\ 102--107. IEEE, 2023

work page 2023
[42]

DeepFakes: a New Threat to Face Recognition? Assessment and Detection

Korshunov, P. and Marcel, S. Deepfakes: a new threat to face recognition? assessment and detection. arXiv preprint arXiv:1812.08685, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[43]

Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes

Larue, N., Vu, N.-S., Struc, V., Peer, P., and Christophides, V. Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes. In ICCV, pp.\ 21011--21021, 2023

work page 2023
[44]

Learning to generalize: Meta-learning for domain generalization

Li, D., Yang, Y., Song, Y.-Z., and Hospedales, T. Learning to generalize: Meta-learning for domain generalization. In AAAI, volume 32, 2018

work page 2018
[45]

Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection

Li, J., Xie, H., Li, J., Wang, Z., and Zhang, Y. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In CVPR, 2021

work page 2021
[46]

Diverse image synthesis from semantic layouts via conditional imle

Li, K., Zhang, T., and Malik, J. Diverse image synthesis from semantic layouts via conditional imle. In ICCV, 2019

work page 2019
[47]

Face x-ray for more general face forgery detection

Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., and Guo, B. Face x-ray for more general face forgery detection. In CVPR, 2020 a

work page 2020
[48]

Celeb-df: A new dataset for deepfake forensics

Li, Y., Yang, X., Sun, P., Qi, H., and Lyu, S. Celeb-df: A new dataset for deepfake forensics. In CVPR, 2020 b

work page 2020
[50]

Spatial-phase shallow learning: rethinking face forgery detection in frequency domain

Liu, H., Li, X., Zhou, W., Chen, Y., He, Y., Xue, H., Zhang, W., and Yu, N. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In CVPR, 2021 a

work page 2021
[51]

Forgery-aware adaptive transformer for generalizable synthetic image detection

Liu, H., Tan, Z., Tan, C., Wei, Y., Wang, J., and Zhao, Y. Forgery-aware adaptive transformer for generalizable synthetic image detection. In CVPR, pp.\ 10770--10780, 2024

work page 2024
[52]

Swin transformer: Hierarchical vision transformer using shifted windows

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pp.\ 10012--10022, 2021 b

work page 2021
[53]

Liu, Z. et al. Global texture enhancement for fake face detection in the wild. In CVPR, pp.\ 8060--8069, 2020

work page 2020
[55]

Luo, A., Kong, C., Huang, J., Hu, Y., Kang, X., and Kot, A. C. Beyond the prior forgery knowledge: Mining critical clues for general face forgery detection. IEEE TIFS, 19: 0 1168--1182, 2023 b

work page 2023
[56]

Generalizing face forgery detection with high-frequency features

Luo, Y., Zhang, Y., Yan, J., and Liu, W. Generalizing face forgery detection with high-frequency features. In CVPR, 2021

work page 2021
[57]

Lare\^ 2 : Latent reconstruction error based method for diffusion-generated image detection

Luo, Y., Du, J., Yan, K., and Ding, S. Lare\^ 2 : Latent reconstruction error based method for diffusion-generated image detection. In CVPR, pp.\ 17006--17015, 2024

work page 2024
[58]

F 2 trans: High-frequency fine-grained transformer for face forgery detection

Miao, C., Tan, Z., Chu, Q., Liu, H., Hu, H., and Yu, N. F 2 trans: High-frequency fine-grained transformer for face forgery detection. IEEE TIFS, 18: 0 1039--1051, 2023

work page 2023
[59]

https://www.midjourney.com/home

MidJourney. https://www.midjourney.com/home

work page
[60]

and Rostamizadeh, A

Mohri, M. and Rostamizadeh, A. Rademacher complexity bounds for non-iid processes. Advances in neural information processing systems, 21, 2008

work page 2008
[61]

M., Chandrasekaran, S., Flenner, A., Bappy, J

Nataraj, L., Mohammed, T. M., Chandrasekaran, S., Flenner, A., Bappy, J. H., Roy-Chowdhury, A. K., and Manjunath, B. Detecting gan generated fake images using co-occurrence matrices. arXiv preprint arXiv:1903.06836, 2019

work page arXiv 1903
[62]

Core: Consistent representation learning for face forgery detection

Ni, Y., Meng, D., Yu, C., Quan, C., Ren, D., and Zhao, Y. Core: Consistent representation learning for face forgery detection. In CVPRW, pp.\ 12--21, 2022

work page 2022
[64]

Glide: Towards photorealistic image generation and editing with text-guided diffusion models

Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., and Chen, M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In ICML, 2022

work page 2022
[65]

Ojha, U. et al. Towards universal fake image detectors that generalize across generative models. In CVPR, pp.\ 24480--24489, 2023

work page 2023
[66]

Semantic image synthesis with spatially-adaptive normalization

Park, T., Liu, M.-Y., Wang, T.-C., and Zhu, J.-Y. Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019

work page 2019
[67]

Beit v2: Masked image modeling with vector-quantized visual tokenizers

Peng, Z., Dong, L., Bao, H., Ye, Q., and Wei, F. Beit v2: Masked image modeling with vector-quantized visual tokenizers. arXiv preprint arXiv:2208.06366, 2022

work page arXiv 2022
[68]

Thinking in frequency: Face forgery detection by mining frequency-aware clues

Qian, Y., Yin, G., Sheng, L., Chen, Z., and Shao, J. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In ECCV, pp.\ 86--103. Springer, 2020

work page 2020
[69]

W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. In ICML, pp.\ 8748--8763. PMLR, 2021

work page 2021
[70]

Zero-shot text-to-image generation

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. Zero-shot text-to-image generation. In ICML, 2021

work page 2021
[71]

High-resolution image synthesis with latent diffusion models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, 2022 a

work page 2022
[72]

High-resolution image synthesis with latent diffusion models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In CVPR, pp.\ 10684--10695, 2022 b

work page 2022
[73]

Face F orensics++: Learning to detect manipulated facial images

R\"ossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., and Nie ner, M. Face F orensics++: Learning to detect manipulated facial images. In ICCV, 2019

work page 2019
[74]

Faceforensics++: Learning to detect manipulated facial images

Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., and Nie ner, M. Faceforensics++: Learning to detect manipulated facial images. In ICCV, pp.\ 1--11, 2019

work page 2019
[76]

and Yamasaki, T

Shiohara, K. and Yamasaki, T. Detecting deepfakes with self-blended images. In CVPR, pp.\ 18720--18729, 2022

work page 2022
[77]

Blendface: Re-designing identity encoders for face-swapping

Shiohara, K., Yang, X., and Taketomi, T. Blendface: Re-designing identity encoders for face-swapping. In ICCV, pp.\ 7634--7644, 2023

work page 2023
[78]

Domain general face forgery detection by learning to weight

Sun, K., Liu, H., Ye, Q., Gao, Y., Liu, J., Shao, L., and Ji, R. Domain general face forgery detection by learning to weight. In AAAI, volume 35, pp.\ 2638--2646, 2021

work page 2021
[79]

Dual contrastive learning for general face forgery detection

Sun, K., Yao, T., Chen, S., Ding, S., Li, J., and Ji, R. Dual contrastive learning for general face forgery detection. In AAAI, volume 36, pp.\ 2316--2324, 2022

work page 2022
[80]

Learning on gradients: Generalized artifacts representation for gan-generated images detection

Tan, C., Zhao, Y., Wei, S., Gu, G., and Wei, Y. Learning on gradients: Generalized artifacts representation for gan-generated images detection. In CVPR, pp.\ 12105--12114, June 2023

work page 2023
[81]

Data-independent operator: A training-free artifact representation extractor for generalizable deepfake detection

Tan, C., Liu, P., Tao, R., Liu, H., Zhao, Y., Wu, B., and Wei, Y. Data-independent operator: A training-free artifact representation extractor for generalizable deepfake detection. arXiv preprint arXiv:2403.06803, 2024 a

work page arXiv 2024
[82]

Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning

Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., and Wei, Y. Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning. In AAAI, volume 38, pp.\ 5052--5060, 2024 b

work page 2024
[83]

Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection

Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., and Wei, Y. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection. In CVPR, pp.\ 28130--28139, 2024 c

work page 2024
[84]

and Le, Q

Tan, M. and Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, pp.\ 6105--6114. PMLR, 2019

work page 2019
[85]

Face2face: Real-time face capture and reenactment of rgb videos

Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., and Nie ner, M. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR, 2016

work page 2016
[86]

Training data-efficient image transformers & distillation through attention

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J \'e gou, H. Training data-efficient image transformers & distillation through attention. In ICML, pp.\ 10347--10357. PMLR, 2021

work page 2021
[87]

and Liu, Y

Trinh, L. and Liu, Y. An examination of fairness of ai models for deepfake detection. arXiv, 2021

work page 2021
[88]

M2tr: Multi-modal multi-scale transformers for deepfake detection

Wang, J., Wu, Z., Ouyang, W., Han, X., Chen, J., Jiang, Y.-G., and Li, S.-N. M2tr: Multi-modal multi-scale transformers for deepfake detection. In ICMR, pp.\ 615--623, 2022

work page 2022

Showing first 80 references.