DRIFT: From Robustness Gaps to Invariance Manifolds for AI-Generated Image Detection

Abhishek Ameta; Akshay Janardan Bankar; Amit Satish Unde; Ankita Chatterjee; Harshit; Sayan Banerjee; Shreyas Pandith

arxiv: 2606.06918 · v1 · pith:MRGU775Anew · submitted 2026-06-05 · 💻 cs.CV

DRIFT: From Robustness Gaps to Invariance Manifolds for AI-Generated Image Detection

Abhishek Ameta , Sayan Banerjee , Shreyas Pandith , Harshit , Ankita Chatterjee , Akshay Janardan Bankar , Amit Satish Unde This is my paper

Pith reviewed 2026-06-27 22:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords ai-generated image detectioninvariance manifoldone-class supervisionrobust subspacefragile subspaceordering marginopen-world generalizationmulti-scale drift

0 comments

The pith

Detection of AI-generated images works by learning an invariance manifold from real images alone and flagging margin violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that AI-generated image detection can be reformulated as one-class learning of a structured invariance manifold built only from real images. Lightweight projection heads on a frozen vision foundation model split the space into a robust subspace that ignores physical imaging changes and a fragile subspace that stays sensitive to edits, with an ordering margin keeping the two separated. Detection then reduces to checking whether a candidate image violates that margin under the learned transformations. This matters to a sympathetic reader because prior robustness-gap methods use fixed pretraining geometry that fails on new generators, while the manifold approach claims to adapt the invariance structure directly to the detection task and to supply localization maps as well.

Core claim

We formulate AI-generated image detection as learning a structured invariance manifold of real images under one-class supervision. Building upon a frozen VFM, we introduce lightweight projection heads that decompose representation space into complementary robust and fragile subspaces. The robust subspace is explicitly trained to suppress variations induced by physically plausible imaging transformations, approximating tangent directions of a real-image manifold, while the fragile subspace retains sensitivity to edit-like perturbations. A structured ordering margin enforces hierarchical separation between physical invariance and edit-induced variability, enabling detection as a margin-violati

What carries the argument

structured invariance manifold with robust and fragile subspaces separated by a structured ordering margin

If this is right

Images from unseen generators trigger margin violations relative to the learned real-image manifold, allowing detection without any fake training data.
Multi-scale patch-wise drift supplies both an overall detection score and spatially localized invariance-violation maps.
The dual-channel signature from robust and fragile subspaces improves open-world performance over fixed pretraining robustness gaps.
The method generalizes across generator types and image resolutions by construction of the manifold rather than by retraining on new fakes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same one-class manifold construction could be tested on video or audio by swapping the definition of physically plausible transformations.
The fragile subspace might be inspected directly to surface recurring artifacts that current generators still produce.
Combining the margin-violation score with existing pixel-level detectors could yield ensembles that remain stable when generators change.
The approach suggests experiments on whether smaller real-image sets from a single camera model still suffice to build a usable manifold.

Load-bearing premise

One-class training on real images using physically plausible transformations will produce a manifold whose margin violations reliably flag images from entirely unseen generators at different resolutions.

What would settle it

A newly released generator that produces images consistently inside the margin boundaries at multiple scales and under both transformation families would show the detection rule does not hold.

Figures

Figures reproduced from arXiv: 2606.06918 by Abhishek Ameta, Akshay Janardan Bankar, Amit Satish Unde, Ankita Chatterjee, Harshit, Sayan Banerjee, Shreyas Pandith.

**Figure 1.** Figure 1: Visualization of robust-fragile drift violation. Real images maintain drift consistency, whereas AI-generated images show strong drift violations, producing highresponse heatmaps. ods frequently degrade under unseen generators or distribution shifts. More recent work has explored training-free detection strategies using vision foundation models (VFMs) [6, 35], such as DINOv2 [31], based on the observati… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed student–teacher training framework. The student network processes the original image x while the EMA teacher processes transformed views. Two projection heads learn complementary representations. A reconstruction decoder anchors the representation to prevent collapse. 3 Methodology 3.1 Problem Setting and Notation Let x ∈ R H×W×3 denote an RGB image. We assume access to a set of re… view at source ↗

**Figure 3.** Figure 3: Representation geometry under robust and fragile transformations. 4.4 Ablation Studies We perform ablations in [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of robust-fragile drift violation. Real images maintain drift consistency [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of robust-fragile drift violation. Real images maintain drift consistency [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of robust-fragile drift violation. AI-generated images show strong drift violations, producing high-response heatmaps [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of robust-fragile drift violation. AI-generated images show strong drift violations, producing high-response heatmaps [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗

read the original abstract

The rapid evolution of generative image models challenges existing AI-generated image detectors, particularly in open-world settings with unseen generators. Recent training-free approaches measure robustness gaps in frozen vision foundation models (VFMs), detecting fakes via perturbation-induced embedding drift. However, these methods rely on fixed invariance geometry inherited from pretraining and lack principled adaptation to the detection task. We instead formulate AI-generated image detection as learning a structured invariance manifold of real images under one-class supervision. Building upon a frozen VFM, we introduce lightweight projection heads that decompose representation space into complementary robust and fragile subspaces. The robust subspace is explicitly trained to suppress variations induced by physically plausible imaging transformations, approximating tangent directions of a real-image manifold, while the fragile subspace retains sensitivity to edit-like perturbations. A structured ordering margin enforces hierarchical separation between physical invariance and edit-induced variability, enabling detection as a margin-violation test relative to the learned manifold. At inference, multi-scale patch-wise drift under both transformation families yields a dual-channel invariance signature and interpretable localization. Extensive experiments demonstrate strong open-world generalization across unseen generators and resolutions, consistently outperforming training-free robustness-based baselines while providing interpretable invariance-violation maps.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formulates AI-generated image detection as one-class learning of a structured invariance manifold via subspace decomposition on a frozen VFM, but supplies no experimental results to support its open-world claims.

read the letter

The paper's core idea is to move past fixed robustness-gap checks by training lightweight projection heads on real images only. These heads split the representation into a robust subspace (suppressed under physical transforms) and a fragile subspace (kept sensitive to edits), with an ordering margin that enforces separation so detection reduces to margin-violation checks. At inference it combines multi-scale drift from both transform families.

What is actually new is the explicit decomposition plus the structured margin under one-class supervision; the cited robustness-gap baselines do not use this trainable split or the hierarchical ordering term.

The formulation is clear and directly addresses the limitation of inheriting pretraining geometry. The choice to keep the VFM frozen and add only small heads is practical, and the dual-channel signature with localization maps is a reasonable way to make the output interpretable.

The soft spots are large and central. The abstract states strong open-world generalization and consistent outperformance, yet contains zero numbers, no dataset sizes, no ablations, and no error bars. The key assumption—that physical transforms alone will make the fragile subspace flag artifacts from entirely unseen generators—receives no supporting evidence here. The stress-test concern holds: nothing penalizes the fragile subspace from treating generator-specific patterns as in-manifold, so the method could simply measure sensitivity to the chosen transform families rather than learn a true detection manifold.

This is for people working on training-light or one-class methods in image forensics. A reader interested in the subspace idea might extract a useful angle, but only if the full paper's experiments are actually present and convincing.

I would not send it to peer review on the basis of the abstract alone; the claims need quantitative grounding before referee time is warranted.

Referee Report

2 major / 2 minor

Summary. The paper claims that AI-generated image detection can be reformulated as learning a structured invariance manifold of real images under one-class supervision on a frozen VFM. Lightweight projection heads decompose the representation into complementary robust (suppressing physically plausible transforms) and fragile subspaces, separated by a structured ordering margin that enforces hierarchical separation; detection then reduces to a margin-violation test, with multi-scale patch-wise drift yielding a dual-channel signature and localization maps. The abstract asserts that this yields strong open-world generalization across unseen generators and resolutions while outperforming training-free robustness baselines.

Significance. If the experimental claims hold, the work would offer a principled, adaptive alternative to fixed-geometry training-free detectors, with added interpretability via invariance-violation maps. The one-class formulation and explicit subspace decomposition could influence future detector design in open-world settings.

major comments (2)

[Abstract] Abstract: the central claim of strong open-world generalization and consistent outperformance rests on experimental assertions, yet the abstract supplies no quantitative results, ablation details, error bars, or dataset statistics, leaving the load-bearing generalization claim unverified in the provided summary.
[Method] Method description (one-class supervision and subspace decomposition): the assumption that training solely on real images plus physically plausible transforms will cause the fragile subspace to flag artifacts from entirely unseen generators via margin violation is load-bearing for the open-world claim, but nothing in the formulation explicitly penalizes leakage or ensures the margin enforces sensitivity to generator-specific edits rather than the training transform families.

minor comments (2)

Clarify the precise optimization objective for the projection heads and the structured ordering margin (including whether the margin value is a learned free parameter or derived from data).
Provide dataset statistics, generator names, and resolution ranges used in the claimed extensive experiments to allow assessment of the open-world scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of strong open-world generalization and consistent outperformance rests on experimental assertions, yet the abstract supplies no quantitative results, ablation details, error bars, or dataset statistics, leaving the load-bearing generalization claim unverified in the provided summary.

Authors: We agree that the abstract would be strengthened by including key quantitative support. In the revised version we will add concise performance highlights (e.g., mean AUC on unseen generators, number of generators and resolutions tested) while remaining within length limits. revision: yes
Referee: [Method] Method description (one-class supervision and subspace decomposition): the assumption that training solely on real images plus physically plausible transforms will cause the fragile subspace to flag artifacts from entirely unseen generators via margin violation is load-bearing for the open-world claim, but nothing in the formulation explicitly penalizes leakage or ensures the margin enforces sensitivity to generator-specific edits rather than the training transform families.

Authors: The structured ordering margin and complementary subspace objectives already enforce the desired separation: the robust head is explicitly optimized to absorb only the listed physical transforms, while the fragile head retains all residual directions. Any generator-induced edit lies outside the learned real-image manifold by construction and therefore triggers a margin violation in the fragile channel. We will insert a short clarifying paragraph in Section 3.3 that makes this leakage-prevention argument explicit and references the margin loss term. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new trainable components learned independently from data

full rationale

The derivation introduces lightweight projection heads and a structured ordering margin that are explicitly trained under one-class supervision on real images with physical transformations. These elements decompose the space into robust and fragile subspaces and enforce separation via a learned margin, rather than reducing by construction to any pre-fitted parameters, self-citations, or renamed known results. The central claim of open-world detection via margin violation is an empirical generalization step outside the training loop, with no load-bearing self-definitional or fitted-input reductions visible in the formulation.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 4 invented entities

The approach rests on several domain assumptions about vision foundation models and introduces multiple new learned components and conceptual entities without external verification.

free parameters (2)

projection head parameters
Weights of the lightweight heads trained to define robust and fragile subspaces
ordering margin value
Hyperparameter controlling hierarchical separation between physical and edit-induced variability

axioms (2)

domain assumption Frozen vision foundation models supply a representation space in which physically plausible transformations approximate tangent directions of a real-image manifold
Invoked to justify training the robust subspace
ad hoc to paper One-class supervision on real images is sufficient to learn a manifold that generalizes to detect fakes from unseen generators
Central modeling choice of the method

invented entities (4)

structured invariance manifold no independent evidence
purpose: Representation of real images under physical transformations for detection
Core new construct introduced for the detection task
robust subspace no independent evidence
purpose: Suppresses variations from physically plausible imaging transformations
Decomposition component introduced in the paper
fragile subspace no independent evidence
purpose: Retains sensitivity to edit-like perturbations
Complementary decomposition component
structured ordering margin no independent evidence
purpose: Enforces hierarchical separation between physical invariance and edit variability
New margin mechanism for the manifold

pith-pipeline@v0.9.1-grok · 5767 in / 1704 out tokens · 29769 ms · 2026-06-27T22:31:44.803239+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 12 canonical work pages · 7 internal anchors

[1]

Openjourney

Prompthero. Openjourney. https://openjourney.art (2023)

2023
[2]

https://huggingface.co/Yntec/YiffyMix (2023)

Yiffymix v31. https://huggingface.co/Yntec/YiffyMix (2023)

2023
[3]

https://huggingface.co/dataautogpt3/ProteusV0.3 (2024)

Proteus v0.3. https://huggingface.co/dataautogpt3/ProteusV0.3 (2024)

2024
[4]

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global tonal adjustment with a database of input/output image pairs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

2011
[6]

In: Proceedings of the IEEE/CVF international conference on computer vision

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9650–9660 (2021)

2021
[7]

Cheng, S., Lyu, L., Wang, Z., Zhang, X., Sehwag, V.: Co-spy: Combining semantic andpixelfeaturestodetectsyntheticimagesbyai.In:ProceedingsoftheComputer Vision and Pattern Recognition Conference. pp. 13455–13465 (2025)

2025
[8]

arXiv preprint arXiv:2511.14030 (2025)

Choi, S., Lee, H., Lee, M.: Training-free detection of ai-generated images via crop- ping robustness. arXiv preprint arXiv:2511.14030 (2025)

work page arXiv 2025
[9]

In: Proceed- ings of the Computer Vision and Pattern Recognition Conference

Chu, B., Xu, X., Wang, X., Zhang, Y., You, W., Zhou, L.: Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 12830–12839 (2025)

2025
[10]

In: ICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models. In: ICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1–5. IEEE (2023)

2023
[11]

Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: Raise: A raw images datasetfordigitalimageforensics.In:ProceedingsoftheACMMultimediaSystems Conference (MMSys) (2015)

2015
[12]

Advances in neural information processing systems34, 8780–8794 (2021)

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems34, 8780–8794 (2021)

2021
[14]

In: Forty-first international conference on machine learning (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high-resolution image synthesis. In: Forty-first international conference on machine learning (2024)

2024
[15]

Advances in neural information processing systems33, 21271–21284 (2020) 16 A

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Do- ersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems33, 21271–21284 (2020) 16 A. Ameta, S. Banerjee et al

2020
[16]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., Guo, B.: Vector quantized diffusion model for text-to-image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10696– 10706 (2022)

2022
[17]

arXiv preprint arXiv:2405.20112 (2024)

He, Z., Chen, P.Y., Ho, T.Y.: Rigid: A training-free and model-agnostic framework for robust ai-generated image detection. arXiv preprint arXiv:2405.20112 (2024)

work page arXiv 2024
[18]

In: Proceedings of the IEEE/CVF winter conference on applications of computer vision

Jeong, Y., Kim, D., Min, S., Joe, S., Gwon, Y., Choi, J.: Bihpf: Bilateral high-pass filters for robust deepfake detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 48–57 (2022)

2022
[19]

In: Proceedings of the AAAI conference on artificial intelligence

Jeong, Y., Kim, D., Ro, Y., Choi, J.: Frepgan: robust deepfake detection using frequency-level perturbations. In: Proceedings of the AAAI conference on artificial intelligence. vol. 36(1), pp. 1060–1068 (2022)

2022
[20]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Karageorgiou, D., Papadopoulos, S., Kompatsiaris, I., Gavves, E.: Any-resolution ai-generated image detection by spectral learning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 18706–18717 (2025)

2025
[21]

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for im- proved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4401–4410 (2019)

2019
[23]

In: Pro- ceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Li, O., Cai, J., Hao, Y., Jiang, X., Hu, Y., Feng, F.: Improving synthetic image detection towards generalization: An image transformation perspective. In: Pro- ceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. pp. 2405–2414 (2025)

2025
[24]

arXiv preprint arXiv:2509.20890 (2025)

Liang,S.,Liu,J.,Chen,R.,Guan,Q.:Ferretnet:Efficientsyntheticimagedetection via local pixel dependencies. arXiv preprint arXiv:2509.20890 (2025)

work page arXiv 2025
[25]

In: European conference on computer vision

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision. pp. 740–755. Springer (2014)

2014
[26]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, H., Tan, Z., Tan, C., Wei, Y., Wang, J., Zhao, Y.: Forgery-aware adaptive transformer for generalizable synthetic image detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10770– 10780 (2024)

2024
[27]

arXiv preprint arXiv:2202.09778 (2022)

Liu, L., Ren, Y., Lin, Z., Zhao, Z.: Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778 (2022)

work page arXiv 2022
[28]

In: Proceedings of the IEEE international conference on computer vision

Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision. pp. 3730– 3738 (2015)

2015
[29]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., Chen, M.: Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[30]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Ojha, U., Li, Y., Lee, Y.J.: Towards universal fake image detectors that gener- alize across generative models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 24480–24489 (2023)

2023
[31]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2337–2346 (2019) DRIFT 17

2019
[33]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[34]

In: European conference on computer vision

Qian,Y.,Yin,G.,Sheng,L.,Chen,Z.,Shao,J.:Thinkinginfrequency:Faceforgery detection by mining frequency-aware clues. In: European conference on computer vision. pp. 86–103. Springer (2020)

2020
[35]

In: International conference on machine learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

2021
[36]

In: International conference on machine learning

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: International conference on machine learning. pp. 8821–8831. Pmlr (2021)

2021
[37]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ricker, J., Lukovnikov, D., Fischer, A.: Aeroblade: Training-free detection of latent diffusion images using autoencoder reconstruction error. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9130– 9140 (2024)

2024
[38]

High-Resolution Image Synthesis with Latent Diffusion Models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High- resolution image synthesis with latent diffusion models, 2022. URL https://arxiv. org/abs/2112.107522112(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[39]

In: Proceedings of the IEEE/CVF international conference on computer vision

Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: Face- forensics++: Learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1–11 (2019)

2019
[40]

International journal of computer vision115(3), 211–252 (2015)

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recog- nition challenge. International journal of computer vision115(3), 211–252 (2015)

2015
[41]

Advances in neural information processing systems35, 25278–25294 (2022)

Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., et al.: Laion-5b: An open large- scale dataset for training next generation image-text models. Advances in neural information processing systems35, 25278–25294 (2022)

2022
[42]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38(5), pp. 5052–5060 (2024)

2024
[43]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Rethinking the up-sampling op- erations in cnn-based generative network for generalizable deepfake detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 28130–28139 (2024)

2024
[44]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Tan, C., Zhao, Y., Wei, S., Gu, G., Wei, Y.: Learning on gradients: Generalized artifacts representation for gan-generated images detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12105– 12114 (2023)

2023
[45]

arXiv preprint arXiv:2411.19117 (2024)

Tsai, C.T., Ko, C.Y., Chung, I., Wang, Y.C.F., Chen, P.Y., et al.: Understanding and improving training-free ai-generated image detections with vision foundation models. arXiv preprint arXiv:2411.19117 (2024)

work page arXiv 2024
[46]

Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: Cnn-generated images are surprisingly easy to spot... for now. In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. pp. 8695–8704 (2020)

2020
[47]

In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision

Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: Dire for diffusion- generated image detection. In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision. pp. 22445–22455 (2023) 18 A. Ameta, S. Banerjee et al

2023
[48]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[49]

In: Proceedings of the IEEE interna- tional conference on computer vision

Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE interna- tional conference on computer vision. pp. 2223–2232 (2017) DRIFT 1 DRIFT: From Robustness Gaps to Invariance Manifolds for AI-Generated Image Detection Supplementary Material 6 Geometric Interpr...

2017

[1] [1]

Openjourney

Prompthero. Openjourney. https://openjourney.art (2023)

2023

[2] [2]

https://huggingface.co/Yntec/YiffyMix (2023)

Yiffymix v31. https://huggingface.co/Yntec/YiffyMix (2023)

2023

[3] [3]

https://huggingface.co/dataautogpt3/ProteusV0.3 (2024)

Proteus v0.3. https://huggingface.co/dataautogpt3/ProteusV0.3 (2024)

2024

[4] [4]

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

Bychkovsky, V., Paris, S., Chan, E., Durand, F.: Learning photographic global tonal adjustment with a database of input/output image pairs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

2011

[6] [6]

In: Proceedings of the IEEE/CVF international conference on computer vision

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9650–9660 (2021)

2021

[7] [7]

Cheng, S., Lyu, L., Wang, Z., Zhang, X., Sehwag, V.: Co-spy: Combining semantic andpixelfeaturestodetectsyntheticimagesbyai.In:ProceedingsoftheComputer Vision and Pattern Recognition Conference. pp. 13455–13465 (2025)

2025

[8] [8]

arXiv preprint arXiv:2511.14030 (2025)

Choi, S., Lee, H., Lee, M.: Training-free detection of ai-generated images via crop- ping robustness. arXiv preprint arXiv:2511.14030 (2025)

work page arXiv 2025

[9] [9]

In: Proceed- ings of the Computer Vision and Pattern Recognition Conference

Chu, B., Xu, X., Wang, X., Zhang, Y., You, W., Zhou, L.: Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 12830–12839 (2025)

2025

[10] [10]

In: ICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models. In: ICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1–5. IEEE (2023)

2023

[11] [11]

Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: Raise: A raw images datasetfordigitalimageforensics.In:ProceedingsoftheACMMultimediaSystems Conference (MMSys) (2015)

2015

[12] [12]

Advances in neural information processing systems34, 8780–8794 (2021)

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems34, 8780–8794 (2021)

2021

[13] [14]

In: Forty-first international conference on machine learning (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high-resolution image synthesis. In: Forty-first international conference on machine learning (2024)

2024

[14] [15]

Advances in neural information processing systems33, 21271–21284 (2020) 16 A

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Do- ersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems33, 21271–21284 (2020) 16 A. Ameta, S. Banerjee et al

2020

[15] [16]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., Guo, B.: Vector quantized diffusion model for text-to-image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10696– 10706 (2022)

2022

[16] [17]

arXiv preprint arXiv:2405.20112 (2024)

He, Z., Chen, P.Y., Ho, T.Y.: Rigid: A training-free and model-agnostic framework for robust ai-generated image detection. arXiv preprint arXiv:2405.20112 (2024)

work page arXiv 2024

[17] [18]

In: Proceedings of the IEEE/CVF winter conference on applications of computer vision

Jeong, Y., Kim, D., Min, S., Joe, S., Gwon, Y., Choi, J.: Bihpf: Bilateral high-pass filters for robust deepfake detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 48–57 (2022)

2022

[18] [19]

In: Proceedings of the AAAI conference on artificial intelligence

Jeong, Y., Kim, D., Ro, Y., Choi, J.: Frepgan: robust deepfake detection using frequency-level perturbations. In: Proceedings of the AAAI conference on artificial intelligence. vol. 36(1), pp. 1060–1068 (2022)

2022

[19] [20]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Karageorgiou, D., Papadopoulos, S., Kompatsiaris, I., Gavves, E.: Any-resolution ai-generated image detection by spectral learning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 18706–18717 (2025)

2025

[20] [21]

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for im- proved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [22]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4401–4410 (2019)

2019

[22] [23]

In: Pro- ceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Li, O., Cai, J., Hao, Y., Jiang, X., Hu, Y., Feng, F.: Improving synthetic image detection towards generalization: An image transformation perspective. In: Pro- ceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. pp. 2405–2414 (2025)

2025

[23] [24]

arXiv preprint arXiv:2509.20890 (2025)

Liang,S.,Liu,J.,Chen,R.,Guan,Q.:Ferretnet:Efficientsyntheticimagedetection via local pixel dependencies. arXiv preprint arXiv:2509.20890 (2025)

work page arXiv 2025

[24] [25]

In: European conference on computer vision

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision. pp. 740–755. Springer (2014)

2014

[25] [26]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu, H., Tan, Z., Tan, C., Wei, Y., Wang, J., Zhao, Y.: Forgery-aware adaptive transformer for generalizable synthetic image detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10770– 10780 (2024)

2024

[26] [27]

arXiv preprint arXiv:2202.09778 (2022)

Liu, L., Ren, Y., Lin, Z., Zhao, Z.: Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778 (2022)

work page arXiv 2022

[27] [28]

In: Proceedings of the IEEE international conference on computer vision

Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision. pp. 3730– 3738 (2015)

2015

[28] [29]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., Chen, M.: Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[29] [30]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Ojha, U., Li, Y., Lee, Y.J.: Towards universal fake image detectors that gener- alize across generative models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 24480–24489 (2023)

2023

[30] [31]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [32]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2337–2346 (2019) DRIFT 17

2019

[32] [33]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[33] [34]

In: European conference on computer vision

Qian,Y.,Yin,G.,Sheng,L.,Chen,Z.,Shao,J.:Thinkinginfrequency:Faceforgery detection by mining frequency-aware clues. In: European conference on computer vision. pp. 86–103. Springer (2020)

2020

[34] [35]

In: International conference on machine learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

2021

[35] [36]

In: International conference on machine learning

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: International conference on machine learning. pp. 8821–8831. Pmlr (2021)

2021

[36] [37]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ricker, J., Lukovnikov, D., Fischer, A.: Aeroblade: Training-free detection of latent diffusion images using autoencoder reconstruction error. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9130– 9140 (2024)

2024

[37] [38]

High-Resolution Image Synthesis with Latent Diffusion Models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High- resolution image synthesis with latent diffusion models, 2022. URL https://arxiv. org/abs/2112.107522112(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[38] [39]

In: Proceedings of the IEEE/CVF international conference on computer vision

Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: Face- forensics++: Learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1–11 (2019)

2019

[39] [40]

International journal of computer vision115(3), 211–252 (2015)

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recog- nition challenge. International journal of computer vision115(3), 211–252 (2015)

2015

[40] [41]

Advances in neural information processing systems35, 25278–25294 (2022)

Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., et al.: Laion-5b: An open large- scale dataset for training next generation image-text models. Advances in neural information processing systems35, 25278–25294 (2022)

2022

[41] [42]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38(5), pp. 5052–5060 (2024)

2024

[42] [43]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Rethinking the up-sampling op- erations in cnn-based generative network for generalizable deepfake detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 28130–28139 (2024)

2024

[43] [44]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Tan, C., Zhao, Y., Wei, S., Gu, G., Wei, Y.: Learning on gradients: Generalized artifacts representation for gan-generated images detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12105– 12114 (2023)

2023

[44] [45]

arXiv preprint arXiv:2411.19117 (2024)

Tsai, C.T., Ko, C.Y., Chung, I., Wang, Y.C.F., Chen, P.Y., et al.: Understanding and improving training-free ai-generated image detections with vision foundation models. arXiv preprint arXiv:2411.19117 (2024)

work page arXiv 2024

[45] [46]

Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: Cnn-generated images are surprisingly easy to spot... for now. In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. pp. 8695–8704 (2020)

2020

[46] [47]

In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision

Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: Dire for diffusion- generated image detection. In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision. pp. 22445–22455 (2023) 18 A. Ameta, S. Banerjee et al

2023

[47] [48]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[48] [49]

In: Proceedings of the IEEE interna- tional conference on computer vision

Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE interna- tional conference on computer vision. pp. 2223–2232 (2017) DRIFT 1 DRIFT: From Robustness Gaps to Invariance Manifolds for AI-Generated Image Detection Supplementary Material 6 Geometric Interpr...

2017