Navigating the Challenges of AI-Generated Image Detection in the Wild: What Truly Matters?

Christos Koutlis; Despina Konstantinidou; Dimitrios Karageorgiou; Emmanouil Schinas; Olga Papadopoulou; Symeon Papadopoulos

arxiv: 2507.10236 · v2 · pith:HNLG54IYnew · submitted 2025-07-14 · 💻 cs.CV

Navigating the Challenges of AI-Generated Image Detection in the Wild: What Truly Matters?

Despina Konstantinidou , Dimitrios Karageorgiou , Christos Koutlis , Olga Papadopoulou , Emmanouil Schinas , Symeon Papadopoulos This is my paper

Pith reviewed 2026-05-21 23:27 UTC · model grok-4.3

classification 💻 cs.CV

keywords AI-generated image detectionreal-world evaluationsocial media imagesdesign choiceslow-level traceshigh-level semanticsITW-SM datasetAUC improvement

0 comments

The pith

Optimizing each design choice to propagate low-level traces and high-level semantics improves AI-generated image detection AUC by 26.87% under real-world conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the ITW-SM dataset of real and AI-generated images collected from major social media platforms to test detectors outside laboratory benchmarks. It systematically varies architecture, pre-trained latent spaces, training data volume, and preprocessing steps to measure their effects. Simple increases in model scale or data quantity do not reliably raise performance. Instead, the authors find that tuning the full pipeline so it can carry forward both low-level generation artifacts and high-level semantic content produces consistent gains. This yields an average 26.87 percent AUC lift across several existing detectors when evaluated on the challenging social-media images.

Core claim

By curating the ITW-SM dataset from major social media platforms and performing ablation studies on detector components, the authors establish that effective AI-generated image detection in uncontrolled environments requires a processing pipeline optimized to transmit and utilize both low-level forensic traces and high-level image semantics, rather than relying on larger models or more data alone; this optimization produces an average AUC improvement of 26.87 percent across multiple state-of-the-art approaches.

What carries the argument

The optimized detection pipeline that propagates and analyzes both low-level traces and high-level image semantics.

If this is right

Naively scaling pre-training or adding more training data does not always improve detection performance.
Effective real-world detectors must balance low-level trace analysis with high-level semantic understanding.
The same optimizations improve performance across multiple existing state-of-the-art detection approaches.
These choices supply a practical roadmap for constructing more resilient detectors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar dual-level optimization may help detectors for other generative media such as video or audio.
Test datasets will require regular refresh as new generative models appear.
Adding explicit semantic modules could further strengthen purely artifact-based detectors.
Benchmark-only evaluations are likely to overestimate real-world robustness.

Load-bearing premise

The ITW-SM dataset and the specific experimental conditions used are representative enough of real-world social media images and future AI generators for the observed gains to generalize.

What would settle it

A new test collection drawn from previously unseen social media platforms or generated by newer AI models shows no AUC improvement or a performance drop when the optimized pipeline is applied.

Figures

Figures reproduced from arXiv: 2507.10236 by Christos Koutlis, Despina Konstantinidou, Dimitrios Karageorgiou, Emmanouil Schinas, Olga Papadopoulou, Symeon Papadopoulos.

**Figure 2.** Figure 2: Real (a-e) and generated (f-j) images from the ITW-SM Dataset. [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Model performance (AUC) on benchmark datasets, as reported in original papers, and [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

**Figure 4.** Figure 4: Original and updated model performance (AUC) [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗

read the original abstract

As generative Artificial Intelligence (AI) advances, the realism of AI generated imagery has reached a threshold capable of deceiving even vigilant human observers. Yet, while current AI-generated Image Detection (AID) approaches perform exceptionally well on controlled benchmark datasets, they struggle significantly with real-world cases. To study this behavior we introduce the ITW-SM dataset, a curated collection of real and AI-generated images originating from major social media platforms. We employ it to analyze the effects of key design choices typically considered when building a detector, involving its architecture, pre-trained latent spaces, training data as well as pre-processing approaches. We indicate that naively scaling the pre-training stage or opting for more training data does not always lead to better detection performance. Instead, our work reveals that it is crucial to optimize each design choice to enable the processing pipeline to propagate and effectively analyze both low-level traces as well as high-level image semantics. Building on our findings, we achieve a substantial average improvement of 26.87% in AUC across multiple state-of-the-art detection approaches and under real-world conditions, providing a roadmap for developing more resilient detectors. Our assets are available on https://mever-team.github.io/itw-sm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a social-media-sourced dataset and shows that targeted tuning for both low-level traces and high-level semantics beats naive scaling for real-world detectors, but the gains sit on one collection whose representativeness is unproven.

read the letter

The main thing to know is that this work releases the ITW-SM dataset of real and AI-generated images pulled from actual social media platforms and reports that optimizing architecture, latent spaces, training data, and preprocessing to preserve both low-level forensic signals and high-level semantics delivers an average 26.87% AUC lift across several detectors. Just adding more pretraining or data does not reliably help under these conditions.

Referee Report

3 major / 2 minor

Summary. The paper introduces the ITW-SM dataset of real and AI-generated images collected from major social media platforms. It systematically examines the impact of design choices in AI-generated image detectors—architecture, pre-trained latent spaces, training data, and pre-processing—and concludes that naive scaling of pre-training or data volume does not reliably improve performance. Instead, optimizing each choice to allow the pipeline to analyze both low-level forensic traces and high-level semantics produces a 26.87% average AUC gain across multiple state-of-the-art detectors under real-world conditions. The assets are released publicly.

Significance. If the reported gains prove robust, the work is significant for the field of multimedia forensics and computer vision. It supplies an empirical roadmap that prioritizes balanced feature propagation over simple scaling, introduces a new in-the-wild benchmark, and demonstrates concrete performance lifts on social-media imagery where existing detectors degrade. The public release of the dataset and code supports reproducibility and follow-on research.

major comments (3)

[§4] §4 (Experimental Evaluation) and associated tables: The abstract and results claim a 26.87% average AUC improvement, yet the manuscript provides neither per-detector variance, number of random seeds, nor statistical significance tests (e.g., paired t-test or Wilcoxon test across runs). Without these, it is impossible to determine whether the reported gain exceeds experimental noise and therefore supports the central claim that the optimized pipeline is reliably superior.
[§3.1] §3.1 (ITW-SM Dataset Construction): The generalization argument rests on ITW-SM being representative of unseen platforms and future generators. The curation details—exact generator versions, platform-specific compression pipelines, and any post-processing filters—are not sufficiently quantified. If the dataset inadvertently emphasizes particular artifacts, the observed low-level/high-level balance may be partly tuned to ITW-SM rather than reflecting a transferable principle.
[§4.3] §4.3 (Ablation Studies): The paper states that naive scaling does not always help and that balancing low- and high-level cues is crucial, but the ablation tables do not isolate the marginal contribution of each optimized component (pre-processing, latent space, architecture) to the final 26.87% gain. This weakens the causal link between the design principle and the measured improvement.

minor comments (2)

[Figure 3] Figure 3 and §4.2: Axis labels and legend entries are too small for comfortable reading; increasing font size and adding a short caption explaining the color coding would improve clarity.
[Related Work] Related Work section: Several recent works on social-media image forensics (post-2023) are missing; adding them would better situate the contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have addressed each major comment below and will incorporate revisions to strengthen the empirical support and clarity of our claims.

read point-by-point responses

Referee: [§4] §4 (Experimental Evaluation) and associated tables: The abstract and results claim a 26.87% average AUC improvement, yet the manuscript provides neither per-detector variance, number of random seeds, nor statistical significance tests (e.g., paired t-test or Wilcoxon test across runs). Without these, it is impossible to determine whether the reported gain exceeds experimental noise and therefore supports the central claim that the optimized pipeline is reliably superior.

Authors: We agree that the absence of variance estimates and statistical tests limits the strength of the central claim. In the revised manuscript we will report AUC results averaged over five independent random seeds with standard deviations for each detector and include paired t-tests (or Wilcoxon signed-rank tests where appropriate) comparing the baseline and optimized pipelines. These additions will allow readers to assess whether the 26.87% average gain exceeds experimental variability. revision: yes
Referee: [§3.1] §3.1 (ITW-SM Dataset Construction): The generalization argument rests on ITW-SM being representative of unseen platforms and future generators. The curation details—exact generator versions, platform-specific compression pipelines, and any post-processing filters—are not sufficiently quantified. If the dataset inadvertently emphasizes particular artifacts, the observed low-level/high-level balance may be partly tuned to ITW-SM rather than reflecting a transferable principle.

Authors: We concur that additional quantitative details on curation are required to support claims of representativeness. In the revised §3.1 we will enumerate the specific generator versions and release dates used, document the exact JPEG quality factors and resizing pipelines applied by each platform, and describe the post-processing filters (e.g., duplicate removal, resolution thresholds). These expansions will clarify the artifact distribution and help readers evaluate the transferability of the low-/high-level balance principle. revision: yes
Referee: [§4.3] §4.3 (Ablation Studies): The paper states that naive scaling does not always help and that balancing low- and high-level cues is crucial, but the ablation tables do not isolate the marginal contribution of each optimized component (pre-processing, latent space, architecture) to the final 26.87% gain. This weakens the causal link between the design principle and the measured improvement.

Authors: We acknowledge that the current ablation tables do not isolate the incremental effect of each component. In the revised version we will add a set of incremental ablation experiments that successively introduce the optimized pre-processing, latent-space choice, and architecture, reporting the marginal AUC gain at each step. This will provide a clearer decomposition of how each design decision contributes to the overall 26.87% improvement. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ablation study with self-contained experimental results

full rationale

This paper is an empirical ablation study that introduces the ITW-SM dataset and measures the effects of architecture, pre-trained latent spaces, training data, and pre-processing choices on AI-generated image detection performance. The central result of a 26.87% average AUC improvement is obtained directly from experimental evaluations on the collected real-world images rather than from any mathematical derivation, first-principles prediction, or quantity that reduces to its own inputs by construction. No self-definitional patterns, fitted-input-called-predictions, or load-bearing self-citation chains appear in the reported analysis; the work remains self-contained against its own benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work to force the outcome.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the representativeness of the newly collected social-media dataset and on the assumption that the tested design choices capture the dominant factors affecting real-world detector performance.

axioms (1)

domain assumption The ITW-SM dataset is representative of real-world social media images and generators
Generalization of the 26.87% AUC improvement rests on this assumption.

pith-pipeline@v0.9.0 · 5769 in / 1237 out tokens · 38360 ms · 2026-05-21T23:27:32.111784+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

optimizing each design choice to enable the processing pipeline to propagate and effectively analyze both low-level traces as well as high-level image semantics
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

texture-based cropping... targets high-frequency regions such as edges and fine textures

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Automated In-the-Wild Data Collection for Continual AI Generated Image Detection
cs.CV 2026-05 unverdicted novelty 7.0

An automated fact-check-based pipeline for in-the-wild AI image data, when mixed with generator data in continual learning, lets detectors adapt to new generators while avoiding forgetting and delivers 8-9% accuracy g...
Boosting Robust AIGI Detection with LoRA-based Pairwise Training
cs.CV 2026-04 unverdicted novelty 4.0

LoRA-based pairwise training with distortion and size simulations boosts robust AIGI detection under severe distortions, placing third in the NTIRE challenge.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 2 Pith papers · 10 internal anchors

[1]

Amoroso, R., Morelli, D., Cornia, M., Baraldi, L., Bimbo, A.D., Cucchiara, R.: Parents and children: Distinguishing multimodal deepfakes from natural images (2024), https://arxiv.org/abs/2304.00500

work page arXiv 2024
[2]

In: IEEE Open Journal of Signal Processing (2023)

Bammey, Q.: Synthbuster: Towards detection of diffusion model generated images. In: IEEE Open Journal of Signal Processing (2023)

work page 2023
[3]

Chai, L., Bau, D., Lim, S.N., Isola, P.: What makes fake images detectable? understanding properties that generalize (2020), https://arxiv.org/abs/2008. 10588

work page 2020
[4]

Chen, Y., Zou, J.: Twigma: A dataset of ai-generated images with metadata from twitter (2023), https://arxiv.org/abs/2306.08310 28

work page arXiv 2023
[5]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cherti, M., Beaumont, R., Wightman, R., Wortsman, M., Ilharco, G., Gor- don, C., Schuhmann, C., Schmidt, L., Jitsev, J.: Reproducible scaling laws for contrastive language-image learning. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). p. 2818–2829. IEEE (Jun 2023). https://doi.org/10.1109/cvpr52729.2023.00276, http://dx.do...

work page doi:10.1109/cvpr52729.2023.00276 2023
[6]

Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models (2023), https://arxiv.org/abs/2304.06408

work page arXiv 2023
[7]

org/abs/2211.00680

Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models (2022), https://arxiv. org/abs/2211.00680

work page arXiv 2022
[8]

Cozzolino, D., Gragnaniello, D., Poggi, G., Verdoliva, L.: Towards universal gan image detection (2021), https://arxiv.org/abs/2112.12606

work page arXiv 2021
[9]

Cozzolino, D., Poggi, G., Corvi, R., Nießner, M., Verdoliva, L.: Raising the bar of ai-generated image detection with clip (2024), https://arxiv.org/abs/2312.00195

work page arXiv 2024
[10]

Cozzolino, D., Poggi, G., Nießner, M., Verdoliva, L.: Zero-shot detection of ai-generated images (2024), https://arxiv.org/abs/2409.15875

work page arXiv 2024
[11]

In: Proceedings of the 6th ACM Multimedia Systems Conference (2015)

Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: Raise: a raw images 29 dataset for digital image forensics. In: Proceedings of the 6th ACM Multimedia Systems Conference (2015)

work page 2015
[12]

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis (2021), https://arxiv.org/abs/2105.05233

work page internal anchor Pith review Pith/arXiv arXiv 2021
[13]

In: Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation

Dogoulis, P., Kordopatis-Zilos, G., Kompatsiaris, I., Papadopoulos, S.: Improving synthetically generated image detection in cross-concept settings. In: Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation. ICMR ’23, ACM (Jun 2023). https://doi.org/10.1145/3592572.3592846, http://dx. doi.org/10.1145/3592572.3592846

work page doi:10.1145/3592572.3592846 2023
[14]

Durall, R., Keuper, M., Keuper, J.: Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions (2020), https: //arxiv.org/abs/2003.01826

work page arXiv 2020
[15]

Gragnaniello, D., Cozzolino, D., Marra, F., Poggi, G., Verdoliva, L.: Are gan generated images easy to detect? a critical analysis of the state-of-the-art (2021), https:// arxiv.org/abs/2104.02617

work page arXiv 2021
[16]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015), https://arxiv.org/abs/1512.03385

work page internal anchor Pith review Pith/arXiv arXiv 2015
[17]

He, Z., Chen, P.Y., Ho, T.Y.: Rigid: A training-free and model-agnostic framework for robust ai-generated image detection (2024), https://arxiv.org/abs/2405.20112 30

work page arXiv 2024
[18]

Ju, Y., Jia, S., Ke, L., Xue, H., Nagano, K., Lyu, S.: Fusing global and local features for generalized ai-synthesized image detection (2022), https://arxiv.org/ abs/2203.13964

work page arXiv 2022
[19]

arXiv preprint arXiv:2408.11541 (2024)

Karageorgiou, D., Bammey, Q., Porcellini, V., Goupil, B., Teyssou, D., Papadopoulos, S.: Evolution of detection performance throughout the online lifespan of synthetic images. arXiv preprint arXiv:2408.11541 (2024)

work page arXiv 2024
[20]

Karageorgiou, D., Papadopoulos, S., Kompatsiaris, I., Gavves, E.: Any-resolution ai-generated image detection by spectral learning (2024), https://arxiv.org/abs/ 2411.19417

work page arXiv 2024
[21]

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation (2018), https://arxiv.org/abs/1710.10196

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Konstantinidou, D., Koutlis, C., Papadopoulos, S.: Texturecrop: Enhancing synthetic image detection through texture-based cropping (2025), https://arxiv.org/abs/ 2407.15500

work page arXiv 2025
[23]

Koutlis, C., Papadopoulos, S.: Leveraging representations from intermediate encoder- blocks for synthetic image detection (2024), https://arxiv.org/abs/2402.19091

work page arXiv 2024
[24]

International Journal of Computer Vision 128(7), 1956–1981 (Mar 2020)

Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A., Duerig, T., Ferrari, V.: The open images dataset v4: Unified image classification, object detection, and visual 31 relationship detection at scale. International Journal of Computer Vision 128(7), 1956–1981 (Mar 2020). https:...

work page doi:10.1007/s11263-020-01316-z 1956
[25]

Annals of Data Science 12(1), 141–170 (2025)

Li, J., Zhang, C., Zhu, W., Ren, Y.: A comprehensive survey of image generation models based on deep learning. Annals of Data Science 12(1), 141–170 (2025)

work page 2025
[26]

Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models (2023), https://arxiv.org/ abs/2301.12597

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

Li, J., Selvaraju, R.R., Gotmare, A.D., Joty, S., Xiong, C., Hoi, S.: Align before fuse: Vision and language representation learning with momentum distillation (2021), https://arxiv.org/abs/2107.07651

work page arXiv 2021
[28]

In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Li, Y., Bammey, Q., Gardella, M., Nikoukhah, T., Morel, J.M., Colom, M., Gioi, R.G.V.: Masksim: Detection of synthetic images by masked spectrum similarity anal- ysis. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3855–3865. IEEE (Jun 2024)

work page 2023
[29]

Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Doll´ ar, P.: Microsoft coco: Common objects in context (2015), https://arxiv.org/abs/1405.0312

work page internal anchor Pith review Pith/arXiv arXiv 2015
[30]

Lu, Z., Huang, D., Bai, L., Qu, J., Wu, C., Liu, X., Ouyang, W.: Seeing is not always 32 believing: Benchmarking human and model perception of ai-generated images (2023), https://arxiv.org/abs/2304.13023

work page arXiv 2023
[31]

Mandelli, S., Bonettini, N., Bestagini, P., Tubaro, S.: Detecting gan-generated images by orthogonal training of multiple cnns (2022), https://arxiv.org/abs/2203.02246

work page arXiv 2022
[32]

Ojha, U., Li, Y., Lee, Y.J.: Towards universal fake image detectors that generalize across generative models (2024), https://arxiv.org/abs/2302.10174

work page arXiv 2024
[33]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual featur...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

In: 11th International Workshop on Biometrics and Forensics (IWBF)

Papa, L., Faiella, L., Corvitto, L., Maiano, L., Amerini, I.: On the use of stable diffusion for creating realistic faces: From generation to detection. In: 11th International Workshop on Biometrics and Forensics (IWBF). pp. 1–6 (2023)

work page 2023
[35]

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision (2021), https://arxiv.org/abs/ 2103.00020 33

work page internal anchor Pith review Pith/arXiv arXiv 2021
[36]

Ricker, J., Lukovnikov, D., Fischer, A.: Aeroblade: Training-free detection of latent diffusion images using autoencoder reconstruction error (2024), https://arxiv.org/ abs/2401.17879

work page arXiv 2024
[37]

Schinas, M., Papadopoulos, S.: Sidbench: A python framework for reliably assessing synthetic image detection methods (2024), https://arxiv.org/abs/2404.18552

work page arXiv 2024
[38]

Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., Schramowski, P., Kundurthy, S., Crowson, K., Schmidt, L., Kaczmarczyk, R., Jitsev, J.: Laion-5b: An open large-scale dataset for training next generation image-text models (2022), https://arxiv.org/ abs/2210.08402

work page internal anchor Pith review Pith/arXiv arXiv 2022
[39]

Sha, Z., Li, Z., Yu, N., Zhang, Y.: De-fake: Detection and attribution of fake images generated by text-to-image generation models (2023), https://arxiv.org/ abs/2210.06998

work page arXiv 2023
[40]

Tan, C., Liu, H., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection (2023), https://arxiv.org/abs/2312.10461

work page arXiv 2023
[41]

In: Proceedings of the 34 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Tan, C., Zhao, Y., Wei, S., Gu, G., Wei, Y.: Learning on gradients: Generalized artifacts representation for gan-generated images detection. In: Proceedings of the 34 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12105– 12114 (2023)

work page 2023
[42]

Tan, M., Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neural networks (2020), https://arxiv.org/abs/1905.11946

work page internal anchor Pith review Pith/arXiv arXiv 2020
[43]

Tredinnick, L., Laybats, C.: The dangers of generative artificial intelligence (2023)

work page 2023
[44]

for now (2020), https://arxiv.org/abs/1912.11035

Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: Cnn-generated images are surprisingly easy to spot... for now (2020), https://arxiv.org/abs/1912.11035

work page arXiv 2020
[45]

Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: Dire for diffusion- generated image detection (2023), https://arxiv.org/abs/2303.09295

work page arXiv 2023
[46]

Yan, S., Li, O., Cai, J., Hao, Y., Jiang, X., Hu, Y., Xie, W.: A sanity check for ai-generated image detection (2025), https://arxiv.org/abs/2406.19435

work page arXiv 2025
[47]

Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop (2016), https://arxiv.org/abs/1506.03365 35

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Amoroso, R., Morelli, D., Cornia, M., Baraldi, L., Bimbo, A.D., Cucchiara, R.: Parents and children: Distinguishing multimodal deepfakes from natural images (2024), https://arxiv.org/abs/2304.00500

work page arXiv 2024

[2] [2]

In: IEEE Open Journal of Signal Processing (2023)

Bammey, Q.: Synthbuster: Towards detection of diffusion model generated images. In: IEEE Open Journal of Signal Processing (2023)

work page 2023

[3] [3]

Chai, L., Bau, D., Lim, S.N., Isola, P.: What makes fake images detectable? understanding properties that generalize (2020), https://arxiv.org/abs/2008. 10588

work page 2020

[4] [4]

Chen, Y., Zou, J.: Twigma: A dataset of ai-generated images with metadata from twitter (2023), https://arxiv.org/abs/2306.08310 28

work page arXiv 2023

[5] [5]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Cherti, M., Beaumont, R., Wightman, R., Wortsman, M., Ilharco, G., Gor- don, C., Schuhmann, C., Schmidt, L., Jitsev, J.: Reproducible scaling laws for contrastive language-image learning. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). p. 2818–2829. IEEE (Jun 2023). https://doi.org/10.1109/cvpr52729.2023.00276, http://dx.do...

work page doi:10.1109/cvpr52729.2023.00276 2023

[6] [6]

Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models (2023), https://arxiv.org/abs/2304.06408

work page arXiv 2023

[7] [7]

org/abs/2211.00680

Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models (2022), https://arxiv. org/abs/2211.00680

work page arXiv 2022

[8] [8]

Cozzolino, D., Gragnaniello, D., Poggi, G., Verdoliva, L.: Towards universal gan image detection (2021), https://arxiv.org/abs/2112.12606

work page arXiv 2021

[9] [9]

Cozzolino, D., Poggi, G., Corvi, R., Nießner, M., Verdoliva, L.: Raising the bar of ai-generated image detection with clip (2024), https://arxiv.org/abs/2312.00195

work page arXiv 2024

[10] [10]

Cozzolino, D., Poggi, G., Nießner, M., Verdoliva, L.: Zero-shot detection of ai-generated images (2024), https://arxiv.org/abs/2409.15875

work page arXiv 2024

[11] [11]

In: Proceedings of the 6th ACM Multimedia Systems Conference (2015)

Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: Raise: a raw images 29 dataset for digital image forensics. In: Proceedings of the 6th ACM Multimedia Systems Conference (2015)

work page 2015

[12] [12]

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis (2021), https://arxiv.org/abs/2105.05233

work page internal anchor Pith review Pith/arXiv arXiv 2021

[13] [13]

In: Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation

Dogoulis, P., Kordopatis-Zilos, G., Kompatsiaris, I., Papadopoulos, S.: Improving synthetically generated image detection in cross-concept settings. In: Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation. ICMR ’23, ACM (Jun 2023). https://doi.org/10.1145/3592572.3592846, http://dx. doi.org/10.1145/3592572.3592846

work page doi:10.1145/3592572.3592846 2023

[14] [14]

Durall, R., Keuper, M., Keuper, J.: Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions (2020), https: //arxiv.org/abs/2003.01826

work page arXiv 2020

[15] [15]

Gragnaniello, D., Cozzolino, D., Marra, F., Poggi, G., Verdoliva, L.: Are gan generated images easy to detect? a critical analysis of the state-of-the-art (2021), https:// arxiv.org/abs/2104.02617

work page arXiv 2021

[16] [16]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015), https://arxiv.org/abs/1512.03385

work page internal anchor Pith review Pith/arXiv arXiv 2015

[17] [17]

He, Z., Chen, P.Y., Ho, T.Y.: Rigid: A training-free and model-agnostic framework for robust ai-generated image detection (2024), https://arxiv.org/abs/2405.20112 30

work page arXiv 2024

[18] [18]

Ju, Y., Jia, S., Ke, L., Xue, H., Nagano, K., Lyu, S.: Fusing global and local features for generalized ai-synthesized image detection (2022), https://arxiv.org/ abs/2203.13964

work page arXiv 2022

[19] [19]

arXiv preprint arXiv:2408.11541 (2024)

Karageorgiou, D., Bammey, Q., Porcellini, V., Goupil, B., Teyssou, D., Papadopoulos, S.: Evolution of detection performance throughout the online lifespan of synthetic images. arXiv preprint arXiv:2408.11541 (2024)

work page arXiv 2024

[20] [20]

Karageorgiou, D., Papadopoulos, S., Kompatsiaris, I., Gavves, E.: Any-resolution ai-generated image detection by spectral learning (2024), https://arxiv.org/abs/ 2411.19417

work page arXiv 2024

[21] [21]

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation (2018), https://arxiv.org/abs/1710.10196

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Konstantinidou, D., Koutlis, C., Papadopoulos, S.: Texturecrop: Enhancing synthetic image detection through texture-based cropping (2025), https://arxiv.org/abs/ 2407.15500

work page arXiv 2025

[23] [23]

Koutlis, C., Papadopoulos, S.: Leveraging representations from intermediate encoder- blocks for synthetic image detection (2024), https://arxiv.org/abs/2402.19091

work page arXiv 2024

[24] [24]

International Journal of Computer Vision 128(7), 1956–1981 (Mar 2020)

Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A., Duerig, T., Ferrari, V.: The open images dataset v4: Unified image classification, object detection, and visual 31 relationship detection at scale. International Journal of Computer Vision 128(7), 1956–1981 (Mar 2020). https:...

work page doi:10.1007/s11263-020-01316-z 1956

[25] [25]

Annals of Data Science 12(1), 141–170 (2025)

Li, J., Zhang, C., Zhu, W., Ren, Y.: A comprehensive survey of image generation models based on deep learning. Annals of Data Science 12(1), 141–170 (2025)

work page 2025

[26] [26]

Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models (2023), https://arxiv.org/ abs/2301.12597

work page internal anchor Pith review Pith/arXiv arXiv 2023

[27] [27]

Li, J., Selvaraju, R.R., Gotmare, A.D., Joty, S., Xiong, C., Hoi, S.: Align before fuse: Vision and language representation learning with momentum distillation (2021), https://arxiv.org/abs/2107.07651

work page arXiv 2021

[28] [28]

In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Li, Y., Bammey, Q., Gardella, M., Nikoukhah, T., Morel, J.M., Colom, M., Gioi, R.G.V.: Masksim: Detection of synthetic images by masked spectrum similarity anal- ysis. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3855–3865. IEEE (Jun 2024)

work page 2023

[29] [29]

Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Doll´ ar, P.: Microsoft coco: Common objects in context (2015), https://arxiv.org/abs/1405.0312

work page internal anchor Pith review Pith/arXiv arXiv 2015

[30] [30]

Lu, Z., Huang, D., Bai, L., Qu, J., Wu, C., Liu, X., Ouyang, W.: Seeing is not always 32 believing: Benchmarking human and model perception of ai-generated images (2023), https://arxiv.org/abs/2304.13023

work page arXiv 2023

[31] [31]

Mandelli, S., Bonettini, N., Bestagini, P., Tubaro, S.: Detecting gan-generated images by orthogonal training of multiple cnns (2022), https://arxiv.org/abs/2203.02246

work page arXiv 2022

[32] [32]

Ojha, U., Li, Y., Lee, Y.J.: Towards universal fake image detectors that generalize across generative models (2024), https://arxiv.org/abs/2302.10174

work page arXiv 2024

[33] [33]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: Dinov2: Learning robust visual featur...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

In: 11th International Workshop on Biometrics and Forensics (IWBF)

Papa, L., Faiella, L., Corvitto, L., Maiano, L., Amerini, I.: On the use of stable diffusion for creating realistic faces: From generation to detection. In: 11th International Workshop on Biometrics and Forensics (IWBF). pp. 1–6 (2023)

work page 2023

[35] [35]

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision (2021), https://arxiv.org/abs/ 2103.00020 33

work page internal anchor Pith review Pith/arXiv arXiv 2021

[36] [36]

Ricker, J., Lukovnikov, D., Fischer, A.: Aeroblade: Training-free detection of latent diffusion images using autoencoder reconstruction error (2024), https://arxiv.org/ abs/2401.17879

work page arXiv 2024

[37] [37]

Schinas, M., Papadopoulos, S.: Sidbench: A python framework for reliably assessing synthetic image detection methods (2024), https://arxiv.org/abs/2404.18552

work page arXiv 2024

[38] [38]

Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., Schramowski, P., Kundurthy, S., Crowson, K., Schmidt, L., Kaczmarczyk, R., Jitsev, J.: Laion-5b: An open large-scale dataset for training next generation image-text models (2022), https://arxiv.org/ abs/2210.08402

work page internal anchor Pith review Pith/arXiv arXiv 2022

[39] [39]

Sha, Z., Li, Z., Yu, N., Zhang, Y.: De-fake: Detection and attribution of fake images generated by text-to-image generation models (2023), https://arxiv.org/ abs/2210.06998

work page arXiv 2023

[40] [40]

Tan, C., Liu, H., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection (2023), https://arxiv.org/abs/2312.10461

work page arXiv 2023

[41] [41]

In: Proceedings of the 34 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Tan, C., Zhao, Y., Wei, S., Gu, G., Wei, Y.: Learning on gradients: Generalized artifacts representation for gan-generated images detection. In: Proceedings of the 34 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12105– 12114 (2023)

work page 2023

[42] [42]

Tan, M., Le, Q.V.: Efficientnet: Rethinking model scaling for convolutional neural networks (2020), https://arxiv.org/abs/1905.11946

work page internal anchor Pith review Pith/arXiv arXiv 2020

[43] [43]

Tredinnick, L., Laybats, C.: The dangers of generative artificial intelligence (2023)

work page 2023

[44] [44]

for now (2020), https://arxiv.org/abs/1912.11035

Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: Cnn-generated images are surprisingly easy to spot... for now (2020), https://arxiv.org/abs/1912.11035

work page arXiv 2020

[45] [45]

Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: Dire for diffusion- generated image detection (2023), https://arxiv.org/abs/2303.09295

work page arXiv 2023

[46] [46]

Yan, S., Li, O., Cai, J., Hao, Y., Jiang, X., Hu, Y., Xie, W.: A sanity check for ai-generated image detection (2025), https://arxiv.org/abs/2406.19435

work page arXiv 2025

[47] [47]

Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop (2016), https://arxiv.org/abs/1506.03365 35

work page internal anchor Pith review Pith/arXiv arXiv 2016