pith. machine review for the scientific record. sign in

arxiv: 2605.04445 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 3 theorem links

LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection

Caiyan Qin, Jiwei Wei, Ke Liu, Ran Ran, Shuchang Zhou, Yutong Xiao, Zheng Ziqiang

Pith reviewed 2026-05-08 18:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords synthetic image detectionLoRA adaptationgenerator-specific artifactsdeepfake detectionmodular frameworktwo-stage trainingattention fusion
0
0 comments X

The pith

LEGO detects synthetic images by pretraining separate LoRA modules on each generator's unique artifacts then using MLP modulation and attention fusion to combine them on mixed data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that universal artifact detectors lose effectiveness as more generators appear because shared features become rarer. Focusing only on one generator's patterns leads to overfitting. LEGO instead assigns each generator its own pretrained LoRA module to capture distinctive traces, then trains an MLP to control their contributions and attention layers to fuse the results. This two-stage process lets the system add new modules for fresh generators without retraining everything. The result is higher accuracy than prior methods while using under 30,000 images and only five epochs per stage.

Core claim

By dividing training into individual LoRA pretraining on single-generator sets followed by MLP and attention training on mixed sets, the framework extracts generator-specific artifacts in dedicated low-rank adapters and dynamically regulates their use, avoiding both the dilution of universal features and the overfitting of single-pattern detectors while remaining extensible to new generators.

What carries the argument

MLP-modulated LoRA blocks with attention-based feature fusion: each LoRA is pretrained on one generator's data to hold its unique artifacts, the MLP learns to scale their influence, and attention fuses the modulated features for the final decision.

If this is right

  • New LoRA modules can be inserted for emerging generators without full retraining of the detector.
  • Detection accuracy exceeds prior state-of-the-art methods when trained on fewer than 30,000 images and under 10 percent of the data volume used by earlier approaches.
  • Each training stage requires only five epochs.
  • The modular split prevents the overlap loss that harms universal-feature detectors and the overfitting that harms single-generator detectors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same modular adapter pattern could let detectors for other media types, such as audio or video, incorporate new synthesis methods incrementally.
  • Tracking which LoRA module receives the highest modulation weight during inference might reveal which generator is most likely responsible for a given fake image.
  • Because each LoRA is small and independent, storage and update costs for the detector grow linearly with the number of known generators rather than requiring complete model replacement.

Load-bearing premise

Generator-specific artifacts stay distinct enough that the MLP and attention layers can pick and blend the right modules without causing overfitting or reduced performance on older generators.

What would settle it

Performance on a held-out set of mixed real and synthetic images drops sharply when a new generator's LoRA module is added and tested alongside the original modules.

Figures

Figures reproduced from arXiv: 2605.04445 by Caiyan Qin, Jiwei Wei, Ke Liu, Ran Ran, Shuchang Zhou, Yutong Xiao, Zheng Ziqiang.

Figure 1
Figure 1. Figure 1: Comparison of detector efficiency and generalization. Left: data-efficiency–performance trade-off using ACC on view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed LEGO framework. LEGO is built upon a frozen CLIP backbone and trained in two stages. In view at source ↗
read the original abstract

The rapid advancement of generative technologies has made synthetic images nearly indistinguishable from real ones, thereby creating an urgent need for robust detectors to counter misinformation. However, existing methods mainly rely on universal artifact features that are shared across multiple generators. We observe that as the diversity of generators increases, the overlap of these common features gradually decreases. This severely undermines model generalization. In contrast, focusing only on unique artifacts tends to cause overfitting to specific forgery patterns. To address this challenge, we propose LEGO (LoRA-Enabled Generator-Oriented Framework). The core mechanism of LEGO employs an MLP to modulate multiple LoRA (Low-Rank Adaptation) blocks, each pretrained to capture the unique artifacts of a specific generator, followed by attention-based feature fusion. Unlike conventional methods that seek a single universal solution, LEGO delegates unique artifact extraction to specialized LoRA modules by dividing its training procedure into two stages. Each LoRA module is individually trained on a single-generator dataset to learn generator-specific representations, then MLP and attention layers are trained on mixed datasets to dynamically regulate the contribution of each module. Benefiting from its modular yet robust design, LEGO can be naturally extended by incorporating new LoRA modules for adaptation to newly emerging next-generation datasets, while still achieving substantially better performance than prior SOTA methods with fewer than 30,000 training images, less than 10% of their training data, and only 5 epochs in each training stage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes LEGO, a LoRA-enabled framework for synthetic image detection. Individual LoRA modules are pretrained in isolation on single-generator datasets to capture unique artifacts; an MLP then modulates their contributions and attention fuses the features during a second stage on mixed data. The central claims are substantially superior performance to prior SOTA methods while using under 30,000 training images (less than 10% of prior data) and only 5 epochs per stage, plus natural extensibility by adding new LoRA modules for emerging generators.

Significance. If the performance and extensibility claims are substantiated, the modular specialization-plus-fusion design could meaningfully advance generalization in synthetic-image detection as generator diversity grows, offering a data-efficient alternative to universal-artifact approaches.

major comments (3)
  1. Abstract: the claim of 'substantially better performance than prior SOTA methods with fewer than 30,000 training images, less than 10% of their training data, and only 5 epochs in each training stage' is load-bearing for the contribution yet is unsupported by any quantitative metrics, tables, ablation studies, dataset descriptions, or evaluation protocol, preventing assessment of the data-to-claim link.
  2. Abstract: the MLP modulation of multiple LoRA blocks and the subsequent attention-based fusion are described only procedurally, with no equations, pseudocode, or architectural diagram specifying how per-module scaling, gating, or offsets are computed; this directly affects the central extensibility claim that new modules can be added without crosstalk or degradation on prior generators.
  3. Abstract: the motivating observation that 'as the diversity of generators increases, the overlap of these common features gradually decreases' is stated without supporting analysis, quantitative measurement, or citation, yet it underpins the decision to move from universal to generator-oriented modeling.
minor comments (1)
  1. The acronym expansion 'LoRA-Enabled Generator-Oriented Framework' is given but the manuscript would benefit from an explicit statement of the exact generators and datasets used to pretrain the initial LoRA modules.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below with clarifications from the full manuscript and indicate where revisions will strengthen the abstract.

read point-by-point responses
  1. Referee: Abstract: the claim of 'substantially better performance than prior SOTA methods with fewer than 30,000 training images, less than 10% of their training data, and only 5 epochs in each training stage' is load-bearing for the contribution yet is unsupported by any quantitative metrics, tables, ablation studies, dataset descriptions, or evaluation protocol, preventing assessment of the data-to-claim link.

    Authors: The full manuscript provides the supporting quantitative comparisons, ablation studies, dataset descriptions, and evaluation protocols in the Experiments section, which demonstrate the claimed performance gains and data efficiency. To address the concern that the abstract is not self-contained, we will revise the abstract to include key performance highlights and a brief reference to the experimental evidence. revision: yes

  2. Referee: Abstract: the MLP modulation of multiple LoRA blocks and the subsequent attention-based fusion are described only procedurally, with no equations, pseudocode, or architectural diagram specifying how per-module scaling, gating, or offsets are computed; this directly affects the central extensibility claim that new modules can be added without crosstalk or degradation on prior generators.

    Authors: The full manuscript details the MLP modulation and attention-based fusion with equations and an architectural diagram in the Method section; the extensibility claim is supported by experiments in the later sections. We will revise the abstract to reference these equations and the diagram explicitly. revision: yes

  3. Referee: Abstract: the motivating observation that 'as the diversity of generators increases, the overlap of these common features gradually decreases' is stated without supporting analysis, quantitative measurement, or citation, yet it underpins the decision to move from universal to generator-oriented modeling.

    Authors: The full manuscript grounds this observation in preliminary analysis with visualizations and quantitative measurements in the Introduction. We will revise the abstract to include a concise quantitative summary and add relevant citations. revision: yes

Circularity Check

0 steps flagged

No circularity: procedural two-stage training method with empirical claims

full rationale

The paper presents a modular training procedure (individual LoRA pretraining on single-generator data, followed by MLP modulation and attention fusion on mixed data) without any equations, derivations, or first-principles results that reduce to self-referential definitions or fitted inputs. Performance and extensibility claims are stated as empirical outcomes of the described architecture rather than logical necessities derived from the inputs themselves. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked that collapse the central argument. The method is self-contained as a design choice whose validity rests on external validation, not internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that distinct generators leave sufficiently separable artifact signatures that can be isolated by LoRA adapters and later recombined by a learned router.

axioms (1)
  • domain assumption LoRA modules pretrained on single-generator data capture unique, non-overlapping artifacts that remain useful when combined on mixed data
    Invoked in the description of the two-stage training and the claim of easy extension to new generators.

pith-pipeline@v0.9.0 · 5570 in / 1283 out tokens · 41847 ms · 2026-05-08T18:32:04.144457+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 16 canonical work pages · 3 internal anchors

  1. [1]

    Bar Cavia, Eliahu Horwitz, Tal Reiss, and Yedid Hoshen. 2024. Real-time deepfake detection in the real-world.arXiv preprint arXiv:2406.09398(2024)

  2. [2]

    George Cazenavette, Avneesh Sud, Thomas Leung, and Ben Usman. 2024. Fakein- version: Learning to detect images from unseen text-to-image models by inverting stable diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10759–10769

  3. [3]

    Siyuan Cheng, Lingjuan Lyu, Zhenting Wang, Xiangyu Zhang, and Vikash Se- hwag. 2025. Co-spy: Combining semantic and pixel features to detect synthetic images by ai. InProceedings of the Computer Vision and Pattern Recognition Con- ference. 13455–13465

  4. [4]

    Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, and Luisa Verdoliva. 2024. Raising the bar of ai-generated image detection with clip (2023). arXiv preprint arXiv:2312.00195(2024)

  5. [5]

    Aayush Dhakal, Subash Khanal, Srikumar Sastry, Jacob Arndt, Philipe Ambrozio Dias, Dalton Lunga, and Nathan Jacobs. 2026. SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images.arXiv preprint arXiv:2602.20412(2026)

  6. [6]

    Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis.Advances in neural information processing systems34 (2021), 8780–8794

  7. [7]

    Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. 2020. Leveraging frequency analysis for deep fake image recognition. InInternational conference on machine learning. PMLR, 3247–3258

  8. [8]

    Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets.Advances in neural information processing systems27 (2014)

  9. [9]

    Fabrizio Guillaro, Giada Zingarini, Ben Usman, Avneesh Sud, Davide Cozzolino, and Luisa Verdoliva. 2025. A bias-free training paradigm for more general ai- generated image detection. InProceedings of the Computer Vision and Pattern Recognition Conference. 18685–18694

  10. [10]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in neural information processing systems33 (2020), 6840–6851

  11. [11]

    Yan Hong and Jianfu Zhang. 2024. Wildfake: A large-scale challenging dataset for ai-generated images detection.arXiv preprint arXiv:2402.11843(2024)

  12. [12]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.Iclr1, 2 (2022), 3

  13. [13]

    Yan Ju, Shan Jia, Lipeng Ke, Hongfei Xue, Koki Nagano, and Siwei Lyu. 2022. Fusing global and local features for generalized ai-synthesized image detection. In2022 IEEE International Conference on Image Processing (ICIP). IEEE, 3465–3469

  14. [14]

    Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris, and Efs- tratios Gavves. 2025. Any-resolution ai-generated image detection by spectral learning. InProceedings of the Computer Vision and Pattern Recognition Conference. 18706–18717

  15. [15]

    Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation.arXiv preprint arXiv:1710.10196(2017)

  16. [16]

    Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator ar- chitecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410

  17. [17]

    Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng

  18. [18]

    InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

    Improving synthetic image detection towards generalization: An image transformation perspective. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 2405–2414

  19. [19]

    Ziqiang Li, Jiazhen Yan, Ziwen He, Kai Zeng, Weiwei Jiang, Lizhi Xiong, and Zhangjie Fu. 2025. Is artificial intelligence generated image detection a solved problem?arXiv preprint arXiv:2505.12335(2025)

  20. [20]

    Shuqiao Liang, Jian Liu, Renzhang Chen, and Quanlong Guan. 2025. FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies.arXiv preprint arXiv:2509.20890(2025)

  21. [21]

    Bo Liu, Fan Yang, Xiuli Bi, Bin Xiao, Weisheng Li, and Xinbo Gao. 2022. Detecting generated images by real images. InEuropean conference on computer vision. Springer, 95–110

  22. [22]

    Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. 2021. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 772–781

  23. [23]

    Zhengzhe Liu, Xiaojuan Qi, and Philip HS Torr. 2020. Global texture enhancement for fake face detection in the wild. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8060–8069

  24. [24]

    Peter Lorenz, Ricard L Durall, and Janis Keuper. 2023. Detecting images generated by deep diffusion models using their local intrinsic dimensionality. InProceedings of the IEEE/CVF International Conference on Computer Vision. 448–459

  25. [25]

    Lianrui Mu, Zou Xingze, Jianhong Bai, Jiaqi Hu, Wenjie Zheng, Jiangnan Ye, Jiedong Zhuang, Mudassar Ali, Jing Wang, and Haoji Hu. 2025. No Pixel Left Be- hind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection.arXiv preprint arXiv:2508.17346(2025)

  26. [26]

    Tai D Nguyen, Aref Azizpour, and Matthew C Stamm. 2025. Forensic self- descriptions are all you need for zero-shot detection, open-set source attribution, and clustering of ai-generated images. InProceedings of the Computer Vision and Pattern Recognition Conference. 3040–3050

  27. [27]

    Hong-Hanh Nguyen-Le, Van-Tuan Tran, Thuc D Nguyen, and Nhien-An Le-Khac

  28. [28]

    InProceedings of the AAAI Conference on Artificial Intelligence, Vol

    Beyond Binary Classification: A Semi-supervised Approach to Generalized AI-generated Image Detection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 35733–35741

  29. [29]

    Yunsheng Ni, Depu Meng, Changqian Yu, Chengbin Quan, Dongchun Ren, and Youjian Zhao. 2022. Core: Consistent representation learning for face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12–21

  30. [30]

    Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. 2023. Towards universal fake image detectors that generalize across generative models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 24480–24489

  31. [31]

    Anisha Pal, Julia Kruk, Mansi Phute, Manognya Bhattaram, Diyi Yang, Duen Horng Chau, and Judy Hoffman. 2024. Semi-truths: A large-scale dataset of ai-augmented images for evaluating robustness of ai-generated image detectors. Advances in Neural Information Processing Systems37 (2024), 118025–118051

  32. [32]

    Jeongsoo Park and Andrew Owens. 2025. Community forensics: Using thousands of generators to train fake image detectors. InProceedings of the Computer Vision and Pattern Recognition Conference. 8245–8257

  33. [33]

    Lorenzo Pellegrini, Davide Cozzolino, Serafino Pandolfini, Davide Maltoni, Mat- teo Ferrara, Luisa Verdoliva, Marco Prati, and Marco Ramilli. 2025. AI-GenBench: A New Ongoing Benchmark for AI-Generated Image Detection. In2025 Interna- tional Joint Conference on Neural Networks (IJCNN). IEEE, 1–9. MM ’26, November 10–14, 2026, Rio de Janeiro, Brazil Yutong...

  34. [34]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763

  35. [35]

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen

  36. [36]

    Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.061251, 2 (2022), 3

  37. [37]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695

  38. [38]

    Sergey Sinitsa and Ohad Fried. 2024. Deep image fingerprint: Towards low budget synthetic image detection and model lineage analysis. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 4067–4076

  39. [39]

    Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502(2020)

  40. [40]

    Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, and Yunchao Wei. 2025. C2p-clip: Injecting category common prompt in clip to enhance generalization in deepfake detection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 7184–7192

  41. [41]

    Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. 2024. Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 5052–5060

  42. [42]

    Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. 2024. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 28130–28139

  43. [43]

    Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, and Yunchao Wei. 2023. Learning on gradients: Generalized artifacts representation for gan-generated images detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12105–12114

  44. [44]

    Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. 2020. CNN-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8695–8704

  45. [45]

    Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. 2023. Dire for diffusion-generated image detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22445– 22455

  46. [46]

    Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. 2023. Tree- ring watermarks: Fingerprints for diffusion images that are invisible and robust. arXiv preprint arXiv:2305.20030(2023)

  47. [47]

    Haiwei Wu, Jiantao Zhou, and Shile Zhang. 2025. Generalizable synthetic image detection via language-guided contrastive learning.IEEE Transactions on Artificial Intelligence(2025)

  48. [48]

    Cheng Xia, Manxi Lin, Jiexiang Tan, Xiaoxiong Du, Yang Qiu, Junjun Zheng, Xiangheng Kong, Yuning Jiang, and Bo Zheng. 2025. MIRAGE: Towards AI- Generated Image Detection in the Wild.arXiv preprint arXiv:2508.13223(2025)

  49. [49]

    Jiazhen Yan, Ziqiang Li, Fan Wang, Ziwen He, and Zhangjie Fu. 2026. Dual Frequency Branch Framework with Reconstructed Sliding Windows Attention for AI-Generated Image Detection.IEEE Transactions on Information Forensics and Security(2026)

  50. [50]

    Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Weidi Xie. 2024. A sanity check for ai-generated image detection.arXiv preprint arXiv:2406.19435(2024)

  51. [51]

    Zhiyuan Yan, Jiangming Wang, Zhendong Wang, Peng Jin, Ke-Yue Zhang, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. 2024. Effort: Efficient orthogonal modeling for generalizable ai-generated image detection. arXiv preprint arXiv:2411.156332, 6 (2024), 7

  52. [52]

    Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization.arXiv preprint arXiv:1710.09412 (2017)

  53. [53]

    Haifeng Zhang, Qinghui He, Xiuli Bi, Weisheng Li, Bo Liu, and Bin Xiao. 2025. Towards universal ai-generated image detection by variational information bot- tleneck network. InProceedings of the Computer Vision and Pattern Recognition Conference. 23828–23837

  54. [54]

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional con- trol to text-to-image diffusion models. InProceedings of the IEEE/CVF international conference on computer vision. 3836–3847

  55. [55]

    Chende Zheng, Chenhao Lin, Zhengyu Zhao, Hang Wang, Xu Guo, Shuai Liu, and Chao Shen. 2024. Breaking semantic artifacts for generalized ai-generated image detection.Advances in Neural Information Processing Systems37 (2024), 59570–59596

  56. [56]

    Nan Zhong, Yiran Xu, Sheng Li, Zhenxing Qian, and Xinpeng Zhang. 2023. Patchcraft: Exploring texture patch for efficient ai-generated image detection. arXiv preprint arXiv:2311.12397(2023)

  57. [57]

    Ziyin Zhou, Yunpeng Luo, Yuanchen Wu, Ke Sun, Jiayi Ji, Ke Yan, Shouhong Ding, Xiaoshuai Sun, Yunsheng Wu, and Rongrong Ji. 2025. Aigi-holmes: Towards explainable and generalizable ai-generated image detection via multimodal large language models. InProceedings of the IEEE/CVF International Conference on Computer Vision. 18746–18758

  58. [58]

    Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. 2023. Genimage: A million-scale benchmark for detecting ai-generated image.Advances in neural information processing systems36 (2023), 77771–77782

  59. [59]

    Wanyi Zhuang, Qi Chu, Zhentao Tan, Qiankun Liu, Haojie Yuan, Changtao Miao, Zixiang Luo, and Nenghai Yu. 2022. UIA-ViT: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection. InEuropean conference on computer vision. Springer, 391–407