pith. machine review for the scientific record. sign in

arxiv: 2604.03555 · v1 · submitted 2026-04-04 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

HEDGE: Heterogeneous Ensemble for Detection of AI-GEnerated Images in the Wild

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:49 UTC · model grok-4.3

classification 💻 cs.CV
keywords AI-generated image detectionheterogeneous ensemblerobust detectionimage forensicsensemble fusiongenerative modelscomputer visionAIGC benchmarks
0
0 comments X

The pith

HEDGE detects AI-generated images robustly by ensembling detectors across diverse training data, resolutions, and backbones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that single training regimes, fixed resolutions, or one backbone cannot cover the range of evolving generative models and real-world distortions in AI images. HEDGE builds three routes to address this: a DINOv3 path with staged data expansion and augmentation, a high-resolution branch for fine details, and a MetaCLIP2 branch for backbone variety. These outputs combine through weighted logit averaging refined by dual-gating that corrects branch outliers and fusion mistakes. The resulting system reaches fourth place in the NTIRE 2026 challenge while leading on multiple AIGC benchmarks. This shows that deliberate spread across key design axes can outperform uniform single-model detectors for this task.

Core claim

HEDGE establishes that structuring detection around three complementary axes—progressive training data expansion, multi-scale resolution, and backbone heterogeneity—then fusing the routes via logit-space weighted averaging and a lightweight dual-gating mechanism yields stronger robustness to unseen generators and distortions than any single route alone.

What carries the argument

Three detection routes (DINOv3-based staged training, higher-resolution branch, MetaCLIP2 branch) fused by logit weighted averaging and dual-gating for outlier correction and majority-error handling.

If this is right

  • The ensemble handles a broader set of unseen generative models and distortions than homogeneous detectors.
  • It attains state-of-the-art accuracy and robustness on standard AIGC image detection benchmarks.
  • It secures fourth place in the NTIRE 2026 Robust AI-Generated Image Detection in the Wild Challenge.
  • The dual-gating fusion reduces the impact of individual branch failures without requiring retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same three-axis heterogeneity pattern could be tested on related tasks such as video or audio deepfake detection.
  • Adding further backbone types or resolution levels might increase performance provided the new routes stay complementary.
  • Long-term monitoring on post-2026 generators would test whether the current diversity axes continue to cover future model shifts.

Load-bearing premise

The three chosen axes of heterogeneity plus dual-gating will remain complementary and superior when facing entirely new generative models and distortion distributions not seen in development.

What would settle it

A new generative model that causes all three individual routes in HEDGE to fail at comparable rates would show the heterogeneity no longer supplies independent strengths.

Figures

Figures reproduced from arXiv: 2604.03555 by Dagong Lu, Fei Wu, Fengjun Guo, Mufeng Yao, Xinlei Xu.

Figure 1
Figure 1. Figure 1: Overview of the proposed three-route framework for robust AIGC image detection. Route A progressively constructs DINOv3- [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Robustness evaluation under common image perturbations on HiRes-50K (1,000 real + 1,000 fake, unseen during training). [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: t-SNE visualization of M3 (DINOv3-Huge) CLS token [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Robust detection of AI-generated images in the wild remains challenging due to the rapid evolution of generative models and varied real-world distortions. We argue that relying on a single training regime, resolution, or backbone is insufficient to handle all conditions, and that structured heterogeneity across these dimensions is essential for robust detection. To this end, we propose HEDGE, a Heterogeneous Ensemble for Detection of AI-GEnerated images, that introduces complementary detection routes along three axes: diverse training data with strong augmentation, multi-scale feature extraction, and backbone heterogeneity. Specifically, Route~A progressively constructs DINOv3-based detectors through staged data expansion and augmentation escalation, Route~B incorporates a higher-resolution branch for fine-grained forensic cues, and Route~C adds a MetaCLIP2-based branch for backbone diversity. All outputs are fused via logit-space weighted averaging, refined by a lightweight dual-gating mechanism that handles branch-level outliers and majority-dominated fusion errors. HEDGE achieves 4th place in the NTIRE 2026 Robust AI-Generated Image Detection in the Wild Challenge and attains state-of-the-art performance with strong robustness on multiple AIGC image detection benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper proposes HEDGE, a heterogeneous ensemble for robust detection of AI-generated images. It defines three complementary routes—Route A (DINOv3 backbone with staged data expansion and augmentation escalation), Route B (higher-resolution forensic branch), and Route C (MetaCLIP2 backbone)—whose logit outputs are combined via weighted averaging and refined by a lightweight dual-gating mechanism to mitigate outliers and majority errors. The central claim is that this structured heterogeneity across training data, resolution, and backbone yields 4th place in the NTIRE 2026 Robust AI-Generated Image Detection in the Wild Challenge together with state-of-the-art robustness on multiple AIGC benchmarks.

Significance. If the performance claims hold under scrutiny, the work provides concrete evidence that deliberate heterogeneity along data, scale, and architecture axes can produce additive error patterns useful for detection under real-world distortions. This would be a useful empirical contribution to the AIGC detection literature, particularly if accompanied by reproducible code or detailed per-route diagnostics that future ensembles could build upon.

major comments (3)
  1. [§4] §4 (Experiments): the manuscript reports 4th-place ranking and SOTA robustness yet supplies no ablation tables, per-route accuracy breakdowns, or direct comparisons against a naive average of the three branches; without these the claim that the chosen heterogeneity axes remain complementary cannot be evaluated.
  2. [§3.3] §3.3 (Fusion): the dual-gating mechanism is described as correcting branch-level outliers, but no quantitative comparison (e.g., weighted average vs. gated fusion on the same backbones) is provided; this leaves open whether the added complexity is load-bearing for the reported gains.
  3. [§5] §5 (Discussion / Generalization): the robustness claim rests on the assumption that the three axes capture diverse errors on unseen generators, yet no post-challenge OOD evaluation on generative models or distortion distributions absent from both training and the NTIRE 2026 set is reported; this is central to the “in the wild” title claim.
minor comments (3)
  1. [Abstract] Abstract: states competitive ranking and SOTA results but contains no numerical metrics, making the headline claim difficult to assess at first reading.
  2. [§3.3] Notation: the weighting coefficients in the logit fusion are introduced without an explicit equation or initialization procedure; a short equation would improve clarity.
  3. [Figure 2] Figure 2 (architecture diagram): the dual-gating block is shown schematically but lacks a legend for the gate outputs; a small table of gate activation statistics on the validation set would help.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of experimental validation and generalization that we will address in the revision. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments): the manuscript reports 4th-place ranking and SOTA robustness yet supplies no ablation tables, per-route accuracy breakdowns, or direct comparisons against a naive average of the three branches; without these the claim that the chosen heterogeneity axes remain complementary cannot be evaluated.

    Authors: We agree that explicit ablations are necessary to substantiate the complementarity claim. In the revised manuscript we will add a dedicated ablation subsection in §4 that reports (i) individual route accuracies on the NTIRE 2026 test set and additional AIGC benchmarks, (ii) all pairwise and triple combinations, and (iii) a direct head-to-head comparison of the proposed weighted-average-plus-gating fusion against a naive unweighted average of the three branch logits. These tables will quantify the additive gains attributable to each heterogeneity axis. revision: yes

  2. Referee: [§3.3] §3.3 (Fusion): the dual-gating mechanism is described as correcting branch-level outliers, but no quantitative comparison (e.g., weighted average vs. gated fusion on the same backbones) is provided; this leaves open whether the added complexity is load-bearing for the reported gains.

    Authors: We accept that an isolated comparison is required. The revised §3.3 and §4 will include a controlled ablation that keeps the three backbones and training regimes fixed while replacing the dual-gating module with simple logit averaging. Performance deltas on both the challenge test set and robustness benchmarks will be reported, allowing readers to assess whether the gating contributes meaningfully beyond the heterogeneity already present in the routes. revision: yes

  3. Referee: [§5] §5 (Discussion / Generalization): the robustness claim rests on the assumption that the three axes capture diverse errors on unseen generators, yet no post-challenge OOD evaluation on generative models or distortion distributions absent from both training and the NTIRE 2026 set is reported; this is central to the “in the wild” title claim.

    Authors: We acknowledge that truly post-challenge OOD testing on generators and distortions completely absent from the training distribution and the NTIRE 2026 protocol would provide stronger evidence. Because such models were not available during the challenge window, we cannot retroactively supply those results. In the revised discussion we will (i) articulate the rationale for the three chosen axes and why they are expected to produce diverse error patterns, (ii) report any additional internal OOD splits we can construct from publicly released generators, and (iii) explicitly state the limitation regarding future unseen generators. This will temper the generalization claim while preserving the empirical contribution of the challenge results. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical ensemble of independent routes validated on external benchmarks

full rationale

The paper presents HEDGE as a construction of three heterogeneous detection routes (data/augmentation expansion on DINOv3, higher-resolution forensic branch, MetaCLIP2 backbone) whose outputs are fused by logit-space weighted averaging plus dual-gating. No equations, fitted parameters, or self-citations are shown that reduce the claimed performance or robustness to quantities defined by the same inputs. Results are reported via external NTIRE 2026 challenge placement and multiple AIGC benchmarks, making the derivation self-contained against independent test distributions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to the high-level design assumptions stated there; no free parameters, new entities, or non-standard axioms are explicitly introduced.

axioms (1)
  • domain assumption Structured heterogeneity across training regimes, resolution, and backbone yields complementary detection cues that improve robustness over any single configuration
    Explicitly invoked in the opening argument that a single regime is insufficient.

pith-pipeline@v0.9.0 · 5514 in / 1318 out tokens · 47829 ms · 2026-05-13T18:49:19.885069+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 4 internal anchors

  1. [1]

    Synthbuster: Towards detection of diffu- sion model generated images.IEEE Open Journal of Signal Processing, 5:1–9, 2023

    Quentin Bammey. Synthbuster: Towards detection of diffu- sion model generated images.IEEE Open Journal of Signal Processing, 5:1–9, 2023. 5

  2. [2]

    Real-time deepfake detection in the real-world

    Bar Cavia, Eliahu Horwitz, Tal Reiss, and Yedid Hoshen. Real-time deepfake detection in the real-world.arXiv preprint arXiv:2406.09398, 2024. 5

  3. [3]

    Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images

    Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. InForty- first International Conference on Machine Learning, 2024. 5, 6, 7

  4. [4]

    Dual data alignment makes AI-generated image detector easier generalizable

    Ruoxin Chen, Junwei Xi, Zhiyuan Yan, Ke-Yue Zhang, Shuang Wu, Jingyi Xie, Xu Chen, Lei Xu, Isabel Guan, Taip- ing Yao, and Shouhong Ding. Dual data alignment makes AI-generated image detector easier generalizable. InThe Thirty-ninth Annual Conference on Neural Information Pro- cessing Systems, 2025. 1, 2, 5, 6, 7

  5. [5]

    Meta clip 2: A worldwide scaling recipe.arXiv preprint arXiv:2507.22062,

    Yung-Sung Chuang, Yang Li, Dong Wang, et al. Meta clip 2: A worldwide scaling recipe.arXiv preprint arXiv:2507.22062, 2025. 1

  6. [6]

    Raising the bar of ai-generated image detection with clip

    Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, and Luisa Verdoliva. Raising the bar of ai-generated image detection with clip. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4356–4366, 2024. 5

  7. [7]

    A bias-free training paradigm for more general ai-generated image de- tection

    Fabrizio Guillaro, Giada Zingarini, Ben Usman, Avneesh Sud, Davide Cozzolino, and Luisa Verdoliva. A bias-free training paradigm for more general ai-generated image de- tection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18685–18694, 2025. 2, 5, 6

  8. [8]

    Ntire 2026 challenge on robust ai-generated image detection in the wild

    Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya, Artem Filippov, Georgii Bychkov, Sergey Lavrushkin, Mikhail Erofeev, Anastasia Antsiferova, Changsheng Chen, Shunquan Tan, Radu Timofte, Dmitriy Vatolin, et al. Ntire 2026 challenge on robust ai-generated image detection in the wild. InProceedings of the IEEE/CVF Conference on Com- puter Vision and P...

  9. [9]

    arXiv preprint arXiv:2510.03161 , year=

    Qing Huang, Zhipei Xu, Xuanyu Zhang, and Jian Zhang. Unishield: An adaptive multi-agent framework for unified forgery image detection and localization.arXiv preprint arXiv:2510.03161, 2025. 2

  10. [10]

    So-fake: Benchmarking and explain- ing social media image forgery detection.arXiv preprint arXiv:2505.18660, 2025

    Zhenglin Huang, Tianxiao Li, Xiangtai Li, Haiquan Wen, Yiwei He, Jiangning Zhang, Hao Fei, Xi Yang, Xiaowei Huang, Bei Peng, et al. So-fake: Benchmarking and explain- ing social media image forgery detection.arXiv preprint arXiv:2505.18660, 2025. 3, 5

  11. [11]

    Locate-Then-Examine: Grounded Region Reasoning Improves Detection of AI-Generated Images

    Yikun Ji, Yan Hong, Bowen Deng, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang, et al. Zoom-in to sort ai-generated images out.arXiv preprint arXiv:2510.04225,

  12. [12]

    Fakexplain: AI- generated images detection via human-aligned grounded rea- soning

    Yikun Ji, Yan Hong, Qi Fan, jun lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, and Jianfu Zhang. Fakexplain: AI- generated images detection via human-aligned grounded rea- soning. InThe Fourteenth International Conference on Learning Representations, 2026. 1, 2

  13. [13]

    Legion: Learning to ground and ex- plain for synthetic image detection

    Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Wei- jia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, et al. Legion: Learning to ground and ex- plain for synthetic image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18937–18947, 2025. 1, 2

  14. [14]

    Bridging the gap between ideal and real-world evaluation: Benchmarking ai-generated image detection in challenging scenarios

    Chunxiao Li, Xiaoxiao Wang, Meiling Li, Boming Miao, Peng Sun, Yunjian Zhang, Xiangyang Ji, and Yao Zhu. Bridging the gap between ideal and real-world evaluation: Benchmarking ai-generated image detection in challenging scenarios. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20379–20389, 2025. 3, 5

  15. [15]

    Improving synthetic image detection towards generalization: An image transformation perspec- tive

    Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng. Improving synthetic image detection towards generalization: An image transformation perspec- tive. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, pages 2405– 2414, 2025. 5, 6, 7

  16. [16]

    arXiv preprint arXiv:2505.12335 , year=

    Ziqiang Li, Jiazhen Yan, Ziwen He, Kai Zeng, Weiwei Jiang, Lizhi Xiong, and Zhangjie Fu. Is artificial intelligence gen- erated image detection a solved problem?arXiv preprint arXiv:2505.12335, 2025. 2, 3, 5

  17. [17]

    From Evidence to Verdict: An Agent-Based Forensic Framework for AI-Generated Image Detection

    Mengfei Liang, Yiting Qu, Yukun Jiang, Michael Backes, and Yang Zhang. From evidence to verdict: An agent-based forensic framework for ai-generated image detection.arXiv preprint arXiv:2511.00181, 2025. 2

  18. [18]

    Forgery-aware adaptive transformer for generalizable synthetic image detection

    Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Jingdong Wang, and Yao Zhao. Forgery-aware adaptive transformer for generalizable synthetic image detection. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2024. 5, 6, 7

  19. [19]

    Beyond artifacts: Real-centric envelope modeling for reliable ai-generated image detection.arXiv preprint arXiv:2512.20937, 2025

    Ruiqi Liu, Yi Han, Zhengbo Zhang, Liwei Yao, Zhiyuan Yan, Jialiang Shen, ZhiJin Chen, Boyi Sun, Lubin Weng, Jing Dong, et al. Beyond artifacts: Real-centric envelope modeling for reliable ai-generated image detection.arXiv preprint arXiv:2512.20937, 2025. 2, 5, 7

  20. [20]

    arXiv preprint arXiv:2602.02222 , year=

    Ruiqi Liu, Manni Cui, Ziheng Qin, Zhiyuan Yan, Ruoxin Chen, Yi Han, Zhiheng Li, Junkai Chen, ZhiJin Chen, Kaiqing Lin, et al. Mirror: Manifold ideal reference re- constructor for generalizable ai-generated image detection. arXiv preprint arXiv:2602.02222, 2026. 2, 5, 6

  21. [21]

    Deepfake scam tricks hong kong firm into paying out $25 million.https://edition.cnn

    Kathleen Magramo. Deepfake scam tricks hong kong firm into paying out $25 million.https://edition.cnn. com/2024/02/04/asia/deepfake- cfo- scam- hong-kong-intl-hnk, 2024. Accessed: 2025-06-30. 1

  22. [22]

    No pixel left behind: A detail-preserving architecture for robust high-resolution AI-generated image detection

    Lianrui Mu, Haoji Hu, Zou Xingze, Jianhong Bai, and Jiaqi Hu. No pixel left behind: A detail-preserving architecture for robust high-resolution AI-generated image detection. In The Fourteenth International Conference on Learning Rep- resentations, 2026. 2, 7

  23. [23]

    Towards uni- versal fake image detectors that generalize across genera- tive models

    Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- versal fake image detectors that generalize across genera- tive models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24480– 24489, 2023. 1, 2, 5, 6, 7 9

  24. [24]

    Scaling Up AI-Generated Image Detection with Generator-Aware Prototypes

    Ziheng Qin, Yuheng Ji, Renshuai Tao, Yuxuan Tian, Yuyang Liu, Yipu Wang, and Xiaolong Zheng. Scaling up ai- generated image detection with generator-aware prototypes. arXiv preprint arXiv:2512.12982, 2025. 2

  25. [25]

    Aligned datasets improve detection of latent diffusion-generated images.arXiv preprint arXiv:2410.11835, 2024

    Anirudh Sundara Rajan, Utkarsh Ojha, Jedidiah Schloesser, and Yong Jae Lee. Aligned datasets improve detec- tion of latent diffusion-generated images.arXiv preprint arXiv:2410.11835, 2024. 5, 6, 7

  26. [26]

    Janko Roettgers. This tiktok tom cruise impersonator deep- fake is scary good—and it could be the future of entertain- ment.https://www.theverge.com/22303756/ tiktok-tom-cruise-impersonator-deepfake,

  27. [27]

    Accessed: 2025-06-30. 1

  28. [28]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1

  29. [29]

    DINOv3

    Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 1

  30. [30]

    Frequency-aware deepfake de- tection: Improving generalizability through frequency space domain learning

    Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake de- tection: Improving generalizability through frequency space domain learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5052–5060, 2024. 1

  31. [31]

    Rethinking the up-sampling op- erations in cnn-based generative network for generalizable deepfake detection

    Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling op- erations in cnn-based generative network for generalizable deepfake detection. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 28130–28139, 2024. 1, 2, 5, 6, 7, 8

  32. [32]

    C2p-clip: Inject- ing category common prompt in clip to enhance generaliza- tion in deepfake detection

    Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, and Yunchao Wei. C2p-clip: Inject- ing category common prompt in clip to enhance generaliza- tion in deepfake detection. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 7184–7192, 2025. 5, 6, 7

  33. [33]

    Veritas: Generalizable deepfake detection via pattern-aware reasoning

    Hao Tan, Jun Lan, Zichang Tan, Ajian Liu, Chuanbiao Song, Senyuan Shi, Huijia Zhu, Weiqiang Wang, Jun Wan, and Zhen Lei. Veritas: Generalizable deepfake detection via pattern-aware reasoning. InInternational Conference on Learning Representations, 2026. 2

  34. [34]

    Forensics-bench: A comprehensive forgery detection bench- mark suite for large vision language models

    Jin Wang, Chenghui Lv, Xian Li, Shichao Dong, Huadong Li, Kelu Yao, Chao Li, Wenqi Shao, and Ping Luo. Forensics-bench: A comprehensive forgery detection bench- mark suite for large vision language models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4233–4245, 2025. 2

  35. [35]

    Dire for diffusion-generated image detection

    Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion-generated image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22445–22455, 2023. 1

  36. [36]

    Qwen-image technical report, 2025

    Chenfei Wu, Jiahao Li, Jingren Zhou, et al. Qwen-image technical report, 2025. 1

  37. [37]

    Unveiling perceptual artifacts: A fine-grained benchmark for interpretable AI-generated im- age detection

    Yao Xiao, Weiyan Chen, Jiahao Chen, Zijie Cao, Weijian Deng, Binbin Yang, ZiYi Dong, Xiangyang Ji, Wei Ke, Pengxu Wei, and Liang Lin. Unveiling perceptual artifacts: A fine-grained benchmark for interpretable AI-generated im- age detection. InThe Fourteenth International Conference on Learning Representations, 2026. 2

  38. [38]

    Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- guage models

    Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, and Jian Zhang. Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- guage models. InInternational Conference on Learning Representations, 2025. 1, 2

  39. [39]

    Dual frequency branch framework with reconstructed sliding windows attention for ai-generated image detection

    Jiazhen Yan, Ziqiang Li, Fan Wang, Ziwen He, and Zhangjie Fu. Dual frequency branch framework with reconstructed sliding windows attention for ai-generated image detection. IEEE Transactions on Information Forensics and Security,

  40. [40]

    A sanity check for AI- generated image detection

    Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Weidi Xie. A sanity check for AI- generated image detection. InThe Thirteenth International Conference on Learning Representations, 2025. 1, 2, 3, 5, 6, 7

  41. [41]

    Orthogonal subspace decomposition for generalizable ai-generated image detection.arXiv preprint arXiv:2411.15633, 2024

    Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decompo- sition for generalizable ai-generated image detection.arXiv preprint arXiv:2411.15633, 2024. 8

  42. [42]

    Dˆ 3: scaling up deepfake detection by learning from discrepancy

    Yongqi Yang, Zhihao Qian, Ye Zhu, Olga Russakovsky, and Yu Wu. Dˆ 3: scaling up deepfake detection by learning from discrepancy. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23850–23859,

  43. [43]

    All patches matter, more patches better: En- hance ai-generated image detection via panoptic patch learn- ing.arXiv preprint arXiv:2504.01396, 2025

    Zheng Yang, Ruoxin Chen, Zhiyuan Yan, Ke-Yue Zhang, Xinghe Fu, Shuang Wu, Xiujun Shu, Taiping Yao, Shouhong Ding, and Xi Li. All patches matter, more patches better: En- hance ai-generated image detection via panoptic patch learn- ing.arXiv preprint arXiv:2504.01396, 2025. 2

  44. [44]

    arXiv preprint arXiv:2311.12397 , year=

    Nan Zhong, Yiran Xu, Sheng Li, Zhenxing Qian, and Xinpeng Zhang. Patchcraft: Exploring texture patch for efficient ai-generated image detection.arXiv preprint arXiv:2311.12397, 2023. 1, 2, 5, 8

  45. [45]

    Brought a gun to a knife fight: Modern vfm baselines outgun specialized detectors on in-the-wild ai image detection.arXiv preprint arXiv:2509.12995, 2025

    Yue Zhou, Xinan He, Kaiqing Lin, Bing Fan, Feng Ding, Jin- hua Zeng, and Bin Li. Brought a gun to a knife fight: Modern vfm baselines outgun specialized detectors on in-the-wild ai image detection.arXiv preprint arXiv:2509.12995, 2025. 1, 2

  46. [46]

    Genimage: A million-scale benchmark for de- tecting ai-generated image.Advances in neural information processing systems, 36:77771–77782, 2023

    Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million-scale benchmark for de- tecting ai-generated image.Advances in neural information processing systems, 36:77771–77782, 2023. 3, 5 10