arxiv: 2604.03555 · v1 · submitted 2026-04-04 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

HEDGE: Heterogeneous Ensemble for Detection of AI-GEnerated Images in the Wild

Fei Wu , Dagong Lu , Mufeng Yao , Xinlei Xu , Fengjun Guo

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:49 UTC · model grok-4.3

classification 💻 cs.CV

keywords AI-generated image detectionheterogeneous ensemblerobust detectionimage forensicsensemble fusiongenerative modelscomputer visionAIGC benchmarks

0 comments

The pith

HEDGE detects AI-generated images robustly by ensembling detectors across diverse training data, resolutions, and backbones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that single training regimes, fixed resolutions, or one backbone cannot cover the range of evolving generative models and real-world distortions in AI images. HEDGE builds three routes to address this: a DINOv3 path with staged data expansion and augmentation, a high-resolution branch for fine details, and a MetaCLIP2 branch for backbone variety. These outputs combine through weighted logit averaging refined by dual-gating that corrects branch outliers and fusion mistakes. The resulting system reaches fourth place in the NTIRE 2026 challenge while leading on multiple AIGC benchmarks. This shows that deliberate spread across key design axes can outperform uniform single-model detectors for this task.

Core claim

HEDGE establishes that structuring detection around three complementary axes—progressive training data expansion, multi-scale resolution, and backbone heterogeneity—then fusing the routes via logit-space weighted averaging and a lightweight dual-gating mechanism yields stronger robustness to unseen generators and distortions than any single route alone.

What carries the argument

Three detection routes (DINOv3-based staged training, higher-resolution branch, MetaCLIP2 branch) fused by logit weighted averaging and dual-gating for outlier correction and majority-error handling.

If this is right

The ensemble handles a broader set of unseen generative models and distortions than homogeneous detectors.
It attains state-of-the-art accuracy and robustness on standard AIGC image detection benchmarks.
It secures fourth place in the NTIRE 2026 Robust AI-Generated Image Detection in the Wild Challenge.
The dual-gating fusion reduces the impact of individual branch failures without requiring retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same three-axis heterogeneity pattern could be tested on related tasks such as video or audio deepfake detection.
Adding further backbone types or resolution levels might increase performance provided the new routes stay complementary.
Long-term monitoring on post-2026 generators would test whether the current diversity axes continue to cover future model shifts.

Load-bearing premise

The three chosen axes of heterogeneity plus dual-gating will remain complementary and superior when facing entirely new generative models and distortion distributions not seen in development.

What would settle it

A new generative model that causes all three individual routes in HEDGE to fail at comparable rates would show the heterogeneity no longer supplies independent strengths.

Figures

Figures reproduced from arXiv: 2604.03555 by Dagong Lu, Fei Wu, Fengjun Guo, Mufeng Yao, Xinlei Xu.

**Figure 1.** Figure 1: Overview of the proposed three-route framework for robust AIGC image detection. Route A progressively constructs DINOv3- [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Robustness evaluation under common image perturbations on HiRes-50K (1,000 real + 1,000 fake, unseen during training). [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: t-SNE visualization of M3 (DINOv3-Huge) CLS token [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Robust detection of AI-generated images in the wild remains challenging due to the rapid evolution of generative models and varied real-world distortions. We argue that relying on a single training regime, resolution, or backbone is insufficient to handle all conditions, and that structured heterogeneity across these dimensions is essential for robust detection. To this end, we propose HEDGE, a Heterogeneous Ensemble for Detection of AI-GEnerated images, that introduces complementary detection routes along three axes: diverse training data with strong augmentation, multi-scale feature extraction, and backbone heterogeneity. Specifically, Route~A progressively constructs DINOv3-based detectors through staged data expansion and augmentation escalation, Route~B incorporates a higher-resolution branch for fine-grained forensic cues, and Route~C adds a MetaCLIP2-based branch for backbone diversity. All outputs are fused via logit-space weighted averaging, refined by a lightweight dual-gating mechanism that handles branch-level outliers and majority-dominated fusion errors. HEDGE achieves 4th place in the NTIRE 2026 Robust AI-Generated Image Detection in the Wild Challenge and attains state-of-the-art performance with strong robustness on multiple AIGC image detection benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HEDGE is a practical three-route ensemble for AI-generated image detection that ranks 4th in the NTIRE challenge, though its robustness to new generators needs more validation.

read the letter

The one thing to take away is that HEDGE is a heterogeneous ensemble using DINOv3 with progressive training, a dedicated high-resolution branch, and MetaCLIP2, all fused through weighted logits and dual gating, and it placed fourth in the NTIRE 2026 challenge while claiming top results on several AIGC benchmarks. What stands out as new is the particular mix of those three axes plus the gating step. Earlier work has used ensembles for detection, but this specific staged data expansion on DINOv3, the resolution split, and the MetaCLIP2 addition with outlier handling is a fresh combination. The paper does well at showing how these routes can be made to work together without too much added complexity. The soft spots are around the evidence for robustness. The abstract and results focus on the challenge and current benchmarks, but as the stress test notes, those may not cover generators that produce correlated errors across the routes. If a future model fools the DINO features, the high-res cues, and the CLIP features in similar ways, the fusion has no extra information to fall back on. The paper would be stronger with an explicit section testing on models released after the training data cutoff. This is the kind of paper that reading groups in computer vision forensics would find useful to discuss, because it gives a concrete system that people can implement and test themselves. It is worth a serious referee because the challenge placement is a real external signal and the method is grounded in reproducible components.

Referee Report

3 major / 3 minor

Summary. The paper proposes HEDGE, a heterogeneous ensemble for robust detection of AI-generated images. It defines three complementary routes—Route A (DINOv3 backbone with staged data expansion and augmentation escalation), Route B (higher-resolution forensic branch), and Route C (MetaCLIP2 backbone)—whose logit outputs are combined via weighted averaging and refined by a lightweight dual-gating mechanism to mitigate outliers and majority errors. The central claim is that this structured heterogeneity across training data, resolution, and backbone yields 4th place in the NTIRE 2026 Robust AI-Generated Image Detection in the Wild Challenge together with state-of-the-art robustness on multiple AIGC benchmarks.

Significance. If the performance claims hold under scrutiny, the work provides concrete evidence that deliberate heterogeneity along data, scale, and architecture axes can produce additive error patterns useful for detection under real-world distortions. This would be a useful empirical contribution to the AIGC detection literature, particularly if accompanied by reproducible code or detailed per-route diagnostics that future ensembles could build upon.

major comments (3)

[§4] §4 (Experiments): the manuscript reports 4th-place ranking and SOTA robustness yet supplies no ablation tables, per-route accuracy breakdowns, or direct comparisons against a naive average of the three branches; without these the claim that the chosen heterogeneity axes remain complementary cannot be evaluated.
[§3.3] §3.3 (Fusion): the dual-gating mechanism is described as correcting branch-level outliers, but no quantitative comparison (e.g., weighted average vs. gated fusion on the same backbones) is provided; this leaves open whether the added complexity is load-bearing for the reported gains.
[§5] §5 (Discussion / Generalization): the robustness claim rests on the assumption that the three axes capture diverse errors on unseen generators, yet no post-challenge OOD evaluation on generative models or distortion distributions absent from both training and the NTIRE 2026 set is reported; this is central to the “in the wild” title claim.

minor comments (3)

[Abstract] Abstract: states competitive ranking and SOTA results but contains no numerical metrics, making the headline claim difficult to assess at first reading.
[§3.3] Notation: the weighting coefficients in the logit fusion are introduced without an explicit equation or initialization procedure; a short equation would improve clarity.
[Figure 2] Figure 2 (architecture diagram): the dual-gating block is shown schematically but lacks a legend for the gate outputs; a small table of gate activation statistics on the validation set would help.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of experimental validation and generalization that we will address in the revision. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: [§4] §4 (Experiments): the manuscript reports 4th-place ranking and SOTA robustness yet supplies no ablation tables, per-route accuracy breakdowns, or direct comparisons against a naive average of the three branches; without these the claim that the chosen heterogeneity axes remain complementary cannot be evaluated.

Authors: We agree that explicit ablations are necessary to substantiate the complementarity claim. In the revised manuscript we will add a dedicated ablation subsection in §4 that reports (i) individual route accuracies on the NTIRE 2026 test set and additional AIGC benchmarks, (ii) all pairwise and triple combinations, and (iii) a direct head-to-head comparison of the proposed weighted-average-plus-gating fusion against a naive unweighted average of the three branch logits. These tables will quantify the additive gains attributable to each heterogeneity axis. revision: yes
Referee: [§3.3] §3.3 (Fusion): the dual-gating mechanism is described as correcting branch-level outliers, but no quantitative comparison (e.g., weighted average vs. gated fusion on the same backbones) is provided; this leaves open whether the added complexity is load-bearing for the reported gains.

Authors: We accept that an isolated comparison is required. The revised §3.3 and §4 will include a controlled ablation that keeps the three backbones and training regimes fixed while replacing the dual-gating module with simple logit averaging. Performance deltas on both the challenge test set and robustness benchmarks will be reported, allowing readers to assess whether the gating contributes meaningfully beyond the heterogeneity already present in the routes. revision: yes
Referee: [§5] §5 (Discussion / Generalization): the robustness claim rests on the assumption that the three axes capture diverse errors on unseen generators, yet no post-challenge OOD evaluation on generative models or distortion distributions absent from both training and the NTIRE 2026 set is reported; this is central to the “in the wild” title claim.

Authors: We acknowledge that truly post-challenge OOD testing on generators and distortions completely absent from the training distribution and the NTIRE 2026 protocol would provide stronger evidence. Because such models were not available during the challenge window, we cannot retroactively supply those results. In the revised discussion we will (i) articulate the rationale for the three chosen axes and why they are expected to produce diverse error patterns, (ii) report any additional internal OOD splits we can construct from publicly released generators, and (iii) explicitly state the limitation regarding future unseen generators. This will temper the generalization claim while preserving the empirical contribution of the challenge results. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical ensemble of independent routes validated on external benchmarks

full rationale

The paper presents HEDGE as a construction of three heterogeneous detection routes (data/augmentation expansion on DINOv3, higher-resolution forensic branch, MetaCLIP2 backbone) whose outputs are fused by logit-space weighted averaging plus dual-gating. No equations, fitted parameters, or self-citations are shown that reduce the claimed performance or robustness to quantities defined by the same inputs. Results are reported via external NTIRE 2026 challenge placement and multiple AIGC benchmarks, making the derivation self-contained against independent test distributions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to the high-level design assumptions stated there; no free parameters, new entities, or non-standard axioms are explicitly introduced.

axioms (1)

domain assumption Structured heterogeneity across training regimes, resolution, and backbone yields complementary detection cues that improve robustness over any single configuration
Explicitly invoked in the opening argument that a single regime is insufficient.

pith-pipeline@v0.9.0 · 5514 in / 1318 out tokens · 47829 ms · 2026-05-13T18:49:19.885069+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HEDGE ... three-route heterogeneous framework that diversifies training data and augmentation, input resolution, and backbone architecture ... logit-space weighted fusion ... dual-gating mechanism
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Route A progressively constructs DINOv3-based detectors through staged data expansion and augmentation escalation; Route B ... higher-resolution branch; Route C ... MetaCLIP2-based branch

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 4 internal anchors

[1]

Synthbuster: Towards detection of diffu- sion model generated images.IEEE Open Journal of Signal Processing, 5:1–9, 2023

Quentin Bammey. Synthbuster: Towards detection of diffu- sion model generated images.IEEE Open Journal of Signal Processing, 5:1–9, 2023. 5

work page 2023
[2]

Real-time deepfake detection in the real-world

Bar Cavia, Eliahu Horwitz, Tal Reiss, and Yedid Hoshen. Real-time deepfake detection in the real-world.arXiv preprint arXiv:2406.09398, 2024. 5

work page arXiv 2024
[3]

Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images

Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. InForty- first International Conference on Machine Learning, 2024. 5, 6, 7

work page 2024
[4]

Dual data alignment makes AI-generated image detector easier generalizable

Ruoxin Chen, Junwei Xi, Zhiyuan Yan, Ke-Yue Zhang, Shuang Wu, Jingyi Xie, Xu Chen, Lei Xu, Isabel Guan, Taip- ing Yao, and Shouhong Ding. Dual data alignment makes AI-generated image detector easier generalizable. InThe Thirty-ninth Annual Conference on Neural Information Pro- cessing Systems, 2025. 1, 2, 5, 6, 7

work page 2025
[5]

Meta clip 2: A worldwide scaling recipe.arXiv preprint arXiv:2507.22062,

Yung-Sung Chuang, Yang Li, Dong Wang, et al. Meta clip 2: A worldwide scaling recipe.arXiv preprint arXiv:2507.22062, 2025. 1

work page arXiv 2025
[6]

Raising the bar of ai-generated image detection with clip

Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, and Luisa Verdoliva. Raising the bar of ai-generated image detection with clip. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4356–4366, 2024. 5

work page 2024
[7]

A bias-free training paradigm for more general ai-generated image de- tection

Fabrizio Guillaro, Giada Zingarini, Ben Usman, Avneesh Sud, Davide Cozzolino, and Luisa Verdoliva. A bias-free training paradigm for more general ai-generated image de- tection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18685–18694, 2025. 2, 5, 6

work page 2025
[8]

Ntire 2026 challenge on robust ai-generated image detection in the wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya, Artem Filippov, Georgii Bychkov, Sergey Lavrushkin, Mikhail Erofeev, Anastasia Antsiferova, Changsheng Chen, Shunquan Tan, Radu Timofte, Dmitriy Vatolin, et al. Ntire 2026 challenge on robust ai-generated image detection in the wild. InProceedings of the IEEE/CVF Conference on Com- puter Vision and P...

work page 2026
[9]

arXiv preprint arXiv:2510.03161 , year=

Qing Huang, Zhipei Xu, Xuanyu Zhang, and Jian Zhang. Unishield: An adaptive multi-agent framework for unified forgery image detection and localization.arXiv preprint arXiv:2510.03161, 2025. 2

work page arXiv 2025
[10]

So-fake: Benchmarking and explain- ing social media image forgery detection.arXiv preprint arXiv:2505.18660, 2025

Zhenglin Huang, Tianxiao Li, Xiangtai Li, Haiquan Wen, Yiwei He, Jiangning Zhang, Hao Fei, Xi Yang, Xiaowei Huang, Bei Peng, et al. So-fake: Benchmarking and explain- ing social media image forgery detection.arXiv preprint arXiv:2505.18660, 2025. 3, 5

work page arXiv 2025
[11]

Locate-Then-Examine: Grounded Region Reasoning Improves Detection of AI-Generated Images

Yikun Ji, Yan Hong, Bowen Deng, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang, et al. Zoom-in to sort ai-generated images out.arXiv preprint arXiv:2510.04225,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Fakexplain: AI- generated images detection via human-aligned grounded rea- soning

Yikun Ji, Yan Hong, Qi Fan, jun lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, and Jianfu Zhang. Fakexplain: AI- generated images detection via human-aligned grounded rea- soning. InThe Fourteenth International Conference on Learning Representations, 2026. 1, 2

work page 2026
[13]

Legion: Learning to ground and ex- plain for synthetic image detection

Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Wei- jia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, et al. Legion: Learning to ground and ex- plain for synthetic image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18937–18947, 2025. 1, 2

work page 2025
[14]

Bridging the gap between ideal and real-world evaluation: Benchmarking ai-generated image detection in challenging scenarios

Chunxiao Li, Xiaoxiao Wang, Meiling Li, Boming Miao, Peng Sun, Yunjian Zhang, Xiangyang Ji, and Yao Zhu. Bridging the gap between ideal and real-world evaluation: Benchmarking ai-generated image detection in challenging scenarios. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20379–20389, 2025. 3, 5

work page 2025
[15]

Improving synthetic image detection towards generalization: An image transformation perspec- tive

Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng. Improving synthetic image detection towards generalization: An image transformation perspec- tive. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, pages 2405– 2414, 2025. 5, 6, 7

work page 2025
[16]

arXiv preprint arXiv:2505.12335 , year=

Ziqiang Li, Jiazhen Yan, Ziwen He, Kai Zeng, Weiwei Jiang, Lizhi Xiong, and Zhangjie Fu. Is artificial intelligence gen- erated image detection a solved problem?arXiv preprint arXiv:2505.12335, 2025. 2, 3, 5

work page arXiv 2025
[17]

From Evidence to Verdict: An Agent-Based Forensic Framework for AI-Generated Image Detection

Mengfei Liang, Yiting Qu, Yukun Jiang, Michael Backes, and Yang Zhang. From evidence to verdict: An agent-based forensic framework for ai-generated image detection.arXiv preprint arXiv:2511.00181, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Forgery-aware adaptive transformer for generalizable synthetic image detection

Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Jingdong Wang, and Yao Zhao. Forgery-aware adaptive transformer for generalizable synthetic image detection. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2024. 5, 6, 7

work page 2024
[19]

Beyond artifacts: Real-centric envelope modeling for reliable ai-generated image detection.arXiv preprint arXiv:2512.20937, 2025

Ruiqi Liu, Yi Han, Zhengbo Zhang, Liwei Yao, Zhiyuan Yan, Jialiang Shen, ZhiJin Chen, Boyi Sun, Lubin Weng, Jing Dong, et al. Beyond artifacts: Real-centric envelope modeling for reliable ai-generated image detection.arXiv preprint arXiv:2512.20937, 2025. 2, 5, 7

work page arXiv 2025
[20]

arXiv preprint arXiv:2602.02222 , year=

Ruiqi Liu, Manni Cui, Ziheng Qin, Zhiyuan Yan, Ruoxin Chen, Yi Han, Zhiheng Li, Junkai Chen, ZhiJin Chen, Kaiqing Lin, et al. Mirror: Manifold ideal reference re- constructor for generalizable ai-generated image detection. arXiv preprint arXiv:2602.02222, 2026. 2, 5, 6

work page arXiv 2026
[21]

Deepfake scam tricks hong kong firm into paying out $25 million.https://edition.cnn

Kathleen Magramo. Deepfake scam tricks hong kong firm into paying out $25 million.https://edition.cnn. com/2024/02/04/asia/deepfake- cfo- scam- hong-kong-intl-hnk, 2024. Accessed: 2025-06-30. 1

work page 2024
[22]

No pixel left behind: A detail-preserving architecture for robust high-resolution AI-generated image detection

Lianrui Mu, Haoji Hu, Zou Xingze, Jianhong Bai, and Jiaqi Hu. No pixel left behind: A detail-preserving architecture for robust high-resolution AI-generated image detection. In The Fourteenth International Conference on Learning Rep- resentations, 2026. 2, 7

work page 2026
[23]

Towards uni- versal fake image detectors that generalize across genera- tive models

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- versal fake image detectors that generalize across genera- tive models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24480– 24489, 2023. 1, 2, 5, 6, 7 9

work page 2023
[24]

Scaling Up AI-Generated Image Detection with Generator-Aware Prototypes

Ziheng Qin, Yuheng Ji, Renshuai Tao, Yuxuan Tian, Yuyang Liu, Yipu Wang, and Xiaolong Zheng. Scaling up ai- generated image detection with generator-aware prototypes. arXiv preprint arXiv:2512.12982, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Aligned datasets improve detection of latent diffusion-generated images.arXiv preprint arXiv:2410.11835, 2024

Anirudh Sundara Rajan, Utkarsh Ojha, Jedidiah Schloesser, and Yong Jae Lee. Aligned datasets improve detec- tion of latent diffusion-generated images.arXiv preprint arXiv:2410.11835, 2024. 5, 6, 7

work page arXiv 2024
[26]

Janko Roettgers. This tiktok tom cruise impersonator deep- fake is scary good—and it could be the future of entertain- ment.https://www.theverge.com/22303756/ tiktok-tom-cruise-impersonator-deepfake,

work page arXiv
[27]

Accessed: 2025-06-30. 1

work page 2025
[28]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1

work page 2022
[29]

DINOv3

Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 1

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Frequency-aware deepfake de- tection: Improving generalizability through frequency space domain learning

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake de- tection: Improving generalizability through frequency space domain learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5052–5060, 2024. 1

work page 2024
[31]

Rethinking the up-sampling op- erations in cnn-based generative network for generalizable deepfake detection

Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling op- erations in cnn-based generative network for generalizable deepfake detection. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 28130–28139, 2024. 1, 2, 5, 6, 7, 8

work page 2024
[32]

C2p-clip: Inject- ing category common prompt in clip to enhance generaliza- tion in deepfake detection

Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, and Yunchao Wei. C2p-clip: Inject- ing category common prompt in clip to enhance generaliza- tion in deepfake detection. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 7184–7192, 2025. 5, 6, 7

work page 2025
[33]

Veritas: Generalizable deepfake detection via pattern-aware reasoning

Hao Tan, Jun Lan, Zichang Tan, Ajian Liu, Chuanbiao Song, Senyuan Shi, Huijia Zhu, Weiqiang Wang, Jun Wan, and Zhen Lei. Veritas: Generalizable deepfake detection via pattern-aware reasoning. InInternational Conference on Learning Representations, 2026. 2

work page 2026
[34]

Forensics-bench: A comprehensive forgery detection bench- mark suite for large vision language models

Jin Wang, Chenghui Lv, Xian Li, Shichao Dong, Huadong Li, Kelu Yao, Chao Li, Wenqi Shao, and Ping Luo. Forensics-bench: A comprehensive forgery detection bench- mark suite for large vision language models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4233–4245, 2025. 2

work page 2025
[35]

Dire for diffusion-generated image detection

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion-generated image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22445–22455, 2023. 1

work page 2023
[36]

Qwen-image technical report, 2025

Chenfei Wu, Jiahao Li, Jingren Zhou, et al. Qwen-image technical report, 2025. 1

work page 2025
[37]

Unveiling perceptual artifacts: A fine-grained benchmark for interpretable AI-generated im- age detection

Yao Xiao, Weiyan Chen, Jiahao Chen, Zijie Cao, Weijian Deng, Binbin Yang, ZiYi Dong, Xiangyang Ji, Wei Ke, Pengxu Wei, and Liang Lin. Unveiling perceptual artifacts: A fine-grained benchmark for interpretable AI-generated im- age detection. InThe Fourteenth International Conference on Learning Representations, 2026. 2

work page 2026
[38]

Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- guage models

Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, and Jian Zhang. Fakeshield: Explainable image forgery detection and localization via multi-modal large lan- guage models. InInternational Conference on Learning Representations, 2025. 1, 2

work page 2025
[39]

Dual frequency branch framework with reconstructed sliding windows attention for ai-generated image detection

Jiazhen Yan, Ziqiang Li, Fan Wang, Ziwen He, and Zhangjie Fu. Dual frequency branch framework with reconstructed sliding windows attention for ai-generated image detection. IEEE Transactions on Information Forensics and Security,

work page
[40]

A sanity check for AI- generated image detection

Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Weidi Xie. A sanity check for AI- generated image detection. InThe Thirteenth International Conference on Learning Representations, 2025. 1, 2, 3, 5, 6, 7

work page 2025
[41]

Orthogonal subspace decomposition for generalizable ai-generated image detection.arXiv preprint arXiv:2411.15633, 2024

Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decompo- sition for generalizable ai-generated image detection.arXiv preprint arXiv:2411.15633, 2024. 8

work page arXiv 2024
[42]

Dˆ 3: scaling up deepfake detection by learning from discrepancy

Yongqi Yang, Zhihao Qian, Ye Zhu, Olga Russakovsky, and Yu Wu. Dˆ 3: scaling up deepfake detection by learning from discrepancy. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23850–23859,

work page
[43]

All patches matter, more patches better: En- hance ai-generated image detection via panoptic patch learn- ing.arXiv preprint arXiv:2504.01396, 2025

Zheng Yang, Ruoxin Chen, Zhiyuan Yan, Ke-Yue Zhang, Xinghe Fu, Shuang Wu, Xiujun Shu, Taiping Yao, Shouhong Ding, and Xi Li. All patches matter, more patches better: En- hance ai-generated image detection via panoptic patch learn- ing.arXiv preprint arXiv:2504.01396, 2025. 2

work page arXiv 2025
[44]

arXiv preprint arXiv:2311.12397 , year=

Nan Zhong, Yiran Xu, Sheng Li, Zhenxing Qian, and Xinpeng Zhang. Patchcraft: Exploring texture patch for efficient ai-generated image detection.arXiv preprint arXiv:2311.12397, 2023. 1, 2, 5, 8

work page arXiv 2023
[45]

Brought a gun to a knife fight: Modern vfm baselines outgun specialized detectors on in-the-wild ai image detection.arXiv preprint arXiv:2509.12995, 2025

Yue Zhou, Xinan He, Kaiqing Lin, Bing Fan, Feng Ding, Jin- hua Zeng, and Bin Li. Brought a gun to a knife fight: Modern vfm baselines outgun specialized detectors on in-the-wild ai image detection.arXiv preprint arXiv:2509.12995, 2025. 1, 2

work page arXiv 2025
[46]

Genimage: A million-scale benchmark for de- tecting ai-generated image.Advances in neural information processing systems, 36:77771–77782, 2023

Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million-scale benchmark for de- tecting ai-generated image.Advances in neural information processing systems, 36:77771–77782, 2023. 3, 5 10

work page 2023