pith. machine review for the scientific record. sign in

arxiv: 2604.09106 · v2 · submitted 2026-04-10 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Detecting Diffusion-generated Images via Dynamic Assembly Forests

Mengxin Fu, Yuezun Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:12 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords diffusion-generated imagesimage detectiondeep forestdynamic assemblylightweight detectorforgery detectioncomputer vision
0
0 comments X

The pith

A Dynamic Assembly Forest detects diffusion-generated images using far fewer parameters than neural networks and without GPUs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a Dynamic Assembly Forest model to identify images created by diffusion models. It extends the deep forest paradigm with dynamic assembly to improve feature learning and make training more scalable than standard machine learning approaches. If this holds, detection becomes practical on ordinary hardware, addressing security risks from realistic fakes while avoiding the heavy resource demands of deep neural networks. The approach reaches competitive accuracy on standard tests despite its simpler structure.

Core claim

The paper claims that DAF, built by dynamically assembling decision trees within a forest structure, extracts features sufficient to separate real images from those produced by diffusion models, delivering an effective detector that sidesteps the parameter scale and compute requirements of CNN and Transformer methods.

What carries the argument

Dynamic Assembly Forest (DAF), which extends the deep forest model through dynamic assembly of tree ensembles to enable scalable feature learning for binary image classification.

If this is right

  • DAF requires significantly fewer parameters than CNN or Transformer detectors.
  • It runs with much lower computational cost and does not require GPUs.
  • It maintains competitive detection performance under standard evaluation protocols.
  • It offers a practical substitute for heavyweight models in resource-constrained settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar dynamic assembly ideas could extend to detecting other forms of AI-generated media such as audio or video.
  • The low resource footprint opens the possibility of running detectors directly on consumer devices for real-time checks.
  • Ongoing refinement of the assembly process might increase robustness as new diffusion models emerge.

Load-bearing premise

The deep forest structure, when extended with dynamic assembly, can learn features discriminative enough to tell real images from diffusion-generated ones without the layered representations of neural networks.

What would settle it

A side-by-side test on a standard benchmark set of real and diffusion-generated images in which DAF accuracy falls substantially below a typical CNN detector.

Figures

Figures reproduced from arXiv: 2604.09106 by Mengxin Fu, Yuezun Li.

Figure 1
Figure 1. Figure 1: The overview of our proposed DAF model. considered by fusing patches at different scales. Thanks to the proposed Dynamic Assembly Strategy, the advanced feature extraction can be deployed without being limited by memory overhead. Compared to existing DNN-based methods, DAF re￾quires significantly fewer parameters, incurs substantially lower computational cost, and can be deployed without GPUs. Extensive ex… view at source ↗
Figure 2
Figure 2. Figure 2: The detailed explanation of our proposed dynamic assembly strategy. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The detailed process of Task-specific Feature Extraction. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: The effect of different p values on ACC (%) on ImageNet and LSUN-B datasets with three settings: Single, Multi (Recalc.), Multi (Avg.). DAF on the test set. The detailed experimental results of the three experimental setups under different p-values can be found in Supplementary. It can be seen that with p in￾creasing, the performance slightly drops (no more than 1%). This is because, in our method, a small… view at source ↗
Figure 4
Figure 4. Figure 4: Effect of the number of image patch partitions. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: The visualization results trained on the LSUN-B dataset [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The visualization results trained on the Imagenet dataset [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Robustness under various perturbations, including Gaussian blur (top) and JPEG compression (bottom). [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
read the original abstract

Diffusion models are known for generating high-quality images, causing serious security concerns. To combat this, most efforts rely on deep neural networks (e.g., CNNs and Transformers), while largely overlooking the potential of traditional machine learning models. In this paper, we freshly investigate such alternatives and proposes a novel Dynamic Assembly Forest model (DAF) to detect diffusion-generated images. Built upon the deep forest paradigm, DAF addresses the inherent limitations in feature learning and scalable training, making it an effective diffusion-generated image detector. Compared to existing DNN-based methods, DAF has significantly fewer parameters, much lower computational cost, and can be deployed without GPUs, while achieving competitive performance under standard evaluation protocols. These results highlight the strong potential of the proposed method as a practical substitute for heavyweight DNN models in resource-constrained scenarios. Our code and models are available at https://github.com/OUC-VAS/DAF.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a Dynamic Assembly Forest (DAF) model extending the deep forest paradigm to detect diffusion-generated images. It claims that DAF overcomes limitations in feature learning and scalable training of traditional ML models, delivering competitive detection performance with far fewer parameters, lower computational cost, and GPU-free deployment compared to DNN-based detectors under standard protocols.

Significance. If the empirical claims hold, the work establishes a lightweight tree-ensemble alternative for diffusion-image detection, enabling practical use in resource-constrained environments and expanding options beyond DNNs. The public release of code and models supports reproducibility and follow-up work.

major comments (3)
  1. [Method] Method section: The claim that dynamic assembly addresses 'inherent limitations in feature learning' of deep forests is load-bearing for the central thesis, yet the manuscript supplies no explicit description of the assembly rule, no equations defining how it constructs hierarchical representations from raw images, and no mechanism for capturing frequency-domain or spatial artifacts typical in diffusion traces.
  2. [Experiments] Experiments section: The assertion of 'competitive performance' and efficiency gains rests on empirical comparisons, but the text provides no ablation isolating the dynamic-assembly component versus standard deep-forest baselines (multi-grained scanning + cascade forests), no reported metrics or dataset splits, and no tables comparing parameter counts or inference time against both DNNs and unmodified deep forests.
  3. [Results] Results and Discussion: Without a direct comparison to unmodified deep-forest models or an analysis showing that dynamic assembly supplies the missing automatic hierarchical feature extraction, the weakest assumption (that forests can match DNN representational power for this task) remains untested and risks underfitting subtle generation artifacts.
minor comments (2)
  1. [Abstract] The abstract mentions 'standard evaluation protocols' without naming the datasets, metrics (e.g., AUC, accuracy), or specific diffusion generators used; this should be stated explicitly in the introduction or experimental setup.
  2. [Figures/Tables] Figure captions and tables lack clarity on what 'competitive' means numerically; adding side-by-side parameter counts and FLOPs against at least two recent DNN detectors would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments, which have helped us identify areas for improvement in the manuscript. We address each of the major comments below and plan to revise the paper accordingly.

read point-by-point responses
  1. Referee: [Method] Method section: The claim that dynamic assembly addresses 'inherent limitations in feature learning' of deep forests is load-bearing for the central thesis, yet the manuscript supplies no explicit description of the assembly rule, no equations defining how it constructs hierarchical representations from raw images, and no mechanism for capturing frequency-domain or spatial artifacts typical in diffusion traces.

    Authors: We agree that the Method section would benefit from a more detailed and formal presentation of the dynamic assembly mechanism. In the revised version, we will provide an explicit description of the assembly rule, include the relevant equations that formalize how hierarchical representations are constructed from raw images, and explain the mechanisms by which frequency-domain and spatial artifacts are captured. This will strengthen the central thesis and make the contributions clearer. revision: yes

  2. Referee: [Experiments] Experiments section: The assertion of 'competitive performance' and efficiency gains rests on empirical comparisons, but the text provides no ablation isolating the dynamic-assembly component versus standard deep-forest baselines (multi-grained scanning + cascade forests), no reported metrics or dataset splits, and no tables comparing parameter counts or inference time against both DNNs and unmodified deep forests.

    Authors: We acknowledge the need for more comprehensive experimental validation. We will add an ablation study that isolates the effect of the dynamic-assembly component compared to standard deep forest baselines. The revised manuscript will include details on the metrics used, the dataset splits, and new tables that report parameter counts and inference times for DAF, DNN-based methods, and unmodified deep forests. These additions will better substantiate the claims of competitive performance and efficiency gains. revision: yes

  3. Referee: [Results] Results and Discussion: Without a direct comparison to unmodified deep-forest models or an analysis showing that dynamic assembly supplies the missing automatic hierarchical feature extraction, the weakest assumption (that forests can match DNN representational power for this task) remains untested and risks underfitting subtle generation artifacts.

    Authors: To address this concern, we will include in the revised Results and Discussion section a direct comparison with unmodified deep-forest models. We will also provide an analysis, potentially with supporting figures or metrics, demonstrating that the dynamic assembly enables the automatic hierarchical feature extraction necessary to capture subtle generation artifacts. This will test the assumption regarding the representational power of forests for this task and mitigate concerns about underfitting. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical model proposal with no self-referential derivations

full rationale

The paper introduces DAF as a dynamic assembly extension to the deep forest paradigm for diffusion-image detection. No equations, uniqueness theorems, or first-principles derivations appear that reduce by construction to fitted parameters, self-definitions, or prior self-citations. Claims of addressing feature-learning limits and achieving competitive performance rest on empirical comparisons under standard protocols, which are externally falsifiable and independent of any circular reduction. Self-citations (if present) are not load-bearing for the core result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the unproven effectiveness of the newly proposed DAF architecture for this task and on the assumption that deep forests can overcome their stated feature-learning limitations when applied to diffusion artifacts.

axioms (1)
  • domain assumption Deep forest models can be dynamically assembled to address inherent limitations in feature learning and scalable training for image classification tasks.
    Invoked to justify the DAF design as an effective substitute for DNNs.
invented entities (1)
  • Dynamic Assembly Forest (DAF) no independent evidence
    purpose: Detect diffusion-generated images with low parameter count and computational cost.
    New model introduced in the paper with no prior independent evidence provided.

pith-pipeline@v0.9.0 · 5446 in / 1157 out tokens · 57667 ms · 2026-05-10T18:12:39.925646+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

50 extracted references · 5 canonical work pages · 3 internal anchors

  1. [1]

    Inzamamul Alam, Md Tanvir Islam, and Simon S. Woo. Specxnet: A dual-domain convolutional network for robust deepfake detection. InProceedings of the ACM International Conference on Multimedia, 2025. 2

  2. [2]

    Fakeinversion: Learning to detect images from un- seen text-to-image models by inverting stable diffusion

    George Cazenavette, Avneesh Sud, Thomas Leung, and Ben Usman. Fakeinversion: Learning to detect images from un- seen text-to-image models by inverting stable diffusion. In IEEE Computer Vision and Pattern Recognition Conference,

  3. [3]

    DRCT: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images

    Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. DRCT: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. InInter- national Conference on Machine Learning, 2024. 2

  4. [4]

    A single simple patch is all you need for ai-generated image detection.arXiv preprint arXiv:2402.01123, 2024

    Jiaxuan Chen, Jieteng Yao, and Li Niu. A single simple patch is all you need for ai-generated image detection.arXiv preprint arXiv:2402.01123, 2024. 1

  5. [5]

    Fire: Robust detection of diffusion- generated images via frequency-guided reconstruction error

    Beilin Chu, Xuan Xu, Xin Wang, Yufei Zhang, Weike You, and Linna Zhou. Fire: Robust detection of diffusion- generated images via frequency-guided reconstruction error. InIEEE Computer Vision and Pattern Recognition Confer- ence, 2025. 1, 2, 5, 8

  6. [6]

    Zero-shot detection of ai-generated images

    Davide Cozzolino, Giovanni Poggi, Matthias Nießner, and Luisa Verdoliva. Zero-shot detection of ai-generated images. InEuropean Conference on Computer Vision, 2024. 1

  7. [7]

    Dalal and B

    N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. InIEEE Computer Vision and Pattern Recognition Conference, 2005. 5

  8. [8]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li Jia Li, Kai Li, and Feifei Li. Imagenet: A large-scale hierarchical image database. InIEEE Computer Vision and Pattern Recognition Conference, 2009. 1

  9. [9]

    Diffusion models beat gans on image synthesis

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. InConference on Neural In- formation Processing Systems, 2021. 2, 5

  10. [10]

    Doloriel and Ngai-Man Cheung

    Chandler Timm C. Doloriel and Ngai-Man Cheung. Fre- quency masking for universal deepfake detection. InIEEE International Conference on Acoustics, Speech, and Signal Processing, 2024. 5 9

  11. [11]

    Leveraging fre- quency analysis for deep fake image recognition

    Joel Frank, Thorsten Eisenhofer, Lea Sch ¨onherr, Asja Fis- cher, Dorothea Kolossa, and Thorsten Holz. Leveraging fre- quency analysis for deep fake image recognition. InPro- ceedings of the 37th International Conference on Machine Learning, 2020. 6

  12. [12]

    Explaining and Harnessing Adversarial Examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014. 8

  13. [13]

    Vec- tor quantized diffusion model for text-to-image synthesis

    Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, and Baining Guo. Vec- tor quantized diffusion model for text-to-image synthesis. In IEEE Computer Vision and Pattern Recognition Conference,

  14. [14]

    Denoising dif- fusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InConference on Neural Infor- mation Processing Systems, 2020. 2, 5

  15. [15]

    Diffusion epistemic uncertainty with asymmetric learning for diffusion-generated image detection

    Yingsong Huang, Hui Guo, Jing Huang, Bing Bai, and Qi Xiong. Diffusion epistemic uncertainty with asymmetric learning for diffusion-generated image detection. InIEEE International Conference on Computer Vision, 2025. 2

  16. [16]

    Fusing global and local features for general- ized ai-synthesized image detection

    Yan Ju, Shan Jia, Lipeng Ke, Hongfei Xue, Koki Nagano, and Siwei Lyu. Fusing global and local features for general- ized ai-synthesized image detection. InIEEE International Conference on Image Processing, 2022. 6

  17. [17]

    Semantic-aware reconstruction er- ror for detecting ai-generated images.arXiv preprint arXiv:2508.09487, 2025

    Ju Yeon Kang, Jaehong Park, Semin Kim, Ji Won Yoon, and Nam Soo Kim. Semantic-aware reconstruction er- ror for detecting ai-generated images.arXiv preprint arXiv:2508.09487, 2025. 2

  18. [18]

    A style-based generator architecture for generative adversarial networks

    Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In IEEE Computer Vision and Pattern Recognition Conference,

  19. [19]

    Analyzing and improving the image quality of stylegan

    Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. InIEEE Computer Vision and Pattern Recognition Conference, 2020

  20. [20]

    Alias-free generative adversarial networks

    Tero Karras, Miika Aittala, Samuli Laine, Erik H ¨ark¨onen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. InConference on Neural Information Processing Systems, 2021. 3

  21. [21]

    Lecun, L

    Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient- based learning applied to document recognition.Proceed- ings of the IEEE, 1998. 2

  22. [22]

    Freqblender: Enhancing deepfake detec- tion by blending frequency knowledge

    Hanzhe Li, Jiaran Zhou, Yuezun Li, Baoyuan Wu, Bin Li, and Junyu Dong. Freqblender: Enhancing deepfake detec- tion by blending frequency knowledge. InConference on Neural Information Processing Systems, 2024. 5

  23. [23]

    GLIGEN: Open-set grounded text-to-image generation

    Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jian- wei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. GLIGEN: Open-set grounded text-to-image generation. In IEEE Computer Vision and Pattern Recognition Conference,

  24. [24]

    Detecting generated images by real images

    Bo Liu, Fan Yang, Xiuli Bi, Bin Xiao, Weisheng Li, and Xinbo Gao. Detecting generated images by real images. In European Conference on Computer Vision, 2022. 6

  25. [25]

    Pseudo numerical methods for diffusion models on manifolds

    Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds. InIn- ternational Conference on Learning Representations, 2022. 1, 5

  26. [26]

    Zhengzhe Liu, Xiaojuan Qi, and Philip H.S. Torr. Global texture enhancement for fake face detection in the wild. In IEEE Computer Vision and Pattern Recognition Conference,

  27. [27]

    Forensicsforest family: A series of multi- scale hierarchical cascade forests for detecting gan-generated faces.IEEE Transactions on Information Forensics and Se- curity, 2024

    Jiucui Lu, Jiaran Zhou, Junyu Dong, Bin Li, Siwei Lyu, and Yuezun Li. Forensicsforest family: A series of multi- scale hierarchical cascade forests for detecting gan-generated faces.IEEE Transactions on Information Forensics and Se- curity, 2024. 1, 3, 4, 6

  28. [28]

    LaRE2: Latent reconstruction error based method for diffusion-generated image detection

    Yunpeng Luo, Junlong Du, Ke Yan, and Shouhong Ding. LaRE2: Latent reconstruction error based method for diffusion-generated image detection. InIEEE Computer Vi- sion and Pattern Recognition Conference, 2024. 2

  29. [29]

    F2trans: High-frequency fine-grained transformer for face forgery detection.IEEE Transactions on Information Forensics and Security, 2023

    Changtao Miao, Zichang Tan, Qi Chu, Huan Liu, Honggang Hu, and Nenghai Yu. F2trans: High-frequency fine-grained transformer for face forgery detection.IEEE Transactions on Information Forensics and Security, 2023. 5

  30. [30]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, 2021. 5

  31. [31]

    Towards uni- versal fake image detectors that generalize across generative models

    Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- versal fake image detectors that generalize across generative models. InIEEE Computer Vision and Pattern Recognition Conference, 2023. 2, 6

  32. [32]

    Thinking in frequency: Face forgery detection by mining frequency-aware clues

    Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InEuropean Conference on Computer Vision, 2020. 5

  33. [33]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image gen- eration with clip latents.arXiv preprint arXiv:2204.06125,

  34. [34]

    Aerob- lade: Training-free detection of latent diffusion images using autoencoder reconstruction error

    Jonas Ricker, Denis Lukovnikov, and Asja Fischer. Aerob- lade: Training-free detection of latent diffusion images using autoencoder reconstruction error. InIEEE Computer Vision and Pattern Recognition Conference, 2024. 1, 2, 5

  35. [35]

    High-resolution image syn- thesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InIEEE Computer Vision and Pattern Recognition Conference, 2022. 2, 5

  36. [36]

    Photorealistic text-to-image diffusion models with deep lan- guage understanding

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali- mans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep lan- guage understanding. InConference on Neural Information Processing Systems, 2022. 5

  37. [37]

    Denois- ing diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InInternational Conference on Learning Representations, 2021. 2

  38. [38]

    Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection

    Chuangchuang Tan, Huan Liu, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection. InIEEE Computer Vision and Pattern Recognition Conference, 2024. 6 10

  39. [39]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InConference on Neural Information Processing Systems, 2017. 4

  40. [40]

    Cnn-generated images are sur- prisingly easy to spot...for now

    Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are sur- prisingly easy to spot...for now. InIEEE Computer Vision and Pattern Recognition Conference, 2020. 2, 5, 6, 8

  41. [41]

    Opensdi: Spotting diffusion-generated images in the open world

    Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. Opensdi: Spotting diffusion-generated images in the open world. In IEEE Computer Vision and Pattern Recognition Conference,

  42. [42]

    Dire for diffusion-generated image detection

    Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion-generated image detection. InIEEE International Conference on Computer Vision, 2023. 1, 2, 5, 6, 8

  43. [43]

    A sanity check for AI- generated image detection

    Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Weidi Xie. A sanity check for AI- generated image detection. InInternational Conference on Learning Representations, 2025. 5, 6

  44. [44]

    Mastering text-to-image diffu- sion: Recaptioning, planning, and generating with multi- modal llms

    Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Ste- fano Ermon, and Bin Cui. Mastering text-to-image diffu- sion: Recaptioning, planning, and generating with multi- modal llms. InInternational Conference on Machine Learn- ing, 2024. 1

  45. [45]

    LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

    Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop.arXiv preprint arXiv:1506.03365, 2015. 1

  46. [46]

    Choose your expert: Uncertainty-guided expert selection for continual deepfake detection

    Xueyi Zhang, Peiyin Zhu, Jinping Sui, Xiaoda Yang, Jiahe Tian, Mingrui Lao, Siqi Cai, Yanming Guo, and Jun Tang. Choose your expert: Uncertainty-guided expert selection for continual deepfake detection. InProceedings of the ACM International Conference on Multimedia, 2025. 2

  47. [47]

    Rich and poor texture contrast: A simple yet effective ap- proach for ai-generated image detection.arXiv preprint (arXiv), 2023

    Nan Zhong, Yiran Xu, Zhenxing Qian, and Xinpeng Zhang. Rich and poor texture contrast: A simple yet effective ap- proach for ai-generated image detection.arXiv preprint (arXiv), 2023. 6

  48. [48]

    Deep forest: Towards an alterna- tive to deep neural networks

    Zhihua Zhou and Ji Feng. Deep forest: Towards an alterna- tive to deep neural networks. InInternational Joint Confer- ence on Artificial Intelligence, 2017. 1, 2

  49. [49]

    Genimage: A million-scale benchmark for detecting ai-generated image

    Mingjian Zhu, Hanting Chen, Qiangyu Y AN, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million-scale benchmark for detecting ai-generated image. InConference on Neural Information Processing Systems, 2023. 5

  50. [50]

    Towards good generalizations for diffusion gen- erated image detection using multiple reconstruction con- trastive learning

    Wanyi Zhuang, Qi Chu, Tao Gong, Changtao Miao, and Nenghai Yu. Towards good generalizations for diffusion gen- erated image detection using multiple reconstruction con- trastive learning. InProceedings of the ACM International Conference on Multimedia, 2025. 2 11