How Noise Benefits AI-generated Image Detection
Pith reviewed 2026-05-17 20:50 UTC · model grok-4.3
The pith
Constructing positive-incentive noise in feature space helps CLIP suppress shortcuts and detect AI-generated images more reliably across many generators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce Positive-Incentive Noise for CLIP (PiN-CLIP) that jointly optimizes a noise generator and a detection network under a variational positive-incentive principle. Noise is formed in feature space by cross-attention fusion of visual and categorical semantic features. When this noise is injected during fine-tuning of the visual encoder, shortcut-sensitive directions are suppressed while stable forensic cues are amplified, producing more robust and generalized artifact representations.
What carries the argument
Positive-incentive noise built via cross-attention fusion of visual and categorical semantic features and injected into the feature space to fine-tune the visual encoder.
If this is right
- The method reaches new state-of-the-art accuracy on an open-world dataset of images from 42 distinct generative models.
- It delivers an average accuracy gain of 5.4 points compared with previous approaches.
- The detector extracts more robust forensic features that generalize beyond the training distribution.
- Reliance on spurious shortcuts learned during training is reduced without generator-specific tuning.
Where Pith is reading between the lines
- The same noise-injection idea might help other vision models that suffer from shortcut learning in tasks like object recognition or medical imaging.
- Extending the approach to newer generators released after the 42-model dataset could test whether the gained robustness scales with rapid model progress.
- Replacing the CLIP backbone with other vision-language encoders would show whether the benefit depends on the particular architecture or on the noise principle itself.
Load-bearing premise
The specific noise will suppress only the unwanted shortcut directions without creating new failure modes or needing adjustments for each generator.
What would settle it
A drop or no gain in accuracy when the trained model is tested on synthetic images produced by generative models held out from the noise-construction and fine-tuning stages.
Figures
read the original abstract
The rapid advancement of generative models has made real and synthetic images increasingly indistinguishable. Although extensive efforts have been devoted to detecting AI-generated images, out-of-distribution generalization remains a persistent challenge. We trace this weakness to spurious shortcuts exploited during training and we also observe that small feature-space perturbations can mitigate shortcut dominance. To address this problem in a more controllable manner, we propose the Positive-Incentive Noise for CLIP (PiN-CLIP), which jointly trains a noise generator and a detection network under a variational positive-incentive principle. Specifically, we construct positive-incentive noise in the feature space via cross-attention fusion of visual and categorical semantic features. During optimization, the noise is injected into the feature space to fine-tune the visual encoder, suppressing shortcut-sensitive directions while amplifying stable forensic cues, thereby enabling the extraction of more robust and generalized artifact representations. Comparative experiments are conducted on an open-world dataset comprising synthetic images generated by 42 distinct generative models. Our method achieves new state-of-the-art performance, with notable improvements of 5.4 in average accuracy over existing approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes PiN-CLIP, which jointly optimizes a noise generator and detection network under a variational positive-incentive principle. Positive-incentive noise is constructed in CLIP feature space via cross-attention fusion of visual and categorical semantic features; this noise is injected to suppress shortcut-sensitive directions while amplifying stable forensic cues. Comparative experiments on an open-world dataset of synthetic images from 42 generative models report new state-of-the-art performance with a 5.4 average accuracy gain over prior methods.
Significance. If the reported gains are reproducible and the mechanism is isolated, the work would advance out-of-distribution generalization in AI-generated image detection, an area of high practical importance for forensic and misinformation applications. The variational noise-injection framework offers a controllable alternative to ad-hoc augmentation and could influence shortcut-mitigation techniques in other vision tasks.
major comments (2)
- [Abstract] Abstract: the central SOTA claim rests on a 5.4 average accuracy improvement over existing approaches, yet the abstract (and by extension the experimental section) provides no error bars, exact baseline implementations, data exclusion rules, or ablation details on the noise generator and cross-attention weights; this leaves the performance claim unverifiable from the reported summary.
- [Method] Method / Experiments: the positive-incentive noise is asserted to specifically suppress shortcut-sensitive directions in the CLIP encoder without introducing new failure modes or requiring per-generator tuning, but no directional analyses, saliency maps, or controlled ablations against non-semantic noise are described to rule out gains from generic perturbation or joint optimization alone.
minor comments (1)
- [Abstract] Abstract: the phrase 'variational positive-incentive principle' is used without a concise mathematical statement or pointer to its definition in the main text.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of verifiability and mechanistic validation that we address below. We have prepared revisions to strengthen the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central SOTA claim rests on a 5.4 average accuracy improvement over existing approaches, yet the abstract (and by extension the experimental section) provides no error bars, exact baseline implementations, data exclusion rules, or ablation details on the noise generator and cross-attention weights; this leaves the performance claim unverifiable from the reported summary.
Authors: We agree that greater transparency in the abstract would improve verifiability. In the revised version we will expand the abstract to note that all reported accuracies are means over five independent runs with standard deviations provided in the experimental tables, that baselines follow the original authors' public implementations with our re-implementations using the same training protocol, and that the open-world dataset excludes only images with obvious artifacts as described in Section 4.1. We will also add a concise reference to the ablation results on the noise generator and cross-attention fusion weights. These additions preserve the 5.4-point gain while making the claim directly verifiable from the summary. revision: yes
-
Referee: [Method] Method / Experiments: the positive-incentive noise is asserted to specifically suppress shortcut-sensitive directions in the CLIP encoder without introducing new failure modes or requiring per-generator tuning, but no directional analyses, saliency maps, or controlled ablations against non-semantic noise are described to rule out gains from generic perturbation or joint optimization alone.
Authors: We recognize that explicit isolation of the mechanism would strengthen the contribution. While the current experiments demonstrate consistent gains across 42 generators without per-model tuning, we will incorporate the requested analyses in the revision: directional comparisons of the top principal components and gradient directions in CLIP feature space before versus after noise injection; saliency visualizations that highlight reduced activation on known shortcut regions; and a controlled ablation replacing semantic positive-incentive noise with isotropic Gaussian noise of matched magnitude under otherwise identical joint optimization. These additions will directly address whether the observed improvements arise from targeted suppression rather than generic regularization. revision: yes
Circularity Check
No circularity: empirical SOTA claim rests on external open-world test set
full rationale
The paper defines PiN-CLIP as a joint training procedure under a variational positive-incentive principle that constructs and injects cross-attention-fused noise to fine-tune the CLIP encoder. The load-bearing result is an accuracy improvement of 5.4 on a held-out open-world dataset spanning 42 distinct generators. This performance metric is measured on external data and does not reduce to any fitted parameter or self-defined quantity by construction. No equations equate the claimed suppression of shortcut directions to the training objective itself, and no self-citation chain is invoked to justify uniqueness or the core ansatz. The derivation therefore remains self-contained and externally falsifiable.
Axiom & Free-Parameter Ledger
free parameters (2)
- noise generator parameters
- cross-attention fusion weights
axioms (1)
- domain assumption Variational positive-incentive principle can be applied to generate helpful rather than adversarial noise in feature space
invented entities (1)
-
Positive-incentive noise
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we propose the Positive-Incentive Noise for CLIP (PiN-CLIP), which jointly trains a noise generator and a detection network under a variational positive-incentive principle... suppressing shortcut-sensitive directions while amplifying stable forensic cues
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Better fine- tuning by reducing representational collapse.arXiv preprint arXiv:2008.03156, 2020
Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Na- man Goyal, Luke Zettlemoyer, and Sonal Gupta. Better fine- tuning by reducing representational collapse.arXiv preprint arXiv:2008.03156, 2020. 3
-
[2]
Contrasting deepfakes diffusion via contrastive learning and global-local similarities
Lorenzo Baraldi, Federico Cocchi, Marcella Cornia, Alessan- dro Nicolosi, and Rita Cucchiara. Contrasting deepfakes diffusion via contrastive learning and global-local similarities. InEuropean Conference on Computer Vision, pages 199–216. Springer, 2025. 1
work page 2025
-
[3]
Roberto Benzi, Alfonso Sutera, and Angelo Vulpiani. The mechanism of stochastic resonance.Journal of Physics A: mathematical and general, 14(11):L453, 1981. 3
work page 1981
-
[4]
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Andrew Brock. Large scale gan training for high fidelity natural image synthesis.arXiv preprint arXiv:1809.11096,
work page internal anchor Pith review Pith/arXiv arXiv
- [5]
-
[6]
George Cazenavette, Avneesh Sud, Thomas Leung, and Ben Usman. Fakeinversion: Learning to detect images from un- seen text-to-image models by inverting stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 10759–10769, 2024. 2
work page 2024
-
[7]
Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. InForty- first International Conference on Machine Learning, 2024. 2, 3
work page 2024
-
[8]
Simswap: An efficient framework for high fidelity face swapping
Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. Simswap: An efficient framework for high fidelity face swapping. InProceedings of the 28th ACM international conference on multimedia, pages 2003–2011, 2020. 6
work page 2003
-
[9]
Ruoxin Chen, Junwei Xi, Zhiyuan Yan, Ke-Yue Zhang, Shuang Wu, Jingyi Xie, Xu Chen, Lei Xu, Isabel Guan, Taiping Yao, et al. Dual data alignment makes ai- generated image detector easier generalizable.arXiv preprint arXiv:2505.14359, 2025. 2
-
[10]
Zehao Chen and Hua Yang. Manipulated face detector: Joint spatial and frequency domain attention network.arXiv preprint arXiv:2005.02958, 1(2):4, 2020. 1, 2
-
[11]
Stargan: Unified generative adversarial networks for multi-domain image-to-image trans- lation
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image trans- lation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797, 2018. 6
work page 2018
-
[12]
Fire: Robust detection of diffusion- generated images via frequency-guided reconstruction error
Beilin Chu, Xuan Xu, Xin Wang, Yufei Zhang, Weike You, and Linna Zhou. Fire: Robust detection of diffusion- generated images via frequency-guided reconstruction error. InProceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 12830–12839, 2025. 2
work page 2025
-
[13]
Diffusion models in vision: A survey
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey. IEEE transactions on pattern analysis and machine intelli- gence, 45(9):10850–10869, 2023. 1
work page 2023
-
[14]
Google DeepMind. Imagen3. https://deepmind. google/technologies/imagen-3. 2024. 6
work page 2024
-
[15]
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021. 6
work page 2021
-
[16]
Scaling rectified flow trans- formers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim En- tezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024. 6
work page 2024
-
[17]
Vector quan- tized diffusion model for text-to-image synthesis
Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, and Baining Guo. Vector quan- tized diffusion model for text-to-image synthesis. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10696–10706, 2022. 6
work page 2022
-
[18]
A bias-free training paradigm for more general ai-generated image detection
Fabrizio Guillaro, Giada Zingarini, Ben Usman, Avneesh Sud, Davide Cozzolino, and Luisa Verdoliva. A bias-free training paradigm for more general ai-generated image detection. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 18685–18694, 2025. 1, 3
work page 2025
-
[19]
Nick Huang, Aaron Gokaslan, V olodymyr Kuleshov, and James Tompkin. The gan is dead; long live the gan! a modern gan baseline.Advances in Neural Information Processing Systems, 37:44177–44215, 2024. 6
work page 2024
-
[20]
Enhance vision-language alignment with noise
Sida Huang, Hongyuan Zhang, and Xuelong Li. Enhance vision-language alignment with noise. InProceedings of the AAAI Conference on Artificial Intelligence, pages 17449– 17457, 2025. 2, 3
work page 2025
-
[21]
Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. InProceedings of the 58th annual meeting of the Association for Computational Linguistics, pages 2177–2190, 2020. 3
work page 2020
-
[22]
Progressive growing of GANs for improved quality, stabil- ity, and variation
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of GANs for improved quality, stabil- ity, and variation. InInternational Conference on Learning Representations, 2018. 6 9
work page 2018
-
[23]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019. 1, 6
work page 2019
-
[24]
Analyzing and improv- ing the image quality of stylegan
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improv- ing the image quality of stylegan. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 8110–8119, 2020. 6
work page 2020
-
[25]
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks.Advances in neural informa- tion processing systems, 34:852–863, 2021. 6
work page 2021
-
[26]
Adam: A Method for Stochastic Optimization
Diederik P Kingma. Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980, 2014. 6
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[27]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes.arXiv preprint arXiv:1312.6114, 2013. 4
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[28]
Leveraging rep- resentations from intermediate encoder-blocks for synthetic image detection
Christos Koutlis and Symeon Papadopoulos. Leveraging rep- resentations from intermediate encoder-blocks for synthetic image detection. InEuropean Conference on Computer Vi- sion, pages 394–411. Springer, 2024. 1, 2
work page 2024
-
[29]
Black Forest Labs. Flux.1-dev. https://huggingface. co/black-forest-labs/FLUX.1-dev. 2024. 6
work page 2024
-
[30]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision- language understanding and generation. InInternational conference on machine learning, pages 12888–12900. PMLR,
-
[31]
Improving synthetic image detection towards generalization: An image transformation perspective
Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng. Improving synthetic image detection towards generalization: An image transformation perspective. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, pages 2405– 2414, 2025. 2, 3, 6, 7
work page 2025
-
[32]
Positive-incentive noise.IEEE Transactions on Neural Networks and Learning Systems, 35(6):8708–8714,
Xuelong Li. Positive-incentive noise.IEEE Transactions on Neural Networks and Learning Systems, 35(6):8708–8714,
-
[33]
Fakeclr: Exploring contrastive learning for solving latent discontinuity in data-efficient gans
Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, and Bin Li. Fakeclr: Exploring contrastive learning for solving latent discontinuity in data-efficient gans. InEuropean Con- ference on Computer Vision, pages 598–615. Springer, 2022. 1
work page 2022
-
[34]
Ziqiang Li, Muhammad Usman, Rentuo Tao, Pengfei Xia, Chaoyue Wang, Huanhuan Chen, and Bin Li. A systematic survey of regularization and normalization in gans.ACM Computing Surveys, 55(11):1–37, 2023. 1
work page 2023
-
[35]
Photomaker: Customizing realistic human photos via stacked id embedding
Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming- Ming Cheng, and Ying Shan. Photomaker: Customizing realistic human photos via stacked id embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8640–8650, 2024. 6
work page 2024
-
[36]
Ziqiang Li, Jiazhen Yan, Ziwen He, Kai Zeng, Weiwei Jiang, Lizhi Xiong, and Zhangjie Fu. Is artificial intelligence gen- erated image detection a solved problem?arXiv preprint arXiv:2505.12335, 2025. 6, 7, 8
-
[37]
Transfer learning of real image features with soft contrastive loss for fake image detection
Ziyou Liang, Weifeng Liu, Run Wang, Mengjie Wu, Boheng Li, Yuyang Zhang, Lina Wang, and Xinyi Yang. Transfer learning of real image features with soft contrastive loss for fake image detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 26281–26289, 2025. 1, 2
work page 2025
-
[38]
Forgery-aware adaptive trans- former for generalizable synthetic image detection
Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Jingdong Wang, and Yao Zhao. Forgery-aware adaptive trans- former for generalizable synthetic image detection. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10770–10780, 2024. 2
work page 2024
-
[39]
Fine-grained face swapping via regional gan inversion
Zhian Liu, Maomao Li, Yong Zhang, Cairong Wang, Qi Zhang, Jue Wang, and Yongwei Nie. Fine-grained face swapping via regional gan inversion. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 8578–8587, 2023. 6
work page 2023
-
[40]
General- izing face forgery detection with high-frequency features
Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. General- izing face forgery detection with high-frequency features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16317–16326, 2021. 1, 2
work page 2021
- [41]
-
[42]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741, 2021. 6
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[43]
Towards uni- versal fake image detectors that generalize across generative models
Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- versal fake image detectors that generalize across generative models. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 24480–24489,
-
[44]
Semantic image synthesis with spatially-adaptive nor- malization
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic image synthesis with spatially-adaptive nor- malization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2337–2346,
-
[45]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Ad- vances in neural information processing systems, 32, 2019. 6
work page 2019
-
[46]
Multi-layer random perturbation training for improving model generalization efficiently
Lis Kanashiro Pereira, Yuki Taya, and Ichiro Kobayashi. Multi-layer random perturbation training for improving model generalization efficiently. InProceedings of the Fourth Black- boxNLP Workshop on Analyzing and Interpreting Neural Net- works for NLP, pages 303–310, 2021. 3
work page 2021
-
[47]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis.arXiv preprint arXiv:2307.01952, 2023. 6
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[48]
Thinking in frequency: Face forgery detection by mining frequency-aware clues
Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InEuropean conference on computer vision, pages 86–103. Springer, 2020. 1, 2
work page 2020
-
[49]
Learning transferable visual models from natural language supervi- 10 sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- 10 sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2
work page 2021
-
[50]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image genera- tion with clip latents.arXiv preprint arXiv:2204.06125, 1(2): 3, 2022. 6
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[51]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1, 6
work page 2022
-
[52]
Faceforen- sics++: Learning to detect manipulated facial images
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Chris- tian Riess, Justus Thies, and Matthias Nießner. Faceforen- sics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1–11, 2019. 6
work page 2019
-
[53]
Stylegan- xl: Scaling stylegan to large diverse datasets
Axel Sauer, Katja Schwarz, and Andreas Geiger. Stylegan- xl: Scaling stylegan to large diverse datasets. InACM SIG- GRAPH 2022 conference proceedings, pages 1–10, 2022. 6
work page 2022
-
[54]
Crackling noise.nature, 410(6825):242–250, 2001
James P Sethna, Karin A Dahmen, and Christopher R Myers. Crackling noise.nature, 410(6825):242–250, 2001. 3
work page 2001
-
[55]
Blendface: Re-designing identity encoders for face-swapping
Kaede Shiohara, Xingchao Yang, and Takafumi Taketomi. Blendface: Re-designing identity encoders for face-swapping. InProceedings of the IEEE/CVF international conference on computer vision, pages 7634–7644, 2023. 6
work page 2023
-
[56]
Learning on gradients: Generalized arti- facts representation for gan-generated images detection
Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, and Yunchao Wei. Learning on gradients: Generalized arti- facts representation for gan-generated images detection. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12105–12114, 2023. 2
work page 2023
-
[57]
Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Frequency-aware deepfake de- tection: Improving generalizability through frequency space domain learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5052–5060, 2024. 1, 2, 6
work page 2024
-
[58]
Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 28130–28139, 2024. 1, 2, 6, 7, 8
work page 2024
-
[59]
Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, and Yunchao Wei. C2p-clip: In- jecting category common prompt in clip to enhance gener- alization in deepfake detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7184–7192, 2025. 1, 2
work page 2025
-
[60]
Renshuai Tao, Chuangchuang Tan, Huan Liu, Jiakai Wang, Haotong Qin, Yakun Chang, Wei Wang, Rongrong Ni, and Yao Zhao. Sagnet: Decoupling semantic-agnostic artifacts from limited training data for robust generalization in deep- fake detection.IEEE Transactions on Information Forensics and Security, 2025. 1
work page 2025
-
[61]
Midjourney Team. Midjourney v6.1. https://www. midjourney.com/home, . 2024. 6
work page 2024
-
[62]
OpenAI Team. Dall-e 3 ai image generator. https:// dalle3.ai/, . 2024. 6
work page 2024
-
[63]
Haofan Wang, Ashley Kleynhans, and Abe Estrada. Inswap. https://github.com/haofanwang/inswapper
-
[64]
InstantID: Zero-shot Identity-Preserving Generation in Seconds
Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li, Xu Tang, and Yao Hu. Instantid: Zero-shot identity-preserving generation in seconds.arXiv preprint arXiv:2401.07519, 2024. 6
work page internal anchor Pith review arXiv 2024
-
[65]
Cnn-generated images are surprisingly easy to spot
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 8695–8704, 2020. 2, 3, 6, 7, 8
work page 2020
-
[66]
Dire for diffusion- generated image detection
Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion- generated image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22445– 22455, 2023. 2
work page 2023
-
[67]
Which face is real? https: //www.whichfaceisreal.com/
Jevin West and Carl Bergstrom. Which face is real? https: //www.whichfaceisreal.com/. 2019. 6
work page 2019
-
[68]
Infinite-id: Identity-preserved personalization via id-semantics decoupling paradigm
Yi Wu, Ziqiang Li, Heliang Zheng, Chaoyue Wang, and Bin Li. Infinite-id: Identity-preserved personalization via id-semantics decoupling paradigm. InEuropean Conference on Computer Vision, pages 279–296. Springer, 2024. 1, 6
work page 2024
-
[69]
https://xihe.mindspore.cn/modelzoo/wukong
Wukong. https://xihe.mindspore.cn/modelzoo/wukong. 2022. 6
work page 2022
-
[70]
Data Noising as Smoothing in Neural Network Language Models
Ziang Xie, Sida I Wang, Jiwei Li, Daniel Lévy, Aiming Nie, Dan Jurafsky, and Andrew Y Ng. Data noising as smoothing in neural network language models.arXiv preprint arXiv:1703.02573, 2017. 3
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[71]
Jiazhen Yan, Ziqiang Li, Ziwen He, and Zhangjie Fu. Gener- alizable deepfake detection via effective local-global feature extraction.arXiv preprint arXiv:2501.15253, 2025. 1, 2, 6, 7
-
[72]
Jiazhen Yan, Fan Wang, Weiwei Jiang, Ziqiang Li, and Zhangjie Fu. Ns-net: Decoupling clip semantic informa- tion through null-space for generalizable ai-generated image detection.arXiv preprint arXiv:2508.01248, 2025. 1, 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[73]
A sanity check for ai-generated image detection.arXiv preprint arXiv:2406.19435, 2024
Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Weidi Xie. A sanity check for ai-generated image detection.arXiv preprint arXiv:2406.19435, 2024. 6, 7, 8
-
[74]
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decompo- sition for generalizable ai-generated image detection.arXiv preprint arXiv:2411.15633, 2024. 2, 6, 7, 8
work page internal anchor Pith review arXiv 2024
-
[75]
Dˆ 3: Scaling up deepfake detection by learning from discrepancy
Yongqi Yang, Zhihao Qian, Ye Zhu, Olga Russakovsky, and Yu Wu. Dˆ 3: Scaling up deepfake detection by learning from discrepancy. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23850–23859, 2025. 1, 3
work page 2025
-
[76]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models.arXiv preprint arXiv:2308.06721,
work page internal anchor Pith review Pith/arXiv arXiv
-
[77]
Low-rank few-shot adaptation of vision-language models
Maxime Zanella and Ismail Ben Ayed. Low-rank few-shot adaptation of vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1593–1603, 2024. 6 11
work page 2024
-
[78]
Styleswin: Transformer-based gan for high-resolution image generation
Bowen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, and Baining Guo. Styleswin: Transformer-based gan for high-resolution image generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11304–11314, 2022. 6
work page 2022
-
[79]
Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise
Hongyuan Zhang, Yanchen Xu, Sida Huang, and Xuelong Li. Data augmentation of contrastive learning is estimating positive-incentive noise.arXiv preprint arXiv:2408.09929,
work page internal anchor Pith review Pith/arXiv arXiv
-
[80]
Towards universal ai-generated image de- tection by variational information bottleneck network
Haifeng Zhang, Qinghui He, Xiuli Bi, Weisheng Li, Bo Liu, and Bin Xiao. Towards universal ai-generated image de- tection by variational information bottleneck network. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 23828–23837, 2025. 1, 2, 6, 7, 8
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.