arxiv: 2605.06357 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.AI· cs.CV

Recognition: unknown

Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations

Yuan Du , Mitchel Hill , HanQin Cai

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords adversarial attacksgradient checkpointingstochastic purificationdiffusion modelsLangevin samplingrobustness evaluationmemory efficiencywhite-box attacks

0 comments

The pith

Gradient checkpointing enables exact full-gradient attacks on stochastic purification defenses that reveal overestimated robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that many evaluations of purification-based defenses have relied on approximate gradients because backpropagating through long iterative trajectories exceeds memory limits. These approximations weaken the attack and produce inflated robustness numbers. Gradient checkpointing recomputes intermediate steps to keep memory low while preserving exact gradients end-to-end. Protocols that fix the randomness across purification runs further ensure fair comparisons. If correct, this means prior claims about the security of diffusion and Langevin purification methods must be revisited with stronger attacks.

Core claim

The central claim is that gradient checkpointing makes exact end-to-end gradient computation practical through long purification trajectories in diffusion-based and Langevin-based defenses. This enables full-gradient adaptive white-box attacks that are stronger than those using approximate backpropagation, which can weaken the attack signal. When combined with methods to control stochastic variability across trajectories, the approach produces more reliable robustness measurements and uncovers vulnerabilities missed by earlier evaluations.

What carries the argument

The MEFA framework, which uses gradient checkpointing to trade recomputation for lower memory while preserving exact gradients through iterative purification trajectories, paired with stochastic control protocols.

If this is right

Full-gradient attacks produce stronger state-of-the-art ℓ∞ and ℓ2 white-box attacks against diffusion-based and EBM purification defenses.
Approximate gradients risk overestimating defense robustness by weakening the attack signal.
Controlling stochastic variability across purification trajectories is required for reproducible robustness metrics.
The framework supports reliable probing of out-of-distribution robustness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The checkpointing technique could extend to other memory-intensive iterative computations in machine learning beyond purification.
Many published robustness figures for stochastic generative defenses may require re-evaluation under exact-gradient conditions.
Standard evaluation protocols for complex defenses could incorporate similar memory-efficient exact gradient methods to reduce bias.

Load-bearing premise

That approximate backpropagation through purification trajectories meaningfully weakens the attack signal and leads to overestimation of robustness, while controlling stochastic variability is feasible and necessary for fair comparisons.

What would settle it

A controlled experiment in which full-gradient attacks achieve no higher success rate or lower robustness score than approximate-gradient attacks on the same diffusion and Langevin purification defenses.

Figures

Figures reproduced from arXiv: 2605.06357 by HanQin Cai, Mitchel Hill, Yuan Du.

**Figure 1.** Figure 1: Diagram for adversarial attack process (top) and validation process (bottom). In the MEFA-PGD gradient checkpointing process (top right): each image state represents the forward computation steps from initial state x to the final state xˆ through attack on the defense model transformation T(x), specifically described as from x (0) to x (K) for EBM in (7), xT to x0 for Diffusion model in (2) or (3). All int… view at source ↗

**Figure 2.** Figure 2: Purified images present random prediction with occlusion attributions [52] where red and blue regions indicate high and low relevance areas to confidence reduction, respectively. Left to right: original, adversarial state, 1st-trial natural state, 1st-trial adversarial state, 2nd-trial natural state, and 2nd-trial adversarial state. reducing the runtime overhead introduced by gradient checkpointing, extend… view at source ↗

**Figure 3.** Figure 3: Memory(left), time(right) comparison between MEFA Framework’s gradient checkpointing (CKPT) and standard Pytorch (PT) backpropagation for smooth EBM against PGD ℓ∞ attack on one CIFAR-10 image with WideResNet-28-10 view at source ↗

**Figure 4.** Figure 4: Loss trend stabilized at 20 attack steps. red line: PGD loss [36], green line: APGD loss from autoattack [12]. 0’s on the lines: wrong prediction. vertical dash lines: first broken state view at source ↗

**Figure 5.** Figure 5: Defense replicates required analysis to stabilize prediction due to purification randomness. Histogram of correct logits and highest incorrect logits, with mean and standard deviation for the adversarial image with different Hd. Left: Hd = 1. Middle: Hd = 20. Right: Hd = 50. Non-zero Second Derivatives In the context of adversarial defense evaluation, we implement a rigorous approach to ensure that the gra… view at source ↗

**Figure 6.** Figure 6: % of images with p-value<0.001% by Hdef = 1, 20, 50, 100 larger batch size if memory allows; however, its runtime advantage will not be significant. Surrogate Attack takes about 4 minutes per image on the same hardware. We acknowledge Surrogate Attack’s speed advantage, yet we believe a stronger attack is important for defense evaluation, and the longer runtime of MEFA is worth the trade-off. Future accele… view at source ↗

read the original abstract

This work studies the robust evaluation of iterative stochastic purification defenses under white-box adversarial attacks. Our key technical insight is that gradient checkpointing makes exact end-to-end gradient computation through long purification trajectories practical by trading additional recomputation for substantially lower memory usage. This enables full-gradient adaptive attacks against diffusion- and Langevin-based purification defenses, where prior evaluations often resort to approximate backpropagation due to memory constraints. These approximations can weaken the attack signal and risk overestimating robustness. In parallel, stochasticity in iterative purification is frequently under-controlled, even though different purification trajectories can substantially change reported robustness metrics. Building on this insight, we introduce a memory-efficient full-gradient evaluation framework for stochastic purification defenses. The framework combines checkpointed backpropagation with evaluation protocols that control stochastic variability, thereby reducing memory bottlenecks while preserving exact gradients. We evaluate diffusion-based purification and Langevin sampling with Energy-Based Models (EBMs), demonstrating that full-gradient attacks uncover vulnerabilities missed by approximate-gradient evaluations. Our framework yields stronger state-of-the-art $\ell_{\infty}$ and $\ell_{2}$ white-box attacks and further supports probing out-of-distribution robustness. Overall, our results show that exact-gradient evaluation is essential for reliable benchmarking of iterative stochastic defenses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Checkpointing makes exact full-gradient attacks practical on purification defenses, but fixing seeds for stochastic control risks evaluating single trajectories instead of expected robustness.

read the letter

The main point is that gradient checkpointing lets you compute exact end-to-end gradients through long iterative purification chains without the memory cost that forced prior work to use approximations. This produces stronger adaptive attacks on diffusion and Langevin-based defenses, and the experiments show those attacks find vulnerabilities that approximate-gradient evaluations missed. The framework also includes protocols to manage the randomness in purification steps so that comparisons stay consistent across runs.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces the Memory Efficient Full-gradient Attacks (MEFA) framework, which applies gradient checkpointing to enable exact end-to-end gradient computation through long iterative purification trajectories in diffusion-based and Langevin/EBM defenses. It argues that memory constraints previously forced approximate backpropagation, weakening attacks and overestimating robustness, and that stochastic variability in purification must be controlled for reliable evaluation. Experiments demonstrate that full-gradient adaptive attacks uncover missed vulnerabilities and achieve stronger ℓ∞ and ℓ2 white-box performance than prior approximate methods, while also supporting out-of-distribution robustness probing.

Significance. If the central empirical claims hold, the work provides a practical tool for more accurate benchmarking of stochastic purification defenses, which are increasingly common. Enabling full gradients without prohibitive memory use addresses a real engineering bottleneck and could improve the reliability of robustness numbers reported in the literature. The emphasis on controlling stochasticity is a useful reminder, though its implementation requires care to ensure representativeness.

major comments (1)

Evaluation protocol section: the paper acknowledges that different purification trajectories (via seeds) can substantially alter reported robustness metrics, yet the control mechanism appears to fix a single seed or trajectory for the attack optimization. This evaluates robustness against one realization rather than the defense's distribution over trajectories. Without averaging success rates over multiple independent trajectories or adopting a minimax formulation, the full-gradient numbers may still not generalize to the stochastic defense, weakening the claim that the framework yields reliably stronger and more trustworthy evaluations compared to prior approximate methods.

minor comments (2)

Abstract: the claim of 'stronger state-of-the-art' attacks would benefit from a brief quantitative statement (e.g., improvement in robust accuracy or attack success rate) to give readers an immediate sense of effect size.
Notation and figures: ensure consistent use of symbols for checkpointing overhead and trajectory length across equations and plots; some figure legends could more explicitly label full-gradient vs. approximate backprop curves.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment on the evaluation protocol below and agree that revisions are needed to better represent the stochastic defense.

read point-by-point responses

Referee: Evaluation protocol section: the paper acknowledges that different purification trajectories (via seeds) can substantially alter reported robustness metrics, yet the control mechanism appears to fix a single seed or trajectory for the attack optimization. This evaluates robustness against one realization rather than the defense's distribution over trajectories. Without averaging success rates over multiple independent trajectories or adopting a minimax formulation, the full-gradient numbers may still not generalize to the stochastic defense, weakening the claim that the framework yields reliably stronger and more trustworthy evaluations compared to prior approximate methods.

Authors: We agree that fixing a single seed evaluates robustness on one specific realization of the stochastic purification process rather than its full distribution, which limits how representative the results are. The manuscript controls stochasticity by fixing the random seed during attack optimization and evaluation to ensure a consistent, reproducible gradient signal for the checkpointed full-gradient backpropagation. However, this does not fully address the referee's concern about generalization across trajectories. In the revision, we will update the evaluation protocol section to report attack success rates and robustness metrics averaged over multiple independent purification trajectories (using 10 different seeds per experiment). We will also add a brief discussion of minimax alternatives, noting that they are computationally prohibitive for long trajectories even with MEFA, while our averaged approach offers a practical and more reliable alternative to single-trajectory evaluation. These changes will strengthen the claims about trustworthy evaluations. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces a memory-efficient framework using gradient checkpointing for exact end-to-end gradients through long stochastic purification trajectories, plus protocols to control variability. This rests on standard checkpointing (trading compute for memory) and empirical comparisons to prior approximate-gradient attacks, without any self-definitional reduction, fitted inputs renamed as predictions, or load-bearing self-citations. The central claim that full gradients yield stronger attacks is demonstrated via evaluations on diffusion and EBM defenses rather than being tautological or forced by prior author work. The skeptic concern about seed-fixing affecting representativeness is a potential correctness issue but does not create circularity in the presented derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard deep-learning techniques without introducing new fitted parameters or postulated entities beyond the named framework itself.

axioms (2)

standard math Gradient checkpointing computes exact gradients by selective recomputation of activations
Invoked as the core technical mechanism enabling memory-efficient backpropagation through long trajectories.
domain assumption Stochastic variability in purification trajectories can substantially affect reported robustness metrics
Stated as motivation for controlled evaluation protocols.

pith-pipeline@v0.9.0 · 5518 in / 1338 out tokens · 64218 ms · 2026-05-08T12:49:57.549296+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 25 canonical work pages · 8 internal anchors

[1]

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. InInternational conference on machine learning, pages 274–283. PMLR, 2018

2018
[2]

Synthesizing robust adver- sarial examples, 2018

Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adver- sarial examples, 2018. URLhttps://arxiv.org/abs/1707.07397

work page arXiv 2018
[3]

Threat model- agnostic adversarial defense using diffusion models, 2022

Tsachi Blau, Roy Ganz, Bahjat Kawar, Alex Bronstein, and Michael Elad. Threat model- agnostic adversarial defense using diffusion models, 2022. URL https://arxiv.org/abs/ 2207.08089

work page arXiv 2022
[4]

arXiv preprint arXiv:1712.04248 (2017)

Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models, 2018. URL https://arxiv. org/abs/1712.04248

work page arXiv 2018
[5]

A zeroth-order block coordi- nate descent algorithm for huge-scale black-box optimization

HanQin Cai, Yuchen Lou, Daniel Mckenzie, and Wotao Yin. A zeroth-order block coordi- nate descent algorithm for huge-scale black-box optimization. InProceedings of the 38th International Conference on Machine Learning, volume 139, pages 1193–1203. PMLR, 2021

2021
[6]

Towards evaluating the robustness of neural networks,

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks,
[7]

URLhttps://arxiv.org/abs/1608.04644

work page arXiv
[8]

On evaluating adversarial robustness.IEEE Symposium on Security and Privacy, 2019

Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey Kurakin. On evaluating adversarial robustness.IEEE Symposium on Security and Privacy, 2019

2019
[9]

Robust classification via a single diffusion model.arXiv preprint arXiv:2305.15241, 2023

Huanran Chen, Yinpeng Dong, Zhengyi Wang, Xiao Yang, Chengqi Duan, Hang Su, and Jun Zhu. Robust classification via a single diffusion model.arXiv preprint arXiv:2305.15241, 2023

work page arXiv 2023
[10]

Beats: Audio pre-training with acoustic tokenizers.arXiv preprint arXiv:2212.09058, 2022

Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, and Furu Wei. Beats: Audio pre-training with acoustic tokenizers.arXiv preprint arXiv:2212.09058, 2022

work page arXiv 2022
[11]

Training Deep Nets with Sublinear Memory Cost

Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. Training deep nets with sublinear memory cost.arXiv preprint arXiv:1604.06174, 2016

work page internal anchor Pith review arXiv 2016
[12]

Certified adversarial robustness via randomized smoothing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. Ininternational conference on machine learning, pages 1310–1320. PMLR, 2019

2019
[13]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. InICML, 2020

2020
[14]

Mind the box: l1-apgd for sparse adversarial attacks on image classifiers

Francesco Croce and Matthias Hein. Mind the box: l1-apgd for sparse adversarial attacks on image classifiers. InICML, 2021

2021
[15]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

2009
[16]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 10

work page internal anchor Pith review arXiv 2010
[17]

Implicit generation and modeling with energy based models

Yilun Du and Igor Mordatch. Implicit generation and modeling with energy based models. In Advances in Neural Information Processing Systems, volume 32, 2019

2019
[18]

Robust physical-world attacks on machine learning models,

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning models, 2018. URLhttps://arxiv.org/abs/1707.08945

work page arXiv 2018
[19]

AST: Audio spectrogram transformer.Inter- speech, 2021

Yuchen Gong, Yu-An Chung, and James Glass. AST: Audio spectrogram transformer.Inter- speech, 2021

2021
[20]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014

2014
[21]

Explaining and harnessing ad- versarial examples.Proceedings of the 33rd International Conference on Machine Learning, 2014

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing ad- versarial examples.Proceedings of the 33rd International Conference on Machine Learning, 2014

2014
[22]

Your classifier is secretly an energy based model and you should treat it like one

Will Grathwohl, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. Your classifier is secretly an energy based model and you should treat it like one. InInternational Conference on Learning Representations, 2020

2020
[23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

2016
[24]

Stochastic security: Adversarial defense using long-run dynamics of energy-based models

Mitch Hill, Jonathan Craig Mitchell, and Song-Chun Zhu. Stochastic security: Adversarial defense using long-run dynamics of energy-based models. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=gwFTuzxJW0

2021
[25]

Denoising diffusion probabilistic models, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020

2020
[26]

Diffattack: Evasion attacks against diffusion-based adversarial purification.Advances in Neural Information Processing Systems, 36, 2024

Mintong Kang, Dawn Song, and Bo Li. Diffattack: Evasion attacks against diffusion-based adversarial purification.Advances in Neural Information Processing Systems, 36, 2024

2024
[27]

Diffbreak: Is diffusion-based purification robust?arXiv preprint arXiv:2411.16598, 2024

Andre Kassis, Urs Hengartner, and Yaoliang Yu. Diffbreak: Is diffusion-based purification robust?arXiv preprint arXiv:2411.16598, 2024

work page arXiv 2024
[28]

Curvature-aware derivative-free optimization.Journal of Scientific Computing, 103(43):1–28, 2025

Bumsu Kim, Daniel McKenzie, HanQin Cai, and Wotao Yin. Curvature-aware derivative-free optimization.Journal of Scientific Computing, 103(43):1–28, 2025

2025
[29]

Springer, 1992

Peter E Kloeden, Eckhard Platen, Peter E Kloeden, and Eckhard Platen.Stochastic differential equations. Springer, 1992

1992
[30]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

2009
[31]

Imagenet classification with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. InNeurIPS, 2012

2012
[32]

Robust evaluation of diffusion-based adversarial purification

Minjong Lee and Dongwoo Kim. Robust evaluation of diffusion-based adversarial purification. arXiv preprint arXiv:2303.09051, 2023

work page arXiv 2023
[33]

Adbm: Adversarial diffusion bridge model for reliable adversarial purification

Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, and Xiaolin Hu. Adbm: Adversarial diffusion bridge model for reliable adversarial purification. arXiv preprint arXiv:2408.00315, 2024

work page arXiv 2024
[34]

Scalable gradients and variational inference for stochastic differential equations

Xuechen Li, Ting-Kam Leonard Wong, Ricky TQ Chen, and David K Duvenaud. Scalable gradients and variational inference for stochastic differential equations. InSymposium on Advances in Approximate Bayesian Inference, pages 1–28. PMLR, 2020

2020
[35]

Towards understanding the robustness of diffusion-based purification: A stochastic perspective

Yiming Liu, Kezhao Liu, Yao Xiao, Ziyi Dong, Xiaogang Xu, Pengxu Wei, and Liang Lin. Towards understanding the robustness of diffusion-based purification: A stochastic perspective. arXiv preprint arXiv:2404.14309, 2024. 11

work page arXiv 2024
[36]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021

2021
[37]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018

2018
[38]

Diffusion models for adversarial purification.arXiv preprint arXiv:2205.07460, 2022

Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. Diffusion models for adversarial purification.arXiv preprint arXiv:2205.07460, 2022

work page arXiv 2022
[39]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report, 2024. URLhttps://arxiv.org/abs/2303.08774

work page internal anchor Pith review arXiv 2024
[40]

Practical black-box attacks against deep learning systems using adversarial examples,

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning, 2017. URL https: //arxiv.org/abs/1602.02697

work page arXiv 2017
[41]

Defense-gan: Protecting classifiers against adversarial attacks using generative models

Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models.arXiv preprint arXiv:1805.06605, 2018

work page arXiv 2018
[42]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review arXiv 2010
[43]

Sliced score matching: A scalable approach to density and score estimation

Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. InUncertainty in Artificial Intelligence, pages 574–584. PMLR, 2020

2020
[44]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review arXiv 2011
[45]

One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841, October

Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841, October
[46]

doi: 10.1109/tevc.2019.2890858

ISSN 1941-0026. doi: 10.1109/tevc.2019.2890858. URL http://dx.doi.org/10. 1109/TEVC.2019.2890858

work page doi:10.1109/tevc.2019.2890858 1941
[47]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfel- low, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review arXiv 2013
[48]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review arXiv 2023
[49]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017

2017
[50]

A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011

Pascal Vincent. A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011

2011
[51]

A theory of generative convnet

Jianwen Xie, Yang Lu, Song-Chun Zhu, and Yingnian Wu. A theory of generative convnet. InProceedings of the 33rd International Conference on Machine Learning, pages 2635–2644, 2016

2016
[52]

Adversarial purification with score-based generative models.arXiv preprint arXiv:2106.06041, 2021

Jongmin Yoon, Sung Ju Hwang, and Juho Lee. Adversarial purification with score-based generative models.arXiv preprint arXiv:2106.06041, 2021

work page arXiv 2021
[54]

URLhttp://arxiv.org/abs/1605.07146

work page internal anchor Pith review arXiv
[55]

Visualizing and understanding convolutional networks,

Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks,
[56]

URLhttps://arxiv.org/abs/1311.2901. 12 Supplementary Materials for Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations A MEFA Framework - PGD We summarize the full gradient attack algorithm for MEFA Framework PGD with fixed step size used for adversarial attacks in Algorithm 1. The APGD attack with adaptive step siz...

work page arXiv
[57]

detach () 1# ME fr am wo rk Code : Full g ra die nt computation , input : CE loss 2n e t _ g r a d s = torch

[0]. detach () 1# ME fr am wo rk Code : Full g ra die nt computation , input : CE loss 2n e t _ g r a d s = torch . a ut ogr ad . grad ( 3loss , 4[ X _ r e p e a t _ p u r i f i e d ]
[58]

to ( device ) 7next_t = next_ts [n -i -1]

[0] 6curr_t = curr_ts [n -i -1]. to ( device ) 7next_t = next_ts [n -i -1]. to ( device ) 8net_out = ys [n -i -1] 9curr_y = torch . a ut og ra d . Va ria bl e ( 10net_out , 11r e q u i r e s _ g r a d = True
[59]

to ( device ) 14prev_t , prev_y = curr_t , curr_y 15next_layer , c u r r _ e x t r a = self

to ( device ) 13noi = noises [n -i -1]. to ( device ) 14prev_t , prev_y = curr_t , curr_y 15next_layer , c u r r _ e x t r a = self . step ( 16curr_t , 17next_t , 18curr_y , 19noi , 20c u r r _ e x t r a 21) 22curr_t = next_t 23n e t _ g r a d s = torch . a ut ogr ad . grad ( 24next_layer , 25[ curr_y ] , 26g r a d _ o u t p u t s = n e t _ g r a d s
[60]

view ( 29X _ r e p e a t _ p u r i f i e d

[0] 28grad = n e t _ g r a d s . view ( 29X _ r e p e a t _ p u r i f i e d . shape 30) Key Technical Differences for Score SDE-based Defense: • Left: Approximates gradients through segmented forward/backward passes via adjoint method with deviated- reconstruction • Right: Computes exact gradients from CE loss via SDE solver gradient checkpointing withO(1...
[61]

detach () 1# MEFA F r a m e w o r k Code : Full gr ad ie nt computation , input : n e t _ g r a d s 2net_out = x _ d i f f _ l i s t [ n - k - 1] 3net_out = torch

[0]. detach () 1# MEFA F r a m e w o r k Code : Full gr ad ie nt computation , input : n e t _ g r a d s 2net_out = x _ d i f f _ l i s t [ n - k - 1] 3net_out = torch . au to gra d . Va ri ab le ( 4net_out , 5r e q u i r e s _ g r a d = True
[62]

sample

to ( model . device ) 7noi_out = n o i _ d i f f _ l i s t [ 8n - k - 1 9]. to ( model . device ) 10t = torch . tensor ( 11[ n - k - 1] * shape [0] , 12device = X . device 13) 14 15# DDPM s am pl in g forward process 16n e x t _ l a y e r = s c h e d u l e r . d d i m _ s a m p l e _ g r a d ( 17model , 18net_out , 19t , 20noi_out , 21m o d e l _ k w a r ...
[63]

leakiness

[0] Key Technical Differences for DDPM-based Defense: • Left: Approximates gradients at specific noise levels using deviated-reconstruction loss. • Right: Computes exact gradients from CE loss through full DDPM sampling chain withO(1)memory. • Red lines: Core gradient computation steps. Attack Steps The loss trajectory of one image example stabilizes afte...