pith. machine review for the scientific record. sign in

arxiv: 2605.06357 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.AI· cs.CV

Recognition: unknown

Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords adversarial attacksgradient checkpointingstochastic purificationdiffusion modelsLangevin samplingrobustness evaluationmemory efficiencywhite-box attacks
0
0 comments X

The pith

Gradient checkpointing enables exact full-gradient attacks on stochastic purification defenses that reveal overestimated robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that many evaluations of purification-based defenses have relied on approximate gradients because backpropagating through long iterative trajectories exceeds memory limits. These approximations weaken the attack and produce inflated robustness numbers. Gradient checkpointing recomputes intermediate steps to keep memory low while preserving exact gradients end-to-end. Protocols that fix the randomness across purification runs further ensure fair comparisons. If correct, this means prior claims about the security of diffusion and Langevin purification methods must be revisited with stronger attacks.

Core claim

The central claim is that gradient checkpointing makes exact end-to-end gradient computation practical through long purification trajectories in diffusion-based and Langevin-based defenses. This enables full-gradient adaptive white-box attacks that are stronger than those using approximate backpropagation, which can weaken the attack signal. When combined with methods to control stochastic variability across trajectories, the approach produces more reliable robustness measurements and uncovers vulnerabilities missed by earlier evaluations.

What carries the argument

The MEFA framework, which uses gradient checkpointing to trade recomputation for lower memory while preserving exact gradients through iterative purification trajectories, paired with stochastic control protocols.

If this is right

  • Full-gradient attacks produce stronger state-of-the-art ℓ∞ and ℓ2 white-box attacks against diffusion-based and EBM purification defenses.
  • Approximate gradients risk overestimating defense robustness by weakening the attack signal.
  • Controlling stochastic variability across purification trajectories is required for reproducible robustness metrics.
  • The framework supports reliable probing of out-of-distribution robustness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The checkpointing technique could extend to other memory-intensive iterative computations in machine learning beyond purification.
  • Many published robustness figures for stochastic generative defenses may require re-evaluation under exact-gradient conditions.
  • Standard evaluation protocols for complex defenses could incorporate similar memory-efficient exact gradient methods to reduce bias.

Load-bearing premise

That approximate backpropagation through purification trajectories meaningfully weakens the attack signal and leads to overestimation of robustness, while controlling stochastic variability is feasible and necessary for fair comparisons.

What would settle it

A controlled experiment in which full-gradient attacks achieve no higher success rate or lower robustness score than approximate-gradient attacks on the same diffusion and Langevin purification defenses.

Figures

Figures reproduced from arXiv: 2605.06357 by HanQin Cai, Mitchel Hill, Yuan Du.

Figure 1
Figure 1. Figure 1: Diagram for adversarial attack process (top) and validation process (bottom). In the MEFA-PGD gradient checkpointing process (top right): each image state represents the forward computation steps from initial state x to the final state xˆ through attack on the defense model transformation T(x), specifically described as from x (0) to x (K) for EBM in (7), xT to x0 for Diffusion model in (2) or (3). All int… view at source ↗
Figure 2
Figure 2. Figure 2: Purified images present random prediction with occlusion attributions [52] where red and blue regions indicate high and low relevance areas to confidence reduction, respectively. Left to right: original, adversarial state, 1st-trial natural state, 1st-trial adversarial state, 2nd-trial natural state, and 2nd-trial adversarial state. reducing the runtime overhead introduced by gradient checkpointing, extend… view at source ↗
Figure 3
Figure 3. Figure 3: Memory(left), time(right) comparison between MEFA Framework’s gradient checkpointing (CKPT) and standard Pytorch (PT) backpropagation for smooth EBM against PGD ℓ∞ attack on one CIFAR-10 image with WideResNet-28-10 view at source ↗
Figure 4
Figure 4. Figure 4: Loss trend stabilized at 20 attack steps. red line: PGD loss [36], green line: APGD loss from autoattack [12]. 0’s on the lines: wrong prediction. vertical dash lines: first broken state view at source ↗
Figure 5
Figure 5. Figure 5: Defense replicates required analysis to stabilize prediction due to purification randomness. Histogram of correct logits and highest incorrect logits, with mean and standard deviation for the adversarial image with different Hd. Left: Hd = 1. Middle: Hd = 20. Right: Hd = 50. Non-zero Second Derivatives In the context of adversarial defense evaluation, we implement a rigorous approach to ensure that the gra… view at source ↗
Figure 6
Figure 6. Figure 6: % of images with p-value<0.001% by Hdef = 1, 20, 50, 100 larger batch size if memory allows; however, its runtime advantage will not be significant. Surrogate Attack takes about 4 minutes per image on the same hardware. We acknowledge Surrogate Attack’s speed advantage, yet we believe a stronger attack is important for defense evaluation, and the longer runtime of MEFA is worth the trade-off. Future accele… view at source ↗
read the original abstract

This work studies the robust evaluation of iterative stochastic purification defenses under white-box adversarial attacks. Our key technical insight is that gradient checkpointing makes exact end-to-end gradient computation through long purification trajectories practical by trading additional recomputation for substantially lower memory usage. This enables full-gradient adaptive attacks against diffusion- and Langevin-based purification defenses, where prior evaluations often resort to approximate backpropagation due to memory constraints. These approximations can weaken the attack signal and risk overestimating robustness. In parallel, stochasticity in iterative purification is frequently under-controlled, even though different purification trajectories can substantially change reported robustness metrics. Building on this insight, we introduce a memory-efficient full-gradient evaluation framework for stochastic purification defenses. The framework combines checkpointed backpropagation with evaluation protocols that control stochastic variability, thereby reducing memory bottlenecks while preserving exact gradients. We evaluate diffusion-based purification and Langevin sampling with Energy-Based Models (EBMs), demonstrating that full-gradient attacks uncover vulnerabilities missed by approximate-gradient evaluations. Our framework yields stronger state-of-the-art $\ell_{\infty}$ and $\ell_{2}$ white-box attacks and further supports probing out-of-distribution robustness. Overall, our results show that exact-gradient evaluation is essential for reliable benchmarking of iterative stochastic defenses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces the Memory Efficient Full-gradient Attacks (MEFA) framework, which applies gradient checkpointing to enable exact end-to-end gradient computation through long iterative purification trajectories in diffusion-based and Langevin/EBM defenses. It argues that memory constraints previously forced approximate backpropagation, weakening attacks and overestimating robustness, and that stochastic variability in purification must be controlled for reliable evaluation. Experiments demonstrate that full-gradient adaptive attacks uncover missed vulnerabilities and achieve stronger ℓ∞ and ℓ2 white-box performance than prior approximate methods, while also supporting out-of-distribution robustness probing.

Significance. If the central empirical claims hold, the work provides a practical tool for more accurate benchmarking of stochastic purification defenses, which are increasingly common. Enabling full gradients without prohibitive memory use addresses a real engineering bottleneck and could improve the reliability of robustness numbers reported in the literature. The emphasis on controlling stochasticity is a useful reminder, though its implementation requires care to ensure representativeness.

major comments (1)
  1. Evaluation protocol section: the paper acknowledges that different purification trajectories (via seeds) can substantially alter reported robustness metrics, yet the control mechanism appears to fix a single seed or trajectory for the attack optimization. This evaluates robustness against one realization rather than the defense's distribution over trajectories. Without averaging success rates over multiple independent trajectories or adopting a minimax formulation, the full-gradient numbers may still not generalize to the stochastic defense, weakening the claim that the framework yields reliably stronger and more trustworthy evaluations compared to prior approximate methods.
minor comments (2)
  1. Abstract: the claim of 'stronger state-of-the-art' attacks would benefit from a brief quantitative statement (e.g., improvement in robust accuracy or attack success rate) to give readers an immediate sense of effect size.
  2. Notation and figures: ensure consistent use of symbols for checkpointing overhead and trajectory length across equations and plots; some figure legends could more explicitly label full-gradient vs. approximate backprop curves.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment on the evaluation protocol below and agree that revisions are needed to better represent the stochastic defense.

read point-by-point responses
  1. Referee: Evaluation protocol section: the paper acknowledges that different purification trajectories (via seeds) can substantially alter reported robustness metrics, yet the control mechanism appears to fix a single seed or trajectory for the attack optimization. This evaluates robustness against one realization rather than the defense's distribution over trajectories. Without averaging success rates over multiple independent trajectories or adopting a minimax formulation, the full-gradient numbers may still not generalize to the stochastic defense, weakening the claim that the framework yields reliably stronger and more trustworthy evaluations compared to prior approximate methods.

    Authors: We agree that fixing a single seed evaluates robustness on one specific realization of the stochastic purification process rather than its full distribution, which limits how representative the results are. The manuscript controls stochasticity by fixing the random seed during attack optimization and evaluation to ensure a consistent, reproducible gradient signal for the checkpointed full-gradient backpropagation. However, this does not fully address the referee's concern about generalization across trajectories. In the revision, we will update the evaluation protocol section to report attack success rates and robustness metrics averaged over multiple independent purification trajectories (using 10 different seeds per experiment). We will also add a brief discussion of minimax alternatives, noting that they are computationally prohibitive for long trajectories even with MEFA, while our averaged approach offers a practical and more reliable alternative to single-trajectory evaluation. These changes will strengthen the claims about trustworthy evaluations. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces a memory-efficient framework using gradient checkpointing for exact end-to-end gradients through long stochastic purification trajectories, plus protocols to control variability. This rests on standard checkpointing (trading compute for memory) and empirical comparisons to prior approximate-gradient attacks, without any self-definitional reduction, fitted inputs renamed as predictions, or load-bearing self-citations. The central claim that full gradients yield stronger attacks is demonstrated via evaluations on diffusion and EBM defenses rather than being tautological or forced by prior author work. The skeptic concern about seed-fixing affecting representativeness is a potential correctness issue but does not create circularity in the presented derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard deep-learning techniques without introducing new fitted parameters or postulated entities beyond the named framework itself.

axioms (2)
  • standard math Gradient checkpointing computes exact gradients by selective recomputation of activations
    Invoked as the core technical mechanism enabling memory-efficient backpropagation through long trajectories.
  • domain assumption Stochastic variability in purification trajectories can substantially affect reported robustness metrics
    Stated as motivation for controlled evaluation protocols.

pith-pipeline@v0.9.0 · 5518 in / 1338 out tokens · 64218 ms · 2026-05-08T12:49:57.549296+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 25 canonical work pages · 8 internal anchors

  1. [1]

    Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

    Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. InInternational conference on machine learning, pages 274–283. PMLR, 2018

  2. [2]

    Synthesizing robust adver- sarial examples, 2018

    Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adver- sarial examples, 2018. URLhttps://arxiv.org/abs/1707.07397

  3. [3]

    Threat model- agnostic adversarial defense using diffusion models, 2022

    Tsachi Blau, Roy Ganz, Bahjat Kawar, Alex Bronstein, and Michael Elad. Threat model- agnostic adversarial defense using diffusion models, 2022. URL https://arxiv.org/abs/ 2207.08089

  4. [4]

    arXiv preprint arXiv:1712.04248 (2017)

    Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models, 2018. URL https://arxiv. org/abs/1712.04248

  5. [5]

    A zeroth-order block coordi- nate descent algorithm for huge-scale black-box optimization

    HanQin Cai, Yuchen Lou, Daniel Mckenzie, and Wotao Yin. A zeroth-order block coordi- nate descent algorithm for huge-scale black-box optimization. InProceedings of the 38th International Conference on Machine Learning, volume 139, pages 1193–1203. PMLR, 2021

  6. [6]

    Towards evaluating the robustness of neural networks,

    Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks,

  7. [7]

    URLhttps://arxiv.org/abs/1608.04644

  8. [8]

    On evaluating adversarial robustness.IEEE Symposium on Security and Privacy, 2019

    Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey Kurakin. On evaluating adversarial robustness.IEEE Symposium on Security and Privacy, 2019

  9. [9]

    Robust classification via a single diffusion model.arXiv preprint arXiv:2305.15241, 2023

    Huanran Chen, Yinpeng Dong, Zhengyi Wang, Xiao Yang, Chengqi Duan, Hang Su, and Jun Zhu. Robust classification via a single diffusion model.arXiv preprint arXiv:2305.15241, 2023

  10. [10]

    Beats: Audio pre-training with acoustic tokenizers.arXiv preprint arXiv:2212.09058, 2022

    Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, and Furu Wei. Beats: Audio pre-training with acoustic tokenizers.arXiv preprint arXiv:2212.09058, 2022

  11. [11]

    Training Deep Nets with Sublinear Memory Cost

    Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. Training deep nets with sublinear memory cost.arXiv preprint arXiv:1604.06174, 2016

  12. [12]

    Certified adversarial robustness via randomized smoothing

    Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. Ininternational conference on machine learning, pages 1310–1320. PMLR, 2019

  13. [13]

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

    Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. InICML, 2020

  14. [14]

    Mind the box: l1-apgd for sparse adversarial attacks on image classifiers

    Francesco Croce and Matthias Hein. Mind the box: l1-apgd for sparse adversarial attacks on image classifiers. InICML, 2021

  15. [15]

    Imagenet: A large- scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

  16. [16]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 10

  17. [17]

    Implicit generation and modeling with energy based models

    Yilun Du and Igor Mordatch. Implicit generation and modeling with energy based models. In Advances in Neural Information Processing Systems, volume 32, 2019

  18. [18]

    Robust physical-world attacks on machine learning models,

    Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning models, 2018. URLhttps://arxiv.org/abs/1707.08945

  19. [19]

    AST: Audio spectrogram transformer.Inter- speech, 2021

    Yuchen Gong, Yu-An Chung, and James Glass. AST: Audio spectrogram transformer.Inter- speech, 2021

  20. [20]

    Generative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014

  21. [21]

    Explaining and harnessing ad- versarial examples.Proceedings of the 33rd International Conference on Machine Learning, 2014

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing ad- versarial examples.Proceedings of the 33rd International Conference on Machine Learning, 2014

  22. [22]

    Your classifier is secretly an energy based model and you should treat it like one

    Will Grathwohl, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. Your classifier is secretly an energy based model and you should treat it like one. InInternational Conference on Learning Representations, 2020

  23. [23]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  24. [24]

    Stochastic security: Adversarial defense using long-run dynamics of energy-based models

    Mitch Hill, Jonathan Craig Mitchell, and Song-Chun Zhu. Stochastic security: Adversarial defense using long-run dynamics of energy-based models. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=gwFTuzxJW0

  25. [25]

    Denoising diffusion probabilistic models, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020

  26. [26]

    Diffattack: Evasion attacks against diffusion-based adversarial purification.Advances in Neural Information Processing Systems, 36, 2024

    Mintong Kang, Dawn Song, and Bo Li. Diffattack: Evasion attacks against diffusion-based adversarial purification.Advances in Neural Information Processing Systems, 36, 2024

  27. [27]

    Diffbreak: Is diffusion-based purification robust?arXiv preprint arXiv:2411.16598, 2024

    Andre Kassis, Urs Hengartner, and Yaoliang Yu. Diffbreak: Is diffusion-based purification robust?arXiv preprint arXiv:2411.16598, 2024

  28. [28]

    Curvature-aware derivative-free optimization.Journal of Scientific Computing, 103(43):1–28, 2025

    Bumsu Kim, Daniel McKenzie, HanQin Cai, and Wotao Yin. Curvature-aware derivative-free optimization.Journal of Scientific Computing, 103(43):1–28, 2025

  29. [29]

    Springer, 1992

    Peter E Kloeden, Eckhard Platen, Peter E Kloeden, and Eckhard Platen.Stochastic differential equations. Springer, 1992

  30. [30]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

  31. [31]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. InNeurIPS, 2012

  32. [32]

    Robust evaluation of diffusion-based adversarial purification

    Minjong Lee and Dongwoo Kim. Robust evaluation of diffusion-based adversarial purification. arXiv preprint arXiv:2303.09051, 2023

  33. [33]

    Adbm: Adversarial diffusion bridge model for reliable adversarial purification

    Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, and Xiaolin Hu. Adbm: Adversarial diffusion bridge model for reliable adversarial purification. arXiv preprint arXiv:2408.00315, 2024

  34. [34]

    Scalable gradients and variational inference for stochastic differential equations

    Xuechen Li, Ting-Kam Leonard Wong, Ricky TQ Chen, and David K Duvenaud. Scalable gradients and variational inference for stochastic differential equations. InSymposium on Advances in Approximate Bayesian Inference, pages 1–28. PMLR, 2020

  35. [35]

    Towards understanding the robustness of diffusion-based purification: A stochastic perspective

    Yiming Liu, Kezhao Liu, Yao Xiao, Ziyi Dong, Xiaogang Xu, Pengxu Wei, and Liang Lin. Towards understanding the robustness of diffusion-based purification: A stochastic perspective. arXiv preprint arXiv:2404.14309, 2024. 11

  36. [36]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021

  37. [37]

    Towards deep learning models resistant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018

  38. [38]

    Diffusion models for adversarial purification.arXiv preprint arXiv:2205.07460, 2022

    Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. Diffusion models for adversarial purification.arXiv preprint arXiv:2205.07460, 2022

  39. [39]

    GPT-4 Technical Report

    OpenAI. Gpt-4 technical report, 2024. URLhttps://arxiv.org/abs/2303.08774

  40. [40]

    Practical black-box attacks against deep learning systems using adversarial examples,

    Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning, 2017. URL https: //arxiv.org/abs/1602.02697

  41. [41]

    Defense-gan: Protecting classifiers against adversarial attacks using generative models

    Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models.arXiv preprint arXiv:1805.06605, 2018

  42. [42]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

  43. [43]

    Sliced score matching: A scalable approach to density and score estimation

    Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. InUncertainty in Artificial Intelligence, pages 574–584. PMLR, 2020

  44. [44]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

  45. [45]

    One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841, October

    Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841, October

  46. [46]

    doi: 10.1109/tevc.2019.2890858

    ISSN 1941-0026. doi: 10.1109/tevc.2019.2890858. URL http://dx.doi.org/10. 1109/TEVC.2019.2890858

  47. [47]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfel- low, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013

  48. [48]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

  49. [49]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017

  50. [50]

    A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011

    Pascal Vincent. A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011

  51. [51]

    A theory of generative convnet

    Jianwen Xie, Yang Lu, Song-Chun Zhu, and Yingnian Wu. A theory of generative convnet. InProceedings of the 33rd International Conference on Machine Learning, pages 2635–2644, 2016

  52. [52]

    Adversarial purification with score-based generative models.arXiv preprint arXiv:2106.06041, 2021

    Jongmin Yoon, Sung Ju Hwang, and Juho Lee. Adversarial purification with score-based generative models.arXiv preprint arXiv:2106.06041, 2021

  53. [54]

    URLhttp://arxiv.org/abs/1605.07146

  54. [55]

    Visualizing and understanding convolutional networks,

    Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks,

  55. [56]

    URLhttps://arxiv.org/abs/1311.2901. 12 Supplementary Materials for Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations A MEFA Framework - PGD We summarize the full gradient attack algorithm for MEFA Framework PGD with fixed step size used for adversarial attacks in Algorithm 1. The APGD attack with adaptive step siz...

  56. [57]

    detach () 1# ME fr am wo rk Code : Full g ra die nt computation , input : CE loss 2n e t _ g r a d s = torch

    [0]. detach () 1# ME fr am wo rk Code : Full g ra die nt computation , input : CE loss 2n e t _ g r a d s = torch . a ut ogr ad . grad ( 3loss , 4[ X _ r e p e a t _ p u r i f i e d ]

  57. [58]

    to ( device ) 7next_t = next_ts [n -i -1]

    [0] 6curr_t = curr_ts [n -i -1]. to ( device ) 7next_t = next_ts [n -i -1]. to ( device ) 8net_out = ys [n -i -1] 9curr_y = torch . a ut og ra d . Va ria bl e ( 10net_out , 11r e q u i r e s _ g r a d = True

  58. [59]

    to ( device ) 14prev_t , prev_y = curr_t , curr_y 15next_layer , c u r r _ e x t r a = self

    to ( device ) 13noi = noises [n -i -1]. to ( device ) 14prev_t , prev_y = curr_t , curr_y 15next_layer , c u r r _ e x t r a = self . step ( 16curr_t , 17next_t , 18curr_y , 19noi , 20c u r r _ e x t r a 21) 22curr_t = next_t 23n e t _ g r a d s = torch . a ut ogr ad . grad ( 24next_layer , 25[ curr_y ] , 26g r a d _ o u t p u t s = n e t _ g r a d s

  59. [60]

    view ( 29X _ r e p e a t _ p u r i f i e d

    [0] 28grad = n e t _ g r a d s . view ( 29X _ r e p e a t _ p u r i f i e d . shape 30) Key Technical Differences for Score SDE-based Defense: • Left: Approximates gradients through segmented forward/backward passes via adjoint method with deviated- reconstruction • Right: Computes exact gradients from CE loss via SDE solver gradient checkpointing withO(1...

  60. [61]

    detach () 1# MEFA F r a m e w o r k Code : Full gr ad ie nt computation , input : n e t _ g r a d s 2net_out = x _ d i f f _ l i s t [ n - k - 1] 3net_out = torch

    [0]. detach () 1# MEFA F r a m e w o r k Code : Full gr ad ie nt computation , input : n e t _ g r a d s 2net_out = x _ d i f f _ l i s t [ n - k - 1] 3net_out = torch . au to gra d . Va ri ab le ( 4net_out , 5r e q u i r e s _ g r a d = True

  61. [62]

    sample

    to ( model . device ) 7noi_out = n o i _ d i f f _ l i s t [ 8n - k - 1 9]. to ( model . device ) 10t = torch . tensor ( 11[ n - k - 1] * shape [0] , 12device = X . device 13) 14 15# DDPM s am pl in g forward process 16n e x t _ l a y e r = s c h e d u l e r . d d i m _ s a m p l e _ g r a d ( 17model , 18net_out , 19t , 20noi_out , 21m o d e l _ k w a r ...

  62. [63]

    leakiness

    [0] Key Technical Differences for DDPM-based Defense: • Left: Approximates gradients at specific noise levels using deviated-reconstruction loss. • Right: Computes exact gradients from CE loss through full DDPM sampling chain withO(1)memory. • Red lines: Core gradient computation steps. Attack Steps The loss trajectory of one image example stabilizes afte...