pith. machine review for the scientific record. sign in

arxiv: 2605.02109 · v1 · submitted 2026-05-04 · 💻 cs.LG · cs.CR

Recognition: unknown

Detecting Adversarial Data via Provable Adversarial Noise Amplification

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:03 UTC · model grok-4.3

classification 💻 cs.LG cs.CR
keywords adversarial detectionnoise amplificationdeep neural networksrobustnessinference-time defensespectral lossadversarial attacks
0
0 comments X

The pith

A formal theorem guarantees that adversarial noise amplifies across layers in deep networks under specific conditions, enabling reliable detection of adversarial inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a mathematical theorem that guarantees adversarial perturbations grow stronger as they move through certain neural network layers. This growth creates a detectable signal that separates adversarial examples from normal inputs. The authors develop a training approach using a spectral loss function and targeted architectural choices to strengthen the signal, then build a simple detector that uses it during inference without added cost. If the theorem holds, networks could gain a provable defense mechanism against both standard and adaptive attacks.

Core claim

We present a formal adversarial noise amplification theorem and specify sufficient conditions on the network architecture, attack type, and noise properties under which the amplification is mathematically guaranteed. Using these observations we introduce a custom spectral loss and architectural design that enhance the amplification effect, along with a lightweight inference-time detector that relies on the strengthened signal.

What carries the argument

The adversarial noise amplification theorem, which proves that perturbation magnitude increases layer by layer when the listed sufficient conditions hold.

Load-bearing premise

The network architecture, attack type, and noise properties must satisfy the theorem's sufficient conditions for amplification to be guaranteed.

What would settle it

A counterexample network and attack that meet the sufficient conditions yet show no increase in noise magnitude across layers would disprove the theorem.

Figures

Figures reproduced from arXiv: 2605.02109 by Furkan Mumcu, Yasin Yilmaz.

Figure 1
Figure 1. Figure 1: Adversarial noise amplification theorem (top) inspires the JPEG compression-based lightweight detector (bottom), which measures the amplification ratio with respect to the sanitized ver￾sion x san of the input x inp. If the ratio is greater than a predeter￾mined threshold, then the input sample is labeled as adversarial and the predicted label g(x inp) is ignored. magnified at later layers. While this conj… view at source ↗
Figure 2
Figure 2. Figure 2: Amplification vs. adversarial noise budget ϵ for different models and attacks. tion set. For each model, we compute the net amplifica￾tion ratio (dn/d1) under strong gradient-based attacks that successfully induce misclassification, including PGD [18], BIM [17], VMI [36], and VNI [36], using the default con￾figurations recommended in their respective works view at source ↗
Figure 3
Figure 3. Figure 3: Trade-off between attack success rate and JAD’s detec￾tion AUROC as the adaptive attack shifts its objective via λ. the classifier loss both before and after sanitization, aiming to suppress the amplification ratio dn/d1 that JAD relies on for detection. max xadv (1 − λ)LCE(g(x adv), y) + λ Eq∼U(Q) view at source ↗
Figure 4
Figure 4. Figure 4: Detection AUROC and attack complexity as the number of JPEG trials T increases from 1 to 20. AUROC remains highly stable with only minor fluctuations, while attack complexity increases linearly with T. settings. These results highlight that simple input sanitization is insufficient for reliable adversarial detection, while JAD provides a substantially more robust and consistent detection signal. D. Adaptiv… view at source ↗
read the original abstract

The nonuniform and growing impact of adversarial noise across the layers of deep neural networks has been used in the literature, without a formal mathematical justification, to detect adversarial inputs and improve robustness. In this work, we study this phenomenon in detail and present a formal adversarial noise amplification theorem. We specify a set of sufficient conditions under which the adversarial noise amplification is mathematically guaranteed. Based on theoretical observations, we propose a novel training methodology with a custom spectral loss function and a specific architectural design to enhance the amplification signal for detecting adversarial data. Finally, we introduce a new, lightweight detection mechanism that leverages the enhanced amplification signal and operates entirely at inference time. To validate our approach, we demonstrate the detector's efficacy against both state-of-the-art attacks and a purpose-built adaptive attack, confirming that enhanced amplification can serve as a robust and reliable signal for adversarial defense.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to derive a formal adversarial noise amplification theorem that guarantees amplification of adversarial perturbations across DNN layers under a set of sufficient conditions on architecture, attack model, and noise statistics. It then introduces a custom spectral loss and targeted architectural modifications to strengthen this amplification signal, enabling a lightweight inference-time detector. The detector is evaluated empirically on standard benchmarks against both SOTA attacks and a purpose-designed adaptive attack.

Significance. A rigorously verified theorem linking noise amplification to detection would be a notable contribution, moving beyond the heuristic layer-wise noise observations common in prior work. The inference-time design and inclusion of an adaptive attack are practical strengths. Credit is due for attempting a formal statement with external sufficient conditions rather than post-hoc fitting. However, the significance is limited by the unresolved question of whether the proposed spectral loss and architectural changes preserve the theorem's premises.

major comments (2)
  1. [Theorem 1 and §4 (training methodology)] Theorem 1 (and its proof in §3): the manuscript must demonstrate that the networks trained under the custom spectral loss and the described architectural modifications continue to satisfy every sufficient condition (e.g., spectral properties of layers, bounded noise statistics, attack model assumptions). Without an explicit verification or counter-example check, the mathematical guarantee does not transfer to the deployed detector, rendering the central claim unsupported.
  2. [§5] §5 (experiments): the reported detection rates against SOTA and adaptive attacks are not accompanied by any measurement or argument showing that the observed amplification matches the quantitative predictions of the theorem under the stated conditions. This leaves open whether performance stems from the provable mechanism or from incidental effects of the spectral loss.
minor comments (2)
  1. [Abstract and §1] The abstract and introduction should explicitly list the sufficient conditions rather than referring to them generically, so readers can immediately assess applicability.
  2. [§4] Notation for the spectral loss (Eq. (X)) should be defined before its first use and cross-referenced to the theorem's assumptions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify how the theoretical guarantees connect to the trained detector and empirical results. We address each major comment below and will revise the manuscript accordingly to strengthen these links.

read point-by-point responses
  1. Referee: [Theorem 1 and §4 (training methodology)] Theorem 1 (and its proof in §3): the manuscript must demonstrate that the networks trained under the custom spectral loss and the described architectural modifications continue to satisfy every sufficient condition (e.g., spectral properties of layers, bounded noise statistics, attack model assumptions). Without an explicit verification or counter-example check, the mathematical guarantee does not transfer to the deployed detector, rendering the central claim unsupported.

    Authors: The sufficient conditions of Theorem 1 concern the final network architecture (e.g., layer spectral norms and activation properties that control noise propagation), the attack model (standard bounded perturbations), and noise statistics (bounded variance assumptions). Our architectural modifications were explicitly selected to satisfy these spectral requirements, and the attack model is unchanged from the theorem statement. The custom spectral loss regularizes singular values to enhance amplification but is constructed to preserve boundedness of the noise statistics, as it does not introduce unbounded growth or violate the layer-wise contraction/expansion factors used in the proof. Nevertheless, we agree that an explicit post-training verification is necessary to confirm the conditions hold for the deployed models. In the revised manuscript we will add a dedicated subsection in §4 that computes the relevant spectral norms, verifies bounded noise statistics on the trained networks, and confirms the attack-model assumptions remain satisfied, thereby ensuring the theorem's guarantee applies to the detector. revision: yes

  2. Referee: [§5] §5 (experiments): the reported detection rates against SOTA and adaptive attacks are not accompanied by any measurement or argument showing that the observed amplification matches the quantitative predictions of the theorem under the stated conditions. This leaves open whether performance stems from the provable mechanism or from incidental effects of the spectral loss.

    Authors: The theorem guarantees that adversarial noise is amplified across layers (specifically, the noise norm grows by a factor strictly greater than one under the stated conditions). Our experiments already demonstrate that adversarial inputs produce substantially larger layer-wise noise signals than clean inputs, which directly enables the reported detection performance. To make the quantitative link explicit, we will augment §5 with additional measurements: for each evaluated architecture we will report the observed amplification ratios (layer-wise noise norm ratios) and compare them against the theoretical lower bound obtained by instantiating the theorem with the network dimensions, activation parameters, and noise bounds used in the experiments. This comparison will show that the measured amplification is consistent with the provable mechanism and exceeds what would be expected from incidental effects of the spectral loss alone. revision: yes

Circularity Check

0 steps flagged

Formal theorem with independent sufficient conditions; no reduction to inputs by construction

full rationale

The paper states a new adversarial noise amplification theorem under a set of sufficient conditions on architecture, attack type, and noise properties. It then uses theoretical observations to motivate a custom spectral loss and architectural changes that aim to strengthen the amplification signal, followed by an inference-time detector. No equation or claim in the provided text shows a prediction that is definitionally equivalent to a fitted parameter from the same data, nor does the central result rest on a self-citation chain whose premises already encode the target conclusion. The theorem is presented as a mathematical guarantee with externally verifiable conditions, and empirical results are reported separately. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the central claim rests on unspecified sufficient conditions for the theorem.

axioms (1)
  • domain assumption Sufficient conditions exist under which adversarial noise amplification is mathematically guaranteed
    The theorem is stated to hold only under these conditions, which are not enumerated in the abstract.

pith-pipeline@v0.9.0 · 5439 in / 1119 out tokens · 23541 ms · 2026-05-09T17:03:53.310488+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 18 canonical work pages · 5 internal anchors

  1. [1]

    Synthesizing robust adversarial exam- ples

    Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial exam- ples. InInternational conference on machine learning, pages 284–293. PMLR, 2018

  2. [2]

    Adversarial ro- bustness limits via scaling-law and human-alignment studies.arXiv preprint arXiv:2404.09349, 2024

    Brian R Bartoldson, James Diffenderfer, Konstanti- nos Parasyris, and Bhavya Kailkhura. Adversarial ro- bustness limits via scaling-law and human-alignment studies.arXiv preprint arXiv:2404.09349, 2024

  3. [3]

    Parseval net- works: Improving robustness to adversarial examples

    Moustapha Cˆıss´e, Piotr Bojanowski, ´Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval net- works: Improving robustness to adversarial examples. InProceedings of the 34th International Conference on Machine Learning (ICML), volume 70 ofPro- ceedings of Machine Learning Research, pages 854–

  4. [4]

    URL https://proceedings

    PMLR, 2017. URL https://proceedings. mlr.press/v70/cisse17a.html

  5. [5]

    Detecting adversarial samples using influence func- tions and nearest neighbors

    Gilad Cohen, Guillermo Sapiro, and Raja Giryes. Detecting adversarial samples using influence func- tions and nearest neighbors. InProceedings of the 8 IEEE/CVF conference on computer vision and pattern recognition, pages 14453–14462, 2020

  6. [7]

    Reliable evalu- ation of adversarial robustness with an ensemble of diverse parameter-free attacks

    Francesco Croce and Matthias Hein. Reliable evalu- ation of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational confer- ence on machine learning, pages 2206–2216. PMLR, 2020

  7. [8]

    Robustbench: a standardized adversarial robustness benchmark

    Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. Robustbench: a standardized adversarial robustness benchmark.arXiv preprint arXiv:2010.09670, 2020

  8. [9]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

  9. [10]

    Gener- alizable adversarial training via spectral normalization

    Farzan Farnia, Jesse M Zhang, and David Tse. Gener- alizable adversarial training via spectral normalization. arXiv preprint arXiv:1811.07457, 2018

  10. [11]

    Explaining and Harnessing Adversarial Examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial exam- ples.arXiv preprint arXiv:1412.6572, 2014

  11. [12]

    Uncovering the limits of adversarial training against norm-bounded adversarial examples.arXiv preprint arXiv:2010.03593, 2020

    Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. Uncovering the limits of adversarial training against norm-bounded adversarial examples.arXiv preprint arXiv:2010.03593, 2020

  12. [13]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. arxiv e-prints.arXiv preprint arXiv:1512.03385, 10, 2015

  13. [14]

    A singular value perspective on model robust- ness.arXiv preprint arXiv:2012.03516, 2020

    Malhar Jere, Maghav Kumar, and Farinaz Koushan- far. A singular value perspective on model robust- ness.arXiv preprint arXiv:2012.03516, 2020. URL https://arxiv.org/abs/2012.03516

  14. [15]

    Im- proving training of deep neural networks via singular value bounding

    Kui Jia, Dacheng Tao, and Shenghua Gao. Im- proving training of deep neural networks via singular value bounding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017. URL https: //openaccess.thecvf.com/content_ cvpr_2017/html/Jia_Improving_ Training_of_CVPR_2017_paper.html

  15. [16]

    Improving training of deep neural networks via singular value bounding

    Kui Jia, Dacheng Tao, Shenghua Gao, and Xiangmin Xu. Improving training of deep neural networks via singular value bounding. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 4344–4352, 2017

  16. [17]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

  17. [18]

    Adversarial examples in the physical world

    Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. Adversarial examples in the physical world. InArti- ficial intelligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018

  18. [19]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. To- wards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083, 2017

  19. [20]

    Adversarial robustness of deep neural networks: A survey from a formal verification perspective.IEEE Transactions on Dependable and Secure Computing, 2022

    Mark Huasong Meng, Guangdong Bai, Sin Gee Teo, Zhe Hou, Yan Xiao, Yun Lin, and Jin Song Dong. Adversarial robustness of deep neural networks: A survey from a formal verification perspective.IEEE Transactions on Dependable and Secure Computing, 2022

  20. [21]

    Sequential architecture-agnostic black-box attack design and anal- ysis.Pattern Recognition, 147:110066, 2024

    Furkan Mumcu and Yasin Yilmaz. Sequential architecture-agnostic black-box attack design and anal- ysis.Pattern Recognition, 147:110066, 2024

  21. [22]

    Fast and lightweight vision-language model for adversarial traf- fic sign detection.Electronics, 13(11):2172, 2024

    Furkan Mumcu and Yasin Yilmaz. Fast and lightweight vision-language model for adversarial traf- fic sign detection.Electronics, 13(11):2172, 2024

  22. [23]

    Multimodal attack detection for action recognition models

    Furkan Mumcu and Yasin Yilmaz. Multimodal attack detection for action recognition models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2967–2976, 2024

  23. [24]

    Universal and ef- ficient detection of adversarial data through nonuni- form impact on network layers.arXiv preprint arXiv:2506.20816, 2025

    Furkan Mumcu and Yasin Yilmaz. Universal and ef- ficient detection of adversarial data through nonuni- form impact on network layers.arXiv preprint arXiv:2506.20816, 2025

  24. [25]

    Robustness of agen- tic ai systems via adversarially-aligned jacobian regu- larization.arXiv preprint arXiv:2603.04378, 2026

    Furkan Mumcu and Yasin Yilmaz. Robustness of agen- tic ai systems via adversarially-aligned jacobian regu- larization.arXiv preprint arXiv:2603.04378, 2026

  25. [26]

    Adversarial machine learning attacks against video anomaly detection systems

    Furkan Mumcu, Keval Doshi, and Yasin Yilmaz. Adversarial machine learning attacks against video anomaly detection systems. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 206–213, 2022

  26. [27]

    Deep k-nearest neighbors: Towards confident, inter- pretable and robust deep learning

    Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, inter- pretable and robust deep learning.arXiv preprint arXiv:1803.04765, 2018

  27. [28]

    Deflecting adver- sarial attacks with pixel deflection

    Aaditya Prakash, Nick Moran, Solomon Garber, An- tonella DiLillo, and James Storer. Deflecting adver- sarial attacks with pixel deflection. InProceedings of the IEEE conference on computer vision and pattern 9 recognition, pages 8571–8580, 2018

  28. [29]

    Berg, and Li Fei-Fei

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV), 115(3):211–252,

  29. [30]

    doi: 10.1007/s11263-015-0816-y

  30. [31]

    Denoised smoothing: A provable defense for pretrained classifiers.Advances in Neu- ral Information Processing Systems, 33:21945–21957, 2020

    Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor, and J Zico Kolter. Denoised smoothing: A provable defense for pretrained classifiers.Advances in Neu- ral Information Processing Systems, 33:21945–21957, 2020

  31. [32]

    Towards practical control of singular values of convolutional layers

    Alexandra Senderovich, Ekaterina Bulatova, Anton Obukhov, and Maxim Rakhuba. Towards practical control of singular values of convolutional layers. In Advances in Neural Information Processing Systems (NeurIPS) 2022, 2022

  32. [33]

    Towards practical control of singular values of convolutional layers.Ad- vances in Neural Information Processing Systems, 35: 10918–10930, 2022

    Alexandra Senderovich, Ekaterina Bulatova, Anton Obukhov, and Maxim Rakhuba. Towards practical control of singular values of convolutional layers.Ad- vances in Neural Information Processing Systems, 35: 10918–10930, 2022

  33. [34]

    Oppor- tunities and challenges in deep learning adversarial ro- bustness: A survey.arXiv preprint arXiv:2007.00753, 2020

    Samuel Henrique Silva and Peyman Najafirad. Oppor- tunities and challenges in deep learning adversarial ro- bustness: A survey.arXiv preprint arXiv:2007.00753, 2020

  34. [35]

    Going deeper with convolutions

    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Ser- manet, Scott Reed, Dragomir Anguelov, Dumitru Er- han, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

  35. [36]

    Training data-efficient image transformers & distilla- tion through attention

    Hugo Touvron, Matthieu Cord, Matthijs Douze, Fran- cisco Massa, Alexandre Sablayrolles, and Herv´e J´egou. Training data-efficient image transformers & distilla- tion through attention. InInternational conference on machine learning, pages 10347–10357. PMLR, 2021

  36. [37]

    Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks.Ad- vances in neural information processing systems, 31, 2018

    Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks.Ad- vances in neural information processing systems, 31, 2018

  37. [38]

    Enhancing the trans- ferability of adversarial attacks through variance tun- ing

    Xiaosen Wang and Kun He. Enhancing the trans- ferability of adversarial attacks through variance tun- ing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1924– 1933, 2021

  38. [39]

    Better diffusion models further improve adversarial training

    Zekai Wang, Tianyu Pang, Chao Du, Min Lin, Wei- wei Liu, and Shuicheng Yan. Better diffusion models further improve adversarial training. InInternational conference on machine learning, pages 36246–36263. PMLR, 2023

  39. [40]

    PyTorch Image Models

    Ross Wightman. PyTorch Image Models. URL https://github.com/huggingface/ pytorch-image-models

  40. [41]

    Lrs: Enhancing adversarial transferability through lipschitz regularized surrogate

    Tao Wu, Tie Luo, and Donald C Wunsch II. Lrs: Enhancing adversarial transferability through lipschitz regularized surrogate. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 6135–6143, 2024

  41. [42]

    Schoenholz, and Jeffrey Pennington

    Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, and Jeffrey Pennington. Dy- namical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks. Technical report, arXiv preprint, 2018. URL https://arxiv.org/abs/1806.05393

  42. [43]

    Feature squeez- ing: Detecting adversarial examples in deep neural networks

    W Xu. Feature squeezing: Detecting adversarial exa mples in deep neural networks.arXiv preprint arXiv:1704.01155, 2017

  43. [44]

    Diffusion-based adversarial sample generation for improved stealthiness and controllability.Advances in Neural Information Processing Systems, 36:2894– 2921, 2023

    Haotian Xue, Alexandre Araujo, Bin Hu, and Yongxin Chen. Diffusion-based adversarial sample generation for improved stealthiness and controllability.Advances in Neural Information Processing Systems, 36:2894– 2921, 2023

  44. [46]

    What you see is not what the network in- fers: Detecting adversarial examples based on seman- tic contradiction.arXiv preprint arXiv:2201.09650, 2022

    Yijun Yang, Ruiyuan Gao, Yu Li, Qiuxia Lai, and Qiang Xu. What you see is not what the network in- fers: Detecting adversarial examples based on seman- tic contradiction.arXiv preprint arXiv:2201.09650, 2022

  45. [47]

    Norm-preservation: Why residual networks can become extremely deep?IEEE transac- tions on pattern analysis and machine intelligence, 43 (11):3980–3990, 2020

    Alireza Zaeemzadeh, Nazanin Rahnavard, and Mubarak Shah. Norm-preservation: Why residual networks can become extremely deep?IEEE transac- tions on pattern analysis and machine intelligence, 43 (11):3980–3990, 2020

  46. [48]

    Wide Residual Networks

    Sergey Zagoruyko and Nikos Komodakis. Wide resid- ual networks.arXiv preprint arXiv:1605.07146, 2016

  47. [49]

    Detect- ing adversarial data by probing multiple perturbations using expected perturbation score

    Shuhai Zhang, Feng Liu, Jiahao Yang, Yifan Yang, Changsheng Li, Bo Han, and Mingkui Tan. Detect- ing adversarial data by probing multiple perturbations using expected perturbation score. InInternational conference on machine learning, pages 41429–41451. PMLR, 2023. 10 Method PGD BIM VNI VMI PGD-ℓ 2 AA Diff ResNet-50 0.99 0.99 0.99 0.99 0.99 0.99 0.99 ViT ...