arxiv: 2604.27487 · v1 · submitted 2026-04-30 · 💻 cs.LG · cs.CR

Recognition: unknown

Low Rank Adaptation for Adversarial Perturbation

Chongjie Zhang, Han Liu, Ning Zhang, Shanghao Shi, Yevgeniy Vorobeychik

Pith reviewed 2026-05-07 09:57 UTC · model grok-4.3

classification 💻 cs.LG cs.CR

keywords adversarial perturbationslow-rank adaptationblack-box attacksLoRAadversarial robustnessgradient projectionquery efficiency

0 comments

The pith

Adversarial perturbations exhibit an inherently low-rank structure that can be exploited to improve black-box attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether adversarial perturbations generated by optimization processes share the low-dimensional structure that LoRA exploits for efficient model updates. Theoretical analysis and experiments across attack methods, architectures, and datasets confirm that these perturbations do possess such a low-rank structure. This enables a two-step process: a reference model and auxiliary data project gradients into a low-dimensional subspace, after which the black-box attack search is restricted to that subspace. The approach reduces query demands while raising success rates compared to standard black-box methods. Readers care because black-box attacks are often limited by query budgets, and confirming a shared structural property could reshape both offensive and defensive strategies.

Core claim

Adversarial perturbations possess an inherently low-rank structure. Using a reference model and auxiliary data to project gradients into a low-dimensional subspace and then confining the perturbation search to this subspace yields substantial gains in efficiency and effectiveness for black-box attacks over conventional methods.

What carries the argument

Reference-model-guided projection of gradients into a low-rank subspace that confines the search for adversarial perturbations.

If this is right

Black-box attacks achieve higher success rates with significantly fewer queries.
The low-rank property generalizes across the tested attack methods, models, and datasets.
Both adversarial attacks and defenses can be redesigned around the low-dimensional subspace.
The method integrates with existing black-box techniques without requiring white-box access to the target.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A universal low-rank basis derived once from many reference models might transfer across entirely new targets without per-instance projection.
Defenses could target the low-rank directions directly, for example by regularizing model weights to increase the effective rank of any perturbation.
The analogy to LoRA suggests adversarial training itself might be made more efficient by operating only in low-rank perturbation spaces.

Load-bearing premise

The low-rank property of adversarial perturbations holds consistently enough across attack methods, model architectures, and datasets for a reference-model projection to reliably improve performance.

What would settle it

A test on a standard attack such as PGD using a new dataset and architecture where the low-rank projected version shows no improvement in success rate per query over an unprojected black-box baseline would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.27487 by Chongjie Zhang, Han Liu, Ning Zhang, Shanghao Shi, Yevgeniy Vorobeychik.

**Figure 1.** Figure 1: The relative magnitude change of singular values across differ view at source ↗

**Figure 2.** Figure 2: Overview of the low-rank adversarial attack. view at source ↗

**Figure 3.** Figure 3: The accuracy vs. query curves (ǫ = 2 for untargeted attack, ǫ = 10 for targeted attack) across four datasets and explanators. The first four rows display curves for pre-trained models, while the last row represents non-pre-trained models. The first row shows curves for HSJA, the second row shows Sign-Opt, the third row shows RamBoAttack, and the fourth row highlights the HSJA targeted attack, and the final… view at source ↗

**Figure 5.** Figure 5: Comparison of attack performance under different view at source ↗

**Figure 4.** Figure 4: Comparison of adversarial attack performance view at source ↗

**Figure 7.** Figure 7: Adversarial attack performance under l∞ norm. The accuracy vs. queries curve is obtained using ǫ = 4/255. Evaluation of l∞ norm: We additionally conduct evaluations under the l∞ norm, applying our subspace optimization to HSJA attacks and performing experiments on the CUB200 datasets. The outcomes, presented in view at source ↗

**Figure 8.** Figure 8: The accuracy vs. perturbation budget curves of the view at source ↗

**Figure 6.** Figure 6: Comparison of adversarial attack performance view at source ↗

**Figure 9.** Figure 9: The accuracy vs. queries curves of applying our view at source ↗

**Figure 10.** Figure 10: Visualized trajectories of our proposed attacks on randomly sel view at source ↗

**Figure 11.** Figure 11: The accuracy vs. perturbation budget curves across different dat view at source ↗

read the original abstract

Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved the training efficiency of Large Language Models (LLMs) by updating neural network layers using low-rank matrices. Since the generation of adversarial examples is an optimization process analogous to model training, this naturally raises the question: Do adversarial perturbations exhibit a similar low-rank structure? In this paper, we provide both theoretical analysis and extensive empirical investigation across various attack methods, model architectures, and datasets to show that adversarial perturbations indeed possess an inherently low-rank structure. This insight opens up new opportunities for improving both adversarial attacks and defenses. We mainly focus on leveraging this low-rank property to improve the efficiency and effectiveness of black-box adversarial attacks, which often suffer from excessive query requirements. Our method follows a two-step approach. First, we use a reference model and auxiliary data to guide the projection of gradients into a low-dimensional subspace. Next, we confine the perturbation search in black-box attacks to this low-rank subspace, significantly improving the efficiency and effectiveness of the adversarial attacks. We evaluated our approach across a range of attack methods, benchmark models, datasets, and threat models. The results demonstrate substantial and consistent improvements in the performance of our low-rank adversarial attacks compared to conventional methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that adversarial perturbations exhibit an inherently low-rank structure, supported by theoretical analysis and extensive empirical checks across attack methods, architectures, and datasets. It proposes a two-step method that first uses a reference model plus auxiliary data to project gradients into a low-dimensional subspace and then restricts black-box attack search to that subspace, yielding improved attack efficiency and success rates.

Significance. If the low-rank property holds independently of the projection construction, the work could meaningfully improve query efficiency for black-box attacks while opening avenues for defenses that exploit the same structure. The broad empirical coverage across settings is a strength, but the result's independence from the reference-guided construction is central to its claimed novelty.

major comments (3)

[§3] §3 (theoretical analysis): the derivation that perturbations are 'inherently' low-rank must be shown to hold for unprojected gradients or perturbations; the subsequent reference-model projection in §4.1 risks defining the subspace rather than discovering an intrinsic property. A direct rank comparison (e.g., singular-value decay) on raw perturbations before projection is required to support the central claim.
[§5] §5 (empirical evaluation): the reported improvements in attack success rate and query count rely on the projected subspace; without an ablation that measures effective rank of unprojected perturbations across the same attack methods and datasets, it remains unclear whether the low-rank observation is consistent or induced by the reference model choice.
[§4.2] §4.2 (projection method): the auxiliary data and reference model selection criteria are not fully specified; if the low-rank subspace is sensitive to these choices, the method may not generalize when the reference model differs substantially from the target, weakening the practical claim.

minor comments (2)

[§4] Notation for the low-rank matrices (e.g., A and B in the LoRA-style update) should be introduced earlier and kept consistent with the gradient projection equations.
[Figure 3] Figure captions for rank-decay plots should explicitly state whether the plotted values are before or after the reference-model projection.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below, clarifying the separation between our theoretical claim of an inherent low-rank structure and the practical projection method used to exploit it. We agree that additional direct analyses on unprojected perturbations will strengthen the manuscript and will incorporate them.

read point-by-point responses

Referee: [§3] §3 (theoretical analysis): the derivation that perturbations are 'inherently' low-rank must be shown to hold for unprojected gradients or perturbations; the subsequent reference-model projection in §4.1 risks defining the subspace rather than discovering an intrinsic property. A direct rank comparison (e.g., singular-value decay) on raw perturbations before projection is required to support the central claim.

Authors: Our theoretical analysis in §3 derives the low-rank property directly from the optimization dynamics of adversarial perturbation generation, which parallels the low-rank update structure in LoRA without any reference to projection or a reference model. The projection in §4.1 is introduced later as a practical exploitation of this property for black-box attacks. To strengthen the presentation of the intrinsic claim, we will add explicit singular-value decay plots and effective-rank measurements on raw, unprojected perturbations across the attack methods and datasets already evaluated. revision: yes
Referee: [§5] §5 (empirical evaluation): the reported improvements in attack success rate and query count rely on the projected subspace; without an ablation that measures effective rank of unprojected perturbations across the same attack methods and datasets, it remains unclear whether the low-rank observation is consistent or induced by the reference model choice.

Authors: The empirical results in §5 already span multiple attack methods, architectures, and datasets and show consistent performance gains from the low-rank subspace. We acknowledge that a dedicated ablation isolating the rank of unprojected perturbations would make the independence clearer. We will add this ablation (singular-value spectra and rank statistics on raw perturbations) to the revised manuscript to confirm the observation holds prior to projection. revision: yes
Referee: [§4.2] §4.2 (projection method): the auxiliary data and reference model selection criteria are not fully specified; if the low-rank subspace is sensitive to these choices, the method may not generalize when the reference model differs substantially from the target, weakening the practical claim.

Authors: Section 4.2 describes the reference model as a surrogate of comparable architecture trained on auxiliary data drawn from the same distribution as the target. We will expand this description with explicit selection criteria (architecture similarity, data-domain match, and capacity considerations). We will also add experiments using deliberately mismatched reference models to quantify robustness and show that the low-rank subspace remains effective even when the reference differs from the target. revision: yes

Circularity Check

0 steps flagged

No significant circularity; low-rank claim is empirical observation

full rationale

The paper's derivation begins with an analogy to LoRA, then asserts via separate theoretical analysis plus empirical checks across attack methods, architectures, and datasets that perturbations possess inherently low-rank structure. The subsequent reference-model projection step is presented as a downstream application that exploits this observed property to improve black-box search efficiency, not as the source that defines or forces the low-rank finding itself. No equations, fitted parameters, or self-citations reduce the central claim to a tautology or to the same data used for the attack improvement. The structure is therefore self-contained against external benchmarks and does not match any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5537 in / 933 out tokens · 56033 ms · 2026-05-07T09:57:12.269178+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

96 extracted references · 23 canonical work pages · 7 internal anchors

[1]

Adve rsarial machine learning: A taxonomy and terminology of attacks and mit- igations,

A. V assilev, A. Oprea, A. Fordyce, and H. Anderson, “Adve rsarial machine learning: A taxonomy and terminology of attacks and mit- igations,” tech. rep., National Institute of Standards and Technology, 2024

2024
[2]

Stealing part of a production language model,

N. Carlini, D. Paleka, K. D. Dvijotham, T. Steinke, J. Hay ase, A. F. Cooper, K. Lee, M. Jagielski, M. Nasr, A. Conmy, et al. , “Stealing part of a production language model,” in International Conference on Machine Learning , pp. 5680–5705, PMLR, 2024

2024
[3]

Formalizin g and benchmarking prompt injection attacks and defenses,

Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizin g and benchmarking prompt injection attacks and defenses,” i n 33rd USENIX Security Symposium (USENIX Security 24) , pp. 1831–1847, 2024

2024
[4]

Instruction backdoor attacks against custom ized {LLMs},

R. Zhang, H. Li, R. Wen, W. Jiang, Y . Zhang, M. Backes, Y . Sh en, and Y . Zhang, “Instruction backdoor attacks against custom ized {LLMs},” in 33rd USENIX Security Symposium (USENIX Security 24), pp. 1849–1866, 2024

2024
[5]

Parameter- efﬁcient model adaptation for vision transformers,

X. He, C. Li, P . Zhang, J. Y ang, and X. E. Wang, “Parameter- efﬁcient model adaptation for vision transformers,” in Proceedings of the AAAI Conference on Artiﬁcial Intelligence , vol. 37, pp. 817–825, 2023

2023
[6]

Mtlora: Low-rank adapt ation approach for efﬁcient multi-task learning,

A. Agiza, M. Neseem, and S. Reda, “Mtlora: Low-rank adapt ation approach for efﬁcient multi-task learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni tion, pp. 16196–16205, 2024

2024
[7]

Heterogeneous lora for federated ﬁne-tuning of on-device foundation models,

Y . J. Cho, L. Liu, Z. Xu, A. Fahrezi, M. Barnes, and G. Joshi , “Heterogeneous lora for federated ﬁne-tuning of on-device foundation models,” in International Workshop on Federated Learning in the Age of F oundation Models in Conjunction with NeurIPS 2023 , 2023

2023
[8]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfel- low, and R. Fergus, “Intriguing properties of neural networ ks,” arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review arXiv 2013
[9]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harness- ing adversarial examples,” arXiv preprint arXiv:1412.6572 , 2014

work page internal anchor Pith review arXiv 2014
[10]

Towards Deep Learning Models Resistant to Adversarial Attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vlad u, “Towards deep learning models resistant to adversarial att acks,” arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review arXiv 2017
[11]

Towards evaluating the robus tness of neural networks,

N. Carlini and D. Wagner, “Towards evaluating the robus tness of neural networks,” in 2017 ieee symposium on security and privacy (sp), pp. 39–57, Ieee, 2017

2017
[12]

Sign-opt: A query-efﬁcient hard-label adversaria l attack,

M. Cheng, S. Singh, P . H. Chen, P .-Y . Chen, S. Liu, and C.- J. Hsieh, “Sign-opt: A query-efﬁcient hard-label adversaria l attack,” in International Conference on Learning Representations , 2019

2019
[13]

Ramboattack: A robust query efﬁcient deep neural network decision exploit,

D. C. R. Viet Quoc V o, Ehsan Abbasnejad, “Ramboattack: A robust query efﬁcient deep neural network decision exploit,” in Network and Distributed Systems Security (NDSS) Symposium , 2022

2022
[14]

Hopskipjum pattack: A query-efﬁcient decision-based attack,

J. Chen, M. I. Jordan, and M. J. Wainwright, “Hopskipjum pattack: A query-efﬁcient decision-based attack,” in 2020 ieee symposium on security and privacy (sp) , pp. 1277–1294, IEEE, 2020

2020
[15]

Hybrid batch atta cks: Finding black-box adversarial examples with limited queri es,

F. Suya, J. Chi, D. Evans, and Y . Tian, “Hybrid batch atta cks: Finding black-box adversarial examples with limited queri es,” in 29th USENIX Security Symposium (USENIX Security 20) , 2020

2020
[16]

Adversari al machine learning at scale,

A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversari al machine learning at scale,” in International Conference on Learning Repre- sentations, 2022

2022
[17]

Adversarial train ing for free!,

A. Shafahi, M. Najibi, M. A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L. S. Davis, G. Taylor, and T. Goldstein, “Adversarial train ing for free!,” Advances in neural information processing systems , vol. 32, 2019

2019
[18]

Adversarial robustness via robust low rank representations,

P . Awasthi, H. Jain, A. S. Rawat, and A. Vijayaraghavan, “Adversarial robustness via robust low rank representations,” Advances in Neural Information Processing Systems , vol. 33, pp. 11391–11403, 2020

2020
[19]

Cross -entropy loss and low-rank features have responsibility for adversa rial exam- ples,

K. Nar, O. Ocal, S. S. Sastry, and K. Ramchandran, “Cross -entropy loss and low-rank features have responsibility for adversa rial exam- ples,” arXiv preprint arXiv:1901.08360 , 2019

work page arXiv 1901
[20]

Low-rank and sparse decomposition for low-query decision -based adversarial attacks,

A. Esmaeili, M. Edraki, N. Rahnavard, A. Mian, and M. Sha h, “Low-rank and sparse decomposition for low-query decision -based adversarial attacks,” IEEE Transactions on Information F orensics and Security, vol. 19, pp. 1561–1575, 2023

2023
[21]

Low-ra nk adversar- ial pgd attack,

D. Savostianova, E. Zangrando, and F. Tudisco, “Low-ra nk adversar- ial pgd attack,” arXiv preprint arXiv:2410.12607 , 2024

work page arXiv 2024
[22]

Lora: Low-rank adaptation of large language models,

E. J. Hu, P . Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chen, et al. , “Lora: Low-rank adaptation of large language models,” in International Conference on Learning Representations , 2022

2022
[23]

Delving deep into rec tiﬁers: Surpassing human-level performance on imagenet classiﬁca tion,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rec tiﬁers: Surpassing human-level performance on imagenet classiﬁca tion,” in Proceedings of the IEEE international conference on comput er vision, pp. 1026–1034, 2015

2015
[24]

Adam: A Method for Stochastic Optimization

D. P . Kingma, “Adam: A method for stochastic optimizati on,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review arXiv 2014
[25]

Qlora: Efﬁcient ﬁnetuning of quantized llms,

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoye r, “Qlora: Efﬁcient ﬁnetuning of quantized llms,” Advances in Neural Informa- tion Processing Systems , vol. 36, 2024

2024
[26]

Adversari al examples in the physical world,

A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversari al examples in the physical world,” in Artiﬁcial intelligence safety and security , pp. 99–112, Chapman and Hall/CRC, 2018

2018
[27]

Zoo: Zeroth order optimization based black-box attacks to deep neural n etworks without training substitute models,

P .-Y . Chen, H. Zhang, Y . Sharma, J. Yi, and C.-J. Hsieh, “ Zoo: Zeroth order optimization based black-box attacks to deep neural n etworks without training substitute models,” in Proceedings of the 10th ACM workshop on artiﬁcial intelligence and security , 2017

2017
[28]

Advers arial risk and the dangers of evaluating against weak attacks,

J. Uesato, B. O’donoghue, P . Kohli, and A. Oord, “Advers arial risk and the dangers of evaluating against weak attacks,” in International Conference on Machine Learning , pp. 5025–5034, PMLR, 2018

2018
[29]

Square attack: a query-efﬁcient black-box adversarial attack via random search,

M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein , “Square attack: a query-efﬁcient black-box adversarial attack via random search,” in European conference on computer vision , 2020

2020
[30]

Prior convictions : Black- box adversarial attacks with bandits and priors,

A. Ilyas, L. Engstrom, and A. Madry, “Prior convictions : Black- box adversarial attacks with bandits and priors,” in International Conference on Learning Representations , 2018

2018
[31]

{AutoDA}: Auto- mated decision-based iterative adversarial attacks,

Q.-A. Fu, Y . Dong, H. Su, J. Zhu, and C. Zhang, “ {AutoDA}: Auto- mated decision-based iterative adversarial attacks,” in 31st USENIX Security Symposium (USENIX Security 22) , pp. 3557–3574, 2022

2022
[32]

Query-efﬁcient hard-label black-box attack: An optimiza tion-based approach,

M. Cheng, T. Le, P .-Y . Chen, J. Yi, H. Zhang, and C.-J. Hsi eh, “Query-efﬁcient hard-label black-box attack: An optimiza tion-based approach,” arXiv preprint arXiv:1807.04457 , 2018

work page arXiv 2018
[33]

All you need is low (rank) defending against adversar ial attacks on graphs,

N. Entezari, S. A. Al-Sayouri, A. Darvishzadeh, and E. E . Papalex- akis, “All you need is low (rank) defending against adversar ial attacks on graphs,” in Proceedings of the 13th international conference on web search and data mining , pp. 169–177, 2020

2020
[34]

On the number of linear regions of deep neural networks,

G. F. Montufar, R. Pascanu, K. Cho, and Y . Bengio, “On the number of linear regions of deep neural networks,” Advances in neural information processing systems , vol. 27, 2014

2014
[35]

D. P . Bertsekas, Constrained optimization and Lagrange multiplier methods. Academic press, 2014

2014
[36]

Deep residual learni ng for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learni ng for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 770–778, 2016

2016
[37]

Efﬁcientnet: Rethinking model scalin g for con- volutional neural networks,

M. Tan and Q. Le, “Efﬁcientnet: Rethinking model scalin g for con- volutional neural networks,” in International conference on machine learning, pp. 6105–6114, PMLR, 2019

2019
[38]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenbor n, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gel ly, et al. , “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Rep- resentations, 2020. 14

2020
[39]

Do adversarially robust imagenet models transfer better?,

H. Salman, A. Ilyas, L. Engstrom, A. Kapoor, and A. Madry , “Do adversarially robust imagenet models transfer better?,” Advances in Neural Information Processing Systems , vol. 33, 2020

2020
[40]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fe i, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition , Ieee, 2009

2009
[41]

The caltech-ucsd birds-200-2011 dataset,

C. Wah, S. Branson, P . Welinder, P . Perona, and S. Belong ie, “The caltech-ucsd birds-200-2011 dataset,” 2011

2011
[42]

Decision-based a dversarial attacks: Reliable attacks against black-box machine learn ing models,

W. Brendel, J. Rauber, and M. Bethge, “Decision-based a dversarial attacks: Reliable attacks against black-box machine learn ing models,” in International Conference on Learning Representations , 2018

2018
[43]

Towards deep learning models resistant to adversarial att acks,

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vlad u, “Towards deep learning models resistant to adversarial att acks,” in International Conference on Learning Representations , 2018

2018
[44]

Concur rent adversarial learning for large-batch training,

Y . Liu, X. Chen, M. Cheng, C. J. Hsieh, and Y . Y ou, “Concur rent adversarial learning for large-batch training,” in 10th International Conference on Learning Representations, ICLR 2022 , 2022

2022
[45]

High-resolution image synthesis with latent diffusion mo dels,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Omm er, “High-resolution image synthesis with latent diffusion mo dels,” in Proceedings of the IEEE/CVF conference on computer vision a nd pattern recognition , pp. 10684–10695, 2022

2022
[46]

Introducing mpt-7b: A new standard for open -source, commercially usable llms,

M. N. Team, “Introducing mpt-7b: A new standard for open -source, commercially usable llms,” 2023. Accessed: 2024-08-28

2023
[47]

Boosting the transferability of adversarial attacks with reverse ad- versarial perturbation,

Z. Qin, Y . Fan, Y . Liu, L. Shen, Y . Zhang, J. Wang, and B. Wu , “Boosting the transferability of adversarial attacks with reverse ad- versarial perturbation,” Advances in neural information processing systems, vol. 35, pp. 29845–29858, 2022

2022
[48]

A un iﬁed approach to interpreting and boosting adversarial transfe rability,

X. Wang, J. Ren, S. Lin, X. Zhu, Y . Wang, and Q. Zhang, “A un iﬁed approach to interpreting and boosting adversarial transfe rability,” in International Conference on Learning Representations , 2021

2021
[49]

Pytorch hub

“Pytorch hub.” https://pytorch.org/hub/. Accessed: 2023-12-15

2023
[50]

Tensorﬂow hub

“Tensorﬂow hub.” https://www.tensorﬂow.org/hub. Ac cessed: 2023- 12-15

2023
[51]

Delving into transf erable adversarial examples and black-box attacks,

Y . Liu, X. Chen, C. Liu, and D. Song, “Delving into transf erable adversarial examples and black-box attacks,” in International Con- ference on Learning Representations , 2022

2022
[52]

B oosting adversarial attacks with momentum,

Y . Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “B oosting adversarial attacks with momentum,” in Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 9185– 9193, 2018

2018
[53]

How tra nsferable are features in deep neural networks?,

J. Y osinski, J. Clune, Y . Bengio, and H. Lipson, “How tra nsferable are features in deep neural networks?,” Advances in neural information processing systems , vol. 27, 2014

2014
[54]

Visualizing and understand ing convo- lutional networks,

M. D. Zeiler and R. Fergus, “Visualizing and understand ing convo- lutional networks,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Pr oceedings, Part I 13 , pp. 818–833, Springer, 2014

2014
[55]

& Wattenberg, M

D. Smilkov, N. Thorat, B. Kim, F. Vi´ egas, and M. Wattenb erg, “Smoothgrad: removing noise by adding noise,” arXiv preprint arXiv:1706.03825, 2017

work page arXiv 2017
[56]

Learning i mportant features through propagating activation differences,

A. Shrikumar, P . Greenside, and A. Kundaje, “Learning i mportant features through propagating activation differences,” in International conference on machine learning , pp. 3145–3153, PMLR, 2017

2017
[57]

A uniﬁed approach to inter preting model predictions,

S. M. Lundberg and S.-I. Lee, “A uniﬁed approach to inter preting model predictions,” Advances in neural information processing sys- tems, vol. 30, 2017

2017
[58]

Grad-cam: Visual explanations from deep network s via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. V edantam, D. Par ikh, and D. Batra, “Grad-cam: Visual explanations from deep network s via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision , pp. 618–626, 2017

2017
[59]

Principal compone nt analysis,

S. Wold, K. Esbensen, and P . Geladi, “Principal compone nt analysis,” Chemometrics and intelligent laboratory systems , vol. 2, no. 1-3, pp. 37–52, 1987

1987
[60]

spca: Scalable principal component analysis f or big data on distributed platforms,

T. Elgamal, M. Y abandeh, A. Aboulnaga, W. Mustafa, and M. Hefeeda, “spca: Scalable principal component analysis f or big data on distributed platforms,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , 2015

2015
[61]

Autoencoders,

D. Bank, N. Koenigstein, and R. Giryes, “Autoencoders, ” Machine learning for data science handbook: data mining and knowled ge discovery handbook , pp. 353–374, 2023

2023
[62]

U-net: Convol utional networks for biomedical image segmentation,

O. Ronneberger, P . Fischer, and T. Brox, “U-net: Convol utional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention , Springer, 2015

2015
[63]

3d object r epresen- tations for ﬁne-grained categorization,

J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object r epresen- tations for ﬁne-grained categorization,” in Proceedings of the IEEE international conference on computer vision workshops , 2013

2013
[64]

On detecting adversarial perturbations

J. H. Metzen, T. Genewein, V . Fischer, and B. Bischoff, “ On detecting adversarial perturbations,” arXiv preprint arXiv:1702.04267 , 2017

work page arXiv 2017
[65]

Bounceattack: A quer y- efﬁcient decision-based adversarial attack by bouncing in to the wild,

J. Wan, J. Fu, L. Wang, and Z. Y ang, “Bounceattack: A quer y- efﬁcient decision-based adversarial attack by bouncing in to the wild,” in 2024 IEEE Symposium on Security and Privacy (SP) , pp. 1270– 1286, IEEE, 2024

2024
[66]

Benchmarking adversarial robustness on image classiﬁcat ion,

Y . Dong, Q.-A. Fu, X. Y ang, T. Pang, H. Su, Z. Xiao, and J. Z hu, “Benchmarking adversarial robustness on image classiﬁcat ion,” in proceedings of the IEEE/CVF conference on computer vision a nd pattern recognition , pp. 321–331, 2020

2020
[67]

On Evaluating Adversarial Robustness

N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Raub er, D. Tsipras, I. Goodfellow, A. Madry, and A. Kurakin, “On eval uating adversarial robustness,” arXiv preprint arXiv:1902.06705 , 2019

work page arXiv 1902
[68]

Axiomatic attrib ution for deep networks,

M. Sundararajan, A. Taly, and Q. Y an, “Axiomatic attrib ution for deep networks,” in International conference on machine learning , pp. 3319–3328, PMLR, 2017

2017
[69]

” why should i t rust you?

M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i t rust you?” explaining the predictions of any classiﬁer,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge disc overy and data mining , pp. 1135–1144, 2016

2016
[70]

What makes imagenet good for transfer learning?,

M. Huh, P . Agrawal, and A. A. Efros, “What makes imagenet good for transfer learning?,” arXiv preprint arXiv:1608.08614 , 2016

work page arXiv 2016
[71]

Densely connected convolutional networks,

G. Huang, Z. Liu, L. V an Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017

2017
[72]

Finding st ructure with randomness: Probabilistic algorithms for constructing ap proximate matrix decompositions,

N. Halko, P .-G. Martinsson, and J. A. Tropp, “Finding st ructure with randomness: Probabilistic algorithms for constructing ap proximate matrix decompositions,” SIAM review , 2011

2011
[73]

Theoretically principled trade-off between robustness a nd accuracy,

H. Zhang, Y . Y u, J. Jiao, E. Xing, L. El Ghaoui, and M. Jord an, “Theoretically principled trade-off between robustness a nd accuracy,” in ICML, PMLR, 2019

2019
[74]

Di stil- lation as a defense to adversarial perturbations against de ep neural networks,

N. Papernot, P . McDaniel, X. Wu, S. Jha, and A. Swami, “Di stil- lation as a defense to adversarial perturbations against de ep neural networks,” in 2016 IEEE symposium on security and privacy (SP) , pp. 582–597, IEEE, 2016

2016
[75]

Data ﬁltering for efﬁcient ad versarial training,

E.-C. Chen and C.-R. Lee, “Data ﬁltering for efﬁcient ad versarial training,” Pattern Recognition, p. 110394, 2024

2024
[76]

Wide Residual Networks

S. Zagoruyko and N. Komodakis, “Wide residual networks ,” arXiv preprint arXiv:1605.07146, 2016

work page internal anchor Pith review arXiv 2016
[77]

Defensive distillation is no t robust to adversarial examples,

N. Carlini and D. Wagner, “Defensive distillation is no t robust to adversarial examples,” arXiv preprint arXiv:1607.04311 , 2016

work page arXiv 2016
[78]

Improving the robust ness of deep neural networks via adversarial training with triplet loss,

P . Li, J. Yi, B. Zhou, and L. Zhang, “Improving the robust ness of deep neural networks via adversarial training with triplet loss,” arXiv preprint arXiv:1905.11713, 2019

work page arXiv 1905
[79]

Improving adve rsarial robustness via promoting ensemble diversity,

T. Pang, K. Xu, C. Du, N. Chen, and J. Zhu, “Improving adve rsarial robustness via promoting ensemble diversity,” in International Con- ference on Machine Learning , pp. 4970–4979, PMLR, 2019

2019
[80]

Defense-gan: Protecting classifiers against adversarial attacks using generative models

P . Samangouei, M. Kabkab, and R. Chellappa, “Defense-g an: Protect- ing classiﬁers against adversarial attacks using generati ve models,” arXiv preprint arXiv:1805.06605 , 2018. 15

work page arXiv 2018

Showing first 80 references.