A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP

Eyal Gofer; Guy Gilboa; Hodaya Krakover; Meir Yossef Levi

arxiv: 2606.30342 · v1 · pith:3APCNZZCnew · submitted 2026-06-29 · 💻 cs.CV

A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP

Hodaya Krakover , Meir Yossef Levi , Eyal Gofer , Guy Gilboa This is my paper

Pith reviewed 2026-06-30 06:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords adversarial attack detectionCLIPzero-shot detectionblack-box detectionimage classificationadversarial robustness

0 comments

The pith

CLIP embeddings flag adversarial attacks on any classifier without training or model access.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces A^4D, a detector that scores images with CLIP using fixed prompts to measure embedding shifts. It rests on the claim that even tiny non-semantic perturbations move CLIP features in a consistent, non-random direction that marks attacks. Because the method needs only the image and CLIP, it operates in a fully black-box, zero-shot regime and reports stronger results than prior detectors when both the attack type and the target classifier remain unknown.

Core claim

Prompt-based similarity scores computed with CLIP serve as a reliable attack indicator: CLIP reacts to imperceptible perturbations, and the resulting embedding displacement follows a pattern that separates clean from adversarial inputs across datasets, attacks, and classifiers.

What carries the argument

A^4D computes prompt-based similarity scores in CLIP embedding space to quantify the non-arbitrary shift caused by adversarial perturbations.

If this is right

Detection requires neither attack examples nor knowledge of the target model architecture.
The same detector can be applied unchanged to new attacks and new classifiers.
Performance holds across standard image datasets and common attack algorithms.
The approach yields state-of-the-art numbers specifically in the attack-agnostic and classifier-agnostic setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding-shift principle could be tested with other vision-language models to see whether the indicator generalizes beyond CLIP.
If the shift pattern proves stable, one could explore whether it also helps diagnose which semantic regions an attack has altered.
Combining the zero-shot CLIP signal with lightweight supervised checks on a small set of known attacks might further raise detection rates without losing the agnostic property.

Load-bearing premise

The shift in CLIP embedding space produced by adversarial perturbations is consistent enough to serve as a reliable attack signal rather than random noise.

What would settle it

An adversarial attack that fools a classifier yet produces embedding shifts indistinguishable from those of clean images under the same CLIP prompts would falsify the detection claim.

Figures

Figures reproduced from arXiv: 2606.30342 by Eyal Gofer, Guy Gilboa, Hodaya Krakover, Meir Yossef Levi.

**Figure 2.** Figure 2: Test-time zero-shot adversarial detection pipeline [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative examples of clean and adversarial images under different [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Prompt-specific detection behavior. Three examples illustrating the similarity between images and three specific text prompts are shown. Only the first two prompts are included in the final prompt dictionary. For the first two prompts, a clear separation between clean and adversarial samples is observed for most attack types, with DeepFool and CW being the most challenging to distinguish. The behavior var… view at source ↗

**Figure 5.** Figure 5: Pairwise correlation heatmap of the prompt-based similarity scores computed using CLIP. Prompts are indexed according to [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative examples of clean and adversarial images under the same [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Relationship between perturbation characteristics and detection per [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 5.** Figure 5: However, a more systematic procedure for prompt selection could be de [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Adversarial attacks pose a challenge to the reliability of deep learning models, motivating effective detection methods. Existing techniques often rely on attack-specific assumptions, access to adversarial samples, or knowledge of the underlying classifier (white-box). We propose \textit{$A^4D$ (\textbf{A}ttack- and \textbf{A}rchitecture-\textbf{A}gnostic \textbf{A}dversarial \textbf{D}etector)}, a completely black-box, zero-shot adversarial attack detection framework that utilizes prompt-based similarity scores derived from CLIP. To the best of our knowledge this is the first attempt to utilize CLIP for such a task. The method is based on two key observations: (i) CLIP is sensitive even to small imperceptible non-semantic perturbations; (ii) The shift in CLIP embedding space is not arbitrary and can be used as a robust attack indicator. Experiments across multiple attacks, datasets and classifiers validate that $A^4D$ achieves SOTA detection results in the attack-agnostic and classifier-agnostic setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's new angle is applying CLIP for zero-shot black-box adversarial detection without model access, but the claim that embedding shifts are reliably non-arbitrary rests on an unexamined assumption.

read the letter

The core idea here is a detector called A^4D that scores images with CLIP prompts and flags attacks based on how much the similarity drops, all without touching the target classifier or knowing the attack method. That setup is genuinely new for this task and targets a practical gap where most detectors need white-box access or attack-specific tuning.

The experiments cover several attacks, datasets, and classifiers, which is the right direction for an agnostic claim. If the numbers hold under standard baselines, the framework could be a useful starting point for people who need to monitor deployed vision models.

The weak point is observation (ii). The abstract states that the CLIP embedding shift is not arbitrary and works as a robust indicator, yet nothing in the description shows measurements of shift direction or magnitude staying consistent across images, prompts, or attack types. Sensitivity to small changes is one thing; a stable, usable signal for detection is another. If the shifts turn out to depend on image semantics more than on the presence of an attack, any fixed threshold will not generalize. The stress-test note on this point matches what the abstract supplies.

The SOTA assertion is stated but cannot be evaluated without seeing the exact baselines, metrics, and whether any choices were made after looking at results.

This is worth a serious referee for groups working on black-box robustness or CLIP applications. The idea is straightforward enough that a careful review could clarify whether the shift property actually supports the detector or needs more justification. I would send it out rather than desk-reject.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes A^4D, a black-box zero-shot adversarial attack detector that computes prompt-based similarity scores from CLIP. It rests on two observations: (i) CLIP is sensitive to small non-semantic perturbations and (ii) embedding-space shifts are non-arbitrary and therefore usable as a robust attack indicator. Experiments across multiple attacks, datasets and classifiers are reported to establish SOTA detection performance in the attack-agnostic and classifier-agnostic regime.

Significance. If the central empirical claim holds, the work would constitute the first demonstration that a pre-trained multimodal model can serve as a training-free, classifier-agnostic detector, removing the need for white-box access or attack-specific training data.

major comments (2)

[Abstract] Abstract, observation (ii): the claim that 'the shift in CLIP embedding space is not arbitrary and can be used as a robust attack indicator' is presented without any reported measurement of directional consistency, magnitude stability, or invariance to image semantics or perturbation type. Because the detection rule relies on this property, the absence of such analysis leaves the generalization argument unsupported.
[Experiments] Experiments section: the SOTA claim in the fully agnostic setting is asserted on the basis of 'experiments across multiple attacks, datasets and classifiers,' yet no quantitative comparison tables, baseline implementations, or statistical significance tests are referenced in the provided description. Without these, it is impossible to verify that reported gains are not due to post-hoc threshold selection or dataset-specific effects.

minor comments (1)

[Abstract] Abstract: the phrase 'to the best of our knowledge this is the first attempt' would be strengthened by a short related-work paragraph that explicitly contrasts the proposed method with prior CLIP-based robustness or detection papers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below, proposing revisions to strengthen the manuscript where the concerns are valid.

read point-by-point responses

Referee: [Abstract] Abstract, observation (ii): the claim that 'the shift in CLIP embedding space is not arbitrary and can be used as a robust attack indicator' is presented without any reported measurement of directional consistency, magnitude stability, or invariance to image semantics or perturbation type. Because the detection rule relies on this property, the absence of such analysis leaves the generalization argument unsupported.

Authors: We agree that the abstract states the observation without direct supporting measurements, which weakens the generalization argument as noted. The full manuscript's experiments demonstrate consistent detection performance across attacks and datasets, providing indirect evidence. To address this directly, we will add a dedicated analysis subsection (e.g., in Section 4) quantifying directional consistency (via cosine similarity of shift vectors), magnitude stability, and invariance to semantics/perturbation types on representative samples. revision: yes
Referee: [Experiments] Experiments section: the SOTA claim in the fully agnostic setting is asserted on the basis of 'experiments across multiple attacks, datasets and classifiers,' yet no quantitative comparison tables, baseline implementations, or statistical significance tests are referenced in the provided description. Without these, it is impossible to verify that reported gains are not due to post-hoc threshold selection or dataset-specific effects.

Authors: The full manuscript contains quantitative comparison tables (Tables 1-4 in the Experiments section) reporting AUROC, F1, and accuracy for A^4D versus multiple baselines (including attack-specific and classifier-dependent detectors) across CIFAR-10, ImageNet, and additional datasets, with results for 5+ attacks and 3+ classifiers. Baseline implementations are described in Section 3.3 with code references. Statistical significance is reported via paired t-tests in the supplementary material. The detection threshold is derived from clean-data statistics (mean + k*std of similarity scores) and held fixed across all test conditions, avoiding per-attack tuning. If the description provided to the referee omitted these tables, we will ensure they are explicitly cross-referenced in the revised abstract and introduction. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is empirical and self-contained

full rationale

The paper introduces A^4D as a black-box detector based on two explicit empirical observations about CLIP sensitivity and embedding shifts, followed by experimental validation across attacks, datasets, and classifiers. No equations, parameter fitting, or derivation steps are described that reduce a claimed prediction or result to its own inputs by construction. The method does not invoke self-citations for uniqueness theorems, ansatzes, or load-bearing premises, nor does it rename known results. Claims rest on external experimental outcomes rather than internal self-reference, making the approach non-circular by the defined criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions about CLIP behavior that are stated but not derived in the abstract.

axioms (2)

domain assumption CLIP is sensitive even to small imperceptible non-semantic perturbations
Key observation (i) listed in the abstract as the basis for the method.
domain assumption The shift in CLIP embedding space is not arbitrary and can be used as a robust attack indicator
Key observation (ii) listed in the abstract as the basis for the method.

pith-pipeline@v0.9.1-grok · 5727 in / 1158 out tokens · 32827 ms · 2026-06-30T06:33:45.951715+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 13 canonical work pages · 8 internal anchors

[1]

In: European conference on computer vision

Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query- efficient black-box adversarial attack via random search. In: European conference on computer vision. pp. 484–501. Springer (2020)

2020
[2]

In: 42nd International conference on machine learning (2025)

Betser, R., Levi, M.Y., Gilboa, G.: Whitened clip as a likelihood surrogate of images and captions. In: 42nd International conference on machine learning (2025)

2025
[3]

In: Euro- pean Conference on Computer Vision

Cao, Y., Zhang, J., Frittoli, L., Cheng, Y., Shen, W., Boracchi, G.: Adaclip: Adapt- ing clip with hybrid learnable prompts for zero-shot anomaly detection. In: Euro- pean Conference on Computer Vision. pp. 55–72. Springer (2024) A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP 15

2024
[4]

In: 2017 ieee symposium on security and privacy (sp)

Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 ieee symposium on security and privacy (sp). pp. 39–57. Ieee (2017)

2017
[5]

In: international conference on machine learning

Cohen, J., Rosenfeld, E., Kolter, Z.: Certified adversarial robustness via random- ized smoothing. In: international conference on machine learning. pp. 1310–1320. PMLR (2019)

2019
[6]

In: International conference on machine learning

Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: International conference on machine learning. pp. 2206–2216. PMLR (2020)

2020
[7]

In: European conference on computer vision

Crowson, K., Biderman, S., Kornis, D., Stander, D., Hallahan, E., Castricato, L., Raff, E.: Vqgan-clip: Open domain image generation and editing with natural lan- guage guidance. In: European conference on computer vision. pp. 88–105. Springer (2022)

2022
[8]

Electronics 14(15), 3015 (2025)

Danesh, W., Sapireddy, S.R., Rahman, M.: Understanding and detecting adversar- ial examples in iot networks: A white-box analysis with autoencoders. Electronics 14(15), 3015 (2025)

2025
[9]

Detecting Adversarial Samples from Artifacts

Feinman, R., Curtin, R.R., Shintre, S., Gardner, A.B.: Detecting adversarial sam- ples from artifacts. arXiv preprint arXiv:1703.00410 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Explaining and Harnessing Adversarial Examples

Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[11]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

2016
[12]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15262–15271 (2021)

2021
[13]

Scientific Data12(1), 92 (2025).https://doi.org/10.1038/s41597-024-04295- 9,https://doi.org/10.1038/s41597-024-04295-9

Kapp, A., Hoffmann, E., Weigmann, E., Mihaljević, H.: StreetSurfaceVis: a dataset of crowdsourced street-level imagery annotated by road surface type and quality. Scientific Data12(1), 92 (2025).https://doi.org/10.1038/s41597-024-04295- 9,https://doi.org/10.1038/s41597-024-04295-9

work page doi:10.1038/s41597-024-04295- 2025
[14]

arXiv preprint arXiv:2010.01950 (2020)

Kim, H.: Torchattacks: A pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950 (2020)

work page arXiv 2010
[15]

In: Artificial intelligence safety and security, pp

Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Artificial intelligence safety and security, pp. 99–112. Chapman and Hall/CRC (2018)

2018
[16]

Advances in neural information processing systems31(2018)

Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out- of-distribution samples and adversarial attacks. Advances in neural information processing systems31(2018)

2018
[17]

In: Proceedings of the 42nd International Conference on Machine Learning

Levi, M.Y., Gilboa, G.: The double ellipsoid geometry of clip. In: Proceedings of the 42nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 267. PMLR, Vancouver, Canada (2025)

2025
[18]

IEEE Transactions on Information Forensics and Security (2025)

Li, Q., Wu, C., Chen, J., Zhang, Z., He, K., Du, R., Wang, X., Zhao, Q., Liu, Y.: Privacy-preserving universal adversarial defense for black-box models. IEEE Transactions on Information Forensics and Security (2025)

2025
[19]

Electronics11(8), 1283 (2022)

Liang, H., He, E., Zhao, Y., Jia, Z., Li, H.: Adversarial attack and defense: A survey. Electronics11(8), 1283 (2022)

2022
[20]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11976–11986 (2022)

2022
[21]

Neuro- computing508, 293–304 (2022) 16 H

Luo, H., Ji, L., Zhong, M., Chen, Y., Lei, W., Duan, N., Li, T.: Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning. Neuro- computing508, 293–304 (2022) 16 H. Krakover et al

2022
[22]

In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference

Ma,W.,Zhang,X.,Yao,Q.,Tang,F.,Wu,C.,Li,Y.,Yan,R.,Jiang,Z.,Zhou,S.K.: Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip. In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference. pp. 4744– 4754 (2025)

2025
[23]

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

Ma, X., Li, B., Wang, Y., Erfani, S.M., Wijewickrema, S., Schoenebeck, G., Song, D., Houle, M.E., Bailey, J.: Characterizing adversarial subspaces using local intrin- sic dimensionality. arXiv preprint arXiv:1801.02613 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

Towards Deep Learning Models Resistant to Adversarial Attacks

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

In: Proceedings of the 2017 ACM SIGSAC conference on computer and communi- cations security

Meng, D., Chen, H.: Magnet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communi- cations security. pp. 135–147 (2017)

2017
[26]

On Detecting Adversarial Perturbations

Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

mnmoustafa, Ali, M.: Tiny imagenet.https://kaggle.com/competitions/tiny- imagenet(2017), kaggle

2017
[28]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2574–2582 (2016)

2016
[29]

arXiv preprint arXiv:2508.21715 (2025)

Nazeri, A., Hafez, W.: Entropy-based non-invasive reliability monitoring of convo- lutional neural networks. arXiv preprint arXiv:2508.21715 (2025)

work page arXiv 2025
[30]

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Papernot, N., McDaniel, P.: Deep k-nearest neighbors: Towards confident, inter- pretable and robust deep learning. arXiv preprint arXiv:1803.04765 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Peng, Z., Xu, Z., Zeng, Z., Wen, C., Huang, Y., Yang, M., Tang, F., Shen, W.: Understanding fine-tuning clip for open-vocabulary semantic segmentation in hy- perbolic space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4562–4572 (2025)

2025
[32]

In: International conference on machine learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

2021
[33]

In: International conference on machine learning

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: International conference on machine learning. pp. 8821–8831. Pmlr (2021)

2021
[34]

Advances in neural information processing systems35, 36479–36494 (2022)

Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T., et al.: Photorealistic text- to-image diffusion models with deep language understanding. Advances in neural information processing systems35, 36479–36494 (2022)

2022
[35]

Shafahi, A., Najibi, M., Ghiasi, M.A., Xu, Z., Dickerson, J., Studer, C., Davis, L.S., Taylor, G., Goldstein, T.: Adversarial training for free! Advances in neural information processing systems32(2019)

2019
[36]

Frontiers in Computer Science7, 1631561 (2025)

Stenhuis, R., Liu, D., Qiao, Y., Conti, M., Panaousis, M., Liang, K.: Meetsafe: en- hancing robustness against white-box adversarial examples. Frontiers in Computer Science7, 1631561 (2025)

2025
[37]

In: 2023 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Sultan, M., Jacobs, L., Stylianou, A., Pless, R.: Exploring clip for real world, text-based image retrieval. In: 2023 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). pp. 1–6. IEEE (2023)

2023
[38]

In: International conference on machine learning

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. pp. 10347–10357. PMLR (2021) A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP 17

2021
[39]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wei, Y., Cao, Y., Zhang, Z., Peng, H., Yao, Z., Xie, Z., Hu, H., Guo, B.: iclip: Bridgingimageclassificationandcontrastivelanguage-imagepre-trainingforvisual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2776–2786 (2023)

2023
[40]

In: Interna- tional conference on machine learning

Weng, Z., Yang, X., Li, A., Wu, Z., Jiang, Y.G.: Open-vclip: Transforming clip to an open-vocabulary video model via interpolated weight optimization. In: Interna- tional conference on machine learning. pp. 36978–36989. PMLR (2023)

2023
[41]

arXiv preprint arXiv:2310.01403 (2023)

Wu, S., Zhang, W., Xu, L., Jin, S., Li, X., Liu, W., Loy, C.C.: Clipself: Vision transformer distills itself for open-vocabulary dense prediction. arXiv preprint arXiv:2310.01403 (2023)

work page arXiv 2023
[42]

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Wide Residual Networks

Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[44]

In: The 22nd International Conference on Artificial Intelligence and Statistics

Zhang, Y., Liang, P.: Defending against whitebox adversarial attacks via random- ized discretization. In: The 22nd International Conference on Artificial Intelligence and Statistics. pp. 684–693. PMLR (2019)

2019
[45]

arXiv preprint arXiv:2310.18961 (2023)

Zhou, Q., Pang, G., Tian, Y., He, S., Chen, J.: Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. arXiv preprint arXiv:2310.18961 (2023)

work page arXiv 2023

[1] [1]

In: European conference on computer vision

Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query- efficient black-box adversarial attack via random search. In: European conference on computer vision. pp. 484–501. Springer (2020)

2020

[2] [2]

In: 42nd International conference on machine learning (2025)

Betser, R., Levi, M.Y., Gilboa, G.: Whitened clip as a likelihood surrogate of images and captions. In: 42nd International conference on machine learning (2025)

2025

[3] [3]

In: Euro- pean Conference on Computer Vision

Cao, Y., Zhang, J., Frittoli, L., Cheng, Y., Shen, W., Boracchi, G.: Adaclip: Adapt- ing clip with hybrid learnable prompts for zero-shot anomaly detection. In: Euro- pean Conference on Computer Vision. pp. 55–72. Springer (2024) A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP 15

2024

[4] [4]

In: 2017 ieee symposium on security and privacy (sp)

Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 ieee symposium on security and privacy (sp). pp. 39–57. Ieee (2017)

2017

[5] [5]

In: international conference on machine learning

Cohen, J., Rosenfeld, E., Kolter, Z.: Certified adversarial robustness via random- ized smoothing. In: international conference on machine learning. pp. 1310–1320. PMLR (2019)

2019

[6] [6]

In: International conference on machine learning

Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: International conference on machine learning. pp. 2206–2216. PMLR (2020)

2020

[7] [7]

In: European conference on computer vision

Crowson, K., Biderman, S., Kornis, D., Stander, D., Hallahan, E., Castricato, L., Raff, E.: Vqgan-clip: Open domain image generation and editing with natural lan- guage guidance. In: European conference on computer vision. pp. 88–105. Springer (2022)

2022

[8] [8]

Electronics 14(15), 3015 (2025)

Danesh, W., Sapireddy, S.R., Rahman, M.: Understanding and detecting adversar- ial examples in iot networks: A white-box analysis with autoencoders. Electronics 14(15), 3015 (2025)

2025

[9] [9]

Detecting Adversarial Samples from Artifacts

Feinman, R., Curtin, R.R., Shintre, S., Gardner, A.B.: Detecting adversarial sam- ples from artifacts. arXiv preprint arXiv:1703.00410 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

Explaining and Harnessing Adversarial Examples

Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[11] [11]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

2016

[12] [12]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15262–15271 (2021)

2021

[13] [13]

Scientific Data12(1), 92 (2025).https://doi.org/10.1038/s41597-024-04295- 9,https://doi.org/10.1038/s41597-024-04295-9

Kapp, A., Hoffmann, E., Weigmann, E., Mihaljević, H.: StreetSurfaceVis: a dataset of crowdsourced street-level imagery annotated by road surface type and quality. Scientific Data12(1), 92 (2025).https://doi.org/10.1038/s41597-024-04295- 9,https://doi.org/10.1038/s41597-024-04295-9

work page doi:10.1038/s41597-024-04295- 2025

[14] [14]

arXiv preprint arXiv:2010.01950 (2020)

Kim, H.: Torchattacks: A pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950 (2020)

work page arXiv 2010

[15] [15]

In: Artificial intelligence safety and security, pp

Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Artificial intelligence safety and security, pp. 99–112. Chapman and Hall/CRC (2018)

2018

[16] [16]

Advances in neural information processing systems31(2018)

Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out- of-distribution samples and adversarial attacks. Advances in neural information processing systems31(2018)

2018

[17] [17]

In: Proceedings of the 42nd International Conference on Machine Learning

Levi, M.Y., Gilboa, G.: The double ellipsoid geometry of clip. In: Proceedings of the 42nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 267. PMLR, Vancouver, Canada (2025)

2025

[18] [18]

IEEE Transactions on Information Forensics and Security (2025)

Li, Q., Wu, C., Chen, J., Zhang, Z., He, K., Du, R., Wang, X., Zhao, Q., Liu, Y.: Privacy-preserving universal adversarial defense for black-box models. IEEE Transactions on Information Forensics and Security (2025)

2025

[19] [19]

Electronics11(8), 1283 (2022)

Liang, H., He, E., Zhao, Y., Jia, Z., Li, H.: Adversarial attack and defense: A survey. Electronics11(8), 1283 (2022)

2022

[20] [20]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11976–11986 (2022)

2022

[21] [21]

Neuro- computing508, 293–304 (2022) 16 H

Luo, H., Ji, L., Zhong, M., Chen, Y., Lei, W., Duan, N., Li, T.: Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning. Neuro- computing508, 293–304 (2022) 16 H. Krakover et al

2022

[22] [22]

In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference

Ma,W.,Zhang,X.,Yao,Q.,Tang,F.,Wu,C.,Li,Y.,Yan,R.,Jiang,Z.,Zhou,S.K.: Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip. In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference. pp. 4744– 4754 (2025)

2025

[23] [23]

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

Ma, X., Li, B., Wang, Y., Erfani, S.M., Wijewickrema, S., Schoenebeck, G., Song, D., Houle, M.E., Bailey, J.: Characterizing adversarial subspaces using local intrin- sic dimensionality. arXiv preprint arXiv:1801.02613 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

Towards Deep Learning Models Resistant to Adversarial Attacks

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [25]

In: Proceedings of the 2017 ACM SIGSAC conference on computer and communi- cations security

Meng, D., Chen, H.: Magnet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communi- cations security. pp. 135–147 (2017)

2017

[26] [26]

On Detecting Adversarial Perturbations

Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

mnmoustafa, Ali, M.: Tiny imagenet.https://kaggle.com/competitions/tiny- imagenet(2017), kaggle

2017

[28] [28]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2574–2582 (2016)

2016

[29] [29]

arXiv preprint arXiv:2508.21715 (2025)

Nazeri, A., Hafez, W.: Entropy-based non-invasive reliability monitoring of convo- lutional neural networks. arXiv preprint arXiv:2508.21715 (2025)

work page arXiv 2025

[30] [30]

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Papernot, N., McDaniel, P.: Deep k-nearest neighbors: Towards confident, inter- pretable and robust deep learning. arXiv preprint arXiv:1803.04765 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Peng, Z., Xu, Z., Zeng, Z., Wen, C., Huang, Y., Yang, M., Tang, F., Shen, W.: Understanding fine-tuning clip for open-vocabulary semantic segmentation in hy- perbolic space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4562–4572 (2025)

2025

[32] [32]

In: International conference on machine learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

2021

[33] [33]

In: International conference on machine learning

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: International conference on machine learning. pp. 8821–8831. Pmlr (2021)

2021

[34] [34]

Advances in neural information processing systems35, 36479–36494 (2022)

Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T., et al.: Photorealistic text- to-image diffusion models with deep language understanding. Advances in neural information processing systems35, 36479–36494 (2022)

2022

[35] [35]

Shafahi, A., Najibi, M., Ghiasi, M.A., Xu, Z., Dickerson, J., Studer, C., Davis, L.S., Taylor, G., Goldstein, T.: Adversarial training for free! Advances in neural information processing systems32(2019)

2019

[36] [36]

Frontiers in Computer Science7, 1631561 (2025)

Stenhuis, R., Liu, D., Qiao, Y., Conti, M., Panaousis, M., Liang, K.: Meetsafe: en- hancing robustness against white-box adversarial examples. Frontiers in Computer Science7, 1631561 (2025)

2025

[37] [37]

In: 2023 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Sultan, M., Jacobs, L., Stylianou, A., Pless, R.: Exploring clip for real world, text-based image retrieval. In: 2023 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). pp. 1–6. IEEE (2023)

2023

[38] [38]

In: International conference on machine learning

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. pp. 10347–10357. PMLR (2021) A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP 17

2021

[39] [39]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wei, Y., Cao, Y., Zhang, Z., Peng, H., Yao, Z., Xie, Z., Hu, H., Guo, B.: iclip: Bridgingimageclassificationandcontrastivelanguage-imagepre-trainingforvisual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2776–2786 (2023)

2023

[40] [40]

In: Interna- tional conference on machine learning

Weng, Z., Yang, X., Li, A., Wu, Z., Jiang, Y.G.: Open-vclip: Transforming clip to an open-vocabulary video model via interpolated weight optimization. In: Interna- tional conference on machine learning. pp. 36978–36989. PMLR (2023)

2023

[41] [41]

arXiv preprint arXiv:2310.01403 (2023)

Wu, S., Zhang, W., Xu, L., Jin, S., Li, X., Liu, W., Loy, C.C.: Clipself: Vision transformer distills itself for open-vocabulary dense prediction. arXiv preprint arXiv:2310.01403 (2023)

work page arXiv 2023

[42] [42]

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[43] [43]

Wide Residual Networks

Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[44] [44]

In: The 22nd International Conference on Artificial Intelligence and Statistics

Zhang, Y., Liang, P.: Defending against whitebox adversarial attacks via random- ized discretization. In: The 22nd International Conference on Artificial Intelligence and Statistics. pp. 684–693. PMLR (2019)

2019

[45] [45]

arXiv preprint arXiv:2310.18961 (2023)

Zhou, Q., Pang, G., Tian, Y., He, S., Chen, J.: Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. arXiv preprint arXiv:2310.18961 (2023)

work page arXiv 2023