pith. sign in

arxiv: 2512.03730 · v2 · submitted 2025-12-03 · 💻 cs.CV · cs.AI

Out-of-the-box: Black-box Causal Attacks on Object Detectors

Pith reviewed 2026-05-17 02:30 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords black-box adversarial attacksobject detectorscausal pixel setsadversarial perturbationsexplainable attackscomputer vision securityimperceptible attacks
0
0 comments X

The pith

BlackCAtt identifies minimal causally sufficient pixel sets to generate smaller, explainable attacks on object detectors from black-box outputs alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BlackCAtt, which locates the smallest groups of pixels that directly cause an object detector to produce a particular output. These groups are found using only the detector's visible results such as bounding-box coordinates, class labels, or confidence values. Changing just those pixels produces attacks that match or surpass other black-box methods in effectiveness while remaining fully explainable because the changes target explicit causes. When confidence scores are also available, BlackCAtt can be combined with existing attack algorithms to shrink the number of altered pixels without lowering success rates. This matters for developers who want to understand why detectors fail and to create more precise ways to test and strengthen them.

Core claim

BlackCAtt identifies minimal causally sufficient pixel sets from black-box detector outputs to construct explainable, imperceptible, reproducible, and architecture-agnostic attacks. With access only to bounding-box position and label, the attacks are comparable or better than those from other black-box methods. With added access to model confidence, it functions as a meta-algorithm that reduces perturbation size, for instance lowering the average L0 norm from 0.987 to 0.072 when paired with SquareAttack while preserving success rate.

What carries the argument

Minimal causally sufficient pixel sets, identified from detector outputs and perturbed to produce targeted failures in detection results.

If this is right

  • Attacks become fully explainable because they manipulate only the pixels that cause the detector's output.
  • Using only position and label information yields attacks that are comparable or better than those from other black-box techniques.
  • When model confidence is available, BlackCAtt reduces the size of perturbations from standard methods such as SquareAttack while keeping similar success rates.
  • Ablation studies show that each component of the algorithm contributes measurably to attack quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The causal-pixel approach could be applied to design defenses that protect the most influential image regions rather than the entire input.
  • The same identification process might be tested on other vision tasks such as segmentation or classification to check whether causal sets remain small and useful.
  • Developers could use the method to audit deployed detectors for hidden causal weaknesses without needing model weights or gradients.

Load-bearing premise

Minimal causally sufficient pixel sets can be reliably identified and effectively altered using only black-box outputs such as bounding-box position, label, or confidence.

What would settle it

An experiment in which perturbing the identified causal pixel sets produces no change in the detector's outputs or in which the resulting attacks are larger or less successful than those generated by baseline black-box methods.

Figures

Figures reproduced from arXiv: 2512.03730 by David A. Kelly, Hana Chockler, Melane Navaratnarajah.

Figure 1
Figure 1. Figure 1: The MSPS for cat (Figure 1b) reveals a dependency on the surrounding context. BlackCAtt starts with causal pixels outside of the bounding box and works inwards in order to maximize imperceptibility. In both Figures 1c and 1d the cat is still clearly present and complete, but YOLO no longer detects the cat. The attack works because BlackCAtt changes part of the cause of the detection. BlackCAtt is model agn… view at source ↗
Figure 2
Figure 2. Figure 2: The DC between bounding box and MSPS stays almost [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Causally explainable adversarial attacks on [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of a trial in BlackCAttMoG. From top-left to bottom-right: original image overlaid with the responsibility for inside-MSPS and bbox, the top 7 peaks extracted, fitted MoG mask and, finally, the attacked image with no detection. This perturbs the image at the location of the peak with the intensity indicated by P(X ) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Success rate of different approaches in adding new spurious detection, with different models on COCO dataset, for different [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: no-prediction, change prediction, add prediction with YOLO on the COCO dataset, showing the distribution of LPIPS and L2 distances for the three most successful methods. The problem of over-determination is well known in the literature of causality [16]. As Chockler et al. [8] show for image classifiers, many images have multiple, independent, MSPSs. We know of no comparable work on OD, so we re￾strict our… view at source ↗
Figure 7
Figure 7. Figure 7: Mutations on the pixels that are a part of the MSPS but not in the bounding box. (zoomed in view) [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Single step attack on a car using YOLO. Shows attacking the MSPS that is inside the bounding box and outside the bounding box. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Success rate of different approaches in removing a detection, with different models on COCO dataset, for different thresholds with L0, L1, L2, LP IP S, SSIM. The different techniques are noise, targeted noise, blended, DRISEMoG and MoG. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Success rate of different approaches in changing the label of the detection, with different models on COCO dataset, for different thresholds with L0, L1, L2, LP IP S, SSIM. The different techniques are noise, targeted noise, blended, DRISEMoG and MoG. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Success rate of different approaches in adding new spurious detection, with different models on COCO dataset, for different thresholds with L0, L1, L2, LP IP S, SSIM. The different techniques are noise, targeted noise, blended, DRISEMoG and MoG. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Complete results comparing L2 against confidence for Faster-R-CNN, YOLO and RT-DETR the different approaches are noise, targeted noise, blended, DRISEMoG and MoG. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: (a) Original image and detector bbox; (b–d) responsibility heatmaps (same min/max scale) used for BlackCAtt [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: (a) Original image and detector bbox; (b–d) responsibility heatmaps (same min/max scale) used for BlackCAtt [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗
read the original abstract

Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box, architecture specific and use a loss function. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and analyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. We evaluate BlackCAtt on standard benchmarks and compare it to other black-box adversarial attacks methods. When BlackCAtt has access only to the position and label of a bounding box, it produces attacks that are comparable or better to those produced by other black-box methods. When BlackCAtt has access to the model confidence as well, it can work as a meta-algorithm, improving the ability of standard black-box techniques to construct smaller, less perceptible attacks. As BlackCAtt attacks manipulate causes only, the attacks become fully explainable. We compare the performance of BlackCAtt with other black-box attack methods and show that targeting causal pixels leads to smaller and less perceptible attacks. For example, when using BlackCAtt with SquareAttack, it reduces the average distance ($L_0$ norm) of the attack from the original input from $0.987$ to $0.072$, while maintaining a similar success rate. We perform ablation studies on the BlackCAtt algorithm and analyze the effect of different components on its performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces BlackCAtt, a black-box algorithm for adversarial attacks on object detectors that identifies minimal causally sufficient pixel sets to produce explainable perturbations. It claims that with access only to bounding-box position and label the method matches or exceeds other black-box attacks, and that when model confidence is also available it functions as a meta-algorithm that improves existing techniques (e.g., reducing average L0 norm from 0.987 to 0.072 while preserving success rate when combined with SquareAttack). The work is evaluated on standard benchmarks with ablation studies on algorithmic components.

Significance. If the central claim that minimal causal pixel sets can be recovered and edited from black-box detector outputs holds, the result would be significant for adversarial robustness research in computer vision. It offers an architecture-agnostic, gradient-free route to smaller, more interpretable attacks and could supply concrete diagnostic information for hardening detectors.

major comments (2)
  1. [§3] §3 (causal identification procedure): the algorithm that recovers minimal causally sufficient pixel sets via black-box interventions (masking/perturbation followed by output comparison) is not shown to guarantee minimality. Because detector outputs are discrete and thresholded, the procedure can return supersets or pixels whose effect is merely correlated; this directly affects the reported L0 reduction and the claim that attacks are 'fully explainable' because they manipulate causes only.
  2. [Experimental results] Experimental results (L0-norm comparison paragraph and associated table/figure): the reduction from 0.987 to 0.072 when BlackCAtt is used with SquareAttack is presented without the number of images, number of independent runs, variance, or statistical test for the success-rate equivalence. These details are load-bearing for the meta-algorithm claim.
minor comments (2)
  1. [Abstract] Abstract and §1: 'standard benchmarks' and the exact object detectors used should be named explicitly rather than left generic.
  2. [Notation] Notation: the formal definition of 'causally sufficient' for a pixel set relative to a detector output (position, label, or score) should be stated once, preferably with a short equation or set notation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their detailed and insightful comments, which have helped us identify areas for improvement in the manuscript. We address each major comment below, indicating the revisions we plan to make.

read point-by-point responses
  1. Referee: [§3] §3 (causal identification procedure): the algorithm that recovers minimal causally sufficient pixel sets via black-box interventions (masking/perturbation followed by output comparison) is not shown to guarantee minimality. Because detector outputs are discrete and thresholded, the procedure can return supersets or pixels whose effect is merely correlated; this directly affects the reported L0 reduction and the claim that attacks are 'fully explainable' because they manipulate causes only.

    Authors: We acknowledge that our causal identification procedure, as described in Section 3, does not include a formal proof or guarantee of minimality. The discrete and thresholded nature of object detector outputs means that the identified pixel sets may indeed be supersets or include correlated effects rather than purely causal minimal sets. This is a valid point that impacts the strength of our claims regarding L0 reductions and full explainability. In the revised version, we will update Section 3 to explicitly state that the procedure identifies empirically sufficient sets through interventions but does not guarantee strict minimality. We will also temper the language around 'fully explainable' to reflect that the attacks manipulate pixels that are sufficient to cause changes in the detector output based on our black-box interventions. This will be a partial revision as we clarify rather than fundamentally alter the algorithm. revision: partial

  2. Referee: [Experimental results] Experimental results (L0-norm comparison paragraph and associated table/figure): the reduction from 0.987 to 0.072 when BlackCAtt is used with SquareAttack is presented without the number of images, number of independent runs, variance, or statistical test for the success-rate equivalence. These details are load-bearing for the meta-algorithm claim.

    Authors: We agree that the experimental details for the L0-norm comparison are insufficient as presented. The results are derived from evaluations on the standard COCO dataset or similar benchmarks used in the paper, but we did not report the exact number of images tested, the number of independent runs, variance measures, or perform statistical tests to support the equivalence in success rates. We will revise the relevant paragraph, table, and figure captions to include: the number of images (e.g., 1000 images from the validation set), number of independent runs (e.g., 3 runs with different random seeds), standard deviation or variance for the L0 norms, and a statistical test (such as Wilcoxon signed-rank test) confirming that the success rate remains statistically equivalent while L0 is significantly reduced. This will strengthen the meta-algorithm claim. revision: yes

Circularity Check

0 steps flagged

No circularity; algorithmic method evaluated on external benchmarks

full rationale

The paper presents BlackCAtt as a black-box algorithmic procedure for identifying and manipulating minimal causally sufficient pixel sets using only detector outputs such as bounding box position, label, or confidence. It evaluates performance on standard benchmarks, reports empirical comparisons to other black-box methods (e.g., L0-norm reduction when combined with SquareAttack), and performs ablation studies. No equations, derivations, or claims are shown that reduce by construction to fitted parameters, self-definitions, or self-citation chains. The method is self-contained against external benchmarks with no load-bearing steps that loop back to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that causal sufficiency of pixel sets can be determined from black-box queries alone; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Causal relationships between pixel changes and detector outputs can be identified using only black-box access to bounding box position, label, or confidence.
    This underpins the construction of minimal sufficient sets and the explainability claim.

pith-pipeline@v0.9.0 · 5605 in / 1303 out tokens · 78896 ms · 2026-05-17T02:30:52.862689+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models

    cs.SD 2026-04 unverdicted novelty 7.0

    Transferability analysis finds that minimal sufficient signals transfer across audio models at rates varying by task, around 26% for music genre classification, with some deepfake models showing distinct behaviors not...

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Square attack: a query-efficient black-box adversarial attack via random search

    Maksym Andriushchenko, Francesco Croce, Nicolas Flam- marion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. InEuropean conference on computer vision, pages 484–501. Springer,

  2. [2]

    On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PLOS One, 10(7), 2015

    Sebastian Bach, Alexander Binder, Gr ´egoire Montavon, Frederick Klauschen, Klaus-Robert M ¨uller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PLOS One, 10(7), 2015. 7

  3. [3]

    Face: Faithful automatic concept extraction

    Dipkamal Bhusal, Michael Clifford, Sara Rampazzi, and Nidhi Rastogi. Face: Faithful automatic concept extraction. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems. 8

  4. [4]

    Context-aware transfer attacks for ob- ject detection

    Zikui Cai, Xinxin Xie, Shasha Li, Mingjun Yin, Chengyu Song, Srikanth V Krishnamurthy, Amit K Roy-Chowdhury, and M Salman Asif. Context-aware transfer attacks for ob- ject detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 149–157, 2022. 8

  5. [5]

    Hana Chockler and Joseph Y . Halpern. Responsibility and blame: A structural-model approach.J. Artif. Intell. Res., 22:93–115, 2004. 2

  6. [6]

    Hana Chockler and Joseph Y . Halpern. Explaining image classifiers, 2024. 1

  7. [7]

    Causal explanations for image classifiers

    Hana Chockler, David A Kelly, Daniel Kroening, and Youcheng Sun. Causal explanations for image classifiers. arXiv preprint arXiv:2411.08875, 2024. 1, 2, 3

  8. [8]

    Kelly, and Daniel Kroening

    Hana Chockler, David A. Kelly, and Daniel Kroening. Mul- tiple different explanations for image classifiers. InECAI European Conference on Artificial Intelligence, 2025. 2, 7

  9. [9]

    Sparse and imperceiv- able adversarial attacks

    Francesco Croce and Matthias Hein. Sparse and imperceiv- able adversarial attacks. InProceedings of the IEEE/CVF international conference on computer vision, pages 4724– 4732, 2019. 8

  10. [10]

    Saliency attack: Towards imperceptible black-box adversarial attack

    Zeyu Dai, Shengcai Liu, Qing Li, and Ke Tang. Saliency attack: Towards imperceptible black-box adversarial attack. ACM Transactions on Intelligent Systems and Technology, 14(3), 2023. 8

  11. [11]

    Lee R. Dice. Measures of the amount of ecologic association between species.Ecology, 26:297—-302, 1945. 3

  12. [12]

    On the connection between adversarial ro- bustness and saliency map interpretability

    Christian Etmann, Sebastian Lunz, Peter Maass, and Car- ola Schoenlieb. On the connection between adversarial ro- bustness and saliency map interpretability. InInternational Conference on Machine Learning, pages 1823–1832. PMLR,

  13. [13]

    Fastjsma: Accel- erating jacobian-based saliency map attacks through gradient decoupling

    Zhenghao Gao, Shengjie Xu, Zijing Li, Meixi Chen, Chao- jian Yu, Yuanjie Shao, and Changxin Gao. Fastjsma: Accel- erating jacobian-based saliency map attacks through gradient decoupling. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1506–1515,

  14. [14]

    Saliency methods for explain- ing adversarial attacks, 2019

    Jindong Gu and V olker Tresp. Saliency methods for explain- ing adversarial attacks, 2019. 8

  15. [15]

    Joseph Y . Halpern. A modification of the Halpern–Pearl def- inition of causality. InProceedings of IJCAI, pages 3022–

  16. [16]

    AAAI Press, 2015. 1 9

  17. [17]

    Halpern.Actual Causality

    Joseph Y . Halpern.Actual Causality. The MIT Press, 2019. 2, 7

  18. [18]

    On relating explanations and adversarial examples.Ad- vances in neural information processing systems, 32, 2019

    Alexey Ignatiev, Nina Narodytska, and Joao Marques-Silva. On relating explanations and adversarial examples.Ad- vances in neural information processing systems, 32, 2019. 8

  19. [19]

    Black-box adversarial attacks with limited queries and information

    Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. InInternational conference on machine learn- ing, pages 2137–2146. PMLR, 2018. 8

  20. [20]

    FastSHAP: Real-Time Shapley Value Estimation

    Neil Jethani, Mukund Sudarshan, Ian Connick Covert, Su-In Lee, and Rajesh Ranganath. FastSHAP: Real-Time Shapley Value Estimation. InInternational Conference on Learning Representations, 2022. 8

  21. [21]

    FastSHAP: Real-time shapley value estimation

    Neil Jethani, Mukund Sudarshan, Ian Connick Covert, Su-In Lee, and Rajesh Ranganath. FastSHAP: Real-time shapley value estimation. InInternational Conference on Learning Representations, 2022. 8

  22. [22]

    Comparing the decision-making mechanisms by transformers and cnns via explanation methods

    Mingqi Jiang, Saeed Khorram, and Li Fuxin. Comparing the decision-making mechanisms by transformers and cnns via explanation methods. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9546–9555, 2024. 6

  23. [23]

    Causal identification of sufficient, contrastive and complete feature sets in image classification.arXiv preprint arXiv:2507.23497, 2025

    David A Kelly and Hana Chockler. Causal identification of sufficient, contrastive and complete feature sets in image classification.arXiv preprint arXiv:2507.23497, 2025. 7

  24. [24]

    Fast Explanation Using Shapley Value for Object Detection.IEEE Access, 12: 31047–31054, 2024

    Michihiro Kuroki and Toshihiko Yamasaki. Fast Explanation Using Shapley Value for Object Detection.IEEE Access, 12: 31047–31054, 2024. 8

  25. [25]

    Adversarial attacks and defenses: An interpretation perspective.ACM SIGKDD Explorations Newsletter, 23(1): 86–99, 2021

    Ninghao Liu, Mengnan Du, Ruocheng Guo, Huan Liu, and Xia Hu. Adversarial attacks and defenses: An interpretation perspective.ACM SIGKDD Explorations Newsletter, 23(1): 86–99, 2021. 8

  26. [26]

    Reed, Cheng-Yang Fu, and Alexander C

    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: single shot multibox detector. InProceedings of European Conference in Computer Vision ECCV, Part I, pages 21–37. Springer, 2016. 8

  27. [27]

    DPatch: An Adversarial Patch Attack on Object Detectors

    Xin Liu, Huanrui Yang, Ziwei Liu, Linghao Song, Hai Li, and Yiran Chen. Dpatch: An adversarial patch attack on object detectors.arXiv preprint arXiv:1806.02299, 2018. 8

  28. [28]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InAdvances in Neural In- formation Processing Systems (NeurIPS), pages 4765–4774,

  29. [29]

    Detrs beat yolos on real-time object detection, 2023

    Wenyu Lv, Shangliang Xu, Yian Zhao, Guanzhong Wang, Jinman Wei, Cheng Cui, Yuning Du, Qingqing Dang, and Yi Liu. Detrs beat yolos on real-time object detection, 2023. 8

  30. [30]

    On saliency maps and adversarial robustness

    Puneet Mangla, Vedant Singh, and Vineeth N Balasubrama- nian. On saliency maps and adversarial robustness. Injoint European conference on machine learning and knowledge discovery in databases, pages 272–288. Springer, 2020. 8

  31. [31]

    Relative attributing propagation: Inter- preting the comparative contributions of individual units in deep neural networks

    Woo-Jeoung Nam, Shir Gur, Jaesik Choi, Lior Wolf, and Seong-Whan Lee. Relative attributing propagation: Inter- preting the comparative contributions of individual units in deep neural networks. InAAAI Conference on Artificial In- telligence, pages 2501–2508, 2020. 7

  32. [32]

    Simple black- box adversarial attacks on deep neural networks

    Nina Narodytska and Shiva Kasiviswanathan. Simple black- box adversarial attacks on deep neural networks. In2017 IEEE Conference on Computer Vision and Pattern Recogni- tion Workshops (CVPRW), pages 1310–1318, 2017. 1

  33. [33]

    A survey and evaluation of adversarial attacks for object de- tection, 2025

    Khoi Nguyen Tiet Nguyen, Wenyu Zhang, Kangkang Lu, Yuhuan Wu, Xingjian Zheng, Hui Li Tan, and Liangli Zhen. A survey and evaluation of adversarial attacks for object de- tection, 2025. 8

  34. [34]

    A survey and evaluation of adversarial attacks in object de- tection.IEEE Transactions on Neural Networks and Learn- ing Systems, 36(9):15706–15722, 2025

    Khoi Nguyen Tiet Nguyen, Wenyu Zhang, Kangkang Lu, Yu-Huan Wu, Xingjian Zheng, Hui Li Tan, and Liangli Zhen. A survey and evaluation of adversarial attacks in object de- tection.IEEE Transactions on Neural Networks and Learn- ing Systems, 36(9):15706–15722, 2025. 1

  35. [35]

    Berkay Celik, and Ananthram Swami

    Nicolas Papernot, Patrick Mcdaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings.2016 IEEE European Symposium on Security and Privacy (Eu- roS&P), pages 372–387, 2015. 8

  36. [36]

    The Limitations of Deep Learning in Adversarial Settings

    Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings.CoRR, abs/1511.07528, 2015. 8

  37. [37]

    RISE: random- ized input sampling for explanation of black-box models

    Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: random- ized input sampling for explanation of black-box models. In British Machine Vision Conference (BMVC). BMV A Press,

  38. [38]

    Morariu, Ashutosh Mehra, Vicente Ordonez, and Kate Saenko

    Vitali Petsiuk, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, and Kate Saenko. Black-box explanation of object detectors via saliency maps. In2021 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 11438– 11447, 2021. 2, 6

  39. [39]

    Girshick, and Ali Farhadi

    Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. InProceedings of CVPR, pages 779–788,

  40. [40]

    Why should I trust you?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should I trust you?” Explaining the predictions of any classifier. InKnowledge Discovery and Data Mining (KDD), pages 1135–1144. ACM, 2016. 8

  41. [41]

    Do explanations expose bias? how saliency maps af- fect judgements of biased face-recognition models

    Justyn Rodrigues, Kris Ehinger, Oliver Obst, and Rosalind Wang. Do explanations expose bias? how saliency maps af- fect judgements of biased face-recognition models. InPro- ceedings of the 25th European Conference on Artificial In- telligence (ECAI 2025), 2025. 1

  42. [42]

    Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Ba- tra

    Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Ba- tra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. InInternational Conference on Computer Vision (ICCV), pages 618–626. IEEE, 2017. 8

  43. [43]

    Learning important features through propagating activation differences

    Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InInternational Conference on Machine Learn- ing (ICML), pages 3145–3153. PMLR, 2017. 7

  44. [44]

    Riedmiller

    Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. Striving for simplicity: The all convolutional net. InICLR (Workshop Track), 2015. 7 10

  45. [45]

    One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841,

    Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841,

  46. [46]

    Axiomatic attribution for deep networks

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InInternational Conference on Machine Learning, pages 3319–3328. PMLR, 2017. 7

  47. [47]

    Adversarial sample detection for deep neural network through model mutation testing

    Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. Adversarial sample detection for deep neural network through model mutation testing. InProceedings of the 41st International Conference on Software Engineering. ACM, 2019. 8

  48. [48]

    Adversarial example detection based on saliency map features.Applied Intelligence, 52(6): 6262–6275, 2022

    Shen Wang and Yuxin Gong. Adversarial example detection based on saliency map features.Applied Intelligence, 52(6): 6262–6275, 2022. 1, 8

  49. [49]

    Adversarial attention perturbations for large object de- tection transformers.ArXiv, abs/2508.02987, 2025

    Zachary Yahn, Selim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Margaret Loper, and Ling Liu. Adversarial attention perturbations for large object de- tection transformers.ArXiv, abs/2508.02987, 2025. 1

  50. [50]

    Saliency Maps Give a False Sense of Explanability to Image Classifiers: An empirical evaluation across methods and metrics

    Hanwei Zhang, Felipe Torres Figueroa, and Holger Her- manns. Saliency Maps Give a False Sense of Explanability to Image Classifiers: An empirical evaluation across methods and metrics. InProceedings of the 16th Asian Conference on Machine Learning, pages 479–494. PMLR, 2025. 1

  51. [51]

    yolo11n.pt

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 1, 2 11 Supplementary Materials ModelsThe different models used were (i) single-stage real-time detectors (YOLOv11 checkpoint: “yolo11n.pt”), (ii) two- stage (Faster-R-CNN using the standard CO...