Out-of-the-box: Black-box Causal Attacks on Object Detectors

David A. Kelly; Hana Chockler; Melane Navaratnarajah

arxiv: 2512.03730 · v2 · submitted 2025-12-03 · 💻 cs.CV · cs.AI

Out-of-the-box: Black-box Causal Attacks on Object Detectors

Melane Navaratnarajah , David A. Kelly , Hana Chockler This is my paper

Pith reviewed 2026-05-17 02:30 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords black-box adversarial attacksobject detectorscausal pixel setsadversarial perturbationsexplainable attackscomputer vision securityimperceptible attacks

0 comments

The pith

BlackCAtt identifies minimal causally sufficient pixel sets to generate smaller, explainable attacks on object detectors from black-box outputs alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BlackCAtt, which locates the smallest groups of pixels that directly cause an object detector to produce a particular output. These groups are found using only the detector's visible results such as bounding-box coordinates, class labels, or confidence values. Changing just those pixels produces attacks that match or surpass other black-box methods in effectiveness while remaining fully explainable because the changes target explicit causes. When confidence scores are also available, BlackCAtt can be combined with existing attack algorithms to shrink the number of altered pixels without lowering success rates. This matters for developers who want to understand why detectors fail and to create more precise ways to test and strengthen them.

Core claim

BlackCAtt identifies minimal causally sufficient pixel sets from black-box detector outputs to construct explainable, imperceptible, reproducible, and architecture-agnostic attacks. With access only to bounding-box position and label, the attacks are comparable or better than those from other black-box methods. With added access to model confidence, it functions as a meta-algorithm that reduces perturbation size, for instance lowering the average L0 norm from 0.987 to 0.072 when paired with SquareAttack while preserving success rate.

What carries the argument

Minimal causally sufficient pixel sets, identified from detector outputs and perturbed to produce targeted failures in detection results.

If this is right

Attacks become fully explainable because they manipulate only the pixels that cause the detector's output.
Using only position and label information yields attacks that are comparable or better than those from other black-box techniques.
When model confidence is available, BlackCAtt reduces the size of perturbations from standard methods such as SquareAttack while keeping similar success rates.
Ablation studies show that each component of the algorithm contributes measurably to attack quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The causal-pixel approach could be applied to design defenses that protect the most influential image regions rather than the entire input.
The same identification process might be tested on other vision tasks such as segmentation or classification to check whether causal sets remain small and useful.
Developers could use the method to audit deployed detectors for hidden causal weaknesses without needing model weights or gradients.

Load-bearing premise

Minimal causally sufficient pixel sets can be reliably identified and effectively altered using only black-box outputs such as bounding-box position, label, or confidence.

What would settle it

An experiment in which perturbing the identified causal pixel sets produces no change in the detector's outputs or in which the resulting attacks are larger or less successful than those generated by baseline black-box methods.

Figures

Figures reproduced from arXiv: 2512.03730 by David A. Kelly, Hana Chockler, Melane Navaratnarajah.

**Figure 1.** Figure 1: The MSPS for cat (Figure 1b) reveals a dependency on the surrounding context. BlackCAtt starts with causal pixels outside of the bounding box and works inwards in order to maximize imperceptibility. In both Figures 1c and 1d the cat is still clearly present and complete, but YOLO no longer detects the cat. The attack works because BlackCAtt changes part of the cause of the detection. BlackCAtt is model agn… view at source ↗

**Figure 2.** Figure 2: The DC between bounding box and MSPS stays almost [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Causally explainable adversarial attacks on [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Example of a trial in BlackCAttMoG. From top-left to bottom-right: original image overlaid with the responsibility for inside-MSPS and bbox, the top 7 peaks extracted, fitted MoG mask and, finally, the attacked image with no detection. This perturbs the image at the location of the peak with the intensity indicated by P(X ) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Success rate of different approaches in adding new spurious detection, with different models on COCO dataset, for different [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: no-prediction, change prediction, add prediction with YOLO on the COCO dataset, showing the distribution of LPIPS and L2 distances for the three most successful methods. The problem of over-determination is well known in the literature of causality [16]. As Chockler et al. [8] show for image classifiers, many images have multiple, independent, MSPSs. We know of no comparable work on OD, so we restrict our… view at source ↗

**Figure 7.** Figure 7: Mutations on the pixels that are a part of the MSPS but not in the bounding box. (zoomed in view) [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Single step attack on a car using YOLO. Shows attacking the MSPS that is inside the bounding box and outside the bounding box. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Success rate of different approaches in removing a detection, with different models on COCO dataset, for different thresholds with L0, L1, L2, LP IP S, SSIM. The different techniques are noise, targeted noise, blended, DRISEMoG and MoG. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Success rate of different approaches in changing the label of the detection, with different models on COCO dataset, for different thresholds with L0, L1, L2, LP IP S, SSIM. The different techniques are noise, targeted noise, blended, DRISEMoG and MoG. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Success rate of different approaches in adding new spurious detection, with different models on COCO dataset, for different thresholds with L0, L1, L2, LP IP S, SSIM. The different techniques are noise, targeted noise, blended, DRISEMoG and MoG. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 12.** Figure 12: Complete results comparing L2 against confidence for Faster-R-CNN, YOLO and RT-DETR the different approaches are noise, targeted noise, blended, DRISEMoG and MoG. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

**Figure 13.** Figure 13: (a) Original image and detector bbox; (b–d) responsibility heatmaps (same min/max scale) used for BlackCAtt [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗

**Figure 14.** Figure 14: (a) Original image and detector bbox; (b–d) responsibility heatmaps (same min/max scale) used for BlackCAtt [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗

read the original abstract

Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box, architecture specific and use a loss function. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and analyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. We evaluate BlackCAtt on standard benchmarks and compare it to other black-box adversarial attacks methods. When BlackCAtt has access only to the position and label of a bounding box, it produces attacks that are comparable or better to those produced by other black-box methods. When BlackCAtt has access to the model confidence as well, it can work as a meta-algorithm, improving the ability of standard black-box techniques to construct smaller, less perceptible attacks. As BlackCAtt attacks manipulate causes only, the attacks become fully explainable. We compare the performance of BlackCAtt with other black-box attack methods and show that targeting causal pixels leads to smaller and less perceptible attacks. For example, when using BlackCAtt with SquareAttack, it reduces the average distance ($L_0$ norm) of the attack from the original input from $0.987$ to $0.072$, while maintaining a similar success rate. We perform ablation studies on the BlackCAtt algorithm and analyze the effect of different components on its performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BlackCAtt frames black-box attacks around minimal causally sufficient pixels and can shrink L0 norms when wrapped around methods like SquareAttack, but the recovery of those exact causal sets from discrete outputs is the part that needs checking.

read the letter

BlackCAtt identifies minimal sets of pixels that causally determine an object detector's bounding box outputs, then attacks by changing just those pixels. With only position and label info it matches or beats other black-box attacks. Adding confidence lets it improve existing methods, cutting average L0 distance from 0.987 down to 0.072 at similar success rates. The new part is framing the attack around causal sufficiency rather than pure optimization. This gives a clearer story for why the attack works and makes the changes more reproducible and explainable. The meta-algorithm use is practical if it holds up, and they back it with benchmark comparisons plus ablations on the algorithm's parts. The weak point is verifying that the pixel sets are truly minimal and causal from black-box queries alone. Detector outputs are thresholded, so small changes often do nothing until a boundary is crossed. That makes it hard to isolate exact causal pixels without over-including or picking correlated ones instead. The abstract gives numbers but skips details on the search procedure, controls, or tests for minimality. If that step is loose, the size and explainability benefits shrink. This is for people building or testing robust vision systems who need black-box tools that also offer some insight into what the model is using. It has enough concrete results and a distinct angle to warrant a full review, though the causal identification needs more scrutiny in the full text.

Referee Report

2 major / 2 minor

Summary. The paper introduces BlackCAtt, a black-box algorithm for adversarial attacks on object detectors that identifies minimal causally sufficient pixel sets to produce explainable perturbations. It claims that with access only to bounding-box position and label the method matches or exceeds other black-box attacks, and that when model confidence is also available it functions as a meta-algorithm that improves existing techniques (e.g., reducing average L0 norm from 0.987 to 0.072 while preserving success rate when combined with SquareAttack). The work is evaluated on standard benchmarks with ablation studies on algorithmic components.

Significance. If the central claim that minimal causal pixel sets can be recovered and edited from black-box detector outputs holds, the result would be significant for adversarial robustness research in computer vision. It offers an architecture-agnostic, gradient-free route to smaller, more interpretable attacks and could supply concrete diagnostic information for hardening detectors.

major comments (2)

[§3] §3 (causal identification procedure): the algorithm that recovers minimal causally sufficient pixel sets via black-box interventions (masking/perturbation followed by output comparison) is not shown to guarantee minimality. Because detector outputs are discrete and thresholded, the procedure can return supersets or pixels whose effect is merely correlated; this directly affects the reported L0 reduction and the claim that attacks are 'fully explainable' because they manipulate causes only.
[Experimental results] Experimental results (L0-norm comparison paragraph and associated table/figure): the reduction from 0.987 to 0.072 when BlackCAtt is used with SquareAttack is presented without the number of images, number of independent runs, variance, or statistical test for the success-rate equivalence. These details are load-bearing for the meta-algorithm claim.

minor comments (2)

[Abstract] Abstract and §1: 'standard benchmarks' and the exact object detectors used should be named explicitly rather than left generic.
[Notation] Notation: the formal definition of 'causally sufficient' for a pixel set relative to a detector output (position, label, or score) should be stated once, preferably with a short equation or set notation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their detailed and insightful comments, which have helped us identify areas for improvement in the manuscript. We address each major comment below, indicating the revisions we plan to make.

read point-by-point responses

Referee: [§3] §3 (causal identification procedure): the algorithm that recovers minimal causally sufficient pixel sets via black-box interventions (masking/perturbation followed by output comparison) is not shown to guarantee minimality. Because detector outputs are discrete and thresholded, the procedure can return supersets or pixels whose effect is merely correlated; this directly affects the reported L0 reduction and the claim that attacks are 'fully explainable' because they manipulate causes only.

Authors: We acknowledge that our causal identification procedure, as described in Section 3, does not include a formal proof or guarantee of minimality. The discrete and thresholded nature of object detector outputs means that the identified pixel sets may indeed be supersets or include correlated effects rather than purely causal minimal sets. This is a valid point that impacts the strength of our claims regarding L0 reductions and full explainability. In the revised version, we will update Section 3 to explicitly state that the procedure identifies empirically sufficient sets through interventions but does not guarantee strict minimality. We will also temper the language around 'fully explainable' to reflect that the attacks manipulate pixels that are sufficient to cause changes in the detector output based on our black-box interventions. This will be a partial revision as we clarify rather than fundamentally alter the algorithm. revision: partial
Referee: [Experimental results] Experimental results (L0-norm comparison paragraph and associated table/figure): the reduction from 0.987 to 0.072 when BlackCAtt is used with SquareAttack is presented without the number of images, number of independent runs, variance, or statistical test for the success-rate equivalence. These details are load-bearing for the meta-algorithm claim.

Authors: We agree that the experimental details for the L0-norm comparison are insufficient as presented. The results are derived from evaluations on the standard COCO dataset or similar benchmarks used in the paper, but we did not report the exact number of images tested, the number of independent runs, variance measures, or perform statistical tests to support the equivalence in success rates. We will revise the relevant paragraph, table, and figure captions to include: the number of images (e.g., 1000 images from the validation set), number of independent runs (e.g., 3 runs with different random seeds), standard deviation or variance for the L0 norms, and a statistical test (such as Wilcoxon signed-rank test) confirming that the success rate remains statistically equivalent while L0 is significantly reduced. This will strengthen the meta-algorithm claim. revision: yes

Circularity Check

0 steps flagged

No circularity; algorithmic method evaluated on external benchmarks

full rationale

The paper presents BlackCAtt as a black-box algorithmic procedure for identifying and manipulating minimal causally sufficient pixel sets using only detector outputs such as bounding box position, label, or confidence. It evaluates performance on standard benchmarks, reports empirical comparisons to other black-box methods (e.g., L0-norm reduction when combined with SquareAttack), and performs ablation studies. No equations, derivations, or claims are shown that reduce by construction to fitted parameters, self-definitions, or self-citation chains. The method is self-contained against external benchmarks with no load-bearing steps that loop back to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that causal sufficiency of pixel sets can be determined from black-box queries alone; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Causal relationships between pixel changes and detector outputs can be identified using only black-box access to bounding box position, label, or confidence.
This underpins the construction of minimal sufficient sets and the explainability claim.

pith-pipeline@v0.9.0 · 5605 in / 1303 out tokens · 78896 ms · 2026-05-17T02:30:52.862689+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models
cs.SD 2026-04 unverdicted novelty 7.0

Transferability analysis finds that minimal sufficient signals transfer across audio models at rates varying by task, around 26% for music genre classification, with some deepfake models showing distinct behaviors not...

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Square attack: a query-efficient black-box adversarial attack via random search

Maksym Andriushchenko, Francesco Croce, Nicolas Flam- marion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. InEuropean conference on computer vision, pages 484–501. Springer,

work page
[2]

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PLOS One, 10(7), 2015

Sebastian Bach, Alexander Binder, Gr ´egoire Montavon, Frederick Klauschen, Klaus-Robert M ¨uller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PLOS One, 10(7), 2015. 7

work page 2015
[3]

Face: Faithful automatic concept extraction

Dipkamal Bhusal, Michael Clifford, Sara Rampazzi, and Nidhi Rastogi. Face: Faithful automatic concept extraction. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems. 8

work page
[4]

Context-aware transfer attacks for ob- ject detection

Zikui Cai, Xinxin Xie, Shasha Li, Mingjun Yin, Chengyu Song, Srikanth V Krishnamurthy, Amit K Roy-Chowdhury, and M Salman Asif. Context-aware transfer attacks for ob- ject detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 149–157, 2022. 8

work page 2022
[5]

Hana Chockler and Joseph Y . Halpern. Responsibility and blame: A structural-model approach.J. Artif. Intell. Res., 22:93–115, 2004. 2

work page 2004
[6]

Hana Chockler and Joseph Y . Halpern. Explaining image classifiers, 2024. 1

work page 2024
[7]

Causal explanations for image classifiers

Hana Chockler, David A Kelly, Daniel Kroening, and Youcheng Sun. Causal explanations for image classifiers. arXiv preprint arXiv:2411.08875, 2024. 1, 2, 3

work page arXiv 2024
[8]

Kelly, and Daniel Kroening

Hana Chockler, David A. Kelly, and Daniel Kroening. Mul- tiple different explanations for image classifiers. InECAI European Conference on Artificial Intelligence, 2025. 2, 7

work page 2025
[9]

Sparse and imperceiv- able adversarial attacks

Francesco Croce and Matthias Hein. Sparse and imperceiv- able adversarial attacks. InProceedings of the IEEE/CVF international conference on computer vision, pages 4724– 4732, 2019. 8

work page 2019
[10]

Saliency attack: Towards imperceptible black-box adversarial attack

Zeyu Dai, Shengcai Liu, Qing Li, and Ke Tang. Saliency attack: Towards imperceptible black-box adversarial attack. ACM Transactions on Intelligent Systems and Technology, 14(3), 2023. 8

work page 2023
[11]

Lee R. Dice. Measures of the amount of ecologic association between species.Ecology, 26:297—-302, 1945. 3

work page 1945
[12]

On the connection between adversarial ro- bustness and saliency map interpretability

Christian Etmann, Sebastian Lunz, Peter Maass, and Car- ola Schoenlieb. On the connection between adversarial ro- bustness and saliency map interpretability. InInternational Conference on Machine Learning, pages 1823–1832. PMLR,

work page
[13]

Fastjsma: Accel- erating jacobian-based saliency map attacks through gradient decoupling

Zhenghao Gao, Shengjie Xu, Zijing Li, Meixi Chen, Chao- jian Yu, Yuanjie Shao, and Changxin Gao. Fastjsma: Accel- erating jacobian-based saliency map attacks through gradient decoupling. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1506–1515,

work page
[14]

Saliency methods for explain- ing adversarial attacks, 2019

Jindong Gu and V olker Tresp. Saliency methods for explain- ing adversarial attacks, 2019. 8

work page 2019
[15]

Joseph Y . Halpern. A modification of the Halpern–Pearl def- inition of causality. InProceedings of IJCAI, pages 3022–

work page
[16]

AAAI Press, 2015. 1 9

work page 2015
[17]

Halpern.Actual Causality

Joseph Y . Halpern.Actual Causality. The MIT Press, 2019. 2, 7

work page 2019
[18]

On relating explanations and adversarial examples.Ad- vances in neural information processing systems, 32, 2019

Alexey Ignatiev, Nina Narodytska, and Joao Marques-Silva. On relating explanations and adversarial examples.Ad- vances in neural information processing systems, 32, 2019. 8

work page 2019
[19]

Black-box adversarial attacks with limited queries and information

Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. InInternational conference on machine learn- ing, pages 2137–2146. PMLR, 2018. 8

work page 2018
[20]

FastSHAP: Real-Time Shapley Value Estimation

Neil Jethani, Mukund Sudarshan, Ian Connick Covert, Su-In Lee, and Rajesh Ranganath. FastSHAP: Real-Time Shapley Value Estimation. InInternational Conference on Learning Representations, 2022. 8

work page 2022
[21]

FastSHAP: Real-time shapley value estimation

Neil Jethani, Mukund Sudarshan, Ian Connick Covert, Su-In Lee, and Rajesh Ranganath. FastSHAP: Real-time shapley value estimation. InInternational Conference on Learning Representations, 2022. 8

work page 2022
[22]

Comparing the decision-making mechanisms by transformers and cnns via explanation methods

Mingqi Jiang, Saeed Khorram, and Li Fuxin. Comparing the decision-making mechanisms by transformers and cnns via explanation methods. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9546–9555, 2024. 6

work page 2024
[23]

Causal identification of sufficient, contrastive and complete feature sets in image classification.arXiv preprint arXiv:2507.23497, 2025

David A Kelly and Hana Chockler. Causal identification of sufficient, contrastive and complete feature sets in image classification.arXiv preprint arXiv:2507.23497, 2025. 7

work page arXiv 2025
[24]

Fast Explanation Using Shapley Value for Object Detection.IEEE Access, 12: 31047–31054, 2024

Michihiro Kuroki and Toshihiko Yamasaki. Fast Explanation Using Shapley Value for Object Detection.IEEE Access, 12: 31047–31054, 2024. 8

work page 2024
[25]

Adversarial attacks and defenses: An interpretation perspective.ACM SIGKDD Explorations Newsletter, 23(1): 86–99, 2021

Ninghao Liu, Mengnan Du, Ruocheng Guo, Huan Liu, and Xia Hu. Adversarial attacks and defenses: An interpretation perspective.ACM SIGKDD Explorations Newsletter, 23(1): 86–99, 2021. 8

work page 2021
[26]

Reed, Cheng-Yang Fu, and Alexander C

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: single shot multibox detector. InProceedings of European Conference in Computer Vision ECCV, Part I, pages 21–37. Springer, 2016. 8

work page 2016
[27]

DPatch: An Adversarial Patch Attack on Object Detectors

Xin Liu, Huanrui Yang, Ziwei Liu, Linghao Song, Hai Li, and Yiran Chen. Dpatch: An adversarial patch attack on object detectors.arXiv preprint arXiv:1806.02299, 2018. 8

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InAdvances in Neural In- formation Processing Systems (NeurIPS), pages 4765–4774,

work page
[29]

Detrs beat yolos on real-time object detection, 2023

Wenyu Lv, Shangliang Xu, Yian Zhao, Guanzhong Wang, Jinman Wei, Cheng Cui, Yuning Du, Qingqing Dang, and Yi Liu. Detrs beat yolos on real-time object detection, 2023. 8

work page 2023
[30]

On saliency maps and adversarial robustness

Puneet Mangla, Vedant Singh, and Vineeth N Balasubrama- nian. On saliency maps and adversarial robustness. Injoint European conference on machine learning and knowledge discovery in databases, pages 272–288. Springer, 2020. 8

work page 2020
[31]

Relative attributing propagation: Inter- preting the comparative contributions of individual units in deep neural networks

Woo-Jeoung Nam, Shir Gur, Jaesik Choi, Lior Wolf, and Seong-Whan Lee. Relative attributing propagation: Inter- preting the comparative contributions of individual units in deep neural networks. InAAAI Conference on Artificial In- telligence, pages 2501–2508, 2020. 7

work page 2020
[32]

Simple black- box adversarial attacks on deep neural networks

Nina Narodytska and Shiva Kasiviswanathan. Simple black- box adversarial attacks on deep neural networks. In2017 IEEE Conference on Computer Vision and Pattern Recogni- tion Workshops (CVPRW), pages 1310–1318, 2017. 1

work page 2017
[33]

A survey and evaluation of adversarial attacks for object de- tection, 2025

Khoi Nguyen Tiet Nguyen, Wenyu Zhang, Kangkang Lu, Yuhuan Wu, Xingjian Zheng, Hui Li Tan, and Liangli Zhen. A survey and evaluation of adversarial attacks for object de- tection, 2025. 8

work page 2025
[34]

A survey and evaluation of adversarial attacks in object de- tection.IEEE Transactions on Neural Networks and Learn- ing Systems, 36(9):15706–15722, 2025

Khoi Nguyen Tiet Nguyen, Wenyu Zhang, Kangkang Lu, Yu-Huan Wu, Xingjian Zheng, Hui Li Tan, and Liangli Zhen. A survey and evaluation of adversarial attacks in object de- tection.IEEE Transactions on Neural Networks and Learn- ing Systems, 36(9):15706–15722, 2025. 1

work page 2025
[35]

Berkay Celik, and Ananthram Swami

Nicolas Papernot, Patrick Mcdaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings.2016 IEEE European Symposium on Security and Privacy (Eu- roS&P), pages 372–387, 2015. 8

work page 2016
[36]

The Limitations of Deep Learning in Adversarial Settings

Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings.CoRR, abs/1511.07528, 2015. 8

work page internal anchor Pith review Pith/arXiv arXiv 2015
[37]

RISE: random- ized input sampling for explanation of black-box models

Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: random- ized input sampling for explanation of black-box models. In British Machine Vision Conference (BMVC). BMV A Press,

work page
[38]

Morariu, Ashutosh Mehra, Vicente Ordonez, and Kate Saenko

Vitali Petsiuk, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, and Kate Saenko. Black-box explanation of object detectors via saliency maps. In2021 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 11438– 11447, 2021. 2, 6

work page 2021
[39]

Girshick, and Ali Farhadi

Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. InProceedings of CVPR, pages 779–788,

work page
[40]

Why should I trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should I trust you?” Explaining the predictions of any classifier. InKnowledge Discovery and Data Mining (KDD), pages 1135–1144. ACM, 2016. 8

work page 2016
[41]

Do explanations expose bias? how saliency maps af- fect judgements of biased face-recognition models

Justyn Rodrigues, Kris Ehinger, Oliver Obst, and Rosalind Wang. Do explanations expose bias? how saliency maps af- fect judgements of biased face-recognition models. InPro- ceedings of the 25th European Conference on Artificial In- telligence (ECAI 2025), 2025. 1

work page 2025
[42]

Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Ba- tra

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Ba- tra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. InInternational Conference on Computer Vision (ICCV), pages 618–626. IEEE, 2017. 8

work page 2017
[43]

Learning important features through propagating activation differences

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InInternational Conference on Machine Learn- ing (ICML), pages 3145–3153. PMLR, 2017. 7

work page 2017
[44]

Riedmiller

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. Striving for simplicity: The all convolutional net. InICLR (Workshop Track), 2015. 7 10

work page 2015
[45]

One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841,

Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841,

work page
[46]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InInternational Conference on Machine Learning, pages 3319–3328. PMLR, 2017. 7

work page 2017
[47]

Adversarial sample detection for deep neural network through model mutation testing

Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. Adversarial sample detection for deep neural network through model mutation testing. InProceedings of the 41st International Conference on Software Engineering. ACM, 2019. 8

work page 2019
[48]

Adversarial example detection based on saliency map features.Applied Intelligence, 52(6): 6262–6275, 2022

Shen Wang and Yuxin Gong. Adversarial example detection based on saliency map features.Applied Intelligence, 52(6): 6262–6275, 2022. 1, 8

work page 2022
[49]

Adversarial attention perturbations for large object de- tection transformers.ArXiv, abs/2508.02987, 2025

Zachary Yahn, Selim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Margaret Loper, and Ling Liu. Adversarial attention perturbations for large object de- tection transformers.ArXiv, abs/2508.02987, 2025. 1

work page arXiv 2025
[50]

Saliency Maps Give a False Sense of Explanability to Image Classifiers: An empirical evaluation across methods and metrics

Hanwei Zhang, Felipe Torres Figueroa, and Holger Her- manns. Saliency Maps Give a False Sense of Explanability to Image Classifiers: An empirical evaluation across methods and metrics. InProceedings of the 16th Asian Conference on Machine Learning, pages 479–494. PMLR, 2025. 1

work page 2025
[51]

yolo11n.pt

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 1, 2 11 Supplementary Materials ModelsThe different models used were (i) single-stage real-time detectors (YOLOv11 checkpoint: “yolo11n.pt”), (ii) two- stage (Faster-R-CNN using the standard CO...

work page arXiv 2018

[1] [1]

Square attack: a query-efficient black-box adversarial attack via random search

Maksym Andriushchenko, Francesco Croce, Nicolas Flam- marion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. InEuropean conference on computer vision, pages 484–501. Springer,

work page

[2] [2]

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PLOS One, 10(7), 2015

Sebastian Bach, Alexander Binder, Gr ´egoire Montavon, Frederick Klauschen, Klaus-Robert M ¨uller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PLOS One, 10(7), 2015. 7

work page 2015

[3] [3]

Face: Faithful automatic concept extraction

Dipkamal Bhusal, Michael Clifford, Sara Rampazzi, and Nidhi Rastogi. Face: Faithful automatic concept extraction. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems. 8

work page

[4] [4]

Context-aware transfer attacks for ob- ject detection

Zikui Cai, Xinxin Xie, Shasha Li, Mingjun Yin, Chengyu Song, Srikanth V Krishnamurthy, Amit K Roy-Chowdhury, and M Salman Asif. Context-aware transfer attacks for ob- ject detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 149–157, 2022. 8

work page 2022

[5] [5]

Hana Chockler and Joseph Y . Halpern. Responsibility and blame: A structural-model approach.J. Artif. Intell. Res., 22:93–115, 2004. 2

work page 2004

[6] [6]

Hana Chockler and Joseph Y . Halpern. Explaining image classifiers, 2024. 1

work page 2024

[7] [7]

Causal explanations for image classifiers

Hana Chockler, David A Kelly, Daniel Kroening, and Youcheng Sun. Causal explanations for image classifiers. arXiv preprint arXiv:2411.08875, 2024. 1, 2, 3

work page arXiv 2024

[8] [8]

Kelly, and Daniel Kroening

Hana Chockler, David A. Kelly, and Daniel Kroening. Mul- tiple different explanations for image classifiers. InECAI European Conference on Artificial Intelligence, 2025. 2, 7

work page 2025

[9] [9]

Sparse and imperceiv- able adversarial attacks

Francesco Croce and Matthias Hein. Sparse and imperceiv- able adversarial attacks. InProceedings of the IEEE/CVF international conference on computer vision, pages 4724– 4732, 2019. 8

work page 2019

[10] [10]

Saliency attack: Towards imperceptible black-box adversarial attack

Zeyu Dai, Shengcai Liu, Qing Li, and Ke Tang. Saliency attack: Towards imperceptible black-box adversarial attack. ACM Transactions on Intelligent Systems and Technology, 14(3), 2023. 8

work page 2023

[11] [11]

Lee R. Dice. Measures of the amount of ecologic association between species.Ecology, 26:297—-302, 1945. 3

work page 1945

[12] [12]

On the connection between adversarial ro- bustness and saliency map interpretability

Christian Etmann, Sebastian Lunz, Peter Maass, and Car- ola Schoenlieb. On the connection between adversarial ro- bustness and saliency map interpretability. InInternational Conference on Machine Learning, pages 1823–1832. PMLR,

work page

[13] [13]

Fastjsma: Accel- erating jacobian-based saliency map attacks through gradient decoupling

Zhenghao Gao, Shengjie Xu, Zijing Li, Meixi Chen, Chao- jian Yu, Yuanjie Shao, and Changxin Gao. Fastjsma: Accel- erating jacobian-based saliency map attacks through gradient decoupling. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1506–1515,

work page

[14] [14]

Saliency methods for explain- ing adversarial attacks, 2019

Jindong Gu and V olker Tresp. Saliency methods for explain- ing adversarial attacks, 2019. 8

work page 2019

[15] [15]

Joseph Y . Halpern. A modification of the Halpern–Pearl def- inition of causality. InProceedings of IJCAI, pages 3022–

work page

[16] [16]

AAAI Press, 2015. 1 9

work page 2015

[17] [17]

Halpern.Actual Causality

Joseph Y . Halpern.Actual Causality. The MIT Press, 2019. 2, 7

work page 2019

[18] [18]

On relating explanations and adversarial examples.Ad- vances in neural information processing systems, 32, 2019

Alexey Ignatiev, Nina Narodytska, and Joao Marques-Silva. On relating explanations and adversarial examples.Ad- vances in neural information processing systems, 32, 2019. 8

work page 2019

[19] [19]

Black-box adversarial attacks with limited queries and information

Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. InInternational conference on machine learn- ing, pages 2137–2146. PMLR, 2018. 8

work page 2018

[20] [20]

FastSHAP: Real-Time Shapley Value Estimation

Neil Jethani, Mukund Sudarshan, Ian Connick Covert, Su-In Lee, and Rajesh Ranganath. FastSHAP: Real-Time Shapley Value Estimation. InInternational Conference on Learning Representations, 2022. 8

work page 2022

[21] [21]

FastSHAP: Real-time shapley value estimation

Neil Jethani, Mukund Sudarshan, Ian Connick Covert, Su-In Lee, and Rajesh Ranganath. FastSHAP: Real-time shapley value estimation. InInternational Conference on Learning Representations, 2022. 8

work page 2022

[22] [22]

Comparing the decision-making mechanisms by transformers and cnns via explanation methods

Mingqi Jiang, Saeed Khorram, and Li Fuxin. Comparing the decision-making mechanisms by transformers and cnns via explanation methods. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9546–9555, 2024. 6

work page 2024

[23] [23]

Causal identification of sufficient, contrastive and complete feature sets in image classification.arXiv preprint arXiv:2507.23497, 2025

David A Kelly and Hana Chockler. Causal identification of sufficient, contrastive and complete feature sets in image classification.arXiv preprint arXiv:2507.23497, 2025. 7

work page arXiv 2025

[24] [24]

Fast Explanation Using Shapley Value for Object Detection.IEEE Access, 12: 31047–31054, 2024

Michihiro Kuroki and Toshihiko Yamasaki. Fast Explanation Using Shapley Value for Object Detection.IEEE Access, 12: 31047–31054, 2024. 8

work page 2024

[25] [25]

Adversarial attacks and defenses: An interpretation perspective.ACM SIGKDD Explorations Newsletter, 23(1): 86–99, 2021

Ninghao Liu, Mengnan Du, Ruocheng Guo, Huan Liu, and Xia Hu. Adversarial attacks and defenses: An interpretation perspective.ACM SIGKDD Explorations Newsletter, 23(1): 86–99, 2021. 8

work page 2021

[26] [26]

Reed, Cheng-Yang Fu, and Alexander C

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: single shot multibox detector. InProceedings of European Conference in Computer Vision ECCV, Part I, pages 21–37. Springer, 2016. 8

work page 2016

[27] [27]

DPatch: An Adversarial Patch Attack on Object Detectors

Xin Liu, Huanrui Yang, Ziwei Liu, Linghao Song, Hai Li, and Yiran Chen. Dpatch: An adversarial patch attack on object detectors.arXiv preprint arXiv:1806.02299, 2018. 8

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InAdvances in Neural In- formation Processing Systems (NeurIPS), pages 4765–4774,

work page

[29] [29]

Detrs beat yolos on real-time object detection, 2023

Wenyu Lv, Shangliang Xu, Yian Zhao, Guanzhong Wang, Jinman Wei, Cheng Cui, Yuning Du, Qingqing Dang, and Yi Liu. Detrs beat yolos on real-time object detection, 2023. 8

work page 2023

[30] [30]

On saliency maps and adversarial robustness

Puneet Mangla, Vedant Singh, and Vineeth N Balasubrama- nian. On saliency maps and adversarial robustness. Injoint European conference on machine learning and knowledge discovery in databases, pages 272–288. Springer, 2020. 8

work page 2020

[31] [31]

Relative attributing propagation: Inter- preting the comparative contributions of individual units in deep neural networks

Woo-Jeoung Nam, Shir Gur, Jaesik Choi, Lior Wolf, and Seong-Whan Lee. Relative attributing propagation: Inter- preting the comparative contributions of individual units in deep neural networks. InAAAI Conference on Artificial In- telligence, pages 2501–2508, 2020. 7

work page 2020

[32] [32]

Simple black- box adversarial attacks on deep neural networks

Nina Narodytska and Shiva Kasiviswanathan. Simple black- box adversarial attacks on deep neural networks. In2017 IEEE Conference on Computer Vision and Pattern Recogni- tion Workshops (CVPRW), pages 1310–1318, 2017. 1

work page 2017

[33] [33]

A survey and evaluation of adversarial attacks for object de- tection, 2025

Khoi Nguyen Tiet Nguyen, Wenyu Zhang, Kangkang Lu, Yuhuan Wu, Xingjian Zheng, Hui Li Tan, and Liangli Zhen. A survey and evaluation of adversarial attacks for object de- tection, 2025. 8

work page 2025

[34] [34]

A survey and evaluation of adversarial attacks in object de- tection.IEEE Transactions on Neural Networks and Learn- ing Systems, 36(9):15706–15722, 2025

Khoi Nguyen Tiet Nguyen, Wenyu Zhang, Kangkang Lu, Yu-Huan Wu, Xingjian Zheng, Hui Li Tan, and Liangli Zhen. A survey and evaluation of adversarial attacks in object de- tection.IEEE Transactions on Neural Networks and Learn- ing Systems, 36(9):15706–15722, 2025. 1

work page 2025

[35] [35]

Berkay Celik, and Ananthram Swami

Nicolas Papernot, Patrick Mcdaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings.2016 IEEE European Symposium on Security and Privacy (Eu- roS&P), pages 372–387, 2015. 8

work page 2016

[36] [36]

The Limitations of Deep Learning in Adversarial Settings

Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings.CoRR, abs/1511.07528, 2015. 8

work page internal anchor Pith review Pith/arXiv arXiv 2015

[37] [37]

RISE: random- ized input sampling for explanation of black-box models

Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: random- ized input sampling for explanation of black-box models. In British Machine Vision Conference (BMVC). BMV A Press,

work page

[38] [38]

Morariu, Ashutosh Mehra, Vicente Ordonez, and Kate Saenko

Vitali Petsiuk, Rajiv Jain, Varun Manjunatha, Vlad I. Morariu, Ashutosh Mehra, Vicente Ordonez, and Kate Saenko. Black-box explanation of object detectors via saliency maps. In2021 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 11438– 11447, 2021. 2, 6

work page 2021

[39] [39]

Girshick, and Ali Farhadi

Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. InProceedings of CVPR, pages 779–788,

work page

[40] [40]

Why should I trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should I trust you?” Explaining the predictions of any classifier. InKnowledge Discovery and Data Mining (KDD), pages 1135–1144. ACM, 2016. 8

work page 2016

[41] [41]

Do explanations expose bias? how saliency maps af- fect judgements of biased face-recognition models

Justyn Rodrigues, Kris Ehinger, Oliver Obst, and Rosalind Wang. Do explanations expose bias? how saliency maps af- fect judgements of biased face-recognition models. InPro- ceedings of the 25th European Conference on Artificial In- telligence (ECAI 2025), 2025. 1

work page 2025

[42] [42]

Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Ba- tra

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Ba- tra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. InInternational Conference on Computer Vision (ICCV), pages 618–626. IEEE, 2017. 8

work page 2017

[43] [43]

Learning important features through propagating activation differences

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InInternational Conference on Machine Learn- ing (ICML), pages 3145–3153. PMLR, 2017. 7

work page 2017

[44] [44]

Riedmiller

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. Striving for simplicity: The all convolutional net. InICLR (Workshop Track), 2015. 7 10

work page 2015

[45] [45]

One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841,

Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841,

work page

[46] [46]

Axiomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InInternational Conference on Machine Learning, pages 3319–3328. PMLR, 2017. 7

work page 2017

[47] [47]

Adversarial sample detection for deep neural network through model mutation testing

Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. Adversarial sample detection for deep neural network through model mutation testing. InProceedings of the 41st International Conference on Software Engineering. ACM, 2019. 8

work page 2019

[48] [48]

Adversarial example detection based on saliency map features.Applied Intelligence, 52(6): 6262–6275, 2022

Shen Wang and Yuxin Gong. Adversarial example detection based on saliency map features.Applied Intelligence, 52(6): 6262–6275, 2022. 1, 8

work page 2022

[49] [49]

Adversarial attention perturbations for large object de- tection transformers.ArXiv, abs/2508.02987, 2025

Zachary Yahn, Selim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Margaret Loper, and Ling Liu. Adversarial attention perturbations for large object de- tection transformers.ArXiv, abs/2508.02987, 2025. 1

work page arXiv 2025

[50] [50]

Saliency Maps Give a False Sense of Explanability to Image Classifiers: An empirical evaluation across methods and metrics

Hanwei Zhang, Felipe Torres Figueroa, and Holger Her- manns. Saliency Maps Give a False Sense of Explanability to Image Classifiers: An empirical evaluation across methods and metrics. InProceedings of the 16th Asian Conference on Machine Learning, pages 479–494. PMLR, 2025. 1

work page 2025

[51] [51]

yolo11n.pt

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 1, 2 11 Supplementary Materials ModelsThe different models used were (i) single-stage real-time detectors (YOLOv11 checkpoint: “yolo11n.pt”), (ii) two- stage (Faster-R-CNN using the standard CO...

work page arXiv 2018