pith. machine review for the scientific record. sign in

arxiv: 2604.22552 · v1 · submitted 2026-04-24 · 💻 cs.CV

Recognition: unknown

Transferable Physical-World Adversarial Patches Against Pedestrian Detection Models

Minghui Li, Shengshan Hu, Shihui Yan, Yifan Hu, Yufei Song, Ziqi Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:23 UTC · model grok-4.3

classification 💻 cs.CV
keywords adversarial patchpedestrian detectionphysical world attacksmulti-stage attacktransferable patchtriplet loss
0
0 comments X

The pith

TriPatch generates physical adversarial patches that attack multiple stages of pedestrian detection with added robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method called TriPatch to create patches that can be printed and used to make pedestrian detectors fail in the real world. Existing patches often only affect one part of the detection process, allowing other parts to still work, and they do not hold up well when the scene changes with light or angle. TriPatch fixes this by using a loss that hits confidence, box position, and the final selection step all at once, while keeping the patch's appearance stable and training with varied conditions. A reader would care because these systems are used in cars and cameras, so better attacks reveal how to make them safer.

Core claim

The central discovery is that a triplet loss targeting detection confidence suppression, bounding-box offset amplification, and NMS disruption, combined with appearance consistency loss and data augmentation, enables generation of more effective and robust physical adversarial patches that achieve higher attack success rates across multiple detector models.

What carries the argument

The triplet loss that jointly disrupts confidence scores, bounding box predictions, and non-maximum suppression in the detection pipeline, supported by consistency constraints and physical augmentations.

If this is right

  • The attacks become more transferable to different pedestrian detection models.
  • The patches maintain effectiveness despite physical variations in the environment.
  • Residual modules in detectors are less able to compensate for the perturbations.
  • Overall attack success rates increase compared to previous single-stage or non-augmented methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Security systems relying on pedestrian detection may require new countermeasures that protect all pipeline stages simultaneously.
  • Similar approaches could apply to other detection tasks like vehicle or object recognition in autonomous systems.
  • Further work could test the patches' performance on specific hardware setups used in real deployments.

Load-bearing premise

The simulated effects of the triplet loss and augmentations will carry over to actual physical patches without a large drop in performance due to real-world factors like printing errors or sensor differences.

What would settle it

Printing the generated patches and attaching them to real pedestrians, then recording whether multiple detection models miss them under varied outdoor lighting, distances, and angles, and comparing the miss rates to those from earlier patch designs.

Figures

Figures reproduced from arXiv: 2604.22552 by Minghui Li, Shengshan Hu, Shihui Yan, Yifan Hu, Yufei Song, Ziqi Zhou.

Figure 1
Figure 1. Figure 1: Overview of adversarial examples on object detec view at source ↗
Figure 2
Figure 2. Figure 2: The pipeline of TriPatch. generalization, as well as query-efficient and zero-shot generation methods that require limited or no access to the target model. De￾spite their effectiveness in controlled digital environments, these methods typically assume direct pixel-level manipulation and full gradient accessibility, leaving open challenges when confronting more constrained and structured attack scenarios. … view at source ↗
Figure 2
Figure 2. Figure 2: We optimize on the adversarial image 𝑥 𝑎𝑑𝑣 a total objective composed of three primary losses and one appearance regularizer: 𝐿total = 𝜆det𝐿det + 𝜆iou𝐿iou + 𝜆nms𝐿nms + 𝜆app𝐿app, (2) where 𝜆det, 𝜆iou, 𝜆nms, and 𝜆app are weighting coefficients that bal￾ance the contribution of each loss component. These hyperparam￾eters allow fine-tuning the attack strategy based on the specific detector architecture and dep… view at source ↗
Figure 3
Figure 3. Figure 3: Digital-World Adversarial Attack Results of Our Method. view at source ↗
Figure 4
Figure 4. Figure 4: Physical attack demo of TriPatch view at source ↗
Figure 5
Figure 5. Figure 5: The results (%) of ablation study. (a) - (d) investigate the effect of different epochs, patch sizes, modules and loss view at source ↗
Figure 7
Figure 7. Figure 7: Performance consistency across multiple random view at source ↗
read the original abstract

Physical adversarial patch attacks critically threaten pedestrian detection, causing surveillance and autonomous driving systems to miss pedestrians and creating severe safety risks. Despite their effectiveness in controlled settings, existing physical attacks face two major limitations in practice: they lack systematic disruption of the multi-stage decision pipeline, enabling residual modules to offset perturbations, and they fail to model complex physical variations, leading to poor robustness. To overcome these limitations, we propose a novel pedestrian adversarial patch generation method that combines multi-stage collaborative attacks with robustness enhancement under physical diversity, called TriPatch. Specifically, we design a triplet loss consisting of detection confidence suppression, bounding-box offset amplification, and non-maximum suppression (NMS) disruption, which jointly act across different stages of the detection pipeline. In addition, we introduce an appearance consistency loss to constrain the color distribution of the patch, thereby improving its adaptability under diverse imaging conditions, and incorporate data augmentation to further enhance robustness against complex physical perturbations. Extensive experiments demonstrate that TriPatch achieves a higher attack success rate across multiple detector models compared to existing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes TriPatch, a method for generating transferable physical-world adversarial patches against pedestrian detection models. It introduces a triplet loss that jointly suppresses detection confidence, amplifies bounding-box offsets, and disrupts NMS across the multi-stage detection pipeline, combined with an appearance consistency loss to constrain patch color distribution and data augmentations to improve robustness under physical variations such as lighting and viewpoint changes. The central claim is that this design yields higher attack success rates (ASR) across multiple detector models than existing physical patch attacks.

Significance. If the empirical results hold under rigorous evaluation, the work is significant for safety-critical applications in autonomous driving and surveillance, as it directly targets residual compensation mechanisms in modern detectors and models physical diversity more explicitly than prior confidence-only attacks. The multi-component loss provides a principled way to attack the full pipeline rather than isolated stages, which could guide both attack and defense research.

minor comments (3)
  1. [Abstract] Abstract: the claim of 'higher attack success rate across multiple detector models' is stated without any quantitative values, baselines, or ablation highlights; adding one or two key ASR numbers and model names would strengthen the summary without exceeding length limits.
  2. [Section 3] Section 3 (method): the triplet loss is described in terms of its three goals but the precise weighting or combination formula is not shown; an explicit equation would clarify how the components interact during optimization.
  3. [Evaluation] Evaluation: while multiple models and physical perturbations are mentioned, the manuscript would benefit from reporting standard deviations over repeated physical trials and a clear ablation isolating the contribution of the NMS-disruption term.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful summary of our TriPatch method and for the positive assessment of its significance for safety-critical applications. We appreciate the recommendation for minor revision and will ensure the final version incorporates any clarifications needed to further strengthen the presentation of the multi-stage triplet loss, appearance consistency, and physical augmentation components.

Circularity Check

0 steps flagged

No significant circularity; empirical method with direct loss definitions

full rationale

The paper proposes TriPatch as an empirical adversarial patch method. It defines a triplet loss directly targeting detector pipeline stages (confidence suppression, bounding-box offset amplification, NMS disruption) plus an appearance consistency loss and physical data augmentations. No equations, predictions, or first-principles derivations are present that reduce by construction to fitted inputs or self-referential definitions. Central claims rest on experimental validation across multiple detectors and physical perturbations, with no load-bearing self-citations or uniqueness theorems invoked. This is a standard self-contained empirical proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on standard assumptions that detector stages are differentiable and that simulated augmentations approximate physical variations; no new entities or fitted constants are introduced in the abstract.

axioms (2)
  • domain assumption Detection pipeline stages (confidence, bounding box, NMS) can be independently targeted by loss terms.
    Invoked when defining the triplet loss components.
  • domain assumption Appearance consistency and data augmentation suffice to bridge simulation-to-real gap for patches.
    Central to the robustness enhancement claim.

pith-pipeline@v0.9.0 · 5490 in / 1181 out tokens · 26904 ms · 2026-05-08T12:23:35.383016+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 13 canonical work pages · 5 internal anchors

  1. [1]

    Towards Reliable Forgetting: A Survey on Machine Unlearning Verification

    Lulu Xue, Shengshan Hu, Wei Lu, Yan Shen, Dongxu Li, Peijin Guo, Ziqi Zhou, Minghui Li, Yanjun Zhang, and Leo Yu Zhang. Towards reliable forgetting: A survey on machine unlearning verification.arXiv preprint arXiv:2506.15115, 2025

  2. [2]

    Ufvideo: Towards unified fine-grained video cooperative understanding with large language models

    Hewen Pan, Cong Wei, Dashuang Liang, Zepeng Huang, Pengfei Gao, Ziqi Zhou, Lulu Xue, Pengfei Yan, Xiaoming Wei, Minghui Li, et al. Ufvideo: Towards unified fine-grained video cooperative understanding with large language models. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR’26), 2026

  3. [3]

    Tattoo: Training-free aesthetic- aware outfit recommendation,

    Yuntian Wu, Xiaonan Hu, Ziqi Zhou, and Hao Lu. Tattoo: Training-free aesthetic- aware outfit recommendation.arXiv preprint arXiv:2509.23242, 2025

  4. [4]

    Darkhash: A data-free backdoor attack against deep hashing.IEEE Transactions on Information Forensics and Security, 2025

    Ziqi Zhou, Menghao Deng, Yufei Song, Hangtao Zhang, Wei Wan, Shengshan Hu, Minghui Li, Leo Yu Zhang, and Dezhong Yao. Darkhash: A data-free backdoor attack against deep hashing.IEEE Transactions on Information Forensics and Security, 2025

  5. [5]

    Badhash: Invisible backdoor attacks against deep hashing with clean label

    Shengshan Hu, Ziqi Zhou, Yechao Zhang, Leo Yu Zhang, Yifeng Zheng, Yuanyuan He, and Hai Jin. Badhash: Invisible backdoor attacks against deep hashing with clean label. InProceedings of the 30th ACM International Conference on Multimedia (ACM MM’22), pages 678–686, 2022

  6. [6]

    Mars: A malignity-aware backdoor defense in federated learning

    Wei Wan, Yuxuan Ning, Zhicong Huang, Cheng Hong, Shengshan Hu, Ziqi Zhou, Yechao Zhang, Tianqing Zhu, Wanlei Zhou, and Leo Yu Zhang. Mars: A malignity-aware backdoor defense in federated learning. InProceedings of the 39th Annual Conference on Neural Information Processing Systems (NeurIPS’25), 2025

  7. [7]

    Detector collapse: Backdooring object detection to catastrophic overload or blindness

    Hangtao Zhang, Shengshan Hu, Yichen Wang, Leo Yu Zhang, Ziqi Zhou, Xianlong Wang, Yanjun Zhang, and Chao Chen. Detector collapse: Backdooring object detection to catastrophic overload or blindness. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI’24), 2024

  8. [8]

    Test-time backdoor detection for object detection models

    Hangtao Zhang, Yichen Wang, Shihui Yan, Chenyu Zhu, Ziqi Zhou, Linshan Hou, Shengshan Hu, Minghui Li, Yanjun Zhang, and Leo Yu Zhang. Test-time backdoor detection for object detection models. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR’25), pages 24377–24386, 2025

  9. [9]

    Trojanrobot: Backdoor attacks against robotic manipulation in the physical world.arXiv e-prints, pages arXiv–2411, 2024

    Xianlong Wang, Hewen Pan, Hangtao Zhang, Minghui Li, Shengshan Hu, Ziqi Zhou, Lulu Xue, Peijin Guo, Yichen Wang, Wei Wan, et al. Trojanrobot: Backdoor attacks against robotic manipulation in the physical world.arXiv e-prints, pages arXiv–2411, 2024

  10. [10]

    Detecting and corrupting convolution-based unlearn- able examples

    Minghui Li, Xianlong Wang, Zhifei Yu, Shengshan Hu, Ziqi Zhou, Longling Zhang, and Leo Yu Zhang. Detecting and corrupting convolution-based unlearn- able examples. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI’25), volume 39, pages 18403–18411, 2025

  11. [11]

    Eclipse: Expunging clean-label indiscriminate poisons via sparse diffusion purification

    Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, and Hai Jin. Eclipse: Expunging clean-label indiscriminate poisons via sparse diffusion purification. InEuropean Symposium on Research in Computer Security, pages 146–166. Springer, 2024

  12. [12]

    Spa-vlm: Stealthy poisoning attacks on rag-based vlm,

    Lei Yu, Yechao Zhang, Ziqi Zhou, Yang Wu, Wei Wan, Minghui Li, Shengshan Hu, Pei Xiaobing, and Jing Wang. Spa-vlm: Stealthy poisoning attacks on rag-based vlm.arXiv preprint arXiv:2505.23828, 2025

  13. [13]

    Unlearnable 3d point clouds: Class-wise transfor- mation is all you need

    Xianlong Wang, Minghui Li, Wei Liu, Hangtao Zhang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, and Hai Jin. Unlearnable 3d point clouds: Class-wise transfor- mation is all you need. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS’24), volume 37, pages 99404–99432, 2024

  14. [14]

    Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks.CoRR, abs/1608.04644, 2016

  15. [15]

    Advedm: Fine- grained adversarial attack against vlm-based embodied agents

    Yichen Wang, hangtao Zhang, Pan Hewen, Ziqi Zhou, Xianlong Wang, Peijin Guo, lulu Xue, Shengshan Hu, Minghui Li, Leo Yu Zhang, and Yao. Advedm: Fine- grained adversarial attack against vlm-based embodied agents. InProceedings of the 39th Annual Conference on Neural Information Processing Systems (NeurIPS’25), 2025

  16. [16]

    Breaking barriers in physical-world adversarial examples: Improving robustness and transferability via robust feature

    Yichen Wang, Yuxuan Chou, Ziqi Zhou, Hangtao Zhang, Wei Wan, Shengshan Hu, and Minghui Li. Breaking barriers in physical-world adversarial examples: Improving robustness and transferability via robust feature. InProceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI’25), 2025

  17. [17]

    Pb-uap: Hybrid universal adversarial attack for image segmentation

    Yufei Song, Ziqi Zhou, Minghui Li, Xianlong Wang, Menghao Deng, Wei Wan, Shengshan Hu, and Leo Yu Zhang. Pb-uap: Hybrid universal adversarial attack for image segmentation. InProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’25), 2025

  18. [18]

    Dap: A dynamic adversarial patch for evading person detectors

    Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif, Ihsen Alouani, and Muhammad Shafique. Dap: A dynamic adversarial patch for evading person detectors. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24595–24604. IEEE, 2024

  19. [19]

    Physically realizable natural-looking clothing textures evade person detectors via 3d modeling

    Zhanhao Hu, Wenda Chu, Xiaopei Zhu, Hui Zhang, Bo Zhang, and Xiaolin Hu. Physically realizable natural-looking clothing textures evade person detectors via 3d modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16975–16984. IEEE, 2023

  20. [20]

    Transferable adversarial facial images for privacy protection

    Minghui Li, Jiangxiong Wang, Hao Zhang, Ziqi Zhou, Shengshan Hu, and Xi- aobing Pei. Transferable adversarial facial images for privacy protection. In Proceedings of the 32nd ACM International Conference on Multimedia (ACM MM’24), pages 10649–10658, 2024

  21. [21]

    Segtrans: Transferable adversarial examples for segmentation models.IEEE Transactions on Multimedia, 2025

    Yufei Song, Ziqi Zhou, Qi Lu, Hangtao Zhang, Yifan Hu, Lulu Xue, Shengshan Hu, Minghui Li, and Leo Yu Zhang. Segtrans: Transferable adversarial examples for segmentation models.IEEE Transactions on Multimedia, 2025

  22. [22]

    Erosion attack for adversarial training to enhance semantic segmentation robustness

    Yufei Song, Ziqi Zhou, Menghao Deng, Yifan Hu, Shengshan Hu, Minghui Li, and Leo Yu Zhang. Erosion attack for adversarial training to enhance semantic segmentation robustness. InProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’26), 2026

  23. [23]

    Advedm: Fine- grained adversarial attack against vlm-based embodied agents

    Yichen Wang, Hangtao Zhang, Hewen Pan, Ziqi Zhou, Xianlong Wang, Peijin Guo, Lulu Xue, Shengshan Hu, Minghui Li, and Leo Yu Zhang. Advedm: Fine- grained adversarial attack against vlm-based embodied agents. InProceedings of the 39th Annual Conference on Neural Information Processing Systems (NeurIPS’25), 2025

  24. [24]

    Fooling automated surveil- lance cameras: Adversarial patches to attack person detection

    Simen Thys, Wiebe Van Ranst, and Toon Goedemé. Fooling automated surveil- lance cameras: Adversarial patches to attack person detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 0–0, 2019

  25. [25]

    Dual attention suppression attack: Generate adversarial camouflage in physical world

    Jiakai Wang, Aishan Liu, Zixin Yin, Shunchang Liu, Shiyu Tang, and Xianglong Liu. Dual attention suppression attack: Generate adversarial camouflage in physical world. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8565–8574, 2021

  26. [26]

    Adversarial t-shirt! evading person detectors in a physical world

    Kaidi Xu, Gaoyuan Zhang, Sijia Liu, Quanfu Fan, Mengshu Sun, Hongge Chen, Pin-Yu Chen, Yanzhi Wang, and Xue Lin. Adversarial t-shirt! evading person detectors in a physical world. InEuropean Conference on Computer Vision (ECCV), pages 665–681. Springer, 2020

  27. [27]

    On physical adversarial patches for object detection

    Mark Lee and Zico Kolter. On physical adversarial patches for object detection. arXiv preprint arXiv:1906.11897, 2019

  28. [28]

    Rich feature hierarchies for accurate object detection and semantic segmentation

    Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 580–587, 2014

  29. [29]

    Fast r-cnn

    Ross Girshick. Fast r-cnn. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1440–1448. IEEE, 2015

  30. [30]

    Girshick, and Jian Sun

    Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks.CoRR, abs/1506.01497, 2015

  31. [31]

    You only look once: Unified, real-time object detection

    Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, Las Vegas, NV, USA, 2016. IEEE

  32. [32]

    YOLOv3: An Incremental Improvement

    Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767, 2018

  33. [33]

    YOLOv4: Optimal Speed and Accuracy of Object Detection

    Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Op- timal speed and accuracy of object detection.arXiv preprint arXiv:2004.10934, 2020. Conference acronym ’XX, June 03–05, 2018, Woodstock, NY

  34. [34]

    YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detec- tors

    Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detec- tors. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7464–7475, Vancouver, BC, Canada, 2023. IEEE

  35. [35]

    Ssd: Single shot multibox detector

    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander Berg. Ssd: Single shot multibox detector. In European Conference on Computer Vision (ECCV), volume 9905, pages 21–37. Springer, 2016

  36. [36]

    Focal loss for dense object detection

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, Venice, Italy, 2017. IEEE

  37. [37]

    Securely fine-tuning pre-trained encoders against adversarial examples

    Ziqi Zhou, Minghui Li, Wei Liu, Shengshan Hu, Yechao Zhang, Wei Wan, Lulu Xue, Leo Yu Zhang, Dezhong Yao, and Hai Jin. Securely fine-tuning pre-trained encoders against adversarial examples. InProceedings of the 2024 IEEE Symposium on Security and Privacy (SP’24), 2024

  38. [38]

    Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning

    Ziqi Zhou, Shengshan Hu, Minghui Li, Hangtao Zhang, Yechao Zhang, and Hai Jin. Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning. InProceedings of the 32nd ACM International Conference on Multimedia (MM’23), pages 6311–6320, 2023

  39. [39]

    Downstream-agnostic adversarial examples

    Ziqi Zhou, Shengshan Hu, Ruizhi Zhao, Qian Wang, Leo Yu Zhang, Junhui Hou, and Hai Jin. Downstream-agnostic adversarial examples. InProceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV’23), pages 4345–4355, 2023

  40. [40]

    Numbod: A spatial-frequency fusion attack against object detectors

    Zihan Zhou, Bo Li, Yifan Song, et al. Numbod: A spatial-frequency fusion attack against object detectors. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 1201–1209, 2025

  41. [41]

    Darksam: Fooling segment anything model to segment nothing

    Ziqi Zhou, Yufei Song, Minghui Li, Shengshan Hu, Xianlong Wang, Leo Yu Zhang, Dezhong Yao, and Hai Jin. Darksam: Fooling segment anything model to segment nothing. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS’24), 2024

  42. [42]

    Vanish into thin air: Cross-prompt universal adversarial attacks for sam2

    Ziqi Zhou, Yifan Hu, Yufei Song, Zijing Li, Shengshan Hu, Leo Yu Zhang, Dezhong Yao, Long Zheng, and Hai Jin. Vanish into thin air: Cross-prompt universal adversarial attacks for sam2. InProceedings of the 39th Annual Conference on Neural Information Processing Systems (NeurIPS’25), 2025

  43. [43]

    Towards robust rain removal against adversarial attacks: A comprehensive benchmark analysis and beyond

    Yi Yu, Wenhan Yang, Yap-Peng Tan, and Alex C Kot. Towards robust rain removal against adversarial attacks: A comprehensive benchmark analysis and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22), 2022

  44. [44]

    Benchmarking adversarial robustness of image shadow removal with shadow-adaptive attacks

    Chong Wang, Yi Yu, Lanqing Guo, and Bihan Wen. Benchmarking adversarial robustness of image shadow removal with shadow-adaptive attacks. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP‘24), 2024

  45. [45]

    Transferable adversarial attacks on sam and its downstream models

    Song Xia, Wenhan Yang, Yi Yu, Xun Lin, Henghui Ding, LINGYU DUAN, and Xudong Jiang. Transferable adversarial attacks on sam and its downstream models. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS’24), 2024

  46. [46]

    From pretrain to pain: Adversarial vulnerability of video foundation models without task knowledge

    Hui Lu, Yi Yu, Song Xia, Yiming Yang, Deepu Rajan, Boon Poh Ng, Alex Kot, and Xudong Jiang. From pretrain to pain: Adversarial vulnerability of video foundation models without task knowledge. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI’26), 2026

  47. [47]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks.arXiv preprint arXiv:1312.6199, 2013

  48. [48]

    Explaining and harness- ing adversarial examples.International Conference on Learning Representations (ICLR), 2015

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harness- ing adversarial examples.International Conference on Learning Representations (ICLR), 2015

  49. [49]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017

  50. [50]

    Camou: Learning physical vehicle camouflages to adversarially attack detectors in the wild

    Yang Zhang, Hassan Foroosh, Philip David, and Boqing Gong. Camou: Learning physical vehicle camouflages to adversarially attack detectors in the wild. In International Conference on Learning Representations (ICLR), 2019

  51. [51]

    Synthesizing robust adversarial examples

    Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. InInternational Conference on Machine Learning (ICML), pages 284–293. PMLR, 2018

  52. [52]

    Uv-attack: Physical- world adversarial attacks for person detection via dynamic-nerf-based uv map- ping

    Yanjie Li, Wenxuan Zhang, Kaisheng Liang, and Bin Xiao. Uv-attack: Physical- world adversarial attacks for person detection via dynamic-nerf-based uv map- ping. InInternational Conference on Learning Representations (ICLR), 2025. Poster

  53. [53]

    https://doi.org/10.48550/arXiv.1712.09665

    Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch.arXiv preprint arXiv:1712.09665, 2017

  54. [54]

    Naturalistic physical adversarial patch for object detectors

    Yu-Chih-Tuan Hu, Bo-Han Kung, Daniel Stanley Tan, JunCheng Chen, Kai-Lung Hua, and Wen-Huang Cheng. Naturalistic physical adversarial patch for object detectors. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7848–7857, 2021

  55. [55]

    Adversarial camouflage: Hiding physical-world attacks with natural styles

    Ranjie Duan, Xingjun Ma, Yisen Wang, James Bailey, A Kai Qin, and Yun Yang. Adversarial camouflage: Hiding physical-world attacks with natural styles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1000–1008, 2020

  56. [56]

    Revisiting adversarial patches for designed camera-agnostic attacks against person detection.Advances in Neural Information Processing Systems (NeurIPS), 37:8047–8064, 2024

    Haoxuan Wei, Zhicong Wang, Kai Zhang, Jinghui Hou, Yifei Liu, Haotian Tang, and Zhen Wang. Revisiting adversarial patches for designed camera-agnostic attacks against person detection.Advances in Neural Information Processing Systems (NeurIPS), 37:8047–8064, 2024

  57. [57]

    Doepatch: Dy- namically optimized ensemble model for adversarial patches generation.IEEE Transactions on Information Forensics and Security, 19:9039–9054, 2024

    Wenbin Tan, Yifan Li, Cheng Zhao, Xin Chen, and Lei Wang. Doepatch: Dy- namically optimized ensemble model for adversarial patches generation.IEEE Transactions on Information Forensics and Security, 19:9039–9054, 2024

  58. [58]

    Histograms of oriented gradients for human detection

    Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 886–893. IEEE, 2005

  59. [59]

    Lawrence Zitnick

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. InProceedings of the European Conference on Computer Vision (ECCV), pages 740–755. Springer, 2014

  60. [60]

    Yolo9000: Better, faster, stronger.CoRR, abs/1612.08242, 2016

    Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger.CoRR, abs/1612.08242, 2016

  61. [61]

    ultralytics/yolov5

    Glenn Jocher, Alex Stoken, Jirka Borovec, NanoCode012, ChristopherSTAN, Liu Changyu, Laughing, tkianai, Adam Hogan, lorenzomammana, yxNONG, AlexWang1900, Laurentiu Diaconu, Marc, wanghaoyang0106, ml5ah, Doug, Fran- cisco Ingham, Frederik, Guilhen, Hatovix, Jake Poznanski, Jiacong Fang, Lijun Yu, changyu98, Mingyu Wang, Naman Gupta, Osama Akhtar, Petr Dvor...

  62. [62]

    Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Yolov8. https://github.com/ ultralytics/ultralytics, 2023. Version 8.0.0, GitHub repository, Ultralytics

  63. [63]

    Yolov9: Learning what you want to learn using programmable gradient information

    Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn using programmable gradient information. InProceedings of the European Conference on Computer Vision (ECCV), Cham, Switzerland, 2024. Springer Nature Switzerland

  64. [64]

    T-sea: Transfer-based self-ensemble attack on object detection

    Haotian Huang, Zhi Chen, Hao Chen, et al. T-sea: Transfer-based self-ensemble attack on object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20514–20523. IEEE, 2023

  65. [65]

    Yuille, and Ning Liu

    Lifeng Huang, Chengying Gao, Yuyin Zhou, Changqing Zou, Cihang Xie, Alan L. Yuille, and Ning Liu. Upc: Learning universal physical camouflage attacks on object detectors.CoRR, abs/1909.04326, 2019