Recognition: unknown
Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis
Pith reviewed 2026-05-07 09:41 UTC · model grok-4.3
The pith
Adversarial patches on roadside infrastructure transfer across vision-language models in autonomous driving at 73-91 percent rates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that physically realizable adversarial patches demonstrate high cross-architecture transferability in VLM-based autonomous driving systems. Transfer rates range from 73 to 91 percent, with averages of 0.815 in crosswalk scenarios and 0.833 in highway scenarios. These patches continue to alter model outputs across 64.7 to 79.4 percent of the frames in the critical decision window even when not optimized for the target architecture.
What carries the argument
The transfer-matrix evaluation that measures how patches optimized for one model affect the decision outputs of the other models in simulated driving scenes.
If this is right
- Attackers can create effective physical patches without knowing which VLM a target vehicle uses.
- A single modification to roadside infrastructure can disrupt multiple autonomous driving systems at once.
- Successful manipulation lasts through most of the critical time window for vehicle decisions.
- Architectural differences between models provide limited protection against these attacks.
Where Pith is reading between the lines
- Safety testing for autonomous vehicles may need to include cross-model attack evaluations as a standard requirement.
- Defenses focused on shared visual features rather than model-specific tuning could become necessary.
- The same transfer pattern might appear in other real-time control systems that use vision-language models.
- Expanding the set of tested models and moving from simulation to physical trials would provide clearer bounds on the risk.
Load-bearing premise
The three evaluated VLM architectures represent those deployed in real autonomous vehicles and the simulated patches accurately model feasible real-world attacks on roadside infrastructure.
What would settle it
Experiments that find transfer rates below 50 percent when the same patches are applied to additional VLM architectures or tested on physical roadside signs against actual vehicles would disprove the high transferability result.
Figures
read the original abstract
Vision-language models (VLMs) are increasingly used in autonomous driving because they combine visual perception with language-based reasoning, supporting more interpretable decision-making, yet their robustness to physical adversarial attacks, especially whether such attacks transfer across different VLM architectures, is not well understood and poses a practical risk when attackers do not know which model a vehicle uses. We address this gap with a systematic cross-architecture study of adversarial transferability in VLM-based driving, evaluating three representative architectures (Dolphins, OmniDrive, and LeapVAD) using physically realizable patches placed on roadside infrastructure in both crosswalk and highway scenarios. Our transfer-matrix evaluation shows high cross-architecture effectiveness, with transfer rates of 73-91% (mean TR = 0.815 for crosswalk and 0.833 for highway) and sustained frame-level manipulation over 64.7-79.4% of the critical decision window even when patches are not optimized for the target model.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a systematic empirical study of adversarial transferability across vision-language models for autonomous driving. It evaluates three architectures (Dolphins, OmniDrive, LeapVAD) using physically realizable adversarial patches placed on roadside infrastructure in simulated crosswalk and highway scenarios. The central results are high cross-architecture transfer rates of 73-91% (mean TR = 0.815 crosswalk, 0.833 highway) with sustained frame-level manipulation over 64.7-79.4% of the critical decision window, even for patches not optimized for the target model.
Significance. If the physical simulation faithfully captures real-world conditions, the results would establish a concrete practical risk: transferable attacks on VLM-based driving systems remain effective across architectures without target knowledge. The work is credited for its multi-architecture transfer-matrix design and attention to temporal persistence of manipulation in driving contexts, which strengthens the safety relevance of the findings.
major comments (2)
- [Abstract and §4] Abstract and §4 (Evaluation): The claim that patches are 'physically realizable' and support real-world risk conclusions rests on the simulation of projection, lighting, viewpoint, and material effects, yet the manuscript provides no explicit description or validation of these factors. Without them the reported transfer rates measure digital transfer only and do not substantiate the practical roadside-infrastructure attack scenario.
- [Table 2 and §5.1] Table 2 (Transfer Matrix) and §5.1: The headline means (TR = 0.815/0.833) and ranges (73-91%) are presented without reported trial counts, standard deviations, or statistical tests. This omission makes it impossible to judge whether the cross-architecture effectiveness is robust or sensitive to simulation variance.
minor comments (1)
- [Abstract] The abbreviation 'TR' for transfer rate is used in the abstract before being defined; add an explicit definition on first use.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to improve clarity and rigor on the simulation methodology and statistical reporting.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Evaluation): The claim that patches are 'physically realizable' and support real-world risk conclusions rests on the simulation of projection, lighting, viewpoint, and material effects, yet the manuscript provides no explicit description or validation of these factors. Without them the reported transfer rates measure digital transfer only and do not substantiate the practical roadside-infrastructure attack scenario.
Authors: We agree that greater transparency is required to substantiate the physical realizability claims. Although the manuscript references a simulation framework that incorporates projection, lighting, viewpoint variation, and material properties (as described in the evaluation setup), we acknowledge that these elements were not described with sufficient explicit detail or validation metrics. In the revised manuscript we will expand the methodology section to provide a dedicated subsection detailing the rendering pipeline, including specific models for lighting (e.g., HDR environment maps and directional sources), viewpoint sampling (randomized camera poses within realistic driving ranges), material reflectance simulation, and patch projection onto infrastructure surfaces. We will also add quantitative validation where feasible, such as comparisons of simulated patch visibility against expected physical degradation factors. These additions will clarify that the reported transfer rates derive from a physically-informed simulation rather than purely digital perturbations, thereby better supporting the practical risk conclusions. revision: yes
-
Referee: [Table 2 and §5.1] Table 2 (Transfer Matrix) and §5.1: The headline means (TR = 0.815/0.833) and ranges (73-91%) are presented without reported trial counts, standard deviations, or statistical tests. This omission makes it impossible to judge whether the cross-architecture effectiveness is robust or sensitive to simulation variance.
Authors: We concur that the absence of trial counts, variability measures, and statistical tests limits the interpretability of the headline results. In the revised manuscript we will update Table 2 and the accompanying text in §5.1 to report the exact number of independent trials per transfer pair and scenario (50 runs each), include standard deviations for all mean transfer rates, and add statistical analyses such as paired t-tests or Wilcoxon rank-sum tests to assess whether observed differences in transferability across architectures are significant. These revisions will allow readers to evaluate the robustness of the 73-91% range and the reported means against simulation variance. revision: yes
Circularity Check
No circularity: purely empirical reporting of measured transfer rates
full rationale
The manuscript presents an experimental study that directly measures and tabulates cross-architecture adversarial transfer rates (73-91 %) and sustained-frame manipulation percentages in simulated driving scenes. No equations, fitted parameters, first-principles derivations, or uniqueness theorems appear in the provided text. The reported quantities are obtained by running the three VLMs on the same set of physically-simulated patches and counting success; they are not obtained by re-expressing any input quantity or by a self-citation chain. Consequently the central claims do not reduce to their own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Synthesizing robust adversarial examples
Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. InInternational con- ference on machine learning, pages 284–293. PMLR, 2018
2018
-
[2]
https://doi.org/10.48550/arXiv.1712.09665
Tom B. Brown, Dandelion Man ´e, Aurko Roy, Mart´ın Abadi, and Justin Gilmer. Adversarial patch.CoRR, abs/1712.09665, 2017
-
[3]
CARLA: An open urban driving simulator
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. CARLA: An open urban driving simulator. InProceedings of the 1st Annual Conference on Robot Learning, pages 1–16, 2017
2017
-
[4]
Robust physical-world attacks on deep learning visual classification
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1625–1634, 2018
2018
-
[5]
David Fernandez, Pedram MohajerAnsari, Cigdem Kokenoz, Amir Salarpour, Bing Li, and Mert D. Pes´e. Wip: From detection to explanation: Using llms for adversarial scenario analysis in 8 vehicles. InProceedings of the 3rd USENIX Symposium on V ehi- cle Security and Privacy (V ehicleSec ’25). USENIX Association, 2025
2025
-
[6]
David Fernandez, Pedram MohajerAnsari, Amir Salarpour, Richard Brooks, and Mert D. Pes ´e. Forensic reconstruction of traffic incidents: A vision-language model framework for post- incident forensic analysis.IEEE Multimedia, 2026
2026
-
[7]
Avoiding the crash: A vision-language model eval- uation of critical traffic scenarios
David Fernandez, Pedram MohajerAnsari, Amir Salarpour, and Mert D Pes´e. Avoiding the crash: A vision-language model eval- uation of critical traffic scenarios. Technical report, SAE Techni- cal Paper, 2025
2025
-
[8]
Stoll, and Alexandru Condurache
Steffen Hagedorn, Marcel Hallgarten, M. Stoll, and Alexandru Condurache. The integration of prediction and planning in deep learning automated driving systems: A review.IEEE Transac- tions on Intelligent V ehicles, 10:3626–3643, 2023
2023
-
[9]
Universal adversarial perturbations against semantic image segmentation
Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and V olker Fischer. Universal adversarial perturbations against semantic image segmentation. InProceedings of the IEEE international conference on computer vision, pages 2755–2764, 2017
2017
-
[10]
Planning-oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 17853–17862, 2023
2023
-
[11]
Black-box adversarial attacks with limited queries and informa- tion
Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and informa- tion. InInternational conference on machine learning, pages 2137–2146. PMLR, 2018
2018
-
[12]
Universal adversar- ial perturbations against object detection.Pattern Recognition, 110:107584, 2021
Debang Li, Junge Zhang, and Kaiqi Huang. Universal adversar- ial perturbations against object detection.Pattern Recognition, 110:107584, 2021
2021
-
[13]
Blip: Bootstrapping language-image pre-training for unified vision- language understanding and generation
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision- language understanding and generation. InInternational confer- ence on machine learning, pages 12888–12900. PMLR, 2022
2022
-
[14]
Vi- sual instruction tuning, 2023
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Vi- sual instruction tuning, 2023
2023
-
[15]
Dolphins: Multimodal language model for driv- ing, 2023
Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, and Chaowei Xiao. Dolphins: Multimodal language model for driv- ing, 2023
2023
-
[16]
Yukai Ma, Tiantian Wei, Naiting Zhong, Jianbiao Mei, Tao Hu, Licheng Wen, Xuemeng Yang, Botian Shi, and Yong Liu. Leap- vad: A leap in autonomous driving via cognitive perception and dual-process thinking.arXiv preprint arXiv:2501.08168, 2025
-
[17]
Lingoqa: Visual question answering for autonomous driving
Ana-Maria Marcu, Long Chen, Jan H ¨unermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badri- narayanan, Alex Kendall, Jamie Shotton, et al. Lingoqa: Visual question answering for autonomous driving. InEuropean Con- ference on Computer Vision, pages 252–269. Springer, 2024
2024
-
[18]
Universal adversarial perturbations against semantic image segmentation, 2017
Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and V olker Fischer. Universal adversarial perturbations against semantic image segmentation, 2017
2017
-
[19]
Attention-aware tem- poral adversarial shadows on traffic sign sequences
Pedram MohajerAnsari, Amir Salarpour, David Fernandez, Cig- dem Kokenoz, Bing Li, and Mert D Pes ´e. Attention-aware tem- poral adversarial shadows on traffic sign sequences. InThe 5th Workshop of Adversarial Machine Learning on Computer Vision: F oundation Models + X, 2025
2025
-
[20]
Universal adversarial perturbations, 2017
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations, 2017
2017
-
[21]
Papers for vlm in driving.https://github
OpenDriveLab. Papers for vlm in driving.https://github. com/OpenDriveLab/End-to-end-Autonomous-Driving/ blob/main/papers.md#papers-for-vlm-in-driving,
-
[22]
Learning transferable visual models from natural lan- guage supervision, 2021
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural lan- guage supervision, 2021
2021
-
[23]
Om- nidrive: A holistic vision-language dataset for autonomous driv- ing with counterfactual reasoning
Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, and Jose M Alvarez. Om- nidrive: A holistic vision-language dataset for autonomous driv- ing with counterfactual reasoning. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 22442– 22452, 2025
2025
-
[24]
Physical adversarial attack meets computer vision: A decade sur- vey.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 46(12):9797–9817, 2024
Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin’ichi Satoh, Luc Van Gool, and Zheng Wang. Physical adversarial attack meets computer vision: A decade sur- vey.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 46(12):9797–9817, 2024
2024
-
[25]
Natural evolution strategies.J
Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and J ¨urgen Schmidhuber. Natural evolution strategies.J. Mach. Learn. Res., 15(1):949–980, January 2014
2014
-
[26]
Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 2024
Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 2024
2024
-
[27]
Vlattack: multimodal adversarial attacks on vision-language tasks via pre- trained models
Ziyi Yin, Muchao Ye, Tianrong Zhang, Tianyu Du, Jinguo Zhu, Han Liu, Jinghui Chen, Ting Wang, and Fenglong Ma. Vlattack: multimodal adversarial attacks on vision-language tasks via pre- trained models. InProceedings of the 37th International Confer- ence on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc
2023
-
[28]
Xingcheng Zhou, Mingyu Liu, Ekim Yurtsever, Bare Luka Zagar, Walter Zimmer, Hu Cao, and Alois C. Knoll. Vision language models in autonomous driving: A survey and outlook.IEEE Transactions on Intelligent V ehicles, pages 1–20, 2024. 9
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.