arxiv: 2604.27414 · v1 · submitted 2026-04-30 · 💻 cs.CV · cs.CR· cs.LG

Recognition: unknown

Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis

David Fernandez , Pedram MohajerAnsari , Amir Salarpour , Mert D. Pese

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:41 UTC · model grok-4.3

classification 💻 cs.CV cs.CRcs.LG

keywords adversarial transferabilityvision-language modelsautonomous drivingphysical adversarial attackscross-architecture analysisroadside infrastructureVLM robustness

0 comments

The pith

Adversarial patches on roadside infrastructure transfer across vision-language models in autonomous driving at 73-91 percent rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether physical adversarial patches optimized for one vision-language model will also disrupt others in autonomous driving tasks. It runs controlled tests with three models in crosswalk and highway scenarios, placing patches on signs and measuring success when the patch is not tuned for the target model. High transfer rates and sustained effects over most of the critical decision frames are reported. This finding matters because it shows that switching between models may fail to block attacks if the underlying visual vulnerabilities are shared. A sympathetic reader would view this as a practical limit on using architectural diversity for safety.

Core claim

The paper claims that physically realizable adversarial patches demonstrate high cross-architecture transferability in VLM-based autonomous driving systems. Transfer rates range from 73 to 91 percent, with averages of 0.815 in crosswalk scenarios and 0.833 in highway scenarios. These patches continue to alter model outputs across 64.7 to 79.4 percent of the frames in the critical decision window even when not optimized for the target architecture.

What carries the argument

The transfer-matrix evaluation that measures how patches optimized for one model affect the decision outputs of the other models in simulated driving scenes.

If this is right

Attackers can create effective physical patches without knowing which VLM a target vehicle uses.
A single modification to roadside infrastructure can disrupt multiple autonomous driving systems at once.
Successful manipulation lasts through most of the critical time window for vehicle decisions.
Architectural differences between models provide limited protection against these attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Safety testing for autonomous vehicles may need to include cross-model attack evaluations as a standard requirement.
Defenses focused on shared visual features rather than model-specific tuning could become necessary.
The same transfer pattern might appear in other real-time control systems that use vision-language models.
Expanding the set of tested models and moving from simulation to physical trials would provide clearer bounds on the risk.

Load-bearing premise

The three evaluated VLM architectures represent those deployed in real autonomous vehicles and the simulated patches accurately model feasible real-world attacks on roadside infrastructure.

What would settle it

Experiments that find transfer rates below 50 percent when the same patches are applied to additional VLM architectures or tested on physical roadside signs against actual vehicles would disprove the high transferability result.

Figures

Figures reproduced from arXiv: 2604.27414 by Amir Salarpour, David Fernandez, Mert D. Pese, Pedram MohajerAnsari.

**Figure 1.** Figure 1: Cross-Architecture Adversarial Transferability Framework. Five-stage pipeline to evaluate how adversarial patches transfer across VLM architectures: (1) generate patches (three architecture-specific and one universal), (2) record scenarios in CARLA, (3) test all 9 patch–VLM pairings and build a transfer matrix, (4) normalize model outputs using a CLIP text encoder for consistent scoring, and (5) analyze t… view at source ↗

**Figure 2.** Figure 2: Highway Attack Scenario. The attack scenario demonstrates how view at source ↗

**Figure 3.** Figure 3: Transfer rate heat map for the highway scenario. Darker colors indi view at source ↗

**Figure 6.** Figure 6: Mean cross-architecture frame success rates with standard deviation view at source ↗

read the original abstract

Vision-language models (VLMs) are increasingly used in autonomous driving because they combine visual perception with language-based reasoning, supporting more interpretable decision-making, yet their robustness to physical adversarial attacks, especially whether such attacks transfer across different VLM architectures, is not well understood and poses a practical risk when attackers do not know which model a vehicle uses. We address this gap with a systematic cross-architecture study of adversarial transferability in VLM-based driving, evaluating three representative architectures (Dolphins, OmniDrive, and LeapVAD) using physically realizable patches placed on roadside infrastructure in both crosswalk and highway scenarios. Our transfer-matrix evaluation shows high cross-architecture effectiveness, with transfer rates of 73-91% (mean TR = 0.815 for crosswalk and 0.833 for highway) and sustained frame-level manipulation over 64.7-79.4% of the critical decision window even when patches are not optimized for the target model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first reported transfer rates for physical adversarial patches across three VLMs in simulated driving scenes, but the realism of those simulations is the load-bearing part that needs checking.

read the letter

The main thing here is the cross-architecture transfer matrix on Dolphins, OmniDrive, and LeapVAD. They place patches on roadside infrastructure in crosswalk and highway simulations and measure how often an attack crafted for one model fools the others. The numbers come out at 73-91% transfer with means of 0.815 and 0.833, plus sustained manipulation over roughly two-thirds to three-quarters of the decision window. That is new empirical data for VLMs in driving; prior work on transferability has not focused on this combination of models and scenarios with a transfer matrix layout.

Referee Report

2 major / 1 minor

Summary. The paper presents a systematic empirical study of adversarial transferability across vision-language models for autonomous driving. It evaluates three architectures (Dolphins, OmniDrive, LeapVAD) using physically realizable adversarial patches placed on roadside infrastructure in simulated crosswalk and highway scenarios. The central results are high cross-architecture transfer rates of 73-91% (mean TR = 0.815 crosswalk, 0.833 highway) with sustained frame-level manipulation over 64.7-79.4% of the critical decision window, even for patches not optimized for the target model.

Significance. If the physical simulation faithfully captures real-world conditions, the results would establish a concrete practical risk: transferable attacks on VLM-based driving systems remain effective across architectures without target knowledge. The work is credited for its multi-architecture transfer-matrix design and attention to temporal persistence of manipulation in driving contexts, which strengthens the safety relevance of the findings.

major comments (2)

[Abstract and §4] Abstract and §4 (Evaluation): The claim that patches are 'physically realizable' and support real-world risk conclusions rests on the simulation of projection, lighting, viewpoint, and material effects, yet the manuscript provides no explicit description or validation of these factors. Without them the reported transfer rates measure digital transfer only and do not substantiate the practical roadside-infrastructure attack scenario.
[Table 2 and §5.1] Table 2 (Transfer Matrix) and §5.1: The headline means (TR = 0.815/0.833) and ranges (73-91%) are presented without reported trial counts, standard deviations, or statistical tests. This omission makes it impossible to judge whether the cross-architecture effectiveness is robust or sensitive to simulation variance.

minor comments (1)

[Abstract] The abbreviation 'TR' for transfer rate is used in the abstract before being defined; add an explicit definition on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to improve clarity and rigor on the simulation methodology and statistical reporting.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Evaluation): The claim that patches are 'physically realizable' and support real-world risk conclusions rests on the simulation of projection, lighting, viewpoint, and material effects, yet the manuscript provides no explicit description or validation of these factors. Without them the reported transfer rates measure digital transfer only and do not substantiate the practical roadside-infrastructure attack scenario.

Authors: We agree that greater transparency is required to substantiate the physical realizability claims. Although the manuscript references a simulation framework that incorporates projection, lighting, viewpoint variation, and material properties (as described in the evaluation setup), we acknowledge that these elements were not described with sufficient explicit detail or validation metrics. In the revised manuscript we will expand the methodology section to provide a dedicated subsection detailing the rendering pipeline, including specific models for lighting (e.g., HDR environment maps and directional sources), viewpoint sampling (randomized camera poses within realistic driving ranges), material reflectance simulation, and patch projection onto infrastructure surfaces. We will also add quantitative validation where feasible, such as comparisons of simulated patch visibility against expected physical degradation factors. These additions will clarify that the reported transfer rates derive from a physically-informed simulation rather than purely digital perturbations, thereby better supporting the practical risk conclusions. revision: yes
Referee: [Table 2 and §5.1] Table 2 (Transfer Matrix) and §5.1: The headline means (TR = 0.815/0.833) and ranges (73-91%) are presented without reported trial counts, standard deviations, or statistical tests. This omission makes it impossible to judge whether the cross-architecture effectiveness is robust or sensitive to simulation variance.

Authors: We concur that the absence of trial counts, variability measures, and statistical tests limits the interpretability of the headline results. In the revised manuscript we will update Table 2 and the accompanying text in §5.1 to report the exact number of independent trials per transfer pair and scenario (50 runs each), include standard deviations for all mean transfer rates, and add statistical analyses such as paired t-tests or Wilcoxon rank-sum tests to assess whether observed differences in transferability across architectures are significant. These revisions will allow readers to evaluate the robustness of the 73-91% range and the reported means against simulation variance. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical reporting of measured transfer rates

full rationale

The manuscript presents an experimental study that directly measures and tabulates cross-architecture adversarial transfer rates (73-91 %) and sustained-frame manipulation percentages in simulated driving scenes. No equations, fitted parameters, first-principles derivations, or uniqueness theorems appear in the provided text. The reported quantities are obtained by running the three VLMs on the same set of physically-simulated patches and counting success; they are not obtained by re-expressing any input quantity or by a self-citation chain. Consequently the central claims do not reduce to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As an empirical evaluation study, the paper introduces no new free parameters, axioms, or invented entities; it relies on standard assumptions from adversarial machine learning and computer vision research.

pith-pipeline@v0.9.0 · 5486 in / 1044 out tokens · 53130 ms · 2026-05-07T09:41:19.228895+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 2 canonical work pages

[1]

Synthesizing robust adversarial examples

Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. InInternational con- ference on machine learning, pages 284–293. PMLR, 2018

2018
[2]

https://doi.org/10.48550/arXiv.1712.09665

Tom B. Brown, Dandelion Man ´e, Aurko Roy, Mart´ın Abadi, and Justin Gilmer. Adversarial patch.CoRR, abs/1712.09665, 2017

work page arXiv 2017
[3]

CARLA: An open urban driving simulator

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. CARLA: An open urban driving simulator. InProceedings of the 1st Annual Conference on Robot Learning, pages 1–16, 2017

2017
[4]

Robust physical-world attacks on deep learning visual classification

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1625–1634, 2018

2018
[5]

David Fernandez, Pedram MohajerAnsari, Cigdem Kokenoz, Amir Salarpour, Bing Li, and Mert D. Pes´e. Wip: From detection to explanation: Using llms for adversarial scenario analysis in 8 vehicles. InProceedings of the 3rd USENIX Symposium on V ehi- cle Security and Privacy (V ehicleSec ’25). USENIX Association, 2025

2025
[6]

David Fernandez, Pedram MohajerAnsari, Amir Salarpour, Richard Brooks, and Mert D. Pes ´e. Forensic reconstruction of traffic incidents: A vision-language model framework for post- incident forensic analysis.IEEE Multimedia, 2026

2026
[7]

Avoiding the crash: A vision-language model eval- uation of critical traffic scenarios

David Fernandez, Pedram MohajerAnsari, Amir Salarpour, and Mert D Pes´e. Avoiding the crash: A vision-language model eval- uation of critical traffic scenarios. Technical report, SAE Techni- cal Paper, 2025

2025
[8]

Stoll, and Alexandru Condurache

Steffen Hagedorn, Marcel Hallgarten, M. Stoll, and Alexandru Condurache. The integration of prediction and planning in deep learning automated driving systems: A review.IEEE Transac- tions on Intelligent V ehicles, 10:3626–3643, 2023

2023
[9]

Universal adversarial perturbations against semantic image segmentation

Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and V olker Fischer. Universal adversarial perturbations against semantic image segmentation. InProceedings of the IEEE international conference on computer vision, pages 2755–2764, 2017

2017
[10]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 17853–17862, 2023

2023
[11]

Black-box adversarial attacks with limited queries and informa- tion

Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and informa- tion. InInternational conference on machine learning, pages 2137–2146. PMLR, 2018

2018
[12]

Universal adversar- ial perturbations against object detection.Pattern Recognition, 110:107584, 2021

Debang Li, Junge Zhang, and Kaiqi Huang. Universal adversar- ial perturbations against object detection.Pattern Recognition, 110:107584, 2021

2021
[13]

Blip: Bootstrapping language-image pre-training for unified vision- language understanding and generation

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision- language understanding and generation. InInternational confer- ence on machine learning, pages 12888–12900. PMLR, 2022

2022
[14]

Vi- sual instruction tuning, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Vi- sual instruction tuning, 2023

2023
[15]

Dolphins: Multimodal language model for driv- ing, 2023

Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, and Chaowei Xiao. Dolphins: Multimodal language model for driv- ing, 2023

2023
[16]

Leap- vad: A leap in autonomous driving via cognitive perception and dual-process thinking.arXiv preprint arXiv:2501.08168, 2025

Yukai Ma, Tiantian Wei, Naiting Zhong, Jianbiao Mei, Tao Hu, Licheng Wen, Xuemeng Yang, Botian Shi, and Yong Liu. Leap- vad: A leap in autonomous driving via cognitive perception and dual-process thinking.arXiv preprint arXiv:2501.08168, 2025

work page arXiv 2025
[17]

Lingoqa: Visual question answering for autonomous driving

Ana-Maria Marcu, Long Chen, Jan H ¨unermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badri- narayanan, Alex Kendall, Jamie Shotton, et al. Lingoqa: Visual question answering for autonomous driving. InEuropean Con- ference on Computer Vision, pages 252–269. Springer, 2024

2024
[18]

Universal adversarial perturbations against semantic image segmentation, 2017

Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and V olker Fischer. Universal adversarial perturbations against semantic image segmentation, 2017

2017
[19]

Attention-aware tem- poral adversarial shadows on traffic sign sequences

Pedram MohajerAnsari, Amir Salarpour, David Fernandez, Cig- dem Kokenoz, Bing Li, and Mert D Pes ´e. Attention-aware tem- poral adversarial shadows on traffic sign sequences. InThe 5th Workshop of Adversarial Machine Learning on Computer Vision: F oundation Models + X, 2025

2025
[20]

Universal adversarial perturbations, 2017

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations, 2017

2017
[21]

Papers for vlm in driving.https://github

OpenDriveLab. Papers for vlm in driving.https://github. com/OpenDriveLab/End-to-end-Autonomous-Driving/ blob/main/papers.md#papers-for-vlm-in-driving,
[22]

Learning transferable visual models from natural lan- guage supervision, 2021

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural lan- guage supervision, 2021

2021
[23]

Om- nidrive: A holistic vision-language dataset for autonomous driv- ing with counterfactual reasoning

Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, and Jose M Alvarez. Om- nidrive: A holistic vision-language dataset for autonomous driv- ing with counterfactual reasoning. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 22442– 22452, 2025

2025
[24]

Physical adversarial attack meets computer vision: A decade sur- vey.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 46(12):9797–9817, 2024

Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin’ichi Satoh, Luc Van Gool, and Zheng Wang. Physical adversarial attack meets computer vision: A decade sur- vey.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 46(12):9797–9817, 2024

2024
[25]

Natural evolution strategies.J

Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and J ¨urgen Schmidhuber. Natural evolution strategies.J. Mach. Learn. Res., 15(1):949–980, January 2014

2014
[26]

Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 2024

Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 2024

2024
[27]

Vlattack: multimodal adversarial attacks on vision-language tasks via pre- trained models

Ziyi Yin, Muchao Ye, Tianrong Zhang, Tianyu Du, Jinguo Zhu, Han Liu, Jinghui Chen, Ting Wang, and Fenglong Ma. Vlattack: multimodal adversarial attacks on vision-language tasks via pre- trained models. InProceedings of the 37th International Confer- ence on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

2023
[28]

Xingcheng Zhou, Mingyu Liu, Ekim Yurtsever, Bare Luka Zagar, Walter Zimmer, Hu Cao, and Alois C. Knoll. Vision language models in autonomous driving: A survey and outlook.IEEE Transactions on Intelligent V ehicles, pages 1–20, 2024. 9

2024