Recognition: unknown
Adversarial Flow Matching for Imperceptible Attacks on End-to-End Autonomous Driving
Pith reviewed 2026-05-09 20:26 UTC · model grok-4.3
The pith
A gray-box attack framework called Adversarial Flow Matching crafts imperceptible perturbations that fool end-to-end autonomous driving systems by targeting their Transformer modules.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AFM achieves a superior trade-off between attack effectiveness and imperceptibility by generating adversarial examples in one step via a neural average velocity field. It perturbs both the generative latent space and the neural average velocity field to create attacks that substantially degrade the performance of Vision-Language-Action and modular end-to-end driving agents across various scenarios, while maintaining state-of-the-art visual imperceptibility. The generated adversarial examples also show robust cross-model transferability, allowing the method to approximate black-box attacks with only prior knowledge of the Transformer module.
What carries the argument
Adversarial Flow Matching, a gray-box framework that uses a neural average velocity field to enable one-step adversarial example generation through synergistic perturbation of the generative latent space and the velocity field.
If this is right
- AFM substantially degrades the performance of both VLA and modular AD agents across various scenarios compared to baselines.
- AFM maintains state-of-the-art visual imperceptibility.
- Adversarial examples generated by AFM exhibit robust cross-model transferability.
- AFM approximates a black-box attack setting while requiring only the prior knowledge that the target AD model incorporates a Transformer-based module.
Where Pith is reading between the lines
- Similar structural vulnerabilities may exist in other Transformer-based decision systems outside of driving, pointing to a wider class of latent-space attacks.
- Autonomous driving developers could prioritize robustness testing against flow-matching perturbations in the latent space of their models.
- The approach might be adapted to probe vulnerabilities in other generative or attention-heavy AI components with minimal model access.
Load-bearing premise
Knowledge that the target end-to-end AD model incorporates a Transformer-based module is sufficient for the gray-box attack to succeed without full model transparency or excessive queries.
What would settle it
An evaluation showing that AFM-generated examples fail to degrade AD agent performance beyond random noise levels or that the perturbations exceed imperceptibility thresholds under standard visual metrics would disprove the central claim.
Figures
read the original abstract
Autonomous driving (AD) is evolving towards end-to-end (E2E) frameworks through two primary paradigms: monolithic models exemplified by Vision-Language-Action (VLA), and specialized modular architectures. Despite their divergent designs, both paradigms increasingly rely on Transformer backbones for complex reasoning, potentially causing a shared vulnerability: visually imperceptible perturbations can manipulate E2E AD models into hazardous maneuvers by targeting the Transformer module. Most existing adversarial attack approaches against AD systems operate under white-box or black-box settings; yet, they typically necessitate full model transparency, or suffer from either prohibitive query latency or limited attack transferability. In this paper, we propose Adversarial Flow Matching (AFM), a novel gray-box attack framework that exploits Transformer structural vulnerabilities in E2E AD models. AFM enables efficient one-step generation of adversarial examples via a neural average velocity field. Additionally, the proposed technique yields effective and visually imperceptible attacks by synergistically perturbing the generative latent space and the neural average velocity field. Extensive experiments demonstrate that AFM achieves a superior trade-off between attack effectiveness and imperceptibility: it substantially degrades the performance of both VLA and modular AD agents across various scenarios compared to baselines, while maintaining state-of-the-art visual imperceptibility. Furthermore, adversarial examples generated by AFM exhibit robust cross-model transferability, indicating that AFM closely approximates a black-box attack setting while requiring only the prior knowledge that the target AD model incorporates a Transformer-based module.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Adversarial Flow Matching (AFM), a gray-box adversarial attack framework for end-to-end autonomous driving (E2E AD) models that use Transformer backbones. AFM generates one-step adversarial examples by synergistically perturbing the generative latent space and a neural average velocity field, claiming superior effectiveness in degrading VLA and modular AD agent performance, state-of-the-art imperceptibility, and robust cross-model transferability, using only the prior knowledge of Transformer module presence.
Significance. If the experimental results support the claims, this would represent a notable advance in adversarial attacks on safety-critical systems by demonstrating an efficient attack that bridges gray-box and black-box settings with minimal assumptions, potentially exposing vulnerabilities in both monolithic VLA and modular Transformer-based AD architectures and motivating better defenses.
major comments (2)
- [Abstract and Methods] The central gray-box claim depends on constructing the 'neural average velocity field' solely from the knowledge that the target model incorporates a Transformer-based module. However, no explicit derivation, pseudocode, or algorithm is provided showing how this field is obtained without target-model gradients, logits, or extensive queries, which is load-bearing for attributing the reported effectiveness and imperceptibility to the minimal assumption.
- [Abstract and Experiments] The abstract asserts superior trade-off between attack effectiveness and imperceptibility with quantitative degradation of performance across scenarios compared to baselines, but without specific metrics (e.g., success rates, perceptual distances), baseline details, or validation procedures, it is difficult to assess if the data support the claims of outperforming baselines while maintaining SOTA imperceptibility and transferability.
minor comments (2)
- [Abstract] The abstract would benefit from including key quantitative results (e.g., attack success rates, LPIPS/PSNR values) to substantiate the claims of superiority and imperceptibility.
- [Methods] Clarify the exact definition and computation of the neural average velocity field and its integration with flow matching for adversarial perturbation generation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for clarification in the gray-box methodology and the presentation of experimental claims. We address each major comment point by point below and will incorporate revisions to improve the manuscript.
read point-by-point responses
-
Referee: [Abstract and Methods] The central gray-box claim depends on constructing the 'neural average velocity field' solely from the knowledge that the target model incorporates a Transformer-based module. However, no explicit derivation, pseudocode, or algorithm is provided showing how this field is obtained without target-model gradients, logits, or extensive queries, which is load-bearing for attributing the reported effectiveness and imperceptibility to the minimal assumption.
Authors: We agree that the manuscript would benefit from an explicit derivation and pseudocode to substantiate the gray-box construction. The current text describes the high-level synergy between latent space perturbation and the neural average velocity field but does not detail the step-by-step approximation process. In the revised version, we will add a new subsection in the Methods along with pseudocode (as Algorithm 1) that derives the field using only the known presence of Transformer modules (e.g., via structural properties of attention and feed-forward layers to estimate average velocity in latent space). This construction requires no gradients, logits, or repeated queries, directly supporting the minimal-assumption claim. revision: yes
-
Referee: [Abstract and Experiments] The abstract asserts superior trade-off between attack effectiveness and imperceptibility with quantitative degradation of performance across scenarios compared to baselines, but without specific metrics (e.g., success rates, perceptual distances), baseline details, or validation procedures, it is difficult to assess if the data support the claims of outperforming baselines while maintaining SOTA imperceptibility and transferability.
Authors: We acknowledge that the abstract summarizes the results qualitatively without embedding key numbers. The full paper reports these metrics in the Experiments section (including success rates, LPIPS and other perceptual distances, baseline comparisons such as PGD and query-based attacks, and cross-model transfer protocols). We will revise the abstract to concisely include representative quantitative results and a brief reference to the evaluation setup and baselines, enabling readers to directly assess the claimed trade-off and transferability. revision: yes
Circularity Check
No detectable circularity; derivation chain not visible in text
full rationale
The abstract and provided text present AFM as a novel gray-box framework that generates one-step adversarial examples by perturbing a generative latent space and a neural average velocity field, exploiting Transformer backbones in E2E AD models. No equations, pseudocode, derivations, or self-citations appear that would reduce any claimed prediction or result to an input by construction. Claims of effectiveness, imperceptibility, and transferability are framed as outcomes of experiments rather than tautological redefinitions or fitted parameters renamed as predictions. The method is described without load-bearing reliance on prior author work or uniqueness theorems imported from self-citations, rendering the presented chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Vla4codrive: Vision-language- action dataset for cooperative autonomous driving,
S. P. H. Boroujeni and A. Razi, “Vla4codrive: Vision-language- action dataset for cooperative autonomous driving,” in Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026, pp. 1789–1799
2026
-
[2]
A survey on vision-language- action models for autonomous driving,
S. Jiang, Z. Huang, K. Qian, Z. Luo, T. Zhu, Y. Zhong, Y. Tang, M. Kong, Y. Wang, S. Jiao et al., “A survey on vision-language- action models for autonomous driving,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4524–4536
2025
-
[3]
End-to-end autonomous driving: Challenges and frontiers,
L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,” IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 46, no. 12, pp. 10 164–10 183, 2024
2024
-
[4]
Intriguing properties of neural networks
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013
work page internal anchor Pith review arXiv 2013
-
[5]
Uniada: Universal adaptive multiobjective adversarial attack for end-to-end autonomous driving systems,
J. Zhang, J. W. Keung, Y. Xiao, Y. Liao, Y. Li, and X. Ma, “Uniada: Universal adaptive multiobjective adversarial attack for end-to-end autonomous driving systems,” IEEE Transac- tions on Reliability, vol. 73, no. 4, pp. 1892–1906, 2024
1906
-
[6]
Visual Adversarial Attack on Vision-Language Models for Autonomous Driving
T. Zhang, L. Wang, X. Zhang, Y. Zhang, B. Jia, S. Liang, S. Hu, Q. Fu, A. Liu, and X. Liu, “Visual adversarial attack on vision-language models for autonomous driving,” arXiv preprint arXiv:2411.18275, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
Attack end-to-end autonomous driving through module-wise noise,
L. Wang, T. Zhang, Y. Han, M. Fang, T. Jin, and J. Kang, “Attack end-to-end autonomous driving through module-wise noise,” in Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2024, pp. 8349–8352
2024
-
[8]
Challenger: Af- fordable adversarial driving video generation,
Z. Xu, B. Li, H.-a. Gao, M. Gao, Y. Chen, M. Liu, C. Yan, H. Zhao, S. Feng, and H. Zhao, “Challenger: Af- fordable adversarial driving video generation,” arXiv preprint arXiv:2505.15880, 2025
-
[9]
Driving into danger: Adversarial patch attack on end-to-end autonomous driving systems using deep learning,
T. Wang, X. Kuang, H. Li, Q. Du, Z. Hu, H. Deng, and G. Zhao, “Driving into danger: Adversarial patch attack on end-to-end autonomous driving systems using deep learning,” in 2023 IEEE Symposium on Computers and Communications (ISCC). IEEE, 2023, pp. 995–1000
2023
-
[10]
Physical 3d adversarial attacks against monocular depth estimation in autonomous driving,
J. Zheng, C. Lin, J. Sun, Z. Zhao, Q. Li, and C. Shen, “Physical 3d adversarial attacks against monocular depth estimation in autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 24 452–24 461
2024
-
[11]
Dynamicpae: Generating scene-aware physical ad- versarial examples in real-time,
J. Hu, X. Liu, J. Wang, J. Zhang, X. Yang, H. Qin, Y. Ma, and K. Xu, “Dynamicpae: Generating scene-aware physical ad- versarial examples in real-time,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[12]
Black- box adversarial attacks in autonomous vehicle technology,
K. N. Kumar, C. Vishnu, R. Mitra, and C. K. Mohan, “Black- box adversarial attacks in autonomous vehicle technology,” in 2020 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), 2020, pp. 1–7
2020
-
[13]
Towards robust LiDAR-based perception in autonomous driving: General black- box adversarial sensor attack and countermeasures,
J. Sun, Y. Cao, Q. A. Chen, and Z. M. Mao, “Towards robust LiDAR-based perception in autonomous driving: General black- box adversarial sensor attack and countermeasures,” in 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, Aug. 2020, pp. 877–894
2020
-
[14]
Security analysis of Camera-LiDAR fusion against Black-Box attacks on autonomous vehicles,
R. S. Hallyburton, Y. Liu, Y. Cao, Z. M. Mao, and M. Pajic, “Security analysis of Camera-LiDAR fusion against Black-Box attacks on autonomous vehicles,” in 31st USENIX Security Symposium (USENIX Security 22). Boston, MA: USENIX Association, Aug. 2022, pp. 1903–1920
2022
-
[15]
arXiv preprint arXiv:2501.13563 , year=
L. Wang, T. Zhang, Y. Qu, S. Liang, Y. Chen, A. Liu, X. Liu, and D. Tao, “Black-box adversarial attack on vi- sion language models for autonomous driving,” arXiv preprint arXiv:2501.13563, 2025
-
[16]
Attacking vision-based perception in end-to-end autonomous driving models,
A. Boloor, K. Garimella, X. He, C. Gill, Y. Vorobeychik, and X. Zhang, “Attacking vision-based perception in end-to-end autonomous driving models,” Journal of Systems Architecture, vol. 110, p. 101766, 2020
2020
-
[17]
Physgan: Generating physical-world-resilient adversarial examples for autonomous driving,
Z. Kong, J. Guo, A. Li, and C. Liu, “Physgan: Generating physical-world-resilient adversarial examples for autonomous driving,” in Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, 2020, pp. 14 254–14 263
2020
-
[18]
Adversarial attack on trajectory prediction for autonomous vehicles with generative adversarial networks,
J. Fan, Z. Wang, and G. Li, “Adversarial attack on trajectory prediction for autonomous vehicles with generative adversarial networks,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 1026– 1031
2024
-
[19]
Generating adversarial point clouds using diffusion model,
R. Zhao, B. Zhu, C. Tong, X. Zhou, and X. Zheng, “Generating adversarial point clouds using diffusion model,” arXiv preprint arXiv:2507.21163, 2025
-
[20]
Mean Flows for One-step Generative Modeling
Z. Geng, M. Deng, X. Bai, J. Z. Kolter, and K. He, “Mean flows for one-step generative modeling,” arXiv preprint arXiv2505.13447, 2025
work page internal anchor Pith review arXiv 2025
-
[21]
Autonomous driving system: A comprehensive survey,
J. Zhao, W. Zhao, B. Deng, Z. Wang, F. Zhang, W. Zheng, W. Cao, J. Nan, Y. Lian, and A. F. Burke, “Autonomous driving system: A comprehensive survey,” Expert Systems with Applications, vol. 242, p. 122836, 2024
2024
-
[22]
Towards fully autonomous driving: Systems and algorithms,
J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kam- mel, J. Z. Kolter, D. Langer, O. Pink, V. Pratt et al., “Towards fully autonomous driving: Systems and algorithms,” in 2011 IEEE intelligent vehicles symposium (IV). IEEE, 2011, pp. 163–168
2011
-
[23]
Autonomous driving security: State of the art and challenges,
C. Gao, G. Wang, W. Shi, Z. Wang, and Y. Chen, “Autonomous driving security: State of the art and challenges,” IEEE Internet of Things Journal, vol. 9, no. 10, pp. 7572–7595, 2021
2021
-
[24]
Deep learning adversarial attacks and defenses in autonomous vehicles: A sys- tematic literature review from a safety perspective,
A. D. M. Ibrahum, M. Hussain, and J.-E. Hong, “Deep learning adversarial attacks and defenses in autonomous vehicles: A sys- tematic literature review from a safety perspective,” Artificial Intelligence Review, vol. 58, no. 1, p. 28, 2024
2024
-
[25]
Evaluating adversarial attacks on driving safety in vision- based autonomous vehicles,
Z. Jindi, L. Yang, W. Jianping, W. Kui, L. Kejie, and J. Xiao- hua, “Evaluating adversarial attacks on driving safety in vision- based autonomous vehicles,” IEEE Internet Things J, vol. 5, no. 9, 2022
2022
-
[26]
Adver- sarial attacks on multi-task visual perception for autonomous driving,
I. Sobh, A. Hamed, V. R. Kumar, and S. Yogamani, “Adver- sarial attacks on multi-task visual perception for autonomous driving,” arXiv preprint arXiv:2107.07449, 2021
-
[27]
Deep learning-based autonomous driving systems: A survey of attacks and defenses,
Y. Deng, T. Zhang, G. Lou, X. Zheng, J. Jin, and Q.-L. Han, “Deep learning-based autonomous driving systems: A survey of attacks and defenses,” IEEE Transactions on Industrial Informatics, vol. 17, no. 12, pp. 7897–7912, 2021
2021
-
[28]
Frequency- driven imperceptible adversarial attack on semantic similarity,
C. Luo, Q. Lin, W. Xie, B. Wu, J. Xie, and L. Shen, “Frequency- driven imperceptible adversarial attack on semantic similarity,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 15 315–15 324
2022
-
[29]
Towards large yet imper- ceptible adversarial image perturbations with perceptual color distance,
Z. Zhao, Z. Liu, and M. Larson, “Towards large yet imper- ceptible adversarial image perturbations with perceptual color distance,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
2020
-
[30]
Natural color fool: Towards boosting black-box unrestricted attacks,
S. Yuan, Q. Zhang, L. Gao, Y. Cheng, and J. Song, “Natural color fool: Towards boosting black-box unrestricted attacks,” Advances in Neural Information Processing Systems, vol. 35, pp. 7546–7560, 2022
2022
-
[31]
Imperceptible adversarial attack via invertible neural net- works,
Z. Chen, Z. Wang, J.-J. Huang, W. Zhao, X. Liu, and D. Guan, “Imperceptible adversarial attack via invertible neural net- works,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, pp. 414–424, Jun. 2023
2023
-
[32]
Advflow: Inconspicuous black-box adversarial attacks using normalizing flows,
H. Mohaghegh Dolatabadi, S. Erfani, and C. Leckie, “Advflow: Inconspicuous black-box adversarial attacks using normalizing flows,” Advances in Neural Information Processing Systems, vol. 33, pp. 15 871–15 884, 2020
2020
-
[33]
Diffattack: Evasion attacks against diffusion-based adversarial purification,
M. Kang, D. Song, and B. Li, “Diffattack: Evasion attacks against diffusion-based adversarial purification,” Advances in 16 Neural Information Processing Systems, vol. 36, pp. 73 919– 73 942, 2023
2023
-
[34]
Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,
K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 11, pp. 12 878–12 895, 2022
2022
-
[35]
Simlingo: Vision-only closed-loop autonomous driving with language- action alignment,
K. Renz, L. Chen, E. Arani, and O. Sinavski, “Simlingo: Vision-only closed-loop autonomous driving with language- action alignment,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 11 993–12 003
2025
-
[36]
Explaining and Harnessing Adversarial Examples
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explain- ing and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014
work page internal anchor Pith review arXiv 2014
-
[37]
Towards Deep Learning Models Resistant to Adversarial Attacks
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017
work page internal anchor Pith review arXiv 2017
-
[38]
Diffusion models for imperceptible and transferable adversarial attack,
J. Chen, H. Chen, K. Chen, Y. Zhang, Z. Zou, and Z. Shi, “Diffusion models for imperceptible and transferable adversarial attack,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 2, pp. 961–977, 2024
2024
-
[39]
Towards large yet imper- ceptible adversarial image perturbations with perceptual color distance,
Z. Zhao, Z. Liu, and M. Larson, “Towards large yet imper- ceptible adversarial image perturbations with perceptual color distance,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 1039–1048
2020
-
[40]
Image quality assessment: from error visibility to structural similarity,
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004
2004
-
[41]
The unreasonable effectiveness of deep features as a perceptual metric,
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
2018
-
[42]
Gans trained by a two time-scale update rule converge to a local nash equilibrium,
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Gar- nett, Eds., vol. 30. Curran Associates, Inc., 2017. Xinyu Z...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.