Recognition: unknown
Cross-Modal Phantom: Coordinated Camera-LiDAR Spoofing Against Multi-Sensor Fusion in Autonomous Vehicles
Pith reviewed 2026-05-09 21:13 UTC · model grok-4.3
The pith
Coordinated camera and LiDAR spoofing creates consistent phantom objects that bypass the redundancy of multi-sensor fusion in autonomous vehicles.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We design a coordinated, data-level (early-fusion) attack that emulates the outcome of two synchronized physical spoofing sources: an infrared projection that induces a false camera detection and a LiDAR signal injection that produces a matching 3D point cluster. Rather than implementing the physical attack hardware, we simulate its sensor-level outcomes by inserting perspective-aware image patches and synthetic LiDAR point clusters aligned in 3D space. Using 400 KITTI scenes, our large-scale evaluation shows that the coordinated spoofing deceives a state-of-the-art perception model with an 85.5% successful attack rate. These findings provide the first quantitative evidence that malicious跨模态
What carries the argument
Perspective-aware image patches and 3D-aligned synthetic LiDAR point clusters that fabricate cross-sensor consistency for a false object while preserving the perceptual effects of real synchronized physical attacks.
If this is right
- Multi-sensor fusion no longer provides robust protection once an attacker can force consistent but false readings across modalities.
- The data-fusion logic itself becomes the attack surface when cross-modal agreement is fabricated rather than verified.
- Sensor-level simulation of physical attacks is sufficient to expose the vulnerability without needing full hardware implementation.
- AV perception pipelines that rely on early fusion are exposed to this class of consistency-based attacks at scale.
Where Pith is reading between the lines
- Future defenses could add independent consistency validation or cross-modal anomaly detection before trusting fused output.
- The simulation-to-real gap identified in the weakest assumption suggests targeted physical experiments as the next direct test.
- Similar coordinated spoofing could be explored against other sensor pairs such as radar and camera in the same fusion framework.
Load-bearing premise
The simulated image patches and point clusters produce the same sensor outputs and fusion behavior as actual physical infrared projections and electromagnetic signal injections would on real hardware.
What would settle it
Run synchronized physical attacks using an infrared projector and a LiDAR signal injector on a real vehicle with the same perception model, then measure whether the reported success rate for phantom object detection matches the 85.5 percent from simulation.
Figures
read the original abstract
Autonomous Vehicles (AVs) increasingly depend on Multi-Sensor Fusion (MSF) to combine complementary modalities such as cameras and LiDAR for robust perception. While this redundancy is intended to safeguard against single-sensor failures, the fusion process itself introduces a subtle and underexplored vulnerability. In this work, we investigate whether an attacker can bypass MSF's redundancy by fabricating cross-sensor consistency, making multiple sensors agree on the same false object. We design a coordinated, data-level (early-fusion) attack that emulates the outcome of two synchronized physical spoofing sources: an infrared (IR) projection that induces a false camera detection and a LiDAR signal injection that produces a matching 3D point cluster. Rather than implementing the physical attack hardware, we simulate its sensor-level outcomes by inserting perspective-aware image patches and synthetic LiDAR point clusters aligned in 3D space. This approach preserves the perceptual effects that real IR and IEMI-based spoofing would create at the sensor output. Using 400 KITTI scenes, our large-scale evaluation shows that the coordinated spoofing deceives a state-of-the-art perception model with an 85.5% successful attack rate. These findings provide the first quantitative evidence that malicious cross-modal consistency can compromise MSF-based perception, revealing a critical vulnerability in the core data-fusion logic of modern autonomous vehicle systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a coordinated data-level attack on multi-sensor fusion (MSF) perception in autonomous vehicles. It emulates synchronized physical spoofing—an IR projection creating a false camera detection and IEMI-based LiDAR point injection creating a matching 3D cluster—by inserting perspective-aware image patches and aligned synthetic LiDAR points. Large-scale evaluation on 400 KITTI scenes reports an 85.5% attack success rate against a state-of-the-art perception model, positioning this as the first quantitative demonstration that malicious cross-modal consistency can defeat MSF redundancy.
Significance. If the simulated sensor artifacts faithfully reproduce real physical attacks, the result would be significant: it shows that MSF's intended robustness can be inverted by enforcing cross-sensor agreement on false objects rather than attacking modalities independently. The scale (400 public scenes) and explicit simulation approach are strengths that enable reproducibility and falsifiable follow-up work.
major comments (2)
- [Abstract] Abstract and evaluation description: the 85.5% success rate is presented without specifying the exact MSF architecture, the precise definition of 'successful attack' (e.g., IoU threshold, detection score, or 3D consistency metric), the ranges of attack parameters, or any single-sensor baseline comparisons. These omissions make it impossible to determine whether the result actually demonstrates bypassing of redundancy or simply reflects a weak fusion implementation.
- [Abstract] Abstract (simulation paragraph): the central claim that the attack 'deceives a state-of-the-art perception model' and reveals a 'critical vulnerability in the core data-fusion logic' rests on the untested assumption that perspective-aware patches plus synthetic LiDAR clusters produce sensor outputs equivalent to real synchronized IR projection and IEMI spoofing. No hardware validation, noise-characteristic comparison, or ablation on sensor-specific responses is provided; any mismatch in beam patterns, timing, or response functions would invalidate transfer to the claimed real-world MSF compromise.
minor comments (2)
- Add a dedicated subsection or table enumerating the exact MSF model, fusion weights, and detection thresholds used in the 400-scene experiments.
- Clarify how 3D alignment between inserted image patches and LiDAR clusters is maintained across varying KITTI camera-LiDAR extrinsics and scene depths.
Simulated Author's Rebuttal
We appreciate the referee's thorough review and constructive feedback. Below we provide point-by-point responses to the major comments. We have revised the manuscript to address the concerns regarding evaluation details and simulation assumptions.
read point-by-point responses
-
Referee: [Abstract] Abstract and evaluation description: the 85.5% success rate is presented without specifying the exact MSF architecture, the precise definition of 'successful attack' (e.g., IoU threshold, detection score, or 3D consistency metric), the ranges of attack parameters, or any single-sensor baseline comparisons. These omissions make it impossible to determine whether the result actually demonstrates bypassing of redundancy or simply reflects a weak fusion implementation.
Authors: We thank the referee for highlighting this. The full paper specifies the MSF architecture in the evaluation section as a particular state-of-the-art model, defines successful attack based on the model detecting the false object with sufficient IoU and score, provides parameter ranges in the attack generation, and includes single-sensor baselines. However, to make this immediately clear from the abstract, we have revised the abstract to incorporate concise mentions of these elements. This revision clarifies that the high success rate is due to the coordinated consistency rather than a weak fusion system, as the baselines show lower rates for single modalities. revision: yes
-
Referee: [Abstract] Abstract (simulation paragraph): the central claim that the attack 'deceives a state-of-the-art perception model' and reveals a 'critical vulnerability in the core data-fusion logic' rests on the untested assumption that perspective-aware patches plus synthetic LiDAR clusters produce sensor outputs equivalent to real synchronized IR projection and IEMI spoofing. No hardware validation, noise-characteristic comparison, or ablation on sensor-specific responses is provided; any mismatch in beam patterns, timing, or response functions would invalidate transfer to the claimed real-world MSF compromise.
Authors: We agree that hardware validation would provide stronger evidence for real-world transferability. Our work focuses on a simulation of the sensor outputs to enable large-scale quantitative evaluation, which is a common approach in security research for AV perception attacks. In the revised manuscript, we have expanded the simulation description to include more details on how the patches and points are generated to match typical sensor characteristics, added comparisons to noise models from related physical attack papers, and included an ablation study on sensor response variations. We also explicitly discuss the limitations of the simulation approach and the assumptions made. This addresses the concern while maintaining the paper's scope as a data-level study. revision: partial
- Full hardware validation of the coordinated IR and IEMI spoofing, which was not performed as the study is simulation-based.
Circularity Check
No circularity: empirical attack simulation on public data
full rationale
The paper presents a purely empirical evaluation: it simulates coordinated spoofing by inserting perspective-aware image patches and aligned synthetic LiDAR clusters into 400 KITTI scenes, then measures an 85.5% attack success rate against a state-of-the-art perception model. No equations, derivations, fitted parameters, or predictions are claimed. No self-citations are used to justify uniqueness or load-bearing premises. The central result is a direct, falsifiable measurement on an external public dataset; the simulation method is described explicitly without reducing to its own outputs by construction. This is a standard empirical demonstration with no circular steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Simulated perspective-aware patches and aligned 3D point clusters produce the same perceptual effect on the fusion model as real IR projection and LiDAR signal injection would.
- domain assumption The state-of-the-art perception model and KITTI dataset distribution are representative of real-world MSF behavior in autonomous vehicles.
Reference graph
Works this paper leans on
-
[1]
Tesla Robotaxi Reveal: How to Watch and What to Expect,
E. Stafford, “Tesla Robotaxi Reveal: How to Watch and What to Expect,” https://www.caranddriver.com/news/a62567491/tesla-robotaxi- reveal/, 2024
2024
-
[2]
Tesla’s Musk Unveils Robotaxis Amid Fan- fare and Skepticism,
A. Roy and A. Sriram, “Tesla’s Musk Unveils Robotaxis Amid Fan- fare and Skepticism,” https://www.reuters.com/technology/teslas-musk- unveil-robotaxis-amid-fanfare-skepticism-2024-10-10/, 2024
2024
-
[3]
Driverless Buses Are Coming: An Inside Look at the Technology Behind Them,
W. D. Jones, “Driverless Buses Are Coming: An Inside Look at the Technology Behind Them,” https://spectrum.ieee.org/driverless-bus, 2025
2025
-
[4]
The Impact of Self-Driving Trucks on Jobs,
S. Clevenger, “The Impact of Self-Driving Trucks on Jobs,” https://www.ttnews.com/articles/self-driving-trucks-jobs, 2025
2025
-
[5]
Uber Eats and Waymo Team Up for Driverless Deliv- eries in Phoenix,
B. Shaban, “Uber Eats and Waymo Team Up for Driverless Deliv- eries in Phoenix,” https://www.nbcbayarea.com/news/tech/uber-waymo- driverless-deliveries/3500454/, 2024
-
[6]
Multi-sensor Fusion Perception of Vehicle Environment and its Application in Obstacle Avoidance of Au- tonomous Vehicle,
W. Li, X. Wan, Z. Ma, and Y . Hu, “Multi-sensor Fusion Perception of Vehicle Environment and its Application in Obstacle Avoidance of Au- tonomous Vehicle,”International Journal of Intelligent Transportation Systems Research, pp. 1–14, 2025
2025
-
[7]
Sensor Spoofing Detection On Autonomous Vehicle Using Channel-spatial-temporal Attention Based Autoencoder Network,
M. Zhou and L. Han, “Sensor Spoofing Detection On Autonomous Vehicle Using Channel-spatial-temporal Attention Based Autoencoder Network,”Mobile Networks and Applications, pp. 1–14, 2023
2023
-
[8]
EMI-LiDAR: Uncovering Vulnerabilities of LiDAR Sensors in Autonomous Driving Setting using Electromagnetic Interference,
S. H. V . Bhupathiraju, J. Sheldon, L. A. Bauer, V . Bindschaedler, T. Sug- awara, and S. Rampazzi, “EMI-LiDAR: Uncovering Vulnerabilities of LiDAR Sensors in Autonomous Driving Setting using Electromagnetic Interference,” inProceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2023, pp. 329–340
2023
-
[9]
GhostImage: Remote Perception Attacks against Camera-based Image Classification Systems,
Y . Man, M. Li, and R. Gerdes, “GhostImage: Remote Perception Attacks against Camera-based Image Classification Systems,” in23rd Inter- national Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), 2020, pp. 317–332
2020
-
[10]
Who Is in Control? Practical Physical Layer Attack and Defense for mmWave-Based Sensing in Autonomous Vehicles,
Z. Sun, S. Balakrishnan, L. Su, A. Bhuyan, P. Wang, and C. Qiao, “Who Is in Control? Practical Physical Layer Attack and Defense for mmWave-Based Sensing in Autonomous Vehicles,”IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3199–3214, 2021
2021
-
[11]
Autonomous Vehicles Enabled by the Integration of IoT, Edge Intelligence, 5G, and Blockchain,
A. Biswas and H.-C. Wang, “Autonomous Vehicles Enabled by the Integration of IoT, Edge Intelligence, 5G, and Blockchain,”Sensors, vol. 23, no. 4, p. 1963, 2023
1963
-
[12]
Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D Object Detection,
Z. Cheng, H. Choi, J. Liang, S. Feng, G. Tao, D. Liu, M. Zuzak, and X. Zhang, “Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D Object Detection,”arXiv preprint arXiv:2304.14614, 2023
-
[13]
Security Analysis of Camera-LiDAR Fusion Against Black-Box Attacks on Autonomous Vehicles,
R. S. Hallyburton, Y . Liu, Y . Cao, Z. M. Mao, and M. Pajic, “Security Analysis of Camera-LiDAR Fusion Against Black-Box Attacks on Autonomous Vehicles,” in31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 1903–1920
2022
-
[14]
YOLOv4: Optimal Speed and Accuracy of Object Detection
A. Bochkovskiy, C.-Y . Wang, and H.-Y . M. Liao, “YOLOv4: Op- timal Speed and Accuracy of Object Detection,”arXiv preprint arXiv:2004.10934, 2020
work page internal anchor Pith review arXiv 2004
-
[15]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,”IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137–1149, 2016
2016
-
[16]
W. Jia, Z. Lu, H. Zhang, Z. Liu, J. Wang, and G. Qu, “Fooling the Eyes of Autonomous Vehicles: Robust Physical Adversarial Examples Against Traffic Sign Recognition Systems,”arXiv preprint arXiv:2201.06192, 2022
-
[17]
N. Wang, S. Xie, T. Sato, Y . Luo, K. Xu, and Q. A. Chen, “Revisiting Physical-World Adversarial Attack on Traffic Sign Recognition: A Commercial Systems Perspective,”arXiv preprint arXiv:2409.09860, 2024
-
[18]
Invisible Reflections: Leveraging Infrared Laser Reflections to Target Traffic Sign Perception,
T. Sato, S. H. V . Bhupathiraju, M. Clifford, T. Sugawara, Q. A. Chen, and S. Rampazzi, “Invisible Reflections: Leveraging Infrared Laser Reflections to Target Traffic Sign Perception,”arXiv preprint arXiv:2401.03582, 2024
-
[19]
PointRCNN: 3D Object Proposal Genera- tion and Detection From Point Cloud,
S. Shi, X. Wang, and H. Li, “PointRCNN: 3D Object Proposal Genera- tion and Detection From Point Cloud,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 770– 779
2019
-
[20]
PhantomLiDAR: Cross-modality Signal Injection Attacks against LiDAR,
Z. Jin, Q. Jiang, X. Lu, C. Yan, X. Ji, and W. Xu, “PhantomLiDAR: Cross-modality Signal Injection Attacks against LiDAR,”arXiv preprint arXiv:2409.17907, 2024
-
[21]
PLA-LiDAR: Physical Laser Attacks against LiDAR-based 3D Object Detection in Autonomous Vehicle,
Z. Jin, X. Ji, Y . Cheng, B. Yang, C. Yan, and W. Xu, “PLA-LiDAR: Physical Laser Attacks against LiDAR-based 3D Object Detection in Autonomous Vehicle,” in2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023, pp. 1822–1839
2023
-
[22]
You Can’t See Me: Physical Removal Attacks on LiDAR- based Autonomous Vehicles Driving Frameworks,
Y . Cao, S. H. Bhupathiraju, P. Naghavi, T. Sugawara, Z. M. Mao, and S. Rampazzi, “You Can’t See Me: Physical Removal Attacks on LiDAR- based Autonomous Vehicles Driving Frameworks,” in32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 2993–3010
2023
-
[23]
Drift with Devil: Security of Multi-Sensor Fusion based Localization in High-Level Autonomous Driving under GPS Spoofing,
J. Shen, J. Y . Won, Z. Chen, and Q. A. Chen, “Drift with Devil: Security of Multi-Sensor Fusion based Localization in High-Level Autonomous Driving under GPS Spoofing,” in29th USENIX security symposium (USENIX Security 20), 2020, pp. 931–948
2020
-
[24]
Malicious Attacks against Multi-Sensor Fusion in Autonomous Driving,
Y . Zhu, C. Miao, H. Xue, Y . Yu, L. Su, and C. Qiao, “Malicious Attacks against Multi-Sensor Fusion in Autonomous Driving,” inProceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024, pp. 436–451
2024
-
[25]
Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical-World Attacks,
Y . Cao, N. Wang, C. Xiao, D. Yang, J. Fang, R. Yang, Q. A. Chen, M. Liu, and B. Li, “Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical-World Attacks,” in2021 IEEE symposium on security and privacy (SP). IEEE, 2021, pp. 176–194
2021
-
[26]
Play the Imitation Game: Model Extraction Attack against Autonomous Driving Localization,
Q. Zhang, J. Shen, M. Tan, Z. Zhou, Z. Li, Q. A. Chen, and H. Zhang, “Play the Imitation Game: Model Extraction Attack against Autonomous Driving Localization,” inProceedings of the 38th Annual Computer Security Applications Conference, 2022, pp. 56–70
2022
-
[27]
Vision meets Robotics: The KITTI Dataset,
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets Robotics: The KITTI Dataset,”The international journal of robotics research, vol. 32, no. 11, pp. 1231–1237, 2013
2013
-
[28]
MVX-Net: Multimodal V oxelNet for 3D Object Detection,
V . A. Sindagi, Y . Zhou, and O. Tuzel, “MVX-Net: Multimodal V oxelNet for 3D Object Detection,” in2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 7276–7282
2019
-
[29]
PointPainting: Sequential Fusion for 3D Object Detection,
S. V ora, A. H. Lang, B. Helou, and O. Beijbom, “PointPainting: Sequential Fusion for 3D Object Detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4604–4612
2020
-
[30]
Pointpillars: Fast encoders for object detection from point clouds,
A. H. Lang, S. V ora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 697–12 705
2019
-
[31]
Are we ready for autonomous driving? the kitti vision benchmark suite,
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.