Recognition: unknown
Fail2Drive: Benchmarking Closed-Loop Driving Generalization
Pith reviewed 2026-05-10 17:15 UTC · model grok-4.3
The pith
A paired benchmark in CARLA shows state-of-the-art driving models drop 22.8 percent success rate under distribution shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that generalization under distribution shift is a central bottleneck for closed-loop autonomous driving, which the authors demonstrate by creating Fail2Drive, the first paired-route benchmark in CARLA. Each of the 200 routes has an in-distribution counterpart, so performance differences can be attributed directly to the 17 scenario classes spanning appearance, layout, behavioral, and robustness shifts. Evaluation of state-of-the-art models shows an average success-rate drop of 22.8 percent, with analysis revealing unexpected failure modes including ignoring clearly visible LiDAR objects and failing to learn fundamental concepts of free and occupied space. The benchmark's
What carries the argument
Fail2Drive benchmark of 200 paired routes across 17 scenario classes that isolates distribution-shift effects by matching each shifted route to an in-distribution counterpart.
If this is right
- Evaluation of new driving models must include paired shifted and in-distribution routes to avoid overestimating generalization.
- Training procedures should explicitly target robustness to appearance, layout, behavioral, and robustness shifts rather than relying on memorization.
- Models that ignore visible LiDAR objects or fail to represent free and occupied space require architectural or data changes focused on spatial reasoning.
- The open-source toolbox allows creation and validation of additional scenario pairs to expand the benchmark.
Where Pith is reading between the lines
- The failure modes suggest current end-to-end models may lack basic spatial understanding that rule-based planners take for granted.
- Similar paired-route designs could be adapted for real-vehicle testing to diagnose generalization before deployment.
- The consistent degradation across models implies that scaling data or model size alone may not resolve these issues without targeted shift training.
Load-bearing premise
That the chosen scenario shifts in the CARLA simulator produce distribution changes representative of those encountered in real-world driving without simulator-specific artifacts that would not occur outside the simulator.
What would settle it
Running the same models on a higher-fidelity simulator or real-world closed-loop tests with matched in-distribution and shifted routes and finding no consistent 22.8 percent success-rate drop or the reported failure modes.
Figures
read the original abstract
Generalization under distribution shift remains a central bottleneck for closed-loop autonomous driving. Although simulators like CARLA enable safe and scalable testing, existing benchmarks rarely measure true generalization: they typically reuse training scenarios at test time. Success can therefore reflect memorization rather than robust driving behavior. We introduce Fail2Drive, the first paired-route benchmark for closed-loop generalization in CARLA, with 200 routes and 17 new scenario classes spanning appearance, layout, behavioral, and robustness shifts. Each shifted route is matched with an in-distribution counterpart, isolating the effect of the shift and turning qualitative failures into quantitative diagnostics. Evaluating multiple state-of-the-art models reveals consistent degradation, with an average success-rate drop of 22.8\%. Our analysis uncovers unexpected failure modes, such as ignoring objects clearly visible in the LiDAR and failing to learn the fundamental concepts of free and occupied space. To accelerate follow-up work, Fail2Drive includes an open-source toolbox for creating new scenarios and validating solvability via a privileged expert policy. Together, these components establish a reproducible foundation for benchmarking and improving closed-loop driving generalization. We open-source all code, data, and tools at https://github.com/autonomousvision/fail2drive .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Fail2Drive, a new benchmark for closed-loop driving generalization in CARLA consisting of 200 routes across 17 scenario classes that introduce appearance, layout, behavioral, and robustness distribution shifts. Each shifted route is explicitly paired with an in-distribution counterpart to isolate the effect of the shift. Evaluation of multiple state-of-the-art models shows an average success-rate drop of 22.8%, accompanied by qualitative analysis of failure modes such as ignoring LiDAR-visible objects and failing to learn free/occupied space concepts. The work also releases an open-source toolbox for scenario generation and solvability validation using a privileged expert policy.
Significance. If the paired-route design validly isolates distribution-shift effects, Fail2Drive would provide a reproducible, quantitative foundation for diagnosing generalization failures in closed-loop autonomous driving, moving beyond memorization of training scenarios. The consistent degradation across models and the identification of specific failure modes (e.g., LiDAR object ignoring) offer actionable diagnostics, while the open-sourced code, data, and tools lower the barrier for follow-up work.
major comments (2)
- [§3 and §4] §3 (Benchmark Construction) and §4 (Experiments): The central claim that the 22.8% success-rate drop measures generalization under the intended shifts rests on the assumption that each shifted route differs from its in-distribution pair only in the target factor. The manuscript describes the pairing and provides a solvability-validation toolbox, but does not report quantitative expert-policy success rates on both members of each pair. Without this, it remains possible that some pairs introduce solvability differences that the models simply expose rather than pure generalization gaps.
- [§4.2] §4.2 (Failure Mode Analysis): The claims that models 'ignore objects clearly visible in the LiDAR' and 'fail to learn the fundamental concepts of free and occupied space' are presented as unexpected diagnostics. These would be strengthened by quantitative supporting metrics (e.g., frequency of such events across multiple runs, comparison against expert trajectories, or occlusion/visibility statistics) rather than relying primarily on qualitative examples, especially given CARLA's idealized ray-casting LiDAR.
minor comments (2)
- [Abstract and §4] The abstract states '200 routes' but the main text should clarify the exact distribution across the 17 classes and the number of evaluation episodes per route to allow readers to assess statistical reliability of the 22.8% aggregate figure.
- [§3] Notation for success rate and the precise definition of 'paired-route' matching criteria (e.g., how layout or behavior shifts are controlled while keeping other variables fixed) could be made more explicit in §3 to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be made to strengthen the paper.
read point-by-point responses
-
Referee: [§3 and §4] §3 (Benchmark Construction) and §4 (Experiments): The central claim that the 22.8% success-rate drop measures generalization under the intended shifts rests on the assumption that each shifted route differs from its in-distribution pair only in the target factor. The manuscript describes the pairing and provides a solvability-validation toolbox, but does not report quantitative expert-policy success rates on both members of each pair. Without this, it remains possible that some pairs introduce solvability differences that the models simply expose rather than pure generalization gaps.
Authors: We agree that explicitly reporting the expert-policy success rates on both members of each pair would provide stronger evidence that the observed drops reflect generalization gaps rather than solvability differences. The solvability-validation toolbox (using the privileged expert) was applied during benchmark construction to filter routes, but the manuscript does not include the per-pair quantitative rates. In the revised version, we will add a table reporting the expert success rates for all 200 routes (in-distribution and shifted pairs), confirming that solvability is comparable across pairs and isolating the effect of the distribution shifts. revision: yes
-
Referee: [§4.2] §4.2 (Failure Mode Analysis): The claims that models 'ignore objects clearly visible in the LiDAR' and 'fail to learn the fundamental concepts of free and occupied space' are presented as unexpected diagnostics. These would be strengthened by quantitative supporting metrics (e.g., frequency of such events across multiple runs, comparison against expert trajectories, or occlusion/visibility statistics) rather than relying primarily on qualitative examples, especially given CARLA's idealized ray-casting LiDAR.
Authors: We acknowledge that the failure-mode claims would benefit from quantitative backing beyond the qualitative examples. The analysis draws from observed behaviors across multiple model evaluations and runs, but the manuscript presents them illustratively. In the revision, we will incorporate quantitative metrics, including the frequency of LiDAR-visible object collisions (computed via post-hoc trajectory analysis with visibility checks), trajectory comparisons to the expert policy on free/occupied space violations, and basic occlusion statistics where feasible. This will complement the examples while noting the idealized nature of CARLA's LiDAR. revision: yes
Circularity Check
Empirical benchmark construction with no derivations or self-referential predictions
full rationale
The paper introduces a paired-route benchmark in CARLA for measuring closed-loop generalization, evaluates existing models on 200 routes across 17 scenario classes, and reports direct success-rate drops plus qualitative failure modes. No equations, parameter fits, uniqueness theorems, or ansatzes appear; the central claims rest on simulator measurements and external model performance rather than any derivation chain that reduces to its own inputs by construction. Self-citations are absent from the provided text, and the open-sourced toolbox is a reproducibility aid, not a load-bearing premise. This is a standard empirical benchmark paper whose results are falsifiable against the simulator and independent models.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption CARLA simulator dynamics and sensor models are sufficiently faithful to real-world driving to diagnose generalization failures that would occur outside simulation.
- domain assumption The pairing procedure isolates the intended distribution shift without introducing uncontrolled differences in route difficulty or solvability.
Forward citations
Cited by 1 Pith paper
-
MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems
MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performanc...
Reference graph
Works this paper leans on
-
[1]
Cosmos- transfer1: Conditional world generation with adaptive multi- modal control, 2025
Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren, Tianchang Sh...
2025
-
[2]
Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yuxin Pan, Giancarlo Baldan, and Oscar Beijbom
Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yuxin Pan, Giancarlo Baldan, and Oscar Beijbom. nuScenes: A multi- modal dataset for autonomous driving. InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[3]
Wolff, Alex H
Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric M. Wolff, Alex H. Lang, Luke Fletcher, Os- car Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021. 2
2021
-
[4]
Carla autonomous driving leader- board 2.0.https://leaderboard.carla.org/,
CARLA Contributors. Carla autonomous driving leader- board 2.0.https://leaderboard.carla.org/,
-
[5]
Learning from all ve- hicles
Dian Chen and Philipp Krähenbühl. Learning from all ve- hicles. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 2
2022
-
[6]
End-to-end autonomous driving: Challenges and frontiers.IEEE Trans
Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, An- dreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers.IEEE Trans. Pattern Anal. Mach. Intell., 2024. 1
2024
-
[7]
Neat: Neural attention fields for end-to-end autonomous driving
Kashyap Chitta, Aditya Prakash, and Andreas Geiger. Neat: Neural attention fields for end-to-end autonomous driving. InIEEE/CVF International Conference on Computer Vision (ICCV), 2021. 2
2021
-
[8]
Lopez, Vladlen Koltun, and Alexey Dosovitskiy
Felipe Codevilla, Antonio M. Lopez, Vladlen Koltun, and Alexey Dosovitskiy. On offline evaluation of vision-based driving models. InEuropean Conference on Computer Vi- sion (ECCV), 2018. 2
2018
-
[9]
López, and Adrien Gaidon
Felipe Codevilla, Eder Santana, Antonio M. López, and Adrien Gaidon. Exploring the limitations of behavior cloning for autonomous driving. InIEEE/CVF International Conference on Computer Vision (ICCV), 2019. 1, 2
2019
-
[10]
Lookout: Diverse multi-future prediction and planning for self-driving
Alexander Cui, Abbas Sadat, Sergio Casas, Renjie Liao, and Raquel Urtasun. Lookout: Diverse multi-future prediction and planning for self-driving. InIEEE/CVF International Conference on Computer Vision (ICCV), 2021. 1
2021
-
[11]
Marco F. Cusumano-Towner, David Hafner, Alexander Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor Killian, Stuart Bowers, Ozan Sener, Philipp Krähenbühl, and Vladlen Koltun. Robust autonomy emerges from self-play.arXiv preprint, 2502.03349, 2025. 1
-
[12]
Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmark- ing
Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, and Kashyap Chitta. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmark- ing. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 2
2024
-
[13]
Carla: An open urban driving simulator, 2017
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator, 2017. 1
2017
-
[14]
CARLA: An open urban driving simulator
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. CARLA: An open urban driving simulator. InConference on Robot Learning (CoRL), 2017. 2
2017
-
[15]
Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. ORION: A holistic end- to-end autonomous driving framework by vision-language instructed action generation.arXiv preprint, 2503.19755,
-
[16]
Plant 2.0: Exposing biases and structural flaws in closed-loop driv- ing, 2025
Simon Gerstenecker, Andreas Geiger, and Katrin Renz. Plant 2.0: Exposing biases and structural flaws in closed-loop driv- ing, 2025. 2, 4, 5
2025
-
[17]
Can vehicle motion planning generalize to realistic long-tail scenarios? InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024
Marcel Hallgarten, Julián Zapata, Martin Stoll, Katrin Renz, and Andreas Zell. Can vehicle motion planning generalize to realistic long-tail scenarios? InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024. 2
2024
-
[18]
A new open-source off-road environment for benchmark gen- eralization of autonomous driving.IEEE Access, 2021
Isaac Han, Dong-Hyeok Park, and Kyung-Joong Kim. A new open-source off-road environment for benchmark gen- eralization of autonomous driving.IEEE Access, 2021. 2
2021
-
[19]
Planning-oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, et al. Planning-oriented autonomous driving. InIEEE/CVF 9 Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 4, 5
2023
-
[20]
Hid- den biases of end-to-end driving models
Bernhard Jaeger, Kashyap Chitta, and Andreas Geiger. Hid- den biases of end-to-end driving models. InIEEE/CVF In- ternational Conference on Computer Vision (ICCV), 2023. 4, 5
2023
-
[21]
Common Mistakes in Bench- marking Autonomous Driving.https : / / github
Bernhard Jaeger, Kashyap Chitta, Daniel Dauner, Katrin Renz, and Andreas Geiger. Common Mistakes in Bench- marking Autonomous Driving.https : / / github . com/autonomousvision/carla_garage/blob/ leaderboard _ 2 / docs / common _ mistakes _ in _ benchmarking_ad.md, 2024. 2
2024
-
[22]
Carl: Learning scalable planning policies with simple rewards.arXiv preprint arXiv:2504.17838, 2025
Bernhard Jaeger, Daniel Dauner, Jens Beißwenger, Simon Gerstenecker, Kashyap Chitta, and Andreas Geiger. Carl: Learning scalable planning policies with simple rewards. arXiv preprint, 2504.17838, 2025. 1
-
[23]
Bench2drive: Towards multi-ability bench- marking of closed-loop end-to-end autonomous driving
Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, and Junchi Yan. Bench2drive: Towards multi-ability bench- marking of closed-loop end-to-end autonomous driving. In NeurIPS 2024 Datasets and Benchmarks Track, 2024. 1, 2, 5
2024
-
[24]
Transfuser: Imitation with transformer-based sensor fusion for autonomous driv- ing.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
Chitta Kashyap, Prakash Aditya, Jaeger Bernhard, Yu Ze- hao, Renz Katrin, and Geiger Andreas. Transfuser: Imitation with transformer-based sensor fusion for autonomous driv- ing.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 1, 2
2023
-
[25]
Neuroncap: Photorealistic closed- loop safety testing for autonomous driving.European Con- ference on Computer Vision (ECCV), 2024
William Ljungbergh, Adam Tonderski, Joakim Johnan- der, Holger Caesar, Kalle Åström, Michael Felsberg, and Christoffer Petersson. Neuroncap: Photorealistic closed- loop safety testing for autonomous driving.European Con- ference on Computer Vision (ECCV), 2024. 2
2024
-
[26]
Ur- bancad: Towards highly controllable and photorealistic 3d vehicles for urban scene simulation
Yichong Lu, Yichi Cai, Shangzhan Zhang, Hongyu Zhou, Haoji Hu, Huimin Yu, Andreas Geiger, and Yiyi Liao. Ur- bancad: Towards highly controllable and photorealistic 3d vehicles for urban scene simulation. InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[27]
Two video data sets for tracking and retrieval of out of distribution objects
Kira Maag, Robin Chan, Svenja Uhlemeyer, Kamil Kowol, and Hanno Gottschalk. Two video data sets for tracking and retrieval of out of distribution objects. InAsian Conference on Computer Vision (ACCV), 2023. 2
2023
-
[28]
Evaluating the robustness of semantic segmentation for autonomous driving against real- world adversarial patch attacks
Federico Nesti, Giulio Rossolini, Saasha Nair, Alessandro Biondi, and Giorgio Buttazzo. Evaluating the robustness of semantic segmentation for autonomous driving against real- world adversarial patch attacks. InIEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV), 2022. 2
2022
-
[29]
Carla real traffic scenarios – novel training ground and benchmark for autonomous driving, 2021
Bła ˙zej Osi ´nski, Piotr Miło ´s, Adam Jakubowski, Paweł Zi˛ ecina, Michał Martyniak, Christopher Galias, Antonia Breuer, Silviu Homoceanu, and Henryk Michalewski. Carla real traffic scenarios – novel training ground and benchmark for autonomous driving, 2021. 2
2021
-
[30]
Simlingo: Vision-only closed-loop autonomous driving with language-action alignment
Katrin Renz, Long Chen, Elahe Arani, and Oleg Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 2, 4, 5
2025
-
[31]
Drivelm: Driving with graph visual question answering
Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InEuropean Conference on Computer Vision (ECCV), 2024. 4, 8
2024
-
[32]
Dta: Phys- ical camouflage attacks using differentiable transformation network
Naufal Suryanto, Yongsu Kim, Hyoeun Kang, Ha- rashta Tatimma Larasati, Youngyeo Yun, Thi-Thu-Huong Le, Hunmin Yang, Se-Yoon Oh, and Howon Kim. Dta: Phys- ical camouflage attacks using differentiable transformation network. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 2
2022
-
[33]
Hip-ad: Hierarchical and multi-granularity planning with deformable attention for autonomous driving in a single decoder, 2025
Yingqi Tang, Zhuoran Xu, Zhaotie Meng, and Erkang Cheng. Hip-ad: Hierarchical and multi-granularity planning with deformable attention for autonomous driving in a single decoder, 2025. 2, 4, 5
2025
-
[34]
Hebert, Takeo Kanade, and Steven A
Charles Thorpe, Martial H. Hebert, Takeo Kanade, and Steven A. Shafer. Vision and navigation for the carnegie- mellon navlab.IEEE Transactions on Pattern Analysis and Machine Intelligence, 1988. 1
1988
-
[35]
Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong base- line
Peng Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, and Yu Qiao. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong base- line. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 4, 5
2022
-
[36]
Safebench: A benchmarking platform for safety evaluation of autonomous vehicles
Chejian Xu, Wenhao Ding, Weijie Lyu, ZUXIN LIU, Shuai Wang, Yihan He, Hanjiang Hu, DING ZHAO, and Bo Li. Safebench: A benchmarking platform for safety evaluation of autonomous vehicles. InAdvances in Neural Information Processing Systems, 2022. 2
2022
-
[37]
Wenda Xu, Jia Pan, Junqing Wei, and John M. Dolan. Motion planning under uncertainty for on-road autonomous driving. InIEEE International Conference on Robotics and Automation (ICRA), 2014. 1
2014
-
[38]
Chatscene: Knowledge-enabled safety-critical scenario generation for autonomous vehicles
Jiawei Zhang, Chejian Xu, and Bo Li. Chatscene: Knowledge-enabled safety-critical scenario generation for autonomous vehicles. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024. 2
2024
-
[39]
Hongyu Zhou, Longzhong Lin, Jiabao Wang, Yichong Lu, Dongfeng Bai, Bingbing Liu, Yue Wang, Andreas Geiger, and Yiyi Liao. Hugsim: A real-time, photo-realistic and closed-loop simulator for autonomous driving.arXiv preprint arXiv:2412.01718, 2024. 2
-
[40]
Zewei Zhou, Tianhui Cai, Seth Z. Zhao, Yun Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision- language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.arXiv preprint, 2506.13757, 2025. 1
work page internal anchor Pith review arXiv 2025
-
[41]
Hidden biases of end-to- end driving datasets.IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024
Julian Zimmerlin, Jens Beißwenger, Bernhard Jaeger, An- dreas Geiger, and Kashyap Chitta. Hidden biases of end-to- end driving datasets.IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024. 2 10 Fail2Drive: Benchmarking Closed-Loop Driving Generalization Supplementary Material A. Scenario description We show one example o...
2024
-
[42]
Unlike the standard CARLA parked-vehicle scenario, which always places the vehicle in the same position, our variant can be defined with any orientation, location and asset
BadParking A parked vehicle partially occludes the ego lane. Unlike the standard CARLA parked-vehicle scenario, which always places the vehicle in the same position, our variant can be defined with any orientation, location and asset. This is meant to challenge models’ spatial un- derstanding with known obstacles. The standardParkedObstaclesce- nario serv...
-
[43]
The in-distribution sample is defined by the defaultConstructionObstacle
ConstructionPermutations A modified version of the standardConstructionObstacle, where con- struction assets can be replaced or removed, isolating dependencies on specific parts of construction sites. The in-distribution sample is defined by the defaultConstructionObstacle
-
[44]
The obstacles can be defined by any number of CARLA assets at arbitrary locations and orientations, enabling testing of generalization to unseen objects and structures
CustomObstacle Fully customizable obstacles block the road. The obstacles can be defined by any number of CARLA assets at arbitrary locations and orientations, enabling testing of generalization to unseen objects and structures. Depending on the obstacle size, aParkedObstacleorCon- structionObstacleis used as an in-distribution sample. 1
-
[45]
5 different occlusions are included with Fail2Drive and any CARLA asset can be used
ObscuredStop Occlusions are placed on stop signs when entering an intersection, challenging the visual traffic sign detection. 5 different occlusions are included with Fail2Drive and any CARLA asset can be used. The in-distribution sample is defined by including the scenario with no occlusion
-
[46]
The classicHardBrakescenario with active brake lights is used as the in-distribution sample
HardBrakeNoLights The leading vehicle suddenly brakes with disabled brake lights, testing if models can judge distance and deceleration without relying on this cue. The classicHardBrakescenario with active brake lights is used as the in-distribution sample
-
[47]
Since CARLA includes this scenario only with emergency vehicles, our variations test whether models only yield to emergency vehicles or generalize to other traffic participants
RightOfWay A custom vehicle takes the ego vehicle’s priority while crossing a junction. Since CARLA includes this scenario only with emergency vehicles, our variations test whether models only yield to emergency vehicles or generalize to other traffic participants. The emergency ve- hicle scenarios serve as the in-distribution sample
-
[48]
Fail2Drive introduces 17 animal assets that can be used for all pedestrian scenarios
Animals An animal crosses the road, forcing the ego vehicle to react, testing if models are able to generalize to actors with other appearances and shapes as pedestrians. Fail2Drive introduces 17 animal assets that can be used for all pedestrian scenarios. By default, CARLA includes only pedestrians, which are used for the in-distribution scenario. 2
-
[49]
The in-distribution scenario uses the default CARLA as- sets
PedestrianOtherBlocker A pedestrian emerges from behind an unseen object to cross the road, evaluating whether models overfit to expect pedestrians only from cer- tain objects. The in-distribution scenario uses the default CARLA as- sets
-
[50]
The scenario tests if models react to known cues even when they are placed outside the relevant regions
RightConstruction A construction obstacle is placed outside the road to the right side, requiring no reaction of the ego vehicle. The scenario tests if models react to known cues even when they are placed outside the relevant regions. The in-distribution sample includes no scenario
-
[51]
The in-distribution sample includes no scenario
OppositeConstruction A construction site is placed in the opposite lane, requiring no reaction from the ego vehicle, again testing overfitting to scenario structures. The in-distribution sample includes no scenario
-
[52]
Images include a walking child at two scales and a red light, testing if models can differentiate between these printed images and real objects
ImageOnObject A deceptive image is placed on an advertisement or a bus stop, the ego vehicle should not react to this influence. Images include a walking child at two scales and a red light, testing if models can differentiate between these printed images and real objects. The in-distribution scenario does not include an image. 3
-
[53]
This tests models’ ability to disregard irrelevant objects that do not affect driving behavior
PassableObstacles Objects are placed on or near the road, allowing the vehicle to pass by maintaining its lane. This tests models’ ability to disregard irrelevant objects that do not affect driving behavior. The in-distribution scenario includes no obstacles
-
[54]
Since in CARLA v2, pedes- trians are only present when relevant to a scenario, models may learn to react strongly to their presence
PedestrianCrowd A large number of pedestrians is standing on the sidewalk while the ego vehicle passes or performs a scenario. Since in CARLA v2, pedes- trians are only present when relevant to a scenario, models may learn to react strongly to their presence. The in-distribution sample is de- fined by the same scenarios without any pedestrians
-
[55]
This scenario requires the model to generalize to stop during the overtaking maneuver, which is not shown during training
ConstructionPedestrian While passing a construction site, a pedestrian crosses the road. This scenario requires the model to generalize to stop during the overtaking maneuver, which is not shown during training. The defaultConstruc- tionObstaclewithout a pedestrian serves as the in-distribution sample
-
[56]
This tests whether pedestri- ans are correctly identified and responded to in out-of-distribution sce- narios
PedestriansOnRoad Pedestrians are walking on the road in front of the ego vehicle, requir- ing deceleration or an evasive maneuver. This tests whether pedestri- ans are correctly identified and responded to in out-of-distribution sce- narios. The in-distribution sample tests solving the underlying route without a scenario. 4
-
[57]
While during training, only passable objects are shown, this scenario tests whether models generalize to stop and wait at obstacles
FullyBlocked An object blocks the entire road, forcing the ego vehicle to stop and wait 60 seconds, until the obstacle is removed and the vehicle can pass. While during training, only passable objects are shown, this scenario tests whether models generalize to stop and wait at obstacles. The in- distribution sample uses no scenario and evaluates a model’s...
-
[58]
In addition to waiting at the object, this scenario introduces highly decep- tive visuals
Wall A large-scale wall with a printed image is placed on the road, requir- ing the agent to wait for 60 seconds until the obstacle is removed. In addition to waiting at the object, this scenario introduces highly decep- tive visuals. Fail2Drive includes one brick wall and three walls with images of roads. The in-distribution route is again defined withou...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.