pith. machine review for the scientific record. sign in

arxiv: 2604.11854 · v1 · submitted 2026-04-13 · 💻 cs.RO · cs.AI

Recognition: unknown

MVAdapt: Zero-Shot Multi-Vehicle Adaptation for End-to-End Autonomous Driving

Haesung Oh , Jaeheung Park

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:48 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords end-to-end autonomous drivingvehicle adaptationzero-shot transferphysics conditioningmulti-vehiclewaypoint predictionCARLA benchmark
0
0 comments X

The pith

Conditioning end-to-end driving policies on vehicle physics properties enables zero-shot adaptation to new vehicles with different sizes, masses, and drivetrains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the vehicle-domain gap in end-to-end autonomous driving, where a policy trained on one vehicle loses performance when deployed on another because its outputs are tied to specific dynamics. It introduces MVAdapt, which keeps a scene encoder frozen and adds a lightweight physics encoder plus cross-attention to modify scene features according to vehicle property vectors before waypoint decoding. This setup is tested in the CARLA Leaderboard 1.0 benchmark, where it outperforms naive transfer and multi-embodiment baselines on both in-distribution and unseen vehicles. The results show strong zero-shot transfer for many vehicles and data-efficient few-shot calibration only when physical differences are extreme. A sympathetic reader would see this as a step toward driving models that can be reused across varied vehicle fleets without per-vehicle retraining.

Core claim

MVAdapt is a physics-conditioned adaptation framework that freezes a TransFuser++ scene encoder and inserts a lightweight physics encoder together with a cross-attention module. The cross-attention conditions the extracted scene features on vehicle property vectors such as size, mass, and drivetrain type before the waypoint decoder produces outputs. In the CARLA Leaderboard 1.0 benchmark this yields measurable gains over naive transfer and multi-embodiment baselines for both in-distribution and unseen vehicles, with complementary behaviors of strong zero-shot transfer on many vehicles and data-efficient few-shot calibration only for severe physical outliers.

What carries the argument

The cross-attention module that fuses outputs from a lightweight physics encoder with frozen scene features to adjust waypoint predictions according to vehicle property vectors.

If this is right

  • Waypoint predictions become more accurate on both familiar and previously unseen vehicle types without additional training.
  • Strong zero-shot transfer occurs across a wide range of vehicle property combinations.
  • Only a small amount of calibration data is required when the new vehicle is a severe physical outlier.
  • Explicit conditioning on vehicle physics improves overall transferability of end-to-end policies in simulation benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If vehicle properties can be measured or estimated at deployment time, the same conditioning mechanism could support on-the-fly adaptation during operation.
  • The separation of scene encoding from physics encoding may reduce the cost of maintaining policies across large mixed fleets of vehicles.
  • The approach could be tested for transfer to other control domains where embodiment differences affect policy outputs.

Load-bearing premise

That vehicle property vectors fed through a lightweight physics encoder and cross-attention can meaningfully reshape scene features for accurate waypoint prediction without any vehicle-specific fine-tuning data or internal dynamics simulation.

What would settle it

If MVAdapt produces no measurable improvement in waypoint accuracy or collision rate over a naive transfer baseline when evaluated on a vehicle whose mass or wheelbase lies far outside the training distribution in the CARLA simulator, the value of the physics-conditioning step would be called into question.

Figures

Figures reproduced from arXiv: 2604.11854 by Haesung Oh, Jaeheung Park.

Figure 1
Figure 1. Figure 1: Advantages of MVAdapt: Conventional E2E models lack transferability across vehicles (left), while MVAdapt enables zero-shot adaptation to unseen vehicle types (right) [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Few-shot Adaptation: Even if MVAdapt is not able to adapt to an exceptional unseen vehicle in a zero-shot manner (left), it shows fine-tuning ability with a minimal dataset (right). expert actions for the vehicle. When it is naively transferred to a big and heavy SUV, the same driving outputs with the same sensor inputs cause critical dangers. First, an SUV and a sedan will exhibit distinct dynamic respons… view at source ↗
Figure 3
Figure 3. Figure 3: Overall architecture of MVAdapt: Raw sensor inputs (camera image and LiDAR point cloud) are processed by a frozen TransFuser++ backbone to extract scene features, while vehicle-specific physical properties are encoded into a physics embedding. A multi-head transformer encoder fuses the physics embedding with scene features, producing an integrated feature embedding that conditions perception on the ego-veh… view at source ↗
Figure 4
Figure 4. Figure 4: A Tesla Cybertruck making a right turn. Top (Ours): MVAdapt successfully navigates the turn by ac￾counting for the vehicle’s large size. Bottom (Baseline): The baseline misjudges the turning radius and gets blocked by another car. The red dots represent the model’s output trajectory, and the blue dot indicates the target point [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: A Mini Cooper making a right turn. Top (Ours): MVAdapt executes a smooth, tight turn appropriate for the vehicle. Bottom (Baseline): The baseline model over-rotates and hits the curb, failing the maneuver. The red dots represent the model’s output trajectory, and the blue dot indicates the target point. TABLE V: Ablation study on Town05 Long. Model Physics Cross-Attn. DS ↑ RC ↑ IS ↑ TransFuser++ baseline N… view at source ↗
Figure 6
Figure 6. Figure 6: Catastrophic failure during a right turn. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

End-to-End (E2E) autonomous driving models are usually trained and evaluated with a fixed ego-vehicle, even though their driving policy is implicitly tied to vehicle dynamics. When such a model is deployed on a vehicle with different size, mass, or drivetrain characteristics, its performance can degrade substantially; we refer to this problem as the vehicle-domain gap. To address it, we propose MVAdapt, a physics-conditioned adaptation framework for multi-vehicle E2E driving. MVAdapt combines a frozen TransFuser++ scene encoder with a lightweight physics encoder and a cross-attention module that conditions scene features on vehicle properties before waypoint decoding. In the CARLA Leaderboard 1.0 benchmark, MVAdapt improves over naive transfer and multi-embodiment adaptation baselines on both in-distribution and unseen vehicles. We further show two complementary behaviors: strong zero-shot transfer on many unseen vehicles, and data-efficient few-shot calibration for severe physical outliers. These results suggest that explicitly conditioning E2E driving policies on vehicle physics is an effective step toward more transferable autonomous driving models. All codes are available at https://github.com/hae-sung-oh/MVAdapt

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes MVAdapt, a zero-shot adaptation framework for end-to-end autonomous driving that addresses the vehicle-domain gap by freezing a TransFuser++ scene encoder and conditioning its features on vehicle properties (size, mass, drivetrain) via a lightweight physics encoder and cross-attention module before waypoint decoding. It claims improved performance over naive transfer and multi-embodiment baselines on the CARLA Leaderboard 1.0 for both in-distribution and unseen vehicles, plus data-efficient few-shot calibration for physical outliers, with all code released.

Significance. If the empirical gains hold under scrutiny, the work provides evidence that explicit static physics conditioning can enhance transferability of E2E policies without vehicle-specific fine-tuning or in-loop dynamics simulation. This is a meaningful step toward more robust deployment across heterogeneous vehicle fleets. The open-source code release strengthens reproducibility.

major comments (3)
  1. [Methods (physics encoder and cross-attention)] The central claim that cross-attention on static vehicle property vectors produces dynamics-consistent feature adjustments for waypoint prediction (without state feedback or simulation) is load-bearing for the zero-shot transfer result. The manuscript should include an analysis or visualization showing how the adapted features differ for novel mass/inertia combinations in a manner consistent with CARLA's physics engine rather than memorized correlations from training vehicles.
  2. [Results] Results section: the abstract states improvement on CARLA Leaderboard 1.0 but the provided description lacks specific quantitative metrics, standard deviations, or per-vehicle breakdowns. Without these, the magnitude of gains over baselines and the distinction between in-distribution vs. unseen vehicles cannot be evaluated for statistical reliability.
  3. [Experiments / Ablations] The evaluation relies on CARLA's enforced physics, yet no ablation isolates whether gains arise from the physics encoder versus other factors (e.g., the multi-embodiment baseline details). This is needed to support the claim that explicit conditioning, rather than implicit memorization, drives the zero-shot behavior.
minor comments (2)
  1. [Methods] Clarify the exact vehicle property vector dimensionality and normalization used in the physics encoder; this affects reproducibility.
  2. [Abstract / Results] The abstract mentions 'strong zero-shot transfer on many unseen vehicles' but does not define the criteria for 'many' or list the specific unseen vehicles tested; add this to the results for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of explicit physics conditioning for zero-shot transfer in end-to-end driving. We address each major comment below with targeted revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses
  1. Referee: [Methods (physics encoder and cross-attention)] The central claim that cross-attention on static vehicle property vectors produces dynamics-consistent feature adjustments for waypoint prediction (without state feedback or simulation) is load-bearing for the zero-shot transfer result. The manuscript should include an analysis or visualization showing how the adapted features differ for novel mass/inertia combinations in a manner consistent with CARLA's physics engine rather than memorized correlations from training vehicles.

    Authors: We agree that an explicit demonstration of dynamics-consistent feature adaptation is valuable to support the zero-shot claims. In the revised manuscript we will add a new analysis subsection with visualizations of cross-attention weights and feature deltas for vehicles with varying mass, size, and inertia (including unseen combinations). We will relate these changes to CARLA physics expectations, such as altered steering curvature or braking response for heavier vehicles, using both quantitative metrics and qualitative examples to distinguish from memorization of training-vehicle correlations. revision: yes

  2. Referee: [Results] Results section: the abstract states improvement on CARLA Leaderboard 1.0 but the provided description lacks specific quantitative metrics, standard deviations, or per-vehicle breakdowns. Without these, the magnitude of gains over baselines and the distinction between in-distribution vs. unseen vehicles cannot be evaluated for statistical reliability.

    Authors: The full results section and supplementary tables already report driving scores, route completion, and infractions as means with standard deviations across multiple evaluation seeds, together with per-vehicle breakdowns separating in-distribution and unseen vehicles. To improve clarity we will revise the abstract to include the key quantitative gains (e.g., absolute and relative improvements on unseen vehicles) while preserving conciseness. revision: yes

  3. Referee: [Experiments / Ablations] The evaluation relies on CARLA's enforced physics, yet no ablation isolates whether gains arise from the physics encoder versus other factors (e.g., the multi-embodiment baseline details). This is needed to support the claim that explicit conditioning, rather than implicit memorization, drives the zero-shot behavior.

    Authors: We will add a targeted ablation that keeps the frozen scene encoder and waypoint decoder fixed while replacing the physics encoder and cross-attention with either a constant vehicle embedding or a non-physics multi-embodiment variant. The revised paper will report the resulting drop in zero-shot performance on unseen vehicles, thereby isolating the contribution of explicit static physics conditioning. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical adaptation framework with no derivations or fitted predictions

full rationale

The paper describes an empirical method (frozen TransFuser++ encoder + lightweight physics encoder + cross-attention for conditioning on static vehicle properties) evaluated on CARLA benchmarks for zero-shot and few-shot transfer. No equations, derivations, or first-principles predictions are presented that could reduce to inputs by construction. Claims rest on benchmark improvements over baselines rather than any relabeled fit or self-referential definition. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text. This is a standard non-circular empirical ML paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, training details, or explicit assumptions; therefore no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5504 in / 1054 out tokens · 32369 ms · 2026-05-10T15:48:12.615353+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    End-to-end autonomous driving: Challenges and frontiers,

    L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  2. [2]

    A survey of autonomous driving: Common practices and emerging technologies,

    E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE access, vol. 8, pp. 58 443–58 469, 2020

  3. [3]

    Recent advancements in end-to-end au- tonomous driving using deep learning: A survey,

    P. S. Chib and P. Singh, “Recent advancements in end-to-end au- tonomous driving using deep learning: A survey,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 103–118, 2023

  4. [4]

    A survey of end-to-end driving: Architectures and training methods,

    A. Tampuu, T. Matiisen, M. Semikin, D. Fishman, and N. Muhammad, “A survey of end-to-end driving: Architectures and training methods,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 4, pp. 1364–1384, 2020

  5. [5]

    A survey of deep learning applications to autonomous vehicle control,

    S. Kuutti, R. Bowden, Y . Jin, P. Barber, and S. Fallah, “A survey of deep learning applications to autonomous vehicle control,”IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 712–733, 2020

  6. [6]

    How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence,

    X. Hu, S. Li, T. Huang, B. Tang, R. Huai, and L. Chen, “How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 593–612, 2023

  7. [7]

    Domain randomization for transferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in2017 IEEE/RSJ international con- ference on intelligent robots and systems (IROS). IEEE, 2017, pp. 23–30

  8. [8]

    End-to-end driving via conditional imitation learning,

    F. Codevilla, M. M ¨uller, A. L ´opez, V . Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 4693–4700

  9. [9]

    Sim-to-real via sim-to-seg: End-to-end off-road autonomous driving without real data,

    J. So, A. Xie, S. Jung, J. Edlund, R. Thakker, A. Agha-mohammadi, P. Abbeel, and S. James, “Sim-to-real via sim-to-seg: End-to-end off-road autonomous driving without real data,”arXiv preprint arXiv:2210.14721, 2022

  10. [10]

    Acdc: The adverse conditions dataset with correspondences for semantic driving scene understand- ing,

    C. Sakaridis, D. Dai, and L. Van Gool, “Acdc: The adverse conditions dataset with correspondences for semantic driving scene understand- ing,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 765–10 775

  11. [11]

    Seeing through fog without seeing fog: Deep multi- modal sensor fusion in unseen adverse weather,

    M. Bijelic, T. Gruber, F. Mannan, F. Kraus, W. Ritter, K. Dietmayer, and F. Heide, “Seeing through fog without seeing fog: Deep multi- modal sensor fusion in unseen adverse weather,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 682–11 692

  12. [12]

    Learning to drive anywhere,

    R. Zhu, P. Huang, E. Ohn-Bar, and V . Saligrama, “Learning to drive anywhere,”arXiv preprint arXiv:2309.12295, 2023

  13. [13]

    Online adaptation of learned vehicle dynamics model with meta-learning approach,

    Y . Tsuchiya, T. Balch, P. Drews, and G. Rosman, “Online adaptation of learned vehicle dynamics model with meta-learning approach,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 802–809

  14. [14]

    Train in germany, test in the usa: Making 3d object detectors generalize,

    Y . Wang, X. Chen, Y . You, L. E. Li, B. Hariharan, M. Campbell, K. Q. Weinberger, and W.-L. Chao, “Train in germany, test in the usa: Making 3d object detectors generalize,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 713–11 723

  15. [15]

    Survey on unsupervised domain adaptation for semantic segmentation for visual perception in auto- mated driving,

    M. Schwonberg, J. Niemeijer, J.-A. Term ¨ohlen, N. M. Schmidt, H. Gottschalk, T. Fingscheidtet al., “Survey on unsupervised domain adaptation for semantic segmentation for visual perception in auto- mated driving,”IEEE Access, vol. 11, pp. 54 296–54 336, 2023

  16. [16]

    A comparison of imitation learning pipelines for autonomous driving on the effect of change in ego-vehicle,

    N. A. Ajak, W. H. Ong, and O. A. Malik, “A comparison of imitation learning pipelines for autonomous driving on the effect of change in ego-vehicle,” in2024 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2024, pp. 1693–1698

  17. [17]

    End to End Learning for Self-Driving Cars

    M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhanget al., “End to end learning for self-driving cars,”arXiv preprint arXiv:1604.07316, 2016

  18. [18]

    Explaining how a deep neural net- work trained with end-to-end learning steers a car,

    M. Bojarski, P. Yeres, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, and U. Muller, “Explaining how a deep neural net- work trained with end-to-end learning steers a car,”arXiv preprint arXiv:1704.07911, 2017

  19. [19]

    Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,

    K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 11, pp. 12 878–12 895, 2022

  20. [20]

    Hidden biases of end-to- end driving models,

    B. Jaeger, K. Chitta, and A. Geiger, “Hidden biases of end-to- end driving models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8240–8249

  21. [21]

    Safety-enhanced autonomous driving using interpretable sensor fusion transformer,

    H. Shao, L. Wang, R. Chen, H. Li, and Y . Liu, “Safety-enhanced autonomous driving using interpretable sensor fusion transformer,” in Conference on Robot Learning. PMLR, 2023, pp. 726–737

  22. [22]

    Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments,

    G. Varma, A. Subramanian, A. Namboodiri, M. Chandraker, and C. Jawahar, “Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments,” in2019 IEEE winter conference on applications of computer vision (WACV). IEEE, 2019, pp. 1743–1751

  23. [23]

    Bdd100k: A diverse driving dataset for heterogeneous multitask learning,

    F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636–2645

  24. [24]

    Differen- tiable mpc for end-to-end planning and control,

    B. Amos, I. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differen- tiable mpc for end-to-end planning and control,”Advances in neural information processing systems, vol. 31, 2018

  25. [25]

    Learning- based model predictive control for autonomous racing,

    J. Kabzan, L. Hewing, A. Liniger, and M. N. Zeilinger, “Learning- based model predictive control for autonomous racing,”IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3363–3370, 2019

  26. [26]

    Learning- based model predictive control for safe exploration,

    T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning- based model predictive control for safe exploration,” in2018 IEEE conference on decision and control (CDC). IEEE, 2018, pp. 6059– 6066

  27. [27]

    Deep dynamics: Vehicle dynamics modeling with a physics-constrained neural network for autonomous racing,

    J. Chrosniak, J. Ning, and M. Behl, “Deep dynamics: Vehicle dynamics modeling with a physics-constrained neural network for autonomous racing,”IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5292–5297, 2024

  28. [28]

    Enhance planning with physics-informed safety controller for end-to-end autonomous driving,

    H. Zhou, H. Liu, H. Lu, J. Ma, and Y . Ji, “Enhance planning with physics-informed safety controller for end-to-end autonomous driving,” in2024 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2024, pp. 1775–1782

  29. [29]

    Genloco: Generalized locomotion controllers for quadrupedal robots,

    G. Feng, H. Zhang, Z. Li, X. B. Peng, B. Basireddy, L. Yue, Z. Song, L. Yang, Y . Liu, K. Sreenathet al., “Genloco: Generalized locomotion controllers for quadrupedal robots,” inConference on Robot Learning. PMLR, 2023, pp. 1893–1903

  30. [30]

    Hardware conditioned policies for multi-robot transfer learning,

    T. Chen, A. Murali, and A. Gupta, “Hardware conditioned policies for multi-robot transfer learning,”Advances in Neural Information Processing Systems, vol. 31, 2018

  31. [31]

    Gnm: A general navigation model to drive any robot

    D. Shah, A. Sridhar, A. Bhorkar, N. Hirose, and S. Levine, “Gnm: A general navigation model to drive any robot,”arXiv preprint arXiv:2210.03370, 2022

  32. [32]

    One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,

    N. Bohlinger, G. Czechmanowski, M. Krupka, P. Kicki, K. Walas, J. Peters, and D. Tateo, “One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,”arXiv preprint arXiv:2409.06366, 2024

  33. [33]

    Body transformer: Leveraging robot embodiment for policy learning,

    C. Sferrazza, D.-M. Huang, F. Liu, J. Lee, and P. Abbeel, “Body transformer: Leveraging robot embodiment for policy learning,”arXiv preprint arXiv:2408.06316, 2024

  34. [34]

    Anycar to anywhere: Learning universal dynamics model for agile and adaptive mobility,

    W. Xiao, H. Xue, T. Tao, D. Kalaria, J. M. Dolan, and G. Shi, “Anycar to anywhere: Learning universal dynamics model for agile and adaptive mobility,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8819–8825

  35. [35]

    One model to drift them all: Physics- informed conditional diffusion model for driving at the limits,

    F. Djeumou, T. J. Lew, N. Ding, M. Thompson, M. Suminaka, M. Greiff, and J. Subosits, “One model to drift them all: Physics- informed conditional diffusion model for driving at the limits,” in8th Annual Conference on Robot Learning, 2024

  36. [36]

    Vehicle type specific waypoint generation,

    Y . Liu, J. W. Lavington, A. Scibior, and F. Wood, “Vehicle type specific waypoint generation,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 12 225– 12 230

  37. [37]

    Trajectory- guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,

    P. Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y . Qiao, “Trajectory- guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 6119– 6132

  38. [38]

    Learning by cheating,

    D. Chen, B. Zhou, V . Koltun, and P. Kr ¨ahenb¨uhl, “Learning by cheating,” inProceedings of the Conference on Robot Learning, ser. Proceedings of Machine Learning Research, L. P. Kaelbling, D. Kragic, and K. Sugiura, Eds., vol. 100. PMLR, 30 Oct–01 Nov 2020, pp. 66–75

  39. [39]

    Neat: Neural attention fields for end-to-end autonomous driving,

    K. Chitta, A. Prakash, and A. Geiger, “Neat: Neural attention fields for end-to-end autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 15 793–15 803

  40. [40]

    CARLA: An open urban driving simulator,

    A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” inProceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16

  41. [41]

    Leaderboard for CARLA autonomous driving challenge,

    CARLA Simulator Team, “Leaderboard for CARLA autonomous driving challenge,” 2024. [Online]. Available: https://github.com/ carla-simulator/leaderboard