arxiv: 2604.11854 · v1 · submitted 2026-04-13 · 💻 cs.RO · cs.AI

Recognition: unknown

MVAdapt: Zero-Shot Multi-Vehicle Adaptation for End-to-End Autonomous Driving

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:48 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords end-to-end autonomous drivingvehicle adaptationzero-shot transferphysics conditioningmulti-vehiclewaypoint predictionCARLA benchmark

0 comments

The pith

Conditioning end-to-end driving policies on vehicle physics properties enables zero-shot adaptation to new vehicles with different sizes, masses, and drivetrains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the vehicle-domain gap in end-to-end autonomous driving, where a policy trained on one vehicle loses performance when deployed on another because its outputs are tied to specific dynamics. It introduces MVAdapt, which keeps a scene encoder frozen and adds a lightweight physics encoder plus cross-attention to modify scene features according to vehicle property vectors before waypoint decoding. This setup is tested in the CARLA Leaderboard 1.0 benchmark, where it outperforms naive transfer and multi-embodiment baselines on both in-distribution and unseen vehicles. The results show strong zero-shot transfer for many vehicles and data-efficient few-shot calibration only when physical differences are extreme. A sympathetic reader would see this as a step toward driving models that can be reused across varied vehicle fleets without per-vehicle retraining.

Core claim

MVAdapt is a physics-conditioned adaptation framework that freezes a TransFuser++ scene encoder and inserts a lightweight physics encoder together with a cross-attention module. The cross-attention conditions the extracted scene features on vehicle property vectors such as size, mass, and drivetrain type before the waypoint decoder produces outputs. In the CARLA Leaderboard 1.0 benchmark this yields measurable gains over naive transfer and multi-embodiment baselines for both in-distribution and unseen vehicles, with complementary behaviors of strong zero-shot transfer on many vehicles and data-efficient few-shot calibration only for severe physical outliers.

What carries the argument

The cross-attention module that fuses outputs from a lightweight physics encoder with frozen scene features to adjust waypoint predictions according to vehicle property vectors.

If this is right

Waypoint predictions become more accurate on both familiar and previously unseen vehicle types without additional training.
Strong zero-shot transfer occurs across a wide range of vehicle property combinations.
Only a small amount of calibration data is required when the new vehicle is a severe physical outlier.
Explicit conditioning on vehicle physics improves overall transferability of end-to-end policies in simulation benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If vehicle properties can be measured or estimated at deployment time, the same conditioning mechanism could support on-the-fly adaptation during operation.
The separation of scene encoding from physics encoding may reduce the cost of maintaining policies across large mixed fleets of vehicles.
The approach could be tested for transfer to other control domains where embodiment differences affect policy outputs.

Load-bearing premise

That vehicle property vectors fed through a lightweight physics encoder and cross-attention can meaningfully reshape scene features for accurate waypoint prediction without any vehicle-specific fine-tuning data or internal dynamics simulation.

What would settle it

If MVAdapt produces no measurable improvement in waypoint accuracy or collision rate over a naive transfer baseline when evaluated on a vehicle whose mass or wheelbase lies far outside the training distribution in the CARLA simulator, the value of the physics-conditioning step would be called into question.

Figures

Figures reproduced from arXiv: 2604.11854 by Haesung Oh, Jaeheung Park.

**Figure 1.** Figure 1: Advantages of MVAdapt: Conventional E2E models lack transferability across vehicles (left), while MVAdapt enables zero-shot adaptation to unseen vehicle types (right) [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Few-shot Adaptation: Even if MVAdapt is not able to adapt to an exceptional unseen vehicle in a zero-shot manner (left), it shows fine-tuning ability with a minimal dataset (right). expert actions for the vehicle. When it is naively transferred to a big and heavy SUV, the same driving outputs with the same sensor inputs cause critical dangers. First, an SUV and a sedan will exhibit distinct dynamic respons… view at source ↗

**Figure 3.** Figure 3: Overall architecture of MVAdapt: Raw sensor inputs (camera image and LiDAR point cloud) are processed by a frozen TransFuser++ backbone to extract scene features, while vehicle-specific physical properties are encoded into a physics embedding. A multi-head transformer encoder fuses the physics embedding with scene features, producing an integrated feature embedding that conditions perception on the ego-veh… view at source ↗

**Figure 4.** Figure 4: A Tesla Cybertruck making a right turn. Top (Ours): MVAdapt successfully navigates the turn by accounting for the vehicle’s large size. Bottom (Baseline): The baseline misjudges the turning radius and gets blocked by another car. The red dots represent the model’s output trajectory, and the blue dot indicates the target point [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: A Mini Cooper making a right turn. Top (Ours): MVAdapt executes a smooth, tight turn appropriate for the vehicle. Bottom (Baseline): The baseline model over-rotates and hits the curb, failing the maneuver. The red dots represent the model’s output trajectory, and the blue dot indicates the target point. TABLE V: Ablation study on Town05 Long. Model Physics Cross-Attn. DS ↑ RC ↑ IS ↑ TransFuser++ baseline N… view at source ↗

**Figure 6.** Figure 6: Catastrophic failure during a right turn. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

End-to-End (E2E) autonomous driving models are usually trained and evaluated with a fixed ego-vehicle, even though their driving policy is implicitly tied to vehicle dynamics. When such a model is deployed on a vehicle with different size, mass, or drivetrain characteristics, its performance can degrade substantially; we refer to this problem as the vehicle-domain gap. To address it, we propose MVAdapt, a physics-conditioned adaptation framework for multi-vehicle E2E driving. MVAdapt combines a frozen TransFuser++ scene encoder with a lightweight physics encoder and a cross-attention module that conditions scene features on vehicle properties before waypoint decoding. In the CARLA Leaderboard 1.0 benchmark, MVAdapt improves over naive transfer and multi-embodiment adaptation baselines on both in-distribution and unseen vehicles. We further show two complementary behaviors: strong zero-shot transfer on many unseen vehicles, and data-efficient few-shot calibration for severe physical outliers. These results suggest that explicitly conditioning E2E driving policies on vehicle physics is an effective step toward more transferable autonomous driving models. All codes are available at https://github.com/hae-sung-oh/MVAdapt

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MVAdapt gives a clean architecture for conditioning E2E driving on vehicle properties but the results are too lightly documented to judge how well it actually works.

read the letter

The core idea is straightforward: freeze a TransFuser++ scene encoder, add a small physics encoder that takes vehicle size, mass, and drivetrain as input, and use cross-attention to let those properties shape the features before waypoint prediction. This produces reported gains on CARLA Leaderboard 1.0 for both in-distribution and unseen vehicles, plus some zero-shot transfer and cheap few-shot fixes for outliers. The approach is new enough in its specific combination for multi-vehicle E2E driving, and it directly targets a practical deployment headache—policies that work on one car often break on another because of different dynamics. That part is useful and worth noting for anyone trying to move these models off the training rig and onto real fleets. The paper also ships code, which helps reproducibility. The main weakness is that the abstract and available description give no numbers, no ablation tables, and no error breakdowns. Without those, it is hard to tell whether the cross-attention is doing real physics-aware adjustment or just memorizing property-to-behavior patterns from the training set. The stress-test concern lands: static property vectors plus no online dynamics or state feedback inside the loop could be insufficient for truly novel mass or inertia combinations, and CARLA's enforced physics does not automatically prove the adaptation is general. A reader who wants to try vehicle-conditioned policies will find the architecture easy to replicate and test, but anyone needing strong evidence of transfer will have to wait for the full results section. This is the kind of paper that belongs in a robotics or learning-for-control venue. It deserves a serious referee because the problem is real and the method is simple to evaluate, even if the current write-up leaves the central claim under-supported. I would send it out for review with a request for quantitative details and checks on whether the adaptation respects dynamics.

Referee Report

3 major / 2 minor

Summary. The paper proposes MVAdapt, a zero-shot adaptation framework for end-to-end autonomous driving that addresses the vehicle-domain gap by freezing a TransFuser++ scene encoder and conditioning its features on vehicle properties (size, mass, drivetrain) via a lightweight physics encoder and cross-attention module before waypoint decoding. It claims improved performance over naive transfer and multi-embodiment baselines on the CARLA Leaderboard 1.0 for both in-distribution and unseen vehicles, plus data-efficient few-shot calibration for physical outliers, with all code released.

Significance. If the empirical gains hold under scrutiny, the work provides evidence that explicit static physics conditioning can enhance transferability of E2E policies without vehicle-specific fine-tuning or in-loop dynamics simulation. This is a meaningful step toward more robust deployment across heterogeneous vehicle fleets. The open-source code release strengthens reproducibility.

major comments (3)

[Methods (physics encoder and cross-attention)] The central claim that cross-attention on static vehicle property vectors produces dynamics-consistent feature adjustments for waypoint prediction (without state feedback or simulation) is load-bearing for the zero-shot transfer result. The manuscript should include an analysis or visualization showing how the adapted features differ for novel mass/inertia combinations in a manner consistent with CARLA's physics engine rather than memorized correlations from training vehicles.
[Results] Results section: the abstract states improvement on CARLA Leaderboard 1.0 but the provided description lacks specific quantitative metrics, standard deviations, or per-vehicle breakdowns. Without these, the magnitude of gains over baselines and the distinction between in-distribution vs. unseen vehicles cannot be evaluated for statistical reliability.
[Experiments / Ablations] The evaluation relies on CARLA's enforced physics, yet no ablation isolates whether gains arise from the physics encoder versus other factors (e.g., the multi-embodiment baseline details). This is needed to support the claim that explicit conditioning, rather than implicit memorization, drives the zero-shot behavior.

minor comments (2)

[Methods] Clarify the exact vehicle property vector dimensionality and normalization used in the physics encoder; this affects reproducibility.
[Abstract / Results] The abstract mentions 'strong zero-shot transfer on many unseen vehicles' but does not define the criteria for 'many' or list the specific unseen vehicles tested; add this to the results for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of explicit physics conditioning for zero-shot transfer in end-to-end driving. We address each major comment below with targeted revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Methods (physics encoder and cross-attention)] The central claim that cross-attention on static vehicle property vectors produces dynamics-consistent feature adjustments for waypoint prediction (without state feedback or simulation) is load-bearing for the zero-shot transfer result. The manuscript should include an analysis or visualization showing how the adapted features differ for novel mass/inertia combinations in a manner consistent with CARLA's physics engine rather than memorized correlations from training vehicles.

Authors: We agree that an explicit demonstration of dynamics-consistent feature adaptation is valuable to support the zero-shot claims. In the revised manuscript we will add a new analysis subsection with visualizations of cross-attention weights and feature deltas for vehicles with varying mass, size, and inertia (including unseen combinations). We will relate these changes to CARLA physics expectations, such as altered steering curvature or braking response for heavier vehicles, using both quantitative metrics and qualitative examples to distinguish from memorization of training-vehicle correlations. revision: yes
Referee: [Results] Results section: the abstract states improvement on CARLA Leaderboard 1.0 but the provided description lacks specific quantitative metrics, standard deviations, or per-vehicle breakdowns. Without these, the magnitude of gains over baselines and the distinction between in-distribution vs. unseen vehicles cannot be evaluated for statistical reliability.

Authors: The full results section and supplementary tables already report driving scores, route completion, and infractions as means with standard deviations across multiple evaluation seeds, together with per-vehicle breakdowns separating in-distribution and unseen vehicles. To improve clarity we will revise the abstract to include the key quantitative gains (e.g., absolute and relative improvements on unseen vehicles) while preserving conciseness. revision: yes
Referee: [Experiments / Ablations] The evaluation relies on CARLA's enforced physics, yet no ablation isolates whether gains arise from the physics encoder versus other factors (e.g., the multi-embodiment baseline details). This is needed to support the claim that explicit conditioning, rather than implicit memorization, drives the zero-shot behavior.

Authors: We will add a targeted ablation that keeps the frozen scene encoder and waypoint decoder fixed while replacing the physics encoder and cross-attention with either a constant vehicle embedding or a non-physics multi-embodiment variant. The revised paper will report the resulting drop in zero-shot performance on unseen vehicles, thereby isolating the contribution of explicit static physics conditioning. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical adaptation framework with no derivations or fitted predictions

full rationale

The paper describes an empirical method (frozen TransFuser++ encoder + lightweight physics encoder + cross-attention for conditioning on static vehicle properties) evaluated on CARLA benchmarks for zero-shot and few-shot transfer. No equations, derivations, or first-principles predictions are presented that could reduce to inputs by construction. Claims rest on benchmark improvements over baselines rather than any relabeled fit or self-referential definition. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked in the provided text. This is a standard non-circular empirical ML paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, training details, or explicit assumptions; therefore no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5504 in / 1054 out tokens · 32369 ms · 2026-05-10T15:48:12.615353+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 7 canonical work pages · 1 internal anchor

[1]

End-to-end autonomous driving: Challenges and frontiers,

L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024
[2]

A survey of autonomous driving: Common practices and emerging technologies,

E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE access, vol. 8, pp. 58 443–58 469, 2020

2020
[3]

Recent advancements in end-to-end au- tonomous driving using deep learning: A survey,

P. S. Chib and P. Singh, “Recent advancements in end-to-end au- tonomous driving using deep learning: A survey,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 103–118, 2023

2023
[4]

A survey of end-to-end driving: Architectures and training methods,

A. Tampuu, T. Matiisen, M. Semikin, D. Fishman, and N. Muhammad, “A survey of end-to-end driving: Architectures and training methods,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 4, pp. 1364–1384, 2020

2020
[5]

A survey of deep learning applications to autonomous vehicle control,

S. Kuutti, R. Bowden, Y . Jin, P. Barber, and S. Fallah, “A survey of deep learning applications to autonomous vehicle control,”IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 712–733, 2020

2020
[6]

How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence,

X. Hu, S. Li, T. Huang, B. Tang, R. Huai, and L. Chen, “How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 593–612, 2023

2023
[7]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in2017 IEEE/RSJ international con- ference on intelligent robots and systems (IROS). IEEE, 2017, pp. 23–30

2017
[8]

End-to-end driving via conditional imitation learning,

F. Codevilla, M. M ¨uller, A. L ´opez, V . Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 4693–4700

2018
[9]

Sim-to-real via sim-to-seg: End-to-end off-road autonomous driving without real data,

J. So, A. Xie, S. Jung, J. Edlund, R. Thakker, A. Agha-mohammadi, P. Abbeel, and S. James, “Sim-to-real via sim-to-seg: End-to-end off-road autonomous driving without real data,”arXiv preprint arXiv:2210.14721, 2022

work page arXiv 2022
[10]

Acdc: The adverse conditions dataset with correspondences for semantic driving scene understand- ing,

C. Sakaridis, D. Dai, and L. Van Gool, “Acdc: The adverse conditions dataset with correspondences for semantic driving scene understand- ing,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 765–10 775

2021
[11]

Seeing through fog without seeing fog: Deep multi- modal sensor fusion in unseen adverse weather,

M. Bijelic, T. Gruber, F. Mannan, F. Kraus, W. Ritter, K. Dietmayer, and F. Heide, “Seeing through fog without seeing fog: Deep multi- modal sensor fusion in unseen adverse weather,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 682–11 692

2020
[12]

Learning to drive anywhere,

R. Zhu, P. Huang, E. Ohn-Bar, and V . Saligrama, “Learning to drive anywhere,”arXiv preprint arXiv:2309.12295, 2023

work page arXiv 2023
[13]

Online adaptation of learned vehicle dynamics model with meta-learning approach,

Y . Tsuchiya, T. Balch, P. Drews, and G. Rosman, “Online adaptation of learned vehicle dynamics model with meta-learning approach,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 802–809

2024
[14]

Train in germany, test in the usa: Making 3d object detectors generalize,

Y . Wang, X. Chen, Y . You, L. E. Li, B. Hariharan, M. Campbell, K. Q. Weinberger, and W.-L. Chao, “Train in germany, test in the usa: Making 3d object detectors generalize,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 713–11 723

2020
[15]

Survey on unsupervised domain adaptation for semantic segmentation for visual perception in auto- mated driving,

M. Schwonberg, J. Niemeijer, J.-A. Term ¨ohlen, N. M. Schmidt, H. Gottschalk, T. Fingscheidtet al., “Survey on unsupervised domain adaptation for semantic segmentation for visual perception in auto- mated driving,”IEEE Access, vol. 11, pp. 54 296–54 336, 2023

2023
[16]

A comparison of imitation learning pipelines for autonomous driving on the effect of change in ego-vehicle,

N. A. Ajak, W. H. Ong, and O. A. Malik, “A comparison of imitation learning pipelines for autonomous driving on the effect of change in ego-vehicle,” in2024 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2024, pp. 1693–1698

2024
[17]

End to End Learning for Self-Driving Cars

M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhanget al., “End to end learning for self-driving cars,”arXiv preprint arXiv:1604.07316, 2016

work page internal anchor Pith review arXiv 2016
[18]

Explaining how a deep neural net- work trained with end-to-end learning steers a car,

M. Bojarski, P. Yeres, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, and U. Muller, “Explaining how a deep neural net- work trained with end-to-end learning steers a car,”arXiv preprint arXiv:1704.07911, 2017

work page arXiv 2017
[19]

Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,

K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for au- tonomous driving,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 11, pp. 12 878–12 895, 2022

2022
[20]

Hidden biases of end-to- end driving models,

B. Jaeger, K. Chitta, and A. Geiger, “Hidden biases of end-to- end driving models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8240–8249

2023
[21]

Safety-enhanced autonomous driving using interpretable sensor fusion transformer,

H. Shao, L. Wang, R. Chen, H. Li, and Y . Liu, “Safety-enhanced autonomous driving using interpretable sensor fusion transformer,” in Conference on Robot Learning. PMLR, 2023, pp. 726–737

2023
[22]

Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments,

G. Varma, A. Subramanian, A. Namboodiri, M. Chandraker, and C. Jawahar, “Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments,” in2019 IEEE winter conference on applications of computer vision (WACV). IEEE, 2019, pp. 1743–1751

2019
[23]

Bdd100k: A diverse driving dataset for heterogeneous multitask learning,

F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636–2645

2020
[24]

Differen- tiable mpc for end-to-end planning and control,

B. Amos, I. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differen- tiable mpc for end-to-end planning and control,”Advances in neural information processing systems, vol. 31, 2018

2018
[25]

Learning- based model predictive control for autonomous racing,

J. Kabzan, L. Hewing, A. Liniger, and M. N. Zeilinger, “Learning- based model predictive control for autonomous racing,”IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3363–3370, 2019

2019
[26]

Learning- based model predictive control for safe exploration,

T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning- based model predictive control for safe exploration,” in2018 IEEE conference on decision and control (CDC). IEEE, 2018, pp. 6059– 6066

2018
[27]

Deep dynamics: Vehicle dynamics modeling with a physics-constrained neural network for autonomous racing,

J. Chrosniak, J. Ning, and M. Behl, “Deep dynamics: Vehicle dynamics modeling with a physics-constrained neural network for autonomous racing,”IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5292–5297, 2024

2024
[28]

Enhance planning with physics-informed safety controller for end-to-end autonomous driving,

H. Zhou, H. Liu, H. Lu, J. Ma, and Y . Ji, “Enhance planning with physics-informed safety controller for end-to-end autonomous driving,” in2024 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2024, pp. 1775–1782

2024
[29]

Genloco: Generalized locomotion controllers for quadrupedal robots,

G. Feng, H. Zhang, Z. Li, X. B. Peng, B. Basireddy, L. Yue, Z. Song, L. Yang, Y . Liu, K. Sreenathet al., “Genloco: Generalized locomotion controllers for quadrupedal robots,” inConference on Robot Learning. PMLR, 2023, pp. 1893–1903

2023
[30]

Hardware conditioned policies for multi-robot transfer learning,

T. Chen, A. Murali, and A. Gupta, “Hardware conditioned policies for multi-robot transfer learning,”Advances in Neural Information Processing Systems, vol. 31, 2018

2018
[31]

Gnm: A general navigation model to drive any robot

D. Shah, A. Sridhar, A. Bhorkar, N. Hirose, and S. Levine, “Gnm: A general navigation model to drive any robot,”arXiv preprint arXiv:2210.03370, 2022

work page arXiv 2022
[32]

One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,

N. Bohlinger, G. Czechmanowski, M. Krupka, P. Kicki, K. Walas, J. Peters, and D. Tateo, “One policy to run them all: an end-to-end learning approach to multi-embodiment locomotion,”arXiv preprint arXiv:2409.06366, 2024

work page arXiv 2024
[33]

Body transformer: Leveraging robot embodiment for policy learning,

C. Sferrazza, D.-M. Huang, F. Liu, J. Lee, and P. Abbeel, “Body transformer: Leveraging robot embodiment for policy learning,”arXiv preprint arXiv:2408.06316, 2024

work page arXiv 2024
[34]

Anycar to anywhere: Learning universal dynamics model for agile and adaptive mobility,

W. Xiao, H. Xue, T. Tao, D. Kalaria, J. M. Dolan, and G. Shi, “Anycar to anywhere: Learning universal dynamics model for agile and adaptive mobility,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8819–8825

2025
[35]

One model to drift them all: Physics- informed conditional diffusion model for driving at the limits,

F. Djeumou, T. J. Lew, N. Ding, M. Thompson, M. Suminaka, M. Greiff, and J. Subosits, “One model to drift them all: Physics- informed conditional diffusion model for driving at the limits,” in8th Annual Conference on Robot Learning, 2024

2024
[36]

Vehicle type specific waypoint generation,

Y . Liu, J. W. Lavington, A. Scibior, and F. Wood, “Vehicle type specific waypoint generation,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 12 225– 12 230

2022
[37]

Trajectory- guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,

P. Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y . Qiao, “Trajectory- guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 6119– 6132

2022
[38]

Learning by cheating,

D. Chen, B. Zhou, V . Koltun, and P. Kr ¨ahenb¨uhl, “Learning by cheating,” inProceedings of the Conference on Robot Learning, ser. Proceedings of Machine Learning Research, L. P. Kaelbling, D. Kragic, and K. Sugiura, Eds., vol. 100. PMLR, 30 Oct–01 Nov 2020, pp. 66–75

2020
[39]

Neat: Neural attention fields for end-to-end autonomous driving,

K. Chitta, A. Prakash, and A. Geiger, “Neat: Neural attention fields for end-to-end autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 15 793–15 803

2021
[40]

CARLA: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” inProceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16

2017
[41]

Leaderboard for CARLA autonomous driving challenge,

CARLA Simulator Team, “Leaderboard for CARLA autonomous driving challenge,” 2024. [Online]. Available: https://github.com/ carla-simulator/leaderboard

2024