Test-Time Trajectory Optimization for Autonomous Driving

Alexandre Boulch; Elias Ramzi; Ellington Kirby; Eloi Zablocki; Matthieu Cord; Yihong Xu; Yuan Yin

arxiv: 2606.07170 · v1 · pith:WAHCVO6Fnew · submitted 2026-06-05 · 💻 cs.RO

Test-Time Trajectory Optimization for Autonomous Driving

Yihong Xu , Eloi Zablocki , Yuan Yin , Elias Ramzi , Ellington Kirby , Alexandre Boulch , Matthieu Cord This is my paper

Pith reviewed 2026-06-27 21:51 UTC · model grok-4.3

classification 💻 cs.RO

keywords autonomous drivingtrajectory optimizationtest-time optimizationcross-entropy methodend-to-end plannersNAVSIM benchmarkHUGSIM benchmark

0 comments

The pith

Treating a planner's scorer as a reward and maximizing it at test time with the Cross-Entropy Method improves driving performance without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

End-to-end autonomous driving planners generate candidate trajectories and then score them, but the scoring step cannot affect which candidates are proposed. The paper treats the scorer as a fixed trajectory-level reward and runs the Cross-Entropy Method at inference time to search for higher-scoring trajectories, warm-started from the planner's original proposals. This requires no model changes or extra training and is compatible with any base planner. Results improve on NAVSIM-v1, NAVSIM-v2, and the closed-loop HUGSIM benchmark when the method is added to six different planners. A reader cares because the approach converts an existing limitation of proposal generation into an optimizable objective at runtime.

Core claim

By treating the base planner's scorer as a learned trajectory-level reward function and applying the Cross-Entropy Method at test time, trajectories that better maximize the reward can be found, warm-started from the planner's proposals, yielding higher performance on NAVSIM-v1 (94.7 PDMS), NAVSIM-v2 (56.3 EPDMS), and closed-loop HUGSIM across six base planners without any retraining or additional supervision.

What carries the argument

The Cross-Entropy Method run at test time to maximize the base planner's trajectory scorer treated as a reward function.

If this is right

TOAD works as a plug-and-play addition to any existing end-to-end planner.
No retraining or extra supervision is needed to obtain the reported gains.
Performance rises on NAVSIM-v1 to 94.7 PDMS when TOAD is applied.
Performance rises on NAVSIM-v2 to 56.3 EPDMS when TOAD is applied.
Performance rises on the closed-loop HUGSIM benchmark when TOAD is applied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Many current planners may leave substantial headroom on the table by not optimizing against their own scoring functions after proposal generation.
Similar test-time search could be tried in other sequential decision domains where a learned scorer exists but proposal quality is limited.
Faster or more sample-efficient optimizers than the Cross-Entropy Method might be substituted while preserving the same plug-and-play property.
Pairing the method with stronger initial proposal generators could produce additive rather than redundant gains.

Load-bearing premise

The scorer already produced by the base planner is smooth and informative enough for the Cross-Entropy Method to maximize it effectively at test time.

What would settle it

A controlled test in which running the Cross-Entropy Method on a planner's proposals produces trajectories that receive lower scores from the same scorer or worse closed-loop performance than the original proposals.

Figures

Figures reproduced from arXiv: 2606.07170 by Alexandre Boulch, Elias Ramzi, Ellington Kirby, Eloi Zablocki, Matthieu Cord, Yihong Xu, Yuan Yin.

**Figure 1.** Figure 1: Overview of TOAD. A frozen base planner outputs trajectory proposals and a selected anchor τbase (left). Trajectories are mapped to controls by the inverse bicycle model (BM−1 ). Then, a Cross-Entropy Method (CEM) loop runs for K steps in control space: sample controls, rolls them out with the bicycle model (BM), scores them under ‘Scorer + reg’ (the learned reward with comfort and anchor regularizers), re… view at source ↗

**Figure 2.** Figure 2: Visualization example of improvements and failures found by TOAD on top of DrivoR [8]. 4.3 The gains require search and a generalizing scorer Two factors explain TOAD’s gains: they come from search discovering new trajectories rather than re-ranking or smoothing existing ones, and they only appear with a scorer that remains accurate off 6 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Per-sample inference cost across methods. The bar length is the total inference time per sample, decomposed into the base planner (light gray), the scorer backbone (dark gray), and our test-time optimization (TOAD, blue) [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation study of TOAD with iPad on warmup-two-stage. In Fig. 4a, first observation is that with TOAD, the overall performance is improved compared to vanilla planner. We observe that the initialization from a standard normal distribution still achieves competitive performance (42.7 vs. 48.5), suggesting that TOAD can generalize to a broader class of E2E planners that output only a single trajectory withou… view at source ↗

**Figure 5.** Figure 5: CEM iteration visualization. We visualize the progress of CEM iteration in TOAD, with Hydra-MDP [4]. B.1 Detailed Results in NAVSIM Benchmarks NAVSIM-v1 detailed results. In Tab. 5 we report PDMS and its constituent sub-metrics on navtest. Tab. 5 shows that the safety columns (NC, DAC, TTC) are already saturated for every base planner. Thus, TOAD improves PDMS on the different base planner by boosting the … view at source ↗

read the original abstract

End-to-end planners for autonomous driving typically generate a set of candidate trajectories, score each one, and return the highest-scoring candidate. However, the scorer is applied only after the proposals are generated and cannot influence the set of trajectories: a weak set of candidates limits planning performance regardless of the scorer's quality. We instead treat the scorer as a learned trajectory-level reward function and search for trajectories that maximize it. Our method, TOAD, runs the Cross-Entropy Method at test time, warm-started from the planner's proposals. It requires no retraining and is plug-and-play for existing planners. Across six base planners, TOAD improves results on NAVSIM-v1 (94.7 PDMS), NAVSIM-v2 (56.3 EPDMS), and the closed-loop HUGSIM benchmark. The code will be made publicly available via the project page: https://valeoai.github.io/TOAD/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TOAD applies CEM at test time to refine trajectories from existing planners and reports benchmark gains, but the abstract leaves the reward landscape and optimization details unexamined.

read the letter

The core idea is to treat a planner's scorer as a fixed reward and run Cross-Entropy Method at inference time, warm-started from the planner's own proposals. No retraining is needed, and the method is presented as a drop-in addition for any end-to-end driving planner.

What stands out is the breadth of the experiments: the same procedure is applied to six different base planners and produces higher scores on NAVSIM-v1, NAVSIM-v2, and the closed-loop HUGSIM benchmark. That kind of plug-and-play result is useful to see even if the underlying mechanism is simple.

The main gap is the missing evidence on why the optimization succeeds. The abstract gives only final aggregate numbers; there are no per-scene score deltas, no plots of the reward surface, and no ablations on CEM parameters such as elite fraction, iteration count, or sampling variance. Without those, it is hard to tell whether the scorer is smooth enough for CEM to locate meaningfully better trajectories or whether the gains come from something else.

The paper is aimed at researchers and engineers who already have a working planner and want a lightweight way to squeeze out extra performance on standard driving benchmarks. It is not trying to solve open problems in planning theory.

If the full paper supplies the missing controls, ablations, and some analysis of the reward landscape, it is worth sending to referees. The empirical scope is reasonable and the implementation claim is testable.

Referee Report

3 major / 1 minor

Summary. The paper introduces TOAD, which treats the scorer of existing end-to-end autonomous driving planners as a learned trajectory-level reward and applies the Cross-Entropy Method (CEM) at test time, warm-started from the planner's proposals, to search for higher-scoring trajectories. The method requires no retraining and is presented as plug-and-play. It reports numerical improvements across six base planners on NAVSIM-v1 (94.7 PDMS), NAVSIM-v2 (56.3 EPDMS), and the closed-loop HUGSIM benchmark, with code to be released publicly.

Significance. If the central assumption holds, the approach provides a practical test-time enhancement to existing planners without additional training. The public code release is a clear strength supporting reproducibility and adoption.

major comments (3)

[Section 3] Section 3 (Method): The description of CEM application provides no analysis of the scorer's properties (smoothness, variance, or flat regions) as a trajectory-level reward, which is load-bearing for the claim that CEM will reliably locate improved trajectories over the initial proposals.
[Section 4] Section 4 (Experiments): Aggregate benchmark improvements are reported without per-scene score deltas between base proposals and CEM outputs, ablations on CEM hyperparameters (elite fraction, iterations, sampling variance), or statistical significance tests, preventing verification that gains stem from the optimization rather than confounding factors.
[Table 1] Table 1 (or equivalent results table): The cross-planner gains on NAVSIM-v1/v2 and HUGSIM are presented without controls for CEM convergence criteria or multiple random seeds, undermining assessment of robustness.

minor comments (1)

[Abstract and Section 3] The abstract and method section use 'parameter-free' phrasing for the overall approach, but CEM hyperparameters are introduced without explicit discussion of their selection or sensitivity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and indicate where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Section 3] Section 3 (Method): The description of CEM application provides no analysis of the scorer's properties (smoothness, variance, or flat regions) as a trajectory-level reward, which is load-bearing for the claim that CEM will reliably locate improved trajectories over the initial proposals.

Authors: We agree that analyzing the scorer's properties would strengthen the justification for applying CEM. Our empirical results show consistent gains across six planners, indicating the scorers are amenable to optimization in practice. In the revised manuscript, we will add a brief discussion in Section 3 on observed scorer characteristics (e.g., empirical smoothness from sampling) while noting that a full theoretical analysis lies outside the paper's scope. revision: partial
Referee: [Section 4] Section 4 (Experiments): Aggregate benchmark improvements are reported without per-scene score deltas between base proposals and CEM outputs, ablations on CEM hyperparameters (elite fraction, iterations, sampling variance), or statistical significance tests, preventing verification that gains stem from the optimization rather than confounding factors.

Authors: We will revise Section 4 to include ablations on CEM hyperparameters (elite fraction, iterations, sampling variance). We will also report average per-scene score deltas and apply statistical significance tests (e.g., paired t-tests) on the improvements to confirm they arise from the optimization. revision: yes
Referee: [Table 1] Table 1 (or equivalent results table): The cross-planner gains on NAVSIM-v1/v2 and HUGSIM are presented without controls for CEM convergence criteria or multiple random seeds, undermining assessment of robustness.

Authors: We will update the experimental description and results tables to explicitly state the CEM convergence criteria and report performance averaged over multiple random seeds with standard deviations to better demonstrate robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical test-time optimization with no self-referential derivations

full rationale

The paper describes a plug-and-play method that applies the Cross-Entropy Method at test time to maximize an existing base planner's scorer treated as a reward. No equations, parameter fits, or derivations are present that reduce the claimed benchmark improvements to quantities defined by the paper itself. The approach relies on an external optimizer (CEM) applied to pre-existing components without retraining. No self-citations are load-bearing for any uniqueness theorem or ansatz. The central claim is an empirical performance gain on external benchmarks (NAVSIM, HUGSIM), which does not reduce to a fitted input renamed as prediction or any of the enumerated circular patterns. This is the common case of a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, the method uses standard CEM optimization and existing planner components; no new free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5699 in / 1059 out tokens · 20406 ms · 2026-06-27T21:51:17.992395+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CADET: Physics-Grounded Causal Auditing and Training-Free Deconfounding of End-to-End Driving Planners
cs.RO 2026-06 unverdicted novelty 5.0

CADET is a training-free framework for auditing, benchmarking, and repairing spurious correlations in pretrained end-to-end autonomous driving planners using physics-grounded causal methods.

Reference graph

Works this paper leans on

62 extracted references · 11 linked inside Pith · cited by 1 Pith paper

[1]

Bojarski, D

M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al. End to end learning for self-driving cars.arXiv preprint arXiv:1604.07316, 2016

Pith/arXiv arXiv 2016
[2]

Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y . Qiao, and H. Li. Planning-oriented autonomous driving. InCVPR, 2023

2023
[3]

K. Renz, L. Chen, E. Arani, and O. Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InCVPR, 2025

2025
[4]

Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y . Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu, et al. Hydra- mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024

Pith/arXiv arXiv 2024
[5]

Z. Li, W. Yao, Z. Wang, X. Sun, J. Chen, N. Chang, M. Shen, Z. Wu, S. Lan, and J. M. Alvarez. Generalized trajectory scoring for end-to-end multimodal planning.arXiv preprint arXiv:2506.06664, 2025

arXiv 2025
[6]

Z. Li, W. Yao, Z. Wang, X. Sun, J. Chen, N. Chang, M. Shen, J. Song, Z. Wu, S. Lan, et al. Ztrs: Zero-imitation end-to-end autonomous driving with trajectory scoring.arXiv preprint arXiv:2510.24108, 2025

arXiv 2025
[7]

W. Sun, X. Lin, K. Chen, Z. Pei, X. Li, Y . Shi, and S. Zheng. Sparsedrivev2: Scoring is all you need for end-to-end autonomous driving.arXiv preprint arXiv:2603.29163, 2026

arXiv 2026
[8]

Kirby, A

E. Kirby, A. Boulch, Y . Xu, Y . Yin, G. Puy, E. Zablocki, A. Bursuc, S. Gidaris, R. Marlet, F. Bartoccioni, A.-Q. Cao, N. Samet, T.-H. Vu, and M. Cord. Driving on registers. InCVPR, 2026

2026
[9]

L. Feng, Y . Gao, E. Zablocki, Q. Li, W. Li, S. Liu, M. Cord, and A. Alahi. Rap: 3d rasterization augmented end-to-end planning. InICLR, 2026

2026
[10]

K. Guo, H. Liu, X. Wu, J. Pan, and C. Lv. ipad: Iterative proposal-centric end-to-end autonomous driving.arXiv preprint arXiv:2505.15111, 2025

arXiv 2025
[11]

R. Y . Rubinstein and D. P. Kroese.The cross-entropy method: a unified approach to combinato- rial optimization, Monte-Carlo simulation, and machine learning. Springer, 2004

2004
[12]

Dauner, M

D. Dauner, M. Hallgarten, T. Li, X. Weng, Z. Huang, Z. Yang, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone, et al. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking. InNeurIPS, 2024

2024
[13]

W. Cao, M. Hallgarten, T. Li, D. Dauner, X. Gu, C. Wang, Y . Miron, M. Aiello, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone, A. Geiger, and K. Chitta. Pseudo-simulation for autonomous driving. InCoRL, 2025

2025
[14]

Dauner, M

D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta. Parting with misconceptions about learning-based vehicle motion planning. InCoRL, 2023. 9

2023
[15]

H. Zhou, L. Lin, J. Wang, Y . Lu, D. Bai, B. Liu, Y . Wang, A. Geiger, and Y . Liao. HUGSIM: A real-time, photo-realistic and closed-loop simulator for autonomous driving.IEEE Trans. Pattern Anal. Mach. Intell., 2026

2026
[16]

W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun. End-to-end interpretable neural motion planner. InCVPR, 2019

2019
[17]

Yurtsever, J

E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda. A survey of autonomous driving: Common practices and emerging technologies.IEEE Access, 2020

2020
[18]

Jiang, S

B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang. Vad: Vectorized scene representation for efficient autonomous driving. InICCV, 2023

2023
[19]

X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone. Para-drive: Parallelized architecture for real-time autonomous driving. InCVPR, 2024

2024
[20]

B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InCVPR, 2025

2025
[21]

Z. Xing, X. Zhang, Y . Hu, B. Jiang, T. He, Q. Zhang, X. Long, and W. Yin. Goalflow: Goal- driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. InCVPR, 2025

2025
[22]

W. Yao, Z. Li, S. Lan, Z. Wang, X. Sun, J. M. ´Alvarez, and Z. Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InAAAI, 2026

2026
[23]

Kobilarov

M. Kobilarov. Cross-entropy motion planning.Int. J. Robotics Res., 2012

2012
[24]

Pinneri, S

C. Pinneri, S. Sawant, S. Blaes, J. Achterhold, J. Stueckler, M. Rol ´ınek, and G. Martius. Sample-efficient cross-entropy method for real-time planning. InCoRL, 2020

2020
[25]

Williams, N

G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou. Information theoretic MPC for model-based reinforcement learning. InICRA, 2017

2017
[26]

Drews, G

P. Drews, G. Williams, B. Goldfain, E. A. Theodorou, and J. M. Rehg. Aggressive deep driving: Combining convolutional neural networks and model predictive control. InCoRL, 2017

2017
[27]

Huang, H

Z. Huang, H. Liu, J. Wu, and C. Lv. Differentiable integrated motion prediction and planning with learnable cost function for autonomous driving.IEEE Trans. Neural Networks Learn. Syst., 2024

2024
[28]

C. Sima, K. Chitta, Z. Yu, S. Lan, P. Luo, A. Geiger, H. Li, and J. M. Alvarez. Centaur: Robust end-to-end autonomous driving with test-time training.arXiv preprint arXiv:2503.11650, 2025

arXiv 2025
[29]

Jaech, A

A. Jaech, A. Kalai, A. Lerer, A. Richardson, A. El-Kishky, A. Low, A. Helyar, A. Madry, A. Beutel, A. Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024

Pith/arXiv arXiv 2024
[30]

D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

Pith/arXiv arXiv 2025
[31]

Brown, J

B. Brown, J. Juravsky, R. Ehrlich, R. Clark, Q. V . Le, C. R´e, and A. Mirhoseini. Large language monkeys: Scaling inference compute with repeated sampling.arXiv preprint arXiv:2407.21787, 2024

Pith/arXiv arXiv 2024
[32]

Zheng, Y

P. Zheng, Y . Zhao, Z. Gong, H. Zhu, and S. Wu. Simplevsf: Vlm-scoring fusion for trajectory prediction of end-to-end autonomous driving.arXiv preprint arXiv:2510.17191, 2025

arXiv 2025
[33]

Y . Li, Y . Wang, Y . Liu, J. He, L. Fan, and Z. Zhang. End-to-end driving with online trajectory evaluation via BEV world model. InICCV, 2025. 10

2025
[34]

J. Sun, F. Xue, T. Long, C. Liu, J.-F. Hu, W.-S. Zheng, and N. Sebe. Risk-aware world model predictive control for generalizable end-to-end autonomous driving.arXiv preprint arXiv:2602.23259, 2026

arXiv 2026
[35]

Y . Li, K. Xiong, X. Guo, F. Li, S. Yan, G. Xu, L. Zhou, L. Chen, H. Sun, B. Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving. InICLR, 2026

2026
[36]

Z. Song, L. Liu, H. Pan, B. Liao, M. Guo, L. Yang, Y . Zhang, S. Xu, C. Jia, and Y . Luo. Diver: Reinforced diffusion breaks imitation bottlenecks in end-to-end autonomous driving.arXiv preprint arXiv:2507.04049, 2025

Pith/arXiv arXiv 2025
[37]

D. Li, C. Li, Y . Wang, J. Ren, X. Wen, P. Li, L. Xu, K. Zhan, P. Jia, X. Lang, et al. Learning personalized driving styles via reinforcement learning from human feedback.arXiv preprint arXiv:2503.10434, 2025

arXiv 2025
[38]

Hanselmann, K

N. Hanselmann, K. Renz, K. Chitta, A. Bhattacharyya, and A. Geiger. KING: generating safety-critical driving scenarios for robust imitation via kinematics gradients. InECCV, 2022

2022
[39]

Y . Yin, P. Khayatan,´E. Zablocki, A. Boulch, and M. Cord. Regents: Real-world safety-critical driving scenario generation made stable. InECCV Workshops, 2024

2024
[40]

Kerbl, G

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 2023

2023
[41]

Y . Hong, X. Zhou, Y . Li, X. Zhou, L. Liu, Y . Luo, S. Xu, L. Yang, and Z. Song. Drivefuture: Future-aware latent world models for autonomous driving.arXiv preprint arXiv:2605.09701, 2026

Pith/arXiv arXiv 2026
[42]

Y . Liao, J. Xie, and A. Geiger. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.TPAMI, 2023

2023
[43]

Caesar, V

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom. nuscenes: A multimodal dataset for autonomous driving. InCVPR, 2020

2020
[44]

P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, Y . Wang, and D. Yang. Pandaset: Advanced sensor suite dataset for autonomous driving. InITSC, 2021

2021
[45]

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov. Scalability in perception for autonomous driving: Waymo open dataset. InCVPR, 2020

2020
[46]

Chitta, A

K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.TPAMI, 2022

2022
[47]

Y . Wang, X. Li, W. Wang, J. Zhang, Y . Li, Y . Chen, X. Wang, and Z. Zhang. Unified vision- language-action model.arXiv preprint arXiv:2506.19850, 2025

arXiv 2025
[48]

Y . Chen, Y . Wang, and Z. Zhang. Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers. InICCV, 2025

2025
[49]

C. Shi, S. Shi, K. Sheng, B. Zhang, and L. Jiang. Drivex: Omni scene modeling for learning generalizable world knowledge in autonomous driving. InICCV, 2025

2025
[50]

Zheng, P

Y . Zheng, P. Yang, Z. Xing, Q. Zhang, Y . Zheng, Y . Gao, P. Li, T. Zhang, Z. Xia, P. Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InICCV, 2025. 11

2025
[51]

C. Yuan, Z. Zhang, J. Sun, S. Sun, Z. Huang, C. D. W. Lee, D. Li, Y . Han, A. Wong, K. P. Tee, et al. Drama: An efficient end-to-end motion planner for autonomous driving with mamba. In ISRR, 2024

2024
[52]

S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, Q. Zhang, C. Huang, W. Liu, and X. Wang. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.arXiv preprint arXiv:2402.13243, 2024

Pith/arXiv arXiv 2024
[53]

Wozniak, L

M. Wozniak, L. Liu, Y . Cai, and P. Jensfelt. PRIX: learning to plan from raw pixels for end-to-end autonomous driving.IEEE Robotics Autom. Lett., 2026

2026
[54]

Z. Zhou, T. Cai, S. Z. Zhao, Y . Zhang, Z. Huang, B. Zhou, and J. Ma. Autovla: A vision- language-action model for end-to-end autonomous driving with adaptive reasoning and rein- forcement fine-tuning.NeurIPS, 2025

2025
[55]

Y . Li, S. Shang, W. Liu, B. Zhan, H. Wang, Y . Wang, Y . Chen, X. Wang, Y . An, C. Tang, et al. Drivevla-w0: World models amplify data scaling law in autonomous driving.arXiv preprint arXiv:2510.12796, 2025

Pith/arXiv arXiv 2025
[56]

K. Li, Z. Li, S. Lan, Y . Xie, Z. Zhang, J. Liu, Z. Wu, Z. Yu, and J. M. Alvarez. Hydra- mdp++: Advancing end-to-end driving via expert-guided hydra-distillation.arXiv e-prints, pages arXiv–2503, 2025

2025
[57]

L. Liu, C. Jia, G. Yu, Z. Song, J. Li, F. Jia, P. Wu, X. Hao, and Y . Luo. Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025

arXiv 2025
[58]

Jiang, S

B. Jiang, S. Chen, B. Liao, X. Zhang, W. Yin, Q. Zhang, C. Huang, W. Liu, and X. Wang. Senna: Bridging large vision-language models and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024

Pith/arXiv arXiv 2024
[59]

B. Sun, Y . Cao, Y . Wang, R. Wang, J. Shang, X. Feng, J. Lu, J. Shi, S. Yang, X. Yan, et al. Minddrive: An all-in-one framework bridging world models and vision-language model for end-to-end autonomous driving.arXiv preprint arXiv:2512.04441, 2025

arXiv 2025
[60]

Z. Zhou, R. Yang, Y . Guo, S. X. Chen, T. Feng, K. Pistunova, Y . Shen, L. Su, J. Ma, et al. Spanvla: Efficient action bridging and learning from negative-recovery samples for vision- language-action model.arXiv preprint arXiv:2604.19710, 2026

Pith/arXiv arXiv 2026
[61]

Jiang, Y

A. Jiang, Y . Gao, Z. Sun, Y . Wang, J. Wang, J. Chai, Q. Cao, Y . Heng, H. Jiang, Y . Dong, et al. Diffvla: Vision-language guided diffusion planning for autonomous driving.arXiv preprint arXiv:2505.19381, 2025

arXiv 2025
[62]

H. Tian, T. Li, H. Liu, J. Yang, Y . Qiu, G. Li, J. Wang, Y . Gao, Z. Zhang, L. Wang, H. Ye, T. Tan, L. Chen, and H. Li. Simscale: Learning to drive via real-world simulation at scale. InCVPR, 2026. 12 A Driving Metrics In the study, we use the official metrics as defined by their original benchmarks. NA VSIM-v1 [12].We use the Predictive Driver Model Sco...

2026

[1] [1]

Bojarski, D

M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al. End to end learning for self-driving cars.arXiv preprint arXiv:1604.07316, 2016

Pith/arXiv arXiv 2016

[2] [2]

Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y . Qiao, and H. Li. Planning-oriented autonomous driving. InCVPR, 2023

2023

[3] [3]

K. Renz, L. Chen, E. Arani, and O. Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InCVPR, 2025

2025

[4] [4]

Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y . Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu, et al. Hydra- mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024

Pith/arXiv arXiv 2024

[5] [5]

Z. Li, W. Yao, Z. Wang, X. Sun, J. Chen, N. Chang, M. Shen, Z. Wu, S. Lan, and J. M. Alvarez. Generalized trajectory scoring for end-to-end multimodal planning.arXiv preprint arXiv:2506.06664, 2025

arXiv 2025

[6] [6]

Z. Li, W. Yao, Z. Wang, X. Sun, J. Chen, N. Chang, M. Shen, J. Song, Z. Wu, S. Lan, et al. Ztrs: Zero-imitation end-to-end autonomous driving with trajectory scoring.arXiv preprint arXiv:2510.24108, 2025

arXiv 2025

[7] [7]

W. Sun, X. Lin, K. Chen, Z. Pei, X. Li, Y . Shi, and S. Zheng. Sparsedrivev2: Scoring is all you need for end-to-end autonomous driving.arXiv preprint arXiv:2603.29163, 2026

arXiv 2026

[8] [8]

Kirby, A

E. Kirby, A. Boulch, Y . Xu, Y . Yin, G. Puy, E. Zablocki, A. Bursuc, S. Gidaris, R. Marlet, F. Bartoccioni, A.-Q. Cao, N. Samet, T.-H. Vu, and M. Cord. Driving on registers. InCVPR, 2026

2026

[9] [9]

L. Feng, Y . Gao, E. Zablocki, Q. Li, W. Li, S. Liu, M. Cord, and A. Alahi. Rap: 3d rasterization augmented end-to-end planning. InICLR, 2026

2026

[10] [10]

K. Guo, H. Liu, X. Wu, J. Pan, and C. Lv. ipad: Iterative proposal-centric end-to-end autonomous driving.arXiv preprint arXiv:2505.15111, 2025

arXiv 2025

[11] [11]

R. Y . Rubinstein and D. P. Kroese.The cross-entropy method: a unified approach to combinato- rial optimization, Monte-Carlo simulation, and machine learning. Springer, 2004

2004

[12] [12]

Dauner, M

D. Dauner, M. Hallgarten, T. Li, X. Weng, Z. Huang, Z. Yang, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone, et al. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking. InNeurIPS, 2024

2024

[13] [13]

W. Cao, M. Hallgarten, T. Li, D. Dauner, X. Gu, C. Wang, Y . Miron, M. Aiello, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone, A. Geiger, and K. Chitta. Pseudo-simulation for autonomous driving. InCoRL, 2025

2025

[14] [14]

Dauner, M

D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta. Parting with misconceptions about learning-based vehicle motion planning. InCoRL, 2023. 9

2023

[15] [15]

H. Zhou, L. Lin, J. Wang, Y . Lu, D. Bai, B. Liu, Y . Wang, A. Geiger, and Y . Liao. HUGSIM: A real-time, photo-realistic and closed-loop simulator for autonomous driving.IEEE Trans. Pattern Anal. Mach. Intell., 2026

2026

[16] [16]

W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun. End-to-end interpretable neural motion planner. InCVPR, 2019

2019

[17] [17]

Yurtsever, J

E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda. A survey of autonomous driving: Common practices and emerging technologies.IEEE Access, 2020

2020

[18] [18]

Jiang, S

B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang. Vad: Vectorized scene representation for efficient autonomous driving. InICCV, 2023

2023

[19] [19]

X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone. Para-drive: Parallelized architecture for real-time autonomous driving. InCVPR, 2024

2024

[20] [20]

B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InCVPR, 2025

2025

[21] [21]

Z. Xing, X. Zhang, Y . Hu, B. Jiang, T. He, Q. Zhang, X. Long, and W. Yin. Goalflow: Goal- driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. InCVPR, 2025

2025

[22] [22]

W. Yao, Z. Li, S. Lan, Z. Wang, X. Sun, J. M. ´Alvarez, and Z. Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InAAAI, 2026

2026

[23] [23]

Kobilarov

M. Kobilarov. Cross-entropy motion planning.Int. J. Robotics Res., 2012

2012

[24] [24]

Pinneri, S

C. Pinneri, S. Sawant, S. Blaes, J. Achterhold, J. Stueckler, M. Rol ´ınek, and G. Martius. Sample-efficient cross-entropy method for real-time planning. InCoRL, 2020

2020

[25] [25]

Williams, N

G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou. Information theoretic MPC for model-based reinforcement learning. InICRA, 2017

2017

[26] [26]

Drews, G

P. Drews, G. Williams, B. Goldfain, E. A. Theodorou, and J. M. Rehg. Aggressive deep driving: Combining convolutional neural networks and model predictive control. InCoRL, 2017

2017

[27] [27]

Huang, H

Z. Huang, H. Liu, J. Wu, and C. Lv. Differentiable integrated motion prediction and planning with learnable cost function for autonomous driving.IEEE Trans. Neural Networks Learn. Syst., 2024

2024

[28] [28]

C. Sima, K. Chitta, Z. Yu, S. Lan, P. Luo, A. Geiger, H. Li, and J. M. Alvarez. Centaur: Robust end-to-end autonomous driving with test-time training.arXiv preprint arXiv:2503.11650, 2025

arXiv 2025

[29] [29]

Jaech, A

A. Jaech, A. Kalai, A. Lerer, A. Richardson, A. El-Kishky, A. Low, A. Helyar, A. Madry, A. Beutel, A. Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024

Pith/arXiv arXiv 2024

[30] [30]

D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

Pith/arXiv arXiv 2025

[31] [31]

Brown, J

B. Brown, J. Juravsky, R. Ehrlich, R. Clark, Q. V . Le, C. R´e, and A. Mirhoseini. Large language monkeys: Scaling inference compute with repeated sampling.arXiv preprint arXiv:2407.21787, 2024

Pith/arXiv arXiv 2024

[32] [32]

Zheng, Y

P. Zheng, Y . Zhao, Z. Gong, H. Zhu, and S. Wu. Simplevsf: Vlm-scoring fusion for trajectory prediction of end-to-end autonomous driving.arXiv preprint arXiv:2510.17191, 2025

arXiv 2025

[33] [33]

Y . Li, Y . Wang, Y . Liu, J. He, L. Fan, and Z. Zhang. End-to-end driving with online trajectory evaluation via BEV world model. InICCV, 2025. 10

2025

[34] [34]

J. Sun, F. Xue, T. Long, C. Liu, J.-F. Hu, W.-S. Zheng, and N. Sebe. Risk-aware world model predictive control for generalizable end-to-end autonomous driving.arXiv preprint arXiv:2602.23259, 2026

arXiv 2026

[35] [35]

Y . Li, K. Xiong, X. Guo, F. Li, S. Yan, G. Xu, L. Zhou, L. Chen, H. Sun, B. Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving. InICLR, 2026

2026

[36] [36]

Z. Song, L. Liu, H. Pan, B. Liao, M. Guo, L. Yang, Y . Zhang, S. Xu, C. Jia, and Y . Luo. Diver: Reinforced diffusion breaks imitation bottlenecks in end-to-end autonomous driving.arXiv preprint arXiv:2507.04049, 2025

Pith/arXiv arXiv 2025

[37] [37]

D. Li, C. Li, Y . Wang, J. Ren, X. Wen, P. Li, L. Xu, K. Zhan, P. Jia, X. Lang, et al. Learning personalized driving styles via reinforcement learning from human feedback.arXiv preprint arXiv:2503.10434, 2025

arXiv 2025

[38] [38]

Hanselmann, K

N. Hanselmann, K. Renz, K. Chitta, A. Bhattacharyya, and A. Geiger. KING: generating safety-critical driving scenarios for robust imitation via kinematics gradients. InECCV, 2022

2022

[39] [39]

Y . Yin, P. Khayatan,´E. Zablocki, A. Boulch, and M. Cord. Regents: Real-world safety-critical driving scenario generation made stable. InECCV Workshops, 2024

2024

[40] [40]

Kerbl, G

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 2023

2023

[41] [41]

Y . Hong, X. Zhou, Y . Li, X. Zhou, L. Liu, Y . Luo, S. Xu, L. Yang, and Z. Song. Drivefuture: Future-aware latent world models for autonomous driving.arXiv preprint arXiv:2605.09701, 2026

Pith/arXiv arXiv 2026

[42] [42]

Y . Liao, J. Xie, and A. Geiger. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.TPAMI, 2023

2023

[43] [43]

Caesar, V

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom. nuscenes: A multimodal dataset for autonomous driving. InCVPR, 2020

2020

[44] [44]

P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, Y . Wang, and D. Yang. Pandaset: Advanced sensor suite dataset for autonomous driving. InITSC, 2021

2021

[45] [45]

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov. Scalability in perception for autonomous driving: Waymo open dataset. InCVPR, 2020

2020

[46] [46]

Chitta, A

K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.TPAMI, 2022

2022

[47] [47]

Y . Wang, X. Li, W. Wang, J. Zhang, Y . Li, Y . Chen, X. Wang, and Z. Zhang. Unified vision- language-action model.arXiv preprint arXiv:2506.19850, 2025

arXiv 2025

[48] [48]

Y . Chen, Y . Wang, and Z. Zhang. Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers. InICCV, 2025

2025

[49] [49]

C. Shi, S. Shi, K. Sheng, B. Zhang, and L. Jiang. Drivex: Omni scene modeling for learning generalizable world knowledge in autonomous driving. InICCV, 2025

2025

[50] [50]

Zheng, P

Y . Zheng, P. Yang, Z. Xing, Q. Zhang, Y . Zheng, Y . Gao, P. Li, T. Zhang, Z. Xia, P. Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InICCV, 2025. 11

2025

[51] [51]

C. Yuan, Z. Zhang, J. Sun, S. Sun, Z. Huang, C. D. W. Lee, D. Li, Y . Han, A. Wong, K. P. Tee, et al. Drama: An efficient end-to-end motion planner for autonomous driving with mamba. In ISRR, 2024

2024

[52] [52]

S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, Q. Zhang, C. Huang, W. Liu, and X. Wang. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.arXiv preprint arXiv:2402.13243, 2024

Pith/arXiv arXiv 2024

[53] [53]

Wozniak, L

M. Wozniak, L. Liu, Y . Cai, and P. Jensfelt. PRIX: learning to plan from raw pixels for end-to-end autonomous driving.IEEE Robotics Autom. Lett., 2026

2026

[54] [54]

Z. Zhou, T. Cai, S. Z. Zhao, Y . Zhang, Z. Huang, B. Zhou, and J. Ma. Autovla: A vision- language-action model for end-to-end autonomous driving with adaptive reasoning and rein- forcement fine-tuning.NeurIPS, 2025

2025

[55] [55]

Y . Li, S. Shang, W. Liu, B. Zhan, H. Wang, Y . Wang, Y . Chen, X. Wang, Y . An, C. Tang, et al. Drivevla-w0: World models amplify data scaling law in autonomous driving.arXiv preprint arXiv:2510.12796, 2025

Pith/arXiv arXiv 2025

[56] [56]

K. Li, Z. Li, S. Lan, Y . Xie, Z. Zhang, J. Liu, Z. Wu, Z. Yu, and J. M. Alvarez. Hydra- mdp++: Advancing end-to-end driving via expert-guided hydra-distillation.arXiv e-prints, pages arXiv–2503, 2025

2025

[57] [57]

L. Liu, C. Jia, G. Yu, Z. Song, J. Li, F. Jia, P. Wu, X. Hao, and Y . Luo. Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025

arXiv 2025

[58] [58]

Jiang, S

B. Jiang, S. Chen, B. Liao, X. Zhang, W. Yin, Q. Zhang, C. Huang, W. Liu, and X. Wang. Senna: Bridging large vision-language models and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024

Pith/arXiv arXiv 2024

[59] [59]

B. Sun, Y . Cao, Y . Wang, R. Wang, J. Shang, X. Feng, J. Lu, J. Shi, S. Yang, X. Yan, et al. Minddrive: An all-in-one framework bridging world models and vision-language model for end-to-end autonomous driving.arXiv preprint arXiv:2512.04441, 2025

arXiv 2025

[60] [60]

Z. Zhou, R. Yang, Y . Guo, S. X. Chen, T. Feng, K. Pistunova, Y . Shen, L. Su, J. Ma, et al. Spanvla: Efficient action bridging and learning from negative-recovery samples for vision- language-action model.arXiv preprint arXiv:2604.19710, 2026

Pith/arXiv arXiv 2026

[61] [61]

Jiang, Y

A. Jiang, Y . Gao, Z. Sun, Y . Wang, J. Wang, J. Chai, Q. Cao, Y . Heng, H. Jiang, Y . Dong, et al. Diffvla: Vision-language guided diffusion planning for autonomous driving.arXiv preprint arXiv:2505.19381, 2025

arXiv 2025

[62] [62]

H. Tian, T. Li, H. Liu, J. Yang, Y . Qiu, G. Li, J. Wang, Y . Gao, Z. Zhang, L. Wang, H. Ye, T. Tan, L. Chen, and H. Li. Simscale: Learning to drive via real-world simulation at scale. InCVPR, 2026. 12 A Driving Metrics In the study, we use the official metrics as defined by their original benchmarks. NA VSIM-v1 [12].We use the Predictive Driver Model Sco...

2026