pith. sign in

arxiv: 2606.07170 · v1 · pith:WAHCVO6Fnew · submitted 2026-06-05 · 💻 cs.RO

Test-Time Trajectory Optimization for Autonomous Driving

Pith reviewed 2026-06-27 21:51 UTC · model grok-4.3

classification 💻 cs.RO
keywords autonomous drivingtrajectory optimizationtest-time optimizationcross-entropy methodend-to-end plannersNAVSIM benchmarkHUGSIM benchmark
0
0 comments X

The pith

Treating a planner's scorer as a reward and maximizing it at test time with the Cross-Entropy Method improves driving performance without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

End-to-end autonomous driving planners generate candidate trajectories and then score them, but the scoring step cannot affect which candidates are proposed. The paper treats the scorer as a fixed trajectory-level reward and runs the Cross-Entropy Method at inference time to search for higher-scoring trajectories, warm-started from the planner's original proposals. This requires no model changes or extra training and is compatible with any base planner. Results improve on NAVSIM-v1, NAVSIM-v2, and the closed-loop HUGSIM benchmark when the method is added to six different planners. A reader cares because the approach converts an existing limitation of proposal generation into an optimizable objective at runtime.

Core claim

By treating the base planner's scorer as a learned trajectory-level reward function and applying the Cross-Entropy Method at test time, trajectories that better maximize the reward can be found, warm-started from the planner's proposals, yielding higher performance on NAVSIM-v1 (94.7 PDMS), NAVSIM-v2 (56.3 EPDMS), and closed-loop HUGSIM across six base planners without any retraining or additional supervision.

What carries the argument

The Cross-Entropy Method run at test time to maximize the base planner's trajectory scorer treated as a reward function.

If this is right

  • TOAD works as a plug-and-play addition to any existing end-to-end planner.
  • No retraining or extra supervision is needed to obtain the reported gains.
  • Performance rises on NAVSIM-v1 to 94.7 PDMS when TOAD is applied.
  • Performance rises on NAVSIM-v2 to 56.3 EPDMS when TOAD is applied.
  • Performance rises on the closed-loop HUGSIM benchmark when TOAD is applied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many current planners may leave substantial headroom on the table by not optimizing against their own scoring functions after proposal generation.
  • Similar test-time search could be tried in other sequential decision domains where a learned scorer exists but proposal quality is limited.
  • Faster or more sample-efficient optimizers than the Cross-Entropy Method might be substituted while preserving the same plug-and-play property.
  • Pairing the method with stronger initial proposal generators could produce additive rather than redundant gains.

Load-bearing premise

The scorer already produced by the base planner is smooth and informative enough for the Cross-Entropy Method to maximize it effectively at test time.

What would settle it

A controlled test in which running the Cross-Entropy Method on a planner's proposals produces trajectories that receive lower scores from the same scorer or worse closed-loop performance than the original proposals.

Figures

Figures reproduced from arXiv: 2606.07170 by Alexandre Boulch, Elias Ramzi, Ellington Kirby, Eloi Zablocki, Matthieu Cord, Yihong Xu, Yuan Yin.

Figure 1
Figure 1. Figure 1: Overview of TOAD. A frozen base planner outputs trajectory proposals and a selected anchor τbase (left). Trajectories are mapped to controls by the inverse bicycle model (BM−1 ). Then, a Cross-Entropy Method (CEM) loop runs for K steps in control space: sample controls, rolls them out with the bicycle model (BM), scores them under ‘Scorer + reg’ (the learned reward with comfort and anchor regularizers), re… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization example of improvements and failures found by TOAD on top of DrivoR [8]. 4.3 The gains require search and a generalizing scorer Two factors explain TOAD’s gains: they come from search discovering new trajectories rather than re-ranking or smoothing existing ones, and they only appear with a scorer that remains accurate off 6 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Per-sample inference cost across meth￾ods. The bar length is the total inference time per sample, decomposed into the base planner (light gray), the scorer backbone (dark gray), and our test-time optimization (TOAD, blue) [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation study of TOAD with iPad on warmup-two-stage. In Fig. 4a, first observation is that with TOAD, the overall performance is improved compared to vanilla planner. We observe that the initialization from a standard normal distribution still achieves competitive performance (42.7 vs. 48.5), suggesting that TOAD can generalize to a broader class of E2E planners that output only a single trajectory withou… view at source ↗
Figure 5
Figure 5. Figure 5: CEM iteration visualization. We visualize the progress of CEM iteration in TOAD, with Hydra-MDP [4]. B.1 Detailed Results in NAVSIM Benchmarks NAVSIM-v1 detailed results. In Tab. 5 we report PDMS and its constituent sub-metrics on navtest. Tab. 5 shows that the safety columns (NC, DAC, TTC) are already saturated for every base planner. Thus, TOAD improves PDMS on the different base planner by boosting the … view at source ↗
read the original abstract

End-to-end planners for autonomous driving typically generate a set of candidate trajectories, score each one, and return the highest-scoring candidate. However, the scorer is applied only after the proposals are generated and cannot influence the set of trajectories: a weak set of candidates limits planning performance regardless of the scorer's quality. We instead treat the scorer as a learned trajectory-level reward function and search for trajectories that maximize it. Our method, TOAD, runs the Cross-Entropy Method at test time, warm-started from the planner's proposals. It requires no retraining and is plug-and-play for existing planners. Across six base planners, TOAD improves results on NAVSIM-v1 (94.7 PDMS), NAVSIM-v2 (56.3 EPDMS), and the closed-loop HUGSIM benchmark. The code will be made publicly available via the project page: https://valeoai.github.io/TOAD/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces TOAD, which treats the scorer of existing end-to-end autonomous driving planners as a learned trajectory-level reward and applies the Cross-Entropy Method (CEM) at test time, warm-started from the planner's proposals, to search for higher-scoring trajectories. The method requires no retraining and is presented as plug-and-play. It reports numerical improvements across six base planners on NAVSIM-v1 (94.7 PDMS), NAVSIM-v2 (56.3 EPDMS), and the closed-loop HUGSIM benchmark, with code to be released publicly.

Significance. If the central assumption holds, the approach provides a practical test-time enhancement to existing planners without additional training. The public code release is a clear strength supporting reproducibility and adoption.

major comments (3)
  1. [Section 3] Section 3 (Method): The description of CEM application provides no analysis of the scorer's properties (smoothness, variance, or flat regions) as a trajectory-level reward, which is load-bearing for the claim that CEM will reliably locate improved trajectories over the initial proposals.
  2. [Section 4] Section 4 (Experiments): Aggregate benchmark improvements are reported without per-scene score deltas between base proposals and CEM outputs, ablations on CEM hyperparameters (elite fraction, iterations, sampling variance), or statistical significance tests, preventing verification that gains stem from the optimization rather than confounding factors.
  3. [Table 1] Table 1 (or equivalent results table): The cross-planner gains on NAVSIM-v1/v2 and HUGSIM are presented without controls for CEM convergence criteria or multiple random seeds, undermining assessment of robustness.
minor comments (1)
  1. [Abstract and Section 3] The abstract and method section use 'parameter-free' phrasing for the overall approach, but CEM hyperparameters are introduced without explicit discussion of their selection or sensitivity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and indicate where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Section 3] Section 3 (Method): The description of CEM application provides no analysis of the scorer's properties (smoothness, variance, or flat regions) as a trajectory-level reward, which is load-bearing for the claim that CEM will reliably locate improved trajectories over the initial proposals.

    Authors: We agree that analyzing the scorer's properties would strengthen the justification for applying CEM. Our empirical results show consistent gains across six planners, indicating the scorers are amenable to optimization in practice. In the revised manuscript, we will add a brief discussion in Section 3 on observed scorer characteristics (e.g., empirical smoothness from sampling) while noting that a full theoretical analysis lies outside the paper's scope. revision: partial

  2. Referee: [Section 4] Section 4 (Experiments): Aggregate benchmark improvements are reported without per-scene score deltas between base proposals and CEM outputs, ablations on CEM hyperparameters (elite fraction, iterations, sampling variance), or statistical significance tests, preventing verification that gains stem from the optimization rather than confounding factors.

    Authors: We will revise Section 4 to include ablations on CEM hyperparameters (elite fraction, iterations, sampling variance). We will also report average per-scene score deltas and apply statistical significance tests (e.g., paired t-tests) on the improvements to confirm they arise from the optimization. revision: yes

  3. Referee: [Table 1] Table 1 (or equivalent results table): The cross-planner gains on NAVSIM-v1/v2 and HUGSIM are presented without controls for CEM convergence criteria or multiple random seeds, undermining assessment of robustness.

    Authors: We will update the experimental description and results tables to explicitly state the CEM convergence criteria and report performance averaged over multiple random seeds with standard deviations to better demonstrate robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical test-time optimization with no self-referential derivations

full rationale

The paper describes a plug-and-play method that applies the Cross-Entropy Method at test time to maximize an existing base planner's scorer treated as a reward. No equations, parameter fits, or derivations are present that reduce the claimed benchmark improvements to quantities defined by the paper itself. The approach relies on an external optimizer (CEM) applied to pre-existing components without retraining. No self-citations are load-bearing for any uniqueness theorem or ansatz. The central claim is an empirical performance gain on external benchmarks (NAVSIM, HUGSIM), which does not reduce to a fitted input renamed as prediction or any of the enumerated circular patterns. This is the common case of a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, the method uses standard CEM optimization and existing planner components; no new free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5699 in / 1059 out tokens · 20406 ms · 2026-06-27T21:51:17.992395+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CADET: Physics-Grounded Causal Auditing and Training-Free Deconfounding of End-to-End Driving Planners

    cs.RO 2026-06 unverdicted novelty 5.0

    CADET is a training-free framework for auditing, benchmarking, and repairing spurious correlations in pretrained end-to-end autonomous driving planners using physics-grounded causal methods.

Reference graph

Works this paper leans on

62 extracted references · 11 linked inside Pith · cited by 1 Pith paper

  1. [1]

    Bojarski, D

    M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al. End to end learning for self-driving cars.arXiv preprint arXiv:1604.07316, 2016

  2. [2]

    Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y . Qiao, and H. Li. Planning-oriented autonomous driving. InCVPR, 2023

  3. [3]

    K. Renz, L. Chen, E. Arani, and O. Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InCVPR, 2025

  4. [4]

    Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y . Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu, et al. Hydra- mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024

  5. [5]

    Z. Li, W. Yao, Z. Wang, X. Sun, J. Chen, N. Chang, M. Shen, Z. Wu, S. Lan, and J. M. Alvarez. Generalized trajectory scoring for end-to-end multimodal planning.arXiv preprint arXiv:2506.06664, 2025

  6. [6]

    Z. Li, W. Yao, Z. Wang, X. Sun, J. Chen, N. Chang, M. Shen, J. Song, Z. Wu, S. Lan, et al. Ztrs: Zero-imitation end-to-end autonomous driving with trajectory scoring.arXiv preprint arXiv:2510.24108, 2025

  7. [7]

    W. Sun, X. Lin, K. Chen, Z. Pei, X. Li, Y . Shi, and S. Zheng. Sparsedrivev2: Scoring is all you need for end-to-end autonomous driving.arXiv preprint arXiv:2603.29163, 2026

  8. [8]

    Kirby, A

    E. Kirby, A. Boulch, Y . Xu, Y . Yin, G. Puy, E. Zablocki, A. Bursuc, S. Gidaris, R. Marlet, F. Bartoccioni, A.-Q. Cao, N. Samet, T.-H. Vu, and M. Cord. Driving on registers. InCVPR, 2026

  9. [9]

    L. Feng, Y . Gao, E. Zablocki, Q. Li, W. Li, S. Liu, M. Cord, and A. Alahi. Rap: 3d rasterization augmented end-to-end planning. InICLR, 2026

  10. [10]

    K. Guo, H. Liu, X. Wu, J. Pan, and C. Lv. ipad: Iterative proposal-centric end-to-end autonomous driving.arXiv preprint arXiv:2505.15111, 2025

  11. [11]

    R. Y . Rubinstein and D. P. Kroese.The cross-entropy method: a unified approach to combinato- rial optimization, Monte-Carlo simulation, and machine learning. Springer, 2004

  12. [12]

    Dauner, M

    D. Dauner, M. Hallgarten, T. Li, X. Weng, Z. Huang, Z. Yang, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone, et al. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking. InNeurIPS, 2024

  13. [13]

    W. Cao, M. Hallgarten, T. Li, D. Dauner, X. Gu, C. Wang, Y . Miron, M. Aiello, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone, A. Geiger, and K. Chitta. Pseudo-simulation for autonomous driving. InCoRL, 2025

  14. [14]

    Dauner, M

    D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta. Parting with misconceptions about learning-based vehicle motion planning. InCoRL, 2023. 9

  15. [15]

    H. Zhou, L. Lin, J. Wang, Y . Lu, D. Bai, B. Liu, Y . Wang, A. Geiger, and Y . Liao. HUGSIM: A real-time, photo-realistic and closed-loop simulator for autonomous driving.IEEE Trans. Pattern Anal. Mach. Intell., 2026

  16. [16]

    W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun. End-to-end interpretable neural motion planner. InCVPR, 2019

  17. [17]

    Yurtsever, J

    E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda. A survey of autonomous driving: Common practices and emerging technologies.IEEE Access, 2020

  18. [18]

    Jiang, S

    B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang. Vad: Vectorized scene representation for efficient autonomous driving. InICCV, 2023

  19. [19]

    X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone. Para-drive: Parallelized architecture for real-time autonomous driving. InCVPR, 2024

  20. [20]

    B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InCVPR, 2025

  21. [21]

    Z. Xing, X. Zhang, Y . Hu, B. Jiang, T. He, Q. Zhang, X. Long, and W. Yin. Goalflow: Goal- driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. InCVPR, 2025

  22. [22]

    W. Yao, Z. Li, S. Lan, Z. Wang, X. Sun, J. M. ´Alvarez, and Z. Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InAAAI, 2026

  23. [23]

    Kobilarov

    M. Kobilarov. Cross-entropy motion planning.Int. J. Robotics Res., 2012

  24. [24]

    Pinneri, S

    C. Pinneri, S. Sawant, S. Blaes, J. Achterhold, J. Stueckler, M. Rol ´ınek, and G. Martius. Sample-efficient cross-entropy method for real-time planning. InCoRL, 2020

  25. [25]

    Williams, N

    G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou. Information theoretic MPC for model-based reinforcement learning. InICRA, 2017

  26. [26]

    Drews, G

    P. Drews, G. Williams, B. Goldfain, E. A. Theodorou, and J. M. Rehg. Aggressive deep driving: Combining convolutional neural networks and model predictive control. InCoRL, 2017

  27. [27]

    Huang, H

    Z. Huang, H. Liu, J. Wu, and C. Lv. Differentiable integrated motion prediction and planning with learnable cost function for autonomous driving.IEEE Trans. Neural Networks Learn. Syst., 2024

  28. [28]

    C. Sima, K. Chitta, Z. Yu, S. Lan, P. Luo, A. Geiger, H. Li, and J. M. Alvarez. Centaur: Robust end-to-end autonomous driving with test-time training.arXiv preprint arXiv:2503.11650, 2025

  29. [29]

    Jaech, A

    A. Jaech, A. Kalai, A. Lerer, A. Richardson, A. El-Kishky, A. Low, A. Helyar, A. Madry, A. Beutel, A. Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024

  30. [30]

    D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  31. [31]

    Brown, J

    B. Brown, J. Juravsky, R. Ehrlich, R. Clark, Q. V . Le, C. R´e, and A. Mirhoseini. Large language monkeys: Scaling inference compute with repeated sampling.arXiv preprint arXiv:2407.21787, 2024

  32. [32]

    Zheng, Y

    P. Zheng, Y . Zhao, Z. Gong, H. Zhu, and S. Wu. Simplevsf: Vlm-scoring fusion for trajectory prediction of end-to-end autonomous driving.arXiv preprint arXiv:2510.17191, 2025

  33. [33]

    Y . Li, Y . Wang, Y . Liu, J. He, L. Fan, and Z. Zhang. End-to-end driving with online trajectory evaluation via BEV world model. InICCV, 2025. 10

  34. [34]

    J. Sun, F. Xue, T. Long, C. Liu, J.-F. Hu, W.-S. Zheng, and N. Sebe. Risk-aware world model predictive control for generalizable end-to-end autonomous driving.arXiv preprint arXiv:2602.23259, 2026

  35. [35]

    Y . Li, K. Xiong, X. Guo, F. Li, S. Yan, G. Xu, L. Zhou, L. Chen, H. Sun, B. Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving. InICLR, 2026

  36. [36]

    Z. Song, L. Liu, H. Pan, B. Liao, M. Guo, L. Yang, Y . Zhang, S. Xu, C. Jia, and Y . Luo. Diver: Reinforced diffusion breaks imitation bottlenecks in end-to-end autonomous driving.arXiv preprint arXiv:2507.04049, 2025

  37. [37]

    D. Li, C. Li, Y . Wang, J. Ren, X. Wen, P. Li, L. Xu, K. Zhan, P. Jia, X. Lang, et al. Learning personalized driving styles via reinforcement learning from human feedback.arXiv preprint arXiv:2503.10434, 2025

  38. [38]

    Hanselmann, K

    N. Hanselmann, K. Renz, K. Chitta, A. Bhattacharyya, and A. Geiger. KING: generating safety-critical driving scenarios for robust imitation via kinematics gradients. InECCV, 2022

  39. [39]

    Y . Yin, P. Khayatan,´E. Zablocki, A. Boulch, and M. Cord. Regents: Real-world safety-critical driving scenario generation made stable. InECCV Workshops, 2024

  40. [40]

    Kerbl, G

    B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 2023

  41. [41]

    Y . Hong, X. Zhou, Y . Li, X. Zhou, L. Liu, Y . Luo, S. Xu, L. Yang, and Z. Song. Drivefuture: Future-aware latent world models for autonomous driving.arXiv preprint arXiv:2605.09701, 2026

  42. [42]

    Y . Liao, J. Xie, and A. Geiger. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.TPAMI, 2023

  43. [43]

    Caesar, V

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom. nuscenes: A multimodal dataset for autonomous driving. InCVPR, 2020

  44. [44]

    P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, Y . Wang, and D. Yang. Pandaset: Advanced sensor suite dataset for autonomous driving. InITSC, 2021

  45. [45]

    P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov. Scalability in perception for autonomous driving: Waymo open dataset. InCVPR, 2020

  46. [46]

    Chitta, A

    K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.TPAMI, 2022

  47. [47]

    Y . Wang, X. Li, W. Wang, J. Zhang, Y . Li, Y . Chen, X. Wang, and Z. Zhang. Unified vision- language-action model.arXiv preprint arXiv:2506.19850, 2025

  48. [48]

    Y . Chen, Y . Wang, and Z. Zhang. Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers. InICCV, 2025

  49. [49]

    C. Shi, S. Shi, K. Sheng, B. Zhang, and L. Jiang. Drivex: Omni scene modeling for learning generalizable world knowledge in autonomous driving. InICCV, 2025

  50. [50]

    Zheng, P

    Y . Zheng, P. Yang, Z. Xing, Q. Zhang, Y . Zheng, Y . Gao, P. Li, T. Zhang, Z. Xia, P. Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InICCV, 2025. 11

  51. [51]

    C. Yuan, Z. Zhang, J. Sun, S. Sun, Z. Huang, C. D. W. Lee, D. Li, Y . Han, A. Wong, K. P. Tee, et al. Drama: An efficient end-to-end motion planner for autonomous driving with mamba. In ISRR, 2024

  52. [52]

    S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, Q. Zhang, C. Huang, W. Liu, and X. Wang. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.arXiv preprint arXiv:2402.13243, 2024

  53. [53]

    Wozniak, L

    M. Wozniak, L. Liu, Y . Cai, and P. Jensfelt. PRIX: learning to plan from raw pixels for end-to-end autonomous driving.IEEE Robotics Autom. Lett., 2026

  54. [54]

    Z. Zhou, T. Cai, S. Z. Zhao, Y . Zhang, Z. Huang, B. Zhou, and J. Ma. Autovla: A vision- language-action model for end-to-end autonomous driving with adaptive reasoning and rein- forcement fine-tuning.NeurIPS, 2025

  55. [55]

    Y . Li, S. Shang, W. Liu, B. Zhan, H. Wang, Y . Wang, Y . Chen, X. Wang, Y . An, C. Tang, et al. Drivevla-w0: World models amplify data scaling law in autonomous driving.arXiv preprint arXiv:2510.12796, 2025

  56. [56]

    K. Li, Z. Li, S. Lan, Y . Xie, Z. Zhang, J. Liu, Z. Wu, Z. Yu, and J. M. Alvarez. Hydra- mdp++: Advancing end-to-end driving via expert-guided hydra-distillation.arXiv e-prints, pages arXiv–2503, 2025

  57. [57]

    L. Liu, C. Jia, G. Yu, Z. Song, J. Li, F. Jia, P. Wu, X. Hao, and Y . Luo. Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025

  58. [58]

    Jiang, S

    B. Jiang, S. Chen, B. Liao, X. Zhang, W. Yin, Q. Zhang, C. Huang, W. Liu, and X. Wang. Senna: Bridging large vision-language models and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024

  59. [59]

    B. Sun, Y . Cao, Y . Wang, R. Wang, J. Shang, X. Feng, J. Lu, J. Shi, S. Yang, X. Yan, et al. Minddrive: An all-in-one framework bridging world models and vision-language model for end-to-end autonomous driving.arXiv preprint arXiv:2512.04441, 2025

  60. [60]

    Z. Zhou, R. Yang, Y . Guo, S. X. Chen, T. Feng, K. Pistunova, Y . Shen, L. Su, J. Ma, et al. Spanvla: Efficient action bridging and learning from negative-recovery samples for vision- language-action model.arXiv preprint arXiv:2604.19710, 2026

  61. [61]

    Jiang, Y

    A. Jiang, Y . Gao, Z. Sun, Y . Wang, J. Wang, J. Chai, Q. Cao, Y . Heng, H. Jiang, Y . Dong, et al. Diffvla: Vision-language guided diffusion planning for autonomous driving.arXiv preprint arXiv:2505.19381, 2025

  62. [62]

    H. Tian, T. Li, H. Liu, J. Yang, Y . Qiu, G. Li, J. Wang, Y . Gao, Z. Zhang, L. Wang, H. Ye, T. Tan, L. Chen, and H. Li. Simscale: Learning to drive via real-world simulation at scale. InCVPR, 2026. 12 A Driving Metrics In the study, we use the official metrics as defined by their original benchmarks. NA VSIM-v1 [12].We use the Predictive Driver Model Sco...