pith. machine review for the scientific record. sign in

arxiv: 2603.19675 · v2 · submitted 2026-03-20 · 💻 cs.CV · cs.RO

Recognition: 2 theorem links

· Lean Theorem

DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:03 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords autonomous drivingworld modelsrectified flowlatent dynamicstrajectory selectionscene evolutionstability-aware planning
0
0 comments X

The pith

DynFlowDrive learns a velocity field via rectified flow to predict how driving actions evolve latent scene states, supporting stability-based trajectory selection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a latent world model that replaces appearance generation and deterministic regression with flow-based dynamics for autonomous driving. It trains a velocity field that describes continuous changes in scene states conditioned on different actions, allowing the model to integrate forward step by step to forecast future states. This formulation supports a new selection method that ranks candidate trajectories by how stable the resulting transitions appear in latent space. The approach is shown to improve planning reliability on standard driving benchmarks while adding no inference cost. A sympathetic reader would care because unreliable future-state prediction has been a persistent barrier to safe action planning in real-world driving systems.

Core claim

By adopting the rectified flow formulation, the model learns a velocity field that describes how the scene state changes under different driving actions, enabling progressive prediction of future latent states. Building on this, the method adds a stability-aware multi-mode trajectory selection strategy that evaluates candidates according to the stability of the induced scene transitions, yielding consistent gains across driving frameworks on the nuScenes and NavSim benchmarks.

What carries the argument

The rectified flow velocity field in latent space that models continuous, action-conditioned transitions between world states.

If this is right

  • Future states can be predicted progressively by integrating along the learned velocity field instead of generating appearances or regressing deterministically.
  • Trajectory selection becomes possible by measuring the stability of the scene transitions each candidate action would induce.
  • The same model can be plugged into existing driving frameworks to improve reliability without increasing inference time.
  • Action-conditioned evolution is captured directly in the latent dynamics rather than through separate generation steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same velocity-field approach could be tested on other sequential control tasks where state transitions depend on chosen actions.
  • If the latent space preserves enough scene structure, the stability metric might be extended to incorporate uncertainty estimates from the flow itself.
  • Online fine-tuning of the velocity field using new sensor data could allow the model to adapt to changing environments without retraining from scratch.

Load-bearing premise

The velocity field learned in latent space accurately captures how real scenes evolve under driving actions and that the stability of those transitions reliably indicates safe planning choices.

What would settle it

If integrating the learned velocity field from an observed initial state under a known action produces latent predictions that deviate substantially from the actual future states recorded in held-out driving sequences, the central modeling claim would be falsified.

Figures

Figures reproduced from arXiv: 2603.19675 by Angela Yao, Jianke Zhu, Junbo Chen, Song Wang, Xiaolu Liu, Yicong Li.

Figure 1
Figure 1. Figure 1: (a) Comparisons of perception-based and latent world model-based approaches on nuScenes and NavSim benchmarks. (b) Planning visualization on the front view and bird’s-eye-view (BEV) space. Our DynFlowDrive achieves comparable performance. Abstract. Recently, world models have been incorporated into the au￾tonomous driving systems to improve the planning reliability. Existing approaches typically predict fu… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison between (a) the existing static world model and (b) the dy￾namic latent world model of our DynFlowDrive. Instead of the static regression of next-frame latents, we propose the dynamic modeling that learns a continuous ve￾locity field vθ to capture the evolution of world transitions. trajectories. The learned velocity field explicitly captures the rate of change of the scene during state transiti… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of DynFlowDrive. Given current observations, multi-mode tra￾jectories are firstly generated by the standard planning module. A flow-based dynamic latent world model is incorporated to simulate the progressive future evolution in la￾tent space. The resulting dynamics are used by a stability-aware multi-mode selection module, which assess the trajectory based on reconstruction quality and flow-based… view at source ↗
Figure 4
Figure 4. Figure 4: The architecture of our dynamic latent world model design, in which the velocity field vθ is learnt to capture the trajectory-conditioned dynamics transitions in the latent space. Flow-based Latent World Model Simulation. Given the current world latent z˜ w t and predicted trajectories Tˆ t, we model trajectory￾conditioned world evolu￾tion in latent space. As shown in [PITH_FULL_IMAGE:figures/full_fig_p00… view at source ↗
Figure 5
Figure 5. Figure 5: Stability-aware Multi￾mode Selection. For training, the score head is supervised by the stable criterion. For In￾ference, the best mode trajec￾tory is selected according to the highest score index. Our multi-modal trajectory predictor generates a set of candidate trajectories {Tˆ n t } N n=1 to capture the in￾herent uncertainty of future driving behaviors. Con￾ventional selection strategies typically rely … view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of Planning Results on nuScenese dataset. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Recently, world models have been incorporated into the autonomous driving systems to improve the planning reliability. Existing approaches typically predict future states through appearance generation or deterministic regression, which limits their ability to capture trajectory-conditioned scene evolution and leads to unreliable action planning. To address this, we propose DynFlowDrive, a latent world model that leverages flow-based dynamics to model the transition of world states under different driving actions. By adopting the rectifiedflow formulation, the model learns a velocity field that describes how the scene state changes under different driving actions, enabling progressive prediction of future latent states. Building upon this, we further introduce a stability-aware multi-mode trajectory selection strategy that evaluates candidate trajectories according to the stability of the induced scene transitions. Extensive experiments on the nuScenes and NavSim benchmarks demonstrate consistent improvements across diverse driving frameworks without introducing additional inference overhead. Source code will be abaliable at https://github.com/xiaolul2/DynFlowDrive.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents DynFlowDrive, a latent world model for autonomous driving that adopts the rectified flow formulation to learn a velocity field describing action-conditioned scene state transitions. This enables progressive prediction of future latent states via ODE integration. A stability-aware multi-mode trajectory selection strategy is proposed that scores candidate trajectories by the integrated norm of the induced velocity field. Experiments on nuScenes and NavSim benchmarks report consistent improvements over prior world-model and planning baselines without added inference cost.

Significance. If the empirical gains and ablations hold under rigorous scrutiny, the work offers a principled continuous-dynamics alternative to deterministic regression or generative world models, potentially improving trajectory-conditioned prediction reliability for planning. The stability metric provides a concrete, if empirical, link between flow-field properties and planning safety.

major comments (2)
  1. [Experiments] Experiments section: the central claim of consistent improvements rests on quantitative results, yet the manuscript provides insufficient detail on data splits, number of runs, error bars, and exact metric values for the nuScenes and NavSim evaluations, preventing verification that gains are robust rather than post-hoc.
  2. [§3.2] §3.2 (Stability-aware Trajectory Selection): the stability metric is defined as the integrated norm of the velocity field along the predicted path, but no analysis is given of its sensitivity to integration step count, norm choice, or latent-space scaling; this directly affects whether the metric reliably proxies safe planning.
minor comments (2)
  1. [Abstract] Abstract: typo 'abaliable' should be 'available'.
  2. [Abstract] Abstract and §2: 'rectifiedflow' should be hyphenated or spaced as 'rectified flow' to match standard terminology in the flow-matching literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive summary and constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the presentation of results and analysis.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the central claim of consistent improvements rests on quantitative results, yet the manuscript provides insufficient detail on data splits, number of runs, error bars, and exact metric values for the nuScenes and NavSim evaluations, preventing verification that gains are robust rather than post-hoc.

    Authors: We agree that additional details are required for reproducibility and verification. In the revised manuscript we will expand the Experiments section to explicitly describe the train/validation/test splits for both nuScenes and NavSim, state that all quantitative results are averaged over five independent runs with different random seeds, include error bars (standard deviation) in all tables and figures, and report the precise numerical values (rather than only relative gains) for every metric. revision: yes

  2. Referee: [§3.2] §3.2 (Stability-aware Trajectory Selection): the stability metric is defined as the integrated norm of the velocity field along the predicted path, but no analysis is given of its sensitivity to integration step count, norm choice, or latent-space scaling; this directly affects whether the metric reliably proxies safe planning.

    Authors: We acknowledge that the current manuscript does not contain sensitivity analysis for the stability metric. In the revision we will add a dedicated paragraph (and, if space permits, a small table or plot in the appendix) that examines the effect of varying the number of ODE integration steps, the choice of norm (L1 versus L2), and different latent-space scaling factors on the ranking of candidate trajectories. This will provide evidence that the metric remains stable under reasonable hyper-parameter choices. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper adopts the standard rectified-flow formulation to learn a velocity field in latent space conditioned on driving actions and trajectories; this is trained directly from data rather than defined by construction to equal its own outputs. The stability metric is introduced as the integrated norm of the predicted velocity field along candidate paths, which is a downstream computation and does not reduce the core prediction to a fitted input. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are used to justify the central modeling choice. The claimed gains are presented as empirical results on nuScenes and NavSim benchmarks, leaving the derivation self-contained against external validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are stated. The rectified-flow velocity field is learned from data rather than postulated as a new entity.

pith-pipeline@v0.9.0 · 5469 in / 1122 out tokens · 33815 ms · 2026-05-15T09:03:24.475413+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 7 internal anchors

  1. [1]

    In: IJCAI

    Allen, J.F., Koomen, J.A.: Planning using a temporal world model. In: IJCAI. pp. 741–747 (1983) 2

  2. [2]

    In: CVPR

    Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: CVPR. pp. 11621–11631 (2020) 3, 9, 10

  3. [3]

    IEEE TPAMI (2024) 2

    Chen, L., Wu, P., Chitta, K., Jaeger, B., Geiger, A., Li, H.: End-to-end autonomous driving: Challenges and frontiers. IEEE TPAMI (2024) 2

  4. [4]

    VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

    Chen,S.,Jiang,B.,Gao,H.,Liao,B.,Xu,Q.,Zhang,Q.,Huang,C.,Liu,W.,Wang, X.: Vadv2: End-to-end vectorized autonomous driving via probabilistic planning. arXiv preprint arXiv:2402.13243 (2024) 2, 13

  5. [5]

    The International Journal of Robotics Research44(10-11), 1684–1704 (2025) 4

    Chi,C.,Xu,Z.,Feng,S.,Cousineau,E.,Du,Y.,Burchfiel,B.,Tedrake,R.,Song,S.: Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research44(10-11), 1684–1704 (2025) 4

  6. [6]

    Impromptu vla: Open weights and open data for driving vision-language-action models.arXiv preprint arXiv:2505.23757,

    Chi, H., Gao, H.a., Liu, Z., Liu, J., Liu, C., Li, J., Yang, K., Yu, Y., Wang, Z., Li, W., et al.: Impromptu vla: Open weights and open data for driving vision- language-action models. arXiv preprint arXiv:2505.23757 (2025) 4

  7. [7]

    TIV9(1), 103–118 (2023) 2

    Chib,P.S.,Singh,P.:Recentadvancementsinend-to-endautonomousdrivingusing deep learning: A survey. TIV9(1), 103–118 (2023) 2

  8. [8]

    IEEE TPAMI 45(11), 12878–12895 (2022) 13

    Chitta, K., Prakash, A., Jaeger, B., Yu, Z., Renz, K., Geiger, A.: Transfuser: Imi- tation with transformer-based sensor fusion for autonomous driving. IEEE TPAMI 45(11), 12878–12895 (2022) 13

  9. [9]

    In: NeurIPS (2024) 3, 9, 11

    Dauner, D., Hallgarten, M., Li, T., Weng, X., Huang, Z., Yang, Z., Li, H., Gilitschenski, I., Ivanovic, B., Pavone, M., et al.: Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking. In: NeurIPS (2024) 3, 9, 11

  10. [10]

    IEEE Internet of Things Journal13(3), 3870–3898 (2025) 2

    Dong, W., Lu, S., Chen, X., Zhang, S., Liu, Q., Liu, Z., Chen, L., Wang, H., Cai, Y.: End-to-end autonomous driving: From classic paradigm to large model empowerment—a comprehensive survey. IEEE Internet of Things Journal13(3), 3870–3898 (2025) 2

  11. [11]

    Rad: Training an end-to-end driving policy via large-scale 3dgs-based reinforcement learning.arXiv preprint arXiv:2502.13144,

    Gao, H., Chen, S., Jiang, B., Liao, B., Shi, Y., Guo, X., Pu, Y., Yin, H., Li, X., Zhang, X., Zhang, Y., Liu, W., Zhang, Q., Wang, X.: Rad: Training an end-to-end driving policy via large-scale 3dgs-based reinforcement learning. arXiv preprint arXiv:2502.13144 (2025) 4

  12. [12]

    In: ICLR (2024) 2

    Gao, R., Chen, K., Xie, E., Hong, L., Li, Z., Yeung, D.Y., Xu, Q.: Magicdrive: Street view generation with diverse 3d geometry control. In: ICLR (2024) 2

  13. [13]

    TITS17(4), 1135–1145 (2015) 4

    González, D., Pérez, J., Milanés, V., Nashashibi, F.: A review of motion planning techniques for automated vehicles. TITS17(4), 1135–1145 (2015) 4

  14. [14]

    TIV (2024) 2, 4

    Guan, Y., Liao, H., Li, Z., Hu, J., Yuan, R., Zhang, G., Xu, C.: World models for autonomous driving: An initial survey. TIV (2024) 2, 4

  15. [15]

    In: CVPR

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. pp. 770–778 (2016) 10, 11

  16. [16]

    GAIA-1: A Generative World Model for Autonomous Driving

    Hu, A., Russell, L., Yeo, H., Murez, Z., Fedoseev, G., Kendall, A., Shotton, J., Corrado, G.: Gaia-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080 (2023) 2

  17. [17]

    In: ECCV

    Hu, S., Chen, L., Wu, P., Li, H., Yan, J., Tao, D.: St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning. In: ECCV. pp. 533–549. Springer (2022) 4, 11

  18. [18]

    arXiv preprint arXiv:2512.16760 (2025) 2 16 X

    Hu, T., Liu, X., Wang, S., Zhu, Y., Liang, A., Kong, L., Zhao, G., Gong, Z., Cen, J., Huang, Z., et al.: Vision-language-action models for autonomous driving: Past, present, and future. arXiv preprint arXiv:2512.16760 (2025) 2 16 X. Liu et al

  19. [19]

    In: CVPR

    Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W., et al.: Planning-oriented autonomous driving. In: CVPR. pp. 17853– 17862 (2023) 4, 6, 10, 11, 13

  20. [20]

    In: ICCV

    Jiang, B., Chen, S., Xu, Q., Liao, B., Chen, J., Zhou, H., Zhang, Q., Liu, W., Huang,C.,Wang,X.:Vad:Vectorizedscenerepresentationforefficientautonomous driving. In: ICCV. pp. 8340–8350 (2023) 4, 6, 10, 11

  21. [21]

    arXiv preprint arXiv:2508.09158 (2025) 4

    Jiao, S., Qian, K., Ye, H., Zhong, Y., Luo, Z., Jiang, S., Huang, Z., Fang, Y., Miao, J., Fu, Z., et al.: Evadrive: Evolutionary adversarial policy optimization for end-to-end autonomous driving. arXiv preprint arXiv:2508.09158 (2025) 4

  22. [22]

    arXiv preprint arXiv:2509.07996 (2025) 2, 4

    Kong, L., Yang, W., Mei, J., Liu, Y., Liang, A., Zhu, D., Lu, D., Yin, W., Hu, X., Jia, M., et al.: 3d and 4d world modeling: A survey. arXiv preprint arXiv:2509.07996 (2025) 2, 4

  23. [23]

    arXiv preprint arXiv:2409.18341 (2024) 3, 4, 6, 10, 11

    Li, P., Cui, D.: Navigation-guided sparse scene representation for end-to-end au- tonomous driving. arXiv preprint arXiv:2409.18341 (2024) 3, 4, 6, 10, 11

  24. [24]

    Enhancing end-to-end autonomous driving with latent world model.arXiv preprint arXiv:2406.08481, 2024

    Li, Y., Fan, L., He, J., Wang, Y., Chen, Y., Zhang, Z., Tan, T.: Enhancing end-to- end autonomous driving with latent world model. arXiv preprint arXiv:2406.08481 (2024) 2, 4, 6, 10, 11, 13

  25. [25]

    End-to-end driving with online tra- jectory evaluation via bev world model.arXiv preprint arXiv:2504.01941, 2025

    Li, Y., Wang, Y., Liu, Y., He, J., Fan, L., Zhang, Z.: End-to-end driving with online trajectory evaluation via bev world model. arXiv preprint arXiv:2504.01941 (2025) 4, 10, 11, 13

  26. [26]

    ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

    Li, Y., Xiong, K., Guo, X., Li, F., Yan, S., Xu, G., Zhou, L., Chen, L., Sun, H., Wang, B., et al.: Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving. arXiv preprint arXiv:2506.08052 (2025) 4

  27. [27]

    Drive-r1: Bridging reasoning and planning in vlms for autonomous driving with reinforcement learning.arXiv preprint arXiv:2506.18234,

    Li, Y., Tian, M., Zhu, D., Zhu, J., Lin, Z., Xiong, Z., Zhao, X.: Drive-r1: Bridg- ing reasoning and planning in vlms for autonomous driving with reinforcement learning. arXiv preprint arXiv:2506.18234 (2025) 4

  28. [28]

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Li, Z., Li, K., Wang, S., Lan, S., Yu, Z., Ji, Y., Li, Z., Zhu, Z., Kautz, J., Wu, Z., et al.: Hydra-mdp: End-to-end multimodal planning with multi-target hydra- distillation. arXiv preprint arXiv:2406.06978 (2024) 13

  29. [29]

    IEEE TPAMI (2024) 3

    Li, Z., Wang, W., Li, H., Xie, E., Sima, C., Lu, T., Yu, Q., Dai, J.: Bevformer: learningbird’s-eye-viewrepresentationfromlidar-cameraviaspatiotemporaltrans- formers. IEEE TPAMI (2024) 3

  30. [30]

    Li, Z., Yu, Z., Lan, S., Li, J., Kautz, J., Lu, T., Alvarez, J.M.: Is ego status all you need for open-loop end-to-end autonomous driving? In: CVPR (2024) 4, 10, 11

  31. [31]

    In: CVPR

    Liao, B., Chen, S., Yin, H., Jiang, B., Wang, C., Yan, S., Zhang, X., Li, X., Zhang, Y., Zhang, Q., et al.: Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. In: CVPR. pp. 12037–12047 (2025) 4, 11, 13

  32. [32]

    TITS22(1), 341–355 (2019) 4

    Lim, W., Lee, S., Sunwoo, M., Jo, K.: Hybrid trajectory planning for autonomous driving in on-road dynamic scenarios. TITS22(1), 341–355 (2019) 4

  33. [33]

    Flow Matching for Generative Modeling

    Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022) 4

  34. [34]

    In: ROBIO

    Liu, J., Mao, X., Fang, Y., Zhu, D., Meng, M.Q.H.: A survey on deep-learning approaches for vehicle trajectory prediction in autonomous driving. In: ROBIO. pp. 978–985. IEEE (2021) 4

  35. [35]

    In: CVPR

    Liu, X., Wang, S., Li, W., Yang, R., Chen, J., Zhu, J.: Mgmap: Mask-guided learning for online vectorized hd map construction. In: CVPR. pp. 14812–14821 (2024) 3

  36. [36]

    In: CVPR

    Liu, X., Yang, R., Wang, S., Li, W., Chen, J., Zhu, J.: Uncertainty-instructed structure injection for generalizable hd map construction. In: CVPR. pp. 22359– 22368 (2025) 3 DynFlowDrive 17

  37. [37]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003 (2022) 7

  38. [38]

    In: ICCV

    Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV. pp. 10012–10022 (2021) 11

  39. [39]

    Decoupled Weight Decay Regularization

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017) 10

  40. [40]

    In: ICLR (2024) 2

    Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. In: ICLR (2024) 2

  41. [41]

    In: CVPR

    Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: CVPR. pp. 7077–7087 (2021) 13

  42. [42]

    In: ECCV

    Song, H., Ding, W., Chen, Y., Shen, S., Wang, M.Y., Chen, Q.: Pip: Planning- informed trajectory prediction for autonomous driving. In: ECCV. pp. 598–614. Springer (2020) 4

  43. [43]

    In: CVPR

    Song, Z., Jia, C., Liu, L., Pan, H., Zhang, Y., Wang, J., Zhang, X., Xu, S., Yang, L., Luo, Y.: Don’t shake the wheel: Momentum-aware planning in end-to-end au- tonomous driving. In: CVPR. pp. 22432–22441 (2025) 11

  44. [44]

    In: ICRA

    Sun, W., Lin, X., Shi, Y., Zhang, C., Wu, H., Zheng, S.: Sparsedrive: End-to- end autonomous driving via sparse scene representation. In: ICRA. pp. 8795–8801. IEEE (2025) 4, 11

  45. [45]

    In: ICCV

    Wang, S., Liu, Y., Wang, T., Li, Y., Zhang, X.: Exploring object-centric temporal modeling for efficient multi-view 3d object detection. In: ICCV. pp. 3621–3631 (2023) 3

  46. [46]

    In: CVPR

    Wang, S., Yu, J., Li, W., Liu, W., Liu, X., Chen, J., Zhu, J.: Not all voxels are equal: Hardness-aware semantic scene completion with self-distillation. In: CVPR. pp. 14792–14801 (2024) 3

  47. [47]

    In: ECCV

    Wang, X., Zhu, Z., Huang, G., Chen, X., Zhu, J., Lu, J.: Drivedreamer: Towards real-world-drive world models for autonomous driving. In: ECCV. pp. 55–72 (2024) 2

  48. [48]

    arXiv preprint arXiv:2503.24381 (2025) 4

    Wang, Y., Huang, X., Sun, X., Yan, M., Xing, S., Tu, Z., Li, J.: Uniocc: A unified benchmark for occupancy forecasting and prediction in autonomous driving. arXiv preprint arXiv:2503.24381 (2025) 4

  49. [49]

    In: CVPR

    Wang, Y., He, J., Fan, L., Li, H., Chen, Y., Zhang, Z.: Driving into the future: Mul- tiview visual forecasting and planning with world model for autonomous driving. In: CVPR. pp. 14749–14759 (2024) 2, 4

  50. [50]

    In: CVPR

    Weng, X., Ivanovic, B., Wang, Y., Wang, Y., Pavone, M.: Para-drive: Parallelized architecture for real-time autonomous driving. In: CVPR. pp. 15449–15458 (2024) 2, 11, 13

  51. [51]

    In: ITSC

    Xin, L., Wang, P., Chan, C.Y., Chen, J., Li, S.E., Cheng, B.: Intention-aware long horizon trajectory prediction of surrounding vehicles using dual lstm networks. In: ITSC. pp. 1441–1446. IEEE (2018) 4

  52. [52]

    In: CVPR

    Xing, Z., Zhang, X., Hu, Y., Jiang, B., He, T., Zhang, Q., Long, X., Yin, W.: Goalflow: Goal-driven flow matching for multimodal trajectories generation in end- to-end autonomous driving. In: CVPR. pp. 1602–1611 (2025) 4

  53. [53]

    arXiv preprint arXiv:2511.20325 (2025) 4 18 X

    Yan, T., Tang, T., Gui, X., Li, Y., Zhesng, J., Huang, W., Kong, L., Han, W., Zhou, X., Zhang, X., et al.: Ad-r1: Closed-loop reinforcement learning for end-to-end autonomous driving with impartial world models. arXiv preprint arXiv:2511.20325 (2025) 4 18 X. Liu et al

  54. [54]

    In: CVPR

    Yang, J., Gao, S., Qiu, Y., Chen, L., Li, T., Dai, B., Chitta, K., Wu, P., Zeng, J., Luo, P., et al.: Generalized predictive model for autonomous driving. In: CVPR. pp. 14662–14672 (2024) 4

  55. [55]

    arXiv preprint arXiv:2512.19133 (2025) 2, 4

    Yang, P., Lu, B., Xia, Z., Han, C., Gao, Y., Zhang, T., Zhan, K., Lang, X., Zheng, Y., Zhang, Q.: Worldrft: Latent world model planning with reinforcement fine- tuning for autonomous driving. arXiv preprint arXiv:2512.19133 (2025) 2, 4

  56. [56]

    In: AAAI

    Yang, Y., Mei, J., Ma, Y., Du, S., Chen, W., Qian, Y., Feng, Y., Liu, Y.: Driving in the occupancy world: Vision-centric 4d occupancy forecasting and planning via world models for autonomous driving. In: AAAI. vol. 39, pp. 9327–9335 (2025) 2, 11

  57. [57]

    arXiv preprint arXiv:2408.03601 (2024) 13

    Yuan, C., Zhang, Z., Sun, J., Sun, S., Huang, Z., Lee, C.D.W., Li, D., Han, Y., Wong, A., Tee, K.P., et al.: Drama: An efficient end-to-end motion planner for autonomous driving with mamba. arXiv preprint arXiv:2408.03601 (2024) 13

  58. [58]

    arXiv preprint arXiv:2506.24113 (2025) 2, 4

    Zhang, K., Tang, Z., Hu, X., Pan, X., Guo, X., Liu, Y., Huang, J., Yuan, L., Zhang, Q., Long, X.X., et al.: Epona: Autoregressive diffusion world model for autonomous driving. arXiv preprint arXiv:2506.24113 (2025) 2, 4

  59. [59]

    In: CVPR

    Zhao, G., Ni, C., Wang, X., Zhu, Z., Zhang, X., Wang, Y., Huang, G., Chen, X., Wang, B., Zhang, Y., et al.: Drivedreamer4d: World models are effective data machines for 4d driving scene representation. In: CVPR. pp. 12015–12026 (2025) 4

  60. [60]

    In: ECCV

    Zheng,W.,Chen,W.,Huang,Y.,Zhang,B.,Duan,Y.,Lu,J.:Occworld:Learninga 3d occupancy world model for autonomous driving. In: ECCV. pp. 55–72. Springer (2024) 2, 4

  61. [61]

    In: ECCV

    Zheng, W., Song, R., Guo, X., Zhang, C., Chen, L.: Genad: Generative end-to-end autonomous driving. In: ECCV. pp. 87–104. Springer (2024) 4, 11

  62. [62]

    In: ICCV

    Zheng, Y., Yang, P., Xing, Z., Zhang, Q., Zheng, Y., Gao, Y., Li, P., Zhang, T., Xia, Z., Jia, P., et al.: World4drive: End-to-end autonomous driving via intention- aware physical latent world model. In: ICCV. pp. 28632–28642 (2025) 2, 4, 9, 11, 13

  63. [63]

    arXiv preprint arXiv:2503.23463 (2025) 4

    Zhou, X., Han, X., Yang, F., Ma, Y., Tresp, V., Knoll, A.: Opendrivevla: Towards end-to-end autonomous driving with large vision language action model. arXiv preprint arXiv:2503.23463 (2025) 4