pith. machine review for the scientific record. sign in

arxiv: 2604.07378 · v1 · submitted 2026-04-08 · 💻 cs.RO

Recognition: 2 theorem links

· Lean Theorem

Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles

Chengkai Xu, Jian Sun, Jiaqi Liu, Peng Hang, Yicheng Guo

Pith reviewed 2026-05-10 18:50 UTC · model grok-4.3

classification 💻 cs.RO
keywords adversarial evaluationautonomous vehiclesclosed-loop trainingdiffusion modelssafety-critical scenariosnuScenesnuPlan
0
0 comments X

The pith

Adversarial scenario generation for autonomous vehicles can be reframed as a closed-loop evolutionary curriculum that discovers more failure cases while staying close to real data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Evaluation as Evolution (E²), a framework that converts static adversarial stress testing into an adaptive process where discovered failures are recycled to refine vehicle policies. It models scenario synthesis as transport-regularized sparse control over a learned reverse-time SDE prior, using topology-driven support selection and Topological Anchoring to keep outputs realistic. This closed loop addresses the scarcity of safety-critical events in existing datasets. On nuScenes and nuPlan, E² finds more collisions than prior methods and yields robustness improvements when the boundary cases retrain policies.

Core claim

By casting adversarial generation as transport-regularized sparse control on a reverse-time SDE prior, stabilized by Topological Anchoring and topology-driven agent selection, the method discovers more collision failures than open-loop baselines while constraining deviations from real distributions, allowing those cases to be fed back for policy improvement.

What carries the argument

The Evaluation as Evolution (E²) framework, which performs transport-regularized sparse control over a learned reverse-time SDE prior and applies Topological Anchoring to stabilize high-dimensional scenario generation.

If this is right

  • Collision failure discovery rates rise by 9.01% on nuScenes and up to 21.43% on nuPlan relative to the strongest baselines.
  • Generated scenarios maintain low invalidity and high realism scores.
  • Recycling the discovered boundary cases into closed-loop policy fine-tuning produces measurable robustness gains.
  • Adversarial evaluation shifts from a one-time post-hoc check to an ongoing part of the training loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same transport-regularized control mechanism could be applied to other sequential decision domains where rare failure modes limit robustness.
  • If the topological anchoring step scales to higher-dimensional state spaces, similar curricula might be constructed without hand-crafted scenario templates.
  • Repeated cycles of discovery and retraining could reduce reliance on static datasets for safety validation.

Load-bearing premise

The adversarial scenarios generated from the reverse-time SDE prior stay close enough to real traffic distributions to supply useful training signals without introducing invalid or biased cases.

What would settle it

A controlled experiment showing that policies fine-tuned on E²-generated boundary cases exhibit no reduction in collision rates during closed-loop simulation or real-world replay compared with policies fine-tuned on the strongest baseline adversarial sets.

Figures

Figures reproduced from arXiv: 2604.07378 by Chengkai Xu, Jian Sun, Jiaqi Liu, Peng Hang, Yicheng Guo.

Figure 1
Figure 1. Figure 1: From decoupled evaluation to closed-loop evolution. Top: Conventional pipelines separate training from evaluation, so discovered failures remain discon￾nected from policy learning. Bottom: E2 couples an Adversarial Synthesizer with an Ego Policy to synthesize adversarial interactions for closed-loop simulation and recycle outcomes as learning signals to update the ego policy. terfactual coverage for recove… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Evaluation as Evolution. Left (Adversarial Synthesizer): given scene context, the Synthesizer builds a risk-weighted interaction graph, selects an intervention-critical Top-K adversary set via topological bifurcation analysis, and synthesizes feasible, realistic adversarial trajectories by transport-regularized sparse control over a reverse-time SDE prior with Topological Anchoring. Right (Ego … view at source ↗
Figure 3
Figure 3. Figure 3: Structure-aware sparse control via scene interaction graph construction. The Synthesizer builds a TTC-based risk interaction matrix, converts it into a weighted interaction graph, discovers higher-order risk groups (cliques), and selects top cliques across time to score and activate a small subset of adversarial agents for targeted control. where σ(z) := (1 + exp(−z))−1 , τmax > 0 caps the TTC horizon, and… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative controllability of failure timing and type under increasing adversarial intensity. Closed￾loop rollouts for the lane-graph (LG) ego policy under a fixed random seed (seed=50), with η ∈ {0, 1, 2, 3} from bottom to top. Efficacy of Structure-Aware Selection. Table II val￾idates the hypothesis that the selection of controlled agents is as critical as the control mechanism itself. Randomly selectin… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative interaction skeleton. TTC-based interaction matrices and the activated ego-adversary chain for the same scene. Left: Topological Anchoring disabled. Right: Topological Anchoring enabled. D. Generalization Across Ego Policies To assess generalization, we evaluate E2 against three distinct ego policies, specifically rule-based (LG), physics-based (IDM), and hybrid learned (BITS), treat￾ing each a… view at source ↗
Figure 6
Figure 6. Figure 6: Policy improvement from adversarial closed￾loop fine-tuning. Bars show the percentage gain over the original lane-graph policy on a tuning subset and a disjoint test subset of the nuScenes validation split, eval￾uated under matched and increased adversarial intensity (η = 1.0, 2.0). of magnitude. This observation implies that while BITS effectively manages low-speed interactions, it remains brittle in high… view at source ↗
read the original abstract

Autonomous vehicles in interactive traffic environments are often limited by the scarcity of safety-critical tail events in static datasets, which biases learned policies toward average-case behaviors and reduces robustness. Existing evaluation methods attempt to address this through adversarial stress testing, but are predominantly open-loop and post-hoc, making it difficult to incorporate discovered failures back into the training process. We introduce Evaluation as Evolution ($E^2$), a closed-loop framework that transforms adversarial generation from a static validation step into an adaptive evolutionary curriculum. Specifically, $E^2$ formulates adversarial scenario synthesis as transport-regularized sparse control over a learned reverse-time SDE prior. To make this high-dimensional generation tractable, we utilize topology-driven support selection to identify critical interacting agents, and introduce Topological Anchoring to stabilize the process. This approach enables the targeted discovery of failure cases while strictly constraining deviations from realistic data distributions. Empirically, $E^2$ improves collision failure discovery by 9.01% on the nuScenes dataset and up to 21.43% on the nuPlan dataset over the strongest baselines, while maintaining low invalidity and high realism. It further yields substantial robustness gains when the resulting boundary cases are recycled for closed-loop policy fine-tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Evaluation as Evolution (E²), a closed-loop framework that reformulates adversarial scenario synthesis for autonomous vehicles as an adaptive evolutionary curriculum. It models generation via transport-regularized sparse control over a learned reverse-time SDE prior, augmented by topology-driven support selection and Topological Anchoring to ensure tractability and distributional closeness. The approach discovers more collision failures than open-loop baselines and recycles the resulting boundary cases to fine-tune policies for improved robustness, with reported gains of 9.01% on nuScenes and up to 21.43% on nuPlan while preserving low invalidity and high realism.

Significance. If the empirical results hold under rigorous validation, the work offers a meaningful advance by closing the loop between adversarial evaluation and policy training, directly addressing the scarcity of tail events in static AV datasets. The combination of diffusion-based generation with explicit realism constraints and iterative recycling provides a principled way to evolve curricula, which could lead to more robust closed-loop controllers in interactive traffic settings.

major comments (2)
  1. [Abstract / Results] Abstract and Results: The concrete percentage improvements in collision failure discovery (9.01% on nuScenes, 21.43% on nuPlan) are presented without accompanying details on baseline implementations, number of evaluation runs, variance estimates, or statistical significance tests. This makes it difficult to assess whether the gains are robust or sensitive to implementation choices in the transport-regularized control and Topological Anchoring components.
  2. [Method] Method: The claim that Topological Anchoring and topology-driven support selection keep generated scenarios sufficiently close to the data distribution (to avoid harmful biases) is central to the utility of the closed-loop curriculum, yet the manuscript provides no quantitative validation (e.g., distributional distance metrics or human realism ratings) beyond the stated “low invalidity and high realism.”
minor comments (2)
  1. [Abstract / Introduction] The abstract and introduction use several novel terms (transport-regularized sparse control, Topological Anchoring, topology-driven support selection) without a concise glossary or forward reference to their formal definitions.
  2. [Figures / Tables] Figure captions and experimental tables should explicitly list the exact baselines compared (e.g., which open-loop adversarial methods) and the precise definition of “collision failure discovery” metric.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. We address each major comment below and will revise the manuscript accordingly to improve clarity and substantiation of our claims.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results: The concrete percentage improvements in collision failure discovery (9.01% on nuScenes, 21.43% on nuPlan) are presented without accompanying details on baseline implementations, number of evaluation runs, variance estimates, or statistical significance tests. This makes it difficult to assess whether the gains are robust or sensitive to implementation choices in the transport-regularized control and Topological Anchoring components.

    Authors: We agree that additional experimental details are necessary for rigorous assessment. In the revised manuscript, we will expand the Results and Experimental Setup sections to explicitly describe the baseline implementations, the number of independent evaluation runs, variance estimates (e.g., standard deviations across runs), and any statistical significance testing performed. This will clarify the robustness of the reported improvements. revision: yes

  2. Referee: [Method] Method: The claim that Topological Anchoring and topology-driven support selection keep generated scenarios sufficiently close to the data distribution (to avoid harmful biases) is central to the utility of the closed-loop curriculum, yet the manuscript provides no quantitative validation (e.g., distributional distance metrics or human realism ratings) beyond the stated “low invalidity and high realism.”

    Authors: We acknowledge that the current presentation relies primarily on invalidity rates and qualitative realism indicators. To strengthen the validation of distributional closeness, we will add quantitative metrics in the revised Method and Results sections, such as explicit distributional distance measures (e.g., maximum mean discrepancy or Wasserstein distance) between generated scenarios and the original dataset, computed over the same evaluation runs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims are measured outcomes

full rationale

The paper describes a closed-loop curriculum method that formulates adversarial generation as transport-regularized sparse control over a learned reverse-time SDE prior, stabilized by topology-driven support selection and Topological Anchoring. All load-bearing claims consist of empirical measurements: collision failure discovery rates (9.01% on nuScenes, up to 21.43% on nuPlan) and downstream robustness gains when boundary cases are recycled for policy fine-tuning. These quantities are presented as direct experimental results against baselines on fixed external datasets, subject to explicit constraints on invalidity and realism. No equations, parameter fits, or self-citations are shown to reduce the reported gains to the inputs by construction; the framework remains self-contained with independent content that can be falsified by replication on the same benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Framework rests on standard properties of diffusion SDEs and domain assumptions about scenario realism; introduces new procedural elements without external validation.

axioms (2)
  • standard math Existence and properties of reverse-time stochastic differential equations as priors for scenario generation
    Invoked to model the diffusion process for generating adversarial traffic scenes.
  • domain assumption Generated scenarios remain sufficiently realistic under transport regularization and topological constraints
    Required to ensure low invalidity and high realism as claimed in results.
invented entities (2)
  • Topological Anchoring no independent evidence
    purpose: Stabilize high-dimensional adversarial generation process
    New stabilization technique introduced to make the method tractable.
  • Topology-driven support selection no independent evidence
    purpose: Identify critical interacting agents in scenarios
    New selection mechanism for focusing control on relevant parts of the scene.

pith-pipeline@v0.9.0 · 5529 in / 1395 out tokens · 48401 ms · 2026-05-10T18:50:55.993459+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

45 extracted references · 12 canonical work pages

  1. [1]

    Aligning cyber space with physical world: A comprehensive sur- vey on embodied ai,

    Y . Liu, W. Chen, Y . Bai, X. Liang, G. Li, W. Gao, and L. Lin, “Aligning cyber space with physical world: A comprehensive sur- vey on embodied ai,”IEEE/ASME Transactions on Mechatronics, 2025

  2. [2]

    Multimodal fusion and vision-language models: A survey for robot vision,

    X. Han, S. Chen, Z. Fu, Z. Feng, L. Fan, D. An, C. Wang, L. Guo, W. Meng, X. Zhanget al., “Multimodal fusion and vision-language models: A survey for robot vision,”Information Fusion, p. 103652, 2025

  3. [3]

    Agent0-vl: Exploring self-evolving agent for tool-integrated vision-language reasoning.arXiv preprint arXiv:2511.19900, 2025

    J. Liu, K. Xiong, P. Xia, Y . Zhou, H. Ji, L. Feng, S. Han, M. Ding, and H. Yao, “Agent0-vl: Exploring self-evolving agent for tool-integrated vision-language reasoning,”arXiv preprint arXiv:2511.19900, 2025

  4. [4]

    Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey,

    J. Wu, C. Huang, H. Huang, C. Lv, Y . Wang, and F.-Y . Wang, “Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey,”Transportation Research Part C: Emerging Technologies, vol. 164, p. 104654, 2024

  5. [5]

    Risknet: interaction-aware risk forecasting for autonomous driving in long-tail scenarios,

    Q. Liu, H. Huang, S. Zhao, L. Shi, S. Ahn, and X. Li, “Risknet: interaction-aware risk forecasting for autonomous driving in long-tail scenarios,”Transportation Research Part E: Logistics and Transportation Review, vol. 205, p. 104478, 2026

  6. [6]

    Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,

    X. Jia, Z. Yang, Q. Li, Z. Zhang, and J. Yan, “Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,”Advances in Neural Information Processing Systems, vol. 37, pp. 819–844, 2024

  7. [7]

    Spatialtree: How spatial abilities branch out in mllms,

    Y . Xiao, L. Li, S. Yan, X. Liu, S. Peng, Y . Wei, X. Zhou, and B. Kang, “Spatialtree: How spatial abilities branch out in mllms,” arXiv preprint arXiv:2512.20617, 2025

  8. [8]

    EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

    R. Yang, H. Chen, J. Zhang, M. Zhao, C. Qian, K. Wang, Q. Wang, T. V . Koripella, M. Movahedi, M. Liet al., “Embod- iedbench: Comprehensive benchmarking multi-modal large lan- guage models for vision-driven embodied agents,”arXiv preprint arXiv:2502.09560, 2025

  9. [9]

    Boundary state generation for testing and improvement of autonomous driving systems,

    M. Biagiola and P. Tonella, “Boundary state generation for testing and improvement of autonomous driving systems,”IEEE Transactions on Software Engineering, vol. 50, no. 8, pp. 2040– 2053, 2024

  10. [10]

    Elucidating the solution space of extended reverse-time sde for diffusion models,

    Q. Cui, X. Zhang, Q. Bao, and Q. Liao, “Elucidating the solution space of extended reverse-time sde for diffusion models,” in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 243–252

  11. [11]

    Neuroncap: Pho- torealistic closed-loop safety testing for autonomous driving,

    W. Ljungbergh, A. Tonderski, J. Johnander, H. Caesar, K. ˚Astr¨om, M. Felsberg, and C. Petersson, “Neuroncap: Pho- torealistic closed-loop safety testing for autonomous driving,” in European Conference on Computer Vision. Springer, 2024, pp. 161–177

  12. [12]

    A survey: Learning embodied intelligence from physical simulators and world models,

    X. Long, Q. Zhao, K. Zhang, Z. Zhang, D. Wang, Y . Liu, Z. Shu, Y . Lu, S. Wang, X. Weiet al., “A survey: Learning embodied intelligence from physical simulators and world models,”arXiv preprint arXiv:2507.00917, 2025

  13. [13]

    Inter- active adversarial testing of autonomous vehicles with adjustable confrontation intensity,

    Y . Guo, C. Xu, J. Liu, H. Zhang, P. Hang, and J. Sun, “Inter- active adversarial testing of autonomous vehicles with adjustable confrontation intensity,”arXiv preprint arXiv:2507.21814, 2025

  14. [14]

    Explainable ai for safe and trustworthy autonomous driving: A systematic review,

    A. Kuznietsov, B. Gyevnar, C. Wang, S. Peters, and S. V . Albrecht, “Explainable ai for safe and trustworthy autonomous driving: A systematic review,”IEEE Transactions on Intelligent Transportation Systems, 2024

  15. [15]

    Gen- erating intersection pre-crash trajectories for autonomous driving safety testing using transformer time-series generative adversarial networks,

    X. Liu, H. Huang, J. Bian, R. Zhou, Z. Wei, and H. Zhou, “Gen- erating intersection pre-crash trajectories for autonomous driving safety testing using transformer time-series generative adversarial networks,”Engineering Applications of Artificial Intelligence, vol. 160, p. 111995, 2025. 12

  16. [16]

    Ddm-lag: A diffusion-based decision-making model for autonomous vehicles with lagrangian safety enhancement,

    J. Liu, P. Hang, X. Zhao, J. Wang, and J. Sun, “Ddm-lag: A diffusion-based decision-making model for autonomous vehicles with lagrangian safety enhancement,”IEEE Transactions on Artificial Intelligence, 2024

  17. [17]

    Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,

    B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhanget al., “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,” inProceed- ings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 037–12 047

  18. [18]

    Generative ai for autonomous driving: Frontiers and opportunities

    Y . Wang, S. Xing, C. Can, R. Li, H. Hua, K. Tian, Z. Mo, X. Gao, K. Wu, S. Zhouet al., “Generative ai for autonomous driving: Frontiers and opportunities,”arXiv preprint arXiv:2505.08854, 2025

  19. [19]

    Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving.arXiv preprint arXiv:2505.17685, 2025

    S. Zeng, X. Chang, M. Xie, X. Liu, Y . Bai, Z. Pan, M. Xu, X. Wei, and N. Guo, “Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving,”arXiv preprint arXiv:2505.17685, 2025

  20. [20]

    A survey on end-to-end autonomous driving training from the perspectives of data, strategy, and platform,

    C. Xu, Y . Cui, J. Liu, C. Qin, G. Zhang, X. Dong, S. Fang, Y . Guo, P. Hang, and J. Sun, “A survey on end-to-end autonomous driving training from the perspectives of data, strategy, and platform,”Authorea Preprints, 2025

  21. [21]

    Scenario diffusion: Controllable driving scenario generation with diffusion,

    E. Pronovost, M. R. Ganesina, N. Hendy, Z. Wang, A. Morales, K. Wang, and N. Roy, “Scenario diffusion: Controllable driving scenario generation with diffusion,”Advances in Neural Infor- mation Processing Systems, vol. 36, pp. 68 873–68 894, 2023

  22. [22]

    Towards realistic scene generation with lidar diffusion models,

    H. Ran, V . Guizilini, and Y . Wang, “Towards realistic scene generation with lidar diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, 2024, pp. 14 738–14 748

  23. [23]

    Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments,

    L. Rowe, R. Girgis, A. Gosselin, L. Paull, C. Pal, and F. Heide, “Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments,” inProceedings of the Com- puter Vision and Pattern Recognition Conference, 2025, pp. 17 207–17 218

  24. [24]

    Drivingdiffusion: layout-guided multi-view driving scenarios video generation with latent dif- fusion model,

    X. Li, Y . Zhang, and X. Ye, “Drivingdiffusion: layout-guided multi-view driving scenarios video generation with latent dif- fusion model,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 469–485

  25. [25]

    A survey on generative diffusion models,

    H. Cao, C. Tan, Z. Gao, Y . Xu, G. Chen, P.-A. Heng, and S. Z. Li, “A survey on generative diffusion models,”IEEE transactions on knowledge and data engineering, vol. 36, no. 7, pp. 2814–2830, 2024

  26. [26]

    Advancing au- tonomy through lifelong learning: a survey of autonomous intel- ligent systems,

    D. Zhu, Q. Bu, Z. Zhu, Y . Zhang, and Z. Wang, “Advancing au- tonomy through lifelong learning: a survey of autonomous intel- ligent systems,”Frontiers in neurorobotics, vol. 18, p. 1385778, 2024

  27. [27]

    Mappo-pis: A multi- agent proximal policy optimization method with prior intent sharing for cavs’ cooperative decision-making,

    Y . Guo, J. Liu, R. Yu, P. Hang, and J. Sun, “Mappo-pis: A multi- agent proximal policy optimization method with prior intent sharing for cavs’ cooperative decision-making,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 244–263

  28. [28]

    Agent0: Unleashing self-evolving agents from zero data via tool-integrated reasoning.arXiv preprint arXiv:2511.16043, 2025

    P. Xia, K. Zeng, J. Liu, C. Qin, F. Wu, Y . Zhou, C. Xiong, and H. Yao, “Agent0: Unleashing self-evolving agents from zero data via tool-integrated reasoning,”arXiv preprint arXiv:2511.16043, 2025

  29. [29]

    A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

    H.-a. Gao, J. Geng, W. Hua, M. Hu, X. Juan, H. Liu, S. Liu, J. Qiu, X. Qi, Y . Wuet al., “A survey of self-evolving agents: On path to artificial super intelligence,”arXiv preprint arXiv:2507.21046, 2025

  30. [30]

    Preserving and combining knowledge in robotic lifelong reinforcement learning,

    Y . Meng, Z. Bing, X. Yao, K. Chen, K. Huang, Y . Gao, F. Sun, and A. Knoll, “Preserving and combining knowledge in robotic lifelong reinforcement learning,”Nature Machine Intelligence, pp. 1–14, 2025

  31. [31]

    Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,

    H. Fu, D. Zhang, Z. Zhao, J. Cui, D. Liang, C. Zhang, D. Zhang, H. Xie, B. Wang, and X. Bai, “Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025

  32. [32]

    Reinforced refinement with self-aware expansion for end-to-end autonomous driving,

    H. Liu, T. Li, H. Yang, L. Chen, C. Wang, K. Guo, H. Tian, H. Li, H. Li, and C. Lv, “Reinforced refinement with self-aware expansion for end-to-end autonomous driving,”arXiv preprint arXiv:2506.09800, 2025

  33. [33]

    Openemma: Open-source multimodal model for end-to-end au- tonomous driving,

    S. Xing, C. Qian, Y . Wang, H. Hua, K. Tian, Y . Zhou, and Z. Tu, “Openemma: Open-source multimodal model for end-to-end au- tonomous driving,” inProceedings of the Winter Conference on Applications of Computer Vision, 2025, pp. 1001–1009

  34. [34]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020

  35. [35]

    Bits: Bi-level imitation for traffic simulation,

    D. Xu, Y . Chen, B. Ivanovic, and M. Pavone, “Bits: Bi-level imitation for traffic simulation,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 2929–2936

  36. [36]

    nuscenes: A multimodal dataset for autonomous driving,

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

  37. [37]

    nuplan: A closed-loop ml-based planning bench- mark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021

    H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,”arXiv preprint arXiv:2106.11810, 2021

  38. [38]

    Generating useful accident-prone driving scenarios via a learned traffic prior,

    D. Rempe, J. Philion, L. J. Guibas, S. Fidler, and O. Litany, “Generating useful accident-prone driving scenarios via a learned traffic prior,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 305– 17 315

  39. [39]

    Congested traffic states in empirical observations and microscopic simulations,

    M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,” Physical review E, vol. 62, no. 2, p. 1805, 2000

  40. [40]

    Guided conditional diffusion for controllable traffic simulation,

    Z. Zhong, D. Rempe, D. Xu, Y . Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simulation,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3560–3566

  41. [41]

    Diffscene: Diffusion- based safety-critical scenario generation for autonomous ve- hicles,

    C. Xu, A. Petiushko, D. Zhao, and B. Li, “Diffscene: Diffusion- based safety-critical scenario generation for autonomous ve- hicles,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 8, 2025, pp. 8797–8805

  42. [42]

    Causal composition diffu- sion model for closed-loop traffic generation,

    H. Lin, X. Huang, T. Phan, D. Hayden, H. Zhang, D. Zhao, S. Srinivasa, E. Wolff, and H. Chen, “Causal composition diffu- sion model for closed-loop traffic generation,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 27 542–27 552

  43. [43]

    Safe-sim: Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries,

    W.-J. Chang, F. Pittaluga, M. Tomizuka, W. Zhan, and M. Chan- draker, “Safe-sim: Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 242–258

  44. [44]

    Language conditioned traffic generation,

    S. Tan, B. Ivanovic, X. Weng, M. Pavone, and P. Kraehen- buehl, “Language conditioned traffic generation,”arXiv preprint arXiv:2307.07947, 2023

  45. [45]

    Realgen: Retrieval augmented generation for controllable traffic scenarios,

    W. Ding, Y . Cao, D. Zhao, C. Xiao, and M. Pavone, “Realgen: Retrieval augmented generation for controllable traffic scenarios,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 93–110