arxiv: 2604.07378 · v1 · submitted 2026-04-08 · 💻 cs.RO

Recognition: 2 theorem links

· Lean Theorem

Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles

Chengkai Xu, Jian Sun, Jiaqi Liu, Peng Hang, Yicheng Guo

Pith reviewed 2026-05-10 18:50 UTC · model grok-4.3

classification 💻 cs.RO

keywords adversarial evaluationautonomous vehiclesclosed-loop trainingdiffusion modelssafety-critical scenariosnuScenesnuPlan

0 comments

The pith

Adversarial scenario generation for autonomous vehicles can be reframed as a closed-loop evolutionary curriculum that discovers more failure cases while staying close to real data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Evaluation as Evolution (E²), a framework that converts static adversarial stress testing into an adaptive process where discovered failures are recycled to refine vehicle policies. It models scenario synthesis as transport-regularized sparse control over a learned reverse-time SDE prior, using topology-driven support selection and Topological Anchoring to keep outputs realistic. This closed loop addresses the scarcity of safety-critical events in existing datasets. On nuScenes and nuPlan, E² finds more collisions than prior methods and yields robustness improvements when the boundary cases retrain policies.

Core claim

By casting adversarial generation as transport-regularized sparse control on a reverse-time SDE prior, stabilized by Topological Anchoring and topology-driven agent selection, the method discovers more collision failures than open-loop baselines while constraining deviations from real distributions, allowing those cases to be fed back for policy improvement.

What carries the argument

The Evaluation as Evolution (E²) framework, which performs transport-regularized sparse control over a learned reverse-time SDE prior and applies Topological Anchoring to stabilize high-dimensional scenario generation.

If this is right

Collision failure discovery rates rise by 9.01% on nuScenes and up to 21.43% on nuPlan relative to the strongest baselines.
Generated scenarios maintain low invalidity and high realism scores.
Recycling the discovered boundary cases into closed-loop policy fine-tuning produces measurable robustness gains.
Adversarial evaluation shifts from a one-time post-hoc check to an ongoing part of the training loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transport-regularized control mechanism could be applied to other sequential decision domains where rare failure modes limit robustness.
If the topological anchoring step scales to higher-dimensional state spaces, similar curricula might be constructed without hand-crafted scenario templates.
Repeated cycles of discovery and retraining could reduce reliance on static datasets for safety validation.

Load-bearing premise

The adversarial scenarios generated from the reverse-time SDE prior stay close enough to real traffic distributions to supply useful training signals without introducing invalid or biased cases.

What would settle it

A controlled experiment showing that policies fine-tuned on E²-generated boundary cases exhibit no reduction in collision rates during closed-loop simulation or real-world replay compared with policies fine-tuned on the strongest baseline adversarial sets.

Figures

Figures reproduced from arXiv: 2604.07378 by Chengkai Xu, Jian Sun, Jiaqi Liu, Peng Hang, Yicheng Guo.

**Figure 1.** Figure 1: From decoupled evaluation to closed-loop evolution. Top: Conventional pipelines separate training from evaluation, so discovered failures remain disconnected from policy learning. Bottom: E2 couples an Adversarial Synthesizer with an Ego Policy to synthesize adversarial interactions for closed-loop simulation and recycle outcomes as learning signals to update the ego policy. terfactual coverage for recove… view at source ↗

**Figure 2.** Figure 2: Overview of Evaluation as Evolution. Left (Adversarial Synthesizer): given scene context, the Synthesizer builds a risk-weighted interaction graph, selects an intervention-critical Top-K adversary set via topological bifurcation analysis, and synthesizes feasible, realistic adversarial trajectories by transport-regularized sparse control over a reverse-time SDE prior with Topological Anchoring. Right (Ego … view at source ↗

**Figure 3.** Figure 3: Structure-aware sparse control via scene interaction graph construction. The Synthesizer builds a TTC-based risk interaction matrix, converts it into a weighted interaction graph, discovers higher-order risk groups (cliques), and selects top cliques across time to score and activate a small subset of adversarial agents for targeted control. where σ(z) := (1 + exp(−z))−1 , τmax > 0 caps the TTC horizon, and… view at source ↗

**Figure 4.** Figure 4: Qualitative controllability of failure timing and type under increasing adversarial intensity. Closedloop rollouts for the lane-graph (LG) ego policy under a fixed random seed (seed=50), with η ∈ {0, 1, 2, 3} from bottom to top. Efficacy of Structure-Aware Selection. Table II validates the hypothesis that the selection of controlled agents is as critical as the control mechanism itself. Randomly selectin… view at source ↗

**Figure 5.** Figure 5: Qualitative interaction skeleton. TTC-based interaction matrices and the activated ego-adversary chain for the same scene. Left: Topological Anchoring disabled. Right: Topological Anchoring enabled. D. Generalization Across Ego Policies To assess generalization, we evaluate E2 against three distinct ego policies, specifically rule-based (LG), physics-based (IDM), and hybrid learned (BITS), treating each a… view at source ↗

**Figure 6.** Figure 6: Policy improvement from adversarial closedloop fine-tuning. Bars show the percentage gain over the original lane-graph policy on a tuning subset and a disjoint test subset of the nuScenes validation split, evaluated under matched and increased adversarial intensity (η = 1.0, 2.0). of magnitude. This observation implies that while BITS effectively manages low-speed interactions, it remains brittle in high… view at source ↗

read the original abstract

Autonomous vehicles in interactive traffic environments are often limited by the scarcity of safety-critical tail events in static datasets, which biases learned policies toward average-case behaviors and reduces robustness. Existing evaluation methods attempt to address this through adversarial stress testing, but are predominantly open-loop and post-hoc, making it difficult to incorporate discovered failures back into the training process. We introduce Evaluation as Evolution ($E^2$), a closed-loop framework that transforms adversarial generation from a static validation step into an adaptive evolutionary curriculum. Specifically, $E^2$ formulates adversarial scenario synthesis as transport-regularized sparse control over a learned reverse-time SDE prior. To make this high-dimensional generation tractable, we utilize topology-driven support selection to identify critical interacting agents, and introduce Topological Anchoring to stabilize the process. This approach enables the targeted discovery of failure cases while strictly constraining deviations from realistic data distributions. Empirically, $E^2$ improves collision failure discovery by 9.01% on the nuScenes dataset and up to 21.43% on the nuPlan dataset over the strongest baselines, while maintaining low invalidity and high realism. It further yields substantial robustness gains when the resulting boundary cases are recycled for closed-loop policy fine-tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames adversarial scenario generation for AVs as a closed-loop evolutionary curriculum using transport-regularized diffusion control, with reported gains in failure discovery and policy robustness.

read the letter

The main point is that this work gives a practical way to recycle discovered failures from adversarial testing back into AV policy training instead of treating evaluation as a dead-end step. They call the approach E² and build it around formulating scenario synthesis as transport-regularized sparse control on a learned reverse-time SDE, with topology-driven agent selection and Topological Anchoring to keep outputs close to real distributions. This is a new synthesis compared to the open-loop adversarial methods they cite, and it turns the process into an adaptive curriculum. The paper shows concrete results: 9.01% better collision failure discovery on nuScenes and up to 21.43% on nuPlan versus strong baselines, plus robustness improvements after fine-tuning on the generated boundary cases, all while reporting low invalidity and high realism. That combination of generation and recycling is the part that could actually move the needle on tail-event robustness. The experiments appear grounded in standard datasets and the claims are presented as measured outcomes rather than fitted quantities. The soft spots are mostly in the experimental reporting. The abstract gives percentages but leaves out baseline implementation details, statistical tests, and full ablation breakdowns, so the full manuscript needs to show the gains hold under different random seeds and are not driven by one or two lucky choices. The realism constraint via anchoring is central, and any reader will want to see how sensitive the outputs are to small distribution shifts. Overall this is aimed at AV safety and robust learning researchers who already work with nuScenes or nuPlan style data. The thinking is clear, the pipeline is reproducible in principle, and the claims are falsifiable, so it deserves a serious referee rather than a desk reject. I'd send it out for review with a request for more ablation and baseline transparency.

Referee Report

2 major / 2 minor

Summary. The paper introduces Evaluation as Evolution (E²), a closed-loop framework that reformulates adversarial scenario synthesis for autonomous vehicles as an adaptive evolutionary curriculum. It models generation via transport-regularized sparse control over a learned reverse-time SDE prior, augmented by topology-driven support selection and Topological Anchoring to ensure tractability and distributional closeness. The approach discovers more collision failures than open-loop baselines and recycles the resulting boundary cases to fine-tune policies for improved robustness, with reported gains of 9.01% on nuScenes and up to 21.43% on nuPlan while preserving low invalidity and high realism.

Significance. If the empirical results hold under rigorous validation, the work offers a meaningful advance by closing the loop between adversarial evaluation and policy training, directly addressing the scarcity of tail events in static AV datasets. The combination of diffusion-based generation with explicit realism constraints and iterative recycling provides a principled way to evolve curricula, which could lead to more robust closed-loop controllers in interactive traffic settings.

major comments (2)

[Abstract / Results] Abstract and Results: The concrete percentage improvements in collision failure discovery (9.01% on nuScenes, 21.43% on nuPlan) are presented without accompanying details on baseline implementations, number of evaluation runs, variance estimates, or statistical significance tests. This makes it difficult to assess whether the gains are robust or sensitive to implementation choices in the transport-regularized control and Topological Anchoring components.
[Method] Method: The claim that Topological Anchoring and topology-driven support selection keep generated scenarios sufficiently close to the data distribution (to avoid harmful biases) is central to the utility of the closed-loop curriculum, yet the manuscript provides no quantitative validation (e.g., distributional distance metrics or human realism ratings) beyond the stated “low invalidity and high realism.”

minor comments (2)

[Abstract / Introduction] The abstract and introduction use several novel terms (transport-regularized sparse control, Topological Anchoring, topology-driven support selection) without a concise glossary or forward reference to their formal definitions.
[Figures / Tables] Figure captions and experimental tables should explicitly list the exact baselines compared (e.g., which open-loop adversarial methods) and the precise definition of “collision failure discovery” metric.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. We address each major comment below and will revise the manuscript accordingly to improve clarity and substantiation of our claims.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: The concrete percentage improvements in collision failure discovery (9.01% on nuScenes, 21.43% on nuPlan) are presented without accompanying details on baseline implementations, number of evaluation runs, variance estimates, or statistical significance tests. This makes it difficult to assess whether the gains are robust or sensitive to implementation choices in the transport-regularized control and Topological Anchoring components.

Authors: We agree that additional experimental details are necessary for rigorous assessment. In the revised manuscript, we will expand the Results and Experimental Setup sections to explicitly describe the baseline implementations, the number of independent evaluation runs, variance estimates (e.g., standard deviations across runs), and any statistical significance testing performed. This will clarify the robustness of the reported improvements. revision: yes
Referee: [Method] Method: The claim that Topological Anchoring and topology-driven support selection keep generated scenarios sufficiently close to the data distribution (to avoid harmful biases) is central to the utility of the closed-loop curriculum, yet the manuscript provides no quantitative validation (e.g., distributional distance metrics or human realism ratings) beyond the stated “low invalidity and high realism.”

Authors: We acknowledge that the current presentation relies primarily on invalidity rates and qualitative realism indicators. To strengthen the validation of distributional closeness, we will add quantitative metrics in the revised Method and Results sections, such as explicit distributional distance measures (e.g., maximum mean discrepancy or Wasserstein distance) between generated scenarios and the original dataset, computed over the same evaluation runs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims are measured outcomes

full rationale

The paper describes a closed-loop curriculum method that formulates adversarial generation as transport-regularized sparse control over a learned reverse-time SDE prior, stabilized by topology-driven support selection and Topological Anchoring. All load-bearing claims consist of empirical measurements: collision failure discovery rates (9.01% on nuScenes, up to 21.43% on nuPlan) and downstream robustness gains when boundary cases are recycled for policy fine-tuning. These quantities are presented as direct experimental results against baselines on fixed external datasets, subject to explicit constraints on invalidity and realism. No equations, parameter fits, or self-citations are shown to reduce the reported gains to the inputs by construction; the framework remains self-contained with independent content that can be falsified by replication on the same benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Framework rests on standard properties of diffusion SDEs and domain assumptions about scenario realism; introduces new procedural elements without external validation.

axioms (2)

standard math Existence and properties of reverse-time stochastic differential equations as priors for scenario generation
Invoked to model the diffusion process for generating adversarial traffic scenes.
domain assumption Generated scenarios remain sufficiently realistic under transport regularization and topological constraints
Required to ensure low invalidity and high realism as claimed in results.

invented entities (2)

Topological Anchoring no independent evidence
purpose: Stabilize high-dimensional adversarial generation process
New stabilization technique introduced to make the method tractable.
Topology-driven support selection no independent evidence
purpose: Identify critical interacting agents in scenarios
New selection mechanism for focusing control on relevant parts of the scene.

pith-pipeline@v0.9.0 · 5529 in / 1395 out tokens · 48401 ms · 2026-05-10T18:50:55.993459+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
formulates adversarial scenario synthesis as transport-regularized sparse control over a learned reverse-time SDE prior... Topological Anchoring to stabilize the process
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
topology-driven support selection... Risk-weighted interaction graph... temporal clique scoring

Reference graph

Works this paper leans on

45 extracted references · 12 canonical work pages

[1]

Aligning cyber space with physical world: A comprehensive sur- vey on embodied ai,

Y . Liu, W. Chen, Y . Bai, X. Liang, G. Li, W. Gao, and L. Lin, “Aligning cyber space with physical world: A comprehensive sur- vey on embodied ai,”IEEE/ASME Transactions on Mechatronics, 2025

2025
[2]

Multimodal fusion and vision-language models: A survey for robot vision,

X. Han, S. Chen, Z. Fu, Z. Feng, L. Fan, D. An, C. Wang, L. Guo, W. Meng, X. Zhanget al., “Multimodal fusion and vision-language models: A survey for robot vision,”Information Fusion, p. 103652, 2025

2025
[3]

Agent0-vl: Exploring self-evolving agent for tool-integrated vision-language reasoning.arXiv preprint arXiv:2511.19900, 2025

J. Liu, K. Xiong, P. Xia, Y . Zhou, H. Ji, L. Feng, S. Han, M. Ding, and H. Yao, “Agent0-vl: Exploring self-evolving agent for tool-integrated vision-language reasoning,”arXiv preprint arXiv:2511.19900, 2025

work page arXiv 2025
[4]

Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey,

J. Wu, C. Huang, H. Huang, C. Lv, Y . Wang, and F.-Y . Wang, “Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey,”Transportation Research Part C: Emerging Technologies, vol. 164, p. 104654, 2024

2024
[5]

Risknet: interaction-aware risk forecasting for autonomous driving in long-tail scenarios,

Q. Liu, H. Huang, S. Zhao, L. Shi, S. Ahn, and X. Li, “Risknet: interaction-aware risk forecasting for autonomous driving in long-tail scenarios,”Transportation Research Part E: Logistics and Transportation Review, vol. 205, p. 104478, 2026

2026
[6]

Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,

X. Jia, Z. Yang, Q. Li, Z. Zhang, and J. Yan, “Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,”Advances in Neural Information Processing Systems, vol. 37, pp. 819–844, 2024

2024
[7]

Spatialtree: How spatial abilities branch out in mllms,

Y . Xiao, L. Li, S. Yan, X. Liu, S. Peng, Y . Wei, X. Zhou, and B. Kang, “Spatialtree: How spatial abilities branch out in mllms,” arXiv preprint arXiv:2512.20617, 2025

work page arXiv 2025
[8]

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

R. Yang, H. Chen, J. Zhang, M. Zhao, C. Qian, K. Wang, Q. Wang, T. V . Koripella, M. Movahedi, M. Liet al., “Embod- iedbench: Comprehensive benchmarking multi-modal large lan- guage models for vision-driven embodied agents,”arXiv preprint arXiv:2502.09560, 2025

work page arXiv 2025
[9]

Boundary state generation for testing and improvement of autonomous driving systems,

M. Biagiola and P. Tonella, “Boundary state generation for testing and improvement of autonomous driving systems,”IEEE Transactions on Software Engineering, vol. 50, no. 8, pp. 2040– 2053, 2024

2040
[10]

Elucidating the solution space of extended reverse-time sde for diffusion models,

Q. Cui, X. Zhang, Q. Bao, and Q. Liao, “Elucidating the solution space of extended reverse-time sde for diffusion models,” in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 243–252

2025
[11]

Neuroncap: Pho- torealistic closed-loop safety testing for autonomous driving,

W. Ljungbergh, A. Tonderski, J. Johnander, H. Caesar, K. ˚Astr¨om, M. Felsberg, and C. Petersson, “Neuroncap: Pho- torealistic closed-loop safety testing for autonomous driving,” in European Conference on Computer Vision. Springer, 2024, pp. 161–177

2024
[12]

A survey: Learning embodied intelligence from physical simulators and world models,

X. Long, Q. Zhao, K. Zhang, Z. Zhang, D. Wang, Y . Liu, Z. Shu, Y . Lu, S. Wang, X. Weiet al., “A survey: Learning embodied intelligence from physical simulators and world models,”arXiv preprint arXiv:2507.00917, 2025

work page arXiv 2025
[13]

Inter- active adversarial testing of autonomous vehicles with adjustable confrontation intensity,

Y . Guo, C. Xu, J. Liu, H. Zhang, P. Hang, and J. Sun, “Inter- active adversarial testing of autonomous vehicles with adjustable confrontation intensity,”arXiv preprint arXiv:2507.21814, 2025

work page arXiv 2025
[14]

Explainable ai for safe and trustworthy autonomous driving: A systematic review,

A. Kuznietsov, B. Gyevnar, C. Wang, S. Peters, and S. V . Albrecht, “Explainable ai for safe and trustworthy autonomous driving: A systematic review,”IEEE Transactions on Intelligent Transportation Systems, 2024

2024
[15]

Gen- erating intersection pre-crash trajectories for autonomous driving safety testing using transformer time-series generative adversarial networks,

X. Liu, H. Huang, J. Bian, R. Zhou, Z. Wei, and H. Zhou, “Gen- erating intersection pre-crash trajectories for autonomous driving safety testing using transformer time-series generative adversarial networks,”Engineering Applications of Artificial Intelligence, vol. 160, p. 111995, 2025. 12

2025
[16]

Ddm-lag: A diffusion-based decision-making model for autonomous vehicles with lagrangian safety enhancement,

J. Liu, P. Hang, X. Zhao, J. Wang, and J. Sun, “Ddm-lag: A diffusion-based decision-making model for autonomous vehicles with lagrangian safety enhancement,”IEEE Transactions on Artificial Intelligence, 2024

2024
[17]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,

B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y . Zhang, Q. Zhanget al., “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,” inProceed- ings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 037–12 047

2025
[18]

Generative ai for autonomous driving: Frontiers and opportunities

Y . Wang, S. Xing, C. Can, R. Li, H. Hua, K. Tian, Z. Mo, X. Gao, K. Wu, S. Zhouet al., “Generative ai for autonomous driving: Frontiers and opportunities,”arXiv preprint arXiv:2505.08854, 2025

work page arXiv 2025
[19]

Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving.arXiv preprint arXiv:2505.17685, 2025

S. Zeng, X. Chang, M. Xie, X. Liu, Y . Bai, Z. Pan, M. Xu, X. Wei, and N. Guo, “Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving,”arXiv preprint arXiv:2505.17685, 2025

work page arXiv 2025
[20]

A survey on end-to-end autonomous driving training from the perspectives of data, strategy, and platform,

C. Xu, Y . Cui, J. Liu, C. Qin, G. Zhang, X. Dong, S. Fang, Y . Guo, P. Hang, and J. Sun, “A survey on end-to-end autonomous driving training from the perspectives of data, strategy, and platform,”Authorea Preprints, 2025

2025
[21]

Scenario diffusion: Controllable driving scenario generation with diffusion,

E. Pronovost, M. R. Ganesina, N. Hendy, Z. Wang, A. Morales, K. Wang, and N. Roy, “Scenario diffusion: Controllable driving scenario generation with diffusion,”Advances in Neural Infor- mation Processing Systems, vol. 36, pp. 68 873–68 894, 2023

2023
[22]

Towards realistic scene generation with lidar diffusion models,

H. Ran, V . Guizilini, and Y . Wang, “Towards realistic scene generation with lidar diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, 2024, pp. 14 738–14 748

2024
[23]

Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments,

L. Rowe, R. Girgis, A. Gosselin, L. Paull, C. Pal, and F. Heide, “Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments,” inProceedings of the Com- puter Vision and Pattern Recognition Conference, 2025, pp. 17 207–17 218

2025
[24]

Drivingdiffusion: layout-guided multi-view driving scenarios video generation with latent dif- fusion model,

X. Li, Y . Zhang, and X. Ye, “Drivingdiffusion: layout-guided multi-view driving scenarios video generation with latent dif- fusion model,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 469–485

2024
[25]

A survey on generative diffusion models,

H. Cao, C. Tan, Z. Gao, Y . Xu, G. Chen, P.-A. Heng, and S. Z. Li, “A survey on generative diffusion models,”IEEE transactions on knowledge and data engineering, vol. 36, no. 7, pp. 2814–2830, 2024

2024
[26]

Advancing au- tonomy through lifelong learning: a survey of autonomous intel- ligent systems,

D. Zhu, Q. Bu, Z. Zhu, Y . Zhang, and Z. Wang, “Advancing au- tonomy through lifelong learning: a survey of autonomous intel- ligent systems,”Frontiers in neurorobotics, vol. 18, p. 1385778, 2024

2024
[27]

Mappo-pis: A multi- agent proximal policy optimization method with prior intent sharing for cavs’ cooperative decision-making,

Y . Guo, J. Liu, R. Yu, P. Hang, and J. Sun, “Mappo-pis: A multi- agent proximal policy optimization method with prior intent sharing for cavs’ cooperative decision-making,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 244–263

2024
[28]

Agent0: Unleashing self-evolving agents from zero data via tool-integrated reasoning.arXiv preprint arXiv:2511.16043, 2025

P. Xia, K. Zeng, J. Liu, C. Qin, F. Wu, Y . Zhou, C. Xiong, and H. Yao, “Agent0: Unleashing self-evolving agents from zero data via tool-integrated reasoning,”arXiv preprint arXiv:2511.16043, 2025

work page arXiv 2025
[29]

A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

H.-a. Gao, J. Geng, W. Hua, M. Hu, X. Juan, H. Liu, S. Liu, J. Qiu, X. Qi, Y . Wuet al., “A survey of self-evolving agents: On path to artificial super intelligence,”arXiv preprint arXiv:2507.21046, 2025

work page arXiv 2025
[30]

Preserving and combining knowledge in robotic lifelong reinforcement learning,

Y . Meng, Z. Bing, X. Yao, K. Chen, K. Huang, Y . Gao, F. Sun, and A. Knoll, “Preserving and combining knowledge in robotic lifelong reinforcement learning,”Nature Machine Intelligence, pp. 1–14, 2025

2025
[31]

Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,

H. Fu, D. Zhang, Z. Zhao, J. Cui, D. Liang, C. Zhang, D. Zhang, H. Xie, B. Wang, and X. Bai, “Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2025
[32]

Reinforced refinement with self-aware expansion for end-to-end autonomous driving,

H. Liu, T. Li, H. Yang, L. Chen, C. Wang, K. Guo, H. Tian, H. Li, H. Li, and C. Lv, “Reinforced refinement with self-aware expansion for end-to-end autonomous driving,”arXiv preprint arXiv:2506.09800, 2025

work page arXiv 2025
[33]

Openemma: Open-source multimodal model for end-to-end au- tonomous driving,

S. Xing, C. Qian, Y . Wang, H. Hua, K. Tian, Y . Zhou, and Z. Tu, “Openemma: Open-source multimodal model for end-to-end au- tonomous driving,” inProceedings of the Winter Conference on Applications of Computer Vision, 2025, pp. 1001–1009

2025
[34]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020

2020
[35]

Bits: Bi-level imitation for traffic simulation,

D. Xu, Y . Chen, B. Ivanovic, and M. Pavone, “Bits: Bi-level imitation for traffic simulation,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 2929–2936

2023
[36]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

2020
[37]

nuplan: A closed-loop ml-based planning bench- mark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021

H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,”arXiv preprint arXiv:2106.11810, 2021

work page arXiv 2021
[38]

Generating useful accident-prone driving scenarios via a learned traffic prior,

D. Rempe, J. Philion, L. J. Guibas, S. Fidler, and O. Litany, “Generating useful accident-prone driving scenarios via a learned traffic prior,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 305– 17 315

2022
[39]

Congested traffic states in empirical observations and microscopic simulations,

M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,” Physical review E, vol. 62, no. 2, p. 1805, 2000

2000
[40]

Guided conditional diffusion for controllable traffic simulation,

Z. Zhong, D. Rempe, D. Xu, Y . Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simulation,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3560–3566

2023
[41]

Diffscene: Diffusion- based safety-critical scenario generation for autonomous ve- hicles,

C. Xu, A. Petiushko, D. Zhao, and B. Li, “Diffscene: Diffusion- based safety-critical scenario generation for autonomous ve- hicles,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 8, 2025, pp. 8797–8805

2025
[42]

Causal composition diffu- sion model for closed-loop traffic generation,

H. Lin, X. Huang, T. Phan, D. Hayden, H. Zhang, D. Zhao, S. Srinivasa, E. Wolff, and H. Chen, “Causal composition diffu- sion model for closed-loop traffic generation,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 27 542–27 552

2025
[43]

Safe-sim: Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries,

W.-J. Chang, F. Pittaluga, M. Tomizuka, W. Zhan, and M. Chan- draker, “Safe-sim: Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 242–258

2024
[44]

Language conditioned traffic generation,

S. Tan, B. Ivanovic, X. Weng, M. Pavone, and P. Kraehen- buehl, “Language conditioned traffic generation,”arXiv preprint arXiv:2307.07947, 2023

work page arXiv 2023
[45]

Realgen: Retrieval augmented generation for controllable traffic scenarios,

W. Ding, Y . Cao, D. Zhao, C. Xiao, and M. Pavone, “Realgen: Retrieval augmented generation for controllable traffic scenarios,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 93–110

2024