pith. machine review for the scientific record. sign in

arxiv: 2605.04647 · v2 · submitted 2026-05-06 · 💻 cs.RO

Recognition: no theorem link

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Ben Lu, Bihao Cui, Chuan Tang, Huimin Wang, Kun Zhan, Mingqian Wang, Pengxiang Li, Teng Zhang, Tong Wang, Yue Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:43 UTC · model grok-4.3

classification 💻 cs.RO
keywords discrete diffusionautonomous drivingreinforcement learningself-editingtrajectory planningmasked decodingNAVSIMAutoEdit
0
0 comments X

The pith

Full-rollout reinforcement learning enables effective in-place self-editing in a discrete diffusion driving planner.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ReflectDrive-2, a masked discrete diffusion model that generates autonomous driving trajectories as sequences of discrete tokens through parallel decoding. It introduces AutoEdit, an inference-time mechanism that lets the same model rewrite selected tokens to revise plans without any auxiliary network. Training proceeds in two stages: supervised recovery of expert trajectories from structure-aware perturbations along longitudinal and lateral directions, followed by reinforcement learning on complete decision-draft-reflect rollouts that assign terminal driving rewards to the final edited trajectory. The central result is that this full-rollout RL stage is required to couple drafting and editing; supervised training alone produces at most a 0.3 PDMS gain from AutoEdit, while RL raises the gain to 1.9. The system reports 91.0 PDMS with camera input and 31.8 ms latency on NAVSIM.

Core claim

ReflectDrive-2 shows that a masked discrete diffusion planner for driving trajectories can be trained to perform useful in-place revisions of its own discrete token plans when the full decision-draft-reflect sequence is optimized end-to-end with reinforcement learning and terminal driving rewards; this full-rollout credit assignment is essential, because supervised training on perturbed experts yields only marginal editing benefit while the RL stage produces a substantially larger improvement in final trajectory quality.

What carries the argument

The two-stage training procedure that first supervises recovery from longitudinal and lateral perturbations and then applies policy-gradient updates across full decision-draft-reflect rollouts to align the AutoEdit self-revision behavior with terminal driving rewards.

If this is right

  • The planner reaches 91.0 PDMS with camera-only input and 94.8 PDMS in a best-of-6 oracle setting on NAVSIM.
  • Average inference runs at 31.8 ms on NVIDIA Thor hardware through the co-designed reflective decoding stack.
  • Shared-prefix KV reuse, Alternating Step Decode, and fused unmasking keep the decision-draft-reflect pipeline computationally efficient.
  • Discrete token representation allows trajectory revision without training or deploying a separate refinement model.
  • The separation of an action expert supports modular policy updates while keeping the diffusion backbone fixed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same full-rollout RL alignment could be applied to other discrete generative planners where self-correction steps need to be learned rather than hand-designed.
  • Terminal rewards alone appear sufficient to train multi-step generation-plus-editing when the rollout explicitly includes the editing transitions.
  • Structure-aware perturbation directions may transfer to training self-editing in continuous-trajectory or non-driving planning domains.
  • Reducing reliance on auxiliary refinement networks could simplify deployment of generative planners in resource-constrained vehicles.

Load-bearing premise

That structure-aware perturbations along longitudinal and lateral directions together with terminal driving rewards will train effective in-place editing without introducing distribution shift that harms performance outside the training distribution.

What would settle it

A controlled comparison on NAVSIM showing that the RL-fine-tuned model produces no larger AutoEdit improvement than the supervised-only baseline, or that edited trajectories yield lower driving metrics than non-edited ones when evaluated in closed-loop simulation.

Figures

Figures reproduced from arXiv: 2605.04647 by Ben Lu, Bihao Cui, Chuan Tang, Huimin Wang, Kun Zhan, Mingqian Wang, Pengxiang Li, Teng Zhang, Tong Wang, Yue Wang.

Figure 1
Figure 1. Figure 1: ReflectDrive-2 plans through a decision–draft–reflect process in a shared discrete token space. Conditioned on surround-view cameras, navigation, and ego-state, a goal-point posterior commits a single Goal Token, masked discrete diffusion parallel-decodes a trajectory Draft (blue), and AutoEdit rewrites tokens in place to yield the final Plan (green). discrete diffusion [Austin et al., 2021, Nie et al., 20… view at source ↗
Figure 2
Figure 2. Figure 2: ReflectDrive-2 architecture. Vision, language, ego-state, and trajectory tokens share a single self-attention backbone, with a Full FFN over prompt tokens and a compact Action FFN over the action-token block. The attention pattern (right) is causal over the prompt, enabling shared-prefix KV reuse and block-wise bidirectional over action tokens. The reinforcement-learning stage optimizes the complete draft-… view at source ↗
Figure 3
Figure 3. Figure 3: Training, inference, and deployment of ReflectDrive-2. (a) SFT with three objectives: masked-token loss LDLM, structure-aware perturbation loss LSAP over longitudinal scaling β and lateral rotation α, and drivable-area field loss Lfield. (b) Frame-level decision–draft–reflect inference: commit a goal token, parallel-decode the draft, then rewrite tokens in place via AutoEdit. (c) RL fine-tuning scores full… view at source ↗
Figure 4
Figure 4. Figure 4: Decision diversity from goal points. Each goal anchors a distinct behavior hypothesis. Trajectory opacity encodes PDMS view at source ↗
Figure 5
Figure 5. Figure 5: Inference-time reflection with AutoEdit. Semi-transparent: initial drafts. Solid: final outputs after AutoEdit. PDMS annotated at trajectory endpoints. 0 1 2 3 4 5 Number of steps 89.0 89.5 90.0 90.5 91.0 PDMS NAVSIM Closed-loop Results Generation steps ReEdit steps view at source ↗
Figure 6
Figure 6. Figure 6: Sensitivity to diffusion steps. PDMS with different generation and AutoEdit step counts. 0 2 4 6 8 10 Parameter values 90 92 94 96 PDMS NAVSIM Closed-loop Results Number of goals NMS Thre / 0.2m view at source ↗
read the original abstract

We introduce ReflectDrive-2, a masked discrete diffusion planner with separate action expert for autonomous driving that represents plans as discrete trajectory tokens and generates them through parallel masked decoding. This discrete token space enables in-place trajectory revision: AutoEdit rewrites selected tokens using the same model, without requiring an auxiliary refinement network. To train this capability, we use a two-stage procedure. First, we construct structure-aware perturbations of expert trajectories along longitudinal progress and lateral heading directions and supervise the model to recover the original expert trajectory. We then fine-tune the full decision--draft--reflect rollout with reinforcement learning (RL), assigning terminal driving reward to the final post-edit trajectory and propagating policy-gradient credit through full-rollout transitions. Full-rollout RL proves crucial for coupling drafting and editing: under supervised training alone, inference-time AutoEdit improves PDMS by at most $0.3$, whereas RL increases its gain to $1.9$. We also co-design an efficient reflective decoding stack for the decision--draft--reflect pipeline, combining shared-prefix KV reuse, Alternating Step Decode, and fused on-device unmasking. On NAVSIM, ReflectDrive-2 achieves $91.0$ PDMS with camera-only input and $94.8$ PDMS in a best-of-6 oracle setting, while running at $31.8$ ms average latency on NVIDIA Thor.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ReflectDrive-2, a masked discrete diffusion planner for autonomous driving that represents plans as discrete trajectory tokens and generates them via parallel masked decoding. It enables in-place self-editing (AutoEdit) without an auxiliary network. Training uses a two-stage process: supervised recovery from structure-aware perturbations of expert trajectories along longitudinal/lateral axes, followed by RL fine-tuning on full decision-draft-reflect rollouts using terminal driving rewards. The central claim is that full-rollout RL is crucial for coupling drafting and editing, as supervised training alone yields at most +0.3 PDMS improvement from AutoEdit while RL increases this to +1.9. On NAVSIM it reports 91.0 PDMS (camera-only) and 94.8 in best-of-6, with 31.8 ms average latency on NVIDIA Thor via a co-designed reflective decoding stack.

Significance. If the RL coupling result holds under rigorous controls, the approach could meaningfully advance self-correcting diffusion planners for driving by eliminating separate refinement networks and using on-policy RL to align draft and edit stages. The efficient decoding co-design (shared-prefix KV reuse, Alternating Step Decode, fused unmasking) is a practical strength for real-time deployment.

major comments (2)
  1. [Abstract] Abstract and training procedure: the supervised stage trains the editor exclusively on structure-aware perturbations of expert trajectories along longitudinal progress and lateral heading. No evidence is provided that these perturbations match the token-level error distribution (magnitude, direction, or correlations) produced by the diffusion drafter at inference, especially after RL policy updates. This leaves open the possibility that the reported 1.9 PDMS gain reflects on-policy training rather than a fundamental requirement for full-rollout RL to couple the stages.
  2. [Results] Results and evaluation: the headline comparison (supervised AutoEdit gain +0.3 PDMS vs. RL gain +1.9 PDMS) is presented without ablations on perturbation design, error-distribution matching, or intermediate reward signals. The terminal driving reward alone provides no gradient to distinguish useful edits from reward hacking, weakening the causal claim that full-rollout RL is required for effective self-editing.
minor comments (2)
  1. [Abstract] Clarify whether the 31.8 ms latency figure includes the full decision-draft-reflect pipeline or only the diffusion forward pass.
  2. [Results] The best-of-6 oracle result (94.8 PDMS) should be accompanied by variance or selection criteria to allow fair comparison with single-shot baselines.

Simulated Author's Rebuttal

2 responses · 2 unresolved

We thank the referee for the constructive feedback on our manuscript. We respond point-by-point to the major comments below, providing clarifications on our design choices while acknowledging limitations in the presented evidence.

read point-by-point responses
  1. Referee: [Abstract] Abstract and training procedure: the supervised stage trains the editor exclusively on structure-aware perturbations of expert trajectories along longitudinal progress and lateral heading. No evidence is provided that these perturbations match the token-level error distribution (magnitude, direction, or correlations) produced by the diffusion drafter at inference, especially after RL policy updates. This leaves open the possibility that the reported 1.9 PDMS gain reflects on-policy training rather than a fundamental requirement for full-rollout RL to couple the stages.

    Authors: The perturbations are constructed along longitudinal and lateral axes to target the dominant sources of trajectory error in driving, providing a practical initialization for the editor to recover expert-like behavior. We do not claim or demonstrate that these exactly replicate the token-level error statistics of the diffusion drafter at inference time, particularly after RL updates. The supervised stage initializes recovery capability, while the subsequent full-rollout RL exposes the model to the actual on-policy errors arising from the joint draft-edit process. The observed gap in AutoEdit gains (+0.3 PDMS supervised-only versus +1.9 PDMS after RL) is consistent with RL being necessary to align the two stages under realistic inference conditions. We can add further discussion of the perturbation rationale in a revision. revision: partial

  2. Referee: [Results] Results and evaluation: the headline comparison (supervised AutoEdit gain +0.3 PDMS vs. RL gain +1.9 PDMS) is presented without ablations on perturbation design, error-distribution matching, or intermediate reward signals. The terminal driving reward alone provides no gradient to distinguish useful edits from reward hacking, weakening the causal claim that full-rollout RL is required for effective self-editing.

    Authors: Our experiments prioritize the direct comparison of supervised versus RL training regimes to isolate the contribution of full-rollout RL. We did not perform exhaustive ablations on alternative perturbation designs or intermediate rewards, which would require substantial additional compute. The terminal reward is applied to the final post-edit trajectory and optimized via policy gradients over the complete rollout, allowing credit to flow to both drafting and editing decisions. While this does not explicitly penalize reward hacking at the edit level, the gains are measured on held-out NAVSIM scenarios and remain consistent. We can incorporate a limitations discussion on these aspects in the revised manuscript. revision: partial

standing simulated objections not resolved
  • Empirical verification that the chosen perturbations match the precise token-level error distribution of the diffusion drafter at inference after RL updates
  • Ablations using intermediate reward signals or alternative perturbation distributions to isolate their effects

Circularity Check

0 steps flagged

No circularity: empirical gains measured on external benchmark

full rationale

The paper's central claim compares measured PDMS scores (0.3 under supervised perturbations of expert trajectories vs. 1.9 after full-rollout RL) on the NAVSIM benchmark. These are direct evaluation outcomes from distinct training procedures, not quantities defined in terms of the inputs or fitted parameters. The two-stage process (supervised recovery on longitudinal/lateral perturbations, then RL with terminal reward) is described explicitly without self-referential equations or load-bearing self-citations that reduce the result to its own assumptions. No ansatz smuggling, renaming of known results, or uniqueness theorems imported from prior author work appear in the derivation chain. The performance numbers remain falsifiable external measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; all details on reward design, token vocabulary, and perturbation construction are missing.

pith-pipeline@v0.9.0 · 5568 in / 1077 out tokens · 20164 ms · 2026-05-13T01:43:14.780880+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

123 extracted references · 123 canonical work pages · 12 internal anchors

  1. [1]

    Advances in Neural Information Processing Systems , volume =

    Structured Denoising Diffusion Models in Discrete State-Spaces , author =. Advances in Neural Information Processing Systems , volume =

  2. [3]

    International Conference on Learning Representations , year =

    Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models , author =. International Conference on Learning Representations , year =

  3. [4]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    Blended Diffusion for Text-Driven Editing of Natural Images , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

  4. [5]

    Bie, Tiwei and Cao, Maosong and Chen, Kun and Du, Lun and Gong, Mingliang and Gong, Zhuochen and Gu, Yanmei and Hu, Jiaqi and Huang, Zenan and Lan, Zhenzhong and Li, Chengxi and Li, Chongxuan and Li, Jianguo and Li, Zehuan and Liu, Huabin and Liu, Ling and Lu, Guoshan and Lu, Xiaocheng and Ma, Yuxin and Tan, Jianfeng and Wei, Lanning and Wen, Ji-Rong and ...

  5. [6]

    Bie, Tiwei and Cao, Maosong and Cao, Xiang and Chen, Bingsen and Chen, Fuyuan and Chen, Kun and Du, Lun and Feng, Daozhuo and Feng, Haibo and Gong, Mingliang and Gong, Zhuocheng and Gu, Yanmei and Guan, Jian and Guan, Kaiyuan and He, Hongliang and Huang, Zenan and Jiang, Juyong and Jiang, Zhonghui and Lan, Zhenzhong and Li, Chengxi and Li, Jianguo and Li,...

  6. [7]

    Black, Kevin and Brown, Noah and Darpinian, James and Dhabalia, Karan and Driess, Danny and Esmail, Adnan and Equi, Michael Robert and Finn, Chelsea and Fusai, Niccolo and Galliker, Manuel Y. and Ghosh, Dibya and Groom, Lachy and Hausman, Karol and Ichter, Brian and Jakubczak, Szymon and Jones, Tim and Ke, Liyiming and LeBlanc, Devin and Levine, Sergey an...

  7. [8]

    International Conference on Learning Representations , year =

    Training Diffusion Models with Reinforcement Learning , author =. International Conference on Learning Representations , year =

  8. [9]

    Black, Kevin and Brown, Noah and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Groom, Lachy and Hausman, Karol and Ichter, Brian and Jakubczak, Szymon and Jones, Tim and Ke, Liyiming and Levine, Sergey and Li-Bell, Adrian and Mothukuri, Mohith and Nair, Suraj and Pertsch, Karl and Shi, Lucy Xiaoyang and Tanner,...

  9. [10]

    Brohan, Anthony and Brown, Noah and Carbajal, Justice and Chebotar, Yevgen and Chen, Xi and Choromanski, Krzysztof and Ding, Tianli and Driess, Danny and Dubey, Avinava and Finn, Chelsea and Florence, Pete and Fu, Chuyuan and Arenas, Montse Gonzalez and Gopalakrishnan, Keerthana and Han, Kehang and Hausman, Karol and Herzog, Alexander and Hsu, Jasmine and...

  10. [11]

    and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar , booktitle =

    Caesar, Holger and Bankiti, Varun and Lang, Alex H. and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar , booktitle =

  11. [12]

    and Fletcher, Luke and Beijbom, Oscar and Omari, Sammy , journal =

    Caesar, Holger and Kabzan, Juraj and Tan, Kok Seang and Fong, Whye Kit and Wolff, Eric and Lang, Alex H. and Fletcher, Luke and Beijbom, Oscar and Omari, Sammy , journal =

  12. [13]

    Robotics: Science and Systems , year =

    Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , author =. Robotics: Science and Systems , year =

  13. [14]

    , booktitle =

    Chang, Huiwen and Zhang, Han and Jiang, Lu and Liu, Ce and Freeman, William T. , booktitle =

  14. [15]

    International Conference on Machine Learning , year =

    Muse: Text-to-Image Generation via Masked Generative Transformers , author =. International Conference on Machine Learning , year =

  15. [16]

    Chitta, Kashyap and Prakash, Aditya and Jaeger, Bernhard and Yu, Zehao and Renz, Katrin and Geiger, Andreas , journal =

  16. [17]

    IEEE International Conference on Robotics and Automation , pages =

    End-to-End Driving via Conditional Imitation Learning , author =. IEEE International Conference on Robotics and Automation , pages =

  17. [18]

    Couairon, Guillaume and Verbeek, Jakob and Schwenk, Holger and Cord, Matthieu , booktitle =

  18. [19]

    International Conference on Learning Representations , year =

    Directly Fine-Tuning Diffusion Models on Differentiable Rewards , author =. International Conference on Learning Representations , year =

  19. [20]

    Dang, Chenxu and Ang, Sining and Li, Yongkang and Tian, Haochen and Wang, Jie and Li, Guang and Ye, Hangjun and Ma, Jie and Chen, Long and Wang, Yan , journal =

  20. [21]

    Dauner, Daniel and Hallgarten, Matthias and Li, Tianyu and Weng, Xinshuo and Huang, Zhiyu and Yang, Zetong and Li, Hongyang and Gilitschenski, Igor and Ivanovic, Boris and Pavone, Marco and Geiger, Andreas and Chitta, Kashyap , booktitle =

  21. [22]

    Generative Modeling via Drifting

    Generative Modeling via Drifting , author =. arXiv preprint arXiv:2602.04770 , year =

  22. [23]

    Dosovitskiy, Alexey and Ros, German and Codevilla, Felipe and Lopez, Antonio and Koltun, Vladlen , booktitle =

  23. [24]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

    Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

  24. [25]

    International Conference on Learning Representations , year =

    Diffusion Policy Policy Optimization , author =. International Conference on Learning Representations , year =

  25. [26]

    Fan, Haoyang and Zhu, Fan and Liu, Changchun and Zhang, Liangliang and Zhuang, Li and Li, Dong and Zhu, Weichuan and Hu, Jiangtao and Li, Hongye and Kong, Qi , journal =. Baidu

  26. [27]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    Vector Quantized Diffusion Model for Text-to-Image Synthesis , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

  27. [28]

    International Conference on Learning Representations , year =

    Diffusion-Based Planning for Autonomous Driving with Flexible Guidance , author =. International Conference on Learning Representations , year =

  28. [29]

    Advances in Neural Information Processing Systems , volume =

    Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research , author =. Advances in Neural Information Processing Systems , volume =

  29. [30]

    2026 , url =

    Gong, Shansan and Zhang, Ruixiang and Zheng, Huangjie and Gu, Jiatao and Jaitly, Navdeep and Kong, Lingpeng and Zhang, Yizhe , booktitle =. 2026 , url =

  30. [31]

    He, Zhengfu and Sun, Tianxiang and Wang, Kuanning and Huang, Xuanjing and Qiu, Xipeng , journal =

  31. [32]

    Advances in Neural Information Processing Systems , volume =

    Denoising Diffusion Probabilistic Models , author =. Advances in Neural Information Processing Systems , volume =

  32. [33]

    Hu, Shengchao and Chen, Li and Wu, Penghao and Li, Hongyang and Yan, Junchi and Tao, Dacheng , booktitle =

  33. [34]

    Hu, Anthony and Russell, Lloyd and Yeo, Harrison and Murez, Zak and Fedoseev, Georgy and Kendall, Alex and Shotton, Jamie and Corrado, Greg , journal =

  34. [35]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    Planning-Oriented Autonomous Driving , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

  35. [36]

    International Conference on Machine Learning , pages =

    Planning with Diffusion for Flexible Behavior Synthesis , author =. International Conference on Machine Learning , pages =

  36. [37]

    Jiang, Chiyu and Cornman, Andre and Park, Cheolho and Sapp, Benjamin and Zhou, Yin and Anguelov, Dragomir , booktitle =

  37. [38]

    Jiang, Bo and Chen, Shaoyu and Xu, Qing and Liao, Bencheng and Chen, Jiajie and Zhou, Helong and Zhang, Qian and Liu, Wenyu and Huang, Chang and Wang, Xinggang , booktitle =

  38. [39]

    Li, Yingyan and Shang, Shuyao and Liu, Weisong and Zhan, Bing and Wang, Haochen and Wang, Yuqi and Chen, Yuntao and Wang, Xiaoman and An, Yasong and Tang, Chufeng and Hou, Lu and Fan, Lue and Zhang, Zhaoxiang , journal =

  39. [41]

    Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

    Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success , author =. arXiv preprint arXiv:2502.19645 , year =

  40. [42]

    ACM/IEEE International Conference on Cyber-Physical Systems , pages =

    Autoware on Board: Enabling Autonomous Vehicles with Embedded Systems , author =. ACM/IEEE International Conference on Cyber-Physical Systems , pages =

  41. [43]

    and Zhang, Hao and Stoica, Ion , booktitle =

    Kwon, Woosuk and Li, Zhuohan and Zhuang, Siyuan and Sheng, Ying and Zheng, Lianmin and Yu, Cody Hao and Gonzalez, Joseph E. and Zhang, Hao and Stoica, Ion , booktitle =. Efficient Memory Management for Large Language Model Serving with

  42. [44]

    2025 , doi =

    Liao, Bencheng and Chen, Shaoyu and Yin, Haoran and Jiang, Bo and Wang, Cheng and Yan, Sixu and Zhang, Xinbang and Li, Xiangyu and Zhang, Ying and Zhang, Qian and Wang, Xinggang , booktitle =. 2025 , doi =

  43. [45]

    Li, Qixiu and Liang, Yaobo and Wang, Zeyu and Luo, Lin and Chen, Xi and Liao, Mozheng and Wei, Fangyun and Deng, Yu and Xu, Sicheng and Zhang, Yizhong and Wang, Xiaofan and Liu, Bei and Fu, Jianlong and Bao, Jianmin and Chen, Dong and Shi, Yuanchun and Yang, Jiaolong and Guo, Baining , journal =

  44. [46]

    Li, Shufan and Kallidromitis, Konstantinos and Bansal, Hritik and Gokul, Akash and Kato, Yusuke and Kozuka, Kazuki and Kuen, Jason and Lin, Zhe and Chang, Kai-Wei and Grover, Aditya , journal =

  45. [47]

    Proceedings of the IEEE International Conference on Computer Vision , pages =

    Focal Loss for Dense Object Detection , author =. Proceedings of the IEEE International Conference on Computer Vision , pages =

  46. [48]

    International Conference on Learning Representations , year =

    Flow Matching for Generative Modeling , author =. International Conference on Learning Representations , year =

  47. [49]

    International Conference on Machine Learning , year =

    Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution , author =. International Conference on Machine Learning , year =

  48. [50]

    Lugmayr, Andreas and Danelljan, Martin and Romero, Andres and Yu, Fisher and Timofte, Radu and Van Gool, Luc , booktitle =

  49. [51]

    Meng, Chenlin and He, Yutong and Song, Yang and Song, Jiaming and Wu, Jiajun and Zhu, Jun-Yan and Ermon, Stefano , booktitle =

  50. [52]

    and Pearce, Tim and Fleuret, Fran

    Alonso, Eloi and Jelley, Adam and Micheli, Vincent and Kanervisto, Anssi and Storkey, Amos J. and Pearce, Tim and Fleuret, Fran. Diffusion for World Modeling: Visual Details Matter in. Advances in Neural Information Processing Systems , year =

  51. [54]

    Octo: An Open-Source Generalist Robot Policy

    Octo: An Open-Source Generalist Robot Policy , author =. arXiv preprint arXiv:2405.12213 , year =

  52. [55]

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Open. arXiv preprint arXiv:2310.08864 , year =

  53. [56]

    2025 , url =

    Pertsch, Karl and Stachowicz, Kyle and Ichter, Brian and Driess, Danny and Nair, Suraj and Vuong, Quan and Mees, Oier and Finn, Chelsea and Levine, Sergey , booktitle =. 2025 , url =

  54. [57]

    Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chunguang and Luo, Ping and Geiger, Andreas and Li, Hongyang , booktitle =

  55. [58]

    International Conference on Machine Learning , pages =

    Consistency Models , author =. International Conference on Machine Learning , pages =

  56. [59]

    International Conference on Learning Representations , year =

    Score-Based Generative Modeling through Stochastic Differential Equations , author =. International Conference on Learning Representations , year =

  57. [61]

    Tian, Xiaoyu and Gu, Junru and Li, Bailin and Liu, Yicheng and Hu, Chenxu and Wang, Yang and Zhan, Kun and Jia, Peng and Lang, Xianpeng and Zhao, Hang , booktitle =

  58. [62]

    Advances in Neural Information Processing Systems , volume =

    Attention Is All You Need , author =. Advances in Neural Information Processing Systems , volume =

  59. [63]

    Wen, Yuqing and Li, Hebei and Gu, Kefan and Zhao, Yucheng and Wang, Tiancai and Sun, Xiaoyan , journal =

  60. [64]

    Wu, Chengyue and Zhang, Hao and Xue, Shuchen and Liu, Zhijian and Diao, Shizhe and Zhu, Ligeng and Luo, Ping and Han, Song and Xie, Enze , journal =

  61. [65]

    Wang, Guanghan and Schiff, Yair and Turok, Gilad and Kuleshov, Volodymyr , journal =

  62. [66]

    and Tian, Yuandong and Liu, Bo , journal =

    Wang, Chenyu and Rashidinejad, Paria and Su, DiJia and Jiang, Song and Wang, Sid and Zhao, Siyan and Zhou, Cai and Shen, Shannon Zejiang and Chen, Feiyu and Jaakkola, Tommi S. and Tian, Yuandong and Liu, Bo , journal =

  63. [67]

    Feng, Xiaoxin and Gao, Ziyan and Kan, Yuheng and Wu, Wei , journal =

  64. [68]

    International Conference on Machine Learning , pages =

    Xiao, Guangxuan and Lin, Ji and Seznec, Micka. International Conference on Machine Learning , pages =

  65. [69]

    and Li, Zhenguo and Zhao, Hengshuang , journal =

    Xu, Zhenhua and Zhang, Yujia and Xie, Enze and Zhao, Zhen and Guo, Yong and Wong, Kwan-Yee K. and Li, Zhenguo and Zhao, Hengshuang , journal =

  66. [70]

    Yang, Ling and Tian, Ye and Li, Bowen and Zhang, Xinchen and Shen, Ke and Tong, Yunhai and Wang, Mengdi , journal =

  67. [71]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    Paint by Example: Exemplar-Based Image Editing with Diffusion Models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

  68. [72]

    Dream 7B: Diffusion Large Language Models

    Dream 7B: Diffusion Large Language Models , author =. arXiv preprint arXiv:2508.15487 , year =

  69. [73]

    Igniting

    Zhai, Andy and Liu, Brae and Fang, Bruno and Cai, Chalse and Ma, Ellie and Yin, Ethan and Wang, Hao and Zhou, Hugo and Wang, James and Shi, Lights and Liang, Lucy and Wang, Make and Wang, Qian and Gan, Roy and Yu, Ryan and Li, Shalfun and Liu, Starrick and Chen, Sylas and Chen, Vincent and Xu, Zach , journal =. Igniting

  70. [75]

    IEEE International Conference on Robotics and Automation , pages =

    Guided Conditional Diffusion for Controllable Traffic Simulation , author =. IEEE International Conference on Robotics and Automation , pages =. 2023 , doi =

  71. [76]

    Proceedings of The 7th Conference on Robot Learning , pages =

    Language-Guided Traffic Simulation via Scene-Level Diffusion , author =. Proceedings of The 7th Conference on Robot Learning , pages =. 2023 , url =

  72. [78]

    Li, Yongkang and Xiong, Kaixin and Guo, Xiangyu and Li, Fang and Yan, Sixu and Xu, Gangwei and Zhou, Lijun and Chen, Long and Sun, Haiyang and Wang, Bing and Chen, Guang and Ye, Hangjun and Liu, Wenyu and Wang, Xinggang , journal =

  73. [79]

    Zhao, Siyan and Gupta, Devaansh and Zheng, Qinqing and Grover, Aditya , journal =

  74. [80]

    Xing, Zebin and Zhang, Xingyu and Hu, Yang and Jiang, Bo and He, Tong and Zhang, Qian and Long, Xiaoxiao and Yin, Wei , booktitle =

  75. [81]

    and Zhang, Yun and Huang, Zhiyu and Zhou, Bolei and Ma, Jiaqi , booktitle =

    Zhou, Zewei and Cai, Tianhui and Zhao, Seth Z. and Zhang, Yun and Huang, Zhiyu and Zhou, Bolei and Ma, Jiaqi , booktitle =. 2025 , url =

  76. [82]

    and Salazar, Grecia and Ryoo, Michael S

    Zitkovich, Brianna and Yu, Tianhe and Xu, Sichun and Xu, Peng and Xiao, Ted and Xia, Fei and Wu, Jialin and Wohlhart, Paul and Welker, Stefan and Wahid, Ayzaan and Vuong, Quan and Vanhoucke, Vincent and Tran, Huong and Soricut, Radu and Singh, Anikait and Singh, Jaspiar and Sermanet, Pierre and Sanketi, Pannag R. and Salazar, Grecia and Ryoo, Michael S. a...

  77. [83]

    Bansal, Mayank and Krizhevsky, Alex and Ogale, Abhijit , booktitle =

  78. [84]

    Wang, Xiaofeng and Zhu, Zheng and Huang, Guan and Chen, Xinze and Lu, Jiwen , journal =

  79. [85]

    Conference on Robot Learning , pages =

    Learning by Cheating , author =. Conference on Robot Learning , pages =

  80. [87]

    Block diffusion: Interpolating between autoregressive and diffusion language models.arXiv preprint arXiv:2503.09573,

    Marianne Arriola, Aaron Gokaslan, Justin T. Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, and Volodymyr Kuleshov. Block diffusion: Interpolating between autoregressive and diffusion language models. In International Conference on Learning Representations, 2025. URL https://arxiv.org/abs/2503.09573

Showing first 80 references.