pith. machine review for the scientific record. sign in

arxiv: 2605.03075 · v1 · submitted 2026-05-04 · 💻 cs.RO · cs.AI· cs.LG

Recognition: 3 theorem links

· Lean Theorem

Refining Compositional Diffusion for Reliable Long-Horizon Planning

Anh Tong, Jaesik Choi, Kyowoon Lee, Yunhao Luo

Pith reviewed 2026-05-08 17:43 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords compositional diffusionlong-horizon planningmode averagingguidance methoddiffusion modelsroboticstrajectory stitching
0
0 comments X

The pith

Refining compositional diffusion uses reconstruction error and overlap consistency to steer sampling toward high-density coherent long-horizon plans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Compositional diffusion plans long trajectories by stitching overlapping short segments from pretrained models, but multimodal local distributions produce mode-averaged outputs that are neither feasible nor coherent. Refining Compositional Diffusion (RCD) introduces training-free guidance that treats a pretrained diffusion model's self-reconstruction error as a proxy for the log-density of the full composed plan. An additional overlap consistency term penalizes mismatches at segment boundaries. Together these terms concentrate the sampling trajectory on high-density regions, reducing mode-averaging without any retraining. Experiments on locomotion, manipulation, and pixel-based tasks from OGBench show consistent gains over prior compositional baselines.

Core claim

RCD is a guidance procedure that adds the self-reconstruction error of a pretrained diffusion model, interpreted as log-density of the stitched plan, to the composed score, together with an explicit overlap consistency penalty; the resulting guidance concentrates samples on high-density, boundary-consistent plans that avoid the mode-averaging failure mode of naive score composition.

What carries the argument

Self-reconstruction error treated as log-density proxy, combined with an overlap consistency term, used as additive guidance on the composed score.

If this is right

  • Sampling concentrates on high-density regions of the composed plan distribution.
  • Mode-averaging is reduced, yielding plans that remain locally feasible and globally coherent.
  • No retraining or architectural change is required; the method applies to any pretrained diffusion planner.
  • Performance improves on long-horizon locomotion, object manipulation, and pixel-based tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reconstruction-error signal could serve as a cheap diagnostic for plan quality in other diffusion-based planners.
  • Boundary consistency terms may be useful beyond diffusion when stitching any locally generated trajectory segments.
  • If the proxy holds, it suggests that internal model signals already encode enough information to resolve global coherence without external rewards.

Load-bearing premise

The self-reconstruction error of the pretrained diffusion model remains a faithful proxy for the log-density of the stitched plan even when adjacent segments contain incompatible local modes.

What would settle it

A controlled test in which deliberately incompatible local modes are stitched; if the guided sampler still produces plans with feasibility rates no better than unguided compositional diffusion, the density-proxy assumption fails.

Figures

Figures reproduced from arXiv: 2605.03075 by Anh Tong, Jaesik Choi, Kyowoon Lee, Yunhao Luo.

Figure 1
Figure 1. Figure 1: Toy illustration of the mode-averaging problem. (a) Training data consists of overlapping l=3 segments, each shown in a distinct color. They are anchored at a fixed start and goal , and pass through a bimodal distribution with modes at +1 and −1. (b) CompDiffuser averages incompatible modes in overlap regions, producing many invalid (red) trajectories off both modes. (c) RCD guides the denoising toward hig… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of RCD, contrasted with CompDiffuser and CDGS. Trajectories are planned from start to goal over a transition distribution . (a) CompDiffuser composes overlapping segments via bidirectional conditioning and autoregressive denoising, but produces mode-averaged trajectories in low-density regions when local distributions are multimodal. (b) CDGS mitigates this through population-based search and prun… view at source ↗
Figure 3
Figure 3. Figure 3: Plan quality comparisons of Com￾pDiffuser, CDGS, and RCD. Each column shows 20 sampled plans in the AntMaze-Giant-Stitch environment for 5 test-time tasks defined in OG￾Bench. Plans that violate environment constraints (wall penetration) are shown in red; feasible plans in green. We first assess whether RCD improves the phys￾ical feasibility of composed plans view at source ↗
Figure 4
Figure 4. Figure 4: A cube manipulation se￾quence executed by RCD. Object Manipulation in High-Dimensional State Spaces view at source ↗
Figure 5
Figure 5. Figure 5: Plan quality comparisons on PointMaze-Giant-Stitch. Each column shows 20 sampled plans from CompDiffuser, CDGS, and RCD for 5 test-time tasks defined in OGBench. Plans that violate environment constraints (wall penetration) are shown in red; feasible plans in green. averaged over 5 random seeds, each evaluated on 5 start-goal pairs with 20 episodes per pair. The defaults (w=0.25, λov=0.5, s/T=0.4) are high… view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of RCD rollout execution on AntMaze-Giant-Stitch. The ant agent navigates from the starting region to the pink goal. 25 view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of RCD rollout execution on AntSoccer-Medium-Stitch. The ant agent dribbles the soccer ball toward the goal location. 26 view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of RCD rollout execution on Cube-Triple. The 6-DoF UR5e robot arm arranges three cubes into their target configuration via pick-and-place. t = 0 t = 55 t = 110 t = 165 t = 220 t = 275 t = 330 t = 385 view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of RCD rollout execution on Visual-AntMaze-Medium-Stitch. The ant agent navigates from the starting region to the goal in a long-horizon task where start and goal are at opposite ends of the maze. Frames are the env-rendered 64×64 pixel observations. Planning is performed in a learned 16-dimensional VAE latent space derived from these pixel renders. 27 view at source ↗
read the original abstract

Compositional diffusion planning generates long-horizon trajectories by stitching together overlapping short-horizon segments through score composition. However, when local plan distributions are multimodal, existing compositional methods suffer from mode-averaging, where averaging incompatible local modes leads to plans that are neither locally feasible nor globally coherent. We propose Refining Compositional Diffusion (RCD), a training-free guidance method that steers compositional sampling toward high-density, globally coherent plans. RCD leverages the self-reconstruction error of a pretrained diffusion model as a proxy for the log-density of composed plans, combined with an overlap consistency term that enforces consistency at segment boundaries. We show that the combined guidance concentrates sampling on high-density plans that mitigate mode-averaging. Experiments on challenging long-horizon tasks from OGBench, including locomotion, object manipulation, and pixel-based observations, demonstrate that RCD consistently outperforms existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that compositional diffusion planning for long-horizon trajectories suffers from mode-averaging when stitching multimodal short-horizon segments, and proposes Refining Compositional Diffusion (RCD) as a training-free guidance technique. RCD combines the self-reconstruction error of a pretrained diffusion model (used as a proxy for the log-density of the composed plan) with an overlap consistency term to steer sampling toward high-density, coherent plans. Experiments on OGBench tasks (locomotion, object manipulation, pixel observations) show consistent outperformance over prior compositional methods.

Significance. If the reconstruction-error proxy reliably approximates the composed log-density and the guidance mitigates mode-averaging without new artifacts, the work would be significant for enabling reliable long-horizon planning from pretrained short-horizon models in a training-free manner. Strengths include the training-free design, explicit use of a pretrained model, and evaluation across diverse OGBench domains including pixel-based observations.

major comments (3)
  1. [§3.2] §3.2 (method): The central claim that self-reconstruction error on the concatenated trajectory serves as a proxy for log-density of the composed (product) distribution lacks any derivation, bound, or inequality showing that argmin of this error coincides with argmax of log p_composed. When local modes conflict, the joint error can be dominated by local denoising residuals rather than the measure of the stitched distribution; the overlap term addresses boundaries but does not correct this global mis-estimation.
  2. [Experiments section, Tables 1-2] Experiments section, Tables 1-2: Reported outperformance on OGBench lacks quantitative details on effect sizes, standard deviations, number of random seeds, or ablation isolating the reconstruction-error guidance from the overlap-consistency term. Without these, it is impossible to verify whether the claimed mitigation of mode-averaging is robust or driven by one component.
  3. [§4.1] §4.1 (analysis): No theoretical or empirical examination is given of cases where incompatible local modes cause the reconstruction error to fail as a density proxy, nor of whether the combined guidance can still produce incoherent plans in highly multimodal settings.
minor comments (2)
  1. [Abstract] The abstract would benefit from brief quantitative effect sizes or success-rate deltas to ground the 'consistently outperforms' claim.
  2. [§3] Notation for the two guidance scales (e.g., weighting coefficients) could be introduced earlier and used consistently in the method equations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive review and the recommendation for major revision. We appreciate the focus on strengthening the theoretical motivation, experimental reporting, and analysis of limitations. We address each major comment below and will incorporate revisions to improve the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (method): The central claim that self-reconstruction error on the concatenated trajectory serves as a proxy for log-density of the composed (product) distribution lacks any derivation, bound, or inequality showing that argmin of this error coincides with argmax of log p_composed. When local modes conflict, the joint error can be dominated by local denoising residuals rather than the measure of the stitched distribution; the overlap term addresses boundaries but does not correct this global mis-estimation.

    Authors: We agree that §3.2 presents the reconstruction error as a proxy without a formal derivation or inequality establishing equivalence to the log-density of the composed distribution. The motivation is that, for a single diffusion model, reconstruction error correlates with negative log-likelihood under the learned distribution, and we extend this heuristically to the product of short-horizon models via concatenation. However, we acknowledge that local residuals can dominate when modes conflict and that the overlap term primarily enforces boundary consistency. In the revision we will explicitly label this as an approximation in §3.2, add a paragraph discussing the conditions under which the proxy may degrade, and note that empirical success on OGBench tasks supports its practical utility despite the lack of a tight bound. revision: partial

  2. Referee: [Experiments section, Tables 1-2] Experiments section, Tables 1-2: Reported outperformance on OGBench lacks quantitative details on effect sizes, standard deviations, number of random seeds, or ablation isolating the reconstruction-error guidance from the overlap-consistency term. Without these, it is impossible to verify whether the claimed mitigation of mode-averaging is robust or driven by one component.

    Authors: We accept that the current tables omit standard deviations, seed counts, effect sizes, and component ablations. The revised manuscript will update Tables 1 and 2 with means and standard deviations over 5 random seeds, include effect-size metrics (e.g., relative improvement percentages), and add a dedicated ablation table that runs RCD with reconstruction-error guidance disabled and with overlap consistency disabled. These additions will allow readers to assess the robustness of mode-averaging mitigation and the individual contributions of each term. revision: yes

  3. Referee: [§4.1] §4.1 (analysis): No theoretical or empirical examination is given of cases where incompatible local modes cause the reconstruction error to fail as a density proxy, nor of whether the combined guidance can still produce incoherent plans in highly multimodal settings.

    Authors: The existing §4.1 emphasizes successful long-horizon results but does not systematically explore failure regimes. We will expand this section with a new subsection that (i) provides theoretical intuition on when local denoising residuals may overwhelm the global density signal and (ii) presents empirical examples drawn from the OGBench locomotion and manipulation tasks, including both cases where the combined guidance yields coherent plans and cases where incompatible modes still produce artifacts. This will give a more balanced characterization of the method’s limitations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; guidance terms are independently defined proxies.

full rationale

The paper introduces RCD as a training-free method that explicitly defines two guidance terms—the self-reconstruction error of a pretrained diffusion model used as a log-density proxy, plus an overlap consistency term—without reducing either to a fitted parameter, a self-citation chain, or a renamed input. The central claim that the combined guidance mitigates mode-averaging is supported by empirical results on OGBench tasks rather than by construction from the method's own definitions. No load-bearing step equates the claimed concentration on high-density plans to the inputs by definition or prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions: (1) that reconstruction error of a pretrained diffusion model correlates with log-density of stitched plans, and (2) that enforcing overlap consistency is sufficient to resolve multimodal conflicts. No free parameters are introduced in the abstract description; the method is presented as training-free.

axioms (2)
  • domain assumption Self-reconstruction error of a pretrained diffusion model serves as a usable proxy for the log-density of a composed plan.
    Invoked in the description of RCD guidance; no derivation or external validation supplied in the abstract.
  • domain assumption Adding an overlap consistency term during sampling will concentrate probability mass on globally coherent plans.
    Stated as part of the combined guidance mechanism.

pith-pipeline@v0.9.0 · 5453 in / 1471 out tokens · 46923 ms · 2026-05-08T17:43:25.085735+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

85 extracted references · 15 canonical work pages

  1. [1]

    floq: Training critics via flow- matching for scaling compute in value-based rl

    Agrawalla, B., Nauman, M., Agrawal, K., and Kumar, A. floq: Training critics via flow- matching for scaling compute in value-based rl. InInternational Conference on Learning Representations (ICLR), 2026

  2. [2]

    Option-aware temporally abstracted value for offline goal-conditioned reinforcement learning

    Ahn, H., Choi, H., Han, J., and Moon, T. Option-aware temporally abstracted value for offline goal-conditioned reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  3. [3]

    B., Jaakkola, T

    Ajay, A., Du, Y ., Gupta, A., Tenenbaum, J. B., Jaakkola, T. S., and Agrawal, P. Is conditional generative modeling all you need for decision making? InInternational Conference on Learning Representations (ICLR), 2023

  4. [4]

    Graph-assisted stitching for offline hierarchical reinforcement learning

    Baek, S., Park, T., Park, J., Oh, S., and Kim, Y . Graph-assisted stitching for offline hierarchical reinforcement learning. InInternational Conference on Machine Learning (ICML), 2025

  5. [5]

    Flooddiffusion: Tailored diffusion forcing for streaming motion generation.arXiv preprint arXiv:2512.03520, 2025

    Cai, Y ., Wu, Y ., Li, K., Zhou, Y ., Zheng, B., and Liu, H. Flooddiffusion: Tailored diffusion forcing for streaming motion generation.arXiv preprint arXiv:2512.03520, 2025

  6. [6]

    M., and Schneider, J

    Char, I., Mehta, V ., Villaflor, A., Dolan, J. M., and Schneider, J. Bats: Best action trajectory stitching.arXiv preprint arXiv:2204.12026, 2022

  7. [7]

    Diffusion forcing: Next-token prediction meets full-sequence diffusion

    Chen, B., Martí Monsó, D., Du, Y ., Simchowitz, M., Tedrake, R., and Sitzmann, V . Diffusion forcing: Next-token prediction meets full-sequence diffusion. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  8. [8]

    Simple hierarchical planning with diffusion.arXiv preprint arXiv:2401.02644, 2024

    Chen, C., Deng, F., Kawaguchi, K., Gulcehre, C., and Ahn, S. Simple hierarchical planning with diffusion.arXiv preprint arXiv:2401.02644, 2024

  9. [9]

    Extendable long-horizon planning via hierarchical multiscale diffusion.arXiv e-prints, pp

    Chen, C., Hamed, H., Baek, D., Kang, T., Bengio, Y ., and Ahn, S. Extendable long-horizon planning via hierarchical multiscale diffusion.arXiv e-prints, pp. arXiv–2503, 2025

  10. [10]

    Diffusion policy: Visuomotor policy learning via action diffusion

    Chi, C., Feng, S., Du, Y ., Xu, Z., Cousineau, E., Burchfiel, B., and Song, S. Diffusion policy: Visuomotor policy learning via action diffusion. InProceedings of Robotics: Science and Systems (RSS), 2023

  11. [11]

    T., Klasky, M

    Chung, H., Kim, J., Mccann, M. T., Klasky, M. L., and Ye, J. C. Diffusion posterior sampling for general noisy inverse problems. InInternational Conference on Learning Representations (ICLR), 2023

  12. [12]

    and Shkurti, F

    Clark, Q. and Shkurti, F. What do you need for diverse trajectory composition in diffusion planning?arXiv preprint arXiv:2505.18083, 2025

  13. [13]

    and Nichol, A

    Dhariwal, P. and Nichol, A. Diffusion models beat gans on image synthesis. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

  14. [14]

    Proximal action replacement for behavior cloning actor-critic in offline reinforcement learning.arXiv preprint arXiv:2602.07441, 2026

    Dong, J., Huang, W., Zhang, J., Chen, Z., Yuan, X., Gu, Q., Jiang, Z., and Ye, N. Proximal action replacement for behavior cloning actor-critic in offline reinforcement learning.arXiv preprint arXiv:2602.07441, 2026

  15. [15]

    Diffuserlite: Towards real-time diffusion planning

    Dong, Z., Hao, J., Yuan, Y ., Ni, F., Wang, Y ., Li, P., and Zheng, Y . Diffuserlite: Towards real-time diffusion planning. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  16. [16]

    Cleandiffuser: An easy-to-use modularized library for diffusion models in decision making

    Dong, Z., Yuan, Y ., Hao, J., Ni, F., Ma, Y ., Li, P., and Zheng, Y . Cleandiffuser: An easy-to-use modularized library for diffusion models in decision making. InAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2024

  17. [17]

    Compositional visual generation with energy based models

    Du, Y ., Li, S., and Mordatch, I. Compositional visual generation with energy based models. In Advances in Neural Information Processing Systems (NeurIPS), 2020

  18. [18]

    B., Dieleman, S., Fergus, R., Sohl-Dickstein, J., Doucet, A., and Grathwohl, W

    Du, Y ., Durkan, C., Strudel, R., Tenenbaum, J. B., Dieleman, S., Fergus, R., Sohl-Dickstein, J., Doucet, A., and Grathwohl, W. S. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc. InInternational Conference on Machine Learning (ICML), 2023. 10

  19. [19]

    Eysenbach, B., Zhang, T., Levine, S., and Salakhutdinov, R. R. Contrastive learning as goal- conditioned reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  20. [20]

    Temporal difference flows

    Farebrother, J., Pirotta, M., Tirinzoni, A., Munos, R., Lazaric, A., and Touati, A. Temporal difference flows. InInternational Conference on Machine Learning (ICML), 2025

  21. [21]

    Ada- diffuser: Latent-aware adaptive diffusion for decision-making

    Feng, F., Ge, S., Fu, M., Li, Z., Zheng, Y ., Tang, Z., Hu, Y ., Huang, B., and Zhang, K. Ada- diffuser: Latent-aware adaptive diffusion for decision-making. InInternational Conference on Learning Representations (ICLR), 2026

  22. [22]

    Resisting stochastic risks in diffusion planners with the trajectory aggregation tree

    Feng, L., Gu, P., An, B., and Pan, G. Resisting stochastic risks in diffusion planners with the trajectory aggregation tree. InInternational Conference on Machine Learning (ICML), 2024

  23. [23]

    Diffusion guidance is a controllable policy improvement operator.arXiv preprint arXiv:2505.23458, 2025

    Frans, K., Park, S., Abbeel, P., and Levine, S. Diffusion guidance is a controllable policy improvement operator.arXiv preprint arXiv:2505.23458, 2025

  24. [24]

    Learning to reach goals via iterated supervised learning, 2020

    Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C., Eysenbach, B., and Levine, S. Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088, 2019

  25. [25]

    Closing the gap between td learning and supervised learning–a generalisation point of view

    Ghugare, R., Geist, M., Berseth, G., and Eysenbach, B. Closing the gap between td learning and supervised learning–a generalisation point of view. InInternational Conference on Learning Representations (ICLR), 2024

  26. [26]

    Hierarchical entity- centric reinforcement learning with factored subgoal diffusion

    Haramati, D., Qi, C., Daniel, T., Zhang, A., Tamar, A., and Konidaris, G. Hierarchical entity- centric reinforcement learning with factored subgoal diffusion. InInternational Conference on Learning Representations (ICLR), 2026

  27. [27]

    Z., Salakhutdinov, R., et al

    He, Y ., Murata, N., Lai, C.-H., Takida, Y ., Uesaka, T., Kim, D., Liao, W.-H., Mitsufuji, Y ., Kolter, J. Z., Salakhutdinov, R., et al. Manifold preserving guided diffusion. InInternational Conference on Learning Representations (ICLR), 2024

  28. [28]

    Hepburn, C. A. and Montana, G. Model-based trajectory stitching for improved offline rein- forcement learning.arXiv preprint arXiv:2211.11603, 2022

  29. [29]

    Denoising diffusion probabilistic models

    Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

  30. [30]

    Policy-guided diffusion.arXiv preprint arXiv:2404.06356, 2024

    Jackson, M. T., Matthews, M. T., Lu, C., Ellis, B., Whiteson, S., and Foerster, J. Policy-guided diffusion.arXiv preprint arXiv:2404.06356, 2024

  31. [31]

    B., and Levine, S

    Janner, M., Du, Y ., Tenenbaum, J. B., and Levine, S. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning (ICML), 2022

  32. [32]

    Tree-guided diffusion planner

    Jeon, H., Min, C., and Park, J. Tree-guided diffusion planner. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  33. [33]

    Prior-guided diffusion planning for offline reinforce- ment learning

    Ki, D., Oh, J., Shim, S.-W., and Lee, B.-J. Prior-guided diffusion planning for offline reinforce- ment learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  34. [34]

    DEAS: DEtached value learning with action sequence for scalable offline RL

    Kim, C., Lee, H., Seo, Y ., Lee, K., and Zhu, Y . DEAS: DEtached value learning with action sequence for scalable offline RL. InInternational Conference on Learning Representations (ICLR), 2026

  35. [35]

    Offline reinforcement learning with implicit q-learning

    Kostrikov, I., Nair, A., and Levine, S. Offline reinforcement learning with implicit q-learning. InInternational Conference on Learning Representations (ICLR), 2022

  36. [36]

    Gta: Generative trajectory augmentation with guidance for offline reinforcement learning.arXiv preprint arXiv:2405.16907, 2024

    Lee, J., Yun, S., Yun, T., and Park, J. Gta: Generative trajectory augmentation with guidance for offline reinforcement learning.arXiv preprint arXiv:2405.16907, 2024

  37. [37]

    and Choi, J

    Lee, K. and Choi, J. Local manifold approximation and projection for manifold-aware diffusion planning. InInternational Conference on Machine Learning (ICML), 2025

  38. [38]

    and Choi, J

    Lee, K. and Choi, J. State-covering trajectory stitching for diffusion planners. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. 11

  39. [39]

    Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling

    Lee, K., Kim, S.-A., Choi, J., and Lee, S.-W. Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling. InInternational Conference on Machine Learning (ICML), 2018

  40. [40]

    Refining diffusion planner for reliable behavior synthesis by automatic detection of infeasible plans

    Lee, K., Kim, S., and Choi, J. Refining diffusion planner for reliable behavior synthesis by automatic detection of infeasible plans. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  41. [41]

    Diffstitch: Boosting offline reinforcement learning with diffusion-based trajectory stitching

    Li, G., Shan, Y ., Zhu, Z., Long, T., and Zhang, W. Diffstitch: Boosting offline reinforcement learning with diffusion-based trajectory stitching. InInternational Conference on Machine Learning (ICML), 2024

  42. [42]

    Decoupled q-chunking

    Li, Q., Park, S., and Levine, S. Decoupled q-chunking. InInternational Conference on Learning Representations (ICLR), 2026

  43. [43]

    Hierarchical diffusion for offline decision making

    Li, W., Wang, X., Jin, B., and Zha, H. Hierarchical diffusion for offline decision making. In International Conference on Machine Learning (ICML), 2023

  44. [44]

    K., Koenig, S., and Fioretto, F

    Liang, J., Christopher, J. K., Koenig, S., and Fioretto, F. Simultaneous multi-robot motion planning with projected diffusion models. InInternational Conference on Machine Learning (ICML), 2025

  45. [45]

    Adaptdiffuser: Diffusion models as adaptive self-evolving planners

    Liang, Z., Mu, Y ., Ding, M., Ni, F., Tomizuka, M., and Luo, P. Adaptdiffuser: Diffusion models as adaptive self-evolving planners. InInternational Conference on Machine Learning (ICML), 2023

  46. [46]

    What makes a good diffusion planner for decision making? InInternational Conference on Learning Representations (ICLR), 2025

    Lu, H., Han, D., Shen, Y ., and Li, D. What makes a good diffusion planner for decision making? InInternational Conference on Learning Representations (ICLR), 2025

  47. [47]

    Improving diffusion planners by self-supervised action gating with energies.arXiv preprint arXiv:2603.02650, 2026

    Lu, Y ., Han, D., Wang, Y ., and Li, D. Improving diffusion planners by self-supervised action gating with energies.arXiv preprint arXiv:2603.02650, 2026

  48. [48]

    B., and Du, Y

    Luo, Y ., Sun, C., Tenenbaum, J. B., and Du, Y . Potential based diffusion motion planning. In International Conference on Machine Learning (ICML), 2024

  49. [49]

    A., Du, Y ., and Xu, D

    Luo, Y ., Mishra, U. A., Du, Y ., and Xu, D. Generative trajectory stitching through diffusion composition. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  50. [50]

    Learning latent plans from play

    Lynch, C., Khansari, M., Xiao, T., Kumar, V ., Tompson, J., Levine, S., and Sermanet, P. Learning latent plans from play. InConference on robot learning, 2020

  51. [51]

    A., Xue, S., Chen, Y ., and Xu, D

    Mishra, U. A., Xue, S., Chen, Y ., and Xu, D. Generative skill chaining: Long-horizon skill planning with diffusion models. InConference on Robot Learning, pp. 2905–2925. PMLR, 2023

  52. [52]

    A., He, D., Chen, Y ., and Xu, D

    Mishra, U. A., He, D., Chen, Y ., and Xu, D. Compositional diffusion with guided search for long-horizon planning. InInternational Conference on Learning Representations (ICLR), 2026

  53. [53]

    Hdflow: Hierarchical diffusion-flow planning for long-horizon robotic assembly

    Nandiraju, G., Ju, Y ., Xu, C., and Wang, H. Hdflow: Hierarchical diffusion-flow planning for long-horizon robotic assembly. InNeurIPS 2025 Workshop on Embodied World Models for Decision Making, 2025

  54. [54]

    Test-time graph search for goal-conditioned reinforcement learning.arXiv preprint arXiv:2510.07257, 2025

    Opryshko, E., Quan, J., V oelcker, C., Du, Y ., and Gilitschenski, I. Test-time graph search for goal-conditioned reinforcement learning.arXiv preprint arXiv:2510.07257, 2025

  55. [55]

    Scalable offline model-based RL with action chunks

    Park, K., Park, S., Lee, Y ., and Levine, S. Scalable offline model-based RL with action chunks. InInternational Conference on Learning Representations (ICLR), 2026

  56. [56]

    Offline goal-conditioned rl with latent states as actions

    Park, S., Ghosh, D., Eysenbach, B., and Levine, S. Offline goal-conditioned rl with latent states as actions. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  57. [57]

    Foundation policies with hilbert representations

    Park, S., Kreiman, T., and Levine, S. Foundation policies with hilbert representations. In International Conference on Machine Learning (ICML), 2024. 12

  58. [58]

    Ogbench: Benchmarking offline goal- conditioned rl

    Park, S., Frans, K., Eysenbach, B., and Levine, S. Ogbench: Benchmarking offline goal- conditioned rl. InInternational Conference on Learning Representations (ICLR), 2025

  59. [59]

    Horizon reduction makes rl scalable

    Park, S., Frans, K., Mann, D., Eysenbach, B., Kumar, A., and Levine, S. Horizon reduction makes rl scalable. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  60. [60]

    Flow q-learning

    Park, S., Li, Q., and Levine, S. Flow q-learning. InInternational Conference on Machine Learning (ICML), 2025

  61. [61]

    Dual goal representations

    Park, S., Mann, D., and Levine, S. Dual goal representations. InInternational Conference on Learning Representations (ICLR), 2026

  62. [62]

    Transitive rl: Value learning via divide and conquer

    Park, S., Oberai, A., Atreya, P., and Levine, S. Transitive rl: Value learning via divide and conquer. InInternational Conference on Learning Representations (ICLR), 2026

  63. [63]

    M., and Han, J

    Ren, Y ., Gao, W., Ying, L., Rotskoff, G. M., and Han, J. Driftlite: Lightweight drift control for inference-time scaling of diffusion models. InInternational Conference on Learning Representations (ICLR), 2026

  64. [64]

    Robbins, H. E. An empirical bayes approach to statistics. InBreakthroughs in Statistics: Foundations and basic theory, pp. 388–394. Springer, 1992

  65. [65]

    Multi-robot motion planning with diffusion models

    Shaoul, Y ., Mishani, I., Vats, S., Li, J., and Likhachev, M. Multi-robot motion planning with diffusion models. InInternational Conference on Learning Representations (ICLR), 2025

  66. [66]

    Understanding and improving training-free loss-based diffusion guidance

    Shen, Y ., Jiang, X., Yang, Y ., Wang, Y ., Han, D., and Li, D. Understanding and improving training-free loss-based diffusion guidance. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  67. [67]

    J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V ., Lanctot, M., et al

    Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V ., Lanctot, M., et al. Mastering the game of go with deep neural networks and tree search.nature, 529(7587):484–489, 2016

  68. [68]

    Mastering the game of go without human knowledge

    Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017

  69. [69]

    Denoising diffusion implicit models

    Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. InInternational Conference on Learning Representations (ICLR), 2021

  70. [70]

    Loss- guided diffusion models for plug-and-play controllable generation

    Song, J., Zhang, Q., Yin, H., Mardani, M., Liu, M.-Y ., Kautz, J., Chen, Y ., and Vahdat, A. Loss- guided diffusion models for plug-and-play controllable generation. InInternational Conference on Machine Learning (ICML), 2023

  71. [71]

    P., Kumar, A., Ermon, S., and Poole, B

    Song, Y ., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations (ICLR), 2021

  72. [72]

    Synthesis and stabilization of complex behaviors through online trajectory optimization

    Tassa, Y ., Erez, T., and Todorov, E. Synthesis and stabilization of complex behaviors through online trajectory optimization. In2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4906–4913, 2012

  73. [73]

    Optimal goal-reaching reinforcement learning via quasimetric learning

    Wang, T., Torralba, A., Isola, P., and Zhang, A. Optimal goal-reaching reinforcement learning via quasimetric learning. InInternational Conference on Machine Learning (ICML), 2023

  74. [74]

    Inference-time policy steering through human interactions

    Wang, Y ., Wang, L., Du, Y ., Sundaralingam, B., Yang, X., Chao, Y .-W., Pérez-D’Arpino, C., Fox, D., and Shah, J. Inference-time policy steering through human interactions. InIEEE International Conference on Robotics and Automation (ICRA), 2025

  75. [75]

    Diffusion policies as an expressive policy class for offline reinforcement learning.arXiv preprint arXiv:2208.06193,

    Wang, Z., Hunt, J. J., and Zhou, M. Diffusion policies as an expressive policy class for offline reinforcement learning.arXiv preprint arXiv:2208.06193, 2022

  76. [76]

    Latent diffusion planning for imitation learning

    Xie, A., Rybkin, O., Sadigh, D., and Finn, C. Latent diffusion planning for imitation learning. InInternational Conference on Machine Learning (ICML), 2025. 13

  77. [77]

    Guidance with spherical gaussian constraint for conditional diffusion

    Yang, L., Ding, S., Cai, Y ., Yu, J., Wang, J., and Shi, Y . Guidance with spherical gaussian constraint for conditional diffusion. InInternational Conference on Machine Learning (ICML), 2024

  78. [78]

    S., Freeman, W

    Yedidia, J. S., Freeman, W. T., and Weiss, Y . Constructing free-energy approximations and generalized belief propagation algorithms.IEEE Transactions on information theory, 51(7): 2282–2312, 2005

  79. [79]

    Monte carlo tree diffusion for system 2 planning

    Yoon, J., Cho, H., Baek, D., Bengio, Y ., and Ahn, S. Monte carlo tree diffusion for system 2 planning. InInternational Conference on Machine Learning (ICML), 2025

  80. [80]

    Fast monte carlo tree diffusion: 100x speedup via parallel sparse planning.arXiv preprint arXiv:2506.09498, 2025

    Yoon, J., Cho, H., Bengio, Y ., and Ahn, S. Fast monte carlo tree diffusion: 100x speedup via parallel sparse planning.arXiv preprint arXiv:2506.09498, 2025

Showing first 80 references.