arxiv: 2605.03075 · v1 · submitted 2026-05-04 · 💻 cs.RO · cs.AI· cs.LG

Recognition: 3 theorem links

· Lean Theorem

Refining Compositional Diffusion for Reliable Long-Horizon Planning

Anh Tong, Jaesik Choi, Kyowoon Lee, Yunhao Luo

Pith reviewed 2026-05-08 17:43 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG

keywords compositional diffusionlong-horizon planningmode averagingguidance methoddiffusion modelsroboticstrajectory stitching

0 comments

The pith

Refining compositional diffusion uses reconstruction error and overlap consistency to steer sampling toward high-density coherent long-horizon plans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Compositional diffusion plans long trajectories by stitching overlapping short segments from pretrained models, but multimodal local distributions produce mode-averaged outputs that are neither feasible nor coherent. Refining Compositional Diffusion (RCD) introduces training-free guidance that treats a pretrained diffusion model's self-reconstruction error as a proxy for the log-density of the full composed plan. An additional overlap consistency term penalizes mismatches at segment boundaries. Together these terms concentrate the sampling trajectory on high-density regions, reducing mode-averaging without any retraining. Experiments on locomotion, manipulation, and pixel-based tasks from OGBench show consistent gains over prior compositional baselines.

Core claim

RCD is a guidance procedure that adds the self-reconstruction error of a pretrained diffusion model, interpreted as log-density of the stitched plan, to the composed score, together with an explicit overlap consistency penalty; the resulting guidance concentrates samples on high-density, boundary-consistent plans that avoid the mode-averaging failure mode of naive score composition.

What carries the argument

Self-reconstruction error treated as log-density proxy, combined with an overlap consistency term, used as additive guidance on the composed score.

If this is right

Sampling concentrates on high-density regions of the composed plan distribution.
Mode-averaging is reduced, yielding plans that remain locally feasible and globally coherent.
No retraining or architectural change is required; the method applies to any pretrained diffusion planner.
Performance improves on long-horizon locomotion, object manipulation, and pixel-based tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reconstruction-error signal could serve as a cheap diagnostic for plan quality in other diffusion-based planners.
Boundary consistency terms may be useful beyond diffusion when stitching any locally generated trajectory segments.
If the proxy holds, it suggests that internal model signals already encode enough information to resolve global coherence without external rewards.

Load-bearing premise

The self-reconstruction error of the pretrained diffusion model remains a faithful proxy for the log-density of the stitched plan even when adjacent segments contain incompatible local modes.

What would settle it

A controlled test in which deliberately incompatible local modes are stitched; if the guided sampler still produces plans with feasibility rates no better than unguided compositional diffusion, the density-proxy assumption fails.

Figures

Figures reproduced from arXiv: 2605.03075 by Anh Tong, Jaesik Choi, Kyowoon Lee, Yunhao Luo.

**Figure 1.** Figure 1: Toy illustration of the mode-averaging problem. (a) Training data consists of overlapping l=3 segments, each shown in a distinct color. They are anchored at a fixed start and goal , and pass through a bimodal distribution with modes at +1 and −1. (b) CompDiffuser averages incompatible modes in overlap regions, producing many invalid (red) trajectories off both modes. (c) RCD guides the denoising toward hig… view at source ↗

**Figure 2.** Figure 2: Overview of RCD, contrasted with CompDiffuser and CDGS. Trajectories are planned from start to goal over a transition distribution . (a) CompDiffuser composes overlapping segments via bidirectional conditioning and autoregressive denoising, but produces mode-averaged trajectories in low-density regions when local distributions are multimodal. (b) CDGS mitigates this through population-based search and prun… view at source ↗

**Figure 3.** Figure 3: Plan quality comparisons of CompDiffuser, CDGS, and RCD. Each column shows 20 sampled plans in the AntMaze-Giant-Stitch environment for 5 test-time tasks defined in OGBench. Plans that violate environment constraints (wall penetration) are shown in red; feasible plans in green. We first assess whether RCD improves the physical feasibility of composed plans view at source ↗

**Figure 4.** Figure 4: A cube manipulation sequence executed by RCD. Object Manipulation in High-Dimensional State Spaces view at source ↗

**Figure 5.** Figure 5: Plan quality comparisons on PointMaze-Giant-Stitch. Each column shows 20 sampled plans from CompDiffuser, CDGS, and RCD for 5 test-time tasks defined in OGBench. Plans that violate environment constraints (wall penetration) are shown in red; feasible plans in green. averaged over 5 random seeds, each evaluated on 5 start-goal pairs with 20 episodes per pair. The defaults (w=0.25, λov=0.5, s/T=0.4) are high… view at source ↗

**Figure 6.** Figure 6: Visualization of RCD rollout execution on AntMaze-Giant-Stitch. The ant agent navigates from the starting region to the pink goal. 25 view at source ↗

**Figure 7.** Figure 7: Visualization of RCD rollout execution on AntSoccer-Medium-Stitch. The ant agent dribbles the soccer ball toward the goal location. 26 view at source ↗

**Figure 8.** Figure 8: Visualization of RCD rollout execution on Cube-Triple. The 6-DoF UR5e robot arm arranges three cubes into their target configuration via pick-and-place. t = 0 t = 55 t = 110 t = 165 t = 220 t = 275 t = 330 t = 385 view at source ↗

**Figure 9.** Figure 9: Visualization of RCD rollout execution on Visual-AntMaze-Medium-Stitch. The ant agent navigates from the starting region to the goal in a long-horizon task where start and goal are at opposite ends of the maze. Frames are the env-rendered 64×64 pixel observations. Planning is performed in a learned 16-dimensional VAE latent space derived from these pixel renders. 27 view at source ↗

read the original abstract

Compositional diffusion planning generates long-horizon trajectories by stitching together overlapping short-horizon segments through score composition. However, when local plan distributions are multimodal, existing compositional methods suffer from mode-averaging, where averaging incompatible local modes leads to plans that are neither locally feasible nor globally coherent. We propose Refining Compositional Diffusion (RCD), a training-free guidance method that steers compositional sampling toward high-density, globally coherent plans. RCD leverages the self-reconstruction error of a pretrained diffusion model as a proxy for the log-density of composed plans, combined with an overlap consistency term that enforces consistency at segment boundaries. We show that the combined guidance concentrates sampling on high-density plans that mitigate mode-averaging. Experiments on challenging long-horizon tasks from OGBench, including locomotion, object manipulation, and pixel-based observations, demonstrate that RCD consistently outperforms existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RCD adds reconstruction-error guidance plus overlap consistency to compositional diffusion planning and shows gains on OGBench, but the proxy assumption for stitched density still rests on heuristics.

read the letter

RCD refines compositional diffusion planning by steering sampling with two training-free terms: the pretrained model's self-reconstruction error on the full concatenated trajectory as a density proxy, and an explicit overlap consistency penalty at segment boundaries. The goal is to reduce mode averaging when local short-horizon plans are multimodal and incompatible. This specific pairing is not in the earlier compositional diffusion papers they cite, so the combination counts as the incremental novelty here. The experiments run across locomotion, object manipulation, and pixel observations on OGBench and report consistent outperformance over prior stitching methods, which gives the claim some practical grounding without requiring new model training. That keeps the bar low for adoption in existing diffusion planners. The soft spot is the central assumption that reconstruction error on the stitched plan reliably tracks log-density of the composed distribution. When local modes conflict, the joint error can be driven by local denoising residuals rather than the product measure, and the overlap term only fixes boundary mismatch. The paper does not appear to supply a derivation or bound showing why minimizing reconstruction error aligns with maximizing the composed density, so the justification stays empirical. Ablations isolating each guidance term and reporting of effect sizes with variance would make the results easier to interpret. This is aimed at people already working on diffusion-based long-horizon control in robotics. The method is simple enough to implement on top of existing codebases, so readers in that niche can test it directly. I would send it to peer review. The experiments give it enough substance to merit referee input, even though the theoretical link between reconstruction error and composed density could be tightened.

Referee Report

3 major / 2 minor

Summary. The paper claims that compositional diffusion planning for long-horizon trajectories suffers from mode-averaging when stitching multimodal short-horizon segments, and proposes Refining Compositional Diffusion (RCD) as a training-free guidance technique. RCD combines the self-reconstruction error of a pretrained diffusion model (used as a proxy for the log-density of the composed plan) with an overlap consistency term to steer sampling toward high-density, coherent plans. Experiments on OGBench tasks (locomotion, object manipulation, pixel observations) show consistent outperformance over prior compositional methods.

Significance. If the reconstruction-error proxy reliably approximates the composed log-density and the guidance mitigates mode-averaging without new artifacts, the work would be significant for enabling reliable long-horizon planning from pretrained short-horizon models in a training-free manner. Strengths include the training-free design, explicit use of a pretrained model, and evaluation across diverse OGBench domains including pixel-based observations.

major comments (3)

[§3.2] §3.2 (method): The central claim that self-reconstruction error on the concatenated trajectory serves as a proxy for log-density of the composed (product) distribution lacks any derivation, bound, or inequality showing that argmin of this error coincides with argmax of log p_composed. When local modes conflict, the joint error can be dominated by local denoising residuals rather than the measure of the stitched distribution; the overlap term addresses boundaries but does not correct this global mis-estimation.
[Experiments section, Tables 1-2] Experiments section, Tables 1-2: Reported outperformance on OGBench lacks quantitative details on effect sizes, standard deviations, number of random seeds, or ablation isolating the reconstruction-error guidance from the overlap-consistency term. Without these, it is impossible to verify whether the claimed mitigation of mode-averaging is robust or driven by one component.
[§4.1] §4.1 (analysis): No theoretical or empirical examination is given of cases where incompatible local modes cause the reconstruction error to fail as a density proxy, nor of whether the combined guidance can still produce incoherent plans in highly multimodal settings.

minor comments (2)

[Abstract] The abstract would benefit from brief quantitative effect sizes or success-rate deltas to ground the 'consistently outperforms' claim.
[§3] Notation for the two guidance scales (e.g., weighting coefficients) could be introduced earlier and used consistently in the method equations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive review and the recommendation for major revision. We appreciate the focus on strengthening the theoretical motivation, experimental reporting, and analysis of limitations. We address each major comment below and will incorporate revisions to improve the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (method): The central claim that self-reconstruction error on the concatenated trajectory serves as a proxy for log-density of the composed (product) distribution lacks any derivation, bound, or inequality showing that argmin of this error coincides with argmax of log p_composed. When local modes conflict, the joint error can be dominated by local denoising residuals rather than the measure of the stitched distribution; the overlap term addresses boundaries but does not correct this global mis-estimation.

Authors: We agree that §3.2 presents the reconstruction error as a proxy without a formal derivation or inequality establishing equivalence to the log-density of the composed distribution. The motivation is that, for a single diffusion model, reconstruction error correlates with negative log-likelihood under the learned distribution, and we extend this heuristically to the product of short-horizon models via concatenation. However, we acknowledge that local residuals can dominate when modes conflict and that the overlap term primarily enforces boundary consistency. In the revision we will explicitly label this as an approximation in §3.2, add a paragraph discussing the conditions under which the proxy may degrade, and note that empirical success on OGBench tasks supports its practical utility despite the lack of a tight bound. revision: partial
Referee: [Experiments section, Tables 1-2] Experiments section, Tables 1-2: Reported outperformance on OGBench lacks quantitative details on effect sizes, standard deviations, number of random seeds, or ablation isolating the reconstruction-error guidance from the overlap-consistency term. Without these, it is impossible to verify whether the claimed mitigation of mode-averaging is robust or driven by one component.

Authors: We accept that the current tables omit standard deviations, seed counts, effect sizes, and component ablations. The revised manuscript will update Tables 1 and 2 with means and standard deviations over 5 random seeds, include effect-size metrics (e.g., relative improvement percentages), and add a dedicated ablation table that runs RCD with reconstruction-error guidance disabled and with overlap consistency disabled. These additions will allow readers to assess the robustness of mode-averaging mitigation and the individual contributions of each term. revision: yes
Referee: [§4.1] §4.1 (analysis): No theoretical or empirical examination is given of cases where incompatible local modes cause the reconstruction error to fail as a density proxy, nor of whether the combined guidance can still produce incoherent plans in highly multimodal settings.

Authors: The existing §4.1 emphasizes successful long-horizon results but does not systematically explore failure regimes. We will expand this section with a new subsection that (i) provides theoretical intuition on when local denoising residuals may overwhelm the global density signal and (ii) presents empirical examples drawn from the OGBench locomotion and manipulation tasks, including both cases where the combined guidance yields coherent plans and cases where incompatible modes still produce artifacts. This will give a more balanced characterization of the method’s limitations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; guidance terms are independently defined proxies.

full rationale

The paper introduces RCD as a training-free method that explicitly defines two guidance terms—the self-reconstruction error of a pretrained diffusion model used as a log-density proxy, plus an overlap consistency term—without reducing either to a fitted parameter, a self-citation chain, or a renamed input. The central claim that the combined guidance mitigates mode-averaging is supported by empirical results on OGBench tasks rather than by construction from the method's own definitions. No load-bearing step equates the claimed concentration on high-density plans to the inputs by definition or prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions: (1) that reconstruction error of a pretrained diffusion model correlates with log-density of stitched plans, and (2) that enforcing overlap consistency is sufficient to resolve multimodal conflicts. No free parameters are introduced in the abstract description; the method is presented as training-free.

axioms (2)

domain assumption Self-reconstruction error of a pretrained diffusion model serves as a usable proxy for the log-density of a composed plan.
Invoked in the description of RCD guidance; no derivation or external validation supplied in the abstract.
domain assumption Adding an overlap consistency term during sampling will concentrate probability mass on globally coherent plans.
Stated as part of the combined guidance mechanism.

pith-pipeline@v0.9.0 · 5453 in / 1471 out tokens · 46923 ms · 2026-05-08T17:43:25.085735+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost (J(x) = ½(x+x⁻¹) − 1) washburn_uniqueness_aczel unclear
E_recon(τ̂_0;s) = E_ε[ ‖τ̂_0 − τ̂_0^rec(τ̂_0,ε,s)‖^2 ]
Foundation (parameter-free forcing chain) reality_from_one_distinction unclear
We use a single fixed set of RCD hyperparameters (w=0.25, λ_ov=0.5, s/T=0.4) across every benchmark

Reference graph

Works this paper leans on

85 extracted references · 15 canonical work pages

[1]

floq: Training critics via flow- matching for scaling compute in value-based rl

Agrawalla, B., Nauman, M., Agrawal, K., and Kumar, A. floq: Training critics via flow- matching for scaling compute in value-based rl. InInternational Conference on Learning Representations (ICLR), 2026

2026
[2]

Option-aware temporally abstracted value for offline goal-conditioned reinforcement learning

Ahn, H., Choi, H., Han, J., and Moon, T. Option-aware temporally abstracted value for offline goal-conditioned reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025
[3]

B., Jaakkola, T

Ajay, A., Du, Y ., Gupta, A., Tenenbaum, J. B., Jaakkola, T. S., and Agrawal, P. Is conditional generative modeling all you need for decision making? InInternational Conference on Learning Representations (ICLR), 2023

2023
[4]

Graph-assisted stitching for offline hierarchical reinforcement learning

Baek, S., Park, T., Park, J., Oh, S., and Kim, Y . Graph-assisted stitching for offline hierarchical reinforcement learning. InInternational Conference on Machine Learning (ICML), 2025

2025
[5]

Flooddiffusion: Tailored diffusion forcing for streaming motion generation.arXiv preprint arXiv:2512.03520, 2025

Cai, Y ., Wu, Y ., Li, K., Zhou, Y ., Zheng, B., and Liu, H. Flooddiffusion: Tailored diffusion forcing for streaming motion generation.arXiv preprint arXiv:2512.03520, 2025

work page arXiv 2025
[6]

M., and Schneider, J

Char, I., Mehta, V ., Villaflor, A., Dolan, J. M., and Schneider, J. Bats: Best action trajectory stitching.arXiv preprint arXiv:2204.12026, 2022

work page arXiv 2022
[7]

Diffusion forcing: Next-token prediction meets full-sequence diffusion

Chen, B., Martí Monsó, D., Du, Y ., Simchowitz, M., Tedrake, R., and Sitzmann, V . Diffusion forcing: Next-token prediction meets full-sequence diffusion. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024
[8]

Simple hierarchical planning with diffusion.arXiv preprint arXiv:2401.02644, 2024

Chen, C., Deng, F., Kawaguchi, K., Gulcehre, C., and Ahn, S. Simple hierarchical planning with diffusion.arXiv preprint arXiv:2401.02644, 2024

work page arXiv 2024
[9]

Extendable long-horizon planning via hierarchical multiscale diffusion.arXiv e-prints, pp

Chen, C., Hamed, H., Baek, D., Kang, T., Bengio, Y ., and Ahn, S. Extendable long-horizon planning via hierarchical multiscale diffusion.arXiv e-prints, pp. arXiv–2503, 2025

2025
[10]

Diffusion policy: Visuomotor policy learning via action diffusion

Chi, C., Feng, S., Du, Y ., Xu, Z., Cousineau, E., Burchfiel, B., and Song, S. Diffusion policy: Visuomotor policy learning via action diffusion. InProceedings of Robotics: Science and Systems (RSS), 2023

2023
[11]

T., Klasky, M

Chung, H., Kim, J., Mccann, M. T., Klasky, M. L., and Ye, J. C. Diffusion posterior sampling for general noisy inverse problems. InInternational Conference on Learning Representations (ICLR), 2023

2023
[12]

and Shkurti, F

Clark, Q. and Shkurti, F. What do you need for diverse trajectory composition in diffusion planning?arXiv preprint arXiv:2505.18083, 2025

work page arXiv 2025
[13]

and Nichol, A

Dhariwal, P. and Nichol, A. Diffusion models beat gans on image synthesis. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

2021
[14]

Proximal action replacement for behavior cloning actor-critic in offline reinforcement learning.arXiv preprint arXiv:2602.07441, 2026

Dong, J., Huang, W., Zhang, J., Chen, Z., Yuan, X., Gu, Q., Jiang, Z., and Ye, N. Proximal action replacement for behavior cloning actor-critic in offline reinforcement learning.arXiv preprint arXiv:2602.07441, 2026

work page arXiv 2026
[15]

Diffuserlite: Towards real-time diffusion planning

Dong, Z., Hao, J., Yuan, Y ., Ni, F., Wang, Y ., Li, P., and Zheng, Y . Diffuserlite: Towards real-time diffusion planning. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024
[16]

Cleandiffuser: An easy-to-use modularized library for diffusion models in decision making

Dong, Z., Yuan, Y ., Hao, J., Ni, F., Ma, Y ., Li, P., and Zheng, Y . Cleandiffuser: An easy-to-use modularized library for diffusion models in decision making. InAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2024

2024
[17]

Compositional visual generation with energy based models

Du, Y ., Li, S., and Mordatch, I. Compositional visual generation with energy based models. In Advances in Neural Information Processing Systems (NeurIPS), 2020

2020
[18]

B., Dieleman, S., Fergus, R., Sohl-Dickstein, J., Doucet, A., and Grathwohl, W

Du, Y ., Durkan, C., Strudel, R., Tenenbaum, J. B., Dieleman, S., Fergus, R., Sohl-Dickstein, J., Doucet, A., and Grathwohl, W. S. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc. InInternational Conference on Machine Learning (ICML), 2023. 10

2023
[19]

Eysenbach, B., Zhang, T., Levine, S., and Salakhutdinov, R. R. Contrastive learning as goal- conditioned reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

2022
[20]

Temporal difference flows

Farebrother, J., Pirotta, M., Tirinzoni, A., Munos, R., Lazaric, A., and Touati, A. Temporal difference flows. InInternational Conference on Machine Learning (ICML), 2025

2025
[21]

Ada- diffuser: Latent-aware adaptive diffusion for decision-making

Feng, F., Ge, S., Fu, M., Li, Z., Zheng, Y ., Tang, Z., Hu, Y ., Huang, B., and Zhang, K. Ada- diffuser: Latent-aware adaptive diffusion for decision-making. InInternational Conference on Learning Representations (ICLR), 2026

2026
[22]

Resisting stochastic risks in diffusion planners with the trajectory aggregation tree

Feng, L., Gu, P., An, B., and Pan, G. Resisting stochastic risks in diffusion planners with the trajectory aggregation tree. InInternational Conference on Machine Learning (ICML), 2024

2024
[23]

Diffusion guidance is a controllable policy improvement operator.arXiv preprint arXiv:2505.23458, 2025

Frans, K., Park, S., Abbeel, P., and Levine, S. Diffusion guidance is a controllable policy improvement operator.arXiv preprint arXiv:2505.23458, 2025

work page arXiv 2025
[24]

Learning to reach goals via iterated supervised learning, 2020

Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C., Eysenbach, B., and Levine, S. Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088, 2019

work page arXiv 1912
[25]

Closing the gap between td learning and supervised learning–a generalisation point of view

Ghugare, R., Geist, M., Berseth, G., and Eysenbach, B. Closing the gap between td learning and supervised learning–a generalisation point of view. InInternational Conference on Learning Representations (ICLR), 2024

2024
[26]

Hierarchical entity- centric reinforcement learning with factored subgoal diffusion

Haramati, D., Qi, C., Daniel, T., Zhang, A., Tamar, A., and Konidaris, G. Hierarchical entity- centric reinforcement learning with factored subgoal diffusion. InInternational Conference on Learning Representations (ICLR), 2026

2026
[27]

Z., Salakhutdinov, R., et al

He, Y ., Murata, N., Lai, C.-H., Takida, Y ., Uesaka, T., Kim, D., Liao, W.-H., Mitsufuji, Y ., Kolter, J. Z., Salakhutdinov, R., et al. Manifold preserving guided diffusion. InInternational Conference on Learning Representations (ICLR), 2024

2024
[28]

Hepburn, C. A. and Montana, G. Model-based trajectory stitching for improved offline rein- forcement learning.arXiv preprint arXiv:2211.11603, 2022

work page arXiv 2022
[29]

Denoising diffusion probabilistic models

Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

2020
[30]

Policy-guided diffusion.arXiv preprint arXiv:2404.06356, 2024

Jackson, M. T., Matthews, M. T., Lu, C., Ellis, B., Whiteson, S., and Foerster, J. Policy-guided diffusion.arXiv preprint arXiv:2404.06356, 2024

work page arXiv 2024
[31]

B., and Levine, S

Janner, M., Du, Y ., Tenenbaum, J. B., and Levine, S. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning (ICML), 2022

2022
[32]

Tree-guided diffusion planner

Jeon, H., Min, C., and Park, J. Tree-guided diffusion planner. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025
[33]

Prior-guided diffusion planning for offline reinforce- ment learning

Ki, D., Oh, J., Shim, S.-W., and Lee, B.-J. Prior-guided diffusion planning for offline reinforce- ment learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025
[34]

DEAS: DEtached value learning with action sequence for scalable offline RL

Kim, C., Lee, H., Seo, Y ., Lee, K., and Zhu, Y . DEAS: DEtached value learning with action sequence for scalable offline RL. InInternational Conference on Learning Representations (ICLR), 2026

2026
[35]

Offline reinforcement learning with implicit q-learning

Kostrikov, I., Nair, A., and Levine, S. Offline reinforcement learning with implicit q-learning. InInternational Conference on Learning Representations (ICLR), 2022

2022
[36]

Gta: Generative trajectory augmentation with guidance for offline reinforcement learning.arXiv preprint arXiv:2405.16907, 2024

Lee, J., Yun, S., Yun, T., and Park, J. Gta: Generative trajectory augmentation with guidance for offline reinforcement learning.arXiv preprint arXiv:2405.16907, 2024

work page arXiv 2024
[37]

and Choi, J

Lee, K. and Choi, J. Local manifold approximation and projection for manifold-aware diffusion planning. InInternational Conference on Machine Learning (ICML), 2025

2025
[38]

and Choi, J

Lee, K. and Choi, J. State-covering trajectory stitching for diffusion planners. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. 11

2025
[39]

Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling

Lee, K., Kim, S.-A., Choi, J., and Lee, S.-W. Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling. InInternational Conference on Machine Learning (ICML), 2018

2018
[40]

Refining diffusion planner for reliable behavior synthesis by automatic detection of infeasible plans

Lee, K., Kim, S., and Choi, J. Refining diffusion planner for reliable behavior synthesis by automatic detection of infeasible plans. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023
[41]

Diffstitch: Boosting offline reinforcement learning with diffusion-based trajectory stitching

Li, G., Shan, Y ., Zhu, Z., Long, T., and Zhang, W. Diffstitch: Boosting offline reinforcement learning with diffusion-based trajectory stitching. InInternational Conference on Machine Learning (ICML), 2024

2024
[42]

Decoupled q-chunking

Li, Q., Park, S., and Levine, S. Decoupled q-chunking. InInternational Conference on Learning Representations (ICLR), 2026

2026
[43]

Hierarchical diffusion for offline decision making

Li, W., Wang, X., Jin, B., and Zha, H. Hierarchical diffusion for offline decision making. In International Conference on Machine Learning (ICML), 2023

2023
[44]

K., Koenig, S., and Fioretto, F

Liang, J., Christopher, J. K., Koenig, S., and Fioretto, F. Simultaneous multi-robot motion planning with projected diffusion models. InInternational Conference on Machine Learning (ICML), 2025

2025
[45]

Adaptdiffuser: Diffusion models as adaptive self-evolving planners

Liang, Z., Mu, Y ., Ding, M., Ni, F., Tomizuka, M., and Luo, P. Adaptdiffuser: Diffusion models as adaptive self-evolving planners. InInternational Conference on Machine Learning (ICML), 2023

2023
[46]

What makes a good diffusion planner for decision making? InInternational Conference on Learning Representations (ICLR), 2025

Lu, H., Han, D., Shen, Y ., and Li, D. What makes a good diffusion planner for decision making? InInternational Conference on Learning Representations (ICLR), 2025

2025
[47]

Improving diffusion planners by self-supervised action gating with energies.arXiv preprint arXiv:2603.02650, 2026

Lu, Y ., Han, D., Wang, Y ., and Li, D. Improving diffusion planners by self-supervised action gating with energies.arXiv preprint arXiv:2603.02650, 2026

work page arXiv 2026
[48]

B., and Du, Y

Luo, Y ., Sun, C., Tenenbaum, J. B., and Du, Y . Potential based diffusion motion planning. In International Conference on Machine Learning (ICML), 2024

2024
[49]

A., Du, Y ., and Xu, D

Luo, Y ., Mishra, U. A., Du, Y ., and Xu, D. Generative trajectory stitching through diffusion composition. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025
[50]

Learning latent plans from play

Lynch, C., Khansari, M., Xiao, T., Kumar, V ., Tompson, J., Levine, S., and Sermanet, P. Learning latent plans from play. InConference on robot learning, 2020

2020
[51]

A., Xue, S., Chen, Y ., and Xu, D

Mishra, U. A., Xue, S., Chen, Y ., and Xu, D. Generative skill chaining: Long-horizon skill planning with diffusion models. InConference on Robot Learning, pp. 2905–2925. PMLR, 2023

2023
[52]

A., He, D., Chen, Y ., and Xu, D

Mishra, U. A., He, D., Chen, Y ., and Xu, D. Compositional diffusion with guided search for long-horizon planning. InInternational Conference on Learning Representations (ICLR), 2026

2026
[53]

Hdflow: Hierarchical diffusion-flow planning for long-horizon robotic assembly

Nandiraju, G., Ju, Y ., Xu, C., and Wang, H. Hdflow: Hierarchical diffusion-flow planning for long-horizon robotic assembly. InNeurIPS 2025 Workshop on Embodied World Models for Decision Making, 2025

2025
[54]

Test-time graph search for goal-conditioned reinforcement learning.arXiv preprint arXiv:2510.07257, 2025

Opryshko, E., Quan, J., V oelcker, C., Du, Y ., and Gilitschenski, I. Test-time graph search for goal-conditioned reinforcement learning.arXiv preprint arXiv:2510.07257, 2025

work page arXiv 2025
[55]

Scalable offline model-based RL with action chunks

Park, K., Park, S., Lee, Y ., and Levine, S. Scalable offline model-based RL with action chunks. InInternational Conference on Learning Representations (ICLR), 2026

2026
[56]

Offline goal-conditioned rl with latent states as actions

Park, S., Ghosh, D., Eysenbach, B., and Levine, S. Offline goal-conditioned rl with latent states as actions. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023
[57]

Foundation policies with hilbert representations

Park, S., Kreiman, T., and Levine, S. Foundation policies with hilbert representations. In International Conference on Machine Learning (ICML), 2024. 12

2024
[58]

Ogbench: Benchmarking offline goal- conditioned rl

Park, S., Frans, K., Eysenbach, B., and Levine, S. Ogbench: Benchmarking offline goal- conditioned rl. InInternational Conference on Learning Representations (ICLR), 2025

2025
[59]

Horizon reduction makes rl scalable

Park, S., Frans, K., Mann, D., Eysenbach, B., Kumar, A., and Levine, S. Horizon reduction makes rl scalable. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025
[60]

Flow q-learning

Park, S., Li, Q., and Levine, S. Flow q-learning. InInternational Conference on Machine Learning (ICML), 2025

2025
[61]

Dual goal representations

Park, S., Mann, D., and Levine, S. Dual goal representations. InInternational Conference on Learning Representations (ICLR), 2026

2026
[62]

Transitive rl: Value learning via divide and conquer

Park, S., Oberai, A., Atreya, P., and Levine, S. Transitive rl: Value learning via divide and conquer. InInternational Conference on Learning Representations (ICLR), 2026

2026
[63]

M., and Han, J

Ren, Y ., Gao, W., Ying, L., Rotskoff, G. M., and Han, J. Driftlite: Lightweight drift control for inference-time scaling of diffusion models. InInternational Conference on Learning Representations (ICLR), 2026

2026
[64]

Robbins, H. E. An empirical bayes approach to statistics. InBreakthroughs in Statistics: Foundations and basic theory, pp. 388–394. Springer, 1992

1992
[65]

Multi-robot motion planning with diffusion models

Shaoul, Y ., Mishani, I., Vats, S., Li, J., and Likhachev, M. Multi-robot motion planning with diffusion models. InInternational Conference on Learning Representations (ICLR), 2025

2025
[66]

Understanding and improving training-free loss-based diffusion guidance

Shen, Y ., Jiang, X., Yang, Y ., Wang, Y ., Han, D., and Li, D. Understanding and improving training-free loss-based diffusion guidance. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024
[67]

J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V ., Lanctot, M., et al

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V ., Lanctot, M., et al. Mastering the game of go with deep neural networks and tree search.nature, 529(7587):484–489, 2016

2016
[68]

Mastering the game of go without human knowledge

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017

2017
[69]

Denoising diffusion implicit models

Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. InInternational Conference on Learning Representations (ICLR), 2021

2021
[70]

Loss- guided diffusion models for plug-and-play controllable generation

Song, J., Zhang, Q., Yin, H., Mardani, M., Liu, M.-Y ., Kautz, J., Chen, Y ., and Vahdat, A. Loss- guided diffusion models for plug-and-play controllable generation. InInternational Conference on Machine Learning (ICML), 2023

2023
[71]

P., Kumar, A., Ermon, S., and Poole, B

Song, Y ., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations (ICLR), 2021

2021
[72]

Synthesis and stabilization of complex behaviors through online trajectory optimization

Tassa, Y ., Erez, T., and Todorov, E. Synthesis and stabilization of complex behaviors through online trajectory optimization. In2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4906–4913, 2012

2012
[73]

Optimal goal-reaching reinforcement learning via quasimetric learning

Wang, T., Torralba, A., Isola, P., and Zhang, A. Optimal goal-reaching reinforcement learning via quasimetric learning. InInternational Conference on Machine Learning (ICML), 2023

2023
[74]

Inference-time policy steering through human interactions

Wang, Y ., Wang, L., Du, Y ., Sundaralingam, B., Yang, X., Chao, Y .-W., Pérez-D’Arpino, C., Fox, D., and Shah, J. Inference-time policy steering through human interactions. InIEEE International Conference on Robotics and Automation (ICRA), 2025

2025
[75]

Diffusion policies as an expressive policy class for offline reinforcement learning.arXiv preprint arXiv:2208.06193,

Wang, Z., Hunt, J. J., and Zhou, M. Diffusion policies as an expressive policy class for offline reinforcement learning.arXiv preprint arXiv:2208.06193, 2022

work page arXiv 2022
[76]

Latent diffusion planning for imitation learning

Xie, A., Rybkin, O., Sadigh, D., and Finn, C. Latent diffusion planning for imitation learning. InInternational Conference on Machine Learning (ICML), 2025. 13

2025
[77]

Guidance with spherical gaussian constraint for conditional diffusion

Yang, L., Ding, S., Cai, Y ., Yu, J., Wang, J., and Shi, Y . Guidance with spherical gaussian constraint for conditional diffusion. InInternational Conference on Machine Learning (ICML), 2024

2024
[78]

S., Freeman, W

Yedidia, J. S., Freeman, W. T., and Weiss, Y . Constructing free-energy approximations and generalized belief propagation algorithms.IEEE Transactions on information theory, 51(7): 2282–2312, 2005

2005
[79]

Monte carlo tree diffusion for system 2 planning

Yoon, J., Cho, H., Baek, D., Bengio, Y ., and Ahn, S. Monte carlo tree diffusion for system 2 planning. InInternational Conference on Machine Learning (ICML), 2025

2025
[80]

Fast monte carlo tree diffusion: 100x speedup via parallel sparse planning.arXiv preprint arXiv:2506.09498, 2025

Yoon, J., Cho, H., Bengio, Y ., and Ahn, S. Fast monte carlo tree diffusion: 100x speedup via parallel sparse planning.arXiv preprint arXiv:2506.09498, 2025

work page arXiv 2025

Showing first 80 references.