Energy-based Compositional Diffusion Planning

Danfei Xu; Iro Armeni; Jiaxin Lu; Tao Sun; Utkarsh Aashu Mishra

arxiv: 2606.21646 · v1 · pith:CKKRB64Tnew · submitted 2026-06-19 · 💻 cs.RO

Energy-based Compositional Diffusion Planning

Tao Sun , Utkarsh Aashu Mishra , Jiaxin Lu , Danfei Xu , Iro Armeni This is my paper

Pith reviewed 2026-06-26 14:10 UTC · model grok-4.3

classification 💻 cs.RO

keywords compositional diffusion planningenergy-based modelstrajectory stitchingrobotic planningdiffusion modelslong-horizon tasksconservative fields

0 comments

The pith

ECD recovers global trajectories by minimizing the sum of local bridge potentials, producing a conservative correction field that includes the omitted boundary reaction term.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current compositional diffusion planners stitch local predictions heuristically, but the resulting update field is non-conservative and does not match any valid global trajectory log-density. The paper introduces the Energy-based Compositional Diffuser, which treats the global trajectory as the exact minimizer of summed local bridge potentials. This formulation supplies both a conservative correction and the missing boundary reaction term. A Markov-based score approximation then computes the reaction term through one block-tridiagonal solve, keeping inference linear in the planning horizon. The method reports higher success rates on OGBench long-horizon stitching tasks while preserving the speed of heuristic baselines.

Core claim

The global trajectory is recovered as the minimizer of the sum of local bridge potentials; this energy-based view defines a conservative correction field that contains the boundary reaction term omitted by heuristic stitching, and the reaction term is recovered efficiently by a Markov-based score approximation solved via a single block-tridiagonal linear system.

What carries the argument

Energy-based Compositional Diffuser (ECD), the formulation of the global trajectory as the minimizer of the sum of local bridge potentials.

If this is right

The stitched update becomes a conservative field that corresponds to a valid global trajectory log-density.
Inference cost remains linear in the planning horizon through the block-tridiagonal solve.
Higher success rates are obtained on long-horizon robotic stitching tasks.
The boundary reaction term is included without changing the asymptotic speed of heuristic methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same energy-minimization view could be applied to other sequence-composition methods that currently rely on ad-hoc stitching.
The reaction term recovered by the Markov approximation may correspond to measurable boundary effects in physical robot dynamics.
Extending the block-tridiagonal structure to non-Markov score models could further reduce approximation error on very long horizons.

Load-bearing premise

The global trajectory log-density is exactly recovered by minimizing the sum of local bridge potentials and the Markov score approximation captures the reaction term without introducing bias that changes planning outcomes.

What would settle it

A direct comparison on a stitching task where the trajectory distribution recovered by ECD diverges measurably from the true global log-density or where success rates fall below heuristic stitching.

Figures

Figures reproduced from arXiv: 2606.21646 by Danfei Xu, Iro Armeni, Jiaxin Lu, Tao Sun, Utkarsh Aashu Mishra.

**Figure 1.** Figure 1: Performance-runtime frontier on OGBench stitch tasks (Park et al., 2025). We plot the trade-off between success rate and inference runtime averaged over PointMaze Giant and AntMaze Giant environments. While recent inference-time search methods like CDGS (4 or 8 resampling rounds) achieve high success, they suffer from significantly longer runtime costs. Conversely, heuristic stitching methods, e.g., CompDi… view at source ↗

**Figure 2.** Figure 2: Comparison of the denoising process. Snapshots of the 2D trajectory on one OGBench AntMaze Giant sample during reverse diffusion process (timesteps are normalized to 1 → 0). While CompDiffuser (CD, top) exhibits scattered samples until the later steps, our method (bottom) rapidly converges to one possible global mode at earlier time (t ≈ 0.8 for ours and t ≈ 0.3 for CD) and maintains this mode throughout t… view at source ↗

**Figure 3.** Figure 3: Method comparison. At diffusion time t, the global trajectory x(t) is covered by overlapping chunks (colored bands) connecting a fixed start and goal around obstacles. For chunk k, xsk (t) and xek (t) denote the boundary conditions given to the local denoiser (i.e., start/goal states or segments overlapped with neighboring chunks). Left: CompDiffuser denoises chunks independently, producing local noisy-mea… view at source ↗

**Figure 4.** Figure 4: Toy example with a sequence of 1D distributions: For scalar variables {x0:L}, there are two feasible long-horizon plans from start x0 = 0 to goal xL = 0: one through the top modes (E[xi] = +1) and one through the bottom modes (E[xi] = −1). Each training chunk contains l = 3 consecutive variables. While both CD and our method show robust performance at L = 4, CD’s performance degrades significantly at L = 1… view at source ↗

**Figure 5.** Figure 5: Physical interpretation of three properties. The arrows indicate the desired gradient direction leading to energy minimization for each property. Here, small dots denote interior variables and large dots denote boundary variables. (Best viewed in color.) Let Pkxk denote the local coordinates that we score, and let Okxk denote the local boundary condition. The selector Pk selects the local trajectory coordi… view at source ↗

**Figure 6.** Figure 6: Performance at different replan budgets. We compare the success rates of our method and CD on PointMaze Giant and AntMaze Giant as a function of the allowed number of replans. Our method consistently outperforms CD across all replan budgets. et al., 2025) comprising of goal-conditioned behavioral cloning (GCBC) (Lynch et al., 2020; Ghosh et al., 2019), goal-conditioned implicit V-learning (GCIVL) and Qlea… view at source ↗

**Figure 7.** Figure 7: Fidelity of reaction term approximation. We compare our approximated reaction term with the exact JVP computed via backpropagation. We report the Cosine Similarity and Norm Ratio across normalized diffusion timesteps (1.0 → 0.0) on PointMaze and AntMaze Giant. The approximation shows reasonable alignment, particularly in the later denoising stages. Shaded regions indicate standard deviation. a comparable c… view at source ↗

read the original abstract

Compositional diffusion planners aim to solve long-horizon robotic tasks using short training trajectories. Yet, current approaches often rely on the heuristic stitching of local predictions. We show that the resulting stitched update is generally a non-conservative field} that does not mathematically correspond to any valid global trajectory log-density function. We propose Energy-based Compositional Diffuser (ECD), a framework that formulates the global trajectory as the minimizer of the sum of local bridge potentials. This energy-based perspective defines a conservative correction field and contains a boundary reaction term that heuristic stitching omits. To enable efficient inference, we further introduce a Markov-based score approximation that computes the reaction term via a single block-tridiagonal solve, maintaining time complexity linear in the planning horizon. Empirically, ECD achieves state-of-the-art success rates on a range of OGBench stitching tasks, while nearly matching the inference speed of heuristic stitching methods. Code is available at https://github.com/GradientSpaces/ECD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ECD's energy view and reaction term fix a real gap in stitching, but the Markov approx needs error checks to back the conservativeness claim.

read the letter

The core advance here is treating the global trajectory as the exact minimizer of summed local bridge potentials. This produces a conservative correction field and surfaces the boundary reaction term that heuristic stitching leaves out. They then approximate the score with a single block-tridiagonal Markov solve to keep inference linear in horizon length.

That formulation is mechanically different from the prior stitching methods cited in the abstract, and the empirical results on OGBench stitching tasks show higher success rates at comparable speed. The energy perspective is a clean way to derive the missing term instead of patching it in after the fact.

The soft spot is the Markov approximation itself. The abstract introduces it purely for speed, with no derivation showing the error is zero or bounded independently of horizon, and no ablation isolating its effect on trajectory validity. If the full paper does not supply either a proof or targeted controls, the guarantee that the implemented method remains conservative may not hold. The stress-test concern lands here.

This is for people building long-horizon diffusion planners in robotics who already know the stitching literature. The technical move is distinct enough that a serious referee should see the derivations and the experimental controls around the approximation. I would send it to review.

Referee Report

2 major / 1 minor

Summary. The paper claims that heuristic stitching of local predictions in compositional diffusion planners produces a non-conservative vector field that does not correspond to any valid global trajectory log-density. It proposes the Energy-based Compositional Diffuser (ECD), which formulates the global trajectory as the minimizer of the sum of local bridge potentials; this yields a conservative correction field that includes an omitted boundary reaction term. A Markov-based score approximation recovers the reaction term via a single block-tridiagonal solve while preserving linear time complexity in the horizon. Experiments report state-of-the-art success rates on OGBench stitching tasks at near-heuristic inference speed, with code released.

Significance. If the energy minimization exactly recovers the global log-density and the Markov approximation introduces no material bias, the work supplies a principled, conservative alternative to heuristic stitching for long-horizon diffusion planning. The open-source implementation is a concrete strength that supports reproducibility.

major comments (2)

[Abstract] Abstract and the section introducing the non-conservative claim: the assertion that the stitched update 'does not mathematically correspond to any valid global trajectory log-density function' is stated without a derivation showing that the heuristic field cannot arise from any global density; this claim is load-bearing for the motivation of ECD.
[Markov-based score approximation] The section on the Markov-based score approximation: the claim that the single block-tridiagonal solve 'accurately captures the omitted reaction term without introducing bias' lacks both an error bound (independent of horizon) and an ablation measuring the approximation's effect on trajectory validity or planning success; this directly affects whether the conservativeness guarantee survives the practical implementation.

minor comments (1)

[Abstract] The abstract states 'nearly matching the inference speed'; a table or figure quantifying wall-clock times and success rates side-by-side with the heuristic baseline would strengthen the empirical claim.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate the revisions that will be made.

read point-by-point responses

Referee: [Abstract] Abstract and the section introducing the non-conservative claim: the assertion that the stitched update 'does not mathematically correspond to any valid global trajectory log-density function' is stated without a derivation showing that the heuristic field cannot arise from any global density; this claim is load-bearing for the motivation of ECD.

Authors: We agree that an explicit derivation strengthens the motivation. In the revision we will add a short proof in Section 3 showing that the heuristic stitched field has nonzero curl in general (hence cannot be the gradient of any scalar log-density) together with a two-segment counterexample demonstrating nonzero circulation. This derivation follows directly from the energy formulation already present in the manuscript. revision: yes
Referee: [Markov-based score approximation] The section on the Markov-based score approximation: the claim that the single block-tridiagonal solve 'accurately captures the omitted reaction term without introducing bias' lacks both an error bound (independent of horizon) and an ablation measuring the approximation's effect on trajectory validity or planning success; this directly affects whether the conservativeness guarantee survives the practical implementation.

Authors: We will add an ablation comparing the Markov approximation to exact block-tridiagonal solves on short-horizon instances where the latter is tractable, reporting effects on success rate and trajectory validity. A horizon-independent error bound is not currently derived and appears difficult without further assumptions on the underlying dynamics; we will instead clarify the approximation's bias properties and its effect on the conservativeness guarantee in the revised text. revision: partial

standing simulated objections not resolved

Deriving a general error bound for the Markov score approximation that is independent of planning horizon

Circularity Check

0 steps flagged

No significant circularity; central construction is definitional proposal

full rationale

The paper defines ECD by formulating the global trajectory explicitly as the minimizer of summed local bridge potentials; this choice directly yields the conservative field and reaction term by the mathematical properties of energy minimization, without reducing a claimed derivation back to fitted data or prior self-citations. The Markov score approximation is presented solely as a computational device for linear-time inference, not as a result that is forced by or equivalent to the target quantities. No equations or steps in the provided text exhibit self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations. The framework is therefore self-contained as a modeling choice rather than a circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard diffusion-model assumptions for trajectory generation and the mathematical equivalence between energy minimization and conservative vector fields; no new free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Local diffusion predictions can be treated as bridge potentials whose sum defines a global energy whose minimizer is a valid trajectory density.
Invoked when the paper states that the global trajectory is the minimizer of the sum of local bridge potentials.
domain assumption The Markov score approximation computes the boundary reaction term without material bias for planning success.
Required for the claim that the single block-tridiagonal solve maintains correctness while achieving linear complexity.

pith-pipeline@v0.9.1-grok · 5704 in / 1275 out tokens · 16872 ms · 2026-06-26T14:10:48.927196+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 26 canonical work pages · 5 internal anchors

[1]

Stap: Sequencing task-agnostic policies.arXiv preprint arXiv:2210.12250,

Agia, C., Migimatsu, T., Wu, J., and Bohg, J. Stap: Sequencing task-agnostic policies.arXiv preprint arXiv:2210.12250,

work page arXiv
[2]

B., Jaakkola, T

Ajay, A., Du, Y ., Gupta, A., Tenenbaum, J. B., Jaakkola, T. S., and Agrawal, P. Is conditional generative modeling all you need for decision making? InInternational Conference on Learning Representations (ICLR), 2023a. Ajay, A., Han, S., Du, Y ., Li, S., Gupta, A., Jaakkola, T. S., Tenenbaum, J. B., Kaelbling, L. P., Srivastava, A., and Agrawal, P. Compo...

work page arXiv
[3]

T., Baierl, M., Koert, D., and Peters, J

Carvalho, J., Le, A. T., Baierl, M., Koert, D., and Peters, J. Motion planning diffusion: Learning and planning of robot motions with diffusion models. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1916–1923. IEEE,

1916
[4]

M., and Schneider, J

Char, I., Mehta, V ., Villaflor, A., Dolan, J. M., and Schneider, J. Bats: Best action trajectory stitching.arXiv preprint arXiv:2204.12026,

work page arXiv
[5]

Simple hierarchical planning with diffusion

Chen, C., Deng, F., Kawaguchi, K., Gulcehre, C., and Ahn, S. Simple hierarchical planning with diffusion. In International Conference on Learning Representations (ICLR), 2024a. Chen, C., Deng, F., Kawaguchi, K., Gulcehre, C., and Ahn, S. Simple hierarchical planning with diffusion.arXiv preprint arXiv:2401.02644, 2024b. Chen, C., Hamed, H., Baek, D., Kang...

work page arXiv
[6]

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Chi, C., Feng, S., Du, Y ., Xu, W., Wang, T., Cousineau, E., Burchfiel, B., and Song, S. Diffusion policy: Visuomotor policy learning via action diffusion.arXiv preprint arXiv:2303.04137,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

URLhttps://arxiv.org/abs/2105.05233. Du, Y . and Kaelbling, L. Compositional generative modeling: A single model is not all you need.arXiv preprint arXiv:2402.01103,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Compositional sculpting of iterative generative processes.arXiv preprint arXiv:2309.16115,

Garipov, T., De Peuter, S., Yang, G., Garg, V ., Kaski, S., and Jaakkola, T. Compositional sculpting of iterative generative processes.arXiv preprint arXiv:2309.16115,

work page arXiv
[9]

Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088,

Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C., Eysenbach, B., and Levine, S. Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088,

work page arXiv 1912
[10]

Launchpad

Ghugare, R., Geist, M., Berseth, G., and Eysenbach, B. Closing the gap between td learning and supervised learning–a generalisation point of view.arXiv preprint arXiv:2401.11237, 2024a. Ghugare, R., Geist, M., Berseth, G., and Eysenbach, B. Closing the gap between td learning and supervised learning-a generalisation point of view. InThe Twelfth Internatio...

work page arXiv
[11]

Classifier-Free Diffusion Guidance

Ho, J. and Salimans, T. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Drdt3: Diffusion- refined decision test-time training model.arXiv preprint arXiv:2501.06718,

Huang, X., Wu, D., and Boulet, B. Drdt3: Diffusion- refined decision test-time training model.arXiv preprint arXiv:2501.06718,

work page arXiv
[13]

Offline Reinforcement Learning with Implicit Q-Learning

Kim, J., Lee, S., Kim, W., and Sung, Y . Adaptiveq-aid for conditional supervised learning in offline reinforcement learning.Advances in Neural Information Processing Systems, 37:87104–87135, 2024a. Kim, S., Choi, Y ., Matsunaga, D. E., and Kim, K.- E. Stitching sub-trajectories with conditional diffusion model for goal-conditioned offline rl. InProceedin...

work page internal anchor Pith review Pith/arXiv arXiv
[14]

and Choi, J

Lee, K. and Choi, J. State-covering trajectory stitching for diffusion planners.arXiv preprint arXiv:2506.00895,

work page arXiv
[15]

Mgda: Model-based goal data augmentation for offline goal-conditioned weighted supervised learning.arXiv preprint arXiv:2412.11410,

Lei, X., Zhang, X., and Wang, D. Mgda: Model-based goal data augmentation for offline goal-conditioned weighted supervised learning.arXiv preprint arXiv:2412.11410,

work page arXiv
[16]

and Zhang, X

Li, S. and Zhang, X. Augmenting offline reinforcement learning with state-only interactions.arXiv preprint arXiv:2402.00807,

work page arXiv
[17]

Enhancing decision transformer with diffusion- based trajectory branch generation.arXiv preprint arXiv:2411.11327,

Liu, Z., Qian, L., Liu, Z., Wan, L., Chen, X., and Lan, X. Enhancing decision transformer with diffusion- based trajectory branch generation.arXiv preprint arXiv:2411.11327,

work page arXiv
[18]

A., Du, Y ., and Xu, D

Luo, Y ., Mishra, U. A., Du, Y ., and Xu, D. Generative trajectory stitching through diffusion composition.arXiv preprint arXiv:2503.05153,

work page arXiv
[19]

Compositional risk minimization.arXiv preprint arXiv:2410.06303,

Mahajan, D., Pezeshki, M., Mitliagkas, I., Ahuja, K., and Vincent, P. Compositional risk minimization.arXiv preprint arXiv:2410.06303,

work page arXiv
[20]

A., He, D., Chen, Y ., and Xu, D

Mishra, U. A., He, D., Chen, Y ., and Xu, D. Compositional diffusion with guided search for long-horizon planning. arXiv preprint arXiv:2601.00126,

work page arXiv
[21]

Composing diffusion policies for few-shot learning of movement trajectories

Patil, O., Sah, A., and Gopalan, N. Composing diffusion policies for few-shot learning of movement trajectories. arXiv preprint arXiv:2410.17479,

work page arXiv
[22]

Composition and control with distilled energy diffusion models and sequential monte carlo

Thornton, J., Bethune, L., Zhang, R., Bradley, A., Nakkiran, P., and Zhai, S. Composition and control with distilled energy diffusion models and sequential monte carlo. arXiv preprint arXiv:2502.12786,

work page arXiv
[23]

T., Dolan, J., Schneider, J., and Berseth, G

Venkatraman, S., Khaitan, S., Akella, R. T., Dolan, J., Schneider, J., and Berseth, G. Reasoning with latent diffusion in offline reinforcement learning.arXiv preprint arXiv:2309.06599,

work page arXiv
[24]

H., and Tedrake, R

Wang, L., Zhao, J., Du, Y ., Adelson, E. H., and Tedrake, R. Poco: Policy composition from and for heterogeneous robot learning.arXiv preprint arXiv:2402.02511, 2024a. Wang, T., Torralba, A., Isola, P., and Zhang, A. Optimal goal-reaching reinforcement learning via quasimetric learning. InInternational Conference on Machine Learning, pp. 36411–36430. PMLR,

work page arXiv
[25]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Wang, Y ., Yang, C., Wen, Y ., Liu, Y ., and Qiao, Y . Critic- guided decision transformer for offline reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 15706–15714, 2024b. Wang, Z., Hunt, J. J., and Zhou, M. Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv pre...

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Planning as in-painting: A diffusion-based embodied task planning framework for environments under uncertainty.arXiv preprint arXiv:2312.01097, 2023a

Yang, C.-F., Xu, H., Wu, T.-L., Gao, X., Chang, K.-W., and Gao, F. Planning as in-painting: A diffusion-based embodied task planning framework for environments under uncertainty.arXiv preprint arXiv:2312.01097, 2023a. Yang, Z., Mao, J., Du, Y ., Wu, J., Tenenbaum, J. B., Lozano- P´erez, T., and Kaelbling, L. P. Compositional diffusion- based continuous co...

work page arXiv
[27]

Context-former: Stitching via latent conditioned sequence modeling.arXiv preprint arXiv:2401.16452,

Zhang, Z., Xu, J., Liu, J., Zhuang, Z., Wang, D., Liu, M., and Zhang, S. Context-former: Stitching via latent conditioned sequence modeling.arXiv preprint arXiv:2401.16452,

work page arXiv
[28]

training horizon

Hence the score field is generally non-conservative. Chunk-interleaved variant.CD’s chunk-interleaved variant used in practice is an ordered composition of local updates. It is therefore better viewed as a proposal operator than as a single simultaneous score field. Its non-conservative character can still be formalized. Let Tk denote the local update tha...

2000
[29]

We use the same planner checkpoints for CD and ECD

and CompDiffuser (Luo et al., 2025).We follow the official CompDiffuser implementation and hyperparameter conventions: https://github.com/devinluo27/comp_ diffuser_release. We use the same planner checkpoints for CD and ECD. PointMaze is executed with the PD controller. AntMaze, HumanoidMaze, AntSoccer, and AntMaze-o15d use inverse-dynamics models trained...

2025
[30]

Compositional Diffusion with Guided Search (Mishra et al., 2025).We use the official CDGS codebase: https: //github.com/UtkarshMishra04/CDGS_ogbench

For HumanoidMaze and AntSoccer, which are not fully covered by the released pretrained checkpoints, we train our own planner and inverse-dynamics models following the CD codebase. Compositional Diffusion with Guided Search (Mishra et al., 2025).We use the official CDGS codebase: https: //github.com/UtkarshMishra04/CDGS_ogbench. We use the same checkpoints...

2025

[1] [1]

Stap: Sequencing task-agnostic policies.arXiv preprint arXiv:2210.12250,

Agia, C., Migimatsu, T., Wu, J., and Bohg, J. Stap: Sequencing task-agnostic policies.arXiv preprint arXiv:2210.12250,

work page arXiv

[2] [2]

B., Jaakkola, T

Ajay, A., Du, Y ., Gupta, A., Tenenbaum, J. B., Jaakkola, T. S., and Agrawal, P. Is conditional generative modeling all you need for decision making? InInternational Conference on Learning Representations (ICLR), 2023a. Ajay, A., Han, S., Du, Y ., Li, S., Gupta, A., Jaakkola, T. S., Tenenbaum, J. B., Kaelbling, L. P., Srivastava, A., and Agrawal, P. Compo...

work page arXiv

[3] [3]

T., Baierl, M., Koert, D., and Peters, J

Carvalho, J., Le, A. T., Baierl, M., Koert, D., and Peters, J. Motion planning diffusion: Learning and planning of robot motions with diffusion models. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1916–1923. IEEE,

1916

[4] [4]

M., and Schneider, J

Char, I., Mehta, V ., Villaflor, A., Dolan, J. M., and Schneider, J. Bats: Best action trajectory stitching.arXiv preprint arXiv:2204.12026,

work page arXiv

[5] [5]

Simple hierarchical planning with diffusion

Chen, C., Deng, F., Kawaguchi, K., Gulcehre, C., and Ahn, S. Simple hierarchical planning with diffusion. In International Conference on Learning Representations (ICLR), 2024a. Chen, C., Deng, F., Kawaguchi, K., Gulcehre, C., and Ahn, S. Simple hierarchical planning with diffusion.arXiv preprint arXiv:2401.02644, 2024b. Chen, C., Hamed, H., Baek, D., Kang...

work page arXiv

[6] [6]

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Chi, C., Feng, S., Du, Y ., Xu, W., Wang, T., Cousineau, E., Burchfiel, B., and Song, S. Diffusion policy: Visuomotor policy learning via action diffusion.arXiv preprint arXiv:2303.04137,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

URLhttps://arxiv.org/abs/2105.05233. Du, Y . and Kaelbling, L. Compositional generative modeling: A single model is not all you need.arXiv preprint arXiv:2402.01103,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Compositional sculpting of iterative generative processes.arXiv preprint arXiv:2309.16115,

Garipov, T., De Peuter, S., Yang, G., Garg, V ., Kaski, S., and Jaakkola, T. Compositional sculpting of iterative generative processes.arXiv preprint arXiv:2309.16115,

work page arXiv

[9] [9]

Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088,

Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C., Eysenbach, B., and Levine, S. Learning to reach goals via iterated supervised learning.arXiv preprint arXiv:1912.06088,

work page arXiv 1912

[10] [10]

Launchpad

Ghugare, R., Geist, M., Berseth, G., and Eysenbach, B. Closing the gap between td learning and supervised learning–a generalisation point of view.arXiv preprint arXiv:2401.11237, 2024a. Ghugare, R., Geist, M., Berseth, G., and Eysenbach, B. Closing the gap between td learning and supervised learning-a generalisation point of view. InThe Twelfth Internatio...

work page arXiv

[11] [11]

Classifier-Free Diffusion Guidance

Ho, J. and Salimans, T. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Drdt3: Diffusion- refined decision test-time training model.arXiv preprint arXiv:2501.06718,

Huang, X., Wu, D., and Boulet, B. Drdt3: Diffusion- refined decision test-time training model.arXiv preprint arXiv:2501.06718,

work page arXiv

[13] [13]

Offline Reinforcement Learning with Implicit Q-Learning

Kim, J., Lee, S., Kim, W., and Sung, Y . Adaptiveq-aid for conditional supervised learning in offline reinforcement learning.Advances in Neural Information Processing Systems, 37:87104–87135, 2024a. Kim, S., Choi, Y ., Matsunaga, D. E., and Kim, K.- E. Stitching sub-trajectories with conditional diffusion model for goal-conditioned offline rl. InProceedin...

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

and Choi, J

Lee, K. and Choi, J. State-covering trajectory stitching for diffusion planners.arXiv preprint arXiv:2506.00895,

work page arXiv

[15] [15]

Mgda: Model-based goal data augmentation for offline goal-conditioned weighted supervised learning.arXiv preprint arXiv:2412.11410,

Lei, X., Zhang, X., and Wang, D. Mgda: Model-based goal data augmentation for offline goal-conditioned weighted supervised learning.arXiv preprint arXiv:2412.11410,

work page arXiv

[16] [16]

and Zhang, X

Li, S. and Zhang, X. Augmenting offline reinforcement learning with state-only interactions.arXiv preprint arXiv:2402.00807,

work page arXiv

[17] [17]

Enhancing decision transformer with diffusion- based trajectory branch generation.arXiv preprint arXiv:2411.11327,

Liu, Z., Qian, L., Liu, Z., Wan, L., Chen, X., and Lan, X. Enhancing decision transformer with diffusion- based trajectory branch generation.arXiv preprint arXiv:2411.11327,

work page arXiv

[18] [18]

A., Du, Y ., and Xu, D

Luo, Y ., Mishra, U. A., Du, Y ., and Xu, D. Generative trajectory stitching through diffusion composition.arXiv preprint arXiv:2503.05153,

work page arXiv

[19] [19]

Compositional risk minimization.arXiv preprint arXiv:2410.06303,

Mahajan, D., Pezeshki, M., Mitliagkas, I., Ahuja, K., and Vincent, P. Compositional risk minimization.arXiv preprint arXiv:2410.06303,

work page arXiv

[20] [20]

A., He, D., Chen, Y ., and Xu, D

Mishra, U. A., He, D., Chen, Y ., and Xu, D. Compositional diffusion with guided search for long-horizon planning. arXiv preprint arXiv:2601.00126,

work page arXiv

[21] [21]

Composing diffusion policies for few-shot learning of movement trajectories

Patil, O., Sah, A., and Gopalan, N. Composing diffusion policies for few-shot learning of movement trajectories. arXiv preprint arXiv:2410.17479,

work page arXiv

[22] [22]

Composition and control with distilled energy diffusion models and sequential monte carlo

Thornton, J., Bethune, L., Zhang, R., Bradley, A., Nakkiran, P., and Zhai, S. Composition and control with distilled energy diffusion models and sequential monte carlo. arXiv preprint arXiv:2502.12786,

work page arXiv

[23] [23]

T., Dolan, J., Schneider, J., and Berseth, G

Venkatraman, S., Khaitan, S., Akella, R. T., Dolan, J., Schneider, J., and Berseth, G. Reasoning with latent diffusion in offline reinforcement learning.arXiv preprint arXiv:2309.06599,

work page arXiv

[24] [24]

H., and Tedrake, R

Wang, L., Zhao, J., Du, Y ., Adelson, E. H., and Tedrake, R. Poco: Policy composition from and for heterogeneous robot learning.arXiv preprint arXiv:2402.02511, 2024a. Wang, T., Torralba, A., Isola, P., and Zhang, A. Optimal goal-reaching reinforcement learning via quasimetric learning. InInternational Conference on Machine Learning, pp. 36411–36430. PMLR,

work page arXiv

[25] [25]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Wang, Y ., Yang, C., Wen, Y ., Liu, Y ., and Qiao, Y . Critic- guided decision transformer for offline reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 15706–15714, 2024b. Wang, Z., Hunt, J. J., and Zhou, M. Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv pre...

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Planning as in-painting: A diffusion-based embodied task planning framework for environments under uncertainty.arXiv preprint arXiv:2312.01097, 2023a

Yang, C.-F., Xu, H., Wu, T.-L., Gao, X., Chang, K.-W., and Gao, F. Planning as in-painting: A diffusion-based embodied task planning framework for environments under uncertainty.arXiv preprint arXiv:2312.01097, 2023a. Yang, Z., Mao, J., Du, Y ., Wu, J., Tenenbaum, J. B., Lozano- P´erez, T., and Kaelbling, L. P. Compositional diffusion- based continuous co...

work page arXiv

[27] [27]

Context-former: Stitching via latent conditioned sequence modeling.arXiv preprint arXiv:2401.16452,

Zhang, Z., Xu, J., Liu, J., Zhuang, Z., Wang, D., Liu, M., and Zhang, S. Context-former: Stitching via latent conditioned sequence modeling.arXiv preprint arXiv:2401.16452,

work page arXiv

[28] [28]

training horizon

Hence the score field is generally non-conservative. Chunk-interleaved variant.CD’s chunk-interleaved variant used in practice is an ordered composition of local updates. It is therefore better viewed as a proposal operator than as a single simultaneous score field. Its non-conservative character can still be formalized. Let Tk denote the local update tha...

2000

[29] [29]

We use the same planner checkpoints for CD and ECD

and CompDiffuser (Luo et al., 2025).We follow the official CompDiffuser implementation and hyperparameter conventions: https://github.com/devinluo27/comp_ diffuser_release. We use the same planner checkpoints for CD and ECD. PointMaze is executed with the PD controller. AntMaze, HumanoidMaze, AntSoccer, and AntMaze-o15d use inverse-dynamics models trained...

2025

[30] [30]

Compositional Diffusion with Guided Search (Mishra et al., 2025).We use the official CDGS codebase: https: //github.com/UtkarshMishra04/CDGS_ogbench

For HumanoidMaze and AntSoccer, which are not fully covered by the released pretrained checkpoints, we train our own planner and inverse-dynamics models following the CD codebase. Compositional Diffusion with Guided Search (Mishra et al., 2025).We use the official CDGS codebase: https: //github.com/UtkarshMishra04/CDGS_ogbench. We use the same checkpoints...

2025