hub Canonical reference

Planning with Diffusion for Flexible Behavior Synthesis

Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine · 2022 · cs.LG · arXiv 2205.09991

Canonical reference. 73% of citing Pith papers cite this work as background.

62 Pith papers citing it

Background 73% of classified citations

open full Pith review browse 62 citing papers arXiv PDF

abstract

Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 13 baseline 1 method 1

citation-polarity summary

background 11 unclear 2 baseline 1 use method 1

representative citing papers

GLENS: Global Search via Learning from Solver Iterates with Diffusion Models

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

GLENS uses diffusion models on solver iterates to generate high-quality and diverse initial guesses for multimodal non-convex optimization, leading to faster solver convergence.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

CoDi decomposes the multi-agent diffusion score into pre-trained single-agent policies plus a gradient-free cost guidance term to generate coordinated behavior from single-agent data alone.

Muninn: Your Trajectory Diffusion Model But Faster

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.

Decoupled Guidance Diffusion for Adaptive Offline Safe Reinforcement Learning

cs.LG · 2026-05-04 · unverdicted · novelty 7.0

SDGD uses cost-conditioned classifier-free guidance plus reward guidance with feasible trajectory relabeling to generate safe high-reward trajectories that adapt to changing safety budgets in offline RL.

Being-H0.7: A Latent World-Action Model from Egocentric Videos

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.

Long-Text-to-Image Generation via Compositional Prompt Decomposition

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.

ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

cs.RO · 2026-04-07 · unverdicted · novelty 7.0

ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.

Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control

cs.RO · 2026-03-18 · conditional · novelty 7.0

GeCO replaces time-dependent flow matching with time-unconditional optimization, enabling adaptive inference and intrinsic OOD detection for robotic imitation learning.

From Static Constraints to Dynamic Adaptation: Sample-Level Constraint Relaxation for Offline-to-Online Reinforcement Learning

cs.LG · 2025-11-05 · unverdicted · novelty 7.0 · 2 refs

DARE performs sample-level constraint relaxation in offline-to-online RL by conditioning on behavioral consistency with a behavior model via posterior-induced exchange, yielding improved fine-tuning stability and performance on D4RL benchmarks.

MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies

cs.RO · 2025-09-17 · unverdicted · novelty 7.0

MIMIC-D enables multi-modal multi-agent coordination via joint training of decentralized diffusion policies using only local information.

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

cs.RO · 2025-08-11 · conditional · novelty 7.0

BeyondMimic combines compact motion tracking with a unified guided latent diffusion model to master diverse agile behaviors from human demos and solve unseen downstream tasks via test-time classifier guidance.

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

cs.RO · 2025-06-18 · unverdicted · novelty 7.0

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.

BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

cs.LG · 2025-06-06 · conditional · novelty 7.0

BiTrajDiff augments offline RL datasets by running independent forward and backward diffusion processes from intermediate states, yielding higher performance than prior one-directional data-augmentation baselines on D4RL.

RoboDreamer: Learning Compositional World Models for Robot Imagination

cs.RO · 2024-04-18 · unverdicted · novelty 7.0

RoboDreamer factorizes video generation using language primitives to achieve compositional generalization in robot world models, outperforming monolithic baselines on unseen goals in RT-X.

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

cs.LG · 2022-08-12 · unverdicted · novelty 7.0

Diffusion-QL uses conditional diffusion models as expressive policies in offline RL by coupling behavior cloning with Q-value maximization, achieving SOTA on most D4RL tasks.

Off the Rails: Hijacking the Scoring Head in Generative End-to-End Driving Planners with Safety-Violating Adversarial Perturbations

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

Derail adversarial perturbations hijack the scoring head in generative E2E driving planners, flipping safe to unsafe trajectory selection with 39-80% score drops and up to 50% collision rates.

RS-Diffuser: Risk-Sensitive Diffusion Planning with Distributional Value Guidance

cs.LG · 2026-06-26 · unverdicted · novelty 6.0

RS-Diffuser integrates diffusion planners, quantile regression critics, and CVaR-style guidance to produce risk-averse to risk-seeking behaviors from one model in offline RL.

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

GDSD reduces RL for dLLMs to likelihood-free self-distillation via a normalization-free logit-matching objective, outperforming ELBO methods with more stable training on LLaDA-8B and Dream-7B.

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

cs.HC · 2026-05-26 · unverdicted · novelty 6.0

Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.

Unbiased Diffusion Variational Inversion via Principled Posterior Matching

cs.CV · 2026-05-24 · unverdicted · novelty 6.0

PPM derives a tractable gradient for exact KL optimization in diffusion variational inversion to achieve unbiased posterior matching without heuristic approximations.

Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning

cs.LG · 2026-05-24 · unverdicted · novelty 6.0

CEDGE applies energy-guided trajectory diffusion to generate adapted samples for off-dynamics offline RL, improving planning and policy learning on the ODRL benchmark.

citing papers explorer

Showing 50 of 62 citing papers.

GLENS: Global Search via Learning from Solver Iterates with Diffusion Models cs.LG · 2026-05-29 · unverdicted · none · ref 55 · internal anchor
GLENS uses diffusion models on solver iterates to generate high-quality and diverse initial guesses for multimodal non-convex optimization, leading to faster solver convergence.
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 62 · internal anchor
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations cs.RO · 2026-05-12 · unverdicted · none · ref 3 · internal anchor
CoDi decomposes the multi-agent diffusion score into pre-trained single-agent policies plus a gradient-free cost guidance term to generate coordinated behavior from single-agent data alone.
Muninn: Your Trajectory Diffusion Model But Faster cs.RO · 2026-05-11 · unverdicted · none · ref 21 · internal anchor
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
Decoupled Guidance Diffusion for Adaptive Offline Safe Reinforcement Learning cs.LG · 2026-05-04 · unverdicted · none · ref 11 · internal anchor
SDGD uses cost-conditioned classifier-free guidance plus reward guidance with feasible trajectory relabeling to generate safe high-reward trajectories that adapt to changing safety budgets in offline RL.
Being-H0.7: A Latent World-Action Model from Egocentric Videos cs.RO · 2026-04-30 · unverdicted · none · ref 23 · internal anchor
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
Long-Text-to-Image Generation via Compositional Prompt Decomposition cs.CV · 2026-04-20 · unverdicted · none · ref 5 · internal anchor
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.
ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching cs.RO · 2026-04-13 · unverdicted · none · ref 10 · internal anchor
ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.
Advantage-Guided Diffusion for Model-Based Reinforcement Learning cs.AI · 2026-04-10 · unverdicted · none · ref 11 · internal anchor
Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.
Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation cs.RO · 2026-04-07 · unverdicted · none · ref 15 · internal anchor
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control cs.RO · 2026-03-18 · conditional · none · ref 11 · internal anchor
GeCO replaces time-dependent flow matching with time-unconditional optimization, enabling adaptive inference and intrinsic OOD detection for robotic imitation learning.
From Static Constraints to Dynamic Adaptation: Sample-Level Constraint Relaxation for Offline-to-Online Reinforcement Learning cs.LG · 2025-11-05 · unverdicted · none · ref 6 · 2 links · internal anchor
DARE performs sample-level constraint relaxation in offline-to-online RL by conditioning on behavioral consistency with a behavior model via posterior-induced exchange, yielding improved fine-tuning stability and performance on D4RL benchmarks.
MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies cs.RO · 2025-09-17 · unverdicted · none · ref 13 · internal anchor
MIMIC-D enables multi-modal multi-agent coordination via joint training of decentralized diffusion policies using only local information.
BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion cs.RO · 2025-08-11 · conditional · none · ref 66 · internal anchor
BeyondMimic combines compact motion tracking with a unified guided latent diffusion model to master diverse agile behaviors from human demos and solve unseen downstream tasks via test-time classifier guidance.
Steering Your Diffusion Policy with Latent Space Reinforcement Learning cs.RO · 2025-06-18 · unverdicted · none · ref 34 · internal anchor
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning cs.LG · 2025-06-06 · conditional · none · ref 21 · internal anchor
BiTrajDiff augments offline RL datasets by running independent forward and backward diffusion processes from intermediate states, yielding higher performance than prior one-directional data-augmentation baselines on D4RL.
RoboDreamer: Learning Compositional World Models for Robot Imagination cs.RO · 2024-04-18 · unverdicted · none · ref 52 · internal anchor
RoboDreamer factorizes video generation using language primitives to achieve compositional generalization in robot world models, outperforming monolithic baselines on unseen goals in RT-X.
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning cs.LG · 2022-08-12 · unverdicted · none · ref 5 · internal anchor
Diffusion-QL uses conditional diffusion models as expressive policies in offline RL by coupling behavior cloning with Q-value maximization, achieving SOTA on most D4RL tasks.
Off the Rails: Hijacking the Scoring Head in Generative End-to-End Driving Planners with Safety-Violating Adversarial Perturbations cs.RO · 2026-06-29 · unverdicted · none · ref 20 · internal anchor
Derail adversarial perturbations hijack the scoring head in generative E2E driving planners, flipping safe to unsafe trajectory selection with 39-80% score drops and up to 50% collision rates.
RS-Diffuser: Risk-Sensitive Diffusion Planning with Distributional Value Guidance cs.LG · 2026-06-26 · unverdicted · none · ref 10 · internal anchor
RS-Diffuser integrates diffusion planners, quantile regression critics, and CVaR-style guidance to produce risk-averse to risk-seeking behaviors from one model in offline RL.
GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models cs.LG · 2026-05-28 · unverdicted · none · ref 20 · internal anchor
GDSD reduces RL for dLLMs to likelihood-free self-distillation via a normalization-free logit-matching objective, outperforming ELBO methods with more stable training on LLaDA-8B and Dream-7B.
A Systematic Study of Behavioral Cloning for Scientific Data Annotation cs.HC · 2026-05-26 · unverdicted · none · ref 286 · internal anchor
Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.
Unbiased Diffusion Variational Inversion via Principled Posterior Matching cs.CV · 2026-05-24 · unverdicted · none · ref 13 · internal anchor
PPM derives a tractable gradient for exact KL optimization in diffusion variational inversion to achieve unbiased posterior matching without heuristic approximations.
Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning cs.LG · 2026-05-24 · unverdicted · none · ref 13 · internal anchor
CEDGE applies energy-guided trajectory diffusion to generate adapted samples for off-dynamics offline RL, improving planning and policy learning on the ODRL benchmark.
Whole-Body Inverse Kinematics with Graph Diffusion cs.RO · 2026-05-23 · unverdicted · none · ref 34 · internal anchor
GraphDiff-IK formulates inverse kinematics as a conditional graph diffusion process on kinematic graphs derived from URDF to generate joint configurations for single-arm, dual-arm, and multi-branch robots.
Score-Based One-step MeanFlow Policy Optimization cs.LG · 2026-05-22 · unverdicted · none · ref 10 · internal anchor
SOM is an actor-critic algorithm that constructs the target velocity field for one-step MeanFlow policies directly from the Q-function via score estimation and probability flow ODE, achieving claimed SOTA on locomotion tasks with reduced training and inference time.
Variance Reduction for Expectations with Diffusion Teachers cs.LG · 2026-05-20 · unverdicted · none · ref 19 · 2 links · internal anchor
CARV amortizes upstream diffusion teacher costs over noise resamples with timestep importance sampling and stratified-inverse-CDF sampling, delivering 2-3x effective compute gains in text-to-3D experiments and order-of-magnitude variance cuts in single-step distillation.
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing cs.LG · 2026-05-15 · unverdicted · none · ref 112 · internal anchor
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making cs.LG · 2026-05-15 · unverdicted · none · ref 90 · internal anchor
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale cs.LG · 2026-05-14 · unverdicted · none · ref 61 · internal anchor
TFGN is an architectural overlay for transformers enabling task-free, replay-free continual pre-training across heterogeneous domains at LLM scale with near-zero backward transfer and high gradient orthogonality.
Policy-DRIFT: Dynamic Reward-Informed Flow Trajectory Steering physics.flu-dyn · 2026-05-13 · unverdicted · none · ref 4 · internal anchor
Policy-DRIFT combines conditional flow matching with terminal reward guidance and decoupled DRL to achieve 49% drag reduction in Re_tau=180 channel flow, 16% above DRL benchmarks and with 37 times less actuation energy.
Enforcing Constraints in Generative Sampling via Adaptive Correction Scheduling cs.LG · 2026-05-11 · unverdicted · none · ref 15 · internal anchor
Adaptive correction scheduling for hard constraints in generative sampling recovers 71% of stepwise projection benefits using 75% fewer corrections by focusing on trajectory-perturbing steps.
Path-Coupled Bellman Flows for Distributional Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 34 · internal anchor
PCBF learns return distributions via source-consistent Bellman-coupled paths with shared noise and λ-parameterized control variates, reporting improved fidelity and stability on MRPs, OGBench, and D4RL.
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies cs.LG · 2026-05-04 · unverdicted · none · ref 15 · internal anchor
OGPO enables sample-efficient full-finetuning of generative control policies via off-policy critics and modified PPO, achieving SOTA on robot manipulation tasks while rescuing poorly initialized behavior cloning policies without expert data.
Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation cs.RO · 2026-04-30 · unverdicted · none · ref 20 · internal anchor
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
Fisher Decorator: Refining Flow Policy via a Local Transport Map cs.LG · 2026-04-20 · unverdicted · none · ref 8 · internal anchor
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
CMP: Robust Whole-Body Tracking for Loco-Manipulation via Competence Manifold Projection cs.RO · 2026-04-08 · unverdicted · none · ref 23 · internal anchor
CMP projects actions onto a learned competence manifold using a frame-wise safety scheme and isomorphic latent space to achieve up to 10x better survival in out-of-distribution scenarios with under 10% tracking loss.
Equivariant Asynchronous Diffusion: An Adaptive Denoising Schedule for Accelerated Molecular Conformation Generation cs.LG · 2026-03-10 · unverdicted · none · ref 6 · internal anchor
EAD is an equivariant diffusion model with adaptive asynchronous denoising that achieves state-of-the-art 3D molecular conformation generation.
SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation cs.RO · 2026-03-05 · conditional · none · ref 18 · internal anchor
SeedPolicy introduces self-evolving gated attention to extend the temporal horizon of diffusion policies, yielding 36.8% and 169% relative gains over standard DP on clean and randomized RoboTwin 2.0 tasks.
How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models? cs.LG · 2026-02-02 · unverdicted · none · ref 7 · internal anchor
ALGD augments the Lagrangian to locally convexify the energy landscape in diffusion models, stabilizing safe RL training and generation without changing optimal policies.
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions cs.LG · 2025-09-23 · unverdicted · none · ref 16 · internal anchor
DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.
Real-Time Execution of Action Chunking Flow Policies cs.RO · 2025-06-09 · unverdicted · none · ref 26 · internal anchor
Real-time chunking (RTC) allows diffusion- and flow-based action chunking policies to execute smoothly and asynchronously, maintaining high success rates on dynamic tasks even with significant inference latency.
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model cs.CV · 2025-03-13 · unverdicted · none · ref 57 · internal anchor
HybridVLA unifies diffusion and autoregression in a single VLA model via collaborative training and ensemble to raise robot manipulation success rates by 14% in simulation and 19% in real-world tasks.
Diffusion Policy Policy Optimization cs.RO · 2024-09-01 · unverdicted · none · ref 45 · internal anchor
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies cs.LG · 2023-04-20 · conditional · none · ref 22 · internal anchor
IDQL generalizes IQL into an actor-critic framework and uses diffusion policies for robust policy extraction, outperforming prior offline RL methods.
Scaling Robot Learning with Semantically Imagined Experience cs.RO · 2023-02-22 · unverdicted · none · ref 51 · internal anchor
Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.
FlowAWR: Online Adaptive Flow Reinforcement via Advantage-Weighted Rectification cs.LG · 2026-06-29 · unverdicted · none · ref 9 · internal anchor
FlowAWR derives an advantage-weighted rectification for optimal velocity fields in flow models, claiming 2-5x faster convergence than DiffusionNFT on SD3.5-Medium.
GPC: Large-Scale Generative Pretraining for Transferable Motor Control cs.CV · 2026-06-28 · unverdicted · none · ref 192 · internal anchor
GPC learns a motion vocabulary via Finite Scalar Quantization and end-to-end RL, then trains an autoregressive transformer for next-token control generation, achieving 99.98% motion reproduction success with emergent robustness.
Physically Viable World Models: A Case for Query-Conditioned Embodied AI cs.AI · 2026-05-28 · unverdicted · none · ref 39 · internal anchor
Embodied AI requires query-conditioned world models that select the simplest physical abstraction sufficient to answer intervention queries.
Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization cs.LG · 2026-05-25 · unverdicted · none · ref 44 · internal anchor
MBDPO reformulates policy optimization as a diffusion process over searched trajectories in latent world models to reduce misalignment between search and value learning.

Planning with Diffusion for Flexible Behavior Synthesis

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer