Parameterized Diffusion Policy learns a behavior manifold to condition diffusion policies on low-dimensional continuous parameters, enabling interpolation between strategies and adaptation to novel constraints without policy weight updates.
arXiv preprint arXiv:2311.01223 , year=
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.
The paper establishes an O(ε^{-4}) sample complexity bound for score estimation in diffusion models without requiring access to the empirical risk minimizer.
RS-Diffuser integrates diffusion planners, quantile regression critics, and CVaR-style guidance to produce risk-averse to risk-seeking behaviors from one model in offline RL.
Conditional Graph Diffusion generates continuous negotiation outcomes with high individual rationality using GATv2 encoders, cross-attention fusion, and inference-time normative guidance gradients.
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
A survey compiling DM-enabled DRL algorithms and applications across computation offloading, UAV systems, resource allocation, security, and robotics in wireless networks.
citing papers explorer
-
From Denoising to Decision Making: A Survey on Diffusion Model-Enabled Deep Reinforcement Learning for Wireless Networks
A survey compiling DM-enabled DRL algorithms and applications across computation offloading, UAV systems, resource allocation, security, and robotics in wireless networks.