Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
Diffusion models for reinforcement learning: A survey
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.
RSBM exploits velocity field invariance across regularization levels to achieve over 94% cosine similarity and 92% success in visual navigation using only 3 integration steps.
citing papers explorer
-
Aligning Flow Map Policies with Optimal Q-Guidance
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
-
Muninn: Your Trajectory Diffusion Model But Faster
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
-
TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics
TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.
-
Rectified Schr\"odinger Bridge Matching for Few-Step Visual Navigation
RSBM exploits velocity field invariance across regularization levels to achieve over 94% cosine similarity and 92% success in visual navigation using only 3 integration steps.