EnergyFlow shows that denoising score matching on diffusion policies recovers the gradient of the expert's soft Q-function under maximum-entropy optimality, enabling non-adversarial reward extraction and improved policy generalization.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A survey of VLA robotics research identifies data infrastructure as the primary bottleneck and distills four open challenges in representation alignment, multimodal supervision, reasoning assessment, and scalable data generation.
citing papers explorer
-
Recovering Hidden Reward in Diffusion-Based Policies
EnergyFlow shows that denoising score matching on diffusion policies recovers the gradient of the expert's soft Q-function under maximum-entropy optimality, enabling non-adversarial reward extraction and improved policy generalization.
-
Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines
A survey of VLA robotics research identifies data infrastructure as the primary bottleneck and distills four open challenges in representation alignment, multimodal supervision, reasoning assessment, and scalable data generation.