EXPO stabilizes online RL for expressive policies by training a base policy with imitation and using a lightweight Gaussian edit policy to select higher-value actions on the fly for sampling and TD backups.
Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, and Sergey Levine
2 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.
citing papers explorer
-
EXPO: Stable Reinforcement Learning with Expressive Policies
EXPO stabilizes online RL for expressive policies by training a base policy with imitation and using a lightweight Gaussian edit policy to select higher-value actions on the fly for sampling and TD backups.
-
RISE: Self-Improving Robot Policy with Compositional World Model
RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.