Visual Reinforcement Learning with Imagined Goals

Ashvin Nair , Vitchyr Pong , Murtaza Dalal , Shikhar Bahl , Steven Lin , Sergey Levine

Authors on Pith no claims yet

classification 💻 cs.LG cs.CVcs.ROstat.ML

keywords goalslearnlearningagentalgorithmgeneral-purposegoalmust

read the original abstract

For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised "practice" phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic
cs.LG 2026-05 unverdicted novelty 7.0

Embedding Temporal Logic enables runtime monitoring of temporally extended perceptual behaviors by defining predicates via distances between observed and reference embeddings in learned spaces, with conformal calibrat...
Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic
cs.LG 2026-05 unverdicted novelty 7.0

Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predica...
Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs
cs.LG 2026-05 unverdicted novelty 6.0

An actor-critic RL algorithm for low-rank MDPs achieves improved sample efficiency using solely a policy evaluation oracle.