Reinforcement Learning with Unsupervised Auxiliary Tasks

David Silver; Joel Z Leibo; Koray Kavukcuoglu; Max Jaderberg; Tom Schaul; Volodymyr Mnih; Wojciech Marian Czarnecki

arxiv: 1611.05397 · v1 · pith:4KAI5ADKnew · submitted 2016-11-16 · 💻 cs.LG · cs.NE

Reinforcement Learning with Unsupervised Auxiliary Tasks

Max Jaderberg , Volodymyr Mnih , Wojciech Marian Czarnecki , Tom Schaul , Joel Z Leibo , David Silver , Koray Kavukcuoglu This is my paper

classification 💻 cs.LG cs.NE

keywords learningreinforcementtasksagentaveragingexpertextrinsichuman

0 comments

read the original abstract

Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that also maximises many other pseudo-reward functions simultaneously by reinforcement learning. All of these tasks share a common representation that, like unsupervised learning, continues to develop in the absence of extrinsic rewards. We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task. Our agent significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% expert human performance on Labyrinth.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
cs.LG 2019-11 accept novelty 8.0

MuZero matches or exceeds AlphaZero-level performance in Go, Chess, Shogi and sets a new state of the art on 57 Atari games by learning a model that directly supports planning rather than reconstructing full environme...
A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

Adapting RFRL objectives as auxiliary tasks with preference-guided exploration outperforms prior MORL methods in performance and data efficiency on MO-Gymnasium tasks.
Mastering Diverse Domains through World Models
cs.AI 2023-01 unverdicted novelty 7.0

DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
Dream to Control: Learning Behaviors by Latent Imagination
cs.LG 2019-12 accept novelty 7.0

Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.
Goal-Conditioned Agents that Learn Everything All at Once
cs.LG 2026-05 unverdicted novelty 6.0

LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Learning to Theorize the World from Observation
cs.LG 2026-05 unverdicted novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
Reflective Context Learning: Studying the Optimization Primitives of Context Space
cs.LG 2026-04 unverdicted novelty 6.0

Reflective Context Learning unifies context optimization for agents by recasting prior methods as instances of a shared learning problem and extending them with classical primitives such as batching, failure replay, a...
Reliability-Aware Geometric Fusion for Robust Audio-Visual Navigation
cs.SD 2026-04 unverdicted novelty 6.0

RAVN improves audio-visual navigation by learning audio-derived reliability cues via an Acoustic Geometry Reasoner and using them to modulate visual features through Reliability-Aware Geometric Modulation.
Learning Belief Representations for Imitation Learning in POMDPs
cs.LG 2019-06 unverdicted novelty 6.0

BMIL learns belief modules jointly with policies for GAIL-style imitation learning in POMDPs, outperforming separate training and standard GAIL on continuous control tasks.
Continual Reinforcement Learning with Diversity Exploration and Adversarial Self-Correction
cs.LG 2019-06 unverdicted novelty 6.0

CDAN framework uses diversity exploration and adversarial self-correction for continual RL in continuous control, evaluated on new CAM environment with NSD metric showing 18.35% NSD improvement over baseline.
Supervise Thyself: Examining Self-Supervised Representations in Interactive Environments
cs.LG 2019-06 unverdicted novelty 5.0

Empirical comparison finds that self-supervised representations vary in capturing agent state and generalizing to new levels or textures depending on environment visuals and dynamics.
Shaping Belief States with Generative Environment Models for RL
cs.LG 2019-06 unverdicted novelty 5.0

Multi-step predictive generative models form stable belief states capturing environment layout and agent pose, yielding higher data efficiency on RL tasks than model-free agents.
To Learn or Not to Learn: Analyzing the Role of Learning for Navigation in Virtual Environments
cs.CV 2019-07 unverdicted novelty 4.0

Classical agents outperform learning-based ones on MINOS and Stanford 3D Indoor Spaces, with learned agents weaker at collision avoidance and memory but stronger at handling ambiguity and noise.