pith. sign in

arxiv: 1703.01732 · v1 · pith:673E4YWYnew · submitted 2017-03-06 · 💻 cs.LG

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

classification 💻 cs.LG
keywords learningexplorationintrinsicmotivationreinforcementrewardstaskscomplex
0
0 comments X
read the original abstract

Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. Recent successes in deep reinforcement learning have been achieved mostly using simple heuristic exploration strategies such as $\epsilon$-greedy action selection or Gaussian control noise, but there are many tasks where these methods are insufficient to make any learning progress. Here, we consider more complex heuristics: efficient and scalable exploration strategies that maximize a notion of an agent's surprise about its experiences via intrinsic motivation. We propose to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model. One of our approximations results in using surprisal as intrinsic motivation, while the other gives the $k$-step learning progress. We show that our incentives enable agents to succeed in a wide range of environments with high-dimensional state spaces and very sparse rewards, including continuous control tasks and games in the Atari RAM domain, outperforming several other heuristic exploration techniques.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Goal-Conditioned Agents that Learn Everything All at Once

    cs.LG 2026-05 unverdicted novelty 6.0

    LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.

  2. Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

    cs.LG 2019-07 unverdicted novelty 6.0

    A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.

  3. LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning

    cs.RO 2025-09 unverdicted novelty 5.0

    LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.

  4. Neural Embedding for Physical Manipulations

    cs.LG 2019-07 unverdicted novelty 4.0

    Generative model with normalized pairwise distance constraint discovers output space topologies from sparse data and outperforms GANs and VAEs by avoiding mode collapse.