pith. sign in

arxiv: 1804.01238 · v1 · pith:QQVZ4Z5Tnew · submitted 2018-04-04 · 💻 cs.LG · stat.ML

Information Maximizing Exploration with a Latent Dynamics Model

classification 💻 cs.LG stat.ML
keywords methodsdynamicsexplorationlearningmodelreinforcementapproachlatent
0
0 comments X
read the original abstract

All reinforcement learning algorithms must handle the trade-off between exploration and exploitation. Many state-of-the-art deep reinforcement learning methods use noise in the action selection, such as Gaussian noise in policy gradient methods or $\epsilon$-greedy in Q-learning. While these methods are appealing due to their simplicity, they do not explore the state space in a methodical manner. We present an approach that uses a model to derive reward bonuses as a means of intrinsic motivation to improve model-free reinforcement learning. A key insight of our approach is that this dynamics model can be learned in the latent feature space of a value function, representing the dynamics of the agent and the environment. This method is both theoretically grounded and computationally advantageous, permitting the efficient use of Bayesian information-theoretic methods in high-dimensional state spaces. We evaluate our method on several continuous control tasks, focusing on improving exploration.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.