pith. machine review for the scientific record. sign in

arxiv: 1707.03497 · v2 · submitted 2017-07-11 · 💻 cs.AI · cs.LG

Recognition: unknown

Value Prediction Network

Authors on Pith no claims yet
classification 💻 cs.AI cs.LG
keywords model-basednetworkdeepfuturelearningmethodsmodelmodel-free
0
0 comments X
read the original abstract

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.