Dueling Network Architectures for Deep Reinforcement Learning

· 2015 · cs.LG · arXiv 1511.06581

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open full Pith review browse 7 citing papers arXiv PDF

abstract

In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we present a new neural network architecture for model-free reinforcement learning. Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. Moreover, the dueling architecture enables our RL agent to outperform the state-of-the-art on the Atari 2600 domain.

representative citing papers

Learning the Arrow of Time

cs.LG · 2019-07-02 · unverdicted · novelty 7.0

Introduces a learned arrow of time in MDPs that aligns with the Jordan-Kinderlehrer-Otto notion for stochastic processes and enables practical RL utilities like reachability and side-effect detection.

UAV Access Point Placement for Connectivity to a User with Unknown Location Using Deep RL

eess.SP · 2019-07-09 · unverdicted · novelty 6.0

Deep RL positions UAV for target SINR to unknown user using SINR feedback and 3D map, achieving 90% success in ray-tracing simulations.

Continual Reinforcement Learning with Diversity Exploration and Adversarial Self-Correction

cs.LG · 2019-06-21 · unverdicted · novelty 6.0

CDAN framework uses diversity exploration and adversarial self-correction for continual RL in continuous control, evaluated on new CAM environment with NSD metric showing 18.35% NSD improvement over baseline.

Growing Action Spaces

cs.LG · 2019-06-28 · unverdicted · novelty 5.0

A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.

In Hindsight: A Smooth Reward for Steady Exploration

cs.LG · 2019-06-24 · unverdicted · novelty 4.0

Adding a hindsight factor that integrates historic temporal differences into the Q-learning loss reduces overestimation and yields higher average scores than DQN, DDQN and dueling networks on ATARI games after 10 million frames.

Deep Reinforcement Learning for Personalized Search Story Recommendation

cs.LG · 2019-07-26 · unverdicted · novelty 3.0

A deep RL architecture using imitation learning and reinforcement learning is proposed to model immediate and future values of search story recommendations in a Markov decision process framework.

Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey

cs.LG · 2019-07-22 · unverdicted · novelty 2.0

This survey compiles deep reinforcement learning algorithms for clinical decision support, reviews case studies, and offers guidance on algorithm selection for medical applications.

citing papers explorer

Showing 7 of 7 citing papers.

Learning the Arrow of Time cs.LG · 2019-07-02 · unverdicted · none · ref 20 · internal anchor
Introduces a learned arrow of time in MDPs that aligns with the Jordan-Kinderlehrer-Otto notion for stochastic processes and enables practical RL utilities like reachability and side-effect detection.
UAV Access Point Placement for Connectivity to a User with Unknown Location Using Deep RL eess.SP · 2019-07-09 · unverdicted · none · ref 16 · internal anchor
Deep RL positions UAV for target SINR to unknown user using SINR feedback and 3D map, achieving 90% success in ray-tracing simulations.
Continual Reinforcement Learning with Diversity Exploration and Adversarial Self-Correction cs.LG · 2019-06-21 · unverdicted · none · ref 31 · internal anchor
CDAN framework uses diversity exploration and adversarial self-correction for continual RL in continuous control, evaluated on new CAM environment with NSD metric showing 18.35% NSD improvement over baseline.
Growing Action Spaces cs.LG · 2019-06-28 · unverdicted · none · ref 15 · internal anchor
A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.
In Hindsight: A Smooth Reward for Steady Exploration cs.LG · 2019-06-24 · unverdicted · none · ref 16 · internal anchor
Adding a hindsight factor that integrates historic temporal differences into the Q-learning loss reduces overestimation and yields higher average scores than DQN, DDQN and dueling networks on ATARI games after 10 million frames.
Deep Reinforcement Learning for Personalized Search Story Recommendation cs.LG · 2019-07-26 · unverdicted · none · ref 55 · internal anchor
A deep RL architecture using imitation learning and reinforcement learning is proposed to model immediate and future values of search story recommendations in a Markov decision process framework.
Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey cs.LG · 2019-07-22 · unverdicted · none · ref 15 · internal anchor
This survey compiles deep reinforcement learning algorithms for clinical decision support, reviews case studies, and offers guidance on algorithm selection for medical applications.

Dueling Network Architectures for Deep Reinforcement Learning

fields

years

verdicts

representative citing papers

citing papers explorer