A Brief Survey of Deep Reinforcement Learning

Anil Anthony Bharath; Kai Arulkumaran; Marc Peter Deisenroth; Miles Brundage

arxiv: 1708.05866 · v2 · pith:B44GYKFGnew · submitted 2017-08-19 · 💻 cs.LG · cs.AI· cs.CV· stat.ML

A Brief Survey of Deep Reinforcement Learning

Kai Arulkumaran , Marc Peter Deisenroth , Miles Brundage , Anil Anthony Bharath This is my paper

classification 💻 cs.LG cs.AIcs.CVstat.ML

keywords learningdeepreinforcementfieldsurveyalgorithmsdirectlyunderstanding

0 comments

read the original abstract

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
cs.LG 2026-04 unverdicted novelty 6.0

FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
cs.LG 2026-04 unverdicted novelty 6.0

FlashSAC scales up Soft Actor-Critic with fewer updates, larger models, higher data throughput, and norm bounds to deliver faster, more stable training than PPO on high-dimensional robot control tasks across dozens of...
A Survey on Vision-Language-Action Models for Embodied AI
cs.RO 2024-05 unverdicted novelty 6.0

This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
An Aircraft Upset Recovery System with Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 4.0

A SAC-based reinforcement learning controller for aircraft upset recovery is judged by domain experts to produce more desirable behavior than conventional control methods.
Plasticity Loss in Deep Reinforcement Learning: A Survey
cs.AI 2024-11 unverdicted novelty 4.0

Survey unifies the definition of plasticity loss in DRL, taxonomizes over 50 mitigations, identifies evaluation gaps, and finds general regularization often outperforms domain-specific methods.
An Automatic Ground Collision Avoidance System with Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 3.0

The paper designs a reinforcement learning-based automatic ground collision avoidance system for jet trainers that uses limited observations and line-of-sight terrain queries to prevent collisions.
Deep Reinforcement Learning for Personalized Search Story Recommendation
cs.LG 2019-07 unverdicted novelty 3.0

A deep RL architecture using imitation learning and reinforcement learning is proposed to model immediate and future values of search story recommendations in a Markov decision process framework.
Perfecting Aircraft Maneuvers with Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 2.0

Reinforcement learning agents simulate multiple aircraft aerobatic maneuvers to support development of an AI-assisted pilot training module.
Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey
cs.LG 2019-07 unverdicted novelty 2.0

This survey compiles deep reinforcement learning algorithms for clinical decision support, reviews case studies, and offers guidance on algorithm selection for medical applications.