Investigating Generalisation in Continuous Deep Reinforcement Learning

Chenyang Zhao; Freek Stulp; Olivier Sigaud; Timothy M. Hospedales

arxiv: 1902.07015 · v2 · pith:QVKAIGKRnew · submitted 2019-02-19 · 💻 cs.LG · cs.AI· stat.ML

Investigating Generalisation in Continuous Deep Reinforcement Learning

Chenyang Zhao , Olivier Sigaud , Freek Stulp , Timothy M. Hospedales This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords deepgeneralisationpracticetrainingalgorithmschallengescommonconclusions

0 comments

read the original abstract

Deep Reinforcement Learning has shown great success in a variety of control tasks. However, it is unclear how close we are to the vision of putting Deep RL into practice to solve real world problems. In particular, common practice in the field is to train policies on largely deterministic simulators and to evaluate algorithms through training performance alone, without a train/test distinction to ensure models generalise and are not overfitted. Moreover, it is not standard practice to check for generalisation under domain shift, although robustness to such system change between training and testing would be necessary for real-world Deep RL control, for example, in robotics. In this paper we study these issues by first characterising the sources of uncertainty that provide generalisation challenges in Deep RL. We then provide a new benchmark and thorough empirical evaluation of generalisation challenges for state of the art Deep RL methods. In particular, we show that, if generalisation is the goal, then common practice of evaluating algorithms based on their training performance leads to the wrong conclusions about algorithm choice. Finally, we evaluate several techniques for improving generalisation and draw conclusions about the most robust techniques to date.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Understanding Goal Generalisation in Sequential Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

Empirical analysis of over 100 sequential RL training pipelines across 250+ OOD environments finds salient features drive generalization and early goals persist, with latent policy gradients simulating latent variable...
VRLS: A Unified Reinforcement Learning Scheduler for Vehicle-to-Vehicle Communications
cs.NI 2019-07 unverdicted novelty 5.0

VRLS is a single reinforcement learning formulation for V2V resource scheduling that works across different densities and channel conditions, reduces collisions and half-duplex errors relative to prior schedulers, and...