Rainbow: Combining Improvements in Deep Reinforcement Learning

Bilal Piot; Dan Horgan; David Silver; Georg Ostrovski; Hado van Hasselt; Joseph Modayil; Matteo Hessel; Mohammad Azar; Tom Schaul; Will Dabney

arxiv: 1710.02298 · v1 · pith:IUEA73S6new · submitted 2017-10-06 · 💻 cs.AI · cs.LG

Rainbow: Combining Improvements in Deep Reinforcement Learning

Matteo Hessel , Joseph Modayil , Hado van Hasselt , Tom Schaul , Georg Ostrovski , Will Dabney , Dan Horgan , Bilal Piot

show 2 more authors

Mohammad Azar David Silver

This is my paper

classification 💻 cs.AI cs.LG

keywords performancealgorithmcombinationdeepextensionsimprovementslearningreinforcement

0 comments

read the original abstract

The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
cs.AI 2026-05 unverdicted novelty 6.0

LQL stabilizes Q-learning by penalizing violations of n-step action-sequence lower bounds with a hinge loss computed from standard network outputs.
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
cs.AI 2026-05 unverdicted novelty 6.0

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
Benchmarking Batch Deep Reinforcement Learning Algorithms
cs.LG 2019-10 unverdicted novelty 6.0

Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
cs.LG 2019-10 conditional novelty 6.0

AWR learns policies via advantage-weighted supervised regression on actions, achieving competitive off-policy performance on Gym tasks and strong results from static data alone.
Disentangled Skill Embeddings for Reinforcement Learning
cs.LG 2019-06 unverdicted novelty 6.0

Disentangled Skill Embeddings (DSE) is a variational inference framework for multi-task RL using shared parameters and task-specific latent embeddings for generalization to unseen conditions and as skills in hierarchical RL.
On Multi-Agent Learning in Team Sports Games
cs.MA 2019-06 unverdicted novelty 3.0

Describes a hierarchical RL method for multi-agent learning in team sports games aiming for human-like agents, reporting preliminary results that show promise.