pith. machine review for the scientific record. sign in

arxiv: 1710.02298 · v1 · submitted 2017-10-06 · 💻 cs.AI · cs.LG

Recognition: unknown

Rainbow: Combining Improvements in Deep Reinforcement Learning

Bilal Piot, Dan Horgan, David Silver, Georg Ostrovski, Hado van Hasselt, Joseph Modayil, Matteo Hessel, Mohammad Azar, Tom Schaul, Will Dabney

Authors on Pith no claims yet
classification 💻 cs.AI cs.LG
keywords performancealgorithmcombinationdeepextensionsimprovementslearningreinforcement
0
0 comments X
read the original abstract

The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

    cs.AI 2026-05 unverdicted novelty 6.0

    LQL stabilizes Q-learning by penalizing violations of n-step action-sequence lower bounds with a hinge loss computed from standard network outputs.

  2. Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

    cs.AI 2026-05 unverdicted novelty 6.0

    LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

  3. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

    cs.LG 2019-10 conditional novelty 6.0

    AWR learns policies via advantage-weighted supervised regression on actions, achieving competitive off-policy performance on Gym tasks and strong results from static data alone.