Deep Reinforcement Learning with Double Q-learning

Hado van Hasselt, Arthur Guez, David Silver · 2015 · cs.LG · arXiv 1509.06461

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Mastering Atari with Discrete World Models

cs.LG · 2020-10-05 · accept · novelty 7.0

DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

Neural Assistive Impulses: Synthesizing Exaggerated Motions for Physics-based Characters

cs.AI · 2026-04-07 · unverdicted · novelty 7.0

A hybrid neural policy operating in impulse space enables physics-based characters to track exaggerated, dynamically infeasible motions that standard DRL methods cannot stabilize.

Integrating Causal DAGs in Deep RL: Activating Minimal Markovian States with Multi-Order Exposure

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

A procedure builds provably minimal Markovian states from a longitudinal causal graph, but deep RL requires multi-order historical state exposure (MOSE) to realize gains over minimal or fixed-window baselines.

From Bootstrapping to Sequence Modeling: A Unified Generative Framework for Personalized Landing-Page Modeling

cs.IR · 2026-06-26 · unverdicted · novelty 5.0

GLAN replaces CQL bootstrapping with Decision Transformer sequence modeling for PLPM, using global inter-day (L-RTG) and local session (HRM) modules to achieve +0.158% DAU and +0.108% LT gains in Kuaishou online tests.

citing papers explorer

Showing 4 of 4 citing papers.

Mastering Atari with Discrete World Models cs.LG · 2020-10-05 · accept · none · ref 47 · internal anchor
DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
Neural Assistive Impulses: Synthesizing Exaggerated Motions for Physics-based Characters cs.AI · 2026-04-07 · unverdicted · none · ref 31
A hybrid neural policy operating in impulse space enables physics-based characters to track exaggerated, dynamically infeasible motions that standard DRL methods cannot stabilize.
Integrating Causal DAGs in Deep RL: Activating Minimal Markovian States with Multi-Order Exposure cs.LG · 2026-05-08 · unverdicted · none · ref 61
A procedure builds provably minimal Markovian states from a longitudinal causal graph, but deep RL requires multi-order historical state exposure (MOSE) to realize gains over minimal or fixed-window baselines.
From Bootstrapping to Sequence Modeling: A Unified Generative Framework for Personalized Landing-Page Modeling cs.IR · 2026-06-26 · unverdicted · none · ref 42 · internal anchor
GLAN replaces CQL bootstrapping with Decision Transformer sequence modeling for PLPM, using global inter-day (L-RTG) and local session (HRM) modules to achieve +0.158% DAU and +0.108% LT gains in Kuaishou online tests.

Deep Reinforcement Learning with Double Q-learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer