Recognition: unknown
Distributed Prioritized Experience Replay
read the original abstract
We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shared experience replay memory; the learner replays samples of experience and updates the neural network. The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors. Our architecture substantially improves the state of the art on the Arcade Learning Environment, achieving better final performance in a fraction of the wall-clock training time.
This paper has not been read by Pith yet.
Forward citations
Cited by 5 Pith papers
-
Mastering Atari with Discrete World Models
DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
-
Dota 2 with Large Scale Deep Reinforcement Learning
OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
-
Optimal design of solar-battery hybrid resources considering multi-market participation under weather and price uncertainty
A deep reinforcement learning co-optimization framework is developed for jointly sizing solar-battery hybrids and determining their multi-market bidding strategies under stochastic weather and price conditions.
-
Language Models (Mostly) Know What They Know
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
-
A General Language Assistant as a Laboratory for Alignment
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.