pith. sign in

arxiv: 1803.00933 · v1 · pith:MO2K7L7Rnew · submitted 2018-03-02 · 💻 cs.LG

Distributed Prioritized Experience Replay

classification 💻 cs.LG
keywords experiencearchitecturelearningreplayactorsdatadistributedenvironment
0
0 comments X
read the original abstract

We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shared experience replay memory; the learner replays samples of experience and updates the neural network. The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors. Our architecture substantially improves the state of the art on the Arcade Learning Environment, achieving better final performance in a fraction of the wall-clock training time.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mastering Atari with Discrete World Models

    cs.LG 2020-10 accept novelty 7.0

    DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

  2. Dota 2 with Large Scale Deep Reinforcement Learning

    cs.LG 2019-12 accept novelty 7.0

    OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.

  3. Optimal design of solar-battery hybrid resources considering multi-market participation under weather and price uncertainty

    eess.SY 2026-05 unverdicted novelty 6.0

    A deep reinforcement learning co-optimization framework is developed for jointly sizing solar-battery hybrids and determining their multi-market bidding strategies under stochastic weather and price conditions.

  4. Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

    cs.LG 2025-09 unverdicted novelty 6.0

    A method trains discrete diffusion policies for combinatorial RL by matching to a PMD-regularized target distribution, reporting SOTA performance and sample efficiency on DNA generation, macro-action, and multi-agent ...

  5. Language Models (Mostly) Know What They Know

    cs.CL 2022-07 unverdicted novelty 6.0

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

  6. A General Language Assistant as a Laboratory for Alignment

    cs.CL 2021-12 conditional novelty 6.0

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  7. Scaling Laws for Transfer

    cs.LG 2021-02 unverdicted novelty 6.0

    Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

  8. Learning Safe Unlabeled Multi-Robot Planning with Motion Constraints

    cs.RO 2019-07 unverdicted novelty 5.0

    A multi-agent RL framework for unlabeled multi-robot planning that uses velocity obstacle projections to guarantee collision-free trajectories applicable to arbitrary robot models.

  9. Convolutional Reservoir Computing for World Models

    cs.LG 2019-07 unverdicted novelty 4.0

    RCRC uses untrained random CNNs and reservoir computing plus evolution strategies to reach claimed state-of-the-art scores in reinforcement learning tasks while avoiding data storage and heavy training.

  10. A Deep Reinforcement Learning Approach for Global Routing

    cs.LG 2019-06 unverdicted novelty 4.0

    Deep RL agent trained on generated global routing instances outperforms sequential A* search.