Sample Efficient Actor-Critic with Experience Replay
read the original abstract
This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.
This paper has not been read by Pith yet.
Forward citations
Cited by 12 Pith papers
-
Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise
Establishes maximal concentration bounds for stochastic approximation under heavy-tailed Markovian noise, with tails ranging from sub-Gaussian to heavier than Weibull depending on step sizes and contractivity properti...
-
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
-
Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters
Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
-
Beyond Importance Sampling: Rejection-Gated Policy Optimization
RGPO replaces importance sampling with a smooth [0,1] acceptance gate in policy gradients, unifying TRPO/PPO/REINFORCE, bounding variance for heavy-tailed ratios, and showing gains in online RLHF experiments.
-
Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives
Shows entropy coupling limits DSAC on discrete tasks and introduces a generalized actor-critic framework with m-step critics and novel entropy-regularized objectives that perform robustly on Atari.
-
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
AWR learns policies via advantage-weighted supervised regression on actions, achieving competitive off-policy performance on Gym tasks and strong results from static data alone.
-
Polychromic Objectives for Reinforcement Learning
Introduces polychromic objectives adapted into PPO via vine sampling and modified advantages, showing higher success rates and better coverage under perturbations on BabyAI, Minigrid, and algorithmic tasks.
-
Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis
The authors extend the Almgren-Chriss model to a multi-agent setting and apply deep reinforcement learning to simulate and optimize liquidation strategies under practical constraints.
-
To Learn or Not to Learn: Analyzing the Role of Learning for Navigation in Virtual Environments
Classical agents outperform learning-based ones on MINOS and Stanford 3D Indoor Spaces, with learned agents weaker at collision avoidance and memory but stronger at handling ambiguity and noise.
-
A Dual Memory Structure for Efficient Use of Replay Memory in Deep Reinforcement Learning
Dual memory (main plus cache) for replay memory in DRL yields higher scores than single memory across three Gym environments.
-
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.
-
Optimal Use of Experience in First Person Shooter Environments
Empirical tests in VizDoom show multiple DQN updates per step do not improve performance after learning rate adjustment, with a 4:1 update-to-step ratio optimal before significant degradation.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.