Recognition: unknown
Addressing Function Approximation Error in Actor-Critic Methods
read the original abstract
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.
This paper has not been read by Pith yet.
Forward citations
Cited by 8 Pith papers
-
To Learn or Not to Learn: A Litmus Test for Using Reinforcement Learning in Control
A litmus test based on reachset-conformant model identification and correlation analysis of uncertainties predicts if RL-based control is superior to model-based control without any RL training.
-
Soft Actor-Critic Algorithms and Applications
SAC extends maximum-entropy RL into a stable off-policy actor-critic method with constrained temperature tuning, outperforming prior algorithms in sample efficiency and consistency on locomotion and manipulation tasks.
-
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Soft Actor-Critic is an off-policy maximum-entropy actor-critic algorithm that achieves state-of-the-art performance and high stability on continuous control benchmarks.
-
RL Token: Bootstrapping Online RL with Vision-Language-Action Models
RL Token enables sample-efficient online RL fine-tuning of large VLAs, delivering up to 3x speed gains and higher success rates on real-robot manipulation tasks within minutes to hours.
-
Scalable Neighborhood-Based Multi-Agent Actor-Critic
MADDPG-K scales centralized critics in multi-agent RL by limiting each critic to k-nearest neighbors under Euclidean distance, yielding constant input size and competitive performance.
-
Cascaded TD3-PID Hybrid Controller for Quadrotor Trajectory Tracking in Wind Disturbance Environments
A cascaded TD3-PID controller with multi-Q TD3 and hybrid disturbance observer delivers more accurate quadrotor trajectory tracking under wind than standard PID or TD3 baselines in both simulation and real flights.
-
Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability
Recurrent TD3 with separate LSTM actor-critic networks delivers substantially stronger and more stable chemotherapy control than feed-forward baselines under partial observability on the AhnChemoEnv benchmark.
-
Cascaded TD3-PID Hybrid Controller for Quadrotor Trajectory Tracking in Wind Disturbance Environments
A cascaded TD3-PID controller with hybrid disturbance observer achieves more accurate and robust quadrotor trajectory tracking under wind disturbances than baseline methods.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.