Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto , Herke van Hoof , David Meger

Authors on Pith no claims yet

classification 💻 cs.AI cs.LGstat.ML

keywords actor-criticapproximationerrorfunctionmethodsoverestimationq-learningvalue

read the original abstract

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

To Learn or Not to Learn: A Litmus Test for Using Reinforcement Learning in Control
eess.SY 2026-04 unverdicted novelty 7.0

A litmus test based on reachset-conformant model identification and correlation analysis of uncertainties predicts if RL-based control is superior to model-based control without any RL training.
Soft Actor-Critic Algorithms and Applications
cs.LG 2018-12 unverdicted novelty 7.0

SAC extends maximum-entropy RL into a stable off-policy actor-critic method with constrained temperature tuning, outperforming prior algorithms in sample efficiency and consistency on locomotion and manipulation tasks.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
cs.LG 2018-01 accept novelty 7.0

Soft Actor-Critic is an off-policy maximum-entropy actor-critic algorithm that achieves state-of-the-art performance and high stability on continuous control benchmarks.
RL Token: Bootstrapping Online RL with Vision-Language-Action Models
cs.LG 2026-04 unverdicted novelty 6.0

RL Token enables sample-efficient online RL fine-tuning of large VLAs, delivering up to 3x speed gains and higher success rates on real-robot manipulation tasks within minutes to hours.
Scalable Neighborhood-Based Multi-Agent Actor-Critic
cs.LG 2026-04 unverdicted novelty 6.0

MADDPG-K scales centralized critics in multi-agent RL by limiting each critic to k-nearest neighbors under Euclidean distance, yielding constant input size and competitive performance.
Cascaded TD3-PID Hybrid Controller for Quadrotor Trajectory Tracking in Wind Disturbance Environments
eess.SY 2026-04 unverdicted novelty 5.0

A cascaded TD3-PID controller with multi-Q TD3 and hybrid disturbance observer delivers more accurate quadrotor trajectory tracking under wind than standard PID or TD3 baselines in both simulation and real flights.
Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability
cs.LG 2026-05 unverdicted novelty 4.0

Recurrent TD3 with separate LSTM actor-critic networks delivers substantially stronger and more stable chemotherapy control than feed-forward baselines under partial observability on the AhnChemoEnv benchmark.
Cascaded TD3-PID Hybrid Controller for Quadrotor Trajectory Tracking in Wind Disturbance Environments
eess.SY 2026-04 unverdicted novelty 3.0

A cascaded TD3-PID controller with hybrid disturbance observer achieves more accurate and robust quadrotor trajectory tracking under wind disturbances than baseline methods.