SAC extends maximum-entropy RL into a stable off-policy actor-critic method with constrained temperature tuning, outperforming prior algorithms in sample efficiency and consistency on locomotion and manipulation tasks.
Trust-pcl: An off-policy trust region method for continuous control.arXiv preprint arXiv:1707.01891
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4representative citing papers
Soft Actor-Critic is an off-policy maximum-entropy actor-critic algorithm that achieves state-of-the-art performance and high stability on continuous control benchmarks.
Behavior-regularized actor-critic methods achieve strong offline RL results with simple regularization, rendering many recent technical additions unnecessary.
Maximum entropy reinforcement learning is equivalent to exact probabilistic inference for deterministic dynamics and variational inference for stochastic dynamics.
citing papers explorer
-
Soft Actor-Critic Algorithms and Applications
SAC extends maximum-entropy RL into a stable off-policy actor-critic method with constrained temperature tuning, outperforming prior algorithms in sample efficiency and consistency on locomotion and manipulation tasks.
-
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Soft Actor-Critic is an off-policy maximum-entropy actor-critic algorithm that achieves state-of-the-art performance and high stability on continuous control benchmarks.
-
Behavior Regularized Offline Reinforcement Learning
Behavior-regularized actor-critic methods achieve strong offline RL results with simple regularization, rendering many recent technical additions unnecessary.
-
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
Maximum entropy reinforcement learning is equivalent to exact probabilistic inference for deterministic dynamics and variational inference for stochastic dynamics.