Lyapunov-based Safe Policy Optimization for Continuous Control

[5]Chow, Y · 2019 · cs.LG · arXiv 1901.10031

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open full Pith review browse 7 citing papers arXiv PDF

abstract

We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i.e.,~policies that do not take the agent to undesirable situations. We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them. Our algorithms can use any standard policy gradient (PG) method, such as deep deterministic policy gradient (DDPG) or proximal policy optimization (PPO), to train a neural network policy, while guaranteeing near-constraint satisfaction for every policy update by projecting either the policy parameter or the action onto the set of feasible solutions induced by the state-dependent linearized Lyapunov constraints. Compared to the existing constrained PG algorithms, ours are more data efficient as they are able to utilize both on-policy and off-policy data. Moreover, our action-projection algorithm often leads to less conservative policy updates and allows for natural integration into an end-to-end PG training pipeline. We evaluate our algorithms and compare them with the state-of-the-art baselines on several simulated (MuJoCo) tasks, as well as a real-world indoor robot navigation problem, demonstrating their effectiveness in terms of balancing performance and constraint satisfaction. Videos of the experiments can be found in the following link: https://drive.google.com/file/d/1pzuzFqWIE710bE2U6DmS59AfRzqK2Kek/view?usp=sharing.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODACER) for Safe Reinforcement Learning in Optimal Control

eess.SY · 2026-01-10 · unverdicted · novelty 7.0

SODACER uses fast and slow buffers with adaptive clustering for experience replay in safe RL, integrated with CBFs and Sophia optimizer to achieve faster convergence and safety on nonlinear systems like HPV transmission.

Robust Shielding for Safe Reinforcement Learning

cs.AI · 2026-05-29 · unverdicted · novelty 6.0

A sound and optimal shielding method for robust MDPs ensures LTL safety under worst-case transitions and combines with PAC sampling to produce minimally restrictive shields for learned models.

Iteratively Learning Muscle Memory for Legged Robots to Master Adaptive and High Precision Locomotion

cs.RO · 2025-07-18 · unverdicted · novelty 6.0

Integrates iterative learning control with a torque library to enable high-precision adaptive locomotion on bipedal and quadrupedal robots, reducing tracking errors by up to 85% and achieving over 30x faster control rates.

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

eess.SY · 2019-06-27 · unverdicted · novelty 6.0

Develops a learning-based MPC algorithm that uses confidence intervals on trajectories and terminal set constraints to guarantee safety throughout RL exploration and training.

Safe-Support Q-Learning: Learning without Unsafe Exploration

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

Safe-Support Q-Learning trains Q-functions and policies in reinforcement learning without ever visiting unsafe states by constraining the behavior policy to a safe set and using KL-regularized Bellman targets in a two-stage framework.

Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production

cs.AI · 2026-04-14 · unverdicted · novelty 5.0

PF-CD3Q uses online particle filtering to estimate fatigue parameters and constrains a deep Q-learning agent to solve fatigue-aware human-robot task planning as a CMDP.

A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions

eess.SY · 2025-08-12 · unverdicted · novelty 2.0

A literature review of safe RL using Lyapunov and barrier functions that identifies a shift to model-free methods since 2017, well-defined open problems per approach class, and high-dimensional scalability as the main barrier.

citing papers explorer

Showing 7 of 7 citing papers.

Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODACER) for Safe Reinforcement Learning in Optimal Control eess.SY · 2026-01-10 · unverdicted · none · ref 7 · internal anchor
SODACER uses fast and slow buffers with adaptive clustering for experience replay in safe RL, integrated with CBFs and Sophia optimizer to achieve faster convergence and safety on nonlinear systems like HPV transmission.
Robust Shielding for Safe Reinforcement Learning cs.AI · 2026-05-29 · unverdicted · none · ref 31 · internal anchor
A sound and optimal shielding method for robust MDPs ensures LTL safety under worst-case transitions and combines with PAC sampling to produce minimally restrictive shields for learned models.
Iteratively Learning Muscle Memory for Legged Robots to Master Adaptive and High Precision Locomotion cs.RO · 2025-07-18 · unverdicted · none · ref 12 · internal anchor
Integrates iterative learning control with a torque library to enable high-precision adaptive locomotion on bipedal and quadrupedal robots, reducing tracking errors by up to 85% and achieving over 30x faster control rates.
Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning eess.SY · 2019-06-27 · unverdicted · none · ref 14 · internal anchor
Develops a learning-based MPC algorithm that uses confidence intervals on trajectories and terminal set constraints to guarantee safety throughout RL exploration and training.
Safe-Support Q-Learning: Learning without Unsafe Exploration cs.LG · 2026-04-28 · unverdicted · none · ref 2
Safe-Support Q-Learning trains Q-functions and policies in reinforcement learning without ever visiting unsafe states by constraining the behavior policy to a safe set and using KL-regularized Bellman targets in a two-stage framework.
Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production cs.AI · 2026-04-14 · unverdicted · none · ref 15
PF-CD3Q uses online particle filtering to estimate fatigue parameters and constrains a deep Q-learning agent to solve fatigue-aware human-robot task planning as a CMDP.
A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions eess.SY · 2025-08-12 · unverdicted · none · ref 76 · internal anchor
A literature review of safe RL using Lyapunov and barrier functions that identifies a shift to model-free methods since 2017, well-defined open problems per approach class, and high-dimensional scalability as the main barrier.

Lyapunov-based Safe Policy Optimization for Continuous Control

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer