pith. sign in

arxiv: 1910.10885 · v2 · pith:7AM2RFVPnew · submitted 2019-10-24 · 📡 eess.SY · cs.LG· cs.RO· cs.SY

Robust Model Predictive Shielding for Safe Reinforcement Learning with Stochastic Dynamics

classification 📡 eess.SY cs.LGcs.ROcs.SY
keywords stochasticlearningbackupcontrollerdynamicsensurepolicyreinforcement
0
0 comments X
read the original abstract

This paper proposes a framework for safe reinforcement learning that can handle stochastic nonlinear dynamical systems. We focus on the setting where the nominal dynamics are known, and are subject to additive stochastic disturbances with known distribution. Our goal is to ensure the safety of a control policy trained using reinforcement learning, e.g., in a simulated environment. We build on the idea of model predictive shielding (MPS), where a backup controller is used to override the learned policy as needed to ensure safety. The key challenge is how to compute a backup policy in the context of stochastic dynamics. We propose to use a tube-based robust NMPC controller as the backup controller. We estimate the tubes using sampled trajectories, leveraging ideas from statistical learning theory to obtain high-probability guarantees. We empirically demonstrate that our approach can ensure safety in stochastic systems, including cart-pole and a non-holonomic particle with random obstacles.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. One Demonstration Is Enough for Real-World Robotic Reinforcement Learning

    cs.RO 2026-07 unverdicted novelty 6.0

    AutoSERL achieves strong performance on six real-world robot manipulation tasks using RL guided by a single demonstration via sliding-window intervention, safety recovery, and automatic termination.

  2. Mahalanobis-Guided Latent OOD Detection for Hybrid ES-DRL Control in Time-Varying Systems

    cs.LG 2026-06 unverdicted novelty 3.0

    A hybrid ES-DRL controller uses VAE latent Mahalanobis OOD detection to switch between RL and ES modes for time-varying nonlinear systems.