Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Andreas Krause; Felix Berkenkamp; Joschka Boedecker; Matteo Turchetta; Torsten Koller

arxiv: 1906.12189 · v1 · pith:IPQ2GHZ2new · submitted 2019-06-27 · 📡 eess.SY · cs.AI· cs.LG· cs.SY

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Torsten Koller , Felix Berkenkamp , Matteo Turchetta , Joschka Boedecker , Andreas Krause This is my paper

Pith reviewed 2026-05-25 14:47 UTC · model grok-4.3

classification 📡 eess.SY cs.AIcs.LGcs.SY

keywords model predictive controlreinforcement learningsafety guaranteesconfidence intervalssafe explorationlearning-based controlterminal set constraint

0 comments

The pith

A learning-based model predictive control method supplies high-probability safety guarantees during reinforcement learning exploration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a model predictive control scheme that incorporates learned dynamics while enforcing safety throughout the learning process. It builds provably accurate confidence intervals around predicted trajectories from a reliable statistical model that handles input-dependent uncertainty. These intervals are used to ensure that generated trajectories meet safety constraints. A terminal set constraint then guarantees that safe actions remain available at every future step.

Core claim

The authors construct provably accurate confidence intervals on predicted trajectories from a reliable statistical model that handles input-dependent uncertainties. These intervals guarantee that trajectories satisfy safety constraints with high probability. A terminal set constraint recursively guarantees the existence of safe control actions at every iteration.

What carries the argument

Provably accurate confidence intervals on predicted trajectories from a reliable statistical model, together with a terminal set constraint for recursive feasibility.

If this is right

Trajectories generated during learning satisfy safety constraints with high probability.
Safe control actions remain available at every iteration through the terminal set.
The method enables safe exploration of unknown dynamics in physical systems such as pendulums.
Reinforcement learning tasks with explicit safety constraints can be solved without unsafe actions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support deployment of reinforcement learning in real-world physical plants where constraint violation carries high cost.
Similar confidence-bound techniques might transfer to other model-based planners that must remain feasible under uncertainty.
Testing on systems with time-varying or state-dependent noise would reveal how far the input-dependent interval construction generalizes.

Load-bearing premise

A reliable statistical model must exist that yields provably accurate confidence intervals on predicted trajectories even when uncertainty depends on the input.

What would settle it

Run the controller on the inverted pendulum or cart-pole and record a trajectory that the confidence intervals declared safe yet violates a safety constraint during execution.

Figures

Figures reproduced from arXiv: 1906.12189 by Andreas Krause, Felix Berkenkamp, Joschka Boedecker, Matteo Turchetta, Torsten Koller.

**Figure 2.** Figure 2: Decomposition of the over-approximated image of the system (1) under [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of the samples acquired in the static exploration setting in Sec. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 5.** Figure 5: Comparison of the information gathered from the system after [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of the performance of RL agents with varying [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Performance of the RL agents with safety trajectory length [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

Reinforcement learning has been successfully used to solve difficult tasks in complex unknown environments. However, these methods typically do not provide any safety guarantees during the learning process. This is particularly problematic, since reinforcement learning agent actively explore their environment. This prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that provides high-probability safety guarantees throughout the learning process. Based on a reliable statistical model, we construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we allow for input-dependent uncertainties. Based on these reliable predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. We evaluate the resulting algorithm to safely explore the dynamics of an inverted pendulum and to solve a reinforcement learning task on a cart-pole system with safety constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds input-dependent uncertainty handling and a terminal set to MPC for perpetual safety during RL exploration, but the guarantees hinge on statistical constructions that need verification.

read the letter

The main advance is allowing the learned model’s uncertainty to vary with the input when building trajectory confidence intervals inside MPC, then adding a terminal set constraint so that safe actions remain feasible at every step. This combination is presented as new relative to earlier safe MPC work. The evaluation on an inverted pendulum for dynamics exploration and a constrained cart-pole RL task gives concrete evidence that the scheme can run without violating safety bounds in simulation. Those experiments are straightforward and directly test the safety claim during learning. The central soft spot is the step that turns a statistical model into provably accurate high-probability intervals on predicted trajectories when uncertainty depends on the input. The abstract asserts this is possible and sufficient for constraint satisfaction, yet supplies no derivation or error analysis. If the intervals do not remain valid once the system is in closed loop and inputs change, the recursive safety guarantee does not hold. The paper is aimed at researchers who already work on model-based safe RL or robust MPC and want a concrete way to keep exploration safe. A reader in that group can extract the terminal-set construction and the evaluation setup even if they later adjust the statistical part. It is worth sending to peer review so the proofs and the input-dependent interval construction can be checked in detail.

Referee Report

2 major / 1 minor

Summary. The paper proposes a learning-based model predictive control (MPC) scheme for safe exploration in reinforcement learning. It constructs high-probability confidence intervals on predicted trajectories that allow input-dependent uncertainties, uses these to enforce safety constraints on trajectories, and adds a terminal set constraint to recursively guarantee the existence of safe control actions at every step. The approach is evaluated on an inverted pendulum for safe dynamics exploration and a cart-pole RL task with safety constraints.

Significance. If the statistical construction of the confidence intervals is valid under input dependence and the recursive feasibility holds in closed loop, the result would be significant for enabling safe RL in real-world applications. The work directly addresses the lack of safety guarantees during exploration, a key barrier for RL in safety-critical domains, and provides a concrete integration of statistical learning with MPC.

major comments (2)

[Abstract / confidence interval construction] Abstract and the section constructing confidence intervals: the central safety claim rests on 'provably accurate confidence intervals on predicted trajectories' that remain valid when uncertainty depends on the input. No derivation, error analysis, or explicit statistical model (e.g., handling of heteroscedasticity or temporal dependence) is supplied in the provided text to establish the high-probability bound for closed-loop trajectories; this is load-bearing for the guarantee.
[Terminal set constraint] Terminal set constraint paragraph: the recursive guarantee of safe actions at every iteration is asserted via the terminal set, but it is unclear how the probabilistic nature of the trajectory predictions (with input-dependent uncertainty) propagates into the terminal set definition and feasibility proof without additional assumptions on the uncertainty structure.

minor comments (1)

[Abstract] The abstract states the method 'guarantee[s] that trajectories satisfy safety constraints' but does not specify whether this is almost-sure or high-probability; clarify the exact probabilistic statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for recognizing the potential significance of the work for safe RL. We address the two major comments below. Both point to sections where the manuscript would benefit from expanded technical detail; we will revise accordingly.

read point-by-point responses

Referee: [Abstract / confidence interval construction] Abstract and the section constructing confidence intervals: the central safety claim rests on 'provably accurate confidence intervals on predicted trajectories' that remain valid when uncertainty depends on the input. No derivation, error analysis, or explicit statistical model (e.g., handling of heteroscedasticity or temporal dependence) is supplied in the provided text to establish the high-probability bound for closed-loop trajectories; this is load-bearing for the guarantee.

Authors: We agree that the submitted manuscript states the existence of provably accurate, input-dependent confidence intervals but does not supply the full derivation or error analysis. In the revision we will add a dedicated subsection that (i) specifies the statistical model, (ii) derives the high-probability bounds while explicitly treating input dependence, heteroscedasticity, and temporal correlation, and (iii) states the precise assumptions under which the bounds hold for closed-loop trajectories. revision: yes
Referee: [Terminal set constraint] Terminal set constraint paragraph: the recursive guarantee of safe actions at every iteration is asserted via the terminal set, but it is unclear how the probabilistic nature of the trajectory predictions (with input-dependent uncertainty) propagates into the terminal set definition and feasibility proof without additional assumptions on the uncertainty structure.

Authors: The terminal-set construction is intended to guarantee recursive feasibility under the same high-probability bounds used for the trajectory constraints. We acknowledge that the manuscript does not spell out how the input-dependent probabilistic bounds are propagated into the terminal-set definition and the associated feasibility argument. The revision will expand this paragraph (and the accompanying proof sketch) to make the propagation explicit and to list the additional assumptions required on the uncertainty structure. revision: yes

Circularity Check

0 steps flagged

No circularity; safety claims rest on external statistical model assumption

full rationale

The provided abstract and text describe a scheme that assumes a reliable statistical model yielding provably accurate confidence intervals (including for input-dependent uncertainty), then builds safety guarantees and terminal-set recursive feasibility on top of those intervals. No equations, fitted quantities, or self-citations are exhibited that would reduce the claimed high-probability guarantees to a definition, a renamed fit, or a self-referential chain by construction. The statistical model is treated as an independent input rather than derived within the paper, making the derivation self-contained against that benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a statistical model capable of producing provably accurate input-dependent confidence intervals; no free parameters, invented entities, or additional axioms are visible in the abstract.

axioms (1)

domain assumption A reliable statistical model of the dynamics exists that supports construction of provably accurate confidence intervals on trajectories
Invoked to justify the safety guarantees (abstract).

pith-pipeline@v0.9.0 · 5701 in / 1165 out tokens · 18856 ms · 2026-05-25T14:47:33.886249+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Based on a reliable statistical model, we construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we allow for input-dependent uncertainties. ... terminal set constraint to recursively guarantee the existence of safe control actions
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use ellipsoids to bound the uncertainty ... Minkowski sum ... generalized eigenvalue problem

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 6 internal anchors

[1]

Constrained Policy Optimization

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained Policy Optimization. arXiv:1705.10528 [cs], May 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

A. K. Akametalu, J. F. Fisac, J. H. Gillula, S. Kaynama, M. N. Zeilinger, and C. J. Tomlin. Reachability-based safe learning with Gaussian processes. In In Proc. of the IEEE Conference on Decision and Control (CDC), pages 1424–1431, December 2014

work page 2014
[3]

Constrained Markov Decision Processes

Eitan Altman. Constrained Markov Decision Processes . CRC Press, March 1999

work page 1999
[4]

A General-Purpose Software Framework for Dynamic Optimization

Joel Andersson. A General-Purpose Software Framework for Dynamic Optimization. PhD thesis, Arenberg Doctoral School, KU Leuven, Leuven, Belgium, October 2013

work page 2013
[5]

Control of uncertain nonlinear systems using ellipsoidal reachability calculus

Leonhard Asselborn, Dominic Gross, and Olaf Stursberg. Control of uncertain nonlinear systems using ellipsoidal reachability calculus. In Proc. of the International Federation of Automatic Control (IFAC) , 46(23):50–55, 2013

work page 2013
[6]

Shankar Sastry, and Claire Tom- lin

Anil Aswani, Humberto Gonzalez, S. Shankar Sastry, and Claire Tom- lin. Provably safe and robust learning-based model predictive control. Automatica, 49(5):1216–1226, May 2013

work page 2013
[7]

Berkenkamp, R

F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause. Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes. In In Proc. of the IEEE Conference on Decision and Control (CDC) , pages 4661–4666, December 2016

work page 2016
[8]

Schoellig, and Andreas Krause

Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, and Andreas Krause. Safe model-based reinforcement learning with stability guaran- tees. In Proc. of Neural Information Processing Systems (NIPS) , 1705, May 2017

work page 2017
[9]

Boedecker, J

J. Boedecker, J. T. Springenberg, J. W ¨ulﬁng, and M. Riedmiller. Approximate real-time optimal control based on sparse Gaussian process models. In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) , pages 1–8, December 2014

work page 2014
[10]

A deterministic algorithm for global optimization

Leo Breiman and Adele Cutler. A deterministic algorithm for global optimization. Mathematical Programming , 58(1-3):179–199, January 1993. 14

work page 1993
[11]

Lai, and Fakhrul Alam

Gang Cao, Edmund M.-K. Lai, and Fakhrul Alam. Gaussian process model predictive control of an unmanned quadrotor. Journal of Intelligent & Robotic Systems , 88(1):147–162, October 2017

work page 2017
[12]

Carson, Beh c ¸et Ac ¸ıkmes ¸e, Richard M

John M. Carson, Beh c ¸et Ac ¸ıkmes ¸e, Richard M. Murray, and Douglas G. MacMartin. A robust model predictive control algorithm augmented with a reactive safety mode. Automatica, 49(5):1251–1260, May 2013

work page 2013
[13]

S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V . Kumar, G. J. Pappas, and M. Morari. Approximating Explicit Model Predictive Control Using Constrained Neural Networks. In 2018 Annual American Control Conference (ACC), pages 1520–1527, June 2018

work page 2018
[14]

Lyapunov-based Safe Policy Optimization for Continuous Control

Yinlam Chow, Oﬁr Nachum, Aleksandra Faust, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. Lyapunov-based Safe Policy Optimiza- tion for Continuous Control. arXiv:1901.10031 [cs, stat] , January 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[15]

Safe Exploration in Continuous Action Spaces

Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, and Yuval Tassa. Safe Exploration in Continuous Action Spaces. arXiv:1801.08757 [cs], January 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[16]

Safely Learning to Control the Constrained Linear Quadratic Regulator

Sarah Dean, Stephen Tu, Nikolai Matni, and Benjamin Recht. Safely Learning to Control the Constrained Linear Quadratic Regulator. arXiv:1809.10121 [cs, math, stat] , September 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

PILCO: A model- based and data-efﬁcient approach to policy search

Marc Peter Deisenroth and Carl Edward Rasmussen. PILCO: A model- based and data-efﬁcient approach to policy search. In In Proceedings of the International Conference on Machine Learning , pages 465–472, 2011

work page 2011
[18]

Ernst, M

D. Ernst, M. Glavic, F. Capitanescu, and L. Wehenkel. Reinforcement learning versus model predictive control: A comparison on a power sys- tem problem. In IEEE Transactions on Systems, Man, and Cybernetics , 39(2):517–529, April 2009

work page 2009
[19]

Filippova

Tatiana F. Filippova. Ellipsoidal estimates of reachable sets for control systems with nonlinear terms. In Proc. of the International Federation of Automatic Control (IFAC), 50(1):15355–15360, July 2017

work page 2017
[20]

Safe Exploration of State and Action Spaces in Reinforcement Learning

Javier Garc´ıa and Fernando Fern ´andez. Safe Exploration of State and Action Spaces in Reinforcement Learning. J. Artif. Int. Res. , 45(1):515– 564, September 2012

work page 2012
[21]

Girard, C

A. Girard, C. E. Rasmussen, J. Qui ˜nonero-Candela, R. Murray-Smith, Becker, S, S. Thrun, and K. Obermayer. Multiple-step ahead prediction for non linear dynamic systems: A Gaussian Process treatment with propagation of the uncertainty. In Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), pages 529–536. MIT Press, October 2003

work page 2002
[22]

Golub and Charles F

Gene H. Golub and Charles F. Van Loan. Matrix Computations. JHU Press, December 2012

work page 2012
[23]

Zeilinger

Lukas Hewing and Melanie N. Zeilinger. Cautious model predictive con- trol using Gaussian process regression. arXiv preprint arXiv:1705.10702, 2017

work page arXiv 2017
[24]

Nghiem, Manfred Morari, and Rahul Mangharam

Achin Jain, Truong X. Nghiem, Manfred Morari, and Rahul Mangharam. Learning and Control Using Gaussian Processes: Towards Bridging Machine Learning and Controls for Physical Systems. In Proceedings of the 9th ACM/IEEE International Conference on Cyber-Physical Systems , ICCPS ’18, pages 140–149, Piscataway, NJ, USA, 2018. IEEE Press

work page 2018
[25]

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Sanket Kamthe and Marc Peter Deisenroth. Data-Efﬁcient Reinforcement Learning with Probabilistic Model Predictive Control. arXiv:1706.06491 [cs, stat], June 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

Learning-based Model Predictive Control for Safe Exploration

Torsten Koller, Felix Berkenkamp, Matteo Turchetta, and Andreas Krause. Learning-based Model Predictive Control for Safe Exploration. In Proc. of the IEEE Conference on Decision and Control (CDC) , March 2018

work page 2018
[27]

Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efﬁcient Algorithms and Empirical Studies

Andreas Krause, Ajit Singh, and Carlos Guestrin. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efﬁcient Algorithms and Empirical Studies. Journal of Machine Learning Research , 9(Feb):235– 284, 2008

work page 2008
[28]

A. B. Kurzhanskii and Istvan V ´alyi. Ellipsoidal Calculus for Estimation and Control. Boston, MA : Birkh ¨auser, 1997

work page 1997
[29]

Linear Optimal Control Systems, volume 1

Huibert Kwakernaak and Raphael Sivan. Linear Optimal Control Systems, volume 1. Wiley-interscience New York, 1972

work page 1972
[30]

Ostafew, Angela P

Chris J. Ostafew, Angela P. Schoellig, and Timothy D. Barfoot. Robust constrained learning-based NMPC enabling reliable mobile robot path tracking. The International Journal of Robotics Research , 35(13):1547– 1563, November 2016

work page 2016
[31]

Williams

Carl Edwar Rasmussen and Christopher K.I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge MA, 2006

work page 2006
[32]

Model Predictive Control: Theory and Design

James Blake Rawlings and David Q Mayne. Model Predictive Control: Theory and Design . Nob Hill Pub., 2009

work page 2009
[33]

Robust variable horizon model predictive control for vehicle maneuvering

Richards Arthur and How Jonathan P. Robust variable horizon model predictive control for vehicle maneuvering. International Journal of Robust and Nonlinear Control , 16(7):333–351, February 2006

work page 2006
[34]

Sample-Based Learning Model Predictive Control for Linear Uncertain Systems

Ugo Rosolia and Francesco Borrelli. Sample-Based Learning Model Predictive Control for Linear Uncertain Systems. arXiv:1904.06432 [cs], April 2019

work page arXiv 1904
[35]

Sadraddini and C

S. Sadraddini and C. Belta. A provably correct MPC approach to safety control of urban trafﬁc networks. InAmerican Control Conference (ACC), pages 1679–1684, July 2016

work page 2016
[36]

Nonlinear Model Predictive Control using Feedback Linearization and Local Inner Convex Constraint Approximations

Daniel Simon, Johan L ¨ofberg, and Torkel Glad. Nonlinear Model Predictive Control using Feedback Linearization and Local Inner Convex Constraint Approximations. In 2013 European Control Conference, July 17-19, Zurich, Switzerland , pages 2056–2061, 2013

work page 2013
[37]

M ¨uller, Sebastian Trimpe, and Frank Allg¨ower

Raffaele Soloperto, Matthias A. M ¨uller, Sebastian Trimpe, and Frank Allg¨ower. Learning-Based Robust Model Predictive Control with State- Dependent Uncertainty. IFAC-PapersOnLine, 51(20):442–447, January 2018

work page 2018
[38]

Gaussian process optimization in the bandit setting: No regret and experimental design

Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In In Proc. of the International Conference on Machine Learning (ICML) , pages 1015–1022, 2010

work page 2010
[39]

R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. IEEE Transactions on Neural Networks , 9(5):1054–1054, September 1998

work page 1998
[40]

D. H. van Hessem and O. H. Bosgra. Closed-loop stochastic dynamic process optimization under input and state constraints. In In Proc. of the American Control Conference (ACC) , volume 3, pages 2023–2028, May 2002

work page 2023
[41]

Stability of Controllers for Gaus- sian Process Forward Models

Julia Vinogradska, Bastian Bischoff, Duy Nguyen-Tuong, Henner Schmidt, Anne Romer, and Jan Peters. Stability of Controllers for Gaus- sian Process Forward Models. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pages 545–554, New York, NY , USA, 2016. JMLR.org

work page 2016
[42]

Linear model predictive safety certification for learning-based control

Kim P. Wabersich and Melanie N. Zeilinger. Linear model predictive safety certiﬁcation for learning-based control. arXiv:1803.08552 [cs] , March 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[44]

Wabersich and Melanie N

Kim P. Wabersich and Melanie N. Zeilinger. Safe exploration of nonlinear dynamical systems: A predictive safety ﬁlter for reinforcement learning. arXiv:1812.05506 [cs], December 2018

work page arXiv 2018
[45]

Andreas W ¨achter and Lorenz T. Biegler. On the implementation of an interior-point ﬁlter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, March 2006

work page 2006
[46]

Spline Models for Observational Data , volume 59

Grace Wahba. Spline Models for Observational Data , volume 59. Siam, 1990

work page 1990
[47]

G. R. Wood and B. P. Zhang. Estimation of the Lipschitz constant of a function. Journal of Global Optimization , 8(1):91–103, January 1996

work page 1996
[48]

C. Xie, S. Patil, T. Moldovan, S. Levine, and P. Abbeel. Model-based reinforcement learning with parametrized physical models and optimism- driven exploration. In In Proc. of the IEEE International Conference on Robotics and Automation (ICRA) , pages 504–511, May 2016

work page 2016

[1] [1]

Constrained Policy Optimization

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained Policy Optimization. arXiv:1705.10528 [cs], May 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[2] [2]

A. K. Akametalu, J. F. Fisac, J. H. Gillula, S. Kaynama, M. N. Zeilinger, and C. J. Tomlin. Reachability-based safe learning with Gaussian processes. In In Proc. of the IEEE Conference on Decision and Control (CDC), pages 1424–1431, December 2014

work page 2014

[3] [3]

Constrained Markov Decision Processes

Eitan Altman. Constrained Markov Decision Processes . CRC Press, March 1999

work page 1999

[4] [4]

A General-Purpose Software Framework for Dynamic Optimization

Joel Andersson. A General-Purpose Software Framework for Dynamic Optimization. PhD thesis, Arenberg Doctoral School, KU Leuven, Leuven, Belgium, October 2013

work page 2013

[5] [5]

Control of uncertain nonlinear systems using ellipsoidal reachability calculus

Leonhard Asselborn, Dominic Gross, and Olaf Stursberg. Control of uncertain nonlinear systems using ellipsoidal reachability calculus. In Proc. of the International Federation of Automatic Control (IFAC) , 46(23):50–55, 2013

work page 2013

[6] [6]

Shankar Sastry, and Claire Tom- lin

Anil Aswani, Humberto Gonzalez, S. Shankar Sastry, and Claire Tom- lin. Provably safe and robust learning-based model predictive control. Automatica, 49(5):1216–1226, May 2013

work page 2013

[7] [7]

Berkenkamp, R

F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause. Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes. In In Proc. of the IEEE Conference on Decision and Control (CDC) , pages 4661–4666, December 2016

work page 2016

[8] [8]

Schoellig, and Andreas Krause

Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, and Andreas Krause. Safe model-based reinforcement learning with stability guaran- tees. In Proc. of Neural Information Processing Systems (NIPS) , 1705, May 2017

work page 2017

[9] [9]

Boedecker, J

J. Boedecker, J. T. Springenberg, J. W ¨ulﬁng, and M. Riedmiller. Approximate real-time optimal control based on sparse Gaussian process models. In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) , pages 1–8, December 2014

work page 2014

[10] [10]

A deterministic algorithm for global optimization

Leo Breiman and Adele Cutler. A deterministic algorithm for global optimization. Mathematical Programming , 58(1-3):179–199, January 1993. 14

work page 1993

[11] [11]

Lai, and Fakhrul Alam

Gang Cao, Edmund M.-K. Lai, and Fakhrul Alam. Gaussian process model predictive control of an unmanned quadrotor. Journal of Intelligent & Robotic Systems , 88(1):147–162, October 2017

work page 2017

[12] [12]

Carson, Beh c ¸et Ac ¸ıkmes ¸e, Richard M

John M. Carson, Beh c ¸et Ac ¸ıkmes ¸e, Richard M. Murray, and Douglas G. MacMartin. A robust model predictive control algorithm augmented with a reactive safety mode. Automatica, 49(5):1251–1260, May 2013

work page 2013

[13] [13]

S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V . Kumar, G. J. Pappas, and M. Morari. Approximating Explicit Model Predictive Control Using Constrained Neural Networks. In 2018 Annual American Control Conference (ACC), pages 1520–1527, June 2018

work page 2018

[14] [14]

Lyapunov-based Safe Policy Optimization for Continuous Control

Yinlam Chow, Oﬁr Nachum, Aleksandra Faust, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. Lyapunov-based Safe Policy Optimiza- tion for Continuous Control. arXiv:1901.10031 [cs, stat] , January 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[15] [15]

Safe Exploration in Continuous Action Spaces

Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, and Yuval Tassa. Safe Exploration in Continuous Action Spaces. arXiv:1801.08757 [cs], January 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[16] [16]

Safely Learning to Control the Constrained Linear Quadratic Regulator

Sarah Dean, Stephen Tu, Nikolai Matni, and Benjamin Recht. Safely Learning to Control the Constrained Linear Quadratic Regulator. arXiv:1809.10121 [cs, math, stat] , September 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

PILCO: A model- based and data-efﬁcient approach to policy search

Marc Peter Deisenroth and Carl Edward Rasmussen. PILCO: A model- based and data-efﬁcient approach to policy search. In In Proceedings of the International Conference on Machine Learning , pages 465–472, 2011

work page 2011

[18] [18]

Ernst, M

D. Ernst, M. Glavic, F. Capitanescu, and L. Wehenkel. Reinforcement learning versus model predictive control: A comparison on a power sys- tem problem. In IEEE Transactions on Systems, Man, and Cybernetics , 39(2):517–529, April 2009

work page 2009

[19] [19]

Filippova

Tatiana F. Filippova. Ellipsoidal estimates of reachable sets for control systems with nonlinear terms. In Proc. of the International Federation of Automatic Control (IFAC), 50(1):15355–15360, July 2017

work page 2017

[20] [20]

Safe Exploration of State and Action Spaces in Reinforcement Learning

Javier Garc´ıa and Fernando Fern ´andez. Safe Exploration of State and Action Spaces in Reinforcement Learning. J. Artif. Int. Res. , 45(1):515– 564, September 2012

work page 2012

[21] [21]

Girard, C

A. Girard, C. E. Rasmussen, J. Qui ˜nonero-Candela, R. Murray-Smith, Becker, S, S. Thrun, and K. Obermayer. Multiple-step ahead prediction for non linear dynamic systems: A Gaussian Process treatment with propagation of the uncertainty. In Sixteenth Annual Conference on Neural Information Processing Systems (NIPS 2002), pages 529–536. MIT Press, October 2003

work page 2002

[22] [22]

Golub and Charles F

Gene H. Golub and Charles F. Van Loan. Matrix Computations. JHU Press, December 2012

work page 2012

[23] [23]

Zeilinger

Lukas Hewing and Melanie N. Zeilinger. Cautious model predictive con- trol using Gaussian process regression. arXiv preprint arXiv:1705.10702, 2017

work page arXiv 2017

[24] [24]

Nghiem, Manfred Morari, and Rahul Mangharam

Achin Jain, Truong X. Nghiem, Manfred Morari, and Rahul Mangharam. Learning and Control Using Gaussian Processes: Towards Bridging Machine Learning and Controls for Physical Systems. In Proceedings of the 9th ACM/IEEE International Conference on Cyber-Physical Systems , ICCPS ’18, pages 140–149, Piscataway, NJ, USA, 2018. IEEE Press

work page 2018

[25] [25]

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Sanket Kamthe and Marc Peter Deisenroth. Data-Efﬁcient Reinforcement Learning with Probabilistic Model Predictive Control. arXiv:1706.06491 [cs, stat], June 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [26]

Learning-based Model Predictive Control for Safe Exploration

Torsten Koller, Felix Berkenkamp, Matteo Turchetta, and Andreas Krause. Learning-based Model Predictive Control for Safe Exploration. In Proc. of the IEEE Conference on Decision and Control (CDC) , March 2018

work page 2018

[27] [27]

Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efﬁcient Algorithms and Empirical Studies

Andreas Krause, Ajit Singh, and Carlos Guestrin. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efﬁcient Algorithms and Empirical Studies. Journal of Machine Learning Research , 9(Feb):235– 284, 2008

work page 2008

[28] [28]

A. B. Kurzhanskii and Istvan V ´alyi. Ellipsoidal Calculus for Estimation and Control. Boston, MA : Birkh ¨auser, 1997

work page 1997

[29] [29]

Linear Optimal Control Systems, volume 1

Huibert Kwakernaak and Raphael Sivan. Linear Optimal Control Systems, volume 1. Wiley-interscience New York, 1972

work page 1972

[30] [30]

Ostafew, Angela P

Chris J. Ostafew, Angela P. Schoellig, and Timothy D. Barfoot. Robust constrained learning-based NMPC enabling reliable mobile robot path tracking. The International Journal of Robotics Research , 35(13):1547– 1563, November 2016

work page 2016

[31] [31]

Williams

Carl Edwar Rasmussen and Christopher K.I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge MA, 2006

work page 2006

[32] [32]

Model Predictive Control: Theory and Design

James Blake Rawlings and David Q Mayne. Model Predictive Control: Theory and Design . Nob Hill Pub., 2009

work page 2009

[33] [33]

Robust variable horizon model predictive control for vehicle maneuvering

Richards Arthur and How Jonathan P. Robust variable horizon model predictive control for vehicle maneuvering. International Journal of Robust and Nonlinear Control , 16(7):333–351, February 2006

work page 2006

[34] [34]

Sample-Based Learning Model Predictive Control for Linear Uncertain Systems

Ugo Rosolia and Francesco Borrelli. Sample-Based Learning Model Predictive Control for Linear Uncertain Systems. arXiv:1904.06432 [cs], April 2019

work page arXiv 1904

[35] [35]

Sadraddini and C

S. Sadraddini and C. Belta. A provably correct MPC approach to safety control of urban trafﬁc networks. InAmerican Control Conference (ACC), pages 1679–1684, July 2016

work page 2016

[36] [36]

Nonlinear Model Predictive Control using Feedback Linearization and Local Inner Convex Constraint Approximations

Daniel Simon, Johan L ¨ofberg, and Torkel Glad. Nonlinear Model Predictive Control using Feedback Linearization and Local Inner Convex Constraint Approximations. In 2013 European Control Conference, July 17-19, Zurich, Switzerland , pages 2056–2061, 2013

work page 2013

[37] [37]

M ¨uller, Sebastian Trimpe, and Frank Allg¨ower

Raffaele Soloperto, Matthias A. M ¨uller, Sebastian Trimpe, and Frank Allg¨ower. Learning-Based Robust Model Predictive Control with State- Dependent Uncertainty. IFAC-PapersOnLine, 51(20):442–447, January 2018

work page 2018

[38] [38]

Gaussian process optimization in the bandit setting: No regret and experimental design

Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In In Proc. of the International Conference on Machine Learning (ICML) , pages 1015–1022, 2010

work page 2010

[39] [39]

R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. IEEE Transactions on Neural Networks , 9(5):1054–1054, September 1998

work page 1998

[40] [40]

D. H. van Hessem and O. H. Bosgra. Closed-loop stochastic dynamic process optimization under input and state constraints. In In Proc. of the American Control Conference (ACC) , volume 3, pages 2023–2028, May 2002

work page 2023

[41] [41]

Stability of Controllers for Gaus- sian Process Forward Models

Julia Vinogradska, Bastian Bischoff, Duy Nguyen-Tuong, Henner Schmidt, Anne Romer, and Jan Peters. Stability of Controllers for Gaus- sian Process Forward Models. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pages 545–554, New York, NY , USA, 2016. JMLR.org

work page 2016

[42] [42]

Linear model predictive safety certification for learning-based control

Kim P. Wabersich and Melanie N. Zeilinger. Linear model predictive safety certiﬁcation for learning-based control. arXiv:1803.08552 [cs] , March 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[43] [44]

Wabersich and Melanie N

Kim P. Wabersich and Melanie N. Zeilinger. Safe exploration of nonlinear dynamical systems: A predictive safety ﬁlter for reinforcement learning. arXiv:1812.05506 [cs], December 2018

work page arXiv 2018

[44] [45]

Andreas W ¨achter and Lorenz T. Biegler. On the implementation of an interior-point ﬁlter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1):25–57, March 2006

work page 2006

[45] [46]

Spline Models for Observational Data , volume 59

Grace Wahba. Spline Models for Observational Data , volume 59. Siam, 1990

work page 1990

[46] [47]

G. R. Wood and B. P. Zhang. Estimation of the Lipschitz constant of a function. Journal of Global Optimization , 8(1):91–103, January 1996

work page 1996

[47] [48]

C. Xie, S. Patil, T. Moldovan, S. Levine, and P. Abbeel. Model-based reinforcement learning with parametrized physical models and optimism- driven exploration. In In Proc. of the IEEE International Conference on Robotics and Automation (ICRA) , pages 504–511, May 2016

work page 2016