pith. machine review for the scientific record. sign in

arxiv: 2605.10888 · v2 · submitted 2026-05-11 · 💻 cs.LO · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Shields to Guarantee Probabilistic Safety in MDPs

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:06 UTC · model grok-4.3

classification 💻 cs.LO cs.AI
keywords shieldsprobabilistic safetyMarkov decision processessafety guaranteespermissivenessoffline shieldingonline shieldingreachability probabilities
0
0 comments X

The pith

A new framework extends classical shields to probabilistic safety in MDPs but proves strong safety and permissiveness cannot both be preserved.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a formal framework that adapts shielding techniques from absolute safety to settings where bad events may occur with a small acceptable probability. It shows that any attempt to keep the strong safety and maximal permissiveness properties of classical shields leads to contradiction once probabilistic failures are allowed. The work supplies weaker natural shields that relax one or the other property and supplies concrete offline and online constructions that restore strong safety guarantees while remaining computationally feasible on finite known MDPs.

Core claim

We present a formal framework that conservatively extends classical shields to probabilistic safety. In this framework, we demonstrate the impossibility of preserving the strong guarantees on safety and permissiveness, provide natural shields with weaker guarantees, and introduce offline and online shield constructions ensuring strong safety guarantees.

What carries the argument

Probabilistic shields that enforce a given safety probability threshold on a finite MDP by restricting actions based on reachability probabilities to unsafe states.

If this is right

  • No single shield can simultaneously deliver strong probabilistic safety and the same level of permissiveness as classical shields.
  • Natural weaker shields exist that sacrifice either safety strength or permissiveness in a controlled way.
  • Offline and online constructions can still enforce strong safety guarantees with practical computation on finite MDPs.
  • The framework conservatively generalizes the classical shielding approach without introducing new unsafe behaviors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These constructions could be combined with approximate reachability solvers to handle larger or continuous-state systems.
  • The impossibility result suggests that designers of autonomous agents must explicitly choose which property to relax when moving from hard safety to probabilistic safety.
  • Online shield variants may enable runtime adaptation in partially observable environments where the full MDP is not known in advance.

Load-bearing premise

The MDP is fully known, finite, and the desired probabilistic safety threshold is supplied as input.

What would settle it

An explicit finite MDP and safety threshold for which one of the proposed shield constructions permits paths whose total probability of reaching an unsafe state exceeds the threshold.

Figures

Figures reproduced from arXiv: 2605.10888 by Filip Mac\'ak, Linus Heck, Milan \v{C}e\v{s}ka, Roman Andriushchenko, Sebastian Junges.

Figure 1
Figure 1. Figure 1: MDP from [26] s0 s1 s2 ε 1/2 1/2 α β γ δ 1 1 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: A navigation environment MDP Actions Allowed if shield has s0 s1 s2 Safe? (S+) (S−) (P+) (P−) π1 ✓ ? ? ✓ ? π2 ✓ ? ? ✓ ✓ π3 ✗ ✗ ? ? ? π4 ✗ ✗ ✗ ? ? [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of the lattice of shields. Here, X → Y means Allow(X) ⊆ Allow(Y ). Example 10. Consider the MDP in [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Convergence of the allowed choices for off with ν = 0.2. Results collected every 4K steps and interpolated. Step 0 corresponds to S. tion during training [30], or training policies that account for their own safety budget [12], all of which are related to our pessimistic shields. Online and adaptive shielding. To avoid computing the shield for all state-action combinations, an online shielding approach has… view at source ↗
Figure 7
Figure 7. Figure 7: MDP A.2 Shields with Strong and Weak Guarantees Theorem 3. For a safety threshold 0 < ν < 1, there is an (acyclic, 5-state) MDP such that no shield satisfies (S+) and (P+). Proof (Theorem 3). We prove Theorem 3 for arbitrary 0 < ν < 1. Given such a ν, pick x = min{ ν 2 , 1−ν 2 } and consider [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: MDP Proof (Lemma 5). For δ = 0, the shield is the same as id and trivially satisfies (P+). For δ = 1, the shield is the same as Safe and thus satisfies (S+). Suppose 0 < δ < 1 and 0 < ν < 1. The δ-shield does not satisfy (P−): consider the MDP in [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗
read the original abstract

Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes with strong guarantees about safety and maximal permissiveness. However, shielding systems for probabilistic safety, where something bad is allowed to happen with an acceptable probability, has proven to be more intricate. This paper presents a formal framework that conservatively extends classical shields to probabilistic safety. In this framework, we (i) demonstrate the impossibility of preserving the strong guarantees on safety and permissiveness, (ii) provide natural shields with weaker guarantees, and (iii) introduce offline and online shield constructions ensuring strong safety guarantees. The empirical evaluation highlights the practical advantages of the new shields, as well as their computational feasibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a formal framework extending classical shielding to probabilistic safety in finite MDPs. It proves the impossibility of simultaneously preserving strong safety and maximal permissiveness guarantees, introduces natural shields with weaker guarantees, and provides offline and online shield constructions that enforce strong safety (via reachability probability thresholds) while remaining computationally feasible per empirical evaluation on selected instances.

Significance. If the constructions are correct, the work clarifies fundamental trade-offs in shielding for probabilistic systems and supplies practical methods for enforcing bounded-risk safety in autonomous agents. The impossibility result is a clean contribution, and the offline/online constructions build directly on standard MDP reachability, with empirical results indicating advantages over baselines.

major comments (2)
  1. [§5] §5 (online construction): the strong safety guarantee is obtained by solving reachability probabilities in the product MDP; the manuscript provides no explicit complexity bound or scaling argument, only empirical timing on small instances, even though probabilistic reachability is PSPACE-complete in general. This leaves open whether the method remains practical beyond the evaluated state spaces.
  2. [§4.2] §4.2 (impossibility): the proof that strong safety and maximal permissiveness cannot be preserved simultaneously is presented as definitional, but it would strengthen the claim to exhibit a concrete small MDP where any shield satisfying the probability threshold necessarily blocks an action that a classical shield would permit.
minor comments (2)
  1. [§3] Notation for the safety threshold p and the resulting shield set S_p is introduced in §3 but used inconsistently in later figures; a single definition box would improve readability.
  2. [Table 2] The empirical tables report average runtimes but omit standard deviations or the number of random seeds; adding these would make the feasibility claim easier to assess.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive evaluation of the contributions. We address each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: §5 (online construction): the strong safety guarantee is obtained by solving reachability probabilities in the product MDP; the manuscript provides no explicit complexity bound or scaling argument, only empirical timing on small instances, even though probabilistic reachability is PSPACE-complete in general. This leaves open whether the method remains practical beyond the evaluated state spaces.

    Authors: We agree that an explicit complexity discussion is missing. Probabilistic reachability is PSPACE-complete in the worst case, and our online construction reduces to solving a standard reachability query on the product MDP. We will revise §5 to explicitly acknowledge this complexity class, note that no polynomial-time algorithm is expected in general, and emphasize that the empirical timings on the evaluated benchmarks (up to a few thousand states) demonstrate practical feasibility. We will also add a brief remark on potential scalability improvements via symbolic or approximate methods for larger instances. revision: partial

  2. Referee: §4.2 (impossibility): the proof that strong safety and maximal permissiveness cannot be preserved simultaneously is presented as definitional, but it would strengthen the claim to exhibit a concrete small MDP where any shield satisfying the probability threshold necessarily blocks an action that a classical shield would permit.

    Authors: We accept this suggestion. While the impossibility follows directly from the definitions of strong safety and maximal permissiveness, we will strengthen the presentation in the revised §4.2 by adding a small, fully specified MDP example (with 3 states and 2 actions) that concretely shows an action permitted by the classical shield being blocked to meet the probabilistic safety threshold. revision: yes

Circularity Check

0 steps flagged

No circularity in formal extension of shielding to probabilistic safety

full rationale

The derivation chain consists of a conservative formal extension of classical shields, a direct proof of impossibility for preserving both strong safety and permissiveness, and explicit offline/online constructions that invoke standard reachability probability computation on finite MDPs. These steps rest on externally established MDP theory rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation. The impossibility result follows immediately from the probabilistic safety threshold definition without reducing to the paper's own outputs. No ansatz is smuggled via citation and no known empirical pattern is merely renamed. The framework therefore remains self-contained against standard benchmarks of probabilistic model checking.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard MDP assumptions plus new definitions of probabilistic shields; no free parameters or invented entities are visible from the abstract.

axioms (1)
  • domain assumption The MDP is finite-state with known transition probabilities and the safety threshold is given as input
    Standard assumption required for any shielding construction in MDPs.

pith-pipeline@v0.9.0 · 5430 in / 1074 out tokens · 30858 ms · 2026-05-14T21:06:05.976574+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 40 canonical work pages

  1. [1]

    In: AAAI

    Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI. pp. 2669–2678. AAAI Press (2018). https://doi.org/10.1609/AAAI.V32I1.11797

  2. [2]

    In: ATVA

    Anand, A., Germerie Guizouarn, L., Jéron, T., Mukherjee, S., Pinisetty, S., Sankur, O.: Prompt runtime enforcement. In: ATVA. pp. 135–156 (2025).https://doi. org/10.1007/978-3-032-08707-2_7

  3. [3]

    MIT press (2008)

    Baier, C., Katoen, J.P.: Principles of model checking. MIT press (2008)

  4. [4]

    Barrett, C., Henzinger, T.A., Seshia, S.A.: Certificates in AI: Learn but verify. Commun. ACM69(1), 66–75 (2025).https://doi.org/10.1145/3737447

  5. [5]

    In: ECAI (2025).https://doi

    Bethell, D., Gerasimou, S., Calinescu, R., Imrie, C.: Safe reinforcement learning in black-box environments via adaptive shielding. In: ECAI (2025).https://doi. org/10.3233/FAIA251092

  6. [6]

    Brázdil, T., Chatterjee, K., Novotný, P., Vahala, J.: Reinforcement learning of risk-constrained policies in markov decision processes. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Sym- posium on Educational Advances i...

  7. [7]

    In: AAMAS

    Brorholt, A.H., Larsen, K.G., Schilling, C.: Compositional shielding and rein- forcement learning for multi-agent systems. In: AAMAS. pp. 399–407. Interna- tional Foundation for Autonomous Agents and Multiagent Systems / ACM (2025). https://doi.org/10.5555/3709347.3743554

  8. [8]

    In: AAAI

    Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding under partial observability. In: AAAI. pp. 14748–14756 (2023).https://doi.org/ 10.1609/AAAI.V37I12.26723

  9. [9]

    In: AAMAS

    Chakraborty, D., Busatto-Gaston, D., Raskin, J.F., Pérez, G.A.: Formally-sharp dagger for mcts: Lower-latency monte carlo tree search using data aggregation with formal methods. In: AAMAS. pp. 1354–1362 (2023).https://doi.org/10.5555/ 3545946.3598783

  10. [10]

    IEEE Transactions on Automatic Control68(6), 3720–3727 (2022).https://doi.org/10.1109/TAC.2022.3195381

    Chapman, M.P., Faus, M., Smith, K.M.: On optimizing the conditional value-at-risk of a maximum cost for risk-averse safety analysis. IEEE Transactions on Automatic Control68(6), 3720–3727 (2022).https://doi.org/10.1109/TAC.2022.3195381

  11. [11]

    In: AAAI

    Chatterjee, K., Novotný, P., Pérez, G.A., Raskin, J., Zikelic, D.: Optimizing expec- tation with guarantees in pomdps. In: AAAI. pp. 3725–3732. AAAI Press (2017). https://doi.org/10.1609/AAAI.V31I1.11046

  12. [12]

    In: AAAI

    le Court, E.H., Belardinelli, F., Goodall, A.W.: Probabilistic shielding for safe reinforcement learning. In: AAAI. pp. 16091–16099. AAAI Press (2025).https: //doi.org/10.1609/AAAI.V39I15.33767

  13. [13]

    In: TACAS

    David, A., Jensen, P.G., Larsen, K.G., Mikucionis, M., Taankvist, J.H.: Uppaal stratego. In: TACAS. pp. 206–211. Lecture Notes in Computer Science, Springer (2015).https://doi.org/10.1007/978-3-662-46681-0_16

  14. [14]

    Dräger, K., Forejt, V., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Permissive controller synthesis for probabilistic systems. Log. Methods Comput. Sci.11(2) (2015).https://doi.org/10.2168/LMCS-11(2:16)2015

  15. [15]

    In: AAMAS

    ElSayed-Aly, I., Bharadwaj, S., Amato, C., Ehlers, R., Topcu, U., Feng, L.: Safe multi-agent reinforcement learning via shielding. In: AAMAS. p. 483–491 (2021). https://doi.org/10.5555/3463952.3464013 Shields to Guarantee Probabilistic Safety in MDPs 23

  16. [16]

    Proceedings of the ACM on Programming Languages9(OOPSLA1), 816–843 (2025).https://doi.org/10.1145/3720450

    Feng, Y., Zhu, J., Platzer, A., Laurent, J.: Adaptive shielding via parametric safety proofs. Proceedings of the ACM on Programming Languages9(OOPSLA1), 816–843 (2025).https://doi.org/10.1145/3720450

  17. [17]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).https://doi.org/10.1109/ TPAMI.2024.3457538

    Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Knoll, A.: A review of safe reinforcement learning: Methods, theories and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).https://doi.org/10.1109/ TPAMI.2024.3457538

  18. [18]

    In: Seghrouchni, A.E.F., Sukthankar, G., An, B., Yorke-Smith, N

    Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: Seghrouchni, A.E.F., Sukthankar, G., An, B., Yorke-Smith, N. (eds.) Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020. pp. 483–491. International Foundation for A...

  19. [19]

    shields to guarantee probabilistic safety in mdps

    Heck, L., Macák, F., Andriushchenko, R., Ceska, M., Junges, S.: Artifact supplement for "shields to guarantee probabilistic safety in mdps" (Apr 2026).https://doi. org/10.5281/zenodo.19829091,https://doi.org/10.5281/zenodo.19829091

  20. [20]

    Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker storm. Int. J. Softw. Tools Technol. Transf.24(4), 589–610 (2022). https://doi.org/10.1007/S10009-021-00633-Z

  21. [21]

    Annual Review of Control, Robotics, and Autonomous Systems7(2024).https://doi.org/10.1146/ANNUREV-CONTROL-071723-102940

    Hsu, K.C., Hu, H., Fisac, J.F.: The safety filter: A unified view of safety-critical con- trol in autonomous systems. Annual Review of Control, Robotics, and Autonomous Systems7(2024).https://doi.org/10.1146/ANNUREV-CONTROL-071723-102940

  22. [22]

    CoRRabs/2602.08734(2026)

    Hudák, D., Galesloot, M.F.L., Tappler, M., Kurecka, M., Jansen, N., Ceska, M.: Finite-state controllers for (hidden-model) pomdps using deep reinforcement learn- ing. CoRRabs/2602.08734(2026). https://doi.org/10.48550/ARXIV.2602. 08734,https://doi.org/10.48550/arXiv.2602.08734

  23. [23]

    In: CONCUR

    Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields (invited paper). In: CONCUR. pp. 3:1–3:16. LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020).https://doi. org/10.4230/LIPICS.CONCUR.2020.3

  24. [24]

    In: TACAS

    Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.: Safety-constrained rein- forcement learning for mdps. In: TACAS. pp. 130–146. Lecture Notes in Computer Science, Springer (2016).https://doi.org/10.1007/978-3-662-49674-9_8

  25. [25]

    IEEE Open Journal of Control Systems2, 79–92 (2023)

    Kochdumper, N., Krasowski, H., Wang, X., Bak, S., Althoff, M.: Provably safe rein- forcement learning via action projection using reachability analysis and polynomial zonotopes. IEEE Open Journal of Control Systems2, 79–92 (2023)

  26. [26]

    Formal Methods Syst

    Könighofer, B., Alshiekh, M., Bloem, R., Humphrey, L.R., Könighofer, R., Topcu, U., Wang, C.: Shield synthesis. Formal Methods Syst. Des.51(2), 332–361 (2017). https://doi.org/10.1007/S10703-017-0276-9

  27. [27]

    Könighofer, B., Bloem, R., Jansen, N., Junges, S., Pranger, S.: Shields for safe reinforcement learning. Commun. ACM68(11), 80–90 (2025).https://doi.org/ 10.1145/3715958

  28. [28]

    In: ISoLA (1)

    Könighofer, B., Lorber, F., Jansen, N., Bloem, R.: Shield synthesis for reinforcement learning. In: ISoLA (1). pp. 290–306. Lecture Notes in Computer Science, Springer (2020).https://doi.org/10.1007/978-3-030-61362-4_16

  29. [29]

    Innovations in Systems and Software Engineering19(4) (2023).https://doi.org/10.1007/S11334-022-00480-4 24 L

    Könighofer, B., Rudolf, J., Palmisano, A., Tappler, M., Bloem, R.: Online shielding for reinforcement learning. Innovations in Systems and Software Engineering19(4) (2023).https://doi.org/10.1007/S11334-022-00480-4 24 L. Heck et al

  30. [30]

    Krasowski, H., Thumm, J., Müller, M., Schäfer, L., Wang, X., Althoff, M.: Provably safe reinforcement learning: Conceptual analysis, survey, and benchmarking. Trans. Mach. Learn. Res.2023(2023)

  31. [31]

    ACM Trans

    Marzari, L., Cicalese, F., Farinelli, A., Amato, C., Marchesini, E.: Verifying online safety properties for safe deep reinforcement learning. ACM Trans. Intell. Syst. Technol. (2025).https://doi.org/10.1145/3770068

  32. [32]

    Nesti, F., Salamini, N., Marinoni, M., Cicero, G., Serra, G., Biondi, A., Buttazzo, G.C.: The use of the simplex architecture to enhance safety in deep-learning- powered autonomous systems. Eng. Appl. Artif. Intell.174, 114583 (2026).https: //doi.org/10.1016/J.ENGAPPAI.2026.114583

  33. [33]

    In: HLDVT

    Norman, G., Parker, D., Kwiatkowska, M., Shukla, S., Gupta, R.: Formal analysis and validation of continuous time Markov chain based system level power manage- ment strategies. In: HLDVT. pp. 45–50 (2002).https://doi.org/10.1109/HLDVT. 2002.1224427

  34. [34]

    In: NASA Formal Methods Symposium

    Phan, D.T., Grosu, R., Jansen, N., Paoletti, N., Smolka, S.A., Stoller, S.D.: Neural simplex architecture. In: NASA Formal Methods Symposium. pp. 97–114 (2020). https://doi.org/10.1007/978-3-030-55754-6_6

  35. [35]

    In: ACC (2021).https://doi.org/10

    Pranger, S., Könighofer, B., Tappler, M., Deixelberger, M., Jansen, N., Bloem, R.: Adaptive shielding under uncertainty. In: ACC (2021).https://doi.org/10. 23919/ACC50511.2021.9482889

  36. [36]

    In:Walsh, T.,Shah, J., Kolter,Z

    Rodríguez, A., Amir, G., Corsi, D., Sánchez, C., Katz, G.: Shield synthesis for LTL modulo theories. In:Walsh, T.,Shah, J., Kolter,Z. (eds.) Thirty-NinthAAAI Confer- ence on Artificial Intelligence, Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence, Fifteenth Symposium on Educational Advances in Artificial Intelligence, AAAI ...

  37. [37]

    In: IEEE International Conference on Robotics and Automation, ICRA 2024, Yokohama, Japan, May 13-17, 2024

    Sheng, S., Parker, D., Feng, L.: Safe POMDP online planning via shielding. In: IEEE International Conference on Robotics and Automation, ICRA 2024, Yokohama, Japan, May 13-17, 2024. pp. 126–132. IEEE (2024).https://doi.org/10.1109/ ICRA57147.2024.10610195

  38. [38]

    In: ISoLA

    Tappler, M., Pranger, S., Könighofer, B., Muškardin, E., Bloem, R., Larsen, K.: Automata learning meets shielding. In: ISoLA. pp. 335–359 (2022).https://doi. org/10.1007/978-3-031-19849-6_20

  39. [39]

    Pacific Journal of Mathematics5(2), 285–309 (1955).https://doi.org/10.2140/pjm.1955.5.285

    Tarski, A.: A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of Mathematics5(2), 285–309 (1955).https://doi.org/10.2140/pjm.1955.5.285

  40. [40]

    In: AAMAS

    Van Havermaet, S., Khaluf, Y., Simoens, P.: No more hand-tuning rewards: Masked constrained policy optimization for safe reinforcement learning. In: AAMAS. pp. 1344–1352 (2021).https://doi.org/10.5555/3463952.3464107

  41. [41]

    IEEE Control Systems Magazine43(5), 137–177 (2023).https://doi.org/10.1109/MCS.2023

    Wabersich, K.P., Taylor, A.J., Choi, J.J., Sreenath, K., Tomlin, C.J., Ames, A.D., Zeilinger, M.N.: Data-driven safety filters: Hamilton-jacobi reachability, control barrier functions, and predictive methods for uncertain systems. IEEE Control Systems Magazine43(5), 137–177 (2023).https://doi.org/10.1109/MCS.2023. 3291885

  42. [42]

    In: ATVA

    Waga, M., Castellano, E., Pruekprasert, S., Klikovits, S., Takisaka, T., Hasuo, I.: Dynamic shielding for reinforcement learning in black-box environments. In: ATVA. pp. 25–41 (2022).https://doi.org/10.1007/978-3-031-19992-9_2

  43. [43]

    plugging in

    Yang, W., Marra, G., Rens, G., Raedt, L.D.: Safe reinforcement learning via probabilistic logic shields. In: IJCAI. pp. 5739–5749. ijcai.org (2023).https://doi. org/10.24963/IJCAI.2023/637 Shields to Guarantee Probabilistic Safety in MDPs 25 A Proofs This appendix contains the proofs for all theorems and lemmas stated without proof in the paper. A.1 Shiel...