Recognition: 2 theorem links
· Lean TheoremShields to Guarantee Probabilistic Safety in MDPs
Pith reviewed 2026-05-14 21:06 UTC · model grok-4.3
The pith
A new framework extends classical shields to probabilistic safety in MDPs but proves strong safety and permissiveness cannot both be preserved.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a formal framework that conservatively extends classical shields to probabilistic safety. In this framework, we demonstrate the impossibility of preserving the strong guarantees on safety and permissiveness, provide natural shields with weaker guarantees, and introduce offline and online shield constructions ensuring strong safety guarantees.
What carries the argument
Probabilistic shields that enforce a given safety probability threshold on a finite MDP by restricting actions based on reachability probabilities to unsafe states.
If this is right
- No single shield can simultaneously deliver strong probabilistic safety and the same level of permissiveness as classical shields.
- Natural weaker shields exist that sacrifice either safety strength or permissiveness in a controlled way.
- Offline and online constructions can still enforce strong safety guarantees with practical computation on finite MDPs.
- The framework conservatively generalizes the classical shielding approach without introducing new unsafe behaviors.
Where Pith is reading between the lines
- These constructions could be combined with approximate reachability solvers to handle larger or continuous-state systems.
- The impossibility result suggests that designers of autonomous agents must explicitly choose which property to relax when moving from hard safety to probabilistic safety.
- Online shield variants may enable runtime adaptation in partially observable environments where the full MDP is not known in advance.
Load-bearing premise
The MDP is fully known, finite, and the desired probabilistic safety threshold is supplied as input.
What would settle it
An explicit finite MDP and safety threshold for which one of the proposed shield constructions permits paths whose total probability of reaching an unsafe state exceeds the threshold.
Figures
read the original abstract
Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes with strong guarantees about safety and maximal permissiveness. However, shielding systems for probabilistic safety, where something bad is allowed to happen with an acceptable probability, has proven to be more intricate. This paper presents a formal framework that conservatively extends classical shields to probabilistic safety. In this framework, we (i) demonstrate the impossibility of preserving the strong guarantees on safety and permissiveness, (ii) provide natural shields with weaker guarantees, and (iii) introduce offline and online shield constructions ensuring strong safety guarantees. The empirical evaluation highlights the practical advantages of the new shields, as well as their computational feasibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a formal framework extending classical shielding to probabilistic safety in finite MDPs. It proves the impossibility of simultaneously preserving strong safety and maximal permissiveness guarantees, introduces natural shields with weaker guarantees, and provides offline and online shield constructions that enforce strong safety (via reachability probability thresholds) while remaining computationally feasible per empirical evaluation on selected instances.
Significance. If the constructions are correct, the work clarifies fundamental trade-offs in shielding for probabilistic systems and supplies practical methods for enforcing bounded-risk safety in autonomous agents. The impossibility result is a clean contribution, and the offline/online constructions build directly on standard MDP reachability, with empirical results indicating advantages over baselines.
major comments (2)
- [§5] §5 (online construction): the strong safety guarantee is obtained by solving reachability probabilities in the product MDP; the manuscript provides no explicit complexity bound or scaling argument, only empirical timing on small instances, even though probabilistic reachability is PSPACE-complete in general. This leaves open whether the method remains practical beyond the evaluated state spaces.
- [§4.2] §4.2 (impossibility): the proof that strong safety and maximal permissiveness cannot be preserved simultaneously is presented as definitional, but it would strengthen the claim to exhibit a concrete small MDP where any shield satisfying the probability threshold necessarily blocks an action that a classical shield would permit.
minor comments (2)
- [§3] Notation for the safety threshold p and the resulting shield set S_p is introduced in §3 but used inconsistently in later figures; a single definition box would improve readability.
- [Table 2] The empirical tables report average runtimes but omit standard deviations or the number of random seeds; adding these would make the feasibility claim easier to assess.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive evaluation of the contributions. We address each major comment below and indicate the planned revisions.
read point-by-point responses
-
Referee: §5 (online construction): the strong safety guarantee is obtained by solving reachability probabilities in the product MDP; the manuscript provides no explicit complexity bound or scaling argument, only empirical timing on small instances, even though probabilistic reachability is PSPACE-complete in general. This leaves open whether the method remains practical beyond the evaluated state spaces.
Authors: We agree that an explicit complexity discussion is missing. Probabilistic reachability is PSPACE-complete in the worst case, and our online construction reduces to solving a standard reachability query on the product MDP. We will revise §5 to explicitly acknowledge this complexity class, note that no polynomial-time algorithm is expected in general, and emphasize that the empirical timings on the evaluated benchmarks (up to a few thousand states) demonstrate practical feasibility. We will also add a brief remark on potential scalability improvements via symbolic or approximate methods for larger instances. revision: partial
-
Referee: §4.2 (impossibility): the proof that strong safety and maximal permissiveness cannot be preserved simultaneously is presented as definitional, but it would strengthen the claim to exhibit a concrete small MDP where any shield satisfying the probability threshold necessarily blocks an action that a classical shield would permit.
Authors: We accept this suggestion. While the impossibility follows directly from the definitions of strong safety and maximal permissiveness, we will strengthen the presentation in the revised §4.2 by adding a small, fully specified MDP example (with 3 states and 2 actions) that concretely shows an action permitted by the classical shield being blocked to meet the probabilistic safety threshold. revision: yes
Circularity Check
No circularity in formal extension of shielding to probabilistic safety
full rationale
The derivation chain consists of a conservative formal extension of classical shields, a direct proof of impossibility for preserving both strong safety and permissiveness, and explicit offline/online constructions that invoke standard reachability probability computation on finite MDPs. These steps rest on externally established MDP theory rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation. The impossibility result follows immediately from the probabilistic safety threshold definition without reducing to the paper's own outputs. No ansatz is smuggled via citation and no known empirical pattern is merely renamed. The framework therefore remains self-contained against standard benchmarks of probabilistic model checking.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The MDP is finite-state with known transition probabilities and the safety threshold is given as input
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 3. For a safety threshold 0<ν<1, there is an (acyclic, 5-state) MDP such that no shield satisfies (S+) and (P+).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI. pp. 2669–2678. AAAI Press (2018). https://doi.org/10.1609/AAAI.V32I1.11797
-
[2]
Anand, A., Germerie Guizouarn, L., Jéron, T., Mukherjee, S., Pinisetty, S., Sankur, O.: Prompt runtime enforcement. In: ATVA. pp. 135–156 (2025).https://doi. org/10.1007/978-3-032-08707-2_7
-
[3]
MIT press (2008)
Baier, C., Katoen, J.P.: Principles of model checking. MIT press (2008)
2008
-
[4]
Barrett, C., Henzinger, T.A., Seshia, S.A.: Certificates in AI: Learn but verify. Commun. ACM69(1), 66–75 (2025).https://doi.org/10.1145/3737447
-
[5]
Bethell, D., Gerasimou, S., Calinescu, R., Imrie, C.: Safe reinforcement learning in black-box environments via adaptive shielding. In: ECAI (2025).https://doi. org/10.3233/FAIA251092
-
[6]
Brázdil, T., Chatterjee, K., Novotný, P., Vahala, J.: Reinforcement learning of risk-constrained policies in markov decision processes. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Sym- posium on Educational Advances i...
-
[7]
Brorholt, A.H., Larsen, K.G., Schilling, C.: Compositional shielding and rein- forcement learning for multi-agent systems. In: AAMAS. pp. 399–407. Interna- tional Foundation for Autonomous Agents and Multiagent Systems / ACM (2025). https://doi.org/10.5555/3709347.3743554
-
[8]
Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding under partial observability. In: AAAI. pp. 14748–14756 (2023).https://doi.org/ 10.1609/AAAI.V37I12.26723
- [9]
-
[10]
Chapman, M.P., Faus, M., Smith, K.M.: On optimizing the conditional value-at-risk of a maximum cost for risk-averse safety analysis. IEEE Transactions on Automatic Control68(6), 3720–3727 (2022).https://doi.org/10.1109/TAC.2022.3195381
-
[11]
Chatterjee, K., Novotný, P., Pérez, G.A., Raskin, J., Zikelic, D.: Optimizing expec- tation with guarantees in pomdps. In: AAAI. pp. 3725–3732. AAAI Press (2017). https://doi.org/10.1609/AAAI.V31I1.11046
-
[12]
le Court, E.H., Belardinelli, F., Goodall, A.W.: Probabilistic shielding for safe reinforcement learning. In: AAAI. pp. 16091–16099. AAAI Press (2025).https: //doi.org/10.1609/AAAI.V39I15.33767
-
[13]
David, A., Jensen, P.G., Larsen, K.G., Mikucionis, M., Taankvist, J.H.: Uppaal stratego. In: TACAS. pp. 206–211. Lecture Notes in Computer Science, Springer (2015).https://doi.org/10.1007/978-3-662-46681-0_16
-
[14]
Dräger, K., Forejt, V., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Permissive controller synthesis for probabilistic systems. Log. Methods Comput. Sci.11(2) (2015).https://doi.org/10.2168/LMCS-11(2:16)2015
-
[15]
ElSayed-Aly, I., Bharadwaj, S., Amato, C., Ehlers, R., Topcu, U., Feng, L.: Safe multi-agent reinforcement learning via shielding. In: AAMAS. p. 483–491 (2021). https://doi.org/10.5555/3463952.3464013 Shields to Guarantee Probabilistic Safety in MDPs 23
-
[16]
Feng, Y., Zhu, J., Platzer, A., Laurent, J.: Adaptive shielding via parametric safety proofs. Proceedings of the ACM on Programming Languages9(OOPSLA1), 816–843 (2025).https://doi.org/10.1145/3720450
-
[17]
Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Knoll, A.: A review of safe reinforcement learning: Methods, theories and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).https://doi.org/10.1109/ TPAMI.2024.3457538
-
[18]
In: Seghrouchni, A.E.F., Sukthankar, G., An, B., Yorke-Smith, N
Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. In: Seghrouchni, A.E.F., Sukthankar, G., An, B., Yorke-Smith, N. (eds.) Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020. pp. 483–491. International Foundation for A...
-
[19]
shields to guarantee probabilistic safety in mdps
Heck, L., Macák, F., Andriushchenko, R., Ceska, M., Junges, S.: Artifact supplement for "shields to guarantee probabilistic safety in mdps" (Apr 2026).https://doi. org/10.5281/zenodo.19829091,https://doi.org/10.5281/zenodo.19829091
-
[20]
Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker storm. Int. J. Softw. Tools Technol. Transf.24(4), 589–610 (2022). https://doi.org/10.1007/S10009-021-00633-Z
-
[21]
Hsu, K.C., Hu, H., Fisac, J.F.: The safety filter: A unified view of safety-critical con- trol in autonomous systems. Annual Review of Control, Robotics, and Autonomous Systems7(2024).https://doi.org/10.1146/ANNUREV-CONTROL-071723-102940
-
[22]
Hudák, D., Galesloot, M.F.L., Tappler, M., Kurecka, M., Jansen, N., Ceska, M.: Finite-state controllers for (hidden-model) pomdps using deep reinforcement learn- ing. CoRRabs/2602.08734(2026). https://doi.org/10.48550/ARXIV.2602. 08734,https://doi.org/10.48550/arXiv.2602.08734
-
[23]
Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields (invited paper). In: CONCUR. pp. 3:1–3:16. LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020).https://doi. org/10.4230/LIPICS.CONCUR.2020.3
-
[24]
Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.: Safety-constrained rein- forcement learning for mdps. In: TACAS. pp. 130–146. Lecture Notes in Computer Science, Springer (2016).https://doi.org/10.1007/978-3-662-49674-9_8
-
[25]
IEEE Open Journal of Control Systems2, 79–92 (2023)
Kochdumper, N., Krasowski, H., Wang, X., Bak, S., Althoff, M.: Provably safe rein- forcement learning via action projection using reachability analysis and polynomial zonotopes. IEEE Open Journal of Control Systems2, 79–92 (2023)
2023
-
[26]
Könighofer, B., Alshiekh, M., Bloem, R., Humphrey, L.R., Könighofer, R., Topcu, U., Wang, C.: Shield synthesis. Formal Methods Syst. Des.51(2), 332–361 (2017). https://doi.org/10.1007/S10703-017-0276-9
-
[27]
Könighofer, B., Bloem, R., Jansen, N., Junges, S., Pranger, S.: Shields for safe reinforcement learning. Commun. ACM68(11), 80–90 (2025).https://doi.org/ 10.1145/3715958
-
[28]
Könighofer, B., Lorber, F., Jansen, N., Bloem, R.: Shield synthesis for reinforcement learning. In: ISoLA (1). pp. 290–306. Lecture Notes in Computer Science, Springer (2020).https://doi.org/10.1007/978-3-030-61362-4_16
-
[29]
Könighofer, B., Rudolf, J., Palmisano, A., Tappler, M., Bloem, R.: Online shielding for reinforcement learning. Innovations in Systems and Software Engineering19(4) (2023).https://doi.org/10.1007/S11334-022-00480-4 24 L. Heck et al
-
[30]
Krasowski, H., Thumm, J., Müller, M., Schäfer, L., Wang, X., Althoff, M.: Provably safe reinforcement learning: Conceptual analysis, survey, and benchmarking. Trans. Mach. Learn. Res.2023(2023)
2023
-
[31]
Marzari, L., Cicalese, F., Farinelli, A., Amato, C., Marchesini, E.: Verifying online safety properties for safe deep reinforcement learning. ACM Trans. Intell. Syst. Technol. (2025).https://doi.org/10.1145/3770068
-
[32]
Nesti, F., Salamini, N., Marinoni, M., Cicero, G., Serra, G., Biondi, A., Buttazzo, G.C.: The use of the simplex architecture to enhance safety in deep-learning- powered autonomous systems. Eng. Appl. Artif. Intell.174, 114583 (2026).https: //doi.org/10.1016/J.ENGAPPAI.2026.114583
-
[33]
Norman, G., Parker, D., Kwiatkowska, M., Shukla, S., Gupta, R.: Formal analysis and validation of continuous time Markov chain based system level power manage- ment strategies. In: HLDVT. pp. 45–50 (2002).https://doi.org/10.1109/HLDVT. 2002.1224427
-
[34]
In: NASA Formal Methods Symposium
Phan, D.T., Grosu, R., Jansen, N., Paoletti, N., Smolka, S.A., Stoller, S.D.: Neural simplex architecture. In: NASA Formal Methods Symposium. pp. 97–114 (2020). https://doi.org/10.1007/978-3-030-55754-6_6
-
[35]
In: ACC (2021).https://doi.org/10
Pranger, S., Könighofer, B., Tappler, M., Deixelberger, M., Jansen, N., Bloem, R.: Adaptive shielding under uncertainty. In: ACC (2021).https://doi.org/10. 23919/ACC50511.2021.9482889
-
[36]
In:Walsh, T.,Shah, J., Kolter,Z
Rodríguez, A., Amir, G., Corsi, D., Sánchez, C., Katz, G.: Shield synthesis for LTL modulo theories. In:Walsh, T.,Shah, J., Kolter,Z. (eds.) Thirty-NinthAAAI Confer- ence on Artificial Intelligence, Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence, Fifteenth Symposium on Educational Advances in Artificial Intelligence, AAAI ...
-
[37]
Sheng, S., Parker, D., Feng, L.: Safe POMDP online planning via shielding. In: IEEE International Conference on Robotics and Automation, ICRA 2024, Yokohama, Japan, May 13-17, 2024. pp. 126–132. IEEE (2024).https://doi.org/10.1109/ ICRA57147.2024.10610195
-
[38]
Tappler, M., Pranger, S., Könighofer, B., Muškardin, E., Bloem, R., Larsen, K.: Automata learning meets shielding. In: ISoLA. pp. 335–359 (2022).https://doi. org/10.1007/978-3-031-19849-6_20
-
[39]
Pacific Journal of Mathematics5(2), 285–309 (1955).https://doi.org/10.2140/pjm.1955.5.285
Tarski, A.: A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of Mathematics5(2), 285–309 (1955).https://doi.org/10.2140/pjm.1955.5.285
-
[40]
Van Havermaet, S., Khaluf, Y., Simoens, P.: No more hand-tuning rewards: Masked constrained policy optimization for safe reinforcement learning. In: AAMAS. pp. 1344–1352 (2021).https://doi.org/10.5555/3463952.3464107
-
[41]
IEEE Control Systems Magazine43(5), 137–177 (2023).https://doi.org/10.1109/MCS.2023
Wabersich, K.P., Taylor, A.J., Choi, J.J., Sreenath, K., Tomlin, C.J., Ames, A.D., Zeilinger, M.N.: Data-driven safety filters: Hamilton-jacobi reachability, control barrier functions, and predictive methods for uncertain systems. IEEE Control Systems Magazine43(5), 137–177 (2023).https://doi.org/10.1109/MCS.2023. 3291885
-
[42]
Waga, M., Castellano, E., Pruekprasert, S., Klikovits, S., Takisaka, T., Hasuo, I.: Dynamic shielding for reinforcement learning in black-box environments. In: ATVA. pp. 25–41 (2022).https://doi.org/10.1007/978-3-031-19992-9_2
-
[43]
Yang, W., Marra, G., Rens, G., Raedt, L.D.: Safe reinforcement learning via probabilistic logic shields. In: IJCAI. pp. 5739–5749. ijcai.org (2023).https://doi. org/10.24963/IJCAI.2023/637 Shields to Guarantee Probabilistic Safety in MDPs 25 A Proofs This appendix contains the proofs for all theorems and lemmas stated without proof in the paper. A.1 Shiel...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.