Recognition: unknown
Unifying Runtime Monitoring Approaches for Safety-Critical Machine Learning: Application to Vision-Based Landing
Pith reviewed 2026-05-07 11:09 UTC · model grok-4.3
The pith
Runtime monitoring for safety-critical machine learning can be unified by dividing approaches into three categories that check operating conditions, input distribution, or model behavior.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that categorizing runtime monitoring into Operational Design Domain (ODD) monitoring for compliance with expected conditions, Out-of-Distribution (OOD) monitoring for inputs outside the training distribution, and Out-of-Model-Scope (OMS) monitoring for anomalous internal model states or outputs unifies disparate approaches. This categorization facilitates the design of complementary monitoring activities and allows evaluation using shared safety-oriented metrics, as demonstrated in an experiment on vision-based runway detection during landing.
What carries the argument
The three distinct monitoring categories—Operational Design Domain (ODD), Out-of-Distribution (OOD), and Out-of-Model-Scope (OMS)—that classify approaches according to whether they check operating conditions, input distribution, or model behavior.
If this is right
- Monitoring activities can be designed using monitors from complementary categories to achieve broader coverage.
- Different monitoring methods can be evaluated and compared using the same safety-oriented metrics.
- The framework supports applications in safety-critical domains such as aeronautics, specifically vision-based landing.
- Fragmented research from different communities can be integrated under a single categorization scheme.
Where Pith is reading between the lines
- This unification could enable the creation of standardized test suites for ML monitors across industries.
- Extending the categories to other safety-critical ML uses like medical diagnosis might highlight missing monitor types.
- Future work could develop hybrid monitors that combine elements from multiple categories for improved performance.
Load-bearing premise
The three proposed categories are distinct and comprehensive enough to cover existing runtime monitoring approaches without major overlap or omission, and the runway detection experiment demonstrates practical benefits of the framework.
What would settle it
A runtime monitoring technique for safety-critical ML that cannot be placed in any of the three categories without forcing overlap or omission, or an evaluation in the runway detection task where categorized monitors show no advantage in safety metrics over existing methods.
Figures
read the original abstract
Runtime monitoring is essential to ensure the safety of ML applications in safety-critical domains. However, current research is fragmented, with independent methods emerging from different communities. In this paper, we propose a unified framework categorising runtime monitoring approaches into three distinct types: Operational Design Domain (ODD) monitoring, which ensures compliance with expected operating conditions; Out-of-Distribution (OOD) monitoring, which rejects inputs that deviate from the training data; and Out-of-Model-Scope (OMS) monitoring, which detects anomalous model behaviour based its internal states or outputs. We demonstrate the benefits of this categorization with a dedicated experiment on an aeronautical safety-critical application: runway detection during landing. This framework facilitates design of monitoring activities, with complementary categories of monitors, and enables evaluation and comparison of different monitors using common, safety-oriented metrics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a unified framework for runtime monitoring of safety-critical ML systems by categorizing approaches into three types: Operational Design Domain (ODD) monitoring to ensure compliance with expected operating conditions, Out-of-Distribution (OOD) monitoring to reject inputs deviating from training data, and Out-of-Model-Scope (OMS) monitoring to detect anomalous model behavior from internal states or outputs. It illustrates the framework via a runway detection experiment in a vision-based landing application, claiming that the categorization enables complementary monitor design and evaluation/comparison of monitors using shared safety-oriented metrics.
Significance. If the taxonomy proves comprehensive and the common metrics enable rigorous cross-monitor comparisons, the work could reduce fragmentation across communities (e.g., OOD detection, anomaly detection, and operational monitoring) and support more systematic safety case construction for ML in domains like aviation. The runway-detection case study provides a concrete aeronautical example, which is a strength for applicability, but the absence of quantitative results, error analysis, or a literature mapping table limits the demonstrated impact on evaluation.
major comments (3)
- [§3] §3 (Proposed Framework): The definitions of ODD, OOD, and OMS monitoring are given intuitively but without explicit decision procedures or boundary conditions for borderline cases (e.g., an input-distribution monitor that also inspects model logits or outputs). This directly undermines the claim that the categories are 'distinct' and 'complementary' and that they enable unambiguous design and comparison.
- [§4] §4 (Case Study / Experiment): The runway detection demonstration assigns one monitor to each category but provides no systematic mapping of prior runtime-monitoring papers to the three bins, nor does it report quantitative results, safety-oriented metrics, or comparative evaluation across monitors. Without this, the experiment illustrates rather than validates the unification and common-metric benefits asserted in the abstract and §1.
- [§2] §2 (Related Work): No table or structured review maps existing methods (e.g., from OOD detection literature or runtime verification) onto the ODD/OOD/OMS taxonomy. This omission is load-bearing for the 'unifying' claim, as overlap or omission risks cannot be assessed.
minor comments (2)
- [§3] Notation for the three categories is introduced without a summary table; adding one early in §3 would improve readability.
- [Abstract and §4] The abstract states the experiment demonstrates 'benefits' but the text provides only qualitative description; clarify whether any numerical safety metrics (e.g., false-positive rates under ODD violation) appear in §4.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help us improve the clarity and rigor of our proposed framework. We respond to each major comment below and describe the corresponding revisions to the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Proposed Framework): The definitions of ODD, OOD, and OMS monitoring are given intuitively but without explicit decision procedures or boundary conditions for borderline cases (e.g., an input-distribution monitor that also inspects model logits or outputs). This directly undermines the claim that the categories are 'distinct' and 'complementary' and that they enable unambiguous design and comparison.
Authors: We agree that the intuitive definitions in Section 3 would benefit from more explicit decision procedures to handle borderline cases and to rigorously support the claims of distinctness and complementarity. In the revised manuscript, we will augment §3 with a decision flowchart or pseudocode outlining the classification criteria for each category, along with concrete examples of monitors that might straddle boundaries (such as those examining both input distributions and model outputs). This addition will clarify how the categories remain distinct while allowing for complementary use in safety-critical applications. revision: yes
-
Referee: [§4] §4 (Case Study / Experiment): The runway detection demonstration assigns one monitor to each category but provides no systematic mapping of prior runtime-monitoring papers to the three bins, nor does it report quantitative results, safety-oriented metrics, or comparative evaluation across monitors. Without this, the experiment illustrates rather than validates the unification and common-metric benefits asserted in the abstract and §1.
Authors: The experiment in §4 serves primarily to demonstrate the application of the proposed framework in a concrete aeronautical scenario. We acknowledge that it does not include a full comparative evaluation or quantitative safety metrics. To address this, we will expand §4 to include quantitative results using safety-oriented metrics such as detection rates, false alarm rates, and coverage of the operational envelope. We will also provide a comparative analysis of the three monitors. However, due to the illustrative nature and computational constraints of the simulation environment, an exhaustive validation across all prior methods is not feasible; instead, we will focus on the assigned monitors and discuss how the metrics enable future comparisons. revision: partial
-
Referee: [§2] §2 (Related Work): No table or structured review maps existing methods (e.g., from OOD detection literature or runtime verification) onto the ODD/OOD/OMS taxonomy. This omission is load-bearing for the 'unifying' claim, as overlap or omission risks cannot be assessed.
Authors: We concur that a structured mapping of the literature is essential to substantiate the unifying aspect of the framework. We will introduce a new table in §2 that categorizes representative works from OOD detection, anomaly detection, and runtime verification communities according to the ODD, OOD, and OMS types. The table will highlight potential overlaps and gaps, thereby allowing readers to assess the taxonomy's coverage and supporting the claim that the categorization facilitates systematic design and comparison. revision: yes
Circularity Check
No circularity: taxonomy built from external concepts with no self-referential derivations
full rationale
The paper advances a conceptual categorization of runtime monitoring into ODD, OOD, and OMS without equations, fitted parameters, or predictions. No step reduces a claimed result to a quantity defined by the taxonomy itself or by self-citation chains. The runway-detection experiment applies the categories illustratively rather than deriving metrics or complementarity from the framework by construction. The absence of a systematic mapping table or decision procedure is a limitation of completeness, not a circularity in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Runtime monitoring approaches for safety-critical ML can be partitioned into ODD, OOD, and OMS categories that are distinct, comprehensive, and complementary.
invented entities (3)
-
ODD monitoring
no independent evidence
-
OOD monitoring
no independent evidence
-
OMS monitoring
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Concrete Problems in AI Safety
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016)
work page internal anchor Pith review arXiv 2016
-
[2]
In: Conformal and Probabilistic Prediction with Applications
Andéol, L., Fel, T., De Grancey, F., Mossina, L.: Confident object detection via conformal prediction and conformal risk control: an application to railway signaling. In: Conformal and Probabilistic Prediction with Applications. pp. 36–55. PMLR (2023)
2023
-
[3]
IEEE Transactions on Dependable and Secure Computing1, 11–33 (2004)
Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Transactions on Dependable and Secure Computing1, 11–33 (2004)
2004
-
[4]
In: 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)
Cai,F.,Koutsoukos,X.:Real-time out-of-distributiondetectioninlearning-enabledcyber-physicalsystems. In: 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS). pp. 174–183. IEEE (2020)
2020
-
[5]
Cappi, C., Cohen, N., Ducoffe, M., Gabreau, C., Gardes, L., Gauffriau, A., Ginestet, J.B., Mamalet, F., Mussot, V., Pagetti, C., et al.: How to design a dataset compliant with an ML-based system ODD? In: 12th European Congress on Embedded Real Time Software and Systems (2024)
2024
-
[6]
In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Chen, Y., Cheng, C.H., Yan, J., Yan, R.: Monitoring object detection abnormalities via data-label and post- algorithm abstractions. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 6688–6693. IEEE (2021)
2021
-
[7]
In: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Cheng, C.H., Nührenberg, G., Yasuoka, H.: Runtime monitoring neuron activation patterns. In: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). pp. 300–303. IEEE (2019)
2019
-
[8]
arXiv preprint arXiv:2306.08447 (2023)
Cheng, C.H., Wu, C., Ruess, H., Bensalem, S.: Towards Rigorous Design of OoD Detectors. arXiv preprint arXiv:2306.08447 (2023)
-
[9]
arXiv preprint arXiv:1812.02765 (2018)
Denouden, T., Salay, R., Czarnecki, K., Abdelzad, V., Phan, B., Vernekar, S.: Improving reconstruction au- toencoder out-of-distribution detection with mahalanobis distance. arXiv preprint arXiv:1812.02765 (2018)
-
[10]
Advances in neural information processing systems35, 20434–20449 (2022)
Du, X., Gozum, G., Ming, Y., Li, Y.: Siren: Shaping representations for detecting out-of-distribution objects. Advances in neural information processing systems35, 20434–20449 (2022)
2022
-
[11]
arXiv preprint arXiv:2202.01197 (2022)
Du, X., Wang, Z., Cai, M., Li, Y.: Vos: Learning what you don’t know by virtual outlier synthesis. arXiv preprint arXiv:2202.01197 (2022)
-
[12]
arXiv preprint arXiv:2304.09938 (2023)
Ducoffe, M., Carrere, M., Féliers, L., Gauffriau, A., Mussot, V., Pagetti, C., Sammour, T.: LARD–Landing Approach Runway Detection–Dataset for Vision Based Landing. arXiv preprint arXiv:2304.09938 (2023)
-
[13]
European Aviation Safety Agency (EASA): Artificial Intelligence Roadmap 1.0: A Human-Centric Ap- proach to AI in Aviation. Tech. rep., European Aviation Safety Agency (2020)
2020
-
[14]
European Aviation Safety Agency (EASA): EASA Concept Paper: Guidance for Level 1 & 2 Machine Learning Applications Issue 02. Tech. rep., European Aviation Safety Agency (2024)
2024
-
[15]
In: 2021 IEEE 26th Pacific Rim International Symposium on Dependable Computing (PRDC)
Ferreira, R.S., Arlat, J., Guiochet, J., Waeselynck, H.: Benchmarking safety monitors for image classi- fiers with machine learning. In: 2021 IEEE 26th Pacific Rim International Symposium on Dependable Computing (PRDC). pp. 7–16. IEEE (2021)
2021
-
[16]
Computational Intelligence41, e70032 (2025)
Ferreira, R.S., Guérin, J., Delmas, K., Guiochet, J., Waeselynck, H.: Safety monitoring of machine learning perception functions: a survey. Computational Intelligence41, e70032 (2025)
2025
-
[17]
In: international conference on machine learning
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: international conference on machine learning. pp. 1050–1059. PMLR (2016)
2016
-
[18]
Explaining and Harnessing Adversarial Examples
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
work page internal anchor Pith review arXiv 2014
-
[19]
In: Proceedings of the AAAI conference on artificial intelligence
Guérin, J., Delmas, K., Ferreira, R., Guiochet, J.: Out-of-distribution detection is not all you need. In: Proceedings of the AAAI conference on artificial intelligence. vol. 37, pp. 14829–14837 (2023)
2023
-
[20]
In: 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)
Guerin, J., Ferreira, R.S., Delmas, K., Guiochet, J.: Unifying evaluation of machine learning safety mon- itors. In: 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE). pp. 414–422. IEEE (2022)
2022
-
[21]
In: International Conference on Runtime Verification
Hashemi, V., Křetínsk` y, J., Rieder, S., Schön, T., Vorhoff, J.: Gaussian-based and outside-the-box runtime monitoring join forces. In: International Conference on Runtime Verification. pp. 218–228. Springer (2024)
2024
-
[22]
In: International Conference on Runtime Verification
He, W., Wu, C., Bensalem, S.: Box-based monitor approach for out-of-distribution detection in YOLO: an exploratory study. In: International Conference on Runtime Verification. pp. 229–239. Springer (2024)
2024
-
[23]
In: International Conference on Learning Representations (2019)
Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and per- turbations. In: International Conference on Learning Representations (2019)
2019
-
[24]
In: International Conference on Learning Representations (2017)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: International Conference on Learning Representations (2017)
2017
-
[25]
In: 24th European Conference on Artificial Intelligence
Henzinger, T.A., Lukina, A., Schilling, C.: Outside the box: Abstraction-based monitoring of neural net- works. In: 24th European Conference on Artificial Intelligence. vol. 325 (2020)
2020
-
[26]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Hsu, Y.C., Shen, Y., Jin, H., Kira, Z.: Generalized ODIN: Detecting out-of-distribution image without learning from out-of-distribution data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10951–10960 (2020)
2020
-
[27]
In: International Conference on Computer Safety, Reliability, and Security
Kaakai, F., Adibhatla, S., Pai, G., Escorihuela, E.: Data-centric operational design domain characteriza- tion for machine learning-based aeronautical products. In: International Conference on Computer Safety, Reliability, and Security. pp. 227–242. Springer (2023)
2023
-
[28]
In: NeurIPS MLSys Workshop
Kang, D., Raghavan, D., Bailis, P., Zaharia, M.: Model assertions for debugging machine learning. In: NeurIPS MLSys Workshop. vol. 3 (2018)
2018
-
[29]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Kirchheim, K., Filax, M., Ortmeier, F.: Pytorch-ood: A library for out-of-distribution detection based on pytorch. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4351–4360 (2022)
2022
-
[30]
Advances in neural information processing systems31(2018)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems31(2018)
2018
-
[31]
Advances in neural information processing systems33, 21464–21475 (2020)
Liu, W., Wang, X., Owens, J., Li, Y.: Energy-based out-of-distribution detection. Advances in neural information processing systems33, 21464–21475 (2020)
2020
-
[32]
IEEE Transactions on Systems, Man, and Cybernetics: Systems 48, 702–715 (2016)
Machin, M., Guiochet, J., Waeselynck, H., Blanquart, J.P., Roy, M., Masson, L.: SMOF: A safety monitor- ing framework for autonomous systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems 48, 702–715 (2016)
2016
-
[33]
Digital signal processing73, 1–15 (2018)
Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digital signal processing73, 1–15 (2018)
2018
-
[34]
In: Proc
Ndong, J., Salamatian, K.: Signal processing-based anomaly detection techniques: a comparative analysis. In: Proc. 2011 3rd International Conference on Evolving Internet. pp. 32–39. IARIA (2011)
2011
-
[35]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 427–436 (2015)
2015
-
[36]
Novello, P., Prudent, Y., Friedrich, C., Pequignot, Y., Le Goff, M.: OODEEL, a simple, compact, and hackable post-hoc deep OOD detection for already trained tensorflow or pytorch image classifiers. (2023)
2023
-
[37]
In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Rahman, Q.M., Sünderhauf, N., Dayoub, F.: Did you miss the sign? a false negative alarm system for traffic sign detectors. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 3748–3753. IEEE (2019)
2019
-
[38]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision
Rahman, Q.M., Sunderhauf, N., Dayoub, F.: Per-frame map prediction for continuous performance mon- itoring of object detection during deployment. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 152–160 (2021)
2021
-
[39]
SAE International: Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles. Tech. Rep. J3016, SAE International, Warrendale, PA, USA (Apr 2021)
2021
-
[40]
Advances in neural information processing systems34, 144–157 (2021)
Sun, Y., Guo, C., Li, Y.: React: Out-of-distribution detection with rectified activations. Advances in neural information processing systems34, 144–157 (2021)
2021
-
[41]
In: Interna- tional conference on machine learning
Sun, Y., Ming, Y., Zhu, X., Li, Y.: Out-of-distribution detection with deep nearest neighbors. In: Interna- tional conference on machine learning. pp. 20827–20840. PMLR (2022)
2022
-
[42]
CEAS Aeronautical Journal 16, 973–991 (2025)
Torens, C., Jünger, F., Schirmer, S., Nagarajan, P., Schopferer, S., Zhukov, D., Dauer, J.: Runtime monitor- ing of operational design domain to safeguard machine learning components. CEAS Aeronautical Journal 16, 973–991 (2025)
2025
-
[43]
Wang,H.,Li,Z.,Feng,L.,Zhang,W.:Vim:Out-of-distributionwithvirtual-logitmatching.In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4921–4930 (2022)
2022
-
[44]
In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
Wang, H., Xu, J., Xu, C., Ma, X., Lu, J.: Dissector: Input validation for deep learning applications by crossing-layer dissection. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. pp. 727–738 (2020)
2020
-
[45]
In: Proceedings of the IEEE/CVF international conference on computer vision
Wilson, S., Fischer, T., Dayoub, F., Miller, D., Sünderhauf, N.: Safe: Sensitivity-aware features for out- of-distribution object detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 23565–23576 (2023)
2023
-
[46]
Applied Soft Computing83, 105613 (2019)
Yahaya, S.W., Lotfi, A., Mahmud, M.: A consensus novelty detection ensemble approach for anomaly detection in activities of daily living. Applied Soft Computing83, 105613 (2019)
2019
-
[47]
Yang,J.,Zhou,K.,Li,Y.,Liu,Z.:Generalizedout-of-distributiondetection:Asurvey.InternationalJournal of Computer Vision132(12), 5635–5662 (2024)
2024
-
[48]
In: NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models (2023)
Zhang, J., Yang, J., Wang, P., Wang, H., Lin, Y., Zhang, H., Sun, Y., Du, X., Li, Y., Liu, Z., Chen, Y., Li, H.: OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection. In: NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models (2023)
2023
-
[49]
arXiv preprint arXiv:2505.16740 (2025)
Zouzou, A., Ducoffe, M., Boumazouza, R., et al.: Robust vision-based runway detection through conformal prediction and conformal map. arXiv preprint arXiv:2505.16740 (2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.