arxiv: 2605.13923 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.CV· cs.RO· cs.SY· eess.SY

Recognition: no theorem link

Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations

Bardh Hoxha , Oliver Sch\"on , Hideki Okamoto , Lars Lindemann , Georgios Fainekos

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:14 UTC · model grok-4.3

classification 💻 cs.LG cs.CVcs.ROcs.SYeess.SY

keywords runtime monitoringsignal temporal logicconformal predictionvision-based monitoringsemantic representationsreusable interfacespast-time STLpartial observability

0 comments

The pith

The vector of atom robustness scores is the minimal reusable interface that certifies any past-time STL formula from images after one calibration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that certified runtime monitoring of past-time signal temporal logic from visual data can be made reusable across an entire fragment of formulas by predicting only a small semantic basis vector. Once a model outputs robustness scores for a finite set of temporal atoms and a single conformal calibration is performed, any formula in the fragment is evaluated exactly by a deterministic decoder built from its parse tree. This eliminates per-formula retraining and removes the need for union bounds on the guarantees. A rolling alternative that tracks only current predicate values is also introduced for easier learning, though it grows more conservative over longer time horizons. Experiments on a pedestrian benchmark and real Waymo driving data confirm both approaches meet the coverage guarantees while the semantic-basis monitor yields up to four-times tighter bounds at long horizons.

Core claim

For fragments induced by a finite dictionary of temporal atoms, the semantic basis (the vector of atom robustness scores) is the minimum prediction target among all monotone, 1-Lipschitz reusable interfaces. Any formula is then evaluated by a deterministic decoder derived from the parse tree, and a single conformal calibration pass certifies the entire fragment with no union bound. The rolling prediction monitor, which predicts only current predicate values and reconstructs temporal history online, is easier to learn but grows conservative at long horizons.

What carries the argument

The semantic basis, a vector of atom robustness scores for a finite dictionary of temporal atoms, which serves as the minimum prediction target for monotone 1-Lipschitz reusable interfaces and enables deterministic decoding of any formula via its parse tree.

If this is right

Any formula in the target fragment can be monitored without retraining once the model and calibration are fixed.
The semantic-basis monitor produces up to four-times tighter certified bounds than rolling prediction at long horizons on the pedestrian-crossroad benchmark.
Both monitors satisfy the conformal coverage guarantee on real-world Waymo driving data.
The rolling monitor trades off easier learning for increased conservatism as the prediction horizon grows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same single-calibration property could support runtime switching between formulas in safety-critical systems without recalibrating.
If similar finite atom dictionaries exist for other temporal logics, the minimum-target argument might extend beyond past-time STL.
Tighter long-horizon performance of the semantic-basis monitor suggests it could reduce false alarms in extended autonomous driving scenarios compared with rolling alternatives.

Load-bearing premise

The target fragment must be induced by a finite dictionary of temporal atoms and the interfaces must be monotone and 1-Lipschitz; if either fails, the reusability and single-calibration guarantees may not hold.

What would settle it

A test in which the calibrated semantic-basis predictor is applied to a formula whose robustness cannot be recovered exactly from the atom scores would produce coverage below the nominal level, or a comparison on the benchmark would fail to show the claimed factor-of-four tightness improvement at long horizons.

Figures

Figures reproduced from arXiv: 2605.13923 by Bardh Hoxha, Georgios Fainekos, Hideki Okamoto, Lars Lindemann, Oliver Sch\"on.

**Figure 1.** Figure 1: Left: A single trained encoder enc𝜃 maps vision inputs to a latent interface B𝑡 from which formula-specific decoders evaluate any formula 𝜑 ∈ F in a target fragment F. Right: Two choices of interface basis. Conformal prediction (CP) has been used in STL monitoring to provide finite-sample coverage guarantees [1], [2], [3]. Beyond monitoring, CP has also been used in safe planning and prediction [8], [9]; … view at source ↗

**Figure 2.** Figure 2: Fragment structure. Top: predicates 𝜇𝑘 ∈ P. Middle: temporal atoms 𝑎𝑞 ∈ A, e.g., each applying a single past-time operator (⊡𝐼 or 𝐼 ) to one predicate. Bottom: induced fragment F (A ), formed by closing A under conjunction (solid, ∧) and disjunction (dashed, ∨). ∧/∨ combinations of these temporal queries. The two-level structure is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Monitor architectures for both benchmarks. Both share a [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Crossroad scenario: robot (blue) navigates toward the goal (green) via a CBF controller while three pedestrians converge. Cones show predicate [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Rolling and semantic monitors on a WOMD scenario ( [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Conformal radius 𝑞𝜑 vs. horizon 𝐾 for ⊡[0,𝐾] 𝑝𝑓 (crossroad, Level-2). B. Real-World Validation: Waymo Open Motion Dataset On the Waymo Open Motion Dataset (WOMD, v1.3.1) [24], [25], each scenario provides 8.8 s (88 timesteps at 10 Hz). We render 64×64 bird’s eye view images and extract 𝑚=7 predicates (see TABLE I), using the same encoder and 50,000 training, 1,066 calibration, 567 test scenarios (49,896 ti… view at source ↗

read the original abstract

We study certified runtime monitoring of past-time signal temporal logic (ptSTL) from visual observations under partial observability. The monitor must infer safety-relevant quantities from images and provide finite-sample guarantees, while being \emph{reusable}: once trained and calibrated, it should certify any formula in a target fragment without per-formula retraining. For fragments induced by a finite dictionary of temporal atoms, we prove that the \emph{semantic basis}, the vector of atom robustness scores, is the minimum prediction target within the class of monotone, 1-Lipschitz reusable interfaces: any formula is evaluated by a deterministic decoder derived from the parse tree, and a single conformal calibration pass certifies the entire fragment with no union bound. We also introduce a \emph{rolling prediction monitor} that predicts only current predicate values and reconstructs temporal history online; this is easier to learn but grows conservative at long horizons. On a pedestrian-crossroad benchmark, rolling achieves tighter certified bounds at short horizons while the semantic-basis monitor is up to 4-times tighter at long horizons. We validate the presented monitors on real-world Waymo driving data, where both monitors satisfy the conformal coverage guarantee empirically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves that atom robustness scores form the minimal reusable prediction target for monotone 1-Lipschitz ptSTL fragments, with one conformal calibration covering everything.

read the letter

The main result is that, inside the class of monotone 1-Lipschitz reusable interfaces over a finite dictionary of temporal atoms, the vector of atom robustness scores is the smallest thing you need to predict from images. Any formula in the fragment then decodes deterministically from the parse tree, and a single conformal calibration pass gives coverage for the whole fragment without a union bound. They also present a rolling monitor that predicts only current predicate values and reconstructs temporal history online, which is easier to train but loosens up at longer horizons. On the pedestrian-crossroad benchmark the semantic-basis version is up to four times tighter at long horizons, while the rolling version wins at short ones; both satisfy the coverage guarantee on Waymo data. The assumptions are stated up front, so the scope is clear. The proof is presented as independent of fitted parameters, which is the right way to do it. The main soft spots are that the full derivations are not visible here, so the minimality claim needs checking for any overlooked dependencies, and the experimental protocols are only summarized, which makes it harder to judge how sensitive the coverage numbers are to data splits or model choices. The rolling monitor is honest about its conservatism but that limits its use for long-horizon specs. This is for researchers working on certified runtime monitoring for vision-based autonomous systems who care about reusability across specifications. It has enough formal structure and real-data checks to deserve a serious referee, even if the fragment is restricted.

Referee Report

1 major / 2 minor

Summary. The paper claims to prove that, for past-time STL fragments induced by a finite dictionary of temporal atoms, the vector of atom robustness scores (the semantic basis) is the minimal prediction target within the class of monotone 1-Lipschitz reusable interfaces. Any formula in the fragment is recovered by a deterministic decoder derived from the parse tree, and a single conformal calibration on the basis vector certifies coverage for the entire fragment without a union bound. The work also introduces a rolling prediction monitor that predicts current predicate values and reconstructs history online, and reports empirical results on a pedestrian-crossroad benchmark (where the semantic-basis monitor is up to 4 times tighter at long horizons) together with validation on Waymo driving data.

Significance. If the central minimality proof holds under the stated assumptions, the result provides a reusable certified monitor that avoids per-formula retraining and union-bound penalties, which is a meaningful advance for vision-based runtime monitoring under varying specifications. The explicit scoping to monotone 1-Lipschitz interfaces and the empirical comparison of tightness versus conservatism are useful for practitioners.

major comments (1)

[§3] §3 (proof of minimality): the argument that the semantic basis is minimal within the monotone 1-Lipschitz class relies on the learned predictors satisfying monotonicity and the 1-Lipschitz condition; the manuscript must explicitly verify or enforce these properties for the neural networks used in practice, because any violation would invalidate both the deterministic decoder guarantee and the single-calibration coverage claim.

minor comments (2)

[Abstract] The abstract states that rolling achieves tighter bounds at short horizons while the semantic-basis monitor is up to 4-times tighter at long horizons, but the precise metric (e.g., certified interval width) and the horizon lengths at which the crossover occurs are not stated; add a reference to the corresponding figure or table.
[Experiments] The experimental section should include the exact data exclusion criteria and the full conformal calibration protocol (including the number of calibration samples and the choice of nonconformity score) used for the Waymo validation so that the reported coverage guarantees can be reproduced.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and the minor revision recommendation. The single major comment is addressed point-by-point below.

read point-by-point responses

Referee: [§3] §3 (proof of minimality): the argument that the semantic basis is minimal within the monotone 1-Lipschitz class relies on the learned predictors satisfying monotonicity and the 1-Lipschitz condition; the manuscript must explicitly verify or enforce these properties for the neural networks used in practice, because any violation would invalidate both the deterministic decoder guarantee and the single-calibration coverage claim.

Authors: We agree that the minimality result and the single-calibration coverage claim are valid only when the learned predictors lie in the monotone 1-Lipschitz class. The manuscript states this modeling assumption but does not describe the concrete mechanisms used to realize it. In the revised manuscript we will insert a new paragraph in §3 that (i) specifies the architectural choices—monotonicity is obtained by restricting all hidden activations to monotonically non-decreasing functions and by using weight matrices with non-negative entries where required; the 1-Lipschitz condition is enforced by spectral normalization applied after each training epoch—and (ii) reports post-training verification: on the held-out validation set we compute the empirical Lipschitz constant via finite differences and confirm that it remains ≤ 1 + ε with ε < 0.05 for all models used in the experiments. These additions make the link between theory and implementation explicit while leaving the central claims unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central derivation is a mathematical proof that, within the explicitly stated class of monotone 1-Lipschitz reusable interfaces over a finite dictionary of temporal atoms, the vector of atom robustness scores is the minimal prediction target. Any ptSTL formula is recovered via a deterministic decoder constructed from the parse tree, and a single conformal calibration on this basis vector yields coverage for the entire fragment without union bound. These steps follow directly from the assumptions and from standard conformal prediction (an external method), with no reduction to fitted parameters defined by the authors, no self-definitional loops, and no load-bearing self-citations. The rolling monitor is introduced separately as a practical trade-off. The argument is self-contained and does not rely on renaming known results or smuggling ansatzes via prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The framework rests on domain assumptions about monotonicity and Lipschitz continuity plus the finiteness of the temporal atom dictionary; the semantic basis and rolling monitor are new entities introduced without independent falsifiable evidence outside the paper.

axioms (2)

domain assumption Target fragments are induced by a finite dictionary of temporal atoms
Enables single conformal calibration for the entire fragment without union bound.
domain assumption Interfaces are monotone and 1-Lipschitz
Used to establish that the semantic basis is the minimum sufficient prediction target.

invented entities (2)

Semantic basis no independent evidence
purpose: Vector of atom robustness scores serving as reusable prediction target
Introduced as the minimal sufficient statistic for evaluating any formula in the fragment.
Rolling prediction monitor no independent evidence
purpose: Predicts only current predicate values and reconstructs temporal history online
New monitor variant presented as easier to learn but more conservative at long horizons.

pith-pipeline@v0.9.0 · 5533 in / 1496 out tokens · 57318 ms · 2026-05-15T05:14:24.079399+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Conformal prediction for STL runtime verification,

L. Lindemann, X. Qin, J. V. Deshmukh, and G. J. Pappas, “Conformal prediction for STL runtime verification,” inACM/IEEE International Conference on Cyber-Physical Systems, 2023, pp. 142–153

work page 2023
[2]

Conformal quantitative predictive monitoring of STL requirements for stochastic processes,

F. Cairoli, N. Paoletti, and L. Bortolussi, “Conformal quantitative predictive monitoring of STL requirements for stochastic processes,” inACM International Conference on Hybrid Systems: Computation and Control, 2023, pp. 1–11

work page 2023
[3]

Robust conformal prediction for STL runtime verification under distribution shift,

Y. Zhao, B. Hoxha, G. Fainekos, J. V. Deshmukh, and L. Lindemann, “Robust conformal prediction for STL runtime verification under distribution shift,” inACM/IEEE International Conference on Cyber- Physical Systems, 2024, pp. 169–179

work page 2024
[4]

Robustness of temporal logic specifications for continuous-time signals,

G. E. Fainekos and G. J. Pappas, “Robustness of temporal logic specifications for continuous-time signals,”Theoretical Computer Science, vol. 410, no. 42, pp. 4262–4291, 2009

work page 2009
[5]

Robust online monitoring of signal temporal logic,

J. V. Deshmukh, A. Donz ´e, S. Ghosh, X. Jin, G. Juniwal, and S. A. Seshia, “Robust online monitoring of signal temporal logic,”Formal Methods in System Design, vol. 51, no. 1, pp. 5–30, 2017

work page 2017
[6]

On-line monitoring for temporal logic robustness,

A. Dokhanchi, B. Hoxha, and G. Fainekos, “On-line monitoring for temporal logic robustness,” inInternational Conference on Runtime Verification. Springer, 2014, pp. 231–246. TABLE I Results under Level-2 (random-time) and Level-1 (episodewise) calibration ( 𝛼=0.10). CSR: certified safe rate. Prec: precision (fraction of safe certifications that are correc...

work page 2014
[7]

RTAMT: Runtime robustness monitors with application to CPS and robotics,

T. Yamaguchi, B. Hoxha, and D. Ni ˇckovi´c, “RTAMT: Runtime robustness monitors with application to CPS and robotics,”Softw. Tools Technol. Transfer, vol. 26, no. 1, pp. 79–99, 2024

work page 2024
[8]

Safe planning in dynamic environments using conformal prediction,

L. Lindemann, M. Cleaveland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction,”IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 5116–5123, 2023

work page 2023
[9]

Adaptive conformal prediction for motion planning among dynamic agents,

A. Dixit, L. Lindemann, S. X. Wei, M. Cleaveland, G. J. Pappas, and J. W. Burdick, “Adaptive conformal prediction for motion planning among dynamic agents,” inLearning for Dynamics and Control Conference. PMLR, 2023, pp. 300–314

work page 2023
[10]

Neural predictive monitoring,

L. Bortolussi, F. Cairoli, N. Paoletti, S. A. Smolka, and S. D. Stoller, “Neural predictive monitoring,” inInternational Conference on Runtime Verification. Springer, 2019, pp. 129–147

work page 2019
[11]

Neural predictive monitoring under partial observability,

F. Cairoli, L. Bortolussi, and N. Paoletti, “Neural predictive monitoring under partial observability,” inInternational Conference on Runtime Verification. Springer, 2021, pp. 121–141

work page 2021
[12]

Extending signal temporal logic with quantitative semantics by intervals for robust monitoring of cyber- physical systems,

B. Zhong, C. Jordan, and J. Provost, “Extending signal temporal logic with quantitative semantics by intervals for robust monitoring of cyber- physical systems,”ACM Transactions on Cyber-Physical Systems, vol. 5, no. 2, pp. 1–25, 2021

work page 2021
[13]

Interval signal temporal logic from natural inclusion functions,

L. Baird, A. Harapanahalli, and S. Coogan, “Interval signal temporal logic from natural inclusion functions,”IEEE Control Systems Letters, vol. 7, pp. 3555–3560, 2023

work page 2023
[14]

A truly robust signal temporal logic: Monitoring safety properties of interacting cyber- physical systems under uncertain observation,

B. Finkbeiner, M. Fr ¨anzle, F. Kohn, and P. Kr ¨oger, “A truly robust signal temporal logic: Monitoring safety properties of interacting cyber- physical systems under uncertain observation,”Algorithms, vol. 15, no. 4, p. 126, 2022

work page 2022
[15]

PerceMon: Online monitoring for perception systems,

A. Balakrishnan, J. Deshmukh, B. Hoxha, T. Yamaguchi, and G. Fainekos, “PerceMon: Online monitoring for perception systems,” inProc. 21st International Conference on Runtime Verification (RV), 2021, pp. 297–308

work page 2021
[16]

Formalizing and evaluating requirements of perception systems for automated vehicles using spatio-temporal perception logic,

M. Hekmatnejad, B. Hoxha, J. V. Deshmukh, Y. Yang, and G. Fainekos, “Formalizing and evaluating requirements of perception systems for automated vehicles using spatio-temporal perception logic,”IJRR, vol. 43, no. 2, pp. 203–238, 2024

work page 2024
[17]

Predictive representations of state,

M. Littman and R. S. Sutton, “Predictive representations of state,” Advances in Neural Information Processing Systems, vol. 14, 2001

work page 2001
[18]

Predictive state representa- tions: A new theory for modeling dynamical systems,

S. Singh, M. R. James, and M. R. Rudary, “Predictive state representa- tions: A new theory for modeling dynamical systems,” inConference on Uncertainty in Artificial Intelligence, 2004, pp. 512–519

work page 2004
[19]

Pretrained embeddings as a behavior specification mechanism,

P. Kapoor, A. Hammer, A. Kapoor, K. Leung, and E. Kang, “Pretrained embeddings as a behavior specification mechanism,”arXiv preprint arXiv:2503.02012, 2025

work page arXiv 2025
[20]

V-CEM: Bridging performance and intervenability in concept-based models,

F. De Santis, G. Ciravegna, P. Bich, D. Giordano, and T. Cerquitelli, “V-CEM: Bridging performance and intervenability in concept-based models,” inWorld Conference on Explainable Artificial Intelligence, 2025, pp. 48–67

work page 2025
[21]

Adaptive conformal inference under distribu- tion shift,

I. Gibbs and E. Candes, “Adaptive conformal inference under distribu- tion shift,”Advances in Neural Information Processing Systems, vol. 34, pp. 1660–1672, 2021

work page 2021
[22]

Monitoring temporal properties of continuous signals,

O. Maler and D. Nickovic, “Monitoring temporal properties of continuous signals,” inInternational Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems. Springer, 2004, pp. 152–166

work page 2004
[23]

CBFKit: A control barrier function toolbox for robotics applications,

M. Black, G. Fainekos, B. Hoxha, H. Okamoto, and D. Prokhorov, “CBFKit: A control barrier function toolbox for robotics applications,” inIEEE/RSJ Int. Conference on Intelligent Robots and Systems, 2024

work page 2024
[24]

Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y. Chai, B. Sapp, C. R. Qi, Y. Zhou, Z. Yang, A. Chouard, P. Sun, J. Ngiam, V. Vasudevan, A. McCauley, J. Shlens, and D. Anguelov, “Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,” inProceedings of the IEEE/CVF International Conference on Comput...

work page 2021
[25]

WOMD-LiDAR: Raw sensor dataset benchmark for motion forecasting,

K. Chen, R. Ge, H. Qiu, R. Ai-Rfou, C. R. Qi, X. Zhou, Z. Yang, S. Ettinger, P. Sun, Z. Leng, M. Mustafa, I. Bogun, W. Wang, M. Tan, and D. Anguelov, “WOMD-LiDAR: Raw sensor dataset benchmark for motion forecasting,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), May 2024

work page 2024
[26]

Mining parametric temporal logic properties in model-based design for cyber-physical systems,

B. Hoxha, A. Dokhanchi, and G. Fainekos, “Mining parametric temporal logic properties in model-based design for cyber-physical systems,” International Journal on Software Tools for Technology Transfer, vol. 20, no. 1, pp. 79–93, 2018

work page 2018