Embedding Semantic Risk into Distance Fields and CBFs for Online Monocular Safe Control
Pith reviewed 2026-06-28 14:42 UTC · model grok-4.3
The pith
Semantic class labels are fused into the Euclidean signed distance field before control optimization so high-risk objects impose larger margins on CBF navigation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework reconstructs dense 3-D geometry from monocular RGB video via foundation-model SLAM, fuses per-frame semantic segmentation labels into the map, applies class-dependent inflation to safety-relevant regions, and computes an ESDF on the inflated geometry. This semantic-aware ESDF supplies the local distances and spatial derivatives required by the CBF controller, with additional class-dependent gains regulating the response, enabling 10-20 Hz online operation and semantic-aware safe behavior in teleoperation and autonomous navigation.
What carries the argument
The semantic-aware ESDF formed by class-dependent inflation of reconstructed geometry before field computation, which encodes risk into distances and derivatives supplied to the CBF.
If this is right
- The CBF receives risk-adjusted distances using only standard ESDF queries at runtime.
- High-risk object classes influence larger spatial regions in the safety field by design.
- Efficient distance and gradient queries are retained because the representation remains an ESDF.
- The pipeline achieves 10-20 Hz operation in both simulation and hardware experiments.
Where Pith is reading between the lines
- Continuous updating of the semantic field could support safer motion among moving objects whose classes are re-labeled over time.
- The inflation rules could be extended to predicted object trajectories to reduce risk in dynamic scenes.
- Field performance under noisy or incomplete segmentation would indicate how much label accuracy is required for the approach to remain effective.
Load-bearing premise
Per-frame semantic segmentation labels can be fused reliably into the reconstructed 3-D geometry to produce class-dependent inflation that correctly represents risk for the downstream controller.
What would settle it
A hardware trial in which the robot maintains the same clearance from a high-risk object under semantic inflation as under uniform inflation, or collides despite the inflated field, would show the embedding adds no benefit.
Figures
read the original abstract
We propose an online monocular perception-to-control framework that embeds semantic risk into the distance field used by Control Barrier Function (CBF)-based safe navigation and teleoperation. Many perception-based safety filters assign the same distance-based safety margin to all mapped obstacles or use semantics only as a downstream controller adjustment, rather than encoding semantic risk in the spatial representation. Our framework instead reasons online about obstacle geometry and class-dependent risk by embedding semantic information directly into the Euclidean Signed Distance Field (ESDF). This design encodes semantic risk before control optimization, so high-risk objects exert a larger spatial influence in the safety field while retaining efficient ESDF queries at runtime. Specifically, a foundation-model-based SLAM front end reconstructs dense 3-D geometry from monocular RGB video, while per-frame semantic segmentation provides pixel-level class labels that are fused into the reconstructed geometry. The resulting geometric-semantic representation is then converted into an ESDF, where semantic labels identify safety-relevant regions and impose class-dependent inflation before field computation. The semantic-aware ESDF provides the local distance values and spatial derivatives required by the CBF controller, while class-dependent gains further regulate the controller response. Extensive simulation and hardware experiments demonstrate online operation at 10--20 Hz and semantic-aware safe behavior in both teleoperation and autonomous navigation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an online monocular perception-to-control pipeline that embeds semantic risk directly into an ESDF by fusing foundation-model SLAM geometry with per-frame semantic segmentation labels, applying class-dependent inflation to the geometry before ESDF computation, and supplying the resulting field and gradients to a CBF controller (with class-dependent gains) for safe teleoperation and navigation. The central claim is that this pre-control encoding yields semantic-aware safety margins while supporting 10-20 Hz runtime operation, as demonstrated in simulation and hardware experiments.
Significance. If the fusion and inflation steps can be shown to produce reliable risk-adjusted margins under realistic monocular perception noise, the approach would offer a useful alternative to post-hoc semantic adjustments in CBF safety filters by shifting class-dependent risk into the spatial representation itself. This could improve safety in environments with heterogeneous obstacles while preserving efficient distance queries.
major comments (2)
- [Abstract] Abstract: the claim that experiments demonstrate 'semantic-aware safe behavior' at 10-20 Hz rests on unverified assertions; no quantitative metrics, error bars, ablation results, fusion accuracy measures, or closed-loop violation rates under perception noise are reported, leaving the central safety claim unsupported.
- [Abstract] Abstract (fusion and inflation steps): the method assumes per-frame semantic labels can be fused reliably into monocular dense geometry and that class-dependent inflation produces appropriate CBF margins, yet provides no analysis of error propagation from scale drift, depth inaccuracies, label confusion, or temporal inconsistency; this assumption is load-bearing for the claim that h(x) ≥ 0 encodes the intended risk-adjusted safety.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for stronger quantitative support and analysis of modeling assumptions. We address each major comment below and will revise the manuscript to incorporate additional metrics and discussion.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that experiments demonstrate 'semantic-aware safe behavior' at 10-20 Hz rests on unverified assertions; no quantitative metrics, error bars, ablation results, fusion accuracy measures, or closed-loop violation rates under perception noise are reported, leaving the central safety claim unsupported.
Authors: The manuscript reports measured runtimes of 10-20 Hz via timing benchmarks in simulation and on hardware, along with qualitative demonstrations of collision-free teleoperation and navigation that differentiate semantic-aware inflation from uniform margins. We agree, however, that the abstract and results section would benefit from explicit quantitative safety metrics (e.g., minimum achieved distances, closed-loop constraint violation counts, ablation on inflation radii, and fusion accuracy under noise). We will add these metrics, error bars, and baseline comparisons in the revised version to better substantiate the safety claims. revision: yes
-
Referee: [Abstract] Abstract (fusion and inflation steps): the method assumes per-frame semantic labels can be fused reliably into monocular dense geometry and that class-dependent inflation produces appropriate CBF margins, yet provides no analysis of error propagation from scale drift, depth inaccuracies, label confusion, or temporal inconsistency; this assumption is load-bearing for the claim that h(x) ≥ 0 encodes the intended risk-adjusted safety.
Authors: The pipeline relies on off-the-shelf foundation-model SLAM and segmentation whose per-frame outputs are fused into the ESDF; the manuscript does not contain a dedicated propagation analysis or sensitivity study for scale drift, depth error, label noise, or temporal inconsistency. We will revise the manuscript to include an explicit discussion of these error sources and their potential effect on the semantic ESDF and the resulting CBF constraint h(x) ≥ 0. Where data permits, we will also add a limited sensitivity experiment or conservative bounds. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper describes a perception-to-control pipeline that fuses external foundation-model SLAM geometry with per-frame semantic segmentation labels, applies class-dependent inflation, computes an ESDF, and supplies it to a CBF controller. No equations, parameters, or steps are shown that reduce the claimed semantic-risk encoding or safety behavior to fitted values defined inside the paper or to self-citations whose content is itself unverified. The central construction is a design choice whose correctness is asserted to rest on external models rather than internal self-definition or renaming of results. This matches the default expectation of a non-circular pipeline.
Axiom & Free-Parameter Ledger
free parameters (2)
- class-dependent inflation distances
- class-dependent controller gains
axioms (2)
- domain assumption Foundation-model SLAM produces sufficiently accurate dense 3-D geometry from monocular RGB video for online use.
- domain assumption Per-frame semantic segmentation yields reliable pixel-level class labels that survive projection and fusion into the map.
Reference graph
Works this paper leans on
-
[1]
Control barrier function based quadratic programs for safety critical systems,
A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,”IEEE Trans. Autom. Control, vol. 62, no. 8, pp. 3861–3876, 2017
2017
-
[2]
Control barrier functions: Theory and applications,
A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” inProc. Eur. Control Conf., 2019, pp. 3420–3431
2019
-
[3]
Control barrier functions for systems with high relative degree,
W. Xiao and C. Belta, “Control barrier functions for systems with high relative degree,” inProc. IEEE Conf. Decis. Control, 2019, pp. 474–479
2019
-
[4]
Auxiliary-variable adaptive control barrier functions for safety critical systems,
S. Liu, W. Xiao, and C. A. Belta, “Auxiliary-variable adaptive control barrier functions for safety critical systems,” inProc. IEEE Conf. Decis. Control, 2023, pp. 8602–8607
2023
-
[5]
Learning barrier functions with memory for robust safe navigation,
K. Long, C. Qian, J. Cort ´es, and N. Atanasov, “Learning barrier functions with memory for robust safe navigation,”IEEE Robot. Autom. Lett., vol. 6, no. 3, pp. 4931–4938, 2021
2021
-
[6]
Enforcing safety for vision-based controllers via control barrier functions and neural radiance fields,
M. Tong, C. Dawson, and C. Fan, “Enforcing safety for vision-based controllers via control barrier functions and neural radiance fields,” in Proc. IEEE Int. Conf. Robot. Automat., 2023, pp. 10 511–10 517
2023
-
[7]
Point cloud-based control barrier function regression for safe and efficient vision-based control,
M. De Sa, P. Kotaru, and K. Sreenath, “Point cloud-based control barrier function regression for safe and efficient vision-based control,” inProc. IEEE Int. Conf. Robot. Automat., 2024, pp. 366–372
2024
-
[8]
Control- barrier-aided teleoperation with visual-inertial SLAM for safe MA V navigation in complex environments,
S. Zhou, S. Papatheodorou, S. Leutenegger, and A. P. Schoellig, “Control- barrier-aided teleoperation with visual-inertial SLAM for safe MA V navigation in complex environments,” inProc. IEEE Int. Conf. Robot. Automat., 2024, pp. 17 836–17 842
2024
-
[9]
A control barrier function for safe navigation with online Gaussian splatting maps,
T. Chen, A. Swann, J. Yu, O. Shorinwa, R. Murai, M. Kennedy, and M. Schwager, “A control barrier function for safe navigation with online Gaussian splatting maps,” inProc. IEEE Int. Conf. Robot. Automat., 2025, pp. 11 758–11 765
2025
-
[10]
Closing the perception-action loop for semantically safe navigation in semi-static environments,
J. Qian, S. Zhou, N. J. Ren, V . Chatrath, and A. P. Schoellig, “Closing the perception-action loop for semantically safe navigation in semi-static environments,” inProc. IEEE Int. Conf. Robot. Automat., 2024, pp. 11 641–11 648
2024
-
[11]
ASMA: An adaptive safety margin algorithm for vision-language drone navigation via scene-aware control barrier functions,
S. Sanyal and K. Roy, “ASMA: An adaptive safety margin algorithm for vision-language drone navigation via scene-aware control barrier functions,”IEEE Robot. Autom. Lett., 2025
2025
-
[12]
J. Chen and R. Chandra, “Dynamic control barrier function regulation with vision-language models for safe, adaptive, and realtime visual navigation,”arXiv:2603.21142, 2026
-
[13]
DUSt3R: Geometric 3D vision made easy,
S. Wang, V . Leroy, Y . Cabon, B. Chidlovskii, and J. Revaud, “DUSt3R: Geometric 3D vision made easy,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 20 697–20 709
2024
-
[14]
Grounding image matching in 3D with MASt3R,
V . Leroy, Y . Cabon, and J. Revaud, “Grounding image matching in 3D with MASt3R,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 71–91
2024
-
[15]
Vggt: Visual geometry grounded transformer,
J. Wang, M. Chen, N. Karaev, A. Vedaldi, C. Rupprecht, and D. Novotny, “Vggt: Visual geometry grounded transformer,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025
2025
-
[16]
MASt3R-SLAM: Real-time dense SLAM with 3D reconstruction priors,
R. Murai, E. Dexheimer, and A. J. Davison, “MASt3R-SLAM: Real-time dense SLAM with 3D reconstruction priors,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 16 695–16 705
2025
-
[17]
VGGT-SLAM 2.0: Real-time dense feed- forward scene reconstruction,
D. Maggio and L. Carlone, “VGGT-SLAM 2.0: Real-time dense feed- forward scene reconstruction,”arXiv preprint arXiv:2601.19887, 2026
-
[18]
CroCo: Self- supervised pre-training for 3D vision tasks by cross-view completion,
P. Weinzaepfel, V . Leroy, T. Lucas, R. Br ´egier, Y . Cabon, V . Arora, L. Antsfeld, B. Chidlovskii, G. Csurka, and J. Revaud, “CroCo: Self- supervised pre-training for 3D vision tasks by cross-view completion,” inAdv. Neural Inf. Process. Syst., vol. 35, 2022, pp. 3502–3516
2022
-
[19]
EfficientViT: Lightweight multi-scale attention for high-resolution dense prediction,
H. Cai, J. Li, M. Hu, C. Gan, and S. Han, “EfficientViT: Lightweight multi-scale attention for high-resolution dense prediction,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 17 302–17 313
2023
-
[20]
BarrierNet: Differentiable control barrier functions for learning of safe robot control,
W. Xiao, T.-H. Wang, R. Hasani, M. Chahine, A. Amini, X. Li, and D. Rus, “BarrierNet: Differentiable control barrier functions for learning of safe robot control,”IEEE Trans. Robot., vol. 39, no. 3, pp. 2289–2307, 2023
2023
-
[21]
Reinforcement learning-based receding horizon control using adaptive control barrier functions for safety-critical systems,
E. Sabouni, H. Sabbir Ahmad, V . Giammarino, C. G. Cassandras, I. C. Paschalidis, and W. Li, “Reinforcement learning-based receding horizon control using adaptive control barrier functions for safety-critical systems,” inProc. IEEE Conf. Decis. Control, 2024, pp. 401–406
2024
-
[22]
Online control barrier functions for decentralized multi-agent navigation,
Z. Gao, G. Yang, and A. Prorok, “Online control barrier functions for decentralized multi-agent navigation,” inProc. Int. Symp. Multi-Robot Multi-Agent Syst., 2023, pp. 107–113
2023
-
[23]
Reactive and safe co- navigation with haptic guidance,
M. Coffey, D. Zhang, R. Tron, and A. Pierson, “Reactive and safe co- navigation with haptic guidance,” inProc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2023, pp. 213–220
2023
-
[24]
Control strategies for pursuit-evasion under occlusion using visibility and safety barrier functions,
M. Zhou, M. Shaikh, V . Chaubey, P. Haggerty, S. Koga, D. Panagou, and N. Atanasov, “Control strategies for pursuit-evasion under occlusion using visibility and safety barrier functions,” inProc. IEEE Int. Conf. Robot. Automat., 2025, pp. 12 863–12 869
2025
-
[25]
G. Raja, T. M¨okk¨onen, and R. Ghabcheloo, “Safe control using occupancy grid map-based control barrier function (OGM-CBF),”arXiv:2405.10703, 2024
work page internal anchor Pith review arXiv 2024
-
[26]
ScanNet++: A high- fidelity dataset of 3D indoor scenes,
C. Yeshwanth, Y .-C. Liu, M. Nießner, and A. Dai, “ScanNet++: A high- fidelity dataset of 3D indoor scenes,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023
2023
-
[27]
CVXPY: A python-embedded modeling language for convex optimization,
S. Diamond and S. Boyd, “CVXPY: A python-embedded modeling language for convex optimization,”J. Mach. Learn. Res., vol. 17, no. 83, pp. 1–5, 2016
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.