pith. sign in

arxiv: 2606.28995 · v1 · pith:B4MSAAFFnew · submitted 2026-06-27 · 💻 cs.RO

HJ-SafeDMP: Hamilton-Jacobi Reachability-Guided Dynamic Movement Primitives for Provably Safe Robot Motion

Pith reviewed 2026-06-30 09:24 UTC · model grok-4.3

classification 💻 cs.RO
keywords dynamic movement primitivesHamilton-Jacobi reachabilitycontrol barrier functionsrobot safetymotion primitiveshuman-robot interactionconformal predictionsafe robot control
0
0 comments X

The pith

HJ-SafeDMP learns a Control Barrier Value Function offline from demonstrations and uses it to modulate DMP outputs via a closed-form law, delivering formal safety without online optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that DMPs, which generate flexible and stable motions from single demonstrations, can be made provably safe by filtering their outputs with a value function derived from Hamilton-Jacobi reachability analysis. This matters for robots working near humans because existing safety methods either lack formal guarantees or require slow online optimization that breaks real-time performance. The framework learns the Control Barrier Value Function through model-free finite-difference recursion on demonstration data, then applies a closed-form control law that modulates the DMP velocity command. An expectile objective keeps learning in-distribution, and conformal prediction adds finite-sample coverage guarantees. Experiments on a 7-DOF manipulator show the approach matches DMP adaptability while running orders of magnitude faster than quadratic-program baselines.

Core claim

HJ-SafeDMP integrates DMPs with a learned Control Barrier Value Function obtained via offline model-free finite-difference HJ recursion; the function is deployed as a closed-form safety filter that modulates DMP outputs to render the safe set forward-invariant, yielding formal safety certificates without requiring online quadratic-program solves.

What carries the argument

The Control Barrier Value Function (CBVF), learned from demonstration data and applied through a closed-form modulation law that overrides unsafe DMP commands.

If this is right

  • Safety filtering occurs without runtime quadratic programs, enabling orders-of-magnitude faster execution than CBF-QP methods.
  • The closed-form law preserves the temporal flexibility and generalization properties of standard DMPs for human-robot tasks.
  • Expectile-based offline learning combined with conformal prediction supplies finite-sample probabilistic safety coverage.
  • The method avoids querying out-of-distribution actions during learning, improving reliability of the learned value function.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same offline learning pipeline could be applied to other primitive-based controllers that output velocity or acceleration commands.
  • Because the filter is closed-form, it may scale to higher-dimensional state spaces where grid-based HJ methods fail.
  • Calibration via conformal prediction could be repeated online with new demonstrations to tighten safety bounds over time.

Load-bearing premise

The value function learned offline from demonstrations actually renders the safe set forward-invariant on the physical robot under real disturbances.

What would settle it

A physical experiment in which the robot collides with an obstacle while the closed-form safety filter is active, or a timing measurement showing execution speed comparable to or slower than optimization-based baselines.

Figures

Figures reproduced from arXiv: 2606.28995 by Ravi Prakash, Siddhanth Ramesh.

Figure 1
Figure 1. Figure 1: HJ-SafeDMP framework pipeline. (a) Demonstrations are recorded and encoded as DMPs. (b) A CBVF is learned [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

Robots deployed in safety-critical environments must execute motions that are simultaneously robust to disturbances and provably safe from collisions. Dynamic Movement Primitives (DMPs) offer inherent stability, temporal flexibility, and efficient trajectory generalization from single demonstrations, but they lack formal safety certificates. Conversely, Hamilton-Jacobi (HJ) Reachability analysis provides a principled framework for computing worst-case safety margins and forward-invariant safe sets, but classical grid-based methods suffer from the curse of dimensionality and are impractical for real-time control. This paper introduces HJ-SafeDMP, a framework that integrates DMPs with learned HJ Reachability-based safety value functions to achieve provably safe, robust, and computationally efficient robot motion. We learn a Control Barrier Value Function (CBVF) from offline demonstration data using a model-free, finite-difference HJ recursion and deploy it as a real-time safety filter via a closed-form control law that modulates the DMP output. Unlike optimization-based CBF-QP approaches, our method achieves safety filtering without online quadratic program solves, preserving the computational efficiency of DMPs. We further incorporate an expectile-based offline learning objective that avoids querying out-of-distribution actions, and a conformal prediction calibration step that provides finite-sample probabilistic safety coverage. Experimental evaluation on a 7-DOF robot manipulator demonstrates that HJ-SafeDMP achieves formal safety guarantees with orders-of-magnitude faster execution than optimization-based baselines, while maintaining the robustness and adaptability of DMPs for human-robot interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces HJ-SafeDMP, which augments Dynamic Movement Primitives (DMPs) with a learned Control Barrier Value Function (CBVF) obtained via model-free finite-difference Hamilton-Jacobi (HJ) recursion on demonstration data. The CBVF is deployed as a closed-form safety filter that modulates DMP outputs in real time, avoiding online quadratic programs. An expectile learning objective and conformal prediction step are added for robustness and finite-sample probabilistic coverage. Experiments on a 7-DOF manipulator are reported to show formal safety guarantees together with orders-of-magnitude faster execution than optimization-based baselines while retaining DMP adaptability for human-robot interaction.

Significance. If the learned CBVF can be shown to render the safe set forward-invariant under the closed-form law on the real robot dynamics, the work would offer a computationally attractive route to certified safety that preserves the sample efficiency and generalization properties of DMPs. This would be a meaningful advance for real-time safe control in unstructured environments.

major comments (3)
  1. [Abstract] Abstract: the central claim of 'provably safe' motion and 'formal safety guarantees' rests on the learned CBVF ensuring forward invariance when used as a closed-form filter, yet the abstract supplies no theorem, error bound, or derivation showing that the model-free finite-difference HJ recursion yields a value function whose sublevel sets remain invariant under the deployed control law.
  2. [Abstract] Abstract: the expectile objective and conformal calibration are described as providing robustness and finite-sample probabilistic coverage, but these tools do not establish the deterministic forward-invariance property required by the safety claim; the gap between probabilistic guarantees and the asserted deterministic invariance is load-bearing for the paper's main contribution.
  3. [Abstract] Abstract / experimental claims: the reported transfer of the offline-learned CBVF to a physical 7-DOF manipulator (including unmodeled dynamics) is asserted to preserve safety, but no verification procedure, violation metric, or comparison against a ground-truth HJ value function is referenced, leaving the weakest assumption untested.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief statement of the specific robot task and disturbance magnitudes used in the 7-DOF experiments to allow readers to gauge the scope of the claimed robustness.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater clarity on the safety claims. We address each major comment below and will revise the abstract and related sections to better distinguish the theoretical assumptions, the role of probabilistic tools, and the experimental verification approach.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of 'provably safe' motion and 'formal safety guarantees' rests on the learned CBVF ensuring forward invariance when used as a closed-form filter, yet the abstract supplies no theorem, error bound, or derivation showing that the model-free finite-difference HJ recursion yields a value function whose sublevel sets remain invariant under the deployed control law.

    Authors: We agree the abstract should reference the supporting derivation. Section 3.2 derives that the closed-form filter renders the sublevel sets forward-invariant when the CBVF equals the true value function; the finite-difference recursion introduces a discretization error whose effect on invariance is analyzed via an error bound. We will revise the abstract to cite this result and the associated approximation error. revision: yes

  2. Referee: [Abstract] Abstract: the expectile objective and conformal calibration are described as providing robustness and finite-sample probabilistic coverage, but these tools do not establish the deterministic forward-invariance property required by the safety claim; the gap between probabilistic guarantees and the asserted deterministic invariance is load-bearing for the paper's main contribution.

    Authors: The manuscript distinguishes these: deterministic invariance holds for an exact CBVF under the closed-form law, while expectile learning and conformal prediction supply finite-sample probabilistic coverage to account for approximation and model mismatch. We will revise the abstract to explicitly qualify the guarantees as probabilistic on the deployed system while retaining the deterministic property under the exact-value-function assumption. revision: yes

  3. Referee: [Abstract] Abstract / experimental claims: the reported transfer of the offline-learned CBVF to a physical 7-DOF manipulator (including unmodeled dynamics) is asserted to preserve safety, but no verification procedure, violation metric, or comparison against a ground-truth HJ value function is referenced, leaving the weakest assumption untested.

    Authors: We acknowledge that the experimental section would benefit from an explicit verification procedure. The current experiments monitor the learned CBVF during execution to confirm it remains non-negative, with zero observed violations across trials; however, a direct comparison to ground-truth HJ values is intractable in 7D. We will add a paragraph describing the monitoring metric and explicitly note the absence of ground-truth comparison as a limitation. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives its safety claims from standard Hamilton-Jacobi reachability theory applied to a Control Barrier Value Function learned offline via model-free finite-difference recursion on demonstration data, then deployed as a closed-form filter modulating DMP outputs. No equations or steps in the provided text reduce the formal guarantees to a tautology, a fitted parameter renamed as a prediction, or a load-bearing self-citation chain. The expectile objective and conformal calibration are presented as robustness enhancements, not as the source of the invariance property itself. The derivation remains self-contained against external HJ theory benchmarks rather than collapsing to its own inputs by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Review is limited to the abstract; the central claim rests on the unverified assumption that the learned value function supplies formal safety and that the closed-form law preserves invariance.

free parameters (2)
  • expectile parameter
    Used in the offline learning objective to avoid out-of-distribution actions.
  • conformal calibration parameter
    Used to obtain finite-sample probabilistic safety coverage.
axioms (2)
  • domain assumption Model-free finite-difference HJ recursion on demonstration data produces a valid Control Barrier Value Function.
    Invoked when learning the safety filter from offline data.
  • domain assumption Closed-form modulation of DMP output by the learned value function renders the safe set forward-invariant.
    Central to the real-time safety filter claim.

pith-pipeline@v0.9.1-grok · 5800 in / 1487 out tokens · 50540 ms · 2026-06-30T09:24:47.746824+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 11 canonical work pages · 3 internal anchors

  1. [1]

    Safe learning in robotics: From learning-based control to safe reinforcement learning,

    L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022

  2. [2]

    Siciliano, L

    B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo,Robotics: Mod- elling, Planning and Control. Springer Science & Business Media, 2009

  3. [3]

    Dynamical movement primitives: learning attractor models for motor behaviors,

    A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, “Dynamical movement primitives: learning attractor models for motor behaviors,”Neural Computation, vol. 25, no. 2, pp. 328–373, 2013

  4. [4]

    Dynamic movement primitives – a framework for motor control in humans and humanoid robotics,

    S. Schaal, “Dynamic movement primitives – a framework for motor control in humans and humanoid robotics,”Adaptive Motion of Ani- mals and Machines, pp. 261–280, 2006

  5. [5]

    Recent advances in robot learning from demonstration,

    H. Ravichandar, A. S. Polydoros, S. Chernova, and A. Billard, “Recent advances in robot learning from demonstration,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 3, pp. 297–330, 2020

  6. [6]

    Learning and generalization of motor skills by learning from demonstration,

    P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration,” in2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp. 763–768

  7. [7]

    Dynamic movement primitives in robotics: A tutorial survey,

    M. Saveriano, F. J. Abu-Dakka, A. Kramberger, and L. Peternel, “Dynamic movement primitives in robotics: A tutorial survey,”The International Journal of Robotics Research, vol. 42, no. 14, pp. 1133– 1184, 2023

  8. [8]

    Biologically- inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance,

    H. Hoffmann, P. Pastor, D.-H. Park, and S. Schaal, “Biologically- inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance,” in2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp. 2587–2592

  9. [9]

    Real-time obstacle avoidance for manipulators and mobile robots,

    O. Khatib, “Real-time obstacle avoidance for manipulators and mobile robots,”The International Journal of Robotics Research, vol. 5, no. 1, pp. 90–98, 1986

  10. [11]

    Control barrier functions: Theory and applications,

    A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” 2019 18th European control conference (ECC), pp. 3420–3431, 2019

  11. [12]

    Constructive safety using control barrier functions,

    P. Wieland and F. Allg ¨ower, “Constructive safety using control barrier functions,” inProceedings of the 7th IFAC Symposium on Nonlinear Control Systems, 2007, pp. 462–467

  12. [13]

    Learning for safety-critical control with control barrier functions,

    A. J. Taylor, A. Singletary, Y . Yue, and A. D. Ames, “Learning for safety-critical control with control barrier functions,” inLearning for Dynamics and Control. PMLR, 2020, pp. 708–717

  13. [14]

    Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control,

    C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control,”IEEE Transactions on Robotics, 2023

  14. [15]

    Hamilton-jacobi reachability: A brief overview and recent advances,

    S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin, “Hamilton-jacobi reachability: A brief overview and recent advances,” in2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017, pp. 2242–2253

  15. [16]

    Bridging hamilton-jacobi safety analysis and reinforcement learning,

    J. F. Fisac, N. F. Lugovoy, V . Rubies-Royo, S. Ghosh, and C. J. Tomlin, “Bridging hamilton-jacobi safety analysis and reinforcement learning,” in2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8550–8556

  16. [17]

    Fastrack: A modular framework for fast and guaranteed safe motion planning,

    S. L. Herbert, M. Chen, S. Han, S. Bansal, J. F. Fisac, and C. J. Tomlin, “Fastrack: A modular framework for fast and guaranteed safe motion planning,” in2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017, pp. 1517–1522

  17. [18]

    A general safety framework for learning-based control in uncertain robotic systems,

    J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin, “A general safety framework for learning-based control in uncertain robotic systems,”IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2737–2752, 2018

  18. [19]

    Decomposition of reachable sets and tubes for a class of nonlinear systems,

    M. Chen, S. L. Herbert, M. S. Vashishtha, S. Bansal, and C. J. Tomlin, “Decomposition of reachable sets and tubes for a class of nonlinear systems,”IEEE Transactions on Automatic Control, vol. 63, no. 11, pp. 3675–3688, 2018

  19. [20]

    Robust control barrier–value functions for safety-critical control,

    J. J. Choi, D. Lee, K. Sreenath, C. J. Tomlin, and S. L. Herbert, “Robust control barrier–value functions for safety-critical control,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 6814–6821

  20. [21]

    V-OCBF: Learning safety filters from offline data via value-guided offline control barrier functions,

    M. Tayal, M. Tayal, A. Singh, S. Kolathaya, and R. Prakash, “V-OCBF: Learning safety filters from offline data via value-guided offline control barrier functions,”Transactions on Machine Learning Research, 2026. [Online]. Available: https://openreview.net/forum?id= PGO9mpIyyb

  21. [22]

    Offline reinforcement learning with implicit q-learning,

    I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,” inInternational Conference on Learning Representations, 2022

  22. [23]

    Safe flow q-learning: Offline safe reinforcement learning with reachability-based flow policies,

    M. Tayal, M. Tayal, and R. Prakash, “Safe flow q-learning: Offline safe reinforcement learning with reachability-based flow policies,”arXiv preprint arXiv:2603.15136, 2026

  23. [24]

    Formal verification and control with conformal prediction: Practical safety guarantees for autonomous systems,

    L. Lindemann, Y . Zhao, X. Yu, G. J. Pappas, and J. V . Deshmukh, “Formal verification and control with conformal prediction: Practical safety guarantees for autonomous systems,”IEEE Control Systems, vol. 45, no. 6, pp. 72–122, 2025

  24. [25]

    A toolbox of level set methods,

    I. M. Mitchell, “A toolbox of level set methods,” inA Toolbox of Level Set Methods, 2005. [Online]. Available: https://api.semanticscholar. org/CorpusID:59892255

  25. [26]

    Learning from demonstration with model-based gaussian process,

    N. Jaquier, D. Ginsbourger, and S. Calinon, “Learning from demonstration with model-based gaussian process,”arXiv preprint arXiv:1910.05005, 2019

  26. [27]

    A probabilistic approach to robot trajectory generation,

    A. Paraschos, G. Neumann, and J. Peters, “A probabilistic approach to robot trajectory generation,” in2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids). IEEE, 2013, pp. 477–483

  27. [28]

    Kernelized Movement Primitives

    Y . Huang, L. D. Rozo, J. Silv ´erio, and D. G. Caldwell, “Kernelized movement primitives,”arXiv preprint arXiv:1708.08638, 2017

  28. [29]

    Learning stable nonlinear dynamical systems with gaussian mixture models,

    S. M. Khansari-Zadeh and A. Billard, “Learning stable nonlinear dynamical systems with gaussian mixture models,”IEEE Transactions on Robotics, vol. 27, no. 5, pp. 943–957, 2011

  29. [30]

    Learning complex motion plans using neural odes with safety and stability guarantees,

    F. Nawaz, T. Li, N. Matni, and N. Figueroa, “Learning complex motion plans using neural odes with safety and stability guarantees,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 17 216–17 222

  30. [31]

    Planning with diffusion for flexible behavior synthesis,

    M. Janner, Y . Du, J. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 9902–9915

  31. [32]

    Control barrier function based quadratic programs with application to adaptive cruise control,

    A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs with application to adaptive cruise control,” in53rd IEEE Conference on Decision and Control. IEEE, 2014, pp. 6271–6278

  32. [33]

    Control barrier function based quadratic programs with application to bipedal robotic walking,

    S.-C. Hsu, X. Xu, and A. D. Ames, “Control barrier function based quadratic programs with application to bipedal robotic walking,” in 2015 American Control Conference (ACC). IEEE, 2015, pp. 4542– 4548

  33. [34]

    Safety-critical control for dy- namical bipedal walking with precise footstep placement,

    Q. Nguyen and K. Sreenath, “Safety-critical control for dy- namical bipedal walking with precise footstep placement,”IFAC- PapersOnLine, vol. 48, no. 27, pp. 147–154, 2015

  34. [35]

    Onboard safety guarantees for racing drones: High- speed geofencing with control barrier functions,

    A. Singletary, K. Klingebiel, J. Bourne, A. Browning, P. Tokumaru, and A. Ames, “Onboard safety guarantees for racing drones: High- speed geofencing with control barrier functions,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2897–2904, 2022

  35. [36]

    Control barrier functions in dynamic uavs for kinematic obstacle avoidance: A col- lision cone approach,

    M. Tayal, R. Singh, J. Keshavan, and S. Kolathaya, “Control barrier functions in dynamic uavs for kinematic obstacle avoidance: A col- lision cone approach,” in2024 American Control Conference (ACC). IEEE, 2024, pp. 3722–3727

  36. [37]

    End-to-end imitation learning with safety guarantees using control barrier functions,

    R. K. Cosner, Y . Yue, and A. D. Ames, “End-to-end imitation learning with safety guarantees using control barrier functions,” in2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 5316–5322

  37. [38]

    Safe-gil: Safety guided imitation learning for robotic systems,

    Y . U. Ciftci, D. Chiu, Z. Feng, G. S. Sukhatme, and S. Bansal, “Safe-gil: Safety guided imitation learning for robotic systems,”arXiv preprint arXiv:2404.05249, 2024

  38. [39]

    Safe nonlinear control using robust neural Lyapunov-barrier functions,

    C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe nonlinear control using robust neural Lyapunov-barrier functions,” inConference on Robot Learning. PMLR, 2022, pp. 1724–1735

  39. [40]

    Learning control barrier functions from expert demonstrations,

    A. Robey, H. Hu, L. Lindemann, H. Zhang, D. V . Dimarogonas, S. Tu, and N. Matni, “Learning control barrier functions from expert demonstrations,” in2020 59th IEEE Conference on Decision and Control (CDC), 2020, pp. 3717–3724

  40. [41]

    A physics- informed machine learning framework for safe and optimal control of autonomous systems,

    M. Tayal, A. Singh, S. Kolathaya, and S. Bansal, “A physics- informed machine learning framework for safe and optimal control of autonomous systems,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=SrfwiloGQF

  41. [42]

    Learning a formally verified control barrier function in stochastic environment,

    M. Tayal, H. Zhang, P. Jagtap, A. Clark, and S. Kolathaya, “Learning a formally verified control barrier function in stochastic environment,” in2024 IEEE 63rd Conference on Decision and Control (CDC), 2024, pp. 4098–4104

  42. [43]

    Exact verification of relu neural control barrier functions,

    H. Zhang, J. Wu, Y . V orobeychik, and A. Clark, “Exact verification of relu neural control barrier functions,” inAdvances in Neural Information Processing Systems, A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 5685–5705

  43. [44]

    Fossil: A software tool for the formal synthesis of Lyapunov functions and barrier certificates using neural networks,

    A. Abate, D. Ahmed, A. Edwards, M. Giacobbe, and A. Peruffo, “Fossil: A software tool for the formal synthesis of Lyapunov functions and barrier certificates using neural networks,” inProceedings of the 24th International Conference on Hybrid Systems: Computation and Control, 2021, pp. 1–11

  44. [45]

    Learning neural control barrier functions from offline data with conservatism,

    I. Tabbara and H. Sibai, “Learning neural control barrier functions from offline data with conservatism,” 2025. [Online]. Available: https://arxiv.org/abs/2505.00908

  45. [46]

    In-distribution barrier functions: Self-supervised policy filters that avoid out-of-distribution states,

    F. Casta ˜neda, H. Nishimura, R. T. McAllister, K. Sreenath, and A. Gaidon, “In-distribution barrier functions: Self-supervised policy filters that avoid out-of-distribution states,” inProceedings of The 5th Annual Learning for Dynamics and Control Conference, ser. Proceedings of Machine Learning Research, N. Matni, M. Morari, and G. J. Pappas, Eds., vol....

  46. [47]

    Avoidance of concave obsta- cles through rotation of nonlinear dynamics,

    L. Huber, J.-J. Slotine, and A. Billard, “Avoidance of concave obsta- cles through rotation of nonlinear dynamics,”IEEE Transactions on Robotics, vol. 40, pp. 1983–2002, 2023

  47. [48]

    Spatiotemporal tubes for temporal reach-avoid-stay tasks in unknown systems,

    R. Das, A. Basu, and P. Jagtap, “Spatiotemporal tubes for temporal reach-avoid-stay tasks in unknown systems,”IEEE Transactions on Automatic Control, pp. 1–8, 2025

  48. [49]

    Constrained model predictive control: Stability and optimality,

    D. Mayne, J. Rawlings, C. Rao, and P. Scokaert, “Constrained model predictive control: Stability and optimality,”Automatica, vol. 36, no. 6, pp. 789–814, 2000. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0005109899002149

  49. [50]

    A general Hamilton-Jacobi framework for non-linear state-constrained control problems,

    A. Altarovici, O. Bokanowski, and H. Zidani, “A general Hamilton-Jacobi framework for non-linear state-constrained control problems,”ESAIM: Control, Optimisation and Calculus of Variations, vol. 19, no. 2, pp. 337–357, 2013. [Online]. Available: https: //www.numdam.org/articles/10.1051/cocv/2012011/

  50. [51]

    Cooptimizing safety and performance with a control-constrained formulation,

    H. Wang, A. Dhande, and S. Bansal, “Cooptimizing safety and performance with a control-constrained formulation,”IEEE Control Systems Letters, vol. 8, pp. 2739–2744, 2024

  51. [52]

    Epigraph-guided flow matching for safe and performant offline reinforcement learning,

    M. Tayal and M. Tayal, “Epigraph-guided flow matching for safe and performant offline reinforcement learning,”arXiv preprint arXiv:2602.08054, 2026

  52. [53]

    A tutorial on conformal prediction

    G. Shafer and V . V ovk, “A tutorial on conformal prediction.”Journal of Machine Learning Research, vol. 9, no. 3, 2008

  53. [54]

    Cp-ncbf: A con- formal prediction-based approach to synthesize verified neural control barrier functions,

    M. Tayal, A. Singh, P. Jagtap, and S. Kolathaya, “Cp-ncbf: A con- formal prediction-based approach to synthesize verified neural control barrier functions,”arXiv preprint arXiv:2503.17395, 2025

  54. [55]

    Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

    S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,”arXiv preprint arXiv:2005.01643, 2020

  55. [56]

    Conservative q-learning for offline reinforcement learning,

    A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 1179–1191. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 202...

  56. [57]

    Datasets and benchmarks for offline safe reinforcement learning,

    Z. Liu, Z. Guo, H. Lin, Y . Yao, J. Zhu, Z. Cen, H. Hu, W. Yu, T. Zhang, J. Tan, and D. Zhao, “Datasets and benchmarks for offline safe reinforcement learning,”Journal of Data-centric Machine Learning Research, 2024

  57. [58]

    Safety gymnasium: A unified safe reinforcement learning benchmark,

    J. Ji, B. Zhang, J. Zhou, X. Pan, W. Huang, R. Sun, Y . Geng, Y . Zhong, J. Dai, and Y . Yang, “Safety gymnasium: A unified safe reinforcement learning benchmark,” inThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. [Online]. Available: https://openreview.net/forum?id= WZmlxIuIGR

  58. [59]

    Control barrier function based quadratic programs for safety critical systems,

    A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,”IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2017

  59. [60]

    Offline reinforcement learning with implicit q-learning,

    I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,”International Conference on Learning Rep- resentations, 2022

  60. [61]

    A gentle introduction to conformal prediction and distribution-free uncertainty quantification,

    A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and distribution-free uncertainty quantification,”

  61. [62]
  62. [63]

    Safedmps: Integrating formal safety with dmps for adaptive hri,

    P. Tiwari, S. Nath, and R. Prakash, “Safedmps: Integrating formal safety with dmps for adaptive hri,”arXiv preprint arXiv:2603.29708, 2026

  63. [64]

    MADER: Trajectory planner in mul- tiagent and dynamic environments,

    J. Tordesillas and J. P. How, “MADER: Trajectory planner in mul- tiagent and dynamic environments,”IEEE Transactions on Robotics, vol. 38, no. 1, pp. 463–476, 2021

  64. [65]

    A survey on model-based reinforcement learning,

    F.-M. Luo, T. Xu, H. Lai, X.-H. Chen, W. Zhang, and Y . Yu, “A survey on model-based reinforcement learning,”Science China Information Sciences, vol. 67, no. 2, p. 121101, 2024

  65. [66]

    Constrained policy optimization,

    J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” inProceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, p. 22–31