Bellman Value Decomposition for Task Logic in Safe Optimal Control

William Sharpless , Oswin So , Dylan Hirsch , Sylvia Herbert , Chuchu Fan

Authors on Pith no claims yet

classification 💻 cs.RO cs.SYeess.SY

keywords bellmanvaluecomplexdecomposedgraphlogicoptimalperformance

read the original abstract

Real-world tasks involve nuanced combinations of goal and safety specifications. In high dimensions, the challenge is exacerbated: formal automata become cumbersome, and the combination of sparse rewards tends to require laborious tuning. In this work, we consider the innate structure of the Bellman Value as a means to naturally organize the problem for improved automatic performance. Namely, we prove the Bellman Value for a complex task defined in temporal logic can be decomposed into a graph of Bellman Values, connected by a set of well-known Bellman equations (BEs): the Reach-Avoid BE, the Avoid BE, and a novel type, the Reach-Avoid-Loop BE. To solve the Value and optimal policy, we propose VDPPO, which embeds the decomposed Value graph into a two-layer neural net, bootstrapping the implicit dependencies. We conduct a variety of simulated and hardware experiments to test our method on complex, high-dimensional tasks involving heterogeneous teams and nonlinear dynamics. Ultimately, we find this approach greatly improves performance over existing baselines, balancing safety and liveness automatically.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Value Functions for Temporal Logic: Optimal Policies and Safety Filters
cs.RO 2026-05 unverdicted novelty 6.0

Non-Markovian policies from decomposed temporal logic value functions are proven optimal for nested Until, Globally, and Globally-Until specifications and extend Q-function safety filters to complex tasks.