Featurized Occupation Measures for Structured Global Search in Numerical Optimal Control

Haoyang Tan; Hongyu Nie; Jianfeng Tao; Qi Wei

arxiv: 2603.16231 · v2 · submitted 2026-03-17 · 🧮 math.OC · cs.RO· cs.SY· eess.SY

Featurized Occupation Measures for Structured Global Search in Numerical Optimal Control

Qi Wei , Jianfeng Tao , Haoyang Tan , Hongyu Nie This is my paper

Pith reviewed 2026-05-15 10:28 UTC · model grok-4.3

classification 🧮 math.OC cs.ROcs.SYeess.SY

keywords occupation measuresoptimal controlHamilton-Jacobi-Bellmanprimal-dual methodsglobal optimizationfactor graphspassivity-based systemsnumerical methods

0 comments

The pith

Featurized occupation measures provide a primal-dual interface that lets HJB subsolutions guide scalable global search in optimal control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Featurized Occupation Measures to connect explicit Hamilton-Jacobi-Bellman subsolutions with numerical optimal control solvers. Certificates from the dual side direct the primal trajectory search, while primal residuals tighten those certificates in a shared finite-dimensional language. Two realizations, one using weak-form Liouville tests and one using rollout sampling, are shown to become asymptotically consistent with the exact occupation-measure linear program as the discretization is refined. For systems whose factor graphs arise from compatible passivity-based interconnections, the method assembles blockwise HJB inequalities into globally feasible dual certificates, moving the dimensionality burden from state space onto the interconnection topology.

Core claim

Featurized Occupation Measures form a finite-dimensional primal-dual interface that couples numerical optimal control solvers with explicit HJB subsolutions. The explicit realization employs finite weak-form Liouville tests while the implicit realization pairs rollout-based search with sampled primal-dual residuals. Both are asymptotically consistent with the exact occupation-measure linear program under refinement. For factor graphs induced by compatible passivity-based interconnections, blockwise HJB inequalities assemble into globally feasible OM-dual certificates whose decomposition is preserved under blockwise approximation.

What carries the argument

Featurized Occupation Measures, a finite-dimensional primal-dual interface that couples solvers with HJB subsolutions via features and weak-form tests or sampled residuals.

If this is right

Both realizations converge asymptotically to the exact occupation-measure linear program under refinement.
Blockwise HJB inequalities assemble into globally feasible OM-dual certificates for passivity-based factor graphs.
The blockwise decomposition is preserved under approximation, shifting the curse of dimensionality to interconnection topology.
Approximate certificates remain reusable under time shifts and bounded perturbations with explicit degradation bounds.
Certificates of increasing tightness guide sample-based optimizers toward global optima on static obstacle-avoidance benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could scale global search to high-dimensional modular systems by exploiting interconnection structure rather than state dimension.
Certificate reusability under perturbations suggests direct use in receding-horizon or adaptive control loops.
Similar featurization might extend to other decompositions beyond passivity-based interconnections.
One could test whether asymptotic consistency persists when blockwise assembly is only approximate for non-compatible graphs.

Load-bearing premise

The chosen features and weak-form tests or sampled residuals capture enough global information to guide search, and the passivity-based interconnection structure permits exact blockwise assembly of certificates.

What would settle it

A concrete falsifier is the observation that blockwise-assembled certificates on a passivity-interconnected system violate global dual feasibility conditions of the occupation-measure program, or that refined certificates fail to steer a sample-based optimizer to the known global optimum on the static obstacle-avoidance benchmark.

Figures

Figures reproduced from arXiv: 2603.16231 by Haoyang Tan, Hongyu Nie, Jianfeng Tao, Qi Wei.

read the original abstract

Numerical optimal control has long been split between globally structured but dimensionally intractable Hamilton--Jacobi--Bellman (HJB) methods and scalable but local trajectory optimization. We introduce Featurized Occupation Measures (FOM), a finite-dimensional primal--dual interface for coupling numerical optimal control solvers with explicit HJB subsolutions: the certificate guides the primal search, while primal residuals tighten the certificate in a primal-dual language. Two realizations are developed. The explicit realization uses finite weak-form Liouville tests, and the implicit realization couples rollout-based search with sampled primal--dual residuals. Both are proved asymptotically consistent with the exact occupation-measure linear program under refinement, separating primal expressiveness from dual accuracy in the limit. The framework also gives structural conditions under which HJB-type certificates avoid full state-space representation. For factor graphs induced by compatible passivity-based interconnections, blockwise HJB inequalities assemble into globally feasible OM-dual certificates, and the decomposition is preserved under blockwise approximation. The curse of dimensionality is then shifted from state space to interconnection topology. Approximate certificates remain reusable under time shifts and bounded model perturbations, with explicit degradation bounds. On a static obstacle-avoidance benchmark, certificates of increasing tightness guide a sample-based optimizer toward global optima, confirming that even a coarse certificate carries useful global information.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FOM gives a clean primal-dual way to let HJB subsolutions steer trajectory optimization, but the passivity decomposition for global certificates needs an explicit check on the benchmark.

read the letter

The paper's main contribution is a featurized occupation measure setup that creates a finite-dimensional primal-dual loop between HJB subsolutions and trajectory optimization. The explicit version uses weak-form Liouville tests and the implicit one uses rollout residuals; both are proved to converge to the exact occupation-measure LP as refinement increases. That separation of primal expressiveness from dual accuracy is a useful framing. The structural result on passivity-based factor graphs is the part that could matter for scaling: blockwise HJB inequalities assemble into globally feasible dual certificates, and the decomposition survives blockwise approximation. The obstacle-avoidance benchmark shows coarse certificates already guiding a sampler toward better solutions, and the reuse bounds under time shifts and perturbations are stated with explicit degradation rates. Those pieces are concrete and checkable. The soft spot is the load-bearing assumption that the benchmark satisfies the passivity compatibility needed for the blockwise assembly to remain globally feasible. Without a direct verification that the example meets the structural conditions and that approximation errors do not accumulate across blocks, the claim that the curse of dimensionality moves cleanly to topology stays conditional. If the compatibility holds, the argument goes through; if not, the dimensionality reduction benefit is narrower than stated. This is for researchers in numerical optimal control and robotics who already work with trajectory optimization and want a principled way to inject global information without full-state HJB. The machinery is specific enough and the claims are stated sharply enough that it deserves a serious referee.

Referee Report

2 major / 2 minor

Summary. The paper introduces Featurized Occupation Measures (FOM) as a finite-dimensional primal-dual interface coupling numerical optimal control solvers with HJB subsolutions. Two realizations are developed: explicit (finite weak-form Liouville tests) and implicit (rollout-based search with sampled residuals). Both are proved asymptotically consistent with the exact occupation-measure LP under refinement. Structural conditions are provided under which, for factor graphs from compatible passivity-based interconnections, blockwise HJB inequalities assemble into globally feasible OM-dual certificates, shifting the curse of dimensionality to topology. Approximate certificates are reusable under time shifts and perturbations with explicit bounds. The method is illustrated on a static obstacle-avoidance benchmark where increasing certificate tightness guides a sample-based optimizer to global optima.

Significance. If the consistency proofs and structural decomposition hold, this provides a substantive advance in bridging globally structured HJB methods with scalable local trajectory optimization. The ability to assemble certificates blockwise via passivity structure, together with reusability bounds, offers a concrete mechanism for dimensionality reduction without full state-space representation. The benchmark demonstration that even coarse certificates carry useful global information strengthens the practical case.

major comments (2)

[Abstract / structural conditions] Abstract / structural conditions paragraph: The claim that 'for factor graphs induced by compatible passivity-based interconnections, blockwise HJB inequalities assemble into globally feasible OM-dual certificates, and the decomposition is preserved under blockwise approximation' is load-bearing for the dimensionality-reduction argument. The obstacle-avoidance benchmark is presented as confirmation, yet no explicit verification is given that this example satisfies the passivity compatibility condition or that approximation errors do not accumulate across blocks. Without this check the global feasibility guarantee remains conditional.
[Consistency proofs section] Consistency claims (both realizations): The asymptotic consistency with the exact OM LP is stated as proved, separating primal expressiveness from dual accuracy. However, the manuscript must explicitly address whether blockwise approximation errors remain controlled when the passivity interconnection is only approximately satisfied, as this directly affects the claim that the curse shifts from state space to topology.

minor comments (2)

[Introduction / realizations] Clarify the precise definition of 'featurized' occupation measures and the choice of weak-form test functions in the explicit realization; the current description leaves open how the features are selected to capture global information without full state representation.
[Reusability subsection] The reusability bounds for approximate certificates under time shifts and model perturbations are stated with explicit degradation; include a short numerical illustration of the bound tightness on the benchmark to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. We address the major comments point by point below and will revise the manuscript accordingly to strengthen the presentation of the structural conditions and consistency results.

read point-by-point responses

Referee: [Abstract / structural conditions] The claim that blockwise HJB inequalities assemble into globally feasible OM-dual certificates for passivity-based interconnections is load-bearing, but the benchmark lacks explicit verification of the passivity compatibility condition and that approximation errors do not accumulate across blocks.

Authors: We agree that explicit verification is required to support the dimensionality-reduction claim. In the revised manuscript we will add a dedicated subsection verifying that the static obstacle-avoidance benchmark satisfies the compatible passivity-based interconnection assumptions. We will also supply a direct check (via the factor-graph structure and passivity indices) confirming that blockwise approximation errors remain controlled and do not accumulate to violate global feasibility, rendering the guarantee unconditional for the reported example. revision: yes
Referee: [Consistency proofs section] The manuscript must explicitly address whether blockwise approximation errors remain controlled when the passivity interconnection is only approximately satisfied.

Authors: The existing consistency proofs are stated under exact satisfaction of the passivity interconnection. We acknowledge the need for an explicit treatment of the approximate case. The revision will include a new paragraph in the consistency section that derives quantitative bounds on the growth of blockwise errors under small perturbations of the interconnection operators, together with the precise conditions under which the shift of the curse of dimensionality from state space to topology remains valid with controlled degradation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on stated separate proofs and domain assumptions

full rationale

The paper explicitly states that asymptotic consistency with the exact occupation-measure LP is proved separately for both realizations, and the blockwise HJB-to-OM certificate assembly is conditioned on passivity-based interconnection compatibility as a structural assumption rather than a fitted or self-defined quantity. No equation or claim reduces by construction to its own inputs, no parameter is fitted to a subset and renamed as prediction, and no load-bearing step collapses to a self-citation chain. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Review based on abstract only; full derivations unavailable. The framework rests on standard occupation measure theory plus new featurization and decomposition assumptions.

axioms (2)

domain assumption Occupation measures admit finite weak-form Liouville tests that preserve asymptotic consistency under refinement.
Invoked for the explicit realization and consistency proof.
domain assumption Passivity-based interconnections induce factor graphs where blockwise HJB inequalities assemble into globally feasible dual certificates.
Central to the structural decomposition and dimensionality shift.

invented entities (1)

Featurized Occupation Measures (FOM) no independent evidence
purpose: Finite-dimensional primal-dual interface coupling HJB subsolutions with optimal control solvers.
Core new construct introduced to separate primal expressiveness from dual accuracy.

pith-pipeline@v0.9.0 · 5545 in / 1407 out tokens · 48808 ms · 2026-05-15T10:28:05.760332+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

[1]

J. T. Betts,Practical methods for optimal control and estimation using nonlinear programming. SIAM, 2010

work page 2010
[2]

A direct method for trajectory optimization of rigid bodies through contact,

M. Posa, C. Cantu, and R. Tedrake, “A direct method for trajectory optimization of rigid bodies through contact,”The International Journal of Robotics Research, vol. 33, no. 1, pp. 69–81, 2014

work page 2014
[3]

Footstep planning on uneven terrain with mixed-integer convex optimization,

R. Deits and R. Tedrake, “Footstep planning on uneven terrain with mixed-integer convex optimization,” in2014 IEEE-RAS international conference on humanoid robots. IEEE, 2014, pp. 279–286

work page 2014
[4]

Simultaneous trajectory optimization and contact selection for contact-rich manipula- tion with high-fidelity geometry,

M. Zhang, D. K. Jha, A. U. Raghunathan, and K. Hauser, “Simultaneous trajectory optimization and contact selection for contact-rich manipula- tion with high-fidelity geometry,”IEEE Transactions on Robotics, 2025

work page 2025
[5]

Bas ¸ar and G

T. Bas ¸ar and G. J. Olsder,Dynamic noncooperative game theory. SIAM, 1998

work page 1998
[6]

Blending data-driven priors in dynamic games,

J. Lidard, H. Hu, A. Hancock, Z. Zhang, A. G. Contreras, V . Modi, J. DeCastro, D. Gopinath, G. Rosman, N. E. Leonardet al., “Blending data-driven priors in dynamic games,”arXiv preprint arXiv:2402.14174, 2024

work page arXiv 2024
[7]

Multi-agent guided policy search for non-cooperative dynamic games.arXiv preprint arXiv:2509.24226,

J. Li, G. Qu, J. J. Choi, S. Sojoudi, and C. Tomlin, “Multi-agent guided policy search for non-cooperative dynamic games,”arXiv preprint arXiv:2509.24226, 2025

work page arXiv 2025
[8]

Bardi, I

M. Bardi, I. C. Dolcettaet al.,Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Springer, 1997, vol. 12

work page 1997
[9]

W. H. Fleming and H. M. Soner,Controlled Markov processes and viscosity solutions. Springer, 2006

work page 2006
[10]

Bertsekas,Dynamic programming and optimal control: Volume I

D. Bertsekas,Dynamic programming and optimal control: Volume I. Athena scientific, 2012, vol. 4

work page 2012
[11]

Dynamic programming,

R. Bellman, “Dynamic programming,”science, vol. 153, no. 3731, pp. 34–37, 1966

work page 1966
[12]

Suboptimal feedback control of pdes by solving hjb equations on adaptive sparse grids,

J. Garcke and A. Kr ¨oner, “Suboptimal feedback control of pdes by solving hjb equations on adaptive sparse grids,”Journal of Scientific Computing, vol. 70, no. 1, pp. 1–28, 2017

work page 2017
[13]

An adap- tive sparse grid semi-lagrangian scheme for first order hamilton-jacobi bellman equations,

O. Bokanowski, J. Garcke, M. Griebel, and I. Klompmaker, “An adap- tive sparse grid semi-lagrangian scheme for first order hamilton-jacobi bellman equations,”Journal of Scientific Computing, vol. 55, no. 3, pp. 575–605, 2013

work page 2013
[14]

Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback control and hjb equations,

W. Kang and L. C. Wilcox, “Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback control and hjb equations,”Computational Optimization and Applications, vol. 68, no. 2, pp. 289–315, 2017

work page 2017
[15]

Tensor decomposition meth- ods for high-dimensional hamilton–jacobi–bellman equations,

S. Dolgov, D. Kalise, and K. K. Kunisch, “Tensor decomposition meth- ods for high-dimensional hamilton–jacobi–bellman equations,”SIAM Journal on Scientific Computing, vol. 43, no. 3, pp. A1625–A1650, 2021

work page 2021
[16]

Linear hamilton jacobi bellman equations in high dimensions,

M. B. Horowitz, A. Damle, and J. W. Burdick, “Linear hamilton jacobi bellman equations in high dimensions,” in53rd IEEE Conference on Decision and Control. IEEE, 2014, pp. 5880–5887

work page 2014
[17]

Algorithms for overcoming the curse of dimensionality for certain hamilton–jacobi equations arising in control theory and elsewhere,

J. Darbon and S. Osher, “Algorithms for overcoming the curse of dimensionality for certain hamilton–jacobi equations arising in control theory and elsewhere,”Research in the Mathematical Sciences, vol. 3, no. 1, p. 19, 2016

work page 2016
[18]

Algorithm for over- coming the curse of dimensionality for state-dependent hamilton-jacobi equations,

Y . T. Chow, J. Darbon, S. Osher, and W. Yin, “Algorithm for over- coming the curse of dimensionality for state-dependent hamilton-jacobi equations,”Journal of Computational Physics, vol. 387, pp. 376–409, 2019

work page 2019
[19]

L. S. Pontryagin,The Mathematical Theory of Optimal Processes. John Wiley, 1963

work page 1963
[20]

A proof of a local maximum principle for optimal control problems with mixed state constraints,

P. Bosch and J. G ´omez, “A proof of a local maximum principle for optimal control problems with mixed state constraints,”Rev Invest Oper Braz, vol. 9, no. 3, pp. 239–262, 2000

work page 2000
[21]

A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems,

D. Mayne, “A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems,”International Journal of Control, vol. 3, no. 1, pp. 85–95, 1966

work page 1966
[22]

Differential dynamic programming,

D. H. Jacobson and D. Q. Mayne, “Differential dynamic programming,” Elsevier Press, 1970

work page 1970
[23]

Control-limited differential dynamic programming,

Y . Tassa, N. Mansard, and E. Todorov, “Control-limited differential dynamic programming,” in2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 1168–1175

work page 2014
[24]

Iterative linear quadratic regulator design for nonlinear biological movement systems,

W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems,” inFirst International Con- ference on Informatics in Control, Automation and Robotics, vol. 2. SciTePress, 2004, pp. 222–229

work page 2004
[25]

Linear theory for control of nonlinear stochastic sys- tems,

H. J. Kappen, “Linear theory for control of nonlinear stochastic sys- tems,”Physical review letters, vol. 95, no. 20, p. 200201, 2005

work page 2005
[26]

A generalized path integral control approach to reinforcement learning,

E. Theodorou, J. Buchli, and S. Schaal, “A generalized path integral control approach to reinforcement learning,”The Journal of Machine Learning Research, vol. 11, pp. 3137–3181, 2010

work page 2010
[27]

Model predictive path integral control: From theory to parallel computation,

G. Williams, A. Aldrich, and E. A. Theodorou, “Model predictive path integral control: From theory to parallel computation,”Journal of Guidance, Control, and Dynamics, vol. 40, no. 2, pp. 344–357, 2017. 15

work page 2017
[28]

Information-theoretic model predictive control: Theory and applications to autonomous driving,

G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Information-theoretic model predictive control: Theory and applications to autonomous driving,”IEEE Transactions on Robotics, vol. 34, no. 6, pp. 1603–1622, 2018

work page 2018
[29]

Lao: A heuristic search algorithm that finds solutions with loops,

E. A. Hansen and S. Zilberstein, “Lao: A heuristic search algorithm that finds solutions with loops,”Artificial Intelligence, vol. 129, no. 1-2, pp. 35–62, 2001

work page 2001
[30]

Relaxing dynamic programming,

B. Lincoln and A. Rantzer, “Relaxing dynamic programming,”IEEE Transactions on Automatic Control, vol. 51, no. 8, pp. 1249–1260, 2006

work page 2006
[31]

Design of Admissible Heuristics for Kinodynamic Motion Planning via Sum-of-Squares Programming

B. Paden, V . Varriccho, and E. Frazzoli, “Design of admissible heuristics for kinodynamic motion planning via sum-of-squares programming,” arXiv preprint arXiv:1609.06277, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[32]

Convex duality and nonlinear optimal control,

R. Vinter, “Convex duality and nonlinear optimal control,”SIAM journal on control and optimization, vol. 31, no. 2, pp. 518–538, 1993

work page 1993
[33]

Warga,Optimal control of differential and functional equations

J. Warga,Optimal control of differential and functional equations. Academic press, 2014

work page 2014
[34]

Nonlinear optimal control via occupation measures and lmi-relaxations,

J. B. Lasserre, D. Henrion, C. Prieur, and E. Tr ´elat, “Nonlinear optimal control via occupation measures and lmi-relaxations,”SIAM journal on control and optimization, vol. 47, no. 4, pp. 1643–1666, 2008

work page 2008
[35]

The linear programming approach to approximate dynamic programming,

D. P. De Farias and B. Van Roy, “The linear programming approach to approximate dynamic programming,”Operations research, vol. 51, no. 6, pp. 850–865, 2003

work page 2003
[36]

Information relaxations and duality in stochastic dynamic programs: A review and tutorial,

D. B. Brown and J. E. Smith, “Information relaxations and duality in stochastic dynamic programs: A review and tutorial,”Foundations and Trends in Optimization, vol. 5, no. 3, pp. 246–339, 2022

work page 2022
[37]

On infinite linear programming and the moment approach to deterministic infinite horizon discounted optimal control problems,

A. Kamoutsi, T. Sutter, P. M. Esfahani, and J. Lygeros, “On infinite linear programming and the moment approach to deterministic infinite horizon discounted optimal control problems,”IEEE control systems letters, vol. 1, no. 1, pp. 134–139, 2017

work page 2017
[38]

Positivity certificates in op- timal control,

E. Pauwels, D. Henrion, and J.-B. Lasserre, “Positivity certificates in op- timal control,” inGeometric and Numerical Foundations of Movements. Springer, 2017, pp. 113–131

work page 2017
[39]

Optimistic monte carlo tree search with sampled information relaxation dual bounds,

D. R. Jiang, L. Al-Kanj, and W. B. Powell, “Optimistic monte carlo tree search with sampled information relaxation dual bounds,”Operations Research, vol. 68, no. 6, pp. 1678–1697, 2020

work page 2020
[40]

Minimal-time nonlinear control via semi- infinite programming,

A. Oustry and M. Tacchi, “Minimal-time nonlinear control via semi- infinite programming,”arXiv preprint arXiv:2307.00857, 2023

work page arXiv 2023
[41]

Holtorf, A

F. Holtorf, A. Edelman, and C. Rackauckas, “Stochastic optimal control via local occupation measures,”arXiv preprint arXiv:2211.15652, 2022

work page arXiv 2022
[42]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Deterministic policy gradient algorithms,

D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” inInternational conference on machine learning. Pmlr, 2014, pp. 387–395

work page 2014
[44]

Path integral policy improvement: An information-geometric optimization approach,

P. Varnai and D. V . Dimarogonas, “Path integral policy improvement: An information-geometric optimization approach,” 2020

work page 2020
[45]

Q-guided stein variational model predictive control via rl-informed policy prior,

S. Cai, Z. Yin, J. Jacob, and F. Ramos, “Q-guided stein variational model predictive control via rl-informed policy prior,” 2026. [Online]. Available: https://arxiv.org/abs/2507.06625

work page arXiv 2026
[46]

Holtorf,Bounds and low-rank approximation for controlled Markov processes

F. Holtorf,Bounds and low-rank approximation for controlled Markov processes. Massachusetts Institute of Technology, 2024

work page 2024
[47]

Sepulchre, M

R. Sepulchre, M. Jankovi ´c, and P. V . Kokotovi´c,Constructive nonlinear control. Springer Science & Business Media, 1997

work page 1997
[48]

Constructive nonlinear control: a historical perspective,

P. Kokotovi ´c and M. Arcak, “Constructive nonlinear control: a historical perspective,”Automatica, vol. 37, no. 5, pp. 637–662, 2001. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0005109801000024

work page 2001
[49]

Examples for separable control lyapunov functions and their neural network approximation,

L. Gr ¨une and M. Sperl, “Examples for separable control lyapunov functions and their neural network approximation,”IFAC-PapersOnLine, vol. 56, no. 1, pp. 19–24, 2023, 12th IFAC Symposium on Nonlinear Control Systems NOLCOS 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2405896323001921

work page 2023
[50]

Separable approxima- tions of optimal value functions under a decaying sensitivity assump- tion,

M. Sperl, L. Saluzzi, L. Gr ¨une, and D. Kalise, “Separable approxima- tions of optimal value functions under a decaying sensitivity assump- tion,” in2023 62nd IEEE Conference on Decision and Control (CDC), 2023, pp. 259–264

work page 2023
[51]

Compositionality of optimal control laws,

E. Todorov, “Compositionality of optimal control laws,”Advances in neural information processing systems, vol. 22, 2009

work page 2009

[1] [1]

J. T. Betts,Practical methods for optimal control and estimation using nonlinear programming. SIAM, 2010

work page 2010

[2] [2]

A direct method for trajectory optimization of rigid bodies through contact,

M. Posa, C. Cantu, and R. Tedrake, “A direct method for trajectory optimization of rigid bodies through contact,”The International Journal of Robotics Research, vol. 33, no. 1, pp. 69–81, 2014

work page 2014

[3] [3]

Footstep planning on uneven terrain with mixed-integer convex optimization,

R. Deits and R. Tedrake, “Footstep planning on uneven terrain with mixed-integer convex optimization,” in2014 IEEE-RAS international conference on humanoid robots. IEEE, 2014, pp. 279–286

work page 2014

[4] [4]

Simultaneous trajectory optimization and contact selection for contact-rich manipula- tion with high-fidelity geometry,

M. Zhang, D. K. Jha, A. U. Raghunathan, and K. Hauser, “Simultaneous trajectory optimization and contact selection for contact-rich manipula- tion with high-fidelity geometry,”IEEE Transactions on Robotics, 2025

work page 2025

[5] [5]

Bas ¸ar and G

T. Bas ¸ar and G. J. Olsder,Dynamic noncooperative game theory. SIAM, 1998

work page 1998

[6] [6]

Blending data-driven priors in dynamic games,

J. Lidard, H. Hu, A. Hancock, Z. Zhang, A. G. Contreras, V . Modi, J. DeCastro, D. Gopinath, G. Rosman, N. E. Leonardet al., “Blending data-driven priors in dynamic games,”arXiv preprint arXiv:2402.14174, 2024

work page arXiv 2024

[7] [7]

Multi-agent guided policy search for non-cooperative dynamic games.arXiv preprint arXiv:2509.24226,

J. Li, G. Qu, J. J. Choi, S. Sojoudi, and C. Tomlin, “Multi-agent guided policy search for non-cooperative dynamic games,”arXiv preprint arXiv:2509.24226, 2025

work page arXiv 2025

[8] [8]

Bardi, I

M. Bardi, I. C. Dolcettaet al.,Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Springer, 1997, vol. 12

work page 1997

[9] [9]

W. H. Fleming and H. M. Soner,Controlled Markov processes and viscosity solutions. Springer, 2006

work page 2006

[10] [10]

Bertsekas,Dynamic programming and optimal control: Volume I

D. Bertsekas,Dynamic programming and optimal control: Volume I. Athena scientific, 2012, vol. 4

work page 2012

[11] [11]

Dynamic programming,

R. Bellman, “Dynamic programming,”science, vol. 153, no. 3731, pp. 34–37, 1966

work page 1966

[12] [12]

Suboptimal feedback control of pdes by solving hjb equations on adaptive sparse grids,

J. Garcke and A. Kr ¨oner, “Suboptimal feedback control of pdes by solving hjb equations on adaptive sparse grids,”Journal of Scientific Computing, vol. 70, no. 1, pp. 1–28, 2017

work page 2017

[13] [13]

An adap- tive sparse grid semi-lagrangian scheme for first order hamilton-jacobi bellman equations,

O. Bokanowski, J. Garcke, M. Griebel, and I. Klompmaker, “An adap- tive sparse grid semi-lagrangian scheme for first order hamilton-jacobi bellman equations,”Journal of Scientific Computing, vol. 55, no. 3, pp. 575–605, 2013

work page 2013

[14] [14]

Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback control and hjb equations,

W. Kang and L. C. Wilcox, “Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback control and hjb equations,”Computational Optimization and Applications, vol. 68, no. 2, pp. 289–315, 2017

work page 2017

[15] [15]

Tensor decomposition meth- ods for high-dimensional hamilton–jacobi–bellman equations,

S. Dolgov, D. Kalise, and K. K. Kunisch, “Tensor decomposition meth- ods for high-dimensional hamilton–jacobi–bellman equations,”SIAM Journal on Scientific Computing, vol. 43, no. 3, pp. A1625–A1650, 2021

work page 2021

[16] [16]

Linear hamilton jacobi bellman equations in high dimensions,

M. B. Horowitz, A. Damle, and J. W. Burdick, “Linear hamilton jacobi bellman equations in high dimensions,” in53rd IEEE Conference on Decision and Control. IEEE, 2014, pp. 5880–5887

work page 2014

[17] [17]

Algorithms for overcoming the curse of dimensionality for certain hamilton–jacobi equations arising in control theory and elsewhere,

J. Darbon and S. Osher, “Algorithms for overcoming the curse of dimensionality for certain hamilton–jacobi equations arising in control theory and elsewhere,”Research in the Mathematical Sciences, vol. 3, no. 1, p. 19, 2016

work page 2016

[18] [18]

Algorithm for over- coming the curse of dimensionality for state-dependent hamilton-jacobi equations,

Y . T. Chow, J. Darbon, S. Osher, and W. Yin, “Algorithm for over- coming the curse of dimensionality for state-dependent hamilton-jacobi equations,”Journal of Computational Physics, vol. 387, pp. 376–409, 2019

work page 2019

[19] [19]

L. S. Pontryagin,The Mathematical Theory of Optimal Processes. John Wiley, 1963

work page 1963

[20] [20]

A proof of a local maximum principle for optimal control problems with mixed state constraints,

P. Bosch and J. G ´omez, “A proof of a local maximum principle for optimal control problems with mixed state constraints,”Rev Invest Oper Braz, vol. 9, no. 3, pp. 239–262, 2000

work page 2000

[21] [21]

A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems,

D. Mayne, “A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems,”International Journal of Control, vol. 3, no. 1, pp. 85–95, 1966

work page 1966

[22] [22]

Differential dynamic programming,

D. H. Jacobson and D. Q. Mayne, “Differential dynamic programming,” Elsevier Press, 1970

work page 1970

[23] [23]

Control-limited differential dynamic programming,

Y . Tassa, N. Mansard, and E. Todorov, “Control-limited differential dynamic programming,” in2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 1168–1175

work page 2014

[24] [24]

Iterative linear quadratic regulator design for nonlinear biological movement systems,

W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems,” inFirst International Con- ference on Informatics in Control, Automation and Robotics, vol. 2. SciTePress, 2004, pp. 222–229

work page 2004

[25] [25]

Linear theory for control of nonlinear stochastic sys- tems,

H. J. Kappen, “Linear theory for control of nonlinear stochastic sys- tems,”Physical review letters, vol. 95, no. 20, p. 200201, 2005

work page 2005

[26] [26]

A generalized path integral control approach to reinforcement learning,

E. Theodorou, J. Buchli, and S. Schaal, “A generalized path integral control approach to reinforcement learning,”The Journal of Machine Learning Research, vol. 11, pp. 3137–3181, 2010

work page 2010

[27] [27]

Model predictive path integral control: From theory to parallel computation,

G. Williams, A. Aldrich, and E. A. Theodorou, “Model predictive path integral control: From theory to parallel computation,”Journal of Guidance, Control, and Dynamics, vol. 40, no. 2, pp. 344–357, 2017. 15

work page 2017

[28] [28]

Information-theoretic model predictive control: Theory and applications to autonomous driving,

G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Information-theoretic model predictive control: Theory and applications to autonomous driving,”IEEE Transactions on Robotics, vol. 34, no. 6, pp. 1603–1622, 2018

work page 2018

[29] [29]

Lao: A heuristic search algorithm that finds solutions with loops,

E. A. Hansen and S. Zilberstein, “Lao: A heuristic search algorithm that finds solutions with loops,”Artificial Intelligence, vol. 129, no. 1-2, pp. 35–62, 2001

work page 2001

[30] [30]

Relaxing dynamic programming,

B. Lincoln and A. Rantzer, “Relaxing dynamic programming,”IEEE Transactions on Automatic Control, vol. 51, no. 8, pp. 1249–1260, 2006

work page 2006

[31] [31]

Design of Admissible Heuristics for Kinodynamic Motion Planning via Sum-of-Squares Programming

B. Paden, V . Varriccho, and E. Frazzoli, “Design of admissible heuristics for kinodynamic motion planning via sum-of-squares programming,” arXiv preprint arXiv:1609.06277, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[32] [32]

Convex duality and nonlinear optimal control,

R. Vinter, “Convex duality and nonlinear optimal control,”SIAM journal on control and optimization, vol. 31, no. 2, pp. 518–538, 1993

work page 1993

[33] [33]

Warga,Optimal control of differential and functional equations

J. Warga,Optimal control of differential and functional equations. Academic press, 2014

work page 2014

[34] [34]

Nonlinear optimal control via occupation measures and lmi-relaxations,

J. B. Lasserre, D. Henrion, C. Prieur, and E. Tr ´elat, “Nonlinear optimal control via occupation measures and lmi-relaxations,”SIAM journal on control and optimization, vol. 47, no. 4, pp. 1643–1666, 2008

work page 2008

[35] [35]

The linear programming approach to approximate dynamic programming,

D. P. De Farias and B. Van Roy, “The linear programming approach to approximate dynamic programming,”Operations research, vol. 51, no. 6, pp. 850–865, 2003

work page 2003

[36] [36]

Information relaxations and duality in stochastic dynamic programs: A review and tutorial,

D. B. Brown and J. E. Smith, “Information relaxations and duality in stochastic dynamic programs: A review and tutorial,”Foundations and Trends in Optimization, vol. 5, no. 3, pp. 246–339, 2022

work page 2022

[37] [37]

On infinite linear programming and the moment approach to deterministic infinite horizon discounted optimal control problems,

A. Kamoutsi, T. Sutter, P. M. Esfahani, and J. Lygeros, “On infinite linear programming and the moment approach to deterministic infinite horizon discounted optimal control problems,”IEEE control systems letters, vol. 1, no. 1, pp. 134–139, 2017

work page 2017

[38] [38]

Positivity certificates in op- timal control,

E. Pauwels, D. Henrion, and J.-B. Lasserre, “Positivity certificates in op- timal control,” inGeometric and Numerical Foundations of Movements. Springer, 2017, pp. 113–131

work page 2017

[39] [39]

Optimistic monte carlo tree search with sampled information relaxation dual bounds,

D. R. Jiang, L. Al-Kanj, and W. B. Powell, “Optimistic monte carlo tree search with sampled information relaxation dual bounds,”Operations Research, vol. 68, no. 6, pp. 1678–1697, 2020

work page 2020

[40] [40]

Minimal-time nonlinear control via semi- infinite programming,

A. Oustry and M. Tacchi, “Minimal-time nonlinear control via semi- infinite programming,”arXiv preprint arXiv:2307.00857, 2023

work page arXiv 2023

[41] [41]

Holtorf, A

F. Holtorf, A. Edelman, and C. Rackauckas, “Stochastic optimal control via local occupation measures,”arXiv preprint arXiv:2211.15652, 2022

work page arXiv 2022

[42] [42]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[43] [43]

Deterministic policy gradient algorithms,

D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” inInternational conference on machine learning. Pmlr, 2014, pp. 387–395

work page 2014

[44] [44]

Path integral policy improvement: An information-geometric optimization approach,

P. Varnai and D. V . Dimarogonas, “Path integral policy improvement: An information-geometric optimization approach,” 2020

work page 2020

[45] [45]

Q-guided stein variational model predictive control via rl-informed policy prior,

S. Cai, Z. Yin, J. Jacob, and F. Ramos, “Q-guided stein variational model predictive control via rl-informed policy prior,” 2026. [Online]. Available: https://arxiv.org/abs/2507.06625

work page arXiv 2026

[46] [46]

Holtorf,Bounds and low-rank approximation for controlled Markov processes

F. Holtorf,Bounds and low-rank approximation for controlled Markov processes. Massachusetts Institute of Technology, 2024

work page 2024

[47] [47]

Sepulchre, M

R. Sepulchre, M. Jankovi ´c, and P. V . Kokotovi´c,Constructive nonlinear control. Springer Science & Business Media, 1997

work page 1997

[48] [48]

Constructive nonlinear control: a historical perspective,

P. Kokotovi ´c and M. Arcak, “Constructive nonlinear control: a historical perspective,”Automatica, vol. 37, no. 5, pp. 637–662, 2001. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0005109801000024

work page 2001

[49] [49]

Examples for separable control lyapunov functions and their neural network approximation,

L. Gr ¨une and M. Sperl, “Examples for separable control lyapunov functions and their neural network approximation,”IFAC-PapersOnLine, vol. 56, no. 1, pp. 19–24, 2023, 12th IFAC Symposium on Nonlinear Control Systems NOLCOS 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2405896323001921

work page 2023

[50] [50]

Separable approxima- tions of optimal value functions under a decaying sensitivity assump- tion,

M. Sperl, L. Saluzzi, L. Gr ¨une, and D. Kalise, “Separable approxima- tions of optimal value functions under a decaying sensitivity assump- tion,” in2023 62nd IEEE Conference on Decision and Control (CDC), 2023, pp. 259–264

work page 2023

[51] [51]

Compositionality of optimal control laws,

E. Todorov, “Compositionality of optimal control laws,”Advances in neural information processing systems, vol. 22, 2009

work page 2009