Optimal control as a graphical model inference problem

B. Kappen; M. Opper; V. Gomez

arxiv: 0901.0633 · v3 · pith:JR7P6XF2new · submitted 2009-01-06 · 🧮 math.OC · cs.SY· eess.SY

Optimal control as a graphical model inference problem

B. Kappen , V. Gomez , M. Opper This is my paper

classification 🧮 math.OC cs.SYeess.SY

keywords controlinferenceoptimalapproximatecomputationappliedproblemapproach

0 comments

read the original abstract

We reformulate a class of non-linear stochastic optimal control problems introduced by Todorov (2007) as a Kullback-Leibler (KL) minimization problem. As a result, the optimal control computation reduces to an inference computation and approximate inference methods can be applied to efficiently compute approximate optimal controls. We show how this KL control theory contains the path integral control method as a special case. We provide an example of a block stacking task and a multi-agent cooperative game where we demonstrate how approximate inference can be successfully applied to instances that are too complex for exact computation. We discuss the relation of the KL control approach to other inference approaches to control.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Expected Free Energy-based Planning as Variational Inference
cs.AI 2026-06 unverdicted novelty 7.0

EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.
What Type of Inference is Active Inference?
cs.AI 2026-06 unverdicted novelty 7.0

EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing
cs.LG 2026-05 unverdicted novelty 6.0

Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.