A gradient flow on a continuous-time Bellman error parametrized by feedback gain converges to the optimal LQR controller and stays inside the stabilizing region.
Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
An indirect optimization method inner-approximates the set of passivating state-feedback gains for continuous-time LTI systems by a convex polytope and uses projected gradient flow to minimize the LQR cost inside that polytope.
Riemannian regularization reshapes the policy optimization landscape to enable learning of Kalman gains from data under unknown and rank-deficient covariances with non-asymptotic convergence guarantees.
A convex data-driven formulation yields the optimal LQI feedback gain for continuous-time systems directly from measured data without system matrices.
The authors adapt closed-loop and IRL parameterizations to continuous time, deriving policy iteration schemes, a data-driven CARE, convex reformulations, and a policy gradient flow while unifying the two approaches.
citing papers explorer
-
Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman Error
A gradient flow on a continuous-time Bellman error parametrized by feedback gain converges to the optimal LQR controller and stays inside the stabilizing region.
-
Towards Optimal Passive Feedback Control of LTI Systems under LQR Performance
An indirect optimization method inner-approximates the set of passivating state-feedback gains for continuous-time LTI systems by a convex polytope and uses projected gradient flow to minimize the LQR cost inside that polytope.
-
Learning Kalman Policy for Singular Unknown Covariances via Riemannian Regularization
Riemannian regularization reshapes the policy optimization landscape to enable learning of Kalman gains from data under unknown and rank-deficient covariances with non-asymptotic convergence guarantees.
-
Data-driven Linear Quadratic Integral Control: A Convex Formulation and Policy Gradient Approach
A convex data-driven formulation yields the optimal LQI feedback gain for continuous-time systems directly from measured data without system matrices.
-
Data-Driven Continuous-Time Linear Quadratic Regulator via Closed-Loop and Reinforcement Learning Parameterizations
The authors adapt closed-loop and IRL parameterizations to continuous time, deriving policy iteration schemes, a data-driven CARE, convex reformulations, and a policy gradient flow while unifying the two approaches.