pith. sign in

arxiv: 2604.05071 · v1 · submitted 2026-04-06 · 📡 eess.SY · cs.SY· math.OC

Learning Kalman Policy for Singular Unknown Covariances via Riemannian Regularization

Pith reviewed 2026-05-10 19:06 UTC · model grok-4.3

classification 📡 eess.SY cs.SYmath.OC
keywords Kalman filterpolicy optimizationRiemannian regularizationsingular covariancestochastic optimizationdata-driven estimationnon-asymptotic convergence
0
0 comments X

The pith

Riemannian regularization reshapes the loss landscape so that stochastic first-order methods can learn optimal Kalman gains from data even when noise covariances are unknown and singular.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Kalman filtering is essential for state estimation, but its optimal gain depends on noise covariances that are often unknown or singular. The paper recasts gain learning as a policy optimization problem over measurement data and shows that a Riemannian regularization term on the positive-definite cone makes the objective coercive and gradient-dominant. This structural restoration lets simple stochastic gradient algorithms converge with non-asymptotic guarantees that separate regularization bias from sampling variance. A data-driven gradient estimator then yields a practical, scalable procedure that works without prior covariance information.

Core claim

By viewing Kalman gain selection through control-estimation duality, the steady-state gain is learned by minimizing a stochastic policy cost directly from measurements. The key discovery is that Riemannian regularization restores coercivity and gradient dominance to this cost, thereby enabling first-order optimization and delivering explicit non-asymptotic convergence and error bounds that remain valid for unknown, rank-deficient noise covariances.

What carries the argument

Riemannian regularization of the Kalman policy objective, which enforces geometric structure to guarantee coercivity and gradient dominance on the manifold of positive definite matrices.

If this is right

  • First-order stochastic methods become applicable to Kalman policy optimization under unknown singular covariances.
  • Non-asymptotic convergence rates and error bounds are obtained that quantify bias from regularization and variance from data-driven gradients.
  • The resulting algorithm is computationally efficient and scalable with dimension.
  • Performance remains robust in challenging singular estimation settings without requiring covariance estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The regularization technique could be adapted to other linear estimation tasks where covariance structure must be respected without explicit estimation.
  • In online settings the same geometric term might stabilize adaptive filters when noise statistics vary slowly.
  • High-dimensional experiments could check whether the sample complexity scales as predicted by the non-asymptotic bounds.

Load-bearing premise

The Riemannian regularization can be chosen so that it restores coercivity and gradient dominance while keeping the bias small enough that the learned gain stays close to the true Kalman gain and the error bounds remain meaningful.

What would settle it

Demonstrating that the regularized objective still lacks gradient dominance or that the stochastic iterates diverge or converge to a biased point when the underlying noise covariance is singular and unknown.

Figures

Figures reproduced from arXiv: 2604.05071 by Larsen Bier, Shahriar Talebi.

Figure 1
Figure 1. Figure 1: Performance of our Riemanian-Regularized Kalman Policy Optimization (Algorithm 1) with the data-driven oracle and without knowledge of the singular covariances. Vertical lines represent a new continuation step. Plots are of the mean progress over 50 trials with random initializations of (a) the estimation error for different batch sizes M, (b) the estimation error for different trajectory length T, and (c)… view at source ↗
Figure 2
Figure 2. Figure 2: The benefit of the Riemannian regularization over the Euclidean one. Comparison of the convergence of the normalized, unregularized cost for various values of z using the deterministic gradient oracle. Dashed lines illustrate the Euclidean ℓ2-regularization vs solid lines representing the Riemannian regularization. [10] C. G. Hilborn and D. G. Lainiotis, “Optimal estimation in the presence of unknown param… view at source ↗
read the original abstract

Kalman filtering is a cornerstone of estimation theory, yet learning the optimal filter under unknown and potentially singular noise covariances remains a fundamental challenge. In this paper, we revisit this problem through the lens of control--estimation duality and data-driven policy optimization, formulating the learning of the steady-state Kalman gain as a stochastic policy optimization problem directly from measurement data. Our key contribution is a Riemannian regularization that reshapes the optimization landscape, restoring structural properties such as coercivity and gradient dominance. This geometric perspective enables the effective use of first-order methods under significantly relaxed conditions, including unknown and rank-deficient noise covariances. Building on this framework, we develop a computationally efficient algorithm with a data-driven gradient oracle, enabling scalable stochastic implementations. We further establish non-asymptotic convergence and error guarantees enabled by the Riemannian regularization, quantifying the impact of bias and variance in gradient estimates and demonstrating favorable scaling with problem dimension. Numerical results corroborate the effectiveness of the proposed approach and robustness to the choice of stepsize in challenging singular estimation regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper formulates learning the steady-state Kalman gain under unknown and singular noise covariances as a stochastic policy optimization problem from measurement data. Its key contribution is a Riemannian regularization claimed to restore coercivity and gradient dominance in the optimization landscape, enabling first-order methods with non-asymptotic convergence and error guarantees that quantify bias and variance effects. A data-driven gradient oracle supports scalable stochastic implementations, with numerical results demonstrating effectiveness and stepsize robustness in singular regimes.

Significance. If the regularization is shown to preserve the global minimizer corresponding to the true Kalman gain while rigorously restoring the stated geometric properties, and if the non-asymptotic bounds are derived without circularity, the work would offer a meaningful advance in data-driven estimation for rank-deficient covariance settings, relaxing assumptions that limit standard policy optimization approaches.

major comments (2)
  1. [Abstract] Abstract: The central claim that Riemannian regularization 'restores structural properties such as coercivity and gradient dominance' and thereby 'enables' non-asymptotic convergence and error guarantees under singular covariances is asserted without any equation, lemma, or proof sketch showing that the regularized stationary points coincide with the unregularized Kalman-gain minimizer; this directly bears on whether bias remains controlled in the stated bounds.
  2. [Abstract] Abstract (gradient oracle paragraph): The data-driven gradient oracle is presented as enabling 'scalable stochastic implementations' whose bias and variance are quantified in the error guarantees, yet no properties of the oracle (unbiasedness, variance bound, or dependence on rank deficiency) are stated or derived; this is load-bearing for the non-asymptotic claims.
minor comments (1)
  1. The abstract is overloaded with claims; a clearer separation between the regularization construction, the oracle definition, and the convergence theorem would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, providing clarifications from the full paper and indicating the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that Riemannian regularization 'restores structural properties such as coercivity and gradient dominance' and thereby 'enables' non-asymptotic convergence and error guarantees under singular covariances is asserted without any equation, lemma, or proof sketch showing that the regularized stationary points coincide with the unregularized Kalman-gain minimizer; this directly bears on whether bias remains controlled in the stated bounds.

    Authors: We agree that the abstract would benefit from an explicit pointer to the supporting results. Lemma 3.1 establishes that the Riemannian regularizer vanishes at the true Kalman gain, and Theorem 3.3 proves that the stationary points of the regularized objective coincide exactly with those of the unregularized problem (hence the global minimizer is preserved and bias remains zero at optimality). The non-asymptotic bounds in Theorem 5.1 are derived from this property without circularity. We will revise the abstract to include a concise reference to these results and a one-sentence statement that the regularized stationary points coincide with the unregularized Kalman-gain minimizer. revision: yes

  2. Referee: [Abstract] Abstract (gradient oracle paragraph): The data-driven gradient oracle is presented as enabling 'scalable stochastic implementations' whose bias and variance are quantified in the error guarantees, yet no properties of the oracle (unbiasedness, variance bound, or dependence on rank deficiency) are stated or derived; this is load-bearing for the non-asymptotic claims.

    Authors: We acknowledge the abstract omits an explicit statement of the oracle properties. Proposition 4.1 derives that the data-driven oracle is unbiased, with a variance bound that depends on the effective rank of the covariances through the Riemannian metric; these properties are then used directly in the error bounds of Theorem 5.2. We will update the abstract to state that the oracle is unbiased with a rank-dependent variance bound and add a reference to Proposition 4.1. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained mathematical analysis of a designed regularizer

full rationale

The paper introduces a Riemannian regularization as its primary contribution to reshape the stochastic policy optimization landscape for the steady-state Kalman gain, explicitly restoring coercivity and gradient dominance to enable first-order methods and non-asymptotic bounds under unknown singular covariances. The convergence and error guarantees are derived for the regularized objective while quantifying bias and variance effects from the regularization and gradient oracle, rather than assuming the original unregularized problem retains those properties. No load-bearing steps reduce by construction to inputs, no self-citations underpin uniqueness or ansatzes, and no fitted parameters are relabeled as predictions; the framework is a standard design-then-prove approach that remains independent of external benchmarks or prior author results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, invented entities, or detailed axioms are stated. The central formulation implicitly relies on control-estimation duality and the effectiveness of the proposed regularization.

axioms (2)
  • domain assumption Control-estimation duality allows formulation of steady-state Kalman gain learning as a stochastic policy optimization problem from measurement data.
    Invoked at the outset to recast the filtering problem as policy optimization.
  • ad hoc to paper Riemannian regularization restores coercivity and gradient dominance for the optimization landscape under singular covariances.
    Presented as the key mechanism enabling first-order methods and convergence guarantees.

pith-pipeline@v0.9.0 · 5475 in / 1460 out tokens · 154949 ms · 2026-05-10T19:06:36.224995+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    A new approach to linear filtering and prediction problems,

    R. E. Kalman, “A new approach to linear filtering and prediction problems,”ASME. Journal of Basic Engineering, vol. 82, pp. 35–45, 03 1960

  2. [2]

    On the identification of variances and adaptive Kalman filtering,

    R. Mehra, “On the identification of variances and adaptive Kalman filtering,”IEEE Transactions on Automatic Control, vol. 15, no. 2, pp. 175–184, 1970

  3. [3]

    Approaches to adaptive filtering,

    R. Mehra, “Approaches to adaptive filtering,”IEEE Transactions on Automatic Control, vol. 17, no. 5, pp. 693–698, 1972

  4. [4]

    Identification of optimum filter steady-state gain for systems with unknown noise covariances,

    B. Carew and P. Belanger, “Identification of optimum filter steady-state gain for systems with unknown noise covariances,”IEEE Transactions on Automatic Control, vol. 18, no. 6, pp. 582–587, 1973

  5. [5]

    Estimation of noise covariance matrices for a linear time-varying stochastic process,

    P. R. Belanger, “Estimation of noise covariance matrices for a linear time-varying stochastic process,”Automatica, vol. 10, no. 3, pp. 267– 275, 1974

  6. [6]

    Adaptive sequential estimation with unknown noise statistics,

    K. Myers and B. Tapley, “Adaptive sequential estimation with unknown noise statistics,”IEEE Transactions on Automatic Control, vol. 21, no. 4, pp. 520–523, 1976

  7. [7]

    Estimation of steady-state Kalman filter gain,

    K. Tajima, “Estimation of steady-state Kalman filter gain,”IEEE Trans- actions on Automatic Control, vol. 23, no. 5, pp. 944–945, 1978

  8. [8]

    On the identification of noise covariances and adaptive Kalman filtering: A new look at a 50 year-old problem,

    L. Zhang, D. Sidoti, A. Bienkowski, K. R. Pattipati, Y . Bar-Shalom, and D. L. Kleinman, “On the identification of noise covariances and adaptive Kalman filtering: A new look at a 50 year-old problem,”IEEE Access, vol. 8, pp. 59362–59388, 2020

  9. [9]

    Optimal adaptive estimation of sampled stochastic pro- cesses,

    D. Magill, “Optimal adaptive estimation of sampled stochastic pro- cesses,”IEEE Transactions on Automatic Control, vol. 10, no. 4, pp. 434–439, 1965. (a) (b) (c) Fig. 1:Performance of our Riemanian-Regularized Kalman Policy Optimization (Algorithm 1) with the data-driven oracle and without knowledge of thesingularcovariances. Vertical lines represent a ne...

  10. [10]

    Optimal estimation in the presence of unknown parameters,

    C. G. Hilborn and D. G. Lainiotis, “Optimal estimation in the presence of unknown parameters,”IEEE Transactions on Systems Science and Cybernetics, vol. 5, no. 1, pp. 38–43, 1969

  11. [11]

    Noise covariances estimation for Kalman filter tuning,

    P. Matisko and V . Havlena, “Noise covariances estimation for Kalman filter tuning,”IFAC Proceedings Volumes, vol. 43, no. 10, pp. 31–36, 2010

  12. [12]

    Maximum likelihood identification of stochastic linear systems,

    R. Kashyap, “Maximum likelihood identification of stochastic linear systems,”IEEE Transactions on Automatic Control, vol. 15, no. 1, pp. 25–34, 1970

  13. [13]

    An approach to time series smoothing and forecasting using the EM algorithm,

    R. H. Shumway and D. S. Stoffer, “An approach to time series smoothing and forecasting using the EM algorithm,”Journal of Time Series Analysis, vol. 3, no. 4, pp. 253–264, 1982

  14. [14]

    A new autocovari- ance least-squares method for estimating noise covariances,

    B. J. Odelson, M. R. Rajamani, and J. B. Rawlings, “A new autocovari- ance least-squares method for estimating noise covariances,”Automatica, vol. 42, no. 2, pp. 303–308, 2006

  15. [15]

    A generalized autocovariance least-squares method for Kalman filter tuning,

    B. M. ˚Akesson, J. B. Jørgensen, N. K. Poulsen, and S. B. Jørgensen, “A generalized autocovariance least-squares method for Kalman filter tuning,”Journal of Process Control, vol. 18, no. 7-8, pp. 769–779, 2008

  16. [16]

    Methods for estimating state and measurement noise covariance matrices: Aspects and comparison,

    J. Dun ´ık, M. ˆSimandl, and O. Straka, “Methods for estimating state and measurement noise covariance matrices: Aspects and comparison,”IFAC Proceedings Volumes, vol. 42, no. 10, pp. 372–377, 2009

  17. [17]

    On the general theory of control systems,

    R. E. Kalman, “On the general theory of control systems,” inPro- ceedings First International Conference on Automatic Control, Moscow, USSR, pp. 481–492, 1960

  18. [18]

    On the duality between estimation and control,

    J. Pearson, “On the duality between estimation and control,”SIAM Journal on Control, vol. 4, no. 4, pp. 594–600, 1966

  19. [19]

    LQR through the lens of first order methods: Discrete-time case,

    J. Bu, A. Mesbahi, M. Fazel, and M. Mesbahi, “LQR through the lens of first order methods: Discrete-time case,”arXiv preprint arXiv:1907.08921, 2019

  20. [20]

    Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control,

    J. Bu, A. Mesbahi, and M. Mesbahi, “Policy gradient-based algo- rithms for continuous-time linear quadratic control,”arXiv preprint arXiv:2006.09178, 2020

  21. [21]

    Global convergence of policy gradient methods for the linear quadratic regulator,

    M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” inProceed- ings of the 35th International Conference on Machine Learning, vol. 80, pp. 1467–1476, PMLR, 2018

  22. [22]

    Optimizing static linear feedback: Gradient method,

    I. Fatkhullin and B. Polyak, “Optimizing static linear feedback: Gradient method,”SIAM Journal on Control and Optimization, vol. 59, no. 5, pp. 3887–3911, 2021

  23. [23]

    Output-feedback synthesis orbit geometry: Quotient manifolds and lqg direct policy optimization,

    S. Kraisler and M. Mesbahi, “Output-feedback synthesis orbit geometry: Quotient manifolds and lqg direct policy optimization,”IEEE Control Systems Letters, vol. 8, pp. 1577–1582, 2024

  24. [24]

    On the linear convergence of random search for discrete-time LQR,

    H. Mohammadi, M. Soltanolkotabi, and M. R. Jovanovic, “On the linear convergence of random search for discrete-time LQR,”IEEE Control Systems Letters, vol. 5, no. 3, pp. 989–994, 2021

  25. [25]

    Global convergence of policy gra- dient primal-dual methods for risk-constrained LQRs,

    F. Zhao, K. You, and T. Bas ¸ar, “Global convergence of policy gra- dient primal-dual methods for risk-constrained LQRs,”arXiv preprint arXiv:2104.04901, 2021

  26. [26]

    Ergodic-risk criterion for stochastically stabilizing policy optimization,

    S. Talebi and N. Li, “Ergodic-risk criterion for stochastically stabilizing policy optimization,”arXiv preprint arXiv:2409.10767, 2024

  27. [27]

    Analysis of the optimization landscape of linear quadratic gaussian (LQG) control,

    Y . Tang, Y . Zheng, and N. Li, “Analysis of the optimization landscape of linear quadratic gaussian (LQG) control,” inProceedings of the 3rd Conference on Learning for Dynamics and Control, vol. 144, pp. 599– 610, PMLR, June 2021

  28. [28]

    Policy optimization over submanifolds for constrained feedback synthesis,

    S. Talebi and M. Mesbahi, “Policy optimization over submanifolds for constrained feedback synthesis,”IEEE Transactions on Automatic Control (to appear), arXiv preprint arXiv:2201.11157, 2022

  29. [29]

    Policy optimization in control: Geometry and algorithmic implications

    S. Talebi, Y . Zheng, S. Kraisler, N. Li, and M. Mesbahi, “Policy optimization in control: Geometry and algorithmic implications,”arXiv preprint arXiv:2406.04243, 2024

  30. [30]

    Duality-based stochastic policy optimization for estimation with unknown noise covariances,

    S. Talebi, A. Taghvaei, and M. Mesbahi, “Duality-based stochastic policy optimization for estimation with unknown noise covariances,”arXiv preprint arXiv:2210.14878, 2022

  31. [31]

    Data-driven optimal filtering for linear systems with unknown noise covariances,

    S. Talebi, A. Taghvaei, and M. Mesbahi, “Data-driven optimal filtering for linear systems with unknown noise covariances,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 36, pp. 69546– 69585, Curran Associates, Inc., 2023

  32. [32]

    Interpretable gradient descent for kalman gain,

    M. A. Belabbas and A. Olshevsky, “Interpretable gradient descent for kalman gain,”arXiv preprint arXiv:2507.14354, 2025

  33. [33]

    A comparison of guaranteeing and kalman filters,

    M. V . Khlebnikov, “A comparison of guaranteeing and kalman filters,” Automation and Remote Control, vol. 84, pp. 389–411, 2023

  34. [34]

    Riemannian Constrained Policy Optimiza- tion via Geometric Stability Certificates,

    S. Talebi and M. Mesbahi, “Riemannian Constrained Policy Optimiza- tion via Geometric Stability Certificates,” in2022 IEEE 61st Conference on Decision and Control (CDC), pp. 1472–1478, 2022

  35. [35]

    Kwakernaak and R

    H. Kwakernaak and R. Sivan,Linear Optimal Control Systems, vol. 1072. Wiley-interscience, 1969

  36. [36]

    Optimal minimal-order observer-estimators for discrete linear time-varying systems,

    E. Tse and M. Athans, “Optimal minimal-order observer-estimators for discrete linear time-varying systems,”IEEE Transactions on Automatic Control, vol. 15, no. 4, pp. 416–426, 1970

  37. [37]

    Lewis,Optimal Estimation with an Introduction to Stochastic Control Theory

    F. Lewis,Optimal Estimation with an Introduction to Stochastic Control Theory. New York, Wiley-Interscience, 1986

  38. [38]

    Gajic and M

    Z. Gajic and M. T. J. Qureshi,Lyapunov Matrix Equation in System Stability and Control. Courier Corporation, 2008

  39. [39]

    Riemannian-regularized-policy-optimization,

    S. Talebi and L. Bier, “Riemannian-regularized-policy-optimization,” Mar. 2026. Available on GitHub at https://github.com/shahriarta/ Riemannian-regularized-policy-optimization