pith. machine review for the scientific record. sign in

arxiv: 2605.10493 · v1 · submitted 2026-05-11 · 🧮 math.OC · cs.SY· eess.SY· stat.ML

Recognition: 2 theorem links

· Lean Theorem

A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:12 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SYstat.ML
keywords PAC-Bayes boundslinear discrete-time systemsstochastic controllersquadratic costhigh probability guaranteesdata-dependent boundscontroller learningunknown parameters
0
0 comments X

The pith

A PAC-Bayes bound gives high-probability performance guarantees for any stochastic controller learned on unknown linear discrete-time systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework to learn controllers for linear systems whose dynamics parameters are drawn from an unknown but fixed distribution. It supplies a bound on the expected quadratic cost of any such controller that depends only on observed data and holds with high probability. This bound applies even when the cost is unbounded. The authors also supply efficient algorithms that optimize the bound to find controllers, and these algorithms work for both finite and infinite sets of candidate controllers.

Core claim

We present a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller that holds for unbounded quadratic cost. We also propose novel efficient learning algorithms with theoretical guarantees that can be implemented for both finite and infinite controller spaces. In the special case where LQG is optimal, the learned controllers achieve comparable performance to LQG.

What carries the argument

A data-dependent PAC-Bayes generalization bound that upper-bounds the expected quadratic cost of a stochastic controller using its empirical cost on sampled systems plus a complexity penalty based on divergence from a prior.

If this is right

  • Any controller obtained by minimizing the bound receives a high-probability performance certificate without knowledge of the true parameter distribution.
  • The same bound and optimization procedure apply directly to both finite and infinite controller parameter spaces.
  • When the true optimum is linear-quadratic-Gaussian, the learned controllers reach performance levels comparable to the optimum in numerical experiments.
  • The bound remains valid for unbounded quadratic costs, removing a restriction present in earlier results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could support certification of controllers for repeated deployments where each instance draws fresh parameters from the same distribution.
  • Analogous data-dependent bounds might be developed for nonlinear dynamics or for costs that penalize control effort differently.
  • Allowing the prior or the posterior to adapt when the parameter distribution drifts would remove a practical limitation of the current guarantees.

Load-bearing premise

System parameters are independently sampled each time from one fixed unknown distribution, and the controller is allowed to be stochastic.

What would settle it

Train a controller using the proposed method on samples from one distribution, then draw many fresh independent samples from the same distribution and check whether the fraction of trajectories whose cost exceeds the bound exceeds the claimed probability.

Figures

Figures reproduced from arXiv: 2605.10493 by Jingge Zhu, Jonathan H. Manton, Ye Pu, Yujia Luo.

Figure 1
Figure 1. Figure 1: Comparison of PAC-Bayes upper bounds and expected cost, across varying training trajectories per controller, for a time-invariant linear discrete-time system with a finite controller space. Example 2 (Controller evaluation). To further evaluate the con￾troller learned by our PAC-Bayes approach, we consider a mod￾ified version of Example 1 in which the classical finite-horizon LQG controller is globally opt… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of expected costs under the prior P0 and the learned pos￾terior Pθ , together with the PAC-Bayes bound, across varying iterations. cost of P (Iter) θ has dropped from its initial value of about 1290 (corresponding to the starting choice Pθ0 = P0) to around 7 and then remains at this low level in all subsequent iterations. This indicates that Algorithm 2 not only effectively learns a posterior Pθ… view at source ↗
read the original abstract

This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a PAC-Bayes framework for unknown linear discrete-time systems whose parameters are drawn from a fixed but unknown distribution. It derives a data-dependent high-probability bound on the expected quadratic cost of any learned stochastic controller and proposes efficient algorithms with guarantees that apply to both finite and infinite controller spaces. The central novelty is that the bound is stated to hold for unbounded quadratic costs, with numerical results suggesting performance comparable to LQG when LQG is optimal.

Significance. If the bound is rigorously established, the work would meaningfully extend PAC-Bayes methods to control problems with unbounded losses, a setting that arises naturally with quadratic costs. The data-dependent character of the bound and the algorithms for infinite controller spaces are concrete strengths that could support safer learning-based control under parametric uncertainty.

major comments (2)
  1. [Main PAC-Bayes bound and its proof] The derivation of the PAC-Bayes bound for unbounded quadratic costs (main theorem and its proof) does not explicitly verify or impose conditions ensuring finite exponential moments of the loss. For linear systems the quadratic cost is finite almost surely only when the closed-loop matrix is stable for almost every parameter draw; if the unknown distribution over parameters places positive mass on unstable or marginally stable poles, the expectation can be infinite and the concentration inequality fails to apply. The manuscript must either restrict the prior/posterior to stabilizing controllers or state the required integrability conditions on the parameter distribution.
  2. [Learning algorithms for infinite spaces] The learning algorithms for infinite controller spaces (Section on algorithms and optimization) are presented as efficient with theoretical guarantees, yet it is unclear how the posterior optimization automatically excludes controllers that produce infinite expected cost. Without such a mechanism the data-dependent bound cannot be evaluated or optimized in practice when the parameter distribution has unstable support.
minor comments (2)
  1. [Abstract and Introduction] The abstract and introduction would benefit from a concise statement of the precise assumptions (e.g., stabilizability, moment conditions) under which the unbounded-cost claim holds.
  2. [Numerical experiments] Numerical results should report statistics over multiple independent trials (mean and standard deviation of the achieved cost) rather than single-run comparisons to LQG.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments. Below we provide point-by-point responses to the major comments and indicate the revisions we will make to address them.

read point-by-point responses
  1. Referee: [Main PAC-Bayes bound and its proof] The derivation of the PAC-Bayes bound for unbounded quadratic costs (main theorem and its proof) does not explicitly verify or impose conditions ensuring finite exponential moments of the loss. For linear systems the quadratic cost is finite almost surely only when the closed-loop matrix is stable for almost every parameter draw; if the unknown distribution over parameters places positive mass on unstable or marginally stable poles, the expectation can be infinite and the concentration inequality fails to apply. The manuscript must either restrict the prior/posterior to stabilizing controllers or state the required integrability conditions on the parameter distribution.

    Authors: We thank the referee for pointing this out. The PAC-Bayes bound in the main theorem is derived under the assumption that the expected loss is finite, which requires the closed-loop system to be stable almost surely with respect to the parameter distribution. While the manuscript focuses on stabilizing controllers and the numerical examples use stable systems, we agree that this condition should be stated explicitly. In the revised manuscript, we will add a paragraph in the relevant section clarifying that the prior and posterior distributions are supported only on controllers that stabilize the system for almost all parameter realizations, ensuring the finite exponential moments required for the concentration inequality. This addresses the concern without restricting the generality of the framework, as unstable controllers would yield infinite cost anyway. revision: yes

  2. Referee: [Learning algorithms for infinite spaces] The learning algorithms for infinite controller spaces (Section on algorithms and optimization) are presented as efficient with theoretical guarantees, yet it is unclear how the posterior optimization automatically excludes controllers that produce infinite expected cost. Without such a mechanism the data-dependent bound cannot be evaluated or optimized in practice when the parameter distribution has unstable support.

    Authors: We appreciate this observation. The algorithms optimize the posterior over a parameterized family of controllers where stability is enforced through the choice of parameterization. However, to make this explicit and ensure the bound can be evaluated, in the revised version we will include a detailed description of how the optimization procedure restricts to stabilizing controllers, for example by using a reparameterization that guarantees closed-loop stability or by incorporating stability constraints in the optimization. revision: yes

Circularity Check

0 steps flagged

No circularity: PAC-Bayes bound derived from concentration inequalities without reduction to inputs or self-citations

full rationale

The derivation applies standard PAC-Bayes concentration to the expected quadratic cost under a fixed unknown parameter distribution, yielding a data-dependent high-probability bound that holds for unbounded losses via explicit moment or stability conditions stated in the paper. No equation reduces the bound to a fitted quantity by construction, no load-bearing step relies on a self-citation whose content is unverified, and the algorithms optimize the derived bound rather than presupposing its form. The central claim remains a non-tautological generalization guarantee independent of the specific controller parameterization chosen by the user.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard assumptions of linear dynamics and quadratic cost plus the PAC-Bayes prior-over-posterior construction; no new invented entities are introduced.

axioms (2)
  • domain assumption System is linear discrete-time with parameters drawn i.i.d. from a fixed unknown distribution
    Stated in the abstract as the setting for which the bound holds.
  • domain assumption Quadratic cost may be unbounded
    Explicitly contrasted with prior work that required bounded costs.

pith-pipeline@v0.9.0 · 5400 in / 1388 out tokens · 29992 ms · 2026-05-12T05:12:43.369358+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    Alquier, P. et al. (2024). User-friendly introduction to pac-bayes bounds. Foundations and Trends in Machine Learning , 17(2), 174--303

  2. [2]

    and Moore, J.B

    Anderson, B.D. and Moore, J.B. (2007). Optimal control: linear quadratic methods. Courier Corporation

  3. [3]

    B \'e gin, L., Germain, P., Laviolette, F., and Roy, J.F. (2016). Pac-bayesian bounds based on the r \'e nyi divergence. In Artificial Intelligence and Statistics, 435--444. PMLR

  4. [4]

    Boroujeni, M.G., Galimberti, C.L., Krause, A., and Ferrari-Trecate, G. (2024). A pac-bayesian framework for optimal control with stability guarantees. In 2024 IEEE 63rd Conference on Decision and Control (CDC), 8237--8244. IEEE

  5. [5]

    Brunke, L., Greeff, M., Hall, A.W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A.P. (2022). Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5(1), 411--444

  6. [6]

    and Kozachenko, Y.V

    Buldygin, V.V. and Kozachenko, Y.V. (1980). Sub-gaussian random variables. Ukrainian Mathematical Journal, 32, 483--489

  7. [7]

    and Kumar, P

    Campi, M.C. and Kumar, P. (1998). Adaptive linear quadratic gaussian control: the cost-biased approach revisited. SIAM Journal on Control and Optimization, 36(6), 1890--1907

  8. [8]

    and Kumar, P

    Campi, M. and Kumar, P. (1996). Optimal adaptive control of an lqg system. In Proceedings of 35th IEEE Conference on Decision and Control, volume 1, 349--353. IEEE

  9. [9]

    Dean, S., Mania, H., Matni, N., Recht, B., and Tu, S. (2018). Regret bounds for robust adaptive control of the linear quadratic regulator. Advances in Neural Information Processing Systems, 31

  10. [10]

    Duncan, T.E., Guo, L., and Pasik-Duncan, B. (2002). Adaptive continuous-time linear quadratic gaussian control. IEEE Transactions on automatic control, 44(9), 1653--1662

  11. [11]

    and Boyd, S

    Grant, M. and Boyd, S. (2014). CVX : Matlab software for disciplined convex programming, version 2.1. https://cvxr.com/cvx. Accessed: Mar. 2014

  12. [12]

    and Jaakkola, T

    Honorio, J. and Jaakkola, T. (2014). Tight bounds for the expected risk of linear classifiers and pac-bayes finite-sample guarantees. In Artificial Intelligence and Statistics, 384--392. PMLR

  13. [13]

    Lee, K., Jeon, S., Kim, H., and Kum, D. (2019). Optimal path tracking control of autonomous vehicle: Adaptive full-state linear quadratic gaussian (lqg) control. Ieee Access, 7, 109120--109133

  14. [15]

    Lissa, P., Deane, C., Schukat, M., Seri, F., Keane, M., and Barrett, E. (2021). Deep reinforcement learning for home energy management system control. Energy and AI, 3, 100043

  15. [16]

    Liu, W., Wang, G., Sun, J., Bullo, F., and Chen, J. (2024). Learning robust data-based lqg controllers from noisy data. IEEE Transactions on Automatic Control, 69(12), 8526--8538

  16. [17]

    and Goldstein, M

    Majumdar, A. and Goldstein, M. (2018). Pac-bayes control: Synthesizing controllers that provably generalize to novel environments. In Conference on robot learning, 293--305. PMLR

  17. [18]

    and Losey, D.P

    Parekh, S. and Losey, D.P. (2023). Learning latent representations to co-adapt to humans. Autonomous Robots, 47(6), 771--796

  18. [19]

    Qian, F., Huang, J., Liu, D., and Hu, S. (2015). Adaptive dual control of discrete-time lqg problems with unknown-but-bounded parameter. Asian Journal of Control, 17(3), 942--951

  19. [20]

    Van Den Berg, J., Abbeel, P., and Goldberg, K. (2011). Lqg-mp: Optimized path planning for robots with motion uncertainty and imperfect state information. The International Journal of Robotics Research, 30(7), 895--913

  20. [21]

    Van Den Berg, J., Wilkie, D., Guy, S.J., Niethammer, M., and Manocha, D. (2012). Lqg-obstacles: Feedback control with collision avoidance for mobile robots with motion and sensing uncertainty. In 2012 IEEE International Conference on Robotics and Automation, 346--353. IEEE

  21. [22]

    Zhang, Y., Fidan, B., and Ioannou, P.A. (2003). Backstepping control of linear time-varying systems with known and unknown parameters. IEEE Transactions on Automatic Control, 48(11), 1908--1925

  22. [23]

    SIAM Journal on Control and Optimization , volume=

    Adaptive linear quadratic gaussian control: the cost-biased approach revisited , author=. SIAM Journal on Control and Optimization , volume=. 1998 , publisher=

  23. [24]

    IEEE Transactions on automatic control , volume=

    Adaptive continuous-time linear quadratic Gaussian control , author=. IEEE Transactions on automatic control , volume=. 2002 , publisher=

  24. [25]

    IEEE Transactions on Automatic Control , volume=

    Backstepping control of linear time-varying systems with known and unknown parameters , author=. IEEE Transactions on Automatic Control , volume=. 2003 , publisher=

  25. [26]

    Advances in Neural Information Processing Systems , volume=

    Regret bounds for robust adaptive control of the linear quadratic regulator , author=. Advances in Neural Information Processing Systems , volume=

  26. [27]

    IEEE Transactions on Automatic Control , volume=

    Learning robust data-based LQG controllers from noisy data , author=. IEEE Transactions on Automatic Control , volume=. 2024 , publisher=

  27. [28]

    Asian Journal of Control , volume=

    Adaptive dual control of discrete-Time LQG problems with unknown-but-bounded parameter , author=. Asian Journal of Control , volume=. 2015 , publisher=

  28. [29]

    Energy and AI , volume=

    Deep reinforcement learning for home energy management system control , author=. Energy and AI , volume=. 2021 , publisher=

  29. [30]

    Annual Review of Control, Robotics, and Autonomous Systems , volume=

    Safe learning in robotics: From learning-based control to safe reinforcement learning , author=. Annual Review of Control, Robotics, and Autonomous Systems , volume=. 2022 , publisher=

  30. [31]

    Foundations and Trends

    User-friendly introduction to PAC-Bayes bounds , author=. Foundations and Trends. 2024 , publisher=

  31. [32]

    PAC-Bayesian bounds based on the R

    B. PAC-Bayesian bounds based on the R. Artificial Intelligence and Statistics , pages=. 2016 , organization=

  32. [33]

    Conference on robot learning , pages=

    PAC-Bayes control: Synthesizing controllers that provably generalize to novel environments , author=. Conference on robot learning , pages=. 2018 , organization=

  33. [34]

    2024 IEEE 63rd Conference on Decision and Control (CDC) , pages=

    A PAC-Bayesian framework for optimal control with stability guarantees , author=. 2024 IEEE 63rd Conference on Decision and Control (CDC) , pages=. 2024 , organization=

  34. [35]

    Artificial Intelligence and Statistics , pages=

    Tight bounds for the expected risk of linear classifiers and PAC-Bayes finite-sample guarantees , author=. Artificial Intelligence and Statistics , pages=. 2014 , organization=

  35. [36]

    2014 , note =

    Grant, Michael and Boyd, Stephen , title =. 2014 , note =

  36. [37]

    2025 , howpublished =

    PAC-Bayes Controller Code and Supplementary Materials , author =. 2025 , howpublished =

  37. [38]

    Proceedings of 35th IEEE Conference on Decision and Control , volume=

    Optimal adaptive control of an LQG system , author=. Proceedings of 35th IEEE Conference on Decision and Control , volume=. 1996 , organization=

  38. [39]

    Buldygin, V. V. and Kozachenko, Yu. V. , title =. Ukrainian Mathematical Journal , volume =

  39. [40]

    2007 , publisher=

    Optimal control: linear quadratic methods , author=. 2007 , publisher=

  40. [41]

    2012 IEEE International Conference on Robotics and Automation , pages=

    LQG-obstacles: Feedback control with collision avoidance for mobile robots with motion and sensing uncertainty , author=. 2012 IEEE International Conference on Robotics and Automation , pages=. 2012 , organization=

  41. [42]

    Ieee Access , volume=

    Optimal path tracking control of autonomous vehicle: Adaptive full-state linear quadratic Gaussian (LQG) control , author=. Ieee Access , volume=. 2019 , publisher=

  42. [43]

    The International Journal of Robotics Research , volume=

    LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , author=. The International Journal of Robotics Research , volume=. 2011 , publisher=

  43. [44]

    Autonomous Robots , volume=

    Learning latent representations to co-adapt to humans , author=. Autonomous Robots , volume=. 2023 , publisher=

  44. [45]

    Formal Verification and Control with Conformal Prediction,

    Formal verification and control with conformal prediction , author=. arXiv preprint arXiv:2409.00536 , year=