arxiv: 2605.10493 · v1 · submitted 2026-05-11 · 🧮 math.OC · cs.SY· eess.SY· stat.ML

Recognition: 2 theorem links

· Lean Theorem

A PAC-Bayes Approach for Controlling Unknown Linear Discrete-time Systems

Yujia Luo , Ye Pu , Jonathan H. Manton , Jingge Zhu

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:12 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SYstat.ML

keywords PAC-Bayes boundslinear discrete-time systemsstochastic controllersquadratic costhigh probability guaranteesdata-dependent boundscontroller learningunknown parameters

0 comments

The pith

A PAC-Bayes bound gives high-probability performance guarantees for any stochastic controller learned on unknown linear discrete-time systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework to learn controllers for linear systems whose dynamics parameters are drawn from an unknown but fixed distribution. It supplies a bound on the expected quadratic cost of any such controller that depends only on observed data and holds with high probability. This bound applies even when the cost is unbounded. The authors also supply efficient algorithms that optimize the bound to find controllers, and these algorithms work for both finite and infinite sets of candidate controllers.

Core claim

We present a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller that holds for unbounded quadratic cost. We also propose novel efficient learning algorithms with theoretical guarantees that can be implemented for both finite and infinite controller spaces. In the special case where LQG is optimal, the learned controllers achieve comparable performance to LQG.

What carries the argument

A data-dependent PAC-Bayes generalization bound that upper-bounds the expected quadratic cost of a stochastic controller using its empirical cost on sampled systems plus a complexity penalty based on divergence from a prior.

If this is right

Any controller obtained by minimizing the bound receives a high-probability performance certificate without knowledge of the true parameter distribution.
The same bound and optimization procedure apply directly to both finite and infinite controller parameter spaces.
When the true optimum is linear-quadratic-Gaussian, the learned controllers reach performance levels comparable to the optimum in numerical experiments.
The bound remains valid for unbounded quadratic costs, removing a restriction present in earlier results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support certification of controllers for repeated deployments where each instance draws fresh parameters from the same distribution.
Analogous data-dependent bounds might be developed for nonlinear dynamics or for costs that penalize control effort differently.
Allowing the prior or the posterior to adapt when the parameter distribution drifts would remove a practical limitation of the current guarantees.

Load-bearing premise

System parameters are independently sampled each time from one fixed unknown distribution, and the controller is allowed to be stochastic.

What would settle it

Train a controller using the proposed method on samples from one distribution, then draw many fresh independent samples from the same distribution and check whether the fraction of trajectories whose cost exceeds the bound exceeds the claimed probability.

Figures

Figures reproduced from arXiv: 2605.10493 by Jingge Zhu, Jonathan H. Manton, Ye Pu, Yujia Luo.

**Figure 1.** Figure 1: Comparison of PAC-Bayes upper bounds and expected cost, across varying training trajectories per controller, for a time-invariant linear discrete-time system with a finite controller space. Example 2 (Controller evaluation). To further evaluate the controller learned by our PAC-Bayes approach, we consider a modified version of Example 1 in which the classical finite-horizon LQG controller is globally opt… view at source ↗

**Figure 3.** Figure 3: Comparison of expected costs under the prior P0 and the learned posterior Pθ , together with the PAC-Bayes bound, across varying iterations. cost of P (Iter) θ has dropped from its initial value of about 1290 (corresponding to the starting choice Pθ0 = P0) to around 7 and then remains at this low level in all subsequent iterations. This indicates that Algorithm 2 not only effectively learns a posterior Pθ… view at source ↗

read the original abstract

This paper presents a PAC-Bayes framework for learning controllers for unknown stochastic linear discrete-time systems, where the system parameters are drawn from a fixed but unknown distribution. We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller, and propose novel efficient learning algorithms with theoretical guarantees, which can be implemented for both finite and infinite controller spaces. Compared to prior work, our bound holds for unbounded quadratic cost. In the special case where LQG is optimal, our numerical results suggest that the learned controllers achieve comparable performance to LQG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends PAC-Bayes bounds to unbounded quadratic costs for linear control but may implicitly require stability assumptions on the parameter distribution.

read the letter

The punchline is that this paper gives a PAC-Bayes bound on the performance of stochastic controllers for linear systems with unbounded quadratic costs, along with learning algorithms that scale to infinite controller classes. What stands out is the extension to unbounded costs. Most prior PAC-Bayes work in control sticks to bounded losses to make the concentration inequalities go through. Here they claim a data-dependent high-probability bound that still works when the cost can be arbitrarily large. They also provide efficient algorithms for both finite and infinite spaces, which is practical. The numerical experiments compare against LQG in the case where LQG is optimal and show similar performance, which is a good check. The soft spot is the handling of unboundedness. Standard PAC-Bayes for unbounded random variables requires finite exponential moments or similar conditions. For quadratic costs on linear systems, that boils down to the closed-loop being stable almost surely under the parameter distribution. If the unknown distribution over system parameters includes unstable plants, the expected cost diverges and the bound fails. The abstract does not mention any such restriction or integrability condition, so the paper must either impose it implicitly through the prior or the result applies only when the cost is finite by chance. Either way, the claimed generality is narrower than the abstract suggests. This needs to be checked in the proofs. The rest of the math and the citation choices look standard for the subfield. No obvious fitting issues since the bound is derived rather than tuned. This work is aimed at people doing theoretical work on data-driven control and PAC-Bayes methods. A reader who cares about high-probability guarantees for stochastic policies would get something from the bound and the algorithms. It deserves a serious referee because the unbounded-cost claim is a non-routine step if the details hold up.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a PAC-Bayes framework for unknown linear discrete-time systems whose parameters are drawn from a fixed but unknown distribution. It derives a data-dependent high-probability bound on the expected quadratic cost of any learned stochastic controller and proposes efficient algorithms with guarantees that apply to both finite and infinite controller spaces. The central novelty is that the bound is stated to hold for unbounded quadratic costs, with numerical results suggesting performance comparable to LQG when LQG is optimal.

Significance. If the bound is rigorously established, the work would meaningfully extend PAC-Bayes methods to control problems with unbounded losses, a setting that arises naturally with quadratic costs. The data-dependent character of the bound and the algorithms for infinite controller spaces are concrete strengths that could support safer learning-based control under parametric uncertainty.

major comments (2)

[Main PAC-Bayes bound and its proof] The derivation of the PAC-Bayes bound for unbounded quadratic costs (main theorem and its proof) does not explicitly verify or impose conditions ensuring finite exponential moments of the loss. For linear systems the quadratic cost is finite almost surely only when the closed-loop matrix is stable for almost every parameter draw; if the unknown distribution over parameters places positive mass on unstable or marginally stable poles, the expectation can be infinite and the concentration inequality fails to apply. The manuscript must either restrict the prior/posterior to stabilizing controllers or state the required integrability conditions on the parameter distribution.
[Learning algorithms for infinite spaces] The learning algorithms for infinite controller spaces (Section on algorithms and optimization) are presented as efficient with theoretical guarantees, yet it is unclear how the posterior optimization automatically excludes controllers that produce infinite expected cost. Without such a mechanism the data-dependent bound cannot be evaluated or optimized in practice when the parameter distribution has unstable support.

minor comments (2)

[Abstract and Introduction] The abstract and introduction would benefit from a concise statement of the precise assumptions (e.g., stabilizability, moment conditions) under which the unbounded-cost claim holds.
[Numerical experiments] Numerical results should report statistics over multiple independent trials (mean and standard deviation of the achieved cost) rather than single-run comparisons to LQG.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments. Below we provide point-by-point responses to the major comments and indicate the revisions we will make to address them.

read point-by-point responses

Referee: [Main PAC-Bayes bound and its proof] The derivation of the PAC-Bayes bound for unbounded quadratic costs (main theorem and its proof) does not explicitly verify or impose conditions ensuring finite exponential moments of the loss. For linear systems the quadratic cost is finite almost surely only when the closed-loop matrix is stable for almost every parameter draw; if the unknown distribution over parameters places positive mass on unstable or marginally stable poles, the expectation can be infinite and the concentration inequality fails to apply. The manuscript must either restrict the prior/posterior to stabilizing controllers or state the required integrability conditions on the parameter distribution.

Authors: We thank the referee for pointing this out. The PAC-Bayes bound in the main theorem is derived under the assumption that the expected loss is finite, which requires the closed-loop system to be stable almost surely with respect to the parameter distribution. While the manuscript focuses on stabilizing controllers and the numerical examples use stable systems, we agree that this condition should be stated explicitly. In the revised manuscript, we will add a paragraph in the relevant section clarifying that the prior and posterior distributions are supported only on controllers that stabilize the system for almost all parameter realizations, ensuring the finite exponential moments required for the concentration inequality. This addresses the concern without restricting the generality of the framework, as unstable controllers would yield infinite cost anyway. revision: yes
Referee: [Learning algorithms for infinite spaces] The learning algorithms for infinite controller spaces (Section on algorithms and optimization) are presented as efficient with theoretical guarantees, yet it is unclear how the posterior optimization automatically excludes controllers that produce infinite expected cost. Without such a mechanism the data-dependent bound cannot be evaluated or optimized in practice when the parameter distribution has unstable support.

Authors: We appreciate this observation. The algorithms optimize the posterior over a parameterized family of controllers where stability is enforced through the choice of parameterization. However, to make this explicit and ensure the bound can be evaluated, in the revised version we will include a detailed description of how the optimization procedure restricts to stabilizing controllers, for example by using a reparameterization that guarantees closed-loop stability or by incorporating stability constraints in the optimization. revision: yes

Circularity Check

0 steps flagged

No circularity: PAC-Bayes bound derived from concentration inequalities without reduction to inputs or self-citations

full rationale

The derivation applies standard PAC-Bayes concentration to the expected quadratic cost under a fixed unknown parameter distribution, yielding a data-dependent high-probability bound that holds for unbounded losses via explicit moment or stability conditions stated in the paper. No equation reduces the bound to a fitted quantity by construction, no load-bearing step relies on a self-citation whose content is unverified, and the algorithms optimize the derived bound rather than presupposing its form. The central claim remains a non-tautological generalization guarantee independent of the specific controller parameterization chosen by the user.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard assumptions of linear dynamics and quadratic cost plus the PAC-Bayes prior-over-posterior construction; no new invented entities are introduced.

axioms (2)

domain assumption System is linear discrete-time with parameters drawn i.i.d. from a fixed unknown distribution
Stated in the abstract as the setting for which the bound holds.
domain assumption Quadratic cost may be unbounded
Explicitly contrasted with prior work that required bounded costs.

pith-pipeline@v0.9.0 · 5400 in / 1388 out tokens · 29992 ms · 2026-05-12T05:12:43.369358+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We derive a data-dependent high probability bound on the performance of any learned (stochastic) controller... our bound holds for unbounded quadratic cost.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

Alquier, P. et al. (2024). User-friendly introduction to pac-bayes bounds. Foundations and Trends in Machine Learning , 17(2), 174--303

work page 2024
[2]

and Moore, J.B

Anderson, B.D. and Moore, J.B. (2007). Optimal control: linear quadratic methods. Courier Corporation

work page 2007
[3]

B \'e gin, L., Germain, P., Laviolette, F., and Roy, J.F. (2016). Pac-bayesian bounds based on the r \'e nyi divergence. In Artificial Intelligence and Statistics, 435--444. PMLR

work page 2016
[4]

Boroujeni, M.G., Galimberti, C.L., Krause, A., and Ferrari-Trecate, G. (2024). A pac-bayesian framework for optimal control with stability guarantees. In 2024 IEEE 63rd Conference on Decision and Control (CDC), 8237--8244. IEEE

work page 2024
[5]

Brunke, L., Greeff, M., Hall, A.W., Yuan, Z., Zhou, S., Panerati, J., and Schoellig, A.P. (2022). Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5(1), 411--444

work page 2022
[6]

and Kozachenko, Y.V

Buldygin, V.V. and Kozachenko, Y.V. (1980). Sub-gaussian random variables. Ukrainian Mathematical Journal, 32, 483--489

work page 1980
[7]

and Kumar, P

Campi, M.C. and Kumar, P. (1998). Adaptive linear quadratic gaussian control: the cost-biased approach revisited. SIAM Journal on Control and Optimization, 36(6), 1890--1907

work page 1998
[8]

and Kumar, P

Campi, M. and Kumar, P. (1996). Optimal adaptive control of an lqg system. In Proceedings of 35th IEEE Conference on Decision and Control, volume 1, 349--353. IEEE

work page 1996
[9]

Dean, S., Mania, H., Matni, N., Recht, B., and Tu, S. (2018). Regret bounds for robust adaptive control of the linear quadratic regulator. Advances in Neural Information Processing Systems, 31

work page 2018
[10]

Duncan, T.E., Guo, L., and Pasik-Duncan, B. (2002). Adaptive continuous-time linear quadratic gaussian control. IEEE Transactions on automatic control, 44(9), 1653--1662

work page 2002
[11]

and Boyd, S

Grant, M. and Boyd, S. (2014). CVX : Matlab software for disciplined convex programming, version 2.1. https://cvxr.com/cvx. Accessed: Mar. 2014

work page 2014
[12]

and Jaakkola, T

Honorio, J. and Jaakkola, T. (2014). Tight bounds for the expected risk of linear classifiers and pac-bayes finite-sample guarantees. In Artificial Intelligence and Statistics, 384--392. PMLR

work page 2014
[13]

Lee, K., Jeon, S., Kim, H., and Kum, D. (2019). Optimal path tracking control of autonomous vehicle: Adaptive full-state linear quadratic gaussian (lqg) control. Ieee Access, 7, 109120--109133

work page 2019
[15]

Lissa, P., Deane, C., Schukat, M., Seri, F., Keane, M., and Barrett, E. (2021). Deep reinforcement learning for home energy management system control. Energy and AI, 3, 100043

work page 2021
[16]

Liu, W., Wang, G., Sun, J., Bullo, F., and Chen, J. (2024). Learning robust data-based lqg controllers from noisy data. IEEE Transactions on Automatic Control, 69(12), 8526--8538

work page 2024
[17]

and Goldstein, M

Majumdar, A. and Goldstein, M. (2018). Pac-bayes control: Synthesizing controllers that provably generalize to novel environments. In Conference on robot learning, 293--305. PMLR

work page 2018
[18]

and Losey, D.P

Parekh, S. and Losey, D.P. (2023). Learning latent representations to co-adapt to humans. Autonomous Robots, 47(6), 771--796

work page 2023
[19]

Qian, F., Huang, J., Liu, D., and Hu, S. (2015). Adaptive dual control of discrete-time lqg problems with unknown-but-bounded parameter. Asian Journal of Control, 17(3), 942--951

work page 2015
[20]

Van Den Berg, J., Abbeel, P., and Goldberg, K. (2011). Lqg-mp: Optimized path planning for robots with motion uncertainty and imperfect state information. The International Journal of Robotics Research, 30(7), 895--913

work page 2011
[21]

Van Den Berg, J., Wilkie, D., Guy, S.J., Niethammer, M., and Manocha, D. (2012). Lqg-obstacles: Feedback control with collision avoidance for mobile robots with motion and sensing uncertainty. In 2012 IEEE International Conference on Robotics and Automation, 346--353. IEEE

work page 2012
[22]

Zhang, Y., Fidan, B., and Ioannou, P.A. (2003). Backstepping control of linear time-varying systems with known and unknown parameters. IEEE Transactions on Automatic Control, 48(11), 1908--1925

work page 2003
[23]

SIAM Journal on Control and Optimization , volume=

Adaptive linear quadratic gaussian control: the cost-biased approach revisited , author=. SIAM Journal on Control and Optimization , volume=. 1998 , publisher=

work page 1998
[24]

IEEE Transactions on automatic control , volume=

Adaptive continuous-time linear quadratic Gaussian control , author=. IEEE Transactions on automatic control , volume=. 2002 , publisher=

work page 2002
[25]

IEEE Transactions on Automatic Control , volume=

Backstepping control of linear time-varying systems with known and unknown parameters , author=. IEEE Transactions on Automatic Control , volume=. 2003 , publisher=

work page 2003
[26]

Advances in Neural Information Processing Systems , volume=

Regret bounds for robust adaptive control of the linear quadratic regulator , author=. Advances in Neural Information Processing Systems , volume=

work page
[27]

IEEE Transactions on Automatic Control , volume=

Learning robust data-based LQG controllers from noisy data , author=. IEEE Transactions on Automatic Control , volume=. 2024 , publisher=

work page 2024
[28]

Asian Journal of Control , volume=

Adaptive dual control of discrete-Time LQG problems with unknown-but-bounded parameter , author=. Asian Journal of Control , volume=. 2015 , publisher=

work page 2015
[29]

Energy and AI , volume=

Deep reinforcement learning for home energy management system control , author=. Energy and AI , volume=. 2021 , publisher=

work page 2021
[30]

Annual Review of Control, Robotics, and Autonomous Systems , volume=

Safe learning in robotics: From learning-based control to safe reinforcement learning , author=. Annual Review of Control, Robotics, and Autonomous Systems , volume=. 2022 , publisher=

work page 2022
[31]

Foundations and Trends

User-friendly introduction to PAC-Bayes bounds , author=. Foundations and Trends. 2024 , publisher=

work page 2024
[32]

PAC-Bayesian bounds based on the R

B. PAC-Bayesian bounds based on the R. Artificial Intelligence and Statistics , pages=. 2016 , organization=

work page 2016
[33]

Conference on robot learning , pages=

PAC-Bayes control: Synthesizing controllers that provably generalize to novel environments , author=. Conference on robot learning , pages=. 2018 , organization=

work page 2018
[34]

2024 IEEE 63rd Conference on Decision and Control (CDC) , pages=

A PAC-Bayesian framework for optimal control with stability guarantees , author=. 2024 IEEE 63rd Conference on Decision and Control (CDC) , pages=. 2024 , organization=

work page 2024
[35]

Artificial Intelligence and Statistics , pages=

Tight bounds for the expected risk of linear classifiers and PAC-Bayes finite-sample guarantees , author=. Artificial Intelligence and Statistics , pages=. 2014 , organization=

work page 2014
[36]

2014 , note =

Grant, Michael and Boyd, Stephen , title =. 2014 , note =

work page 2014
[37]

2025 , howpublished =

PAC-Bayes Controller Code and Supplementary Materials , author =. 2025 , howpublished =

work page 2025
[38]

Proceedings of 35th IEEE Conference on Decision and Control , volume=

Optimal adaptive control of an LQG system , author=. Proceedings of 35th IEEE Conference on Decision and Control , volume=. 1996 , organization=

work page 1996
[39]

Buldygin, V. V. and Kozachenko, Yu. V. , title =. Ukrainian Mathematical Journal , volume =

work page
[40]

2007 , publisher=

Optimal control: linear quadratic methods , author=. 2007 , publisher=

work page 2007
[41]

2012 IEEE International Conference on Robotics and Automation , pages=

LQG-obstacles: Feedback control with collision avoidance for mobile robots with motion and sensing uncertainty , author=. 2012 IEEE International Conference on Robotics and Automation , pages=. 2012 , organization=

work page 2012
[42]

Ieee Access , volume=

Optimal path tracking control of autonomous vehicle: Adaptive full-state linear quadratic Gaussian (LQG) control , author=. Ieee Access , volume=. 2019 , publisher=

work page 2019
[43]

The International Journal of Robotics Research , volume=

LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , author=. The International Journal of Robotics Research , volume=. 2011 , publisher=

work page 2011
[44]

Autonomous Robots , volume=

Learning latent representations to co-adapt to humans , author=. Autonomous Robots , volume=. 2023 , publisher=

work page 2023
[45]

Formal Verification and Control with Conformal Prediction,

Formal verification and control with conformal prediction , author=. arXiv preprint arXiv:2409.00536 , year=

work page arXiv