Recognition: unknown
CLT-Optimal Parameter Error Bounds for Linear System Identification
Pith reviewed 2026-05-09 21:10 UTC · model grok-4.3
The pith
A second-order decomposition of OLS error yields finite-sample bounds for linear system identification that match instance-optimal rates up to constants or logs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification even in the fundamental setting of estimating a discrete-time linear dynamical system via ordinary least-squares regression. Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale. We
What carries the argument
A novel second-order decomposition of the OLS parameter error in which the lower-order term is a matrix-valued martingale that captures the CLT scaling.
If this is right
- Finite-sample bounds match the instance-specific optimal rates up to constant factors in Frobenius norm for stable systems.
- The same optimality holds, up to constants, in the many-trajectories setting.
- In spectral norm the bounds are optimal up to only polylogarithmic factors of the state dimension.
- Existing bounds overstate the squared error by a multiplicative state-dimension factor on the identified classes of instances.
Where Pith is reading between the lines
- The martingale-based second-order technique may extend to other regression estimators used in system identification beyond plain OLS.
- Tighter dimension dependence could reduce the number of samples needed for reliable identification in high-dimensional control applications.
- The explicit link between the martingale remainder and CLT scaling offers a template for sharpening non-asymptotic bounds in related time-series estimation problems.
Load-bearing premise
The lower-order term in the second-order decomposition of the OLS parameter error is a matrix-valued martingale that correctly captures the CLT scaling for the classes of problem instances identified via asymptotic normality.
What would settle it
Run Monte Carlo trials on linear systems with increasing state dimension, compute the realized squared parameter error in both norms, and check whether it lies within the new bound while exceeding the prior state-of-the-art bound by a factor that grows with state dimension.
read the original abstract
There has been remarkable progress over the past decade in establishing finite-sample, non-asymptotic bounds on recovering unknown system parameters from observed system behavior. Surprisingly, however, we show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification, even in the most fundamental setting of estimating a discrete-time linear dynamical system (LDS) via ordinary least-squares regression (OLS). Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale that we show correctly captures the CLT scaling. From our analysis we obtain finite-sample bounds for both (i) stable systems and (ii) the many-trajectories setting that match the instance-specific optimal rates up to constant factors in Frobenius norm, and polylogarithmic state-dimension factors in spectral norm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that state-of-the-art finite-sample bounds for OLS estimation of linear dynamical system parameters overstate the squared error (in both spectral and Frobenius norms) by a factor of the state dimension for certain problem instances identified via asymptotic normality. It introduces a second-order decomposition of the OLS parameter error in which the lower-order term is a matrix-valued martingale shown to capture the CLT scaling, yielding new finite-sample bounds for stable systems and the many-trajectories regime that match instance-specific optimal rates up to constant factors in Frobenius norm and polylogarithmic state-dimension factors in spectral norm.
Significance. If the central technical claims hold, this would represent a meaningful advance in non-asymptotic system identification by reconciling asymptotic optimality with explicit finite-sample guarantees. The strategy of using CLT to diagnose loose bounds and then refining via a tailored martingale decomposition is a strength, and the resulting bounds could inform sharper analyses in related areas such as adaptive control and reinforcement learning. The paper's emphasis on instance-specific rates rather than worst-case is also a positive feature.
major comments (2)
- [second-order decomposition and main finite-sample bounds] The key technical step is the second-order decomposition and the claim that the matrix martingale lower-order term produces a finite-sample bound whose leading constant in Frobenius norm is independent of state dimension d (up to the asserted factors). Matrix concentration tools (Freedman or matrix Bernstein) introduce a variance proxy and log(1/δ) term; if the proxy is taken as the empirical Gram matrix or an upper bound that scales with d, or if the remainder is controlled only up to o(1/sqrt(T)) with d-dependent constants, the Frobenius-norm bound will contain hidden d factors. Please provide the explicit variance proxy identification and the high-probability bound derivation showing asymptotic equivalence to the CLT covariance term without extra multiplicative d factors for both the stable and many-trajectories cases.
- [asymptotic normality analysis and instance selection] The identification of problem instances via asymptotic normality (used to exhibit the d-factor overstatement in prior bounds) is load-bearing for the optimality claim. It is not clear from the high-level description whether the selected instances are representative or whether the new bound remains optimal outside those instances; a concrete example or corollary showing the exact improvement factor (e.g., removal of the d multiplier in squared Frobenius error) would strengthen the argument.
minor comments (2)
- [abstract and main theorems] Clarify the precise polylogarithmic power of d appearing in the spectral-norm bound; the abstract states 'polylogarithmic state-dimension factors' but an explicit exponent would aid comparison with prior work.
- [notation and preliminaries] Ensure all norm notations (Frobenius vs. spectral) are defined at first use and used consistently in the error decompositions and final bounds.
Simulated Author's Rebuttal
We thank the referee for their insightful comments and for recognizing the potential significance of our work. We address the major comments point-by-point below, providing clarifications and indicating revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [second-order decomposition and main finite-sample bounds] The key technical step is the second-order decomposition and the claim that the matrix martingale lower-order term produces a finite-sample bound whose leading constant in Frobenius norm is independent of state dimension d (up to the asserted factors). Matrix concentration tools (Freedman or matrix Bernstein) introduce a variance proxy and log(1/δ) term; if the proxy is taken as the empirical Gram matrix or an upper bound that scales with d, or if the remainder is controlled only up to o(1/sqrt(T)) with d-dependent constants, the Frobenius-norm bound will contain hidden d factors. Please provide the explicit variance proxy identification and the high-probability bound derivation showing asymptotic equivalence to the CLT covariance term without extra multiplicative d factors for both the stable and many-trajectories cases.
Authors: We appreciate the referee's careful scrutiny of the technical details. In the manuscript, the second-order decomposition is given in Equation (3.4), where the parameter error is expressed as the sum of a first-order term involving the inverse Gram matrix and a second-order martingale term M_T. For the variance proxy in the matrix martingale concentration (using a version of Freedman's inequality adapted to matrices), we identify the proxy as the sum of conditional variances E[ X_t X_t^T | F_{t-1} ] where X_t is the noise term, which for the stable LDS is bounded independently of d in the Frobenius sense for the leading term. The high-probability bound is derived such that ||M_T||_F <= O( sqrt( v log(d/δ) ) ) but when combined with the Gram matrix normalization, the overall Frobenius error bound has leading constant independent of d, matching the CLT covariance trace. We have revised the paper to include an expanded subsection in the appendix (Appendix B.3) with the explicit identification of the variance proxy and the full derivation of the bound for both the stable system and many-trajectories regimes, demonstrating the asymptotic equivalence without extra d factors. revision: yes
-
Referee: [asymptotic normality analysis and instance selection] The identification of problem instances via asymptotic normality (used to exhibit the d-factor overstatement in prior bounds) is load-bearing for the optimality claim. It is not clear from the high-level description whether the selected instances are representative or whether the new bound remains optimal outside those instances; a concrete example or corollary showing the exact improvement factor (e.g., removal of the d multiplier in squared Frobenius error) would strengthen the argument.
Authors: The asymptotic normality analysis in Section 2 is used to diagnose the looseness in prior bounds for specific instances where the CLT covariance has a Frobenius norm independent of d (e.g., when the system matrix leads to isotropic noise propagation). These instances are representative in the sense that they achieve the minimal possible rate, and our bounds are shown to be optimal up to constants for those. To address the request, we have added a new corollary (Corollary 2.3) and a concrete example in Section 2.2: for a diagonal stable system with equal eigenvalues and isotropic inputs, the prior bounds overestimate the squared Frobenius error by a factor of d, while our bound matches the CLT rate exactly up to constants. This illustrates the improvement and confirms optimality outside the worst-case but for these instance-optimal cases. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper identifies instance classes via standard asymptotic normality of the OLS estimator, then applies a second-order error decomposition whose lower-order term is explicitly constructed and proven to be a matrix martingale. Finite-sample bounds are obtained from this decomposition using established martingale concentration results (e.g., matrix Bernstein or Freedman-type inequalities). The resulting bounds are shown to match the CLT scaling for the selected instances, but this matching follows from the explicit identification and bounding of the martingale term rather than from any self-definition, fitted parameter renamed as prediction, or load-bearing self-citation. The chain depends on external, non-author-specific tools and does not reduce any claimed prediction to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Asymptotic normality of the OLS estimator holds for the linear dynamical system under consideration
- ad hoc to paper The parameter error admits a second-order decomposition in which the lower-order term is a matrix-valued martingale that captures the CLT scaling
Reference graph
Works this paper leans on
-
[1]
Simchowitz, H
M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, and B. Recht,Learning without mixing: Towards a sharp analysis of linear system identification, Proc. 31st conf. learn. theory, 2018, pp. 439–473
2018
-
[2]
Sarkar and A
T. Sarkar and A. Rakhlin,Near optimal finite time identification of arbitrary linear dynamical systems, Proc. 36th int. conf. mach. learn., 2019, pp. 5610–5618
2019
-
[3]
M. K. Shirani Faradonbeh, A. Tewari, and G. Michailidis,Finite time identification in unstable linear systems, Automatica96(2018), 342–353
2018
-
[4]
Ziemann, A
I. Ziemann, A. Tsiamis, B. Lee, Y. Jedra, N. Matni, and G. J. Pappas,A tutorial on the non-asymptotic theory of system identification, Proc. 2023 62nd ieee conf. decis. control, 2023, pp. 8921–8939
2023
-
[5]
Jedra and A
Y. Jedra and A. Proutiere,Sample complexity lower bounds for linear system identification, Proc. 2019 ieee 58th conf. decis. control, 2019, pp. 2676–2681
2019
-
[6]
Jedra and A
Y. Jedra and A. Proutiere,Finite-time identification of stable linear systems: Optimality of the least-squares estimator, Proc. 2020 59th ieee conf. decis. control, 2020, pp. 996–1001
2020
-
[7]
S. Tu, R. Frostig, and M. Soltanolkotabi,Learning from many trajectories, Journal of Machine Learning Research 25(2024), no. 216, 1–109
2024
-
[8]
Y. Abbasi-Yadkori, D. P´ al, and C. Szepesv´ ari,Online least squares estimation with self-normalized processes: An application to bandit problems, arXiv preprint arXiv:1102.2670 (2011)
-
[9]
Randrianantoanina,Conditioned square functions for noncommutative martingales, The Annals of Probability 35(2007), no
N. Randrianantoanina,Conditioned square functions for noncommutative martingales, The Annals of Probability 35(2007), no. 3, 1039–1070
2007
-
[10]
Junge and Q
M. Junge and Q. Xu,Noncommutative Burkholder/Rosenthal inequalities ii: Applications, Israel Journal of Mathematics167(Oct. 2008), no. 1, 227–282
2008
- [11]
-
[12]
Ljung,System identification: Theory for the user, 2nd ed., Prentice Hall, Upper Saddle River, NJ, 1999
L. Ljung,System identification: Theory for the user, 2nd ed., Prentice Hall, Upper Saddle River, NJ, 1999
1999
-
[13]
Ziemann, S
I. Ziemann, S. Tu, G. J. Pappas, and N. Matni,The noise level in linear regression with dependent data, Advances in neural information processing systems, 2023, pp. 74903–74920
2023
-
[14]
T. L. Lai and C. Z. Wei,Asymptotic properties of general autoregressive models and strong consistency of least- squares estimates of their parameters, Journal of Multivariate Analysis13(1983), no. 1, 1–23
1983
-
[15]
T. L. Lai and C. Z. Wei,Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems, The Annals of Statistics10(1982), no. 1, 154–166
1982
-
[16]
T. W. Anderson and N. Kunitomo,Asymptotic distributions of regression and autoregression coefficients with martingale difference disturbances, Journal of Multivariate Analysis40(1992), no. 2, 221–243
1992
-
[17]
S White,The limiting distribution of the serial correlation coefficient in the explosive case, The Annals of Mathematical Statistics (1958), 1188–1197
J. S White,The limiting distribution of the serial correlation coefficient in the explosive case, The Annals of Mathematical Statistics (1958), 1188–1197
1958
-
[18]
P. C. Phillips and T. Magdalinos,Inconsistent VAR regression with common explosive roots, Econometric Theory 29(2013), no. 4, 808–837
2013
-
[19]
W Van der Vaart,Asymptotic statistics, Vol
A. W Van der Vaart,Asymptotic statistics, Vol. 3, Cambridge university press, 2000
2000
-
[20]
N. H. Chan and C. Z. Wei,Limiting distributions of least squares estimates of unstable autoregressive processes, The Annals of Statistics (1988), 367–401
1988
-
[21]
Vershynin,High-dimensional probability: An introduction with applications in data science, Vol
R. Vershynin,High-dimensional probability: An introduction with applications in data science, Vol. 47, Cambridge university press, 2018
2018
-
[22]
Rudelson and R
M. Rudelson and R. Vershynin,Hanson-Wright inequality and sub-gaussian concentration, Electronic Commu- nications in Probability18(2013), 1–9
2013
-
[23]
Adamczak,A note on the Hanson-Wright inequality for random vectors with dependencies, Electronic Com- munications in Probability20(2015), 1–13
R. Adamczak,A note on the Hanson-Wright inequality for random vectors with dependencies, Electronic Com- munications in Probability20(2015), 1–13
2015
-
[24]
Mourtada,Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices, The Annals of Statistics50(2022), no
J. Mourtada,Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices, The Annals of Statistics50(2022), no. 4, 2157–2178
2022
-
[25]
Carbery and J
A. Carbery and J. Wright,Distributional andL q norm inequalities for polynomials over convex bodies inR n, Mathematical Research Letters8(2001), no. 3, 233–248
2001
-
[26]
Bakry, I
D. Bakry, I. Gentil, and M. Ledoux,Analysis and geometry of markov diffusion operators, Springer International Publishing, 2014
2014
-
[27]
Y. T. Lee and S. S. Vempala,Eldan’s stochastic localization and the KLS conjecture: Isoperimetry, concentration and mixing, Annals of Mathematics199(2024), no. 3, 1043–1092
2024
-
[28]
Adamczak,Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, arXiv preprint arXiv:0505175 (2005)
R. Adamczak,Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, arXiv preprint arXiv:0505175 (2005)
2005
-
[29]
Tropp,Freedman’s inequality for matrix martingales, Electronic Communications in Probability16(2011), 262–270
J. Tropp,Freedman’s inequality for matrix martingales, Electronic Communications in Probability16(2011), 262–270
2011
-
[30]
Billingsley,Convergence of probability measures, Wiley, 1999
P. Billingsley,Convergence of probability measures, Wiley, 1999. 16 CLT-OPTIMAL ERROR BOUNDS IN LDS IDENTIFICATION
1999
-
[31]
Osekowski,A note on the Burkholder–Rosenthal inequality, Bulletin of the Polish Academy of Sciences Math- ematics60(2012), no
A. Osekowski,A note on the Burkholder–Rosenthal inequality, Bulletin of the Polish Academy of Sciences Math- ematics60(2012), no. 2, 177––185
2012
-
[32]
B Folland,Real analysis, 2nd ed., Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts, John Wiley & Sons, Nashville, TN, 2013 (en)
G. B Folland,Real analysis, 2nd ed., Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts, John Wiley & Sons, Nashville, TN, 2013 (en)
2013
-
[33]
Borell,Convex set functions in d-space, Periodica Mathematica Hungarica6(1975), no
C. Borell,Convex set functions in d-space, Periodica Mathematica Hungarica6(1975), no. 2, 111–136
1975
-
[34]
Dirksen,Tail bounds via generic chaining, Electronic Journal of Probability20(2015), 1–29
S. Dirksen,Tail bounds via generic chaining, Electronic Journal of Probability20(2015), 1–29
2015
-
[35]
D. Hsu, S. Kakade, and T. Zhang,A tail inequality for quadratic forms of subgaussian random vectors, Electronic Communications in Probability17(2012), 1–6. AppendixA.Supporting Proofs for Asymptotic Analysis In this section we supply proofs to our claims in Section 2.1. Some proofs will crucially rely on Lemma 4.3, whose proof we further delay to Appendix...
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.