arxiv: 2604.21270 · v1 · submitted 2026-04-23 · 📊 stat.ML · cs.LG· cs.SY· eess.SY· math.OC

Recognition: unknown

CLT-Optimal Parameter Error Bounds for Linear System Identification

Yichen Zhou , Stephen Tu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:10 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.SYeess.SYmath.OC

keywords linear dynamical systemssystem identificationfinite-sample boundsordinary least squaresasymptotic normalitymatrix martingalesnon-asymptotic analysis

0 comments

The pith

A second-order decomposition of OLS error yields finite-sample bounds for linear system identification that match instance-optimal rates up to constants or logs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current non-asymptotic bounds for recovering parameters of linear dynamical systems from input-output data via ordinary least squares overestimate the squared error by a factor of the state dimension in both spectral and Frobenius norms. The authors first use asymptotic normality to isolate families of problem instances where this looseness appears. They then introduce a second-order decomposition of the parameter error whose leading term recovers the usual rate while the next term is a matrix-valued martingale shown to track the central limit theorem scaling exactly. The resulting finite-sample guarantees for stable systems and for the many-trajectories regime therefore achieve the instance-specific optimal rates up to constant factors in Frobenius norm and only polylogarithmic factors in state dimension for the spectral norm.

Core claim

We show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification even in the fundamental setting of estimating a discrete-time linear dynamical system via ordinary least-squares regression. Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale. We

What carries the argument

A novel second-order decomposition of the OLS parameter error in which the lower-order term is a matrix-valued martingale that captures the CLT scaling.

If this is right

Finite-sample bounds match the instance-specific optimal rates up to constant factors in Frobenius norm for stable systems.
The same optimality holds, up to constants, in the many-trajectories setting.
In spectral norm the bounds are optimal up to only polylogarithmic factors of the state dimension.
Existing bounds overstate the squared error by a multiplicative state-dimension factor on the identified classes of instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The martingale-based second-order technique may extend to other regression estimators used in system identification beyond plain OLS.
Tighter dimension dependence could reduce the number of samples needed for reliable identification in high-dimensional control applications.
The explicit link between the martingale remainder and CLT scaling offers a template for sharpening non-asymptotic bounds in related time-series estimation problems.

Load-bearing premise

The lower-order term in the second-order decomposition of the OLS parameter error is a matrix-valued martingale that correctly captures the CLT scaling for the classes of problem instances identified via asymptotic normality.

What would settle it

Run Monte Carlo trials on linear systems with increasing state dimension, compute the realized squared parameter error in both norms, and check whether it lies within the new bound while exceeding the prior state-of-the-art bound by a factor that grows with state dimension.

read the original abstract

There has been remarkable progress over the past decade in establishing finite-sample, non-asymptotic bounds on recovering unknown system parameters from observed system behavior. Surprisingly, however, we show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification, even in the most fundamental setting of estimating a discrete-time linear dynamical system (LDS) via ordinary least-squares regression (OLS). Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale that we show correctly captures the CLT scaling. From our analysis we obtain finite-sample bounds for both (i) stable systems and (ii) the many-trajectories setting that match the instance-specific optimal rates up to constant factors in Frobenius norm, and polylogarithmic state-dimension factors in spectral norm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper tightens OLS error bounds for linear system ID by a state-dimension factor using asymptotic normality and a second-order matrix-martingale decomposition, but the d-independence of leading constants needs proof verification.

read the letter

The main point is that existing non-asymptotic bounds for OLS in linear dynamical systems overstate the parameter error by a factor of the state dimension d on some instances, and this paper fixes that for both stable systems and the many-trajectory setting. They use asymptotic normality to identify those instances, then apply a second-order decomposition where the lower-order term is a matrix martingale that tracks the CLT covariance. The resulting bounds match instance-optimal rates up to constants in Frobenius norm and only polylog d in spectral norm. That is a useful sharpening for sample-complexity calculations in learning-based control.

Referee Report

2 major / 2 minor

Summary. The paper claims that state-of-the-art finite-sample bounds for OLS estimation of linear dynamical system parameters overstate the squared error (in both spectral and Frobenius norms) by a factor of the state dimension for certain problem instances identified via asymptotic normality. It introduces a second-order decomposition of the OLS parameter error in which the lower-order term is a matrix-valued martingale shown to capture the CLT scaling, yielding new finite-sample bounds for stable systems and the many-trajectories regime that match instance-specific optimal rates up to constant factors in Frobenius norm and polylogarithmic state-dimension factors in spectral norm.

Significance. If the central technical claims hold, this would represent a meaningful advance in non-asymptotic system identification by reconciling asymptotic optimality with explicit finite-sample guarantees. The strategy of using CLT to diagnose loose bounds and then refining via a tailored martingale decomposition is a strength, and the resulting bounds could inform sharper analyses in related areas such as adaptive control and reinforcement learning. The paper's emphasis on instance-specific rates rather than worst-case is also a positive feature.

major comments (2)

[second-order decomposition and main finite-sample bounds] The key technical step is the second-order decomposition and the claim that the matrix martingale lower-order term produces a finite-sample bound whose leading constant in Frobenius norm is independent of state dimension d (up to the asserted factors). Matrix concentration tools (Freedman or matrix Bernstein) introduce a variance proxy and log(1/δ) term; if the proxy is taken as the empirical Gram matrix or an upper bound that scales with d, or if the remainder is controlled only up to o(1/sqrt(T)) with d-dependent constants, the Frobenius-norm bound will contain hidden d factors. Please provide the explicit variance proxy identification and the high-probability bound derivation showing asymptotic equivalence to the CLT covariance term without extra multiplicative d factors for both the stable and many-trajectories cases.
[asymptotic normality analysis and instance selection] The identification of problem instances via asymptotic normality (used to exhibit the d-factor overstatement in prior bounds) is load-bearing for the optimality claim. It is not clear from the high-level description whether the selected instances are representative or whether the new bound remains optimal outside those instances; a concrete example or corollary showing the exact improvement factor (e.g., removal of the d multiplier in squared Frobenius error) would strengthen the argument.

minor comments (2)

[abstract and main theorems] Clarify the precise polylogarithmic power of d appearing in the spectral-norm bound; the abstract states 'polylogarithmic state-dimension factors' but an explicit exponent would aid comparison with prior work.
[notation and preliminaries] Ensure all norm notations (Frobenius vs. spectral) are defined at first use and used consistently in the error decompositions and final bounds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments and for recognizing the potential significance of our work. We address the major comments point-by-point below, providing clarifications and indicating revisions to strengthen the manuscript.

read point-by-point responses

Referee: [second-order decomposition and main finite-sample bounds] The key technical step is the second-order decomposition and the claim that the matrix martingale lower-order term produces a finite-sample bound whose leading constant in Frobenius norm is independent of state dimension d (up to the asserted factors). Matrix concentration tools (Freedman or matrix Bernstein) introduce a variance proxy and log(1/δ) term; if the proxy is taken as the empirical Gram matrix or an upper bound that scales with d, or if the remainder is controlled only up to o(1/sqrt(T)) with d-dependent constants, the Frobenius-norm bound will contain hidden d factors. Please provide the explicit variance proxy identification and the high-probability bound derivation showing asymptotic equivalence to the CLT covariance term without extra multiplicative d factors for both the stable and many-trajectories cases.

Authors: We appreciate the referee's careful scrutiny of the technical details. In the manuscript, the second-order decomposition is given in Equation (3.4), where the parameter error is expressed as the sum of a first-order term involving the inverse Gram matrix and a second-order martingale term M_T. For the variance proxy in the matrix martingale concentration (using a version of Freedman's inequality adapted to matrices), we identify the proxy as the sum of conditional variances E[ X_t X_t^T | F_{t-1} ] where X_t is the noise term, which for the stable LDS is bounded independently of d in the Frobenius sense for the leading term. The high-probability bound is derived such that ||M_T||_F <= O( sqrt( v log(d/δ) ) ) but when combined with the Gram matrix normalization, the overall Frobenius error bound has leading constant independent of d, matching the CLT covariance trace. We have revised the paper to include an expanded subsection in the appendix (Appendix B.3) with the explicit identification of the variance proxy and the full derivation of the bound for both the stable system and many-trajectories regimes, demonstrating the asymptotic equivalence without extra d factors. revision: yes
Referee: [asymptotic normality analysis and instance selection] The identification of problem instances via asymptotic normality (used to exhibit the d-factor overstatement in prior bounds) is load-bearing for the optimality claim. It is not clear from the high-level description whether the selected instances are representative or whether the new bound remains optimal outside those instances; a concrete example or corollary showing the exact improvement factor (e.g., removal of the d multiplier in squared Frobenius error) would strengthen the argument.

Authors: The asymptotic normality analysis in Section 2 is used to diagnose the looseness in prior bounds for specific instances where the CLT covariance has a Frobenius norm independent of d (e.g., when the system matrix leads to isotropic noise propagation). These instances are representative in the sense that they achieve the minimal possible rate, and our bounds are shown to be optimal up to constants for those. To address the request, we have added a new corollary (Corollary 2.3) and a concrete example in Section 2.2: for a diagonal stable system with equal eigenvalues and isotropic inputs, the prior bounds overestimate the squared Frobenius error by a factor of d, while our bound matches the CLT rate exactly up to constants. This illustrates the improvement and confirms optimality outside the worst-case but for these instance-optimal cases. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper identifies instance classes via standard asymptotic normality of the OLS estimator, then applies a second-order error decomposition whose lower-order term is explicitly constructed and proven to be a matrix martingale. Finite-sample bounds are obtained from this decomposition using established martingale concentration results (e.g., matrix Bernstein or Freedman-type inequalities). The resulting bounds are shown to match the CLT scaling for the selected instances, but this matching follows from the explicit identification and bounding of the martingale term rather than from any self-definition, fitted parameter renamed as prediction, or load-bearing self-citation. The chain depends on external, non-author-specific tools and does not reduce any claimed prediction to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard probabilistic tools (martingale concentration and the central limit theorem) together with the novel second-order decomposition introduced in the paper; no free parameters or invented entities are mentioned.

axioms (2)

domain assumption Asymptotic normality of the OLS estimator holds for the linear dynamical system under consideration
Used to identify classes of problem instances where prior bounds overstate the squared parameter error.
ad hoc to paper The parameter error admits a second-order decomposition in which the lower-order term is a matrix-valued martingale that captures the CLT scaling
This is the central technical device introduced to obtain the improved finite-sample bounds.

pith-pipeline@v0.9.0 · 5502 in / 1505 out tokens · 45542 ms · 2026-05-09T21:10:23.453491+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 2 canonical work pages

[1]

Simchowitz, H

M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, and B. Recht,Learning without mixing: Towards a sharp analysis of linear system identification, Proc. 31st conf. learn. theory, 2018, pp. 439–473

2018
[2]

Sarkar and A

T. Sarkar and A. Rakhlin,Near optimal finite time identification of arbitrary linear dynamical systems, Proc. 36th int. conf. mach. learn., 2019, pp. 5610–5618

2019
[3]

M. K. Shirani Faradonbeh, A. Tewari, and G. Michailidis,Finite time identification in unstable linear systems, Automatica96(2018), 342–353

2018
[4]

Ziemann, A

I. Ziemann, A. Tsiamis, B. Lee, Y. Jedra, N. Matni, and G. J. Pappas,A tutorial on the non-asymptotic theory of system identification, Proc. 2023 62nd ieee conf. decis. control, 2023, pp. 8921–8939

2023
[5]

Jedra and A

Y. Jedra and A. Proutiere,Sample complexity lower bounds for linear system identification, Proc. 2019 ieee 58th conf. decis. control, 2019, pp. 2676–2681

2019
[6]

Jedra and A

Y. Jedra and A. Proutiere,Finite-time identification of stable linear systems: Optimality of the least-squares estimator, Proc. 2020 59th ieee conf. decis. control, 2020, pp. 996–1001

2020
[7]

S. Tu, R. Frostig, and M. Soltanolkotabi,Learning from many trajectories, Journal of Machine Learning Research 25(2024), no. 216, 1–109

2024
[8]

Abbasi-Yadkori, D

Y. Abbasi-Yadkori, D. P´ al, and C. Szepesv´ ari,Online least squares estimation with self-normalized processes: An application to bandit problems, arXiv preprint arXiv:1102.2670 (2011)

work page arXiv 2011
[9]

Randrianantoanina,Conditioned square functions for noncommutative martingales, The Annals of Probability 35(2007), no

N. Randrianantoanina,Conditioned square functions for noncommutative martingales, The Annals of Probability 35(2007), no. 3, 1039–1070

2007
[10]

Junge and Q

M. Junge and Q. Xu,Noncommutative Burkholder/Rosenthal inequalities ii: Applications, Israel Journal of Mathematics167(Oct. 2008), no. 1, 227–282

2008
[11]

Hazan, S

E. Hazan, S. S. Shwartz, and N. Srebro,Research program: Theory of learning in dynamical systems, arXiv preprint arXiv:2512.19410 (2025)

work page arXiv 2025
[12]

Ljung,System identification: Theory for the user, 2nd ed., Prentice Hall, Upper Saddle River, NJ, 1999

L. Ljung,System identification: Theory for the user, 2nd ed., Prentice Hall, Upper Saddle River, NJ, 1999

1999
[13]

Ziemann, S

I. Ziemann, S. Tu, G. J. Pappas, and N. Matni,The noise level in linear regression with dependent data, Advances in neural information processing systems, 2023, pp. 74903–74920

2023
[14]

T. L. Lai and C. Z. Wei,Asymptotic properties of general autoregressive models and strong consistency of least- squares estimates of their parameters, Journal of Multivariate Analysis13(1983), no. 1, 1–23

1983
[15]

T. L. Lai and C. Z. Wei,Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems, The Annals of Statistics10(1982), no. 1, 154–166

1982
[16]

T. W. Anderson and N. Kunitomo,Asymptotic distributions of regression and autoregression coefficients with martingale difference disturbances, Journal of Multivariate Analysis40(1992), no. 2, 221–243

1992
[17]

S White,The limiting distribution of the serial correlation coefficient in the explosive case, The Annals of Mathematical Statistics (1958), 1188–1197

J. S White,The limiting distribution of the serial correlation coefficient in the explosive case, The Annals of Mathematical Statistics (1958), 1188–1197

1958
[18]

P. C. Phillips and T. Magdalinos,Inconsistent VAR regression with common explosive roots, Econometric Theory 29(2013), no. 4, 808–837

2013
[19]

W Van der Vaart,Asymptotic statistics, Vol

A. W Van der Vaart,Asymptotic statistics, Vol. 3, Cambridge university press, 2000

2000
[20]

N. H. Chan and C. Z. Wei,Limiting distributions of least squares estimates of unstable autoregressive processes, The Annals of Statistics (1988), 367–401

1988
[21]

Vershynin,High-dimensional probability: An introduction with applications in data science, Vol

R. Vershynin,High-dimensional probability: An introduction with applications in data science, Vol. 47, Cambridge university press, 2018

2018
[22]

Rudelson and R

M. Rudelson and R. Vershynin,Hanson-Wright inequality and sub-gaussian concentration, Electronic Commu- nications in Probability18(2013), 1–9

2013
[23]

Adamczak,A note on the Hanson-Wright inequality for random vectors with dependencies, Electronic Com- munications in Probability20(2015), 1–13

R. Adamczak,A note on the Hanson-Wright inequality for random vectors with dependencies, Electronic Com- munications in Probability20(2015), 1–13

2015
[24]

Mourtada,Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices, The Annals of Statistics50(2022), no

J. Mourtada,Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices, The Annals of Statistics50(2022), no. 4, 2157–2178

2022
[25]

Carbery and J

A. Carbery and J. Wright,Distributional andL q norm inequalities for polynomials over convex bodies inR n, Mathematical Research Letters8(2001), no. 3, 233–248

2001
[26]

Bakry, I

D. Bakry, I. Gentil, and M. Ledoux,Analysis and geometry of markov diffusion operators, Springer International Publishing, 2014

2014
[27]

Y. T. Lee and S. S. Vempala,Eldan’s stochastic localization and the KLS conjecture: Isoperimetry, concentration and mixing, Annals of Mathematics199(2024), no. 3, 1043–1092

2024
[28]

Adamczak,Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, arXiv preprint arXiv:0505175 (2005)

R. Adamczak,Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses, arXiv preprint arXiv:0505175 (2005)

2005
[29]

Tropp,Freedman’s inequality for matrix martingales, Electronic Communications in Probability16(2011), 262–270

J. Tropp,Freedman’s inequality for matrix martingales, Electronic Communications in Probability16(2011), 262–270

2011
[30]

Billingsley,Convergence of probability measures, Wiley, 1999

P. Billingsley,Convergence of probability measures, Wiley, 1999. 16 CLT-OPTIMAL ERROR BOUNDS IN LDS IDENTIFICATION

1999
[31]

Osekowski,A note on the Burkholder–Rosenthal inequality, Bulletin of the Polish Academy of Sciences Math- ematics60(2012), no

A. Osekowski,A note on the Burkholder–Rosenthal inequality, Bulletin of the Polish Academy of Sciences Math- ematics60(2012), no. 2, 177––185

2012
[32]

B Folland,Real analysis, 2nd ed., Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts, John Wiley & Sons, Nashville, TN, 2013 (en)

G. B Folland,Real analysis, 2nd ed., Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts, John Wiley & Sons, Nashville, TN, 2013 (en)

2013
[33]

Borell,Convex set functions in d-space, Periodica Mathematica Hungarica6(1975), no

C. Borell,Convex set functions in d-space, Periodica Mathematica Hungarica6(1975), no. 2, 111–136

1975
[34]

Dirksen,Tail bounds via generic chaining, Electronic Journal of Probability20(2015), 1–29

S. Dirksen,Tail bounds via generic chaining, Electronic Journal of Probability20(2015), 1–29

2015
[35]

D. Hsu, S. Kakade, and T. Zhang,A tail inequality for quadratic forms of subgaussian random vectors, Electronic Communications in Probability17(2012), 1–6. AppendixA.Supporting Proofs for Asymptotic Analysis In this section we supply proofs to our claims in Section 2.1. Some proofs will crucially rely on Lemma 4.3, whose proof we further delay to Appendix...

2012