pith. machine review for the scientific record. sign in

arxiv: 2605.08362 · v1 · submitted 2026-05-08 · 🧮 math.NA · cs.NA

Recognition: no theorem link

Kernel-based linear system identification using augmented Krylov subspaces

Daniel Kressner, Fabio Matti, Martin Skovgaard Andersen, Tianshi Chen

Pith reviewed 2026-05-12 01:11 UTC · model grok-4.3

classification 🧮 math.NA cs.NA
keywords Krylov subspace methodsaugmented subspaceskernel-based identificationFIR estimationmaximum likelihood estimationlinear time-invariant systemsnumerical linear algebra
0
0 comments X

The pith

A single augmented Krylov subspace jointly approximates both the data-fit quadratic form and the log-determinant needed for kernel-based FIR identification, yielding accelerated convergence and low-cost reuse across regularization values.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Krylov method that estimates the finite impulse response of linear time-invariant systems by combining a kernel formulation with maximum-likelihood hyperparameter selection. This selection requires repeated computation of a quadratic form measuring data fit and the log-determinant measuring model complexity, both depending on a fixed positive-semidefinite matrix A that supports fast matrix-vector products. Rather than approximating the two quantities independently, the method builds one augmented Krylov subspace for A and extracts both approximations from it. Shift invariance of the subspace then permits cheap evaluation of the entire maximum-likelihood objective for many values of the regularization parameter. Error bounds are derived that quantify the benefit of augmentation, and numerical tests on system-identification problems confirm faster convergence for the quadratic term.

Core claim

Instead of approximating the data-fit term y^T (λI + A)^{-1} y and the model-complexity term log det(λI + A) separately, both quantities are obtained from a single augmented Krylov subspace generated by A. Augmentation supplies implicit preconditioning that accelerates convergence of the quadratic-form approximation, while the shift-invariance property of Krylov subspaces allows the extracted information to be reused for many regularization parameters λ at negligible extra cost. Explicit error bounds are proved that reflect these advantages.

What carries the argument

An augmented Krylov subspace for the positive-semidefinite matrix A that arises from the kernel-based FIR formulation; the augmentation jointly supplies approximations to the quadratic form and the log-determinant while enabling reuse across shifts λ.

If this is right

  • The same subspace information yields accurate maximum-likelihood objectives for many values of λ with almost no additional arithmetic.
  • Error bounds are available that incorporate the acceleration gained from augmentation.
  • The overall procedure reduces the cost of hyperparameter tuning inside kernel-based linear-system identification.
  • Because only matrix-vector products with A are required, the method scales to problems where forming A explicitly is prohibitive.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same joint-approximation idea could be tested on other kernel-regularized estimation problems whose objectives involve repeated quadratic forms and log-determinants.
  • Implementation in real-time or embedded identification settings would benefit from the cheap λ-reuse property.
  • Combining the augmented subspace with existing preconditioners for A might produce further speed-ups that the paper does not explore.

Load-bearing premise

The specific positive-semidefinite matrices A that appear in kernel-based FIR identification must admit fast matrix-vector products and must produce the claimed acceleration and error bounds when augmented.

What would settle it

A numerical experiment on a standard FIR identification benchmark in which the augmented Krylov method shows no faster convergence (or larger errors) for the quadratic form than a non-augmented Krylov method of the same dimension.

read the original abstract

We propose a novel Krylov subspace method for estimating the finite impulse response (FIR) of a one-dimensional linear time-invariant systems. The method approximates the system's FIR using a kernel-based formulation combined with hyperparameter selection based on maximum likelihood estimation (MLE), which requires repeated evaluation of two terms: The data fit $\boldsymbol{y}^{\top} (\lambda \boldsymbol{I} + \boldsymbol{A})^{-1} \boldsymbol{y}$ and the model complexity $\log(\det (\lambda \boldsymbol{I} + \boldsymbol{A}))$, where $\boldsymbol{A}$ is a certain positive semidefinite matrix that admits fast matrix--vector products and $\lambda > 0$ is a regularization parameter. Instead of approximating these two quantities separately, we jointly approximate them using a single augmented Krylov subspace for $\boldsymbol{A}$. One major benefit of augmentation is that we obtain accelerated convergence when approximating the data fit quadratic form, through implicit preconditioning. Thanks to the shift invariance of Krylov subspaces, the extracted approximations can be used to evaluate the MLE objective for many values of $\lambda$ at little additional cost. We derive error bounds for the approximations, reflecting the benefits of augmentation demonstrated through multiple numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a Krylov subspace method for approximating the data-fit quadratic form y^T (λI + A)^{-1} y and log-determinant term log det(λI + A) arising in maximum-likelihood hyperparameter selection for kernel-based FIR linear system identification. A single augmented Krylov subspace for the positive-semidefinite matrix A is used to jointly approximate both quantities; shift-invariance of the subspace allows cheap evaluation for many λ, error bounds are derived, and numerical experiments are reported to illustrate accelerated convergence for the quadratic form via implicit preconditioning.

Significance. If the error bounds are valid and the augmentation consistently accelerates the quadratic-form approximation for the kernel-derived PSD matrices that arise in practice, the method offers a practical reduction in computational cost for repeated MLE evaluations in large-scale kernel-based system identification, while preserving the ability to sweep regularization parameters efficiently.

major comments (3)
  1. [§2] §2 (method description): the augmentation vector is introduced without an explicit construction or justification showing why it produces implicit preconditioning for the specific spectrum of the kernel matrix A; the central acceleration claim therefore rests on an unverified interaction between the chosen vector and the eigenvalues of A.
  2. [§3, Eq. (8)] §3, Eq. (8) and surrounding derivation: the error bound for the quadratic-form approximation is stated to reflect the benefit of augmentation, yet the proof does not quantify the improvement relative to a non-augmented Krylov subspace of the same dimension; without this comparison the bound does not yet establish that augmentation is load-bearing for the claimed acceleration.
  3. [§4] §4 (numerical experiments): the reported timings and convergence curves compare the augmented method only against separate approximations, not against a standard (non-augmented) Krylov method using an equivalent total number of matrix-vector products; this leaves open whether the observed gains are due to augmentation or simply to joint evaluation.
minor comments (2)
  1. [Abstract / §1] Notation for the kernel matrix A and the vector y is introduced late; a brief recap in the abstract or §1 would improve readability.
  2. [§4] Several figure captions omit the precise dimensions of the test problems and the number of Monte-Carlo trials; these details should be added for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each of the three major comments below, indicating the revisions we will incorporate to strengthen the presentation while preserving the core contributions of the augmented Krylov approach.

read point-by-point responses
  1. Referee: [§2] §2 (method description): the augmentation vector is introduced without an explicit construction or justification showing why it produces implicit preconditioning for the specific spectrum of the kernel matrix A; the central acceleration claim therefore rests on an unverified interaction between the chosen vector and the eigenvalues of A.

    Authors: We agree that §2 would benefit from an explicit description of the augmentation vector and a clearer justification of its preconditioning effect. In the revised manuscript we will expand the method description to state the construction of the augmentation vector (chosen to align with the observation vector y) and provide a brief motivation based on the eigenvalue clustering typical of kernel-derived matrices A in FIR identification. This addition will make the implicit-preconditioning claim self-contained without altering the algorithmic procedure. revision: yes

  2. Referee: [§3, Eq. (8)] §3, Eq. (8) and surrounding derivation: the error bound for the quadratic-form approximation is stated to reflect the benefit of augmentation, yet the proof does not quantify the improvement relative to a non-augmented Krylov subspace of the same dimension; without this comparison the bound does not yet establish that augmentation is load-bearing for the claimed acceleration.

    Authors: The referee correctly notes that the existing derivation of the bound in Eq. (8) does not contain an explicit side-by-side comparison with the non-augmented Lanczos bound of identical dimension. We will therefore add a short corollary (or remark) immediately following the proof that relates the augmented bound to the classical one, showing that the extra vector can only tighten the estimate and quantifying the improvement in terms of the angle between the augmentation vector and the current residual. This revision will make the load-bearing role of augmentation explicit while leaving the original proof unchanged. revision: yes

  3. Referee: [§4] §4 (numerical experiments): the reported timings and convergence curves compare the augmented method only against separate approximations, not against a standard (non-augmented) Krylov method using an equivalent total number of matrix-vector products; this leaves open whether the observed gains are due to augmentation or simply to joint evaluation.

    Authors: We accept that the current experimental design does not isolate the contribution of augmentation from the benefit of joint evaluation. In the revised §4 we will include an additional set of convergence curves that compare the augmented subspace against a plain (non-augmented) Krylov subspace whose dimension is chosen so that the total number of matrix-vector products matches that of the augmented run. The new plots will demonstrate that the faster decay observed for the quadratic-form error is attributable to the augmentation step rather than to the mere fact of joint approximation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard Krylov theory and derived bounds

full rationale

The paper presents a numerical linear algebra technique for jointly approximating the quadratic form and log-determinant terms arising in kernel-based FIR identification via a single augmented Krylov subspace. The central claims (accelerated convergence for the quadratic term via implicit preconditioning, shift-invariance for multiple λ, and error bounds) are supported by analysis of Krylov subspace properties and explicit error derivations rather than by redefining inputs in terms of outputs or by load-bearing self-citations. The augmentation vector choice and its interaction with the PSD matrix A are part of the proposed method, not assumed by construction. No steps reduce the claimed benefits to fitted parameters or prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on standard properties of Krylov subspaces and the assumption that fast matrix-vector products with A are available. No new entities are postulated and no parameters are fitted inside the central approximation claim.

axioms (2)
  • standard math Krylov subspaces generated by A are shift-invariant
    Invoked to allow reuse of the same subspace for many values of the regularization parameter lambda at little extra cost.
  • domain assumption Augmented Krylov subspaces provide implicit preconditioning for quadratic forms
    Used to claim accelerated convergence for the data-fit term.

pith-pipeline@v0.9.0 · 5518 in / 1389 out tokens · 38385 ms · 2026-05-12T01:11:56.761426+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Springer, London (2010)

    Naylor, P.A., Gaubitch, N.D.: Speech Dereverberation. Springer, London (2010). https://doi.org/10.1007/978-1-84996-056-4 17

  2. [2]

    In: Signal Analysis and Prediction, pp

    Ljung, L.: System Identification. In: Signal Analysis and Prediction, pp. 163–173. Birkh¨ auser Boston, Boston, MA (1998). https://doi.org/10.1007/ 978-1-4612-1768-8 11

  3. [3]

    Andersson, T., Pucar, P.: Estimation of residence time in continuous flow systems with dynamics. J. Process Control5(1), 9–17 (1995) https://doi.org/10.1016/ 0959-1524(95)95941-6

  4. [4]

    Automatica48(8), 1525–1535 (2012) https://doi.org/10.1016/j.automatica.2012.05.026

    Chen, T., Ohlsson, H., Ljung, L.: On the estimation of transfer functions, regular- izations and Gaussian processes—Revisited. Automatica48(8), 1525–1535 (2012) https://doi.org/10.1016/j.automatica.2012.05.026

  5. [5]

    Herzog, T

    Pillonetto, G., De Nicolao, G.: A new kernel-based approach for linear sys- tem identification. Automatica46(1), 81–93 (2010) https://doi.org/10.1016/j. automatica.2009.10.031

  6. [6]

    IEEE Control Syst

    Suzuki, R., Oomen, T., Gonz´ alez, R.A.: Direct Bayesian identification of inverse linear systems. IEEE Control Syst. Lett.9, 1478–1483 (2025) https://doi.org/10. 1109/LCSYS.2025.3578480

  7. [7]

    Chung, J., Miller, S.M., Sabate Landman, M., Saibaba, A.K.: Efficient hyperpa- rameter estimation in Bayesian inverse problems using sample average approxi- mation. Philos. Trans. Roy. Soc. A383(2305), 20240046 (2025) https://doi.org/ 10.1098/rsta.2024.0046

  8. [8]

    Chung, J., Saibaba, A.K.: Generalized hybrid iterative methods for large-scale bayesian inverse problems. SIAM J. Sci. Comput.39(5), 24–46 (2017) https: //doi.org/10.1137/16M1081968

  9. [9]

    IEEE Trans

    Chen, L., Chen, T., Andersen, M.S.: Fast kernel-based regularized system iden- tification using Bayesian optimization. IEEE Trans. Autom. Control71(4), 2200–2212 (2026) https://doi.org/10.1109/TAC.2025.3624192

  10. [10]

    In: Proc

    Chen, L., Chen, T., Detha, U., Andersen, M.S.: Towards scalable kernel-based reg- ularized system identification. In: Proc. 62nd IEEE CDC, pp. 1498–1504 (2023). https://doi.org/10.1109/CDC49753.2023.10384051

  11. [11]

    In: Handbook of Uncertainty Quantification, pp

    Dashti, M., Stuart, A.M.: The Bayesian Approach to Inverse Problems. In: Handbook of Uncertainty Quantification, pp. 311–428. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-12385-1 7

  12. [12]

    Ubaru, S., Chen, J., Saad, Y.: Fast estimation of tr(f(A)) via stochastic Lanczos quadrature. SIAM J. Matrix Anal. Appl.38(4), 1075–1099 (2017) https://doi. org/10.1137/16M1104974

  13. [13]

    Andersen, M.S., Chen, T.: Smoothing splines and rank structured matrices: Revisiting the spline kernel. SIAM J. Matrix Anal. Appl.41(2), 389–412 (2020) 18 https://doi.org/10.1137/19M1267349

  14. [14]

    ACM Trans

    Paige, C.C., Saunders, M.A.: LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software8(1), 43–71 (1982) https: //doi.org/10.1145/355984.355989

  15. [15]

    Frangella, Z., Tropp, J.A., Udell, M.: Randomized Nystr¨ om preconditioning. SIAM J. Matrix Anal. Appl.44(2), 718–752 (2023) https://doi.org/10.1137/ 21M1466244

  16. [16]

    Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Comm. Statist. Simulation Comput.19(2), 433–450 (1990) https://doi.org/10.1080/03610919008812866

  17. [17]

    Girard, D.A.: A fast ‘Monte-Carlo cross-validation’ procedure for large least squares problems with noisy data. Numer. Math.56(1), 1–23 (1989) https: //doi.org/10.1007/BF01395775

  18. [18]

    Johns Hopkins University Press, Baltimore, MD (2013)

    Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore, MD (2013). https://doi.org/10.1137/1.9781421407944

  19. [19]

    Zhou, Y., Saad, Y.: Block Krylov–Schur method for large symmetric eigenvalue problems. Numer. Algorithms47(4), 341–359 (2008) https://doi.org/10.1007/ s11075-008-9192-9

  20. [20]

    Princeton University Press, Princeton, NJ (2009)

    Golub, G.H., Meurant, G.: Matrices, Moments and Quadrature with Applications. Princeton University Press, Princeton, NJ (2009)

  21. [21]

    Bit Numer

    Li, H., Zhu, Y.: Randomized block Krylov subspace methods for trace and log- determinant estimators. Bit Numer. Math.61(3), 911–939 (2021) https://doi. org/10.1007/s10543-021-00850-7

  22. [22]

    Cortinovis, A., Kressner, D.: On randomized trace estimates for indefinite matri- ces with an application to determinants. Found. Comput. Math.22(3), 875–903 (2022) https://doi.org/10.1007/s10208-021-09525-9

  23. [23]

    Saibaba, A.K., Alexanderian, A., Ipsen, I.C.F.: Randomized matrix-free trace and log-determinant estimators. Numer. Math.137(2), 353–395 (2017) https: //doi.org/10.1007/s00211-017-0880-z

  24. [24]

    Lin, L.: Randomized estimation of spectral densities of large matrices made accurate. Numer. Math.136(1), 183–213 (2017) https://doi.org/10.1007/ s00211-016-0837-7

  25. [25]

    In: Proc

    Meyer, R.A., Musco, C., Musco, C., Woodruff, D.P.: Hutch++: Optimal stochas- tic trace estimation. In: Proc. 2021 Symp. Simpl. Algorithms, pp. 142–155. SIAM, Philadelphia, PA (2021). https://doi.org/10.1137/1.9781611976496.16 19

  26. [26]

    , title =

    Higham, N.J.: Functions of Matrices. SIAM, Philadelphia, PA (2008). https:// doi.org/10.1137/1.9780898717778

  27. [27]

    Matti, F., He, H., Kressner, D., Lam, H.Y.: Stochastic trace estimation for parameter-dependent matrices applied to spectral density approximation. SIAM J. Sci. Comput. (2026) https://doi.org/10.48550/arXiv.2502.18626 . to appear

  28. [28]

    Greenbaum, A.: Iterative Methods for Solving Linear Systems vol. 17. SIAM, Philadelphia, PA (1997)

  29. [29]

    Linear Algebra Appl.29, 293–322 (1980) https://doi.org/10.1016/0024-3795(80) 90247-5

    O’Leary, D.P.: The block conjugate gradient algorithm and related methods. Linear Algebra Appl.29, 293–322 (1980) https://doi.org/10.1016/0024-3795(80) 90247-5

  30. [30]

    ETNA, 63–92 (2026) https: //doi.org/10.1553/etna vol65s63

    Chen, T., Huber, C., Lin, E., Zaid, H.: Preconditioning without a preconditioner using randomized block Krylov subspace methods. ETNA, 63–92 (2026) https: //doi.org/10.1553/etna vol65s63

  31. [31]

    Cambridge University Press, Cambridge (1985)

    Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985). https://doi.org/10.1017/CBO9780511810817

  32. [32]

    MathWorks Incorporated Natick, Online (1995) 20

    Ljung, L.: System Identification Toolbox: User’s Guide. MathWorks Incorporated Natick, Online (1995) 20