arxiv: 2605.08362 · v1 · submitted 2026-05-08 · 🧮 math.NA · cs.NA

Recognition: no theorem link

Kernel-based linear system identification using augmented Krylov subspaces

Daniel Kressner, Fabio Matti, Martin Skovgaard Andersen, Tianshi Chen

Pith reviewed 2026-05-12 01:11 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords Krylov subspace methodsaugmented subspaceskernel-based identificationFIR estimationmaximum likelihood estimationlinear time-invariant systemsnumerical linear algebra

0 comments

The pith

A single augmented Krylov subspace jointly approximates both the data-fit quadratic form and the log-determinant needed for kernel-based FIR identification, yielding accelerated convergence and low-cost reuse across regularization values.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Krylov method that estimates the finite impulse response of linear time-invariant systems by combining a kernel formulation with maximum-likelihood hyperparameter selection. This selection requires repeated computation of a quadratic form measuring data fit and the log-determinant measuring model complexity, both depending on a fixed positive-semidefinite matrix A that supports fast matrix-vector products. Rather than approximating the two quantities independently, the method builds one augmented Krylov subspace for A and extracts both approximations from it. Shift invariance of the subspace then permits cheap evaluation of the entire maximum-likelihood objective for many values of the regularization parameter. Error bounds are derived that quantify the benefit of augmentation, and numerical tests on system-identification problems confirm faster convergence for the quadratic term.

Core claim

Instead of approximating the data-fit term y^T (λI + A)^{-1} y and the model-complexity term log det(λI + A) separately, both quantities are obtained from a single augmented Krylov subspace generated by A. Augmentation supplies implicit preconditioning that accelerates convergence of the quadratic-form approximation, while the shift-invariance property of Krylov subspaces allows the extracted information to be reused for many regularization parameters λ at negligible extra cost. Explicit error bounds are proved that reflect these advantages.

What carries the argument

An augmented Krylov subspace for the positive-semidefinite matrix A that arises from the kernel-based FIR formulation; the augmentation jointly supplies approximations to the quadratic form and the log-determinant while enabling reuse across shifts λ.

If this is right

The same subspace information yields accurate maximum-likelihood objectives for many values of λ with almost no additional arithmetic.
Error bounds are available that incorporate the acceleration gained from augmentation.
The overall procedure reduces the cost of hyperparameter tuning inside kernel-based linear-system identification.
Because only matrix-vector products with A are required, the method scales to problems where forming A explicitly is prohibitive.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same joint-approximation idea could be tested on other kernel-regularized estimation problems whose objectives involve repeated quadratic forms and log-determinants.
Implementation in real-time or embedded identification settings would benefit from the cheap λ-reuse property.
Combining the augmented subspace with existing preconditioners for A might produce further speed-ups that the paper does not explore.

Load-bearing premise

The specific positive-semidefinite matrices A that appear in kernel-based FIR identification must admit fast matrix-vector products and must produce the claimed acceleration and error bounds when augmented.

What would settle it

A numerical experiment on a standard FIR identification benchmark in which the augmented Krylov method shows no faster convergence (or larger errors) for the quadratic form than a non-augmented Krylov method of the same dimension.

read the original abstract

We propose a novel Krylov subspace method for estimating the finite impulse response (FIR) of a one-dimensional linear time-invariant systems. The method approximates the system's FIR using a kernel-based formulation combined with hyperparameter selection based on maximum likelihood estimation (MLE), which requires repeated evaluation of two terms: The data fit $\boldsymbol{y}^{\top} (\lambda \boldsymbol{I} + \boldsymbol{A})^{-1} \boldsymbol{y}$ and the model complexity $\log(\det (\lambda \boldsymbol{I} + \boldsymbol{A}))$, where $\boldsymbol{A}$ is a certain positive semidefinite matrix that admits fast matrix--vector products and $\lambda > 0$ is a regularization parameter. Instead of approximating these two quantities separately, we jointly approximate them using a single augmented Krylov subspace for $\boldsymbol{A}$. One major benefit of augmentation is that we obtain accelerated convergence when approximating the data fit quadratic form, through implicit preconditioning. Thanks to the shift invariance of Krylov subspaces, the extracted approximations can be used to evaluate the MLE objective for many values of $\lambda$ at little additional cost. We derive error bounds for the approximations, reflecting the benefits of augmentation demonstrated through multiple numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows how one augmented Krylov subspace can jointly approximate both the quadratic data-fit term and the log-det term for MLE hyperparameter selection in kernel-based FIR identification, with faster convergence on the quadratic via implicit preconditioning and cheap reuse across many λ values.

read the letter

The core contribution is the joint approximation using a single augmented Krylov space for A instead of handling the quadratic form y^T (λI + A)^{-1} y and log det(λI + A) separately. Shift-invariance of the subspaces lets the same basis serve many regularization parameters at low extra cost, which is a practical win when you need to scan λ during MLE. They also derive error bounds that capture the augmentation effect and run numerical tests that show the expected speedups on FIR identification problems where A admits fast matvecs.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a Krylov subspace method for approximating the data-fit quadratic form y^T (λI + A)^{-1} y and log-determinant term log det(λI + A) arising in maximum-likelihood hyperparameter selection for kernel-based FIR linear system identification. A single augmented Krylov subspace for the positive-semidefinite matrix A is used to jointly approximate both quantities; shift-invariance of the subspace allows cheap evaluation for many λ, error bounds are derived, and numerical experiments are reported to illustrate accelerated convergence for the quadratic form via implicit preconditioning.

Significance. If the error bounds are valid and the augmentation consistently accelerates the quadratic-form approximation for the kernel-derived PSD matrices that arise in practice, the method offers a practical reduction in computational cost for repeated MLE evaluations in large-scale kernel-based system identification, while preserving the ability to sweep regularization parameters efficiently.

major comments (3)

[§2] §2 (method description): the augmentation vector is introduced without an explicit construction or justification showing why it produces implicit preconditioning for the specific spectrum of the kernel matrix A; the central acceleration claim therefore rests on an unverified interaction between the chosen vector and the eigenvalues of A.
[§3, Eq. (8)] §3, Eq. (8) and surrounding derivation: the error bound for the quadratic-form approximation is stated to reflect the benefit of augmentation, yet the proof does not quantify the improvement relative to a non-augmented Krylov subspace of the same dimension; without this comparison the bound does not yet establish that augmentation is load-bearing for the claimed acceleration.
[§4] §4 (numerical experiments): the reported timings and convergence curves compare the augmented method only against separate approximations, not against a standard (non-augmented) Krylov method using an equivalent total number of matrix-vector products; this leaves open whether the observed gains are due to augmentation or simply to joint evaluation.

minor comments (2)

[Abstract / §1] Notation for the kernel matrix A and the vector y is introduced late; a brief recap in the abstract or §1 would improve readability.
[§4] Several figure captions omit the precise dimensions of the test problems and the number of Monte-Carlo trials; these details should be added for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each of the three major comments below, indicating the revisions we will incorporate to strengthen the presentation while preserving the core contributions of the augmented Krylov approach.

read point-by-point responses

Referee: [§2] §2 (method description): the augmentation vector is introduced without an explicit construction or justification showing why it produces implicit preconditioning for the specific spectrum of the kernel matrix A; the central acceleration claim therefore rests on an unverified interaction between the chosen vector and the eigenvalues of A.

Authors: We agree that §2 would benefit from an explicit description of the augmentation vector and a clearer justification of its preconditioning effect. In the revised manuscript we will expand the method description to state the construction of the augmentation vector (chosen to align with the observation vector y) and provide a brief motivation based on the eigenvalue clustering typical of kernel-derived matrices A in FIR identification. This addition will make the implicit-preconditioning claim self-contained without altering the algorithmic procedure. revision: yes
Referee: [§3, Eq. (8)] §3, Eq. (8) and surrounding derivation: the error bound for the quadratic-form approximation is stated to reflect the benefit of augmentation, yet the proof does not quantify the improvement relative to a non-augmented Krylov subspace of the same dimension; without this comparison the bound does not yet establish that augmentation is load-bearing for the claimed acceleration.

Authors: The referee correctly notes that the existing derivation of the bound in Eq. (8) does not contain an explicit side-by-side comparison with the non-augmented Lanczos bound of identical dimension. We will therefore add a short corollary (or remark) immediately following the proof that relates the augmented bound to the classical one, showing that the extra vector can only tighten the estimate and quantifying the improvement in terms of the angle between the augmentation vector and the current residual. This revision will make the load-bearing role of augmentation explicit while leaving the original proof unchanged. revision: yes
Referee: [§4] §4 (numerical experiments): the reported timings and convergence curves compare the augmented method only against separate approximations, not against a standard (non-augmented) Krylov method using an equivalent total number of matrix-vector products; this leaves open whether the observed gains are due to augmentation or simply to joint evaluation.

Authors: We accept that the current experimental design does not isolate the contribution of augmentation from the benefit of joint evaluation. In the revised §4 we will include an additional set of convergence curves that compare the augmented subspace against a plain (non-augmented) Krylov subspace whose dimension is chosen so that the total number of matrix-vector products matches that of the augmented run. The new plots will demonstrate that the faster decay observed for the quadratic-form error is attributable to the augmentation step rather than to the mere fact of joint approximation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard Krylov theory and derived bounds

full rationale

The paper presents a numerical linear algebra technique for jointly approximating the quadratic form and log-determinant terms arising in kernel-based FIR identification via a single augmented Krylov subspace. The central claims (accelerated convergence for the quadratic term via implicit preconditioning, shift-invariance for multiple λ, and error bounds) are supported by analysis of Krylov subspace properties and explicit error derivations rather than by redefining inputs in terms of outputs or by load-bearing self-citations. The augmentation vector choice and its interaction with the PSD matrix A are part of the proposed method, not assumed by construction. No steps reduce the claimed benefits to fitted parameters or prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on standard properties of Krylov subspaces and the assumption that fast matrix-vector products with A are available. No new entities are postulated and no parameters are fitted inside the central approximation claim.

axioms (2)

standard math Krylov subspaces generated by A are shift-invariant
Invoked to allow reuse of the same subspace for many values of the regularization parameter lambda at little extra cost.
domain assumption Augmented Krylov subspaces provide implicit preconditioning for quadratic forms
Used to claim accelerated convergence for the data-fit term.

pith-pipeline@v0.9.0 · 5518 in / 1389 out tokens · 38385 ms · 2026-05-12T01:11:56.761426+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Springer, London (2010)

Naylor, P.A., Gaubitch, N.D.: Speech Dereverberation. Springer, London (2010). https://doi.org/10.1007/978-1-84996-056-4 17

work page doi:10.1007/978-1-84996-056-4 2010
[2]

In: Signal Analysis and Prediction, pp

Ljung, L.: System Identification. In: Signal Analysis and Prediction, pp. 163–173. Birkh¨ auser Boston, Boston, MA (1998). https://doi.org/10.1007/ 978-1-4612-1768-8 11

work page 1998
[3]

Andersson, T., Pucar, P.: Estimation of residence time in continuous flow systems with dynamics. J. Process Control5(1), 9–17 (1995) https://doi.org/10.1016/ 0959-1524(95)95941-6

work page 1995
[4]

Automatica48(8), 1525–1535 (2012) https://doi.org/10.1016/j.automatica.2012.05.026

Chen, T., Ohlsson, H., Ljung, L.: On the estimation of transfer functions, regular- izations and Gaussian processes—Revisited. Automatica48(8), 1525–1535 (2012) https://doi.org/10.1016/j.automatica.2012.05.026

work page doi:10.1016/j.automatica.2012.05.026 2012
[5]

Herzog, T

Pillonetto, G., De Nicolao, G.: A new kernel-based approach for linear sys- tem identification. Automatica46(1), 81–93 (2010) https://doi.org/10.1016/j. automatica.2009.10.031

work page doi:10.1016/j 2010
[6]

IEEE Control Syst

Suzuki, R., Oomen, T., Gonz´ alez, R.A.: Direct Bayesian identification of inverse linear systems. IEEE Control Syst. Lett.9, 1478–1483 (2025) https://doi.org/10. 1109/LCSYS.2025.3578480

work page arXiv 2025
[7]

Chung, J., Miller, S.M., Sabate Landman, M., Saibaba, A.K.: Efficient hyperpa- rameter estimation in Bayesian inverse problems using sample average approxi- mation. Philos. Trans. Roy. Soc. A383(2305), 20240046 (2025) https://doi.org/ 10.1098/rsta.2024.0046

work page doi:10.1098/rsta.2024.0046 2025
[8]

Chung, J., Saibaba, A.K.: Generalized hybrid iterative methods for large-scale bayesian inverse problems. SIAM J. Sci. Comput.39(5), 24–46 (2017) https: //doi.org/10.1137/16M1081968

work page doi:10.1137/16m1081968 2017
[9]

IEEE Trans

Chen, L., Chen, T., Andersen, M.S.: Fast kernel-based regularized system iden- tification using Bayesian optimization. IEEE Trans. Autom. Control71(4), 2200–2212 (2026) https://doi.org/10.1109/TAC.2025.3624192

work page doi:10.1109/tac.2025.3624192 2026
[10]

In: Proc

Chen, L., Chen, T., Detha, U., Andersen, M.S.: Towards scalable kernel-based reg- ularized system identification. In: Proc. 62nd IEEE CDC, pp. 1498–1504 (2023). https://doi.org/10.1109/CDC49753.2023.10384051

work page doi:10.1109/cdc49753.2023.10384051 2023
[11]

In: Handbook of Uncertainty Quantification, pp

Dashti, M., Stuart, A.M.: The Bayesian Approach to Inverse Problems. In: Handbook of Uncertainty Quantification, pp. 311–428. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-12385-1 7

work page doi:10.1007/978-3-319-12385-1 2017
[12]

Ubaru, S., Chen, J., Saad, Y.: Fast estimation of tr(f(A)) via stochastic Lanczos quadrature. SIAM J. Matrix Anal. Appl.38(4), 1075–1099 (2017) https://doi. org/10.1137/16M1104974

work page doi:10.1137/16m1104974 2017
[13]

Andersen, M.S., Chen, T.: Smoothing splines and rank structured matrices: Revisiting the spline kernel. SIAM J. Matrix Anal. Appl.41(2), 389–412 (2020) 18 https://doi.org/10.1137/19M1267349

work page doi:10.1137/19m1267349 2020
[14]

ACM Trans

Paige, C.C., Saunders, M.A.: LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software8(1), 43–71 (1982) https: //doi.org/10.1145/355984.355989

work page doi:10.1145/355984.355989 1982
[15]

Frangella, Z., Tropp, J.A., Udell, M.: Randomized Nystr¨ om preconditioning. SIAM J. Matrix Anal. Appl.44(2), 718–752 (2023) https://doi.org/10.1137/ 21M1466244

work page 2023
[16]

Hutchinson, M.F.: A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Comm. Statist. Simulation Comput.19(2), 433–450 (1990) https://doi.org/10.1080/03610919008812866

work page doi:10.1080/03610919008812866 1990
[17]

Girard, D.A.: A fast ‘Monte-Carlo cross-validation’ procedure for large least squares problems with noisy data. Numer. Math.56(1), 1–23 (1989) https: //doi.org/10.1007/BF01395775

work page doi:10.1007/bf01395775 1989
[18]

Johns Hopkins University Press, Baltimore, MD (2013)

Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore, MD (2013). https://doi.org/10.1137/1.9781421407944

work page doi:10.1137/1.9781421407944 2013
[19]

Zhou, Y., Saad, Y.: Block Krylov–Schur method for large symmetric eigenvalue problems. Numer. Algorithms47(4), 341–359 (2008) https://doi.org/10.1007/ s11075-008-9192-9

work page 2008
[20]

Princeton University Press, Princeton, NJ (2009)

Golub, G.H., Meurant, G.: Matrices, Moments and Quadrature with Applications. Princeton University Press, Princeton, NJ (2009)

work page 2009
[21]

Bit Numer

Li, H., Zhu, Y.: Randomized block Krylov subspace methods for trace and log- determinant estimators. Bit Numer. Math.61(3), 911–939 (2021) https://doi. org/10.1007/s10543-021-00850-7

work page doi:10.1007/s10543-021-00850-7 2021
[22]

Cortinovis, A., Kressner, D.: On randomized trace estimates for indefinite matri- ces with an application to determinants. Found. Comput. Math.22(3), 875–903 (2022) https://doi.org/10.1007/s10208-021-09525-9

work page doi:10.1007/s10208-021-09525-9 2022
[23]

Saibaba, A.K., Alexanderian, A., Ipsen, I.C.F.: Randomized matrix-free trace and log-determinant estimators. Numer. Math.137(2), 353–395 (2017) https: //doi.org/10.1007/s00211-017-0880-z

work page doi:10.1007/s00211-017-0880-z 2017
[24]

Lin, L.: Randomized estimation of spectral densities of large matrices made accurate. Numer. Math.136(1), 183–213 (2017) https://doi.org/10.1007/ s00211-016-0837-7

work page 2017
[25]

In: Proc

Meyer, R.A., Musco, C., Musco, C., Woodruff, D.P.: Hutch++: Optimal stochas- tic trace estimation. In: Proc. 2021 Symp. Simpl. Algorithms, pp. 142–155. SIAM, Philadelphia, PA (2021). https://doi.org/10.1137/1.9781611976496.16 19

work page doi:10.1137/1.9781611976496.16 2021
[26]

, title =

Higham, N.J.: Functions of Matrices. SIAM, Philadelphia, PA (2008). https:// doi.org/10.1137/1.9780898717778

work page doi:10.1137/1.9780898717778 2008
[27]

Matti, F., He, H., Kressner, D., Lam, H.Y.: Stochastic trace estimation for parameter-dependent matrices applied to spectral density approximation. SIAM J. Sci. Comput. (2026) https://doi.org/10.48550/arXiv.2502.18626 . to appear

work page doi:10.48550/arxiv.2502.18626 2026
[28]

Greenbaum, A.: Iterative Methods for Solving Linear Systems vol. 17. SIAM, Philadelphia, PA (1997)

work page 1997
[29]

Linear Algebra Appl.29, 293–322 (1980) https://doi.org/10.1016/0024-3795(80) 90247-5

O’Leary, D.P.: The block conjugate gradient algorithm and related methods. Linear Algebra Appl.29, 293–322 (1980) https://doi.org/10.1016/0024-3795(80) 90247-5

work page doi:10.1016/0024-3795(80 1980
[30]

ETNA, 63–92 (2026) https: //doi.org/10.1553/etna vol65s63

Chen, T., Huber, C., Lin, E., Zaid, H.: Preconditioning without a preconditioner using randomized block Krylov subspace methods. ETNA, 63–92 (2026) https: //doi.org/10.1553/etna vol65s63

work page doi:10.1553/etna 2026
[31]

Cambridge University Press, Cambridge (1985)

Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985). https://doi.org/10.1017/CBO9780511810817

work page doi:10.1017/cbo9780511810817 1985
[32]

MathWorks Incorporated Natick, Online (1995) 20

Ljung, L.: System Identification Toolbox: User’s Guide. MathWorks Incorporated Natick, Online (1995) 20

work page 1995