Scalable Krylov Subspace Methods for Generalized Mixed-Effects Models with Crossed Random Effects
Pith reviewed 2026-05-22 15:10 UTC · model grok-4.3
The pith
Krylov subspace methods replace Cholesky decompositions to fit generalized mixed-effects models with high-dimensional crossed random effects up to 10,000 times faster while remaining numerically stable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Preconditioned stochastic Lanczos quadrature and conjugate gradient methods, when applied to the linear algebra problems in generalized mixed-effects models with crossed random effects, converge rapidly enough to deliver accurate estimates and predictive variances at scales where Cholesky factorizations fail, with the methods shown to be both faster and more numerically stable in practice.
What carries the argument
Preconditioned stochastic Lanczos quadrature and conjugate gradient solvers applied to the sparse linear systems induced by crossed random effects.
If this is right
- Model fitting becomes feasible for datasets whose crossed factors would previously require hours or days of computation.
- Predictive variances can now be obtained at the same scale without additional prohibitive cost.
- Numerical stability improves for variance-component estimates that previously suffered from rounding errors in large Cholesky factors.
- The same solvers apply to any generalized linear mixed model whose design matrices produce similar sparse Kronecker-type structures.
Where Pith is reading between the lines
- The same Krylov framework could be adapted to spatial or network models that also involve large crossed or overlapping grouping factors.
- Software packages for mixed modeling could incorporate these solvers as drop-in replacements once the linear-system interface is exposed.
- Extension to non-Gaussian responses beyond the tested cases would follow the same preconditioning strategy if the working-response linearization preserves sparsity.
Load-bearing premise
The convergence and accuracy guarantees derived for the preconditioned Krylov methods transfer directly to the matrix structures created by crossed random effects.
What would settle it
A benchmark dataset with crossed factors at cardinality 10,000 or higher where the proposed solvers either exceed one second of runtime or produce parameter estimates differing from a stable reference solution by more than machine precision.
read the original abstract
Mixed-effects models are widely used to model data with hierarchical grouping structures and high-cardinality categorical predictor variables. However, for high-dimensional crossed random effects, current standard computations relying on Cholesky decompositions can become prohibitively slow. In this work, we present Krylov subspace-based methods that address existing computational bottlenecks, and we analyze them both theoretically and empirically. In particular, we derive new results on the convergence and accuracy of the preconditioned stochastic Lanczos quadrature and conjugate gradient methods for mixed-effects models, and we develop scalable methods for calculating predictive variances. In experiments with simulated and real-world data, the proposed methods yield speedups by factors of up to about 10,000 and are numerically more stable than Cholesky-based computations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Krylov subspace methods—specifically preconditioned stochastic Lanczos quadrature and conjugate gradient—for scalable fitting and inference in generalized mixed-effects models with crossed random effects. It derives new theoretical results on convergence and accuracy of these methods, develops scalable approaches for predictive variances, and reports empirical speedups of up to approximately 10,000 times relative to Cholesky-based methods along with improved numerical stability.
Significance. If the convergence and accuracy guarantees are shown to hold under the iteration-dependent matrices arising in generalized models, the work would offer a meaningful advance for computational statistics. It targets a practical bottleneck in high-dimensional crossed random effects, where standard methods scale poorly, and the combination of new theory with large reported speedups and stability gains could enable routine analysis of previously intractable datasets.
major comments (1)
- Theoretical results section: The new convergence and accuracy results for preconditioned stochastic Lanczos quadrature and CG are derived for the linear algebra structure of (Gaussian) mixed models with a fixed covariance or information matrix. In generalized models the effective matrix is reweighted at each outer iteration (via IRLS or Laplace approximation), so the spectral bounds, preconditioner quality, and quadrature error controls must be re-established for the sequence of matrices. The manuscript should provide an explicit argument or additional bounds showing that the claimed guarantees survive this dependence; without it the accuracy and stability claims for the generalized case rest on an unverified extension.
minor comments (1)
- Abstract: The phrase 'speedups by factors of up to about 10,000' would benefit from a brief qualifier indicating the model dimensions, number of crossed factors, and hardware on which the largest factor was observed.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive comments on our manuscript. We appreciate the emphasis on ensuring the theoretical results extend properly to generalized models. Below we address the major comment point by point and outline the revisions we will make.
read point-by-point responses
-
Referee: Theoretical results section: The new convergence and accuracy results for preconditioned stochastic Lanczos quadrature and CG are derived for the linear algebra structure of (Gaussian) mixed models with a fixed covariance or information matrix. In generalized models the effective matrix is reweighted at each outer iteration (via IRLS or Laplace approximation), so the spectral bounds, preconditioner quality, and quadrature error controls must be re-established for the sequence of matrices. The manuscript should provide an explicit argument or additional bounds showing that the claimed guarantees survive this dependence; without it the accuracy and stability claims for the generalized case rest on an unverified extension.
Authors: We agree that an explicit extension of the guarantees is necessary for the generalized case. The derivations in the theoretical results section assume a fixed matrix, but in the generalized setting each outer iteration produces a new positive definite matrix of the form A_k = X^T W_k X + Z^T W_k Z + Lambda, where W_k is a diagonal matrix of weights that depends on the current estimates. For standard generalized linear models, the weights satisfy 0 < w_{k,i} <= c for some small constant c (e.g., c=1/4 in the logistic case). This implies that the eigenvalues of A_k are bounded above and below by constants that are independent of k and depend only on the design matrices and the prior precision Lambda. Therefore, the condition number and the spectral distribution remain controlled uniformly, allowing the same convergence bounds for CG and the same error controls for stochastic Lanczos quadrature to apply at each iteration. The preconditioner, constructed from a fixed low-rank or diagonal approximation, continues to be effective because the variation is confined to the bounded weights. We will revise the manuscript by adding a dedicated paragraph or subsection that states these uniform bounds and confirms that the claimed accuracy and stability results extend to the generalized models. We will also include a short numerical check verifying that the iteration counts remain stable across outer iterations in our experiments. revision: yes
Circularity Check
No circularity: standard numerical linear algebra applied to model covariance without self-referential reductions
full rationale
The paper derives new convergence and accuracy results for preconditioned stochastic Lanczos quadrature and conjugate gradient methods applied to the linear algebra structure of mixed-effects models with crossed random effects. These results rest on established Krylov subspace theory adapted to the quadratic forms and information matrices arising from the model, with empirical speedups and stability validated on simulated and real data. No derivation step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the central claims remain independent of the inputs and are supported by external numerical linear algebra foundations.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use the preconditioned CG and SLQ methods to solve sparse linear systems and to calculate log-determinants in log-likelihoods, respectively. ... derive novel results on extremal eigenvalues, condition numbers, and thus the convergence behavior of the SLQ method for the symmetric successive over-relaxation (SSOR) and diagonal preconditioners
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Laplace Approximations for Mixed-Effects and Gaussian Process Quantile Regression
Laplace approximation framework for quantile regression with mixed-effects and Gaussian processes using Fisher information and population curvature of expected loss instead of observed Hessian.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.