pith. sign in

arxiv: 2505.09552 · v3 · pith:HB7OYOC5new · submitted 2025-05-14 · 📊 stat.ME · cs.LG· stat.ML

Scalable Krylov Subspace Methods for Generalized Mixed-Effects Models with Crossed Random Effects

Pith reviewed 2026-05-22 15:10 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML
keywords mixed-effects modelscrossed random effectsKrylov subspace methodsstochastic Lanczos quadratureconjugate gradientpredictive variancescomputational statistics
0
0 comments X

The pith

Krylov subspace methods replace Cholesky decompositions to fit generalized mixed-effects models with high-dimensional crossed random effects up to 10,000 times faster while remaining numerically stable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Krylov subspace techniques, specifically preconditioned stochastic Lanczos quadrature and conjugate gradient methods, to solve the large linear systems that arise when fitting generalized mixed-effects models containing crossed random effects. These structures make standard Cholesky-based computations prohibitively expensive for high-cardinality factors, but the new solvers exploit matrix properties to accelerate both parameter estimation and predictive variance calculations. Theoretical results establish convergence rates and accuracy bounds tailored to this model class. Empirical tests on simulated and real data confirm the promised speed and stability gains.

Core claim

Preconditioned stochastic Lanczos quadrature and conjugate gradient methods, when applied to the linear algebra problems in generalized mixed-effects models with crossed random effects, converge rapidly enough to deliver accurate estimates and predictive variances at scales where Cholesky factorizations fail, with the methods shown to be both faster and more numerically stable in practice.

What carries the argument

Preconditioned stochastic Lanczos quadrature and conjugate gradient solvers applied to the sparse linear systems induced by crossed random effects.

If this is right

  • Model fitting becomes feasible for datasets whose crossed factors would previously require hours or days of computation.
  • Predictive variances can now be obtained at the same scale without additional prohibitive cost.
  • Numerical stability improves for variance-component estimates that previously suffered from rounding errors in large Cholesky factors.
  • The same solvers apply to any generalized linear mixed model whose design matrices produce similar sparse Kronecker-type structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same Krylov framework could be adapted to spatial or network models that also involve large crossed or overlapping grouping factors.
  • Software packages for mixed modeling could incorporate these solvers as drop-in replacements once the linear-system interface is exposed.
  • Extension to non-Gaussian responses beyond the tested cases would follow the same preconditioning strategy if the working-response linearization preserves sparsity.

Load-bearing premise

The convergence and accuracy guarantees derived for the preconditioned Krylov methods transfer directly to the matrix structures created by crossed random effects.

What would settle it

A benchmark dataset with crossed factors at cardinality 10,000 or higher where the proposed solvers either exceed one second of runtime or produce parameter estimates differing from a stable reference solution by more than machine precision.

read the original abstract

Mixed-effects models are widely used to model data with hierarchical grouping structures and high-cardinality categorical predictor variables. However, for high-dimensional crossed random effects, current standard computations relying on Cholesky decompositions can become prohibitively slow. In this work, we present Krylov subspace-based methods that address existing computational bottlenecks, and we analyze them both theoretically and empirically. In particular, we derive new results on the convergence and accuracy of the preconditioned stochastic Lanczos quadrature and conjugate gradient methods for mixed-effects models, and we develop scalable methods for calculating predictive variances. In experiments with simulated and real-world data, the proposed methods yield speedups by factors of up to about 10,000 and are numerically more stable than Cholesky-based computations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces Krylov subspace methods—specifically preconditioned stochastic Lanczos quadrature and conjugate gradient—for scalable fitting and inference in generalized mixed-effects models with crossed random effects. It derives new theoretical results on convergence and accuracy of these methods, develops scalable approaches for predictive variances, and reports empirical speedups of up to approximately 10,000 times relative to Cholesky-based methods along with improved numerical stability.

Significance. If the convergence and accuracy guarantees are shown to hold under the iteration-dependent matrices arising in generalized models, the work would offer a meaningful advance for computational statistics. It targets a practical bottleneck in high-dimensional crossed random effects, where standard methods scale poorly, and the combination of new theory with large reported speedups and stability gains could enable routine analysis of previously intractable datasets.

major comments (1)
  1. Theoretical results section: The new convergence and accuracy results for preconditioned stochastic Lanczos quadrature and CG are derived for the linear algebra structure of (Gaussian) mixed models with a fixed covariance or information matrix. In generalized models the effective matrix is reweighted at each outer iteration (via IRLS or Laplace approximation), so the spectral bounds, preconditioner quality, and quadrature error controls must be re-established for the sequence of matrices. The manuscript should provide an explicit argument or additional bounds showing that the claimed guarantees survive this dependence; without it the accuracy and stability claims for the generalized case rest on an unverified extension.
minor comments (1)
  1. Abstract: The phrase 'speedups by factors of up to about 10,000' would benefit from a brief qualifier indicating the model dimensions, number of crossed factors, and hardware on which the largest factor was observed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We appreciate the emphasis on ensuring the theoretical results extend properly to generalized models. Below we address the major comment point by point and outline the revisions we will make.

read point-by-point responses
  1. Referee: Theoretical results section: The new convergence and accuracy results for preconditioned stochastic Lanczos quadrature and CG are derived for the linear algebra structure of (Gaussian) mixed models with a fixed covariance or information matrix. In generalized models the effective matrix is reweighted at each outer iteration (via IRLS or Laplace approximation), so the spectral bounds, preconditioner quality, and quadrature error controls must be re-established for the sequence of matrices. The manuscript should provide an explicit argument or additional bounds showing that the claimed guarantees survive this dependence; without it the accuracy and stability claims for the generalized case rest on an unverified extension.

    Authors: We agree that an explicit extension of the guarantees is necessary for the generalized case. The derivations in the theoretical results section assume a fixed matrix, but in the generalized setting each outer iteration produces a new positive definite matrix of the form A_k = X^T W_k X + Z^T W_k Z + Lambda, where W_k is a diagonal matrix of weights that depends on the current estimates. For standard generalized linear models, the weights satisfy 0 < w_{k,i} <= c for some small constant c (e.g., c=1/4 in the logistic case). This implies that the eigenvalues of A_k are bounded above and below by constants that are independent of k and depend only on the design matrices and the prior precision Lambda. Therefore, the condition number and the spectral distribution remain controlled uniformly, allowing the same convergence bounds for CG and the same error controls for stochastic Lanczos quadrature to apply at each iteration. The preconditioner, constructed from a fixed low-rank or diagonal approximation, continues to be effective because the variation is confined to the bounded weights. We will revise the manuscript by adding a dedicated paragraph or subsection that states these uniform bounds and confirms that the claimed accuracy and stability results extend to the generalized models. We will also include a short numerical check verifying that the iteration counts remain stable across outer iterations in our experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: standard numerical linear algebra applied to model covariance without self-referential reductions

full rationale

The paper derives new convergence and accuracy results for preconditioned stochastic Lanczos quadrature and conjugate gradient methods applied to the linear algebra structure of mixed-effects models with crossed random effects. These results rest on established Krylov subspace theory adapted to the quadratic forms and information matrices arising from the model, with empirical speedups and stability validated on simulated and real data. No derivation step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the central claims remain independent of the inputs and are supported by external numerical linear algebra foundations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5658 in / 1028 out tokens · 29066 ms · 2026-05-22T15:10:11.578227+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We use the preconditioned CG and SLQ methods to solve sparse linear systems and to calculate log-determinants in log-likelihoods, respectively. ... derive novel results on extremal eigenvalues, condition numbers, and thus the convergence behavior of the SLQ method for the symmetric successive over-relaxation (SSOR) and diagonal preconditioners

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Laplace Approximations for Mixed-Effects and Gaussian Process Quantile Regression

    stat.ME 2026-05 unverdicted novelty 6.0

    Laplace approximation framework for quantile regression with mixed-effects and Gaussian processes using Fisher information and population curvature of expected loss instead of observed Hessian.