Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

Fabio Sigrist; Reinhard Furrer; Tim Gyger

arxiv: 2507.05064 · v4 · pith:YFN26XWMnew · submitted 2025-07-07 · 📊 stat.ML · cs.LG· stat.ME

Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

Tim Gyger , Reinhard Furrer , Fabio Sigrist This is my paper

Pith reviewed 2026-05-25 07:42 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords Gaussian processesVecchia approximationinducing pointsscalable approximationsnon-Gaussian likelihoodsLaplace approximationpreconditionersnumerical stability

0 comments

The pith

Vecchia-inducing-points full-scale approximations combine inducing points and Vecchia methods to scale Gaussian processes to large datasets with improved accuracy and stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes VIF approximations for Gaussian processes to address scalability issues with large data. It integrates global inducing points, effective for high-dimensional inputs and smooth covariances, with local Vecchia approximations suited to low-dimensional and moderately smooth cases. The key innovation is an efficient correlation-based strategy for finding neighbors in the Vecchia approximation of the residual process, using a modified cover tree. The method extends to non-Gaussian likelihoods with iterative methods and new preconditioners that speed up computations significantly. Numerical experiments demonstrate that these approximations are more efficient, accurate, and stable than current state-of-the-art methods.

Core claim

VIF approximations bridge the regimes of inducing point and Vecchia methods by using inducing points for the main process and Vecchia for the residual, with correlation-based neighbor finding, resulting in computationally efficient, accurate, and stable approximations for both Gaussian and non-Gaussian likelihoods.

What carries the argument

The VIF approximation, which pairs inducing points with a Vecchia approximation on the residual process via correlation-based neighbor search implemented with a modified cover tree algorithm.

If this is right

Enables handling of both low- and high-dimensional inputs effectively.
Reduces computational costs for non-Gaussian likelihoods by several orders of magnitude using iterative methods.
Provides theoretical convergence results for the preconditioners in Laplace approximations.
Shows superior performance in experiments on simulated and real-world datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may allow Gaussian processes to be applied to even larger problems in fields like spatial statistics or machine learning.
Novel preconditioners could be adapted for other scalable GP methods.
Further testing in extreme regimes might highlight when the neighbor-finding strategy needs adjustment.

Load-bearing premise

The correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process works reliably across various input dimensions and covariance smoothness levels.

What would settle it

If experiments on data with higher input dimensions or less smooth covariance functions than those tested show reduced accuracy or instability compared to alternatives, that would challenge the central claim.

Figures

Figures reproduced from arXiv: 2507.05064 by Fabio Sigrist, Reinhard Furrer, Tim Gyger.

**Figure 2.** Figure 2: shows the results when comparing the VIF, FITC, and Vecchia approximations for varying input dimensions d for a 3/2-Matérn kernel. As expected, we find that the Vecchia approximation is very accurate for low-dimensional inputs. However, the accuracy of the Vecchia approximation declines relatively quickly with increasing dimension d, and the FITC approximation is considerably more accurate for large dimens… view at source ↗

**Figure 3.** Figure 3: RMSE, log-score (LS), and CRPS (mean ± 2 standard errors) for VIF (mv = 30 & m = 200), FITC (m = 200), and Vecchia (mv = 30) approximations for 1/2-Matérn, 3/2-Matérn, 5/2-Matérn, and Gaussian (∞-Matérn) ARD kernels for d = 10. 7.2 Comparison of preconditioners For all subsequent experiments, unless stated otherwise, we generate 100’000 samples from a zeromean Gaussian process with five-dimensional inputs… view at source ↗

**Figure 4.** Figure 4: Differences of iterative-methods-based log-marginal likelihoods compared to Cholesky-based [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Accuracy and runtime comparison of simulation- and iterative-methods-based predictive [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Time (s) for computing the marginal likelihood with VIF, FITC, and Vecchia approximations [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Time (s) for constructing the cover tree and finding the [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Log-score (LS) with error bars (mean ± 2 standard errors) for the data sets modeled with a Gaussian (left plot) and non-Gaussian likelihoods (right plot). The current implementation of DKLGP by Cao et al. [2023] available on https://github.com/katzfussgroup/DKL-GP/ does not support Poisson or Gamma likelihoods. The NA entry indicates that SGPR crashed. The VIF approximation consistently outperforms all ot… view at source ↗

**Figure 9.** Figure 9: Log-score (LS) with error bars (mean ± 2 standard errors) when estimating the smoothness parameter for the regression data sets (left plot) and when using non-zero prior mean functions (right plot). Next, we extend the GP model (1) by allowing for non-zero prior mean (fixed effects) functions F(·). Specifically, we consider a linear regression function F(x) = x Tβ as well as a function that is modeled usin… view at source ↗

**Figure 10.** Figure 10: RMSE, log-score (LS), and CRPS (mean ± 2 standard errors) for VIF (mv = 30 & m = 200), FITC (m = 200), and Vecchia (mv = 30) approximations for 1/2-Matérn, 3/2-Matérn, 5/2-Matérn, and Gaussian (∞-Matérn) ARD kernels when d = 2. Figures 2, 3, and 10 [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗

**Figure 11.** Figure 11: RMSE, log-score (LS), and CRPS (mean ± 2 standard errors) for VIF (mv = 30 & m = 200), FITC (m = 200), and Vecchia (mv = 30) approximations for various dimensions d using an error variance of 0.01 and length scale parameters chosen such that the covariance remains approximately equal (to the one of a Gaussian kernel with length scales λ = (0.35, 0.4, 0.45, 0.5, 0.55)T) at the average distance among two ra… view at source ↗

**Figure 12.** Figure 12: Log-marginal likelihood differences relative to Cholesky-based computations and runtime [PITH_FULL_IMAGE:figures/full_fig_p037_12.png] view at source ↗

**Figure 13.** Figure 13: Time (s) for computing predictive distributions with VIF, FITC, and Vecchia approxima [PITH_FULL_IMAGE:figures/full_fig_p039_13.png] view at source ↗

read the original abstract

Gaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VIF is a concrete hybrid of inducing points and residual Vecchia that the experiments position as more accurate and stable than either alone, but the correlation-based neighbor search is the part that still needs stress-testing.

read the letter

The new element is the specific combination: global inducing points for the main process plus a Vecchia factor on the residual, with neighbors chosen by correlation threshold via a modified cover tree. That pairing is presented as filling the gap between low-dimensional Vecchia strengths and high-dimensional inducing-point strengths, and they add iterative solvers plus preconditioners for non-Gaussian likelihoods with convergence results. The open-source C++ library with Python and R interfaces is a clear practical plus for anyone who wants to try it directly. The experiments on simulated and real data are the main evidence offered for better accuracy, stability, and speed than current alternatives. If those comparisons are clean and the regimes are representative, the hybrid does something useful. The soft spot is the neighbor-finding step for the residual. The stress-test note is on target: correlation thresholds on a cover tree can miss longer-range conditional dependence once dimension grows or the kernel gets smoother, and the abstract does not isolate whether the reported gains survive when that heuristic is pushed (higher d or larger nu). Without seeing the full experimental design and any ablation on the residual approximation quality, it is hard to know how wide the reliable regime actually is. This paper is aimed at people already working on scalable GPs who need something between pure inducing points and pure Vecchia. A reader who cares about moderate-dimensional spatial or spatio-temporal data would get the most immediate value. The algorithmic idea is specific enough, the code is public, and the claims are falsifiable, so it deserves a serious referee even if revisions are needed on the experiments.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Vecchia-inducing-points full-scale (VIF) approximations for Gaussian processes that combine global inducing-point methods with local Vecchia approximations applied to the residual process after the inducing-point correction. The Vecchia component uses a correlation-based neighbor-finding strategy implemented via a modified cover tree. The framework is extended to non-Gaussian likelihoods via iterative methods and novel preconditioners with theoretical convergence results under Laplace approximation. Extensive numerical experiments on simulated and real-world datasets are reported to demonstrate that VIF approximations are computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. The methods are implemented in the open-source GPBoost library with Python and R interfaces.

Significance. If the central claims hold, the work provides a practical method that bridges the regimes where Vecchia approximations excel (low-dimensional inputs, moderate smoothness) and where inducing-point methods are preferred (high-dimensional inputs, smoother kernels). The open-source implementation and the theoretical convergence results for the iterative solvers are explicit strengths that enhance reproducibility and usability.

major comments (2)

[Method description of VIF approximation and neighbor-finding strategy] The central claim that VIF bridges low- and high-dimensional regimes rests on the reliability of the correlation-based neighbor-finding strategy for the residual Vecchia component. The manuscript does not report targeted experiments that stress this heuristic when input dimension exceeds the tested range (d>20) or when covariance smoothness increases (e.g., Matérn ν>5/2), where pairwise correlation may cease to be a faithful proxy for conditional dependence after the global correction. This assumption is load-bearing for the accuracy and stability superiority claims.
[Numerical experiments section] The abstract asserts superiority on the basis of extensive numerical experiments, yet the manuscript provides no details on experimental design, data exclusion criteria, or error-bar reporting. Without these, the support for the claim that VIF is more accurate and numerically stable cannot be fully assessed. This directly affects evaluation of the central empirical claim.

minor comments (2)

[Abstract] The phrase 'several orders of magnitudes' in the abstract should read 'several orders of magnitude'.
Notation for the residual process and the correlation threshold parameter should be introduced with explicit definitions and cross-references to the inducing-point correction step to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their insightful comments. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Method description of VIF approximation and neighbor-finding strategy] The central claim that VIF bridges low- and high-dimensional regimes rests on the reliability of the correlation-based neighbor-finding strategy for the residual Vecchia component. The manuscript does not report targeted experiments that stress this heuristic when input dimension exceeds the tested range (d>20) or when covariance smoothness increases (e.g., Matérn ν>5/2), where pairwise correlation may cease to be a faithful proxy for conditional dependence after the global correction. This assumption is load-bearing for the accuracy and stability superiority claims.

Authors: We agree that the reliability of the correlation-based neighbor-finding strategy after the inducing-point correction is central to the bridging claim. While the current experiments span a range of input dimensions and kernel smoothness levels, we acknowledge that the manuscript lacks targeted stress tests specifically for d>20 and Matérn ν>5/2. In the revised manuscript we will add such experiments to directly evaluate the heuristic's performance in these regimes and thereby provide stronger empirical support for the accuracy and stability claims. revision: yes
Referee: [Numerical experiments section] The abstract asserts superiority on the basis of extensive numerical experiments, yet the manuscript provides no details on experimental design, data exclusion criteria, or error-bar reporting. Without these, the support for the claim that VIF is more accurate and numerically stable cannot be fully assessed. This directly affects evaluation of the central empirical claim.

Authors: We thank the referee for highlighting the need for greater transparency. In the revised manuscript we will expand the numerical experiments section to include a detailed description of the experimental design, any data exclusion criteria applied, and reporting of error bars or standard deviations across repeated runs. This will allow readers to fully assess the empirical support for the accuracy and stability superiority claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; hybrid approximation is independently constructed.

full rationale

The paper introduces VIF as an algorithmic combination of global inducing points and local Vecchia approximations for the residual process, with a correlation-based neighbor search via modified cover tree. No equations, predictions, or uniqueness claims are shown to reduce by construction to fitted inputs, self-citations, or prior ansatzes from the same authors. The central performance claims rest on numerical experiments rather than any self-referential derivation. This is the common case of a self-contained methodological proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the proposed algorithmic combination itself.

pith-pipeline@v0.9.0 · 5754 in / 1002 out tokens · 27345 ms · 2026-05-25T07:42:15.815479+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

[1]

K-means++ the advantages of careful seeding

David Arthur and Sergei Vassilvitskii. K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027--1035, 2007

work page 2007
[2]

Parameter estimation in high dimensional G aussian distributions

Erlend Aune, Daniel P Simpson, and Jo Eidsvik. Parameter estimation in high dimensional G aussian distributions. Statistics and Computing, 24: 0 247--263, 2014

work page 2014
[3]

G aussian predictive process models for large spatial data sets

Sudipto Banerjee, Alan E Gelfand, Andrew O Finley, and Huiyan Sang. G aussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70 0 (4): 0 825--848, 2008

work page 2008
[4]

An estimator for the diagonal of a matrix

Costas Bekas, Effrosyni Kokiopoulou, and Yousef Saad. An estimator for the diagonal of a matrix. Applied N umerical M athematics , 57 0 (11-12): 0 1214--1229, 2007

work page 2007
[5]

Cover trees for nearest neighbor

Alina Beygelzimer, Sham Kakade, and John Langford. Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on Machine learning, pages 97--104, 2006

work page 2006
[6]

Variational sparse inverse C holesky approximation for latent G aussian processes via double K ullback- L eibler minimization

Jian Cao, Myeongjong Kang, Felix Jimenez, Huiyan Sang, Florian Tobias Schaefer, and Matthias Katzfuss. Variational sparse inverse C holesky approximation for latent G aussian processes via double K ullback- L eibler minimization. In International Conference on Machine Learning, pages 3559--3576. PMLR, 2023

work page 2023
[7]

Statistics for spatial data

Noel Cressie. Statistics for spatial data. John Wiley & Sons, 1993

work page 1993
[8]

Improving dual-tree algorithms

Ryan R Curtin. Improving dual-tree algorithms. PhD thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2016

work page 2016
[9]

Hierarchical nearest-neighbor G aussian process models for large geostatistical datasets

Abhirup Datta, Sudipto Banerjee, Andrew O Finley, and Alan E Gelfand. Hierarchical nearest-neighbor G aussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111 0 (514): 0 800--812, 2016

work page 2016
[10]

Direct methods for sparse linear systems

Timothy A Davis. Direct methods for sparse linear systems. SIAM, 2006

work page 2006
[11]

Scalable log determinants for G aussian process kernel learning

Kun Dong, David Eriksson, Hannes Nickisch, David Bindel, and Andrew G Wilson. Scalable log determinants for G aussian process kernel learning. Advances in Neural Information Processing Systems, 30, 2017

work page 2017
[12]

The approximation of one matrix by another of lower rank

Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1 0 (3): 0 211--218, 1936

work page 1936
[13]

A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree

Yury Elkin and Vitaliy Kurlin. A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree. In International Conference on Machine Learning, pages 9267--9311. PMLR, 2023

work page 2023
[14]

Improving the performance of predictive process modeling for large datasets

Andrew O Finley, Huiyan Sang, Sudipto Banerjee, and Alan E Gelfand. Improving the performance of predictive process modeling for large datasets. Computational S tatistics & D ata A nalysis , 53 0 (8): 0 2873--2884, 2009

work page 2009
[15]

Practical methods of optimization

Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2000

work page 2000
[16]

Covariance tapering for interpolation of large spatial datasets

Reinhard Furrer, Marc G Genton, and Douglas Nychka. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15 0 (3): 0 502--523, 2006

work page 2006
[17]

Gpytorch: B lackbox matrix-matrix G aussian process inference with gpu acceleration

Jacob Gardner, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. Gpytorch: B lackbox matrix-matrix G aussian process inference with gpu acceleration. Advances in Neural Information Processing Systems, 31, 2018 a

work page 2018
[18]

Product kernel interpolation for scalable G aussian processes

Jacob Gardner, Geoff Pleiss, Ruihan Wu, Kilian Weinberger, and Andrew Wilson. Product kernel interpolation for scalable G aussian processes. In International Conference on Artificial Intelligence and Statistics, pages 1407--1416. PMLR, 2018 b

work page 2018
[19]

G aussian process learning via F isher scoring of V ecchia’s approximation

Joseph Guinness. G aussian process learning via F isher scoring of V ecchia’s approximation . Statistics and Computing, 31 0 (3): 0 1--8, 2021

work page 2021
[20]

Iterative methods for full-scale G aussian process approximations for large spatial data

Tim Gyger, Reinhard Furrer, and Fabio Sigrist. Iterative methods for full-scale G aussian process approximations for large spatial data. arXiv preprint arXiv:2405.14492, 2024

work page arXiv 2024
[21]

On the low-rank approximation by the pivoted C holesky decomposition

Helmut Harbrecht, Michael Peters, and Reinhold Schneider. On the low-rank approximation by the pivoted C holesky decomposition. Applied N umerical M athematics , 62 0 (4): 0 428--440, 2012

work page 2012
[22]

A case study competition among methods for analyzing large spatial data

Matthew J Heaton, Abhirup Datta, Andrew O Finley, Reinhard Furrer, Joseph Guinness, Rajarshi Guhaniyogi, Florian Gerber, Robert B Gramacy, Dorit Hammerling, Matthias Katzfuss, et al. A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological and Environmental Statistics, 24: 0 398--425, 2019

work page 2019
[23]

G aussian processes for big data

James Hensman, Nicol \`o Fusi, and Neil D Lawrence. G aussian processes for big data. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 282--290, 2013

work page 2013
[24]

Scalable variational G aussian process classification

James Hensman, Alexander Matthews, and Zoubin Ghahramani. Scalable variational G aussian process classification. In Artificial intelligence and statistics, pages 351--360. PMLR, 2015

work page 2015
[25]

Matrix analysis

Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university P ress, 2012

work page 2012
[26]

A stochastic estimator of the trace of the influence matrix for L aplacian smoothing splines

Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for L aplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18 0 (3): 0 1059--1076, 1989

work page 1989
[27]

Correlation-based sparse inverse C holesky factorization for fast G aussian-process inference

Myeongjong Kang and Matthias Katzfuss. Correlation-based sparse inverse C holesky factorization for fast G aussian-process inference. Statistics and Computing, 33 0 (3): 0 56, 2023

work page 2023
[28]

A general framework for V ecchia approximations of G aussian processes

Matthias Katzfuss and Joseph Guinness. A general framework for V ecchia approximations of G aussian processes. Statistical Science, 36 0 (1): 0 124--141, 2021

work page 2021
[29]

V ecchia approximations of G aussian-process predictions

Matthias Katzfuss, Joseph Guinness, Wenlong Gong, and Daniel Zilber. V ecchia approximations of G aussian-process predictions. Journal of Agricultural, Biological and Environmental Statistics, 25: 0 383--414, 2020

work page 2020
[30]

Iterative methods for V ecchia- L aplace approximations for latent G aussian process models

Pascal K \"u ndig and Fabio Sigrist. Iterative methods for V ecchia- L aplace approximations for latent G aussian process models. Journal of the American Statistical Association, 0 (just-accepted): 0 1--22, 2024

work page 2024
[31]

An iteration method for the solution of the eigenvalue problem of linear differential and integral operators

Cornelius Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. 1950

work page 1950
[32]

Control variates

Christiane Lemieux. Control variates. Wiley StatsRef: Statistics Reference Online, pages 1--8, 2014

work page 2014
[33]

Generalized nested dissection

Richard J Lipton, Donald J Rose, and Robert Endre Tarjan. Generalized nested dissection. SIAM Journal on N umerical A nalysis , 16 0 (2): 0 346--358, 1979

work page 1979
[34]

When G aussian process meets big data: A review of scalable GP s

Haitao Liu, Yew-Soon Ong, Xiaobo Shen, and Jianfei Cai. When G aussian process meets big data: A review of scalable GP s. IEEE Transactions on Neural Networks and Learning Systems, 31 0 (11): 0 4405--4423, 2020

work page 2020
[35]

Approximations for binary G aussian process classification

Hannes Nickisch and Carl Edward Rasmussen. Approximations for binary G aussian process classification. Journal of Machine Learning Research, 9 0 (Oct): 0 2035--2078, 2008

work page 2035
[36]

Constant-time predictive distributions for G aussian processes

Geoff Pleiss, Jacob Gardner, Kilian Weinberger, and Andrew Gordon Wilson. Constant-time predictive distributions for G aussian processes. In International Conference on Machine Learning, pages 4114--4123. PMLR, 2018

work page 2018
[37]

A unifying view of sparse approximate G aussian process regression

Joaquin Quinonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approximate G aussian process regression. The Journal of Machine Learning Research, 6: 0 1939--1959, 2005

work page 1939
[38]

An accuracy-runtime trade-off comparison of scalable Gaussian process approximations for spatial data

Filippo Rambelli and Fabio Sigrist. An accuracy-runtime trade-off comparison of scalable G aussian process approximations for spatial data. arXiv preprint arXiv:2501.11448, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Williams

Carl Edward Rasmussen and Christopher K.I. Williams. G aussian processes for machine learning . MIT P ress Cambridge, MA, 2006

work page 2006
[40]

Iterative methods for sparse linear systems

Yousef Saad. Iterative methods for sparse linear systems. SIAM, 2003

work page 2003
[41]

A full scale approximation of covariance functions for large spatial data sets

Huiyan Sang and Jianhua Z Huang. A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology, 74 0 (1): 0 111--132, 2012

work page 2012
[42]

Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors

Huiyan Sang, Mikyoung Jun, and Jianhua Z Huang. Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors. The Annals of Applied Statistics, pages 2519--2548, 2011

work page 2011
[43]

Sparse C holesky factorization by K ullback-- L eibler minimization

Florian Sch\"afer, Matthias Katzfuss, and Houman Owhadi. Sparse C holesky factorization by K ullback-- L eibler minimization. SIAM Journal on Scientific Computing, 43 0 (3): 0 A2019--A2046, 2021 a

work page 2021
[44]

Compression, inversion, and approximate pca of dense kernel matrices at near-linear computational complexity

Florian Sch\"afer, Timothy John Sullivan, and Houman Owhadi. Compression, inversion, and approximate pca of dense kernel matrices at near-linear computational complexity. Multiscale Modeling & Simulation, 19 0 (2): 0 688--730, 2021 b

work page 2021
[45]

Two new lower bounds for the smallest singular value

Xu Shun. Two new lower bounds for the smallest singular value. arXiv preprint arXiv:2108.01221, 2021

work page arXiv 2021
[46]

G aussian process boosting

Fabio Sigrist. G aussian process boosting. The Journal of Machine Learning Research, 23 0 (1): 0 10565--10610, 2022 a

work page 2022
[47]

Latent G aussian model boosting

Fabio Sigrist. Latent G aussian model boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 0 (2): 0 1894--1905, 2022 b

work page 1905
[48]

Integrating random effects in deep neural networks

Giora Simchoni and Saharon Rosset. Integrating random effects in deep neural networks. Journal of Machine Learning Research, 24 0 (156): 0 1--57, 2023

work page 2023
[49]

A review of nystr \"o m methods for large-scale machine learning

Shiliang Sun, Jing Zhao, and Jiang Zhu. A review of nystr \"o m methods for large-scale machine learning. Information Fusion, 26: 0 36--48, 2015

work page 2015
[50]

Accurate approximations for posterior moments and marginal densities

Luke Tierney and Joseph B Kadane. Accurate approximations for posterior moments and marginal densities . Journal of the American Statistical Association, 81 0 (393): 0 82--86, 1986

work page 1986
[51]

Variational learning of inducing variables in sparse G aussian processes

Michalis Titsias. Variational learning of inducing variables in sparse G aussian processes. In Artificial intelligence and statistics, pages 567--574. PMLR, 2009

work page 2009
[52]

N umerical linear algebra , volume 181

Lloyd N Trefethen and David Bau. N umerical linear algebra , volume 181. Siam, 2022

work page 2022
[53]

Some bounds for the singular values of matrices

Ramazan Turkmen and Haci Civciv. Some bounds for the singular values of matrices. Applied M athematical Sciences , 1 0 (49): 0 2443--2449, 2007

work page 2007
[54]

Fast estimation of tr(f(a)) via stochastic lanczos quadrature

Shashanka Ubaru, Jie Chen, and Yousef Saad. Fast estimation of tr(f(a)) via stochastic lanczos quadrature. SIAM Journal on Matrix Analysis and Applications, 38 0 (4): 0 1075--1099, 2017

work page 2017
[55]

Estimation and model identification for continuous spatial processes

Aldo V V ecchia. Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 50 0 (2): 0 297--312, 1988

work page 1988
[56]

Kernel interpolation for scalable structured G aussian processes ( KISS - GP )

Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured G aussian processes ( KISS - GP ). In International conference on machine learning, pages 1775--1784. PMLR, 2015

work page 2015
[57]

A note on a lower bound for the smallest singular value

Yu Yi-Sheng and Gu Dun-He. A note on a lower bound for the smallest singular value. Linear algebra and its Applications, 253 0 (1-3): 0 25--38, 1997

work page 1997
[58]

Smoothed full-scale approximation of G aussian process models for computation of large spatial data sets

Bohai Zhang, Huiyan Sang, and Jianhua Z Huang. Smoothed full-scale approximation of G aussian process models for computation of large spatial data sets. Statistica Sinica, 29 0 (4): 0 1711--1737, 2019

work page 2019

[1] [1]

K-means++ the advantages of careful seeding

David Arthur and Sergei Vassilvitskii. K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027--1035, 2007

work page 2007

[2] [2]

Parameter estimation in high dimensional G aussian distributions

Erlend Aune, Daniel P Simpson, and Jo Eidsvik. Parameter estimation in high dimensional G aussian distributions. Statistics and Computing, 24: 0 247--263, 2014

work page 2014

[3] [3]

G aussian predictive process models for large spatial data sets

Sudipto Banerjee, Alan E Gelfand, Andrew O Finley, and Huiyan Sang. G aussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70 0 (4): 0 825--848, 2008

work page 2008

[4] [4]

An estimator for the diagonal of a matrix

Costas Bekas, Effrosyni Kokiopoulou, and Yousef Saad. An estimator for the diagonal of a matrix. Applied N umerical M athematics , 57 0 (11-12): 0 1214--1229, 2007

work page 2007

[5] [5]

Cover trees for nearest neighbor

Alina Beygelzimer, Sham Kakade, and John Langford. Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on Machine learning, pages 97--104, 2006

work page 2006

[6] [6]

Variational sparse inverse C holesky approximation for latent G aussian processes via double K ullback- L eibler minimization

Jian Cao, Myeongjong Kang, Felix Jimenez, Huiyan Sang, Florian Tobias Schaefer, and Matthias Katzfuss. Variational sparse inverse C holesky approximation for latent G aussian processes via double K ullback- L eibler minimization. In International Conference on Machine Learning, pages 3559--3576. PMLR, 2023

work page 2023

[7] [7]

Statistics for spatial data

Noel Cressie. Statistics for spatial data. John Wiley & Sons, 1993

work page 1993

[8] [8]

Improving dual-tree algorithms

Ryan R Curtin. Improving dual-tree algorithms. PhD thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2016

work page 2016

[9] [9]

Hierarchical nearest-neighbor G aussian process models for large geostatistical datasets

Abhirup Datta, Sudipto Banerjee, Andrew O Finley, and Alan E Gelfand. Hierarchical nearest-neighbor G aussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111 0 (514): 0 800--812, 2016

work page 2016

[10] [10]

Direct methods for sparse linear systems

Timothy A Davis. Direct methods for sparse linear systems. SIAM, 2006

work page 2006

[11] [11]

Scalable log determinants for G aussian process kernel learning

Kun Dong, David Eriksson, Hannes Nickisch, David Bindel, and Andrew G Wilson. Scalable log determinants for G aussian process kernel learning. Advances in Neural Information Processing Systems, 30, 2017

work page 2017

[12] [12]

The approximation of one matrix by another of lower rank

Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1 0 (3): 0 211--218, 1936

work page 1936

[13] [13]

A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree

Yury Elkin and Vitaliy Kurlin. A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree. In International Conference on Machine Learning, pages 9267--9311. PMLR, 2023

work page 2023

[14] [14]

Improving the performance of predictive process modeling for large datasets

Andrew O Finley, Huiyan Sang, Sudipto Banerjee, and Alan E Gelfand. Improving the performance of predictive process modeling for large datasets. Computational S tatistics & D ata A nalysis , 53 0 (8): 0 2873--2884, 2009

work page 2009

[15] [15]

Practical methods of optimization

Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2000

work page 2000

[16] [16]

Covariance tapering for interpolation of large spatial datasets

Reinhard Furrer, Marc G Genton, and Douglas Nychka. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15 0 (3): 0 502--523, 2006

work page 2006

[17] [17]

Gpytorch: B lackbox matrix-matrix G aussian process inference with gpu acceleration

Jacob Gardner, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. Gpytorch: B lackbox matrix-matrix G aussian process inference with gpu acceleration. Advances in Neural Information Processing Systems, 31, 2018 a

work page 2018

[18] [18]

Product kernel interpolation for scalable G aussian processes

Jacob Gardner, Geoff Pleiss, Ruihan Wu, Kilian Weinberger, and Andrew Wilson. Product kernel interpolation for scalable G aussian processes. In International Conference on Artificial Intelligence and Statistics, pages 1407--1416. PMLR, 2018 b

work page 2018

[19] [19]

G aussian process learning via F isher scoring of V ecchia’s approximation

Joseph Guinness. G aussian process learning via F isher scoring of V ecchia’s approximation . Statistics and Computing, 31 0 (3): 0 1--8, 2021

work page 2021

[20] [20]

Iterative methods for full-scale G aussian process approximations for large spatial data

Tim Gyger, Reinhard Furrer, and Fabio Sigrist. Iterative methods for full-scale G aussian process approximations for large spatial data. arXiv preprint arXiv:2405.14492, 2024

work page arXiv 2024

[21] [21]

On the low-rank approximation by the pivoted C holesky decomposition

Helmut Harbrecht, Michael Peters, and Reinhold Schneider. On the low-rank approximation by the pivoted C holesky decomposition. Applied N umerical M athematics , 62 0 (4): 0 428--440, 2012

work page 2012

[22] [22]

A case study competition among methods for analyzing large spatial data

Matthew J Heaton, Abhirup Datta, Andrew O Finley, Reinhard Furrer, Joseph Guinness, Rajarshi Guhaniyogi, Florian Gerber, Robert B Gramacy, Dorit Hammerling, Matthias Katzfuss, et al. A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological and Environmental Statistics, 24: 0 398--425, 2019

work page 2019

[23] [23]

G aussian processes for big data

James Hensman, Nicol \`o Fusi, and Neil D Lawrence. G aussian processes for big data. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 282--290, 2013

work page 2013

[24] [24]

Scalable variational G aussian process classification

James Hensman, Alexander Matthews, and Zoubin Ghahramani. Scalable variational G aussian process classification. In Artificial intelligence and statistics, pages 351--360. PMLR, 2015

work page 2015

[25] [25]

Matrix analysis

Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university P ress, 2012

work page 2012

[26] [26]

A stochastic estimator of the trace of the influence matrix for L aplacian smoothing splines

Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for L aplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18 0 (3): 0 1059--1076, 1989

work page 1989

[27] [27]

Correlation-based sparse inverse C holesky factorization for fast G aussian-process inference

Myeongjong Kang and Matthias Katzfuss. Correlation-based sparse inverse C holesky factorization for fast G aussian-process inference. Statistics and Computing, 33 0 (3): 0 56, 2023

work page 2023

[28] [28]

A general framework for V ecchia approximations of G aussian processes

Matthias Katzfuss and Joseph Guinness. A general framework for V ecchia approximations of G aussian processes. Statistical Science, 36 0 (1): 0 124--141, 2021

work page 2021

[29] [29]

V ecchia approximations of G aussian-process predictions

Matthias Katzfuss, Joseph Guinness, Wenlong Gong, and Daniel Zilber. V ecchia approximations of G aussian-process predictions. Journal of Agricultural, Biological and Environmental Statistics, 25: 0 383--414, 2020

work page 2020

[30] [30]

Iterative methods for V ecchia- L aplace approximations for latent G aussian process models

Pascal K \"u ndig and Fabio Sigrist. Iterative methods for V ecchia- L aplace approximations for latent G aussian process models. Journal of the American Statistical Association, 0 (just-accepted): 0 1--22, 2024

work page 2024

[31] [31]

An iteration method for the solution of the eigenvalue problem of linear differential and integral operators

Cornelius Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. 1950

work page 1950

[32] [32]

Control variates

Christiane Lemieux. Control variates. Wiley StatsRef: Statistics Reference Online, pages 1--8, 2014

work page 2014

[33] [33]

Generalized nested dissection

Richard J Lipton, Donald J Rose, and Robert Endre Tarjan. Generalized nested dissection. SIAM Journal on N umerical A nalysis , 16 0 (2): 0 346--358, 1979

work page 1979

[34] [34]

When G aussian process meets big data: A review of scalable GP s

Haitao Liu, Yew-Soon Ong, Xiaobo Shen, and Jianfei Cai. When G aussian process meets big data: A review of scalable GP s. IEEE Transactions on Neural Networks and Learning Systems, 31 0 (11): 0 4405--4423, 2020

work page 2020

[35] [35]

Approximations for binary G aussian process classification

Hannes Nickisch and Carl Edward Rasmussen. Approximations for binary G aussian process classification. Journal of Machine Learning Research, 9 0 (Oct): 0 2035--2078, 2008

work page 2035

[36] [36]

Constant-time predictive distributions for G aussian processes

Geoff Pleiss, Jacob Gardner, Kilian Weinberger, and Andrew Gordon Wilson. Constant-time predictive distributions for G aussian processes. In International Conference on Machine Learning, pages 4114--4123. PMLR, 2018

work page 2018

[37] [37]

A unifying view of sparse approximate G aussian process regression

Joaquin Quinonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approximate G aussian process regression. The Journal of Machine Learning Research, 6: 0 1939--1959, 2005

work page 1939

[38] [38]

An accuracy-runtime trade-off comparison of scalable Gaussian process approximations for spatial data

Filippo Rambelli and Fabio Sigrist. An accuracy-runtime trade-off comparison of scalable G aussian process approximations for spatial data. arXiv preprint arXiv:2501.11448, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[39] [39]

Williams

Carl Edward Rasmussen and Christopher K.I. Williams. G aussian processes for machine learning . MIT P ress Cambridge, MA, 2006

work page 2006

[40] [40]

Iterative methods for sparse linear systems

Yousef Saad. Iterative methods for sparse linear systems. SIAM, 2003

work page 2003

[41] [41]

A full scale approximation of covariance functions for large spatial data sets

Huiyan Sang and Jianhua Z Huang. A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology, 74 0 (1): 0 111--132, 2012

work page 2012

[42] [42]

Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors

Huiyan Sang, Mikyoung Jun, and Jianhua Z Huang. Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors. The Annals of Applied Statistics, pages 2519--2548, 2011

work page 2011

[43] [43]

Sparse C holesky factorization by K ullback-- L eibler minimization

Florian Sch\"afer, Matthias Katzfuss, and Houman Owhadi. Sparse C holesky factorization by K ullback-- L eibler minimization. SIAM Journal on Scientific Computing, 43 0 (3): 0 A2019--A2046, 2021 a

work page 2021

[44] [44]

Compression, inversion, and approximate pca of dense kernel matrices at near-linear computational complexity

Florian Sch\"afer, Timothy John Sullivan, and Houman Owhadi. Compression, inversion, and approximate pca of dense kernel matrices at near-linear computational complexity. Multiscale Modeling & Simulation, 19 0 (2): 0 688--730, 2021 b

work page 2021

[45] [45]

Two new lower bounds for the smallest singular value

Xu Shun. Two new lower bounds for the smallest singular value. arXiv preprint arXiv:2108.01221, 2021

work page arXiv 2021

[46] [46]

G aussian process boosting

Fabio Sigrist. G aussian process boosting. The Journal of Machine Learning Research, 23 0 (1): 0 10565--10610, 2022 a

work page 2022

[47] [47]

Latent G aussian model boosting

Fabio Sigrist. Latent G aussian model boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 0 (2): 0 1894--1905, 2022 b

work page 1905

[48] [48]

Integrating random effects in deep neural networks

Giora Simchoni and Saharon Rosset. Integrating random effects in deep neural networks. Journal of Machine Learning Research, 24 0 (156): 0 1--57, 2023

work page 2023

[49] [49]

A review of nystr \"o m methods for large-scale machine learning

Shiliang Sun, Jing Zhao, and Jiang Zhu. A review of nystr \"o m methods for large-scale machine learning. Information Fusion, 26: 0 36--48, 2015

work page 2015

[50] [50]

Accurate approximations for posterior moments and marginal densities

Luke Tierney and Joseph B Kadane. Accurate approximations for posterior moments and marginal densities . Journal of the American Statistical Association, 81 0 (393): 0 82--86, 1986

work page 1986

[51] [51]

Variational learning of inducing variables in sparse G aussian processes

Michalis Titsias. Variational learning of inducing variables in sparse G aussian processes. In Artificial intelligence and statistics, pages 567--574. PMLR, 2009

work page 2009

[52] [52]

N umerical linear algebra , volume 181

Lloyd N Trefethen and David Bau. N umerical linear algebra , volume 181. Siam, 2022

work page 2022

[53] [53]

Some bounds for the singular values of matrices

Ramazan Turkmen and Haci Civciv. Some bounds for the singular values of matrices. Applied M athematical Sciences , 1 0 (49): 0 2443--2449, 2007

work page 2007

[54] [54]

Fast estimation of tr(f(a)) via stochastic lanczos quadrature

Shashanka Ubaru, Jie Chen, and Yousef Saad. Fast estimation of tr(f(a)) via stochastic lanczos quadrature. SIAM Journal on Matrix Analysis and Applications, 38 0 (4): 0 1075--1099, 2017

work page 2017

[55] [55]

Estimation and model identification for continuous spatial processes

Aldo V V ecchia. Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 50 0 (2): 0 297--312, 1988

work page 1988

[56] [56]

Kernel interpolation for scalable structured G aussian processes ( KISS - GP )

Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured G aussian processes ( KISS - GP ). In International conference on machine learning, pages 1775--1784. PMLR, 2015

work page 2015

[57] [57]

A note on a lower bound for the smallest singular value

Yu Yi-Sheng and Gu Dun-He. A note on a lower bound for the smallest singular value. Linear algebra and its Applications, 253 0 (1-3): 0 25--38, 1997

work page 1997

[58] [58]

Smoothed full-scale approximation of G aussian process models for computation of large spatial data sets

Bohai Zhang, Huiyan Sang, and Jianhua Z Huang. Smoothed full-scale approximation of G aussian process models for computation of large spatial data sets. Statistica Sinica, 29 0 (4): 0 1711--1737, 2019

work page 2019