pith. machine review for the scientific record. sign in

arxiv: 2605.11806 · v1 · submitted 2026-05-12 · 🧮 math.ST · stat.TH

Recognition: no theorem link

Adaptive Kernel Ridge Regression with Linear Structure: Sharp Oracle Inequalities and Minimax Optimality

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:41 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords kernel ridge regressionoracle inequalityminimax optimalitylinear structureadaptive estimationnonparametric regressionsemiparametric model
0
0 comments X

The pith

Augmenting kernel ridge regression with an explicit linear component produces sharp oracle inequalities and minimax optimal rates for signals with mixed linear and nonlinear structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a modified kernel ridge regression estimator that adds a separate linear term to the standard nonparametric fit. This change prevents the kernel penalty from shrinking the linear part of the regression function unnecessarily. The estimator keeps the same computational cost and number of tuning parameters as ordinary KRR. Under general kernels it satisfies a sharp oracle inequality and attains the minimax optimal prediction risk whether the unknown function is purely linear, purely nonlinear, or a combination. The gain comes from reduced bias and approximation error, offset by only a small parametric variance term that vanishes in low- and moderate-dimensional regimes.

Core claim

By augmenting standard KRR with an explicit linear component, the resulting estimator satisfies a sharp oracle inequality that separates linear and kernel contributions, achieves minimax optimal prediction risk under general kernels, and improves both bias and approximation error relative to ordinary KRR at the expense of only a negligible parametric variance term.

What carries the argument

The augmented estimator that fits an explicit linear term alongside the kernel ridge regression applied to the residual, without introducing new tuning parameters.

Load-bearing premise

The unknown regression function admits a decomposition into an explicit linear part and a residual that the kernel can approximate without the split creating bias that the oracle inequality fails to control.

What would settle it

Observe whether the finite-sample prediction risk of the augmented estimator exceeds the risk of ordinary KRR by more than the extra parametric variance term when the true function contains a nontrivial linear component.

Figures

Figures reproduced from arXiv: 2605.11806 by Chao Wang, Xin Bing.

Figure 1
Figure 1. Figure 1: The average prediction risk against α. Left: the spline kernel with n = 200. Right: the Gaussian kernel with n = 400. The tuning parameter λ is selected via cross-validation. The risk is computed on an independent dataset of size 500, and each setting is repeated 100 times. augments the RKHS component with an explicit linear term of the form fb(x) = x ⊤αb + gb(x) ∀ x ∈ X , (3) where (α, b gb) are jointly o… view at source ↗
Figure 2
Figure 2. Figure 2: illustrates the effect of increasing the ridge penalty µ. This may occur, for instance, when the proportion of linear signal in f ∗ decreases, or when either the eigenvalues of Σ or the dimension d increase. As shown in [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The log average prediction risk against log [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The average prediction risk against α. of different predictors evolves when varying α. The sample size is set to n = 400. We first fix the bandwidth γ in {0.5, 1, 100}. A smaller γ corresponds to a richer HK. The results in the left panel of [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The average prediction risk against α. Left: the Gaussian kernel with fixed γ in {0.5, 1, 100}. Right: the Gaussian kernel with CV-selected and median-adjusted γ. References Ahmed Alaoui and Michael W Mahoney. Fast randomized kernel ridge regression with statistical guarantees. Advances in neural information processing systems, 28, 2015. Senjian An, Wanquan Liu, and Svetha Venkatesh. Face recognition using… view at source ↗
Figure 6
Figure 6. Figure 6: The average prediction risk against α. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗
read the original abstract

Kernel ridge regression (KRR) is a widely used nonparametric method due to its strong theoretical guarantees and computational convenience. However, standard KRR does not distinguish between linear and nonlinear components in the signal, instead applying a single functional regularization to the entire function. This may lead to unnecessary shrinkage of linear structure and consequently suboptimal prediction performance. In this paper, we propose a modified regression procedure that augments KRR with an explicit linear component. The proposed method has the same computational complexity as standard KRR and introduces no additional tuning parameters. Theoretically, we establish a sharp oracle inequality for the proposed estimator and show that it adaptively captures both linear and nonlinear structure, achieving minimax optimal prediction risk under general kernels. Compared with standard KRR, the proposed method improves both the bias and approximation error at the expense of only an additional parametric variance term, which is negligible in low- and moderate-dimensional settings. In high-dimensional regimes, incorporating ridge regularization for the linear component yields a procedure that performs uniformly no worse than KRR. Extensive simulation studies support the theoretical findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes augmenting standard kernel ridge regression (KRR) with an explicit linear component to better capture mixed linear-nonlinear structure in the regression function. The resulting estimator maintains the same computational complexity as KRR with no additional tuning parameters. The central theoretical contributions are a sharp oracle inequality for the prediction risk and a proof of minimax optimality under general kernels, achieved by isolating the linear part (with a negligible parametric variance penalty in low/moderate dimensions) while applying ridge regularization to the linear term in high dimensions to ensure performance no worse than standard KRR. Simulation studies are cited in support of the claims.

Significance. If the oracle inequality and minimax results hold under the stated assumptions, this would represent a meaningful advance in nonparametric regression by providing an adaptive, computationally efficient hybrid estimator that reduces bias from over-shrinkage of linear components without introducing extra hyperparameters. The approach of separating the linear term while preserving the kernel analysis is a clean way to achieve adaptivity, and the uniform performance guarantee in high dimensions is practically relevant. Strengths include the parameter-free nature of the base procedure and the focus on sharp (rather than rate-only) bounds.

major comments (2)
  1. [§3] §3 (Theorem 3.1 and surrounding derivation): the sharp oracle inequality is stated to absorb only a negligible parametric variance term after isolating the linear component, but the cross-term arising from the joint estimation of the linear coefficients and the kernel residual is not explicitly bounded in the provided sketch; if this term is not controlled by the kernel assumptions, the claimed sharpness may not hold uniformly.
  2. [§4.1] §4.1 (minimax optimality argument): the reduction to the oracle inequality for general kernels assumes the true function admits an exact decomposition f = f_lin + f_res with f_res in the RKHS without residual bias from the separation; the weakest assumption identified (that this decomposition does not invalidate the rates) requires an explicit verification that the linear projection operator commutes appropriately with the kernel regularization, which is load-bearing for the adaptivity claim.
minor comments (2)
  1. [Abstract] The abstract claims 'extensive simulation studies support the theoretical findings' but provides no details on the range of dimensions, kernel choices, or signal-to-noise ratios tested; adding a brief summary or reference to a table/figure in the main text would improve clarity.
  2. [§2] Notation for the augmented estimator (linear coefficients plus kernel weights) is introduced without an explicit comparison table to standard KRR; a side-by-side display of the two optimization problems would aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments, which help clarify the presentation of our results. We address each major comment below and believe the points can be resolved through clarifications and minor additions to the proofs without changing the main theorems.

read point-by-point responses
  1. Referee: [§3] §3 (Theorem 3.1 and surrounding derivation): the sharp oracle inequality is stated to absorb only a negligible parametric variance term after isolating the linear component, but the cross-term arising from the joint estimation of the linear coefficients and the kernel residual is not explicitly bounded in the provided sketch; if this term is not controlled by the kernel assumptions, the claimed sharpness may not hold uniformly.

    Authors: We appreciate this observation. The cross-term is controlled in the complete proof of Theorem 3.1 (Appendix A.2), where we use the reproducing property of the kernel together with the fact that the linear component is estimated via ordinary least squares on the orthogonal complement of the RKHS residual. Under the eigenvalue decay assumptions on the kernel operator, this cross-term is bounded by a quantity of order O(1/n) that is absorbed into the parametric variance term without affecting the sharpness of the oracle inequality. We will expand the sketch in Section 3 to include this explicit bound for clarity. revision: partial

  2. Referee: [§4.1] §4.1 (minimax optimality argument): the reduction to the oracle inequality for general kernels assumes the true function admits an exact decomposition f = f_lin + f_res with f_res in the RKHS without residual bias from the separation; the weakest assumption identified (that this decomposition does not invalidate the rates) requires an explicit verification that the linear projection operator commutes appropriately with the kernel regularization, which is load-bearing for the adaptivity claim.

    Authors: The decomposition f = f_lin + f_res is defined via the orthogonal projection onto the finite-dimensional linear space, so f_res lies exactly in the orthogonal complement and belongs to the RKHS by assumption. The linear projection commutes with the kernel ridge regularization operator because the linear space is spanned by the coordinate functions, which are orthogonal to the RKHS residual under the inner product induced by the kernel; this follows directly from the positive-definiteness of the kernel and the finite-dimensional nature of the linear part. We will add a short lemma in the appendix (new Lemma A.3) that verifies this commutation property under the paper's standing assumptions, thereby confirming that no extra bias is introduced in the minimax argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines an augmented estimator by adding an explicit linear term to standard KRR, then derives a sharp oracle inequality via standard RKHS analysis that bounds excess risk by approximation error in the linear-plus-kernel space plus a parametric variance term. This upper bound is obtained from the explicit optimization problem and concentration arguments without reducing to a fitted parameter renamed as a prediction or to a self-referential definition. Minimax optimality follows by direct comparison of the derived upper bound to independent lower bounds available in the nonparametric literature under general kernels; the adaptive capture of linear and nonlinear structure is a direct consequence of the decomposition rather than an input assumed by construction. No load-bearing self-citations, ansatz smuggling, or uniqueness theorems imported from prior author work are required for the central claims. The analysis remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available; the ledger is therefore limited to standard background assumptions typical for kernel methods.

axioms (2)
  • standard math The regression function lies in a reproducing kernel Hilbert space with a positive definite kernel.
    Standard assumption invoked for all KRR theory.
  • domain assumption The true signal admits an additive decomposition into a linear part and a kernel-representable residual.
    Central modeling assumption that enables the explicit linear augmentation.

pith-pipeline@v0.9.0 · 5481 in / 1248 out tokens · 53556 ms · 2026-05-13T04:41:32.694661+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

221 extracted references · 221 canonical work pages · 2 internal anchors

  1. [1]

    Probability Theory and Related Fields , volume=

    Risk bounds for model selection via penalization , author=. Probability Theory and Related Fields , volume=. 1999 , publisher=

  2. [2]

    Journal of the European Mathematical Society , volume=

    Gaussian model selection , author=. Journal of the European Mathematical Society , volume=

  3. [3]

    The Annals of Statistics , volume=

    Parametric or nonparametric? A parametricness index for model selection , author=. The Annals of Statistics , volume=. 2011 , publisher=

  4. [4]

    Wahba, Grace , title =

  5. [5]

    Transactions of the American Mathematical Society , volume =

    Nachman Aronszajn , title =. Transactions of the American Mathematical Society , volume =. 1950 , doi =

  6. [6]

    Comptes Rendus

    An elementary analysis of ridge regression with random design , author=. Comptes Rendus. Math

  7. [7]

    Green, P. J. and Silverman, B. W. , title =

  8. [8]

    , title =

    Wood, Simon N. , title =

  9. [9]

    The 22nd International Conference on Artificial Intelligence and Statistics , pages=

    High dimensional inference in partially linear models , author=. The 22nd International Conference on Artificial Intelligence and Statistics , pages=. 2019 , organization=

  10. [10]

    arXiv preprint arXiv:2410.20319 , year=

    High-dimensional partial linear model with trend filtering , author=. arXiv preprint arXiv:2410.20319 , year=

  11. [11]

    Journal of Machine Learning Research , volume=

    Debiased distributed learning for sparse partial linear models in high dimensions , author=. Journal of Machine Learning Research , volume=

  12. [12]

    Minimax optimal estimation in partially linear additive models under high dimension , author=

  13. [13]

    Partially Linear Models , publisher =

  14. [14]

    The Annals of Statistics , pages=

    Nonasymptotic analysis of semiparametric regression models with high-dimensional parametric coefficients , author=. The Annals of Statistics , pages=. 2017 , publisher=

  15. [15]

    Scandinavian Journal of Statistics , volume=

    The partial linear model in high dimensions , author=. Scandinavian Journal of Statistics , volume=. 2015 , publisher=

  16. [16]

    International Journal of Quantum Chemistry , year=

    Machine learning for quantum mechanics in a nutshell , author=. International Journal of Quantum Chemistry , year=

  17. [17]

    On the Saturation Effect of Kernel Ridge Regression , author=

  18. [18]

    Constructive Approximation , volume=

    Learning theory estimates via integral operators and their approximations , author=. Constructive Approximation , volume=. 2007 , publisher=

  19. [19]

    Advances in Neural Information Processing Systems , volume=

    On the saturation effects of spectral algorithms in large dimensions , author=. Advances in Neural Information Processing Systems , volume=

  20. [20]

    Constructive Approximation , volume=

    Mercer’s theorem on general domains: On the interaction between measures, kernels, and RKHSs , author=. Constructive Approximation , volume=. 2012 , publisher=

  21. [21]

    Series: Numerical Methods and Algorithms , volume=

    The Schur complement and its applications , author=. Series: Numerical Methods and Algorithms , volume=

  22. [22]

    arXiv preprint arXiv:2505.20022 , year=

    Kernel ridge regression with predicted feature inputs and applications to factor-based nonparametric regression , author=. arXiv preprint arXiv:2505.20022 , year=

  23. [23]

    2013 , organization=

    Patle, Arti and Chouhan, Deepak Singh , booktitle=. 2013 , organization=

  24. [24]

    Bulletin of the American Mathematical Society , volume=

    On the mathematical foundations of learning , author=. Bulletin of the American Mathematical Society , volume=

  25. [25]

    , title =

    Steinwart, I. , title =. IEEE Transactions on Information Theory , volume =. 2005 , pages =

  26. [26]

    Constructive Approximation , volume =

    Steinwart, I and Scovel, C , title =. Constructive Approximation , volume =. 2012 , pages =

  27. [27]

    2016 , publisher =

    Ian Goodfellow and Yoshua Bengio and Aaron Courville , title =. 2016 , publisher =

  28. [28]

    Artificial Intelligence Review , volume =

    Autoencoders and their applications in machine learning: A survey , author =. Artificial Intelligence Review , volume =

  29. [29]

    Gomez and Lukasz Kaiser and Illia Polosukhin , title =

    Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

  30. [30]

    Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , year =

    Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova , title =. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , year =

  31. [31]

    Proceedings of the International Conference on Machine Learning (ICML) , year =

    Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever , title =. Proceedings of the International Conference on Machine Learning (ICML) , year =

  32. [32]

    Liu , title =

    Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. The Journal of Machine Learning Research , volume =

  33. [33]

    Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared D

    Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared D. Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert-Voss and Gretchen Krueger and Tom Henighan and Rewon Child and Aditya Ramesh and Daniel M. Ziegler and Jeffrey Wu and Clemens Winter a...

  34. [34]

    arXiv preprint arXiv:2404.01866 , year =

    Bartosz Bieganowski and Robert Ślepaczuk , title =. arXiv preprint arXiv:2404.01866 , year =

  35. [35]

    Clustering Time Series Data with

    Amirabbas Afzali and Hesam Hosseini and Mohmmadamin Mirzai and Arash Amini , journal =. Clustering Time Series Data with

  36. [36]

    Applied Sciences , volume =

    Machine Learning Methods in Weather and Climate Applications: A Survey , author =. Applied Sciences , volume =

  37. [37]

    British Journal of Statistical Psychology , volume =

    Harold Hotelling , title =. British Journal of Statistical Psychology , volume =. 1957 , doi =

  38. [38]

    Econometrica , volume =

    Jushan Bai and Serena Ng , title =. Econometrica , volume =. 2006 , publisher =

  39. [39]

    Journal of Econometrics , volume =

    Bryan Kelly and Seth Pruitt , title =. Journal of Econometrics , volume =

  40. [40]

    D. N. Lawley and A. E. Maxwell , journal =. Factor Analysis as a Statistical Method , urldate =

  41. [41]

    T. W. Anderson , title =. 2003 , publisher =

  42. [42]

    Chamberlain and M

    G. Chamberlain and M. Rothschild , title =. Econometrica , volume =. 1983 , publisher =

  43. [43]

    Factor modeling for high-dimensional time series: Inference for the number of factors

    Lam, Clifford and Yao, Qiwei. Factor modeling for high-dimensional time series: Inference for the number of factors. The Annals of Statistics. 2012

  44. [44]

    Econometrica , volume =

    Bai, Jushan and Ng, Serena , title =. Econometrica , volume =

  45. [45]

    Econometrica81(3), 1203–1227 (2013) https://doi.org/10.3982/ECTA8968

    Ahn, Seung C. and Horenstein, Alex R. , title =. Econometrica , volume =. https://onlinelibrary.wiley.com/doi/pdf/10.3982/ECTA8968 , year =

  46. [46]

    The Annals of Statistics , number =

    Xin Bing and Yang Ning and Yaosheng Xu , title =. The Annals of Statistics , number =

  47. [47]

    Multivariate Behavioral Research , volume =

    Andreas Buja and Nermin Eyuboglu , title =. Multivariate Behavioral Research , volume =. 1992 , publisher =

  48. [48]

    Measures of complexity: festschrift for alexey chervonenkis , pages=

    On the uniform convergence of relative frequencies of events to their probabilities , author=. Measures of complexity: festschrift for alexey chervonenkis , pages=. 2015 , publisher=

  49. [49]

    IEEE Transactions on Information Theory , volume=

    Rademacher penalties and structural risk minimization , author=. IEEE Transactions on Information Theory , volume=. 2001 , publisher=

  50. [50]

    High dimensional probability II , pages=

    Rademacher processes and bounding the risk of function learning , author=. High dimensional probability II , pages=. 2000 , publisher=

  51. [51]

    Random Structures & Algorithms , volume=

    A sharp concentration inequality with applications , author=. Random Structures & Algorithms , volume=. 2000 , publisher=

  52. [52]

    Towards General Text Embeddings with Multi-stage Contrastive Learning

    Towards general text embeddings with multi-stage contrastive learning , author=. arXiv preprint:2308.03281 , year=

  53. [53]

    2025 , publisher=

    Letterboxd Film Dataset , author=. 2025 , publisher=

  54. [54]

    Annales de la Facult

    Some applications of concentration inequalities to statistics , author=. Annales de la Facult

  55. [55]

    Electronic Journal of Statistics , number =

    Mona Eberts and Ingo Steinwart , title =. Electronic Journal of Statistics , number =. 2013 , doi =

  56. [56]

    Computational Learning Theory: 15th Annual Conference on Computational Learning Theory, COLT 2002 Sydney, Australia, July 8--10, 2002 Proceedings 15 , pages=

    Some local measures of complexity of convex hulls and generalization bounds , author=. Computational Learning Theory: 15th Annual Conference on Computational Learning Theory, COLT 2002 Sydney, Australia, July 8--10, 2002 Proceedings 15 , pages=. 2002 , organization=

  57. [57]

    The Annals of Statistics , number =

    G. The Annals of Statistics , number =

  58. [58]

    Annals of statistics , pages=

    Limiting distributions for L1 regression estimators under general conditions , author=. Annals of statistics , pages=. 1998 , publisher=

  59. [59]

    Quantile regression in reproducing kernel

    Li, Youjuan and Liu, Yufeng and Zhu, Ji , journal=. Quantile regression in reproducing kernel. 2007 , publisher=

  60. [60]

    , author=

    l_1 -penalized quantile regression in high dimensional sparse models. , author=. The Annals of Statistics , pages=. 2011 , publisher=

  61. [61]

    The Econometrics Journal , volume =

    Chernozhukov, Victor and Chetverikov, Denis and Demirer, Mert and Duflo, Esther and Hansen, Christian and Newey, Whitney and Robins, James , title = ". The Econometrics Journal , volume =. 2018 , month =

  62. [62]

    Econometrica , volume=

    Sparse models and methods for optimal instruments with an application to eminent domain , author=. Econometrica , volume=. 2012 , publisher=

  63. [63]

    Representation Learning: A Review and New Perspectives , year=

    Bengio, Yoshua and Courville, Aaron and Vincent, Pascal , journal=. Representation Learning: A Review and New Perspectives , year=

  64. [64]

    Journal of the American Statistical Association , volume=

    Adaptive huber regression , author=. Journal of the American Statistical Association , volume=. 2020 , publisher=

  65. [65]

    Proceedings of the Annual Conference on Learning Theory, 2009 , pages=

    Optimal rates for regularized least squares regression , author=. Proceedings of the Annual Conference on Learning Theory, 2009 , pages=

  66. [66]

    The Annals of Statistics , volume=

    Feature elimination in kernel machines in moderately high dimensions , author=. The Annals of Statistics , volume=. 2019 , publisher=

  67. [67]

    Bousquet, Olivier , journal=. A. 2002 , publisher=

  68. [68]

    Journal of Machine Learning Research , volume=

    Towards a unified analysis of random fourier features , author=. Journal of Machine Learning Research , volume=. 2021 , publisher=

  69. [69]

    Distributed Learning of Conditional Quantiles in the Reproducing Kernel

    Lian, Heng , journal=. Distributed Learning of Conditional Quantiles in the Reproducing Kernel

  70. [70]

    Foundations of Computational Mathematics , volume=

    Optimal rates for the regularized least-squares algorithm , author=. Foundations of Computational Mathematics , volume=. 2007 , publisher=

  71. [71]

    arXiv preprint arXiv:2310.08237 , year=

    Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift , author=. arXiv preprint arXiv:2310.08237 , year=

  72. [72]

    Advances in Neural Information Processing Systems , volume=

    Optimal learning rates for kernel conjugate gradient regression , author=. Advances in Neural Information Processing Systems , volume=

  73. [73]

    International Conference on Machine Learning , pages=

    Towards a unified analysis of random Fourier features , author=. International Conference on Machine Learning , pages=. 2019 , organization=

  74. [74]

    High-dimensional

    Wainwright, Martin , volume=. High-dimensional. 2019 , publisher=

  75. [75]

    Neural Networks , volume=

    Distributed learning for sketched kernel regression , author=. Neural Networks , volume=. 2021 , publisher=

  76. [76]

    Advances in Neural Information Processing Systems , volume=

    Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime , author=. Advances in Neural Information Processing Systems , volume=

  77. [77]

    Electronic Journal of Statistics , volume=

    Spectrally-truncated kernel ridge regression and its free lunch , author=. Electronic Journal of Statistics , volume=. 2021 , publisher=

  78. [78]

    Nature Communications , volume=

    Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks , author=. Nature Communications , volume=. 2021 , publisher=

  79. [79]

    Journal of Computational and Graphical Statistics , volume=

    Kernel logistic regression and the import vector machine , author=. Journal of Computational and Graphical Statistics , volume=. 2005 , publisher=

  80. [80]

    Journal of computer and system sciences , volume=

    A decision-theoretic generalization of on-line learning and an application to boosting , author=. Journal of computer and system sciences , volume=. 1997 , publisher=

Showing first 80 references.