arxiv: 2605.06314 · v2 · submitted 2026-05-07 · 💻 cs.LG

Recognition: no theorem link

When Does ell₂-Boosting Overfit Benignly? High-Dimensional Risk Asymptotics and the ell₁ Implicit Bias

Ye Su , Jian Li , Yong Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:41 UTC · model grok-4.3

classification 💻 cs.LG

keywords benign overfittingboostinghigh-dimensional riskimplicit biasLasso equivalencegreedy coordinate selectionrisk asymptotics

0 comments

The pith

ℓ₂-boosting fails to benignly overfit at linear rates in high dimensions because greedy selection localizes noise into sparse active sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper analyzes the high-dimensional risk of continuous-time ℓ₂-boosting over p features and n samples. It establishes that, under an isotropic pure-noise model, the method does not achieve the fast linear-rate benign overfitting known from ℓ₂ geometries. Instead, the ℓ₁ implicit bias of greedy coordinate selection concentrates noise into sparse subsets, so excess variance decays only logarithmically as Θ(σ²/log(p/n)). The analysis also supplies a tuning-free early stopping rule derived from the boosting flow's subdifferential dynamics; under a bounded ℓ₁-path condition this rule recovers the Lasso basic inequality and attains minimax-optimal empirical prediction rates. A sympathetic reader cares because the result clarifies when an implicitly regularized greedy method remains safe in overparameterized regimes and when explicit tuning is still required.

Core claim

Under an isotropic pure-noise model, continuous-time ℓ₂-Boosting does not exhibit benign overfitting at the linear rate. Greedy selection localizes noise into sparse active sets, causing the excess variance to decay at the logarithmic rate Θ(σ²/log(p/n)). For spiked-isotropic designs whose tail dimension r₂ satisfies r₂ ≫ n, risk still converges to zero but only at the slower logarithmic rate Θ(σ²/log(r₂/n)). The non-smooth subdifferential dynamics of the boosting flow yield a tuning-free early stopping rule that, under a bounded ℓ₁-path condition, recovers the Lasso basic inequality and attains the minimax-optimal empirical prediction rate for ℓ₁-bounded signals.

What carries the argument

The continuous-time ℓ₂-boosting flow whose greedy coordinate selection induces an ℓ₁ implicit bias; this bias is resolved analytically by coupling the Convex Gaussian Minimax Theorem with asymptotic expansions of double-sided truncated Gaussian moments.

If this is right

Excess variance decays only logarithmically instead of linearly with the overparameterization ratio.
For designs with large tail dimensions r₂ ≫ n the risk converges to zero but at a logarithmic rate slower than in ridge regression.
A simple early stopping rule based on the boosting path length recovers the Lasso basic inequality without parameter tuning.
The localization of noise into sparse active sets persists as long as the ℓ₁-path remains bounded.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same localization mechanism may continue to operate even when weak signals are present, making boosting inherently sparsity-promoting.
In practice, monitoring the cumulative ℓ₁ length of the boosting path could serve as an automatic safeguard against slow noise accumulation.
The contrast with linear-rate benign overfitting in ℓ₂ methods suggests that greedy coordinate algorithms trade faster noise adaptation for built-in sparsity bias.

Load-bearing premise

The data must be generated from an isotropic pure-noise model with no underlying signal, and the boosting trajectory must obey a bounded ℓ₁-path condition.

What would settle it

Generate high-dimensional pure-noise data with p ≫ n, run continuous-time ℓ₂-boosting, and check whether the excess prediction risk decays proportionally to 1/log(p/n) rather than linearly with n/p.

Figures

Figures reproduced from arXiv: 2605.06314 by Jian Li, Ye Su, Yong Liu.

**Figure 1.** Figure 1: Empirical verification under isotropic Gaussian design. view at source ↗

**Figure 1.** Figure 1: Empirical verification under isotropic Gaussian design. [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗

**Figure 2.** Figure 2: Empirical evaluation of ℓ1 risk decay across distinct feature distributions (n = 800). (a) The excess variance trajectories overlap closely, showing insensitivity to feature tail properties. (b) OLS regressions against 1/ log(p/n) yield R2 > 0.999 for all ensembles, verifying that the logarithmic scaling law is a universal property of the ℓ1 geometry. 7.2.2 Universality across Feature Distributions view at source ↗

**Figure 3.** Figure 3: Empirical excess variance under spiked-isotropic designs across multiple system sizes view at source ↗

**Figure 4.** Figure 4: Dynamic path analysis of the continuous ℓ2-Boosting flow (n = 800, p = 1600). The analytical stopping time t ∗ is triggered when the maximum correlation ρ(t) intercepts the noise threshold 2λn. This tuning-free stopping rule mathematically separates the safe signal-recovery regime from the subsequent noise-interpolating rebound. The analytically stopped estimator achieves the optimum risk of the LassoCV ba… view at source ↗

**Figure 5.** Figure 5: Adaptive early stopping of the ℓ2-Boosting flow (n = 800, p = 1600). The data-driven threshold 2λˆ n, computed via RCV, aligns with the oracle threshold. The resulting algorithmic halt tˆ∗ intercepts the overfitting rebound. The estimator at tˆ∗ matches the optimum risk of the LassoCV baseline and exhibits predictive performance equivalent to the 5-fold CV stopped estimator without requiring repetitive mod… view at source ↗

**Figure 6.** Figure 6: Empirical evaluation of XGBoost interpolating pure noise ( view at source ↗

**Figure 6.** Figure 6: Empirical evaluation of XGBoost interpolating pure noise ( [PITH_FULL_IMAGE:figures/full_fig_p038_6.png] view at source ↗

read the original abstract

Benign overfitting is well-characterized in $\ell_2$ geometries, but its behavior under the $\ell_1$ implicit bias of greedy ensembles remains challenging. The analytical barrier stems from the non-linear coupling of coordinate selection thresholds, which invalidates standard spectral resolvent tools. To isolate this algorithmic bias, we characterize the high-dimensional risk of continuous-time $\ell_2$-Boosting over $p$ features and $n$ samples. By coupling the Convex Gaussian Minimax Theorem with delicate asymptotic expansions of double-sided truncated Gaussian moments, we analytically resolve the non-smooth $\ell_1$ interpolant. Under an isotropic pure-noise model, we prove that benign overfitting fails at the linear rate: greedy selection localizes noise into sparse active sets, and the excess variance decays at a logarithmic rate $\Theta(\sigma^2/\log(p/n))$ for noise variance $\sigma^2$. We remark that while this localization mechanism should persist in the presence of signals, the exact signal-noise decomposition remains an open problem. For spiked-isotropic designs with $k^*$ head eigenvalues and $r_2 = p - k^*$ tail dimensions, the risk converges to zero when $r_{2} \gg n$, but only at a logarithmic rate $\Theta(\sigma^2/\log(r_2/n))$, which is slower than the linear decay observed in $\ell_2$ geometries. To avoid this slow convergence, we analyze the non-smooth subdifferential dynamics of the boosting flow. This yields a tuning-free early stopping rule that, under a bounded $\ell_1$-path condition, recovers the Lasso basic inequality and attains the minimax-optimal empirical prediction rate for $\ell_1$-bounded signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

l2-Boosting overfits at a logarithmic rate under l1 bias because greedy selection sparsifies the noise, and the paper gives an early-stopping rule to recover Lasso rates.

read the letter

The main result is that continuous-time l2-boosting fails to benignly overfit at the linear rate we see with ridge. Under isotropic pure noise the excess risk decays only as sigma squared over log(p/n). Greedy coordinate selection concentrates the noise on a sparse active set, which prevents the fast averaging that l2 methods enjoy. They also show that a subdifferential-based early stopping rule, under a bounded l1-path condition, recovers the Lasso basic inequality and the usual minimax rate for l1-bounded signals. For spiked designs the risk still goes to zero but only logarithmically when the tail dimension is large compared with n. That is slower than the linear decay in pure l2 geometries, which matches intuition once the implicit bias is l1 rather than l2. The analysis is new because it handles the non-linear coupling of selection thresholds that breaks standard resolvent tools. They do this by coupling the convex Gaussian minimax theorem to asymptotic expansions of double-sided truncated Gaussian moments. The expansions let them track the evolving truncation levels along the boosting path and extract the leading logarithmic term. That is a concrete technical step beyond what was available for the l2 case. The soft spot is the control on those expansions. The truncation thresholds move continuously with boosting time and sweep an interval whose width is order log(p/n). The paper does not appear to supply an explicit uniform remainder bound over that interval. If the integrated error is not o(1/log(p/n)), it could sit at the same order as the claimed excess-risk rate and weaken the asymptotics. The isotropic pure-noise model is clean for isolating the effect, but the signal case is left open, as the abstract notes. This work is for people who need precise high-dimensional rates for boosting and implicit regularization. A reader already comfortable with CGMT and moment expansions will see a clear path from the setup to the logarithmic rate and the stopping rule. The thinking is direct and the claims are stated sharply enough that the paper deserves a serious referee. The core comparison to l2 methods is useful even if the error bounds on the expansions need tightening in revision.

Referee Report

2 major / 2 minor

Summary. The paper analyzes the high-dimensional risk of continuous-time ℓ₂-Boosting over p features and n samples. It couples the Convex Gaussian Minimax Theorem (CGMT) with asymptotic expansions of double-sided truncated Gaussian moments to resolve the non-smooth ℓ₁ interpolant induced by greedy coordinate selection. Under an isotropic pure-noise model, it claims that benign overfitting fails at the linear rate: noise localizes into sparse active sets and excess variance decays as Θ(σ²/log(p/n)). For spiked-isotropic designs it obtains a slower logarithmic rate Θ(σ²/log(r₂/n)) when r₂ ≫ n. It further derives a tuning-free early-stopping rule that, under a bounded ℓ₁-path condition, recovers the Lasso basic inequality and minimax-optimal empirical prediction rates.

Significance. If the central claims hold, the work clarifies how the ℓ₁ implicit bias of boosting prevents the linear-rate benign overfitting familiar from ℓ₂ geometries, while still permitting slow (logarithmic) convergence to zero risk. The technical device of coupling CGMT to path-dependent truncated-Gaussian expansions is a genuine advance for analyzing non-smooth greedy flows. The proposed early-stopping rule supplies a practical, tuning-free method for attaining optimal rates under ℓ₁-bounded signals. These contributions would be of clear interest to the high-dimensional statistics and optimization communities.

major comments (2)

[CGMT coupling and moment expansions (proof of the logarithmic excess-risk rate)] The claimed excess-variance rate Θ(σ²/log(p/n)) is obtained by integrating the CGMT representation along the continuous-time boosting path whose coordinate-selection thresholds evolve over an interval of width Θ(log(p/n)). The manuscript invokes asymptotic expansions of double-sided truncated Gaussian moments but supplies no explicit uniform remainder bound over this path-dependent range of truncation levels. If the integrated error is not o(1/log(p/n)), it contaminates the leading term at the same order, rendering the precise rate unproven. This is load-bearing for the main pure-noise theorem.
[analysis of the non-smooth subdifferential dynamics and early-stopping rule] The early-stopping rule is asserted to recover the Lasso basic inequality and minimax rates once a bounded ℓ₁-path condition is imposed. The paper does not state or verify the design conditions under which this path bound holds (isotropic, spiked, or otherwise), nor does it quantify how often the condition is satisfied in the regimes of interest. Because the optimality claim rests on this assumption, its scope must be clarified.

minor comments (2)

[spiked-design paragraph] The transition from the pure-noise result to the spiked-isotropic case (r₂ = p - k*) would benefit from an explicit statement of which parts of the CGMT argument carry over unchanged and which require new expansions.
[introduction and notation] Notation for the tail dimension r₂ is introduced only in the abstract and late in the text; an earlier definition would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments identify key points where the proofs and assumptions require additional rigor, which we will address in the revision.

read point-by-point responses

Referee: [CGMT coupling and moment expansions (proof of the logarithmic excess-risk rate)] The claimed excess-variance rate Θ(σ²/log(p/n)) is obtained by integrating the CGMT representation along the continuous-time boosting path whose coordinate-selection thresholds evolve over an interval of width Θ(log(p/n)). The manuscript invokes asymptotic expansions of double-sided truncated Gaussian moments but supplies no explicit uniform remainder bound over this path-dependent range of truncation levels. If the integrated error is not o(1/log(p/n)), it contaminates the leading term at the same order, rendering the precise rate unproven. This is load-bearing for the main pure-noise theorem.

Authors: We agree that an explicit uniform remainder bound is required to rigorously justify the rate. In the revised manuscript we will add a new appendix lemma establishing a uniform bound on the approximation error of the double-sided truncated Gaussian moments over the full path-dependent interval of truncation levels of width Θ(log(p/n)). The lemma shows that the integrated remainder is o(1/log(p/n)), so that it does not affect the leading Θ(σ²/log(p/n)) term. This directly strengthens the proof of the pure-noise theorem. revision: yes
Referee: [analysis of the non-smooth subdifferential dynamics and early-stopping rule] The early-stopping rule is asserted to recover the Lasso basic inequality and minimax rates once a bounded ℓ₁-path condition is imposed. The paper does not state or verify the design conditions under which this path bound holds (isotropic, spiked, or otherwise), nor does it quantify how often the condition is satisfied in the regimes of interest. Because the optimality claim rests on this assumption, its scope must be clarified.

Authors: We will revise the text to state the precise design conditions under which the bounded ℓ₁-path condition holds with high probability. For isotropic designs the condition is satisfied when the signal is ℓ₁-bounded by a dimension-independent constant; for spiked-isotropic designs it holds whenever r₂ ≫ n. We will add a proposition that quantifies the success probability of the path bound, showing that it tends to 1 in the high-dimensional regime under the paper’s assumptions. This clarifies the scope of the early-stopping optimality claim. revision: yes

Circularity Check

0 steps flagged

No circularity: risk asymptotics derived from external CGMT and moment expansions

full rationale

The derivation couples the Convex Gaussian Minimax Theorem (external) to continuous-time boosting flow via asymptotic expansions of double-sided truncated Gaussian moments. These expansions are applied to path-dependent thresholds but are not shown to reduce the excess-risk expression to a fitted parameter or self-referential definition. The early-stopping rule recovers the Lasso basic inequality under an explicit bounded ℓ1-path assumption rather than by construction. No self-citation is load-bearing for the main isotropic pure-noise result, and the logarithmic rate follows from the analysis without renaming or smuggling an ansatz. The paper is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claims rest on the isotropic pure-noise model and bounded ℓ₁-path condition as domain assumptions, with no free parameters fitted or new entities invented; CGMT is treated as a standard external theorem.

axioms (2)

domain assumption Isotropic pure-noise model
Invoked for the main risk characterization and logarithmic decay proof under p features and n samples.
domain assumption Bounded ℓ₁-path condition
Required for the non-smooth subdifferential dynamics to recover the Lasso basic inequality and minimax rates.

pith-pipeline@v0.9.0 · 5630 in / 1406 out tokens · 69880 ms · 2026-05-13T06:41:56.623880+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 2 internal anchors

[1]

Journal of Machine Learning Research , volume=

Boosting as a regularized path to a maximum margin classifier , author=. Journal of Machine Learning Research , volume=

work page
[2]

Least angle regression , author=

work page
[3]

International Conference on Machine Learning , pages=

Characterizing implicit bias in terms of optimization geometry , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[4]

1997 , publisher=

Convex analysis , author=. 1997 , publisher=

work page 1997
[5]

On Milman's inequality and random subspaces which escape through a mesh in

Gordon, Yehoram , booktitle=. On Milman's inequality and random subspaces which escape through a mesh in. 2006 , organization=

work page 2006
[6]

Conference on Learning Theory , pages=

Regularized linear regression: A precise analysis of the estimation error , author=. Conference on Learning Theory , pages=. 2015 , organization=

work page 2015
[7]

2018 , publisher=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , publisher=

work page 2018
[8]

IEEE transactions on information theory , volume=

Decoding by linear programming , author=. IEEE transactions on information theory , volume=. 2005 , publisher=

work page 2005
[9]

Proceedings of the National Academy of Sciences , volume=

Message-passing algorithms for compressed sensing , author=. Proceedings of the National Academy of Sciences , volume=. 2009 , publisher=

work page 2009
[10]

Information and Inference: A Journal of the IMA , volume=

Living on the edge: Phase transitions in convex programs with random data , author=. Information and Inference: A Journal of the IMA , volume=. 2014 , publisher=

work page 2014
[11]

The Gaussian min--max theorem in the presence of convexity, 2014

The Gaussian min-max theorem in the presence of convexity , author=. arXiv preprint arXiv:1408.4837 , year=

work page arXiv
[12]

2012 , publisher=

The theory of max-min and its application to weapons allocation problems , author=. 2012 , publisher=

work page 2012
[13]

The annals of Statistics , pages=

Estimation of the mean of a multivariate normal distribution , author=. The annals of Statistics , pages=. 1981 , publisher=

work page 1981
[14]

2013 , publisher=

Perturbation analysis of optimization problems , author=. 2013 , publisher=

work page 2013
[15]

Proceedings of the National Academy of Sciences , volume=

Benign overfitting in linear regression , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=

work page 2020
[16]

Journal of Machine Learning Research , volume=

Finite-sample analysis of interpolating linear classifiers in the overparameterized regime , author=. Journal of Machine Learning Research , volume=

work page
[17]

, author=

On general minimax theorems. , author=

work page
[18]

1981 , publisher=

Asymptotic methods in analysis , author=. 1981 , publisher=

work page 1981
[19]

1975 , publisher=

Asymptotic expansions of integrals , author=. 1975 , publisher=

work page 1975
[20]

Optimization , volume=

Karush-kuhn-tucker conditions , author=. Optimization , volume=

work page
[21]

2018 , publisher=

Foundations of the theory of probability: Second English Edition , author=. 2018 , publisher=

work page 2018
[22]

The Annals of Statistics , volume=

Simultaneous analysis of Lasso and Dantzig selector , author=. The Annals of Statistics , volume=. 2009 , publisher=

work page 2009
[23]

IEEE transactions on information theory , volume=

Minimax rates of estimation for high-dimensional linear regression over _q -balls , author=. IEEE transactions on information theory , volume=. 2011 , publisher=

work page 2011
[24]

The Annals of Statistics , volume=

Boosting with early stopping: Convergence and consistency , author=. The Annals of Statistics , volume=. 2005 , publisher=

work page 2005
[25]

Advances in Neural Information Processing Systems , volume=

Adaboost is consistent , author=. Advances in Neural Information Processing Systems , volume=

work page
[26]

2000 , publisher=

Probability and measure theory , author=. 2000 , publisher=

work page 2000
[27]

Proceedings of the National Academy of Sciences , volume=

Reconciling modern machine-learning practice and the classical bias--variance trade-off , author=. Proceedings of the National Academy of Sciences , volume=. 2019 , publisher=

work page 2019
[28]

Annals of statistics , volume=

Surprises in high-dimensional ridgeless least squares interpolation , author=. Annals of statistics , volume=

work page
[29]

Communications on Pure and Applied Mathematics , volume=

The generalization error of random features regression: Precise asymptotics and the double descent curve , author=. Communications on Pure and Applied Mathematics , volume=. 2022 , publisher=

work page 2022
[30]

Journal of Machine Learning Research , volume=

The implicit bias of gradient descent on separable data , author=. Journal of Machine Learning Research , volume=

work page
[31]

Journal of Machine Learning Research , volume=

Benign overfitting in ridge regression , author=. Journal of Machine Learning Research , volume=

work page
[32]

IEEE Journal on Selected Areas in Information Theory , volume=

Harmless interpolation of noisy data in regression , author=. IEEE Journal on Selected Areas in Information Theory , volume=. 2020 , publisher=

work page 2020
[33]

Neural computation , volume=

Neural networks and the bias/variance dilemma , author=. Neural computation , volume=. 1992 , publisher=

work page 1992
[34]

2010 , publisher=

The elements of statistical learning: data mining, inference, and prediction , author=. 2010 , publisher=

work page 2010
[35]

Understanding deep learning requires rethinking generalization

Understanding deep learning requires rethinking generalization , author=. arXiv preprint arXiv:1611.03530 , year=

work page internal anchor Pith review arXiv
[36]

Journal of Machine Learning Research , volume=

Explaining the success of adaboost and random forests as interpolating classifiers , author=. Journal of Machine Learning Research , volume=

work page
[37]

Advances in neural information processing systems , volume=

Why do tree-based models still outperform deep learning on typical tabular data? , author=. Advances in neural information processing systems , volume=

work page
[38]

Annals of statistics , pages=

Greedy function approximation: a gradient boosting machine , author=. Annals of statistics , pages=. 2001 , publisher=

work page 2001
[39]

Information fusion , volume=

Tabular data: Deep learning is not all you need , author=. Information fusion , volume=. 2022 , publisher=

work page 2022
[40]

Journal of the American Statistical Association , volume=

Boosting with the _2 loss: regression and classification , author=. Journal of the American Statistical Association , volume=. 2003 , publisher=

work page 2003
[41]

Electronic Journal of Statistics , volume=

Forward stagewise regression and the lasso , author=. Electronic Journal of Statistics , volume=. 2007 , publisher=

work page 2007
[42]

SIAM review , volume=

Atomic decomposition by basis pursuit , author=. SIAM review , volume=. 2001 , publisher=

work page 2001
[43]

The annals of statistics , volume=

Boosting the margin: A new explanation for the effectiveness of voting methods , author=. The annals of statistics , volume=. 1998 , publisher=

work page 1998
[44]

Advances in Neural Information Processing Systems , volume=

Uniform convergence of interpolators: Gaussian width, norm bounds and benign overfitting , author=. Advances in Neural Information Processing Systems , volume=

work page
[45]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1996 , publisher=

work page 1996
[46]

Monographs on statistics and applied probability , volume=

Statistical learning with sparsity , author=. Monographs on statistics and applied probability , volume=. 2015 , publisher=

work page 2015
[47]

IEEE Transactions on signal processing , volume=

Matching pursuits with time-frequency dictionaries , author=. IEEE Transactions on signal processing , volume=. 1993 , publisher=

work page 1993
[48]

757 arxiv e-prints p , author=

Just interpolate: Kernel” ridgeless” regression can generalize. 757 arxiv e-prints p , author=. arXiv preprint arXiv:1808.00387 , volume=

work page arXiv
[49]

International Conference on Machine Learning , pages=

Margins, shrinkage, and boosting , author=. International Conference on Machine Learning , pages=. 2013 , organization=

work page 2013
[50]

International Conference on Artificial Intelligence and Statistics , pages=

Tight bounds for minimum _1 -norm interpolation of noisy data , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2022 , organization=

work page 2022
[51]

arXiv e-prints, page , author=

The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning. arXiv e-prints, page , author=. arXiv preprint arXiv:1811.01212 , year=

work page arXiv
[52]

International Conference on Artificial Intelligence and Statistics , pages=

Fundamental limits of ridge-regularized empirical risk minimization in high dimensions , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021
[53]

The Annals of Statistics , volume=

The lasso with general gaussian designs with applications to hypothesis testing , author=. The Annals of Statistics , volume=. 2023 , publisher=

work page 2023
[54]

Biometrika , volume=

Scaled sparse linear regression , author=. Biometrika , volume=. 2012 , publisher=

work page 2012
[55]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Variance estimation using refitted cross-validation in ultrahigh dimensional regression , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2012 , publisher=

work page 2012
[56]

IEEE Transactions on Information theory , volume=

Universal approximation bounds for superpositions of a sigmoidal function , author=. IEEE Transactions on Information theory , volume=. 2002 , publisher=

work page 2002
[57]

Advances in neural information processing systems , volume=

Benign overfitting in two-layer convolutional neural networks , author=. Advances in neural information processing systems , volume=

work page
[58]

International conference on machine learning , pages=

Benign overfitting in two-layer relu convolutional neural networks , author=. International conference on machine learning , pages=. 2023 , organization=

work page 2023
[59]

International Conference on Machine Learning , pages=

Benign overfitting in deep neural networks under lazy training , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[60]

arXiv preprint arXiv:2510.15337 , year=

Transfer Learning for Benign Overfitting in High-Dimensional Linear Regression , author=. arXiv preprint arXiv:2510.15337 , year=

work page arXiv
[61]

arXiv preprint arXiv:2410.01774 , year=

Trained transformer classifiers generalize and exhibit benign overfitting in-context , author=. arXiv preprint arXiv:2410.01774 , year=

work page arXiv
[62]

Advances in Neural Information Processing Systems , volume=

Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization , author=. Advances in Neural Information Processing Systems , volume=

work page
[63]

arXiv preprint arXiv:2410.07746 , year=

Benign overfitting in single-head attention , author=. arXiv preprint arXiv:2410.07746 , year=

work page arXiv
[64]

arXiv preprint arXiv:2409.17625 , year=

Benign Overfitting in Token Selection of Attention Mechanism , author=. arXiv preprint arXiv:2409.17625 , year=

work page arXiv
[65]

arXiv preprint arXiv:2410.19139 , year=

Initialization matters: On the benign overfitting of two-layer relu cnn with fully trainable layers , author=. arXiv preprint arXiv:2410.19139 , year=

work page arXiv
[66]

arXiv preprint arXiv:2502.11893 , year=

Rethinking benign overfitting in two-layer neural networks , author=. arXiv preprint arXiv:2502.11893 , year=

work page arXiv
[67]

URL https://arxiv

Benign, tempered, or catastrophic: A taxonomy of overfitting , author=. URL https://arxiv. org/abs/2207 , volume=

work page
[68]

2019 , publisher=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , publisher=

work page 2019
[69]

Electronic Journal of Statistics , volume=

The lasso problem and uniqueness , author=. Electronic Journal of Statistics , volume=

work page
[70]

IEEE Transactions on Information Theory , volume=

Random forests and kernel methods , author=. IEEE Transactions on Information Theory , volume=. 2016 , publisher=

work page 2016
[71]

Test , volume=

A random forest guided tour , author=. Test , volume=. 2016 , publisher=

work page 2016
[72]

International conference on machine learning , pages=

On the spectral bias of neural networks , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[73]

The Annals of Statistics , volume=

High-dimensional asymptotics of prediction: Ridge regression and classification , author=. The Annals of Statistics , volume=. 2018 , publisher=

work page 2018
[74]

Journal of the american statistical association , volume=

Adapting to unknown smoothness via wavelet shrinkage , author=. Journal of the american statistical association , volume=. 1995 , publisher=

work page 1995
[75]

Mills, J. P. , title=. Biometrika , volume=

work page
[76]

1999 , publisher=

Theory of multivariate statistics , author=. 1999 , publisher=

work page 1999
[77]

1968 , publisher=

An introduction to probability theory and its applications, Volume 1 , author=. 1968 , publisher=

work page 1968
[78]

1966 , publisher=

Handbook of mathematical functions, with formulas, graphs, and mathematical tables , author=. 1966 , publisher=

work page 1966
[79]

Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , author=. Ann. Probab , volume=. 1993 , publisher=

work page 1993
[80]

2009 , publisher=

Aspects of multivariate statistical theory , author=. 2009 , publisher=

work page 2009

Showing first 80 references.