arxiv: 2605.13203 · v1 · submitted 2026-05-13 · 📊 stat.ME

Recognition: unknown

Double Descent and Emergent Smoothing in Model Averaging Prediction

Ke Chen , Dandan Jiang , Xinyu Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 18:23 UTC · model grok-4.3

classification 📊 stat.ME

keywords model averagingdouble descenthigh-dimensional regressionrandom matrix theoryemergent smoothingout-of-sample riskLaMA

0 comments

The pith

Weighted aggregation in high-dimensional model averaging suppresses the double descent risk peak via emergent smoothing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that model averaging in linear regression with regressor count near sample size still displays double descent, with risk spiking as individual models approach interpolation. Weighted combination of those models, however, produces a smoothing effect that reduces the localized variance divergence at that boundary. The authors derive the precise asymptotic out-of-sample risk under a nested-model regime using random matrix theory, then introduce the LaMA procedure whose weight criterion trades in-sample bias against the asymptotic variance term to improve generalization.

Core claim

In high-dimensional linear regression where the number of regressors is comparable to the sample size, the model-averaging predictor inherits the variance explosion of its constituent estimators near the interpolation boundary, yet the weighted aggregation simultaneously generates an emergent smoothing effect that structurally suppresses this localized risk divergence; the exact limiting risk is obtained via random matrix theory under a nested model setting, which in turn yields the LaMA weighting rule that incorporates in-sample bias together with the derived asymptotic variance.

What carries the argument

Emergent smoothing effect induced by strategic weighted aggregation of nested linear models, which damps the interpolation-boundary risk spike while preserving the double-descent shape overall.

If this is right

The double-descent trajectory persists inside the model-averaging risk curve but is attenuated by weight choice.
The LaMA criterion yields lower out-of-sample risk than conventional averaging in the high-dimensional regime.
Asymptotic risk expressions become available for any fixed weighting scheme under the nested design.
Real-data applications show measurable gains in predictive accuracy when LaMA weights replace default schemes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Ensemble weighting may serve as an implicit regularizer that operates without changing the individual model class.
The same smoothing mechanism could be tested in non-linear or nonparametric settings where interpolation boundaries also produce risk spikes.
If the nested-model assumption is relaxed, the smoothing benefit may shrink or shift, offering a direct empirical test of the derivation's scope.

Load-bearing premise

The exact limiting risk formula and the LaMA criterion both rest on a nested-model structure plus random-matrix assumptions on the design matrix and noise distribution.

What would settle it

A simulation in which p/n approaches one, optimal weights are chosen by the LaMA rule, and the height of the risk peak is measured against the height obtained from uniform or unweighted averaging.

read the original abstract

This paper investigates the predictive performance of model averaging in high-dimensional linear regression where the number of regressors is comparable to the sample size. We demonstrate that the double descent trajectory manifests within the model averaging framework, where the ensemble inherits the variance explosion of individual models near the interpolation boundary. However, we reveal that weighted aggregation simultaneously triggers an emergent smoothing effect that structurally suppresses the localized risk divergence, indicating that strategic weight choice serves as a vital stabilizing mechanism. Leveraging tools from random matrix theory, we derive the exact limiting out-of-sample risk under a nested model setting and provide a comprehensive characterization of the risk landscape. Building on these asymptotic results, we propose the Large Model Averaging (LaMA) method, which introduces a novel criterion incorporating in-sample bias and asymptotic out-of-sample variance to balance fitting accuracy and generalization. Numerical studies and real data applications confirm that LaMA achieves superior predictive accuracy in high-dimensional environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Model averaging can smooth out double descent spikes through weighted aggregation, with a new LaMA rule derived from RMT asymptotics, but everything rests on a nested model setup.

read the letter

The paper's main point is that in high-dimensional linear regression, model averaging inherits the usual double descent variance blow-up near interpolation but weighted aggregation produces an emergent smoothing that flattens the risk peak. They derive the exact limiting out-of-sample risk via random matrix theory under nested models and turn that into the LaMA criterion, which picks weights using in-sample bias plus the asymptotic variance term. That is the concrete new piece: a principled way to choose ensemble weights that exploits the smoothing effect rather than just averaging uniformly. The numerics and real-data examples back up that LaMA beats standard alternatives in the p comparable to n regime. The derivation avoids circularity because the limiting risk comes from RMT independently of the target performance. The soft spot is the nested-model restriction. The smoothing mechanism and the LaMA rule are built directly on the nested structure and the associated bias-variance split; nothing in the abstract or stress-test material shows whether the same suppression happens for non-nested ensembles. If the paper does not include those checks, the stabilization claim stays narrower than the abstract suggests. The RMT assumptions on the design matrix and noise are standard but need to be stated clearly so readers know the exact scope. This is useful reading for people working on high-dimensional ensembles and double descent. It is grounded enough in asymptotics and experiments to deserve a serious referee, even if the nested limitation will probably require more discussion or additional simulations in revision.

Referee Report

2 major / 2 minor

Summary. The paper claims that in high-dimensional linear regression, model averaging exhibits the double descent phenomenon with variance explosion near the interpolation boundary, but weighted aggregation induces an emergent smoothing effect that structurally suppresses localized risk divergence. Leveraging random matrix theory under a nested model setting, the authors derive the exact limiting out-of-sample risk, characterize the risk landscape, and propose the Large Model Averaging (LaMA) criterion that balances in-sample bias with asymptotic out-of-sample variance; numerical studies and real-data applications are said to confirm superior predictive accuracy.

Significance. If the RMT derivation holds, the work supplies an exact asymptotic characterization of risk in model averaging ensembles and identifies a mechanism by which strategic weighting stabilizes interpolation behavior. The LaMA criterion offers a concrete, asymptotically motivated selection rule whose empirical performance is reported to outperform standard alternatives in high-dimensional regimes.

major comments (2)

[§3] §3 (limiting risk derivation): the exact limiting out-of-sample risk and the claimed emergent smoothing effect are obtained exclusively under the nested model setting; the paper provides no analytic extension or simulation evidence for non-nested ensembles, yet the central stabilization claim appears to rely on the successive-inclusion bias-variance structure that may not transfer.
[§4] §4 (LaMA criterion): the criterion is constructed directly from the nested-model asymptotic variance; without explicit verification that the in-sample bias term remains consistent when the nested assumption is relaxed, the superiority claim for LaMA inherits the same scope restriction.

minor comments (2)

The abstract and theoretical sections should state the precise assumptions on the design matrix (e.g., Marchenko-Pastur regime, moment conditions) and noise distribution required for the RMT limit to hold.
Figure captions for the risk curves should report the number of Monte Carlo replications and include pointwise variability measures to allow readers to assess the smoothness of the reported trajectories.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major comment point by point below, acknowledging the scope limitations of our analysis while outlining targeted revisions to improve clarity and transparency.

read point-by-point responses

Referee: [§3] §3 (limiting risk derivation): the exact limiting out-of-sample risk and the claimed emergent smoothing effect are obtained exclusively under the nested model setting; the paper provides no analytic extension or simulation evidence for non-nested ensembles, yet the central stabilization claim appears to rely on the successive-inclusion bias-variance structure that may not transfer.

Authors: We agree that the exact limiting out-of-sample risk derivation and the emergent smoothing analysis are developed exclusively under the nested model setting, as stated in the manuscript. This setting is essential for our random matrix theory approach, which exploits the successive inclusion of predictors to obtain closed-form risk expressions. The stabilization effect is characterized precisely within this successive-inclusion bias-variance structure. We do not claim or demonstrate analytic results for non-nested ensembles, nor do we provide dedicated simulation evidence outside the nested case. In revision we will (i) reinforce the nested assumption in the abstract, introduction, and theoretical sections, (ii) add an explicit discussion of the limitation and the challenges of extending the RMT analysis to non-nested ensembles, and (iii) note that empirical validation for non-nested settings is left for future work. This is a partial revision focused on exposition and caveats. revision: partial
Referee: [§4] §4 (LaMA criterion): the criterion is constructed directly from the nested-model asymptotic variance; without explicit verification that the in-sample bias term remains consistent when the nested assumption is relaxed, the superiority claim for LaMA inherits the same scope restriction.

Authors: We acknowledge that the LaMA criterion is constructed from the asymptotic variance obtained under the nested model setting, together with an in-sample bias estimate. The theoretical justification for balancing these terms therefore inherits the nested-model scope. While the empirical results (numerical studies and real-data applications) show strong performance, they do not constitute formal verification of bias-term consistency outside the nested case. In the revised manuscript we will (i) explicitly state the nested-model motivation for LaMA, (ii) add a remark on the potential sensitivity of the criterion when the nested assumption is relaxed, and (iii) qualify the superiority claims accordingly. This constitutes a partial revision to improve transparency without altering the core proposal. revision: partial

standing simulated objections not resolved

Analytic extension of the limiting risk and emergent smoothing results to non-nested model ensembles

Circularity Check

0 steps flagged

Limiting risk derivation via random matrix theory is independent of LaMA proposal

full rationale

The paper first derives the exact limiting out-of-sample risk under a nested model setting using random matrix theory tools, which constitutes a standard asymptotic analysis whose inputs are the design matrix and noise assumptions rather than the target LaMA weights or performance metric. The LaMA criterion is subsequently defined by combining the resulting asymptotic variance expression with an in-sample bias term; this construction does not redefine the risk quantity in terms of itself or rename a fitted quantity as a prediction. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked as load-bearing steps in the provided derivation chain. The emergent smoothing characterization follows directly from the derived risk landscape rather than being presupposed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the applicability of random matrix theory to obtain exact asymptotics under a nested model setting; no new free parameters or invented entities are introduced beyond the LaMA criterion itself.

axioms (1)

domain assumption Random matrix theory applies to the high-dimensional linear model under the nested model setting
Invoked to derive the exact limiting out-of-sample risk.

pith-pipeline@v0.9.0 · 5452 in / 1190 out tokens · 59198 ms · 2026-05-14T18:23:30.005077+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

and K OMAKI , F

A NDO , R. and K OMAKI , F. (2023). On high-dimensional asymptotic properties of m odel averaging estima- tors. arXiv 2308.09476

work page arXiv 2023
[2]

and L I, K.-C

A NDO , T. and L I, K.-C. (2014). A model-averaging approach for high-dimens ional regression. Journal of the American Statistical Association 109 254–265

work page 2014
[3]

and L I, K.-C

A NDO , T. and L I, K.-C. (2017). A weight-relaxed model averaging approach f or high-dimensional gener- alized linear models. The Annals of Statistics 45 2654–2679

work page 2017
[4]

and M ANDAL , S

B ELKIN , M., H SU , D., M A, S. and M ANDAL , S. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116 15849– 15854

work page 2019
[5]

and X U, J

B ELKIN , M., H SU , D. and X U, J. (2020). Two models of double descent for weak features. SIAM Journal on Mathematics of Data Science 2 1167–1180

work page 2020
[6]

and W AGER , S

D OBRIBAN , E. and W AGER , S. (2015). High-dimensional asymptotics of prediction: r idge regression and classiﬁcation. The Annals of Statistics 46 247-279

work page 2015
[7]

and T IAN , W

F ANG , F., Y UAN , C. and T IAN , W. (2023). An asymptotic theory for least squares model ave raging with nested models. Econometric Theory 39 412–441

work page 2023
[8]

and L IU , Q

F ENG , Y. and L IU , Q. (2020). Nested model averaging on solution path for high -dimensional linear regres- sion. Stat 9 e317

work page 2020
[9]

H ANSEN , B. E. (2007). Least squares model averaging. Econometrica 75 1175–1189

work page 2007
[10]

H ANSEN , B. E. and R ACINE , J. S. (2012). Jackknife model averaging. Journal of Econometrics 167 38–46

work page 2012
[11]

and T IBSHIRANI , R

H ASTIE , T., M ONTANARI , A., R OSSET , S. and T IBSHIRANI , R. J. (2022). Surprises in high-dimensional ridgeless least squares interpolation. The Annals of Statistics 50 949–986

work page 2022
[12]

and W OLF, M

L EDOIT , O. and W OLF, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88 365–411

work page 2004
[13]

and Z HANG , D

L I, C., L I, Q., R ACINE , J. and Z HANG , D. (2018). Optimal model averaging of varying coefﬁcient m odels. Statistica Sinica 28 2795–2809

work page 2018
[14]

and Y OSHIMURA , A

L IU , Q., O KUI , R. and Y OSHIMURA , A. (2016). Generalized least squares model averaging. Econometric Reviews 35 1692–1752

work page 2016
[15]

and YANG , Y

P ENG , J., L I, Y. and YANG , Y. (2025). On optimality of mallows model averaging. Journal of the American Statistical Association 120 1152-1163

work page 2025
[16]

and M ESTRE , X

R UBIO , F. and M ESTRE , X. (2011). Spectral convergence for a general class of rand om matrices. Statistics & Probability Letters 81 592-602

work page 2011
[17]

S ERDOBOLSKII , V. I. (2008). Chapter 6 - Theory of solution to high-order sy stems of empirical linear algebraic equations. In Multiparametric Statistics 239–284. Elsevier, Amsterdam

work page 2008
[18]

S ILVERSTEIN , J. W. (1995). Strong convergence of the empirical distribu tion of eigenvalues of large di- mensional random matrices. Journal of Multivariate Analysis 55 331-339

work page 1995
[19]

W AN , A. T. K., Z HANG , X. and Z OU , G. (2010). Least squares model averaging by Mallows criter ion. Journal of Econometrics 156 277–283

work page 2010
[20]

and T ANG , N

X IE , J., Y AN , X. and T ANG , N. (2021). A model-averaging method for high-dimensional regression with missing responses at random. Statistica Sinica 31 1005–1026

work page 2021
[21]

Z HANG , X. (2021). Optimal model averaging based on generalized me thod of moments. Statistica Sinica 31 2103–2122

work page 2021
[22]

Z HANG , X. (2021). A new study on asymptotic optimality of least squ ares model averaging. Econometric Theory 37 388–407

work page 2021
[23]

and C ARROLL , R

Z HANG , X., Z OU , G., L IANG , H. and C ARROLL , R. J. (2020). Parsimonious model averaging with a diverging number of parameters. Journal of the American Statistical Association 115 972–984

work page 2020
[24]

Z HOU , Z.-H. (2021). Linear models In Machine Learning. Springer Nature, Singapore

work page 2021