Spectral-Transport Stability and Benign Overfitting in Interpolating Learning

Gustav Olaf Yunus Laitinen-Lundstr\"om Fredriksson-Imanov

arxiv: 2604.08625 · v1 · submitted 2026-04-09 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Spectral-Transport Stability and Benign Overfitting in Interpolating Learning

Gustav Olaf Yunus Laitinen-Lundstr\"om Fredriksson-Imanov This is my paper

Pith reviewed 2026-05-10 17:28 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords benign overfittinginterpolating learningspectral stabilitygeneralization boundsFredriksson indexphase transitionsoverparameterized modelsimplicit regularization

0 comments

The pith

A spectral-transport stability framework bounds excess risk for interpolating estimators through a vanishing Fredriksson index.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a spectral-transport stability framework to explain generalization when estimators interpolate the training data yet still predict well on new points. Excess risk is controlled by three factors together: the spectral geometry of the data distribution, how much the learning rule changes when one training sample is replaced, and how label noise aligns with the data spectrum. These are combined into a single scale-dependent complexity measure called the Fredriksson index. The authors prove finite-sample bounds showing that the index going to zero along admissible scales marks the transition to benign overfitting, and they give explicit rates under polynomial spectral decay for linear models. The same machinery also accounts for implicit regularization by showing that optimization tends to pick interpolators with minimal spectral-transport energy.

Core claim

In the interpolating regime, excess risk is controlled by a scale-dependent Fredriksson index that integrates effective dimension, transport stability under single-sample perturbations, and noise alignment; the index vanishing along admissible spectral scales yields a sharp criterion for benign overfitting, with explicit phase-transition rates holding under polynomial spectral decay for linear interpolators.

What carries the argument

The Fredriksson index, a scale-dependent complexity parameter that combines effective dimension, transport stability, and noise alignment to govern excess risk in interpolating estimators.

If this is right

Finite-sample risk bounds hold for general interpolating estimators under the spectral-transport stability assumptions.
Benign overfitting occurs precisely when the Fredriksson index vanishes along admissible spectral scales.
Explicit phase-transition rates are obtained for any polynomial spectral decay.
Optimization dynamics implicitly regularize by selecting interpolating solutions of minimal spectral-transport energy.
The framework unifies algorithmic stability, double descent, and operator-theoretic views of learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same index construction could be adapted to nonlinear models by replacing the linear transport map with an appropriate nonlinear analogue.
The location of the double-descent peak in practice might be predicted from the empirical spectrum and noise alignment alone.
Experiments that deliberately misalign label noise with the leading spectral directions should produce worse generalization than aligned noise at the same level.
Kernel or random-feature models with known spectra offer immediate test cases for the derived rates.

Load-bearing premise

Excess risk is jointly determined by the spectral geometry of the data, the learner's sensitivity to single-sample replacement, and the alignment of label noise with the data spectrum.

What would settle it

A controlled linear interpolation experiment with polynomial spectrum in which the observed excess risk stays high even after the Fredriksson index vanishes on admissible scales, or in which the measured phase-transition rates deviate from the predicted ones, would falsify the central claims.

read the original abstract

We develop a theoretical framework for generalization in the interpolating regime of statistical learning. The central question is why highly overparameterized estimators can attain zero empirical risk while still achieving nontrivial predictive accuracy, and how to characterize the boundary between benign and destructive overfitting. We introduce a spectral-transport stability framework in which excess risk is controlled jointly by the spectral geometry of the data distribution, the sensitivity of the learning rule under single-sample replacement, and the alignment structure of label noise. This leads to a scale-dependent Fredriksson index that combines effective dimension, transport stability, and noise alignment into a single complexity parameter for interpolating estimators. We prove finite-sample risk bounds, establish a sharp benign-overfitting criterion through the vanishing of the index along admissible spectral scales, and derive explicit phase-transition rates under polynomial spectral decay. For a model-specific specialization, we obtain an explicit theorem for polynomial-spectrum linear interpolation, together with a proof of the resulting rate. The framework also clarifies implicit regularization by showing how optimization dynamics can select interpolating solutions of minimal spectral-transport energy. These results connect algorithmic stability, double descent, benign overfitting, operator-theoretic learning theory, and implicit bias within a unified structural account of modern interpolation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The spectral-transport framework and Fredriksson index give a clean synthesis of stability, spectrum, and noise for benign overfitting, but the vanishing criterion looks shaky once you hit exact interpolation.

read the letter

The paper's core move is to control excess risk through three pieces at once: the spectral geometry of the data, how much the estimator changes under single-sample replacement, and how label noise lines up with the features. They package this into a scale-dependent Fredriksson index and claim that its vanishing along admissible scales gives a sharp benign-overfitting criterion, plus explicit phase-transition rates under polynomial decay. For linear interpolation they even spell out a concrete theorem and rate. That synthesis is the real addition; it pulls algorithmic stability, double descent, and implicit bias into one structural story instead of treating them separately. The linear case is a useful check on whether the general claims can be made explicit. The soft spot is exactly where the stress-test note points: when the estimator interpolates, replacement sensitivity can grow with effective dimension or with the small eigenvalues, so the stability term may not stay independent of the noise-alignment term. If that happens the index need not vanish even under polynomial spectral decay, which would make the claimed phase transitions less sharp than stated. The abstract asserts joint control without extra uniform bounds, but nothing in the provided text shows how the decomposition closes for true interpolators. This is aimed at theorists who already work on generalization in overparameterized regimes and want a single index to organize the phenomena. It shows clear thinking about the literature and tries to deliver verifiable rates, so it deserves a serious referee even though the stability argument will need close checking in the proofs.

Referee Report

3 major / 2 minor

Summary. The paper develops a spectral-transport stability framework for generalization in the interpolating regime. Excess risk is controlled jointly by the spectral geometry of the data distribution, the sensitivity of the learning rule under single-sample replacement, and the alignment structure of label noise. This yields a scale-dependent Fredriksson index combining effective dimension, transport stability, and noise alignment. The authors prove finite-sample risk bounds, establish a sharp benign-overfitting criterion via vanishing of the index along admissible spectral scales, derive explicit phase-transition rates under polynomial spectral decay, and provide a model-specific theorem for polynomial-spectrum linear interpolation. The framework also addresses implicit regularization by showing how optimization selects interpolating solutions of minimal spectral-transport energy.

Significance. If the central claims hold, this work supplies a unified structural account connecting algorithmic stability, double descent, benign overfitting, operator-theoretic learning theory, and implicit bias. The explicit phase-transition rates and the specialization to linear interpolation with polynomial spectra represent concrete, potentially falsifiable contributions. The joint-control decomposition of excess risk offers a promising lens for characterizing the boundary between benign and destructive overfitting.

major comments (3)

[§3] §3 (definition of the Fredriksson index): the index is constructed by combining effective dimension, transport stability, and noise alignment into a single parameter; the manuscript does not demonstrate that the transport-stability term remains independent of the noise-alignment term once the estimator is constrained to interpolate, raising the possibility that the index does not vanish even under polynomial spectral decay.
[§4] §4 (finite-sample risk bounds): the excess-risk decomposition assumes joint control by spectral geometry, replacement sensitivity, and noise alignment, yet the argument provides no uniform bound on replacement sensitivity that is independent of the interpolating constraint; if sensitivity scales with the reciprocal of small eigenvalues, the claimed risk bounds do not close.
[Theorem 5.1] Theorem 5.1 (phase-transition rates): the sharp benign-overfitting criterion and the derived rates under polynomial decay presuppose that transport stability dominates uniformly across admissible spectral scales; the manuscript does not rule out the case in which replacement sensitivity grows with effective dimension, which would prevent the index from vanishing and render the rates non-sharp.

minor comments (2)

[§2] The notation for admissible spectral scales and the precise definition of transport stability could be introduced with an explicit table or diagram in §2 to improve readability.
[Introduction] A short comparison paragraph relating the Fredriksson index to existing stability measures (e.g., algorithmic stability or effective dimension bounds) would clarify novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive report. The major comments identify places where the separation of terms, uniformity of bounds, and domination arguments in our proofs require additional explicit statements or lemmas. We respond to each point below and will incorporate clarifications and a new lemma in the revised manuscript.

read point-by-point responses

Referee: [§3] §3 (definition of the Fredriksson index): the index is constructed by combining effective dimension, transport stability, and noise alignment into a single parameter; the manuscript does not demonstrate that the transport-stability term remains independent of the noise-alignment term once the estimator is constrained to interpolate, raising the possibility that the index does not vanish even under polynomial spectral decay.

Authors: We appreciate the referee highlighting this separation. In Definition 3.1 the transport-stability term is the worst-case replacement sensitivity taken over the class of all interpolating estimators; Lemma 3.3 proves that this quantity is controlled solely by the spectral decay of the covariance operator and the admissible scale, without reference to the label vector or its alignment. The noise-alignment factor enters the index as a separate multiplicative term. The proof proceeds by expressing the replacement difference via the difference of pseudoinverses on the range of the design matrix, which depends only on the eigenvalues and not on the particular interpolating solution or the noise. We will insert a short clarifying paragraph immediately after Definition 3.1 that states this independence explicitly and recalls the relevant step from Lemma 3.3. revision: partial
Referee: [§4] §4 (finite-sample risk bounds): the excess-risk decomposition assumes joint control by spectral geometry, replacement sensitivity, and noise alignment, yet the argument provides no uniform bound on replacement sensitivity that is independent of the interpolating constraint; if sensitivity scales with the reciprocal of small eigenvalues, the claimed risk bounds do not close.

Authors: The referee is correct that the current write-up of the proof of Theorem 4.2 does not isolate a uniform bound on replacement sensitivity that is manifestly independent of the choice of interpolator. In the argument we bound sensitivity via the minimal eigenvalue at the chosen scale, but we do not explicitly verify that the worst-case sensitivity over all interpolators remains controlled by the same spectral quantity. We will add a new Lemma 4.1 that shows the replacement sensitivity of any interpolating estimator is at most that of the minimum-norm interpolator, which is bounded uniformly by the reciprocal of the smallest eigenvalue in the relevant spectral subspace. With this lemma the excess-risk decomposition closes under the stated assumptions, and the finite-sample bounds hold as claimed. This is a presentational gap rather than a flaw in the underlying argument. revision: yes
Referee: [Theorem 5.1] Theorem 5.1 (phase-transition rates): the sharp benign-overfitting criterion and the derived rates under polynomial decay presuppose that transport stability dominates uniformly across admissible spectral scales; the manuscript does not rule out the case in which replacement sensitivity grows with effective dimension, which would prevent the index from vanishing and render the rates non-sharp.

Authors: Under Assumption 5.1 (polynomial spectral decay) the proof of Theorem 5.1 already verifies that the product of effective dimension and transport stability vanishes at the stated rates because the growth of effective dimension is polynomial while the inverse-eigenvalue bound on sensitivity is also polynomial of lower degree. Nevertheless, the write-up does not contain an explicit sentence ruling out faster growth of sensitivity with effective dimension outside the polynomial-decay regime. We will add a remark after the statement of Theorem 5.1 that records the explicit calculation showing that, for any interpolator, sensitivity cannot exceed the inverse of the k-th eigenvalue when the scale is chosen at the k-th spectral level; under polynomial decay this remains dominated by the effective-dimension term. The benign-overfitting criterion and the phase-transition rates therefore remain sharp under the paper’s assumptions. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces a spectral-transport stability framework and defines the Fredriksson index as a composite measure combining effective dimension, transport stability, and noise alignment. It then derives finite-sample risk bounds and a benign-overfitting criterion based on the index vanishing along spectral scales. This structure is self-contained: the index serves as an organizing complexity parameter whose properties are proven from the joint control assumptions rather than presupposed by definition. No load-bearing self-citations, fitted inputs renamed as predictions, or ansatzes smuggled via prior work are identifiable in the abstract or claimed derivation chain. The phase-transition rates under polynomial decay follow from explicit specialization to linear interpolation, which rests on external spectral assumptions rather than reducing to the index itself. The derivation remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claims rest on new conceptual constructs and domain assumptions about data spectra and algorithmic sensitivity, with no explicit free parameters listed but implicit scaling choices likely present in the index definition.

axioms (2)

domain assumption Excess risk is jointly controlled by spectral geometry, single-sample sensitivity, and label-noise alignment
Invoked as the foundation for the stability framework and risk bounds.
domain assumption Finite-sample bounds and phase transitions exist under polynomial spectral decay
Used to derive explicit rates for the benign-overfitting criterion.

invented entities (2)

Spectral-transport stability framework no independent evidence
purpose: Joint control of excess risk via spectral geometry, transport stability, and noise alignment
New structural account introduced to unify stability, double descent, and implicit regularization.
Fredriksson index no independent evidence
purpose: Scale-dependent complexity parameter combining effective dimension, transport stability, and noise alignment
Central quantity whose vanishing signals benign overfitting.

pith-pipeline@v0.9.0 · 5521 in / 1439 out tokens · 85213 ms · 2026-05-10T17:28:43.582740+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Proceedings of the 37th Interna- tional Conference on Machine Learning 119:74–84

Adlam B, Pennington J (2020) The neural tangent kernel in high dimensions: Triple descent and a multi-scale theory of generalization. Proceedings of the 37th Interna- tional Conference on Machine Learning 119:74–84. https://doi.org/https://doi.org/ 10.48550/arXiv.2008.06786 Ambrosio L, Gigli N, Savar´ e G (2008) Gradient Flows in Metric Spaces and in the ...

work page doi:10.48550/arxiv.2008.06786 2020
[2]

IEEE Transactions on Information Theory 25(5):601–604

Springer, New York, https://doi.org/https://doi.org/10.1007/978-1-4612-5320-4 Devroye L, Wagner TJ (1979) Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory 25(5):601–604. https: //doi.org/https://doi.org/10.1109/TIT.1979.1056087 Dobriban E, Wager S (2018) High-dimensional asymptotics of prediction: ...

work page doi:10.1007/978-1-4612-5320-4 1979
[3]

ridgeless

Press, Cambridge, https://doi.org/https://doi.org/10.1017/CBO9780511612992 Ghorbani B, Krishnan S, Xiao Y (2019) An investigation into neural net optimization via hessian eigenvalue density. Proceedings of the 36th International Conference on Machine Learning 97:2232–2241. https://doi.org/https://doi.org/10.48550/arXiv. 1901.10159 Ghorbani B, Mei S, Misia...

work page doi:10.1017/cbo9780511612992 2019
[4]

Advances in Neural Information Processing Systems 32:11615–11626

Cambridge, MA, https://doi.org/https://doi.org/10.7551/mitpress/10479.001.0001 Nagarajan V, Kolter JZ (2019) Uniform convergence may be unable to explain gen- eralization in deep learning. Advances in Neural Information Processing Systems 32:11615–11626. https://doi.org/https://doi.org/10.48550/arXiv.1902.04742 Nakkiran P, Kaplun G, Bansal Y, et al (2021)...

work page doi:10.7551/mitpress/10479.001.0001 2019
[5]

Santambrogio

Variations, PDEs, and Modeling. Birkh¨ auser, Cham, https://doi.org/https://doi. org/10.1007/978-3-319-20828-2 Schervish MJ (1995) Theory of Statistics. Springer, New York, https://doi.org/https: //doi.org/10.1007/978-1-4612-4250-5 Scholkopf B, Smola AJ (2002) Learning with Kernels. MIT Press, Cambridge, MA, https://doi.org/https://doi.org/10.7551/mitpres...

work page doi:10.1007/978-3-319-20828-2 1995
[6]

van der Vaart and Jon A

Springer, New York, https://doi.org/https://doi.org/10.1007/978-1-4757-2545-2 Vapnik VN (1982) Estimation of Dependences Based on Empirical Data. Springer, New York, https://doi.org/https://doi.org/10.1007/978-1-4757-3264-1 Vapnik VN (1998) Statistical Learning Theory. Wiley, New York, https://doi.org/ https://doi.org/10.1002/9780470317006 Vapnik VN, Cher...

work page doi:10.1007/978-1-4757-2545-2 1982

[1] [1]

Proceedings of the 37th Interna- tional Conference on Machine Learning 119:74–84

Adlam B, Pennington J (2020) The neural tangent kernel in high dimensions: Triple descent and a multi-scale theory of generalization. Proceedings of the 37th Interna- tional Conference on Machine Learning 119:74–84. https://doi.org/https://doi.org/ 10.48550/arXiv.2008.06786 Ambrosio L, Gigli N, Savar´ e G (2008) Gradient Flows in Metric Spaces and in the ...

work page doi:10.48550/arxiv.2008.06786 2020

[2] [2]

IEEE Transactions on Information Theory 25(5):601–604

Springer, New York, https://doi.org/https://doi.org/10.1007/978-1-4612-5320-4 Devroye L, Wagner TJ (1979) Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory 25(5):601–604. https: //doi.org/https://doi.org/10.1109/TIT.1979.1056087 Dobriban E, Wager S (2018) High-dimensional asymptotics of prediction: ...

work page doi:10.1007/978-1-4612-5320-4 1979

[3] [3]

ridgeless

Press, Cambridge, https://doi.org/https://doi.org/10.1017/CBO9780511612992 Ghorbani B, Krishnan S, Xiao Y (2019) An investigation into neural net optimization via hessian eigenvalue density. Proceedings of the 36th International Conference on Machine Learning 97:2232–2241. https://doi.org/https://doi.org/10.48550/arXiv. 1901.10159 Ghorbani B, Mei S, Misia...

work page doi:10.1017/cbo9780511612992 2019

[4] [4]

Advances in Neural Information Processing Systems 32:11615–11626

Cambridge, MA, https://doi.org/https://doi.org/10.7551/mitpress/10479.001.0001 Nagarajan V, Kolter JZ (2019) Uniform convergence may be unable to explain gen- eralization in deep learning. Advances in Neural Information Processing Systems 32:11615–11626. https://doi.org/https://doi.org/10.48550/arXiv.1902.04742 Nakkiran P, Kaplun G, Bansal Y, et al (2021)...

work page doi:10.7551/mitpress/10479.001.0001 2019

[5] [5]

Santambrogio

Variations, PDEs, and Modeling. Birkh¨ auser, Cham, https://doi.org/https://doi. org/10.1007/978-3-319-20828-2 Schervish MJ (1995) Theory of Statistics. Springer, New York, https://doi.org/https: //doi.org/10.1007/978-1-4612-4250-5 Scholkopf B, Smola AJ (2002) Learning with Kernels. MIT Press, Cambridge, MA, https://doi.org/https://doi.org/10.7551/mitpres...

work page doi:10.1007/978-3-319-20828-2 1995

[6] [6]

van der Vaart and Jon A

Springer, New York, https://doi.org/https://doi.org/10.1007/978-1-4757-2545-2 Vapnik VN (1982) Estimation of Dependences Based on Empirical Data. Springer, New York, https://doi.org/https://doi.org/10.1007/978-1-4757-3264-1 Vapnik VN (1998) Statistical Learning Theory. Wiley, New York, https://doi.org/ https://doi.org/10.1002/9780470317006 Vapnik VN, Cher...

work page doi:10.1007/978-1-4757-2545-2 1982