Extends high-dimensional KRR to product kernels, proving convergence rates that recover minimax optimality for source condition s ≤ 1, saturation for s > 1, and multiple-descent phenomena with respect to sample size n.
The generalization error of random features regression: precise asymptotics and the double descent curve
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Derives adaptive generalization bounds {c_m / N^{1/(2∨m)}} for digital ML models via new concentration of measure results on finite metric spaces, with c_m = O(sqrt(m)).
Repetition of training data produces a systematic eval loss peak at intermediate repeat counts whose location scales with model size, quantifiable as large compute-equivalent loss even at modest repetition fractions.
citing papers explorer
-
Large Dimensional Kernel Ridge Regression: Extending to Product Kernels
Extends high-dimensional KRR to product kernels, proving convergence rates that recover minimax optimality for source condition s ≤ 1, saturation for s > 1, and multiple-descent phenomena with respect to sample size n.
-
Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces
Derives adaptive generalization bounds {c_m / N^{1/(2∨m)}} for digital ML models via new concentration of measure results on finite metric spaces, with c_m = O(sqrt(m)).
-
Internal Data Repetition Destroys Language Models
Repetition of training data produces a systematic eval loss peak at intermediate repeat counts whose location scales with model size, quantifiable as large compute-equivalent loss even at modest repetition fractions.