Phase transitions for feature learning in neural networks

Andrea Montanari, Zihao Wang · 2026 · arXiv 2602.01434

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Neural LoFi models deep learning as layer-wise spectral filtering that selects maximal low-degree correlations, yielding a tractable surrogate for hierarchical representation learning beyond the lazy regime.

Phases of Muon: When Muon Eclipses SignSGD

math.OC · 2026-05-10 · unverdicted · novelty 7.0

On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

stat.ML · 2026-05-14 · unverdicted · novelty 6.0

For multi-index polynomials, the top r eigenspace of the AGOP matrix from KRR recovers the central subspace at sample complexity n ~ d^{p+δ} where p is the degree of the informative component.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 4 of 4 citing papers.

Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning cs.LG · 2026-05-13 · unverdicted · none · ref 59
Neural LoFi models deep learning as layer-wise spectral filtering that selects maximal low-degree correlations, yielding a tractable surrogate for hierarchical representation learning beyond the lazy regime.
Phases of Muon: When Muon Eclipses SignSGD math.OC · 2026-05-10 · unverdicted · none · ref 50
On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.
Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models stat.ML · 2026-05-14 · unverdicted · none · ref 40
For multi-index polynomials, the top r eigenspace of the AGOP matrix from KRR recovers the central subspace at sample complexity n ~ d^{p+δ} where p is the degree of the informative component.
There Will Be a Scientific Theory of Deep Learning stat.ML · 2026-04-23 · unverdicted · none · ref 20
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

Phase transitions for feature learning in neural networks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer