Neel Nanda, Lawrence Chan, Tom Liberum, Jess Smith, and Jacob Steinhardt

[MW26] Andrea Montanari, Zihao Wang · 2026 · arXiv 2602.01434

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Neural LoFi models deep learning as layer-wise spectral filtering that selects maximal low-degree correlations, yielding a tractable surrogate for hierarchical representation learning beyond the lazy regime.

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

cs.LG · 2026-02-18 · unverdicted · novelty 8.0

Grokking reflects escape from a metastable low-dimensional regime where transverse curvature accumulates before generalization, with subspace motion necessary but curvature boost insufficient.

The Geometry of Statistical Feature Learning in Mean-Field Langevin Dynamics

math.ST · 2026-06-30 · unverdicted · novelty 7.0

Spherical mean-field Langevin dynamics concentrate near hidden indices in Gaussian multi-index models with a sharp temperature transition at λ ≃ 1 and achieve d/N and Md/N rates in single-index models via Lévy-Milman concentration.

Phases of Muon: When Muon Eclipses SignSGD

math.OC · 2026-05-10 · unverdicted · novelty 7.0

On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.

The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure

cs.LG · 2026-02-19 · unverdicted · novelty 7.0

Multi-task grokking in Transformers produces staggered generalization, low-dimensional manifolds, weight-decay phase structure, holographic solutions, and transverse redundancy.

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

stat.ML · 2026-05-14 · unverdicted · novelty 6.0

For multi-index polynomials, the top r eigenspace of the AGOP matrix from KRR recovers the central subspace at sample complexity n ~ d^{p+δ} where p is the degree of the informative component.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Neel Nanda, Lawrence Chan, Tom Liberum, Jess Smith, and Jacob Steinhardt

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer