pith. sign in

Learning Quadratic Neural Networks in High Dimensions

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 5

roles

background 2

polarities

background 2

representative citing papers

Phases of Muon: When Muon Eclipses SignSGD

math.OC · 2026-05-10 · unverdicted · novelty 7.0

On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.

Asymmetric Scaling Laws from Sparse Features

stat.ML · 2026-05-22 · unverdicted · novelty 5.0

A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 5 of 5 citing papers.

  • Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model stat.ML · 2026-05-14 · accept · none · ref 45

    A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.

  • Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks stat.ML · 2026-05-11 · unverdicted · none · ref 2

    In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).

  • Phases of Muon: When Muon Eclipses SignSGD math.OC · 2026-05-10 · unverdicted · none · ref 6

    On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.

  • Asymmetric Scaling Laws from Sparse Features stat.ML · 2026-05-22 · unverdicted · none · ref 4

    A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.

  • There Will Be a Scientific Theory of Deep Learning stat.ML · 2026-04-23 · unverdicted · none · ref 56

    A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.