pith. machine review for the scientific record. sign in

The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagrams

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 3

verdicts

UNVERDICTED 3

representative citing papers

Uniform Scaling Limits in AdamW-Trained Transformers

stat.ML · 2026-05-11 · unverdicted · novelty 7.0

AdamW-trained transformer hidden states and backpropagated variables converge uniformly in L2 to a forward-backward ODE system (McKean-Vlasov when non-causal) at rate O(L^{-1}+L^{-1/3}H^{-1/2}) as depth L and heads H increase, with bounds independent of token number.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 3 of 3 citing papers.

  • Uniform Scaling Limits in AdamW-Trained Transformers stat.ML · 2026-05-11 · unverdicted · none · ref 11

    AdamW-trained transformer hidden states and backpropagated variables converge uniformly in L2 to a forward-backward ODE system (McKean-Vlasov when non-causal) at rate O(L^{-1}+L^{-1/3}H^{-1/2}) as depth L and heads H increase, with bounds independent of token number.

  • Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models math.PR · 2026-04-29 · unverdicted · none · ref 13

    Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.

  • There Will Be a Scientific Theory of Deep Learning stat.ML · 2026-04-23 · unverdicted · none · ref 13

    A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.