pith. sign in

The hidden width of deep resnets: Tight error bounds and phase diagrams

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

baseline 1

citation-polarity summary

years

2026 6

verdicts

UNVERDICTED 6

roles

baseline 1

polarities

baseline 1

representative citing papers

Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks

stat.ML · 2026-05-21 · unverdicted · novelty 7.0

Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.

Uniform Scaling Limits in AdamW-Trained Transformers

stat.ML · 2026-05-11 · unverdicted · novelty 7.0

AdamW-trained transformer hidden states and backpropagated variables converge uniformly in L2 to a forward-backward ODE system (McKean-Vlasov when non-causal) at rate O(L^{-1}+L^{-1/3}H^{-1/2}) as depth L and heads H increase, with bounds independent of token number.

Propagation of Chaos in Contextual Flow Maps

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 6 of 6 citing papers.

  • Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks stat.ML · 2026-05-21 · unverdicted · none · ref 3

    Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.

  • Uniform Scaling Limits in AdamW-Trained Transformers stat.ML · 2026-05-11 · unverdicted · none · ref 11

    AdamW-trained transformer hidden states and backpropagated variables converge uniformly in L2 to a forward-backward ODE system (McKean-Vlasov when non-causal) at rate O(L^{-1}+L^{-1/3}H^{-1/2}) as depth L and heads H increase, with bounds independent of token number.

  • Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models math.PR · 2026-04-29 · unverdicted · none · ref 13

    Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.

  • Propagation of Chaos in Contextual Flow Maps cs.LG · 2026-05-16 · unverdicted · none · ref 9

    Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.

  • Functional Scaling Limits of Interpolated Correlated Random Walks in H\"older Topology math.PR · 2026-06-02 · unverdicted · none · ref 24

    Strengthens classical scaling limit theorems for correlated random walks to functional convergence in Hölder and rough Hölder topologies.

  • There Will Be a Scientific Theory of Deep Learning stat.ML · 2026-04-23 · unverdicted · none · ref 13

    A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.