Derives exact Frobenius norm imbalance identity for deep nonlinear networks, classifies activations into four classes, and obtains critical-depth escape time law τ★ = Θ(ε^{-(r-2)}) from reduction to scalar ODE on permutation-symmetric submanifold.
A mean field view of the landscape of two-layer neural networks
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
In double asymptotic limits, the squared singular value process of non-square matrix products obeys geometric Dyson Brownian motion whose T-transform solves a Burgers equation, producing the free log-normal law via free multiplicative convolution.
Transformer residual layers are approximated as an explicit Euler scheme for a controlled hidden-state flow whose mean-field limit is a first-order transport control problem with Pontryagin terminal condition given by the softmax residual.
Maximum entropy inference on weight distributions under context-dependent task constraints produces neuron populations with contextual gain modulation whose connectivity matches gradient-descent trained networks, with transitions to random structure as context count or weight scale increases.
Gradient descent on wide shallow models with bounded nonlinearities converges globally in the mean-field limit as non-global critical points are unstable under the dynamics.
citing papers explorer
-
A Theory of Saddle Escape in Deep Nonlinear Networks
Derives exact Frobenius norm imbalance identity for deep nonlinear networks, classifies activations into four classes, and obtains critical-depth escape time law τ★ = Θ(ε^{-(r-2)}) from reduction to scalar ODE on permutation-symmetric submanifold.
-
Geometric Dyson Brownian Motions and the Free Log-Normal Limit for a Non-Square Product of Random Matrices
In double asymptotic limits, the squared singular value process of non-square matrix products obeys geometric Dyson Brownian motion whose T-transform solves a Burgers equation, producing the free log-normal law via free multiplicative convolution.
-
A First-Order Mean Field Control Analysis of Transformer Layers under Cross-Entropy Training
Transformer residual layers are approximated as an explicit Euler scheme for a controlled hidden-state flow whose mean-field limit is a first-order transport control problem with Pontryagin terminal condition given by the softmax residual.
-
Balancing structure and randomness: maximum entropy networks for context-dependent computations
Maximum entropy inference on weight distributions under context-dependent task constraints produces neuron populations with contextual gain modulation whose connectivity matches gradient-descent trained networks, with transitions to random structure as context count or weight scale increases.
-
On the global convergence of gradient descent for wide shallow models with bounded nonlinearities
Gradient descent on wide shallow models with bounded nonlinearities converges globally in the mean-field limit as non-global critical points are unstable under the dynamics.