pith. sign in

The mean-field dynamics of transformers

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 14

verdicts

UNVERDICTED 14

roles

background 2

polarities

background 2

clear filters

representative citing papers

Phase transitions for the noisy transformer model in arbitrary dimension

math.AP · 2026-06-03 · unverdicted · novelty 7.0

In every dimension d≥2 there exists a unique β_*^{(d)}>0 such that the uniform density on the sphere is the unique global minimizer of the USA free energy up to the linear-stability threshold K_# for β≤β_*, yielding a continuous transition, while for β>β_* the uniform density is not globally minimiz

The physics of AI weather models

physics.ao-ph · 2026-05-22 · unverdicted · novelty 7.0

AI weather models may simulate the atmosphere via particle positions in latent space whose updates follow gradient flow on a learned free energy functional rather than conventional physical equations.

Uniform Scaling Limits in AdamW-Trained Transformers

stat.ML · 2026-05-11 · unverdicted · novelty 7.0

AdamW-trained transformer hidden states and backpropagated variables converge uniformly in L2 to a forward-backward ODE system (McKean-Vlasov when non-causal) at rate O(L^{-1}+L^{-1/3}H^{-1/2}) as depth L and heads H increase, with bounds independent of token number.

Spectral Selection in Symmetric Self-Attention Dynamics

math.DS · 2026-04-28 · unverdicted · novelty 7.0

Symmetric self-attention dynamics select the dominant eigendirection of V, producing homogeneous alignment when one positive eigenvalue dominates or sign-split polarization when V is negative definite.

Propagation of Chaos in Contextual Flow Maps

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.

Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables

cs.LG · 2026-05-28 · unverdicted · novelty 5.0

Auxiliary variables prevent mode collapse in mean-field transformers, with the limit distribution being the pushforward of the auxiliary distribution, and positional encoding and prompt insertion have universality of representation.

citing papers explorer

Showing 1 of 1 citing paper after filters.