Perceptrons and localization of attention’s mean-field landscape

Antonio Álvarez-López, Borjan Geshkovski, Domènec Ruiz-Balet · 2026 · arXiv 2601.21366

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

representative citing papers

Kinetic theory for Transformers and the lost-in-the-middle phenomenon

math.AP · 2026-05-09 · conditional · novelty 8.0

A mean-field kinetic theory derivation produces a closed-form U-shaped token retrieval profile that explains the lost-in-the-middle phenomenon in Transformers.

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

math.PR · 2026-04-29 · unverdicted · novelty 7.0

Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.

Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime

math.AP · 2026-05-11 · unverdicted · novelty 6.0

In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).

citing papers explorer

Showing 3 of 3 citing papers.

Kinetic theory for Transformers and the lost-in-the-middle phenomenon math.AP · 2026-05-09 · conditional · none · ref 1 · internal anchor
A mean-field kinetic theory derivation produces a closed-form U-shaped token retrieval profile that explains the lost-in-the-middle phenomenon in Transformers.
Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models math.PR · 2026-04-29 · unverdicted · none · ref 2 · internal anchor
Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.
Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime math.AP · 2026-05-11 · unverdicted · none · ref 5 · internal anchor
In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).

Perceptrons and localization of attention’s mean-field landscape

fields

years

verdicts

representative citing papers

citing papers explorer