pith. sign in

The emergence of clusters in self-attention dynamics

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 10 2025 1

roles

background 1

polarities

background 1

clear filters

representative citing papers

Reachability and asymptotics of Gaussian Transformer dynamics

cs.LG · 2026-05-29 · unverdicted · novelty 8.0

Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.

Transformer-like Inference from Optimal Control

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Derives transformer-like dual-filter inference layers from first-principles optimal control on nonlinear discrete and linear Gaussian sequence models.

Constructive conditional normalizing flows

math.OC · 2026-02-09 · unverdicted · novelty 7.0

Explicit constructions approximate diffeomorphisms and pushforward measures via continuity equation flows with perceptron velocity fields of piecewise constant weights, using polar-like decompositions and probabilistic methods for regular maps.

Exact Sequence Interpolation with Transformers

cs.LG · 2025-02-04 · conditional · novelty 7.0

Transformers with O(sum m^j) blocks and O(d sum m^j) parameters can exactly interpolate any finite dataset of input sequences in R^d to output sequences of lengths m^j.

Propagation of Chaos in Contextual Flow Maps

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.

Measure-to-measure Regression with Transformers

cs.LG · 2026-05-27 · unverdicted · novelty 5.0

Formalizes nonlinear M2M regression and introduces transformer architectures as static maps and dynamic velocity fields between probability measures, tested on synthetic, particle, and organoid datasets.

citing papers explorer

Showing 10 of 10 citing papers after filters.

  • Reachability and asymptotics of Gaussian Transformer dynamics cs.LG · 2026-05-29 · unverdicted · none · ref 2

    Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.

  • Kinetic theory for Transformers and the lost-in-the-middle phenomenon math.AP · 2026-05-09 · conditional · none · ref 19

    A mean-field kinetic theory derivation produces a closed-form U-shaped token retrieval profile that explains the lost-in-the-middle phenomenon in Transformers.

  • Transformer-like Inference from Optimal Control cs.LG · 2026-05-15 · unverdicted · none · ref 3

    Derives transformer-like dual-filter inference layers from first-principles optimal control on nonlinear discrete and linear Gaussian sequence models.

  • Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models math.PR · 2026-04-29 · unverdicted · none · ref 28

    Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.

  • Continuous transformations of probability measures and their transport representations math.FA · 2026-04-17 · unverdicted · none · ref 18

    Lipschitz continuous transformations F of probability measures w.r.t. Wasserstein distance admit continuous transport maps f(·,μ) such that F(μ) = f(·,μ)_# μ.

  • Constructive conditional normalizing flows math.OC · 2026-02-09 · unverdicted · none · ref 8

    Explicit constructions approximate diffeomorphisms and pushforward measures via continuity equation flows with perceptron velocity fields of piecewise constant weights, using polar-like decompositions and probabilistic methods for regular maps.

  • Perceptrons and localization of attention's mean-field landscape cs.LG · 2026-01-29 · unverdicted · none · ref 9

    In the mean-field limit of attention with perceptron blocks, critical points of the energy landscape are generically atomic and localized on subsets of the unit sphere.

  • Propagation of Chaos in Contextual Flow Maps cs.LG · 2026-05-16 · unverdicted · none · ref 14

    Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.

  • Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows cs.LG · 2026-05-15 · unverdicted · none · ref 22

    Models multi-head transformer data flow as time-dependent Wasserstein gradient flows of an attention-capturing interaction energy, with proofs on omega-limit stationary points and stability under weight and input perturbations.

  • Measure-to-measure Regression with Transformers cs.LG · 2026-05-27 · unverdicted · none · ref 10

    Formalizes nonlinear M2M regression and introduces transformer architectures as static maps and dynamic velocity fields between probability measures, tested on synthetic, particle, and organoid datasets.