pith. sign in

arXiv preprint arXiv:2411.04551 , year=

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 9 2025 1

roles

background 1

polarities

background 1

representative citing papers

Reachability and asymptotics of Gaussian Transformer dynamics

cs.LG · 2026-05-29 · unverdicted · novelty 8.0

Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.

Transformer-like Inference from Optimal Control

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Derives transformer-like dual-filter inference layers from first-principles optimal control on nonlinear discrete and linear Gaussian sequence models.

Constructive conditional normalizing flows

math.OC · 2026-02-09 · unverdicted · novelty 7.0

Explicit constructions approximate diffeomorphisms and pushforward measures via continuity equation flows with perceptron velocity fields of piecewise constant weights, using polar-like decompositions and probabilistic methods for regular maps.

Exact Sequence Interpolation with Transformers

cs.LG · 2025-02-04 · conditional · novelty 7.0

Transformers with O(sum m^j) blocks and O(d sum m^j) parameters can exactly interpolate any finite dataset of input sequences in R^d to output sequences of lengths m^j.

Propagation of Chaos in Contextual Flow Maps

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.

citing papers explorer

Showing 10 of 10 citing papers.

  • Reachability and asymptotics of Gaussian Transformer dynamics cs.LG · 2026-05-29 · unverdicted · none · ref 2

    Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.

  • Kinetic theory for Transformers and the lost-in-the-middle phenomenon math.AP · 2026-05-09 · conditional · none · ref 19

    A mean-field kinetic theory derivation produces a closed-form U-shaped token retrieval profile that explains the lost-in-the-middle phenomenon in Transformers.

  • Transformer-like Inference from Optimal Control cs.LG · 2026-05-15 · unverdicted · none · ref 3

    Derives transformer-like dual-filter inference layers from first-principles optimal control on nonlinear discrete and linear Gaussian sequence models.

  • Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models math.PR · 2026-04-29 · unverdicted · none · ref 28

    Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.

  • Continuous transformations of probability measures and their transport representations math.FA · 2026-04-17 · unverdicted · none · ref 18

    Lipschitz continuous transformations F of probability measures w.r.t. Wasserstein distance admit continuous transport maps f(·,μ) such that F(μ) = f(·,μ)_# μ.

  • Constructive conditional normalizing flows math.OC · 2026-02-09 · unverdicted · none · ref 8

    Explicit constructions approximate diffeomorphisms and pushforward measures via continuity equation flows with perceptron velocity fields of piecewise constant weights, using polar-like decompositions and probabilistic methods for regular maps.

  • Perceptrons and localization of attention's mean-field landscape cs.LG · 2026-01-29 · unverdicted · none · ref 9

    In the mean-field limit of attention with perceptron blocks, critical points of the energy landscape are generically atomic and localized on subsets of the unit sphere.

  • Exact Sequence Interpolation with Transformers cs.LG · 2025-02-04 · conditional · none · ref 12

    Transformers with O(sum m^j) blocks and O(d sum m^j) parameters can exactly interpolate any finite dataset of input sequences in R^d to output sequences of lengths m^j.

  • Propagation of Chaos in Contextual Flow Maps cs.LG · 2026-05-16 · unverdicted · none · ref 14

    Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.

  • Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows cs.LG · 2026-05-15 · unverdicted · none · ref 22

    Models multi-head transformer data flow as time-dependent Wasserstein gradient flows of an attention-capturing interaction energy, with proofs on omega-limit stationary points and stability under weight and input perturbations.