arXiv preprint arXiv:2411.04551 , year=

Measure-to-measure interpolation using Transformers , author= · 2024 · arXiv 2411.04551

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Reachability and asymptotics of Gaussian Transformer dynamics

cs.LG · 2026-05-29 · unverdicted · novelty 8.0

Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.

Kinetic theory for Transformers and the lost-in-the-middle phenomenon

math.AP · 2026-05-09 · conditional · novelty 8.0

A mean-field kinetic theory derivation produces a closed-form U-shaped token retrieval profile that explains the lost-in-the-middle phenomenon in Transformers.

Transformer-like Inference from Optimal Control

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Derives transformer-like dual-filter inference layers from first-principles optimal control on nonlinear discrete and linear Gaussian sequence models.

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

math.PR · 2026-04-29 · unverdicted · novelty 7.0

Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.

Continuous transformations of probability measures and their transport representations

math.FA · 2026-04-17 · unverdicted · novelty 7.0

Lipschitz continuous transformations F of probability measures w.r.t. Wasserstein distance admit continuous transport maps f(·,μ) such that F(μ) = f(·,μ)_# μ.

Constructive conditional normalizing flows

math.OC · 2026-02-09 · unverdicted · novelty 7.0

Explicit constructions approximate diffeomorphisms and pushforward measures via continuity equation flows with perceptron velocity fields of piecewise constant weights, using polar-like decompositions and probabilistic methods for regular maps.

Perceptrons and localization of attention's mean-field landscape

cs.LG · 2026-01-29 · unverdicted · novelty 7.0

In the mean-field limit of attention with perceptron blocks, critical points of the energy landscape are generically atomic and localized on subsets of the unit sphere.

Exact Sequence Interpolation with Transformers

cs.LG · 2025-02-04 · conditional · novelty 7.0

Transformers with O(sum m^j) blocks and O(d sum m^j) parameters can exactly interpolate any finite dataset of input sequences in R^d to output sequences of lengths m^j.

Propagation of Chaos in Contextual Flow Maps

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.

Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Models multi-head transformer data flow as time-dependent Wasserstein gradient flows of an attention-capturing interaction energy, with proofs on omega-limit stationary points and stability under weight and input perturbations.

citing papers explorer

Showing 10 of 10 citing papers.

Reachability and asymptotics of Gaussian Transformer dynamics cs.LG · 2026-05-29 · unverdicted · none · ref 2
Gaussian distributions are invariant under the mean-field Transformer flow, reducing infinite-dimensional dynamics to a bilinear control system on mean and covariance with explicit reachability and stability results.
Kinetic theory for Transformers and the lost-in-the-middle phenomenon math.AP · 2026-05-09 · conditional · none · ref 19
A mean-field kinetic theory derivation produces a closed-form U-shaped token retrieval profile that explains the lost-in-the-middle phenomenon in Transformers.
Transformer-like Inference from Optimal Control cs.LG · 2026-05-15 · unverdicted · none · ref 3
Derives transformer-like dual-filter inference layers from first-principles optimal control on nonlinear discrete and linear Gaussian sequence models.
Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models math.PR · 2026-04-29 · unverdicted · none · ref 28
Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.
Continuous transformations of probability measures and their transport representations math.FA · 2026-04-17 · unverdicted · none · ref 18
Lipschitz continuous transformations F of probability measures w.r.t. Wasserstein distance admit continuous transport maps f(·,μ) such that F(μ) = f(·,μ)_# μ.
Constructive conditional normalizing flows math.OC · 2026-02-09 · unverdicted · none · ref 8
Explicit constructions approximate diffeomorphisms and pushforward measures via continuity equation flows with perceptron velocity fields of piecewise constant weights, using polar-like decompositions and probabilistic methods for regular maps.
Perceptrons and localization of attention's mean-field landscape cs.LG · 2026-01-29 · unverdicted · none · ref 9
In the mean-field limit of attention with perceptron blocks, critical points of the energy landscape are generically atomic and localized on subsets of the unit sphere.
Exact Sequence Interpolation with Transformers cs.LG · 2025-02-04 · conditional · none · ref 12
Transformers with O(sum m^j) blocks and O(d sum m^j) parameters can exactly interpolate any finite dataset of input sequences in R^d to output sequences of lengths m^j.
Propagation of Chaos in Contextual Flow Maps cs.LG · 2026-05-16 · unverdicted · none · ref 14
Derives forward and backward propagation-of-chaos bounds for finite vs. infinite-context transformers modeled as contextual flow maps, achieving Wasserstein rate n^{-1/d} generally and n^{-1/2} for transformer-like cases.
Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows cs.LG · 2026-05-15 · unverdicted · none · ref 22
Models multi-head transformer data flow as time-dependent Wasserstein gradient flows of an attention-capturing interaction energy, with proofs on omega-limit stationary points and stability under weight and input perturbations.

arXiv preprint arXiv:2411.04551 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer