arxiv: 2604.16653 · v1 · submitted 2026-04-17 · 🧮 math.FA · math.OC· math.PR

Recognition: unknown

Continuous transformations of probability measures and their transport representations

Hugo Lavenant , Giuseppe Savar\'e

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:51 UTC · model grok-4.3

classification 🧮 math.FA math.OCmath.PR

keywords transport representationsWasserstein distanceprobability measuresLipschitz continuitycontinuous selectionspush-forward mapstransformations of measuresoptimal transport

0 comments

The pith

If a map F between probability measures is Lipschitz continuous in the Wasserstein distance, then it admits a continuous transport representation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks when a transformation F of probability measures can be realized as the push-forward of the input measure under a map f that itself depends on the measure. It establishes that continuity of F alone does not guarantee a continuous choice of f, even when a transport representation exists for each fixed measure. Lipschitz continuity of F with respect to the Wasserstein distance is sufficient to ensure the representing map can be chosen continuously in a suitable topology. This regularity matters because it supports stable approximations of measure-valued operations, such as those arising in transformer models that act on distributions rather than points.

Core claim

Given a function F mapping probability measures to probability measures, a transport representation consists of a family of maps f(·, μ) such that F(μ) equals the push-forward of μ under f(·, μ). The central result is that Lipschitz continuity of F in the Wasserstein distance allows selection of a continuous f, while mere continuity of F does not guarantee such a continuous selection. The authors supply concrete counterexamples showing the necessity of the Lipschitz assumption.

What carries the argument

The transport representation given by a μ-dependent map f(·, μ) whose push-forward recovers F(μ), with continuity of f enforced by the Lipschitz condition on F in the Wasserstein metric.

If this is right

Continuous selections of transport maps become available for any Lipschitz transformation of measures, enabling uniform approximation schemes.
Transformations satisfying the Lipschitz condition can be stably discretized or learned without introducing discontinuities in the representation.
The provided counterexamples delimit the precise boundary between continuous and merely measurable selections of transport maps.
Results apply in general Polish spaces supporting the Wasserstein metric, not just Euclidean domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The continuous selection could be leveraged to define differentiable flows or gradients through measure transformations in optimization settings.
Similar continuity statements might hold for other transport costs or unbalanced optimal transport formulations.
In machine-learning contexts the result supplies a theoretical justification for training transformer architectures directly on empirical measures when the target map is Lipschitz.

Load-bearing premise

That a transport map realizing F(μ) as a push-forward exists for every individual measure μ, and that the underlying space is equipped with a well-defined Wasserstein distance.

What would settle it

An explicit Lipschitz continuous F on the space of probability measures together with a sequence of measures μ_n converging in Wasserstein distance to μ such that the corresponding transport maps f(·, μ_n) fail to converge to f(·, μ) in any reasonable topology.

read the original abstract

Given a function $F$ transforming a probability measure $\mu$ into another one $F(\mu)$, we study the existence and regularity of a transport representation of it. That is, we ask whether we can represent the image $F(\mu)$ of the input probability measure $\mu$ as the push-forward of $\mu$ by a map $f(\cdot,\mu)$ which may depend on $\mu$; and furthermore, how regular $f$ can be chosen depending on $F$. Even if $F$ is continuous and a transport representative exists, it cannot necessarily be chosen in a continuous way; however, if $F$ is Lipschitz continuous with respect to the Wasserstein distance, then $f$ can be chosen continuous. We provide several examples to illustrate the sharpness of our assumptions. This question is motivated by approximation results for transformations of probability distributions with transformers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lipschitz continuity of F in Wasserstein distance allows a continuous transport map f, while plain continuity does not, with sharpness examples.

read the letter

The main thing to know is that if F is Lipschitz continuous in the Wasserstein metric, then you can find a continuous map f such that F(mu) is the pushforward of mu by f, and this fails for merely continuous F. The paper gives examples to prove the sharpness of the distinction. This organizes a regularity upgrade that was not previously isolated in this form. The work does well by building directly on standard optimal transport tools to get the continuous selection under the Lipschitz assumption. The motivation from transformer approximations for distributions is noted but stays in the background, which keeps the focus on the math. The examples are concrete and illustrate why the Lipschitz condition cannot be relaxed. The citation pattern draws on established results without circularity or over-reliance on self-citation. The central argument holds up: the Lipschitz hypothesis controls the variation across measures enough to select continuous f. One soft spot is that quantitative bounds on the modulus of continuity for f are not developed, so the result is existential rather than giving rates. The assumption that some transport representative exists for continuous F is taken as standard, which works in the usual settings but could use a brief remark for more general spaces. This paper is for specialists in optimal transport and functional analysis who care about regularity of maps on measures. Readers interested in theoretical support for distribution transformations in ML might pick up the motivation, but the core value is the precise condition and the counterexamples. It shows clear thinking and honest engagement with the literature. I would send it to peer review because the claim is focused, the evidence is direct, and the distinction is useful.

Referee Report

2 major / 3 minor

Summary. The manuscript examines transformations F that map probability measures μ to F(μ) and asks when F(μ) can be realized as the push-forward of μ under a map f(·, μ) that depends on μ. The central claim is that continuity of F alone does not guarantee a continuous choice of f, even when a transport representative exists, but Lipschitz continuity of F with respect to the Wasserstein metric does permit a continuous selection. Several examples are provided to show that the Lipschitz assumption is sharp. The work is motivated by approximation questions arising in transformer models for probability distributions.

Significance. If the main result holds, the paper supplies a clean regularity statement in optimal transport that distinguishes the roles of continuity and Lipschitz continuity for the existence of continuous transport representatives. This distinction is illustrated by explicit counterexamples, and the result could inform stability analyses in settings where measure-valued maps are approximated by neural networks. The absence of free parameters or ad-hoc constructions in the stated theorem is a positive feature.

major comments (2)

[Main theorem / §3] The statement of the main result (presumably Theorem 3.1 or equivalent) conditions on the existence of some (measurable) transport representative f(·, μ) for every continuous F. This assumption is load-bearing for the regularity question to be meaningful, yet the manuscript does not specify the precise class of spaces (e.g., Polish, compact, or separable metric) under which such representatives are guaranteed to exist; without this, the scope of the Lipschitz upgrade remains unclear.
[§4] §4 (examples): The counterexample demonstrating that mere continuity of F does not yield a continuous f should explicitly verify that the underlying space admits a well-defined Wasserstein metric and that the constructed F is indeed continuous but not Lipschitz; otherwise the sharpness claim rests on an implicit verification that is not load-bearing if omitted.

minor comments (3)

[Abstract] The abstract refers to 'several examples' without indicating their location or number; adding a sentence such as 'Examples 4.1–4.3 illustrate sharpness' would improve navigation.
[§2 (Preliminaries)] Notation for the Wasserstein distance (W or W_p) and the precise p-value used in the Lipschitz condition should be introduced in the preliminaries rather than assumed from context.
[Introduction] In the motivation paragraph, the link to transformer approximation results is stated but not referenced; adding one or two citations to relevant works on measure-valued neural networks would strengthen the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the constructive comments on our manuscript. We will incorporate clarifications in a minor revision to address the points raised.

read point-by-point responses

Referee: [Main theorem / §3] The statement of the main result (presumably Theorem 3.1 or equivalent) conditions on the existence of some (measurable) transport representative f(·, μ) for every continuous F. This assumption is load-bearing for the regularity question to be meaningful, yet the manuscript does not specify the precise class of spaces (e.g., Polish, compact, or separable metric) under which such representatives are guaranteed to exist; without this, the scope of the Lipschitz upgrade remains unclear.

Authors: We agree that the setting requires explicit clarification. The manuscript is set in Polish spaces (complete separable metric spaces), which is the standard framework ensuring the Wasserstein metric is well-defined on the space of probability measures with finite moments. The main result is conditional on the existence of at least one measurable transport representative for each μ; it does not claim that such representatives exist for every continuous F. We will add a short preliminary subsection (or remark) in §3 stating the assumptions on the space X and emphasizing the conditional nature of the theorem. This improves readability without changing the result. revision: yes
Referee: [§4] §4 (examples): The counterexample demonstrating that mere continuity of F does not yield a continuous f should explicitly verify that the underlying space admits a well-defined Wasserstein metric and that the constructed F is indeed continuous but not Lipschitz; otherwise the sharpness claim rests on an implicit verification that is not load-bearing if omitted.

Authors: We thank the referee for this suggestion. The counterexamples are constructed on standard Polish spaces (e.g., the unit interval [0,1] or R^d), where the Wasserstein metric is well-defined. In the revised manuscript we will explicitly record that these spaces are Polish, state the Wasserstein distance used, and provide a short direct argument verifying that the constructed F is continuous but fails to be Lipschitz (e.g., by exhibiting pairs of measures whose Wasserstein distance scales differently from the distance between their images). This makes the sharpness of the Lipschitz hypothesis fully explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper establishes an existence and regularity result for transport representations of measure transformations: when F is Lipschitz continuous w.r.t. the Wasserstein metric, a continuous selection f(·,μ) exists, while mere continuity of F does not guarantee this. This is a direct theorem relying on standard optimal transport machinery (existence of transport maps under the given assumptions, selection theorems for continuity upgrade under Lipschitz conditions) rather than any self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The weakest assumptions are explicitly stated as prerequisites, and examples are provided only to show sharpness, without reducing the main claim to its inputs by construction. The derivation chain is self-contained against external benchmarks in the field.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard background from optimal transport (Polish spaces, existence of transport maps under certain conditions) without introducing new free parameters or invented entities in the abstract.

axioms (1)

standard math The underlying space is a Polish metric space so that the Wasserstein distance is well-defined and transport maps exist under suitable conditions.
Invoked implicitly when discussing Wasserstein continuity and push-forwards.

pith-pipeline@v0.9.0 · 5447 in / 1149 out tokens · 35496 ms · 2026-05-10T06:51:52.721128+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Alberti, N

S. Alberti, N. Dern, L. Thesing, and G. Kutyniok , Sumformer: Universal approximation for efficient transformers , in Topological, Algebraic and Geometric Learning Workshops 2023, PMLR, 2023, pp. 72--86

2023
[2]

C. D. Aliprantis and K. C. Border , Infinite dimensional analysis: a hitchhiker's guide , Springer Science & Business Media, 2006

2006
[3]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savar \'e , Gradient flows in metric spaces and in the space of probability measures , Lectures in Mathematics ETH Zurich, Birkh \"a user Verlag, 2008

2008
[4]

Ambrosio and P

L. Ambrosio and P. Tilli , Topics on analysis in metric spaces , Oxford University Press, 2004

2004
[5]

Bergin , On the continuity of correspondences on sets of measures with restricted marginals , Econom

J. Bergin , On the continuity of correspondences on sets of measures with restricted marginals , Econom. Theory, 13 (1999), pp. 471--481

1999
[6]

V. I. Bogachev , Measure theory , vol. 2, Springer, 2007

2007
[7]

Brenier and W

Y. Brenier and W. Gangbo , Approximation of maps by diffeomorphisms , Calc. Var. Partial Differential Equations, 16 (2003), pp. 147--164

2003
[8]

Cardaliaguet , Notes on mean field games , tech

P. Cardaliaguet , Notes on mean field games , tech. rep., 2010

2010
[9]

Carmona and F

R. Carmona and F. c. Delarue , Probabilistic theory of mean field games with applications. I , vol. 83 of Probability Theory and Stochastic Modelling, Springer, Cham, 2018. Mean field FBSDEs, control, and games

2018
[10]

A Lagrangian approach to totally dissipative evolutions in Wasserstein spaces

G. Cavagnari, G. Savar \'e , and G. E. Sodini , A Lagrangian approach to totally dissipative evolutions in Wasserstein spaces , arXiv preprint arXiv:2305.05211, (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Cavagnari, G

G. Cavagnari, G. Savar\'e, and G. E. Sodini , Extension of monotone operators and L ipschitz maps invariant for a group of isometries , Canad. J. Math., 77 (2025), pp. 149--186

2025
[12]

Fornasier, G

M. Fornasier, G. Savar \'e , and G. E. Sodini , Density of subalgebras of Lipschitz functions in metric Sobolev spaces and applications to Wasserstein Sobolev spaces , J. Funct. Anal., 285 (2023), p. 110153

2023
[13]

Furuya, M

T. Furuya, M. V. de Hoop, and G. Peyr \'e , Transformers are universal in-context learners , arXiv preprint arXiv:2408.01367, (2024)

work page arXiv 2024
[14]

Garc\'ia Trillos and D

N. Garc\'ia Trillos and D. Slep c ev , Continuum limit of total variation on point clouds , Arch. Ration. Mech. Anal., 220 (2016), pp. 193--241

2016
[15]

height 2pt depth -1.6pt width 23pt, A variational approach to the consistency of spectral clustering , Appl. Comput. Harmon. Anal., 45 (2018), pp. 239--281

2018
[16]

M. R. Garey and D. S. Johnson , Computers and Intractability: a Guide to the Theory of NP-Completeness , W.H. freeman New York, 1979

1979
[17]

Geshkovski, C

B. Geshkovski, C. Letrouit, Y. Polyanskiy, and P. Rigollet , A mathematical perspective on transformers , Bull. Amer. Math. Soc., 62 (2025), pp. 427--479

2025
[18]

Measure-to-measure inter- polation using transformers.arXiv preprint arXiv:2411.04551, 2024

B. Geshkovski, P. Rigollet, and D. Ruiz-Balet , Measure-to-measure interpolation using transformers , arXiv preprint arXiv:2411.04551, (2024)

work page arXiv 2024
[19]

Ghossoub and D

M. Ghossoub and D. Saunders , On the continuity of the feasible set mapping in optimal transport , Econ. Theory Bull., 9 (2021), pp. 113--117

2021
[20]

Kallenberg , Random measures, theory and applications , vol

O. Kallenberg , Random measures, theory and applications , vol. 1, Springer, 2017

2017
[21]

Pinkus , Approximation theory of the MLP model in neural networks , Acta Numer., 8 (1999), pp

A. Pinkus , Approximation theory of the MLP model in neural networks , Acta Numer., 8 (1999), pp. 143--195

1999
[22]

M. E. Sander, P. Ablin, M. Blondel, and G. Peyr \'e , Sinkformers: Transformers with doubly stochastic attention , in International Conference on Artificial Intelligence and Statistics, PMLR, 2022, pp. 3515--3530

2022
[23]

Thorpe, S

M. Thorpe, S. Park, S. Kolouri, G. K. Rohde, and D. Slep c ev , A transportation L^p distance for signal analysis , J. Math. Imaging Vision, 59 (2017), pp. 187--210

2017
[24]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin , Attention is all you need , Advances in neural information processing systems, 30 (2017)

2017
[25]

Villani , Optimal transport: old and new , vol

C. Villani , Optimal transport: old and new , vol. 338, Springer, 2009

2009
[26]

A mathematical theory of attention

J. Vuckovic, A. Baratin, and R. Tachet des Combes , A mathematical theory of attention , arXiv preprint arXiv:2007.02876, (2020)

work page arXiv 2007
[27]

C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi, and S. Kumar , Are transformers universal approximators of sequence-to-sequence functions? , arXiv preprint arXiv:1912.10077, (2019)

work page arXiv 1912