Recognition: unknown
Continuous transformations of probability measures and their transport representations
Pith reviewed 2026-05-10 06:51 UTC · model grok-4.3
The pith
If a map F between probability measures is Lipschitz continuous in the Wasserstein distance, then it admits a continuous transport representation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a function F mapping probability measures to probability measures, a transport representation consists of a family of maps f(·, μ) such that F(μ) equals the push-forward of μ under f(·, μ). The central result is that Lipschitz continuity of F in the Wasserstein distance allows selection of a continuous f, while mere continuity of F does not guarantee such a continuous selection. The authors supply concrete counterexamples showing the necessity of the Lipschitz assumption.
What carries the argument
The transport representation given by a μ-dependent map f(·, μ) whose push-forward recovers F(μ), with continuity of f enforced by the Lipschitz condition on F in the Wasserstein metric.
If this is right
- Continuous selections of transport maps become available for any Lipschitz transformation of measures, enabling uniform approximation schemes.
- Transformations satisfying the Lipschitz condition can be stably discretized or learned without introducing discontinuities in the representation.
- The provided counterexamples delimit the precise boundary between continuous and merely measurable selections of transport maps.
- Results apply in general Polish spaces supporting the Wasserstein metric, not just Euclidean domains.
Where Pith is reading between the lines
- The continuous selection could be leveraged to define differentiable flows or gradients through measure transformations in optimization settings.
- Similar continuity statements might hold for other transport costs or unbalanced optimal transport formulations.
- In machine-learning contexts the result supplies a theoretical justification for training transformer architectures directly on empirical measures when the target map is Lipschitz.
Load-bearing premise
That a transport map realizing F(μ) as a push-forward exists for every individual measure μ, and that the underlying space is equipped with a well-defined Wasserstein distance.
What would settle it
An explicit Lipschitz continuous F on the space of probability measures together with a sequence of measures μ_n converging in Wasserstein distance to μ such that the corresponding transport maps f(·, μ_n) fail to converge to f(·, μ) in any reasonable topology.
read the original abstract
Given a function $F$ transforming a probability measure $\mu$ into another one $F(\mu)$, we study the existence and regularity of a transport representation of it. That is, we ask whether we can represent the image $F(\mu)$ of the input probability measure $\mu$ as the push-forward of $\mu$ by a map $f(\cdot,\mu)$ which may depend on $\mu$; and furthermore, how regular $f$ can be chosen depending on $F$. Even if $F$ is continuous and a transport representative exists, it cannot necessarily be chosen in a continuous way; however, if $F$ is Lipschitz continuous with respect to the Wasserstein distance, then $f$ can be chosen continuous. We provide several examples to illustrate the sharpness of our assumptions. This question is motivated by approximation results for transformations of probability distributions with transformers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript examines transformations F that map probability measures μ to F(μ) and asks when F(μ) can be realized as the push-forward of μ under a map f(·, μ) that depends on μ. The central claim is that continuity of F alone does not guarantee a continuous choice of f, even when a transport representative exists, but Lipschitz continuity of F with respect to the Wasserstein metric does permit a continuous selection. Several examples are provided to show that the Lipschitz assumption is sharp. The work is motivated by approximation questions arising in transformer models for probability distributions.
Significance. If the main result holds, the paper supplies a clean regularity statement in optimal transport that distinguishes the roles of continuity and Lipschitz continuity for the existence of continuous transport representatives. This distinction is illustrated by explicit counterexamples, and the result could inform stability analyses in settings where measure-valued maps are approximated by neural networks. The absence of free parameters or ad-hoc constructions in the stated theorem is a positive feature.
major comments (2)
- [Main theorem / §3] The statement of the main result (presumably Theorem 3.1 or equivalent) conditions on the existence of some (measurable) transport representative f(·, μ) for every continuous F. This assumption is load-bearing for the regularity question to be meaningful, yet the manuscript does not specify the precise class of spaces (e.g., Polish, compact, or separable metric) under which such representatives are guaranteed to exist; without this, the scope of the Lipschitz upgrade remains unclear.
- [§4] §4 (examples): The counterexample demonstrating that mere continuity of F does not yield a continuous f should explicitly verify that the underlying space admits a well-defined Wasserstein metric and that the constructed F is indeed continuous but not Lipschitz; otherwise the sharpness claim rests on an implicit verification that is not load-bearing if omitted.
minor comments (3)
- [Abstract] The abstract refers to 'several examples' without indicating their location or number; adding a sentence such as 'Examples 4.1–4.3 illustrate sharpness' would improve navigation.
- [§2 (Preliminaries)] Notation for the Wasserstein distance (W or W_p) and the precise p-value used in the Lipschitz condition should be introduced in the preliminaries rather than assumed from context.
- [Introduction] In the motivation paragraph, the link to transformer approximation results is stated but not referenced; adding one or two citations to relevant works on measure-valued neural networks would strengthen the introduction.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the constructive comments on our manuscript. We will incorporate clarifications in a minor revision to address the points raised.
read point-by-point responses
-
Referee: [Main theorem / §3] The statement of the main result (presumably Theorem 3.1 or equivalent) conditions on the existence of some (measurable) transport representative f(·, μ) for every continuous F. This assumption is load-bearing for the regularity question to be meaningful, yet the manuscript does not specify the precise class of spaces (e.g., Polish, compact, or separable metric) under which such representatives are guaranteed to exist; without this, the scope of the Lipschitz upgrade remains unclear.
Authors: We agree that the setting requires explicit clarification. The manuscript is set in Polish spaces (complete separable metric spaces), which is the standard framework ensuring the Wasserstein metric is well-defined on the space of probability measures with finite moments. The main result is conditional on the existence of at least one measurable transport representative for each μ; it does not claim that such representatives exist for every continuous F. We will add a short preliminary subsection (or remark) in §3 stating the assumptions on the space X and emphasizing the conditional nature of the theorem. This improves readability without changing the result. revision: yes
-
Referee: [§4] §4 (examples): The counterexample demonstrating that mere continuity of F does not yield a continuous f should explicitly verify that the underlying space admits a well-defined Wasserstein metric and that the constructed F is indeed continuous but not Lipschitz; otherwise the sharpness claim rests on an implicit verification that is not load-bearing if omitted.
Authors: We thank the referee for this suggestion. The counterexamples are constructed on standard Polish spaces (e.g., the unit interval [0,1] or R^d), where the Wasserstein metric is well-defined. In the revised manuscript we will explicitly record that these spaces are Polish, state the Wasserstein distance used, and provide a short direct argument verifying that the constructed F is continuous but fails to be Lipschitz (e.g., by exhibiting pairs of measures whose Wasserstein distance scales differently from the distance between their images). This makes the sharpness of the Lipschitz hypothesis fully explicit. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper establishes an existence and regularity result for transport representations of measure transformations: when F is Lipschitz continuous w.r.t. the Wasserstein metric, a continuous selection f(·,μ) exists, while mere continuity of F does not guarantee this. This is a direct theorem relying on standard optimal transport machinery (existence of transport maps under the given assumptions, selection theorems for continuity upgrade under Lipschitz conditions) rather than any self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The weakest assumptions are explicitly stated as prerequisites, and examples are provided only to show sharpness, without reducing the main claim to its inputs by construction. The derivation chain is self-contained against external benchmarks in the field.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The underlying space is a Polish metric space so that the Wasserstein distance is well-defined and transport maps exist under suitable conditions.
Reference graph
Works this paper leans on
-
[1]
Alberti, N
S. Alberti, N. Dern, L. Thesing, and G. Kutyniok , Sumformer: Universal approximation for efficient transformers , in Topological, Algebraic and Geometric Learning Workshops 2023, PMLR, 2023, pp. 72--86
2023
-
[2]
C. D. Aliprantis and K. C. Border , Infinite dimensional analysis: a hitchhiker's guide , Springer Science & Business Media, 2006
2006
-
[3]
Ambrosio, N
L. Ambrosio, N. Gigli, and G. Savar \'e , Gradient flows in metric spaces and in the space of probability measures , Lectures in Mathematics ETH Zurich, Birkh \"a user Verlag, 2008
2008
-
[4]
Ambrosio and P
L. Ambrosio and P. Tilli , Topics on analysis in metric spaces , Oxford University Press, 2004
2004
-
[5]
Bergin , On the continuity of correspondences on sets of measures with restricted marginals , Econom
J. Bergin , On the continuity of correspondences on sets of measures with restricted marginals , Econom. Theory, 13 (1999), pp. 471--481
1999
-
[6]
V. I. Bogachev , Measure theory , vol. 2, Springer, 2007
2007
-
[7]
Brenier and W
Y. Brenier and W. Gangbo , Approximation of maps by diffeomorphisms , Calc. Var. Partial Differential Equations, 16 (2003), pp. 147--164
2003
-
[8]
Cardaliaguet , Notes on mean field games , tech
P. Cardaliaguet , Notes on mean field games , tech. rep., 2010
2010
-
[9]
Carmona and F
R. Carmona and F. c. Delarue , Probabilistic theory of mean field games with applications. I , vol. 83 of Probability Theory and Stochastic Modelling, Springer, Cham, 2018. Mean field FBSDEs, control, and games
2018
-
[10]
A Lagrangian approach to totally dissipative evolutions in Wasserstein spaces
G. Cavagnari, G. Savar \'e , and G. E. Sodini , A Lagrangian approach to totally dissipative evolutions in Wasserstein spaces , arXiv preprint arXiv:2305.05211, (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
Cavagnari, G
G. Cavagnari, G. Savar\'e, and G. E. Sodini , Extension of monotone operators and L ipschitz maps invariant for a group of isometries , Canad. J. Math., 77 (2025), pp. 149--186
2025
-
[12]
Fornasier, G
M. Fornasier, G. Savar \'e , and G. E. Sodini , Density of subalgebras of Lipschitz functions in metric Sobolev spaces and applications to Wasserstein Sobolev spaces , J. Funct. Anal., 285 (2023), p. 110153
2023
- [13]
-
[14]
Garc\'ia Trillos and D
N. Garc\'ia Trillos and D. Slep c ev , Continuum limit of total variation on point clouds , Arch. Ration. Mech. Anal., 220 (2016), pp. 193--241
2016
-
[15]
height 2pt depth -1.6pt width 23pt, A variational approach to the consistency of spectral clustering , Appl. Comput. Harmon. Anal., 45 (2018), pp. 239--281
2018
-
[16]
M. R. Garey and D. S. Johnson , Computers and Intractability: a Guide to the Theory of NP-Completeness , W.H. freeman New York, 1979
1979
-
[17]
Geshkovski, C
B. Geshkovski, C. Letrouit, Y. Polyanskiy, and P. Rigollet , A mathematical perspective on transformers , Bull. Amer. Math. Soc., 62 (2025), pp. 427--479
2025
-
[18]
Measure-to-measure inter- polation using transformers.arXiv preprint arXiv:2411.04551, 2024
B. Geshkovski, P. Rigollet, and D. Ruiz-Balet , Measure-to-measure interpolation using transformers , arXiv preprint arXiv:2411.04551, (2024)
-
[19]
Ghossoub and D
M. Ghossoub and D. Saunders , On the continuity of the feasible set mapping in optimal transport , Econ. Theory Bull., 9 (2021), pp. 113--117
2021
-
[20]
Kallenberg , Random measures, theory and applications , vol
O. Kallenberg , Random measures, theory and applications , vol. 1, Springer, 2017
2017
-
[21]
Pinkus , Approximation theory of the MLP model in neural networks , Acta Numer., 8 (1999), pp
A. Pinkus , Approximation theory of the MLP model in neural networks , Acta Numer., 8 (1999), pp. 143--195
1999
-
[22]
M. E. Sander, P. Ablin, M. Blondel, and G. Peyr \'e , Sinkformers: Transformers with doubly stochastic attention , in International Conference on Artificial Intelligence and Statistics, PMLR, 2022, pp. 3515--3530
2022
-
[23]
Thorpe, S
M. Thorpe, S. Park, S. Kolouri, G. K. Rohde, and D. Slep c ev , A transportation L^p distance for signal analysis , J. Math. Imaging Vision, 59 (2017), pp. 187--210
2017
-
[24]
Vaswani, N
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin , Attention is all you need , Advances in neural information processing systems, 30 (2017)
2017
-
[25]
Villani , Optimal transport: old and new , vol
C. Villani , Optimal transport: old and new , vol. 338, Springer, 2009
2009
-
[26]
A mathematical theory of attention
J. Vuckovic, A. Baratin, and R. Tachet des Combes , A mathematical theory of attention , arXiv preprint arXiv:2007.02876, (2020)
- [27]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.