pith. machine review for the scientific record. sign in

arxiv: 2604.16653 · v1 · submitted 2026-04-17 · 🧮 math.FA · math.OC· math.PR

Recognition: unknown

Continuous transformations of probability measures and their transport representations

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:51 UTC · model grok-4.3

classification 🧮 math.FA math.OCmath.PR
keywords transport representationsWasserstein distanceprobability measuresLipschitz continuitycontinuous selectionspush-forward mapstransformations of measuresoptimal transport
0
0 comments X

The pith

If a map F between probability measures is Lipschitz continuous in the Wasserstein distance, then it admits a continuous transport representation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks when a transformation F of probability measures can be realized as the push-forward of the input measure under a map f that itself depends on the measure. It establishes that continuity of F alone does not guarantee a continuous choice of f, even when a transport representation exists for each fixed measure. Lipschitz continuity of F with respect to the Wasserstein distance is sufficient to ensure the representing map can be chosen continuously in a suitable topology. This regularity matters because it supports stable approximations of measure-valued operations, such as those arising in transformer models that act on distributions rather than points.

Core claim

Given a function F mapping probability measures to probability measures, a transport representation consists of a family of maps f(·, μ) such that F(μ) equals the push-forward of μ under f(·, μ). The central result is that Lipschitz continuity of F in the Wasserstein distance allows selection of a continuous f, while mere continuity of F does not guarantee such a continuous selection. The authors supply concrete counterexamples showing the necessity of the Lipschitz assumption.

What carries the argument

The transport representation given by a μ-dependent map f(·, μ) whose push-forward recovers F(μ), with continuity of f enforced by the Lipschitz condition on F in the Wasserstein metric.

If this is right

  • Continuous selections of transport maps become available for any Lipschitz transformation of measures, enabling uniform approximation schemes.
  • Transformations satisfying the Lipschitz condition can be stably discretized or learned without introducing discontinuities in the representation.
  • The provided counterexamples delimit the precise boundary between continuous and merely measurable selections of transport maps.
  • Results apply in general Polish spaces supporting the Wasserstein metric, not just Euclidean domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The continuous selection could be leveraged to define differentiable flows or gradients through measure transformations in optimization settings.
  • Similar continuity statements might hold for other transport costs or unbalanced optimal transport formulations.
  • In machine-learning contexts the result supplies a theoretical justification for training transformer architectures directly on empirical measures when the target map is Lipschitz.

Load-bearing premise

That a transport map realizing F(μ) as a push-forward exists for every individual measure μ, and that the underlying space is equipped with a well-defined Wasserstein distance.

What would settle it

An explicit Lipschitz continuous F on the space of probability measures together with a sequence of measures μ_n converging in Wasserstein distance to μ such that the corresponding transport maps f(·, μ_n) fail to converge to f(·, μ) in any reasonable topology.

read the original abstract

Given a function $F$ transforming a probability measure $\mu$ into another one $F(\mu)$, we study the existence and regularity of a transport representation of it. That is, we ask whether we can represent the image $F(\mu)$ of the input probability measure $\mu$ as the push-forward of $\mu$ by a map $f(\cdot,\mu)$ which may depend on $\mu$; and furthermore, how regular $f$ can be chosen depending on $F$. Even if $F$ is continuous and a transport representative exists, it cannot necessarily be chosen in a continuous way; however, if $F$ is Lipschitz continuous with respect to the Wasserstein distance, then $f$ can be chosen continuous. We provide several examples to illustrate the sharpness of our assumptions. This question is motivated by approximation results for transformations of probability distributions with transformers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript examines transformations F that map probability measures μ to F(μ) and asks when F(μ) can be realized as the push-forward of μ under a map f(·, μ) that depends on μ. The central claim is that continuity of F alone does not guarantee a continuous choice of f, even when a transport representative exists, but Lipschitz continuity of F with respect to the Wasserstein metric does permit a continuous selection. Several examples are provided to show that the Lipschitz assumption is sharp. The work is motivated by approximation questions arising in transformer models for probability distributions.

Significance. If the main result holds, the paper supplies a clean regularity statement in optimal transport that distinguishes the roles of continuity and Lipschitz continuity for the existence of continuous transport representatives. This distinction is illustrated by explicit counterexamples, and the result could inform stability analyses in settings where measure-valued maps are approximated by neural networks. The absence of free parameters or ad-hoc constructions in the stated theorem is a positive feature.

major comments (2)
  1. [Main theorem / §3] The statement of the main result (presumably Theorem 3.1 or equivalent) conditions on the existence of some (measurable) transport representative f(·, μ) for every continuous F. This assumption is load-bearing for the regularity question to be meaningful, yet the manuscript does not specify the precise class of spaces (e.g., Polish, compact, or separable metric) under which such representatives are guaranteed to exist; without this, the scope of the Lipschitz upgrade remains unclear.
  2. [§4] §4 (examples): The counterexample demonstrating that mere continuity of F does not yield a continuous f should explicitly verify that the underlying space admits a well-defined Wasserstein metric and that the constructed F is indeed continuous but not Lipschitz; otherwise the sharpness claim rests on an implicit verification that is not load-bearing if omitted.
minor comments (3)
  1. [Abstract] The abstract refers to 'several examples' without indicating their location or number; adding a sentence such as 'Examples 4.1–4.3 illustrate sharpness' would improve navigation.
  2. [§2 (Preliminaries)] Notation for the Wasserstein distance (W or W_p) and the precise p-value used in the Lipschitz condition should be introduced in the preliminaries rather than assumed from context.
  3. [Introduction] In the motivation paragraph, the link to transformer approximation results is stated but not referenced; adding one or two citations to relevant works on measure-valued neural networks would strengthen the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the constructive comments on our manuscript. We will incorporate clarifications in a minor revision to address the points raised.

read point-by-point responses
  1. Referee: [Main theorem / §3] The statement of the main result (presumably Theorem 3.1 or equivalent) conditions on the existence of some (measurable) transport representative f(·, μ) for every continuous F. This assumption is load-bearing for the regularity question to be meaningful, yet the manuscript does not specify the precise class of spaces (e.g., Polish, compact, or separable metric) under which such representatives are guaranteed to exist; without this, the scope of the Lipschitz upgrade remains unclear.

    Authors: We agree that the setting requires explicit clarification. The manuscript is set in Polish spaces (complete separable metric spaces), which is the standard framework ensuring the Wasserstein metric is well-defined on the space of probability measures with finite moments. The main result is conditional on the existence of at least one measurable transport representative for each μ; it does not claim that such representatives exist for every continuous F. We will add a short preliminary subsection (or remark) in §3 stating the assumptions on the space X and emphasizing the conditional nature of the theorem. This improves readability without changing the result. revision: yes

  2. Referee: [§4] §4 (examples): The counterexample demonstrating that mere continuity of F does not yield a continuous f should explicitly verify that the underlying space admits a well-defined Wasserstein metric and that the constructed F is indeed continuous but not Lipschitz; otherwise the sharpness claim rests on an implicit verification that is not load-bearing if omitted.

    Authors: We thank the referee for this suggestion. The counterexamples are constructed on standard Polish spaces (e.g., the unit interval [0,1] or R^d), where the Wasserstein metric is well-defined. In the revised manuscript we will explicitly record that these spaces are Polish, state the Wasserstein distance used, and provide a short direct argument verifying that the constructed F is continuous but fails to be Lipschitz (e.g., by exhibiting pairs of measures whose Wasserstein distance scales differently from the distance between their images). This makes the sharpness of the Lipschitz hypothesis fully explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper establishes an existence and regularity result for transport representations of measure transformations: when F is Lipschitz continuous w.r.t. the Wasserstein metric, a continuous selection f(·,μ) exists, while mere continuity of F does not guarantee this. This is a direct theorem relying on standard optimal transport machinery (existence of transport maps under the given assumptions, selection theorems for continuity upgrade under Lipschitz conditions) rather than any self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The weakest assumptions are explicitly stated as prerequisites, and examples are provided only to show sharpness, without reducing the main claim to its inputs by construction. The derivation chain is self-contained against external benchmarks in the field.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard background from optimal transport (Polish spaces, existence of transport maps under certain conditions) without introducing new free parameters or invented entities in the abstract.

axioms (1)
  • standard math The underlying space is a Polish metric space so that the Wasserstein distance is well-defined and transport maps exist under suitable conditions.
    Invoked implicitly when discussing Wasserstein continuity and push-forwards.

pith-pipeline@v0.9.0 · 5447 in / 1149 out tokens · 35496 ms · 2026-05-10T06:51:52.721128+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Alberti, N

    S. Alberti, N. Dern, L. Thesing, and G. Kutyniok , Sumformer: Universal approximation for efficient transformers , in Topological, Algebraic and Geometric Learning Workshops 2023, PMLR, 2023, pp. 72--86

  2. [2]

    C. D. Aliprantis and K. C. Border , Infinite dimensional analysis: a hitchhiker's guide , Springer Science & Business Media, 2006

  3. [3]

    Ambrosio, N

    L. Ambrosio, N. Gigli, and G. Savar \'e , Gradient flows in metric spaces and in the space of probability measures , Lectures in Mathematics ETH Zurich, Birkh \"a user Verlag, 2008

  4. [4]

    Ambrosio and P

    L. Ambrosio and P. Tilli , Topics on analysis in metric spaces , Oxford University Press, 2004

  5. [5]

    Bergin , On the continuity of correspondences on sets of measures with restricted marginals , Econom

    J. Bergin , On the continuity of correspondences on sets of measures with restricted marginals , Econom. Theory, 13 (1999), pp. 471--481

  6. [6]

    V. I. Bogachev , Measure theory , vol. 2, Springer, 2007

  7. [7]

    Brenier and W

    Y. Brenier and W. Gangbo , Approximation of maps by diffeomorphisms , Calc. Var. Partial Differential Equations, 16 (2003), pp. 147--164

  8. [8]

    Cardaliaguet , Notes on mean field games , tech

    P. Cardaliaguet , Notes on mean field games , tech. rep., 2010

  9. [9]

    Carmona and F

    R. Carmona and F. c. Delarue , Probabilistic theory of mean field games with applications. I , vol. 83 of Probability Theory and Stochastic Modelling, Springer, Cham, 2018. Mean field FBSDEs, control, and games

  10. [10]

    A Lagrangian approach to totally dissipative evolutions in Wasserstein spaces

    G. Cavagnari, G. Savar \'e , and G. E. Sodini , A Lagrangian approach to totally dissipative evolutions in Wasserstein spaces , arXiv preprint arXiv:2305.05211, (2023)

  11. [11]

    Cavagnari, G

    G. Cavagnari, G. Savar\'e, and G. E. Sodini , Extension of monotone operators and L ipschitz maps invariant for a group of isometries , Canad. J. Math., 77 (2025), pp. 149--186

  12. [12]

    Fornasier, G

    M. Fornasier, G. Savar \'e , and G. E. Sodini , Density of subalgebras of Lipschitz functions in metric Sobolev spaces and applications to Wasserstein Sobolev spaces , J. Funct. Anal., 285 (2023), p. 110153

  13. [13]

    Furuya, M

    T. Furuya, M. V. de Hoop, and G. Peyr \'e , Transformers are universal in-context learners , arXiv preprint arXiv:2408.01367, (2024)

  14. [14]

    Garc\'ia Trillos and D

    N. Garc\'ia Trillos and D. Slep c ev , Continuum limit of total variation on point clouds , Arch. Ration. Mech. Anal., 220 (2016), pp. 193--241

  15. [15]

    height 2pt depth -1.6pt width 23pt, A variational approach to the consistency of spectral clustering , Appl. Comput. Harmon. Anal., 45 (2018), pp. 239--281

  16. [16]

    M. R. Garey and D. S. Johnson , Computers and Intractability: a Guide to the Theory of NP-Completeness , W.H. freeman New York, 1979

  17. [17]

    Geshkovski, C

    B. Geshkovski, C. Letrouit, Y. Polyanskiy, and P. Rigollet , A mathematical perspective on transformers , Bull. Amer. Math. Soc., 62 (2025), pp. 427--479

  18. [18]

    Measure-to-measure inter- polation using transformers.arXiv preprint arXiv:2411.04551, 2024

    B. Geshkovski, P. Rigollet, and D. Ruiz-Balet , Measure-to-measure interpolation using transformers , arXiv preprint arXiv:2411.04551, (2024)

  19. [19]

    Ghossoub and D

    M. Ghossoub and D. Saunders , On the continuity of the feasible set mapping in optimal transport , Econ. Theory Bull., 9 (2021), pp. 113--117

  20. [20]

    Kallenberg , Random measures, theory and applications , vol

    O. Kallenberg , Random measures, theory and applications , vol. 1, Springer, 2017

  21. [21]

    Pinkus , Approximation theory of the MLP model in neural networks , Acta Numer., 8 (1999), pp

    A. Pinkus , Approximation theory of the MLP model in neural networks , Acta Numer., 8 (1999), pp. 143--195

  22. [22]

    M. E. Sander, P. Ablin, M. Blondel, and G. Peyr \'e , Sinkformers: Transformers with doubly stochastic attention , in International Conference on Artificial Intelligence and Statistics, PMLR, 2022, pp. 3515--3530

  23. [23]

    Thorpe, S

    M. Thorpe, S. Park, S. Kolouri, G. K. Rohde, and D. Slep c ev , A transportation L^p distance for signal analysis , J. Math. Imaging Vision, 59 (2017), pp. 187--210

  24. [24]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin , Attention is all you need , Advances in neural information processing systems, 30 (2017)

  25. [25]

    Villani , Optimal transport: old and new , vol

    C. Villani , Optimal transport: old and new , vol. 338, Springer, 2009

  26. [26]

    A mathematical theory of attention

    J. Vuckovic, A. Baratin, and R. Tachet des Combes , A mathematical theory of attention , arXiv preprint arXiv:2007.02876, (2020)

  27. [27]

    C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi, and S. Kumar , Are transformers universal approximators of sequence-to-sequence functions? , arXiv preprint arXiv:1912.10077, (2019)