pith. machine review for the scientific record. sign in

arxiv: 2605.09084 · v1 · submitted 2026-05-09 · 🧮 math.ST · stat.TH

Recognition: no theorem link

Two-Sample Inference for Gaussian-Smoothed Wasserstein Costs with Finite Moments

Jiaping Yang, Yunxin Zhang

Pith reviewed 2026-05-12 03:20 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords Wasserstein distanceGaussian smoothingtwo-sample inferenceconvergence ratescentral limit theoremempirical measuresmoment conditionsoptimal transport
0
0 comments X

The pith

The plug-in estimator for the Gaussian-smoothed Wasserstein cost converges at rates determined by the distributions' polynomial moments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper derives convergence rates for the two-sample plug-in estimator of the Gaussian-smoothed Wasserstein cost between two distributions in Euclidean space. With fixed smoothing and moments of order q greater than the cost order p, the error is bounded in probability by a term that decays like the sample size to a negative power depending on how large q is relative to dimension plus twice p. The same order applies to the distance itself when it is positive. Additional results include a central limit theorem under stronger conditions.

Core claim

For fixed smoothing and measures with finite moments of order q_μ and q_ν strictly above p, the two-sample estimator of the smoothed Wasserstein cost satisfies an upper probability bound of order ρ(m) + ρ(n), where ρ(N) equals N to the power -(q-p)/(q+d) when p < q < d+2p, N to the power -1/2 times log N when q equals d+2p, and N to the power -1/2 when q exceeds d+2p. This rate extends to expectation when moments are at least 2p, and to the distance when positive. For p greater than 1 and moments above d+2p, a first-order expansion and separated CLT hold along with a variance estimator.

What carries the argument

The Gaussian-smoothed Wasserstein cost defined as the p-th power of the Wasserstein distance between the measures each convolved with a Gaussian of variance σ², and its empirical version using independent samples from each measure.

If this is right

  • The estimator converges in probability to the population value at the given rate.
  • The bound also holds in expectation under the condition that moments are at least twice p.
  • When the population smoothed cost is strictly positive, the rate applies directly to the estimator of the distance.
  • For p > 1 and sufficiently high moments, the estimator admits a central limit theorem after suitable centering and scaling.
  • A sample-splitting method provides a consistent estimator for the asymptotic variance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The rates could guide practical choices of sample sizes when using smoothed transport distances in data analysis.
  • Similar techniques might apply to other regularized optimal transport problems with different kernels.
  • Extensions to dependent samples or non-iid settings could build on the moment conditions used here.
  • Testing the sharpness of the phase transition at q = d + 2p would clarify the boundary between different regimes.

Load-bearing premise

The underlying distributions have finite polynomial moments of order strictly larger than p, and the Gaussian smoothing parameter remains fixed and positive.

What would settle it

Generate many pairs of samples from distributions with known moments q just above p and check whether the observed error of the estimator decays slower than the predicted ρ order as m and n increase.

read the original abstract

We study the two-sample plug-in estimator of the Gaussian-smoothed Wasserstein cost \(T_p^{(\sigma)}(\mu,\nu)=W_p(\mu*\gamma_\sigma,\nu*\gamma_\sigma)^p\) on \(\R^d\). For fixed smoothing and finite polynomial moments \(M_{q_\mu}(\mu)<\infty\), \(M_{q_\nu}(\nu)<\infty\), with \(q_\mu,q_\nu>p\), we establish upper bounds in probability of order \(\rho_{q_\mu,p,d}(m)+\rho_{q_\nu,p,d}(n)\). Here \(\rho_{q,p,d}(N)=N^{-(q-p)/(q+d)}\) for \(p<q<d+2p\), \(N^{-1/2}\log N\) at \(q=d+2p\), and \(N^{-1/2}\) for \(q>d+2p\). This order also holds in expectation under \(q_\mu,q_\nu\ge2p\). When the smoothed population distance is positive, the cost bound yields this rate for the distance itself. For \(p>1\) and \(q_\mu,q_\nu>d+2p\), we also derive a first-order expansion, a separated two-sample central limit theorem, and a sample-splitting variance estimator.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper studies two-sample inference for the Gaussian-smoothed Wasserstein cost T_p^{(σ)}(μ, ν) = W_p(μ ∗ γ_σ, ν ∗ γ_σ)^p. It establishes upper bounds in probability of order ρ_{q_μ,p,d}(m) + ρ_{q_ν,p,d}(n) for the plug-in estimator under finite moments M_{q}(·) < ∞ with q > p, where the rate function ρ transitions from polynomial decay to sqrt(N) rates based on whether q is below, at, or above d + 2p. The same rate holds in expectation for q ≥ 2p. When the population cost is positive, the rate carries over to the distance. For p > 1 and q > d + 2p, a first-order expansion, separated CLT, and sample-splitting variance estimator are derived.

Significance. If the claims hold, this contributes meaningfully to the literature on statistical properties of Wasserstein distances by providing explicit non-asymptotic rates and asymptotic normality for the smoothed version, which mitigates some computational and statistical issues of the unsmoothed metric. The results are particularly useful for applications in machine learning and statistics where smoothed OT is used. The strength lies in the comprehensive treatment of different moment regimes and the practical variance estimator.

major comments (2)
  1. §3 (main deviation bound, likely Theorem 3.1): the piecewise definition of ρ_{q,p,d}(N) at the critical value q = d + 2p includes an extra log N factor; while this is standard for empirical processes without smoothing, the Gaussian convolution may improve integrability enough to remove the log term, and the proof should explicitly track whether the smoothing alters the boundary case.
  2. §4 (CLT and expansion): the first-order expansion and separated CLT are stated under p > 1 and q > d + 2p, but the non-degeneracy of the limiting variance is only implicitly tied to positivity of T_p^{(σ)}; an explicit condition ensuring the asymptotic variance is positive (or a statement that the result is for the cost rather than the distance) is needed to make the CLT statement complete.
minor comments (3)
  1. Notation section: the moment functional M_{q_μ}(μ) is used throughout but should be defined with an equation number at first appearance for clarity.
  2. Abstract and §1: the phrase 'upper bounds in probability' should be sharpened to indicate whether the bounds hold with high probability (1 - o(1)) or merely in probability; the distinction affects how the results are used for inference.
  3. References: add citations to recent works on empirical processes for Wasserstein distances (e.g., on rates under polynomial moments) to better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation of our manuscript and for the detailed, constructive comments. We address each major comment below and have incorporated revisions to strengthen the presentation.

read point-by-point responses
  1. Referee: §3 (main deviation bound, likely Theorem 3.1): the piecewise definition of ρ_{q,p,d}(N) at the critical value q = d + 2p includes an extra log N factor; while this is standard for empirical processes without smoothing, the Gaussian convolution may improve integrability enough to remove the log term, and the proof should explicitly track whether the smoothing alters the boundary case.

    Authors: We appreciate the referee's suggestion regarding the possible improvement from Gaussian smoothing. Upon re-examining the proof of Theorem 3.1, the convolution does enhance integrability of the relevant function class. However, because the smoothing parameter σ is held fixed, the entropy integral for the Lipschitz functions at the critical moment order q = d + 2p still produces a logarithmic factor in the maximal inequality. We have added a remark immediately following the proof that explicitly tracks the effect of the smoothing and explains why the log N term cannot be removed in the boundary regime. revision: partial

  2. Referee: §4 (CLT and expansion): the first-order expansion and separated CLT are stated under p > 1 and q > d + 2p, but the non-degeneracy of the limiting variance is only implicitly tied to positivity of T_p^{(σ)}; an explicit condition ensuring the asymptotic variance is positive (or a statement that the result is for the cost rather than the distance) is needed to make the CLT statement complete.

    Authors: We agree that an explicit non-degeneracy condition improves clarity. The first-order expansion and separated CLT are derived for the smoothed cost T_p^{(σ)}(μ, ν). The asymptotic variance is positive whenever the population cost is positive, which follows directly from the form of the influence functions. We have revised the statement of the theorem in Section 4 to include this explicit condition and have added a short sentence clarifying that the CLT applies to the cost (with the distance rate following from the earlier deviation bounds when the cost is positive). revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives upper bounds on the two-sample plug-in estimator of the fixed-σ Gaussian-smoothed Wasserstein cost T_p^(σ)(μ,ν) under the stated polynomial moment assumptions M_{q_μ}(μ)<∞ and M_{q_ν}(ν)<∞ with q>p. The rates ρ_{q,p,d}(N) are obtained from standard empirical-process concentration for the convolved measures μ*γ_σ and ν*γ_σ; the piecewise definition (power-law decay, log/sqrt(N) transition, and sqrt(N) regime) follows directly from moment integrability and does not reduce to a self-definition or a fitted parameter renamed as a prediction. Subsequent claims on positivity implying the same rate for the distance, first-order expansions, separated CLTs, and sample-splitting variance estimators are consistent extensions within the same regime and do not rely on load-bearing self-citations or ansätze smuggled from prior work. The derivation is self-contained against external benchmarks in empirical-process theory and Gaussian convolution properties.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that the two distributions have finite polynomial moments of order strictly larger than p and that the Gaussian smoothing parameter is fixed.

axioms (1)
  • domain assumption Distributions μ and ν satisfy M_{q_μ}(μ) < ∞ and M_{q_ν}(ν) < ∞ for q_μ, q_ν > p
    Explicitly required in the abstract for the probability bounds to hold.

pith-pipeline@v0.9.0 · 5531 in / 1209 out tokens · 43028 ms · 2026-05-12T03:20:47.496906+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    (2019).One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances

    Bobkov, S., and Ledoux, M. (2019).One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances. Memoirs of the American Mathematical Society, 261(1259)

  2. [2]

    Cosso, A., Martini, M., and Perelli, L. (2025). Mean convergence rates for Gaussian-smoothed Wasserstein distances and classical Wasserstein distances. arXiv:2504.17477

  3. [3]

    del Barrio, E., González-Sanz, A., and Loubes, J.-M. (2024). Central limit theorems for general transportation costs.Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 60(2), 847–873. doi:10.1214/22-AIHP1356

  4. [4]

    Ding, Y., and Niles-Weed, J. (2022). Asymptotics of smoothed Wasserstein distances in the small noise regime. InAdvances in Neural Information Processing Systems

  5. [5]

    Fang, Z., and Santos, A. (2019). Inference on directionally differentiable functions.The Review of Economic Studies, 86(1), 377–412

  6. [6]

    Fournier, N., and Guillin, A. (2015). On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162, 707–738. 22

  7. [7]

    Goldfeld, Z., and Greenewald, K. (2020). Gaussian-smoothed optimal transport: metric structure and statistical efficiency. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108, 3327–3337

  8. [8]

    Goldfeld, Z., Greenewald, K., Niles-Weed, J., and Polyanskiy, Y. (2020). Convergence of smoothed empirical measures with applications to entropy estimation.IEEE Transactions on Information Theory, 66(7), 4368–4391

  9. [9]

    Goldfeld, Z., Kato, K., Nietert, S., and Rioux, G. (2024a). Limit distribution theory for smooth p-Wasserstein distances.The Annals of Applied Probability, 34(2), 2447–2511

  10. [10]

    Goldfeld, Z., Kato, K., Rioux, G., and Sadhu, R. (2024b). Statistical inference with regularized optimal transport.Information and Inference: A Journal of the IMA, 13(1), iaad056

  11. [11]

    Manole, T., and Niles-Weed, J. (2024). Sharp convergence rates for empirical optimal transport with smooth costs.The Annals of Applied Probability, 34(1B), 1108–1135

  12. [12]

    Nietert, S., Goldfeld, Z., and Kato, K. (2021). Smoothp-Wasserstein distance: structure, empirical approximation, and statistical applications. InProceedings of the 38th International Conference on Machine Learning, PMLR 139, 8172–8183

  13. [13]

    Sadhu, R., Goldfeld, Z., and Kato, K. (2021). Limit distribution theory for the smooth 1-Wasserstein distance with applications. arXiv:2107.13494

  14. [14]

    (2015).Optimal Transport for Applied Mathematicians

    Santambrogio, F. (2015).Optimal Transport for Applied Mathematicians. Birkhäuser

  15. [15]

    Sommerfeld, M., and Munk, A. (2018). Inference for empirical Wasserstein distances on finite spaces.Journal of the Royal Statistical Society: Series B, 80(1), 219–238

  16. [16]

    Staudt, T., Hundrieser, S., and Munk, A. (2025). On the uniqueness of Kantorovich potentials. SIAM Journal on Mathematical Analysis, 57(2), 1452–1482. doi:10.1137/24M1658966

  17. [17]

    W., and Wellner, J

    van der Vaart, A. W., and Wellner, J. A. (1996).Weak Convergence and Empirical Processes. Springer

  18. [18]

    (2009).Optimal Transport: Old and New

    Villani, C. (2009).Optimal Transport: Old and New. Springer

  19. [19]

    Weed, J., and Bach, F. (2019). Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance.Bernoulli, 25(4A), 2620–2648

  20. [20]

    Zhang, Y., Cheng, X., and Reeves, G. (2021). Convergence of Gaussian-smoothed optimal transport distance with sub-gamma distributions and dependent samples. InProceedings of the Twenty Fourth International Conference on Artificial Intelligence and Statistics, PMLR 130, 2422–2430. 23