Recognition: no theorem link
Two-Sample Inference for Gaussian-Smoothed Wasserstein Costs with Finite Moments
Pith reviewed 2026-05-12 03:20 UTC · model grok-4.3
The pith
The plug-in estimator for the Gaussian-smoothed Wasserstein cost converges at rates determined by the distributions' polynomial moments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For fixed smoothing and measures with finite moments of order q_μ and q_ν strictly above p, the two-sample estimator of the smoothed Wasserstein cost satisfies an upper probability bound of order ρ(m) + ρ(n), where ρ(N) equals N to the power -(q-p)/(q+d) when p < q < d+2p, N to the power -1/2 times log N when q equals d+2p, and N to the power -1/2 when q exceeds d+2p. This rate extends to expectation when moments are at least 2p, and to the distance when positive. For p greater than 1 and moments above d+2p, a first-order expansion and separated CLT hold along with a variance estimator.
What carries the argument
The Gaussian-smoothed Wasserstein cost defined as the p-th power of the Wasserstein distance between the measures each convolved with a Gaussian of variance σ², and its empirical version using independent samples from each measure.
If this is right
- The estimator converges in probability to the population value at the given rate.
- The bound also holds in expectation under the condition that moments are at least twice p.
- When the population smoothed cost is strictly positive, the rate applies directly to the estimator of the distance.
- For p > 1 and sufficiently high moments, the estimator admits a central limit theorem after suitable centering and scaling.
- A sample-splitting method provides a consistent estimator for the asymptotic variance.
Where Pith is reading between the lines
- The rates could guide practical choices of sample sizes when using smoothed transport distances in data analysis.
- Similar techniques might apply to other regularized optimal transport problems with different kernels.
- Extensions to dependent samples or non-iid settings could build on the moment conditions used here.
- Testing the sharpness of the phase transition at q = d + 2p would clarify the boundary between different regimes.
Load-bearing premise
The underlying distributions have finite polynomial moments of order strictly larger than p, and the Gaussian smoothing parameter remains fixed and positive.
What would settle it
Generate many pairs of samples from distributions with known moments q just above p and check whether the observed error of the estimator decays slower than the predicted ρ order as m and n increase.
read the original abstract
We study the two-sample plug-in estimator of the Gaussian-smoothed Wasserstein cost \(T_p^{(\sigma)}(\mu,\nu)=W_p(\mu*\gamma_\sigma,\nu*\gamma_\sigma)^p\) on \(\R^d\). For fixed smoothing and finite polynomial moments \(M_{q_\mu}(\mu)<\infty\), \(M_{q_\nu}(\nu)<\infty\), with \(q_\mu,q_\nu>p\), we establish upper bounds in probability of order \(\rho_{q_\mu,p,d}(m)+\rho_{q_\nu,p,d}(n)\). Here \(\rho_{q,p,d}(N)=N^{-(q-p)/(q+d)}\) for \(p<q<d+2p\), \(N^{-1/2}\log N\) at \(q=d+2p\), and \(N^{-1/2}\) for \(q>d+2p\). This order also holds in expectation under \(q_\mu,q_\nu\ge2p\). When the smoothed population distance is positive, the cost bound yields this rate for the distance itself. For \(p>1\) and \(q_\mu,q_\nu>d+2p\), we also derive a first-order expansion, a separated two-sample central limit theorem, and a sample-splitting variance estimator.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies two-sample inference for the Gaussian-smoothed Wasserstein cost T_p^{(σ)}(μ, ν) = W_p(μ ∗ γ_σ, ν ∗ γ_σ)^p. It establishes upper bounds in probability of order ρ_{q_μ,p,d}(m) + ρ_{q_ν,p,d}(n) for the plug-in estimator under finite moments M_{q}(·) < ∞ with q > p, where the rate function ρ transitions from polynomial decay to sqrt(N) rates based on whether q is below, at, or above d + 2p. The same rate holds in expectation for q ≥ 2p. When the population cost is positive, the rate carries over to the distance. For p > 1 and q > d + 2p, a first-order expansion, separated CLT, and sample-splitting variance estimator are derived.
Significance. If the claims hold, this contributes meaningfully to the literature on statistical properties of Wasserstein distances by providing explicit non-asymptotic rates and asymptotic normality for the smoothed version, which mitigates some computational and statistical issues of the unsmoothed metric. The results are particularly useful for applications in machine learning and statistics where smoothed OT is used. The strength lies in the comprehensive treatment of different moment regimes and the practical variance estimator.
major comments (2)
- §3 (main deviation bound, likely Theorem 3.1): the piecewise definition of ρ_{q,p,d}(N) at the critical value q = d + 2p includes an extra log N factor; while this is standard for empirical processes without smoothing, the Gaussian convolution may improve integrability enough to remove the log term, and the proof should explicitly track whether the smoothing alters the boundary case.
- §4 (CLT and expansion): the first-order expansion and separated CLT are stated under p > 1 and q > d + 2p, but the non-degeneracy of the limiting variance is only implicitly tied to positivity of T_p^{(σ)}; an explicit condition ensuring the asymptotic variance is positive (or a statement that the result is for the cost rather than the distance) is needed to make the CLT statement complete.
minor comments (3)
- Notation section: the moment functional M_{q_μ}(μ) is used throughout but should be defined with an equation number at first appearance for clarity.
- Abstract and §1: the phrase 'upper bounds in probability' should be sharpened to indicate whether the bounds hold with high probability (1 - o(1)) or merely in probability; the distinction affects how the results are used for inference.
- References: add citations to recent works on empirical processes for Wasserstein distances (e.g., on rates under polynomial moments) to better situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation of our manuscript and for the detailed, constructive comments. We address each major comment below and have incorporated revisions to strengthen the presentation.
read point-by-point responses
-
Referee: §3 (main deviation bound, likely Theorem 3.1): the piecewise definition of ρ_{q,p,d}(N) at the critical value q = d + 2p includes an extra log N factor; while this is standard for empirical processes without smoothing, the Gaussian convolution may improve integrability enough to remove the log term, and the proof should explicitly track whether the smoothing alters the boundary case.
Authors: We appreciate the referee's suggestion regarding the possible improvement from Gaussian smoothing. Upon re-examining the proof of Theorem 3.1, the convolution does enhance integrability of the relevant function class. However, because the smoothing parameter σ is held fixed, the entropy integral for the Lipschitz functions at the critical moment order q = d + 2p still produces a logarithmic factor in the maximal inequality. We have added a remark immediately following the proof that explicitly tracks the effect of the smoothing and explains why the log N term cannot be removed in the boundary regime. revision: partial
-
Referee: §4 (CLT and expansion): the first-order expansion and separated CLT are stated under p > 1 and q > d + 2p, but the non-degeneracy of the limiting variance is only implicitly tied to positivity of T_p^{(σ)}; an explicit condition ensuring the asymptotic variance is positive (or a statement that the result is for the cost rather than the distance) is needed to make the CLT statement complete.
Authors: We agree that an explicit non-degeneracy condition improves clarity. The first-order expansion and separated CLT are derived for the smoothed cost T_p^{(σ)}(μ, ν). The asymptotic variance is positive whenever the population cost is positive, which follows directly from the form of the influence functions. We have revised the statement of the theorem in Section 4 to include this explicit condition and have added a short sentence clarifying that the CLT applies to the cost (with the distance rate following from the earlier deviation bounds when the cost is positive). revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper derives upper bounds on the two-sample plug-in estimator of the fixed-σ Gaussian-smoothed Wasserstein cost T_p^(σ)(μ,ν) under the stated polynomial moment assumptions M_{q_μ}(μ)<∞ and M_{q_ν}(ν)<∞ with q>p. The rates ρ_{q,p,d}(N) are obtained from standard empirical-process concentration for the convolved measures μ*γ_σ and ν*γ_σ; the piecewise definition (power-law decay, log/sqrt(N) transition, and sqrt(N) regime) follows directly from moment integrability and does not reduce to a self-definition or a fitted parameter renamed as a prediction. Subsequent claims on positivity implying the same rate for the distance, first-order expansions, separated CLTs, and sample-splitting variance estimators are consistent extensions within the same regime and do not rely on load-bearing self-citations or ansätze smuggled from prior work. The derivation is self-contained against external benchmarks in empirical-process theory and Gaussian convolution properties.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Distributions μ and ν satisfy M_{q_μ}(μ) < ∞ and M_{q_ν}(ν) < ∞ for q_μ, q_ν > p
Reference graph
Works this paper leans on
-
[1]
(2019).One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances
Bobkov, S., and Ledoux, M. (2019).One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances. Memoirs of the American Mathematical Society, 261(1259)
work page 2019
- [2]
-
[3]
del Barrio, E., González-Sanz, A., and Loubes, J.-M. (2024). Central limit theorems for general transportation costs.Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 60(2), 847–873. doi:10.1214/22-AIHP1356
-
[4]
Ding, Y., and Niles-Weed, J. (2022). Asymptotics of smoothed Wasserstein distances in the small noise regime. InAdvances in Neural Information Processing Systems
work page 2022
-
[5]
Fang, Z., and Santos, A. (2019). Inference on directionally differentiable functions.The Review of Economic Studies, 86(1), 377–412
work page 2019
-
[6]
Fournier, N., and Guillin, A. (2015). On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162, 707–738. 22
work page 2015
-
[7]
Goldfeld, Z., and Greenewald, K. (2020). Gaussian-smoothed optimal transport: metric structure and statistical efficiency. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108, 3327–3337
work page 2020
-
[8]
Goldfeld, Z., Greenewald, K., Niles-Weed, J., and Polyanskiy, Y. (2020). Convergence of smoothed empirical measures with applications to entropy estimation.IEEE Transactions on Information Theory, 66(7), 4368–4391
work page 2020
-
[9]
Goldfeld, Z., Kato, K., Nietert, S., and Rioux, G. (2024a). Limit distribution theory for smooth p-Wasserstein distances.The Annals of Applied Probability, 34(2), 2447–2511
-
[10]
Goldfeld, Z., Kato, K., Rioux, G., and Sadhu, R. (2024b). Statistical inference with regularized optimal transport.Information and Inference: A Journal of the IMA, 13(1), iaad056
-
[11]
Manole, T., and Niles-Weed, J. (2024). Sharp convergence rates for empirical optimal transport with smooth costs.The Annals of Applied Probability, 34(1B), 1108–1135
work page 2024
-
[12]
Nietert, S., Goldfeld, Z., and Kato, K. (2021). Smoothp-Wasserstein distance: structure, empirical approximation, and statistical applications. InProceedings of the 38th International Conference on Machine Learning, PMLR 139, 8172–8183
work page 2021
- [13]
-
[14]
(2015).Optimal Transport for Applied Mathematicians
Santambrogio, F. (2015).Optimal Transport for Applied Mathematicians. Birkhäuser
work page 2015
-
[15]
Sommerfeld, M., and Munk, A. (2018). Inference for empirical Wasserstein distances on finite spaces.Journal of the Royal Statistical Society: Series B, 80(1), 219–238
work page 2018
-
[16]
Staudt, T., Hundrieser, S., and Munk, A. (2025). On the uniqueness of Kantorovich potentials. SIAM Journal on Mathematical Analysis, 57(2), 1452–1482. doi:10.1137/24M1658966
-
[17]
van der Vaart, A. W., and Wellner, J. A. (1996).Weak Convergence and Empirical Processes. Springer
work page 1996
-
[18]
(2009).Optimal Transport: Old and New
Villani, C. (2009).Optimal Transport: Old and New. Springer
work page 2009
-
[19]
Weed, J., and Bach, F. (2019). Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance.Bernoulli, 25(4A), 2620–2648
work page 2019
-
[20]
Zhang, Y., Cheng, X., and Reeves, G. (2021). Convergence of Gaussian-smoothed optimal transport distance with sub-gamma distributions and dependent samples. InProceedings of the Twenty Fourth International Conference on Artificial Intelligence and Statistics, PMLR 130, 2422–2430. 23
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.