pith. sign in

arxiv: 2605.18497 · v1 · pith:KIKAYEKNnew · submitted 2026-05-18 · 🧮 math.PR · math.OC· math.ST· stat.TH

Sharp Rates of MMD Empirical Estimation with Power Kernels

Pith reviewed 2026-05-20 08:22 UTC · model grok-4.3

classification 🧮 math.PR math.OCmath.STstat.TH
keywords maximum mean discrepancypower kernelsenergy distanceempirical measuresAhlfors regularityconvergence ratesprobability measures
0
0 comments X

The pith

For Ahlfors-regular measures the MMD with power kernels to any N-point empirical set decays at the exact rate N to the power of minus one-half times one plus q over beta.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes sharp quantitative rates for how well an N-point empirical measure can approximate a target probability measure in the maximum mean discrepancy induced by a power kernel. Under an Ahlfors regularity condition of exponent beta the discrepancy is shown to decay precisely like N to the negative one-half times one plus q over beta, and this rate is achieved both by the worst possible placement of the N points and by the best possible placement. The result supplies the missing quantitative half of an earlier qualitative consistency theorem that only showed convergence without a speed. A reader would care because the bound tells exactly how many samples are needed to reach a given accuracy when using this particular discrepancy for measure approximation.

Core claim

Given a probability measure omega on R^d satisfying an Ahlfors regularity condition of exponent beta, the sharp two-sided bound E_q(mu_N, omega) ≍ N^{-1/2 (1 + q/beta)} holds both for the worst-case empirical measure mu_N (lower bound) and for an optimally chosen empirical measure mu_N (upper bound), where E_q is the energy distance induced by the power kernel K_q(x,y) = -|x-y|^q.

What carries the argument

The energy distance E_q induced by the power kernel K_q(x,y) = -|x-y|^q for q in (0,2), whose empirical estimation is controlled by the Ahlfors regularity exponent beta of the target measure.

If this is right

  • The upper and lower bounds match, so the rate is optimal and cannot be improved by any choice of N points.
  • The same rate applies uniformly to every possible configuration of N points in the lower bound.
  • The quantitative speed fills the gap left by the earlier qualitative narrow-convergence result for minimizers of the energy distance.
  • The exponent depends explicitly on both the kernel parameter q and the regularity exponent beta of the measure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same proof technique might yield analogous sharp rates for other singular kernels whose Fourier transforms decay at comparable rates.
  • Because beta encodes the dimension of the support, the result links empirical approximation quality directly to the intrinsic dimension of the measure.
  • Numerical verification on self-similar measures with known beta would provide an immediate test of the predicted exponent.

Load-bearing premise

The target probability measure must satisfy an Ahlfors regularity condition of exponent beta.

What would settle it

Compute the asymptotic decay of the minimal energy distance for a concrete Ahlfors-regular measure such as the uniform distribution on the unit ball in R^d and check whether the observed exponent matches exactly minus one-half times one plus q over beta.

read the original abstract

We establish quantitative rates of convergence for the empirical estimation of probability measures by means of the Maximum Mean Discrepancy (MMD) with power kernel $K_q(x,y) = -|x-y|^q$, $q \in (0,2)$. The resulting discrepancy is the classical energy distance $$\mathcal E_q^2(\mu, \omega) = -\frac{1}{2}\iint_{\mathbb{R}^d \times \mathbb{R}^d} |x-y|^q \, d(\mu - \omega)(x)\, d(\mu - \omega)(y),$$ and we ask how fast the best $N$-point empirical approximation $\inf_{\mu_N \in \mathcal{P}^N}\mathcal{E}_q(\mu_N,\omega)$ decays as $N \to \infty$. Given a probability measure $\omega$ on $\mathbb{R}^d$ satisfying an Ahlfors regularity condition of exponent $\beta$, we prove that the sharp two-sided bound $$\mathcal E_q(\mu_N, \omega) \asymp N^{-\frac{1}{2}\left(1 + \frac{q}{\beta}\right)}$$ holds both for the worst-case empirical measure $\mu_N$ (lower bound, holding for every configuration of $N$ points) and for an optimally chosen empirical measure $\mu_N$ (upper bound). This complements the qualitative consistency result of Fornasier and H\"utter \cite{fornasier2014consistency}, who proved narrow convergence of the minimizers of $\mathcal E_q^2(\cdot, \omega)$ over empirical measures without quantitative rates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper establishes sharp quantitative rates for the empirical estimation of a probability measure ω on R^d by N-point measures μ_N in the maximum mean discrepancy induced by the power kernel K_q(x,y) = -|x-y|^q for q ∈ (0,2). Under the assumption that ω satisfies an Ahlfors regularity condition of exponent β, it proves the two-sided bound E_q(μ_N, ω) ≍ N^{-(1/2)(1 + q/β)}, which holds both for the worst-case choice of μ_N (lower bound) and for the optimally chosen μ_N (upper bound). This supplies explicit rates that complement the qualitative narrow-convergence result of Fornasier and Hütter.

Significance. If the result holds, it furnishes the first sharp, explicit convergence rates for energy-distance approximation of Ahlfors-regular measures by empirical measures. The two-sided character of the bound, the direct dependence on the regularity exponent β, and the derivation via covering arguments and potential-theoretic comparisons constitute a clean contribution to quantitative potential theory and discrepancy theory.

minor comments (3)
  1. §2, Definition 2.3: the precise statement of the Ahlfors regularity condition (including the admissible range for β relative to dimension d) should be recalled explicitly before the main theorem, rather than only referenced.
  2. The proof of the lower bound in §5 invokes a potential-theoretic comparison on balls of radius N^{-1/β}; a short remark clarifying why the constant in the comparison is independent of the particular ball would improve readability.
  3. Figure 1 (if present) or the numerical illustration in §6: the caption should state the precise values of q and β used in the simulation so that the plotted rate can be directly compared with the theoretical exponent.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report, so we have no individual points to address point-by-point. We will prepare a revised manuscript incorporating any minor editorial suggestions that may arise during the process.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained analytic proof

full rationale

The paper establishes the sharp two-sided rate directly from the Ahlfors regularity assumption of exponent β via an explicit covering argument for the upper bound (placing N points to control local energy at scale N^{-1/β}) and a potential-theoretic comparison for the lower bound (showing any N-point measure leaves discrepancy of the claimed order). The exponent 1/2(1 + q/β) follows from balancing the quadratic form of the power kernel against β-dimensional volume scaling, with no fitted parameters, self-definitional reductions, or load-bearing self-citations. The cited prior result of Fornasier and Hütter supplies only qualitative consistency and is not used to derive the quantitative rates, leaving the central claim independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the Ahlfors regularity assumption for the target measure and standard properties of the energy distance functional; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The probability measure ω satisfies an Ahlfors regularity condition of exponent β.
    This condition is explicitly required for the rate to hold and enters the exponent of the convergence bound.

pith-pipeline@v0.9.0 · 5839 in / 1352 out tokens · 48947 ms · 2026-05-20T08:22:15.853908+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

  1. [1]

    Oxford Mathematical Monographs

    Luigi Ambrosio, Nicola Fusco, and Diego Pallara.Functions of bounded variation and free dis- continuity problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford University Press, New York, 2000

  2. [2]

    Lectures in Mathematics ETH Z¨ urich

    Luigi Ambrosio, Nicola Gigli, and Giuseppe Savar´ e.Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Z¨ urich. Birkh¨ auser, Basel, 2nd edition, 2008

  3. [3]

    From kinetic theory to AI: A rediscovery of high-dimensional divergences and their properties.Mathematical Models and Methods in Applied Sciences, 2026

    Gennaro Auricchio, Giovanni Brigati, Paolo Giudici, and Giuseppe Toscani. From kinetic theory to AI: A rediscovery of high-dimensional divergences and their properties.Mathematical Models and Methods in Applied Sciences, 2026. Preprint arXiv:2507.11387

  4. [4]

    The equivalence of Fourier-based and Wasserstein metrics on imaging problems.Atti Accademia Nazionale dei Lincei

    Gennaro Auricchio, Andrea Codegoni, Stefano Gualandi, Giuseppe Toscani, and Marco Veneroni. The equivalence of Fourier-based and Wasserstein metrics on imaging problems.Atti Accademia Nazionale dei Lincei. Rendiconti Lincei. Matematica e Applicazioni, 31(3):627–649, 2020

  5. [5]

    Luca Brandolini, William W. L. Chen, Leonardo Colzani, Giacomo Gigante, and Giancarlo Travaglini. Discrepancy and numerical integration on metric measure spaces.Journal of Geo- metric Analysis, 29(1):328–369, 2019

  6. [6]

    A projection algorithm on measures sets

    Nicolas Chauffert, Philippe Ciuciu, Jonas Kahn, and Pierre Weiss. A projection method on mea- sures sets.Constructive Approximation, 45(1):83–111, February 2017. Preprint arXiv:1509.00229, 2015

  7. [7]

    Kernel two-sample tests for manifold data.Bernoulli, 30(4):2572– 2597, 2024

    Xiuyuan Cheng and Yao Xie. Kernel two-sample tests for manifold data.Bernoulli, 30(4):2572– 2597, 2024

  8. [8]

    Quantita- tive convergence of wasserstein gradient flows of kernel mean discrepancies.arXiv preprint arXiv:2603.01977,

    L´ ena ¨ ıc Chizat, Maria Colombo, Roberto Colombo, and Xavier Fern´ andez-Real. Quantita- tive convergence of Wasserstein gradient flows of kernel mean discrepancies, 2026. Preprint, arXiv:2603.01977. SHARP RATES OF MMD EMPIRICAL ESTIMATION WITH POWER KERNELS 33

  9. [9]

    Sinkhorn distances: Lightspeed computation of optimal transport

    Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Christopher J. C. Burges, L´ eon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 26, pages 2292–2300, 2013

  10. [10]

    Sugli estremi dei momenti delle funzioni di ripartizione doppia.Annali della Scuola Normale Superiore di Pisa - Scienze Fisiche e Matematiche, Ser

    Giorgio Dall’Aglio. Sugli estremi dei momenti delle funzioni di ripartizione doppia.Annali della Scuola Normale Superiore di Pisa - Scienze Fisiche e Matematiche, Ser. 3, 10(1-2):35–74, 1956

  11. [11]

    Constructive quantization: approxi- mation by empirical measures.Annales de l’I.H.P

    Steffen Dereich, Michael Scheutzow, and Reik Schottstedt. Constructive quantization: approxi- mation by empirical measures.Annales de l’I.H.P. Probabilit´ es et statistiques, 49(4):1183–1203, 2013

  12. [12]

    Asymptotic behavior of gradient flows driven by nonlocal power repulsion and attraction potentials in one dimension.SIAM Journal on Mathematical Analysis, 46(6):3814–3837, 2014

    Marco Di Francesco, Massimo Fornasier, Jan-Christian H¨ utter, and Daniel Matthes. Asymptotic behavior of gradient flows driven by nonlocal power repulsion and attraction potentials in one dimension.SIAM Journal on Mathematical Analysis, 46(6):3814–3837, 2014

  13. [13]

    Springer Monographs in Mathematics

    Irene Fonseca and Giovanni Leoni.Modern Methods in the Calculus of Variations: Lp Spaces. Springer Monographs in Mathematics. Springer New York, 2007

  14. [14]

    Consistency of variational continuous- domain quantization via kinetic theory.Applicable Analysis, 92(6):1283–1298, 2013

    Massimo Fornasier, Jan Haˇ skovec, and Gabriele Steidl. Consistency of variational continuous- domain quantization via kinetic theory.Applicable Analysis, 92(6):1283–1298, 2013

  15. [15]

    Consistency of Probability Measure Quantization by Means of Power Repulsion-Attraction Potentials

    Massimo Fornasier and Jan-Christian H¨ utter. Consistency of probability measure quantization by means of power repulsion–attraction potentials.Journal of Fourier Analysis and Applications, 22(3):694–749, 2016. Preprint arXiv:1310.1120, 2013

  16. [16]

    On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3–4):707–738, 2015

    Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3–4):707–738, 2015

  17. [17]

    Birkh¨ auser, Boston, 1997

    Bert Fristedt and Lawrence Gray.A Modern Approach to Probability Theory. Birkh¨ auser, Boston, 1997

  18. [18]

    Diameter bounded equal measure partitions of Ahlfors regular metric measure spaces.Discrete Comput

    Giacomo Gigante and Paul Leopardi. Diameter bounded equal measure partitions of Ahlfors regular metric measure spaces.Discrete Comput. Geom., 57(2):419–430, 2017

  19. [19]

    Springer, Berlin, 2000

    Siegfried Graf and Harald Luschgy.Foundations of Quantization for Probability Distributions, volume 1730 ofLecture Notes in Mathematics. Springer, Berlin, 2000

  20. [20]

    Borgwardt, Malte J

    Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(25):723–773, 2012

  21. [21]

    Borgwardt, Malte J

    Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander J. Smola. A kernel method for the two-sample-problem. In Bernhard Sch¨ olkopf, John Platt, and Thomas Hofmann, editors,Advances in Neural Information Processing Systems 19 (NIPS 2006), pages 513–520. MIT Press, December 2006

  22. [22]

    Posterior sampling based on gradient flows of the MMD with negative distance kernel

    Paul Hagemann, Johannes Hertrich, Fabian Altekr¨ uger, Robert Beinert, Jannis Chemseddine, and Gabriele Steidl. Posterior sampling based on gradient flows of the MMD with negative distance kernel. InThe Twelfth International Conference on Learning Representations, 2024

  23. [23]

    Generative sliced MMD flows with riesz kernels

    Johannes Hertrich, Christian Wald, Fabian Altekr¨ uger, and Paul Hagemann. Generative sliced MMD flows with riesz kernels. InThe Twelfth International Conference on Learning Represen- tations, 2024

  24. [24]

    Hutchinson

    John E. Hutchinson. Fractals and self-similarity.Indiana Univ. Math. J., 30(5):713–747, 1981

  25. [25]

    Distance covariance in metric spaces.The Annals of Probability, 41(5):3284–3305, 2013

    Russell Lyons. Distance covariance in metric spaces.The Annals of Probability, 41(5):3284–3305, 2013

  26. [26]

    Characterization of translation invariant MMD on Rd and connections with Wasserstein distances.Journal of Machine Learning Research, 25:1–39, 2024

    Thibault Modeste and Cl´ ement Dombry. Characterization of translation invariant MMD on Rd and connections with Wasserstein distances.Journal of Machine Learning Research, 25:1–39, 2024

  27. [27]

    Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019

    Gabriel Peyr´ e and Marco Cuturi. Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019

  28. [28]

    Electrostatic halftoning.Computer Graphics Forum, 29(8):2313–2327, December 2010

    Christian Schmaltz, Pascal Gwosdek, Andr´ es Bruhn, and Joachim Weickert. Electrostatic halftoning.Computer Graphics Forum, 29(8):2313–2327, December 2010

  29. [29]

    I. J. Schoenberg. Metric spaces and completely monotone functions.Annals of Mathematics, 39(4):811–841, 1938

  30. [30]

    I. J. Schoenberg. Metric spaces and positive definite functions.Transactions of the American Mathematical Society, 44(3):522–536, November 1938

  31. [31]

    Equivalence of distance-based and RKHS-based statistics in hypothesis testing.The Annals of Statistics, 41(5):2263–2291, October 2013

    Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. Equivalence of distance-based and RKHS-based statistics in hypothesis testing.The Annals of Statistics, 41(5):2263–2291, October 2013. SHARP RATES OF MMD EMPIRICAL ESTIMATION WITH POWER KERNELS 34

  32. [32]

    Vogelstein

    Cencheng Shen and Joshua T. Vogelstein. The exact equivalence of distance and kernel methods in hypothesis testing.AStA Advances in Statistical Analysis, 105(3):385–403, 2021

  33. [33]

    Sz´ ekely

    G´ abor J. Sz´ ekely. Potential and kinetic energy in statistics. Lecture Notes, Budapest Institute of Technology (Technical University of Budapest), 1989

  34. [34]

    Sz´ ekely

    G´ abor J. Sz´ ekely. E-statistics: The Energy of Statistical Samples. Technical Report 02-16, Department of Mathematics and Statistics, Bowling Green State University, 2002

  35. [35]

    Sz´ ekely and Maria L

    G´ abor J. Sz´ ekely and Maria L. Rizzo. A new test for multivariate normality.Journal of Multivariate Analysis, 93(1):58–80, 2005

  36. [36]

    Sz´ ekely and Maria L

    G´ abor J. Sz´ ekely and Maria L. Rizzo. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143(8):1249–1272, August 2013

  37. [37]

    Sz´ ekely and Maria L

    G´ abor J. Sz´ ekely and Maria L. Rizzo.The Energy of Data and Distance Correlation, volume 171 ofChapman & Hall/CRC Monographs on Statistics and Applied Probability. Chapman and Hall/CRC Press, Boca Raton, 2023

  38. [38]

    Sz´ ekely, Maria L

    G´ abor J. Sz´ ekely, Maria L. Rizzo, and Nail K. Bakirov. Measuring and testing dependence by correlation of distances.The Annals of Statistics, 35(6):2769–2794, December 2007

  39. [39]

    Dithering by differences of convex functions.SIAM Journal on Imaging Sciences, 4(1):79–108, 2011

    Tanja Teuber, Gabriele Steidl, Pascal Gwosdek, Christian Schmaltz, and Joachim Weickert. Dithering by differences of convex functions.SIAM Journal on Imaging Sciences, 4(1):79–108, 2011

  40. [40]

    Cambridge University Press, Cambridge, 2005

    Holger Wendland.Scattered Data Approximation, volume 17 ofCambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2005

  41. [41]

    Kernel two-sample tests in high dimensions: interplay between moment discrepancy and dimension-and-sample orders.Biometrika, 110(2):411–430, 2023

    Jian Yan and Xianyang Zhang. Kernel two-sample tests in high dimensions: interplay between moment discrepancy and dimension-and-sample orders.Biometrika, 110(2):411–430, 2023

  42. [42]

    Paul L. Zador. Topics in the asymptotic quantization of continuous random variables. Technical report, Bell Laboratories, Murray Hill, NJ, 1966. (Francesco Colasanto)Department of Mathematics, CIT School, Technical University of Munich, Munich, Germany Email address:francesco.colasanto@tum.de (Matteo Focardi)DiMaI U. Dini, Universit `a di Firenze, Florenc...