Two-Sample Homogeneity Test via Entropic Optimal Transport
Pith reviewed 2026-06-27 12:26 UTC · model grok-4.3
The pith
The squared L2 distance between empirical entropic optimal transport maps from a uniform reference provides a consistent test for equality of two distributions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that the population map discrepancy is identifiable, derive the functional CLT for the empirical map difference under the null yielding a Gaussian quadratic-form limit, prove consistency against fixed alternatives, characterize local asymptotic power under contiguous alternatives, and validate a weighted multiplier bootstrap for the non-pivotal null distribution.
What carries the argument
The entropic optimal transport map from the uniform law on the unit ball to each target distribution; the squared L2 norm of the difference between two such maps forms the test statistic.
Load-bearing premise
The entropic regularization parameter is fixed in advance and the reference measure is the uniform distribution on the unit ball, with the data distributions admitting well-behaved EOT maps.
What would settle it
Empirical rejection rates under the null that deviate substantially from the nominal level, or failure to reject with high probability when the two distributions are known to differ by a fixed amount.
Figures
read the original abstract
This paper proposes a two-sample homogeneity test based on entropic optimal transport (EOT) maps from a common reference distribution -- the uniform law on the unit ball. The test statistic is the squared $L^2$-distance between the two empirical EOT maps. For fixed entropic regularization parameter, we prove that the population map discrepancy is identifiable, derive a functional central limit theorem for the empirical map difference under the null, and establish the Gaussian quadratic-form null limit. We also prove consistency against fixed alternatives and characterize local asymptotic power under contiguous alternatives. A weighted multiplier bootstrap is proposed to calibrate the non-pivotal null distribution, and its validity is established. Extensive simulations demonstrate that the proposed EOT-map test has reliable finite-sample size control and exhibits competitive power compared with other existing methods. The method is particularly powerful for location alternatives and, beyond a single scalar discrepancy, it provides additional diagnostic information on how the two distributions differ. Finally, a real data application concludes the paper.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-sample homogeneity test using the squared L² distance between empirical entropic optimal transport (EOT) maps from a fixed common reference (uniform on the unit ball). For fixed regularization parameter, it claims to prove identifiability of the population map discrepancy, a functional CLT for the empirical map difference under the null yielding a Gaussian quadratic-form limit, consistency against fixed alternatives, local asymptotic power under contiguous alternatives, and validity of a weighted multiplier bootstrap for the non-pivotal limit. Simulations and a real-data example are included.
Significance. If the regularity conditions hold, the method supplies both a calibrated test statistic and diagnostic information on distributional differences beyond a scalar p-value, with competitive power for location shifts. The combination of identifiability, FCLT, bootstrap validity, and local-power characterization is a substantive contribution to nonparametric two-sample testing.
major comments (2)
- [FCLT theorem statement and proof] The functional CLT (and hence the Gaussian quadratic-form null limit and bootstrap validity) rests on tightness of the centered empirical EOT-map process in the relevant function space. The manuscript must explicitly state the moment/support conditions on P and Q (relative to the fixed uniform reference on the unit ball) that guarantee this tightness; without them the central limit theorem does not follow from standard empirical-process arguments.
- [Identifiability and consistency sections] The identifiability claim and consistency result are stated for the population map discrepancy under the fixed reference measure. The paper should clarify whether the unit-ball support of the reference is essential or whether the results extend to other compactly supported references without altering the test statistic's form.
minor comments (1)
- [Notation and bootstrap section] Notation for the EOT map functional and the precise definition of the weighted multiplier bootstrap weights should be introduced earlier and used consistently.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the presentation.
read point-by-point responses
-
Referee: [FCLT theorem statement and proof] The functional CLT (and hence the Gaussian quadratic-form null limit and bootstrap validity) rests on tightness of the centered empirical EOT-map process in the relevant function space. The manuscript must explicitly state the moment/support conditions on P and Q (relative to the fixed uniform reference on the unit ball) that guarantee this tightness; without them the central limit theorem does not follow from standard empirical-process arguments.
Authors: We agree that the conditions guaranteeing tightness of the empirical EOT-map process must be stated explicitly for the FCLT to be fully rigorous. The current manuscript assumes P and Q admit densities with respect to Lebesgue measure and possess finite moments of sufficiently high order (to control the Lipschitz constants of the EOT potentials), but these are not collected in a single assumption. In the revision we will add an explicit Assumption (new Assumption 2.2) stating that P and Q are supported on a fixed compact convex set containing the unit ball in its interior and have finite (2+δ)-moments for some δ>0; we will then verify in the proof of Theorem 3.1 that these conditions imply the required entropy-integrability condition for tightness in the Hölder space used for the functional CLT. revision: yes
-
Referee: [Identifiability and consistency sections] The identifiability claim and consistency result are stated for the population map discrepancy under the fixed reference measure. The paper should clarify whether the unit-ball support of the reference is essential or whether the results extend to other compactly supported references without altering the test statistic's form.
Authors: The unit ball is selected for computational convenience and to ensure an explicit characterization of the reference measure, but it is not essential to the theoretical results. Identifiability of the map discrepancy follows from the strict convexity of the entropic transport cost and the uniqueness of the Brenier potential for any reference measure that is absolutely continuous with respect to Lebesgue measure on a compact convex set with nonempty interior; the same argument applies verbatim to consistency. We will insert a short remark after Definition 2.1 clarifying that all results in Sections 3 and 4 continue to hold, with no change to the form of the test statistic, when the reference is replaced by any other fixed compactly supported probability measure satisfying the same absolute-continuity condition. revision: yes
Circularity Check
No circularity: standard empirical-process arguments applied to EOT map functional
full rationale
The paper states that for fixed entropic regularization it proves identifiability of the population map discrepancy, derives a functional CLT for the empirical map difference under the null, obtains a Gaussian quadratic-form limit, proves consistency, and characterizes local power, all under the maintained assumption that the data-generating distributions admit well-defined EOT maps from the fixed uniform reference on the unit ball. These are presented as consequences of standard tightness and convergence conditions in the relevant function space rather than as quantities fitted or defined in terms of the target test statistic. No self-definitional loop, fitted-input prediction, or load-bearing self-citation chain appears in the derivation chain; the central claims rest on external empirical-process theory applied to the EOT functional.
Axiom & Free-Parameter Ledger
free parameters (1)
- entropic regularization parameter
axioms (2)
- domain assumption The EOT map functional satisfies the conditions for a functional central limit theorem under the null (tightness, finite-dimensional convergence).
- domain assumption The population map discrepancy is identifiable when the two distributions differ.
Reference graph
Works this paper leans on
-
[1]
The Econometrics Journal , volume =
R-estimators in GARCH models: asymptotics and applications , author =. The Econometrics Journal , volume =. 2022 , doi =
2022
-
[2]
2025 , eprint=
Multidimensional Stochastic Dominance Test Based on Center-outward Quantiles , author=. 2025 , eprint=
2025
-
[3]
2024 , eprint=
Quantiles and Quantile Regression on Riemannian Manifolds: a measure-transportation-based approach , author=. 2024 , eprint=
2024
-
[4]
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , series =
Sample Complexity of Sinkhorn Divergences , author =. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , series =. 2019 , url =
2019
-
[5]
Advances in Neural Information Processing Systems , volume =
Statistical Bounds for Entropic Optimal Transport: Sample Complexity and the Central Limit Theorem , author =. Advances in Neural Information Processing Systems , volume =. 2019 , url =
2019
-
[6]
Proceedings of the Thirty-Ninth International Conference on Machine Learning , series =
Debiaser Beware: Pitfalls of Centering Regularized Transport Maps , author =. Proceedings of the Thirty-Ninth International Conference on Machine Learning , series =. 2022 , url =
2022
-
[7]
Advances in Neural Information Processing Systems , volume =
Sinkhorn Distances: Lightspeed Computation of Optimal Transport , author =. Advances in Neural Information Processing Systems , volume =. 2013 , url =
2013
-
[8]
Journal of Machine Learning Research , volume =
A Kernel Two-Sample Test , author =. Journal of Machine Learning Research , volume =. 2012 , url =
2012
-
[10]
2003 , doi =
Topics in Optimal Transportation , author =. 2003 , doi =
2003
-
[11]
2009 , doi =
Optimal Transport: Old and New , author =. 2009 , doi =
2009
-
[12]
2008 , doi =
Introduction to Empirical Processes and Semiparametric Inference , author =. 2008 , doi =
2008
-
[13]
Journal of the American Statistical Association , volume =
Multivariate Two-Sample Tests Based on Nearest Neighbors , author =. Journal of the American Statistical Association , volume =. 1986 , doi =
1986
-
[14]
The Annals of Statistics , volume =
A multivariate two-sample test based on the number of nearest neighbor type coincidences , author =. The Annals of Statistics , volume =. 1988 , doi =
1988
-
[15]
1998 , doi =
Asymptotic Statistics , author =. 1998 , doi =
1998
-
[16]
1996 , doi =
Weak Convergence and Empirical Processes: With Applications to Statistics , author =. 1996 , doi =
1996
-
[17]
The Annals of Statistics , volume =
Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests , author =. The Annals of Statistics , volume =. 1979 , doi =
1979
-
[18]
Foundations and Trends in Machine Learning , volume =
Computational Optimal Transport: With Applications to Data Science , author =. Foundations and Trends in Machine Learning , volume =. 2019 , publisher =
2019
-
[19]
2015 , doi =
Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling , author =. 2015 , doi =
2015
-
[20]
Proceedings of the American Mathematical Society , volume=
Anderson inequality is strict for Gaussian and stable measures , author=. Proceedings of the American Mathematical Society , volume=. 1995 , doi=
1995
-
[21]
Annales de l'Institut Henri Poincar
Central Limit Theorems for General Transportation Costs , author =. Annales de l'Institut Henri Poincar. 2024 , doi =
2024
-
[22]
SIAM Journal on Mathematics of Data Science , volume =
Empirical Regularized Optimal Transport: Statistical Theory and Applications , author =. SIAM Journal on Mathematics of Data Science , volume =. 2020 , doi =
2020
-
[23]
Journal of the American Statistical Association , volume=
Multivariate rank-based distribution-free nonparametric testing using measure transportation , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=
2023
-
[24]
A Survey of the
L. A Survey of the. Discrete and Continuous Dynamical Systems - A , volume =. 2014 , doi =
2014
-
[25]
Distribution and quantile functions, ranks and signs in dimension
Hallin, Marc and del Barrio, Eustasio and Cuesta-Albertos, Juan and Matr. Distribution and quantile functions, ranks and signs in dimension. The Annals of Statistics , volume=. 2021 , doi=
2021
-
[26]
Journal of Multivariate Analysis , volume=
On a new multivariate two-sample test , author=. Journal of Multivariate Analysis , volume=. 2004 , doi=
2004
-
[27]
Journal of the Royal Statistical Society: Series B , volume=
An exact distribution-free test comparing two multivariate distributions based on adjacency , author=. Journal of the Royal Statistical Society: Series B , volume=. 2005 , doi=
2005
-
[28]
Journal of the American Statistical Association , volume=
A new graph-based two-sample test for multivariate and object data , author=. Journal of the American Statistical Association , volume=. 2017 , doi=
2017
-
[29]
2020 , doi =
An Invitation to Statistics in Wasserstein Space , author =. 2020 , doi =
2020
-
[30]
The Annals of Probability , volume =
Exchangeably Weighted Bootstraps of the General Empirical Process , author =. The Annals of Probability , volume =. 1993 , doi =
1993
-
[31]
The Annals of Statistics , volume=
On the sample complexity of entropic optimal transport , author=. The Annals of Statistics , volume=. 2025 , publisher=
2025
-
[32]
Stability of Schr
Nutz, Marcel and Wiesel, Johannes , journal =. Stability of Schr. 2023 , doi =
2023
-
[33]
Journal of Statistical Planning and Inference , volume =
Energy Statistics: A Class of Statistics Based on Distances , author =. Journal of Statistical Planning and Inference , volume =. 2013 , doi =
2013
-
[34]
Entropy , volume =
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , author =. Entropy , volume =. 2017 , doi =
2017
-
[35]
Electronic Journal of Statistics , volume =
Multivariate Goodness-of-Fit Tests Based on Wasserstein Distance , author =. Electronic Journal of Statistics , volume =. 2021 , doi =
2021
-
[36]
Electronic Journal of Statistics , volume =
Limit Theorems for Entropic Optimal Transport Maps and Sinkhorn Divergence , author =. Electronic Journal of Statistics , volume =. 2024 , doi =
2024
-
[37]
On a new multivariate two-sample test
Ludwig Baringhaus and Carsten Franz. On a new multivariate two-sample test. Journal of Multivariate Analysis, 88 0 (1): 0 190--206, 2004. doi:10.1016/S0047-259X(03)00079-4
-
[38]
doi: 10.1080/01621459.2016.1211016
Hao Chen and Jerome H. Friedman. A new graph-based two-sample test for multivariate and object data. Journal of the American Statistical Association, 112 0 (517): 0 397--409, 2017. doi:10.1080/01621459.2016.1147356
-
[39]
Sinkhorn distances: Lightspeed computation of optimal transport
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, volume 26, pages 2292--2300, 2013. URL https://papers.nips.cc/paper_files/paper/2013/hash/af21d0c97db2e27e13572cbf59eb343d-Abstract.html
2013
-
[40]
Journal of the American Statistical Association , volume =
Nabarun Deb and Bodhisattva Sen. Multivariate rank-based distribution-free nonparametric testing using measure transportation. Journal of the American Statistical Association, 118 0 (541): 0 192--207, 2023. doi:10.1080/01621459.2021.1923508
-
[41]
Central limit theorems for general transportation costs
Eustasio del Barrio, Alberto Gonz \'a lez-Sanz, and Jean-Michel Loubes. Central limit theorems for general transportation costs. Annales de l'Institut Henri Poincar \'e , Probabilit \'e s et Statistiques , 60 0 (2): 0 847--873, 2024. doi:10.1214/22-AIHP1356
-
[42]
Jerome H. Friedman and Lawrence C. Rafsky. Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests. The Annals of Statistics, 7 0 (4): 0 697--717, 1979. doi:10.1214/aos/1176344722
-
[43]
Sample complexity of sinkhorn divergences
Aude Genevay, L \'e na \"i c Chizat, Francis Bach, Marco Cuturi, and Gabriel Peyr \'e . Sample complexity of sinkhorn divergences. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 1574--1583, 2019. URL http://proceedings.mlr.press/v89/geneva...
2019
-
[44]
Limit theorems for entropic optimal transport maps and sinkhorn divergence
Ziv Goldfeld, Kengo Kato, Gabriel Rioux, and Ritwik Sadhu. Limit theorems for entropic optimal transport maps and sinkhorn divergence. Electronic Journal of Statistics, 18 0 (1): 0 980--1041, 2024. doi:10.1214/24-EJS2217
-
[45]
Borgwardt, Malte J
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch \"o lkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13 0 (25): 0 723--773, 2012. URL http://jmlr.org/papers/v13/gretton12a.html
2012
-
[46]
Marc Hallin and Hang Liu. Quantiles and quantile regression on riemannian manifolds: a measure-transportation-based approach, 2024. URL https://arxiv.org/abs/2410.15711
arXiv 2024
-
[47]
Marc Hallin, Eustasio del Barrio, Juan Cuesta-Albertos, and Carlos Matr \'a n. Distribution and quantile functions, ranks and signs in dimension d : A measure transportation approach. The Annals of Statistics, 49 0 (2): 0 1139--1165, 2021 a . doi:10.1214/20-AOS1996
-
[48]
Multivariate goodness-of-fit tests based on wasserstein distance
Marc Hallin, Gilles Mordant, and Johan Segers. Multivariate goodness-of-fit tests based on wasserstein distance. Electronic Journal of Statistics, 15 0 (1): 0 1328--1371, 2021 b . doi:10.1214/21-EJS1816
-
[49]
A multivariate two-sample test based on the number of nearest neighbor type coincidences
Norbert Henze. A multivariate two-sample test based on the number of nearest neighbor type coincidences. The Annals of Statistics, 16 0 (2): 0 772--783, 1988. doi:10.1214/aos/1176350835
-
[50]
Empirical regularized optimal transport: Statistical theory and applications
Marcel Klatt, Carla Tameling, and Axel Munk. Empirical regularized optimal transport: Statistical theory and applications. SIAM Journal on Mathematics of Data Science, 2 0 (2): 0 419--443, 2020. doi:10.1137/19M1278788
-
[51]
Michael R. Kosorok. Introduction to Empirical Processes and Semiparametric Inference. Springer, New York, 2008. doi:10.1007/978-0-387-74978-5
-
[52]
A survey of the S chr \"o dinger problem and some of its connections with optimal transport
Christian L \'e onard. A survey of the S chr \"o dinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems - A, 34 0 (4): 0 1533--1574, 2014. doi:10.3934/dcds.2014.34.1533
-
[53]
Anderson inequality is strict for gaussian and stable measures
Maciej Lewandowski, Micha Ryznar, and Tomasz \.Z ak. Anderson inequality is strict for gaussian and stable measures. Proceedings of the American Mathematical Society, 123 0 (12): 0 3875--3880, 1995. doi:10.1090/S0002-9939-1995-1264821-6
-
[54]
R-estimators in garch models: asymptotics and applications
Hang Liu and Kanchan Mukherjee. R-estimators in garch models: asymptotics and applications. The Econometrics Journal, 25 0 (1): 0 98--113, 2022. doi:10.1093/ectj/utab026
-
[55]
Multidimensional stochastic dominance test based on center-outward quantiles, 2025
Yiming Ma, Hang Liu, and Weiwei Zhuang. Multidimensional stochastic dominance test based on center-outward quantiles, 2025. URL https://arxiv.org/abs/2512.19966
arXiv 2025
-
[56]
Statistical bounds for entropic optimal transport: Sample complexity and the central limit theorem
Gonzalo Mena and Jonathan Niles-Weed. Statistical bounds for entropic optimal transport: Sample complexity and the central limit theorem. In Advances in Neural Information Processing Systems, volume 32, pages 4541--4551, 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/5acdc9ca5d99ae66afdfe1eea0e3b26b-Paper.pdf
2019
-
[57]
Stability of schr \"o dinger potentials and convergence of sinkhorn's algorithm
Marcel Nutz and Johannes Wiesel. Stability of schr \"o dinger potentials and convergence of sinkhorn's algorithm. The Annals of Probability, 51 0 (2): 0 699--722, 2023. doi:10.1214/22-AOP1611
-
[58]
Victor M. Panaretos and Yoav Zemel. An Invitation to Statistics in Wasserstein Space. SpringerBriefs in Probability and Mathematical Statistics. Springer, Cham, 2020. doi:10.1007/978-3-030-38438-8
-
[59]
Foundations and Trends in Machine Learning , volume =
Gabriel Peyr \'e and Marco Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning, 11 0 (5--6): 0 355--607, 2019. doi:10.1561/2200000073
-
[60]
Entropic estimation of optimal transport maps
Aram-Alexandre Pooladian and Jonathan Niles-Weed. Entropic estimation of optimal transport maps. arXiv preprint arXiv:2109.12004, 2021. URL https://arxiv.org/abs/2109.12004
arXiv 2021
-
[61]
Debiaser beware: Pitfalls of centering regularized transport maps
Aram-Alexandre Pooladian, Marco Cuturi, and Jonathan Niles-Weed. Debiaser beware: Pitfalls of centering regularized transport maps. In Proceedings of the Thirty-Ninth International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 17830--17847, 2022. URL https://proceedings.mlr.press/v162/pooladian22a.html
2022
-
[62]
Jens Pr stgaard and Jon A. Wellner. Exchangeably weighted bootstraps of the general empirical process. The Annals of Probability, 21 0 (4): 0 2053--2086, 1993. doi:10.1214/aop/1176989011
-
[63]
On wasserstein two-sample testing and related families of nonparametric tests
Aaditya Ramdas, Nicol \'a s Garc \'i a Trillos, and Marco Cuturi. On wasserstein two-sample testing and related families of nonparametric tests. Entropy, 19 0 (2): 0 47, 2017. doi:10.3390/e19020047
-
[64]
On the sample complexity of entropic optimal transport
Philippe Rigollet and Austin J Stromme. On the sample complexity of entropic optimal transport. The Annals of Statistics, 53 0 (1): 0 61--90, 2025. doi:10.1214/24-AOS2455
-
[65]
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , author =
Paul R. Rosenbaum. An exact distribution-free test comparing two multivariate distributions based on adjacency. Journal of the Royal Statistical Society: Series B, 67 0 (4): 0 515--530, 2005. doi:10.1111/j.1467-9868.2005.00513.x
-
[66]
Filippo Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling, volume 87 of Progress in Nonlinear Differential Equations and Their Applications. Birkh \"a user, Cham, 2015. doi:10.1007/978-3-319-20828-2
-
[67]
Mark F. Schilling. Multivariate two-sample tests based on nearest neighbors. Journal of the American Statistical Association, 81 0 (395): 0 799--806, 1986. doi:10.1080/01621459.1986.10478337
-
[68]
G \'a bor J. Sz \'e kely and Maria L. Rizzo. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143 0 (8): 0 1249--1272, 2013. doi:10.1016/j.jspi.2013.03.018
-
[69]
Cambridge university press, Cambridge
Aad W. van der Vaart. Asymptotic Statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998. doi:10.1017/CBO9780511802256
-
[70]
Aad W. van der Vaart and Jon A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York, 1996. doi:10.1007/978-1-4757-2545-2
-
[71]
Topics in Optimal Transportation, volume 58 of Graduate Studies in Mathematics
C \'e dric Villani. Topics in Optimal Transportation, volume 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2003. doi:10.1090/gsm/058
-
[72]
Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften
C \'e dric Villani. Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009. doi:10.1007/978-3-540-71050-9
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.