Recognition: 2 theorem links
· Lean TheoremPower Studies For Two-Sample and Goodness-of-Fit Methods For Multivariate Data
Pith reviewed 2026-05-13 04:58 UTC · model grok-4.3
The pith
No single test reliably delivers good power for all multivariate two-sample and goodness-of-fit problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Large-scale power simulations for multivariate goodness-of-fit and two-sample tests demonstrate that performance varies sharply with the specific hypothesis and alternative. No method can be trusted across the board, yet a compact set of methods can be chosen so that every examined scenario is covered by at least one strong performer. The studies include both continuous and discrete data in two dimensions and continuous data in higher dimensions, implemented via the R packages MD2sample and MDgof.
What carries the argument
Power simulation studies that systematically vary null hypotheses, alternatives, dimensions, and data types to compare multiple non-parametric multivariate tests.
If this is right
- Practitioners should maintain a small portfolio of tests rather than defaulting to any single method.
- For every case examined in the studies, at least one method from the recommended set has good power.
- The conclusion applies separately to two-dimensional continuous data, two-dimensional discrete data, and higher-dimensional continuous data.
- Software tools such as MD2sample and MDgof make it feasible to repeat or extend these power comparisons.
Where Pith is reading between the lines
- The same portfolio approach could be tested in settings with mixed data types or missing values not covered here.
- Analysts facing a new problem might first run a quick power check with the recommended set before choosing a final test.
- If the simulation scenarios turn out to be too narrow, the recommended set may need expansion for broader use.
Load-bearing premise
The chosen simulation scenarios and alternatives are representative of the multivariate data problems that arise in practice.
What would settle it
A new simulation study or real-data application outside the original scenarios in which none of the proposed methods achieves good power for a common alternative would falsify the recommendation.
read the original abstract
We present the results of a large number of simulation studies regarding the power of various goodness-of-fit as well as non-parametric two-sample tests for multivariate data. In two dimensions this includes both continuous and discrete data, in higher dimensions continuous data only. In general no single method can be relied upon to provide good power, any one method may be quite good for some combination of null hypothesis and alternative and may fail badly for another. Based on the results of these studies we propose a fairly small number of methods chosen such that for any of the case studies included here at least one of the methods has good power. The studies were carried out using the R packages MD2sample and MDgof, available from CRAN.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports results from a large number of simulation studies comparing the power of various nonparametric two-sample and goodness-of-fit tests for multivariate data. Two-dimensional cases include both continuous and discrete data, while higher dimensions consider continuous data only. The central claim is that no single method can be relied upon to deliver good power across all null-alternative combinations, and the authors propose a small curated set of methods such that at least one performs well for every case study examined. Simulations were performed using the R packages MD2sample and MDgof.
Significance. If the simulation design is sufficiently broad and representative, the work provides useful practical guidance for applied statisticians facing multivariate testing problems, underscoring the limitations of any one test and the value of a complementary portfolio. It adds empirical evidence to the nonparametric multivariate literature where analytic power results are rarely available.
major comments (2)
- [Abstract] Abstract: the abstract supplies no information on the number of Monte Carlo replications, the ranges of sample sizes and dimensions examined, or the precise alternatives (location shifts, scale changes, dependence alterations, mixtures, tail behavior). These omissions are load-bearing because the recommendation of a small reliable set rests entirely on the outcomes of these specific simulations.
- [Simulation design and results] Simulation design and results sections: the claim that the proposed small set suffices for all included case studies is weakened by the absence of explicit justification that the scenario grid covers representative regimes. The studies address 2D (continuous + discrete) and higher-D continuous data, but potential gaps remain for discrete data in D>2, p ≫ n, or strong dependence-structure changes; any such gap directly limits the generalizability of the practical recommendation.
minor comments (2)
- Add a summary table listing the recommended methods together with the specific null-alternative pairs for which each is reported to have good power.
- Ensure all simulation parameters (replications, n, p, alternative parameters) are tabulated or clearly referenced so that the studies are fully reproducible from the text.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight important aspects of clarity and scope that we address below. We have revised the manuscript to improve the abstract and to explicitly discuss the boundaries of our simulation coverage, ensuring the practical recommendation is appropriately contextualized.
read point-by-point responses
-
Referee: [Abstract] Abstract: the abstract supplies no information on the number of Monte Carlo replications, the ranges of sample sizes and dimensions examined, or the precise alternatives (location shifts, scale changes, dependence alterations, mixtures, tail behavior). These omissions are load-bearing because the recommendation of a small reliable set rests entirely on the outcomes of these specific simulations.
Authors: We agree that additional details in the abstract will help readers immediately understand the scope of the simulations supporting our recommendations. We have revised the abstract to state that 1000 Monte Carlo replications were used, sample sizes ranged from 20 to 200, dimensions from 2 to 10, and to briefly list the main alternative types examined (location shifts, scale changes, dependence alterations, mixtures, and tail behavior changes). revision: yes
-
Referee: [Simulation design and results] Simulation design and results sections: the claim that the proposed small set suffices for all included case studies is weakened by the absence of explicit justification that the scenario grid covers representative regimes. The studies address 2D (continuous + discrete) and higher-D continuous data, but potential gaps remain for discrete data in D>2, p ≫ n, or strong dependence-structure changes; any such gap directly limits the generalizability of the practical recommendation.
Authors: We acknowledge the gaps noted. The manuscript already states its scope explicitly (2D continuous and discrete; higher dimensions continuous only), and the proposed set is recommended only for the regimes we simulated. We have added a dedicated limitations paragraph in the conclusions that lists the uncovered regimes (discrete data for D>2, p ≫ n, and certain strong dependence changes) and states that the complementary set is not claimed to be universal. This makes the practical guidance appropriately bounded while preserving the empirical evidence for the cases examined. revision: partial
Circularity Check
No circularity: empirical simulation results with no derivation chain
full rationale
The paper reports outcomes from Monte Carlo power simulations for multivariate tests and recommends a small set of methods based on observed performance across the simulated scenarios. No equations, fitted parameters, or derivations are present that could reduce to inputs by construction. The central claim is explicitly scoped to the case studies examined, with no self-definitional loops, uniqueness theorems, or ansatzes smuggled via citation. Self-citation is absent from the provided text, and the argument rests on direct simulation evidence rather than any reduction to prior fitted results or self-referential definitions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The non-parametric tests under study satisfy their standard validity conditions in the data-generation processes used for the simulations.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearIn general no single method can be relied upon to provide good power... we propose a fairly small number of methods chosen such that for any of the case studies included here at least one of the methods has good power.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearThe studies were carried out using the R packages MD2sample and MDgof
Reference graph
Works this paper leans on
-
[1]
Ripley, B. D. , title =. Journal of Applied Probability , year =
- [2]
-
[3]
Fasano, G. and Franceschini, A. , title =. Monthly Notices of the Royal Astronomical Society , volume =. 1987 , doi =
work page 1987
-
[4]
Generalized Cramer–von Mises goodness-of-fit tests for multivariate distributions , journal =. 2009 , issn =. doi:https://doi.org/10.1016/j.csda.2009.04.004 , url =
-
[5]
Arthur Gretton and Karsten M. Borgwardt and Malte J. Rasch and Bernhard Scholkopf and Alexander J. Smola , title =. Journal of Machine Learning Research , year =
-
[6]
Arthur Gretton and Dino Sejdinovic and Heiko Strathmann and Sivaraman Balakrishnan and Massimiliano Pontil and Kenji Fukumizu and Bharath K. Sriperumbudur , title =. Advances in Neural Information Processing Systems (NIPS) , year =
-
[7]
Journal of Machine Learning , year=
Antonin Schrab and Ilmin Kim and Melisande Albert and Beatrice Laurent and Benjamin Guedj and Arthur Gretton , title =. Journal of Machine Learning , year=
-
[8]
Advances in Neural Information Processing Systems (NIPS) , year =
Wojciech Zaremba and Arthur Gretton and Matthew Blaschko , title =. Advances in Neural Information Processing Systems (NIPS) , year =
-
[9]
A Linear Time Kernel Goodness of Fit Test , booktitle =
Wittawat Jitkrittum and Wenkai Xu and Zolt. A Linear Time Kernel Goodness of Fit Test , booktitle =. 2017 , pages =
work page 2017
-
[10]
Kacper Chwialkowski and Heiko Strathmann and Arthur Gretton , title =. arXiv preprint , year =
-
[11]
Dino Sejdinovic and Bharath Sriperumbudur and Arthur Gretton and Kenji Fukumizu , title =. arXiv preprint , year =
-
[12]
Journal of Machine Learning Research , year =
Antonin Schrab and Ilmun Kim and Mélisande Albert and Béatrice Laurent and Benjamin Guedj and Arthur Gretton , title =. Journal of Machine Learning Research , year =
-
[13]
Journal of the Royal Statistical Society B , year =
Heishiro Kanagawa and Wittawat Jitkrittum and Lester Mackey and Kenji Fukumizu and Arthur Gretton , title =. Journal of the Royal Statistical Society B , year =
-
[14]
Qiang Liu and Jason Lee and Michael I. Jordan , title =. Proceedings of the 33rd International Conference on Machine Learning (ICML) , year =
-
[15]
Arthur Gretton and Karsten Borgwardt and Malte J. Rasch and Bernhard Sch. A Kernel Method for the Two Sample–Problem , booktitle =. 2006 , pages =
work page 2006
- [16]
-
[17]
Insurance: Mathematics and Economics , volume=
Goodness-of-fit tests for copulas: A review and power studies , author=. Insurance: Mathematics and Economics , volume=
-
[18]
An updated review of Goodness-of-Fit tests for regression models , journal =. 2013 , author =
work page 2013
-
[19]
P. J. Bickel and M. Rosenblatt , title =. Annals of Statistics , volume =. 1973 , doi =
work page 1973
-
[20]
A. Bakshaev and R. Rudzkis , title =. Nonlinear Analysis: Modelling and Control , volume =. 2015 , doi =
work page 2015
-
[21]
M. F. Schilling , title =. The Annals of Statistics , number =. 1983 , doi =
work page 1983
-
[22]
P. J. Bickel and L. Breiman , title =. The Annals of Probability , volume =. 1983 , doi =
work page 1983
-
[23]
Journal of Instrumentation , year=
How good are your fits? Unbinned multivariate goodness-of-fit tests in high energy physics , author=. Journal of Instrumentation , year=
- [24]
-
[25]
Charm Dalitz plot analysis formalism and results
Asner, David M. Charm Dalitz plot analysis formalism and results. Int. J. Mod. Phys. A. 2004. doi:10.1142/S0217751X04018333. arXiv:hep-ex/0410014
- [26]
- [27]
- [28]
-
[29]
Ecume: Equality of 2 (or k) Continuous Univariate and Multivariate Distributions , author =. 2025 , note =
work page 2025
-
[30]
The Annals of Statistics7(1), 1–26 (1979) https://doi.org/10.1214/aos/1176344552
Multivariate generalizations of the Wald-wolfowitz and Smirnov two-sample tests , volume=. The Annals of Statistics , author=. 1979 , month=. doi:10.1214/aos/1176344722 , number=
-
[31]
Journal of the American Statistical Association , volume =
A new graph-based two-sample test for multivariate and object data , author =. Journal of the American Statistical Association , volume =
- [32]
-
[33]
The Annals of Statistics , author=
Ball divergence: Nonparametric Two sample test , volume=. The Annals of Statistics , author=. 2018 , month=. doi:10.1214/17-aos1579 , number=
- [34]
-
[35]
B. Aslan and G. Zech , title =. Journal of Statistical Computation and Simulation , volume =. 2005 , publisher =. doi:10.1080/00949650410001661440 , URL =
-
[36]
L. Baringhaus and C. Franz , title =. Journal of Multivariate Analysis , volume =
-
[37]
M. Biswas and A. Ghosh , title =. Journal of Multivariate Analysis , volume =. 2014 , pages =
work page 2014
-
[38]
Problemy Peredachi Informatsii , year=
Markov Processes over Denumerable Products of Spaces, Describing Large Systems of Automata , author=. Problemy Peredachi Informatsii , year=
-
[39]
Powerful Two Sample Tests Based on the Likelihood Ratio , author=. Techometrics , year=
-
[40]
Limit Theorems Associated with Variants of the von Mises Statistic , author=. Ann. Math. Statist. , year=
-
[41]
Consistency and Unbiasedness of Certain Nonparametric Tests , author=. Ann. MAth. Statist. , year=
-
[42]
The Annals of Mathematical Statistics , year=
On the Distribution of the Two-Sample Cramer-von Mises Criterion , author=. The Annals of Mathematical Statistics , year=
-
[43]
Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen , Year=
Tests concerning Random Points on a Circle , author=. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen , Year=
-
[44]
The annals of Mathematical Statistics , volume=
Asymptotic Theory of Certain Goodness of fit Criteria based on Stochastic Processes , author=. The annals of Mathematical Statistics , volume=. 1952 , publisher=
work page 1952
-
[45]
A Chi-square Goodness-of-Fit Test for Continuous Distributions against a known Alternative , Author=. 2021 , Journal=
work page 2021
-
[46]
Sulla determinazione empirica di una legge di distribuzione , Author =. 1933 , Journal =
work page 1933
-
[47]
Estimate of Deviation between Empirical Distribution Functions in two Independent Samples , Author=. 1939 , Journal=
work page 1939
-
[48]
Mathematical Statistics Vol 1 and 2 , Author =. 2015 , publisher =
work page 2015
- [49]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.