pith. machine review for the scientific record. sign in

arxiv: 2605.03886 · v1 · submitted 2026-05-05 · 📊 stat.CO · math.ST· stat.OT· stat.TH

Recognition: 2 theorem links

· Lean Theorem

More Permutations Do Not Always Increase Power: Non-monotonicity in Monte Carlo Permutation Tests

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:51 UTC · model grok-4.3

classification 📊 stat.CO math.STstat.OTstat.TH
keywords Monte Carlo permutation testsstatistical powernon-monotonicitydiscretenesspermutation testsMonte Carlo methodssaw-toothed structure
0
0 comments X

The pith

Monte Carlo permutation test power can decrease with more sampled permutations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Monte Carlo permutation tests provide model-free statistical inference by comparing an observed test statistic against values obtained from random permutations of the data. A standard practical assumption is that raising the number of Monte Carlo samples will steadily raise the test's ability to detect true effects. The paper demonstrates that this assumption fails because the power function takes a saw-toothed shape induced by the discrete nature of the approximated null distribution. As a direct result, there exist increases in the Monte Carlo budget that actually lower power, and such drops recur infinitely often. Readers should care because the finding directly affects how many permutations are chosen in routine applications of these tests.

Core claim

The power of a Monte Carlo permutation test is not a monotonically increasing function of the number of sampled permutations. Because the Monte Carlo estimate of the permutation distribution is discrete, the power function exhibits a saw-toothed pattern; the paper proves that this pattern produces infinitely many values of the Monte Carlo sample size at which power strictly decreases.

What carries the argument

The saw-toothed structure of the power function induced by the discreteness of the Monte Carlo approximation to the permutation distribution.

If this is right

  • Power can decrease when the Monte Carlo budget is increased from one integer to the next.
  • Such decreases occur for infinitely many budget values.
  • The non-monotonicity is explained solely by the discrete support of the estimated null distribution.
  • Standard advice to use as many permutations as computationally feasible does not guarantee higher power.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Routine software defaults that simply maximize the number of permutations may miss locally optimal choices.
  • Similar saw-tooth behavior may appear in other Monte Carlo tests that rely on discrete resampling distributions.
  • Adaptive rules that stop sampling once power begins to decline could be developed from the same structural argument.

Load-bearing premise

The power function of the Monte Carlo permutation test exhibits a saw-toothed shape induced by the discreteness of the permutation distribution.

What would settle it

A direct computation, for a fixed data set, test statistic, and nominal level, that shows the rejection probability at B Monte Carlo permutations exceeds the rejection probability at B+1 permutations for some B.

Figures

Figures reproduced from arXiv: 2605.03886 by Antonin Schrab, Ilmun Kim, Seongchan Lee, Suman Cha.

Figure 1
Figure 1. Figure 1: Unconditional power curves for the closed-form Bernoulli example at level view at source ↗
Figure 2
Figure 2. Figure 2: Estimated power curves for four permutation test statistics at level view at source ↗
Figure 3
Figure 3. Figure 3: Unconditional power Pow(B) for the closed-form Bernoulli example at level α = 0.05 with p0 = 0 and p1 = 0.16. The top and bottom panels show Case 1 (n = 15, B ≤ 5 × 106 ) and Case 2 (n = 25, B ≤ 104 ). Solid and dashed lines represent Pow(B) and Powexact, respectively. 16 view at source ↗
read the original abstract

Monte Carlo permutation tests are a cornerstone of valid, model-free statistical inference. A widely held practical intuition is that increasing the number of sampled permutations improves test performance, in particular that statistical power tends to increase with the Monte Carlo budget. In this paper, we show that these intuitions are false in general. Leveraging the saw-toothed structure of power arising from distributional discreteness, we provide a simple structural explanation for why power can decrease as the number of sampled permutations increases, and we prove that such decreases occur infinitely often as the Monte Carlo budget grows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that the power of Monte Carlo permutation tests is non-monotonic in the Monte Carlo sample size B. It attributes this to the saw-toothed structure induced by the discrete support of the p-value, which takes the form (k+1)/(B+1), and proves that power decreases occur infinitely often as B grows by exhibiting an alternative distribution whose test-statistic atoms align with the rejection-threshold jumps at arbitrarily large B.

Significance. If the result holds, it is significant for the theory and practice of nonparametric testing. The structural explanation based on discreteness directly challenges the widespread intuition that larger B always improves power, and the explicit construction proving infinite occurrences provides a clean, falsifiable contribution without relying on simulation or parameter fitting. This could inform better default choices of B in applied work.

major comments (1)
  1. [Main theorem and proof] The proof that decreases occur infinitely often (presumably in the main theorem) constructs an alternative distribution to align atoms with threshold jumps. The manuscript should state the precise conditions on the test statistic (e.g., whether it applies to any fixed discrete distribution or requires specific atom locations) to confirm the result is not restricted to contrived cases.
minor comments (2)
  1. [Introduction or early results] A brief numerical illustration of the saw-toothed power curve for small B (e.g., B from 10 to 100) would make the structural argument more accessible before the general proof.
  2. [Preliminaries] The notation for the p-value and rejection region should be introduced with an equation early on to anchor the discreteness argument.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation and constructive suggestion regarding the conditions in the main theorem. We address the comment below and will revise the manuscript to incorporate the requested clarification.

read point-by-point responses
  1. Referee: [Main theorem and proof] The proof that decreases occur infinitely often (presumably in the main theorem) constructs an alternative distribution to align atoms with threshold jumps. The manuscript should state the precise conditions on the test statistic (e.g., whether it applies to any fixed discrete distribution or requires specific atom locations) to confirm the result is not restricted to contrived cases.

    Authors: We appreciate this observation. Our main theorem establishes that power decreases occur infinitely often by explicit construction of an alternative distribution for which the test statistic has atoms that align with the jumps in the rejection threshold (k+1)/(B+1) at arbitrarily large B. This construction applies whenever the test statistic admits a discrete distribution under the alternative whose support points can be positioned to achieve such alignments; it does not require the distribution to be fixed in advance but rather demonstrates existence for alternatives that induce the necessary atoms. In the setting of permutation tests this covers a wide range of common statistics (e.g., rank or sum statistics when the data distribution under the alternative is discrete or lattice-valued). We will revise the manuscript to state these conditions explicitly, clarifying that the result shows the phenomenon is possible for such distributions rather than claiming it for every alternative, thereby confirming that the cases are not limited to contrived examples. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives non-monotonicity of Monte Carlo permutation test power directly from the discrete support of the p-value of the form (k+1)/(B+1) and the resulting jumps in the rejection threshold as B increases, while the test-statistic distribution under the alternative is held fixed. The proof that decreases occur infinitely often proceeds by explicit construction of an alternative distribution whose atoms align with those threshold jumps at arbitrarily large B. No parameters are estimated from data, no self-citations are load-bearing for the central claim, and the argument does not reduce any result to its own inputs by definition or renaming. The derivation rests on elementary properties of discrete distributions and is therefore independent of the paper's own fitted quantities or prior self-referential statements.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard domain property that permutation distributions are discrete, which produces jumps in the estimated critical value or p-value as the Monte Carlo sample size changes.

axioms (1)
  • domain assumption The permutation distribution is discrete, inducing a saw-toothed power function as the Monte Carlo sample size varies.
    This discreteness is invoked to explain the non-monotonic behavior and is a standard feature of permutation tests.

pith-pipeline@v0.9.0 · 5400 in / 1217 out tokens · 34325 ms · 2026-05-08T17:51:07.325229+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 2 canonical work pages

  1. [1]

    Biometrika , volume=

    Sequential monte carlo p-values , author=. Biometrika , volume=. 1991 , publisher=

  2. [2]

    The Annals of Mathematical Statistics , volume=

    Modified randomization tests for nonparametric hypotheses , author=. The Annals of Mathematical Statistics , volume=. 1957 , publisher=

  3. [3]

    Barnard, G. A. , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =. 1963 , publisher =

  4. [4]

    Barnard's Monte Carlo tests: how many simulations?

    Marriott, Francis HC , journal=. Barnard's Monte Carlo tests: how many simulations?. 1979 , publisher=

  5. [5]

    On the number of replications in resampling tests and Monte Carlo simulation studies

    Gaigall, Daniel and Gerstenberg, Julian , journal=. On the number of replications in resampling tests and Monte Carlo simulation studies. 2026 , publisher=

  6. [6]

    Statistical Science , volume=

    Permutation methods: a basis for exact inference , author=. Statistical Science , volume=. 2004 , publisher=

  7. [7]

    Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk

    Gandy, Axel , journal=. Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. 2009 , publisher=

  8. [8]

    The Annals of Mathematical Statistics , year =

    Hoeffding, Wassily , title = ". The Annals of Mathematical Statistics , year =

  9. [9]

    , title = "

    Phipson, Belinda and Smyth, Gordon K. , title = ". Statistical Applications in Genetics and Molecular Biology , year =

  10. [10]

    Bootstrap tests: How many bootstraps?

    Davidson, Russell and MacKinnon, James G , journal=. Bootstrap tests: How many bootstraps?. 2000 , publisher=

  11. [11]

    Consistency of permutation tests of independence using distance covariance, HSIC and dHSIC

    Rindt, David and Sejdinovic, Dino and Steinsaltz, David , journal=. Consistency of permutation tests of independence using distance covariance, HSIC and dHSIC. 2021 , publisher=

  12. [12]

    Journal of the Royal Statistical Society: Series B (Statistical Methodology) , year =

    Fischer, Lasse and Ramdas, Aaditya , title =. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , year =

  13. [13]

    The American Statistician , volume=

    The saw-toothed behavior of power versus sample size and software solutions: single binomial proportion using exact methods , author=. The American Statistician , volume=. 2002 , publisher=

  14. [14]

    Finite sample properties and asymptotic efficiency of Monte Carlo tests

    J. Finite sample properties and asymptotic efficiency of Monte Carlo tests. The Annals of Statistics , volume=. 1986 , publisher=

  15. [15]

    A Remark on Stirling's Formula

    Robbins, Herbert , year =. A Remark on Stirling's Formula. The American Mathematical Monthly , publisher =. doi:10.2307/2308012 , number =

  16. [16]

    Test , volume=

    Exact testing with random permutations , author=. Test , volume=. 2018 , publisher=

  17. [17]

    A simplified Monte Carlo significance test procedure

    Hope, Adery CA , journal=. A simplified Monte Carlo significance test procedure. 1968 , publisher=

  18. [18]

    2005 , publisher=

    Testing Statistical Hypotheses , author=. 2005 , publisher=

  19. [19]

    The journal of machine learning research , volume=

    A kernel two-sample test , author=. The journal of machine learning research , volume=. 2012 , publisher=

  20. [20]

    Biometrika , volume=

    More power by using fewer permutations , author=. Biometrika , volume=. 2024 , publisher=

  21. [21]

    Sankhya A , volume=

    Permutation tests using arbitrary permutation distributions , author=. Sankhya A , volume=. 2023 , publisher=

  22. [22]

    Biometrics , volume=

    Permutation tests for least absolute deviation regression , author=. Biometrics , volume=. 1996 , publisher=

  23. [23]

    Human brain mapping , volume=

    Non-parametric combination and related permutation tests for neuroimaging , author=. Human brain mapping , volume=. 2016 , publisher=

  24. [24]

    Permutation and parametric tests for effect sizes in voxel-based morphometry of gray matter volume in brain structural MRI

    Dickie, David A and Mikhael, Shadia and Job, Dominic E and Wardlaw, Joanna M and Laidlaw, David H and Bastin, Mark E , journal=. Permutation and parametric tests for effect sizes in voxel-based morphometry of gray matter volume in brain structural MRI. 2015 , publisher=

  25. [25]

    Bioinformatics , volume=

    Efficient permutation-based genome-wide association studies for normal and skewed phenotypic distributions , author=. Bioinformatics , volume=. 2022 , publisher=

  26. [26]

    Pitman, E. J. G. , title = ". Supplement to the Journal of the Royal Statistical Society , year =

  27. [27]

    Annals of Statistics , year =

    Janssen, Arnold and Pauls, Thorsten , title = ". Annals of Statistics , year =

  28. [28]

    , title = "

    Chung, EunYi and Romano, Joseph P. , title = ". Annals of Statistics , year =

  29. [29]

    Measuring statistical dependence with

    Gretton, Arthur and Bousquet, Olivier and Smola, Alex and Sch. Measuring statistical dependence with. Algorithmic Learning Theory , pages=. 2005 , publisher=

  30. [30]

    InterStat , volume=

    Testing for equal distributions in high dimension , author=. InterStat , volume=

  31. [31]

    Communications in Statistics---Simulation and Computation , volume=

    Simulation-based tests that can use any number of simulations , author=. Communications in Statistics---Simulation and Computation , volume=. 2007 , publisher=

  32. [32]

    The American Statistician , volume=

    Designing Monte Carlo implementations of permutation or bootstrap hypothesis tests , author=. The American Statistician , volume=. 2002 , publisher=

  33. [33]

    Biometrika , volume=

    More efficient exact group invariance testing: using a representative subgroup , author=. Biometrika , volume=. 2024 , publisher=

  34. [34]

    Computational Statistics , volume=

    A simple method for implementing Monte Carlo tests , author=. Computational Statistics , volume=. 2020 , publisher=

  35. [35]

    Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics

    Dufour, Jean-Marie , journal=. Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics. 2006 , publisher=

  36. [36]

    Ritzwoller, Joseph P

    Ritzwoller, David M. and Romano, Joseph P. and Shaikh, Azeem M. , title =. arXiv preprint arXiv:2406.09521 , year =

  37. [37]

    A Review of Multivariate Permutation Tests: Findings and Trends , journal =

    Arboretti, Rosa and Barzizza, Elena and Biasetton, Nicol\'. A Review of Multivariate Permutation Tests: Findings and Trends , journal =. 2025 , doi =

  38. [38]

    , title =

    Fisher, Ronald A. , title =

  39. [39]

    2009 , publisher=

    Large deviations techniques and applications , author=. 2009 , publisher=

  40. [40]

    The Annals of Mathematical Statistics , volume=

    On deviations of the sample mean , author=. The Annals of Mathematical Statistics , volume=. 1960 , publisher=