pith. machine review for the scientific record. sign in

arxiv: 2604.21851 · v1 · submitted 2026-04-23 · 📊 stat.ME · math.ST· stat.TH

Recognition: unknown

Betting on Bets: Anytime-Valid Tests for Stochastic Dominance

Ilia Tsetlin, Marco Scarsini, Sebastian Arnold, Yo Joong Choe

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:53 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH
keywords stochastic dominanceanytime-valid testse-processessequential testingnonparametric inferencestochastic orderingpower one tests
0
0 comments X

The pith

Sequential tests for stochastic dominance achieve power one by mixing asymptotically growth-rate optimal e-variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops new sequential tests that let researchers monitor in real time whether one uncertain prospect is better than another in a distributional sense, beyond just averages. These tests accumulate evidence against the idea that one is dominated by the other using a new construction based on betting-like quantities that grow optimally. The resulting procedures will eventually detect dominance if it exists and stay valid no matter when you look at the data. This works not only for the basic version but also for stronger orders of dominance. They perform as well as traditional fixed-sample tests but without needing to decide the sample size in advance.

Core claim

We develop a novel family of sequential, anytime-valid tests for stochastic dominance by constructing e-processes as mixtures of asymptotically growth-rate optimal e-variables. These yield power-one tests that remain valid under continuous monitoring. The construction is nonparametric for first-order stochastic dominance and extends directly to any higher-order version.

What carries the argument

Mixture of asymptotically growth-rate optimal e-variables forming an e-process that quantifies accumulating evidence against the null of no stochastic dominance.

If this is right

  • The tests achieve power one, eventually rejecting the null almost surely when dominance holds.
  • Validity is preserved for any stopping time or continuous monitoring schedule.
  • The same mixing construction applies to higher-order stochastic dominance.
  • Empirical power matches or approaches that of standard non-sequential tests for fixed samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could support ongoing monitoring of investment or treatment prospects without pre-fixing sample sizes.
  • Conditions sketched for testing the complementary non-dominance null might enable detection of definite upside in sequential settings.
  • Similar mixtures could be explored for related ordering concepts such as convex stochastic order.

Load-bearing premise

Mixtures of asymptotically growth-rate optimal e-variables for the stochastic dominance null will produce processes that grow without bound under the alternative while staying valid at all times.

What would settle it

A data-generating process where one distribution truly dominates the other yet the constructed evidence process remains bounded, or where the test rejects the null too often when monitoring continuously on identical distributions.

Figures

Figures reproduced from arXiv: 2604.21851 by Ilia Tsetlin, Marco Scarsini, Sebastian Arnold, Yo Joong Choe.

Figure 1
Figure 1. Figure 1: All GRO e-processes grow quickly under anti-monotonicity while maintaining anytime-validity. Plots show simulations with finite support (4 outcomes) and maximal anti-monotonicity (𝜌(𝑋, 𝑌) = −1). Each line is averaged over 500 repeated simulations. The Ville error plot is drawn for 𝑡 = 1, … , 5, 000, while the e-power plot is truncated at 𝑡 = 500 and at value 40 for better visualization. Over time, a linear… view at source ↗
Figure 2
Figure 2. Figure 2: Adaptive GRO e-processes can grow even when the non-dominance region is far away from the initial search interval (Case 4 vs. Case 3). They can also grow more quickly than non-adaptive GRO e￾processes when the contact set is relatively small within the initial search interval (Case 2). All methods control the Ville error at level 𝛼 = 0.05 under the “hardest” null case (Case 1, where 𝐹𝑋 ≡ 𝐹𝑌). The e-power p… view at source ↗
Figure 3
Figure 3. Figure 3: All GRO e-processes grow quickly even when the contact set is large, with the adaptive vari￾ant with exponential weights having the largest e-power. These plots summarize simulations where 𝐹𝑌 ≡ 𝖴𝗇𝗂𝖿[0, 1] and 𝐹𝑋 is a piecewise uniform distribution (25) with a kink point at 𝑧0 ∈ [0, 1]. As 𝑧0 decreases, the two CDFs have more “contact” and the testing problem becomes more challenging (under the null, to not… view at source ↗
Figure 4
Figure 4. Figure 4: When compared against classical, non-anytime-valid approaches, GRO e-processes can achieve competitive (if not better) power. We compare the level-𝛼 sequential tests induced by adaptive and non￾adaptive GRO e-processes (AdaGRO-Exp and GRO, respectively) against three classical, non-anytime￾valid test (BD03, LMW05, and LSW10). The level 𝛼 is fixed at 0.05. Figure 4a confirms that, under the null (𝑧0 = 0, i.… view at source ↗
Figure 5
Figure 5. Figure 5: For higher-order SD testing, adaptive UP e-processes grow quickly, relative to non-adaptive UP or constant-bet counterparts, particularly when the contact set between the CDFs is large. These plots summarize simulations in the same setup as for [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Histograms and means of the stopping times [PITH_FULL_IMAGE:figures/full_fig_p053_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: All UP e-processes control the Ville error at level [PITH_FULL_IMAGE:figures/full_fig_p054_7.png] view at source ↗
read the original abstract

How can we monitor, in real time, whether one uncertain prospect has any upside over another? To answer this question, we develop a novel family of sequential, anytime-valid tests for stochastic dominance (SD; also known as stochastic ordering), a classical and popular notion for comparing entire distribution functions. The problem is distinct from the popular problem of testing for dominance in means, which would not capture distributional differences beyond the first moment. We first derive powerful, nonparametric e-processes that quantify evidence against the null hypothesis that one prospect is dominated by another. For first-order SD, these e-processes are constructed as a mixture of asymptotically growth-rate optimal e-variables and yield a test of power one. The approach further generalizes to sequential testing for SD beyond the first order, including any higher-order SD. Empirically, we demonstrate that the resulting sequential tests are competitive with existing non-sequential SD tests in terms of power, while achieving validity under continuous monitoring that existing methods do not. Finally, we sketch the complementary and challenging problem of testing the non-SD null hypothesis, which asks whether a prospect has a definite upside, and describe the conditions under which we can derive a nontrivial anytime-valid test.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a family of sequential, anytime-valid tests for stochastic dominance (SD) using nonparametric e-processes. For first-order SD, the e-processes are constructed as mixtures of asymptotically growth-rate optimal e-variables and are claimed to yield power-one tests under the alternative while remaining valid under continuous monitoring. The approach generalizes to higher-order SD, and the manuscript includes empirical comparisons showing competitiveness with non-sequential SD tests plus a sketch of the complementary problem of testing the non-SD null.

Significance. If the derivations and power-one claims hold, the work supplies a practical tool for real-time monitoring of distributional dominance that existing fixed-sample tests lack. The grounding in e-process theory for a composite nonparametric null is a clear strength, as is the explicit generalization beyond first-order SD and the empirical demonstration of power competitiveness. These features address a genuine gap in sequential nonparametric testing with direct relevance to economics and decision theory.

major comments (2)
  1. [§3.2] §3.2, the mixture construction: the claim that the e-process is a mixture of asymptotically growth-rate optimal e-variables for the first-order SD null requires an explicit statement of the mixing measure and a proof that the resulting process retains the growth-rate optimality (or at least the power-one property) under the alternative; without this, the power-one guarantee cannot be verified from the given construction.
  2. [§4] §4, generalization to higher-order SD: the extension from first-order to k-th order SD is sketched but the corresponding e-variable family and the validity argument under continuous monitoring are not derived in detail; this is load-bearing for the claim that the method 'further generalizes' and needs at least a theorem statement with the key steps.
minor comments (2)
  1. [Throughout] Notation for the e-processes (e.g., the distinction between the instantaneous e-variable and the cumulative process) is introduced without a consolidated table or definition list, making it easy to lose track across sections.
  2. [§5] The empirical section would benefit from reporting the exact sample sizes and number of Monte Carlo replications used to generate the power curves, as well as a direct comparison of type-I error under continuous monitoring versus fixed-sample benchmarks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation for minor revision. We address the two major comments point by point below, agreeing that greater explicitness is needed in both cases. The requested clarifications will be incorporated into the revised manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2, the mixture construction: the claim that the e-process is a mixture of asymptotically growth-rate optimal e-variables for the first-order SD null requires an explicit statement of the mixing measure and a proof that the resulting process retains the growth-rate optimality (or at least the power-one property) under the alternative; without this, the power-one guarantee cannot be verified from the given construction.

    Authors: We agree that an explicit statement of the mixing measure and a supporting argument for the power-one property are required for full verifiability. In the revision we will state the mixing measure precisely (a probability measure supported on the class of asymptotically growth-rate optimal e-variables for the first-order SD null) and add a short lemma establishing that the resulting mixture e-process inherits the power-one property under the alternative. This addition will be placed in §3.2 immediately after the construction. revision: yes

  2. Referee: [§4] §4, generalization to higher-order SD: the extension from first-order to k-th order SD is sketched but the corresponding e-variable family and the validity argument under continuous monitoring are not derived in detail; this is load-bearing for the claim that the method 'further generalizes' and needs at least a theorem statement with the key steps.

    Authors: We acknowledge that the higher-order extension is currently only sketched. We will expand §4 with a formal theorem that (i) defines the family of e-variables for k-th order SD, (ii) states the corresponding e-process, and (iii) outlines the key steps establishing anytime-validity under continuous monitoring. The proof will adapt the first-order argument via the appropriate integral representation of higher-order dominance. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper constructs e-processes for stochastic dominance testing by mixing asymptotically growth-rate optimal e-variables drawn from established sequential testing theory. This step relies on external optimality results for e-variables under composite nonparametric nulls rather than defining the target quantity in terms of itself or fitting parameters to the same data used for validation. No load-bearing self-citations reduce the central claim to unverified prior work by the same authors; the power-one property and continuous-monitoring validity follow directly from the mixture construction and standard e-process martingale properties. The derivation remains self-contained against external benchmarks in e-process literature.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are specified in the abstract; the approach builds on existing e-process theory without introducing new postulated entities.

pith-pipeline@v0.9.0 · 5521 in / 1058 out tokens · 40217 ms · 2026-05-09T20:53:50.734725+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

158 extracted references · 15 canonical work pages

  1. [1]

    Messenger Math

    Some simple inequalities satisfied by convex functions , author=. Messenger Math. , volume=

  2. [2]

    1934 , publisher=

    Inequalities , author=. 1934 , publisher=

  3. [3]

    Sur une in

    Karamata, Jovan , journal=. Sur une in

  4. [4]

    Theory of Games and Economic Behavior , year =

    John von Neumann and Oskar Morgenstern , publisher =. Theory of Games and Economic Behavior , year =

  5. [5]

    Journal of the American Statistical Association , volume=

    The theory of statistical decision , author=. Journal of the American Statistical Association , volume=. 1951 , publisher=

  6. [6]

    Journal of Finance , volume=

    Portfolio selection , author=. Journal of Finance , volume=

  7. [7]

    Econometrica: Journal of the Econometric Society , pages=

    Risk Aversion in the Small and in the Large , author=. Econometrica: Journal of the Econometric Society , pages=. 1964 , publisher=

  8. [8]

    The American Economic Review , volume=

    Rules for ordering uncertain prospects , author=. The American Economic Review , volume=. 1969 , publisher=

  9. [9]

    The Review of Economic Studies , volume=

    The efficiency analysis of choices involving risk , author=. The Review of Economic Studies , volume=. 1969 , publisher=

  10. [10]

    A definition , journal =

    Increasing risk: I. A definition , journal =. 1970 , issn =. doi:https://doi.org/10.1016/0022-0531(70)90038-4 , url =

  11. [11]

    G. A. Whitmore , journal =. Third-Degree Stochastic Dominance , volume =

  12. [12]

    1970 , publisher=

    Utility theory for decision making , author=. 1970 , publisher=

  13. [13]

    1976 , issn =

    Continua of stochastic dominance relations for bounded probability distributions , journal =. 1976 , issn =

  14. [14]

    1980 , author =

    Continua of stochastic dominance relations for unbounded probability distributions , journal =. 1980 , author =

  15. [15]

    1976 , publisher=

    Order relations in the set of probability distribution functions and their applications in queueing theory , author=. 1976 , publisher=

  16. [16]

    The Annals of Probability , volume=

    Stochastic inequalities on partially ordered spaces , author=. The Annals of Probability , volume=. 1977 , publisher=

  17. [17]

    Studies in the Economics of Uncertainty: In Honor of Josef Hadar , pages=

    Stochastic dominance for the class of completely monotonic utility functions , author=. Studies in the Economics of Uncertainty: In Honor of Josef Hadar , pages=. 1989 , publisher=

  18. [18]

    Econometrica: Journal of the Econometric Society , pages=

    Precautionary Saving in the Small and in the Large , author=. Econometrica: Journal of the Econometric Society , pages=. 1990 , publisher=

  19. [19]

    Management Science , volume=

    Stochastic dominance and expected utility: Survey and analysis , author=. Management Science , volume=. 1992 , publisher=

  20. [20]

    Lecture Notes-Monograph Series , pages=

    Multivariate stochastic orderings and generating cones of functions , author=. Lecture Notes-Monograph Series , pages=. 1991 , publisher=

  21. [21]

    Management Science , volume=

    Risk, return, skewness and preference , author=. Management Science , volume=. 1992 , publisher=

  22. [22]

    1993 , publisher=

    Stochastic orders and applications: A classified bibliography , author=. 1993 , publisher=

  23. [23]

    Advances in Applied Probability , volume=

    Stochastic orders generated by integrals: A unified study , author=. Advances in Applied Probability , volume=. 1997 , publisher=

  24. [24]

    2002 , publisher=

    Comparison Methods for Stochastic Models and Risks , author=. 2002 , publisher=

  25. [25]

    Advances in applied probability , volume=

    Integral probability metrics and their generating classes of functions , author=. Advances in applied probability , volume=. 1997 , publisher=

  26. [26]

    The Annals of Applied Probability , volume=

    Smooth generators of integral stochastic orders , author=. The Annals of Applied Probability , volume=. 2002 , publisher=

  27. [27]

    Management Science , volume=

    Preferred by ``all'' and preferred by ``most'' decision makers: Almost stochastic dominance , author=. Management Science , volume=. 2002 , publisher=

  28. [28]

    2007 , publisher=

    Stochastic orders , author=. 2007 , publisher=

  29. [29]

    2019 , publisher=

    Econometric analysis of stochastic dominance: Concepts, methods, tools, and applications , author=. 2019 , publisher=

  30. [30]

    PySDTest: a

    Lee, Kyungho and Whang, Yoon-Jae , journal=. PySDTest: a

  31. [31]

    Available at SSRN 5143307 , year=

    Integral stochastic orders with parametric classes of functions , author=. Available at SSRN 5143307 , year=

  32. [32]

    Wiley Encyclopedia of Operations Research and Management Science , year=

    Simplifying and solving decision problems by stochastic dominance relations , author=. Wiley Encyclopedia of Operations Research and Management Science , year=

  33. [33]

    Journal of Risk and Insurance , volume=

    Multivariate almost stochastic dominance , author=. Journal of Risk and Insurance , volume=. 2018 , publisher=

  34. [34]

    Operations Research , volume=

    Multivariate almost stochastic dominance: Transfer characterizations and sufficient conditions under dependence uncertainty , author=. Operations Research , volume=. 2025 , publisher=

  35. [35]

    Sulla determinazione empirica di una legge di distribuzione , author=. Giorn. Ist. Ital. Attuari , volume=

  36. [36]

    On the estimation of the discrepancy between empirical curves of distribution for two independent samples , author=. Bull. Math. Univ. Moscou , volume=

  37. [37]

    Biometrics bulletin , volume=

    Individual comparisons by ranking methods , author=. Biometrics bulletin , volume=. 1945 , publisher=

  38. [38]

    The Annals of Mathematical Statistics , pages=

    On a test of whether one of two random variables is stochastically larger than the other , author=. The Annals of Mathematical Statistics , pages=. 1947 , publisher=

  39. [39]

    The Annals of Mathematical Statistics , pages=

    Consistency and unbiasedness of certain nonparametric tests , author=. The Annals of Mathematical Statistics , pages=. 1951 , publisher=

  40. [40]

    Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability , year=

    Comparison of experiments , author=. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability , year=

  41. [41]

    The Annals of Mathematical Statistics , pages=

    Equivalent comparisons of experiments , author=. The Annals of Mathematical Statistics , pages=. 1953 , publisher=

  42. [42]

    The Annals of Mathematical Statistics , pages=

    Ordered Families of Distributions , author=. The Annals of Mathematical Statistics , pages=. 1955 , publisher=

  43. [43]

    The significance probability of the Smirnov two-sample test , author=. Arkiv f. 1958 , publisher=

  44. [44]

    The Annals of Mathematical Statistics , volume=

    Estimates of Location Based on Rank Tests , author=. The Annals of Mathematical Statistics , volume=. 1963 , publisher=

  45. [45]

    Studies in the Economics of Uncertainty: In Honor of Josef Hadar , pages=

    Testing for stochastic dominance , author=. Studies in the Economics of Uncertainty: In Honor of Josef Hadar , pages=. 1989 , publisher=

  46. [46]

    Communications in Statistics-Theory and Methods , volume=

    Testing for second order stochastic dominance , author=. Communications in Statistics-Theory and Methods , volume=. 1985 , publisher=

  47. [47]

    Econometric Theory , volume=

    Testing for second-order stochastic dominance of two distributions , author=. Econometric Theory , volume=. 1994 , publisher=

  48. [48]

    Econometrica: Journal of the Econometric Society , pages=

    Nonparametric tests of stochastic dominance in income distributions , author=. Econometrica: Journal of the Econometric Society , pages=. 1996 , publisher=

  49. [49]

    Econometrica , volume=

    Statistical inference for stochastic dominance and for the measurement of poverty and inequality , author=. Econometrica , volume=. 2000 , publisher=

  50. [50]

    Econometrica , volume=

    Consistent tests for stochastic dominance , author=. Econometrica , volume=. 2003 , publisher=

  51. [51]

    the Journal of Finance , volume=

    Empirical tests for stochastic dominance efficiency , author=. the Journal of Finance , volume=. 2003 , publisher=

  52. [52]

    Journal of Econometrics , volume=

    Testing for stochastic dominance using the weighted McFadden-type statistic , author=. Journal of Econometrics , volume=. 2006 , publisher=

  53. [53]

    1991 , institution=

    A robust test for stochastic dominance , author=. 1991 , institution=

  54. [54]

    The Review of Economic Studies , volume=

    Consistent testing for stochastic dominance under general sampling schemes , author=. The Review of Economic Studies , volume=. 2005 , publisher=

  55. [55]

    Journal of Econometrics , volume=

    An improved bootstrap test of stochastic dominance , author=. Journal of Econometrics , volume=. 2010 , publisher=

  56. [56]

    Journal of the American Statistical Association , volume=

    Bootstrap tests for distributional treatment effects in instrumental variable models , author=. Journal of the American Statistical Association , volume=. 2002 , publisher=

  57. [57]

    Journal of Econometrics , volume=

    Instrumental quantile regression inference for structural and treatment effect models , author=. Journal of Econometrics , volume=. 2006 , publisher=

  58. [58]

    Econometrica , volume=

    Inference on counterfactual distributions , author=. Econometrica , volume=. 2013 , publisher=

  59. [59]

    Journal of Business & Economic Statistics , volume=

    Testing for stochastic dominance efficiency , author=. Journal of Business & Economic Statistics , volume=. 2010 , publisher=

  60. [60]

    Econometric Reviews , volume=

    Testing for restricted stochastic dominance , author=. Econometric Reviews , volume=. 2013 , publisher=

  61. [61]

    Journal of Econometrics , volume=

    An improved bootstrap test for restricted stochastic dominance , author=. Journal of Econometrics , volume=. 2021 , publisher=

  62. [62]

    Econometric Reviews , volume=

    Improving the power of tests of stochastic dominance , author=. Econometric Reviews , volume=. 2016 , publisher=

  63. [63]

    European Journal of Operational Research , volume=

    Stochastic dominance via quantile regression with applications to investigate arbitrage opportunity and market efficiency , author=. European Journal of Operational Research , volume=. 2017 , publisher=

  64. [64]

    Journal of Mathematical Sciences , volume=

    On consistent hypothesis testing , author=. Journal of Mathematical Sciences , volume=. 2017 , publisher=

  65. [65]

    Management Science , volume=

    Estimating the critical parameter in almost stochastic dominance from insurance deductibles , author=. Management Science , volume=. 2021 , publisher=

  66. [66]

    Journal of Economic Dynamics and Control , volume=

    Stochastic dominance tests , author=. Journal of Economic Dynamics and Control , volume=. 2020 , publisher=

  67. [67]

    Journal of Machine Learning Research , volume=

    Statistical comparisons of classifiers by generalized stochastic dominance , author=. Journal of Machine Learning Research , volume=

  68. [68]

    Journal of Business and Economic Statistics (in press) , year=

    Tests for almost stochastic dominance , author=. Journal of Business and Economic Statistics (in press) , year=

  69. [69]

    Canadian Journal of Statistics , volume=

    Tests for the first-order stochastic dominance , author=. Canadian Journal of Statistics , volume=. 2024 , publisher=

  70. [70]

    Advances in Neural Information Processing Systems , volume=

    Multivariate stochastic dominance via optimal transport and applications to models benchmarking , author=. Advances in Neural Information Processing Systems , volume=

  71. [71]

    The Mathematics of the Uncertain: A Tribute to Pedro Gil , pages=

    An optimal transportation approach for assessing almost stochastic order , author=. The Mathematics of the Uncertain: A Tribute to Pedro Gil , pages=. 2018 , publisher=

  72. [72]

    Li, Jiachun and Shi, Kaining and Simchi-Levi, David , journal=. Beyond

  73. [73]

    Operations Research , volume=

    Ranking distributions when only means and variances are known , author=. Operations Research , volume=. 2022 , publisher=

  74. [74]

    Bernoulli , pages=

    Prequential probability: Principles and properties , author=. Bernoulli , pages=. 1999 , publisher=

  75. [75]

    2005 , publisher=

    Probability and finance: It's only a game! , author=. 2005 , publisher=

  76. [76]

    2005 , publisher=

    Algorithmic learning in a random world , author=. 2005 , publisher=

  77. [77]

    Test martingales,

    Shafer, Glenn and Shen, Alexander and Vereshchagin, Nikolai and Vovk, Vladimir , journal=. Test martingales,. 2011 , publisher=

  78. [78]

    2019 , publisher=

    Game-theoretic foundations for probability and finance , author=. 2019 , publisher=

  79. [79]

    Journal of the Royal Statistical Society: Series A (Statistics in Society) , volume =

    Shafer, Glenn , title =. Journal of the Royal Statistical Society: Series A (Statistics in Society) , volume =

  80. [80]

    Transactions of the American Mathematical Society , volume=

    Regularity properties of certain families of chance variables , author=. Transactions of the American Mathematical Society , volume=

Showing first 80 references.