Recognition: unknown
Time-sensitive anytime-valid testing
Pith reviewed 2026-05-08 04:19 UTC · model grok-4.3
The pith
Anytime-valid tests can favor early rejection by maximizing expected rewards for stopping times under a known alternative.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By modeling rejection as a controlled process with time-dependent rewards, the authors reduce the problem of finding the optimal anytime-valid test to a dynamic programming task. For the case of a hard deadline and simple hypotheses, this recovers the finite-horizon Neyman-Pearson test as the optimal e-process. When rewards decay exponentially, a stationary policy called the exponential-decay-optimal (EDO) criterion emerges, which approaches the classical growth-rate-optimal criterion as the time scale grows large.
What carries the argument
The Bellman representation of the optimal control problem, which tracks only time and the current evidence against the null to decide whether to continue or reject.
If this is right
- For hard deadlines the optimal e-process coincides with the solution of a finite-horizon Neyman-Pearson problem.
- Exponentially decaying rewards yield a stationary EDO criterion that serves as a practical finite-time analogue to the growth-rate-optimal viewpoint.
- The classical growth-rate-optimal criterion is recovered in the limit of large time scales.
- Soft time preferences can be incorporated by choosing appropriate reward functions.
Where Pith is reading between the lines
- If the Bellman reduction holds, computation of optimal tests becomes feasible for long sequences without tracking full history.
- The stationary EDO criterion could serve as a default practical choice when exact time preferences are not specified in advance.
- The framework connects finite-time sequential testing back to classical asymptotic optimality results as the horizon lengthens.
Load-bearing premise
The method requires that a specific alternative hypothesis is known in advance so that the expected reward can be maximized.
What would settle it
If the optimal rejection strategy in a simple-vs-simple test with a hard deadline fails to match the threshold derived from the finite-horizon Neyman-Pearson lemma, the reduction claimed in the paper would not hold.
Figures
read the original abstract
Anytime-valid tests allow evidence to be checked during data collection: one can either continue testing or stop and reject the null while still controlling type-I error. Yet, in many applications rejection is useful only if it comes soon enough. We introduce a time-sensitive testing-by-betting framework that favours early rejection by assigning rewards to rejection times and maximising their expected value under a given alternative. This encompasses hard deadlines and softer time preferences. The resulting optimal control problem admits a Bellman representation in terms only of time and evidence against the null, rather than the full history. For hard deadlines, the simple-vs-simple case reduces to a finite-horizon Neyman--Pearson problem and identify the corresponding optimal e-process. Furthermore, we show that exponentially decaying rewards admit a stationary approximation, yielding the exponential-decay-optimal (EDO) criterion: a finite-time-scale counterpart to the classical growth-rate-optimal (GRO) viewpoint in anytime-valid statistics, with the GRO criterion recovered in the large-time-scale limit.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a time-sensitive testing-by-betting framework for anytime-valid inference. Rewards are assigned to rejection times and the expected reward is maximized under a known alternative, yielding an optimal control problem. This problem admits a Bellman representation depending only on time and current evidence against the null (rather than full history). For hard deadlines in the simple-vs-simple setting, the problem reduces to a finite-horizon Neyman-Pearson problem whose optimal e-process is identified. Exponentially decaying rewards admit a stationary approximation producing the exponential-decay-optimal (EDO) criterion, which recovers the growth-rate-optimal (GRO) criterion in the large-time-scale limit.
Significance. If the claimed state reduction and derivations hold, the work meaningfully extends anytime-valid testing to incorporate explicit time preferences, which is relevant for applications with deadlines or decaying utility of late rejections. The Markovian Bellman representation and the explicit recovery of the GRO criterion as a limiting case are technically attractive features that connect the new framework to both optimal control and existing e-process literature. The transparent modeling choice of a known alternative for optimization avoids hidden circularity.
minor comments (2)
- Abstract: the clause 'the simple-vs-simple case reduces to a finite-horizon Neyman--Pearson problem and identify the corresponding optimal e-process' is grammatically incomplete; rephrase to 'we identify' or restructure the sentence for readability.
- Abstract and introduction: while the Bellman representation is asserted to depend only on time and evidence, a brief statement of the regularity conditions (e.g., Markovian property of the evidence process under the alternative) that justify the state reduction would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript, including the recognition of the Markovian Bellman representation, the connection to optimal control, and the recovery of the GRO criterion in the large-time limit. The recommendation is for minor revision, but the report contains no specific major comments to address.
Circularity Check
No significant circularity
full rationale
The derivation begins from an explicit modeling choice: assign rewards to rejection times and maximize expected reward under a known alternative. The claim that the resulting optimal-control problem admits a Bellman equation whose state is only (time, current evidence) is presented as a structural property of the Markovian setup rather than a fitted or self-defined quantity. The reduction of the hard-deadline simple-vs-simple case to a finite-horizon Neyman-Pearson problem follows directly from standard dynamic programming once that state reduction is granted. The stationary approximation for exponentially decaying rewards produces the EDO criterion, with the classical GRO recovered only as a large-time limit; neither step renames a fitted input as a prediction nor relies on a load-bearing self-citation. All load-bearing steps are therefore independent of the target results and rest on the upfront modeling assumptions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The testing-by-betting setup can be cast as a Markov decision process whose state is (time, current evidence) and whose value function satisfies a Bellman equation.
- domain assumption An optimal policy exists for the finite-horizon or stationary reward maximization problems under the given alternatives.
Reference graph
Works this paper leans on
-
[1]
S. Agrawal and A. Ramdas. On stopping times of power-one sequential tests: Tight lower and upper bounds. arXiv:2504.19952, 2025
- [2]
-
[3]
E. Clerico. On the optimality of coin-betting for mean estimation. International Journal of Approximate Reasoning, 187: 0 109550, 2025
2025
-
[4]
L. E. Dubins and L. J. Savage. How to Gamble If You Must: Inequalities for Stochastic Processes. McGraw-Hill, New York, 1965
1965
-
[5]
W. Feller. An Introduction to Probability Theory and Its Applications, volume 2. John Wiley & Sons, New York, 2 edition, 1971
1971
-
[6]
Fischer and A
L. Fischer and A. Ramdas. Improving wald's (approximate) sequential probability ratio test by avoiding overshoot. IEEE Transactions on Information Theory, 72 0 (4): 0 2457--2471, 2026
2026
-
[7]
Gr \"u nwald, R
P. Gr \"u nwald, R. de Heide, and W. M. Koolen. Safe testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 86 0 (5): 0 1091--1128, 2024
2024
-
[8]
Kallenberg
O. Kallenberg. Stationary and invariant densities and disintegration kernels. Probability Theory and Related Fields, 160 0 (3--4): 0 567--592, 2014
2014
-
[9]
A. S. Kechris. Classical Descriptive Set Theory, volume 156 of Graduate Texts in Mathematics. Springer, New York, 1995
1995
-
[10]
Kelly, John L
J. Kelly, John L. A new interpretation of information rate. Bell System Technical Journal, 35 0 (4): 0 917--926, 1956
1956
-
[11]
N. W. Koning and S. van Meer. Anytime validity is free: Inducing sequential tests. Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkag050, 2026
2026
-
[12]
K. Lange. Borel sets of probability measures. Pacific Journal of Mathematics, 48 0 (1): 0 141--161, 1973
1973
-
[13]
M. Larsson, A. Ramdas, and J. Ruf. Testing hypotheses generated by constraints. arXiv:2504.02974, 2025
-
[14]
E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer, New York, 3 edition, 2005
2005
-
[15]
A. A. Liapounoff. Sur les fonctions-vecteurs compl \`e tement additives. Izvestiya Akademii Nauk SSSR. Seriya Matematicheskaya, 4 0 (6): 0 465--478, 1940
1940
-
[16]
Neyman and E
J. Neyman and E. S. Pearson. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, 231: 0 289--337, 1933
1933
-
[17]
Orabona and K.-S
F. Orabona and K.-S. Jun. Tight concentrations and confidence sequences from the regret of universal portfolio. IEEE Transactions on Information Theory, 70 0 (1): 0 436--455, 2024
2024
-
[18]
Ramdas and R
A. Ramdas and R. Wang. Hypothesis testing with e -values. Foundations and Trends in Statistics, 1 0 (1--2): 0 1--390, 2025
2025
- [19]
-
[20]
Ramdas, J
A. Ramdas, J. Ruf, M. Larsson, and W. M. Koolen. Testing exchangeability: Fork-convexity, supermartingales and e -processes. International Journal of Approximate Reasoning, 141: 0 83--109, 2022
2022
-
[21]
Ramdas, P
A. Ramdas, P. Gr \"u nwald, V. Vovk, and G. Shafer. Game-theoretic statistics and safe anytime-valid inference. Statistical Science, 38 0 (4): 0 576--601, 2023
2023
-
[22]
Sch \"a l and W
M. Sch \"a l and W. Sudderth. Stationary policies and markov policies in borel dynamic programming. Probability Theory and Related Fields, 74 0 (1): 0 91--111, 1987
1987
- [23]
-
[24]
Vor \'a c ek and F
V. Vor \'a c ek and F. Orabona. STaR-Bets : Sequential target-recalculating bets for tighter confidence intervals. In Advances in Neural Information Processing Systems, 2025
2025
-
[25]
A. Wald. Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16 0 (2): 0 117--186, 1945
1945
-
[26]
Waudby-Smith and A
I. Waudby-Smith and A. Ramdas. Estimating means of bounded random variables by betting. Journal of the Royal Statistical Society Series B: Statistical Methodology, 86 0 (1): 0 1--27, 2024
2024
-
[27]
Y.-C. Yao. On optimality of bold play for discounted Dubins--Savage gambling problems with limited playing times. Journal of Applied Probability, 44 0 (1): 0 212--225, 2007
2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.