arxiv: 2605.06521 · v1 · submitted 2026-05-07 · 🧮 math.ST · math.OC· stat.TH

Recognition: unknown

Time-sensitive anytime-valid testing

Eugenio Clerico , Tobias Wegel , Iskander Azangulov , Patrick Rebeschini

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:19 UTC · model grok-4.3

classification 🧮 math.ST math.OCstat.TH

keywords anytime-valid testingtesting-by-bettinge-processessequential hypothesis testingoptimal stoppingNeyman-Pearsondynamic programming

0 comments

The pith

Anytime-valid tests can favor early rejection by maximizing expected rewards for stopping times under a known alternative.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a time-sensitive version of testing-by-betting for anytime-valid inference. It lets users assign higher rewards to earlier rejection times and then finds the strategy that maximizes the expected reward under a fixed alternative hypothesis. This setup covers both hard deadlines and gradual time preferences. The key technical step is showing that the resulting optimal control problem can be solved using a Bellman equation that depends only on the current time and the current strength of evidence against the null.

Core claim

By modeling rejection as a controlled process with time-dependent rewards, the authors reduce the problem of finding the optimal anytime-valid test to a dynamic programming task. For the case of a hard deadline and simple hypotheses, this recovers the finite-horizon Neyman-Pearson test as the optimal e-process. When rewards decay exponentially, a stationary policy called the exponential-decay-optimal (EDO) criterion emerges, which approaches the classical growth-rate-optimal criterion as the time scale grows large.

What carries the argument

The Bellman representation of the optimal control problem, which tracks only time and the current evidence against the null to decide whether to continue or reject.

If this is right

For hard deadlines the optimal e-process coincides with the solution of a finite-horizon Neyman-Pearson problem.
Exponentially decaying rewards yield a stationary EDO criterion that serves as a practical finite-time analogue to the growth-rate-optimal viewpoint.
The classical growth-rate-optimal criterion is recovered in the limit of large time scales.
Soft time preferences can be incorporated by choosing appropriate reward functions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the Bellman reduction holds, computation of optimal tests becomes feasible for long sequences without tracking full history.
The stationary EDO criterion could serve as a default practical choice when exact time preferences are not specified in advance.
The framework connects finite-time sequential testing back to classical asymptotic optimality results as the horizon lengthens.

Load-bearing premise

The method requires that a specific alternative hypothesis is known in advance so that the expected reward can be maximized.

What would settle it

If the optimal rejection strategy in a simple-vs-simple test with a hard deadline fails to match the threshold derived from the finite-horizon Neyman-Pearson lemma, the reduction claimed in the paper would not hold.

Figures

Figures reproduced from arXiv: 2605.06521 by Eugenio Clerico, Iskander Azangulov, Patrick Rebeschini, Tobias Wegel.

**Figure 1.** Figure 1: Heatmap of the optimal action a, representing the bet E[a](x) = exp{ax − a 2/2}, as a function of time t and log-wealth log w when testing π0 = N (0, 1) against π1 = N (0.6, 1) with α = .05, under different rewards. The corresponding stopping-time cdfs are shown below. Further details in Appendix D.1. 2 view at source ↗

**Figure 2.** Figure 2: Rejection-time distributions for π0 = Ber(0.4) vs π1 = Ber(0.6) under the Doob e-process, compared with GRO. Details in Appendix D.2. 8 view at source ↗

**Figure 3.** Figure 3: Comparison of EDO with the numerical Bellman optimum for view at source ↗

**Figure 4.** Figure 4: Effect of the reward time scale on EDO rejection-time distributions for view at source ↗

**Figure 5.** Figure 5: Power trade-off in π0 = Ber(.5) vs π1 = Ber(.7) for EDO criterion. Details in Appendix D.5. 12 view at source ↗

**Figure 6.** Figure 6: Rejection-time distributions for π0 = Ber(0.4) and π1 = Ber(0.6), with α = 0.05, for deadlines T = 10, 25, 100. The top row shows coarse-binned rejection probabilities and the bottom row shows the corresponding cdfs. The Doob curve is the exact hard-deadline Doob/Neyman–Pearson benchmark, GRO and EDO are constant-bet strategies, and STaR-Bets and Taga et al. are simulated horizon-aware baselines. mass is p… view at source ↗

read the original abstract

Anytime-valid tests allow evidence to be checked during data collection: one can either continue testing or stop and reject the null while still controlling type-I error. Yet, in many applications rejection is useful only if it comes soon enough. We introduce a time-sensitive testing-by-betting framework that favours early rejection by assigning rewards to rejection times and maximising their expected value under a given alternative. This encompasses hard deadlines and softer time preferences. The resulting optimal control problem admits a Bellman representation in terms only of time and evidence against the null, rather than the full history. For hard deadlines, the simple-vs-simple case reduces to a finite-horizon Neyman--Pearson problem and identify the corresponding optimal e-process. Furthermore, we show that exponentially decaying rewards admit a stationary approximation, yielding the exponential-decay-optimal (EDO) criterion: a finite-time-scale counterpart to the classical growth-rate-optimal (GRO) viewpoint in anytime-valid statistics, with the GRO criterion recovered in the large-time-scale limit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper folds time costs into anytime-valid testing via reward maximization on rejection times, yielding a Bellman reduction and an EDO criterion that recovers GRO at large scales.

read the letter

The main takeaway is that the authors treat early rejection as carrying explicit value under a fixed alternative, then solve the resulting optimal control problem. This produces a state that depends only on current time and accumulated evidence, plus a clean reduction of the hard-deadline simple-vs-simple case to finite-horizon Neyman-Pearson and a stationary EDO rule for exponential decay that limits to the usual GRO criterion. Those pieces are new relative to standard anytime-valid and GRO literature. The construction stays Markovian and avoids circularity by grounding everything in the dynamic-programming setup from the start. The abstract and stress-test notes show no internal contradictions or hidden dependence on full history, so the central claims appear to hold up on their own terms. The obvious limitation is the need for a known alternative to define the reward function; that is a standard modeling choice in optimal testing but narrows direct applicability when the alternative is only partially specified. No other load-bearing gaps are visible. This work is aimed at people already working on e-processes, sequential testing, and optimal stopping who want to penalize late decisions. A reader who cares about practical timing in monitoring or A/B testing would get concrete tools from the EDO approximation and the finite-horizon link. The ideas are coherent enough and the connections to existing criteria are sharp enough that the paper merits a serious referee rather than a desk rejection.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces a time-sensitive testing-by-betting framework for anytime-valid inference. Rewards are assigned to rejection times and the expected reward is maximized under a known alternative, yielding an optimal control problem. This problem admits a Bellman representation depending only on time and current evidence against the null (rather than full history). For hard deadlines in the simple-vs-simple setting, the problem reduces to a finite-horizon Neyman-Pearson problem whose optimal e-process is identified. Exponentially decaying rewards admit a stationary approximation producing the exponential-decay-optimal (EDO) criterion, which recovers the growth-rate-optimal (GRO) criterion in the large-time-scale limit.

Significance. If the claimed state reduction and derivations hold, the work meaningfully extends anytime-valid testing to incorporate explicit time preferences, which is relevant for applications with deadlines or decaying utility of late rejections. The Markovian Bellman representation and the explicit recovery of the GRO criterion as a limiting case are technically attractive features that connect the new framework to both optimal control and existing e-process literature. The transparent modeling choice of a known alternative for optimization avoids hidden circularity.

minor comments (2)

Abstract: the clause 'the simple-vs-simple case reduces to a finite-horizon Neyman--Pearson problem and identify the corresponding optimal e-process' is grammatically incomplete; rephrase to 'we identify' or restructure the sentence for readability.
Abstract and introduction: while the Bellman representation is asserted to depend only on time and evidence, a brief statement of the regularity conditions (e.g., Markovian property of the evidence process under the alternative) that justify the state reduction would improve clarity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, including the recognition of the Markovian Bellman representation, the connection to optimal control, and the recovery of the GRO criterion in the large-time limit. The recommendation is for minor revision, but the report contains no specific major comments to address.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation begins from an explicit modeling choice: assign rewards to rejection times and maximize expected reward under a known alternative. The claim that the resulting optimal-control problem admits a Bellman equation whose state is only (time, current evidence) is presented as a structural property of the Markovian setup rather than a fitted or self-defined quantity. The reduction of the hard-deadline simple-vs-simple case to a finite-horizon Neyman-Pearson problem follows directly from standard dynamic programming once that state reduction is granted. The stationary approximation for exponentially decaying rewards produces the EDO criterion, with the classical GRO recovered only as a large-time limit; neither step renames a fitted input as a prediction nor relies on a load-bearing self-citation. All load-bearing steps are therefore independent of the target results and rest on the upfront modeling assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, new entities, or detailed axioms are stated. The setup relies on standard optimal control and betting framework assumptions.

axioms (2)

domain assumption The testing-by-betting setup can be cast as a Markov decision process whose state is (time, current evidence) and whose value function satisfies a Bellman equation.
Invoked to obtain the representation in terms of time and evidence only.
domain assumption An optimal policy exists for the finite-horizon or stationary reward maximization problems under the given alternatives.
Required for the reductions to Neyman-Pearson and the EDO approximation to be well-defined.

pith-pipeline@v0.9.0 · 5475 in / 1584 out tokens · 87950 ms · 2026-05-08T04:19:08.000146+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 5 canonical work pages

[1]

Agrawal and A

S. Agrawal and A. Ramdas. On stopping times of power-one sequential tests: Tight lower and upper bounds. arXiv:2504.19952, 2025

work page arXiv 2025
[2]

E. Clerico. Optimal e -value testing for properly constrained hypotheses. arXiv:2412.21125, 2024

work page arXiv 2024
[3]

E. Clerico. On the optimality of coin-betting for mean estimation. International Journal of Approximate Reasoning, 187: 0 109550, 2025

2025
[4]

L. E. Dubins and L. J. Savage. How to Gamble If You Must: Inequalities for Stochastic Processes. McGraw-Hill, New York, 1965

1965
[5]

W. Feller. An Introduction to Probability Theory and Its Applications, volume 2. John Wiley & Sons, New York, 2 edition, 1971

1971
[6]

Fischer and A

L. Fischer and A. Ramdas. Improving wald's (approximate) sequential probability ratio test by avoiding overshoot. IEEE Transactions on Information Theory, 72 0 (4): 0 2457--2471, 2026

2026
[7]

Gr \"u nwald, R

P. Gr \"u nwald, R. de Heide, and W. M. Koolen. Safe testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 86 0 (5): 0 1091--1128, 2024

2024
[8]

Kallenberg

O. Kallenberg. Stationary and invariant densities and disintegration kernels. Probability Theory and Related Fields, 160 0 (3--4): 0 567--592, 2014

2014
[9]

A. S. Kechris. Classical Descriptive Set Theory, volume 156 of Graduate Texts in Mathematics. Springer, New York, 1995

1995
[10]

Kelly, John L

J. Kelly, John L. A new interpretation of information rate. Bell System Technical Journal, 35 0 (4): 0 917--926, 1956

1956
[11]

N. W. Koning and S. van Meer. Anytime validity is free: Inducing sequential tests. Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkag050, 2026

2026
[12]

K. Lange. Borel sets of probability measures. Pacific Journal of Mathematics, 48 0 (1): 0 141--161, 1973

1973
[13]

Larsson, A

M. Larsson, A. Ramdas, and J. Ruf. Testing hypotheses generated by constraints. arXiv:2504.02974, 2025

work page arXiv 2025
[14]

E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses. Springer, New York, 3 edition, 2005

2005
[15]

A. A. Liapounoff. Sur les fonctions-vecteurs compl \`e tement additives. Izvestiya Akademii Nauk SSSR. Seriya Matematicheskaya, 4 0 (6): 0 465--478, 1940

1940
[16]

Neyman and E

J. Neyman and E. S. Pearson. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, 231: 0 289--337, 1933

1933
[17]

Orabona and K.-S

F. Orabona and K.-S. Jun. Tight concentrations and confidence sequences from the regret of universal portfolio. IEEE Transactions on Information Theory, 70 0 (1): 0 436--455, 2024

2024
[18]

Ramdas and R

A. Ramdas and R. Wang. Hypothesis testing with e -values. Foundations and Trends in Statistics, 1 0 (1--2): 0 1--390, 2025

2025
[19]

Ramdas, J

A. Ramdas, J. Ruf, M. Larsson, and W. M. Koolen. Admissible anytime-valid sequential inference must rely on nonnegative martingales. arXiv:2009.03167, 2020

work page arXiv 2009
[20]

Ramdas, J

A. Ramdas, J. Ruf, M. Larsson, and W. M. Koolen. Testing exchangeability: Fork-convexity, supermartingales and e -processes. International Journal of Approximate Reasoning, 141: 0 83--109, 2022

2022
[21]

Ramdas, P

A. Ramdas, P. Gr \"u nwald, V. Vovk, and G. Shafer. Game-theoretic statistics and safe anytime-valid inference. Statistical Science, 38 0 (4): 0 576--601, 2023

2023
[22]

Sch \"a l and W

M. Sch \"a l and W. Sudderth. Stationary policies and markov policies in borel dynamic programming. Probability Theory and Related Fields, 74 0 (1): 0 91--111, 1987

1987
[23]

E. O. Taga, S. Oymak, and S. Shekhar. Learning to bet for horizon-aware anytime-valid testing. arXiv:2603.19551, 2026

work page arXiv 2026
[24]

Vor \'a c ek and F

V. Vor \'a c ek and F. Orabona. STaR-Bets : Sequential target-recalculating bets for tighter confidence intervals. In Advances in Neural Information Processing Systems, 2025

2025
[25]

A. Wald. Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16 0 (2): 0 117--186, 1945

1945
[26]

Waudby-Smith and A

I. Waudby-Smith and A. Ramdas. Estimating means of bounded random variables by betting. Journal of the Royal Statistical Society Series B: Statistical Methodology, 86 0 (1): 0 1--27, 2024

2024
[27]

Y.-C. Yao. On optimality of bold play for discounted Dubins--Savage gambling problems with limited playing times. Journal of Applied Probability, 44 0 (1): 0 212--225, 2007

2007