Anytime-valid testing with e-values and confirmatory adaptive designs
Pith reviewed 2026-06-28 18:01 UTC · model grok-4.3
The pith
Confirmatory adaptive designs are formally equivalent to e-value based anytime-valid sequential tests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adaptive design tools like conditional error functions and combination tests are formally equivalent to e-value based, anytime-valid sequential tests. The two frameworks share the goal of introducing flexibility into statistical inference yet differ in focus: combination tests and conditional error functions generally seek to exhaust type I error rates, while e-value testing additionally emphasizes optional stopping, chosen significance levels, and extensions to loss functions. The equivalence is shown under the standard constructions given in the respective literatures.
What carries the argument
The formal mapping between conditional error functions, combination tests, and e-values that establishes their equivalence for sequential testing.
If this is right
- E-value methods can supply optional-continuation properties to confirmatory adaptive designs.
- Adaptive design techniques can tighten error-rate control within e-value frameworks.
- The equivalence allows direct transfer of level choice and loss-function extensions between the two areas.
- Clinical trial protocols can adopt elements from both literatures without violating validity.
Where Pith is reading between the lines
- The mapping may let anytime-validity guarantees move directly into existing adaptive trial software.
- Hybrid procedures could be built that use e-value optional stopping inside a conditional error function skeleton.
- Similar equivalences might be checked for other sequential methods such as group-sequential boundaries.
Load-bearing premise
The claimed equivalence holds under the specific constructions of conditional error functions, combination tests, and e-values as defined in their respective literatures.
What would settle it
A concrete counterexample of a conditional error function or combination test that cannot be rewritten as an e-value (or vice versa) under the paper's definitions would falsify the equivalence.
read the original abstract
Confirmatory adaptive designs were introduced more than 30 years ago and enable for example sample size re-assessments and the selection of treatments, endpoints as well as subpopulations during the course of a clinical trial. Recently, sequential tests based on e-values for an anytime-valid inference have been developed, promising seemingly similar or even more flexibility and utility. In this note, we compare these two independently developed concepts, shedding light on their formal and methodological connections and differences. Specifically, we show that adaptive design tools like conditional error functions and combination tests are formally equivalent to e-value based, anytime-valid sequential tests. However, in spite of their common fundamental intention to bring flexibility into statistical inference, they have quite different emphases: While hypothesis testing with combination tests and conditional error function usually intent to exhaust type I error rates under the offered flexibility, e-value based testing aims on the additional flexibility with regard to optional continuation, the chosen level and, in recent extensions, in the loss functions to be controlled. We also indicate how recent e-value achievements could enrich clinical trial methodology and adaptive design methodology could inspire and improve e-value based testing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that confirmatory adaptive design tools, specifically conditional error functions and combination tests, are formally equivalent to e-value based anytime-valid sequential tests. It contrasts their emphases—exhausting type I error under flexibility versus additional options for continuation, level choice, and loss-function control—and indicates potential cross-enrichment between the literatures.
Significance. If the claimed formal equivalence is established under standard definitions, the note provides a bridge between two bodies of work on flexible inference, which could allow transfer of techniques such as loss-function extensions from e-values into clinical trial designs or adaptive-design ideas into sequential e-value procedures.
minor comments (2)
- The abstract asserts the equivalence but the manuscript would benefit from an explicit statement (e.g., in the introduction or a dedicated section) of the precise constructions under which the mapping holds and any restrictions that would break it.
- Notation for the mapping between conditional error functions/combination tests and e-values should be introduced once and used consistently to aid readability of the equivalence argument.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. No specific major comments were provided in the report.
Circularity Check
No significant circularity; formal equivalence between independent frameworks
full rationale
The paper's central claim is a formal equivalence mapping between two pre-existing families of procedures (conditional error functions/combination tests from adaptive design literature, and e-value based anytime-valid tests) under their standard definitions. No derivation chain reduces a result to its own inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing premise rests on a self-citation chain. The abstract and described contribution treat the two bodies of work as independently developed, with the paper only exhibiting the mapping and noting differing emphases. This is a self-contained observation of equivalence rather than a constructed result.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard axioms of probability and the mathematical definitions of e-values, conditional error functions, and combination tests as established in prior literature.
Reference graph
Works this paper leans on
-
[1]
Multistage testing with adaptive designs.Biometrie und Informatik in Medizin und Biologie, 20(4):130–148, 1989
Peter Bauer. Multistage testing with adaptive designs.Biometrie und Informatik in Medizin und Biologie, 20(4):130–148, 1989
1989
-
[2]
Combining different phases in the development of medical treatments within a single trial.Statistics in Medicine, pages 1833–1848, 1999
Peter Bauer and Meinhard Kieser. Combining different phases in the development of medical treatments within a single trial.Statistics in Medicine, pages 1833–1848, 1999
1999
-
[3]
Evaluation of experiments with adaptive interim analyses.Biometrics, 50:1029–1041, 1994
Peter Bauer and Karl Köhne. Evaluation of experiments with adaptive interim analyses.Biometrics, 50:1029–1041, 1994. (Correction in 1996 Biometrics, 52, 380)
1994
-
[4]
Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society Series B: Statistical Methodology, 57(1):289–300, 1995
1995
-
[5]
Multiple hypotheses testing with weights.Scandinavian Journal of Statistics, 24(3):407–418, 1997
Yoav Benjamini and Yosef Hochberg. Multiple hypotheses testing with weights.Scandinavian Journal of Statistics, 24(3):407–418, 1997
1997
-
[6]
Recursive combination tests
Werner Brannath, Martin Posch, and Peter Bauer. Recursive combination tests. pages 236–244, 2002
2002
-
[7]
Probabilistic foundation of confirmatory adaptive designs.Journal of the American Statistical Association, 107(498):824–832, 2012
Werner Brannath, Georg Gutjahr, and Peter Bauer. Probabilistic foundation of confirmatory adaptive designs.Journal of the American Statistical Association, 107(498):824–832, 2012
2012
-
[8]
The population-wise error rate for clinical trials with overlapping populations.Statistical Methods in Medical Research, 32(2): 334–352, 2023
Werner Brannath, Charlie Hillner, and Kornelius Rohmeyer. The population-wise error rate for clinical trials with overlapping populations.Statistical Methods in Medical Research, 32(2): 334–352, 2023
2023
-
[9]
Optimal gambling systems for favourable games
Leo Breiman. Optimal gambling systems for favourable games. InFourth Berkeley Symposium on Mathematical Statistics and Probability, pages 65–78, 1961
1961
-
[10]
A graphical approach to sequentially rejective multiple test procedures.Statistics in Medicine, 28(4):586–604, 2009
Frank Bretz, Willi Maurer, Werner Brannath, and Martin Posch. A graphical approach to sequentially rejective multiple test procedures.Statistics in Medicine, 28(4):586–604, 2009. 11
2009
-
[11]
Improving wald’s (approximate) sequential probability ratio test by avoiding overshoot.IEEE Transactions on Information Theory, (4):2457–2471, 2026
Lasse Fischer and Aaditya Ramdas. Improving wald’s (approximate) sequential probability ratio test by avoiding overshoot.IEEE Transactions on Information Theory, (4):2457–2471, 2026
2026
-
[12]
Safe testing.Journal of the Royal Statistical Society Series B: Statistical Methodology (with discussion), 2024
Peter Grünwald, Rianne de Heide, and Wouter M Koolen. Safe testing.Journal of the Royal Statistical Society Series B: Statistical Methodology (with discussion), 2024
2024
-
[13]
Beyond neyman–pearson: E-values enable hypothesis testing with a data-driven alpha.Proceedings of the National Academy of Sciences, 121(39):e2302098121, 2024
Peter D Grünwald. Beyond neyman–pearson: E-values enable hypothesis testing with a data-driven alpha.Proceedings of the National Academy of Sciences, 121(39):e2302098121, 2024
2024
-
[14]
Family-wise Error Rate Control with E-values
Will Hartog and Lihua Lei. Family-wise error rate control with e-values.arXiv preprint arXiv:2501.09015, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
Adaptive modifications of hypotheses after an interim analysis.Biometrical Journal, pages 581–589, 2001
Gerhard Hommel. Adaptive modifications of hypotheses after an interim analysis.Biometrical Journal, pages 581–589, 2001
2001
-
[16]
Powerful short-cuts for multiple testing procedures with special reference to gatekeeping strategies.Statistics in Medicine, pages 4063–73,
Gerhard Hommel, Bretz Frank, and Maurer Willi. Powerful short-cuts for multiple testing procedures with special reference to gatekeeping strategies.Statistics in Medicine, pages 4063–73,
-
[17]
doi: 10.1002/sim.2873
-
[18]
A new interpretation of information rate.The Bell System Technical Journal, 35 (4):917–926, 1956
John L Kelly. A new interpretation of information rate.The Bell System Technical Journal, 35 (4):917–926, 1956
1956
-
[19]
Adaptive graph-based multiple testing procedures.Pharmaceutical Statistics, 13(6):345–356, 2014
Florian Klinglmüller, Martin Posch, and Franz Koenig. Adaptive graph-based multiple testing procedures.Pharmaceutical Statistics, 13(6):345–356, 2014
2014
-
[20]
arXiv preprint arXiv:2312.08040 , year=
Nick W Koning. Post-hoc alpha hypothesis testing and the post-hocp-value.arXiv preprint arXiv:2312.08040, 2023
-
[21]
Continuous testing: Unifying tests and e-values.arXiv preprint arXiv:2409.05654, 2024
Nick W Koning. Continuous testing: Unifying tests and e-values.arXiv preprint arXiv:2409.05654, 2024
-
[22]
Anytime validity is free: inducing sequential tests.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkag050, 2026
Nick W Koning and Sam Van Meer. Anytime validity is free: inducing sequential tests.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkag050, 2026
2026
-
[23]
The numeraire e-variable and reverse information projection.The Annals of Statistics, 53(3):1015–1043, 2025
Martin Larsson, Aaditya Ramdas, and Johannes Ruf. The numeraire e-variable and reverse information projection.The Annals of Statistics, 53(3):1015–1043, 2025
2025
-
[24]
Atutorial on safe anytime-valid inference: Practical maximally flexible sampling designs for experiments based on e-values.PsyArXiv preprint h5vae_v3, 2024
AlexanderLy, UdoBoehm, PeterGrünwald, AadityaRamdas, andDonvanRavenzwaaij. Atutorial on safe anytime-valid inference: Practical maximally flexible sampling designs for experiments based on e-values.PsyArXiv preprint h5vae_v3, 2024
2024
-
[25]
Optimal test procedures for multiple hypotheses controlling the familywise expected loss.Biometrics, 79(4):2781–2793, 2023
Willi Maurer, Frank Bretz, and Xiaolei Xun. Optimal test procedures for multiple hypotheses controlling the familywise expected loss.Biometrics, 79(4):2781–2793, 2023
2023
-
[26]
Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential approaches.Biometrics, pages 886–891, 2001
Hans-Helge Müller and Helmut Schäfer. Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential approaches.Biometrics, pages 886–891, 2001
2001
-
[27]
A general statistical principle for changing a design any time during the course of a trial.Statistics in Medicine, 23(16):2497–2508, 2004
Hans-Helge Müller and Helmut Schäfer. A general statistical principle for changing a design any time during the course of a trial.Statistics in Medicine, 23(16):2497–2508, 2004
2004
-
[28]
Adaptive two stage designs and the conditional error function
Martin Posch and Peter Bauer. Adaptive two stage designs and the conditional error function. Biometrical Journal, pages 689––696, 1999
1999
-
[29]
A uniform improvement of bonferroni-type tests by sequential tests.Journal of the American Statistical Association, (481):299–308, 2008
Martin Posch and Andreas Futschik. A uniform improvement of bonferroni-type tests by sequential tests.Journal of the American Statistical Association, (481):299–308, 2008. 12
2008
-
[30]
Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim.Pharmaceutical Statistics, 10(2): 96–104, 2011
Martin Posch, Willi Maurer, and Frank Bretz. Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim.Pharmaceutical Statistics, 10(2): 96–104, 2011
2011
-
[31]
Designed extension of studies based on conditional power.Biometrics, 51(4):1315–1324, 1995
Michael A Proschan and Sally A Hunsberger. Designed extension of studies based on conditional power.Biometrics, 51(4):1315–1324, 1995
1995
-
[32]
Hypothesis testing with e-values.Foundations and Trends® in Statistics, 1(1-2):1–390, 2025
Aaditya Ramdas and Ruodu Wang. Hypothesis testing with e-values.Foundations and Trends® in Statistics, 1(1-2):1–390, 2025
2025
- [33]
-
[34]
Testing exchangeability: Fork-convexity, supermartingales and e-processes.International Journal of Approximate Reasoning, 141:83–109, 2022
Aaditya Ramdas, Johannes Ruf, Martin Larsson, and Wouter M Koolen. Testing exchangeability: Fork-convexity, supermartingales and e-processes.International Journal of Approximate Reasoning, 141:83–109, 2022
2022
-
[35]
Game-theoretic statistics and safe anytime-valid inference.Statistical Science, 38(4):576–601, 2023
Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, and Glenn Shafer. Game-theoretic statistics and safe anytime-valid inference.Statistical Science, 38(4):576–601, 2023
2023
-
[36]
Modification of the sample size and the schedule of interim analyses in survival trials based on data inspections.Statistics in Medicine, 20:3741–3751, 2001
Helmut Schäfer and Hans-Helge Müller. Modification of the sample size and the schedule of interim analyses in survival trials based on data inspections.Statistics in Medicine, 20:3741–3751, 2001
2001
-
[37]
Glenn Shafer. Testing by betting: A strategy for statistical and scientific communication.Journal of the Royal Statistical Society Series A: Statistics in Society (with discussion), 184(2):407–431, 2021
2021
-
[38]
Test martingales, Bayes factors and p-values.Statistical Science, 2011
Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk. Test martingales, Bayes factors and p-values.Statistical Science, 2011
2011
-
[39]
Gauthier-Villars Paris, 1939
Jean Ville.Etude critique de la notion de collectif. Gauthier-Villars Paris, 1939
1939
-
[40]
Testing randomness online.Statistical Science, 36(4):595–611, 2021
Vladimir Vovk. Testing randomness online.Statistical Science, 36(4):595–611, 2021
2021
-
[41]
E-values: Calibration, combination and applications.The Annals of Statistics, 49(3):1736–1754, 2021
Vladimir Vovk and Ruodu Wang. E-values: Calibration, combination and applications.The Annals of Statistics, 49(3):1736–1754, 2021
2021
-
[42]
Springer, 2005
Vladimir Vovk, Alexander Gammerman, and Glenn Shafer.Algorithmic learning in a random world, volume 29. Springer, 2005
2005
-
[43]
Sequential tests of statistical hypotheses.The Annals of Mathematical Statistics, 16(2):117–186, 1945
Abraham Wald. Sequential tests of statistical hypotheses.The Annals of Mathematical Statistics, 16(2):117–186, 1945
1945
-
[44]
The only admissible way of merging arbitrary e-values.Biometrika, 112:asaf020, 2025
Ruodu Wang. The only admissible way of merging arbitrary e-values.Biometrika, 112:asaf020, 2025
2025
-
[45]
Universal inference.Proceedings of the National Academy of Sciences, 117(29):16880–16890, 2020
Larry Wasserman, Aaditya Ramdas, and Sivaraman Balakrishnan. Universal inference.Proceedings of the National Academy of Sciences, 117(29):16880–16890, 2020
2020
-
[46]
Theoretische Konzepte und deren praktische Umsetzung mit SAS
Gernot Wassmer.Statistische Testverfahren für gruppensequentielle und adaptive Pläne in klinis- chen Studien. Theoretische Konzepte und deren praktische Umsetzung mit SAS. Verlag Alexander Mönch, 1999
1999
-
[47]
Springer, 2nd edition, 2025
Gernot Wassmer and Werner Brannath.Group sequential and confirmatory adaptive designs in clinical trials. Springer, 2nd edition, 2025
2025
-
[48]
arXiv preprint arXiv:2509.02517 , year=
Ziyu Xu, Aldo Solari, Lasse Fischer, Rianne de Heide, Aaditya Ramdas, and Jelle Goeman. Bringing closure to false discovery rate control: A general principle for multiple testing.arXiv preprint arXiv:2509.02517, 2025. 13 A Exhausting e-value based tests with recursive combination tests It is well known that1{U is not an e-value for a uniformly distributed...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.