pith. sign in

arxiv: 2606.00878 · v1 · pith:4YEE36ZQnew · submitted 2026-05-30 · 📊 stat.ME

Anytime-valid testing with e-values and confirmatory adaptive designs

Pith reviewed 2026-06-28 18:01 UTC · model grok-4.3

classification 📊 stat.ME
keywords confirmatory adaptive designse-valuesanytime-valid inferencecombination testsconditional error functionssequential testingclinical trials
0
0 comments X

The pith

Confirmatory adaptive designs are formally equivalent to e-value based anytime-valid sequential tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that tools from confirmatory adaptive designs, such as conditional error functions and combination tests, are mathematically equivalent to e-value based tests that support anytime-valid inference. The equivalence shows that two independently developed approaches to flexible statistical testing reach the same underlying mechanism. A sympathetic reader would care because the result unifies methods that permit mid-study adaptations like sample size re-assessments or endpoint selection while preserving validity. The work further contrasts their emphases, noting that adaptive designs typically aim to exhaust type I error under the allowed flexibility whereas e-value methods stress optional continuation, level choice, and loss-function control. It indicates routes for each approach to inform the other.

Core claim

Adaptive design tools like conditional error functions and combination tests are formally equivalent to e-value based, anytime-valid sequential tests. The two frameworks share the goal of introducing flexibility into statistical inference yet differ in focus: combination tests and conditional error functions generally seek to exhaust type I error rates, while e-value testing additionally emphasizes optional stopping, chosen significance levels, and extensions to loss functions. The equivalence is shown under the standard constructions given in the respective literatures.

What carries the argument

The formal mapping between conditional error functions, combination tests, and e-values that establishes their equivalence for sequential testing.

If this is right

  • E-value methods can supply optional-continuation properties to confirmatory adaptive designs.
  • Adaptive design techniques can tighten error-rate control within e-value frameworks.
  • The equivalence allows direct transfer of level choice and loss-function extensions between the two areas.
  • Clinical trial protocols can adopt elements from both literatures without violating validity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The mapping may let anytime-validity guarantees move directly into existing adaptive trial software.
  • Hybrid procedures could be built that use e-value optional stopping inside a conditional error function skeleton.
  • Similar equivalences might be checked for other sequential methods such as group-sequential boundaries.

Load-bearing premise

The claimed equivalence holds under the specific constructions of conditional error functions, combination tests, and e-values as defined in their respective literatures.

What would settle it

A concrete counterexample of a conditional error function or combination test that cannot be rewritten as an e-value (or vice versa) under the paper's definitions would falsify the equivalence.

read the original abstract

Confirmatory adaptive designs were introduced more than 30 years ago and enable for example sample size re-assessments and the selection of treatments, endpoints as well as subpopulations during the course of a clinical trial. Recently, sequential tests based on e-values for an anytime-valid inference have been developed, promising seemingly similar or even more flexibility and utility. In this note, we compare these two independently developed concepts, shedding light on their formal and methodological connections and differences. Specifically, we show that adaptive design tools like conditional error functions and combination tests are formally equivalent to e-value based, anytime-valid sequential tests. However, in spite of their common fundamental intention to bring flexibility into statistical inference, they have quite different emphases: While hypothesis testing with combination tests and conditional error function usually intent to exhaust type I error rates under the offered flexibility, e-value based testing aims on the additional flexibility with regard to optional continuation, the chosen level and, in recent extensions, in the loss functions to be controlled. We also indicate how recent e-value achievements could enrich clinical trial methodology and adaptive design methodology could inspire and improve e-value based testing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript claims that confirmatory adaptive design tools, specifically conditional error functions and combination tests, are formally equivalent to e-value based anytime-valid sequential tests. It contrasts their emphases—exhausting type I error under flexibility versus additional options for continuation, level choice, and loss-function control—and indicates potential cross-enrichment between the literatures.

Significance. If the claimed formal equivalence is established under standard definitions, the note provides a bridge between two bodies of work on flexible inference, which could allow transfer of techniques such as loss-function extensions from e-values into clinical trial designs or adaptive-design ideas into sequential e-value procedures.

minor comments (2)
  1. The abstract asserts the equivalence but the manuscript would benefit from an explicit statement (e.g., in the introduction or a dedicated section) of the precise constructions under which the mapping holds and any restrictions that would break it.
  2. Notation for the mapping between conditional error functions/combination tests and e-values should be introduced once and used consistently to aid readability of the equivalence argument.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; formal equivalence between independent frameworks

full rationale

The paper's central claim is a formal equivalence mapping between two pre-existing families of procedures (conditional error functions/combination tests from adaptive design literature, and e-value based anytime-valid tests) under their standard definitions. No derivation chain reduces a result to its own inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing premise rests on a self-citation chain. The abstract and described contribution treat the two bodies of work as independently developed, with the paper only exhibiting the mapping and noting differing emphases. This is a self-contained observation of equivalence rather than a constructed result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard definitions from probability theory and existing statistical methodology for e-values and adaptive designs; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)
  • standard math Standard axioms of probability and the mathematical definitions of e-values, conditional error functions, and combination tests as established in prior literature.
    The equivalence mapping depends on these background definitions.

pith-pipeline@v0.9.1-grok · 5720 in / 1111 out tokens · 25289 ms · 2026-06-28T18:01:54.327955+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Multistage testing with adaptive designs.Biometrie und Informatik in Medizin und Biologie, 20(4):130–148, 1989

    Peter Bauer. Multistage testing with adaptive designs.Biometrie und Informatik in Medizin und Biologie, 20(4):130–148, 1989

  2. [2]

    Combining different phases in the development of medical treatments within a single trial.Statistics in Medicine, pages 1833–1848, 1999

    Peter Bauer and Meinhard Kieser. Combining different phases in the development of medical treatments within a single trial.Statistics in Medicine, pages 1833–1848, 1999

  3. [3]

    Evaluation of experiments with adaptive interim analyses.Biometrics, 50:1029–1041, 1994

    Peter Bauer and Karl Köhne. Evaluation of experiments with adaptive interim analyses.Biometrics, 50:1029–1041, 1994. (Correction in 1996 Biometrics, 52, 380)

  4. [4]

    Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society Series B: Statistical Methodology, 57(1):289–300, 1995

  5. [5]

    Multiple hypotheses testing with weights.Scandinavian Journal of Statistics, 24(3):407–418, 1997

    Yoav Benjamini and Yosef Hochberg. Multiple hypotheses testing with weights.Scandinavian Journal of Statistics, 24(3):407–418, 1997

  6. [6]

    Recursive combination tests

    Werner Brannath, Martin Posch, and Peter Bauer. Recursive combination tests. pages 236–244, 2002

  7. [7]

    Probabilistic foundation of confirmatory adaptive designs.Journal of the American Statistical Association, 107(498):824–832, 2012

    Werner Brannath, Georg Gutjahr, and Peter Bauer. Probabilistic foundation of confirmatory adaptive designs.Journal of the American Statistical Association, 107(498):824–832, 2012

  8. [8]

    The population-wise error rate for clinical trials with overlapping populations.Statistical Methods in Medical Research, 32(2): 334–352, 2023

    Werner Brannath, Charlie Hillner, and Kornelius Rohmeyer. The population-wise error rate for clinical trials with overlapping populations.Statistical Methods in Medical Research, 32(2): 334–352, 2023

  9. [9]

    Optimal gambling systems for favourable games

    Leo Breiman. Optimal gambling systems for favourable games. InFourth Berkeley Symposium on Mathematical Statistics and Probability, pages 65–78, 1961

  10. [10]

    A graphical approach to sequentially rejective multiple test procedures.Statistics in Medicine, 28(4):586–604, 2009

    Frank Bretz, Willi Maurer, Werner Brannath, and Martin Posch. A graphical approach to sequentially rejective multiple test procedures.Statistics in Medicine, 28(4):586–604, 2009. 11

  11. [11]

    Improving wald’s (approximate) sequential probability ratio test by avoiding overshoot.IEEE Transactions on Information Theory, (4):2457–2471, 2026

    Lasse Fischer and Aaditya Ramdas. Improving wald’s (approximate) sequential probability ratio test by avoiding overshoot.IEEE Transactions on Information Theory, (4):2457–2471, 2026

  12. [12]

    Safe testing.Journal of the Royal Statistical Society Series B: Statistical Methodology (with discussion), 2024

    Peter Grünwald, Rianne de Heide, and Wouter M Koolen. Safe testing.Journal of the Royal Statistical Society Series B: Statistical Methodology (with discussion), 2024

  13. [13]

    Beyond neyman–pearson: E-values enable hypothesis testing with a data-driven alpha.Proceedings of the National Academy of Sciences, 121(39):e2302098121, 2024

    Peter D Grünwald. Beyond neyman–pearson: E-values enable hypothesis testing with a data-driven alpha.Proceedings of the National Academy of Sciences, 121(39):e2302098121, 2024

  14. [14]

    Family-wise Error Rate Control with E-values

    Will Hartog and Lihua Lei. Family-wise error rate control with e-values.arXiv preprint arXiv:2501.09015, 2025

  15. [15]

    Adaptive modifications of hypotheses after an interim analysis.Biometrical Journal, pages 581–589, 2001

    Gerhard Hommel. Adaptive modifications of hypotheses after an interim analysis.Biometrical Journal, pages 581–589, 2001

  16. [16]

    Powerful short-cuts for multiple testing procedures with special reference to gatekeeping strategies.Statistics in Medicine, pages 4063–73,

    Gerhard Hommel, Bretz Frank, and Maurer Willi. Powerful short-cuts for multiple testing procedures with special reference to gatekeeping strategies.Statistics in Medicine, pages 4063–73,

  17. [17]

    doi: 10.1002/sim.2873

  18. [18]

    A new interpretation of information rate.The Bell System Technical Journal, 35 (4):917–926, 1956

    John L Kelly. A new interpretation of information rate.The Bell System Technical Journal, 35 (4):917–926, 1956

  19. [19]

    Adaptive graph-based multiple testing procedures.Pharmaceutical Statistics, 13(6):345–356, 2014

    Florian Klinglmüller, Martin Posch, and Franz Koenig. Adaptive graph-based multiple testing procedures.Pharmaceutical Statistics, 13(6):345–356, 2014

  20. [20]

    arXiv preprint arXiv:2312.08040 , year=

    Nick W Koning. Post-hoc alpha hypothesis testing and the post-hocp-value.arXiv preprint arXiv:2312.08040, 2023

  21. [21]

    Continuous testing: Unifying tests and e-values.arXiv preprint arXiv:2409.05654, 2024

    Nick W Koning. Continuous testing: Unifying tests and e-values.arXiv preprint arXiv:2409.05654, 2024

  22. [22]

    Anytime validity is free: inducing sequential tests.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkag050, 2026

    Nick W Koning and Sam Van Meer. Anytime validity is free: inducing sequential tests.Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkag050, 2026

  23. [23]

    The numeraire e-variable and reverse information projection.The Annals of Statistics, 53(3):1015–1043, 2025

    Martin Larsson, Aaditya Ramdas, and Johannes Ruf. The numeraire e-variable and reverse information projection.The Annals of Statistics, 53(3):1015–1043, 2025

  24. [24]

    Atutorial on safe anytime-valid inference: Practical maximally flexible sampling designs for experiments based on e-values.PsyArXiv preprint h5vae_v3, 2024

    AlexanderLy, UdoBoehm, PeterGrünwald, AadityaRamdas, andDonvanRavenzwaaij. Atutorial on safe anytime-valid inference: Practical maximally flexible sampling designs for experiments based on e-values.PsyArXiv preprint h5vae_v3, 2024

  25. [25]

    Optimal test procedures for multiple hypotheses controlling the familywise expected loss.Biometrics, 79(4):2781–2793, 2023

    Willi Maurer, Frank Bretz, and Xiaolei Xun. Optimal test procedures for multiple hypotheses controlling the familywise expected loss.Biometrics, 79(4):2781–2793, 2023

  26. [26]

    Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential approaches.Biometrics, pages 886–891, 2001

    Hans-Helge Müller and Helmut Schäfer. Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential approaches.Biometrics, pages 886–891, 2001

  27. [27]

    A general statistical principle for changing a design any time during the course of a trial.Statistics in Medicine, 23(16):2497–2508, 2004

    Hans-Helge Müller and Helmut Schäfer. A general statistical principle for changing a design any time during the course of a trial.Statistics in Medicine, 23(16):2497–2508, 2004

  28. [28]

    Adaptive two stage designs and the conditional error function

    Martin Posch and Peter Bauer. Adaptive two stage designs and the conditional error function. Biometrical Journal, pages 689––696, 1999

  29. [29]

    A uniform improvement of bonferroni-type tests by sequential tests.Journal of the American Statistical Association, (481):299–308, 2008

    Martin Posch and Andreas Futschik. A uniform improvement of bonferroni-type tests by sequential tests.Journal of the American Statistical Association, (481):299–308, 2008. 12

  30. [30]

    Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim.Pharmaceutical Statistics, 10(2): 96–104, 2011

    Martin Posch, Willi Maurer, and Frank Bretz. Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim.Pharmaceutical Statistics, 10(2): 96–104, 2011

  31. [31]

    Designed extension of studies based on conditional power.Biometrics, 51(4):1315–1324, 1995

    Michael A Proschan and Sally A Hunsberger. Designed extension of studies based on conditional power.Biometrics, 51(4):1315–1324, 1995

  32. [32]

    Hypothesis testing with e-values.Foundations and Trends® in Statistics, 1(1-2):1–390, 2025

    Aaditya Ramdas and Ruodu Wang. Hypothesis testing with e-values.Foundations and Trends® in Statistics, 1(1-2):1–390, 2025

  33. [33]

    Ramdas, J

    Aaditya Ramdas, Johannes Ruf, Martin Larsson, and Wouter Koolen. Admissible anytime-valid sequential inference must rely on nonnegative martingales.arXiv preprint arXiv:2009.03167, 2020

  34. [34]

    Testing exchangeability: Fork-convexity, supermartingales and e-processes.International Journal of Approximate Reasoning, 141:83–109, 2022

    Aaditya Ramdas, Johannes Ruf, Martin Larsson, and Wouter M Koolen. Testing exchangeability: Fork-convexity, supermartingales and e-processes.International Journal of Approximate Reasoning, 141:83–109, 2022

  35. [35]

    Game-theoretic statistics and safe anytime-valid inference.Statistical Science, 38(4):576–601, 2023

    Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, and Glenn Shafer. Game-theoretic statistics and safe anytime-valid inference.Statistical Science, 38(4):576–601, 2023

  36. [36]

    Modification of the sample size and the schedule of interim analyses in survival trials based on data inspections.Statistics in Medicine, 20:3741–3751, 2001

    Helmut Schäfer and Hans-Helge Müller. Modification of the sample size and the schedule of interim analyses in survival trials based on data inspections.Statistics in Medicine, 20:3741–3751, 2001

  37. [37]

    Glenn Shafer. Testing by betting: A strategy for statistical and scientific communication.Journal of the Royal Statistical Society Series A: Statistics in Society (with discussion), 184(2):407–431, 2021

  38. [38]

    Test martingales, Bayes factors and p-values.Statistical Science, 2011

    Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk. Test martingales, Bayes factors and p-values.Statistical Science, 2011

  39. [39]

    Gauthier-Villars Paris, 1939

    Jean Ville.Etude critique de la notion de collectif. Gauthier-Villars Paris, 1939

  40. [40]

    Testing randomness online.Statistical Science, 36(4):595–611, 2021

    Vladimir Vovk. Testing randomness online.Statistical Science, 36(4):595–611, 2021

  41. [41]

    E-values: Calibration, combination and applications.The Annals of Statistics, 49(3):1736–1754, 2021

    Vladimir Vovk and Ruodu Wang. E-values: Calibration, combination and applications.The Annals of Statistics, 49(3):1736–1754, 2021

  42. [42]

    Springer, 2005

    Vladimir Vovk, Alexander Gammerman, and Glenn Shafer.Algorithmic learning in a random world, volume 29. Springer, 2005

  43. [43]

    Sequential tests of statistical hypotheses.The Annals of Mathematical Statistics, 16(2):117–186, 1945

    Abraham Wald. Sequential tests of statistical hypotheses.The Annals of Mathematical Statistics, 16(2):117–186, 1945

  44. [44]

    The only admissible way of merging arbitrary e-values.Biometrika, 112:asaf020, 2025

    Ruodu Wang. The only admissible way of merging arbitrary e-values.Biometrika, 112:asaf020, 2025

  45. [45]

    Universal inference.Proceedings of the National Academy of Sciences, 117(29):16880–16890, 2020

    Larry Wasserman, Aaditya Ramdas, and Sivaraman Balakrishnan. Universal inference.Proceedings of the National Academy of Sciences, 117(29):16880–16890, 2020

  46. [46]

    Theoretische Konzepte und deren praktische Umsetzung mit SAS

    Gernot Wassmer.Statistische Testverfahren für gruppensequentielle und adaptive Pläne in klinis- chen Studien. Theoretische Konzepte und deren praktische Umsetzung mit SAS. Verlag Alexander Mönch, 1999

  47. [47]

    Springer, 2nd edition, 2025

    Gernot Wassmer and Werner Brannath.Group sequential and confirmatory adaptive designs in clinical trials. Springer, 2nd edition, 2025

  48. [48]

    arXiv preprint arXiv:2509.02517 , year=

    Ziyu Xu, Aldo Solari, Lasse Fischer, Rianne de Heide, Aaditya Ramdas, and Jelle Goeman. Bringing closure to false discovery rate control: A general principle for multiple testing.arXiv preprint arXiv:2509.02517, 2025. 13 A Exhausting e-value based tests with recursive combination tests It is well known that1{U is not an e-value for a uniformly distributed...