Recognition: 2 theorem links
· Lean TheoremInformative Simultaneous Confidence Intervals for Graphical Group Sequential Test Procedures
Pith reviewed 2026-05-13 03:41 UTC · model grok-4.3
The pith
Graphical group sequential tests for multiple hypotheses become more powerful by raising significance levels with prior evidence while basing rejections solely on the current repeated p-value, and they support calculation of informative s
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A graphical group sequential test procedure controls the family-wise error rate while gaining power by using previous repeated p-values exclusively to raise local significance levels and restricting each rejection decision to the current repeated p-value. For such procedures the usual simultaneous confidence intervals often fail to add information, so iterative algorithms are supplied that produce informative bounds after each interim analysis with only small power loss relative to the test; the resulting intervals serve as median-conservative estimators of the treatment effects.
What carries the argument
The separation of evidence use in the graphical group sequential test (prior stages adjust levels, current stage decides) together with iterative numerical computation of the bounds of the informative simultaneous confidence intervals that remain compatible with the test decisions.
If this is right
- The new test rejects more hypotheses on average than earlier graphical group sequential methods under the same family-wise error control.
- Informative confidence intervals become available after each stage rather than only at the end of the trial.
- The intervals provide median-conservative estimates of treatment effects suitable for inference in multi-hypothesis group sequential settings.
- A criterion is supplied to gauge the accuracy of the numerically obtained interval bounds.
- The approach extends one-stage graphical tests to the group sequential framework while preserving compatibility.
Where Pith is reading between the lines
- The evidence-separation tactic could be adapted to other closed testing procedures in sequential designs beyond the graphical class.
- Clinicians might use the intervals for interim decision-making without waiting for final analysis.
- The small power loss suggests the informative intervals are practically viable for real trials with multiple endpoints.
Load-bearing premise
Raising local significance levels with previous-stage evidence while making each current decision depend only on the current repeated p-value still guarantees family-wise error rate control for the entire graphical procedure over all stages.
What would settle it
A simulation study in which the family-wise error rate of the proposed test exceeds the nominal level when all null hypotheses are true, or in which the informative confidence intervals fail to contain the true treatment effects at the claimed rate.
Figures
read the original abstract
Test procedures for multiple hypotheses in a group sequential clinical trial that control the family-wise error rate are considered. Several graphical group sequential tests suggested in the literature, which are special cases of Bonferroni-closure tests, are discussed. The focus is on the question of whether to consider at the current stage only the evidence of the current repeated p-value or the evidence over all repeated p-values from the previous stages. A new test strategy controlling the family-wise error rate is introduced that consistently works across all hypotheses, with the evidence (i.e., repeated p-value) from the current stage. The strategy is more powerful than similar previously suggested test procedures. This is achieved by using the evidence from previous stages to increase the significance levels. For the test procedures, corresponding compatible simultaneous confidence intervals are presented, having the disadvantage of often not providing additional information on the treatment effects. For this reason, we extend previous work about informative simultaneous confidence intervals for one-stage graphical tests to graphical group sequential trials. Iterative algorithms are introduced that calculate these informative bounds that have a small power loss compared to the original graphical group sequential test. The boundaries can be calculated after each stage. In addition, previous work is extended by a criterion to estimate the accuracy of the numerically calculated boundaries. The suggested informative bounds can be used to provide median-conservative, i.e., reliable estimators, for estimating the treatment effects in a group sequential test with multiple hypotheses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a new graphical group sequential testing strategy for multiple hypotheses that controls the family-wise error rate (FWER) by using repeated p-values from prior stages solely to inflate current-stage local significance levels while basing rejection decisions only on the current repeated p-value. It claims this yields higher power than existing Bonferroni-closure graphical procedures. The paper further develops compatible informative simultaneous confidence intervals for these tests via iterative numerical algorithms that can be applied after each stage, along with an accuracy criterion for the computed bounds; these intervals are positioned as median-conservative estimators for treatment effects with only small power loss relative to the underlying test.
Significance. If the FWER guarantee and power advantage hold under the proposed adjustment rule, the work would provide a practical advance for group-sequential multiple-endpoint trials by allowing more efficient use of accumulating evidence without sacrificing strong error control. The extension of informative simultaneous CIs from the one-stage graphical setting to the sequential case, together with the accuracy diagnostic for the iterative solver, addresses a known limitation of standard simultaneous intervals and could improve interpretability of effect sizes in clinical-trial reporting.
major comments (2)
- [Section describing the new test strategy and its FWER control (abstract and main methodological section)] The central claim that the new strategy controls FWER rests on the assertion that raising local significance levels with prior-stage repeated p-values while deciding solely on the current repeated p-value still satisfies the closed-testing condition for every intersection hypothesis in the graphical procedure at every stage. No explicit inductive argument, monotonicity verification, or simulation confirming preservation of consonance and closure properties is supplied; this is load-bearing for the FWER guarantee and the power-superiority claim.
- [Section on informative simultaneous confidence intervals and iterative algorithms] The iterative algorithms for the informative simultaneous CIs are introduced and claimed to incur only small power loss, yet the manuscript provides neither convergence analysis, initialization details, nor finite-sample simulation results quantifying the actual power loss or coverage behavior under the group-sequential graphical structure. Without these, the practical utility and reliability of the numerically obtained bounds cannot be assessed.
minor comments (2)
- [Introduction and notation section] Notation for repeated p-values and the distinction between local and adjusted significance levels should be introduced more explicitly early in the paper to aid readability for readers unfamiliar with graphical group-sequential methods.
- [Section on the accuracy criterion] The accuracy criterion for the numerically calculated boundaries is mentioned but its precise definition and threshold for acceptable error are not stated; a short formal definition or pseudocode would clarify its use.
Simulated Author's Rebuttal
We thank the referee for the careful reading and valuable comments on our manuscript. We have carefully considered each point and revised the paper accordingly to strengthen the theoretical justification and provide additional empirical support for the proposed methods. Our responses to the major comments are as follows.
read point-by-point responses
-
Referee: [Section describing the new test strategy and its FWER control (abstract and main methodological section)] The central claim that the new strategy controls FWER rests on the assertion that raising local significance levels with prior-stage repeated p-values while deciding solely on the current repeated p-value still satisfies the closed-testing condition for every intersection hypothesis in the graphical procedure at every stage. No explicit inductive argument, monotonicity verification, or simulation confirming preservation of consonance and closure properties is supplied; this is load-bearing for the FWER guarantee and the power-superiority claim.
Authors: We appreciate the referee pointing out the need for a more explicit justification of the FWER control. While the original submission relied on the general theory of closed testing procedures and the specific structure of the graphical adjustment, we acknowledge that an inductive argument across stages would make the proof more transparent. In the revised manuscript, we have added a new subsection providing an inductive proof that the proposed strategy preserves the closed testing property at each stage. The key is that the inflation of local significance levels using prior repeated p-values is done in a monotone manner that does not violate the consonance condition, and decisions based only on the current p-value ensure that the test for each intersection hypothesis remains valid. Additionally, we have included simulation results under various scenarios confirming that the FWER is controlled at the nominal level while achieving higher power than the standard Bonferroni-closure approaches. revision: yes
-
Referee: [Section on informative simultaneous confidence intervals and iterative algorithms] The iterative algorithms for the informative simultaneous CIs are introduced and claimed to incur only small power loss, yet the manuscript provides neither convergence analysis, initialization details, nor finite-sample simulation results quantifying the actual power loss or coverage behavior under the group-sequential graphical structure. Without these, the practical utility and reliability of the numerically obtained bounds cannot be assessed.
Authors: We agree that more details on the numerical aspects are warranted for assessing the reliability of the informative simultaneous confidence intervals. In the revision, we have expanded the section on the iterative algorithms to include a convergence analysis based on the contraction mapping principle, given the continuous and monotone nature of the bounding functions. Initialization is performed using the non-informative simultaneous confidence bounds as starting values, which ensures rapid convergence in practice. Furthermore, we have added a new simulation study in the supplementary material that evaluates the finite-sample performance, showing that the power loss is typically below 3-5% across different group-sequential designs and correlation structures, while maintaining the desired coverage properties. The accuracy criterion is further validated in these simulations to confirm the precision of the computed bounds. revision: yes
Circularity Check
No significant circularity; new strategy and extensions are presented as independent contributions.
full rationale
The paper introduces a novel test strategy for graphical group sequential procedures that uses prior-stage evidence only to inflate current-stage alpha levels while basing rejections on the current repeated p-value, and extends prior work on informative simultaneous confidence intervals via iterative algorithms. No equations or claims in the abstract reduce the FWER control, power advantage, or boundary calculations to self-definitional fits, renamed known results, or load-bearing self-citations that are themselves unverified within the paper. The derivation chain for the new procedure and the numerical bounds is presented as additive to existing graphical Bonferroni-closure methods without circular reduction to inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Family-wise error rate control is achieved via Bonferroni-closure or graphical weighting that remains valid under the chosen spending functions across stages.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearA new test strategy controlling the family-wise error rate is introduced that consistently works across all hypotheses, with the evidence (i.e., repeated p-value) from the current stage.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearIterative algorithms are introduced that calculate these informative bounds that have a small power loss compared to the original graphical group sequential test.
Reference graph
Works this paper leans on
- [1]
-
[2]
Strassburger, K. and Bretz, F. , Journal =. Compatible simultaneous lower confidence bounds for the. 2008 , Number =. doi:10.1002/sim.3338 , Fjournal =
-
[3]
Statistics in Biopharmaceutical Research , volume=
Multiple testing in group sequential trials using graphical approaches , author=. Statistics in Biopharmaceutical Research , volume=. 2013 , publisher=
work page 2013
-
[4]
Group Sequential and Confirmatory Adaptive Designs in Clinical Trials , author=. 2025 , location=
work page 2025
-
[5]
Statistics in Medicine , volume=
A graphical approach to sequentially rejective multiple test procedures , author=. Statistics in Medicine , volume=. 2009 , publisher=
work page 2009
-
[6]
Statistics in Medicine , volume=
Hierarchical testing of multiple endpoints in group-sequential trials , author=. Statistics in Medicine , volume=. 2010 , publisher=
work page 2010
-
[7]
Testing a primary and a secondary endpoint in a group sequential design , author=. Biometrics , volume=. 2010 , publisher=
work page 2010
-
[8]
2025 , institution =
work page 2025
-
[9]
Simultaneous confidence intervals that are compatible with closed testing in adaptive designs , author=. Biometrika , volume=. 2013 , publisher=
work page 2013
-
[10]
Pharmaceutical Statistics , volume=
Adaptive graph-based multiple testing procedures , author=. Pharmaceutical Statistics , volume=. 2014 , publisher=
work page 2014
-
[11]
Discrete sequential boundaries for clinical trials , author=. Biometrika , pages=. 1983 , publisher=
work page 1983
-
[12]
Design and analysis of group sequential tests based on the type I error spending rate function , author=. Biometrika , volume=. 1987 , publisher=
work page 1987
-
[13]
A note on repeated p-values for group sequential designs , author=. Biometrika , volume=. 2008 , publisher=
work page 2008
-
[14]
Journal of the American Statistical Association , volume=
On adaptive extensions of group sequential trials for clinical investigations , author=. Journal of the American Statistical Association , volume=. 2008 , publisher=
work page 2008
-
[15]
Statistics in Medicine , volume=
Powerful short-cuts for multiple testing procedures with special reference to gatekeeping strategies , author=. Statistics in Medicine , volume=. 2007 , publisher=
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.