pith. sign in

arxiv: 2605.27664 · v1 · pith:GTBP5ER5new · submitted 2026-05-26 · 📊 stat.ME

BOOST: Power-Optimal Strong-FWER Testing for Block-Structured Multiplicity

Pith reviewed 2026-06-29 15:16 UTC · model grok-4.3

classification 📊 stat.ME
keywords multiple testingstrong FWER controlblock structurepower optimizationKKT conditionseQTL mappingA/B testing
0
0 comments X

The pith

BOOST is the power-optimal strong-FWER procedure for hypotheses grouped in blocks of size three.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BOOST for structured multiple-testing problems where hypotheses are organized into design-imposed blocks. It claims this procedure achieves the highest total power among all methods that deliver finite-sample strong family-wise error rate control within the block-separable class, at linear computational cost in the number of hypotheses. The method solves an equalized-marginal allocation problem via bisection and improves on Sidak under cross-block independence. Simulations across several dependence regimes report 1.4-1.7 times higher power than the best existing baseline at the target FWER level. Real-data examples from eQTL mapping and bundled A/B tests show substantially more full-block rejections certified at controlled error.

Core claim

BOOST attains power optimality for block size three by solving the equalized-marginal KKT condition that equalizes marginal power contributions across heterogeneous blocks; the resulting allocation yields finite-sample strong FWER validity at O(K) cost and a strict improvement over Sidak when blocks are independent.

What carries the argument

The equalized-marginal KKT condition that determines the error-rate allocation across blocks to maximize total power subject to strong FWER control.

If this is right

  • Finite-sample strong FWER validity holds without any independence assumptions at O(K) cost.
  • Under cross-block independence the procedure strictly dominates Sidak in power.
  • A sample-split plug-in version controls FWER up to an additive term linear in the sup-norm estimation error of the alternative density.
  • Simulations and two published datasets show 1.4-1.7 times more discoveries than the strongest baseline at calibrated FWER.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same KKT-based allocation idea may extend to block sizes larger than three once the corresponding optimality conditions are characterized.
  • The sample-split plug-in construction suggests a general route for handling unknown alternative distributions in other structured testing settings.
  • Applications to genomics and online experiments indicate the procedure can increase the number of certifiable discoveries in any confirmatory analysis whose design already imposes blocks.

Load-bearing premise

The equalized-marginal KKT condition is solvable and produces the global power maximum inside the block-separable class.

What would settle it

Any procedure inside the block-separable class that, on the same data, rejects more hypotheses than BOOST while keeping the realized strong FWER at or below the nominal level.

Figures

Figures reproduced from arXiv: 2605.27664 by Prasanjit Dubey, Xiaoming Huo.

Figure 1
Figure 1. Figure 1: BOOST pipeline. The K = 3 optimizer of Dubey and Huo [2025] (orange boxes) appears twice: as a value-function oracle producing the concave curves π (b) 3 (·), and as a decision rule applied at the optimized allocation α ∗ b . The outer stage (red) solves the equalized-marginal KKT system of Theorem 3.10 by bisection on the shared Lagrange multiplier µ ∗ (Proposition S2.7), reducing a B-dimensional constrai… view at source ↗
Figure 2
Figure 2. Figure 2: BOOST (red) dominates every stepwise, graphical, and closed-testing baseline at moderate signals across three p-value families at K = 30, complementing [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Per-block fitting converts block heterogeneity into a power source ( [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Plug-in FWER is controlled and the per-hypothesis oracle deficit is [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Domain-dictated K = 3 block structure for the two applications. Each domain unit (a gene in BLUEPRINT, an experiment in Upworthy) generates three coordinate tests against a common internal reference (three immune lineages sharing the same cis-SNP; three challengers against a shared baseline), whose p-values form one block. BOOST certifies the block-level event T3 j=1{pb,j rejected} (“all three reject”), th… view at source ↗
Figure 6
Figure 6. Figure 6: Average power ΠK across block counts, all K ∈ {6, 15, 30, 60}. Rows: K increasing top-to-bottom (B ∈ {2, 5, 10, 20}). Columns: truncnorm, tdist, sparse. Methods match the master comparison [PITH_FULL_IMAGE:figures/full_fig_p045_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Minimal (any-discovery) power Πany across block counts, all K ∈ {6, 15, 30, 60}. Rows and columns as in [PITH_FULL_IMAGE:figures/full_fig_p046_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Convergence of the outer KKT bisection (Proposition S2.7): [PITH_FULL_IMAGE:figures/full_fig_p047_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: BOOST scalability (experiment E3). Left: outer-solve wall-clock vs. K. Right: per-replicate decide time vs. K, alongside the closed-Fisher baseline for reference. BOOST’s decide time is linear in K (flat throughput), consistent with the O(K) cost of Theorem 3.10; the outer solve is K-independent [PITH_FULL_IMAGE:figures/full_fig_p047_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: reports the BOOST power curve under the two admissible per-block budgets: the Bonferroni floor α/B and the Sid´ak tightening 1 ˇ − (1 − α) 1/B. Both are strong-FWER valid (Theorem 3.3); the Sid´ak variant strictly dominates under cross-block independence, ˇ with the largest gain at moderate signal strengths where per-block power is still sensitive to the allocation. The gap shrinks as the signal-to-noise … view at source ↗
Figure 11
Figure 11. Figure 11: KKT (equalized-marginal) versus uniform allocation across heterogeneous [PITH_FULL_IMAGE:figures/full_fig_p049_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: shows the resulting power curves. BOOST leads the classical and dependence-aware stepwise baselines over the full signal grid, and closed-Fisher collapses because the Fisher statistic is calibrated for uniform nulls only; the ordering matches Section 4.1. All procedures control FWER at the nominal 0.05 level (empirical FWER ≤ 0.051). 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Beta shape s (smaller = stronger) 0.0 0.2 0.… view at source ↗
Figure 13
Figure 13. Figure 13: Plug-in density misspecification sweep. Truncnorm, [PITH_FULL_IMAGE:figures/full_fig_p050_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Empirical validation of Corollary S2.6. Truncnorm [PITH_FULL_IMAGE:figures/full_fig_p051_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Empirical FWER under three Gaussian dependence regimes (panels left-to [PITH_FULL_IMAGE:figures/full_fig_p054_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Sparsity and partition alignment (truncnorm, [PITH_FULL_IMAGE:figures/full_fig_p055_16.png] view at source ↗
read the original abstract

Structured multiple-testing problems (gatekeeping trials, dose-finding, multi-tissue eQTL mapping, bundled-challenger A/B experiments) organize hypotheses into design-imposed blocks and demand strong family-wise error rate (FWER) control for confirmatory claims. Practitioners currently use objective-agnostic stepwise rules (Bonferroni, Holm, Hochberg, Hommel), closed-testing and graphical extensions, or hierarchical and resampling methods; none is power-optimal within the block-separable class these designs induce. We introduce BOOST (Block-Optimal Objective-driven Strong-FWER Testing), the power-optimal strong-FWER procedure for block size three, with three guarantees: (i) finite-sample strong-FWER validity at $O(K)$ cost (versus $O(K^2)$ for general closed testing) without independence assumptions, with a strict Sidak improvement under cross-block independence; (ii) power-optimal allocation across heterogeneous blocks via an equalized-marginal KKT condition, solvable by bisection in $O(B\log(1/\varepsilon))$; and (iii) a sample-split plug-in variant for unknown alternative density $g$, attaining $\alpha$-control up to $O(B_T \mathbb E\|g-\widehat g\|_\infty)$ inflation with per-hypothesis power deficit independent of $B_T$. Simulations across independent, equicorrelated, sparse, and mis-specified regimes show 1.4-1.7$\times$ power gains over the strongest existing baseline at calibrated FWER. On two published datasets (BLUEPRINT cross-lineage cis-eQTL and Upworthy bundled-challenger A/B experiments), BOOST certifies an order of magnitude more full-block discoveries than existing baselines at controlled FWER.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces BOOST, a procedure for strong-FWER control in block-structured multiple testing with block size three. It claims finite-sample strong-FWER validity at O(K) cost without independence assumptions (with Sidak improvement under cross-block independence), power-optimality within the block-separable class via an equalized-marginal KKT condition solved by bisection in O(B log(1/ε)), a sample-split plug-in variant for unknown alternative density g with controlled inflation, and 1.4-1.7× power gains over baselines in simulations plus more discoveries on two real datasets at controlled FWER.

Significance. If the finite-sample validity and global optimality claims hold, the work would advance methodology for confirmatory structured testing (e.g., gatekeeping, eQTL, A/B experiments) by providing the first power-optimal rule in the block-separable class together with linear-time computation, offering both theoretical and practical improvements over stepwise, closed-testing, and graphical methods.

major comments (3)
  1. [Abstract] Abstract: finite-sample strong-FWER validity is asserted with no derivation, proof sketch, or theorem reference; this guarantee is load-bearing for all subsequent claims including the O(K) procedure and plug-in variant.
  2. [Abstract (KKT allocation paragraph)] Abstract (paragraph on KKT allocation): the claim that the equalized-marginal KKT condition yields the global power optimum (and that no other rule in the block-separable class can exceed it) assumes the Lagrangian admits a unique global solution, but provides no argument that the power objective is strictly concave in the per-block thresholds or that the strong-FWER constraint set is convex in the allocation variables; without this, the bisection solver may locate only a local stationary point or fail to exist for some p-value distributions.
  3. [Simulations across independent, equicorrelated, sparse, and mis-specified regimes] Simulations section: the reported 1.4-1.7× power gains give no detail on how FWER was calibrated for each baseline or whether post-hoc tuning occurred, undermining the ability to attribute gains specifically to the KKT optimality rather than calibration differences.
minor comments (2)
  1. [Abstract] The O(K) vs. O(K²) complexity comparison with general closed testing is stated without an explicit algorithmic complexity breakdown or pseudocode for the bisection solver.
  2. [Abstract] Notation for the plug-in bound O(B_T E||g - ĝ||_∞) is introduced without defining B_T or the precise form of the per-hypothesis power deficit.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our finite-sample guarantees and simulation details. We respond to each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: finite-sample strong-FWER validity is asserted with no derivation, proof sketch, or theorem reference; this guarantee is load-bearing for all subsequent claims including the O(K) procedure and plug-in variant.

    Authors: We agree that the abstract should explicitly reference the supporting result. The finite-sample strong-FWER validity is established in Theorem 1 (Section 3), which derives the O(K) procedure from the block-wise Sidak bound without independence assumptions. In the revision we will insert a parenthetical reference to Theorem 1 immediately after the validity claim in the abstract. revision: yes

  2. Referee: [Abstract (KKT allocation paragraph)] Abstract (paragraph on KKT allocation): the claim that the equalized-marginal KKT condition yields the global power optimum (and that no other rule in the block-separable class can exceed it) assumes the Lagrangian admits a unique global solution, but provides no argument that the power objective is strictly concave in the per-block thresholds or that the strong-FWER constraint set is convex in the allocation variables; without this, the bisection solver may locate only a local stationary point or fail to exist for some p-value distributions.

    Authors: The referee correctly notes that the abstract does not spell out the concavity/convexity argument. Within the block-separable class the equalized-marginal condition is obtained directly from the KKT stationarity requirement on the separable Lagrangian; uniqueness follows from the strict monotonicity of the marginal power functions under the maintained regularity conditions on g. The bisection solver is guaranteed to locate the unique root because the left-hand side of the equalized-marginal equation is strictly decreasing. We will add a one-sentence clarification of this monotonicity in the abstract and a short paragraph in Section 4 referencing the relevant properties of the power objective. revision: partial

  3. Referee: [Simulations across independent, equicorrelated, sparse, and mis-specified regimes] Simulations section: the reported 1.4-1.7× power gains give no detail on how FWER was calibrated for each baseline or whether post-hoc tuning occurred, undermining the ability to attribute gains specifically to the KKT optimality rather than calibration differences.

    Authors: We agree that additional calibration details are needed. In the revised simulations section we will report, for each baseline, the exact nominal level at which it was run, the method used to enforce exact FWER control (e.g., closed-testing or resampling), and confirmation that no post-hoc adjustment was applied. This will make clear that the observed power advantage is attributable to the KKT allocation rather than differential calibration. revision: yes

Circularity Check

0 steps flagged

No circularity; optimality derived directly from KKT conditions on stated optimization problem

full rationale

The paper formulates power maximization under strong-FWER constraints as an explicit optimization problem, derives the equalized-marginal KKT stationarity condition from the Lagrangian, and solves it via bisection. This is a standard first-principles derivation from the defined objective and constraints, not a fit to data, self-referential definition, or load-bearing self-citation. Finite-sample validity and simulation power gains are established independently. No quoted steps reduce by construction to inputs or prior author results; the global-optimality question is a convexity/correctness issue outside circularity analysis.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method rests on the standard definition of strong FWER, the assumption that the design imposes a block partition of fixed size three, and the existence of a marginal power function that admits an equalized KKT solution; no new entities are postulated.

free parameters (1)
  • block allocation parameters
    Solved numerically by bisection for each block to satisfy the equalized-marginal condition; values are not pre-specified but computed from the optimization.
axioms (2)
  • domain assumption Hypotheses are partitioned into blocks of size three by the experimental design
    Invoked throughout the abstract as the setting that induces the block-separable class.
  • domain assumption Strong FWER is the relevant error criterion for confirmatory claims
    Standard in the multiple-testing literature and taken as given.

pith-pipeline@v0.9.1-grok · 5844 in / 1526 out tokens · 36575 ms · 2026-06-29T15:16:08.102037+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    URL https://onlinelibrary.wiley.com/doi/abs/10

    doi: 10.1002/sim.3495. URL https://onlinelibrary.wiley.com/doi/abs/10. 1002/sim.3495. Frank Bretz, Martin Posch, Ekkehard Glimm, Florian Klinglmueller, Willi Maurer, and Kornelius Rohmeyer. Graphical approaches for multiple comparison procedures using weighted Bonferroni, Simes, or parametric tests.Biometrical Journal, 53(6):894–913,

  2. [2]

    URL https://onlinelibrary.wiley.com/doi/ abs/10.1002/bimj.201000239

    doi: 10.1002/bimj.201000239. URL https://onlinelibrary.wiley.com/doi/ abs/10.1002/bimj.201000239. Lu Chen, Bing Ge, Francesco Paolo Casale, Louella Vasquez, Tony Kwan, Diego Garrido- Mart´ ın, Stephen Watt, Ying Yan, Kousik Kundu, Simone Ecker, Avik Datta, David Richardson, Frances Burden, Daniel Mead, Alice L. Mann, Jose Maria Fernandez, Sophia Rowlston,...

  3. [3]

    Family-wise Error Rate Control with E-values

    ISBN 9780521864015. doi: 10.1017/CBO9781139020893. Will Hartog and Lihua Lei. Family-wise error rate control with e-values, 2025. URL https://arxiv.org/abs/2501.09015. Yosef Hochberg. A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75(4):800–802, 1988. ISSN 00063444. URL http://www.jstor.org/ stable/2336325. Yosef Hochberg a...

  4. [4]

    at level α∗,ind b applied to Q(b) ⃗hK . In the homogeneous special case (Assumption S2.1), Π∗,ind K = π3(1 − (1 −α )1/B)and the optimal allocation is the uniform ˇSid´ ak splitα(b) blk = 1 − (1 −α )1/B, strictly dominating the Bonferroni split α/B for B > 1and any α∈ (0, 1) on whichπ 3 is strictly increasing. Proof. By Theorem 3.9, every Dsep ∈D ind sep w...

  5. [5]

    (Remark S2.3). Proof. (i)If ⃗D(b) depends only on X(b), then {V (b) > 0} ∈σ (X(b)) by composition.(ii) Under (1), the unordered block X(b) = ( u(b) 1 , u(b) 2 , u(b) 3 ) has product density Q i ˜gi(u(b) i ) with ˜gi =1 [0,1] if η(b) i = 0 and ˜gi = g if η(b) i = 1. Mapping to the ordered simplex Q = {u1 ≤u 2 ≤u 3} of volume 1 /3! multiplies by the symmetr...

  6. [6]

    Cross-block equicorrelation: Xk = √ρZ0 + √1−ρZ k with a single latent factor Z0 coupling allK, forρ∈ {0.2,0.4,0.6,0.8,0.95}

  7. [7]

    Findings.Figure 15 reports two FWER statistics per regime under the complete null: the global FWER P(∪k{k∈ R} ) and the average per-block FWER B−1P b P(Eb)

    1-factor: Xk = λkZ0 + p 1−λ 2 kZk with block-constant heterogeneous loadings λk averaging ¯λ∈ {0.1,0.3,0.5,0.7,0.9}. Findings.Figure 15 reports two FWER statistics per regime under the complete null: the global FWER P(∪k{k∈ R} ) and the average per-block FWER B−1P b P(Eb). Under independence, global FWER is 0.049 (nominal) and block-level is 0.005 ≈α/B , ...