Recognition: unknown
PRADAS: PRior-Assisted DAta Splitting for False Discovery Rate Control
Pith reviewed 2026-05-10 02:01 UTC · model grok-4.3
The pith
A Bayes-optimal mirror statistic from prior information, combined with optional stopping for split ratios, yields higher power while asymptotically controlling FDR.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Within the class of mirror statistics for any fixed splitting scheme, the Bayes-optimal choice can be derived from the prior; a two-stage procedure using this choice controls the false discovery rate asymptotically under weak dependence, and recasting the split ratio as a stopping time solved via the Snell envelope produces a power gain in the Rare/Weak model, with the stopping rule computed by Longstaff-Schwartz regression, as confirmed by theory, simulations, and real-data examples.
What carries the argument
The Bayes-optimal mirror statistic together with the Snell envelope that characterizes the optimal stopping time for the split ratio.
If this is right
- Asymptotic FDR control holds for the two-stage procedure under the stated weak-dependence conditions.
- Theoretical power gains appear in the Rare/Weak signal model.
- The optimal stopping rule is computable in practice via Longstaff-Schwartz regression.
- Simulations and real data examples confirm the overall effectiveness of the resulting PRADAS framework.
Where Pith is reading between the lines
- The connection between mirror statistics and optional stopping suggests the same Snell-envelope approach could be applied to other sequential testing procedures where split or threshold choices affect power.
- If the prior is only partially accurate the two-stage FDR guarantee may still provide robustness while the stopping rule adapts to the information that is present.
- The power advantage in sparse-signal settings implies potential gains in high-dimensional applications such as genomics where priors from previous studies are routinely available.
Load-bearing premise
The supplied prior information is accurate enough to produce a useful Bayes-optimal mirror statistic, and the test statistics satisfy mild weak-dependence conditions.
What would settle it
A simulation in the Rare/Weak model in which the PRADAS procedure shows no power improvement over fixed equal splitting while the empirical FDR exceeds the nominal level would falsify the claimed advantage and control.
Figures
read the original abstract
In the FDR-controlling literature, mirror statistics offer a flexible alternative to $p$-value based procedures. When prior information is available, however, it is unclear how to incorporate mirror statistics in a principled way, and the standard equal split used by data-splitting methods can be inefficient. In this paper, we characterize a broader class of mirror statistics for any fixed splitting scheme and establish asymptotic FDR control under mild weak-dependence conditions using a two-stage procedure inspired by \cite{li2021whiteout}. Within this class, we derive a Bayes-optimal mirror statistic. Theoretically, we demonstrate its power advantage through analyses in the Rare/Weak signal model. Building upon this Bayes-optimal mirror statistic, we propose \textsc{PRADAS} (PRior-Assisted DAta Splitting) that treats split ratio as a stopping time and recasts the data-splitting as an optional stopping over a natural filtration; the optimal stopping rule is characterized by the Snell envelope and computed efficiently via a Longstaff--Schwartz regression approximation. Both simulations and real data examples demonstrate the effectiveness of our proposed framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to characterize a broader class of mirror statistics applicable to any fixed splitting scheme, derive the Bayes-optimal member of this class, establish asymptotic FDR control under mild weak-dependence conditions via a two-stage procedure, demonstrate power gains for the Bayes-optimal statistic in the Rare/Weak model, and introduce PRADAS, which recasts the split ratio as a stopping time whose optimal rule is given by the Snell envelope and approximated via Longstaff-Schwartz regression, with supporting simulations and real-data examples.
Significance. If the FDR control and power results hold, particularly for the adaptive splitting procedure, the work would offer a principled method for incorporating prior information into mirror-statistic-based FDR control and a data-driven approach to choosing split ratios. The application of optimal stopping theory via the Snell envelope provides a novel technical contribution beyond fixed-split methods. The Rare/Weak model analysis and empirical validation are positive elements.
major comments (3)
- [§3 (asymptotic FDR control for fixed splits) and §4 (PRADAS stopping-time extension)] The asymptotic FDR control (via the two-stage procedure) is established for fixed splitting schemes. However, PRADAS treats the split ratio as a data-dependent stopping time adapted to the filtration generated by the data. No argument is supplied showing that the control carries over when the stopping decision can correlate with the mirror statistics, which could alter the null distribution or invalidate the threshold.
- [Rare/Weak model analysis (likely §3.3 or §4.3)] The power advantage in the Rare/Weak model is derived for the Bayes-optimal mirror statistic under a fixed split ratio. The analysis does not extend to the case where the split ratio is itself random and chosen via the Snell envelope approximation, leaving open whether the adaptive procedure preserves the claimed power gain.
- [Assumptions preceding the FDR theorem] The mild weak-dependence conditions invoked to justify the asymptotic FDR result are stated for the class of mirror statistics, but it is not shown that these conditions continue to hold uniformly when the mirror statistic is the data-dependent Bayes-optimal choice and the split is adaptive.
minor comments (2)
- [§4.1 (optional stopping setup)] The natural filtration with respect to which the stopping time is defined should be stated explicitly when the optional-stopping framework is introduced.
- [Simulation section] Simulation figures would benefit from reporting the exact Rare/Weak parameters (signal strength, sparsity level, dimension) used in each panel to facilitate direct comparison with the theoretical analysis.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment of the work's significance, and constructive major comments. We address each point below and will make revisions to strengthen the theoretical treatment of the adaptive PRADAS procedure.
read point-by-point responses
-
Referee: [§3 (asymptotic FDR control for fixed splits) and §4 (PRADAS stopping-time extension)] The asymptotic FDR control (via the two-stage procedure) is established for fixed splitting schemes. However, PRADAS treats the split ratio as a data-dependent stopping time adapted to the filtration generated by the data. No argument is supplied showing that the control carries over when the stopping decision can correlate with the mirror statistics, which could alter the null distribution or invalidate the threshold.
Authors: We appreciate this observation, which correctly identifies that the FDR control in Section 3 is derived for fixed splits. For the adaptive case in Section 4, where the split ratio is a bounded stopping time, we acknowledge the manuscript does not supply an explicit extension accounting for possible correlation with the mirror statistics. In the revised version we will add a new subsection (or appendix) establishing that the two-stage procedure continues to control FDR asymptotically. The argument will use the boundedness of the stopping time together with the weak-dependence conditions to show that the null distribution of the mirror statistics and the threshold selection remain asymptotically unaffected. revision: yes
-
Referee: [Rare/Weak model analysis (likely §3.3 or §4.3)] The power advantage in the Rare/Weak model is derived for the Bayes-optimal mirror statistic under a fixed split ratio. The analysis does not extend to the case where the split ratio is itself random and chosen via the Snell envelope approximation, leaving open whether the adaptive procedure preserves the claimed power gain.
Authors: The referee is right that the Rare/Weak power comparison is stated for a fixed split ratio. While the manuscript already reports simulation evidence that the adaptive PRADAS procedure improves power, we agree a theoretical link is missing. In the revision we will add a remark or short extension in Section 4.3 showing that the Snell-envelope stopping rule achieves power at least as high as the best fixed-ratio Bayes-optimal statistic in the large-sample limit of the Rare/Weak model, or we will derive a bound on any finite-sample power loss due to adaptivity. revision: yes
-
Referee: [Assumptions preceding the FDR theorem] The mild weak-dependence conditions invoked to justify the asymptotic FDR result are stated for the class of mirror statistics, but it is not shown that these conditions continue to hold uniformly when the mirror statistic is the data-dependent Bayes-optimal choice and the split is adaptive.
Authors: We thank the referee for highlighting this uniformity issue. The weak-dependence conditions are written for the general class, yet the data-dependent Bayes-optimal statistic and adaptive split require verification that the conditions hold uniformly. In the revised manuscript we will add a short lemma (placed before the FDR theorem) proving uniformity: the Bayes-optimal mirror statistic is continuous in the split ratio, the stopping time is bounded, and therefore the dependence coefficients remain controlled uniformly over admissible splits. revision: yes
Circularity Check
No significant circularity; derivation uses external inspiration and standard techniques
full rationale
The paper first characterizes a class of mirror statistics for any fixed splitting scheme and derives the Bayes-optimal member within that class. It then invokes an external two-stage procedure (inspired by li2021whiteout) to establish asymptotic FDR control under weak dependence. The PRADAS proposal recasts the split ratio as a stopping time and applies the Snell envelope plus Longstaff-Schwartz regression, both standard tools from optimal stopping theory. None of these steps reduces the claimed FDR control, power advantage, or Bayes-optimality result to a fitted parameter, self-defined quantity, or unverified self-citation chain. The central mathematical content remains independent of the paper's own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Mild weak-dependence conditions on the test statistics
- domain assumption Rare/Weak signal model
Reference graph
Works this paper leans on
-
[1]
The Annals of Statistics , volume=
Higher criticism for detecting sparse heterogeneous mixtures , author=. The Annals of Statistics , volume=. 2004 , publisher=
2004
-
[2]
Journal of the Royal statistical society: series B (Methodological) , volume=
Controlling the false discovery rate: a practical and powerful approach to multiple testing , author=. Journal of the Royal statistical society: series B (Methodological) , volume=. 1995 , publisher=
1995
-
[3]
Barber, Rina Foygel and Candès, Emmanuel J. , pages=. Controlling the false discovery rate via knockoffs , volume=. The Annals of Statistics , publisher=. doi:10.1214/15-aos1337 , number=
-
[4]
Proceedings of the National Academy of Sciences , volume=
Genotypic predictors of human immunodeficiency virus type 1 drug resistance , author=. Proceedings of the National Academy of Sciences , volume=. 2006 , publisher=
2006
- [5]
-
[6]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
AdaPT: an interactive procedure for multiple testing with side information , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2018 , publisher=
2018
-
[7]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Panning for gold:‘model-X’knockoffs for high dimensional controlled variable selection , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2018 , publisher=
2018
-
[8]
Journal of the American Statistical Association , volume=
False discovery rate control via data splitting , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=
2023
-
[9]
Journal of the American Statistical Association , volume=
A scale-free approach for false discovery rate control in generalized linear models , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=
2023
-
[10]
Journal of the American Statistical Association , volume=
Controlling false discovery rate using gaussian mirrors , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=
2023
-
[11]
Biometrika , volume=
A high-dimensional power analysis of the conditional randomization test and knockoffs , author=. Biometrika , volume=. 2022 , publisher=
2022
-
[12]
Journal of Machine Learning Research , volume=
Power of knockoff: The impact of ranking algorithm, augmented design, and symmetric statistic , author=. Journal of Machine Learning Research , volume=
-
[13]
arXiv preprint arXiv:1712.06465 , year=
A power and prediction analysis for knockoffs with lasso statistics , author=. arXiv preprint arXiv:1712.06465 , year=
-
[14]
The Annals of Statistics , volume=
A power analysis for model-X knockoffs with ℓ p-regularized statistics , author=. The Annals of Statistics , volume=. 2023 , publisher=
2023
-
[15]
Lee and Dennis L
Jason D. Lee and Dennis L. Sun and Yuekai Sun and Jonathan E. Taylor , title =. The Annals of Statistics , number =. 2016 , doi =
2016
-
[16]
A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models
Discussion of “A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models” , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=
2023
-
[17]
On asymptotically optimal confidence regions and tests for high-dimensional models , volume=
van de Geer, Sara and Bühlmann, Peter and Ritov, Ya’acov and Dezeure, Ruben , pages=. On asymptotically optimal confidence regions and tests for high-dimensional models , volume=. The Annals of Statistics , publisher=. doi:10.1214/14-aos1221 , number=
-
[18]
International conference on machine learning , pages=
The knockoff filter for FDR control in group-sparse and multitask regression , author=. International conference on machine learning , pages=. 2016 , organization=
2016
-
[19]
arXiv preprint arXiv:2212.08766 , year=
Asymptotically Optimal Knockoff Statistics via the Masked Likelihood Ratio , author=. arXiv preprint arXiv:2212.08766 , year=
-
[20]
The Annals of Statistics , volume=
Powerful knockoffs via minimizing reconstructability , author=. The Annals of Statistics , volume=. 2022 , publisher=
2022
-
[21]
The Annals of Statistics , volume=
Fundamental barriers to high-dimensional regression with convex penalties , author=. The Annals of Statistics , volume=. 2022 , publisher=
2022
-
[22]
The Annals of Statistics , volume=
The lasso with general gaussian designs with applications to hypothesis testing , author=. The Annals of Statistics , volume=. 2023 , publisher=
2023
-
[23]
The Annals of Statistics , volume=
Debiasing convex regularized estimators and interval estimation in linear models , author=. The Annals of Statistics , volume=. 2023 , publisher=
2023
-
[24]
Foundations and Trends
A unifying tutorial on approximate message passing , author=. Foundations and Trends. 2022 , publisher=
2022
-
[25]
The Gaussian min--max theorem in the presence of convexity, 2014
The gaussian min-max theorem in the presence of convexity , author=. arXiv preprint arXiv:1408.4837 , year=
-
[26]
IEEE Transactions on Information Theory , volume=
Precise error analysis of regularized M -estimators in high dimensions , author=. IEEE Transactions on Information Theory , volume=. 2018 , publisher=
2018
-
[27]
arXiv preprint arXiv:1506.03850 , year=
Generalized additive model selection , author=. arXiv preprint arXiv:1506.03850 , year=
-
[28]
Journal of Computational and Graphical Statistics , volume=
Sparse partially linear additive models , author=. Journal of Computational and Graphical Statistics , volume=. 2016 , publisher=
2016
-
[29]
Wainwright, Martin J. , year=. High-Dimensional Statistics: A Non-Asymptotic Viewpoint , publisher=
-
[30]
arXiv preprint arXiv:2211.02778 , year=
Near-optimal multiple testing in Bayesian linear models with finite-sample FDR control , author=. arXiv preprint arXiv:2211.02778 , year=
-
[31]
A Comparison of the Lasso and Marginal Regression , author=. J. Mach. Learn. Res. , year=
-
[32]
Annals of Statistics , year=
UPS delivers optimal phase diagram in high-dimensional variable selection , author=. Annals of Statistics , year=
-
[33]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Confidence intervals for low dimensional parameters in high dimensional linear models , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2014 , publisher=
2014
-
[34]
The Journal of Machine Learning Research , volume=
Confidence intervals and hypothesis testing for high-dimensional regression , author=. The Journal of Machine Learning Research , volume=. 2014 , publisher=
2014
-
[35]
Proceedings of the National Academy of Sciences , volume=
A modern maximum-likelihood theory for high-dimensional logistic regression , author=. Proceedings of the National Academy of Sciences , volume=. 2019 , publisher=
2019
-
[36]
2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) , year=
NGM: Neural Gaussian Mirror for Controlled Feature Selection in Neural Networks , author=. 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) , year=
2020
-
[37]
Journal of the American Statistical Association , volume=
Neuronized priors for Bayesian sparse linear regression , author=. Journal of the American Statistical Association , volume=. 2022 , publisher=
2022
-
[38]
Journal of the American statistical association , volume=
Empirical Bayes analysis of a microarray experiment , author=. Journal of the American statistical association , volume=. 2001 , publisher=
2001
-
[39]
Genetic epidemiology , volume=
Empirical Bayes methods and false discovery rates for microarrays , author=. Genetic epidemiology , volume=. 2002 , publisher=
2002
-
[40]
Statistical Science , volume=
Bayesian transfer learning , author=. Statistical Science , volume=. 2025 , publisher=
2025
-
[41]
Bayesian inference for logistic models using P
Polson, Nicholas G and Scott, James G and Windle, Jesse , journal=. Bayesian inference for logistic models using P. 2013 , publisher=
2013
-
[42]
Bulletin of the American Mathematical Society , volume=
Probability laws related to the Jacobi theta and Riemann zeta functions, and Brownian excursions , author=. Bulletin of the American Mathematical Society , volume=
-
[43]
International Statistical Review/Revue Internationale de Statistique , pages=
Normal variance-mean mixtures and z distributions , author=. International Statistical Review/Revue Internationale de Statistique , pages=. 1982 , publisher=
1982
-
[44]
International Conference on Machine Learning , pages=
Scalable spike-and-slab , author=. International Conference on Machine Learning , pages=. 2022 , organization=
2022
-
[45]
Journal of the American Statistical Association , volume=
Variational Bayes for high-dimensional linear regression with sparse priors , author=. Journal of the American Statistical Association , volume=. 2022 , publisher=
2022
-
[46]
Biometrics , volume=
Bayesian multivariate logistic regression , author=. Biometrics , volume=. 2004 , publisher=
2004
-
[47]
Journal of the American Statistical Association , year=
Skinny gibbs: A consistent and scalable gibbs sampler for model selection , author=. Journal of the American Statistical Association , year=
-
[48]
Journal of the American Statistical Association , volume=
Rare feature selection in high dimensions , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=
2021
-
[49]
The Annals of Applied Statistics , volume=
Knockoffs with side information , author=. The Annals of Applied Statistics , volume=. 2023 , publisher=
2023
-
[50]
Pattern Recognition , volume=
Fused lasso for feature selection using structural information , author=. Pattern Recognition , volume=. 2021 , publisher=
2021
-
[51]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1996 , publisher=
1996
-
[52]
Neurocomputing , volume=
Variable selection using neural-network models , author=. Neurocomputing , volume=. 2000 , publisher=
2000
-
[53]
Bayesian Analysis , year=
A review of Bayesian variable selection methods: what, how and which , author=. Bayesian Analysis , year=
-
[54]
Annals of statistics , volume=
The control of the false discovery rate in multiple testing under dependency , author=. Annals of statistics , volume=. 2001 , publisher=
2001
-
[55]
The Annals of Statistics , volume=
Some results on false discovery rate in stepwise multiple testing procedures , author=. The Annals of Statistics , volume=. 2002 , publisher=
2002
-
[56]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
A direct approach to false discovery rates , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2002 , publisher=
2002
-
[57]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2004 , publisher=
2004
-
[58]
Biometrika , volume=
False discovery control with p-value weighting , author=. Biometrika , volume=. 2006 , publisher=
2006
-
[59]
Journal of the American Statistical Association , volume=
False discovery rate control with groups , author=. Journal of the American Statistical Association , volume=. 2010 , publisher=
2010
-
[60]
Journal of Statistical Planning and Inference , volume=
Weighted p-value procedures for controlling FDR of grouped hypotheses , author=. Journal of Statistical Planning and Inference , volume=. 2014 , publisher=
2014
-
[61]
2019 , eprint=
Adaptive p-value weighting with power optimality , author=. 2019 , eprint=
2019
-
[62]
On false discovery control under dependence , volume=
Wu, Wei Biao , year=. On false discovery control under dependence , volume=. The Annals of Statistics , publisher=. doi:10.1214/009053607000000730 , number=
-
[63]
Robustness of multiple testing procedures against dependence , volume=
Clarke, Sandy and Hall, Peter , year=. Robustness of multiple testing procedures against dependence , volume=. The Annals of Statistics , publisher=. doi:10.1214/07-aos557 , number=
-
[64]
E-values: Calibration, combination and applications , volume=
Vovk, Vladimir and Wang, Ruodu , year=. E-values: Calibration, combination and applications , volume=. The Annals of Statistics , publisher=. doi:10.1214/20-aos2020 , number=
-
[65]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
False discovery rate control with e-values , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=
2022
-
[66]
The Annals of Statistics , volume=
Robust inference with knockoffs , author=. The Annals of Statistics , volume=. 2020 , publisher=
2020
-
[67]
The Annals of Statistics , volume=
Relaxing the assumptions of knockoffs by conditioning , author=. The Annals of Statistics , volume=. 2020 , publisher=
2020
-
[68]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Sure independence screening for ultrahigh dimensional feature space , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2008 , publisher=
2008
-
[69]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
-investing: a procedure for sequential control of expected false discoveries , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2008 , publisher=
2008
-
[70]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Multiple testing with the structure-adaptive Benjamini--Hochberg algorithm , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2019 , publisher=
2019
-
[71]
Nature methods , volume=
Data-driven hypothesis weighting increases detection power in genome-scale multiple testing , author=. Nature methods , volume=. 2016 , publisher=
2016
-
[72]
arXiv preprint arXiv:1701.05179 , year=
Covariate-powered weighted multiple testing with false discovery rate control , author=. arXiv preprint arXiv:1701.05179 , year=
-
[73]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Derandomised knockoffs: leveraging e-values for false discovery rate control , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2024 , publisher=
2024
-
[74]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=
2022
-
[75]
2025 , eprint=
A General Stability Approach to False Discovery Rate Control , author=. 2025 , eprint=
2025
-
[76]
Stochastic Processes and their Applications , volume=
Probability and moment inequalities for sums of weakly dependent random variables, with applications , author=. Stochastic Processes and their Applications , volume=. 2007 , publisher=
2007
-
[77]
Journal of the American Statistical Association , volume=
Oracle and adaptive compound decision rules for false discovery rate control , author=. Journal of the American Statistical Association , volume=. 2007 , publisher=
2007
-
[78]
Statistics in Medicine , volume=
Optimal Control of Directional False Discovery Rates in Large-Scale Testing , author=. Statistics in Medicine , volume=. 2025 , publisher=
2025
-
[79]
Statistics and its interface , volume=
Optimal false discovery rate control for dependent data , author=. Statistics and its interface , volume=
-
[80]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
The optimal discovery procedure: a new approach to simultaneous significance testing , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2007 , publisher=
2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.