Feedback-Enhanced Online Multiple Testing with Applications to Conformal Selection
Pith reviewed 2026-05-18 19:31 UTC · model grok-4.3
The pith
GAIF uses revealed hypothesis outcomes to dynamically adjust testing thresholds while preserving finite-sample FDR control.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GAIF is a generalized alpha-investing procedure that receives the true label of each tested hypothesis after the rejection decision has been issued. It uses the observed label to update the remaining alpha budget for all future tests, thereby producing data-dependent thresholds that still guarantee finite-sample FDR or marginal FDR control. When the same feedback loop is attached to a stream of conformal p-values, a model-selection step chooses the score function that has performed best on the already-revealed labels, which increases the number of discoveries while the FDR bound remains intact.
What carries the argument
GAIF, the feedback-enhanced generalized alpha-investing rule that updates the alpha investment level after each revealed outcome and then sets the next rejection threshold from the updated budget.
If this is right
- Sequential testing procedures can now incorporate delayed outcome information without sacrificing exact finite-sample error control.
- Conformal selection gains an automatic way to switch between candidate score functions using only the revealed labels.
- The same feedback mechanism applies to any alpha-investing scheme, not only to the generalized version presented here.
- Power gains are largest when the revealed outcomes are informative about the remaining hypotheses.
Where Pith is reading between the lines
- The method could be paired with bandit-style allocation rules that decide which hypotheses to test next on the basis of past feedback.
- In streaming data settings the procedure might be run with a sliding window on the revealed labels to adapt to distribution drift.
- Because conformal p-values are constructed to be independent of the selection rule, the feedback-driven model choice remains valid under the same marginal FDR bound.
Load-bearing premise
After each decision the true state of the hypothesis is revealed and can be fed back into the threshold rule without destroying the finite-sample FDR guarantee.
What would settle it
Run GAIF on a stream of hypotheses whose true labels are generated adversarially after each decision; check whether the realized proportion of false discoveries exceeds the nominal FDR level at any finite horizon.
Figures
read the original abstract
We study online multiple testing with feedback, where decisions are made sequentially and the true state of the hypothesis is revealed after the decision has been made, either instantly or with a delay. We propose GAIF, a feedback-enhanced generalized alpha-investing framework that dynamically adjusts thresholds using revealed outcomes, ensuring finite-sample false discovery rate (FDR)/marginal FDR control. Extending GAIF to online conformal testing, we construct independent conformal $p$-values and introduce a feedback-driven model selection criterion to identify the best model/score, thereby improving statistical power. We demonstrate the effectiveness of our methods through numerical simulations and real-data applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes GAIF, a feedback-enhanced generalized alpha-investing procedure for online multiple testing in which the true state of each hypothesis is revealed after the decision (instantly or with delay). It claims that GAIF dynamically updates thresholds using these revelations while preserving finite-sample FDR and marginal FDR control. The method is extended to online conformal testing by constructing independent conformal p-values and introducing a feedback-driven model selection step that improves power. Numerical simulations and real-data examples are provided to illustrate gains over non-feedback baselines.
Significance. If the finite-sample control result holds under delayed feedback, the work would meaningfully extend alpha-investing ideas to realistic sequential settings with post-decision revelations, offering a practical route to higher power in conformal selection and related online testing problems. The conformal extension and empirical demonstrations are clear strengths.
major comments (2)
- [§4.1, Theorem 2] §4.1, Theorem 2 (FDR control under delay): the supermartingale argument for the alpha-wealth process is stated for the immediate-revelation filtration; the extension to delayed feedback requires an explicit re-derivation showing that the wealth increment remains a supermartingale when the revelation for hypothesis i arrives after decisions for some j > i have already been made. Without this step the finite-sample bound does not automatically carry over.
- [§5.3] §5.3, conformal p-value construction: the independence claim for the conformal p-values under the feedback-driven model selection criterion is not accompanied by a precise statement of the filtration or conditioning that prevents the selection step from introducing dependence between the p-value for the current hypothesis and the feedback used to choose the score function.
minor comments (2)
- [§3] Notation for the delayed revelation time τ_i is introduced in §3 but used inconsistently in the algorithm pseudocode; a single consistent definition would improve readability.
- [Figure 3] Figure 3 caption does not specify the delay distribution used in the simulation; adding this detail would allow readers to reproduce the power curves.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments. We address each major comment below and indicate the revisions we will incorporate.
read point-by-point responses
-
Referee: [§4.1, Theorem 2] §4.1, Theorem 2 (FDR control under delay): the supermartingale argument for the alpha-wealth process is stated for the immediate-revelation filtration; the extension to delayed feedback requires an explicit re-derivation showing that the wealth increment remains a supermartingale when the revelation for hypothesis i arrives after decisions for some j > i have already been made. Without this step the finite-sample bound does not automatically carry over.
Authors: We agree that the supermartingale argument as currently written is developed explicitly under immediate revelation. In the revision we will add a dedicated subsection that re-derives the supermartingale property under delayed feedback. The argument will use a filtration that includes all decisions made before the delayed revelation arrives and will verify that the wealth increment remains a supermartingale with respect to this filtration, thereby preserving the finite-sample FDR bound. revision: yes
-
Referee: [§5.3] §5.3, conformal p-value construction: the independence claim for the conformal p-values under the feedback-driven model selection criterion is not accompanied by a precise statement of the filtration or conditioning that prevents the selection step from introducing dependence between the p-value for the current hypothesis and the feedback used to choose the score function.
Authors: We will expand Section 5.3 with an explicit statement of the filtration and the conditioning argument. The model-selection step is measurable with respect to the sigma-field generated by all previous revelations and decisions; the conformal p-value for the current hypothesis is then constructed from a score function chosen conditionally on that sigma-field. Because the conformal scores remain exchangeable under the null conditionally on the selected model, the resulting p-value is independent of the selection step and valid for the online procedure. revision: yes
Circularity Check
No significant circularity; GAIF extends alpha-investing with independent feedback control derivation
full rationale
The paper presents GAIF as an extension of generalized alpha-investing that incorporates revealed outcomes (instant or delayed) to dynamically adjust thresholds while preserving finite-sample FDR/mFDR control. This control is derived from the supermartingale property of the alpha-wealth process under the feedback filtration, building on but not reducing to prior alpha-investing results. No step equates the claimed guarantee to a fitted parameter or self-citation by construction; the feedback update rule and conformal p-value construction introduce new elements that are explicitly re-derived for the delayed case. The framework remains self-contained against external benchmarks for FDR control.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Revealed outcomes after each decision are accurate and can be used for threshold updates without breaking finite-sample FDR control.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GAIF procedure adjusts thresholds using revealed outcomes θt for finite-sample FDR/mFDR control under conditional super-uniformity
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
online conformal p-values constructed by updating calibration set C0t under exchangeability Assumption 1
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables
Post-hoc conformal selection creates a path of selection sets with estimated false discovery proportions, enabling data-driven adaptive FDR control with average reliability guarantees via e-variables and e-BH.
Reference graph
Works this paper leans on
-
[1]
Generalized -investing: definitions, optimality results and application to public databases
Ehud Aharoni and Saharon Rosset. Generalized -investing: definitions, optimality results and application to public databases. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76 0 (4): 0 771--794, 2014
work page 2014
-
[2]
Theoretical Foundations of Conformal Prediction
Anastasios N Angelopoulos, Rina Foygel Barber, and Stephen Bates. Theoretical foundations of conformal prediction. arXiv preprint arXiv:2411.11824, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[3]
Optimized conformal selection: Powerful selective inference after conformity score optimization
Tian Bai and Ying Jin. Optimized conformal selection: Powerful selective inference after conformity score optimization. arXiv preprint arXiv:2411.17983, 2024
-
[4]
Testing for outliers with conformal p-values
Stephen Bates, Emmanuel Cand \`e s, Lihua Lei, Yaniv Romano, and Matteo Sesia. Testing for outliers with conformal p-values. The Annals of Statistics, 51 0 (1): 0 149--178, 2023
work page 2023
-
[5]
Barry Becker and Ronny Kohavi. Adult income investigation . UCI Machine Learning Repository https://archive-beta.ics.uci.edu/dataset/2/adult, 1996
work page 1996
-
[6]
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57 0 (1): 0 289--300, 1995
work page 1995
-
[7]
Thomas Brooks, D Pope, and Michael Marcolini. Airfoil Self-Noise . UCI Machine Learning Repository https://archive.ics.uci.edu/dataset/291/airfoil+self+noise, 2014
work page 2014
-
[8]
On-line consistent ranking on e-recruitment: seeking the truth behind a well-formed cv
Evanthia Faliagka, Lazaros Iliadis, Ioannis Karydis, Maria Rigou, Spyros Sioutas, Athanasios Tsakalidis, and Giannis Tzimas. On-line consistent ranking on e-recruitment: seeking the truth behind a well-formed cv. Artificial Intelligence Review, 42 0 (3): 0 515--528, 2014
work page 2014
-
[9]
Online generalizations of the e-BH and BH procedure
Lasse Fischer, Ziyu Xu, and Aaditya Ramdas. Online generalizations of the e-BH and BH procedure. arXiv preprint arXiv:2407.20683, 2024
-
[10]
Online false discovery rate control for LORD++ and SAFFRON under positive, local dependence
Aaron Fisher. Online false discovery rate control for LORD++ and SAFFRON under positive, local dependence. Biometrical Journal, 66 0 (1): 0 2300177, 2024
work page 2024
-
[11]
-investing: a procedure for sequential control of expected false discoveries
Dean P Foster and Robert A Stine. -investing: a procedure for sequential control of expected false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70 0 (2): 0 429--444, 2008
work page 2008
-
[12]
Structure--adaptive sequential testing for online false discovery rate control
Bowen Gang, Wenguang Sun, and Weinan Wang. Structure--adaptive sequential testing for online false discovery rate control. Journal of the American Statistical Association, pages 1--14, 2021
work page 2021
-
[13]
Conformal online model aggregation
Matteo Gasparin and Aaditya Ramdas. Conformal online model aggregation. arXiv preprint arXiv:2403.15527, 2024
-
[14]
Adaptive conformal inference under distribution shift
Isaac Gibbs and Emmanuel Cand \`e s. Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34: 0 1660--1672, 2021
work page 2021
-
[15]
Conformal inference for online prediction with arbitrary distribution shifts
Isaac Gibbs and Emmanuel J Cand \`e s. Conformal inference for online prediction with arbitrary distribution shifts. Journal of Machine Learning Research, 25 0 (162): 0 1--36, 2024
work page 2024
-
[16]
Conformal alignment: Knowing when to trust foundation models with guarantees
Yu Gui, Ying Jin, and Zhimei Ren. Conformal alignment: Knowing when to trust foundation models with guarantees. Advances in Neural Information Processing Systems, 37: 0 73884--73919, 2024
work page 2024
-
[17]
ACS : An interactive framework for conformal selection
Yu Gui, Ying Jin, Yash Nair, and Zhimei Ren. ACS : An interactive framework for conformal selection. arXiv preprint arXiv:2507.15825, 2025
-
[18]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43 0 (2): 0 1--55, 2025
work page 2025
-
[19]
Online selective conformal inference: adaptive scores, convergence rate and optimality
Pierre Humbert, Ulysse Gazin, Ruth Heller, and Etienne Roquain. Online selective conformal inference: adaptive scores, convergence rate and optimality. arXiv preprint arXiv:2508.10336, 2025
-
[20]
Real-time selection under general constraints via predictive inference
Yuyang Huo, Lin Lu, Haojie Ren, and Changliang Zou. Real-time selection under general constraints via predictive inference. Advances in Neural Information Processing Systems, 37: 0 61267--61305, 2024
work page 2024
-
[21]
On Online Control of False Discovery Rate
Adel Javanmard and Andrea Montanari. On online control of false discovery rate. arXiv preprint arXiv:1502.06197, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[22]
Online rules for control of false discovery rate and false discovery exceedance
Adel Javanmard and Andrea Montanari. Online rules for control of false discovery rate and false discovery exceedance. The Annals of Statistics, 46 0 (2): 0 526--554, 2018
work page 2018
-
[23]
Model-free selective inference under covariate shift via weighted conformal p-values
Ying Jin and Emmanuel J Cand \`e s. Model-free selective inference under covariate shift via weighted conformal p-values. arXiv preprint arXiv:2307.09291, 2023 a
-
[24]
Selection by prediction with conformal p-values
Ying Jin and Emmanuel J Cand \`e s. Selection by prediction with conformal p-values. Journal of Machine Learning Research, 24 0 (244): 0 1--41, 2023 b
work page 2023
-
[25]
Kaggle. Candidate selection dataset. https://www.kaggle.com/datasets/tarunchilkur/client, 2020
work page 2020
-
[26]
Diabetes health indicators dataset
Kaggle. Diabetes health indicators dataset. https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset, 2021
work page 2021
-
[27]
Fdr control for online anomaly detection
Etienne Kr \"o nert, Alain C \'e lisse, and Dalila Hattab. Fdr control for online anomaly detection. arXiv preprint arXiv:2312.01969, 2023
-
[28]
Adaptive novelty detection with false discovery rate guarantee
Ariane Marandon, Lihua Lei, David Mary, and Etienne Roquain. Adaptive novelty detection with false discovery rate guarantee . The Annals of Statistics, 52 0 (1): 0 157 -- 183, 2024
work page 2024
-
[29]
Diversifying conformal selections
Yash Nair, Ying Jin, James Yang, and Emmanuel Candes. Diversifying conformal selections. arXiv preprint arXiv:2506.16229, 2025
-
[30]
WATCH : Adaptive monitoring for AI deployments via weighted-conformal martingales
Drew Prinster, Xing Han, Anqi Liu, and Suchi Saria. WATCH : Adaptive monitoring for AI deployments via weighted-conformal martingales. In Forty-second International Conference on Machine Learning, 2025
work page 2025
-
[31]
Online control of the false discovery rate with decaying memory
Aaditya Ramdas, Fanny Yang, Martin J Wainwright, and Michael I Jordan. Online control of the false discovery rate with decaying memory. Advances in Neural Information Processing Systems, 30: 0 5655--5664, 2017
work page 2017
-
[32]
SAFFRON : an adaptive algorithm for online control of the false discovery rate
Aaditya Ramdas, Tijana Zrnic, Martin Wainwright, and Michael Jordan. SAFFRON : an adaptive algorithm for online control of the false discovery rate. In International Conference on Machine Learning, pages 4286--4294. PMLR, 2018
work page 2018
-
[33]
A unified treatment of multiple testing with prior knowledge using the p-filter
Aaditya K Ramdas, Rina F Barber, Martin J Wainwright, and Michael I Jordan. A unified treatment of multiple testing with prior knowledge using the p-filter. The Annals of Statistics, 47 0 (5): 0 2790--2821, 2019
work page 2019
-
[34]
Online false discovery rate control for anomaly detection in time series
Quentin Rebjock, Baris Kurt, Tim Januschowski, and Laurent Callot. Online false discovery rate control for anomaly detection in time series. Advances in Neural Information Processing Systems, 34: 0 26487--26498, 2021
work page 2021
-
[35]
Online error rate control for platform trials
David S Robertson, James MS Wason, Franz K \"o nig, Martin Posch, and Thomas Jaki. Online error rate control for platform trials. Statistics in Medicine, 42 0 (14): 0 2475--2495, 2023
work page 2023
-
[36]
John D Storey, Jonathan E Taylor, and David Siegmund. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society Series B: Statistical Methodology, 66 0 (1): 0 187--205, 2004
work page 2004
-
[37]
ADDIS : an adaptive discarding algorithm for online FDR control with conservative nulls
Jinjin Tian and Aaditya Ramdas. ADDIS : an adaptive discarding algorithm for online FDR control with conservative nulls. Advances in Neural Information Processing Systems, 32: 0 9388--9396, 2019
work page 2019
-
[38]
Vladimir Vovk. Testing randomness online. Statistical Science, 36 0 (4): 0 595--611, 2021
work page 2021
-
[39]
Testing exchangeability on-line
Vladimir Vovk, Ilia Nouretdinov, and Alexander Gammerman. Testing exchangeability on-line. In Proceedings of the 20th International Conference on Machine Learning, pages 768--775, 2003
work page 2003
-
[40]
Algorithmic learning in a random world
Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic learning in a random world. New York: Springer, 2005
work page 2005
-
[41]
Conformalized multiple testing after data-dependent selection
Xiaoning Wang, Yuyang Huo, Liuhua Peng, and Changliang Zou. Conformalized multiple testing after data-dependent selection. Advances in Neural Information Processing Systems, 37: 0 58574--58609, 2024
work page 2024
-
[42]
Optimal subsampling via predictive inference
Xiaoyang Wu, Yuyang Huo, Haojie Ren, and Changliang Zou. Optimal subsampling via predictive inference. Journal of the American Statistical Association, 119 0 (548): 0 2844--2856, 2024
work page 2024
-
[43]
Conditional testing based on localized conformal p-values
Xiaoyang Wu, Lin Lu, Zhaojun Wang, and Changliang Zou. Conditional testing based on localized conformal p-values. In The Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[44]
Online multiple testing with e-values
Ziyu Xu and Aaditya Ramdas. Online multiple testing with e-values. In International Conference on Artificial Intelligence and Statistics, pages 3997--4005. PMLR, 2024
work page 2024
-
[45]
Automs: automatic model selection for novelty detection with error rate control
Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, and Dejing Dou. Automs: automatic model selection for novelty detection with error rate control. In Advances in Neural Information Processing Systems, 2022
work page 2022
-
[46]
e- GAI : e-value-based generalized alpha-investing for online false discovery rate control
Yifan Zhang, Zijian Wei, Haojie Ren, and Changliang Zou. e- GAI : e-value-based generalized alpha-investing for online false discovery rate control. In Forty-second International Conference on Machine Learning, 2025
work page 2025
-
[47]
Tijana Zrnic, Aaditya Ramdas, and Michael I. Jordan. Asynchronous online testing of multiple hypotheses. Journal of Machine Learning Research, 22 0 (33): 0 1--39, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.