Feedback-Enhanced Online Multiple Testing with Applications to Conformal Selection

Changliang Zou; Haojie Ren; Lin Lu; Yuyang Huo; Zhaojun Wang

arxiv: 2509.03297 · v2 · submitted 2025-09-03 · 📊 stat.ME · stat.ML

Feedback-Enhanced Online Multiple Testing with Applications to Conformal Selection

Lin Lu , Yuyang Huo , Haojie Ren , Zhaojun Wang , Changliang Zou This is my paper

Pith reviewed 2026-05-18 19:31 UTC · model grok-4.3

classification 📊 stat.ME stat.ML

keywords online multiple testingfalse discovery ratefeedbackalpha investingconformal selectionsequential decision making

0 comments

The pith

GAIF uses revealed hypothesis outcomes to dynamically adjust testing thresholds while preserving finite-sample FDR control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GAIF, a feedback-enhanced version of generalized alpha-investing for online multiple testing. In this setting decisions on hypotheses are made one after another, and the true status of each hypothesis becomes known afterward, either right away or after some delay. GAIF feeds those revealed outcomes back into the threshold-setting rule so that future decisions can be made more aggressively without breaking the finite-sample false discovery rate guarantee. The same feedback idea is then applied to online conformal selection: independent conformal p-values are built, and a feedback-driven rule picks the strongest scoring model or function at each step to raise power.

Core claim

GAIF is a generalized alpha-investing procedure that receives the true label of each tested hypothesis after the rejection decision has been issued. It uses the observed label to update the remaining alpha budget for all future tests, thereby producing data-dependent thresholds that still guarantee finite-sample FDR or marginal FDR control. When the same feedback loop is attached to a stream of conformal p-values, a model-selection step chooses the score function that has performed best on the already-revealed labels, which increases the number of discoveries while the FDR bound remains intact.

What carries the argument

GAIF, the feedback-enhanced generalized alpha-investing rule that updates the alpha investment level after each revealed outcome and then sets the next rejection threshold from the updated budget.

If this is right

Sequential testing procedures can now incorporate delayed outcome information without sacrificing exact finite-sample error control.
Conformal selection gains an automatic way to switch between candidate score functions using only the revealed labels.
The same feedback mechanism applies to any alpha-investing scheme, not only to the generalized version presented here.
Power gains are largest when the revealed outcomes are informative about the remaining hypotheses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be paired with bandit-style allocation rules that decide which hypotheses to test next on the basis of past feedback.
In streaming data settings the procedure might be run with a sliding window on the revealed labels to adapt to distribution drift.
Because conformal p-values are constructed to be independent of the selection rule, the feedback-driven model choice remains valid under the same marginal FDR bound.

Load-bearing premise

After each decision the true state of the hypothesis is revealed and can be fed back into the threshold rule without destroying the finite-sample FDR guarantee.

What would settle it

Run GAIF on a stream of hypotheses whose true labels are generated adversarially after each decision; check whether the realized proportion of false discoveries exceeds the nominal FDR level at any finite horizon.

Figures

Figures reproduced from arXiv: 2509.03297 by Changliang Zou, Haojie Ren, Lin Lu, Yuyang Huo, Zhaojun Wang.

**Figure 1.** Figure 1: depicts the testing thresholds {αt} over time t for various procedures applied to Gaussian observations. It is clear that our methods yield larger thresholds after improving the gap via feedback, with α SF t > αSAFFRON t and α LF t > αLORD++ t in average. This illustrates that the GAIF framework leverages alpha-wealth more effectively through feedback, thereby achieving higher power than the traditional GA… view at source ↗

**Figure 2.** Figure 2: Results for Scenario I and Scenario II. Line charts of FDR and Power at stopping time with varying non-null proportion π1 from 0.1 to 0.8 after 500 replications; The black dashed lines denote the FDR level α = 0.1. Shaded areas show ±1 standard error. 0.0 0.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8 π1 FDR 0.00 0.25 0.50 0.75 1.00 0.2 0.4 0.6 0.8 π1 Power Method SFdep SF LFdep LF SAFFRONdep SAFFRON LORDdep LORD++ L… view at source ↗

**Figure 3.** Figure 3: Results for Scenario III: FDR and power at stopping time across 500 replications with non-null proportion π1 ranging from 0.1 to 0.8. The black dashed line indicates the target FDR level α = 0.1. Shaded areas show ±1 standard error. • Scenario IV: Data is generated as X | Y = 0 ∼ N4 (µ1, I4), and X | Y = 1 ∼ N4 (µ2, I4), where µ1 = (1, 0, 0, 0)⊤, µ2 = (0, 0, −2, −2)⊤. The target region is A = {1}. We set t… view at source ↗

**Figure 4.** Figure 4: reports the online FDR and power at the stopping time T under Scenario IV across varying non-null proportions π1 ∈ [0.1, 0.8]. All methods control the FDR below the nominal level α, with SF aligning most closely with the target level among all competitors. In terms of power, as expected, SF consistently achieves the highest power, while LF also performs competitively and attains higher power than SAFFRON a… view at source ↗

**Figure 5.** Figure 5: Results for Scenario V (sine pattern shifts): the values of FDR(t) and Power(t) across different time t. The black dashed lines denote the FDR level α = 0.1. Shaded areas show ±1 standard error. 5 Real Data Applications In this section, we evaluate our proposed methods on four real-world datasets, illustrating their practical benefits in diverse online decision-making tasks. • Task 1: Online Candidate Scre… view at source ↗

**Figure 6.** Figure 6: Results for real-data applications: the values of FDR(δ t ) and Power(δ t ) over time t for six benchmarks. The black dashed lines indicate the FDR level α = 0.3. Shaded areas show ±1 standard error [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: The FDR and Power for SAFFRON at stopping time 600 under different λ value for target FDR level α = 0.1. The red lines denote the results for the variant of feedback method. D Extensions of GAIF based on e-values Although feedback cannot be directly used to improve e-LOND (Xu and Ramdas, 2024), it can enhance e-LORD and e-SAFFRON (Zhang et al., 2025) through a feedback-driven 50 [PITH_FULL_IMAGE:figures/f… view at source ↗

**Figure 8.** Figure 8: Results for Scenario VI: values of FDR(T) and Power(T) at stopping time T across different non-null proportions π1. The black dashed line denotes the FDR level α = 0.2. The results for Scenarios IV and VI under different training algorithms—RF, SVM, and NN—with varying initial calibration sizes are presented in [PITH_FULL_IMAGE:figures/full_fig_p054_8.png] view at source ↗

**Figure 9.** Figure 9: Results for Scenario IV: FDR(T) and Power(T) vs. initial calibration size n (π1 = 0.5, α = 0.2). RF SVM NN FDR Power 200 400 600 200 400 600 200 400 600 0.0 0.1 0.2 0.00 0.25 0.50 0.75 n Method SF LF SFS LFS SAFFRON LORD++ LOND [PITH_FULL_IMAGE:figures/full_fig_p055_9.png] view at source ↗

**Figure 10.** Figure 10: Results for Scenario VI: FDR(T) and Power(T) vs. initial calibration size n (π1 = 0.5, α = 0.2). The results with model selection for Scenarios IV and VI are shown below [PITH_FULL_IMAGE:figures/full_fig_p055_10.png] view at source ↗

**Figure 11.** Figure 11: Results for Scenario IV: the values of FDR(T) and Power(T) at stopping time T across different non-null proportion π1. The black dashed lines denote the FDR level α = 0.1. 0.00 0.25 0.50 0.75 1.00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 π1 FDR 0.00 0.25 0.50 0.75 1.00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 π1 Power Method Opt−LF Ran−LF Opt−LFS Ran−LFS LORD++ 0.00 0.25 0.50 0.75 1.00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 π1 FD… view at source ↗

**Figure 12.** Figure 12: Results for Scenario VI: the values of FDR(T) and Power(T) at stopping time T across different non-null proportion π1. The black dashed lines denote the FDR level α = 0.1. 56 [PITH_FULL_IMAGE:figures/full_fig_p056_12.png] view at source ↗

**Figure 13.** Figure 13: Results for Scenario III (local dependence): Line charts of mFDR and FDR at stopping time with varying non-null proportion π1 from 0.1 to 0.8. The black dashed lines denote the target FDR level α = 0.1. 0.0 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 π1 mFDR Scenario IV 0.0 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 π1 mFDR Scenario VI 0.0 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 π1 FDR 0.… view at source ↗

**Figure 14.** Figure 14: Results for Scenario IV and Scenario VI : Line charts of mFDR and FDR at stopping time with varying non-null proportion π1 from 0.1 to 0.8 after 500 replications; The black dashed lines denote the target FDR level α = 0.2. 57 [PITH_FULL_IMAGE:figures/full_fig_p057_14.png] view at source ↗

read the original abstract

We study online multiple testing with feedback, where decisions are made sequentially and the true state of the hypothesis is revealed after the decision has been made, either instantly or with a delay. We propose GAIF, a feedback-enhanced generalized alpha-investing framework that dynamically adjusts thresholds using revealed outcomes, ensuring finite-sample false discovery rate (FDR)/marginal FDR control. Extending GAIF to online conformal testing, we construct independent conformal $p$-values and introduce a feedback-driven model selection criterion to identify the best model/score, thereby improving statistical power. We demonstrate the effectiveness of our methods through numerical simulations and real-data applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GAIF adds feedback from revealed outcomes to adjust thresholds in online alpha-investing and applies the idea to sequential conformal model selection, but the finite-sample FDR claim under delayed feedback needs explicit verification.

read the letter

The main contribution is a feedback loop inside generalized alpha-investing: once a hypothesis outcome is revealed (instantly or later), the rule updates the remaining alpha-wealth or threshold for future tests. They then carry the same mechanism over to online conformal testing by using the feedback to pick among candidate models or scores on the fly. That combination is not in the earlier alpha-investing or conformal literature they cite, and the simulations plus real-data examples show power gains without obvious inflation of FDR in the reported cases.

Referee Report

2 major / 2 minor

Summary. The paper proposes GAIF, a feedback-enhanced generalized alpha-investing procedure for online multiple testing in which the true state of each hypothesis is revealed after the decision (instantly or with delay). It claims that GAIF dynamically updates thresholds using these revelations while preserving finite-sample FDR and marginal FDR control. The method is extended to online conformal testing by constructing independent conformal p-values and introducing a feedback-driven model selection step that improves power. Numerical simulations and real-data examples are provided to illustrate gains over non-feedback baselines.

Significance. If the finite-sample control result holds under delayed feedback, the work would meaningfully extend alpha-investing ideas to realistic sequential settings with post-decision revelations, offering a practical route to higher power in conformal selection and related online testing problems. The conformal extension and empirical demonstrations are clear strengths.

major comments (2)

[§4.1, Theorem 2] §4.1, Theorem 2 (FDR control under delay): the supermartingale argument for the alpha-wealth process is stated for the immediate-revelation filtration; the extension to delayed feedback requires an explicit re-derivation showing that the wealth increment remains a supermartingale when the revelation for hypothesis i arrives after decisions for some j > i have already been made. Without this step the finite-sample bound does not automatically carry over.
[§5.3] §5.3, conformal p-value construction: the independence claim for the conformal p-values under the feedback-driven model selection criterion is not accompanied by a precise statement of the filtration or conditioning that prevents the selection step from introducing dependence between the p-value for the current hypothesis and the feedback used to choose the score function.

minor comments (2)

[§3] Notation for the delayed revelation time τ_i is introduced in §3 but used inconsistently in the algorithm pseudocode; a single consistent definition would improve readability.
[Figure 3] Figure 3 caption does not specify the delay distribution used in the simulation; adding this detail would allow readers to reproduce the power curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major comment below and indicate the revisions we will incorporate.

read point-by-point responses

Referee: [§4.1, Theorem 2] §4.1, Theorem 2 (FDR control under delay): the supermartingale argument for the alpha-wealth process is stated for the immediate-revelation filtration; the extension to delayed feedback requires an explicit re-derivation showing that the wealth increment remains a supermartingale when the revelation for hypothesis i arrives after decisions for some j > i have already been made. Without this step the finite-sample bound does not automatically carry over.

Authors: We agree that the supermartingale argument as currently written is developed explicitly under immediate revelation. In the revision we will add a dedicated subsection that re-derives the supermartingale property under delayed feedback. The argument will use a filtration that includes all decisions made before the delayed revelation arrives and will verify that the wealth increment remains a supermartingale with respect to this filtration, thereby preserving the finite-sample FDR bound. revision: yes
Referee: [§5.3] §5.3, conformal p-value construction: the independence claim for the conformal p-values under the feedback-driven model selection criterion is not accompanied by a precise statement of the filtration or conditioning that prevents the selection step from introducing dependence between the p-value for the current hypothesis and the feedback used to choose the score function.

Authors: We will expand Section 5.3 with an explicit statement of the filtration and the conditioning argument. The model-selection step is measurable with respect to the sigma-field generated by all previous revelations and decisions; the conformal p-value for the current hypothesis is then constructed from a score function chosen conditionally on that sigma-field. Because the conformal scores remain exchangeable under the null conditionally on the selected model, the resulting p-value is independent of the selection step and valid for the online procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity; GAIF extends alpha-investing with independent feedback control derivation

full rationale

The paper presents GAIF as an extension of generalized alpha-investing that incorporates revealed outcomes (instant or delayed) to dynamically adjust thresholds while preserving finite-sample FDR/mFDR control. This control is derived from the supermartingale property of the alpha-wealth process under the feedback filtration, building on but not reducing to prior alpha-investing results. No step equates the claimed guarantee to a fitted parameter or self-citation by construction; the feedback update rule and conformal p-value construction introduce new elements that are explicitly re-derived for the delayed case. The framework remains self-contained against external benchmarks for FDR control.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework relies on standard definitions of FDR and marginal FDR plus the assumption that feedback provides accurate ground truth. No new free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Revealed outcomes after each decision are accurate and can be used for threshold updates without breaking finite-sample FDR control.
This is the core setting that enables the feedback mechanism described in the abstract.

pith-pipeline@v0.9.0 · 5636 in / 1072 out tokens · 28257 ms · 2026-05-18T19:31:33.275028+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GAIF procedure adjusts thresholds using revealed outcomes θt for finite-sample FDR/mFDR control under conditional super-uniformity
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

online conformal p-values constructed by updating calibration set C0t under exchangeability Assumption 1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables
cs.LG 2026-04 unverdicted novelty 7.0

Post-hoc conformal selection creates a path of selection sets with estimated false discovery proportions, enabling data-driven adaptive FDR control with average reliability guarantees via e-variables and e-BH.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Generalized -investing: definitions, optimality results and application to public databases

Ehud Aharoni and Saharon Rosset. Generalized -investing: definitions, optimality results and application to public databases. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76 0 (4): 0 771--794, 2014

work page 2014
[2]

Theoretical Foundations of Conformal Prediction

Anastasios N Angelopoulos, Rina Foygel Barber, and Stephen Bates. Theoretical foundations of conformal prediction. arXiv preprint arXiv:2411.11824, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

Optimized conformal selection: Powerful selective inference after conformity score optimization

Tian Bai and Ying Jin. Optimized conformal selection: Powerful selective inference after conformity score optimization. arXiv preprint arXiv:2411.17983, 2024

work page arXiv 2024
[4]

Testing for outliers with conformal p-values

Stephen Bates, Emmanuel Cand \`e s, Lihua Lei, Yaniv Romano, and Matteo Sesia. Testing for outliers with conformal p-values. The Annals of Statistics, 51 0 (1): 0 149--178, 2023

work page 2023
[5]

Adult income investigation

Barry Becker and Ronny Kohavi. Adult income investigation . UCI Machine Learning Repository https://archive-beta.ics.uci.edu/dataset/2/adult, 1996

work page 1996
[6]

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57 0 (1): 0 289--300, 1995

work page 1995
[7]

Airfoil Self-Noise

Thomas Brooks, D Pope, and Michael Marcolini. Airfoil Self-Noise . UCI Machine Learning Repository https://archive.ics.uci.edu/dataset/291/airfoil+self+noise, 2014

work page 2014
[8]

On-line consistent ranking on e-recruitment: seeking the truth behind a well-formed cv

Evanthia Faliagka, Lazaros Iliadis, Ioannis Karydis, Maria Rigou, Spyros Sioutas, Athanasios Tsakalidis, and Giannis Tzimas. On-line consistent ranking on e-recruitment: seeking the truth behind a well-formed cv. Artificial Intelligence Review, 42 0 (3): 0 515--528, 2014

work page 2014
[9]

Online generalizations of the e-BH and BH procedure

Lasse Fischer, Ziyu Xu, and Aaditya Ramdas. Online generalizations of the e-BH and BH procedure. arXiv preprint arXiv:2407.20683, 2024

work page arXiv 2024
[10]

Online false discovery rate control for LORD++ and SAFFRON under positive, local dependence

Aaron Fisher. Online false discovery rate control for LORD++ and SAFFRON under positive, local dependence. Biometrical Journal, 66 0 (1): 0 2300177, 2024

work page 2024
[11]

-investing: a procedure for sequential control of expected false discoveries

Dean P Foster and Robert A Stine. -investing: a procedure for sequential control of expected false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70 0 (2): 0 429--444, 2008

work page 2008
[12]

Structure--adaptive sequential testing for online false discovery rate control

Bowen Gang, Wenguang Sun, and Weinan Wang. Structure--adaptive sequential testing for online false discovery rate control. Journal of the American Statistical Association, pages 1--14, 2021

work page 2021
[13]

Conformal online model aggregation

Matteo Gasparin and Aaditya Ramdas. Conformal online model aggregation. arXiv preprint arXiv:2403.15527, 2024

work page arXiv 2024
[14]

Adaptive conformal inference under distribution shift

Isaac Gibbs and Emmanuel Cand \`e s. Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34: 0 1660--1672, 2021

work page 2021
[15]

Conformal inference for online prediction with arbitrary distribution shifts

Isaac Gibbs and Emmanuel J Cand \`e s. Conformal inference for online prediction with arbitrary distribution shifts. Journal of Machine Learning Research, 25 0 (162): 0 1--36, 2024

work page 2024
[16]

Conformal alignment: Knowing when to trust foundation models with guarantees

Yu Gui, Ying Jin, and Zhimei Ren. Conformal alignment: Knowing when to trust foundation models with guarantees. Advances in Neural Information Processing Systems, 37: 0 73884--73919, 2024

work page 2024
[17]

ACS : An interactive framework for conformal selection

Yu Gui, Ying Jin, Yash Nair, and Zhimei Ren. ACS : An interactive framework for conformal selection. arXiv preprint arXiv:2507.15825, 2025

work page arXiv 2025
[18]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43 0 (2): 0 1--55, 2025

work page 2025
[19]

Online selective conformal inference: adaptive scores, convergence rate and optimality

Pierre Humbert, Ulysse Gazin, Ruth Heller, and Etienne Roquain. Online selective conformal inference: adaptive scores, convergence rate and optimality. arXiv preprint arXiv:2508.10336, 2025

work page arXiv 2025
[20]

Real-time selection under general constraints via predictive inference

Yuyang Huo, Lin Lu, Haojie Ren, and Changliang Zou. Real-time selection under general constraints via predictive inference. Advances in Neural Information Processing Systems, 37: 0 61267--61305, 2024

work page 2024
[21]

On Online Control of False Discovery Rate

Adel Javanmard and Andrea Montanari. On online control of false discovery rate. arXiv preprint arXiv:1502.06197, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

Online rules for control of false discovery rate and false discovery exceedance

Adel Javanmard and Andrea Montanari. Online rules for control of false discovery rate and false discovery exceedance. The Annals of Statistics, 46 0 (2): 0 526--554, 2018

work page 2018
[23]

Model-free selective inference under covariate shift via weighted conformal p-values

Ying Jin and Emmanuel J Cand \`e s. Model-free selective inference under covariate shift via weighted conformal p-values. arXiv preprint arXiv:2307.09291, 2023 a

work page arXiv 2023
[24]

Selection by prediction with conformal p-values

Ying Jin and Emmanuel J Cand \`e s. Selection by prediction with conformal p-values. Journal of Machine Learning Research, 24 0 (244): 0 1--41, 2023 b

work page 2023
[25]

Candidate selection dataset

Kaggle. Candidate selection dataset. https://www.kaggle.com/datasets/tarunchilkur/client, 2020

work page 2020
[26]

Diabetes health indicators dataset

Kaggle. Diabetes health indicators dataset. https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset, 2021

work page 2021
[27]

Fdr control for online anomaly detection

Etienne Kr \"o nert, Alain C \'e lisse, and Dalila Hattab. Fdr control for online anomaly detection. arXiv preprint arXiv:2312.01969, 2023

work page arXiv 2023
[28]

Adaptive novelty detection with false discovery rate guarantee

Ariane Marandon, Lihua Lei, David Mary, and Etienne Roquain. Adaptive novelty detection with false discovery rate guarantee . The Annals of Statistics, 52 0 (1): 0 157 -- 183, 2024

work page 2024
[29]

Diversifying conformal selections

Yash Nair, Ying Jin, James Yang, and Emmanuel Candes. Diversifying conformal selections. arXiv preprint arXiv:2506.16229, 2025

work page arXiv 2025
[30]

WATCH : Adaptive monitoring for AI deployments via weighted-conformal martingales

Drew Prinster, Xing Han, Anqi Liu, and Suchi Saria. WATCH : Adaptive monitoring for AI deployments via weighted-conformal martingales. In Forty-second International Conference on Machine Learning, 2025

work page 2025
[31]

Online control of the false discovery rate with decaying memory

Aaditya Ramdas, Fanny Yang, Martin J Wainwright, and Michael I Jordan. Online control of the false discovery rate with decaying memory. Advances in Neural Information Processing Systems, 30: 0 5655--5664, 2017

work page 2017
[32]

SAFFRON : an adaptive algorithm for online control of the false discovery rate

Aaditya Ramdas, Tijana Zrnic, Martin Wainwright, and Michael Jordan. SAFFRON : an adaptive algorithm for online control of the false discovery rate. In International Conference on Machine Learning, pages 4286--4294. PMLR, 2018

work page 2018
[33]

A unified treatment of multiple testing with prior knowledge using the p-filter

Aaditya K Ramdas, Rina F Barber, Martin J Wainwright, and Michael I Jordan. A unified treatment of multiple testing with prior knowledge using the p-filter. The Annals of Statistics, 47 0 (5): 0 2790--2821, 2019

work page 2019
[34]

Online false discovery rate control for anomaly detection in time series

Quentin Rebjock, Baris Kurt, Tim Januschowski, and Laurent Callot. Online false discovery rate control for anomaly detection in time series. Advances in Neural Information Processing Systems, 34: 0 26487--26498, 2021

work page 2021
[35]

Online error rate control for platform trials

David S Robertson, James MS Wason, Franz K \"o nig, Martin Posch, and Thomas Jaki. Online error rate control for platform trials. Statistics in Medicine, 42 0 (14): 0 2475--2495, 2023

work page 2023
[36]

Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach

John D Storey, Jonathan E Taylor, and David Siegmund. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society Series B: Statistical Methodology, 66 0 (1): 0 187--205, 2004

work page 2004
[37]

ADDIS : an adaptive discarding algorithm for online FDR control with conservative nulls

Jinjin Tian and Aaditya Ramdas. ADDIS : an adaptive discarding algorithm for online FDR control with conservative nulls. Advances in Neural Information Processing Systems, 32: 0 9388--9396, 2019

work page 2019
[38]

Testing randomness online

Vladimir Vovk. Testing randomness online. Statistical Science, 36 0 (4): 0 595--611, 2021

work page 2021
[39]

Testing exchangeability on-line

Vladimir Vovk, Ilia Nouretdinov, and Alexander Gammerman. Testing exchangeability on-line. In Proceedings of the 20th International Conference on Machine Learning, pages 768--775, 2003

work page 2003
[40]

Algorithmic learning in a random world

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic learning in a random world. New York: Springer, 2005

work page 2005
[41]

Conformalized multiple testing after data-dependent selection

Xiaoning Wang, Yuyang Huo, Liuhua Peng, and Changliang Zou. Conformalized multiple testing after data-dependent selection. Advances in Neural Information Processing Systems, 37: 0 58574--58609, 2024

work page 2024
[42]

Optimal subsampling via predictive inference

Xiaoyang Wu, Yuyang Huo, Haojie Ren, and Changliang Zou. Optimal subsampling via predictive inference. Journal of the American Statistical Association, 119 0 (548): 0 2844--2856, 2024

work page 2024
[43]

Conditional testing based on localized conformal p-values

Xiaoyang Wu, Lin Lu, Zhaojun Wang, and Changliang Zou. Conditional testing based on localized conformal p-values. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025
[44]

Online multiple testing with e-values

Ziyu Xu and Aaditya Ramdas. Online multiple testing with e-values. In International Conference on Artificial Intelligence and Statistics, pages 3997--4005. PMLR, 2024

work page 2024
[45]

Automs: automatic model selection for novelty detection with error rate control

Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, and Dejing Dou. Automs: automatic model selection for novelty detection with error rate control. In Advances in Neural Information Processing Systems, 2022

work page 2022
[46]

e- GAI : e-value-based generalized alpha-investing for online false discovery rate control

Yifan Zhang, Zijian Wei, Haojie Ren, and Changliang Zou. e- GAI : e-value-based generalized alpha-investing for online false discovery rate control. In Forty-second International Conference on Machine Learning, 2025

work page 2025
[47]

Tijana Zrnic, Aaditya Ramdas, and Michael I. Jordan. Asynchronous online testing of multiple hypotheses. Journal of Machine Learning Research, 22 0 (33): 0 1--39, 2021

work page 2021

[1] [1]

Generalized -investing: definitions, optimality results and application to public databases

Ehud Aharoni and Saharon Rosset. Generalized -investing: definitions, optimality results and application to public databases. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76 0 (4): 0 771--794, 2014

work page 2014

[2] [2]

Theoretical Foundations of Conformal Prediction

Anastasios N Angelopoulos, Rina Foygel Barber, and Stephen Bates. Theoretical foundations of conformal prediction. arXiv preprint arXiv:2411.11824, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

Optimized conformal selection: Powerful selective inference after conformity score optimization

Tian Bai and Ying Jin. Optimized conformal selection: Powerful selective inference after conformity score optimization. arXiv preprint arXiv:2411.17983, 2024

work page arXiv 2024

[4] [4]

Testing for outliers with conformal p-values

Stephen Bates, Emmanuel Cand \`e s, Lihua Lei, Yaniv Romano, and Matteo Sesia. Testing for outliers with conformal p-values. The Annals of Statistics, 51 0 (1): 0 149--178, 2023

work page 2023

[5] [5]

Adult income investigation

Barry Becker and Ronny Kohavi. Adult income investigation . UCI Machine Learning Repository https://archive-beta.ics.uci.edu/dataset/2/adult, 1996

work page 1996

[6] [6]

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57 0 (1): 0 289--300, 1995

work page 1995

[7] [7]

Airfoil Self-Noise

Thomas Brooks, D Pope, and Michael Marcolini. Airfoil Self-Noise . UCI Machine Learning Repository https://archive.ics.uci.edu/dataset/291/airfoil+self+noise, 2014

work page 2014

[8] [8]

On-line consistent ranking on e-recruitment: seeking the truth behind a well-formed cv

Evanthia Faliagka, Lazaros Iliadis, Ioannis Karydis, Maria Rigou, Spyros Sioutas, Athanasios Tsakalidis, and Giannis Tzimas. On-line consistent ranking on e-recruitment: seeking the truth behind a well-formed cv. Artificial Intelligence Review, 42 0 (3): 0 515--528, 2014

work page 2014

[9] [9]

Online generalizations of the e-BH and BH procedure

Lasse Fischer, Ziyu Xu, and Aaditya Ramdas. Online generalizations of the e-BH and BH procedure. arXiv preprint arXiv:2407.20683, 2024

work page arXiv 2024

[10] [10]

Online false discovery rate control for LORD++ and SAFFRON under positive, local dependence

Aaron Fisher. Online false discovery rate control for LORD++ and SAFFRON under positive, local dependence. Biometrical Journal, 66 0 (1): 0 2300177, 2024

work page 2024

[11] [11]

-investing: a procedure for sequential control of expected false discoveries

Dean P Foster and Robert A Stine. -investing: a procedure for sequential control of expected false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70 0 (2): 0 429--444, 2008

work page 2008

[12] [12]

Structure--adaptive sequential testing for online false discovery rate control

Bowen Gang, Wenguang Sun, and Weinan Wang. Structure--adaptive sequential testing for online false discovery rate control. Journal of the American Statistical Association, pages 1--14, 2021

work page 2021

[13] [13]

Conformal online model aggregation

Matteo Gasparin and Aaditya Ramdas. Conformal online model aggregation. arXiv preprint arXiv:2403.15527, 2024

work page arXiv 2024

[14] [14]

Adaptive conformal inference under distribution shift

Isaac Gibbs and Emmanuel Cand \`e s. Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34: 0 1660--1672, 2021

work page 2021

[15] [15]

Conformal inference for online prediction with arbitrary distribution shifts

Isaac Gibbs and Emmanuel J Cand \`e s. Conformal inference for online prediction with arbitrary distribution shifts. Journal of Machine Learning Research, 25 0 (162): 0 1--36, 2024

work page 2024

[16] [16]

Conformal alignment: Knowing when to trust foundation models with guarantees

Yu Gui, Ying Jin, and Zhimei Ren. Conformal alignment: Knowing when to trust foundation models with guarantees. Advances in Neural Information Processing Systems, 37: 0 73884--73919, 2024

work page 2024

[17] [17]

ACS : An interactive framework for conformal selection

Yu Gui, Ying Jin, Yash Nair, and Zhimei Ren. ACS : An interactive framework for conformal selection. arXiv preprint arXiv:2507.15825, 2025

work page arXiv 2025

[18] [18]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43 0 (2): 0 1--55, 2025

work page 2025

[19] [19]

Online selective conformal inference: adaptive scores, convergence rate and optimality

Pierre Humbert, Ulysse Gazin, Ruth Heller, and Etienne Roquain. Online selective conformal inference: adaptive scores, convergence rate and optimality. arXiv preprint arXiv:2508.10336, 2025

work page arXiv 2025

[20] [20]

Real-time selection under general constraints via predictive inference

Yuyang Huo, Lin Lu, Haojie Ren, and Changliang Zou. Real-time selection under general constraints via predictive inference. Advances in Neural Information Processing Systems, 37: 0 61267--61305, 2024

work page 2024

[21] [21]

On Online Control of False Discovery Rate

Adel Javanmard and Andrea Montanari. On online control of false discovery rate. arXiv preprint arXiv:1502.06197, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[22] [22]

Online rules for control of false discovery rate and false discovery exceedance

Adel Javanmard and Andrea Montanari. Online rules for control of false discovery rate and false discovery exceedance. The Annals of Statistics, 46 0 (2): 0 526--554, 2018

work page 2018

[23] [23]

Model-free selective inference under covariate shift via weighted conformal p-values

Ying Jin and Emmanuel J Cand \`e s. Model-free selective inference under covariate shift via weighted conformal p-values. arXiv preprint arXiv:2307.09291, 2023 a

work page arXiv 2023

[24] [24]

Selection by prediction with conformal p-values

Ying Jin and Emmanuel J Cand \`e s. Selection by prediction with conformal p-values. Journal of Machine Learning Research, 24 0 (244): 0 1--41, 2023 b

work page 2023

[25] [25]

Candidate selection dataset

Kaggle. Candidate selection dataset. https://www.kaggle.com/datasets/tarunchilkur/client, 2020

work page 2020

[26] [26]

Diabetes health indicators dataset

Kaggle. Diabetes health indicators dataset. https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset, 2021

work page 2021

[27] [27]

Fdr control for online anomaly detection

Etienne Kr \"o nert, Alain C \'e lisse, and Dalila Hattab. Fdr control for online anomaly detection. arXiv preprint arXiv:2312.01969, 2023

work page arXiv 2023

[28] [28]

Adaptive novelty detection with false discovery rate guarantee

Ariane Marandon, Lihua Lei, David Mary, and Etienne Roquain. Adaptive novelty detection with false discovery rate guarantee . The Annals of Statistics, 52 0 (1): 0 157 -- 183, 2024

work page 2024

[29] [29]

Diversifying conformal selections

Yash Nair, Ying Jin, James Yang, and Emmanuel Candes. Diversifying conformal selections. arXiv preprint arXiv:2506.16229, 2025

work page arXiv 2025

[30] [30]

WATCH : Adaptive monitoring for AI deployments via weighted-conformal martingales

Drew Prinster, Xing Han, Anqi Liu, and Suchi Saria. WATCH : Adaptive monitoring for AI deployments via weighted-conformal martingales. In Forty-second International Conference on Machine Learning, 2025

work page 2025

[31] [31]

Online control of the false discovery rate with decaying memory

Aaditya Ramdas, Fanny Yang, Martin J Wainwright, and Michael I Jordan. Online control of the false discovery rate with decaying memory. Advances in Neural Information Processing Systems, 30: 0 5655--5664, 2017

work page 2017

[32] [32]

SAFFRON : an adaptive algorithm for online control of the false discovery rate

Aaditya Ramdas, Tijana Zrnic, Martin Wainwright, and Michael Jordan. SAFFRON : an adaptive algorithm for online control of the false discovery rate. In International Conference on Machine Learning, pages 4286--4294. PMLR, 2018

work page 2018

[33] [33]

A unified treatment of multiple testing with prior knowledge using the p-filter

Aaditya K Ramdas, Rina F Barber, Martin J Wainwright, and Michael I Jordan. A unified treatment of multiple testing with prior knowledge using the p-filter. The Annals of Statistics, 47 0 (5): 0 2790--2821, 2019

work page 2019

[34] [34]

Online false discovery rate control for anomaly detection in time series

Quentin Rebjock, Baris Kurt, Tim Januschowski, and Laurent Callot. Online false discovery rate control for anomaly detection in time series. Advances in Neural Information Processing Systems, 34: 0 26487--26498, 2021

work page 2021

[35] [35]

Online error rate control for platform trials

David S Robertson, James MS Wason, Franz K \"o nig, Martin Posch, and Thomas Jaki. Online error rate control for platform trials. Statistics in Medicine, 42 0 (14): 0 2475--2495, 2023

work page 2023

[36] [36]

Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach

John D Storey, Jonathan E Taylor, and David Siegmund. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society Series B: Statistical Methodology, 66 0 (1): 0 187--205, 2004

work page 2004

[37] [37]

ADDIS : an adaptive discarding algorithm for online FDR control with conservative nulls

Jinjin Tian and Aaditya Ramdas. ADDIS : an adaptive discarding algorithm for online FDR control with conservative nulls. Advances in Neural Information Processing Systems, 32: 0 9388--9396, 2019

work page 2019

[38] [38]

Testing randomness online

Vladimir Vovk. Testing randomness online. Statistical Science, 36 0 (4): 0 595--611, 2021

work page 2021

[39] [39]

Testing exchangeability on-line

Vladimir Vovk, Ilia Nouretdinov, and Alexander Gammerman. Testing exchangeability on-line. In Proceedings of the 20th International Conference on Machine Learning, pages 768--775, 2003

work page 2003

[40] [40]

Algorithmic learning in a random world

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic learning in a random world. New York: Springer, 2005

work page 2005

[41] [41]

Conformalized multiple testing after data-dependent selection

Xiaoning Wang, Yuyang Huo, Liuhua Peng, and Changliang Zou. Conformalized multiple testing after data-dependent selection. Advances in Neural Information Processing Systems, 37: 0 58574--58609, 2024

work page 2024

[42] [42]

Optimal subsampling via predictive inference

Xiaoyang Wu, Yuyang Huo, Haojie Ren, and Changliang Zou. Optimal subsampling via predictive inference. Journal of the American Statistical Association, 119 0 (548): 0 2844--2856, 2024

work page 2024

[43] [43]

Conditional testing based on localized conformal p-values

Xiaoyang Wu, Lin Lu, Zhaojun Wang, and Changliang Zou. Conditional testing based on localized conformal p-values. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025

[44] [44]

Online multiple testing with e-values

Ziyu Xu and Aaditya Ramdas. Online multiple testing with e-values. In International Conference on Artificial Intelligence and Statistics, pages 3997--4005. PMLR, 2024

work page 2024

[45] [45]

Automs: automatic model selection for novelty detection with error rate control

Yifan Zhang, Haiyan Jiang, Haojie Ren, Changliang Zou, and Dejing Dou. Automs: automatic model selection for novelty detection with error rate control. In Advances in Neural Information Processing Systems, 2022

work page 2022

[46] [46]

e- GAI : e-value-based generalized alpha-investing for online false discovery rate control

Yifan Zhang, Zijian Wei, Haojie Ren, and Changliang Zou. e- GAI : e-value-based generalized alpha-investing for online false discovery rate control. In Forty-second International Conference on Machine Learning, 2025

work page 2025

[47] [47]

Tijana Zrnic, Aaditya Ramdas, and Michael I. Jordan. Asynchronous online testing of multiple hypotheses. Journal of Machine Learning Research, 22 0 (33): 0 1--39, 2021

work page 2021