arxiv: 2605.11511 · v1 · submitted 2026-05-12 · 📊 stat.ML · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Post-ADC Inference: Valid Inference After Active Data Collection

Ichiro Takeuchi, Shuichi Nishino, Teruyuki Katsuoka, Tomohiro Shiraishi

Pith reviewed 2026-05-13 01:45 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords post-ADC inferenceselective inferenceactive data collectionblack-box optimizationSMBOvalid p-valuesconfidence intervalsTPE

0 comments

The pith

Post-ADC inference corrects selection biases from both active data collection and data-driven targets to deliver valid p-values and confidence intervals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Active data collection methods such as sequential model-based optimization concentrate samples in promising regions, which breaks standard statistical inference when the same data is later used for testing or estimation. The paper develops post-ADC inference to adjust for the combined effects of this adaptive sampling and any post-collection choice of inferential target. The method uses selective inference machinery and requires assumptions only on the observation noise. A reader would care because it restores trustworthy p-values and intervals for data that was gathered efficiently but non-randomly. The framework is shown to work empirically for common algorithms including GP-UCB and TPE without placing restrictions on the unknown black-box function.

Core claim

Post-ADC inference accounts for the biases arising from both the active data collection process and the subsequent data-driven target construction by building on selective inference, thereby providing valid p-values and confidence intervals that apply to a broad class of ADC processes under only assumptions on the observation noise.

What carries the argument

The post-ADC inference framework, an extension of selective inference that simultaneously corrects for adaptive sampling bias and data-dependent target selection.

If this is right

Valid p-values and confidence intervals become available for data collected by GP-UCB and TPE.
The same guarantees hold for any ADC process whose sampling depends only on past observations under the stated noise assumptions.
No modeling assumptions are needed about the underlying black-box function or the surrogate used during collection.
Both sources of bias, adaptive sampling and data-driven target construction, are corrected simultaneously.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could support reliable post-hoc hypothesis tests on the outcomes of hyperparameter optimization or neural architecture search runs.
Similar corrections might be developed for other adaptive collection settings such as sequential experimental design or online A/B testing.
Future implementations could integrate the method directly into existing SMBO libraries to report inference results alongside optimization traces.

Load-bearing premise

The observation noise must obey conditions that let selective inference produce exact or asymptotic validity, without any requirements on the black-box function or the surrogate model.

What would settle it

A Monte Carlo experiment in which the p-values produced by post-ADC inference under the null hypothesis are not uniformly distributed between zero and one, or the associated confidence intervals fail to attain nominal coverage.

Figures

Figures reproduced from arXiv: 2605.11511 by Ichiro Takeuchi, Shuichi Nishino, Teruyuki Katsuoka, Tomohiro Shiraishi.

**Figure 2.** Figure 2: Empirical Type-I error and coverage. post-ADC remains close to nominal type-I error and coverage across horizons, while naive and the ablations are anti-conservative and Bonferroni is conservative. In GP-UCB, w/o-η nearly overlaps post-ADC; failure cases appear in Appendix F.5. 6.3 Experiments on Real Data We use four UCI benchmark datasets as black-box objectives, with dataset, feature-selection, and nois… view at source ↗

**Figure 3.** Figure 3: Empirical power. post-ADC gains power as the signal increases across six synthetic objective families, while all Bonferroni results (dashed lines) remain near zero [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

**Figure 4.** Figure 4: Toy example summary. Left: The four possible realizations of the [PITH_FULL_IMAGE:figures/full_fig_p037_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of selective confidence interval lengths (log scale) for [PITH_FULL_IMAGE:figures/full_fig_p050_5.png] view at source ↗

**Figure 6.** Figure 6: Type-I error under the global null f⋆ ≡ 0 as the problem dimension varies over d ∈ {1, 2, 3}, with Nsteps = 50 fixed. The horizontal dashed line marks the nominal level α = 0.05. level γ over the grid {0.1, 0.2, . . . , 0.8}. These denser sensitivity sweeps use 1,000 Monte Carlo replicates per setting [PITH_FULL_IMAGE:figures/full_fig_p051_6.png] view at source ↗

**Figure 7.** Figure 7: Type-I error under the global null f⋆ ≡ 0 as the acquisition hyperparameters vary, with d = 3 and Nsteps = 50 fixed. The horizontal dashed line marks the nominal level α = 0.05 [PITH_FULL_IMAGE:figures/full_fig_p052_7.png] view at source ↗

read the original abstract

The validity of statistical inference depends critically on how data are collected. When data gathered through active data collection (ADC) are reused for a post-hoc inferential task, conventional inference can fail because the sampling is adaptively biased toward regions favored by the collection strategy. This issue is especially pronounced in black-box optimization, where sequential model-based optimization (SMBO) methods such as the tree-structured Parzen estimator (TPE) and Gaussian process upper confidence bound (GP-UCB) preferentially concentrate evaluations in promising regions. We study statistical inference on actively collected data when the inferential target is constructed in a data-dependent manner after data collection. To enable valid inference in this setting, we propose post-ADC inference, a framework that accounts for the biases arising from both the active data collection process and the subsequent data-driven target construction. Our method builds on selective inference and provides valid $p$-values and confidence intervals that correct for both sources of bias. The framework applies to a broad class of ADC processes by imposing only assumptions on the observation noise, without requiring any assumptions on the underlying black-box function or the surrogate model used by the SMBO algorithm. Empirical results also show that post-ADC inference provides valid inference for data collected by GP-UCB and TPE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends selective inference to correct for bias from both active data collection and post-hoc target choice, but the claim that it needs no assumptions on the surrogate model looks hard to square with how TPE actually selects points.

read the letter

The core idea is a framework called post-ADC inference that tries to produce valid p-values and intervals when you reuse data collected by things like GP-UCB or TPE and then pick your inferential target afterward. It conditions on the observed selection path under a noise model and claims this handles both the adaptive sampling bias and the data-dependent target without needing to know the black-box function or the surrogate's internals. That combination is new enough to be worth noticing if the details hold up. The problem itself is practical: lots of ML work runs adaptive optimization and then wants to report statistics on the collected points, and conventional inference breaks there. If this works cleanly, it could reduce the need for separate non-adaptive runs. The empirical checks on GP-UCB and TPE are at least a start toward showing it can be applied. The soft spot is the generality claim. For TPE the selection event encodes the Parzen density ratios and sampling steps at every iteration, so evaluating the conditional distribution given the path seems to require encoding those surrogate mechanics. That undercuts the statement that only noise assumptions are needed. Without a derivation or proof sketch in the abstract, it's unclear whether they achieve exact conditioning, a conservative bound, or an approximation that works in practice. The reader's stress-test note on this point seems on target from what's visible. This is aimed at people doing black-box optimization who also care about downstream inference, or at selective-inference researchers looking for new applications. A reader already familiar with conditioning arguments in post-selection settings would get the most out of it. The work shows clear engagement with the right literature and a real problem, so it deserves a serious referee even if the surrogate-agnostic part needs close checking in the full text. I'd send it to peer review rather than desk reject.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes post-ADC inference, a selective-inference framework for valid p-values and confidence intervals after active data collection (ADC) via SMBO algorithms such as GP-UCB and TPE. The inferential target may be constructed data-dependently after collection; the method corrects for selection bias from both the adaptive sampling path and the target construction. Validity is claimed to hold under noise-model assumptions alone, with no requirements on the black-box function or the surrogate model used by the ADC procedure. Empirical checks on GP-UCB and TPE data are reported to confirm validity.

Significance. If the central claim holds, the work would provide a practical tool for post-hoc inference on adaptively collected data in black-box optimization, extending selective-inference techniques to settings where both the sampling and the target are data-driven. The claimed generality (noise assumptions only) would be a strength relative to methods that require surrogate-specific derivations.

major comments (2)

[Abstract] Abstract: the claim that the framework 'imposes only assumptions on the observation noise, without requiring any assumptions on ... the surrogate model' is load-bearing for the generality result, yet the selection event for TPE encodes the Parzen-estimator density ratios and sampling steps at each iteration. It is unclear how the conditional distribution of the data given the full adaptive path can be characterized (exactly or via Monte Carlo) without reference to the surrogate's update mechanics; a derivation or explicit algorithm showing surrogate-agnostic conditioning is needed.
[Abstract] Abstract (and empirical section): the manuscript asserts that empirical checks on GP-UCB and TPE data succeed, but supplies neither a derivation/proof sketch for the conditional distribution nor a detailed protocol (number of Monte Carlo draws, exact conditioning event definition, coverage tables). Without these, the support for exact validity cannot be evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each major comment below, providing clarifications on the framework's generality and committing to additions that strengthen the presentation of the conditional distribution and empirical protocol.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the framework 'imposes only assumptions on the observation noise, without requiring any assumptions on ... the surrogate model' is load-bearing for the generality result, yet the selection event for TPE encodes the Parzen-estimator density ratios and sampling steps at each iteration. It is unclear how the conditional distribution of the data given the full adaptive path can be characterized (exactly or via Monte Carlo) without reference to the surrogate's update mechanics; a derivation or explicit algorithm showing surrogate-agnostic conditioning is needed.

Authors: The surrogate-agnostic claim means that the validity theorem depends only on the noise distribution and places no restrictions on the form of the surrogate or black-box function. To characterize the conditional distribution given the observed adaptive path, we use Monte Carlo sampling: draw fresh noise realizations from the assumed model, adjust the observed responses accordingly, and re-execute the identical ADC procedure (including all surrogate updates and sampling rules) that produced the original data. Because the practitioner knows and can re-run their chosen surrogate, this procedure requires no additional modeling assumptions. We will add an explicit algorithm box together with a short derivation sketch in the revised manuscript to detail the conditioning step. revision: yes
Referee: [Abstract] Abstract (and empirical section): the manuscript asserts that empirical checks on GP-UCB and TPE data succeed, but supplies neither a derivation/proof sketch for the conditional distribution nor a detailed protocol (number of Monte Carlo draws, exact conditioning event definition, coverage tables). Without these, the support for exact validity cannot be evaluated.

Authors: We agree that the current version lacks sufficient implementation detail. In the revision we will insert a proof sketch deriving the conditional distribution under the noise model, give the precise definition of the conditioning event (the full sequence of sampling decisions plus the data-dependent target construction), report the Monte Carlo protocol (10,000 draws in the reported experiments), and supply coverage tables for both GP-UCB and TPE to allow direct evaluation of the empirical results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation rests on external selective-inference conditioning

full rationale

The paper's central construction conditions the target statistic on the observed ADC selection path under a noise model only. This conditioning step is defined independently of the final inferential target and does not reduce any reported p-value or CI to a fitted parameter or self-referential quantity by construction. The framework explicitly invokes the established selective-inference machinery for the selection event; no load-bearing premise collapses to a self-citation, an ansatz smuggled via prior work, or a renaming of an empirical pattern. The generality claim (no assumptions on f or surrogate) is a correctness question rather than a definitional loop, and the derivation chain remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on extending selective inference with an additional bias-correction layer whose validity is asserted to require only standard assumptions on observation noise; no free parameters, invented entities, or additional axioms are identifiable from the abstract.

axioms (1)

domain assumption Observation noise satisfies standard conditions (e.g., independence, known distribution family) sufficient for the selective-inference correction to remain valid.
The abstract states that the framework imposes only assumptions on the observation noise and applies to a broad class of ADC processes.

pith-pipeline@v0.9.0 · 5535 in / 1492 out tokens · 54227 ms · 2026-05-13T01:45:42.884014+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
The framework applies to a broad class of ADC processes by imposing only assumptions on the observation noise, without requiring any assumptions on the underlying black-box function or the surrogate model... conditional distribution of the test statistic admits a closed-form expression... truncated normal distribution TN(Δ_sel, v_η, Z)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear
conditioning on the selection event E_Y=(T_Y, η_Y)... T(Y)|{E_Y=E_y, Q_Y=Q_y} ∼ TN(Δ_sel, v_η, Z)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

Efficient global optimization of expensive black-box functions.Journal of Global optimiza- tion, 13(4):455–492, 1998

Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions.Journal of Global optimiza- tion, 13(4):455–492, 1998

work page 1998
[2]

Gaussian process optimization in the bandit setting: no regret and exper- imental design

Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: no regret and exper- imental design. InProceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 1015–1022, Madison, WI, USA, 2010. Omnipress

work page 2010
[3]

Algo- rithms for hyper-parameter optimization

James Bergstra, R´ emi Bardenet, Yoshua Bengio, and Bal´ azs K´ egl. Algo- rithms for hyper-parameter optimization. InAdvances in Neural Informa- tion Processing Systems 24, pages 2546–2554, 2011

work page 2011
[4]

Hoos, and Kevin Leyton-Brown

Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model- based optimization for general algorithm configuration. In Carlos A. Coello Coello, editor,Learning and Intelligent Optimization, volume 6683 ofLec- ture Notes in Computer Science, pages 507–523. Springer, 2011

work page 2011
[5]

Peter I. Frazier. Bayesian optimization. InINFORMS TutORials in Oper- ations Research: Recent Advances in Optimization and Modeling of Con- temporary Problems, pages 255–278. INFORMS, 2018

work page 2018
[6]

Selective inference in complex research.Philosophical Transactions of the Royal Society A: Math- ematical, Physical and Engineering Sciences, 367(1906):4255–4271, 2009

Yoav Benjamini, Ruth Heller, and Daniel Yekutieli. Selective inference in complex research.Philosophical Transactions of the Royal Society A: Math- ematical, Physical and Engineering Sciences, 367(1906):4255–4271, 2009

work page 1906
[7]

Optimal inference after model selection.arXiv preprint arXiv:1410.2597, 2014

William Fithian, Dennis Sun, and Jonathan Taylor. Optimal inference after model selection.arXiv preprint arXiv:1410.2597, 2014

work page arXiv 2014
[8]

Ex- act post-selection inference, with application to the lasso.The Annals of Statistics, 44(3):907–927, 2016

Jason D Lee, Dennis L Sun, Yuekai Sun, and Jonathan E Taylor. Ex- act post-selection inference, with application to the lasso.The Annals of Statistics, 44(3):907–927, 2016. 54

work page 2016
[9]

Exact post-selection inference for sequential regression procedures

Ryan J Tibshirani, Jonathan Taylor, Richard Lockhart, and Robert Tib- shirani. Exact post-selection inference for sequential regression procedures. Journal of the American Statistical Association, 111(514):600–620, 2016

work page 2016
[10]

Post-selection inference for- penalized likelihood models.Canadian Journal of Statistics, 46(1):41–61, 2018

Jonathan Taylor and Robert Tibshirani. Post-selection inference for- penalized likelihood models.Canadian Journal of Statistics, 46(1):41–61, 2018

work page 2018
[11]

Selective inference with a randomized response.The Annals of Statistics, 46(2):679–710, 2018

Xiaoying Tian and Jonathan Taylor. Selective inference with a randomized response.The Annals of Statistics, 46(2):679–710, 2018

work page 2018
[12]

Asymptotics of selective inference

Xiaoying Tian and Jonathan Taylor. Asymptotics of selective inference. Scandinavian Journal of Statistics, 44(2):480–499, 2017

work page 2017
[13]

Selective inference for group-sparse linear models

Fan Yang, Rina Foygel Barber, Prateek Jain, and John Lafferty. Selective inference for group-sparse linear models. InAdvances in Neural Information Processing Systems, pages 2469–2477, 2016

work page 2016
[14]

More powerful post-selection inference, with application to the lasso.arXiv preprint arXiv:1801.09037, 2018

Keli Liu, Jelena Markovic, and Robert Tibshirani. More powerful post-selection inference, with application to the lasso.arXiv preprint arXiv:1801.09037, 2018

work page arXiv 2018
[15]

Exact post-selection inference for the generalized lasso path.Electronic Journal of Statistics, 12(1):1053–1097, 2018

Sangwon Hyun, Max G’sell, and Ryan J Tibshirani. Exact post-selection inference for the generalized lasso path.Electronic Journal of Statistics, 12(1):1053–1097, 2018

work page 2018
[16]

Selective inference: The silent killer of replicability

Yoav Benjamini. Selective inference: The silent killer of replicability. 2020

work page 2020
[17]

Post-selection adaptive inference for least angle regression and the lasso.arXiv preprint arXiv:1401.3889, 354, 2014

Jonathan Taylor, Richard Lockhart, Ryan J Tibshirani, and Robert Tib- shirani. Post-selection adaptive inference for least angle regression and the lasso.arXiv preprint arXiv:1401.3889, 354, 2014

work page arXiv 2014
[18]

Selective sequential model selection.arXiv preprint arXiv:1512.02565, 2015

William Fithian, Jonathan Taylor, Robert Tibshirani, and Ryan Tibshi- rani. Selective sequential model selection.arXiv preprint arXiv:1512.02565, 2015. 55

work page arXiv 2015
[19]

Selective inference for hier- archical clustering.Journal of the American Statistical Association, pages 1–11, 2022

Lucy L Gao, Jacob Bien, and Daniela Witten. Selective inference for hier- archical clustering.Journal of the American Statistical Association, pages 1–11, 2022

work page 2022
[20]

Quantifying statis- tical significance of neural network-based image segmentation by selective inference.Advances in Neural Information Processing Systems, 35:31627– 31639, 2022

Vo Nguyen Le Duy, Shogo Iwazaki, and Ichiro Takeuchi. Quantifying statis- tical significance of neural network-based image segmentation by selective inference.Advances in Neural Information Processing Systems, 35:31627– 31639, 2022

work page 2022
[21]

Valid p-value for deep learning-driven salient region

Daiki Miwa, Duy Vo Nguyen Le, and Ichiro Takeuchi. Valid p-value for deep learning-driven salient region. InProceedings of the 11th International Conference on Learning Representation, 2023

work page 2023
[22]

Statistical test for feature selection pipelines by selective in- ference

Tomohiro Shiraishi, Tatsuya Matsukawa, Shuichi Nishino, and Ichiro Takeuchi. Statistical test for feature selection pipelines by selective in- ference. InForty-second International Conference on Machine Learning, 2025

work page 2025
[23]

Statistical test for auto feature engineering by selective inference

Tatsuya Matsukawa, Tomohiro Shiraishi, Shuichi Nishino, Teruyuki Kat- suoka, and Ichiro Takeuchi. Statistical test for auto feature engineering by selective inference. InThe 28th International Conference on Artificial Intelligence and Statistics, 2025

work page 2025
[24]

Unifying approach to selective inference with applications to cross-validation.arXiv preprint arXiv:1703.06559, 2017

Jelena Markovic, Lucy Xia, and Jonathan Taylor. Unifying approach to selective inference with applications to cross-validation.arXiv preprint arXiv:1703.06559, 2017

work page arXiv 2017
[25]

Why adap- tively collected data have negative bias and how to correct for it

Xinkun Nie, Xiaoying Tian, Jonathan Taylor, and James Zou. Why adap- tively collected data have negative bias and how to correct for it. InProceed- ings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 ofProceedings of Machine Learning Research, pages 1261–1269. PMLR, 2018

work page 2018
[26]

Selective randomization inference for adaptive experiments.arXiv preprint arXiv:2405.07026, 2024

Tobias Freidling, Qingyuan Zhao, and Zijun Gao. Selective randomization inference for adaptive experiments.arXiv preprint arXiv:2405.07026, 2024. 56

work page arXiv 2024
[27]

Optimal conditional inference in adap- tive experiments.arXiv preprint arXiv:2309.12162, 2023

Jiafeng Chen and Isaiah Andrews. Optimal conditional inference in adap- tive experiments.arXiv preprint arXiv:2309.12162, 2023

work page arXiv 2023
[28]

A (tight) upper bound for the length of confidence intervals with conditional coverage.Electronic Journal of Statistics, 18(1):1677–1701, 2024

Danijel Kivaranovic and Hannes Leeb. A (tight) upper bound for the length of confidence intervals with conditional coverage.Electronic Journal of Statistics, 18(1):1677–1701, 2024

work page 2024
[29]

Springer, 1986

Erich Leo Lehmann, Joseph P Romano, et al.Testing statistical hypotheses, volume 3. Springer, 1986. 57

work page 1986