pith. machine review for the scientific record. sign in

arxiv: 2605.06608 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.LG· stat.ME

Recognition: unknown

DARTS: Targeting Prognostic Covariates in Budget-Constrained Sequential Experiments

Alexander Volfovsky, Kateryna Husar

Pith reviewed 2026-05-08 04:35 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords covariate-adaptive randomizationThompson samplingsequential experimentsbudget constraintscausal inferenceregression adjustmentrandomized controlled trials
0
0 comments X

The pith

DARTS decouples adaptive covariate selection from randomization validity in budget-limited sequential experiments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DARTS, a method that treats the acquisition of high-dimensional pretreatment covariates as a sequential decision problem solved by combinatorial Thompson sampling. It learns which covariates are prognostic from past experimental batches and uses them to improve rerandomization and regression adjustment in future batches. The key result is that this adaptation preserves the validity of each batch's randomization, so the overall inverse-variance weighted estimator for the average treatment effect maintains at least nominal asymptotic coverage. This matters because real-world trials face measurement costs, and DARTS closes much of the efficiency gap to an oracle that knows the best covariates in advance while keeping strict inferential guarantees.

Core claim

Adaptive covariate selection based on past batches preserves batch-level randomization validity, and the cumulative inverse-variance weighted estimator achieves at least nominal asymptotic coverage. The acquisition layer satisfies a Bayes risk bound that matches the minimax lower bound up to logarithmic factors.

What carries the argument

The decoupling result between adaptive covariate selection via Thompson sampling on past batches and the design-based validity of current-batch randomization.

If this is right

  • The method systematically concentrates the measurement budget on the most informative covariates.
  • It closes a substantial fraction of the efficiency gap to an oracle design that knows the prognostic covariates in advance.
  • The overall estimator retains at least nominal asymptotic coverage for the average treatment effect.
  • Batch-level randomization remains valid at every step despite the adaptive policy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decoupling idea might allow adaptive covariate policies inside other design-based procedures such as stratified randomization or rerandomization with different balance metrics.
  • Empirical checks could compare DARTS against fixed-budget designs on data sets with known cost structures to measure realized variance reduction.
  • Tighter analysis could remove the logarithmic factors in the risk bound or replace Thompson sampling with a different acquisition rule while preserving the validity guarantee.

Load-bearing premise

The prognostic value of covariates can be learned reliably from past batches using Thompson sampling without introducing bias into the current batch's randomization or the overall estimator.

What would settle it

A simulation in which the coverage probability of the treatment-effect confidence interval falls below the nominal level when selection is made adaptive across batches, or in which the Bayes risk of the acquisition policy exceeds the minimax lower bound by more than logarithmic factors.

Figures

Figures reproduced from arXiv: 2605.06608 by Alexander Volfovsky, Kateryna Husar.

Figure 1
Figure 1. Figure 1: Cumulative regret relative to Oracle DARTS across 1000 replications. Solid lines show medians; shaded bands show 95% intervals from run percentiles. DARTS with fractional and binary rewards exhibit sublinear growth, with fractional rewards converging faster. We formally characterize the regret bound of this fractional scheme in Appendix A.8, es￾tablishing that the algorithm retains sublin￾ear Bayes risk wh… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of final posterior inclusion probabilities view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of DARTS against ARMM [26] and MADCovar [17] across 1000 replications of the Liang DGP (n = 1000, T = 200, p = 100, B = 2000, variable costs). ARMM and MADCovar are given 12 randomly pre-selected covariates; DARTS learns from the full candidate pool. Solid lines show medians across 1000 replications; shaded bands show 95% intervals from run percentiles. All CIs are 95% valid at fixed T; HC2 stan… view at source ↗
Figure 4
Figure 4. Figure 4: Heterogeneous treatment effects robustness check. Liang outcome surface on covariates view at source ↗
Figure 5
Figure 5. Figure 5: Oracle-costly covariates robustness check. Liang DGP with view at source ↗
Figure 6
Figure 6. Figure 6: Diagnostics from the 1000-replication method comparison (Liang DGP, view at source ↗
read the original abstract

Randomized controlled trials typically assume that prognostic covariates are known and available at no cost. In practice, obtaining high-dimensional pretreatment data is costly, forcing a trade-off between covariate-adaptive precision and a measurement budget. We introduce Dynamic Adaptive Rerandomization via Thompson Sampling (DARTS), which treats covariate acquisition as a sequential optimization problem embedded within a design-based causal inference task. A budgeted combinatorial Thompson sampler learns which covariates are most prognostic across successive batches; selected covariates then drive rerandomization and regression adjustment to reduce batch-level average treatment effect variance. Our primary theoretical contribution is a decoupling result: adaptive covariate selection based on past batches preserves batch-level randomization validity, and the cumulative inverse-variance weighted estimator achieves at least nominal asymptotic coverage. We further derive a Bayes risk bound for the acquisition layer that matches the minimax lower bound up to logarithmic factors. Empirically, DARTS systematically concentrates the budget on informative features, significantly closing the efficiency gap to oracle designs while maintaining strict inferential validity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes DARTS, a sequential experimental design that embeds budgeted covariate acquisition as a combinatorial Thompson sampling problem within batch-wise RCTs. Covariates selected from prior batches drive rerandomization and regression adjustment to reduce ATE variance; the central claims are a decoupling result ensuring that past-batch adaptive selection preserves within-batch randomization validity and that the inverse-variance-weighted cumulative estimator attains at least nominal asymptotic coverage, plus a Bayes risk bound for the acquisition layer that matches the minimax lower bound up to logarithmic factors. Empirical results indicate that the method concentrates the budget on prognostic features and narrows the efficiency gap to oracle designs while retaining strict design-based validity.

Significance. If the decoupling and coverage results hold under the stated conditions, the work provides a principled way to trade off measurement cost against precision in high-dimensional covariate settings, which is directly relevant to budget-limited trials. The explicit separation of selection from randomization is a clean contribution to adaptive design theory, and the near-minimax Bayes bound for the acquisition policy is a notable theoretical strength. The empirical demonstration of budget concentration without validity loss further supports practical utility.

major comments (2)
  1. [Abstract / §3] Abstract and §3 (theoretical results): the decoupling claim that 'adaptive covariate selection based on past batches preserves batch-level randomization validity' is asserted by construction, yet the manuscript does not list the precise measurability or independence conditions (e.g., on the Thompson sampling posterior or batch-size growth) required for the argument; without these, it is impossible to verify whether the result extends beyond the finite-batch case or survives model misspecification in the outcome regression.
  2. [Abstract / §4] Abstract and §4 (Bayes risk bound): the statement that the acquisition-layer bound 'matches the minimax lower bound up to logarithmic factors' is given without an explicit statement of the outcome model class, the prior on covariate prognostic values, or the precise logarithmic term; this makes it difficult to assess whether the bound is tight only under strong parametric assumptions or holds more generally.
minor comments (2)
  1. [Empirical evaluation] The empirical section describes results qualitatively ('significantly closing the efficiency gap'); quantitative tables or figures reporting variance reduction ratios, coverage rates, and budget allocation fractions across simulation settings would strengthen the claims.
  2. [Notation / §2] Notation for the cumulative inverse-variance-weighted estimator is introduced without an explicit recursive formula or variance estimator; adding this would improve readability for readers focused on implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and recommendation for minor revision. The comments correctly identify places where explicit statements of assumptions will improve clarity and verifiability. We address both points below and will incorporate the requested details in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract / §3] Abstract and §3 (theoretical results): the decoupling claim that 'adaptive covariate selection based on past batches preserves batch-level randomization validity' is asserted by construction, yet the manuscript does not list the precise measurability or independence conditions (e.g., on the Thompson sampling posterior or batch-size growth) required for the argument; without these, it is impossible to verify whether the result extends beyond the finite-batch case or survives model misspecification in the outcome regression.

    Authors: We agree that the measurability and independence conditions should be stated explicitly rather than left implicit. In the revision we will add a dedicated paragraph in §3 that lists the required conditions: (i) the Thompson sampling posterior at the start of batch t is a measurable function of the data from batches 1 through t-1 only, hence independent of the current-batch potential outcomes; (ii) batch sizes n_t are non-decreasing and satisfy n_t / N → 0 with N = sum n_t → ∞; (iii) the covariate-selection policy is adapted to the filtration generated by previous batches. Under these conditions the within-batch randomization remains valid by construction, the inverse-variance-weighted estimator is asymptotically normal, and the coverage result holds in the design-based sense without requiring correct specification of the outcome regression. The argument therefore extends to the sequential (infinite-batch) limit under the stated growth condition. revision: yes

  2. Referee: [Abstract / §4] Abstract and §4 (Bayes risk bound): the statement that the acquisition-layer bound 'matches the minimax lower bound up to logarithmic factors' is given without an explicit statement of the outcome model class, the prior on covariate prognostic values, or the precise logarithmic term; this makes it difficult to assess whether the bound is tight only under strong parametric assumptions or holds more generally.

    Authors: We will make the modeling assumptions explicit in the revised §4. The Bayes-risk upper bound is derived for a linear outcome model Y = Xβ + τW + ε with ε ~ N(0,σ²) and a Gaussian prior on the prognostic vector β. The minimax lower bound is taken with respect to the same parametric class. The logarithmic factor is O(log T) where T denotes the number of batches; it arises from the posterior concentration rate of the combinatorial Thompson sampler. We will add these statements together with a short remark that extensions to nonparametric or misspecified outcome models are left for future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity; decoupling and risk bound are independent of fitted inputs

full rationale

The paper's central claims rest on a decoupling argument (past-batch Thompson sampling for covariate selection leaves within-batch randomization and the inverse-variance weighted ATE estimator design-valid) and a separate Bayes-risk bound for the acquisition policy that matches minimax rates up to logs. Neither step reduces to a fitted quantity renamed as a prediction, nor to a self-citation that itself assumes the target result. The decoupling follows directly from the sequential batch structure (selection uses only prior data), and the risk bound is stated under an explicit outcome model without circular dependence on the estimator itself. No equations or self-citations are shown that would force the claimed coverage or rate by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review limits visibility into explicit assumptions; the method appears to rest on standard causal inference conditions for rerandomization validity and bandit assumptions for Thompson sampling convergence.

axioms (2)
  • domain assumption Batch-level randomization remains valid when covariate selection depends only on prior batches
    Invoked in the decoupling result for preserving inferential validity.
  • domain assumption Outcome model permits a Bayes risk bound matching minimax up to logs
    Required for the acquisition layer guarantee.

pith-pipeline@v0.9.0 · 5477 in / 1268 out tokens · 29332 ms · 2026-05-08T04:35:08.037751+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Bandits with knap- sacks.Journal of the ACM (JACM), 65(3):1–55, 2018

    Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knap- sacks.Journal of the ACM (JACM), 65(3):1–55, 2018

  2. [2]

    Bugni, Ivan A

    Federico A. Bugni, Ivan A. Canay, and Azeem M. Shaikh. Inference under covariate-adaptive randomization.Journal of the American Statistical Association, 113(524):1784–1796, 2018

  3. [3]

    An empirical evaluation of thompson sam- pling

    Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sam- pling. InAdvances in Neural Information Processing Systems, volume 24, pages 2249–2257, 2011. URL https://proceedings.neurips.cc/paper/2011/hash/ e53a0a2978c28872a4505bdb51db06dc-Abstract.html

  4. [4]

    Learning to maximize mutual information for dynamic feature selection

    Ian Connick Covert, Wei Qiu, Mingyu Lu, Na Yoon Kim, Nathan J White, and Su-In Lee. Learning to maximize mutual information for dynamic feature selection. InInternational Conference on Machine Learning, pages 6424–6447. PMLR, 2023

  5. [5]

    Budgeted combinatorial multi-armed bandits.arXiv preprint arXiv:2202.03704, 2022

    Debojit Das, Shweta Jain, and Sujit Gujar. Budgeted combinatorial multi-armed bandits.arXiv preprint arXiv:2202.03704, 2022

  6. [6]

    Correlated combinatorial bandits for online resource allocation

    Samarth Gupta, Jinhang Zuo, Carlee Joe-Wong, Gauri Joshi, and Osman Ya˘gan. Correlated combinatorial bandits for online resource allocation. InProceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pages 91–100, 2022

  7. [7]

    Academic press, 2014

    Peter Hall and Christopher C Heyde.Martingale limit theory and its application. Academic press, 2014

  8. [8]

    Classification with costly features using deep reinforcement learning

    Jaromír Janisch, Tomáš Pevn `y, and Viliam Lis `y. Classification with costly features using deep reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 3959–3966, 2019

  9. [9]

    Tight regret bounds for stochastic combinatorial semi-bandits

    Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. Tight regret bounds for stochastic combinatorial semi-bandits. InArtificial Intelligence and Statistics, pages 535–543. PMLR, 2015

  10. [10]

    Variable importance matching for causal inference

    Quinn Lanners, Harsh Parikh, Alexander V olfovsky, Cynthia Rudin, and David Page. Variable importance matching for causal inference. InUncertainty in Artificial Intelligence, pages 1174–1184. PMLR, 2023

  11. [11]

    Rerandomization and regression adjustment.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(1):241–268, 2020

    Xinran Li and Peng Ding. Rerandomization and regression adjustment.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(1):241–268, 2020

  12. [12]

    Design-based theory for causal inference from adaptive experiments

    Xinran Li and Anqi Zhao. Design-based theory for causal inference from adaptive experiments. arXiv preprint arXiv:2602.21998, 2026

  13. [13]

    Active feature acquisition with generative surrogate models

    Yang Li and Junier Oliva. Active feature acquisition with generative surrogate models. In International conference on machine learning, pages 6450–6459. PMLR, 2021

  14. [14]

    Bayesian neural networks for selection of drug sensitive genes.Journal of the American Statistical Association, 113(523):955–972, 2018

    Faming Liang, Qizhai Li, and Lei Zhou. Bayesian neural networks for selection of drug sensitive genes.Journal of the American Statistical Association, 113(523):955–972, 2018

  15. [15]

    Variable selection via thompson sampling.Journal of the American Statistical Association, 118(541):287–304, 2023

    Yi Liu and Veronika Ro ˇcková. Variable selection via thompson sampling.Journal of the American Statistical Association, 118(541):287–304, 2023

  16. [16]

    Lasso-type recovery of sparse representations for high- dimensional data.The Annals of Statistics, 37(1):246–270, 2009

    Nicolai Meinshausen and Bin Yu. Lasso-type recovery of sparse representations for high- dimensional data.The Annals of Statistics, 37(1):246–270, 2009

  17. [17]

    Anytime-valid inference in adaptive experiments: Covariate adjustment and balanced power.arXiv preprint arXiv:2506.20523, 2025

    Daniel Molitor and Samantha Gold. Anytime-valid inference in adaptive experiments: Covariate adjustment and balanced power.arXiv preprint arXiv:2506.20523, 2025

  18. [18]

    Rerandomization to improve covariate balance in experiments.The Annals of Statistics, 40(2):1263–1282, April 2012

    Kari Lock Morgan and Donald B Rubin. Rerandomization to improve covariate balance in experiments.The Annals of Statistics, 40(2):1263–1282, April 2012. doi: 10.1214/12-AOS1008. 10

  19. [19]

    Thompson sampling and approx- imate inference

    My Phan, Yasin Abbasi-Yadkori, and Justin Domke. Thompson sampling and approx- imate inference. InAdvances in Neural Information Processing Systems, volume 32, pages 8801–8811, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/ f3507289cfdc8c9ae93f4098111a13f9-Abstract.html

  20. [20]

    An analysis of en- semble sampling

    Chao Qin, Zheng Wen, Xiuyuan Lu, and Benjamin Van Roy. An analysis of en- semble sampling. InAdvances in Neural Information Processing Systems, volume 35,

  21. [21]

    URL https://proceedings.neurips.cc/paper_files/paper/2022/hash/ 874f5e53d7ce44f65fbf27a7b9406983-Abstract-Conference.html

  22. [22]

    Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling

    Carlos Riquelme, George Tucker, and Jasper Snoek. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. InInternational Conference on Learning Representations, 2018. URL https://openreview.net/forum? id=SyYe6k-CW

  23. [23]

    Learning to optimize via posterior sampling.Mathematics of Operations Research, 39(4):1221–1243, 2014

    Daniel Russo and Benjamin Van Roy. Learning to optimize via posterior sampling.Mathematics of Operations Research, 39(4):1221–1243, 2014

  24. [24]

    Joint active feature acquisition and classification with variable-size set encoding.Advances in neural information processing systems, 31, 2018

    Hajin Shim, Sung Ju Hwang, and Eunho Yang. Joint active feature acquisition and classification with variable-size set encoding.Advances in neural information processing systems, 31, 2018

  25. [25]

    Types of cost in inductive concept learning.arXiv preprint cs/0212034, 2002

    Peter D Turney. Types of cost in inductive concept learning.arXiv preprint cs/0212034, 2002

  26. [26]

    High dimensional variable selection.Annals of statistics, 37(5A):2178, 2009

    Larry Wasserman and Kathryn Roeder. High dimensional variable selection.Annals of statistics, 37(5A):2178, 2009

  27. [27]

    Balancing covariates in multi- arm trials via adaptive randomization.Computational Statistics & Data Analysis, 179:107642, 2023

    Haoyu Yang, Yichen Qin, Fan Wang, Yang Li, and Feifang Hu. Balancing covariates in multi- arm trials via adaptive randomization.Computational Statistics & Data Analysis, 179:107642, 2023

  28. [28]

    Sequential covariate-adjusted randomization via hierarchically minimizing mahalanobis distance and marginal imbalance.Biometrics, 80(2): ujae047, 2024

    Haoyu Yang, Yichen Qin, Yang Li, and Feifang Hu. Sequential covariate-adjusted randomization via hierarchically minimizing mahalanobis distance and marginal imbalance.Biometrics, 80(2): ujae047, 2024

  29. [29]

    Response-adaptive rerandomization.Journal of the Royal Statistical Society: Series C (Applied Statistics), 70(5):1281–1298, 2021

    Hengtao Zhang and Guosheng Yin. Response-adaptive rerandomization.Journal of the Royal Statistical Society: Series C (Applied Statistics), 70(5):1281–1298, 2021

  30. [30]

    Ernst, Kari Lock Morgan, Donald B

    Quan Zhou, Philip A. Ernst, Kari Lock Morgan, Donald B. Rubin, and Anru Zhang. Sequential rerandomization.Biometrika, 105(3):745–752, 2018. A Technical appendices and supplementary material A.1 Proof of theorem 1 Fix batch t and condition on Ft−1. Since St is chosen using only previous batches, the selected covariate set is fixed under this conditioning. ...

  31. [31]

    (A5) imports this conclusion into the superpopulation framework as a primitive assumption on the joint conditional distribution ofˆτgivenG t

    show that such schemes reduce or preserve variance relative to simple random assignment. (A5) imports this conclusion into the superpopulation framework as a primitive assumption on the joint conditional distribution ofˆτgivenG t. 13 A.4 Proof of theorem 2 By relative weight consistency, PT t=1 wtPT t=1 ¯wt p − →1. By the estimated-weight remainder condit...

  32. [32]

    at mostKarms per found forTrounds, soQ(T)≤KT

  33. [33]

    Since both must hold,Q(T)≤min KT, B cmin

    each pull of arm j costs at least cmin, and the total budget is B, so cmin ·Q(T)≤B , giving Q(T)≤B/c min. Since both must hold,Q(T)≤min KT, B cmin . We get E TX t=1 [Ut(St)−r θ∗(St)]≤2 p 2 log(pT2)· r p·min KT, B cmin . Combining this with the vanishing terms for event E C t and term II, we get the bound in the theorem. A.7 Planning gap Let B′ = min(B, T)...