arxiv: 2605.06608 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.LG· stat.ME

Recognition: unknown

DARTS: Targeting Prognostic Covariates in Budget-Constrained Sequential Experiments

Alexander Volfovsky, Kateryna Husar

Pith reviewed 2026-05-08 04:35 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords covariate-adaptive randomizationThompson samplingsequential experimentsbudget constraintscausal inferenceregression adjustmentrandomized controlled trials

0 comments

The pith

DARTS decouples adaptive covariate selection from randomization validity in budget-limited sequential experiments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DARTS, a method that treats the acquisition of high-dimensional pretreatment covariates as a sequential decision problem solved by combinatorial Thompson sampling. It learns which covariates are prognostic from past experimental batches and uses them to improve rerandomization and regression adjustment in future batches. The key result is that this adaptation preserves the validity of each batch's randomization, so the overall inverse-variance weighted estimator for the average treatment effect maintains at least nominal asymptotic coverage. This matters because real-world trials face measurement costs, and DARTS closes much of the efficiency gap to an oracle that knows the best covariates in advance while keeping strict inferential guarantees.

Core claim

Adaptive covariate selection based on past batches preserves batch-level randomization validity, and the cumulative inverse-variance weighted estimator achieves at least nominal asymptotic coverage. The acquisition layer satisfies a Bayes risk bound that matches the minimax lower bound up to logarithmic factors.

What carries the argument

The decoupling result between adaptive covariate selection via Thompson sampling on past batches and the design-based validity of current-batch randomization.

If this is right

The method systematically concentrates the measurement budget on the most informative covariates.
It closes a substantial fraction of the efficiency gap to an oracle design that knows the prognostic covariates in advance.
The overall estimator retains at least nominal asymptotic coverage for the average treatment effect.
Batch-level randomization remains valid at every step despite the adaptive policy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decoupling idea might allow adaptive covariate policies inside other design-based procedures such as stratified randomization or rerandomization with different balance metrics.
Empirical checks could compare DARTS against fixed-budget designs on data sets with known cost structures to measure realized variance reduction.
Tighter analysis could remove the logarithmic factors in the risk bound or replace Thompson sampling with a different acquisition rule while preserving the validity guarantee.

Load-bearing premise

The prognostic value of covariates can be learned reliably from past batches using Thompson sampling without introducing bias into the current batch's randomization or the overall estimator.

What would settle it

A simulation in which the coverage probability of the treatment-effect confidence interval falls below the nominal level when selection is made adaptive across batches, or in which the Bayes risk of the acquisition policy exceeds the minimax lower bound by more than logarithmic factors.

Figures

Figures reproduced from arXiv: 2605.06608 by Alexander Volfovsky, Kateryna Husar.

**Figure 1.** Figure 1: Cumulative regret relative to Oracle DARTS across 1000 replications. Solid lines show medians; shaded bands show 95% intervals from run percentiles. DARTS with fractional and binary rewards exhibit sublinear growth, with fractional rewards converging faster. We formally characterize the regret bound of this fractional scheme in Appendix A.8, establishing that the algorithm retains sublinear Bayes risk wh… view at source ↗

**Figure 2.** Figure 2: Distribution of final posterior inclusion probabilities view at source ↗

**Figure 3.** Figure 3: Comparison of DARTS against ARMM [26] and MADCovar [17] across 1000 replications of the Liang DGP (n = 1000, T = 200, p = 100, B = 2000, variable costs). ARMM and MADCovar are given 12 randomly pre-selected covariates; DARTS learns from the full candidate pool. Solid lines show medians across 1000 replications; shaded bands show 95% intervals from run percentiles. All CIs are 95% valid at fixed T; HC2 stan… view at source ↗

**Figure 4.** Figure 4: Heterogeneous treatment effects robustness check. Liang outcome surface on covariates view at source ↗

**Figure 5.** Figure 5: Oracle-costly covariates robustness check. Liang DGP with view at source ↗

**Figure 6.** Figure 6: Diagnostics from the 1000-replication method comparison (Liang DGP, view at source ↗

read the original abstract

Randomized controlled trials typically assume that prognostic covariates are known and available at no cost. In practice, obtaining high-dimensional pretreatment data is costly, forcing a trade-off between covariate-adaptive precision and a measurement budget. We introduce Dynamic Adaptive Rerandomization via Thompson Sampling (DARTS), which treats covariate acquisition as a sequential optimization problem embedded within a design-based causal inference task. A budgeted combinatorial Thompson sampler learns which covariates are most prognostic across successive batches; selected covariates then drive rerandomization and regression adjustment to reduce batch-level average treatment effect variance. Our primary theoretical contribution is a decoupling result: adaptive covariate selection based on past batches preserves batch-level randomization validity, and the cumulative inverse-variance weighted estimator achieves at least nominal asymptotic coverage. We further derive a Bayes risk bound for the acquisition layer that matches the minimax lower bound up to logarithmic factors. Empirically, DARTS systematically concentrates the budget on informative features, significantly closing the efficiency gap to oracle designs while maintaining strict inferential validity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DARTS decouples past-batch covariate selection from current-batch randomization so that Thompson sampling can chase prognostic features without breaking design-based validity for the ATE.

read the letter

The paper's main move is to embed a budgeted combinatorial Thompson sampler inside a sequential RCT so that the decision of which covariates to measure in the next batch is driven only by data from earlier batches. Once the covariates are chosen, the design falls back to standard rerandomization and regression adjustment within the batch. Because the selection step is independent of the current randomization, the batch-level treatment assignment remains valid by construction, and the inverse-variance weighted cumulative estimator keeps at least nominal asymptotic coverage. They also supply a Bayes risk bound for the acquisition policy that sits within logarithmic factors of the minimax lower bound under their outcome model. Empirically the method does concentrate the measurement budget on the actually prognostic covariates and narrows the efficiency gap to an oracle that knows the best features in advance.

Referee Report

2 major / 2 minor

Summary. The paper proposes DARTS, a sequential experimental design that embeds budgeted covariate acquisition as a combinatorial Thompson sampling problem within batch-wise RCTs. Covariates selected from prior batches drive rerandomization and regression adjustment to reduce ATE variance; the central claims are a decoupling result ensuring that past-batch adaptive selection preserves within-batch randomization validity and that the inverse-variance-weighted cumulative estimator attains at least nominal asymptotic coverage, plus a Bayes risk bound for the acquisition layer that matches the minimax lower bound up to logarithmic factors. Empirical results indicate that the method concentrates the budget on prognostic features and narrows the efficiency gap to oracle designs while retaining strict design-based validity.

Significance. If the decoupling and coverage results hold under the stated conditions, the work provides a principled way to trade off measurement cost against precision in high-dimensional covariate settings, which is directly relevant to budget-limited trials. The explicit separation of selection from randomization is a clean contribution to adaptive design theory, and the near-minimax Bayes bound for the acquisition policy is a notable theoretical strength. The empirical demonstration of budget concentration without validity loss further supports practical utility.

major comments (2)

[Abstract / §3] Abstract and §3 (theoretical results): the decoupling claim that 'adaptive covariate selection based on past batches preserves batch-level randomization validity' is asserted by construction, yet the manuscript does not list the precise measurability or independence conditions (e.g., on the Thompson sampling posterior or batch-size growth) required for the argument; without these, it is impossible to verify whether the result extends beyond the finite-batch case or survives model misspecification in the outcome regression.
[Abstract / §4] Abstract and §4 (Bayes risk bound): the statement that the acquisition-layer bound 'matches the minimax lower bound up to logarithmic factors' is given without an explicit statement of the outcome model class, the prior on covariate prognostic values, or the precise logarithmic term; this makes it difficult to assess whether the bound is tight only under strong parametric assumptions or holds more generally.

minor comments (2)

[Empirical evaluation] The empirical section describes results qualitatively ('significantly closing the efficiency gap'); quantitative tables or figures reporting variance reduction ratios, coverage rates, and budget allocation fractions across simulation settings would strengthen the claims.
[Notation / §2] Notation for the cumulative inverse-variance-weighted estimator is introduced without an explicit recursive formula or variance estimator; adding this would improve readability for readers focused on implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and recommendation for minor revision. The comments correctly identify places where explicit statements of assumptions will improve clarity and verifiability. We address both points below and will incorporate the requested details in the revised manuscript.

read point-by-point responses

Referee: [Abstract / §3] Abstract and §3 (theoretical results): the decoupling claim that 'adaptive covariate selection based on past batches preserves batch-level randomization validity' is asserted by construction, yet the manuscript does not list the precise measurability or independence conditions (e.g., on the Thompson sampling posterior or batch-size growth) required for the argument; without these, it is impossible to verify whether the result extends beyond the finite-batch case or survives model misspecification in the outcome regression.

Authors: We agree that the measurability and independence conditions should be stated explicitly rather than left implicit. In the revision we will add a dedicated paragraph in §3 that lists the required conditions: (i) the Thompson sampling posterior at the start of batch t is a measurable function of the data from batches 1 through t-1 only, hence independent of the current-batch potential outcomes; (ii) batch sizes n_t are non-decreasing and satisfy n_t / N → 0 with N = sum n_t → ∞; (iii) the covariate-selection policy is adapted to the filtration generated by previous batches. Under these conditions the within-batch randomization remains valid by construction, the inverse-variance-weighted estimator is asymptotically normal, and the coverage result holds in the design-based sense without requiring correct specification of the outcome regression. The argument therefore extends to the sequential (infinite-batch) limit under the stated growth condition. revision: yes
Referee: [Abstract / §4] Abstract and §4 (Bayes risk bound): the statement that the acquisition-layer bound 'matches the minimax lower bound up to logarithmic factors' is given without an explicit statement of the outcome model class, the prior on covariate prognostic values, or the precise logarithmic term; this makes it difficult to assess whether the bound is tight only under strong parametric assumptions or holds more generally.

Authors: We will make the modeling assumptions explicit in the revised §4. The Bayes-risk upper bound is derived for a linear outcome model Y = Xβ + τW + ε with ε ~ N(0,σ²) and a Gaussian prior on the prognostic vector β. The minimax lower bound is taken with respect to the same parametric class. The logarithmic factor is O(log T) where T denotes the number of batches; it arises from the posterior concentration rate of the combinatorial Thompson sampler. We will add these statements together with a short remark that extensions to nonparametric or misspecified outcome models are left for future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity; decoupling and risk bound are independent of fitted inputs

full rationale

The paper's central claims rest on a decoupling argument (past-batch Thompson sampling for covariate selection leaves within-batch randomization and the inverse-variance weighted ATE estimator design-valid) and a separate Bayes-risk bound for the acquisition policy that matches minimax rates up to logs. Neither step reduces to a fitted quantity renamed as a prediction, nor to a self-citation that itself assumes the target result. The decoupling follows directly from the sequential batch structure (selection uses only prior data), and the risk bound is stated under an explicit outcome model without circular dependence on the estimator itself. No equations or self-citations are shown that would force the claimed coverage or rate by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review limits visibility into explicit assumptions; the method appears to rest on standard causal inference conditions for rerandomization validity and bandit assumptions for Thompson sampling convergence.

axioms (2)

domain assumption Batch-level randomization remains valid when covariate selection depends only on prior batches
Invoked in the decoupling result for preserving inferential validity.
domain assumption Outcome model permits a Bayes risk bound matching minimax up to logs
Required for the acquisition layer guarantee.

pith-pipeline@v0.9.0 · 5477 in / 1268 out tokens · 29332 ms · 2026-05-08T04:35:08.037751+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Bandits with knap- sacks.Journal of the ACM (JACM), 65(3):1–55, 2018

Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knap- sacks.Journal of the ACM (JACM), 65(3):1–55, 2018

2018
[2]

Bugni, Ivan A

Federico A. Bugni, Ivan A. Canay, and Azeem M. Shaikh. Inference under covariate-adaptive randomization.Journal of the American Statistical Association, 113(524):1784–1796, 2018

2018
[3]

An empirical evaluation of thompson sam- pling

Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sam- pling. InAdvances in Neural Information Processing Systems, volume 24, pages 2249–2257, 2011. URL https://proceedings.neurips.cc/paper/2011/hash/ e53a0a2978c28872a4505bdb51db06dc-Abstract.html

2011
[4]

Learning to maximize mutual information for dynamic feature selection

Ian Connick Covert, Wei Qiu, Mingyu Lu, Na Yoon Kim, Nathan J White, and Su-In Lee. Learning to maximize mutual information for dynamic feature selection. InInternational Conference on Machine Learning, pages 6424–6447. PMLR, 2023

2023
[5]

Budgeted combinatorial multi-armed bandits.arXiv preprint arXiv:2202.03704, 2022

Debojit Das, Shweta Jain, and Sujit Gujar. Budgeted combinatorial multi-armed bandits.arXiv preprint arXiv:2202.03704, 2022

work page arXiv 2022
[6]

Correlated combinatorial bandits for online resource allocation

Samarth Gupta, Jinhang Zuo, Carlee Joe-Wong, Gauri Joshi, and Osman Ya˘gan. Correlated combinatorial bandits for online resource allocation. InProceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pages 91–100, 2022

2022
[7]

Academic press, 2014

Peter Hall and Christopher C Heyde.Martingale limit theory and its application. Academic press, 2014

2014
[8]

Classification with costly features using deep reinforcement learning

Jaromír Janisch, Tomáš Pevn `y, and Viliam Lis `y. Classification with costly features using deep reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 3959–3966, 2019

2019
[9]

Tight regret bounds for stochastic combinatorial semi-bandits

Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. Tight regret bounds for stochastic combinatorial semi-bandits. InArtificial Intelligence and Statistics, pages 535–543. PMLR, 2015

2015
[10]

Variable importance matching for causal inference

Quinn Lanners, Harsh Parikh, Alexander V olfovsky, Cynthia Rudin, and David Page. Variable importance matching for causal inference. InUncertainty in Artificial Intelligence, pages 1174–1184. PMLR, 2023

2023
[11]

Rerandomization and regression adjustment.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(1):241–268, 2020

Xinran Li and Peng Ding. Rerandomization and regression adjustment.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(1):241–268, 2020

2020
[12]

Design-based theory for causal inference from adaptive experiments

Xinran Li and Anqi Zhao. Design-based theory for causal inference from adaptive experiments. arXiv preprint arXiv:2602.21998, 2026

work page arXiv 2026
[13]

Active feature acquisition with generative surrogate models

Yang Li and Junier Oliva. Active feature acquisition with generative surrogate models. In International conference on machine learning, pages 6450–6459. PMLR, 2021

2021
[14]

Bayesian neural networks for selection of drug sensitive genes.Journal of the American Statistical Association, 113(523):955–972, 2018

Faming Liang, Qizhai Li, and Lei Zhou. Bayesian neural networks for selection of drug sensitive genes.Journal of the American Statistical Association, 113(523):955–972, 2018

2018
[15]

Variable selection via thompson sampling.Journal of the American Statistical Association, 118(541):287–304, 2023

Yi Liu and Veronika Ro ˇcková. Variable selection via thompson sampling.Journal of the American Statistical Association, 118(541):287–304, 2023

2023
[16]

Lasso-type recovery of sparse representations for high- dimensional data.The Annals of Statistics, 37(1):246–270, 2009

Nicolai Meinshausen and Bin Yu. Lasso-type recovery of sparse representations for high- dimensional data.The Annals of Statistics, 37(1):246–270, 2009

2009
[17]

Anytime-valid inference in adaptive experiments: Covariate adjustment and balanced power.arXiv preprint arXiv:2506.20523, 2025

Daniel Molitor and Samantha Gold. Anytime-valid inference in adaptive experiments: Covariate adjustment and balanced power.arXiv preprint arXiv:2506.20523, 2025

work page arXiv 2025
[18]

Rerandomization to improve covariate balance in experiments.The Annals of Statistics, 40(2):1263–1282, April 2012

Kari Lock Morgan and Donald B Rubin. Rerandomization to improve covariate balance in experiments.The Annals of Statistics, 40(2):1263–1282, April 2012. doi: 10.1214/12-AOS1008. 10

work page doi:10.1214/12-aos1008 2012
[19]

Thompson sampling and approx- imate inference

My Phan, Yasin Abbasi-Yadkori, and Justin Domke. Thompson sampling and approx- imate inference. InAdvances in Neural Information Processing Systems, volume 32, pages 8801–8811, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/ f3507289cfdc8c9ae93f4098111a13f9-Abstract.html

2019
[20]

An analysis of en- semble sampling

Chao Qin, Zheng Wen, Xiuyuan Lu, and Benjamin Van Roy. An analysis of en- semble sampling. InAdvances in Neural Information Processing Systems, volume 35,
[21]

URL https://proceedings.neurips.cc/paper_files/paper/2022/hash/ 874f5e53d7ce44f65fbf27a7b9406983-Abstract-Conference.html

2022
[22]

Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling

Carlos Riquelme, George Tucker, and Jasper Snoek. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. InInternational Conference on Learning Representations, 2018. URL https://openreview.net/forum? id=SyYe6k-CW

2018
[23]

Learning to optimize via posterior sampling.Mathematics of Operations Research, 39(4):1221–1243, 2014

Daniel Russo and Benjamin Van Roy. Learning to optimize via posterior sampling.Mathematics of Operations Research, 39(4):1221–1243, 2014

2014
[24]

Joint active feature acquisition and classification with variable-size set encoding.Advances in neural information processing systems, 31, 2018

Hajin Shim, Sung Ju Hwang, and Eunho Yang. Joint active feature acquisition and classification with variable-size set encoding.Advances in neural information processing systems, 31, 2018

2018
[25]

Types of cost in inductive concept learning.arXiv preprint cs/0212034, 2002

Peter D Turney. Types of cost in inductive concept learning.arXiv preprint cs/0212034, 2002

work page internal anchor Pith review arXiv 2002
[26]

High dimensional variable selection.Annals of statistics, 37(5A):2178, 2009

Larry Wasserman and Kathryn Roeder. High dimensional variable selection.Annals of statistics, 37(5A):2178, 2009

2009
[27]

Balancing covariates in multi- arm trials via adaptive randomization.Computational Statistics & Data Analysis, 179:107642, 2023

Haoyu Yang, Yichen Qin, Fan Wang, Yang Li, and Feifang Hu. Balancing covariates in multi- arm trials via adaptive randomization.Computational Statistics & Data Analysis, 179:107642, 2023

2023
[28]

Sequential covariate-adjusted randomization via hierarchically minimizing mahalanobis distance and marginal imbalance.Biometrics, 80(2): ujae047, 2024

Haoyu Yang, Yichen Qin, Yang Li, and Feifang Hu. Sequential covariate-adjusted randomization via hierarchically minimizing mahalanobis distance and marginal imbalance.Biometrics, 80(2): ujae047, 2024

2024
[29]

Response-adaptive rerandomization.Journal of the Royal Statistical Society: Series C (Applied Statistics), 70(5):1281–1298, 2021

Hengtao Zhang and Guosheng Yin. Response-adaptive rerandomization.Journal of the Royal Statistical Society: Series C (Applied Statistics), 70(5):1281–1298, 2021

2021
[30]

Ernst, Kari Lock Morgan, Donald B

Quan Zhou, Philip A. Ernst, Kari Lock Morgan, Donald B. Rubin, and Anru Zhang. Sequential rerandomization.Biometrika, 105(3):745–752, 2018. A Technical appendices and supplementary material A.1 Proof of theorem 1 Fix batch t and condition on Ft−1. Since St is chosen using only previous batches, the selected covariate set is fixed under this conditioning. ...

2018
[31]

(A5) imports this conclusion into the superpopulation framework as a primitive assumption on the joint conditional distribution ofˆτgivenG t

show that such schemes reduce or preserve variance relative to simple random assignment. (A5) imports this conclusion into the superpopulation framework as a primitive assumption on the joint conditional distribution ofˆτgivenG t. 13 A.4 Proof of theorem 2 By relative weight consistency, PT t=1 wtPT t=1 ¯wt p − →1. By the estimated-weight remainder condit...
[32]

at mostKarms per found forTrounds, soQ(T)≤KT
[33]

Since both must hold,Q(T)≤min KT, B cmin

each pull of arm j costs at least cmin, and the total budget is B, so cmin ·Q(T)≤B , giving Q(T)≤B/c min. Since both must hold,Q(T)≤min KT, B cmin . We get E TX t=1 [Ut(St)−r θ∗(St)]≤2 p 2 log(pT2)· r p·min KT, B cmin . Combining this with the vanishing terms for event E C t and term II, we get the bound in the theorem. A.7 Planning gap Let B′ = min(B, T)...

work page arXiv 2000