Active Context Selection Improves Simple Regret in Contextual Bandits

Jalal Etesami; Mohammad Shahverdikondori; Negar Kiyavash

arxiv: 2605.20040 · v1 · pith:A573AI53new · submitted 2026-05-19 · 💻 cs.LG

Active Context Selection Improves Simple Regret in Contextual Bandits

Mohammad Shahverdikondori , Jalal Etesami , Negar Kiyavash This is my paper

Pith reviewed 2026-05-20 06:35 UTC · model grok-4.3

classification 💻 cs.LG

keywords contextual banditssimple regretactive samplingcontext distributioninstance-dependent boundsexplore then commitregret rates

0 comments

The pith

Active sampling of contexts with allocation q proportional to p_j^{2/3} achieves the tight simple regret rate sqrt(n/T) ||p||_{2/3}.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines contextual multi-armed bandits evaluated by simple regret weighted according to a fixed context distribution p. It compares the case where contexts arrive randomly according to p against the case where the learner actively chooses which context to query for a reward. For known p, passive sampling produces regret of order sqrt(n/T ||p||_{1/2}) while the active allocation q_j proportional to p_j to the 2/3 power produces the tighter rate sqrt(n/T) ||p||_{2/3}, with the gap reaching Theta(k^{1/4}) for k contexts. The analysis extends to budgeted active selection and to the case of unknown p via a phased estimation-and-commit procedure that recovers the active rate for large horizons.

Core claim

For a known context distribution p the tight rate under active context selection is sqrt(n/T) ||p||_{2/3} when samples are allocated with probabilities q_j proportional to p_j^{2/3}; this improves on the passive rate of order sqrt(n/T ||p||_{1/2}). The bounds hold in the worst case over reward distributions. When p is unknown the Explore-Explore-Then-Commit algorithm first estimates p and then switches to the active allocation, recovering the known-p active rate up to constants for large time horizons.

What carries the argument

The allocation q_j proportional to p_j^{2/3} that balances the weighted contributions to simple regret across contexts.

If this is right

The improvement over passive sampling grows with the number of contexts and reaches Theta(k^{1/4}).
A sufficient active-sampling budget recovers the full active rate even under budget constraints.
When p is unknown the EETC algorithm recovers the active regret rate asymptotically.
The rates remain tight, so no better dependence on the norm of p is possible under the stated conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In settings such as clinical trials the result suggests deliberately oversampling important but rare patient subgroups to lower overall decision regret.
If p changes slowly the same allocation can be re-computed periodically after re-estimation.
The allocation principle may apply to other subpopulation design problems where one controls which group to observe next.

Load-bearing premise

The learner can freely choose any context to sample at each round and the context distribution p is fixed and known or accurately estimable.

What would settle it

Measure the ratio of simple regret under the proposed active allocation versus passive random sampling on an instance with k=16 contexts; the ratio should approach 2 if the predicted k^{1/4} improvement holds.

Figures

Figures reproduced from arXiv: 2605.20040 by Jalal Etesami, Mohammad Shahverdikondori, Negar Kiyavash.

**Figure 2.** Figure 2: Simple regret of the budgeted active algorithm on the MovieLens instance as a function [PITH_FULL_IMAGE:figures/full_fig_p030_2.png] view at source ↗

**Figure 3.** Figure 3: Synthetic experiment with varying number of subpopulations [PITH_FULL_IMAGE:figures/full_fig_p032_3.png] view at source ↗

**Figure 4.** Figure 4: Synthetic experiment with varying number of treatments [PITH_FULL_IMAGE:figures/full_fig_p032_4.png] view at source ↗

read the original abstract

We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, while remaining instance-dependent with respect to the context distribution vector $p$. Akin to experimental design problems where the population of interest is fixed but the sampled subpopulation can be controlled, we allow the learner to actively choose which context to sample from. For a known $p$, we characterize tight regret rates: passive sampling where contexts are randomly revealed achieves regret of order $\sqrt{n/T \, \lVert p \rVert_{1/2}}$, whereas active sampling with allocation $q_j \propto p_j^{2/3}$ achieves the tight rate $\sqrt{n/T} \, \lVert p \rVert_{2/3}$. The resulting improvement can be as large as $\Theta(k^{1/4})$, where $k$ is the number of contexts. We further extend the analysis to budgeted active sampling, characterize the corresponding tight rate, and identify when a limited active budget suffices to recover the fully active rate. When $p$ is unknown, we propose the Explore-Explore-Then-Commit (EETC) algorithm, which optimally balances estimating the context distribution and the time to switch to active allocation, such that for large horizons, it matches the known-$p$ active rate up to constants. Experiments on synthetic and real-world data support our theoretical findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Active sampling with q proportional to p to the 2/3 gives a tight improvement over passive sampling in finite-context bandits, up to a k to the 1/4 factor in simple regret, plus a workable algorithm when p is unknown.

read the letter

The main point is that for contextual bandits with finite contexts and a fixed distribution p, actively choosing which context to sample from beats the usual random revelation of contexts. With the right allocation rule, you get a strictly better instance-dependent simple regret rate, and the paper pins down the exact exponents and the size of the gain. They also give a practical way to handle the case where p is not known in advance. This is a clean theoretical sharpening rather than a broad new framework. The matching upper and lower bounds for the known-p case are the strongest part, along with the budgeted variant and the EETC procedure that recovers the active rate asymptotically. The experiments on synthetic and real data are straightforward and line up with the predictions without stretching the claims. The analysis stays worst-case over rewards while staying instance-dependent on p, which is the right scope. One limitation is that the gains shrink when p is close to uniform, and the whole setup requires that the learner can freely select any context at each round. If sampling is constrained in practice, the rates do not apply directly. The unknown-p extension is asymptotic, so finite-time behavior might need more checking. This paper is aimed at people working on instance-dependent bounds in bandits or experimental design with subpopulations. A reader who already knows the basic contextual bandit setup and cares about tight rates will find the explicit allocation and the improvement factor useful. It is worth sending to peer review because the central claims rest on matching bounds and the algorithm is simple enough to implement and test.

Referee Report

0 major / 2 minor

Summary. The manuscript studies contextual multi-armed bandits with finite contexts under context-weighted simple regret. It derives matching upper and lower bounds for known context distribution p: passive sampling yields regret of order sqrt(n/T ||p||_{1/2}), while active sampling with allocation q_j ∝ p_j^{2/3} achieves the tight rate sqrt(n/T) ||p||_{2/3}, with improvement up to Θ(k^{1/4}). The analysis extends to budgeted active sampling and proposes the EETC algorithm for unknown p, which asymptotically recovers the active rate for large T. Experiments on synthetic and real-world data are included.

Significance. If the matching bounds hold, the work makes a solid contribution by quantifying the benefit of active context selection over passive sampling in heterogeneous populations, with instance-dependent rates that remain worst-case over rewards. The tight characterization, the budgeted extension, and the EETC procedure for unknown p are strengths; the explicit improvement factor of Θ(k^{1/4}) and the connection to experimental design are noteworthy.

minor comments (2)

[Abstract] The abstract states the rates but does not define the vector norms ||p||_{1/2} and ||p||_{2/3} inline; a short parenthetical or reference to the preliminary notation section would improve immediate readability.
In the EETC description, the precise rule for choosing the switch time from exploration to active allocation could be stated more explicitly (e.g., as a function of estimated p and remaining horizon) to facilitate implementation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation to accept. We appreciate the recognition of the tight matching bounds, the explicit improvement factor of Θ(k^{1/4}), the budgeted extension, and the EETC procedure for unknown p.

Circularity Check

0 steps flagged

No significant circularity; rates derived from independent minimax bounds

full rationale

The paper's central claims on tight regret rates for passive sampling (order sqrt(n/T ||p||_{1/2})) and active sampling with q_j proportional to p_j^{2/3} (order sqrt(n/T) ||p||_{2/3}) follow from explicit upper bounds obtained by optimizing the worst-case weighted regret expression and matching lower bounds via change-of-measure arguments over reward distributions. These steps are standard information-theoretic derivations that do not reduce to self-definitions, fitted inputs renamed as predictions, or load-bearing self-citations. The EETC extension for unknown p is constructed to recover the active rate asymptotically without circular dependence on the target result. The analysis remains self-contained with respect to external benchmarks such as minimax optimization and does not invoke uniqueness theorems or ansatzes from prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard bandit assumptions plus the modeling choice of finite contexts and controllable sampling; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption Finite context space with fixed distribution p
Stated in the problem setup; enables the norm-based rates and active allocation.
standard math Worst-case reward distributions (arbitrary but bounded or sub-Gaussian)
Invoked to obtain minimax-style guarantees while keeping dependence on p instance-dependent.

pith-pipeline@v0.9.0 · 5811 in / 1403 out tokens · 37158 ms · 2026-05-20T06:35:52.179738+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

active sampling with allocation q_j ∝ p_j^{2/3} achieves the tight rate √(n/T) ||p||_{2/3}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 3 internal anchors

[1]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

work page 2020
[2]

Foundations and Trends

Regret analysis of stochastic and nonstochastic multi-armed bandit problems , author=. Foundations and Trends. 2012 , publisher=

work page 2012
[3]

2009 , publisher=

Causality , author=. 2009 , publisher=

work page 2009
[4]

International Conference on Machine Learning , year=

Recommendations from Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization , author=. International Conference on Machine Learning , year=

work page
[5]

ACM Computing Surveys , year=

Causality in Bandits: A Survey , author=. ACM Computing Surveys , year=

work page
[6]

2020 IEEE Congress on Evolutionary Computation (CEC) , pages=

Survey on applications of multi-armed and contextual bandits , author=. 2020 IEEE Congress on Evolutionary Computation (CEC) , pages=. 2020 , organization=

work page 2020
[7]

Mobile health: sensors, analytic methods, and applications , pages=

From ads to interventions: Contextual bandits in mobile health , author=. Mobile health: sensors, analytic methods, and applications , pages=. 2017 , publisher=

work page 2017
[8]

Statistical science: a review journal of the Institute of Mathematical Statistics , volume=

Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges , author=. Statistical science: a review journal of the Institute of Mathematical Statistics , volume=

work page
[9]

Journal of medical Internet research , volume=

Reinforcement learning for clinical decision support in critical care: comprehensive review , author=. Journal of medical Internet research , volume=. 2020 , publisher=

work page 2020
[10]

Proceedings of the 19th international conference on World wide web , pages=

A contextual-bandit approach to personalized news article recommendation , author=. Proceedings of the 19th international conference on World wide web , pages=

work page
[11]

A Survey on Practical Applications of Multi-Armed and Contextual Bandits

A survey on practical applications of multi-armed and contextual bandits , author=. arXiv preprint arXiv:1904.10040 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904
[12]

arXiv preprint arXiv:2503.07555 , year=

Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference , author=. arXiv preprint arXiv:2503.07555 , year=. 2503.07555 , archivePrefix=

work page arXiv
[13]

Journal of Machine Learning Research , volume=

Recursive causal discovery , author=. Journal of Machine Learning Research , volume=

work page
[14]

Advances in neural information processing systems , volume=

Causal bandits: Learning good interventions via causal inference , author=. Advances in neural information processing systems , volume=

work page
[15]

Advances in neural information processing systems , volume=

Structural causal bandits: Where to intervene? , author=. Advances in neural information processing systems , volume=

work page
[16]

Conference on Uncertainty in Artificial Intelligence , pages=

Regret analysis of bandit problems with causal background knowledge , author=. Conference on Uncertainty in Artificial Intelligence , pages=. 2020 , organization=

work page 2020
[17]

International Conference on Artificial Intelligence and Statistics , pages=

Budgeted and non-budgeted causal bandits , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021
[18]

Causal Learning and Reasoning , pages=

Confounded budgeted causal bandits , author=. Causal Learning and Reasoning , pages=. 2024 , organization=

work page 2024
[19]

IEEE Journal on Selected Areas in Information Theory , volume=

Robust causal bandits for linear models , author=. IEEE Journal on Selected Areas in Information Theory , volume=. 2024 , publisher=

work page 2024
[20]

Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals , author=

work page
[21]

Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context

Optimal Best Arm Identification with Post-Action Context , author=. arXiv preprint arXiv:2502.03061 , year=. 2502.03061 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv
[22]

International conference on Algorithmic learning theory , pages=

Pure exploration in multi-armed bandits problems , author=. International conference on Algorithmic learning theory , pages=. 2009 , organization=

work page 2009
[23]

International Conference on Machine Learning , pages=

Revisiting simple regret: Fast rates for returning a good arm , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[24]

Journal of the American statistical association , volume=

Probability inequalities for sums of bounded random variables , author=. Journal of the American statistical association , volume=. 1963 , publisher=

work page 1963
[25]

THE ANNALS of STATISTICS , pages=

THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES , author=. THE ANNALS of STATISTICS , pages=. 2013 , publisher=

work page 2013
[26]

Advances in Neural Information Processing Systems , volume=

A/b/n testing with control in the presence of subpopulations , author=. Advances in Neural Information Processing Systems , volume=

work page
[27]

2023 Winter Simulation Conference (WSC) , pages=

Best Arm Identification with Fairness Constraints on Subpopulations , author=. 2023 Winter Simulation Conference (WSC) , pages=. 2023 , organization=

work page 2023
[28]

International Conference on Machine Learning , pages=

Adaptive identification of populations with treatment benefit in clinical trials: machine learning challenges and solutions , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[29]

, author=

Discriminative learning under covariate shift. , author=. Journal of Machine Learning Research , volume=

work page
[30]

arXiv preprint arXiv:2402.16710 , year=

Cost aware best arm identification , author=. arXiv preprint arXiv:2402.16710 , year=

work page arXiv
[31]

arXiv preprint arXiv:2506.24007 , year=

Minimax and Bayes Optimal Best-Arm Identification , author=. arXiv preprint arXiv:2506.24007 , year=

work page arXiv
[32]

arXiv preprint arXiv:1810.07371 , year=

Simple regret minimization for contextual bandits , author=. arXiv preprint arXiv:1810.07371 , year=

work page arXiv
[33]

arXiv preprint arXiv:2209.07330 , year=

Best arm identification with contextual information under a small gap , author=. arXiv preprint arXiv:2209.07330 , year=

work page arXiv
[34]

Advances in neural information processing systems , volume=

Variational Bayesian optimal experimental design , author=. Advances in neural information processing systems , volume=

work page
[35]

ICML2022 Workshop on Adaptive Experimental Design and Active Learning in the Real World , year=

Simple regret minimization for contextual bandits using bayesian optimal experimental design , author=. ICML2022 Workshop on Adaptive Experimental Design and Active Learning in the Real World , year=

work page
[36]

arXiv preprint arXiv:2401.03756 , year=

Adaptive experimental design for policy learning , author=. arXiv preprint arXiv:2401.03756 , year=

work page arXiv
[37]

2024 IEEE 63rd Conference on Decision and Control (CDC) , pages=

Fair best arm identification with fixed confidence , author=. 2024 IEEE 63rd Conference on Decision and Control (CDC) , pages=. 2024 , organization=

work page 2024
[38]

1952 , publisher=

Inequalities , author=. 1952 , publisher=

work page 1952
[39]

2009 , publisher=

Concentration of measure for the analysis of randomized algorithms , author=. 2009 , publisher=

work page 2009
[40]

Advances in Neural Information Processing Systems , volume=

Proportional response: Contextual bandits for simple and cumulative regret minimization , author=. Advances in Neural Information Processing Systems , volume=

work page
[41]

Advances in Neural Information Processing Systems , volume=

Multi-bandit best arm identification , author=. Advances in Neural Information Processing Systems , volume=

work page
[42]

2019 IEEE International Symposium on Information Theory (ISIT) , pages=

Overlapping multi-bandit best arm identification , author=. 2019 IEEE International Symposium on Information Theory (ISIT) , pages=. 2019 , organization=

work page 2019
[43]

Colt , pages=

Minimax policies for adversarial and stochastic bandits , author=. Colt , pages=

work page
[44]

Acm transactions on interactive intelligent systems (tiis) , volume=

The movielens datasets: History and context , author=. Acm transactions on interactive intelligent systems (tiis) , volume=. 2015 , publisher=

work page 2015
[45]

Conference on Learning Theory , pages=

Maximin action identification: A new bandit framework for games , author=. Conference on Learning Theory , pages=. 2016 , organization=

work page 2016
[46]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Max-min grouped bandits , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[47]

arXiv preprint arXiv:2505.17869 , year=

Best Group Identification in Multi-Objective Bandits , author=. arXiv preprint arXiv:2505.17869 , year=

work page arXiv
[48]

Uncertainty in Artificial Intelligence , pages=

A causal bandit approach to learning good atomic interventions in presence of unobserved confounders , author=. Uncertainty in Artificial Intelligence , pages=. 2022 , organization=

work page 2022
[49]

Graph Learning Is Suboptimal in Causal Bandits

Graph Learning is Suboptimal in Causal Bandits , author=. arXiv preprint arXiv:2510.16811 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[50]

arXiv preprint arXiv:2601.21167 , year=

Efficient Simple Regret Algorithms for Stochastic Contextual Bandits , author=. arXiv preprint arXiv:2601.21167 , year=

work page arXiv

[1] [1]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

work page 2020

[2] [2]

Foundations and Trends

Regret analysis of stochastic and nonstochastic multi-armed bandit problems , author=. Foundations and Trends. 2012 , publisher=

work page 2012

[3] [3]

2009 , publisher=

Causality , author=. 2009 , publisher=

work page 2009

[4] [4]

International Conference on Machine Learning , year=

Recommendations from Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization , author=. International Conference on Machine Learning , year=

work page

[5] [5]

ACM Computing Surveys , year=

Causality in Bandits: A Survey , author=. ACM Computing Surveys , year=

work page

[6] [6]

2020 IEEE Congress on Evolutionary Computation (CEC) , pages=

Survey on applications of multi-armed and contextual bandits , author=. 2020 IEEE Congress on Evolutionary Computation (CEC) , pages=. 2020 , organization=

work page 2020

[7] [7]

Mobile health: sensors, analytic methods, and applications , pages=

From ads to interventions: Contextual bandits in mobile health , author=. Mobile health: sensors, analytic methods, and applications , pages=. 2017 , publisher=

work page 2017

[8] [8]

Statistical science: a review journal of the Institute of Mathematical Statistics , volume=

Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges , author=. Statistical science: a review journal of the Institute of Mathematical Statistics , volume=

work page

[9] [9]

Journal of medical Internet research , volume=

Reinforcement learning for clinical decision support in critical care: comprehensive review , author=. Journal of medical Internet research , volume=. 2020 , publisher=

work page 2020

[10] [10]

Proceedings of the 19th international conference on World wide web , pages=

A contextual-bandit approach to personalized news article recommendation , author=. Proceedings of the 19th international conference on World wide web , pages=

work page

[11] [11]

A Survey on Practical Applications of Multi-Armed and Contextual Bandits

A survey on practical applications of multi-armed and contextual bandits , author=. arXiv preprint arXiv:1904.10040 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904

[12] [12]

arXiv preprint arXiv:2503.07555 , year=

Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference , author=. arXiv preprint arXiv:2503.07555 , year=. 2503.07555 , archivePrefix=

work page arXiv

[13] [13]

Journal of Machine Learning Research , volume=

Recursive causal discovery , author=. Journal of Machine Learning Research , volume=

work page

[14] [14]

Advances in neural information processing systems , volume=

Causal bandits: Learning good interventions via causal inference , author=. Advances in neural information processing systems , volume=

work page

[15] [15]

Advances in neural information processing systems , volume=

Structural causal bandits: Where to intervene? , author=. Advances in neural information processing systems , volume=

work page

[16] [16]

Conference on Uncertainty in Artificial Intelligence , pages=

Regret analysis of bandit problems with causal background knowledge , author=. Conference on Uncertainty in Artificial Intelligence , pages=. 2020 , organization=

work page 2020

[17] [17]

International Conference on Artificial Intelligence and Statistics , pages=

Budgeted and non-budgeted causal bandits , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021

[18] [18]

Causal Learning and Reasoning , pages=

Confounded budgeted causal bandits , author=. Causal Learning and Reasoning , pages=. 2024 , organization=

work page 2024

[19] [19]

IEEE Journal on Selected Areas in Information Theory , volume=

Robust causal bandits for linear models , author=. IEEE Journal on Selected Areas in Information Theory , volume=. 2024 , publisher=

work page 2024

[20] [20]

Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals , author=

work page

[21] [21]

Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context

Optimal Best Arm Identification with Post-Action Context , author=. arXiv preprint arXiv:2502.03061 , year=. 2502.03061 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

International conference on Algorithmic learning theory , pages=

Pure exploration in multi-armed bandits problems , author=. International conference on Algorithmic learning theory , pages=. 2009 , organization=

work page 2009

[23] [23]

International Conference on Machine Learning , pages=

Revisiting simple regret: Fast rates for returning a good arm , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[24] [24]

Journal of the American statistical association , volume=

Probability inequalities for sums of bounded random variables , author=. Journal of the American statistical association , volume=. 1963 , publisher=

work page 1963

[25] [25]

THE ANNALS of STATISTICS , pages=

THE MULTI-ARMED BANDIT PROBLEM WITH COVARIATES , author=. THE ANNALS of STATISTICS , pages=. 2013 , publisher=

work page 2013

[26] [26]

Advances in Neural Information Processing Systems , volume=

A/b/n testing with control in the presence of subpopulations , author=. Advances in Neural Information Processing Systems , volume=

work page

[27] [27]

2023 Winter Simulation Conference (WSC) , pages=

Best Arm Identification with Fairness Constraints on Subpopulations , author=. 2023 Winter Simulation Conference (WSC) , pages=. 2023 , organization=

work page 2023

[28] [28]

International Conference on Machine Learning , pages=

Adaptive identification of populations with treatment benefit in clinical trials: machine learning challenges and solutions , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[29] [29]

, author=

Discriminative learning under covariate shift. , author=. Journal of Machine Learning Research , volume=

work page

[30] [30]

arXiv preprint arXiv:2402.16710 , year=

Cost aware best arm identification , author=. arXiv preprint arXiv:2402.16710 , year=

work page arXiv

[31] [31]

arXiv preprint arXiv:2506.24007 , year=

Minimax and Bayes Optimal Best-Arm Identification , author=. arXiv preprint arXiv:2506.24007 , year=

work page arXiv

[32] [32]

arXiv preprint arXiv:1810.07371 , year=

Simple regret minimization for contextual bandits , author=. arXiv preprint arXiv:1810.07371 , year=

work page arXiv

[33] [33]

arXiv preprint arXiv:2209.07330 , year=

Best arm identification with contextual information under a small gap , author=. arXiv preprint arXiv:2209.07330 , year=

work page arXiv

[34] [34]

Advances in neural information processing systems , volume=

Variational Bayesian optimal experimental design , author=. Advances in neural information processing systems , volume=

work page

[35] [35]

ICML2022 Workshop on Adaptive Experimental Design and Active Learning in the Real World , year=

Simple regret minimization for contextual bandits using bayesian optimal experimental design , author=. ICML2022 Workshop on Adaptive Experimental Design and Active Learning in the Real World , year=

work page

[36] [36]

arXiv preprint arXiv:2401.03756 , year=

Adaptive experimental design for policy learning , author=. arXiv preprint arXiv:2401.03756 , year=

work page arXiv

[37] [37]

2024 IEEE 63rd Conference on Decision and Control (CDC) , pages=

Fair best arm identification with fixed confidence , author=. 2024 IEEE 63rd Conference on Decision and Control (CDC) , pages=. 2024 , organization=

work page 2024

[38] [38]

1952 , publisher=

Inequalities , author=. 1952 , publisher=

work page 1952

[39] [39]

2009 , publisher=

Concentration of measure for the analysis of randomized algorithms , author=. 2009 , publisher=

work page 2009

[40] [40]

Advances in Neural Information Processing Systems , volume=

Proportional response: Contextual bandits for simple and cumulative regret minimization , author=. Advances in Neural Information Processing Systems , volume=

work page

[41] [41]

Advances in Neural Information Processing Systems , volume=

Multi-bandit best arm identification , author=. Advances in Neural Information Processing Systems , volume=

work page

[42] [42]

2019 IEEE International Symposium on Information Theory (ISIT) , pages=

Overlapping multi-bandit best arm identification , author=. 2019 IEEE International Symposium on Information Theory (ISIT) , pages=. 2019 , organization=

work page 2019

[43] [43]

Colt , pages=

Minimax policies for adversarial and stochastic bandits , author=. Colt , pages=

work page

[44] [44]

Acm transactions on interactive intelligent systems (tiis) , volume=

The movielens datasets: History and context , author=. Acm transactions on interactive intelligent systems (tiis) , volume=. 2015 , publisher=

work page 2015

[45] [45]

Conference on Learning Theory , pages=

Maximin action identification: A new bandit framework for games , author=. Conference on Learning Theory , pages=. 2016 , organization=

work page 2016

[46] [46]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Max-min grouped bandits , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[47] [47]

arXiv preprint arXiv:2505.17869 , year=

Best Group Identification in Multi-Objective Bandits , author=. arXiv preprint arXiv:2505.17869 , year=

work page arXiv

[48] [48]

Uncertainty in Artificial Intelligence , pages=

A causal bandit approach to learning good atomic interventions in presence of unobserved confounders , author=. Uncertainty in Artificial Intelligence , pages=. 2022 , organization=

work page 2022

[49] [49]

Graph Learning Is Suboptimal in Causal Bandits

Graph Learning is Suboptimal in Causal Bandits , author=. arXiv preprint arXiv:2510.16811 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[50] [50]

arXiv preprint arXiv:2601.21167 , year=

Efficient Simple Regret Algorithms for Stochastic Contextual Bandits , author=. arXiv preprint arXiv:2601.21167 , year=

work page arXiv