Leveraging Similarities in Multi-Armed Bandits

Augustin Cablant; Khaled Eldowa; Panayotis Mertikopoulos; Pierre Gaillard; Thibaud Rahier

arxiv: 2606.23414 · v1 · pith:CSIULQZFnew · submitted 2026-06-22 · 💻 cs.LG

Leveraging Similarities in Multi-Armed Bandits

Khaled Eldowa , Thibaud Rahier , Augustin Cablant , Panayotis Mertikopoulos , Pierre Gaillard This is my paper

Pith reviewed 2026-06-26 08:39 UTC · model grok-4.3

classification 💻 cs.LG

keywords multi-armed banditssimilarity structuretree-compatible lossesregret boundsLipschitz banditsfeedback modelseffective action countbest-of-both-worlds

0 comments

The pith

Tree-structured action similarities allow bandit algorithms to replace the full action count K by an effective count K_eff in regret bounds, but only when feedback is richer than one-point.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard one-point bandit feedback cannot exploit similarities between actions even when those similarities are encoded by a tree and losses are forced to be close for related actions. It then supplies a family of algorithms that handle a spectrum of feedback models from semi-bandit down to two-point bandit feedback and obtain regret bounds governed by an effective action count K_eff instead of the raw number of leaves K. These bounds hold in both stochastic and adversarial regimes. The same machinery yields square-root regret for Lipschitz bandits in dimension at most two when only two-point feedback is available.

Core claim

For loss sequences that are tree-compatible, meaning losses of actions whose lowest common ancestor is at a given tree level differ by at most a factor tied to that level, algorithms exist that achieve regret scaling with an effective number of actions K_eff induced by the tree rather than the total number of leaves K, provided the feedback model supplies at least two loss evaluations per round; the same algorithms are impossible to obtain under ordinary one-point feedback.

What carries the argument

A rooted tree on the action set whose internal nodes define similarity levels, together with the derived effective action count K_eff that counts how many distinct loss values are distinguishable under the tree compatibility constraint.

If this is right

Regret bounds replace the raw action count K by the smaller effective count K_eff across semi-bandit, multi-point, and two-point feedback models.
The algorithms simultaneously achieve the best known rates for both stochastic and adversarial environments.
Square-root regret becomes attainable for Lipschitz bandits on the unit ball when dimension is at most two and two-point feedback is supplied.
The same framework covers hierarchical action sets and tagged or latent-trait similarities encoded as trees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Two-point feedback may be sufficient to obtain near-optimal rates in other structured continuous problems whose similarity can be approximated by a tree.
The impossibility for one-point feedback suggests that any practical deployment of similarity-aware bandits will require at least occasional direct loss comparisons between nearby actions.
K_eff could serve as a design criterion for constructing or pruning action trees in applications where the number of leaves is large but the effective diversity is small.

Load-bearing premise

The loss sequence must be tree-compatible so that losses of similar actions stay close.

What would settle it

An explicit loss sequence that is tree-compatible yet forces any algorithm using only one-point feedback to suffer regret linear in the full number of actions K rather than in K_eff.

read the original abstract

In many online learning and bandit problems, the actions we consider possess inherent similarities--for instance because they share latent traits, tags, or hierarchical structure. We study online learning with a similarity-structured action set, encoded by a rooted tree whose leaves are the actions and whose levels quantify how closely two actions are related. The loss sequence is assumed tree-compatible: losses of similar actions are constrained to be close. We establish an impossibility result showing that usual one-point bandit feedback cannot, in general, leverage range or tree-induced similarity, even under very strong similarity constraints. We then provide a unified set of algorithms which adapt to a wide range of richer feedback models, from semi-bandit feedback down to multi-point bandit protocols, including the minimal two-point feedback setting. We show these algorithms exhibit best-of-both-worlds guarantees and provably exploit action similarities by replacing the number of actions $K$ by a similarity-aware effective number of actions $K_{\mathrm{eff}}$ in the regret bounds. As an application, we show that under two-point feedback, it is possible to achieve $\sqrt{T}$ regret in Lipschitz bandits when $d \leq 2$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

One-point feedback can't exploit tree similarities but two-point feedback yields sqrt(T) regret for d<=2 Lipschitz bandits via K_eff.

read the letter

The main point is that standard one-point bandit feedback cannot leverage tree-encoded action similarities even under strong assumptions, while two-point feedback lets you replace K with a smaller K_eff and reach sqrt(T) regret in Lipschitz bandits for dimension at most 2.

The paper sets up a rooted tree on the action set where levels measure similarity, assumes losses are tree-compatible so nearby actions have close losses, and proves an impossibility for one-point feedback. It then gives a family of algorithms that work across feedback models from semi-bandit down to two-point and substitute K_eff into the regret bounds. The Lipschitz application is the clearest payoff.

The unified treatment of feedback richness and the explicit impossibility result look like the genuinely new pieces. The best-of-both-worlds guarantees are a reasonable way to package the positive results.

The tree-compatibility assumption is the clearest limitation; it rules out many natural loss sequences and makes the model narrower than standard Lipschitz bandits. K_eff itself is only sketched in the abstract, so it is unclear how expensive it is to compute or whether the bounds remain useful when the tree is not perfectly known in advance. The d <= 2 restriction is stated plainly but still narrows the scope.

This is aimed at bandit researchers who care about structured action sets and feedback models. The theoretical claims are sharp enough that a serious referee should see the full proofs and check the derivation of K_eff.

Referee Report

0 major / 2 minor

Summary. The paper studies multi-armed bandits where actions form a similarity structure encoded by a rooted tree (leaves are actions, levels quantify relatedness). Losses are assumed tree-compatible (similar actions have close losses). It proves an impossibility result that standard one-point bandit feedback cannot leverage such similarities even under strong constraints. It then gives a unified family of algorithms for richer feedback models (semi-bandit down to minimal two-point feedback) that achieve best-of-both-worlds regret bounds by replacing the action count K with a similarity-aware effective count K_eff. As an application, two-point feedback yields √T regret for Lipschitz bandits when d ≤ 2.

Significance. If the proofs and definitions of K_eff hold, the work supplies a clean way to quantify and exploit tree-induced similarities across feedback regimes, with the two-point Lipschitz application providing a concrete improvement over standard one-point bounds in low dimensions. The impossibility result for one-point feedback and the best-of-both-worlds guarantees are useful complementary contributions.

minor comments (2)

The abstract introduces K_eff as the quantity that replaces K in the regret bounds but does not indicate how K_eff is computed from the tree or the loss-compatibility assumption; a brief parenthetical definition or reference to the main theorem would improve readability.
The tree-compatibility assumption on the loss sequence is central to both the impossibility and the positive results; a short illustrative example of a tree-compatible vs. incompatible loss sequence would help readers assess the modeling restriction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the potential significance of the impossibility result, the unified algorithms, and the Lipschitz application. We address the points raised below. Since the report lists no enumerated major comments after 'MAJOR COMMENTS:', we respond to the overall assessment and note that we stand ready to supply additional proof details or clarifications if the editor or referee requests them.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation relies on explicit modeling assumptions (rooted tree encoding action similarities, tree-compatible loss sequences) that are stated upfront and used to define K_eff as a similarity-aware effective action count; regret bounds are then derived by substituting this quantity into standard analyses for richer feedback models. The impossibility result for one-point feedback is shown separately as a complementary negative result, and the Lipschitz-bandit application for d ≤ 2 follows directly from instantiating the general two-point-feedback algorithm without reducing any bound to a fitted parameter or self-referential definition. No load-bearing step collapses to a self-citation chain, ansatz smuggled via prior work, or renaming of a known empirical pattern; the central claims remain independent of the target results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the tree encoding of similarities and the tree-compatibility assumption on losses; K_eff is derived from these rather than fitted.

axioms (2)

domain assumption Action set encoded by a rooted tree whose leaves are actions and levels quantify closeness.
Stated in the problem setup to define similarity structure.
domain assumption Loss sequence is tree-compatible: losses of similar actions are constrained to be close.
Invoked to enable leveraging similarities and to define K_eff.

pith-pipeline@v0.9.1-grok · 5744 in / 1353 out tokens · 19421 ms · 2026-06-26T08:39:12.833593+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 2 linked inside Pith

[1]

Improved algorithms for linear stochastic bandits

Yasin Abbasi-Yadkori, D \'a vid P \'a l, and Csaba Szepesv \'a ri. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011

2011
[2]

Optimal algorithms for online convex optimization with multi-point bandit feedback

Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Colt, pages 28--40, 2010

2010
[3]

Thompson sampling for contextual bandits with linear payoffs

Shipra Agrawal and Navin Goyal. Thompson sampling for contextual bandits with linear payoffs. In International conference on machine learning, pages 127--135. PMLR, 2013

2013
[4]

Mixed-effect thompson sampling

Imad Aouali, Branislav Kveton, and Sumeet Katariya. Mixed-effect thompson sampling. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 2087--2115. PMLR, 2023

2087
[5]

Finite-time analysis of the multiarmed bandit problem

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47 0 (2): 0 235--256, 2002

2002
[6]

Convex optimization theory, volume 1

Dimitri Bertsekas. Convex optimization theory, volume 1. Athena Scientific, 2009

2009
[7]

Sébastien Bubeck, Nicoló Cesa-Bianchi, and Sham M. Kakade. Towards minimax policies for online linear optimization with bandit feedback. In Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, pages 41.1--41.14. PMLR, 2012

2012
[8]

Algorithmic chaining and the role of partial feedback in online nonparametric learning

Nicol \`o Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, and S \'e bastien Gerchinovitz. Algorithmic chaining and the role of partial feedback in online nonparametric learning. In Conference on Learning Theory, pages 465--481. PMLR, 2017

2017
[9]

Combinatorial bandits revisited

Richard Combes, Mohammad Sadegh Talebi Mazraeh Shahi, Alexandre Proutiere, et al. Combinatorial bandits revisited. Advances in neural information processing systems, 28, 2015

2015
[10]

Minimal exploration in structured stochastic bandits

Richard Combes, Stefan Magureanu, and Alexandre Proutiere. Minimal exploration in structured stochastic bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2017

2017
[11]

A blackbox approach to best of both worlds in bandits and beyond

Chris Dann, Chen - Yu Wei, and Julian Zimmert. A blackbox approach to best of both worlds in bandits and beyond. In Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 5503--5570. PMLR, 12--15 Jul 2023

2023
[12]

Bandits with side observations: Bounded vs

R \'e my Degenne, Evrard Garcelon, and Vianney Perchet. Bandits with side observations: Bounded vs. logarithmic regret. In Conference on Uncertainty in Artificial Intelligence, 2018

2018
[13]

Optimal rates for zero-order convex optimization: The power of two function evaluations

John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61 0 (5): 0 2788--2806, 2015

2015
[14]

Online convex optimization in the bandit setting: gradient descent without a gradient

Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. arXiv preprint cs/0408007, 2004

Pith/arXiv arXiv 2004
[15]

Refined lower bounds for adversarial bandits

S \'e bastien Gerchinovitz and Tor Lattimore. Refined lower bounds for adversarial bandits. Advances in Neural Information Processing Systems, 29, 2016

2016
[16]

Zeroth-order non-convex learning via hierarchical dual averaging

Am \'e lie H \'e liou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Zeroth-order non-convex learning via hierarchical dual averaging. In International Conference on Machine Learning, pages 4192--4202. PMLR, 2021

2021
[17]

Deep hierarchy in bandits

Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, and Mohammad Ghavamzadeh. Deep hierarchy in bandits. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 8833--8851. PMLR, 2022 a

2022
[18]

Hierarchical bayesian bandits

Joey Hong, Branislav Kveton, Manzil Zaheer, and Mohammad Ghavamzadeh. Hierarchical bayesian bandits. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 7724--7741. PMLR, 2022 b

2022
[19]

Tight first-and second-order regret bounds for adversarial linear bandits

Shinji Ito, Shuichi Hirahara, Tasuku Soma, and Yuichi Yoshida. Tight first-and second-order regret bounds for adversarial linear bandits. Advances in Neural Information Processing Systems, 33: 0 2028--2038, 2020

2028
[20]

Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs

Shinji Ito, Taira Tsuchiya, and Junya Honda. Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs. In Advances in Neural Information Processing Systems, volume 35, pages 28631--28643. Curran Associates, Inc., 2022

2022
[21]

David S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 9 0 (3): 0 256--278, 1974. ISSN 0022-0000

1974
[22]

Robust lipschitz bandits to adversarial corruptions

Yue Kang, Cho-Jui Hsieh, and Thomas Chun Man Lee. Robust lipschitz bandits to adversarial corruptions. Advances in Neural Information Processing Systems, 36: 0 10897--10908, 2023

2023
[23]

Nearly tight bounds for the continuum-armed bandit problem

Robert Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In Advances in Neural Information Processing Systems, volume 17. MIT Press, 2004

2004
[24]

Multi-armed bandits in metric spaces

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681--690, 2008

2008
[25]

Bandits and experts in metric spaces

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Bandits and experts in metric spaces. Journal of the ACM (JACM), 66 0 (4): 0 1--77, 2019

2019
[26]

Bandit algorithms

Tor Lattimore and Csaba Szepesv \'a ri. Bandit algorithms. Cambridge University Press, 2020

2020
[27]

Adaptive regret for bandits made possible: Two queries suffice

Zhou Lu, Qiuyi Zhang, Xinyi Chen, Fred Zhang, David Woodruff, and Elad Hazan. Adaptive regret for bandits made possible: Two queries suffice. arXiv preprint arXiv:2401.09278, 2024

arXiv 2024
[28]

Nested bandits

Matthieu Martin, Panayotis Mertikopoulos, Thibaud Rahier, and Houssam Zenati. Nested bandits. In International Conference on Machine Learning, pages 15093--15121. PMLR, 2022

2022
[29]

Prior-dependent allocations for bayesian fixed-budget best-arm identification in structured bandits

Nicolas Nguyen, Imad Aouali, Andr \'a s Gy \"o rgy, and Claire Vernade. Prior-dependent allocations for bayesian fixed-budget best-arm identification in structured bandits. In Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, volume 258 of Proceedings of Machine Learning Research, pages 379--387. PMLR, 2025

2025
[30]

A modern introduction to online learning

Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2025

Pith/arXiv arXiv 1912
[31]

Adaptive discretization for adversarial lipschitz bandits

Chara Podimata and Alex Slivkins. Adaptive discretization for adversarial lipschitz bandits. In Conference on Learning Theory, pages 3788--3805. PMLR, 2021

2021
[32]

Tyrrell Rockafellar

R. Tyrrell Rockafellar. Convex analysis. Princeton Mathematical Series. Princeton University Press, 1970

1970
[33]

Exploiting easy data in online optimization

Amir Sani, Gergely Neu, and Alessandro Lazaric. Exploiting easy data in online optimization. Advances in Neural Information Processing Systems, 27, 2014

2014
[34]

An optimal algorithm for bandit and zero-order convex optimization with two-point feedback

Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. Journal of Machine Learning Research, 18 0 (52): 0 1--11, 2017

2017
[35]

A tight analysis of the greedy algorithm for set cover

Petr Slav\' k. A tight analysis of the greedy algorithm for set cover. Journal of Algorithms, 25 0 (2): 0 237--254, 1997. ISSN 0196-6774

1997
[36]

Multi-armed bandits on implicit metric spaces

Aleksandrs Slivkins. Multi-armed bandits on implicit metric spaces. Advances in Neural Information Processing Systems, 24, 2011

2011
[37]

Adaptation to easy data in prediction with limited advice

Tobias Sommer Thune and Yevgeny Seldin. Adaptation to easy data in prediction with limited advice. Advances in Neural Information Processing Systems, 31, 2018

2018
[38]

Efficient learning in large-scale combinatorial semi-bandits

Zheng Wen, Branislav Kveton, and Azin Ashkan. Efficient learning in large-scale combinatorial semi-bandits. In International Conference on Machine Learning, pages 1113--1122. PMLR, 2015

2015
[39]

Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits

Julian Zimmert and Yevgeny Seldin. Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22 0 (28): 0 1--49, 2021

2021

[1] [1]

Improved algorithms for linear stochastic bandits

Yasin Abbasi-Yadkori, D \'a vid P \'a l, and Csaba Szepesv \'a ri. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011

2011

[2] [2]

Optimal algorithms for online convex optimization with multi-point bandit feedback

Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Colt, pages 28--40, 2010

2010

[3] [3]

Thompson sampling for contextual bandits with linear payoffs

Shipra Agrawal and Navin Goyal. Thompson sampling for contextual bandits with linear payoffs. In International conference on machine learning, pages 127--135. PMLR, 2013

2013

[4] [4]

Mixed-effect thompson sampling

Imad Aouali, Branislav Kveton, and Sumeet Katariya. Mixed-effect thompson sampling. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 2087--2115. PMLR, 2023

2087

[5] [5]

Finite-time analysis of the multiarmed bandit problem

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47 0 (2): 0 235--256, 2002

2002

[6] [6]

Convex optimization theory, volume 1

Dimitri Bertsekas. Convex optimization theory, volume 1. Athena Scientific, 2009

2009

[7] [7]

Sébastien Bubeck, Nicoló Cesa-Bianchi, and Sham M. Kakade. Towards minimax policies for online linear optimization with bandit feedback. In Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, pages 41.1--41.14. PMLR, 2012

2012

[8] [8]

Algorithmic chaining and the role of partial feedback in online nonparametric learning

Nicol \`o Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, and S \'e bastien Gerchinovitz. Algorithmic chaining and the role of partial feedback in online nonparametric learning. In Conference on Learning Theory, pages 465--481. PMLR, 2017

2017

[9] [9]

Combinatorial bandits revisited

Richard Combes, Mohammad Sadegh Talebi Mazraeh Shahi, Alexandre Proutiere, et al. Combinatorial bandits revisited. Advances in neural information processing systems, 28, 2015

2015

[10] [10]

Minimal exploration in structured stochastic bandits

Richard Combes, Stefan Magureanu, and Alexandre Proutiere. Minimal exploration in structured stochastic bandits. In Advances in Neural Information Processing Systems (NeurIPS), 2017

2017

[11] [11]

A blackbox approach to best of both worlds in bandits and beyond

Chris Dann, Chen - Yu Wei, and Julian Zimmert. A blackbox approach to best of both worlds in bandits and beyond. In Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 5503--5570. PMLR, 12--15 Jul 2023

2023

[12] [12]

Bandits with side observations: Bounded vs

R \'e my Degenne, Evrard Garcelon, and Vianney Perchet. Bandits with side observations: Bounded vs. logarithmic regret. In Conference on Uncertainty in Artificial Intelligence, 2018

2018

[13] [13]

Optimal rates for zero-order convex optimization: The power of two function evaluations

John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61 0 (5): 0 2788--2806, 2015

2015

[14] [14]

Online convex optimization in the bandit setting: gradient descent without a gradient

Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. arXiv preprint cs/0408007, 2004

Pith/arXiv arXiv 2004

[15] [15]

Refined lower bounds for adversarial bandits

S \'e bastien Gerchinovitz and Tor Lattimore. Refined lower bounds for adversarial bandits. Advances in Neural Information Processing Systems, 29, 2016

2016

[16] [16]

Zeroth-order non-convex learning via hierarchical dual averaging

Am \'e lie H \'e liou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Zeroth-order non-convex learning via hierarchical dual averaging. In International Conference on Machine Learning, pages 4192--4202. PMLR, 2021

2021

[17] [17]

Deep hierarchy in bandits

Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, and Mohammad Ghavamzadeh. Deep hierarchy in bandits. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 8833--8851. PMLR, 2022 a

2022

[18] [18]

Hierarchical bayesian bandits

Joey Hong, Branislav Kveton, Manzil Zaheer, and Mohammad Ghavamzadeh. Hierarchical bayesian bandits. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 7724--7741. PMLR, 2022 b

2022

[19] [19]

Tight first-and second-order regret bounds for adversarial linear bandits

Shinji Ito, Shuichi Hirahara, Tasuku Soma, and Yuichi Yoshida. Tight first-and second-order regret bounds for adversarial linear bandits. Advances in Neural Information Processing Systems, 33: 0 2028--2038, 2020

2028

[20] [20]

Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs

Shinji Ito, Taira Tsuchiya, and Junya Honda. Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs. In Advances in Neural Information Processing Systems, volume 35, pages 28631--28643. Curran Associates, Inc., 2022

2022

[21] [21]

David S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 9 0 (3): 0 256--278, 1974. ISSN 0022-0000

1974

[22] [22]

Robust lipschitz bandits to adversarial corruptions

Yue Kang, Cho-Jui Hsieh, and Thomas Chun Man Lee. Robust lipschitz bandits to adversarial corruptions. Advances in Neural Information Processing Systems, 36: 0 10897--10908, 2023

2023

[23] [23]

Nearly tight bounds for the continuum-armed bandit problem

Robert Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In Advances in Neural Information Processing Systems, volume 17. MIT Press, 2004

2004

[24] [24]

Multi-armed bandits in metric spaces

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681--690, 2008

2008

[25] [25]

Bandits and experts in metric spaces

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Bandits and experts in metric spaces. Journal of the ACM (JACM), 66 0 (4): 0 1--77, 2019

2019

[26] [26]

Bandit algorithms

Tor Lattimore and Csaba Szepesv \'a ri. Bandit algorithms. Cambridge University Press, 2020

2020

[27] [27]

Adaptive regret for bandits made possible: Two queries suffice

Zhou Lu, Qiuyi Zhang, Xinyi Chen, Fred Zhang, David Woodruff, and Elad Hazan. Adaptive regret for bandits made possible: Two queries suffice. arXiv preprint arXiv:2401.09278, 2024

arXiv 2024

[28] [28]

Nested bandits

Matthieu Martin, Panayotis Mertikopoulos, Thibaud Rahier, and Houssam Zenati. Nested bandits. In International Conference on Machine Learning, pages 15093--15121. PMLR, 2022

2022

[29] [29]

Prior-dependent allocations for bayesian fixed-budget best-arm identification in structured bandits

Nicolas Nguyen, Imad Aouali, Andr \'a s Gy \"o rgy, and Claire Vernade. Prior-dependent allocations for bayesian fixed-budget best-arm identification in structured bandits. In Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, volume 258 of Proceedings of Machine Learning Research, pages 379--387. PMLR, 2025

2025

[30] [30]

A modern introduction to online learning

Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2025

Pith/arXiv arXiv 1912

[31] [31]

Adaptive discretization for adversarial lipschitz bandits

Chara Podimata and Alex Slivkins. Adaptive discretization for adversarial lipschitz bandits. In Conference on Learning Theory, pages 3788--3805. PMLR, 2021

2021

[32] [32]

Tyrrell Rockafellar

R. Tyrrell Rockafellar. Convex analysis. Princeton Mathematical Series. Princeton University Press, 1970

1970

[33] [33]

Exploiting easy data in online optimization

Amir Sani, Gergely Neu, and Alessandro Lazaric. Exploiting easy data in online optimization. Advances in Neural Information Processing Systems, 27, 2014

2014

[34] [34]

An optimal algorithm for bandit and zero-order convex optimization with two-point feedback

Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. Journal of Machine Learning Research, 18 0 (52): 0 1--11, 2017

2017

[35] [35]

A tight analysis of the greedy algorithm for set cover

Petr Slav\' k. A tight analysis of the greedy algorithm for set cover. Journal of Algorithms, 25 0 (2): 0 237--254, 1997. ISSN 0196-6774

1997

[36] [36]

Multi-armed bandits on implicit metric spaces

Aleksandrs Slivkins. Multi-armed bandits on implicit metric spaces. Advances in Neural Information Processing Systems, 24, 2011

2011

[37] [37]

Adaptation to easy data in prediction with limited advice

Tobias Sommer Thune and Yevgeny Seldin. Adaptation to easy data in prediction with limited advice. Advances in Neural Information Processing Systems, 31, 2018

2018

[38] [38]

Efficient learning in large-scale combinatorial semi-bandits

Zheng Wen, Branislav Kveton, and Azin Ashkan. Efficient learning in large-scale combinatorial semi-bandits. In International Conference on Machine Learning, pages 1113--1122. PMLR, 2015

2015

[39] [39]

Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits

Julian Zimmert and Yevgeny Seldin. Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22 0 (28): 0 1--49, 2021

2021