Recognition: no theorem link
On Characterizing Learnability for Adversarial Noisy Bandits
Pith reviewed 2026-05-12 01:51 UTC · model grok-4.3
The pith
Learnability in adversarial noisy bandits is characterized by a convexified generalized maximin volume.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A function class is learnable against oblivious adversaries precisely when its convexified generalized maximin volume is finite. The identical quantity characterizes learnability against adaptive adversaries whenever the arm space is countable. The proofs rest on an equivalence between this volume and the existence of simple hitting sets; the authors further conjecture that the volume remains the right criterion for uncountable arms through its link to the distribution covering number, a strengthened hitting-set notion that still permits efficient learning by the multiplicative-weights algorithm.
What carries the argument
the convexified generalized maximin volume, a convexified variant of the generalized maximin volume that measures function-class complexity and is equivalent to the existence of simple hitting sets
Load-bearing premise
The characterization for adaptive adversaries requires the arm space to be countable, and the uncountable case rests on a conjecture linking the convexified volume to the distribution covering number.
What would settle it
A concrete function class with countable arms whose convexified generalized maximin volume is finite, yet every algorithm incurs linear regret against some adaptive adversary, would disprove the claimed characterization.
read the original abstract
We study adversarial noisy bandits given a known function class $\mathcal{F}$. In each round, the adversary selects a function $f \in \mathcal{F}$, the learner chooses an arm, and then observes a noisy reward determined by the chosen arm and the function $f$. The goal is to minimize the cumulative regret $R(T)$, defined as the difference between the learner's performance and that of the best fixed arm in hindsight over $T$ rounds. We say that a function class $\mathcal{F}$ is learnable if there exists an algorithm achieving sublinear regret. Our main results concern characterizing learnability. The main quantity appearing in our characterization is a convexified variant of the generalized maximin volume introduced by Hanneke and Wang (2025). For oblivious adversaries, we characterize learnability in terms of this convexified generalized maximin volume. For adaptive adversaries, we show that the same quantity characterizes learnability when the arm space is countable. Our analysis builds on a connection between convexified generalized maximin volume and the existence of simple hitting sets. We further conjecture that the same quantity also characterizes learnability when the arm space is uncountable, via its relation to a new complexity measure, which we call the distribution covering number. This notion can be viewed as a strengthened form of the hitting set that still admits efficient learning via the multiplicative weights algorithm. We also pose a number of relevant open questions regarding this problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies learnability in adversarial noisy bandits with a known function class F. It introduces a convexified variant of the generalized maximin volume from prior work and proves that finite value of this quantity characterizes learnability (sublinear regret) for all oblivious adversaries and for adaptive adversaries when the arm space is countable, via a reduction to the existence of simple hitting sets. For adaptive adversaries over uncountable arm spaces, the authors conjecture that the same quantity characterizes learnability by controlling a new 'distribution covering number' (a strengthened hitting set) that still permits sublinear regret via the multiplicative weights algorithm. The manuscript also poses several open questions.
Significance. If the proven characterizations hold, the work supplies a clean, geometric condition for learnability in this adversarial noisy setting and strengthens the link between maximin-type volumes and combinatorial hitting-set arguments. The reduction to simple hitting sets for the countable case and the proposal to use multiplicative weights on the distribution covering number are concrete technical contributions that could transfer to related online learning problems. The conjecture, if resolved affirmatively, would complete a full characterization for arbitrary adaptive adversaries.
major comments (2)
- [Abstract] Abstract and the statement of main results: the claim that the convexified generalized maximin volume 'characterizes learnability' for adaptive adversaries is only fully established for countable arm spaces; the extension to uncountable spaces rests entirely on an unproven conjecture linking the volume to the distribution covering number. Because the abstract presents the overall characterization of learnability as a main result, this gap is load-bearing for the broadest interpretation of the central claim.
- [Main results (characterization for countable arms)] The reduction to hitting sets (mentioned in the abstract) is used to prove the countable-arm case, but the manuscript does not supply an explicit construction or bound showing that finite convexified volume implies the existence of a hitting set of controlled size; without this step the equivalence between the volume and learnability remains formal rather than quantitative.
minor comments (3)
- [Abstract] The abstract introduces the 'convexified generalized maximin volume' and 'distribution covering number' without a one-sentence definition or pointer to the 2025 reference; adding a brief gloss would improve accessibility.
- [Introduction] Notation for the function class F and the arm space is used without an early global definition; a short 'Notation' paragraph in the introduction would prevent ambiguity.
- [Conclusion / Open questions] The open questions at the end are valuable but would benefit from being numbered and cross-referenced to the specific technical gaps (e.g., the uncountable conjecture) they address.
Simulated Author's Rebuttal
We thank the referee for the careful reading, constructive feedback, and recognition of the technical contributions in our manuscript. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract and the statement of main results: the claim that the convexified generalized maximin volume 'characterizes learnability' for adaptive adversaries is only fully established for countable arm spaces; the extension to uncountable spaces rests entirely on an unproven conjecture linking the volume to the distribution covering number. Because the abstract presents the overall characterization of learnability as a main result, this gap is load-bearing for the broadest interpretation of the central claim.
Authors: We appreciate the referee highlighting this potential source of ambiguity. The abstract already distinguishes the cases explicitly: it states that learnability is characterized for oblivious adversaries in full, and for adaptive adversaries only when the arm space is countable, while separately presenting the uncountable case as a conjecture via the distribution covering number. The main results section maintains the same separation. To further eliminate any risk of misinterpretation, we will revise the abstract to more prominently emphasize that the full characterization for arbitrary adaptive adversaries is conjectural. This is a clarification rather than a substantive change. revision: partial
-
Referee: [Main results (characterization for countable arms)] The reduction to hitting sets (mentioned in the abstract) is used to prove the countable-arm case, but the manuscript does not supply an explicit construction or bound showing that finite convexified volume implies the existence of a hitting set of controlled size; without this step the equivalence between the volume and learnability remains formal rather than quantitative.
Authors: The referee is correct that the current manuscript establishes the connection between finite convexified generalized maximin volume and the existence of simple hitting sets at a relatively high level, without an explicit construction or a quantitative bound on hitting-set size in terms of the volume. This leaves the link somewhat formal. We will add an explicit construction of the hitting set from a finite-volume assumption, together with a bound on its cardinality, in a new subsection of the main results (or appendix). This will make the reduction quantitative and strengthen the proof that finite volume implies sublinear regret for countable arms. revision: yes
Circularity Check
No significant circularity; characterizations rest on new reductions to hitting sets
full rationale
The paper introduces a convexified variant of the generalized maximin volume (defined via self-citation to Hanneke and Wang 2025) as the central complexity measure, then proves that finite value of this measure characterizes learnability for all oblivious adversaries and for adaptive adversaries on countable arm spaces. These proofs proceed via explicit reductions to existence of simple hitting sets, which constitute independent content not present in the cited prior work. The uncountable adaptive case is explicitly labeled a conjecture relating the volume to a new distribution covering number, with no claim of proof. No derivation step equates a claimed result to its own inputs by construction, renames a known pattern, or relies on a load-bearing self-citation whose validity is unverified outside this paper. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The function class F is known in advance to the learner
- standard math Regret is measured against the best fixed arm in hindsight
invented entities (2)
-
convexified generalized maximin volume
no independent evidence
-
distribution covering number
no independent evidence
Reference graph
Works this paper leans on
-
[1]
The Thirty Sixth Annual Conference on Learning Theory , pages=
Tight guarantees for interactive decision making with the decision-estimation coefficient , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=
work page 2023
-
[2]
The Statistical Complexity of Interactive Decision Making , author=. 2023 , eprint=
work page 2023
-
[3]
arXiv preprint arXiv:2410.09597 , year=
A Complete Characterization of Learnability for Stochastic Noisy Bandits , author=. arXiv preprint arXiv:2410.09597 , year=
-
[4]
36th International Conference on Algorithmic Learning Theory , year=
A Complete Characterization of Learnability for Stochastic Noisy Bandits , author=. 36th International Conference on Algorithmic Learning Theory , year=
- [5]
-
[6]
SIAM journal on computing , volume=
The nonstochastic multiarmed bandit problem , author=. SIAM journal on computing , volume=. 2002 , publisher=
work page 2002
-
[7]
Proceedings of IEEE 36th annual foundations of computer science , pages=
Gambling in a rigged casino: The adversarial multi-armed bandit problem , author=. Proceedings of IEEE 36th annual foundations of computer science , pages=. 1995 , organization=
work page 1995
-
[8]
International Conference on Algorithmic Learning Theory , pages=
Scale-free adversarial multi armed bandits , author=. International Conference on Algorithmic Learning Theory , pages=. 2022 , organization=
work page 2022
-
[9]
Advances in Neural Information Processing Systems , volume=
Explore no more: Improved high-probability regret bounds for non-stochastic bandits , author=. Advances in Neural Information Processing Systems , volume=
-
[10]
Foundations and Trends in Machine Learning , year=
Regret analysis of stochastic and nonstochastic multi-armed bandit problems , author=. Foundations and Trends in Machine Learning , year=
-
[11]
Conference on Learning Theory , pages=
Efficient and robust algorithms for adversarial linear contextual bandits , author=. Conference on Learning Theory , pages=. 2020 , organization=
work page 2020
-
[12]
Journal of Machine Learning Research , year =
Elad Hazan and Zohar Karnin , title =. Journal of Machine Learning Research , year =
-
[13]
Conference On Learning Theory , pages=
The many faces of exponential weights in online learning , author=. Conference On Learning Theory , pages=. 2018 , organization=
work page 2018
-
[14]
Proceedings of The 28th Conference on Learning Theory , pages =
The entropic barrier: a simple and optimal universal self-concordant barrier , author =. Proceedings of The 28th Conference on Learning Theory , pages =. 2015 , volume =
work page 2015
-
[15]
Conference on Learning Theory , pages=
Adaptive discretization for adversarial lipschitz bandits , author=. Conference on Learning Theory , pages=. 2021 , organization=
work page 2021
-
[16]
Joint european conference on machine learning and knowledge discovery in databases , pages=
Online learning in adversarial lipschitz environments , author=. Joint european conference on machine learning and knowledge discovery in databases , pages=. 2010 , organization=
work page 2010
-
[17]
arXiv preprint arXiv:2006.12367 , year=
Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning , author=. arXiv preprint arXiv:2006.12367 , year=
-
[18]
Advances in Neural Information Processing Systems , volume=
On the complexity of adversarial decision making , author=. Advances in Neural Information Processing Systems , volume=
-
[19]
Proceedings of the 24th Annual Conference on Learning Theory , pages=
Bandits, query learning, and the haystack dimension , author=. Proceedings of the 24th Annual Conference on Learning Theory , pages=. 2011 , organization=
work page 2011
-
[20]
arXiv preprint arXiv:2112.13487 , year=
The statistical complexity of interactive decision making , author=. arXiv preprint arXiv:2112.13487 , year=
-
[21]
The Thirty Sixth Annual Conference on Learning Theory , pages=
Bandit learnability can be undecidable , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=
work page 2023
-
[22]
Proceedings of Thirty Eighth Conference on Learning Theory , pages =
On the Hardness of Bandit Learning , author =. Proceedings of Thirty Eighth Conference on Learning Theory , pages =. 2025 , publisher =
work page 2025
-
[23]
Advances in Neural Information Processing Systems , volume=
Eluder dimension and the sample complexity of optimistic exploration , author=. Advances in Neural Information Processing Systems , volume=
-
[24]
Foundations and Trends in Machine Learning , volume=
Introduction to multi-armed bandits , author=. Foundations and Trends in Machine Learning , volume=. 2019 , publisher=
work page 2019
-
[25]
Finite-time analysis of the multiarmed bandit problem , author=. Machine learning , volume=. 2002 , publisher=
work page 2002
-
[26]
Bulletin of the American Mathematical Society , year=
Some aspects of the sequential design of experiments , author=. Bulletin of the American Mathematical Society , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.