arxiv: 2605.09200 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: no theorem link

On Characterizing Learnability for Adversarial Noisy Bandits

Steve Hanneke , Kun Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:51 UTC · model grok-4.3

classification 💻 cs.LG

keywords adversarial banditsnoisy rewardslearnabilityfunction classesmaximin volumehitting setsmultiplicative weightsregret minimization

0 comments

The pith

Learnability in adversarial noisy bandits is characterized by a convexified generalized maximin volume.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks which known function classes allow a learner to achieve sublinear cumulative regret when an adversary chooses the function each round and rewards are noisy. It identifies the convexified generalized maximin volume as the quantity that decides this. For adversaries that ignore the learner's past actions, the volume being finite fully determines whether sublinear regret is possible. The same condition works for adaptive adversaries provided the arm space is countable. This supplies a concrete test for when efficient learning survives worst-case choices.

Core claim

A function class is learnable against oblivious adversaries precisely when its convexified generalized maximin volume is finite. The identical quantity characterizes learnability against adaptive adversaries whenever the arm space is countable. The proofs rest on an equivalence between this volume and the existence of simple hitting sets; the authors further conjecture that the volume remains the right criterion for uncountable arms through its link to the distribution covering number, a strengthened hitting-set notion that still permits efficient learning by the multiplicative-weights algorithm.

What carries the argument

the convexified generalized maximin volume, a convexified variant of the generalized maximin volume that measures function-class complexity and is equivalent to the existence of simple hitting sets

Load-bearing premise

The characterization for adaptive adversaries requires the arm space to be countable, and the uncountable case rests on a conjecture linking the convexified volume to the distribution covering number.

What would settle it

A concrete function class with countable arms whose convexified generalized maximin volume is finite, yet every algorithm incurs linear regret against some adaptive adversary, would disprove the claimed characterization.

read the original abstract

We study adversarial noisy bandits given a known function class $\mathcal{F}$. In each round, the adversary selects a function $f \in \mathcal{F}$, the learner chooses an arm, and then observes a noisy reward determined by the chosen arm and the function $f$. The goal is to minimize the cumulative regret $R(T)$, defined as the difference between the learner's performance and that of the best fixed arm in hindsight over $T$ rounds. We say that a function class $\mathcal{F}$ is learnable if there exists an algorithm achieving sublinear regret. Our main results concern characterizing learnability. The main quantity appearing in our characterization is a convexified variant of the generalized maximin volume introduced by Hanneke and Wang (2025). For oblivious adversaries, we characterize learnability in terms of this convexified generalized maximin volume. For adaptive adversaries, we show that the same quantity characterizes learnability when the arm space is countable. Our analysis builds on a connection between convexified generalized maximin volume and the existence of simple hitting sets. We further conjecture that the same quantity also characterizes learnability when the arm space is uncountable, via its relation to a new complexity measure, which we call the distribution covering number. This notion can be viewed as a strengthened form of the hitting set that still admits efficient learning via the multiplicative weights algorithm. We also pose a number of relevant open questions regarding this problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper fully characterizes learnability for oblivious and countable-arm adaptive adversarial noisy bandits but leaves the general case as a conjecture.

read the letter

The one or two things to know: the paper fully characterizes learnability for oblivious adversaries and for adaptive adversaries with countable arms using the convexified generalized maximin volume. The general adaptive case with uncountable arms is left as a conjecture involving a new distribution covering number. The work does well in establishing the link to hitting sets for the proven cases. This gives a combinatorial handle on when an algorithm exists that achieves sublinear regret against the adversary. The countable case extension shows that adaptivity does not change the condition when arms are discrete. The soft spot is the conjecture. The authors claim that the same volume controls learnability in general by bounding the distribution covering number, but no proof is provided for that relation or for the regret bound via multiplicative weights. Since the stress test highlights this, and the abstract confirms it is a conjecture, the central result for arbitrary adaptive adversaries is not yet established. The citation pattern is fine; they build directly on their prior work but add new connections. No major circularity because the hitting set idea stands on its own. This paper is for researchers in theoretical online learning who study bandit problems with function classes. A reader looking for sharp learnability conditions will get value from the proven parts and the open conjecture. It deserves a serious referee because the established characterizations are new and the framing is careful. I recommend sending it to peer review.

Referee Report

2 major / 3 minor

Summary. The paper studies learnability in adversarial noisy bandits with a known function class F. It introduces a convexified variant of the generalized maximin volume from prior work and proves that finite value of this quantity characterizes learnability (sublinear regret) for all oblivious adversaries and for adaptive adversaries when the arm space is countable, via a reduction to the existence of simple hitting sets. For adaptive adversaries over uncountable arm spaces, the authors conjecture that the same quantity characterizes learnability by controlling a new 'distribution covering number' (a strengthened hitting set) that still permits sublinear regret via the multiplicative weights algorithm. The manuscript also poses several open questions.

Significance. If the proven characterizations hold, the work supplies a clean, geometric condition for learnability in this adversarial noisy setting and strengthens the link between maximin-type volumes and combinatorial hitting-set arguments. The reduction to simple hitting sets for the countable case and the proposal to use multiplicative weights on the distribution covering number are concrete technical contributions that could transfer to related online learning problems. The conjecture, if resolved affirmatively, would complete a full characterization for arbitrary adaptive adversaries.

major comments (2)

[Abstract] Abstract and the statement of main results: the claim that the convexified generalized maximin volume 'characterizes learnability' for adaptive adversaries is only fully established for countable arm spaces; the extension to uncountable spaces rests entirely on an unproven conjecture linking the volume to the distribution covering number. Because the abstract presents the overall characterization of learnability as a main result, this gap is load-bearing for the broadest interpretation of the central claim.
[Main results (characterization for countable arms)] The reduction to hitting sets (mentioned in the abstract) is used to prove the countable-arm case, but the manuscript does not supply an explicit construction or bound showing that finite convexified volume implies the existence of a hitting set of controlled size; without this step the equivalence between the volume and learnability remains formal rather than quantitative.

minor comments (3)

[Abstract] The abstract introduces the 'convexified generalized maximin volume' and 'distribution covering number' without a one-sentence definition or pointer to the 2025 reference; adding a brief gloss would improve accessibility.
[Introduction] Notation for the function class F and the arm space is used without an early global definition; a short 'Notation' paragraph in the introduction would prevent ambiguity.
[Conclusion / Open questions] The open questions at the end are valuable but would benefit from being numbered and cross-referenced to the specific technical gaps (e.g., the uncountable conjecture) they address.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, constructive feedback, and recognition of the technical contributions in our manuscript. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract and the statement of main results: the claim that the convexified generalized maximin volume 'characterizes learnability' for adaptive adversaries is only fully established for countable arm spaces; the extension to uncountable spaces rests entirely on an unproven conjecture linking the volume to the distribution covering number. Because the abstract presents the overall characterization of learnability as a main result, this gap is load-bearing for the broadest interpretation of the central claim.

Authors: We appreciate the referee highlighting this potential source of ambiguity. The abstract already distinguishes the cases explicitly: it states that learnability is characterized for oblivious adversaries in full, and for adaptive adversaries only when the arm space is countable, while separately presenting the uncountable case as a conjecture via the distribution covering number. The main results section maintains the same separation. To further eliminate any risk of misinterpretation, we will revise the abstract to more prominently emphasize that the full characterization for arbitrary adaptive adversaries is conjectural. This is a clarification rather than a substantive change. revision: partial
Referee: [Main results (characterization for countable arms)] The reduction to hitting sets (mentioned in the abstract) is used to prove the countable-arm case, but the manuscript does not supply an explicit construction or bound showing that finite convexified volume implies the existence of a hitting set of controlled size; without this step the equivalence between the volume and learnability remains formal rather than quantitative.

Authors: The referee is correct that the current manuscript establishes the connection between finite convexified generalized maximin volume and the existence of simple hitting sets at a relatively high level, without an explicit construction or a quantitative bound on hitting-set size in terms of the volume. This leaves the link somewhat formal. We will add an explicit construction of the hitting set from a finite-volume assumption, together with a bound on its cardinality, in a new subsection of the main results (or appendix). This will make the reduction quantitative and strengthen the proof that finite volume implies sublinear regret for countable arms. revision: yes

Circularity Check

0 steps flagged

No significant circularity; characterizations rest on new reductions to hitting sets

full rationale

The paper introduces a convexified variant of the generalized maximin volume (defined via self-citation to Hanneke and Wang 2025) as the central complexity measure, then proves that finite value of this measure characterizes learnability for all oblivious adversaries and for adaptive adversaries on countable arm spaces. These proofs proceed via explicit reductions to existence of simple hitting sets, which constitute independent content not present in the cited prior work. The uncountable adaptive case is explicitly labeled a conjecture relating the volume to a new distribution covering number, with no claim of proof. No derivation step equates a claimed result to its own inputs by construction, renames a known pattern, or relies on a load-bearing self-citation whose validity is unverified outside this paper. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The results rest on standard online learning assumptions plus the new definitions of the convexified volume and distribution covering number; no free parameters are fitted to data.

axioms (2)

domain assumption The function class F is known in advance to the learner
Explicitly stated in the problem setup in the abstract.
standard math Regret is measured against the best fixed arm in hindsight
Standard definition of regret in bandit problems.

invented entities (2)

convexified generalized maximin volume no independent evidence
purpose: Quantity used to characterize learnability
Convexified variant of the quantity from the 2025 prior work.
distribution covering number no independent evidence
purpose: Strengthened hitting set notion for uncountable arm spaces
New complexity measure introduced to support the conjecture.

pith-pipeline@v0.9.0 · 5552 in / 1329 out tokens · 78349 ms · 2026-05-12T01:51:11.596588+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

The Thirty Sixth Annual Conference on Learning Theory , pages=

Tight guarantees for interactive decision making with the decision-estimation coefficient , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=

work page 2023
[2]

2023 , eprint=

The Statistical Complexity of Interactive Decision Making , author=. 2023 , eprint=

work page 2023
[3]

arXiv preprint arXiv:2410.09597 , year=

A Complete Characterization of Learnability for Stochastic Noisy Bandits , author=. arXiv preprint arXiv:2410.09597 , year=

work page arXiv
[4]

36th International Conference on Algorithmic Learning Theory , year=

A Complete Characterization of Learnability for Stochastic Noisy Bandits , author=. 36th International Conference on Algorithmic Learning Theory , year=

work page
[5]

2020 , publisher=

Bandit algorithms , author=. 2020 , publisher=

work page 2020
[6]

SIAM journal on computing , volume=

The nonstochastic multiarmed bandit problem , author=. SIAM journal on computing , volume=. 2002 , publisher=

work page 2002
[7]

Proceedings of IEEE 36th annual foundations of computer science , pages=

Gambling in a rigged casino: The adversarial multi-armed bandit problem , author=. Proceedings of IEEE 36th annual foundations of computer science , pages=. 1995 , organization=

work page 1995
[8]

International Conference on Algorithmic Learning Theory , pages=

Scale-free adversarial multi armed bandits , author=. International Conference on Algorithmic Learning Theory , pages=. 2022 , organization=

work page 2022
[9]

Advances in Neural Information Processing Systems , volume=

Explore no more: Improved high-probability regret bounds for non-stochastic bandits , author=. Advances in Neural Information Processing Systems , volume=

work page
[10]

Foundations and Trends in Machine Learning , year=

Regret analysis of stochastic and nonstochastic multi-armed bandit problems , author=. Foundations and Trends in Machine Learning , year=

work page
[11]

Conference on Learning Theory , pages=

Efficient and robust algorithms for adversarial linear contextual bandits , author=. Conference on Learning Theory , pages=. 2020 , organization=

work page 2020
[12]

Journal of Machine Learning Research , year =

Elad Hazan and Zohar Karnin , title =. Journal of Machine Learning Research , year =

work page
[13]

Conference On Learning Theory , pages=

The many faces of exponential weights in online learning , author=. Conference On Learning Theory , pages=. 2018 , organization=

work page 2018
[14]

Proceedings of The 28th Conference on Learning Theory , pages =

The entropic barrier: a simple and optimal universal self-concordant barrier , author =. Proceedings of The 28th Conference on Learning Theory , pages =. 2015 , volume =

work page 2015
[15]

Conference on Learning Theory , pages=

Adaptive discretization for adversarial lipschitz bandits , author=. Conference on Learning Theory , pages=. 2021 , organization=

work page 2021
[16]

Joint european conference on machine learning and knowledge discovery in databases , pages=

Online learning in adversarial lipschitz environments , author=. Joint european conference on machine learning and knowledge discovery in databases , pages=. 2010 , organization=

work page 2010
[17]

arXiv preprint arXiv:2006.12367 , year=

Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning , author=. arXiv preprint arXiv:2006.12367 , year=

work page arXiv 2006
[18]

Advances in Neural Information Processing Systems , volume=

On the complexity of adversarial decision making , author=. Advances in Neural Information Processing Systems , volume=

work page
[19]

Proceedings of the 24th Annual Conference on Learning Theory , pages=

Bandits, query learning, and the haystack dimension , author=. Proceedings of the 24th Annual Conference on Learning Theory , pages=. 2011 , organization=

work page 2011
[20]

arXiv preprint arXiv:2112.13487 , year=

The statistical complexity of interactive decision making , author=. arXiv preprint arXiv:2112.13487 , year=

work page arXiv
[21]

The Thirty Sixth Annual Conference on Learning Theory , pages=

Bandit learnability can be undecidable , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=

work page 2023
[22]

Proceedings of Thirty Eighth Conference on Learning Theory , pages =

On the Hardness of Bandit Learning , author =. Proceedings of Thirty Eighth Conference on Learning Theory , pages =. 2025 , publisher =

work page 2025
[23]

Advances in Neural Information Processing Systems , volume=

Eluder dimension and the sample complexity of optimistic exploration , author=. Advances in Neural Information Processing Systems , volume=

work page
[24]

Foundations and Trends in Machine Learning , volume=

Introduction to multi-armed bandits , author=. Foundations and Trends in Machine Learning , volume=. 2019 , publisher=

work page 2019
[25]

Machine learning , volume=

Finite-time analysis of the multiarmed bandit problem , author=. Machine learning , volume=. 2002 , publisher=

work page 2002
[26]

Bulletin of the American Mathematical Society , year=

Some aspects of the sequential design of experiments , author=. Bulletin of the American Mathematical Society , year=

work page