pith. machine review for the scientific record. sign in

arxiv: 2605.09200 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: no theorem link

On Characterizing Learnability for Adversarial Noisy Bandits

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:51 UTC · model grok-4.3

classification 💻 cs.LG
keywords adversarial banditsnoisy rewardslearnabilityfunction classesmaximin volumehitting setsmultiplicative weightsregret minimization
0
0 comments X

The pith

Learnability in adversarial noisy bandits is characterized by a convexified generalized maximin volume.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks which known function classes allow a learner to achieve sublinear cumulative regret when an adversary chooses the function each round and rewards are noisy. It identifies the convexified generalized maximin volume as the quantity that decides this. For adversaries that ignore the learner's past actions, the volume being finite fully determines whether sublinear regret is possible. The same condition works for adaptive adversaries provided the arm space is countable. This supplies a concrete test for when efficient learning survives worst-case choices.

Core claim

A function class is learnable against oblivious adversaries precisely when its convexified generalized maximin volume is finite. The identical quantity characterizes learnability against adaptive adversaries whenever the arm space is countable. The proofs rest on an equivalence between this volume and the existence of simple hitting sets; the authors further conjecture that the volume remains the right criterion for uncountable arms through its link to the distribution covering number, a strengthened hitting-set notion that still permits efficient learning by the multiplicative-weights algorithm.

What carries the argument

the convexified generalized maximin volume, a convexified variant of the generalized maximin volume that measures function-class complexity and is equivalent to the existence of simple hitting sets

Load-bearing premise

The characterization for adaptive adversaries requires the arm space to be countable, and the uncountable case rests on a conjecture linking the convexified volume to the distribution covering number.

What would settle it

A concrete function class with countable arms whose convexified generalized maximin volume is finite, yet every algorithm incurs linear regret against some adaptive adversary, would disprove the claimed characterization.

read the original abstract

We study adversarial noisy bandits given a known function class $\mathcal{F}$. In each round, the adversary selects a function $f \in \mathcal{F}$, the learner chooses an arm, and then observes a noisy reward determined by the chosen arm and the function $f$. The goal is to minimize the cumulative regret $R(T)$, defined as the difference between the learner's performance and that of the best fixed arm in hindsight over $T$ rounds. We say that a function class $\mathcal{F}$ is learnable if there exists an algorithm achieving sublinear regret. Our main results concern characterizing learnability. The main quantity appearing in our characterization is a convexified variant of the generalized maximin volume introduced by Hanneke and Wang (2025). For oblivious adversaries, we characterize learnability in terms of this convexified generalized maximin volume. For adaptive adversaries, we show that the same quantity characterizes learnability when the arm space is countable. Our analysis builds on a connection between convexified generalized maximin volume and the existence of simple hitting sets. We further conjecture that the same quantity also characterizes learnability when the arm space is uncountable, via its relation to a new complexity measure, which we call the distribution covering number. This notion can be viewed as a strengthened form of the hitting set that still admits efficient learning via the multiplicative weights algorithm. We also pose a number of relevant open questions regarding this problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper studies learnability in adversarial noisy bandits with a known function class F. It introduces a convexified variant of the generalized maximin volume from prior work and proves that finite value of this quantity characterizes learnability (sublinear regret) for all oblivious adversaries and for adaptive adversaries when the arm space is countable, via a reduction to the existence of simple hitting sets. For adaptive adversaries over uncountable arm spaces, the authors conjecture that the same quantity characterizes learnability by controlling a new 'distribution covering number' (a strengthened hitting set) that still permits sublinear regret via the multiplicative weights algorithm. The manuscript also poses several open questions.

Significance. If the proven characterizations hold, the work supplies a clean, geometric condition for learnability in this adversarial noisy setting and strengthens the link between maximin-type volumes and combinatorial hitting-set arguments. The reduction to simple hitting sets for the countable case and the proposal to use multiplicative weights on the distribution covering number are concrete technical contributions that could transfer to related online learning problems. The conjecture, if resolved affirmatively, would complete a full characterization for arbitrary adaptive adversaries.

major comments (2)
  1. [Abstract] Abstract and the statement of main results: the claim that the convexified generalized maximin volume 'characterizes learnability' for adaptive adversaries is only fully established for countable arm spaces; the extension to uncountable spaces rests entirely on an unproven conjecture linking the volume to the distribution covering number. Because the abstract presents the overall characterization of learnability as a main result, this gap is load-bearing for the broadest interpretation of the central claim.
  2. [Main results (characterization for countable arms)] The reduction to hitting sets (mentioned in the abstract) is used to prove the countable-arm case, but the manuscript does not supply an explicit construction or bound showing that finite convexified volume implies the existence of a hitting set of controlled size; without this step the equivalence between the volume and learnability remains formal rather than quantitative.
minor comments (3)
  1. [Abstract] The abstract introduces the 'convexified generalized maximin volume' and 'distribution covering number' without a one-sentence definition or pointer to the 2025 reference; adding a brief gloss would improve accessibility.
  2. [Introduction] Notation for the function class F and the arm space is used without an early global definition; a short 'Notation' paragraph in the introduction would prevent ambiguity.
  3. [Conclusion / Open questions] The open questions at the end are valuable but would benefit from being numbered and cross-referenced to the specific technical gaps (e.g., the uncountable conjecture) they address.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, constructive feedback, and recognition of the technical contributions in our manuscript. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and the statement of main results: the claim that the convexified generalized maximin volume 'characterizes learnability' for adaptive adversaries is only fully established for countable arm spaces; the extension to uncountable spaces rests entirely on an unproven conjecture linking the volume to the distribution covering number. Because the abstract presents the overall characterization of learnability as a main result, this gap is load-bearing for the broadest interpretation of the central claim.

    Authors: We appreciate the referee highlighting this potential source of ambiguity. The abstract already distinguishes the cases explicitly: it states that learnability is characterized for oblivious adversaries in full, and for adaptive adversaries only when the arm space is countable, while separately presenting the uncountable case as a conjecture via the distribution covering number. The main results section maintains the same separation. To further eliminate any risk of misinterpretation, we will revise the abstract to more prominently emphasize that the full characterization for arbitrary adaptive adversaries is conjectural. This is a clarification rather than a substantive change. revision: partial

  2. Referee: [Main results (characterization for countable arms)] The reduction to hitting sets (mentioned in the abstract) is used to prove the countable-arm case, but the manuscript does not supply an explicit construction or bound showing that finite convexified volume implies the existence of a hitting set of controlled size; without this step the equivalence between the volume and learnability remains formal rather than quantitative.

    Authors: The referee is correct that the current manuscript establishes the connection between finite convexified generalized maximin volume and the existence of simple hitting sets at a relatively high level, without an explicit construction or a quantitative bound on hitting-set size in terms of the volume. This leaves the link somewhat formal. We will add an explicit construction of the hitting set from a finite-volume assumption, together with a bound on its cardinality, in a new subsection of the main results (or appendix). This will make the reduction quantitative and strengthen the proof that finite volume implies sublinear regret for countable arms. revision: yes

Circularity Check

0 steps flagged

No significant circularity; characterizations rest on new reductions to hitting sets

full rationale

The paper introduces a convexified variant of the generalized maximin volume (defined via self-citation to Hanneke and Wang 2025) as the central complexity measure, then proves that finite value of this measure characterizes learnability for all oblivious adversaries and for adaptive adversaries on countable arm spaces. These proofs proceed via explicit reductions to existence of simple hitting sets, which constitute independent content not present in the cited prior work. The uncountable adaptive case is explicitly labeled a conjecture relating the volume to a new distribution covering number, with no claim of proof. No derivation step equates a claimed result to its own inputs by construction, renames a known pattern, or relies on a load-bearing self-citation whose validity is unverified outside this paper. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The results rest on standard online learning assumptions plus the new definitions of the convexified volume and distribution covering number; no free parameters are fitted to data.

axioms (2)
  • domain assumption The function class F is known in advance to the learner
    Explicitly stated in the problem setup in the abstract.
  • standard math Regret is measured against the best fixed arm in hindsight
    Standard definition of regret in bandit problems.
invented entities (2)
  • convexified generalized maximin volume no independent evidence
    purpose: Quantity used to characterize learnability
    Convexified variant of the quantity from the 2025 prior work.
  • distribution covering number no independent evidence
    purpose: Strengthened hitting set notion for uncountable arm spaces
    New complexity measure introduced to support the conjecture.

pith-pipeline@v0.9.0 · 5552 in / 1329 out tokens · 78349 ms · 2026-05-12T01:51:11.596588+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    The Thirty Sixth Annual Conference on Learning Theory , pages=

    Tight guarantees for interactive decision making with the decision-estimation coefficient , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=

  2. [2]

    2023 , eprint=

    The Statistical Complexity of Interactive Decision Making , author=. 2023 , eprint=

  3. [3]

    arXiv preprint arXiv:2410.09597 , year=

    A Complete Characterization of Learnability for Stochastic Noisy Bandits , author=. arXiv preprint arXiv:2410.09597 , year=

  4. [4]

    36th International Conference on Algorithmic Learning Theory , year=

    A Complete Characterization of Learnability for Stochastic Noisy Bandits , author=. 36th International Conference on Algorithmic Learning Theory , year=

  5. [5]

    2020 , publisher=

    Bandit algorithms , author=. 2020 , publisher=

  6. [6]

    SIAM journal on computing , volume=

    The nonstochastic multiarmed bandit problem , author=. SIAM journal on computing , volume=. 2002 , publisher=

  7. [7]

    Proceedings of IEEE 36th annual foundations of computer science , pages=

    Gambling in a rigged casino: The adversarial multi-armed bandit problem , author=. Proceedings of IEEE 36th annual foundations of computer science , pages=. 1995 , organization=

  8. [8]

    International Conference on Algorithmic Learning Theory , pages=

    Scale-free adversarial multi armed bandits , author=. International Conference on Algorithmic Learning Theory , pages=. 2022 , organization=

  9. [9]

    Advances in Neural Information Processing Systems , volume=

    Explore no more: Improved high-probability regret bounds for non-stochastic bandits , author=. Advances in Neural Information Processing Systems , volume=

  10. [10]

    Foundations and Trends in Machine Learning , year=

    Regret analysis of stochastic and nonstochastic multi-armed bandit problems , author=. Foundations and Trends in Machine Learning , year=

  11. [11]

    Conference on Learning Theory , pages=

    Efficient and robust algorithms for adversarial linear contextual bandits , author=. Conference on Learning Theory , pages=. 2020 , organization=

  12. [12]

    Journal of Machine Learning Research , year =

    Elad Hazan and Zohar Karnin , title =. Journal of Machine Learning Research , year =

  13. [13]

    Conference On Learning Theory , pages=

    The many faces of exponential weights in online learning , author=. Conference On Learning Theory , pages=. 2018 , organization=

  14. [14]

    Proceedings of The 28th Conference on Learning Theory , pages =

    The entropic barrier: a simple and optimal universal self-concordant barrier , author =. Proceedings of The 28th Conference on Learning Theory , pages =. 2015 , volume =

  15. [15]

    Conference on Learning Theory , pages=

    Adaptive discretization for adversarial lipschitz bandits , author=. Conference on Learning Theory , pages=. 2021 , organization=

  16. [16]

    Joint european conference on machine learning and knowledge discovery in databases , pages=

    Online learning in adversarial lipschitz environments , author=. Joint european conference on machine learning and knowledge discovery in databases , pages=. 2010 , organization=

  17. [17]

    arXiv preprint arXiv:2006.12367 , year=

    Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning , author=. arXiv preprint arXiv:2006.12367 , year=

  18. [18]

    Advances in Neural Information Processing Systems , volume=

    On the complexity of adversarial decision making , author=. Advances in Neural Information Processing Systems , volume=

  19. [19]

    Proceedings of the 24th Annual Conference on Learning Theory , pages=

    Bandits, query learning, and the haystack dimension , author=. Proceedings of the 24th Annual Conference on Learning Theory , pages=. 2011 , organization=

  20. [20]

    arXiv preprint arXiv:2112.13487 , year=

    The statistical complexity of interactive decision making , author=. arXiv preprint arXiv:2112.13487 , year=

  21. [21]

    The Thirty Sixth Annual Conference on Learning Theory , pages=

    Bandit learnability can be undecidable , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=

  22. [22]

    Proceedings of Thirty Eighth Conference on Learning Theory , pages =

    On the Hardness of Bandit Learning , author =. Proceedings of Thirty Eighth Conference on Learning Theory , pages =. 2025 , publisher =

  23. [23]

    Advances in Neural Information Processing Systems , volume=

    Eluder dimension and the sample complexity of optimistic exploration , author=. Advances in Neural Information Processing Systems , volume=

  24. [24]

    Foundations and Trends in Machine Learning , volume=

    Introduction to multi-armed bandits , author=. Foundations and Trends in Machine Learning , volume=. 2019 , publisher=

  25. [25]

    Machine learning , volume=

    Finite-time analysis of the multiarmed bandit problem , author=. Machine learning , volume=. 2002 , publisher=

  26. [26]

    Bulletin of the American Mathematical Society , year=

    Some aspects of the sequential design of experiments , author=. Bulletin of the American Mathematical Society , year=