pith. sign in

Finite-time analysis of the multiarmed bandit problem.Machine learning, 47(2):235–256

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.AI 2 cs.LG 2

years

2026 4

roles

background 1

polarities

background 1

representative citing papers

Learning Safely Without Knowing the World:COMPASS-Hedge

cs.LG · 2026-03-22 · unverdicted · novelty 7.0

COMPASS-Hedge is presented as the first parameter-free full-information anytime algorithm that simultaneously delivers minimax-optimal adversarial regret, instance-optimal stochastic regret, and Õ(1) regret to a baseline policy.

Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.

citing papers explorer

Showing 4 of 4 citing papers.

  • Not all uncertainty is alike: volatility, stochasticity, and exploration cs.AI · 2026-05-19 · unverdicted · none · ref 1

    Volatility promotes exploration and stochasticity suppresses it in Gaussian state-space bandits, shown by extending Gittins indices and deriving the CAUSE exploration bonus via control-as-inference.

  • The Context Gathering Decision Process: A POMDP Framework for Agentic Search cs.AI · 2026-05-07 · accept · none · ref 3

    Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.

  • Learning Safely Without Knowing the World:COMPASS-Hedge cs.LG · 2026-03-22 · unverdicted · none · ref 7

    COMPASS-Hedge is presented as the first parameter-free full-information anytime algorithm that simultaneously delivers minimax-optimal adversarial regret, instance-optimal stochastic regret, and Õ(1) regret to a baseline policy.

  • Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy cs.LG · 2026-05-08 · unverdicted · none · ref 4

    Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.