Finite-time analysis of the multiarmed bandit problem.Machine learning, 47(2):235–256

Peter Auer, Nicolo Cesa-Bianchi, Paul Fischer · 2002

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Not all uncertainty is alike: volatility, stochasticity, and exploration

cs.AI · 2026-05-19 · unverdicted · novelty 7.0

Volatility promotes exploration and stochasticity suppresses it in Gaussian state-space bandits, shown by extending Gittins indices and deriving the CAUSE exploration bonus via control-as-inference.

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

cs.AI · 2026-05-07 · accept · novelty 7.0

Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.

Learning Safely Without Knowing the World:COMPASS-Hedge

cs.LG · 2026-03-22 · unverdicted · novelty 7.0

COMPASS-Hedge is presented as the first parameter-free full-information anytime algorithm that simultaneously delivers minimax-optimal adversarial regret, instance-optimal stochastic regret, and Õ(1) regret to a baseline policy.

Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.

citing papers explorer

Showing 4 of 4 citing papers.

Not all uncertainty is alike: volatility, stochasticity, and exploration cs.AI · 2026-05-19 · unverdicted · none · ref 1
Volatility promotes exploration and stochasticity suppresses it in Gaussian state-space bandits, shown by extending Gittins indices and deriving the CAUSE exploration bonus via control-as-inference.
The Context Gathering Decision Process: A POMDP Framework for Agentic Search cs.AI · 2026-05-07 · accept · none · ref 3
Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.
Learning Safely Without Knowing the World:COMPASS-Hedge cs.LG · 2026-03-22 · unverdicted · none · ref 7
COMPASS-Hedge is presented as the first parameter-free full-information anytime algorithm that simultaneously delivers minimax-optimal adversarial regret, instance-optimal stochastic regret, and Õ(1) regret to a baseline policy.
Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy cs.LG · 2026-05-08 · unverdicted · none · ref 4
Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.

Finite-time analysis of the multiarmed bandit problem.Machine learning, 47(2):235–256

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer