A unified bandit framework for general open multi-agent systems with global-UCB algorithms and regret bounds linear in entry uncertainty and dependent on system stability and agent patterns.
The nonstochastic multiarmed bandit problem
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Pareto regret in multi-objective bandits matches the single-objective case by scaling inversely with the largest objective-wise gap g†, independent of dimension d, via a new top-two races and uncertainty-greedy algorithm with matching bounds.
citing papers explorer
-
Bandit Learning in General Open Multi-agent Systems
A unified bandit framework for general open multi-agent systems with global-UCB algorithms and regret bounds linear in entry uncertainty and dependent on system stability and agent patterns.
-
Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?
Pareto regret in multi-objective bandits matches the single-objective case by scaling inversely with the largest objective-wise gap g†, independent of dimension d, via a new top-two races and uncertainty-greedy algorithm with matching bounds.