A framework for concave distributional utility maximization in stochastic bandits via influence-function stochastic gradients and entropic mirror ascent on the simplex, with regret bounds.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Concave Statistical Utility Maximization Bandits via Influence-Function Gradients
A framework for concave distributional utility maximization in stochastic bandits via influence-function stochastic gradients and entropic mirror ascent on the simplex, with regret bounds.