Derives first lower bound on γ_t for mean-based algorithms in unknown-horizon bandit settings, proposes two new algorithms, and shows some are also no-regret.
Bandit Algorithms
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 2polarities
background 2representative citing papers
The paper introduces the Worst-case Marginal Benefit (WMB) criterion for sample-size design in test-and-roll experiments and shows it yields an optimal m approximately equal to N/3 for Bernoulli and Gaussian outcomes.
Introduces BOBa, a multi-armed bandit method for scalable surrogate optimization that adaptively allocates inference and evaluations to promising partitions of ultra-large chemical libraries.
A classical agent extracts more work from quantum temporal correlations via adaptive strategies bounded by the new Time-Ordered Free Energy, while reinforcement learning achieves polylogarithmic dissipation when learning unknown states.
CSTS learns context-dependent weights for multiple objectives in a multi-objective contextual bandit and outperforms fixed-weight and standard contextual bandit baselines on Swiss public broadcaster programming data.
Bayesian optimization automates the scientific discovery cycle by modeling observations with surrogate models and using acquisition functions to select experiments that balance known information with new exploration.
citing papers explorer
-
A Demon that remembers: An agential approach towards quantum thermodynamics of temporal correlations
A classical agent extracts more work from quantum temporal correlations via adaptive strategies bounded by the new Time-Ordered Free Energy, while reinforcement learning achieves polylogarithmic dissipation when learning unknown states.