Derives first lower bound on γ_t for mean-based algorithms in unknown-horizon bandit settings, proposes two new algorithms, and shows some are also no-regret.
Bandit Algorithms
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 2polarities
background 2representative citing papers
The paper introduces the Worst-case Marginal Benefit (WMB) criterion for sample-size design in test-and-roll experiments and shows it yields an optimal m approximately equal to N/3 for Bernoulli and Gaussian outcomes.
Introduces BOBa, a multi-armed bandit method for scalable surrogate optimization that adaptively allocates inference and evaluations to promising partitions of ultra-large chemical libraries.
A classical agent extracts more work from quantum temporal correlations via adaptive strategies bounded by the new Time-Ordered Free Energy, while reinforcement learning achieves polylogarithmic dissipation when learning unknown states.
CSTS learns context-dependent weights for multiple objectives in a multi-objective contextual bandit and outperforms fixed-weight and standard contextual bandit baselines on Swiss public broadcaster programming data.
Bayesian optimization automates the scientific discovery cycle by modeling observations with surrogate models and using acquisition functions to select experiments that balance known information with new exploration.
citing papers explorer
-
Mean-based algorithms: A lower bound and regret
Derives first lower bound on γ_t for mean-based algorithms in unknown-horizon bandit settings, proposes two new algorithms, and shows some are also no-regret.
-
Prior-Free Sample Size Design for Test-and-Roll Experiments
The paper introduces the Worst-case Marginal Benefit (WMB) criterion for sample-size design in test-and-roll experiments and shows it yields an optimal m approximately equal to N/3 for Bernoulli and Gaussian outcomes.
-
Target-Aware Bandit Allocation for Scalable Surrogate Optimization in Chemical Space
Introduces BOBa, a multi-armed bandit method for scalable surrogate optimization that adaptively allocates inference and evaluations to promising partitions of ultra-large chemical libraries.
-
A Demon that remembers: An agential approach towards quantum thermodynamics of temporal correlations
A classical agent extracts more work from quantum temporal correlations via adaptive strategies bounded by the new Time-Ordered Free Energy, while reinforcement learning achieves polylogarithmic dissipation when learning unknown states.
-
Contextual Scalarisation Thompson Sampling for multi-objective decisions in public media
CSTS learns context-dependent weights for multiple objectives in a multi-objective contextual bandit and outperforms fixed-weight and standard contextual bandit baselines on Swiss public broadcaster programming data.
-
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial
Bayesian optimization automates the scientific discovery cycle by modeling observations with surrogate models and using acquisition functions to select experiments that balance known information with new exploration.
- SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters