Bandit Algorithms

Tor Lattimore, Csaba Szepesvári · 2020 · DOI 10.1017/9781108571401

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open at publisher browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers

cs.DB · 2026-07-01 · unverdicted · novelty 7.0

SOLAR is a learning-augmented policy for semantic cache replacement that achieves constant competitive ratio 3 and 5-75% gains over FIFO on retrieval workloads.

Mean-based algorithms: A lower bound and regret

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Derives first lower bound on γ_t for mean-based algorithms in unknown-horizon bandit settings, proposes two new algorithms, and shows some are also no-regret.

Prior-Free Sample Size Design for Test-and-Roll Experiments

econ.EM · 2026-05-04 · unverdicted · novelty 7.0

The paper introduces the Worst-case Marginal Benefit (WMB) criterion for sample-size design in test-and-roll experiments and shows it yields an optimal m approximately equal to N/3 for Bernoulli and Gaussian outcomes.

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

cs.DC · 2026-05-01 · unverdicted · novelty 7.0 · 2 refs

SAGA introduces workflow-atomic scheduling for compound AI agents, achieving 1.64x lower task completion time and 1.22x better memory utilization than vLLM on a 64-GPU cluster at the cost of 30% lower peak throughput.

Target-Aware Bandit Allocation for Scalable Surrogate Optimization in Chemical Space

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

Introduces BOBa, a multi-armed bandit method for scalable surrogate optimization that adaptively allocates inference and evaluations to promising partitions of ultra-large chemical libraries.

A Demon that remembers: An agential approach towards quantum thermodynamics of temporal correlations

quant-ph · 2026-04-06 · unverdicted · novelty 6.0

A classical agent extracts more work from quantum temporal correlations via adaptive strategies bounded by the new Time-Ordered Free Energy, while reinforcement learning achieves polylogarithmic dissipation when learning unknown states.

Contextual Scalarisation Thompson Sampling for multi-objective decisions in public media

cs.IR · 2026-05-29 · unverdicted · novelty 4.0

CSTS learns context-dependent weights for multiple objectives in a multi-objective contextual bandit and outperforms fixed-weight and standard contextual bandit baselines on Swiss public broadcaster programming data.

Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial

cs.LG · 2026-04-01 · accept · novelty 2.0

Bayesian optimization automates the scientific discovery cycle by modeling observations with surrogate models and using acquisition functions to select experiments that balance known information with new exploration.

citing papers explorer

Showing 2 of 2 citing papers after filters.

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters cs.DC · 2026-05-01 · unverdicted · none · ref 38 · 2 links
SAGA introduces workflow-atomic scheduling for compound AI agents, achieving 1.64x lower task completion time and 1.22x better memory utilization than vLLM on a 64-GPU cluster at the cost of 30% lower peak throughput.
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial cs.LG · 2026-04-01 · accept · none · ref 54
Bayesian optimization automates the scientific discovery cycle by modeling observations with surrogate models and using acquisition functions to select experiments that balance known information with new exploration.

Bandit Algorithms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer