Cambridge University Press

Tor Lattimore, Csaba Szepesvári · 2020

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

cs.SE · 2026-04-29 · unverdicted · novelty 7.0

Comet-H orchestrates LLMs via deficit-scoring prompt selection and half-life task tracking to co-evolve research software components, demonstrated by a static analysis tool reaching F1=0.768 versus a 0.364 baseline.

Offline Local Search for Online Stochastic Bandits

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

A generic conversion turns offline local search algorithms into online stochastic combinatorial bandit algorithms with O(log^3 T) approximate regret.

A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies

cs.LG · 2025-10-17 · unverdicted · novelty 7.0

Establishes last-iterate convergence rates for on-policy Q-learning under minimal irreducibility assumptions, with sample complexity O(1/ξ²) matching off-policy up to exploration factors.

Delightful Exploration

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Delight-gated exploration spends actions only when expected improvement times surprisal exceeds a gate price, recovers Pandora's reservation rule, and shows weaker regret growth than Thompson sampling or epsilon-greedy across bandits and MDPs with transferable hyperparameters.

Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.

Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

BCCB unifies learning of heterogeneous ad responses, exploration of uncertain users, and budget pacing into a single online process that works effectively from the first user on the Criteo Uplift dataset.

An Efficient Algorithm for Minimizing Ordered Norms in Fractional Load Balancing

cs.DS · 2025-11-14 · conditional · novelty 6.0

A randomized (1+ε)-approximation algorithm for ordered-norm load balancing uses O((n+d)(ε^{-2} + log log d) log(n+d)) linear-oracle calls via follow-the-regularized-leader prices and martingale progress analysis.

RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits

stat.ML · 2026-03-11 · unverdicted · novelty 5.0

RIE-Greedy uses stochasticity from cross-validation regularization to induce Thompson Sampling-like exploration, claimed equivalent in the two-armed case and empirically competitive in large-scale settings.

citing papers explorer

Showing 8 of 8 citing papers.

Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves cs.SE · 2026-04-29 · unverdicted · none · ref 19
Comet-H orchestrates LLMs via deficit-scoring prompt selection and half-life task tracking to co-evolve research software components, demonstrated by a static analysis tool reaching F1=0.768 versus a 0.364 baseline.
Offline Local Search for Online Stochastic Bandits cs.LG · 2026-04-10 · unverdicted · none · ref 37
A generic conversion turns offline local search algorithms into online stochastic combinatorial bandit algorithms with O(log^3 T) approximate regret.
A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies cs.LG · 2025-10-17 · unverdicted · none · ref 61
Establishes last-iterate convergence rates for on-policy Q-learning under minimal irreducibility assumptions, with sample complexity O(1/ξ²) matching off-policy up to exploration factors.
Delightful Exploration cs.LG · 2026-05-13 · unverdicted · none · ref 9
Delight-gated exploration spends actions only when expected improvement times surprisal exceeds a gate price, recovers Pandora's reservation rule, and shows weaker regret growth than Thompson sampling or epsilon-greedy across bandits and MDPs with transferable hyperparameters.
Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy cs.LG · 2026-05-08 · unverdicted · none · ref 20
Develops COF algorithm for MAB-CS that intelligently checks cheap arm feasibility by pooling samples, with generalized instance-dependent lower bounds and matching upper bounds on cumulative cost and quality regret.
Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making cs.LG · 2026-04-28 · unverdicted · none · ref 16
BCCB unifies learning of heterogeneous ad responses, exploration of uncertain users, and budget pacing into a single online process that works effectively from the first user on the Criteo Uplift dataset.
An Efficient Algorithm for Minimizing Ordered Norms in Fractional Load Balancing cs.DS · 2025-11-14 · conditional · none · ref 56
A randomized (1+ε)-approximation algorithm for ordered-norm load balancing uses O((n+d)(ε^{-2} + log log d) log(n+d)) linear-oracle calls via follow-the-regularized-leader prices and martingale progress analysis.
RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits stat.ML · 2026-03-11 · unverdicted · none · ref 17
RIE-Greedy uses stochasticity from cross-validation regularization to induce Thompson Sampling-like exploration, claimed equivalent in the two-armed case and empirically competitive in large-scale settings.

Cambridge University Press

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer