In: Proceedings of the 19th international conference on World Wide Web

Li, L · 2010 · arXiv 2690.177275

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 2 method 1

citation-polarity summary

background 1 support 1 use method 1

representative citing papers

Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits

stat.ML · 2026-01-31 · unverdicted · novelty 8.0

First shuffle-DP and joint-DP algorithms for GLM contextual bandits achieve near non-private regret without strong spectral assumptions on contexts.

Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

stat.ML · 2026-06-08 · unverdicted · novelty 7.0

UCB-AA is a screening-enhanced UCB algorithm for bandits with arriving arms that delivers arrival-dependent regret bounds and sublinear dynamic regret under gap regularity conditions.

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.

Offline Local Search for Online Stochastic Bandits

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

A generic conversion turns offline local search algorithms into online stochastic combinatorial bandit algorithms with O(log^3 T) approximate regret.

Anytime-valid Optimal Policy Identification

stat.ME · 2026-06-16 · unverdicted · novelty 6.0

Constructs a time-indexed set S_t retaining the true optimal policy uniformly over time with high probability, enabling early stopping with sample complexity O((log |Π| + log log(1/Δ_min))/Δ_min²) when the optimum is unique.

Pessimistic Risk-Aware Policy Learning in Contextual Bandits

stat.ML · 2026-05-15 · unverdicted · novelty 6.0

A distributional framework for optimizing Lipschitz risk functionals in offline contextual bandits yields data-dependent suboptimality bounds of Õ(1/√n) that match risk-neutral rates and are minimax optimal.

Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture

cs.SE · 2026-05-02 · unverdicted · novelty 6.0

RL Developer Memory is a feedback-normalized, safety-gated memory architecture for RL coding agents that logs contextual decisions and applies conservative off-policy gates to maintain 80% decision accuracy and full hard-negative suppression on a 200-case benchmark.

PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

Calibrated Model-Based Deep Reinforcement Learning

cs.LG · 2019-06-19 · unverdicted · novelty 5.0

Augmenting model-based RL agents with calibrated predictive uncertainties improves planning, sample efficiency, and exploration on continuous control tasks.

Contextual Scalarisation Thompson Sampling for multi-objective decisions in public media

cs.IR · 2026-05-29 · unverdicted · novelty 4.0

CSTS learns context-dependent weights for multiple objectives in a multi-objective contextual bandit and outperforms fixed-weight and standard contextual bandit baselines on Swiss public broadcaster programming data.

The EDGE Language: Extended General Einsums for Graph Algorithms

cs.DS · 2024-04-17

citing papers explorer

Showing 1 of 1 citing paper after filters.

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents cs.AI · 2026-05-11 · unverdicted · none · ref 21
OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.

In: Proceedings of the 19th international conference on World Wide Web

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer