Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback

· 2026 · cs.LG · arXiv 2603.25029

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

We consider the problem of Online Convex Optimization (OCO) with two-point bandit feedback. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at two points. While it is well-known that two-point feedback allows for gradient estimation, achieving tight high-probability regret bounds for strongly convex functions still remained open as highlighted by \citet{agarwal2010optimal}. The primary challenge lies in the heavy-tailed nature of bandit gradient estimators, which makes standard concentration analysis difficult. In this paper, we resolve this open challenge and provide the first high-probability regret bound of $O(d(\log T + \log(1/\delta))/\mu)$ for $\mu$-strongly convex losses. Our result is minimax optimal with respect to both the time horizon $T$ and the dimension $d$.

representative citing papers

High-Probability Last-Iterate Guarantees for Two-Point Gaussian Zeroth-Order Stochastic Gradient Descent

math.OC · 2026-06-18 · unverdicted · novelty 7.0

Direct high-probability last-iterate guarantee of Õ(d/T) for same-sample two-point Gaussian ZO-SGD under conditional exponential-moment noise when d ≥ 16 log(6T/δ).

High-Probability Guarantees for Random Zeroth-Order (Stochastic) Gradient Descent

math.OC · 2026-04-26 · unverdicted · novelty 7.0 · 2 refs

Establishes high-probability bounds for zeroth-order GD showing logarithmic dependence on failure probability δ in deterministic case and specific query complexity in stochastic case.

citing papers explorer

Showing 2 of 2 citing papers after filters.

High-Probability Last-Iterate Guarantees for Two-Point Gaussian Zeroth-Order Stochastic Gradient Descent math.OC · 2026-06-18 · unverdicted · none · ref 21 · internal anchor
Direct high-probability last-iterate guarantee of Õ(d/T) for same-sample two-point Gaussian ZO-SGD under conditional exponential-moment noise when d ≥ 16 log(6T/δ).
High-Probability Guarantees for Random Zeroth-Order (Stochastic) Gradient Descent math.OC · 2026-04-26 · unverdicted · none · ref 17 · 2 links · internal anchor
Establishes high-probability bounds for zeroth-order GD showing logarithmic dependence on failure probability δ in deterministic case and specific query complexity in stochastic case.

Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback

fields

years

verdicts

representative citing papers

citing papers explorer