Direct high-probability last-iterate guarantee of Õ(d/T) for same-sample two-point Gaussian ZO-SGD under conditional exponential-moment noise when d ≥ 16 log(6T/δ).
Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
We consider the problem of Online Convex Optimization (OCO) with two-point bandit feedback. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at two points. While it is well-known that two-point feedback allows for gradient estimation, achieving tight high-probability regret bounds for strongly convex functions still remained open as highlighted by \citet{agarwal2010optimal}. The primary challenge lies in the heavy-tailed nature of bandit gradient estimators, which makes standard concentration analysis difficult. In this paper, we resolve this open challenge and provide the first high-probability regret bound of $O(d(\log T + \log(1/\delta))/\mu)$ for $\mu$-strongly convex losses. Our result is minimax optimal with respect to both the time horizon $T$ and the dimension $d$.
fields
math.OC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Establishes high-probability bounds for zeroth-order GD showing logarithmic dependence on failure probability δ in deterministic case and specific query complexity in stochastic case.
citing papers explorer
-
High-Probability Last-Iterate Guarantees for Two-Point Gaussian Zeroth-Order Stochastic Gradient Descent
Direct high-probability last-iterate guarantee of Õ(d/T) for same-sample two-point Gaussian ZO-SGD under conditional exponential-moment noise when d ≥ 16 log(6T/δ).
-
High-Probability Guarantees for Random Zeroth-Order (Stochastic) Gradient Descent
Establishes high-probability bounds for zeroth-order GD showing logarithmic dependence on failure probability δ in deterministic case and specific query complexity in stochastic case.