High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Alexander Gasnikov; Eduard Gorbunov; Innokentiy Shibaev; Marina Danilova; Pavel Dvurechensky

arxiv: 2106.05958 · v3 · pith:ZSLT5S2Lnew · submitted 2021-06-10 · 🧮 math.OC · cs.LG

High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Eduard Gorbunov , Marina Danilova , Innokentiy Shibaev , Pavel Dvurechensky , Alexander Gasnikov This is my paper

classification 🧮 math.OC cs.LG

keywords stochasticmethodsnon-smoothcomplexityconvexnoiseobjectiveoptimization

0 comments

read the original abstract

Stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with H\"older-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

High-Probability Last-Iterate Guarantees for Two-Point Gaussian Zeroth-Order Stochastic Gradient Descent
math.OC 2026-06 unverdicted novelty 7.0

Direct high-probability last-iterate guarantee of Õ(d/T) for same-sample two-point Gaussian ZO-SGD under conditional exponential-moment noise when d ≥ 16 log(6T/δ).
Stochastic Optimization and Data Science
math.OC 2026-05 unverdicted novelty 2.0

The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.