Power of Generalized Smoothness in Stochastic Convex Optimization: First- and Zero-Order Algorithms

Aleksandr Lobanov; Alexander Gasnikov

arxiv: 2501.18198 · v2 · pith:YX3XXP7Nnew · submitted 2025-01-30 · 🧮 math.OC

Power of Generalized Smoothness in Stochastic Convex Optimization: First- and Zero-Order Algorithms

Aleksandr Lobanov , Alexander Gasnikov This is my paper

classification 🧮 math.OC

keywords gradientstochasticconvexdescentfracsmoothnessvarepsilonalgorithms

0 comments

read the original abstract

This paper is devoted to the study of stochastic optimization problems under the generalized smoothness assumption. By considering the unbiased gradient oracle in Stochastic Gradient Descent, we provide strategies to achieve in bounds the summands describing linear rate. In particular, in the case $L_0 = 0$, we obtain in the convex setup the iteration complexity: $N = \mathcal{O}\left(L_1R \log\frac{1}{\varepsilon} + \frac{L_1 c R^2}{\varepsilon}\right)$ for Clipped Stochastic Gradient Descent and $N = \mathcal{O}\left(L_1R \log\frac{1}{\varepsilon}\right)$ for Normalized Stochastic Gradient Descent. Furthermore, we generalize the convergence results to the case with a biased gradient oracle, and show that the power of $(L_0,L_1)$-smoothness extends to zero-order algorithms. Finally, we demonstrate the possibility of linear convergence in the convex setup through numerical experimentation, which has aroused some interest in the machine learning community.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Avoiding Bias in Clipped SGD for Overparameterized Models under Generalized Smoothness
math.OC 2026-05 unverdicted novelty 7.0

Clipped and normalized SGD converge without bias in overparameterized interpolating models under (L0,L1)-smoothness, with improved rates and extensions to heavy-tailed noise and weaker smoothness.