Power of Generalized Smoothness in Stochastic Convex Optimization: First- and Zero-Order Algorithms
read the original abstract
This paper is devoted to the study of stochastic optimization problems under the generalized smoothness assumption. By considering the unbiased gradient oracle in Stochastic Gradient Descent, we provide strategies to achieve in bounds the summands describing linear rate. In particular, in the case $L_0 = 0$, we obtain in the convex setup the iteration complexity: $N = \mathcal{O}\left(L_1R \log\frac{1}{\varepsilon} + \frac{L_1 c R^2}{\varepsilon}\right)$ for Clipped Stochastic Gradient Descent and $N = \mathcal{O}\left(L_1R \log\frac{1}{\varepsilon}\right)$ for Normalized Stochastic Gradient Descent. Furthermore, we generalize the convergence results to the case with a biased gradient oracle, and show that the power of $(L_0,L_1)$-smoothness extends to zero-order algorithms. Finally, we demonstrate the possibility of linear convergence in the convex setup through numerical experimentation, which has aroused some interest in the machine learning community.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Avoiding Bias in Clipped SGD for Overparameterized Models under Generalized Smoothness
Clipped and normalized SGD converge without bias in overparameterized interpolating models under (L0,L1)-smoothness, with improved rates and extensions to heavy-tailed noise and weaker smoothness.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.