Recognition: unknown
Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition
read the original abstract
We consider optimizing a function smooth convex function $f$ that is the average of a set of differentiable functions $f_i$, under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient $f_i'$ is bounded by a linear function of the norm of the average gradient $f'$. We show that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an $O(1/k)$ convergence rate, and has a linear convergence rate if $g$ is strongly-convex.
This paper has not been read by Pith yet.
Forward citations
Cited by 6 Pith papers
-
Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
-
Convergence guarantees for stochastic algorithms solving non-unique problems in metric spaces
A quantitative theorem supplies uniform rates of convergence for stochastic quasi-Fejér monotone sequences in metric spaces by extending a deterministic regularity notion to the stochastic setting and applying it to p...
-
Stochastic Trust-Region Methods for Over-parameterized Models
Stochastic trust-region methods achieve O(ε^{-2} log(1/ε)) complexity for unconstrained problems and O(ε^{-4} log(1/ε)) for equality-constrained problems under the strong growth condition, with experiments showing sta...
-
Shuffling the Data, Stretching the Step-size: Sharper Bias in constant step-size SGD
Combining random reshuffling and Richardson-Romberg extrapolation yields cubic bias refinement and better MSE for constant-step SGD on structured non-monotone variational inequalities.
-
\mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments
VISTA adaptively tunes consistency thresholds in decentralized SGD so that the system converges asymptotically like standard SGD even when adversaries dominate the worker pool.
-
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction
Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.