Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition

Mark Schmidt (INRIA Paris - Rocquencourt , LIENS) , Nicolas Le Roux (INRIA Paris - Rocquencourt

Authors on Pith no claims yet

classification 🧮 math.OC

keywords gradientconvergencefunctionunderaveragelinearnormrate

read the original abstract

We consider optimizing a function smooth convex function $f$ that is the average of a set of differentiable functions $f_i$, under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient $f_i'$ is bounded by a linear function of the norm of the average gradient $f'$. We show that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an $O(1/k)$ convergence rate, and has a linear convergence rate if $g$ is strongly-convex.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits
math.OC 2026-05 unverdicted novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
Convergence guarantees for stochastic algorithms solving non-unique problems in metric spaces
math.OC 2026-05 unverdicted novelty 7.0

A quantitative theorem supplies uniform rates of convergence for stochastic quasi-Fejér monotone sequences in metric spaces by extending a deterministic regularity notion to the stochastic setting and applying it to p...
Stochastic Trust-Region Methods for Over-parameterized Models
math.OC 2026-04 unverdicted novelty 7.0

Stochastic trust-region methods achieve O(ε^{-2} log(1/ε)) complexity for unconstrained problems and O(ε^{-4} log(1/ε)) for equality-constrained problems under the strong growth condition, with experiments showing sta...
Shuffling the Data, Stretching the Step-size: Sharper Bias in constant step-size SGD
math.OC 2026-04 unverdicted novelty 7.0

Combining random reshuffling and Richardson-Romberg extrapolation yields cubic bias refinement and better MSE for constant-step SGD on structured non-monotone variational inequalities.
\mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments
cs.LG 2026-05 unverdicted novelty 6.0

VISTA adaptively tunes consistency thresholds in decentralized SGD so that the system converges asymptotically like standard SGD even when adversaries dominate the worker pool.
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction
math.OC 2026-05 unverdicted novelty 5.0

Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.