On Convergence of Incremental Gradient for Non-Convex Smooth Functions

Anastasia Koloskova; Martin Jaggi; Nikita Doikov; Sebastian U. Stich

arxiv: 2305.19259 · v4 · pith:4FKSQ6LCnew · submitted 2023-05-30 · 💻 cs.LG · math.OC· stat.ML

On Convergence of Incremental Gradient for Non-Convex Smooth Functions

Anastasia Koloskova , Nikita Doikov , Sebastian U. Stich , Martin Jaggi This is my paper

classification 💻 cs.LG math.OCstat.ML

keywords convergencefunctionsgradientincrementalnon-convexoptimizationsmoothvarepsilon

0 comments

read the original abstract

In machine learning and neural network optimization, algorithms like incremental gradient, and shuffle SGD are popular due to minimizing the number of cache misses and good practical convergence behavior. However, their optimization properties in theory, especially for non-convex smooth functions, remain incompletely explored. This paper delves into the convergence properties of SGD algorithms with arbitrary data ordering, within a broad framework for non-convex smooth functions. Our findings show enhanced convergence guarantees for incremental gradient and single shuffle SGD. Particularly if $n$ is the training set size, we improve $n$ times the optimization term of convergence guarantee to reach accuracy $\varepsilon$ from $O(n / \varepsilon)$ to $O(1 / \varepsilon)$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity
cs.LG 2026-05 unverdicted novelty 6.0

Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and ...