SEGA: Variance Reduction via Gradient Sketching

Filip Hanzely; Konstantin Mishchenko; Peter Richtarik

arxiv: 1809.03054 · v2 · pith:7TVDGN6Ynew · submitted 2018-09-09 · 🧮 math.OC · cs.LG

SEGA: Variance Reduction via Gradient Sketching

Filip Hanzely , Konstantin Mishchenko , Peter Richtarik This is my paper

classification 🧮 math.OC cs.LG

keywords gradientestimatesegacoordinatedescentusedconvergencelinear

0 comments

read the original abstract

We propose a randomized first order optimization method--SEGA (SkEtched GrAdient method)-- which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random linear measurements (sketches) of the gradient obtained from an oracle. In each iteration, SEGA updates the current estimate of the gradient through a sketch-and-project operation using the information provided by the latest sketch, and this is subsequently used to compute an unbiased estimate of the true gradient through a random relaxation procedure. This unbiased estimate is then used to perform a gradient step. Unlike standard subspace descent methods, such as coordinate descent, SEGA can be used for optimization problems with a non-separable proximal term. We provide a general convergence analysis and prove linear convergence for strongly convex objectives. In the special case of coordinate sketches, SEGA can be enhanced with various techniques such as importance sampling, minibatching and acceleration, and its rate is up to a small constant factor identical to the best-known rate of coordinate descent.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Adaptive directional gradients for parameterised quantum circuits
quant-ph 2026-06 unverdicted novelty 8.0

Forward gradient framework for PQCs unifies SPSA and parameter-shift as limits, introduces QUIVER adaptive optimizer with closed-form measurement allocation, and demonstrates efficient training of 60-qubit circuits on...