Stochastic Optimization and Data Science

arxiv: 2605.16875 · v1 · pith:Z23LILDFnew · submitted 2026-05-16 · 🧮 math.OC

Stochastic Optimization and Data Science

Arutyun Avetisyan , Darina Dvinskikh , Alexander Gasnikov , Vladimir Temlyakov , Nazarii Tupitsa , Denis Turdakov This is my paper

Pith reviewed 2026-05-19 21:04 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic optimizationstatistical learningsample average approximationstochastic approximationpopulation risklog-likelihoodexpectation minimization

0 comments p. Extension

pith:Z23LILDF Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{Z23LILDF}

Prints a linked pith:Z23LILDF badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Stochastic optimization problems arise when maximizing log-likelihood or minimizing population risk in statistical estimation and learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper connects stochastic optimization directly to statistical and statistical learning tasks by framing them as problems of minimizing an expectation. In this view the objective is either to maximize a log-likelihood based on observed data or to minimize a population risk that averages a loss over an unknown distribution. It presents two standard routes to these problems: offline methods that draw many samples in advance and replace the expectation with an average, and online methods that update a candidate solution after each new observation. A sympathetic reader cares because this framing explains why random sampling and sequential updates appear naturally in data-driven model fitting and why the same algorithmic ideas serve both optimization and statistics.

Core claim

Stochastic optimization problems are motivated from a statistical perspective and a statistical learning perspective, where the goal is to maximize the log-likelihood or minimize the population risk. The paper briefly describes the two main approaches to solve the resulting expectation minimization problems: the offline approach using Monte Carlo simulation or Sample Average Approximation and the online approach using Stochastic Approximation.

What carries the argument

Expectation minimization problems obtained from log-likelihood maximization or population-risk minimization, solved by either offline Monte Carlo/SAA averaging or online stochastic approximation updates.

If this is right

Statistical parameter estimation can be recast as an expectation-minimization task.
Offline methods replace the unknown expectation by a finite-sample average computed from Monte Carlo draws.
Online methods produce a sequence of iterates that converge to the solution using one fresh observation at each step.
Both routes therefore inherit convergence guarantees and complexity bounds from stochastic optimization theory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same framing suggests that statistical efficiency concepts such as asymptotic variance could be used to compare different stochastic optimization algorithms.
Hybrid schemes that switch between batch averaging and sequential updates may be worth exploring for very large data sets.
The perspective naturally extends to settings where the loss function itself is defined by a statistical model, such as empirical risk minimization with regularization.

Load-bearing premise

The two described approaches, offline Monte Carlo or Sample Average Approximation and online Stochastic Approximation, are the primary or sufficient ways to solve expectation minimization problems that arise in statistical settings.

What would settle it

A concrete statistical estimation task whose optimum cannot be recovered to arbitrary accuracy by either drawing a large fixed sample and averaging or by processing observations one at a time would falsify the claim that these two routes cover the relevant problems.

read the original abstract

This paper aims to motivate stochastic optimization problems from a statistical perspective and a statistical learning perspective, where the goal is to maximize the log-likelihood or minimize the population risk. We briefly describe the two main approaches: offline (Monte Carlo / Sample Average Approximation) and online (Stochastic Approximation) approaches -- to solve the expectation minimization problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a short motivational note restating standard links between stochastic optimization and stats/ML estimation, with no new methods or results.

read the letter

The main takeaway is that this paper is an expository note rather than a research contribution. It motivates expectation minimization problems by connecting them to maximum likelihood estimation and population risk minimization, then outlines the usual offline route via Monte Carlo or sample average approximation and the online route via stochastic approximation. The framing is straightforward and could serve as a quick entry point for someone moving between optimization and statistics.

Referee Report

1 major / 1 minor

Summary. The manuscript motivates stochastic optimization problems from a statistical perspective and a statistical learning perspective. It states that the goal is to maximize the log-likelihood or minimize the population risk and briefly describes the two main approaches—offline (Monte Carlo / Sample Average Approximation) and online (Stochastic Approximation)—for solving the resulting expectation minimization problems.

Significance. If the high-level framing is accurate, the paper offers a concise motivational overview connecting stochastic optimization to statistical estimation and learning tasks. However, it advances no new results, derivations, theorems, error bounds, or comparisons, instead recalling standard distinctions already established in the literature. Its significance for a research journal in mathematical optimization is therefore limited; it functions more as an expository note than a substantive contribution.

major comments (1)

[Abstract and full text] Abstract and full text: The central motivation relies entirely on well-known facts about offline Monte Carlo/SAA and online SA methods without any derivations, proofs, or quantitative statements that could be checked. This absence of technical content means the manuscript does not contain load-bearing claims that advance the field beyond existing terminology.

minor comments (1)

The manuscript is extremely brief; adding concrete examples, key references to foundational works (e.g., on SAA or Robbins-Monro), or a short discussion of convergence properties would improve clarity for readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and constructive feedback on our manuscript. We acknowledge that the work is expository in nature and does not introduce new technical results.

read point-by-point responses

Referee: [Abstract and full text] Abstract and full text: The central motivation relies entirely on well-known facts about offline Monte Carlo/SAA and online SA methods without any derivations, proofs, or quantitative statements that could be checked. This absence of technical content means the manuscript does not contain load-bearing claims that advance the field beyond existing terminology.

Authors: We agree that the manuscript contains no new derivations, proofs, error bounds, or quantitative comparisons. Its stated aim is to motivate stochastic optimization problems from statistical and statistical learning perspectives by recalling the standard distinction between offline (Monte Carlo/SAA) and online (stochastic approximation) approaches to expectation minimization. We believe a concise, high-level framing of this connection can still be useful to readers, particularly those entering the area from statistics or data science, even though it does not advance the technical literature. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a short motivational overview that recalls standard distinctions between offline (Monte Carlo/SAA) and online (stochastic approximation) methods for expectation minimization in statistical settings. No derivations, equations, fitted parameters, or load-bearing claims are advanced; the text simply names well-known approaches already present in the literature. No step reduces by construction to its own inputs or to a self-citation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an expository paper with no new mathematical claims, so the ledger contains no free parameters, axioms, or invented entities beyond standard background assumptions in optimization and statistics.

pith-pipeline@v0.9.0 · 5583 in / 956 out tokens · 30867 ms · 2026-05-19T21:04:14.358367+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We briefly describe the two main approaches: offline (Monte Carlo / Sample Average Approximation) and online (Stochastic Approximation) approaches – to solve the expectation minimization problems.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 5 … N = O(M_p² R_p² / ε² (n log(M_p R_p / ε) + log(1/β)))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

162 extracted references · 162 canonical work pages · 5 internal anchors

[1]

Journal of machine learning research , volume=

From low probability to high confidence in stochastic convex optimization , author=. Journal of machine learning research , volume=

work page
[2]

Mathematical Programming , volume=

Accelerated stochastic approximation with state-dependent noise , author=. Mathematical Programming , volume=. 2025 , publisher=

work page 2025
[3]

Optimization Methods and Software , volume=

Inexact tensor methods and their application to stochastic convex optimization , author=. Optimization Methods and Software , volume=. 2024 , publisher=

work page 2024
[4]

The Twelfth International Conference on Learning Representations , year=

Advancing the Lower Bounds: an Accelerated, Stochastic, Second-order Method with Optimal Adaptation to Inexactness , author=. The Twelfth International Conference on Learning Representations , year=

work page
[5]

and Juditsky, A

Nemirovski, A. and Juditsky, A. and Lan, G. and Shapiro, A. , title =. SIAM Journal on Optimization , volume =. 2009 , doi =. https://doi.org/10.1137/070704277 , abstract =

work page doi:10.1137/070704277 2009
[6]

Chaos, Solitons & Fractals , volume=

Method with batching for stochastic finite-sum variational inequalities in non-Euclidean setting , author=. Chaos, Solitons & Fractals , volume=. 2024 , publisher=

work page 2024
[7]

European Mathematical Society Magazine , number=

Smooth monotone stochastic variational inequalities and saddle point problems: A survey , author=. European Mathematical Society Magazine , number=

work page
[8]

Forty-second International Conference on Machine Learning , year=

On Linear Convergence in Smooth Convex-Concave Bilinearly-Coupled Saddle-Point Optimization: Lower Bounds and Optimal Algorithms , author=. Forty-second International Conference on Machine Learning , year=

work page
[9]

Journal of Machine Learning Research , year =

Xun Qian and Zheng Qu and Peter Richtarik , title =. Journal of Machine Learning Research , year =

work page
[10]

Solving Convex-Concave Problems with

Lesi Chen and Chengchang Liu and Luo Luo and Jingzhao Zhang , year=. Solving Convex-Concave Problems with. 2506.08362 , archivePrefix=

work page arXiv
[11]

The sample complexity of

Carmon, Daniel and Yehudayoff, Amir and Livni, Roi , booktitle =. The sample complexity of. 2024 , editor =

work page 2024
[12]

2023 , editor =

Ivgi, Maor and Hinder, Oliver and Carmon, Yair , booktitle =. 2023 , editor =

work page 2023
[13]

Proceedings of Thirty Seventh Conference on Learning Theory , pages =

Accelerated Parameter-Free Stochastic Optimization , author =. Proceedings of Thirty Seventh Conference on Learning Theory , pages =. 2024 , editor =

work page 2024
[14]

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =

Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems , author =. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =. 2024 , editor =

work page 2024
[15]

2024 , eprint=

Universal Gradient Methods for Stochastic Convex Optimization , author=. 2024 , eprint=

work page 2024
[16]

2009 , publisher=

Large deviations techniques and applications , author=. 2009 , publisher=

work page 2009
[17]

Journal of the American statistical association , volume=

Probability inequalities for sums of bounded random variables , author=. Journal of the American statistical association , volume=. 1963 , publisher=

work page 1963
[18]

Journal of Machine Learning Research , volume =

Ohad Shamir , title =. Journal of Machine Learning Research , volume =. 2017 , url =

work page 2017
[19]

Stability and Generalization , type =

Bousquet, Olivier and Elisseeff, André , biburl =. Stability and Generalization , type =. Journal of Machine Learning Research , keywords =

work page
[20]

2021 , publisher=

Lectures on stochastic programming: modeling and theory , author=. 2021 , publisher=

work page 2021
[21]

arXiv preprint arXiv:2008.00051 , year=

On the convergence of SGD with biased gradients , author=. arXiv preprint arXiv:2008.00051 , year=

work page arXiv 2008
[22]

2013 , school=

Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization , author=. 2013 , school=

work page 2013
[23]

arXiv preprint arXiv:2201.12289 , year=

The power of first-order smooth optimization for black-box non-smooth problems , author=. arXiv preprint arXiv:2201.12289 , year=

work page arXiv
[24]

1970 , publisher=

Convex analysis , author=. 1970 , publisher=

work page 1970
[25]

arXiv preprint arXiv:2102.08352 , year=

Stochastic variance reduction for variational inequality methods , author=. arXiv preprint arXiv:2102.08352 , year=

work page arXiv
[26]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Advances in Neural Information Processing Systems , volume=

Dual averaging method for regularized stochastic learning and online optimization , author=. Advances in Neural Information Processing Systems , volume=

work page
[28]

arXiv preprint arXiv:2011.13259 , year=

Recent theoretical advances in decentralized distributed convex optimization , author=. arXiv preprint arXiv:2011.13259 , year=

work page arXiv 2011
[29]

arXiv preprint arXiv:2107.07190 , year=

Decentralized and personalized federated learning , author=. arXiv preprint arXiv:2107.07190 , year=

work page arXiv
[30]

Conference on Learning Theory , pages=

The min-max complexity of distributed stochastic convex optimization with intermittent communication , author=. Conference on Learning Theory , pages=. 2021 , organization=

work page 2021
[31]

Never Go Full Batch (in Stochastic Convex Optimization) , url =

Amir, Idan and Carmon, Yair and Koren, Tomer and Livni, Roi , booktitle =. Never Go Full Batch (in Stochastic Convex Optimization) , url =

work page
[32]

Proceedings of The 33rd International Conference on Machine Learning , pages =

Train faster, generalize better: Stability of stochastic gradient descent , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , editor =

work page 2016
[33]

Is Local

Woodworth, Blake and Patel, Kumar Kshitij and Stich, Sebastian and Dai, Zhen and Bullins, Brian and Mcmahan, Brendan and Shamir, Ohad and Srebro, Nathan , booktitle =. Is Local. 2020 , editor =

work page 2020
[34]

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning , url =

Woodworth, Blake E and Srebro, Nathan , booktitle =. An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning , url =

work page
[35]

, author=

Composite objective mirror descent. , author=. COLT , volume=. 2010 , organization=

work page 2010
[36]

Optimization Methods and Software , pages=

Inexact model: A framework for optimization and variational inequalities , author=. Optimization Methods and Software , pages=. 2021 , publisher=

work page 2021
[37]

arXiv preprint arXiv:2205.12751 , year=

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization , author=. arXiv preprint arXiv:2205.12751 , year=

work page arXiv
[38]

SIAM journal on imaging sciences , volume=

A fast iterative shrinkage-thresholding algorithm for linear inverse problems , author=. SIAM journal on imaging sciences , volume=. 2009 , publisher=

work page 2009
[39]

Mathematical programming , volume=

Gradient methods for minimizing composite functions , author=. Mathematical programming , volume=. 2013 , publisher=

work page 2013
[40]

Avtomatika i telemekhanika , number=

Adaptive estimation algorithms: convergence, optimality, stability , author=. Avtomatika i telemekhanika , number=. 1979 , publisher=

work page 1979
[41]

arXiv preprint arXiv:2206.08627 , year=

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization , author=. arXiv preprint arXiv:2206.08627 , year=

work page arXiv
[42]

Avtomatika i Telemekhanika , number=

Optimal pseudogradient adaptation algorithms , author=. Avtomatika i Telemekhanika , number=. 1980 , publisher=

work page 1980
[43]

1984 , publisher=

Recurrent estimation and adaptive filtration , author=. 1984 , publisher=

work page 1984
[44]

Computational Mathematics and Mathematical Physics , volume=

Universal method for stochastic composite optimization problems , author=. Computational Mathematics and Mathematical Physics , volume=. 2018 , publisher=

work page 2018
[45]

Conference on learning theory , pages=

A universal algorithm for variational inequalities adaptive to smoothness and noise , author=. Conference on learning theory , pages=. 2019 , organization=

work page 2019
[46]

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation , volume=

Belkin, Mikhail , year=. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation , volume=. doi:10.1017/S0962492921000039 , journal=

work page doi:10.1017/s0962492921000039
[47]

Proceedings of The 33rd International Conference on Machine Learning , pages =

Generalization Properties and Implicit Regularization for Multiple Passes SGM , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , editor =

work page 2016
[48]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[49]

2020 , publisher=

Deep learning with PyTorch , author=. 2020 , publisher=

work page 2020
[50]

nature , volume=

Deep learning , author=. nature , volume=. 2015 , publisher=

work page 2015
[51]

On the Convergence of Adam and Beyond

On the convergence of adam and beyond , author=. arXiv preprint arXiv:1904.09237 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904
[52]

, author=

Adaptive subgradient methods for online learning and stochastic optimization. , author=. Journal of machine learning research , volume=

work page
[53]

arXiv preprint arXiv:2103.08280 , year=

Lower complexity bounds of finite-sum optimization problems: The results and construction , author=. arXiv preprint arXiv:2103.08280 , year=

work page arXiv
[54]

Advances in Neural Information Processing Systems , volume=

Optimal black-box reductions between optimization objectives , author=. Advances in Neural Information Processing Systems , volume=

work page
[55]

Mathematical Programming , volume=

Universal gradient methods for convex optimization problems , author=. Mathematical Programming , volume=. 2015 , publisher=

work page 2015
[56]

Advances in neural information processing systems , volume=

Tight complexity bounds for optimizing composite objectives , author=. Advances in neural information processing systems , volume=

work page
[57]

Nature Singapore: Springer , year=

Accelerated optimization for machine learning , author=. Nature Singapore: Springer , year=

work page
[58]

arXiv preprint arXiv:1907.04232 , year=

Unified optimal analysis of the (stochastic) gradient method , author=. arXiv preprint arXiv:1907.04232 , year=

work page arXiv 1907
[59]

Advances in Neural Information Processing Systems , volume=

Stochastic optimization with heavy-tailed noise via accelerated gradient clipping , author=. Advances in Neural Information Processing Systems , volume=

work page
[60]

Journal of Optimization Theory and Applications , volume=

Stochastic intermediate gradient method for convex problems with stochastic inexact oracle , author=. Journal of Optimization Theory and Applications , volume=. 2016 , publisher=

work page 2016
[61]

arXiv preprint arXiv:2206.00090 , year=

Decentralized Saddle-Point Problems with Different Constants of Strong Convexity and Strong Concavity , author=. arXiv preprint arXiv:2206.00090 , year=

work page arXiv
[62]

Advances in neural information processing systems , volume=

Graph oracle models, lower bounds, and gaps for parallel stochastic optimization , author=. Advances in neural information processing systems , volume=

work page
[63]

optimization software , author=

Introduction to optimization. optimization software , author=. Inc., Publications Division, New York , volume=

work page
[64]

Mathematical Programming , volume=

Smooth strongly convex interpolation and exact worst-case performance of first-order methods , author=. Mathematical Programming , volume=. 2017 , publisher=

work page 2017
[65]

Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems , journal =

Darina Dvinskikh and Alexander Gasnikov , doi =. Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems , journal =. 2021 , pages =

work page 2021
[66]

Accelerated and nonaccelerated stochastic gradient descent with model conception , author=. Math. Notes , volume=. 2020 , publisher=

work page 2020
[67]

Mathematical Programming , volume=

Linear convergence of first order methods for non-strongly convex optimization , author=. Mathematical Programming , volume=. 2019 , publisher=

work page 2019
[68]

Joint European conference on machine learning and knowledge discovery in databases , pages=

Linear convergence of gradient and proximal-gradient methods under the Polyak--Lojasiewicz condition , author=. Joint European conference on machine learning and knowledge discovery in databases , pages=. 2016 , organization=

work page 2016
[69]

2014 , publisher=

Understanding machine learning: From theory to algorithms , author=. 2014 , publisher=

work page 2014
[70]

2018 , publisher=

Lectures on convex optimization , author=. 2018 , publisher=

work page 2018
[71]

Advances in neural information processing systems , volume=

Non-asymptotic analysis of stochastic approximation algorithms for machine learning , author=. Advances in neural information processing systems , volume=

work page
[72]

e-print,

Learning Theory from First Principles , author=. e-print,

work page
[73]

Finite sample theory , author=

Parametric estimation. Finite sample theory , author=. The Annals of Statistics , volume=. 2012 , publisher=

work page 2012
[74]

2013 , publisher=

Statistical estimation: asymptotic theory , author=. 2013 , publisher=

work page 2013
[75]

Learning From An Optimization Viewpoint

Learning from an optimization viewpoint , author=. arXiv preprint arXiv:1204.4145 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[76]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016
[77]

arXiv preprint arXiv:1909.03550 , year=

Lecture notes: Optimization for machine learning , author=. arXiv preprint arXiv:1909.03550 , year=

work page arXiv 1909
[78]

e-print,

Statistical Learning and Sequential Prediction , author=. e-print,

work page
[79]

Foundations and Trends

Introduction to online convex optimization , author=. Foundations and Trends. 2016 , publisher=

work page 2016
[80]

The mathematics of data , volume=

Introductory lectures on stochastic optimization , author=. The mathematics of data , volume=

work page

Showing first 80 references.

[1] [1]

Journal of machine learning research , volume=

From low probability to high confidence in stochastic convex optimization , author=. Journal of machine learning research , volume=

work page

[2] [2]

Mathematical Programming , volume=

Accelerated stochastic approximation with state-dependent noise , author=. Mathematical Programming , volume=. 2025 , publisher=

work page 2025

[3] [3]

Optimization Methods and Software , volume=

Inexact tensor methods and their application to stochastic convex optimization , author=. Optimization Methods and Software , volume=. 2024 , publisher=

work page 2024

[4] [4]

The Twelfth International Conference on Learning Representations , year=

Advancing the Lower Bounds: an Accelerated, Stochastic, Second-order Method with Optimal Adaptation to Inexactness , author=. The Twelfth International Conference on Learning Representations , year=

work page

[5] [5]

and Juditsky, A

Nemirovski, A. and Juditsky, A. and Lan, G. and Shapiro, A. , title =. SIAM Journal on Optimization , volume =. 2009 , doi =. https://doi.org/10.1137/070704277 , abstract =

work page doi:10.1137/070704277 2009

[6] [6]

Chaos, Solitons & Fractals , volume=

Method with batching for stochastic finite-sum variational inequalities in non-Euclidean setting , author=. Chaos, Solitons & Fractals , volume=. 2024 , publisher=

work page 2024

[7] [7]

European Mathematical Society Magazine , number=

Smooth monotone stochastic variational inequalities and saddle point problems: A survey , author=. European Mathematical Society Magazine , number=

work page

[8] [8]

Forty-second International Conference on Machine Learning , year=

On Linear Convergence in Smooth Convex-Concave Bilinearly-Coupled Saddle-Point Optimization: Lower Bounds and Optimal Algorithms , author=. Forty-second International Conference on Machine Learning , year=

work page

[9] [9]

Journal of Machine Learning Research , year =

Xun Qian and Zheng Qu and Peter Richtarik , title =. Journal of Machine Learning Research , year =

work page

[10] [10]

Solving Convex-Concave Problems with

Lesi Chen and Chengchang Liu and Luo Luo and Jingzhao Zhang , year=. Solving Convex-Concave Problems with. 2506.08362 , archivePrefix=

work page arXiv

[11] [11]

The sample complexity of

Carmon, Daniel and Yehudayoff, Amir and Livni, Roi , booktitle =. The sample complexity of. 2024 , editor =

work page 2024

[12] [12]

2023 , editor =

Ivgi, Maor and Hinder, Oliver and Carmon, Yair , booktitle =. 2023 , editor =

work page 2023

[13] [13]

Proceedings of Thirty Seventh Conference on Learning Theory , pages =

Accelerated Parameter-Free Stochastic Optimization , author =. Proceedings of Thirty Seventh Conference on Learning Theory , pages =. 2024 , editor =

work page 2024

[14] [14]

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =

Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems , author =. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =. 2024 , editor =

work page 2024

[15] [15]

2024 , eprint=

Universal Gradient Methods for Stochastic Convex Optimization , author=. 2024 , eprint=

work page 2024

[16] [16]

2009 , publisher=

Large deviations techniques and applications , author=. 2009 , publisher=

work page 2009

[17] [17]

Journal of the American statistical association , volume=

Probability inequalities for sums of bounded random variables , author=. Journal of the American statistical association , volume=. 1963 , publisher=

work page 1963

[18] [18]

Journal of Machine Learning Research , volume =

Ohad Shamir , title =. Journal of Machine Learning Research , volume =. 2017 , url =

work page 2017

[19] [19]

Stability and Generalization , type =

Bousquet, Olivier and Elisseeff, André , biburl =. Stability and Generalization , type =. Journal of Machine Learning Research , keywords =

work page

[20] [20]

2021 , publisher=

Lectures on stochastic programming: modeling and theory , author=. 2021 , publisher=

work page 2021

[21] [21]

arXiv preprint arXiv:2008.00051 , year=

On the convergence of SGD with biased gradients , author=. arXiv preprint arXiv:2008.00051 , year=

work page arXiv 2008

[22] [22]

2013 , school=

Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization , author=. 2013 , school=

work page 2013

[23] [23]

arXiv preprint arXiv:2201.12289 , year=

The power of first-order smooth optimization for black-box non-smooth problems , author=. arXiv preprint arXiv:2201.12289 , year=

work page arXiv

[24] [24]

1970 , publisher=

Convex analysis , author=. 1970 , publisher=

work page 1970

[25] [25]

arXiv preprint arXiv:2102.08352 , year=

Stochastic variance reduction for variational inequality methods , author=. arXiv preprint arXiv:2102.08352 , year=

work page arXiv

[26] [26]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Advances in Neural Information Processing Systems , volume=

Dual averaging method for regularized stochastic learning and online optimization , author=. Advances in Neural Information Processing Systems , volume=

work page

[28] [28]

arXiv preprint arXiv:2011.13259 , year=

Recent theoretical advances in decentralized distributed convex optimization , author=. arXiv preprint arXiv:2011.13259 , year=

work page arXiv 2011

[29] [29]

arXiv preprint arXiv:2107.07190 , year=

Decentralized and personalized federated learning , author=. arXiv preprint arXiv:2107.07190 , year=

work page arXiv

[30] [30]

Conference on Learning Theory , pages=

The min-max complexity of distributed stochastic convex optimization with intermittent communication , author=. Conference on Learning Theory , pages=. 2021 , organization=

work page 2021

[31] [31]

Never Go Full Batch (in Stochastic Convex Optimization) , url =

Amir, Idan and Carmon, Yair and Koren, Tomer and Livni, Roi , booktitle =. Never Go Full Batch (in Stochastic Convex Optimization) , url =

work page

[32] [32]

Proceedings of The 33rd International Conference on Machine Learning , pages =

Train faster, generalize better: Stability of stochastic gradient descent , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , editor =

work page 2016

[33] [33]

Is Local

Woodworth, Blake and Patel, Kumar Kshitij and Stich, Sebastian and Dai, Zhen and Bullins, Brian and Mcmahan, Brendan and Shamir, Ohad and Srebro, Nathan , booktitle =. Is Local. 2020 , editor =

work page 2020

[34] [34]

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning , url =

Woodworth, Blake E and Srebro, Nathan , booktitle =. An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning , url =

work page

[35] [35]

, author=

Composite objective mirror descent. , author=. COLT , volume=. 2010 , organization=

work page 2010

[36] [36]

Optimization Methods and Software , pages=

Inexact model: A framework for optimization and variational inequalities , author=. Optimization Methods and Software , pages=. 2021 , publisher=

work page 2021

[37] [37]

arXiv preprint arXiv:2205.12751 , year=

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization , author=. arXiv preprint arXiv:2205.12751 , year=

work page arXiv

[38] [38]

SIAM journal on imaging sciences , volume=

A fast iterative shrinkage-thresholding algorithm for linear inverse problems , author=. SIAM journal on imaging sciences , volume=. 2009 , publisher=

work page 2009

[39] [39]

Mathematical programming , volume=

Gradient methods for minimizing composite functions , author=. Mathematical programming , volume=. 2013 , publisher=

work page 2013

[40] [40]

Avtomatika i telemekhanika , number=

Adaptive estimation algorithms: convergence, optimality, stability , author=. Avtomatika i telemekhanika , number=. 1979 , publisher=

work page 1979

[41] [41]

arXiv preprint arXiv:2206.08627 , year=

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization , author=. arXiv preprint arXiv:2206.08627 , year=

work page arXiv

[42] [42]

Avtomatika i Telemekhanika , number=

Optimal pseudogradient adaptation algorithms , author=. Avtomatika i Telemekhanika , number=. 1980 , publisher=

work page 1980

[43] [43]

1984 , publisher=

Recurrent estimation and adaptive filtration , author=. 1984 , publisher=

work page 1984

[44] [44]

Computational Mathematics and Mathematical Physics , volume=

Universal method for stochastic composite optimization problems , author=. Computational Mathematics and Mathematical Physics , volume=. 2018 , publisher=

work page 2018

[45] [45]

Conference on learning theory , pages=

A universal algorithm for variational inequalities adaptive to smoothness and noise , author=. Conference on learning theory , pages=. 2019 , organization=

work page 2019

[46] [46]

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation , volume=

Belkin, Mikhail , year=. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation , volume=. doi:10.1017/S0962492921000039 , journal=

work page doi:10.1017/s0962492921000039

[47] [47]

Proceedings of The 33rd International Conference on Machine Learning , pages =

Generalization Properties and Implicit Regularization for Multiple Passes SGM , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , editor =

work page 2016

[48] [48]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[49] [49]

2020 , publisher=

Deep learning with PyTorch , author=. 2020 , publisher=

work page 2020

[50] [50]

nature , volume=

Deep learning , author=. nature , volume=. 2015 , publisher=

work page 2015

[51] [51]

On the Convergence of Adam and Beyond

On the convergence of adam and beyond , author=. arXiv preprint arXiv:1904.09237 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904

[52] [52]

, author=

Adaptive subgradient methods for online learning and stochastic optimization. , author=. Journal of machine learning research , volume=

work page

[53] [53]

arXiv preprint arXiv:2103.08280 , year=

Lower complexity bounds of finite-sum optimization problems: The results and construction , author=. arXiv preprint arXiv:2103.08280 , year=

work page arXiv

[54] [54]

Advances in Neural Information Processing Systems , volume=

Optimal black-box reductions between optimization objectives , author=. Advances in Neural Information Processing Systems , volume=

work page

[55] [55]

Mathematical Programming , volume=

Universal gradient methods for convex optimization problems , author=. Mathematical Programming , volume=. 2015 , publisher=

work page 2015

[56] [56]

Advances in neural information processing systems , volume=

Tight complexity bounds for optimizing composite objectives , author=. Advances in neural information processing systems , volume=

work page

[57] [57]

Nature Singapore: Springer , year=

Accelerated optimization for machine learning , author=. Nature Singapore: Springer , year=

work page

[58] [58]

arXiv preprint arXiv:1907.04232 , year=

Unified optimal analysis of the (stochastic) gradient method , author=. arXiv preprint arXiv:1907.04232 , year=

work page arXiv 1907

[59] [59]

Advances in Neural Information Processing Systems , volume=

Stochastic optimization with heavy-tailed noise via accelerated gradient clipping , author=. Advances in Neural Information Processing Systems , volume=

work page

[60] [60]

Journal of Optimization Theory and Applications , volume=

Stochastic intermediate gradient method for convex problems with stochastic inexact oracle , author=. Journal of Optimization Theory and Applications , volume=. 2016 , publisher=

work page 2016

[61] [61]

arXiv preprint arXiv:2206.00090 , year=

Decentralized Saddle-Point Problems with Different Constants of Strong Convexity and Strong Concavity , author=. arXiv preprint arXiv:2206.00090 , year=

work page arXiv

[62] [62]

Advances in neural information processing systems , volume=

Graph oracle models, lower bounds, and gaps for parallel stochastic optimization , author=. Advances in neural information processing systems , volume=

work page

[63] [63]

optimization software , author=

Introduction to optimization. optimization software , author=. Inc., Publications Division, New York , volume=

work page

[64] [64]

Mathematical Programming , volume=

Smooth strongly convex interpolation and exact worst-case performance of first-order methods , author=. Mathematical Programming , volume=. 2017 , publisher=

work page 2017

[65] [65]

Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems , journal =

Darina Dvinskikh and Alexander Gasnikov , doi =. Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems , journal =. 2021 , pages =

work page 2021

[66] [66]

Accelerated and nonaccelerated stochastic gradient descent with model conception , author=. Math. Notes , volume=. 2020 , publisher=

work page 2020

[67] [67]

Mathematical Programming , volume=

Linear convergence of first order methods for non-strongly convex optimization , author=. Mathematical Programming , volume=. 2019 , publisher=

work page 2019

[68] [68]

Joint European conference on machine learning and knowledge discovery in databases , pages=

Linear convergence of gradient and proximal-gradient methods under the Polyak--Lojasiewicz condition , author=. Joint European conference on machine learning and knowledge discovery in databases , pages=. 2016 , organization=

work page 2016

[69] [69]

2014 , publisher=

Understanding machine learning: From theory to algorithms , author=. 2014 , publisher=

work page 2014

[70] [70]

2018 , publisher=

Lectures on convex optimization , author=. 2018 , publisher=

work page 2018

[71] [71]

Advances in neural information processing systems , volume=

Non-asymptotic analysis of stochastic approximation algorithms for machine learning , author=. Advances in neural information processing systems , volume=

work page

[72] [72]

e-print,

Learning Theory from First Principles , author=. e-print,

work page

[73] [73]

Finite sample theory , author=

Parametric estimation. Finite sample theory , author=. The Annals of Statistics , volume=. 2012 , publisher=

work page 2012

[74] [74]

2013 , publisher=

Statistical estimation: asymptotic theory , author=. 2013 , publisher=

work page 2013

[75] [75]

Learning From An Optimization Viewpoint

Learning from an optimization viewpoint , author=. arXiv preprint arXiv:1204.4145 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[76] [76]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016

[77] [77]

arXiv preprint arXiv:1909.03550 , year=

Lecture notes: Optimization for machine learning , author=. arXiv preprint arXiv:1909.03550 , year=

work page arXiv 1909

[78] [78]

e-print,

Statistical Learning and Sequential Prediction , author=. e-print,

work page

[79] [79]

Foundations and Trends

Introduction to online convex optimization , author=. Foundations and Trends. 2016 , publisher=

work page 2016

[80] [80]

The mathematics of data , volume=

Introductory lectures on stochastic optimization , author=. The mathematics of data , volume=

work page