arxiv: 2604.13388 · v1 · submitted 2026-04-15 · 🧮 math.OC

Recognition: unknown

Convergence of the Iterates of the Stochastic Proximal Gradient Method

Javier I. Madariaga

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:37 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic proximal gradient methodalmost sure convergenceconvergence in the meanconvex optimizationstochastic optimizationclassification problemsconvex feasibility

0 comments

The pith

The stochastic proximal gradient method's iterates converge almost surely and in mean to a minimizer without needing bounds on random variable variance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that for minimizing the sum of a smooth convex function and another convex function, the stochastic proximal gradient algorithm generates a sequence of points that converges to a solution both almost surely and in the mean. This holds under appropriate assumptions on the functions involved and the underlying stochastic process. Importantly, the proof does not rely on the random variables having bounded variance or the sequence being bounded in any way. This matters because it broadens the applicability of the method to noisy optimization problems where variance control is hard to enforce, such as in machine learning classification tasks and convex feasibility problems.

Core claim

We propose a novel study of the stochastic proximal gradient method for minimizing the sum of two convex functions, one of which is smooth. Under suitable assumptions and without requiring any boundedness or control of the variance of the random variables, we derive the almost sure convergence and the convergence in the mean of the iterates to a solution of the minimization problem. The results are applied to classification and convex feasibility problems.

What carries the argument

The stochastic proximal gradient iteration combining a stochastic gradient step on the smooth convex function with a proximal mapping on the nonsmooth convex function, analyzed for convergence without variance bounds on the random terms.

If this is right

The method provides convergence guarantees for classification problems.
The same guarantees apply to convex feasibility problems.
No variance reduction techniques or bounded noise assumptions are needed for the convergence results.
Both almost sure and mean convergence hold for the sequence of iterates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach might extend to other stochastic first-order methods if analogous assumptions can be identified.
This removes a barrier for applying the method in settings with heavy-tailed or unbounded noise common in some data-driven problems.
Similar analysis could be tested on related algorithms like stochastic forward-backward splitting.

Load-bearing premise

Suitable assumptions exist on the two convex functions and the stochastic process that enable the convergence claims without any boundedness or variance control.

What would settle it

A concrete instance of convex functions and stochastic process meeting the paper's assumptions where the generated sequence of iterates fails to converge almost surely or in the mean to a solution.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows almost-sure and mean convergence for stochastic proximal gradient iterates without variance bounds on the oracle, using square-summable stepsizes and the Robbins-Siegmund lemma.

read the letter

The central result is that the stochastic proximal gradient method converges almost surely and in mean to a minimizer for the sum of a smooth convex function and a nonsmooth convex function, without any bound or control on the variance of the stochastic gradients. The proof applies the Robbins-Siegmund lemma to a Lyapunov sequence and lets the square-summability of the step sizes absorb the noise term; mean convergence then follows from Fatou's lemma. That relaxation is the main technical move and it is cleanly executed under the stated assumptions of proper convex lower-semicontinuous functions, one with Lipschitz gradient, and an unbiased oracle. The applications to classification and convex feasibility problems are direct verifications that the framework covers those settings, which is useful but not surprising. The derivation looks internally consistent and does not hide extra variance controls or circular steps. The assumptions remain fairly standard, so the advance is incremental rather than foundational. One modest limitation is that square-summable stepsizes are still required, which restricts the practical step-size choices one can make. The paper would be stronger with a short comparison to earlier results that did impose variance bounds, to quantify how much is gained. No numerical checks are included to show behavior when the noise is large, but that is not essential for a pure convergence paper. This work is for readers already working in stochastic optimization theory who want a reference for relaxed noise conditions. It is self-contained enough and the central claim holds up, so it deserves a serious referee. I would send it to peer review.

Referee Report

0 major / 0 minor

Summary. The manuscript analyzes the stochastic proximal gradient method for solving convex optimization problems of the form min (f + g), where f is smooth convex and g is convex lower semicontinuous. Under the assumptions that f and g are proper convex lsc, ∇f is Lipschitz continuous, and the stochastic gradient oracle for f is unbiased (with no variance bound imposed), the authors prove that the iterates converge almost surely and in mean to a minimizer. The proof applies the Robbins-Siegmund lemma to a Lyapunov sequence controlled by the square-summable step sizes, and uses Fatou's lemma for mean convergence. The results are illustrated on classification and convex feasibility problems.

Significance. If the claims hold, this work is significant because it establishes convergence results for the stochastic proximal gradient algorithm under weaker conditions than standard in the literature, specifically relaxing the common requirement of bounded variance for the stochastic oracle. This is valuable for practical applications where variance control may not hold. The proof strategy leverages classical tools (Robbins-Siegmund lemma, Fatou's lemma) in a clean manner, and the applications to classification and feasibility problems provide concrete context. The parameter-free nature of the convergence (relying only on step-size summability) is a strength.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The referee's summary correctly captures the key contributions, including the almost-sure and mean convergence results under the relaxed assumption of no variance bound on the stochastic oracle.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation applies the Robbins-Siegmund lemma to a Lyapunov sequence constructed from the proximal-gradient iterates, with the noise term bounded solely by square-summability of the step sizes; mean convergence follows directly from Fatou's lemma. All assumptions (proper convex lsc functions, one smooth with Lipschitz gradient, unbiased oracle) are stated externally in Section 2 and do not reference the target convergence statements. No equation reduces to a fitted parameter renamed as prediction, no self-citation chain is load-bearing, and the proof chain is independent of the paper's own results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities can be extracted or audited.

pith-pipeline@v0.9.0 · 5346 in / 984 out tokens · 24319 ms · 2026-05-10T13:37:12.348213+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 1 canonical work pages

[1]

Asi and J

H. Asi and J. C. Duchi, Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity, SIAM J. Optim., vol. 29, pp. 2257–2290, 2019

2019
[2]

Y. F. Atchadé, G. Fort, and E. Moulines, On perturbed proximal gradient algorithms, J. Mach. Learn. Res. , vol. 18, pp. 1–33, 2017

2017
[3]

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2nd ed. Springer, New York, 2017

2017
[4]

D. P. Bertsekas, Incremental proximal methods for large scale convex optimization, Math. Program., vol. B129, pp. 163–196, 2011

2011
[5]

Bianchi and W

P. Bianchi and W. Hachem, Dynamical behavior of a stochastic forward-backward algorithm using ran- dom monotone operators, J. Optim. Theory Appl., vol. 171, pp. 90–120, 2016

2016
[6]

Bravo and R

M. Bravo and R. Cominetti, Stochastic fixed-point iterations for nonexpansive maps: Convergence and error bounds, SIAM J. Control Optim. , vol. 62, pp. 191–219, 2024

2024
[7]

Butnariu, The expected-projection method: Its behavior and applications to linear operator equations and convex optimization, J

D. Butnariu, The expected-projection method: Its behavior and applications to linear operator equations and convex optimization, J. Appl. Anal., vol. 1, pp. 93–108, 1995

1995
[8]

Butnariu and S

D. Butnariu and S. D. Flåm, Strong convergence of expected-projection methods in Hilbert spaces,Numer. Funct. Anal. Optim., vol. 16, pp. 601–636, 1995

1995
[9]

S. Chen, Y. Zhang, and Q. Yang, Multi-task learning in natural language processing: An overview, ACM Comput. Surv., vol. 56, pp. 1–32, 2024. 11

2024
[10]

P. L. Combettes, The geometry of monotone operator splitting methods,Acta Numer., vol. 33, pp. 487–632, 2024

2024
[11]

P. L. Combettes and J. I. Madariaga, A geometric framework for stochastic iterations, Math. Comput., to appear
[12]

P. L. Combettes and J. I. Madariaga, Asymptotic analysis of an abstract stochastic scheme for solving monotone inclusions, arxiv, 2025. https://arxiv.org/pdf/2512.03023

work page arXiv 2025
[13]

P. L. Combettes and J.-C. Pesquet, Stochastic quasi-Fejér block-coordinate fixed point iterations with ran- dom sweeping, SIAM J. Optim., vol. 25, pp. 1221–1248, 2015

2015
[14]

P. L. Combettes and J.-C. Pesquet, Stochastic approximations and perturbations in forward-backward splitting for monotone operators, Pure Appl. Funct. Anal., vol. 1, pp. 13–37, 2016

2016
[15]

P. L. Combettes and B. C. V˜u, Variable metric quasi-Fejér monotonicity,Nonlinear Anal., vol. 78, pp. 17–31, 2013

2013
[16]

P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting,Multiscale Model. Simul., vol. 4, pp. 1168–1200, 2005

2005
[17]

S. Cui, U. Shanbhag, M. Staudigl, and P. Vuong, Stochastic relaxed inertial forward-backward-forward splitting for monotone inclusions in Hilbert spaces, Comput. Optim. Appl., vol. 83, pp. 465–524, 2022

2022
[18]

Eisenmann, T

M. Eisenmann, T. Stillfjord, and M. Williamson, Sub-linear convergence of a stochastic proximal iteration method in Hilbert space, Comput. Optim. Appl., vol. 83, pp. 181–210, 2022

2022
[19]

G. Fort, E. Ollier, and A. Samson, Stochastic proximal-gradient algorithms for penalized mixed models, Stat. Comput., vol. 29, pp. 231–253, 2019

2019
[20]

Hermer, D

N. Hermer, D. R. Luke, and A. Sturm, Nonexpansive Markov operators and random function iterations for stochastic fixed point problems, J. Convex Anal., vol. 30, pp. 1073–1114, 2023

2023
[21]

Iiduka, Almost sure convergence of random projected proximal and subgradient algorithms for dis- tributed nonsmooth convex optimization, Optimization, vol

H. Iiduka, Almost sure convergence of random projected proximal and subgradient algorithms for dis- tributed nonsmooth convex optimization, Optimization, vol. 66, pp. 35–59, 2017

2017
[22]

Patrascu and P

A. Patrascu and P. Irofti, Stochastic proximal splitting algorithm for composite minimization,Optim. Lett., vol. 15, pp. 2255–2273, 2021

2021
[23]

Pennanen and A.-P

T. Pennanen and A.-P. Perkkiö, Convex Stochastic Optimization. Springer, Berlin, 2024

2024
[24]

Rosasco, S

L. Rosasco, S. Villa, and B. C. V ˜u, Convergence of stochastic proximal gradient algorithm, Appl. Math. Optim., vol. 82, pp. 891–917, 2020

2020
[25]

A. N. Shiryaev, Probability–1, 3rd ed. Springer, New York, 2016

2016
[26]

Traoré, V

C. Traoré, V. Apidopoulos, S. Salzo, and S. Villa, Variance reduction techniques for stochastic proximal point algorithms, J. Optim. Theory Appl., vol. 203, pp. 1910–1939, 2024

1910
[27]

Tseng, Applications of a splitting algorithm to decomposition in convex programming and variational inequalities, SIAM J

P. Tseng, Applications of a splitting algorithm to decomposition in convex programming and variational inequalities, SIAM J. Control Optim. , vol. 29, pp. 119–138, 1991

1991
[28]

Xiao, and T

L. Xiao, and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., vol. 24, pp. 2057–2075, 2014. 12

2057