arxiv: 2604.22581 · v2 · submitted 2026-04-24 · 🧮 math.OC

Recognition: 2 theorem links

· Lean Theorem

Stochastic Krasnoselskii-Mann Iterations: Convergence without Uniformly Bounded Variance

Daniel Cortild , Coralia Cartis

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:47 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic Krasnoselskii-Mann iterationnonexpansive operatorsfixed-point problemsvariance assumptionsalmost sure convergenceconvergence ratesstochastic optimizationoperator splitting

0 comments

The pith

Stochastic Krasnoselskii-Mann iterations converge almost surely when variance is finite at only one fixed point.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that stochastic iterations for solving expected nonexpansive fixed-point problems in Hilbert space converge under a variance condition weaker than the usual uniform bound. It is enough for the stochastic oracle to have finite variance at a single fixed point. This still produces almost sure weak convergence of the iterates, rates on the expected residual, and almost sure rates on the running minimum residual while recovering the best-known complexity in oracle calls. The results apply to stochastic gradient descent and give the first such guarantees for stochastic three-operator splitting and backward-forward splitting.

Core claim

For an expected nonexpansive operator on a real Hilbert space, the stochastic Krasnoselskii-Mann iteration driven by an oracle with variance finite at one fixed point yields almost sure weak convergence of the iterates to a fixed point. The expected residual converges at the optimal rate and the running minimum residual converges almost surely at a comparable rate, all without requiring the variance to be bounded uniformly.

What carries the argument

The Krasnoselskii-Mann iteration with a stochastic oracle whose variance is finite only at a single fixed point.

If this is right

Almost sure weak convergence of the iterates to a fixed point holds.
Rates are obtained for the expected residual of the iterates.
Almost sure rates are obtained for the running minimum residual.
Optimal stochastic oracle complexity is recovered for stochastic gradient descent.
New convergence guarantees without uniform variance bounds are obtained for stochastic three-operator splitting and stochastic backward-forward splitting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-point variance condition may apply to stochastic methods where noise intensity increases away from the solution.
Analogous local variance assumptions could be tested on other stochastic fixed-point and splitting algorithms.
Oracles could be engineered to keep variance controlled only in a neighborhood of the expected solution rather than globally.
Numerical checks on problems with distance-dependent variance would provide direct evidence for the rates.

Load-bearing premise

The stochastic oracle has finite variance at one fixed point and the operator is nonexpansive in a Hilbert space.

What would settle it

A concrete nonexpansive operator and oracle with finite variance only at one fixed point for which the iterates fail to converge almost surely or the residual rates do not hold.

read the original abstract

We investigate the Stochastic Krasnoselskii-Mann iterations for expected nonexpansive fixed-point problems in a real Hilbert space. We establish convergence guarantees under significantly weaker assumptions on the variance than those typically used in the literature. In particular, instead of a uniform bound on the variance of the stochastic oracle, we only assume finite variance at a single fixed point. Under this assumption, we prove almost sure weak convergence of the iterates, derive convergence rates for the expected residual, and obtain almost sure convergence rates for the running minimum residual. Notably, we recover the best-known stochastic oracle complexity without imposing uniformly bounded variance. We illustrate the applicability of our results to Stochastic Gradient Descent, where we recover known guarantees, and to Stochastic Three-Operator Splitting and Stochastic Backward-Forward Splitting, for which we obtain the first results that avoid uniform variance bounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper weakens the variance assumption for stochastic Krasnoselskii-Mann iterations to finite variance only at the fixed point and obtains the first such results for stochastic three-operator and backward-forward splitting.

read the letter

The main point is that this work replaces the standard uniform bound on oracle variance with the weaker condition that variance is finite at a single fixed point. Under that assumption they prove almost sure weak convergence of the iterates, plus rates for the expected residual and almost sure rates for the running minimum residual. They recover the known SGD guarantees and, more importantly, give the first results of this type for Stochastic Three-Operator Splitting and Stochastic Backward-Forward Splitting while keeping the best-known oracle complexity.

Referee Report

2 major / 2 minor

Summary. The paper investigates stochastic Krasnoselskii-Mann iterations for expected nonexpansive fixed-point problems in real Hilbert spaces. It claims to establish almost sure weak convergence of the iterates, convergence rates for the expected residual, and almost sure convergence rates for the running minimum residual, under the weaker assumption that the stochastic oracle has finite variance only at a single fixed point rather than uniformly bounded variance. The results are applied to stochastic gradient descent and certain operator splitting methods, recovering known guarantees and providing new ones.

Significance. If the proofs hold, this work significantly weakens the variance assumptions in stochastic fixed-point iterations, which could extend the applicability of these methods to a broader class of problems where variance is not uniformly bounded. The recovery of optimal stochastic oracle complexity without uniform bounds is a strong point, and the first results for stochastic three-operator splitting and backward-forward splitting under these conditions are valuable contributions.

major comments (2)

[§2.2] §2.2 (variance assumption): The paper assumes only that the stochastic oracle has finite variance at one fixed point x*. Nonexpansiveness controls ||x_n - x*|| but does not control the variance of the oracle at other points x_n visited by the iteration. No growth condition, continuity of the variance map, or a-priori boundedness argument is indicated in the abstract or setup. This leaves the existence of E[||T(x_n) - x_n + noise||^2] (and thus the well-definedness of the iteration and all residual statements) unverified. The central claims on almost-sure weak convergence and expected-residual rates are load-bearing on this point.
[Main convergence theorem] Main convergence theorem (likely Theorem 3.1 or §3): The proof of almost-sure weak convergence and the derivation of rates for the running minimum residual must explicitly establish that the stochastic term remains square-integrable along the trajectory. If the argument relies only on nonexpansiveness without additional justification, the statements become formally undefined when the variance grows away from x*.

minor comments (2)

[Abstract] Abstract: The phrase 'recover the best-known stochastic oracle complexity' should cite the specific prior result (e.g., the reference for the O(1/ε) or O(1/√ε) bound being matched).
[Notation] Notation: Ensure the stochastic oracle is denoted consistently (e.g., T_ξ or similar) across sections and that the fixed point is clearly labeled x* throughout.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The points on the variance assumption and well-definedness of the iteration are well-taken and highlight areas where the manuscript can be clarified. We respond to each major comment below and will incorporate the necessary revisions.

read point-by-point responses

Referee: [§2.2] §2.2 (variance assumption): The paper assumes only that the stochastic oracle has finite variance at one fixed point x*. Nonexpansiveness controls ||x_n - x*|| but does not control the variance of the oracle at other points x_n visited by the iteration. No growth condition, continuity of the variance map, or a-priori boundedness argument is indicated in the abstract or setup. This leaves the existence of E[||T(x_n) - x_n + noise||^2] (and thus the well-definedness of the iteration and all residual statements) unverified. The central claims on almost-sure weak convergence and expected-residual rates are load-bearing on this point.

Authors: We agree that the setup requires explicit clarification on this point to avoid ambiguity. The model assumes the stochastic oracle produces a square-integrable output at every point in the domain (a standard modeling hypothesis for such iterations to be executable), while the novel contribution is the absence of a uniform bound on the variance. Nonexpansiveness is used to control the expected distance to x* and to derive summability of the residuals, but we acknowledge that this does not automatically guarantee finite variance at x_n without further regularity. In the revised manuscript we will add a dedicated paragraph in §2.2 stating the well-definedness assumption explicitly and noting that the finite-variance condition at x* suffices for the convergence analysis without needing a global bound. This is a clarification rather than a strengthening of the hypotheses. revision: yes
Referee: [Main convergence theorem] Main convergence theorem (likely Theorem 3.1 or §3): The proof of almost-sure weak convergence and the derivation of rates for the running minimum residual must explicitly establish that the stochastic term remains square-integrable along the trajectory. If the argument relies only on nonexpansiveness without additional justification, the statements become formally undefined when the variance grows away from x*.

Authors: We thank the referee for this precise observation. The proof of Theorem 3.1 applies a Robbins-Siegmund lemma to the sequence of squared distances to x*, where the variance contribution appears as a perturbation term whose conditional expectation is controlled using the finite-variance assumption at x* together with the nonexpansive property. However, to ensure all conditional expectations are rigorously defined, an explicit verification that the iterates remain square-integrable is needed. We will revise §3 to insert a preliminary step (by induction on the boundedness in expectation established via the nonexpansive fixed-point property) that confirms square-integrability of the stochastic term along the trajectory. This makes the argument self-contained without altering the stated assumptions or conclusions. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained under stated assumptions

full rationale

The paper derives almost-sure weak convergence and residual rates for the stochastic Krasnoselskii-Mann iteration from the nonexpansiveness of the operator, the Hilbert-space inner-product structure, and the single-point finite-variance assumption. All steps invoke standard martingale convergence arguments and expectation bounds that follow directly from the given conditions without any reduction to self-definition, parameter fitting, or load-bearing self-citation. The central claims therefore remain independent of the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claims rest on standard properties of Hilbert spaces and nonexpansive operators, with the novel part being the single-point variance assumption.

axioms (3)

domain assumption The mapping is nonexpansive
Standard assumption for Krasnoselskii-Mann iterations in fixed-point problems.
standard math The space is a real Hilbert space
Allows use of inner product and weak convergence.
domain assumption Finite variance of the stochastic oracle at the fixed point
The key weakened assumption replacing uniform boundedness.

pith-pipeline@v0.9.0 · 5441 in / 1391 out tokens · 47013 ms · 2026-05-12T02:47:28.215629+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We only assume finite variance at a single fixed point... prove almost sure weak convergence... recover O(ε^{-4}) stochastic oracle complexity
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Variance Transfer Lemma: E[||(Tξ−T)x||²] ≤ 2σ²* + 8||x−p||²

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Quadratic Objective Perturbation: Curvature-Based Differential Privacy
cs.LG 2026-05 unverdicted novelty 7.0

QOP achieves (ε, δ)-differential privacy for ERM in the interpolation regime under weaker assumptions than linear objective perturbation by using random quadratic curvature to enforce stability and control sensitivity.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Agarwal, M

A. Agarwal, M. J. Wainwright, P. Bartlett, and P. Ravikumar. Information-theoretic lower bounds on the oracle complexity of convex optimization. InAdvances in Neural Information Processing Systems, volume 22, 2009

work page 2009
[2]

Attia, M

A. Attia, M. Schliserman, U. Sherman, and T. Koren. Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, Oct. 2025

work page 2025
[3]

Bach and E

F. Bach and E. Moulines. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning. InAdvances in Neural Information Processing Systems, volume 24, 2011

work page 2011
[4]

Bottou, F

L. Bottou, F. E. Curtis, and J. Nocedal. Optimization Methods for Large-Scale Machine Learning.SIAM Review, 60(2):223–311, Jan. 2018

work page 2018
[5]

Bravo and R

M. Bravo and R. Cominetti. Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds.SIAM Journal on Control and Optimization, 62(1):191–219, Feb. 2024

work page 2024
[6]

Bravo and J

M. Bravo and J. P. Contreras. Stochastic Halpern iteration in normed spaces and applications to reinforcement learning.Mathematical Programming, Mar. 2026

work page 2026
[7]

F. E. Browder. Nonexpansive nonlinear operators in a Banach space.Proceedings of the National Academy of Sciences, 54(4):1041–1044, 1965

work page 1965
[8]

Cortild, L

D. Cortild, L. Ketels, J. Peypouquet, and G. Garrigos. Bias-Optimal Bounds for SGD: A Computer-Aided Lyapunov Analysis, May 2025. arXiv preprint arXiv:2505.17965

work page arXiv 2025
[9]

Cortild and J

D. Cortild and J. Peypouquet. Krasnoselskii–Mann Iterations: Inertia, Perturbations and Approximation. Journal of Optimization Theory and Applications, 204(2):35, Jan. 2025

work page 2025
[10]

Davis and W

D. Davis and W. Yin. A Three-Operator Splitting Scheme and its Optimization Applications.Set-Valued and Variational Analysis, 25(4):829–858, Dec. 2017

work page 2017
[11]

Q.-L. Dong, Y. J. Cho, S. He, P. M. Pardalos, and T. M. Rassias.The Krasnosel’ski˘ ı-Mann Iterative Method: Recent Progress and Applications. Springer Cham, 2022

work page 2022
[12]

Almost Sure Convergence of SGD on Smooth Non-Convex Functions, Oct

Francesco Orabona. Almost Sure Convergence of SGD on Smooth Non-Convex Functions, Oct. 2020

work page 2020
[13]

Garrigos, D

G. Garrigos, D. Cortild, L. Ketels, and J. Peypouquet. Last-Iterate Complexity of SGD for Convex and Smooth Stochastic Problems, July 2025. arXiv preprint arXiv:2507.14122

work page arXiv 2025
[14]

Garrigos and R

G. Garrigos and R. M. Gower. Handbook of Convergence Theorems for (Stochastic) Gradient Methods, Mar

work page
[15]

arXiv preprint arXiv:2301.11235. 13

work page arXiv
[16]

Gower, O

R. Gower, O. Sebbouh, and N. Loizou. SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation. InProceedings of the 24th International Conference on Artificial Intelligence and Statistics, Mar. 2021

work page 2021
[17]

R. M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, and P. Richt´ arik. SGD: General Analysis and Improved Rates. InProceedings of the 36th International Conference on Machine Learning, May 2019

work page 2019
[18]

C. W. Groetsch. A note on segmenting Mann iterates.Journal of Mathematical Analysis and Applications, 40(2):369–372, 1972

work page 1972
[19]

H. Iiduka. Mini-Batch Stochastic Krasnosel’ski\uı-Mann Algorithm for Nonexpansive Fixed Point Problems, Apr. 2026. arXiv preprint arXiv:2604.06909

work page internal anchor Pith review Pith/arXiv arXiv 2026
[20]

Khaled and P

A. Khaled and P. Richt´ arik. Better Theory for SGD in the Nonconvex World.Transactions on Machine Learning Research, 2023

work page 2023
[21]

M. A. Krasnosel’skii. Two remarks on the method of successive approximations.Uspekhi Matematicheskikh Nauk, 10(1(63)):123–127, 1955

work page 1955
[22]

Liu and Y

J. Liu and Y. Yuan. On Almost Sure Convergence Rates of Stochastic Gradient Methods. InProceedings of Thirty Fifth Conference on Learning Theory, June 2022

work page 2022
[23]

W. R. Mann. Mean value methods in iteration.Proceedings of the American Mathematical Society, 4:506–510, 1953

work page 1953
[24]

J. J. Maul´ en, I. Fierro, and J. Peypouquet. Inertial Krasnoselskii-Mann Iterations.Set-Valued and Variational Analysis, 32(2):10, 2024

work page 2024
[25]

Needell, N

D. Needell, N. Srebro, and R. Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm.Mathematical Programming, 155(1):549–573, Jan. 2016

work page 2016
[26]

Nguyen, P

L. Nguyen, P. H. Nguyen, M. Dijk, P. Richtarik, K. Scheinberg, and M. Takac. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption. InProceedings of the 35th International Conference on Machine Learning, pages 3750–3758, July 2018

work page 2018
[27]

Z. Opial. Weak convergence of the sequence of successive approximations for nonexpansive mappings.Bulletin of the American Mathematical Society, 73(4):591–597, 1967

work page 1967
[28]

Robbins and D

H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. InOptimizing Methods in Statistics, pages 233–257. Elsevier, 1971

work page 1971
[29]

H. H. Schaefer. ¨Uber die Methode sukzessiver Approximationen.Jahresbericht Der Deutschen Mathematiker- vereinigung, 59:131–140, 1957

work page 1957
[30]

Yurtsever, A

A. Yurtsever, A. Gu, and S. Sra. Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates. InAdvances in Neural Information Processing Systems, volume 34, 2021

work page 2021
[31]

Yurtsever, B

A. Yurtsever, B. C. Vu, and V. Cevher. Stochastic Three-Composite Convex Minimization. InAdvances in Neural Information Processing Systems, volume 29, 2016. 14

work page 2016