pith. machine review for the scientific record. sign in

arxiv: 2604.22581 · v2 · submitted 2026-04-24 · 🧮 math.OC

Recognition: 2 theorem links

· Lean Theorem

Stochastic Krasnoselskii-Mann Iterations: Convergence without Uniformly Bounded Variance

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:47 UTC · model grok-4.3

classification 🧮 math.OC
keywords stochastic Krasnoselskii-Mann iterationnonexpansive operatorsfixed-point problemsvariance assumptionsalmost sure convergenceconvergence ratesstochastic optimizationoperator splitting
0
0 comments X

The pith

Stochastic Krasnoselskii-Mann iterations converge almost surely when variance is finite at only one fixed point.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that stochastic iterations for solving expected nonexpansive fixed-point problems in Hilbert space converge under a variance condition weaker than the usual uniform bound. It is enough for the stochastic oracle to have finite variance at a single fixed point. This still produces almost sure weak convergence of the iterates, rates on the expected residual, and almost sure rates on the running minimum residual while recovering the best-known complexity in oracle calls. The results apply to stochastic gradient descent and give the first such guarantees for stochastic three-operator splitting and backward-forward splitting.

Core claim

For an expected nonexpansive operator on a real Hilbert space, the stochastic Krasnoselskii-Mann iteration driven by an oracle with variance finite at one fixed point yields almost sure weak convergence of the iterates to a fixed point. The expected residual converges at the optimal rate and the running minimum residual converges almost surely at a comparable rate, all without requiring the variance to be bounded uniformly.

What carries the argument

The Krasnoselskii-Mann iteration with a stochastic oracle whose variance is finite only at a single fixed point.

If this is right

  • Almost sure weak convergence of the iterates to a fixed point holds.
  • Rates are obtained for the expected residual of the iterates.
  • Almost sure rates are obtained for the running minimum residual.
  • Optimal stochastic oracle complexity is recovered for stochastic gradient descent.
  • New convergence guarantees without uniform variance bounds are obtained for stochastic three-operator splitting and stochastic backward-forward splitting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The single-point variance condition may apply to stochastic methods where noise intensity increases away from the solution.
  • Analogous local variance assumptions could be tested on other stochastic fixed-point and splitting algorithms.
  • Oracles could be engineered to keep variance controlled only in a neighborhood of the expected solution rather than globally.
  • Numerical checks on problems with distance-dependent variance would provide direct evidence for the rates.

Load-bearing premise

The stochastic oracle has finite variance at one fixed point and the operator is nonexpansive in a Hilbert space.

What would settle it

A concrete nonexpansive operator and oracle with finite variance only at one fixed point for which the iterates fail to converge almost surely or the residual rates do not hold.

read the original abstract

We investigate the Stochastic Krasnoselskii-Mann iterations for expected nonexpansive fixed-point problems in a real Hilbert space. We establish convergence guarantees under significantly weaker assumptions on the variance than those typically used in the literature. In particular, instead of a uniform bound on the variance of the stochastic oracle, we only assume finite variance at a single fixed point. Under this assumption, we prove almost sure weak convergence of the iterates, derive convergence rates for the expected residual, and obtain almost sure convergence rates for the running minimum residual. Notably, we recover the best-known stochastic oracle complexity without imposing uniformly bounded variance. We illustrate the applicability of our results to Stochastic Gradient Descent, where we recover known guarantees, and to Stochastic Three-Operator Splitting and Stochastic Backward-Forward Splitting, for which we obtain the first results that avoid uniform variance bounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper investigates stochastic Krasnoselskii-Mann iterations for expected nonexpansive fixed-point problems in real Hilbert spaces. It claims to establish almost sure weak convergence of the iterates, convergence rates for the expected residual, and almost sure convergence rates for the running minimum residual, under the weaker assumption that the stochastic oracle has finite variance only at a single fixed point rather than uniformly bounded variance. The results are applied to stochastic gradient descent and certain operator splitting methods, recovering known guarantees and providing new ones.

Significance. If the proofs hold, this work significantly weakens the variance assumptions in stochastic fixed-point iterations, which could extend the applicability of these methods to a broader class of problems where variance is not uniformly bounded. The recovery of optimal stochastic oracle complexity without uniform bounds is a strong point, and the first results for stochastic three-operator splitting and backward-forward splitting under these conditions are valuable contributions.

major comments (2)
  1. [§2.2] §2.2 (variance assumption): The paper assumes only that the stochastic oracle has finite variance at one fixed point x*. Nonexpansiveness controls ||x_n - x*|| but does not control the variance of the oracle at other points x_n visited by the iteration. No growth condition, continuity of the variance map, or a-priori boundedness argument is indicated in the abstract or setup. This leaves the existence of E[||T(x_n) - x_n + noise||^2] (and thus the well-definedness of the iteration and all residual statements) unverified. The central claims on almost-sure weak convergence and expected-residual rates are load-bearing on this point.
  2. [Main convergence theorem] Main convergence theorem (likely Theorem 3.1 or §3): The proof of almost-sure weak convergence and the derivation of rates for the running minimum residual must explicitly establish that the stochastic term remains square-integrable along the trajectory. If the argument relies only on nonexpansiveness without additional justification, the statements become formally undefined when the variance grows away from x*.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'recover the best-known stochastic oracle complexity' should cite the specific prior result (e.g., the reference for the O(1/ε) or O(1/√ε) bound being matched).
  2. [Notation] Notation: Ensure the stochastic oracle is denoted consistently (e.g., T_ξ or similar) across sections and that the fixed point is clearly labeled x* throughout.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The points on the variance assumption and well-definedness of the iteration are well-taken and highlight areas where the manuscript can be clarified. We respond to each major comment below and will incorporate the necessary revisions.

read point-by-point responses
  1. Referee: [§2.2] §2.2 (variance assumption): The paper assumes only that the stochastic oracle has finite variance at one fixed point x*. Nonexpansiveness controls ||x_n - x*|| but does not control the variance of the oracle at other points x_n visited by the iteration. No growth condition, continuity of the variance map, or a-priori boundedness argument is indicated in the abstract or setup. This leaves the existence of E[||T(x_n) - x_n + noise||^2] (and thus the well-definedness of the iteration and all residual statements) unverified. The central claims on almost-sure weak convergence and expected-residual rates are load-bearing on this point.

    Authors: We agree that the setup requires explicit clarification on this point to avoid ambiguity. The model assumes the stochastic oracle produces a square-integrable output at every point in the domain (a standard modeling hypothesis for such iterations to be executable), while the novel contribution is the absence of a uniform bound on the variance. Nonexpansiveness is used to control the expected distance to x* and to derive summability of the residuals, but we acknowledge that this does not automatically guarantee finite variance at x_n without further regularity. In the revised manuscript we will add a dedicated paragraph in §2.2 stating the well-definedness assumption explicitly and noting that the finite-variance condition at x* suffices for the convergence analysis without needing a global bound. This is a clarification rather than a strengthening of the hypotheses. revision: yes

  2. Referee: [Main convergence theorem] Main convergence theorem (likely Theorem 3.1 or §3): The proof of almost-sure weak convergence and the derivation of rates for the running minimum residual must explicitly establish that the stochastic term remains square-integrable along the trajectory. If the argument relies only on nonexpansiveness without additional justification, the statements become formally undefined when the variance grows away from x*.

    Authors: We thank the referee for this precise observation. The proof of Theorem 3.1 applies a Robbins-Siegmund lemma to the sequence of squared distances to x*, where the variance contribution appears as a perturbation term whose conditional expectation is controlled using the finite-variance assumption at x* together with the nonexpansive property. However, to ensure all conditional expectations are rigorously defined, an explicit verification that the iterates remain square-integrable is needed. We will revise §3 to insert a preliminary step (by induction on the boundedness in expectation established via the nonexpansive fixed-point property) that confirms square-integrability of the stochastic term along the trajectory. This makes the argument self-contained without altering the stated assumptions or conclusions. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained under stated assumptions

full rationale

The paper derives almost-sure weak convergence and residual rates for the stochastic Krasnoselskii-Mann iteration from the nonexpansiveness of the operator, the Hilbert-space inner-product structure, and the single-point finite-variance assumption. All steps invoke standard martingale convergence arguments and expectation bounds that follow directly from the given conditions without any reduction to self-definition, parameter fitting, or load-bearing self-citation. The central claims therefore remain independent of the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claims rest on standard properties of Hilbert spaces and nonexpansive operators, with the novel part being the single-point variance assumption.

axioms (3)
  • domain assumption The mapping is nonexpansive
    Standard assumption for Krasnoselskii-Mann iterations in fixed-point problems.
  • standard math The space is a real Hilbert space
    Allows use of inner product and weak convergence.
  • domain assumption Finite variance of the stochastic oracle at the fixed point
    The key weakened assumption replacing uniform boundedness.

pith-pipeline@v0.9.0 · 5441 in / 1391 out tokens · 47013 ms · 2026-05-12T02:47:28.215629+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Quadratic Objective Perturbation: Curvature-Based Differential Privacy

    cs.LG 2026-05 unverdicted novelty 7.0

    QOP achieves (ε, δ)-differential privacy for ERM in the interpolation regime under weaker assumptions than linear objective perturbation by using random quadratic curvature to enforce stability and control sensitivity.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Agarwal, M

    A. Agarwal, M. J. Wainwright, P. Bartlett, and P. Ravikumar. Information-theoretic lower bounds on the oracle complexity of convex optimization. InAdvances in Neural Information Processing Systems, volume 22, 2009

  2. [2]

    Attia, M

    A. Attia, M. Schliserman, U. Sherman, and T. Koren. Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, Oct. 2025

  3. [3]

    Bach and E

    F. Bach and E. Moulines. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning. InAdvances in Neural Information Processing Systems, volume 24, 2011

  4. [4]

    Bottou, F

    L. Bottou, F. E. Curtis, and J. Nocedal. Optimization Methods for Large-Scale Machine Learning.SIAM Review, 60(2):223–311, Jan. 2018

  5. [5]

    Bravo and R

    M. Bravo and R. Cominetti. Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds.SIAM Journal on Control and Optimization, 62(1):191–219, Feb. 2024

  6. [6]

    Bravo and J

    M. Bravo and J. P. Contreras. Stochastic Halpern iteration in normed spaces and applications to reinforcement learning.Mathematical Programming, Mar. 2026

  7. [7]

    F. E. Browder. Nonexpansive nonlinear operators in a Banach space.Proceedings of the National Academy of Sciences, 54(4):1041–1044, 1965

  8. [8]

    Cortild, L

    D. Cortild, L. Ketels, J. Peypouquet, and G. Garrigos. Bias-Optimal Bounds for SGD: A Computer-Aided Lyapunov Analysis, May 2025. arXiv preprint arXiv:2505.17965

  9. [9]

    Cortild and J

    D. Cortild and J. Peypouquet. Krasnoselskii–Mann Iterations: Inertia, Perturbations and Approximation. Journal of Optimization Theory and Applications, 204(2):35, Jan. 2025

  10. [10]

    Davis and W

    D. Davis and W. Yin. A Three-Operator Splitting Scheme and its Optimization Applications.Set-Valued and Variational Analysis, 25(4):829–858, Dec. 2017

  11. [11]

    Q.-L. Dong, Y. J. Cho, S. He, P. M. Pardalos, and T. M. Rassias.The Krasnosel’ski˘ ı-Mann Iterative Method: Recent Progress and Applications. Springer Cham, 2022

  12. [12]

    Almost Sure Convergence of SGD on Smooth Non-Convex Functions, Oct

    Francesco Orabona. Almost Sure Convergence of SGD on Smooth Non-Convex Functions, Oct. 2020

  13. [13]

    Garrigos, D

    G. Garrigos, D. Cortild, L. Ketels, and J. Peypouquet. Last-Iterate Complexity of SGD for Convex and Smooth Stochastic Problems, July 2025. arXiv preprint arXiv:2507.14122

  14. [14]

    Garrigos and R

    G. Garrigos and R. M. Gower. Handbook of Convergence Theorems for (Stochastic) Gradient Methods, Mar

  15. [15]

    arXiv preprint arXiv:2301.11235. 13

  16. [16]

    Gower, O

    R. Gower, O. Sebbouh, and N. Loizou. SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation. InProceedings of the 24th International Conference on Artificial Intelligence and Statistics, Mar. 2021

  17. [17]

    R. M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, and P. Richt´ arik. SGD: General Analysis and Improved Rates. InProceedings of the 36th International Conference on Machine Learning, May 2019

  18. [18]

    C. W. Groetsch. A note on segmenting Mann iterates.Journal of Mathematical Analysis and Applications, 40(2):369–372, 1972

  19. [19]

    H. Iiduka. Mini-Batch Stochastic Krasnosel’ski\uı-Mann Algorithm for Nonexpansive Fixed Point Problems, Apr. 2026. arXiv preprint arXiv:2604.06909

  20. [20]

    Khaled and P

    A. Khaled and P. Richt´ arik. Better Theory for SGD in the Nonconvex World.Transactions on Machine Learning Research, 2023

  21. [21]

    M. A. Krasnosel’skii. Two remarks on the method of successive approximations.Uspekhi Matematicheskikh Nauk, 10(1(63)):123–127, 1955

  22. [22]

    Liu and Y

    J. Liu and Y. Yuan. On Almost Sure Convergence Rates of Stochastic Gradient Methods. InProceedings of Thirty Fifth Conference on Learning Theory, June 2022

  23. [23]

    W. R. Mann. Mean value methods in iteration.Proceedings of the American Mathematical Society, 4:506–510, 1953

  24. [24]

    J. J. Maul´ en, I. Fierro, and J. Peypouquet. Inertial Krasnoselskii-Mann Iterations.Set-Valued and Variational Analysis, 32(2):10, 2024

  25. [25]

    Needell, N

    D. Needell, N. Srebro, and R. Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm.Mathematical Programming, 155(1):549–573, Jan. 2016

  26. [26]

    Nguyen, P

    L. Nguyen, P. H. Nguyen, M. Dijk, P. Richtarik, K. Scheinberg, and M. Takac. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption. InProceedings of the 35th International Conference on Machine Learning, pages 3750–3758, July 2018

  27. [27]

    Z. Opial. Weak convergence of the sequence of successive approximations for nonexpansive mappings.Bulletin of the American Mathematical Society, 73(4):591–597, 1967

  28. [28]

    Robbins and D

    H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. InOptimizing Methods in Statistics, pages 233–257. Elsevier, 1971

  29. [29]

    H. H. Schaefer. ¨Uber die Methode sukzessiver Approximationen.Jahresbericht Der Deutschen Mathematiker- vereinigung, 59:131–140, 1957

  30. [30]

    Yurtsever, A

    A. Yurtsever, A. Gu, and S. Sra. Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates. InAdvances in Neural Information Processing Systems, volume 34, 2021

  31. [31]

    Yurtsever, B

    A. Yurtsever, B. C. Vu, and V. Cevher. Stochastic Three-Composite Convex Minimization. InAdvances in Neural Information Processing Systems, volume 29, 2016. 14