Recognition: 2 theorem links
· Lean TheoremStochastic Krasnoselskii-Mann Iterations: Convergence without Uniformly Bounded Variance
Pith reviewed 2026-05-12 02:47 UTC · model grok-4.3
The pith
Stochastic Krasnoselskii-Mann iterations converge almost surely when variance is finite at only one fixed point.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For an expected nonexpansive operator on a real Hilbert space, the stochastic Krasnoselskii-Mann iteration driven by an oracle with variance finite at one fixed point yields almost sure weak convergence of the iterates to a fixed point. The expected residual converges at the optimal rate and the running minimum residual converges almost surely at a comparable rate, all without requiring the variance to be bounded uniformly.
What carries the argument
The Krasnoselskii-Mann iteration with a stochastic oracle whose variance is finite only at a single fixed point.
If this is right
- Almost sure weak convergence of the iterates to a fixed point holds.
- Rates are obtained for the expected residual of the iterates.
- Almost sure rates are obtained for the running minimum residual.
- Optimal stochastic oracle complexity is recovered for stochastic gradient descent.
- New convergence guarantees without uniform variance bounds are obtained for stochastic three-operator splitting and stochastic backward-forward splitting.
Where Pith is reading between the lines
- The single-point variance condition may apply to stochastic methods where noise intensity increases away from the solution.
- Analogous local variance assumptions could be tested on other stochastic fixed-point and splitting algorithms.
- Oracles could be engineered to keep variance controlled only in a neighborhood of the expected solution rather than globally.
- Numerical checks on problems with distance-dependent variance would provide direct evidence for the rates.
Load-bearing premise
The stochastic oracle has finite variance at one fixed point and the operator is nonexpansive in a Hilbert space.
What would settle it
A concrete nonexpansive operator and oracle with finite variance only at one fixed point for which the iterates fail to converge almost surely or the residual rates do not hold.
read the original abstract
We investigate the Stochastic Krasnoselskii-Mann iterations for expected nonexpansive fixed-point problems in a real Hilbert space. We establish convergence guarantees under significantly weaker assumptions on the variance than those typically used in the literature. In particular, instead of a uniform bound on the variance of the stochastic oracle, we only assume finite variance at a single fixed point. Under this assumption, we prove almost sure weak convergence of the iterates, derive convergence rates for the expected residual, and obtain almost sure convergence rates for the running minimum residual. Notably, we recover the best-known stochastic oracle complexity without imposing uniformly bounded variance. We illustrate the applicability of our results to Stochastic Gradient Descent, where we recover known guarantees, and to Stochastic Three-Operator Splitting and Stochastic Backward-Forward Splitting, for which we obtain the first results that avoid uniform variance bounds.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates stochastic Krasnoselskii-Mann iterations for expected nonexpansive fixed-point problems in real Hilbert spaces. It claims to establish almost sure weak convergence of the iterates, convergence rates for the expected residual, and almost sure convergence rates for the running minimum residual, under the weaker assumption that the stochastic oracle has finite variance only at a single fixed point rather than uniformly bounded variance. The results are applied to stochastic gradient descent and certain operator splitting methods, recovering known guarantees and providing new ones.
Significance. If the proofs hold, this work significantly weakens the variance assumptions in stochastic fixed-point iterations, which could extend the applicability of these methods to a broader class of problems where variance is not uniformly bounded. The recovery of optimal stochastic oracle complexity without uniform bounds is a strong point, and the first results for stochastic three-operator splitting and backward-forward splitting under these conditions are valuable contributions.
major comments (2)
- [§2.2] §2.2 (variance assumption): The paper assumes only that the stochastic oracle has finite variance at one fixed point x*. Nonexpansiveness controls ||x_n - x*|| but does not control the variance of the oracle at other points x_n visited by the iteration. No growth condition, continuity of the variance map, or a-priori boundedness argument is indicated in the abstract or setup. This leaves the existence of E[||T(x_n) - x_n + noise||^2] (and thus the well-definedness of the iteration and all residual statements) unverified. The central claims on almost-sure weak convergence and expected-residual rates are load-bearing on this point.
- [Main convergence theorem] Main convergence theorem (likely Theorem 3.1 or §3): The proof of almost-sure weak convergence and the derivation of rates for the running minimum residual must explicitly establish that the stochastic term remains square-integrable along the trajectory. If the argument relies only on nonexpansiveness without additional justification, the statements become formally undefined when the variance grows away from x*.
minor comments (2)
- [Abstract] Abstract: The phrase 'recover the best-known stochastic oracle complexity' should cite the specific prior result (e.g., the reference for the O(1/ε) or O(1/√ε) bound being matched).
- [Notation] Notation: Ensure the stochastic oracle is denoted consistently (e.g., T_ξ or similar) across sections and that the fixed point is clearly labeled x* throughout.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. The points on the variance assumption and well-definedness of the iteration are well-taken and highlight areas where the manuscript can be clarified. We respond to each major comment below and will incorporate the necessary revisions.
read point-by-point responses
-
Referee: [§2.2] §2.2 (variance assumption): The paper assumes only that the stochastic oracle has finite variance at one fixed point x*. Nonexpansiveness controls ||x_n - x*|| but does not control the variance of the oracle at other points x_n visited by the iteration. No growth condition, continuity of the variance map, or a-priori boundedness argument is indicated in the abstract or setup. This leaves the existence of E[||T(x_n) - x_n + noise||^2] (and thus the well-definedness of the iteration and all residual statements) unverified. The central claims on almost-sure weak convergence and expected-residual rates are load-bearing on this point.
Authors: We agree that the setup requires explicit clarification on this point to avoid ambiguity. The model assumes the stochastic oracle produces a square-integrable output at every point in the domain (a standard modeling hypothesis for such iterations to be executable), while the novel contribution is the absence of a uniform bound on the variance. Nonexpansiveness is used to control the expected distance to x* and to derive summability of the residuals, but we acknowledge that this does not automatically guarantee finite variance at x_n without further regularity. In the revised manuscript we will add a dedicated paragraph in §2.2 stating the well-definedness assumption explicitly and noting that the finite-variance condition at x* suffices for the convergence analysis without needing a global bound. This is a clarification rather than a strengthening of the hypotheses. revision: yes
-
Referee: [Main convergence theorem] Main convergence theorem (likely Theorem 3.1 or §3): The proof of almost-sure weak convergence and the derivation of rates for the running minimum residual must explicitly establish that the stochastic term remains square-integrable along the trajectory. If the argument relies only on nonexpansiveness without additional justification, the statements become formally undefined when the variance grows away from x*.
Authors: We thank the referee for this precise observation. The proof of Theorem 3.1 applies a Robbins-Siegmund lemma to the sequence of squared distances to x*, where the variance contribution appears as a perturbation term whose conditional expectation is controlled using the finite-variance assumption at x* together with the nonexpansive property. However, to ensure all conditional expectations are rigorously defined, an explicit verification that the iterates remain square-integrable is needed. We will revise §3 to insert a preliminary step (by induction on the boundedness in expectation established via the nonexpansive fixed-point property) that confirms square-integrability of the stochastic term along the trajectory. This makes the argument self-contained without altering the stated assumptions or conclusions. revision: yes
Circularity Check
No circularity; derivation is self-contained under stated assumptions
full rationale
The paper derives almost-sure weak convergence and residual rates for the stochastic Krasnoselskii-Mann iteration from the nonexpansiveness of the operator, the Hilbert-space inner-product structure, and the single-point finite-variance assumption. All steps invoke standard martingale convergence arguments and expectation bounds that follow directly from the given conditions without any reduction to self-definition, parameter fitting, or load-bearing self-citation. The central claims therefore remain independent of the inputs by construction.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption The mapping is nonexpansive
- standard math The space is a real Hilbert space
- domain assumption Finite variance of the stochastic oracle at the fixed point
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We only assume finite variance at a single fixed point... prove almost sure weak convergence... recover O(ε^{-4}) stochastic oracle complexity
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Variance Transfer Lemma: E[||(Tξ−T)x||²] ≤ 2σ²* + 8||x−p||²
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Quadratic Objective Perturbation: Curvature-Based Differential Privacy
QOP achieves (ε, δ)-differential privacy for ERM in the interpolation regime under weaker assumptions than linear objective perturbation by using random quadratic curvature to enforce stability and control sensitivity.
Reference graph
Works this paper leans on
-
[1]
A. Agarwal, M. J. Wainwright, P. Bartlett, and P. Ravikumar. Information-theoretic lower bounds on the oracle complexity of convex optimization. InAdvances in Neural Information Processing Systems, volume 22, 2009
work page 2009
- [2]
-
[3]
F. Bach and E. Moulines. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning. InAdvances in Neural Information Processing Systems, volume 24, 2011
work page 2011
- [4]
-
[5]
M. Bravo and R. Cominetti. Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds.SIAM Journal on Control and Optimization, 62(1):191–219, Feb. 2024
work page 2024
-
[6]
M. Bravo and J. P. Contreras. Stochastic Halpern iteration in normed spaces and applications to reinforcement learning.Mathematical Programming, Mar. 2026
work page 2026
-
[7]
F. E. Browder. Nonexpansive nonlinear operators in a Banach space.Proceedings of the National Academy of Sciences, 54(4):1041–1044, 1965
work page 1965
-
[8]
D. Cortild, L. Ketels, J. Peypouquet, and G. Garrigos. Bias-Optimal Bounds for SGD: A Computer-Aided Lyapunov Analysis, May 2025. arXiv preprint arXiv:2505.17965
-
[9]
D. Cortild and J. Peypouquet. Krasnoselskii–Mann Iterations: Inertia, Perturbations and Approximation. Journal of Optimization Theory and Applications, 204(2):35, Jan. 2025
work page 2025
-
[10]
D. Davis and W. Yin. A Three-Operator Splitting Scheme and its Optimization Applications.Set-Valued and Variational Analysis, 25(4):829–858, Dec. 2017
work page 2017
-
[11]
Q.-L. Dong, Y. J. Cho, S. He, P. M. Pardalos, and T. M. Rassias.The Krasnosel’ski˘ ı-Mann Iterative Method: Recent Progress and Applications. Springer Cham, 2022
work page 2022
-
[12]
Almost Sure Convergence of SGD on Smooth Non-Convex Functions, Oct
Francesco Orabona. Almost Sure Convergence of SGD on Smooth Non-Convex Functions, Oct. 2020
work page 2020
-
[13]
G. Garrigos, D. Cortild, L. Ketels, and J. Peypouquet. Last-Iterate Complexity of SGD for Convex and Smooth Stochastic Problems, July 2025. arXiv preprint arXiv:2507.14122
-
[14]
G. Garrigos and R. M. Gower. Handbook of Convergence Theorems for (Stochastic) Gradient Methods, Mar
- [15]
- [16]
-
[17]
R. M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, and P. Richt´ arik. SGD: General Analysis and Improved Rates. InProceedings of the 36th International Conference on Machine Learning, May 2019
work page 2019
-
[18]
C. W. Groetsch. A note on segmenting Mann iterates.Journal of Mathematical Analysis and Applications, 40(2):369–372, 1972
work page 1972
-
[19]
H. Iiduka. Mini-Batch Stochastic Krasnosel’ski\uı-Mann Algorithm for Nonexpansive Fixed Point Problems, Apr. 2026. arXiv preprint arXiv:2604.06909
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[20]
A. Khaled and P. Richt´ arik. Better Theory for SGD in the Nonconvex World.Transactions on Machine Learning Research, 2023
work page 2023
-
[21]
M. A. Krasnosel’skii. Two remarks on the method of successive approximations.Uspekhi Matematicheskikh Nauk, 10(1(63)):123–127, 1955
work page 1955
- [22]
-
[23]
W. R. Mann. Mean value methods in iteration.Proceedings of the American Mathematical Society, 4:506–510, 1953
work page 1953
-
[24]
J. J. Maul´ en, I. Fierro, and J. Peypouquet. Inertial Krasnoselskii-Mann Iterations.Set-Valued and Variational Analysis, 32(2):10, 2024
work page 2024
-
[25]
D. Needell, N. Srebro, and R. Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm.Mathematical Programming, 155(1):549–573, Jan. 2016
work page 2016
- [26]
-
[27]
Z. Opial. Weak convergence of the sequence of successive approximations for nonexpansive mappings.Bulletin of the American Mathematical Society, 73(4):591–597, 1967
work page 1967
-
[28]
H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. InOptimizing Methods in Statistics, pages 233–257. Elsevier, 1971
work page 1971
-
[29]
H. H. Schaefer. ¨Uber die Methode sukzessiver Approximationen.Jahresbericht Der Deutschen Mathematiker- vereinigung, 59:131–140, 1957
work page 1957
-
[30]
A. Yurtsever, A. Gu, and S. Sra. Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates. InAdvances in Neural Information Processing Systems, volume 34, 2021
work page 2021
-
[31]
A. Yurtsever, B. C. Vu, and V. Cevher. Stochastic Three-Composite Convex Minimization. InAdvances in Neural Information Processing Systems, volume 29, 2016. 14
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.