A Stochastic Implicit Proximal Point Algorithm for Solving Linearly Constrained Stochastic Minimax Problems
Pith reviewed 2026-05-25 04:33 UTC · model grok-4.3
The pith
The stochastic implicit proximal point algorithm with variance reduction solves linearly constrained stochastic minimax problems and achieves global linear convergence in expectation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that its stochastic implicit proximal point algorithm, which selects stochastic oracles either with or without replacement and solves the implicit proximal point subproblems via semismooth Newton methods with Armijo line search, produces global q-linear convergence of the primal-dual iterates to the saddle point together with global r-linear convergence of the multipliers to the multiplier set, all in expectation, when applied to strongly-convex-strongly-concave objectives that carry nonsmooth regularizers and linear coupling constraints.
What carries the argument
The implicit proximal point update subproblem, solved at each iteration by semismooth Newton methods with Armijo line search, together with variance reduction on the stochastic oracles.
If this is right
- The algorithm handles nonsmooth regularizers and linear coupling constraints without requiring explicit smoothing or penalty reformulations.
- Global q-linear convergence holds for the iterates to the saddle point in expectation.
- Global r-linear convergence holds for the multipliers to the multiplier set in expectation.
- Variance reduction allows the method to use either with-replacement or without-replacement sampling while preserving the linear rates.
- Numerical tests on machine learning problems indicate improved computational efficiency and more robust step-size behavior than existing methods.
Where Pith is reading between the lines
- The same proximal-point structure could be tested on problems whose strong-convexity parameters are unknown or time-varying.
- The linear rates suggest that the method may scale to distributed settings where linear constraints represent consensus or resource-sharing conditions.
- Because each subproblem is solved only to moderate accuracy by the semismooth Newton solver, the overall iteration count may remain modest even when the dimension grows.
Load-bearing premise
The objective function must be strongly convex and strongly concave.
What would settle it
Apply the algorithm to a concrete strongly-convex-strongly-concave minimax instance with known linear constraints and nonsmooth regularizers; if the observed convergence of iterates is slower than linear or fails to hold in expectation over repeated runs, the claimed rates are falsified.
Figures
read the original abstract
This paper presents a novel approach to solving large-scale minimax problems with nonsmooth regularizers. We propose a stochastic implicit proximal point algorithm with variance reduction techniques where stochastic oracles are selected in two cases -- with or without replacement. The semismooth Newton methods with Armijo line search is used to solve the implicit proximal point update subproblem in each iteration. The algorithm efficiently handles the strongly-convex-strongly-concave objective function with nonsmooth regularizers and coupling linear equations, which is proved to exhibit global q-linear convergence of the iterations to the saddle point and global r-linearly convergence of the multipliers to the multiplier set in expectation. Numerical experiments on machine learning problems demonstrate the superiority of the proposed method over state-of-the-art algorithms in terms of both computational efficiency and selection of the step sizes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a stochastic implicit proximal point algorithm incorporating variance reduction (with or without replacement sampling) for linearly constrained stochastic minimax problems with nonsmooth regularizers. Semismooth Newton methods with Armijo line search are used to solve the implicit proximal subproblems at each iteration. Under the assumption that the objective is strongly convex-strongly concave, the algorithm is claimed to achieve global q-linear convergence of the iterates to the saddle point and global r-linear convergence of the multipliers to the multiplier set, both in expectation. Numerical experiments on machine learning problems are presented to show superiority over state-of-the-art methods in computational efficiency and step-size selection.
Significance. If the stated convergence guarantees can be rigorously established, the work would contribute a practically relevant method for large-scale stochastic minimax problems with linear coupling constraints, which appear frequently in machine learning. The combination of implicit proximal-point updates, variance reduction, and semismooth Newton subproblem solvers could address limitations in step-size selection and handling of nonsmooth terms that affect existing stochastic methods.
major comments (2)
- Abstract: the central claims of global q-linear convergence of the iterates and r-linear convergence of the multipliers in expectation are asserted without any indication of the proof strategy, error bounds, or the precise role of the variance-reduction sampling (with/without replacement). Because these rates are load-bearing for the paper's contribution, the absence of even a high-level proof outline prevents verification of the result from the provided information.
- Abstract: the claim that the algorithm 'efficiently handles' the problem class relies on the use of semismooth Newton methods for the proximal subproblems, yet no discussion is given of the conditions under which the Newton steps are well-defined or globally convergent when the regularizers are nonsmooth and the constraints are linear; this assumption appears load-bearing for both theory and implementation.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: Abstract: the central claims of global q-linear convergence of the iterates and r-linear convergence of the multipliers in expectation are asserted without any indication of the proof strategy, error bounds, or the precise role of the variance-reduction sampling (with/without replacement). Because these rates are load-bearing for the paper's contribution, the absence of even a high-level proof outline prevents verification of the result from the provided information.
Authors: The abstract provides a concise summary. The proof strategy, error bounds derived from strong convexity-strong concavity, and the precise role of variance reduction (with and without replacement) in controlling stochastic errors to obtain the linear rates in expectation are developed in detail in Section 3. We will revise the introduction to include a high-level proof outline for improved accessibility and verification. revision: yes
-
Referee: Abstract: the claim that the algorithm 'efficiently handles' the problem class relies on the use of semismooth Newton methods for the proximal subproblems, yet no discussion is given of the conditions under which the Newton steps are well-defined or globally convergent when the regularizers are nonsmooth and the constraints are linear; this assumption appears load-bearing for both theory and implementation.
Authors: We agree that the conditions for the semismooth Newton solver merit explicit discussion. Section 2 describes the proximal subproblem solver and notes that Armijo line search is used. The well-definedness of the steps follows from convexity of the regularizers together with the linear constraints, while global convergence of the Newton iterations holds under semismoothness. We will expand Section 2.3 with a dedicated paragraph stating these conditions and citing relevant results on semismooth Newton methods. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper proposes a stochastic implicit proximal point algorithm for linearly constrained stochastic minimax problems and states that global q-linear convergence of iterations and r-linear convergence of multipliers are proved under strong convexity-concavity with nonsmooth regularizers. These are standard convergence claims for proximal-point-type methods on strongly convex-concave problems, derived from the algorithm definition and semismooth Newton subproblem solves rather than by construction from fitted inputs or self-citations. No load-bearing self-citation chains, self-definitional reductions, or renamed empirical patterns appear in the abstract or description; the derivation chain remains self-contained against external benchmarks for such algorithms.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aryeh Akhavan, Massimiliano Pontil, and Alexandre Tsybakov,Distributed zero-order op- timization under adversarial noise, Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 10209–10220
work page 2021
-
[2]
2, Stanford University Press Stanford, 1958
Kenneth Joseph Arrow, Leonid Hurwicz, Hirofumi Uzawa, Hollis Burnley Chenery, Selmer Johnson, and Samuel Karlin,Studies in linear and non-linear programming, vol. 2, Stanford University Press Stanford, 1958
work page 1958
-
[3]
Amir Beck,First-order methods in optimization, SIAM, 2017
work page 2017
-
[4]
J Fr´ ed´ eric Bonnans and Alexander Shapiro,Perturbation analysis of optimization problems, Springer Science & Business Media, 2013
work page 2013
-
[5]
Antonin Chambolle and Thomas Pock,A first-order primal-dual algorithm for convex prob- lems with applications to imaging, Journal of mathematical imaging and vision40(2011), no. 1, 120–145. 28 KEHAN ZHU, JIANI WANG, AND YU-HONG DAI
work page 2011
-
[6]
,An introduction to continuous optimization for imaging, Acta Numerica25(2016), 161–319
work page 2016
-
[7]
,On the ergodic convergence rates of a first-order primal–dual algorithm, Mathemat- ical Programming159(2016), no. 1, 253–287
work page 2016
-
[8]
Yunmei Chen, Guanghui Lan, and Yuyuan Ouyang,Optimal primal-dual methods for a class of saddle point problems, SIAM Journal on Optimization24(2014), no. 4, 1779–1814
work page 2014
-
[9]
Yu-Hong Dai, Jiani Wang, and Liwei Zhang,Optimality conditions and numerical algorithms for a class of linearly constrained minimax optimization problems, SIAM Journal on Opti- mization34(2024), no. 3, 2883–2916
work page 2024
-
[10]
Cong Dang and Guanghui Lan,Randomized first-order methods for saddle point optimization, arXiv preprint arXiv:1409.8625 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[11]
Simon S Du and Wei Hu,Linear convergence of the primal-dual gradient method for convex- concave saddle point problems without strong convexity, The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, 2019, pp. 196–205
work page 2019
- [12]
-
[13]
Jonathan Eckstein and Dimitri P Bertsekas,On the douglas—rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical programming 55(1992), no. 1, 293–318
work page 1992
-
[14]
Francisco Facchinei,Minimization of sc1 functions and the maratos effect, Operations Re- search Letters17(1995), no. 3, 131–137
work page 1995
-
[15]
Xinzhe Fu and Eytan Modiano,Network interdiction using adversarial traffic flows, IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE, 2019, pp. 1765– 1773
work page 2019
-
[16]
Rafal Goebel and R Tyrrell Rockafellar,Local strong convexity and local lipschitz continuity of the gradient of convex functions, Journal of Convex Analysis15(2008), no. 2, 263
work page 2008
-
[17]
Deren Han, Defeng Sun, and Liwei Zhang,Linear rate convergence of the alternating direc- tion method of multipliers for convex composite programming, Mathematics of Operations Research43(2018), no. 2, 622–637
work page 2018
-
[18]
Kevin Huang and Shuzhong Zhang,New first-order algorithms for stochastic variational in- equalities, SIAM Journal on Optimization32(2022), no. 4, 2745–2772
work page 2022
-
[19]
Sashank J Reddi, Suvrit Sra, Barnabas Poczos, and Alexander J Smola,Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, Advances in neural information processing systems29(2016)
work page 2016
- [20]
-
[21]
Jerzy Kyparisis,On uniqueness of kuhn-tucker multipliers in nonlinear programming, Math- ematical Programming32(1985), no. 2, 242–246
work page 1985
-
[22]
Ming-Jun Lai and Wotao Yin,Augmented\ell 1 and nuclear-norm models with a globally linearly convergent algorithm, SIAM Journal on Imaging Sciences6(2013), no. 2, 1059–1091
work page 2013
-
[23]
Xudong Li, Defeng Sun, and Kim-Chuan Toh,A highly efficient semismooth newton aug- mented lagrangian method for solving lasso problems, SIAM Journal on Optimization28 (2018), no. 1, 433–458
work page 2018
- [24]
-
[25]
Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro,Dlm: Decentralized linearized alter- nating direction method of multipliers, IEEE Transactions on Signal Processing63(2015), no. 15, 4051–4064
work page 2015
-
[26]
Sharon L Lohr,Sampling: design and analysis, Chapman and Hall/CRC, 2021
work page 2021
- [27]
-
[28]
Kaj Madsen and Hans Schjaer-Jacobsen,Linearly constrained minimax optimization, Math- ematical Programming14(1978), no. 1, 208–223
work page 1978
-
[29]
thesis, Technische Universit¨ at M¨ unchen, 2016
Andre Milzarek,Numerical methods and second order theory for nonsmooth problems, Ph.D. thesis, Technische Universit¨ at M¨ unchen, 2016. A STOCHASTIC IMPLICIT PROXIMAL POINT ALGORITHM FOR SOLVING LINEARLY CONSTRAINED STOCHASTIC MINIMAX PROBLEMS 29
work page 2016
-
[30]
Andre Milzarek, Fabian Schaipp, and Michael Ulbrich,A semismooth newton stochastic prox- imal point algorithm with variance reduction, SIAM Journal on Optimization34(2024), no. 1, 1157–1185
work page 2024
-
[31]
Arkadi Nemirovski,Prox-method with rate of convergence o (1/t) for variational inequali- ties with lipschitz continuous monotone operators and smooth convex-concave saddle point problems, SIAM Journal on Optimization15(2004), no. 1, 229–251
work page 2004
-
[32]
Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro,Robust stochas- tic approximation approach to stochastic programming, SIAM Journal on optimization19 (2009), no. 4, 1574–1609
work page 2009
-
[33]
MJD Powell,On the number of iterations of karmarkar’s algorithm for linear programming, Mathematical Programming62(1993), no. 1, 153–197
work page 1993
-
[34]
Liqun Qi and Jie Sun,A nonsmooth version of newton’s method, Mathematical programming 58(1993), no. 1, 353–367
work page 1993
-
[35]
Javier Salmeron, Kevin Wood, and Ross Baldick,Analysis of electric grid security under terrorist threat, IEEE Transactions on power systems19(2004), no. 2, 905–912
work page 2004
-
[36]
Alexander Shapiro and Anton Kleywegt,Minimax analysis of stochastic problems, Optimiza- tion Methods and Software17(2002), no. 3, 523–542
work page 2002
- [37]
-
[38]
Ioannis Tsaknakis, Mingyi Hong, and Shuzhong Zhang,Minimax problems with coupled linear constraints: Computational complexity and duality, SIAM Journal on Optimization33(2023), no. 4, 2675–2702
work page 2023
-
[39]
Michael Ulbrich,Semismooth newton methods for variational inequalities and constrained optimization problems in function spaces, SIAM, 2011
work page 2011
- [40]
-
[41]
Wenhan Xian, Feihu Huang, Yanfu Zhang, and Heng Huang,A faster decentralized algorithm for nonconvex minimax problems, Advances in Neural Information Processing Systems34 (2021), 25865–25877
work page 2021
-
[42]
Lin Xiao and Tong Zhang,A proximal stochastic gradient method with progressive variance reduction, SIAM Journal on Optimization24(2014), no. 4, 2057–2075
work page 2014
- [43]
-
[44]
Zi Xu, Hongchao Zhang, Yang Xu, and Guanghui Lan,A unified single-loop alternating gra- dient projection algorithm for nonconvex-concave and convex-nonconcave minimax problems, Mathematical Programming201(2023), no. 1, 635–706
work page 2023
-
[45]
Liuqin Yang, Defeng Sun, and Kim-Chuan Toh,Sdpnal+: a majorized semismooth newton- cg augmented lagrangian method for semidefinite programming with nonnegative constraints, Mathematical Programming Computation7(2015), no. 3, 331–366
work page 2015
-
[46]
Renbo Zhao,Accelerated stochastic algorithms for convex-concave saddle-point problems, Mathematics of Operations Research47(2022), no. 2, 1443–1473
work page 2022
-
[47]
Xin-Yuan Zhao, Defeng Sun, and Kim-Chuan Toh,A newton-cg augmented lagrangian method for semidefinite programming, SIAM Journal on Optimization20(2010), no. 4, 1737–1765. School of Mathematical Sciences, Beijing University of Posts and Telecommunica- tions, Beijing, China Email address:zhukehan@bupt.edu.cn School of Mathematical Sciences & Key Laboratory ...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.