pith. sign in

arxiv: 2605.23488 · v1 · pith:CEID35RUnew · submitted 2026-05-22 · 🧮 math.OC

A Stochastic Implicit Proximal Point Algorithm for Solving Linearly Constrained Stochastic Minimax Problems

Pith reviewed 2026-05-25 04:33 UTC · model grok-4.3

classification 🧮 math.OC
keywords stochastic minimax optimizationproximal point algorithmvariance reductionsemismooth Newton methodlinear convergencelinearly constrained saddle-point problemsnonsmooth regularizers
0
0 comments X

The pith

The stochastic implicit proximal point algorithm with variance reduction solves linearly constrained stochastic minimax problems and achieves global linear convergence in expectation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a stochastic implicit proximal point algorithm that incorporates variance reduction and solves each update subproblem using semismooth Newton methods with Armijo line search. It targets large-scale minimax problems that include nonsmooth regularizers and coupling linear constraints, under the assumption that the objective is strongly convex and strongly concave. The central result establishes that the iterates converge globally and q-linearly to a saddle point while the multipliers converge r-linearly to the multiplier set, both in expectation. This matters for applications such as machine learning where such structured minimax problems appear at scale and prior methods struggle with step-size selection or efficiency.

Core claim

The paper claims that its stochastic implicit proximal point algorithm, which selects stochastic oracles either with or without replacement and solves the implicit proximal point subproblems via semismooth Newton methods with Armijo line search, produces global q-linear convergence of the primal-dual iterates to the saddle point together with global r-linear convergence of the multipliers to the multiplier set, all in expectation, when applied to strongly-convex-strongly-concave objectives that carry nonsmooth regularizers and linear coupling constraints.

What carries the argument

The implicit proximal point update subproblem, solved at each iteration by semismooth Newton methods with Armijo line search, together with variance reduction on the stochastic oracles.

If this is right

  • The algorithm handles nonsmooth regularizers and linear coupling constraints without requiring explicit smoothing or penalty reformulations.
  • Global q-linear convergence holds for the iterates to the saddle point in expectation.
  • Global r-linear convergence holds for the multipliers to the multiplier set in expectation.
  • Variance reduction allows the method to use either with-replacement or without-replacement sampling while preserving the linear rates.
  • Numerical tests on machine learning problems indicate improved computational efficiency and more robust step-size behavior than existing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same proximal-point structure could be tested on problems whose strong-convexity parameters are unknown or time-varying.
  • The linear rates suggest that the method may scale to distributed settings where linear constraints represent consensus or resource-sharing conditions.
  • Because each subproblem is solved only to moderate accuracy by the semismooth Newton solver, the overall iteration count may remain modest even when the dimension grows.

Load-bearing premise

The objective function must be strongly convex and strongly concave.

What would settle it

Apply the algorithm to a concrete strongly-convex-strongly-concave minimax instance with known linear constraints and nonsmooth regularizers; if the observed convergence of iterates is slower than linear or fails to hold in expectation over repeated runs, the claimed rates are falsified.

Figures

Figures reproduced from arXiv: 2605.23488 by Jiani Wang, Kehan Zhu, Yu-Hong Dai.

Figure 1
Figure 1. Figure 1: Comparison of network flow performance under differ￾ent parameter settings 6.2. Linear regression. In this section, we consider the well-known linear regres￾sion problem [11] with joint linear constraints as follows (6.1) min x∈Rn max y∈Rm f(x, y) = 1 m " − 1 2 ∥y∥ 2 − b T y + 1 N X N i=1 y T Kix # + λ 2 ∥x∥ 2 subject to Ax + By + c = 0p, where we generate a comprehensive dataset of 500,000 random instance… view at source ↗
Figure 2
Figure 2. Figure 2: Performance comparison under different settings. Note that O represents the number of repeated experiments. (a) Comparison of step size (m = 15, b0 = 5) (b) Comparison of inner iterations (α = 5, b0 = 5) [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance comparison under different settings . It is observed that excessive inner iterations do not strictly imply better overall efficiency due to the trade-off between calculation precision and computational cost. References 1. Aryeh Akhavan, Massimiliano Pontil, and Alexandre Tsybakov, Distributed zero-order op￾timization under adversarial noise, Advances in Neural Information Processing Systems, vo… view at source ↗
read the original abstract

This paper presents a novel approach to solving large-scale minimax problems with nonsmooth regularizers. We propose a stochastic implicit proximal point algorithm with variance reduction techniques where stochastic oracles are selected in two cases -- with or without replacement. The semismooth Newton methods with Armijo line search is used to solve the implicit proximal point update subproblem in each iteration. The algorithm efficiently handles the strongly-convex-strongly-concave objective function with nonsmooth regularizers and coupling linear equations, which is proved to exhibit global q-linear convergence of the iterations to the saddle point and global r-linearly convergence of the multipliers to the multiplier set in expectation. Numerical experiments on machine learning problems demonstrate the superiority of the proposed method over state-of-the-art algorithms in terms of both computational efficiency and selection of the step sizes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a stochastic implicit proximal point algorithm incorporating variance reduction (with or without replacement sampling) for linearly constrained stochastic minimax problems with nonsmooth regularizers. Semismooth Newton methods with Armijo line search are used to solve the implicit proximal subproblems at each iteration. Under the assumption that the objective is strongly convex-strongly concave, the algorithm is claimed to achieve global q-linear convergence of the iterates to the saddle point and global r-linear convergence of the multipliers to the multiplier set, both in expectation. Numerical experiments on machine learning problems are presented to show superiority over state-of-the-art methods in computational efficiency and step-size selection.

Significance. If the stated convergence guarantees can be rigorously established, the work would contribute a practically relevant method for large-scale stochastic minimax problems with linear coupling constraints, which appear frequently in machine learning. The combination of implicit proximal-point updates, variance reduction, and semismooth Newton subproblem solvers could address limitations in step-size selection and handling of nonsmooth terms that affect existing stochastic methods.

major comments (2)
  1. Abstract: the central claims of global q-linear convergence of the iterates and r-linear convergence of the multipliers in expectation are asserted without any indication of the proof strategy, error bounds, or the precise role of the variance-reduction sampling (with/without replacement). Because these rates are load-bearing for the paper's contribution, the absence of even a high-level proof outline prevents verification of the result from the provided information.
  2. Abstract: the claim that the algorithm 'efficiently handles' the problem class relies on the use of semismooth Newton methods for the proximal subproblems, yet no discussion is given of the conditions under which the Newton steps are well-defined or globally convergent when the regularizers are nonsmooth and the constraints are linear; this assumption appears load-bearing for both theory and implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses
  1. Referee: Abstract: the central claims of global q-linear convergence of the iterates and r-linear convergence of the multipliers in expectation are asserted without any indication of the proof strategy, error bounds, or the precise role of the variance-reduction sampling (with/without replacement). Because these rates are load-bearing for the paper's contribution, the absence of even a high-level proof outline prevents verification of the result from the provided information.

    Authors: The abstract provides a concise summary. The proof strategy, error bounds derived from strong convexity-strong concavity, and the precise role of variance reduction (with and without replacement) in controlling stochastic errors to obtain the linear rates in expectation are developed in detail in Section 3. We will revise the introduction to include a high-level proof outline for improved accessibility and verification. revision: yes

  2. Referee: Abstract: the claim that the algorithm 'efficiently handles' the problem class relies on the use of semismooth Newton methods for the proximal subproblems, yet no discussion is given of the conditions under which the Newton steps are well-defined or globally convergent when the regularizers are nonsmooth and the constraints are linear; this assumption appears load-bearing for both theory and implementation.

    Authors: We agree that the conditions for the semismooth Newton solver merit explicit discussion. Section 2 describes the proximal subproblem solver and notes that Armijo line search is used. The well-definedness of the steps follows from convexity of the regularizers together with the linear constraints, while global convergence of the Newton iterations holds under semismoothness. We will expand Section 2.3 with a dedicated paragraph stating these conditions and citing relevant results on semismooth Newton methods. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes a stochastic implicit proximal point algorithm for linearly constrained stochastic minimax problems and states that global q-linear convergence of iterations and r-linear convergence of multipliers are proved under strong convexity-concavity with nonsmooth regularizers. These are standard convergence claims for proximal-point-type methods on strongly convex-concave problems, derived from the algorithm definition and semismooth Newton subproblem solves rather than by construction from fitted inputs or self-citations. No load-bearing self-citation chains, self-definitional reductions, or renamed empirical patterns appear in the abstract or description; the derivation chain remains self-contained against external benchmarks for such algorithms.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No specific free parameters, axioms, or invented entities can be identified from the abstract alone.

pith-pipeline@v0.9.0 · 5669 in / 1066 out tokens · 27048 ms · 2026-05-25T04:33:25.413478+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

  1. [1]

    34, 2021, pp

    Aryeh Akhavan, Massimiliano Pontil, and Alexandre Tsybakov,Distributed zero-order op- timization under adversarial noise, Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 10209–10220

  2. [2]

    2, Stanford University Press Stanford, 1958

    Kenneth Joseph Arrow, Leonid Hurwicz, Hirofumi Uzawa, Hollis Burnley Chenery, Selmer Johnson, and Samuel Karlin,Studies in linear and non-linear programming, vol. 2, Stanford University Press Stanford, 1958

  3. [3]

    Amir Beck,First-order methods in optimization, SIAM, 2017

  4. [4]

    J Fr´ ed´ eric Bonnans and Alexander Shapiro,Perturbation analysis of optimization problems, Springer Science & Business Media, 2013

  5. [5]

    1, 120–145

    Antonin Chambolle and Thomas Pock,A first-order primal-dual algorithm for convex prob- lems with applications to imaging, Journal of mathematical imaging and vision40(2011), no. 1, 120–145. 28 KEHAN ZHU, JIANI WANG, AND YU-HONG DAI

  6. [6]

    ,An introduction to continuous optimization for imaging, Acta Numerica25(2016), 161–319

  7. [7]

    1, 253–287

    ,On the ergodic convergence rates of a first-order primal–dual algorithm, Mathemat- ical Programming159(2016), no. 1, 253–287

  8. [8]

    4, 1779–1814

    Yunmei Chen, Guanghui Lan, and Yuyuan Ouyang,Optimal primal-dual methods for a class of saddle point problems, SIAM Journal on Optimization24(2014), no. 4, 1779–1814

  9. [9]

    3, 2883–2916

    Yu-Hong Dai, Jiani Wang, and Liwei Zhang,Optimality conditions and numerical algorithms for a class of linearly constrained minimax optimization problems, SIAM Journal on Opti- mization34(2024), no. 3, 2883–2916

  10. [10]

    Cong Dang and Guanghui Lan,Randomized first-order methods for saddle point optimization, arXiv preprint arXiv:1409.8625 (2014)

  11. [11]

    Simon S Du and Wei Hu,Linear convergence of the primal-dual gradient method for convex- concave saddle point problems without strong convexity, The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, 2019, pp. 196–205

  12. [12]

    Darina Dvinskikh, Vladislav Tominin, Yaroslav Tominin, and Alexander Gasnikov,Gradient- free optimization for non-smooth minimax problems with maximum value of adversarial noise, arXiv preprint arXiv:2202.06114 (2022)

  13. [13]

    1, 293–318

    Jonathan Eckstein and Dimitri P Bertsekas,On the douglas—rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical programming 55(1992), no. 1, 293–318

  14. [14]

    3, 131–137

    Francisco Facchinei,Minimization of sc1 functions and the maratos effect, Operations Re- search Letters17(1995), no. 3, 131–137

  15. [15]

    1765– 1773

    Xinzhe Fu and Eytan Modiano,Network interdiction using adversarial traffic flows, IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE, 2019, pp. 1765– 1773

  16. [16]

    Rafal Goebel and R Tyrrell Rockafellar,Local strong convexity and local lipschitz continuity of the gradient of convex functions, Journal of Convex Analysis15(2008), no. 2, 263

  17. [17]

    2, 622–637

    Deren Han, Defeng Sun, and Liwei Zhang,Linear rate convergence of the alternating direc- tion method of multipliers for convex composite programming, Mathematics of Operations Research43(2018), no. 2, 622–637

  18. [18]

    4, 2745–2772

    Kevin Huang and Shuzhong Zhang,New first-order algorithms for stochastic variational in- equalities, SIAM Journal on Optimization32(2022), no. 4, 2745–2772

  19. [19]

    Sashank J Reddi, Suvrit Sra, Barnabas Poczos, and Alexander J Smola,Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, Advances in neural information processing systems29(2016)

  20. [20]

    1, 17–58

    Anatoli Juditsky, Arkadi Nemirovski, and Claire Tauvel,Solving variational inequalities with stochastic mirror-prox algorithm, Stochastic Systems1(2011), no. 1, 17–58

  21. [21]

    2, 242–246

    Jerzy Kyparisis,On uniqueness of kuhn-tucker multipliers in nonlinear programming, Math- ematical Programming32(1985), no. 2, 242–246

  22. [22]

    2, 1059–1091

    Ming-Jun Lai and Wotao Yin,Augmented\ell 1 and nuclear-norm models with a globally linearly convergent algorithm, SIAM Journal on Imaging Sciences6(2013), no. 2, 1059–1091

  23. [23]

    1, 433–458

    Xudong Li, Defeng Sun, and Kim-Chuan Toh,A highly efficient semismooth newton aug- mented lagrangian method for solving lasso problems, SIAM Journal on Optimization28 (2018), no. 1, 433–458

  24. [24]

    2738–2779

    Tianyi Lin, Chi Jin, and Michael I Jordan,Near-optimal algorithms for minimax optimization, Conference on learning theory, PMLR, 2020, pp. 2738–2779

  25. [25]

    15, 4051–4064

    Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro,Dlm: Decentralized linearized alter- nating direction method of multipliers, IEEE Transactions on Signal Processing63(2015), no. 15, 4051–4064

  26. [26]

    Sharon L Lohr,Sampling: design and analysis, Chapman and Hall/CRC, 2021

  27. [27]

    Luo Luo, Guangzeng Xie, Tong Zhang, and Zhihua Zhang,Near optimal stochastic al- gorithms for finite-sum unbalanced convex-concave minimax optimization, arXiv preprint arXiv:2106.01761 (2021)

  28. [28]

    1, 208–223

    Kaj Madsen and Hans Schjaer-Jacobsen,Linearly constrained minimax optimization, Math- ematical Programming14(1978), no. 1, 208–223

  29. [29]

    thesis, Technische Universit¨ at M¨ unchen, 2016

    Andre Milzarek,Numerical methods and second order theory for nonsmooth problems, Ph.D. thesis, Technische Universit¨ at M¨ unchen, 2016. A STOCHASTIC IMPLICIT PROXIMAL POINT ALGORITHM FOR SOLVING LINEARLY CONSTRAINED STOCHASTIC MINIMAX PROBLEMS 29

  30. [30]

    1, 1157–1185

    Andre Milzarek, Fabian Schaipp, and Michael Ulbrich,A semismooth newton stochastic prox- imal point algorithm with variance reduction, SIAM Journal on Optimization34(2024), no. 1, 1157–1185

  31. [31]

    1, 229–251

    Arkadi Nemirovski,Prox-method with rate of convergence o (1/t) for variational inequali- ties with lipschitz continuous monotone operators and smooth convex-concave saddle point problems, SIAM Journal on Optimization15(2004), no. 1, 229–251

  32. [32]

    4, 1574–1609

    Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro,Robust stochas- tic approximation approach to stochastic programming, SIAM Journal on optimization19 (2009), no. 4, 1574–1609

  33. [33]

    1, 153–197

    MJD Powell,On the number of iterations of karmarkar’s algorithm for linear programming, Mathematical Programming62(1993), no. 1, 153–197

  34. [34]

    1, 353–367

    Liqun Qi and Jie Sun,A nonsmooth version of newton’s method, Mathematical programming 58(1993), no. 1, 353–367

  35. [35]

    2, 905–912

    Javier Salmeron, Kevin Wood, and Ross Baldick,Analysis of electric grid security under terrorist threat, IEEE Transactions on power systems19(2004), no. 2, 905–912

  36. [36]

    3, 523–542

    Alexander Shapiro and Anton Kleywegt,Minimax analysis of stochastic problems, Optimiza- tion Methods and Software17(2002), no. 3, 523–542

  37. [37]

    1949–1987

    J Cole Smith, Mike Prince, and Joseph Geunes,Modern network interdiction problems and algorithms, Handbook of combinatorial optimization, Springer, 2013, pp. 1949–1987

  38. [38]

    4, 2675–2702

    Ioannis Tsaknakis, Mingyi Hong, and Shuzhong Zhang,Minimax problems with coupled linear constraints: Computational complexity and duality, SIAM Journal on Optimization33(2023), no. 4, 2675–2702

  39. [39]

    Michael Ulbrich,Semismooth newton methods for variational inequalities and constrained optimization problems in function spaces, SIAM, 2011

  40. [40]

    1, 78–80

    Gerd Wachsmuth,On licq and the uniqueness of lagrange multipliers, Operations Research Letters41(2013), no. 1, 78–80

  41. [41]

    Wenhan Xian, Feihu Huang, Yanfu Zhang, and Heng Huang,A faster decentralized algorithm for nonconvex minimax problems, Advances in Neural Information Processing Systems34 (2021), 25865–25877

  42. [42]

    4, 2057–2075

    Lin Xiao and Tong Zhang,A proximal stochastic gradient method with progressive variance reduction, SIAM Journal on Optimization24(2014), no. 4, 2057–2075

  43. [43]

    Zi Xu, Jun Shen, Zhe Wang, and Yu-Hong Dai,Zeroth-order alternating randomized gradi- ent projection algorithms for general nonconvex-concave minimax problems, arXiv preprint arXiv:2108.00473 (2021)

  44. [44]

    1, 635–706

    Zi Xu, Hongchao Zhang, Yang Xu, and Guanghui Lan,A unified single-loop alternating gra- dient projection algorithm for nonconvex-concave and convex-nonconcave minimax problems, Mathematical Programming201(2023), no. 1, 635–706

  45. [45]

    3, 331–366

    Liuqin Yang, Defeng Sun, and Kim-Chuan Toh,Sdpnal+: a majorized semismooth newton- cg augmented lagrangian method for semidefinite programming with nonnegative constraints, Mathematical Programming Computation7(2015), no. 3, 331–366

  46. [46]

    2, 1443–1473

    Renbo Zhao,Accelerated stochastic algorithms for convex-concave saddle-point problems, Mathematics of Operations Research47(2022), no. 2, 1443–1473

  47. [47]

    4, 1737–1765

    Xin-Yuan Zhao, Defeng Sun, and Kim-Chuan Toh,A newton-cg augmented lagrangian method for semidefinite programming, SIAM Journal on Optimization20(2010), no. 4, 1737–1765. School of Mathematical Sciences, Beijing University of Posts and Telecommunica- tions, Beijing, China Email address:zhukehan@bupt.edu.cn School of Mathematical Sciences & Key Laboratory ...