Recognition: 2 theorem links
· Lean TheoremProx-PEP: A Proximal Partial Exact Penalty Algorithm for Weakly Convex Stochastic Nonlinear Programming
Pith reviewed 2026-05-11 01:50 UTC · model grok-4.3
The pith
Prox-PEP achieves O(T^{-1/4}) expected complexity for approximate KKT points in weakly convex stochastic nonlinear programs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Prox-PEP is a proximal method with quadratic subproblems and an exact penalty approach for handling nonlinear equality constraints in weakly convex stochastic nonlinear programming. By designing second-order approximation matrices to guarantee strong convexity and employing a monotonically increasing then constant equality penalty parameter, the method establishes O(T^{-1/4}) average expected oracle complexity for epsilon-KKT stationarity, bounding the squared gradient norm of the Lagrangian's Moreau envelope, constraint violations, and complementarity. Under light-tailed martingale noise, it further yields an O(T^{-1/8}) high-probability bound on the gradient norm of the Moreau envelope and
What carries the argument
The quadratic approximation matrices within the augmented Lagrangian subproblems, chosen to make each subproblem strongly convex, combined with the dynamic equality penalty parameter strategy.
If this is right
- The method supplies a concrete algorithm for finding approximate stationary points in stochastic nonlinear programs under weak convexity.
- Both average expected and high-probability convergence guarantees become available for the gradient of the Lagrangian Moreau envelope and for constraint violation measures.
- The dynamic penalty schedule keeps the equality penalty bounded while still driving constraint satisfaction.
- Strong convexity of every quadratic subproblem is guaranteed by the choice of approximation matrices, enabling reliable numerical solution at each step.
Where Pith is reading between the lines
- The Moreau-envelope stationarity measure could serve as a practical stopping criterion in implementations even when the original problem is nonsmooth.
- Similar quadratic-approximation and dynamic-penalty ideas may transfer to deterministic weakly convex problems to obtain comparable rates without stochastic assumptions.
- The O(T^{-1/8}) high-probability result suggests the algorithm could be useful in settings where tail bounds on noise are available but variance is not uniformly bounded.
- Extensions that relax the light-tailed noise assumption while preserving the same rate would broaden applicability to heavier-tailed stochastic programs.
Load-bearing premise
The objective and constraint functions are weakly convex and the stochastic noise satisfies light-tailed martingale assumptions.
What would settle it
Running Prox-PEP on a weakly convex instance and observing that the squared Moreau-envelope gradient norm does not decay at the claimed O(T^{-1/4}) expected rate, or that high-probability bounds fail under light-tailed noise.
read the original abstract
This paper considers stochastic optimization problems with weakly convex objective and constraint functions. We propose Prox-PEP, a proximal method equipped with quadratic subproblems. To handle nonlinear equality constraints, we employ an exact penalty approach, transforming them into inequality constraints with auxiliary slack variables. At each iteration, we construct quadratic approximations for both the objective and the constraint functions to facilitate efficient subproblem computation. By carefully designing the second-order approximation matrices, the subproblem constructed via the augmented Lagrangian function is strictly guaranteed to be strongly convex. Furthermore, we adopt a dynamic strategy for the equality penalty parameter: it monotonically increases up to a predefined threshold and remains constant thereafter. Building upon this algorithmic framework, we establish comprehensive asymptotic complexities. We prove that Prox-PEP achieves an $\mathcal{O}(T^{-1/4})$ average expected oracle complexity for $\epsilon$-KKT stationarity, specifically bounding the squared norm of the gradient of the Moreau envelope of the Lagrangian function, alongside constraint violations and complementarity conditions. Additionally, under standard light-tailed martingale noise assumptions, we derive an $\mathcal{O}(T^{-1/8})$ high-probability convergence bound for the norm of the gradient of the Lagrangian's Moreau envelope, as well as $\mathcal{O}(T^{-1/4})$ high-probability bounds for both constraint violations and complementarity conditions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper proposes Prox-PEP, a proximal partial exact penalty algorithm for stochastic nonlinear programming problems with weakly convex objective and constraint functions. The algorithm constructs quadratic approximations to the functions at each iteration to form subproblems based on the augmented Lagrangian, with the approximation matrices designed to ensure the subproblems are strictly strongly convex. Equality constraints are handled by transforming them into inequalities using slack variables and applying a partial exact penalty approach where the penalty parameter increases monotonically to a fixed threshold. The main theoretical contribution is the establishment of asymptotic complexity bounds: an O(T^{-1/4}) average expected oracle complexity for achieving epsilon-KKT stationarity, measured by the squared norm of the gradient of the Moreau envelope of the Lagrangian, as well as bounds on constraint violations and complementarity conditions. Additionally, under light-tailed martingale noise assumptions, O(T^{-1/8}) high-probability bounds are derived for the gradient norm and O(T^{-1/4}) for the other measures.
Significance. If the results hold, this work is significant for providing complexity guarantees in stochastic weakly convex optimization with nonlinear equality constraints, a setting where standard convex methods do not apply and constraint handling is nontrivial. The explicit matrix design ensuring strong convexity of subproblems and the stabilized penalty strategy are constructive strengths that support practical implementation while enabling the analysis. The use of the Moreau envelope gradient norm as the stationarity measure is well-suited to the weakly convex regime and aligns with recent literature on nonconvex stochastic optimization.
major comments (2)
- [§3.2] §3.2, quadratic approximation construction: the claim that the chosen matrices guarantee strict strong convexity of the augmented-Lagrangian subproblem independent of the stochastic oracle calls must be shown with an explicit uniform lower bound on the strong-convexity modulus; without this, the descent lemma used in the complexity proof may not hold uniformly.
- [Theorem 4.1] Theorem 4.1 (expected complexity): the ergodic bound on the squared Moreau-envelope gradient norm is derived under the assumption that the penalty parameter stabilizes after finitely many iterations; the analysis should explicitly bound the contribution of the transient phase where the penalty is still increasing, as this phase length depends on the initial penalty and the threshold.
minor comments (3)
- [Abstract] The abstract uses 'average expected oracle complexity' without clarifying whether this is E[ (1/T) sum ... ] or (1/T) sum E[...]; a brief sentence in the introduction would remove ambiguity.
- [§2] Notation for the weak-convexity modulus and the Lipschitz constants of the gradients should be introduced in §2 before they appear in the algorithm description and theorems.
- [§5] In the high-probability section, the dependence of the O(T^{-1/8}) bound on the failure probability delta should be stated explicitly (including any log(1/delta) factors) to allow direct comparison with other high-probability results in the literature.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment of significance, and constructive comments. We address each major comment below and will incorporate the suggested clarifications into the revised manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2, quadratic approximation construction: the claim that the chosen matrices guarantee strict strong convexity of the augmented-Lagrangian subproblem independent of the stochastic oracle calls must be shown with an explicit uniform lower bound on the strong-convexity modulus; without this, the descent lemma used in the complexity proof may not hold uniformly.
Authors: We appreciate this observation. Section 3.2 constructs the approximation matrices (incorporating the penalty threshold and a sufficiently large regularization term) so that the augmented-Lagrangian subproblem is strictly strongly convex with modulus independent of the stochastic oracle. While the manuscript states that strong convexity is guaranteed, we agree that an explicit uniform lower bound (e.g., in terms of the minimum eigenvalue of the designed matrix and the fixed penalty threshold) is not isolated as a separate lemma. In the revision we will add a short lemma deriving this uniform modulus explicitly from the matrix design, which will directly support the uniform descent lemma invoked in the complexity analysis. revision: yes
-
Referee: [Theorem 4.1] Theorem 4.1 (expected complexity): the ergodic bound on the squared Moreau-envelope gradient norm is derived under the assumption that the penalty parameter stabilizes after finitely many iterations; the analysis should explicitly bound the contribution of the transient phase where the penalty is still increasing, as this phase length depends on the initial penalty and the threshold.
Authors: We agree that the transient phase merits explicit treatment. The algorithm increases the penalty parameter monotonically until it reaches a fixed threshold (chosen larger than any Lagrange multiplier bound), after which it remains constant; the number of transient iterations is therefore finite and bounded by (threshold minus initial value) divided by the increment size. The ergodic average in Theorem 4.1 already includes all iterations, but the proof currently relies on eventual stabilization without isolating the transient contribution. In the revision we will add a short paragraph bounding the transient phase separately: its total contribution to the summed stationarity measure is O(1) (independent of T), which is absorbed into the O(T^{-1/4}) ergodic rate without changing the asymptotic order. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper constructs Prox-PEP via explicit algorithmic choices (quadratic approximations chosen to enforce strong convexity of augmented-Lagrangian subproblems, monotonic increase of the equality penalty to a fixed threshold, and slack-variable exact-penalty reformulation), then derives the stated O(T^{-1/4}) expected and O(T^{-1/8}) high-probability bounds on Moreau-envelope stationarity, constraint violation, and complementarity directly from those choices plus standard weak-convexity and light-tailed martingale assumptions. No load-bearing step reduces to a self-definition, a fitted parameter renamed as a prediction, or a self-citation chain; the complexity statements are obtained by standard analysis of the proximal updates and are not tautological with the inputs. The derivation is therefore self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- equality penalty threshold
axioms (2)
- domain assumption Objective and constraint functions are weakly convex
- domain assumption Stochastic noise satisfies light-tailed martingale assumptions
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Prox-PEP achieves an O(T^{-1/4}) average expected oracle complexity for ε-KKT stationarity, bounding the squared norm of the gradient of the Moreau envelope of the Lagrangian
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
quadratic approximations ... augmented Lagrangian ... weakly convex objective and constraint functions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Beck, First-Order Methods in Optimization, SIAM, Philadelphia, 2017
A. Beck, First-Order Methods in Optimization, SIAM, Philadelphia, 2017
work page 2017
-
[2]
D. Boob, Q. Deng and G. Lan, Stochastic first-order methods for convex and nonconvex func- tional constrained optimization, Mathematical Programming, 197 (2023), 215–279
work page 2023
-
[3]
F. E. Curtis, M. J. O’Neill and D. P. Robinson, Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization, Mathematical Programming, 205 (2024), 431–483
work page 2024
-
[4]
D. Davis and D. Drusvyatskiy, Stochastic model-based minimization of weakly convex func- tions, SIAM Journal on Optimization, 29:1(2019), 207–239
work page 2019
-
[5]
D. Drusvyatskiy and A. S. Lewis, Error bounds, quadratic growth, and linear convergence of proximal methods, Mathematics of Operations Research, 43:3 (2018), 919–948
work page 2018
-
[6]
S. Ghadimi and G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework, SIAM Journal on Op- timization, 22 (2012), 1469–1492
work page 2012
-
[7]
S. Ghadimi and G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms, SIAM Journal on Optimization, 23:4 (2013), 2061–2089
work page 2013
-
[8]
S. Ghadimi, G. Lan and H. Zhang, Mini-batch stochastic approximation methods for noncon- vex stochastic composite optimization, Mathematical Programming, 155:1 (2016), 267–305
work page 2016
- [9]
-
[10]
G. Lan, First-Order and Stochastic Optimization Methods for Machine Learning , Springer Series in the Data Sciences, Springer, Cham, Switzerland, 2020
work page 2020
-
[11]
G. Lan, An optimal method for stochastic composite optimization, Mathematical Program- ming, 133:1 (2012), 365–397. 46
work page 2012
-
[12]
G. Lan, A. Nemirovski and A. Shapiro, Validation analysis of mirror descent stochastic ap- proximation method, Mathematical Programming, 134:2 (2012), 425–458
work page 2012
- [13]
- [14]
-
[15]
A. Nemirovski, A. Juditsky, G. Lan and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 19 (2009), 1574–1609
work page 2009
-
[16]
B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging,SIAM Journal on Control and Optimization, 30 (1992), 838–855
work page 1992
-
[17]
H. Robbins and S. Monro, A stochastic approximation method,Annals of Mathematical Statis- tics, 22 (1951), 400–407
work page 1951
-
[18]
S. Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning, 4:2 (2011), 107–194
work page 2011
-
[19]
Q. Shi, X. Wang and H. Wang, A momentum-based linearized augmented Lagrangian method for nonconvex constrained stochastic optimization,Mathematics of Operations Research, pub- lished online, 23 Jan 2025, https://doi.org/10.1287/moor.2022.0193
-
[20]
L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research, 11 (2010), 2543–2596
work page 2010
-
[21]
Y . Xu, Primal-dual stochastic gradient method for convex programs with many functional constraints, SIAM Journal on Optimization, 30:2 (2020), 1664–1692
work page 2020
- [22]
- [23]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.