pith. machine review for the scientific record. sign in

arxiv: 2605.07165 · v1 · submitted 2026-05-08 · 🧮 math.OC

Recognition: 2 theorem links

· Lean Theorem

Prox-PEP: A Proximal Partial Exact Penalty Algorithm for Weakly Convex Stochastic Nonlinear Programming

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:50 UTC · model grok-4.3

classification 🧮 math.OC
keywords proximal algorithmexact penaltyweakly convexstochastic optimizationnonlinear programmingKKT stationarityoracle complexityMoreau envelope
0
0 comments X

The pith

Prox-PEP achieves O(T^{-1/4}) expected complexity for approximate KKT points in weakly convex stochastic nonlinear programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Prox-PEP, a proximal partial exact penalty method for stochastic optimization problems where the objective and constraints are weakly convex. It transforms equality constraints into inequalities using auxiliary slack variables and builds quadratic approximations for efficient subproblem solving. The design ensures subproblems are strongly convex and uses a dynamic penalty parameter that increases to a fixed threshold. This framework delivers an O(T^{-1/4}) average expected oracle complexity for reaching epsilon-KKT stationarity, along with high-probability bounds under light-tailed noise assumptions.

Core claim

Prox-PEP is a proximal method with quadratic subproblems and an exact penalty approach for handling nonlinear equality constraints in weakly convex stochastic nonlinear programming. By designing second-order approximation matrices to guarantee strong convexity and employing a monotonically increasing then constant equality penalty parameter, the method establishes O(T^{-1/4}) average expected oracle complexity for epsilon-KKT stationarity, bounding the squared gradient norm of the Lagrangian's Moreau envelope, constraint violations, and complementarity. Under light-tailed martingale noise, it further yields an O(T^{-1/8}) high-probability bound on the gradient norm of the Moreau envelope and

What carries the argument

The quadratic approximation matrices within the augmented Lagrangian subproblems, chosen to make each subproblem strongly convex, combined with the dynamic equality penalty parameter strategy.

If this is right

  • The method supplies a concrete algorithm for finding approximate stationary points in stochastic nonlinear programs under weak convexity.
  • Both average expected and high-probability convergence guarantees become available for the gradient of the Lagrangian Moreau envelope and for constraint violation measures.
  • The dynamic penalty schedule keeps the equality penalty bounded while still driving constraint satisfaction.
  • Strong convexity of every quadratic subproblem is guaranteed by the choice of approximation matrices, enabling reliable numerical solution at each step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Moreau-envelope stationarity measure could serve as a practical stopping criterion in implementations even when the original problem is nonsmooth.
  • Similar quadratic-approximation and dynamic-penalty ideas may transfer to deterministic weakly convex problems to obtain comparable rates without stochastic assumptions.
  • The O(T^{-1/8}) high-probability result suggests the algorithm could be useful in settings where tail bounds on noise are available but variance is not uniformly bounded.
  • Extensions that relax the light-tailed noise assumption while preserving the same rate would broaden applicability to heavier-tailed stochastic programs.

Load-bearing premise

The objective and constraint functions are weakly convex and the stochastic noise satisfies light-tailed martingale assumptions.

What would settle it

Running Prox-PEP on a weakly convex instance and observing that the squared Moreau-envelope gradient norm does not decay at the claimed O(T^{-1/4}) expected rate, or that high-probability bounds fail under light-tailed noise.

read the original abstract

This paper considers stochastic optimization problems with weakly convex objective and constraint functions. We propose Prox-PEP, a proximal method equipped with quadratic subproblems. To handle nonlinear equality constraints, we employ an exact penalty approach, transforming them into inequality constraints with auxiliary slack variables. At each iteration, we construct quadratic approximations for both the objective and the constraint functions to facilitate efficient subproblem computation. By carefully designing the second-order approximation matrices, the subproblem constructed via the augmented Lagrangian function is strictly guaranteed to be strongly convex. Furthermore, we adopt a dynamic strategy for the equality penalty parameter: it monotonically increases up to a predefined threshold and remains constant thereafter. Building upon this algorithmic framework, we establish comprehensive asymptotic complexities. We prove that Prox-PEP achieves an $\mathcal{O}(T^{-1/4})$ average expected oracle complexity for $\epsilon$-KKT stationarity, specifically bounding the squared norm of the gradient of the Moreau envelope of the Lagrangian function, alongside constraint violations and complementarity conditions. Additionally, under standard light-tailed martingale noise assumptions, we derive an $\mathcal{O}(T^{-1/8})$ high-probability convergence bound for the norm of the gradient of the Lagrangian's Moreau envelope, as well as $\mathcal{O}(T^{-1/4})$ high-probability bounds for both constraint violations and complementarity conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. This paper proposes Prox-PEP, a proximal partial exact penalty algorithm for stochastic nonlinear programming problems with weakly convex objective and constraint functions. The algorithm constructs quadratic approximations to the functions at each iteration to form subproblems based on the augmented Lagrangian, with the approximation matrices designed to ensure the subproblems are strictly strongly convex. Equality constraints are handled by transforming them into inequalities using slack variables and applying a partial exact penalty approach where the penalty parameter increases monotonically to a fixed threshold. The main theoretical contribution is the establishment of asymptotic complexity bounds: an O(T^{-1/4}) average expected oracle complexity for achieving epsilon-KKT stationarity, measured by the squared norm of the gradient of the Moreau envelope of the Lagrangian, as well as bounds on constraint violations and complementarity conditions. Additionally, under light-tailed martingale noise assumptions, O(T^{-1/8}) high-probability bounds are derived for the gradient norm and O(T^{-1/4}) for the other measures.

Significance. If the results hold, this work is significant for providing complexity guarantees in stochastic weakly convex optimization with nonlinear equality constraints, a setting where standard convex methods do not apply and constraint handling is nontrivial. The explicit matrix design ensuring strong convexity of subproblems and the stabilized penalty strategy are constructive strengths that support practical implementation while enabling the analysis. The use of the Moreau envelope gradient norm as the stationarity measure is well-suited to the weakly convex regime and aligns with recent literature on nonconvex stochastic optimization.

major comments (2)
  1. [§3.2] §3.2, quadratic approximation construction: the claim that the chosen matrices guarantee strict strong convexity of the augmented-Lagrangian subproblem independent of the stochastic oracle calls must be shown with an explicit uniform lower bound on the strong-convexity modulus; without this, the descent lemma used in the complexity proof may not hold uniformly.
  2. [Theorem 4.1] Theorem 4.1 (expected complexity): the ergodic bound on the squared Moreau-envelope gradient norm is derived under the assumption that the penalty parameter stabilizes after finitely many iterations; the analysis should explicitly bound the contribution of the transient phase where the penalty is still increasing, as this phase length depends on the initial penalty and the threshold.
minor comments (3)
  1. [Abstract] The abstract uses 'average expected oracle complexity' without clarifying whether this is E[ (1/T) sum ... ] or (1/T) sum E[...]; a brief sentence in the introduction would remove ambiguity.
  2. [§2] Notation for the weak-convexity modulus and the Lipschitz constants of the gradients should be introduced in §2 before they appear in the algorithm description and theorems.
  3. [§5] In the high-probability section, the dependence of the O(T^{-1/8}) bound on the failure probability delta should be stated explicitly (including any log(1/delta) factors) to allow direct comparison with other high-probability results in the literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of significance, and constructive comments. We address each major comment below and will incorporate the suggested clarifications into the revised manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2, quadratic approximation construction: the claim that the chosen matrices guarantee strict strong convexity of the augmented-Lagrangian subproblem independent of the stochastic oracle calls must be shown with an explicit uniform lower bound on the strong-convexity modulus; without this, the descent lemma used in the complexity proof may not hold uniformly.

    Authors: We appreciate this observation. Section 3.2 constructs the approximation matrices (incorporating the penalty threshold and a sufficiently large regularization term) so that the augmented-Lagrangian subproblem is strictly strongly convex with modulus independent of the stochastic oracle. While the manuscript states that strong convexity is guaranteed, we agree that an explicit uniform lower bound (e.g., in terms of the minimum eigenvalue of the designed matrix and the fixed penalty threshold) is not isolated as a separate lemma. In the revision we will add a short lemma deriving this uniform modulus explicitly from the matrix design, which will directly support the uniform descent lemma invoked in the complexity analysis. revision: yes

  2. Referee: [Theorem 4.1] Theorem 4.1 (expected complexity): the ergodic bound on the squared Moreau-envelope gradient norm is derived under the assumption that the penalty parameter stabilizes after finitely many iterations; the analysis should explicitly bound the contribution of the transient phase where the penalty is still increasing, as this phase length depends on the initial penalty and the threshold.

    Authors: We agree that the transient phase merits explicit treatment. The algorithm increases the penalty parameter monotonically until it reaches a fixed threshold (chosen larger than any Lagrange multiplier bound), after which it remains constant; the number of transient iterations is therefore finite and bounded by (threshold minus initial value) divided by the increment size. The ergodic average in Theorem 4.1 already includes all iterations, but the proof currently relies on eventual stabilization without isolating the transient contribution. In the revision we will add a short paragraph bounding the transient phase separately: its total contribution to the summed stationarity measure is O(1) (independent of T), which is absorbed into the O(T^{-1/4}) ergodic rate without changing the asymptotic order. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper constructs Prox-PEP via explicit algorithmic choices (quadratic approximations chosen to enforce strong convexity of augmented-Lagrangian subproblems, monotonic increase of the equality penalty to a fixed threshold, and slack-variable exact-penalty reformulation), then derives the stated O(T^{-1/4}) expected and O(T^{-1/8}) high-probability bounds on Moreau-envelope stationarity, constraint violation, and complementarity directly from those choices plus standard weak-convexity and light-tailed martingale assumptions. No load-bearing step reduces to a self-definition, a fitted parameter renamed as a prediction, or a self-citation chain; the complexity statements are obtained by standard analysis of the proximal updates and are not tautological with the inputs. The derivation is therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on domain assumptions standard in stochastic optimization plus the specific algorithmic construction of Prox-PEP; no new entities are postulated and only one tunable threshold appears.

free parameters (1)
  • equality penalty threshold
    Predefined upper limit that the dynamic penalty parameter increases to and then stays constant; required for the asymptotic analysis.
axioms (2)
  • domain assumption Objective and constraint functions are weakly convex
    Invoked throughout to justify the quadratic approximations and stationarity measures.
  • domain assumption Stochastic noise satisfies light-tailed martingale assumptions
    Used to obtain the high-probability convergence bounds.

pith-pipeline@v0.9.0 · 5539 in / 1552 out tokens · 52788 ms · 2026-05-11T01:50:56.676641+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Beck, First-Order Methods in Optimization, SIAM, Philadelphia, 2017

    A. Beck, First-Order Methods in Optimization, SIAM, Philadelphia, 2017

  2. [2]

    D. Boob, Q. Deng and G. Lan, Stochastic first-order methods for convex and nonconvex func- tional constrained optimization, Mathematical Programming, 197 (2023), 215–279

  3. [3]

    F. E. Curtis, M. J. O’Neill and D. P. Robinson, Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization, Mathematical Programming, 205 (2024), 431–483

  4. [4]

    Davis and D

    D. Davis and D. Drusvyatskiy, Stochastic model-based minimization of weakly convex func- tions, SIAM Journal on Optimization, 29:1(2019), 207–239

  5. [5]

    Drusvyatskiy and A

    D. Drusvyatskiy and A. S. Lewis, Error bounds, quadratic growth, and linear convergence of proximal methods, Mathematics of Operations Research, 43:3 (2018), 919–948

  6. [6]

    Ghadimi and G

    S. Ghadimi and G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework, SIAM Journal on Op- timization, 22 (2012), 1469–1492

  7. [7]

    Ghadimi and G

    S. Ghadimi and G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms, SIAM Journal on Optimization, 23:4 (2013), 2061–2089

  8. [8]

    Ghadimi, G

    S. Ghadimi, G. Lan and H. Zhang, Mini-batch stochastic approximation methods for noncon- vex stochastic composite optimization, Mathematical Programming, 155:1 (2016), 267–305

  9. [9]

    Jin and X

    L. Jin and X. Wang, A stochastic primal-dual method for a class of nonconvex constrained optimization, Computational Optimization and Applications, 83 (2022), 143–180

  10. [10]

    Lan, First-Order and Stochastic Optimization Methods for Machine Learning , Springer Series in the Data Sciences, Springer, Cham, Switzerland, 2020

    G. Lan, First-Order and Stochastic Optimization Methods for Machine Learning , Springer Series in the Data Sciences, Springer, Cham, Switzerland, 2020

  11. [11]

    Lan, An optimal method for stochastic composite optimization, Mathematical Program- ming, 133:1 (2012), 365–397

    G. Lan, An optimal method for stochastic composite optimization, Mathematical Program- ming, 133:1 (2012), 365–397. 46

  12. [12]

    G. Lan, A. Nemirovski and A. Shapiro, Validation analysis of mirror descent stochastic ap- proximation method, Mathematical Programming, 134:2 (2012), 425–458

  13. [13]

    Lan and Z

    G. Lan and Z. Zhou, Algorithms for stochastic optimization with expectation constraints,Com- putational Optimization and Applications, 76 (2020), 461–498

  14. [14]

    Li, P.-Y

    Z. Li, P.-Y . Chen, S. Liu, S. Lu and Y . Xu, Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization, Computational Optimization and Appli- cations, 87 (2024), 117–147

  15. [15]

    Nemirovski, A

    A. Nemirovski, A. Juditsky, G. Lan and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 19 (2009), 1574–1609

  16. [16]

    B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging,SIAM Journal on Control and Optimization, 30 (1992), 838–855

  17. [17]

    Robbins and S

    H. Robbins and S. Monro, A stochastic approximation method,Annals of Mathematical Statis- tics, 22 (1951), 400–407

  18. [18]

    Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning, 4:2 (2011), 107–194

    S. Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning, 4:2 (2011), 107–194

  19. [19]

    Q. Shi, X. Wang and H. Wang, A momentum-based linearized augmented Lagrangian method for nonconvex constrained stochastic optimization,Mathematics of Operations Research, pub- lished online, 23 Jan 2025, https://doi.org/10.1287/moor.2022.0193

  20. [20]

    Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research, 11 (2010), 2543–2596

    L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research, 11 (2010), 2543–2596

  21. [21]

    Xu, Primal-dual stochastic gradient method for convex programs with many functional constraints, SIAM Journal on Optimization, 30:2 (2020), 1664–1692

    Y . Xu, Primal-dual stochastic gradient method for convex programs with many functional constraints, SIAM Journal on Optimization, 30:2 (2020), 1664–1692

  22. [22]

    H. Yu, M. J. Neely and X. Wei, Online convex optimization with stochastic constraints, arXiv:1708.03741v1, 2017

  23. [23]

    Zhang, Y

    L. Zhang, Y . Zhang, X. Xiao and J. Wu, Stochastic approximation proximal method of multi- pliers for convex stochastic programming, Mathematics of Operations Research, 48:1 (2023), 177–193. 47