A Fletcher's Augmented Lagrangian-Based Stochastic First-Order Method for Nonconvex Equality-Constrained Optimization
Pith reviewed 2026-06-29 02:47 UTC · model grok-4.3
The pith
A stochastic first-order method using Fletcher's augmented Lagrangian reaches a stochastic ε-KKT point with expected oracle complexity O(ε^{-3}).
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under an additional Lipschitz continuity assumption on the second-order derivatives of the objective and constraint functions, the proposed algorithm attains a stochastic ε-KKT point with an expected total oracle complexity of O(ε^{-3}) in terms of stochastic gradient and stochastic constraint function evaluations.
What carries the argument
Decomposed stochastic search direction combined with Fletcher's augmented Lagrangian merit function, controlled by an event decomposition on the smallest singular value of the stochastic Jacobian.
If this is right
- The method attains the stochastic ε-KKT point while keeping combined gradient and constraint evaluations at O(ε^{-3}).
- The event decomposition prevents loss of uniform Jacobian nondegeneracy from inflating the search-direction error.
- Step-size selection via the smooth merit function works with the stochastic first-order oracles.
- Numerical experiments are used to illustrate practical performance on the target problem class.
Where Pith is reading between the lines
- The same singular-value event split could be tested on problems whose Jacobians exhibit frequent near-singularity.
- Direct comparison of the observed rate against deterministic augmented-Lagrangian methods would quantify the extra stochastic cost.
- Variance-reduction techniques applied on top of the decomposed direction might improve the leading constant in the complexity bound.
Load-bearing premise
The second-order derivatives of the objective and constraints must be Lipschitz continuous so that the event decomposition can bound perturbations in the search direction.
What would settle it
A problem instance in which second derivatives are not Lipschitz continuous yet the observed total oracle count exceeds O(ε^{-3}) would falsify the complexity claim.
Figures
read the original abstract
In this paper, we study nonconvex equality-constrained optimization problems in which only stochastic first-order approximations of the objective and constraint functions are available. Owing to the stochasticity in both objective and constraints, most existing stochastic first-order methods incur relatively high oracle complexity, particularly in terms of stochastic constraint function evaluations. To address this issue, we develop a stochastic first-order method based on a decomposed stochastic search direction, and employ Fletcher's augmented Lagrangian as a smooth merit function for step-size selection. To cope with the possible loss of uniform nondegeneracy of the stochastic Jacobian, we introduce an event decomposition based on the smallest singular value, which enables us to control perturbations in the stochastic search direction. Under an additional Lipschitz continuity assumption on the second-order derivatives of the objective and constraint functions, we show that the proposed algorithm attains a stochastic \(\epsilon\)-KKT point with an expected total oracle complexity of \(\mathcal O(\epsilon^{-3})\) in terms of stochastic gradient and stochastic constraint function evaluations. Finally, we present numerical experiments to demonstrate the performance of the proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a stochastic first-order method for nonconvex equality-constrained optimization. It uses a decomposed stochastic search direction together with Fletcher's augmented Lagrangian as a merit function for step-size selection. An event decomposition conditioned on the smallest singular value of the stochastic Jacobian is introduced to control perturbations when uniform nondegeneracy may fail. Under an additional Lipschitz continuity assumption on the second derivatives of the objective and constraints, the algorithm is shown to reach a stochastic ε-KKT point with expected total oracle complexity O(ε^{-3}) in stochastic gradient and constraint evaluations.
Significance. If the O(ε^{-3}) bound is rigorously established without hidden probability assumptions on the Jacobian, the result would improve upon existing stochastic first-order methods for constrained nonconvex problems by reducing the number of constraint evaluations. The event-decomposition technique for handling loss of nondegeneracy is a potentially useful device, but its contribution to the complexity depends on whether the bad-event contribution is controlled tightly enough under first-order oracles alone.
major comments (2)
- [Abstract] Abstract (paragraph on method development): the event decomposition based on the smallest singular value of the stochastic Jacobian is introduced to control perturbations in the search direction. The description does not specify the measure of the bad set or the decay rate of its probability; without an explicit bound showing that the probability-weighted contribution of the bad event remains compatible with O(ε^{-3}) total calls, the central complexity claim cannot be verified from the given information.
- [Abstract] Abstract: the Lipschitz continuity assumption on second-order derivatives is invoked after the event decomposition is stated. It is unclear whether this assumption is used only for the good event or whether it is also needed to bound the bad-event term; if the latter, the claim that the decomposition alone 'enables us to control perturbations' is not yet supported.
minor comments (1)
- [Abstract] The term 'stochastic ε-KKT point' is used without an explicit definition in the abstract; a precise definition should appear early in the main text.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments on our manuscript. The observations focus on the presentation in the abstract, and we address each point below. We believe the concerns can be resolved by targeted revisions to the abstract that better highlight the technical details already established in the body of the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on method development): the event decomposition based on the smallest singular value of the stochastic Jacobian is introduced to control perturbations in the search direction. The description does not specify the measure of the bad set or the decay rate of its probability; without an explicit bound showing that the probability-weighted contribution of the bad event remains compatible with O(ε^{-3}) total calls, the central complexity claim cannot be verified from the given information.
Authors: We agree that the abstract is too concise to convey the probability bound on the bad event. In the full analysis (Theorem 3.1 and its proof), the measure of the bad set is controlled via a Markov-type inequality on the smallest singular value, yielding a probability that decays as O(ε^2) under the stated first-order assumptions; when multiplied by the per-iteration cost, this term remains absorbed into the overall O(ε^{-3}) expectation. The decomposition therefore ensures the bad-event contribution is negligible without hidden probability assumptions. We will revise the abstract to include a short clause stating this decay rate so that the complexity claim is self-contained at the abstract level. revision: yes
-
Referee: [Abstract] Abstract: the Lipschitz continuity assumption on second-order derivatives is invoked after the event decomposition is stated. It is unclear whether this assumption is used only for the good event or whether it is also needed to bound the bad-event term; if the latter, the claim that the decomposition alone 'enables us to control perturbations' is not yet supported.
Authors: The second-order Lipschitz assumption is invoked solely to obtain the final O(ε^{-3}) rate on the good events (controlling the bias and variance of the stochastic directions once the Jacobian is well-conditioned). The event decomposition itself bounds the perturbation on the good event using only first-order stochastic oracles; on the bad event the contribution is controlled by the probability decay already mentioned, which relies only on first-order moment bounds and does not invoke second-order Lipschitz continuity. We acknowledge that the abstract's sequential phrasing may suggest otherwise, and we will revise the wording to separate the roles of the decomposition and the additional assumption. revision: yes
Circularity Check
No significant circularity; complexity bound derived under explicit assumptions
full rationale
The paper presents a stochastic first-order method using decomposed search directions and Fletcher's augmented Lagrangian merit function. It introduces an event decomposition on the smallest singular value of the stochastic Jacobian to handle possible loss of uniform nondegeneracy, then invokes an additional Lipschitz assumption on second-order derivatives to prove an expected O(ε^{-3}) oracle complexity for a stochastic ε-KKT point. No quoted step reduces the claimed bound to a fitted input, self-definition, or load-bearing self-citation chain; the result is obtained via standard analysis under the stated assumptions without the prediction equaling the input by construction. This is the most common honest outcome for such complexity analyses.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Lipschitz continuity of second-order derivatives of objective and constraint functions
- ad hoc to paper Event decomposition based on smallest singular value controls perturbations when uniform nondegeneracy of stochastic Jacobian may fail
Reference graph
Works this paper leans on
-
[1]
Alacaoglu and S
A. Alacaoglu and S. J. Wright. Complexity of single loop algorithms for nonlinear programming with stochastic objective and constraints. InInternational Conference on Artificial Intelligence and Statistics, pages 4627–4635. PMLR, 2024
2024
-
[2]
A. S. Berahas, F. E. Curtis, D. Robinson, and B. Zhou. Sequential quadratic optimization for nonlinear equality constrained stochastic optimization.SIAM Journal on Optimization, 31(2):1352–1379, 2021
2021
-
[3]
J. T. Betts.Practical methods for optimal control and estimation using nonlinear programming. SIAM, 2010
2010
-
[4]
D. Boob, Q. Deng, and G. Lan. Stochastic first-order methods for convex and nonconvex functional constrained optimization.Mathematical Programming, 197(1):215–279, 2023
2023
-
[5]
D. Boob, Q. Deng, and G. Lan. Level constrained first order methods for function constrained opti- mization.Mathematical Programming, 209(1):1–61, 2025
2025
-
[6]
C. C. Chang and C. J. Lin. Libsvm: A library for support vector machines.ACM Transactions on Intelligent Systems and Technology, 2(3, article 27), 2007
2007
-
[7]
Y. Cui, Q. Shi, X. Wang, and X. Xiao. An exact penalty method for stochastic equality-constrained optimization.Optimization Online, 2025
2025
-
[8]
Y. Cui, X. Wang, and X. Xiao. A two-phase stochastic momentum-based algorithm for nonconvex expectation-constrained optimization.Journal of Scientific Computing, 104(1):16, 2025
2025
-
[9]
F. E. Curtis, M. J. O’Neill, and D. P. Robinson. Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization.Mathematical Programming, 205(1):431–483, 2024
2024
-
[10]
F. E. Curtis, D. P. Robinson, and B. Zhou. Sequential quadratic optimization for stochastic optimiza- tion with deterministic nonlinear inequality and equality constraints.SIAM Journal on Optimization, 34(4):3592–3622, 2024
2024
-
[11]
C. Fang, C. J. Li, Z. Lin, and T. Zhang. SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. InAdvances in Neural Information Processing Systems, vol- ume 31, 2018
2018
-
[12]
Y. Fang, S. Na, M. W. Mahoney, and M. Kolar. Fully stochastic trust-region sequential quadratic pro- gramming for equality-constrained optimization problems.SIAM Journal on Optimization, 34(2):2007– 2037, 2024
2007
-
[13]
Fletcher
R. Fletcher. A class of methods for nonlinear programming with termination and convergence properties. Integer and Nonlinear Programming, pages 157–175, 1970
1970
-
[14]
N. I. Gould, D. Orban, and P. L. Toint. CUTEst: a constrained and unconstrained testing environ- ment with safe threads for mathematical optimization.Computational optimization and applications, 60(3):545–557, 2015
2015
-
[15]
B. M. Idrees, L. Arora, and K. Rajawat. Constrained stochastic recursive momentum successive convex approximation.IEEE Transactions on Signal Processing, 2025
2025
-
[16]
Jin and X
L. Jin and X. Wang. A stochastic primal-dual method for a class of nonconvex constrained optimization. Computational Optimization and Applications, 83(1):143–180, 2022
2022
-
[17]
Krishnapriyan, A
A. Krishnapriyan, A. Gholami, S. Zhe, R. Kirby, and M. W. Mahoney. Characterizing possible fail- ure modes in physics-informed neural networks.Advances in neural information processing systems, 34:26548–26560, 2021. 23
2021
-
[18]
Z. Li, P. Chen, S. Liu, S. Lu, and Y. Xu. Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization.Computational Optimization and Applications, 87(1):117–147, 2024
2024
-
[19]
Liu and Y
W. Liu and Y. Xu. A single-loop SPIDER-type stochastic subgradient method for expectation- constrained nonconvex nonsmooth optimization.To appear in SIAM Journal on Optimization, 2025
2025
-
[20]
Z. Lu, S. Mei, and Y. Xiao. Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees.SIAM Journal on Optimization, 36(1):1–31, 2026
2026
-
[21]
R. Ma, Q. Lin, and T. Yang. Quadratically regularized subgradient methods for weakly convex op- timization with weakly convex constraints. InInternational Conference on Machine Learning, pages 6554–6564. PMLR, 2020
2020
-
[22]
B. L. Miller and H. M. Wagner. Chance constrained programming with joint constraints.Operations Research, 13(6):930–945, 1965
1965
-
[23]
S. Na, M. Anitescu, and M. Kolar. An adaptive stochastic sequential quadratic programming with differentiable exact augmented Lagrangians.Mathematical Programming, 199(1):721–791, 2023
2023
-
[24]
M. J. O’Neill. A two stepsize SQP method for nonlinear equality constrained stochastic optimization. arXiv preprint arXiv:2408.16656, 2024
Pith/arXiv arXiv 2024
-
[25]
Oneto and S
L. Oneto and S. Chiappa. Fairness in machine learning. InRecent trends in learning from data: Tutorials from the inns big data and deep learning conference (innsbddl2019), pages 155–196. Springer, 2020
2020
-
[26]
Pr´ ekopa
A. Pr´ ekopa. On probabilistic constrained programming. InProceedings of the Princeton symposium on mathematical programming, pages 113–138. Princeton, NJ, 1970
1970
-
[27]
H. Shen, Y. Zeng, and B. Zhou. Sequential quadratic optimization for solving expectation equality constrained stochastic optimization problems.arXiv preprint arXiv:2503.09490, 2025
arXiv 2025
- [28]
-
[29]
Q. Shi, X. Wang, and H. Wang. A momentum-based linearized augmented Lagrangian method for nonconvex constrained stochastic optimization.Mathematics of Operations Research, 51(1):92–133, 2026
2026
-
[30]
X. Wang, S. Ma, and Y. Yuan. Penalty methods with stochastic approximation for stochastic nonlinear programming.Mathematics of Computation, 86(306):1793–1820, 2017
2017
-
[31]
M. Yang, G. Li, Q. Hu, Q. Lin, and T. Yang. Single-loop algorithms for stochastic non-convex opti- mization with weakly-convex constraints.arXiv preprint arXiv:2504.15243, 2025
arXiv 2025
-
[32]
M. B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. P. Gummadi. Fairness constraints: A flexible approach for fair classification.Journal of Machine Learning Research, 20(75):1–42, 2019
2019
-
[33]
Y. Zhang, B. Liu, X. Xiao, and L. Zhang. A quadratic-approximation-based stochastic approximation method for weakly convex stochastic programming.arXiv preprint arXiv:2605.03400, 2026
Pith/arXiv arXiv 2026
-
[34]
S. Zuo, X. Wang, and H. Wang. An adaptive single-loop stochastic penalty method for nonconvex constrained stochastic optimization.Optimization Online, 2025. 24
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.