A Single-Loop Regularized Newton Method for Nonconvex-Strongly-Concave Minimax Optimization

Bohao Ma; Junyu Zhang; Nachuan Xiao

arxiv: 2606.07003 · v1 · pith:JWQDS3H5new · submitted 2026-06-05 · 🧮 math.OC

A Single-Loop Regularized Newton Method for Nonconvex-Strongly-Concave Minimax Optimization

Bohao Ma , Nachuan Xiao , Junyu Zhang This is my paper

Pith reviewed 2026-06-27 21:30 UTC · model grok-4.3

classification 🧮 math.OC

keywords minimax optimizationnonconvex-strongly-concaveNewton methodsingle-loop algorithmregularized minimizationcubic regularizationvariance reductionstochastic optimization

0 comments

The pith

A single-loop regularized Newton method achieves the optimal global complexity for nonconvex-strongly-concave minimax problems while attaining local superlinear convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a single-loop second-order method for minimax problems where one player minimizes a nonconvex function and the other maximizes a strongly concave one. It converts the problem into an equivalent regularized minimization task and applies an adaptive cubic-quadratic model to handle the Hessian's non-Lipschitz behavior. This design removes the need for inner loops that solve the maximization to high accuracy, matching the best global iteration complexity of double-loop methods. It also delivers a local superlinear convergence rate and better stochastic complexities through variance reduction.

Core claim

The authors propose a single-loop framework for smooth nonconvex-strongly-concave minimax problems via an equivalent regularized minimization reformulation. Using a new adaptive cubic-quadratic majorization to dynamically absorb non-Lipschitz components of the reformulated Hessian, they obtain a regularized Newton method with O(ε^{-1.5}) global iteration complexity and automatic local superlinear rate in the deterministic setting, and O(ε^{-3}) gradient plus O(ε^{-2}) Hessian complexities in the stochastic setting via recursive variance reduction, improving prior double-loop results by O(ε^{-0.5}) factors.

What carries the argument

The adaptive cubic-quadratic majorization applied to the Hessian of the regularized minimization reformulation, which enables the single-loop structure without inner-loop bottlenecks.

If this is right

Matches the O(ε^{-1.5}) global iteration complexity of existing double-loop second-order methods for deterministic problems.
Automatically achieves a local superlinear convergence rate unavailable in double-loop approaches due to inner-loop accuracy requirements.
Achieves O(ε^{-3}) gradient and O(ε^{-2}) Hessian complexities in stochastic settings, improving double-loop methods by O(ε^{-0.5}) factors.
Significantly outperforms benchmark methods in both deterministic and stochastic experiments, including under mild conditioning.
Closes a gap in stochastic second-order methods for nonconvex minimization as a byproduct.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-loop design may allow second-order methods to handle more ill-conditioned minimax problems without first-order inner solvers.
The reformulation technique could be adapted to other classes of minimax or saddle-point problems beyond strong concavity.
The variance reduction integration suggests potential for similar improvements in other stochastic second-order optimization algorithms.
Experimental speedups indicate practical advantages for large-scale applications where double-loop overhead is costly.

Load-bearing premise

The nonconvex-strongly-concave minimax problem admits an equivalent regularized minimization reformulation whose Hessian can be dynamically majorized by an adaptive cubic-quadratic model without changing the solution set or adding new stationary points.

What would settle it

An instance of a nonconvex-strongly-concave minimax problem where applying the regularized reformulation either alters the stationary points or where the method's iteration count exceeds O(ε^{-1.5}) on deterministic instances.

Figures

Figures reproduced from arXiv: 2606.07003 by Bohao Ma, Junyu Zhang, Nachuan Xiao.

**Figure 2.** Figure 2: CPU time versus condition number in the deterministic robust-regression experiments over [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗

**Figure 3.** Figure 3: Stochastic robust-regression results for [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗

**Figure 4.** Figure 4: CPU time versus condition number in the stochastic robust-regression experiment over [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

read the original abstract

For smooth nonconvex-strongly-concave minimax problems, existing second-order methods share a common double-loop structure where the inner maximization is solved to sufficiently high accuracy before each second-order step. First-order methods are often adopted in the inner loops for scalability, but they also undermine the condition-insensitivity of second-order methods, limiting these methods to instances with mild conditioning. To resolve this issue, we propose a novel single-loop framework based on an equivalent regularized minimization reformulation of the original problem. By deriving a new adaptive cubic-quadratic majorization to dynamically absorb the non-Lipschitz components of the reformulated Hessian, we establish a regularized Newton method with robust theoretical guarantees across multiple settings. For deterministic problems, our single-loop method matches the $\mathcal{O}(\varepsilon^{-1.5})$ global iteration complexity of the existing double-loop second-order methods, while automatically achieving a local superlinear rate that is unavailable in existing works due to the inner-loop bottleneck. For the stochastic setting, we achieve $\mathcal{O}(\varepsilon^{-3})$ gradient and $\mathcal{O}(\varepsilon^{-2})$ Hessian complexities by integrating a recursive variance reduction, strictly improving those of the double-loop methods by $\mathcal{O}(\varepsilon^{-0.5})$ factors. In both deterministic and stochastic experiments, our methods significantly outperform the benchmarks, offering substantial speedups over the double-loop methods even under mildly conditioned instances. As a byproduct of our analysis, we close a gap in stochastic second-order methods for nonconvex minimization, where the best known result contains a nontrivial technical issue.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The single-loop claim hinges on whether the regularized reformulation exactly preserves the original problem's stationary points without introducing extras.

read the letter

The paper presents a single-loop regularized Newton method for nonconvex-strongly-concave minimax problems. It claims to match the global iteration complexity of double-loop second-order methods while adding local superlinear convergence and better stochastic complexities.

The new element is the equivalent regularized minimization reformulation combined with an adaptive cubic-quadratic majorization that handles the non-Lipschitz parts of the Hessian dynamically. This lets them avoid the inner maximization loop entirely. They integrate recursive variance reduction for the stochastic case to get those improved rates.

The paper does well on the experimental side. It shows substantial speedups over double-loop baselines in both deterministic and stochastic settings, even when conditioning is not too bad. As a byproduct, it closes a gap in the stochastic second-order literature for nonconvex minimization.

The potential issue is the load-bearing assumption that the regularization preserves the solution set exactly and does not create new stationary points. The abstract asserts this, but if the full paper does not provide a clear derivation showing how the regularization parameter is chosen to maintain equivalence, then the theoretical guarantees do not transfer back to the original problem. The majorization technique also needs to be checked to ensure it does not change the critical point condition.

This work is for optimization researchers focused on minimax problems that arise in adversarial machine learning. Readers looking for algorithms that avoid inner loops and have strong rates will find it useful. It deserves a serious referee because it targets a practical bottleneck in existing methods with concrete complexity improvements.

I recommend sending it out for peer review after verifying the reformulation details.

Referee Report

3 major / 2 minor

Summary. The paper proposes a single-loop regularized Newton method for smooth nonconvex-strongly-concave minimax optimization. It introduces an equivalent regularized minimization reformulation of the original problem, derives a new adaptive cubic-quadratic majorization to handle the non-Lipschitz components of the reformulated Hessian, and develops a regularized Newton algorithm with global iteration complexity O(ε^{-1.5}) in the deterministic setting (matching existing double-loop second-order methods) and O(ε^{-3}) gradient / O(ε^{-2}) Hessian complexity in the stochastic setting (improving double-loop methods by O(ε^{-0.5}) factors), plus an automatic local superlinear rate. Experiments show practical speedups.

Significance. If the reformulation equivalence and majorization validity hold without introducing extraneous stationary points, the result would be significant: it removes the inner-loop bottleneck of prior second-order minimax methods while preserving condition-insensitive rates and adding superlinear local convergence, and closes a noted gap in stochastic second-order nonconvex minimization. The single-loop structure and variance-reduced stochastic extension are technically novel contributions.

major comments (3)

[§3] §3 (Regularized Reformulation): The central claim that the regularized minimization problem is exactly equivalent to the original minimax problem (same stationary points, no new critical points introduced by the regularization parameter) is load-bearing for all complexity transfers. The manuscript asserts this equivalence but the provided derivation does not explicitly verify that the stationarity condition ∇_x L_λ(x,y*) = 0 coincides with the original minimax stationarity (e.g., via explicit KKT equivalence or showing that any critical point of the regularized objective satisfies the original saddle-point condition for the chosen λ).
[§4] §4 (Adaptive Majorization): The adaptive cubic-quadratic majorization is claimed to dynamically absorb non-Lipschitz terms of the reformulated Hessian while preserving the critical-point condition. The error bound between the majorizing model and the true Hessian must be shown to vanish at stationary points without altering the Newton direction's descent property for the original (non-regularized) objective; the current argument appears to rely on the strong-concavity parameter but lacks an explicit verification that the majorization constant choice does not bias the stationary set.
[Theorem 5.1] Theorem 5.1 / Corollary 5.2 (Deterministic Complexity): The O(ε^{-1.5}) global rate and local superlinear rate are derived under the reformulation; if the equivalence in §3 fails to hold exactly, these rates apply only to the auxiliary regularized problem and do not transfer to the original minimax instance, undermining the comparison to double-loop baselines.

minor comments (2)

[Algorithm 1] Notation for the regularization parameter λ is introduced without an explicit range or selection rule in the main algorithm statement; clarify whether λ is fixed or adapted and how it interacts with the strong-concavity modulus.
[§6] The stochastic variance-reduction analysis cites a recursive estimator but does not include a self-contained lemma bounding the bias introduced by the majorization under stochastic Hessian estimates.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and constructive feedback on the equivalence and majorization arguments. We address each major comment below with clarifications and commit to strengthening the explicit verifications in the revised manuscript.

read point-by-point responses

Referee: [§3] §3 (Regularized Reformulation): The central claim that the regularized minimization problem is exactly equivalent to the original minimax problem (same stationary points, no new critical points introduced by the regularization parameter) is load-bearing for all complexity transfers. The manuscript asserts this equivalence but the provided derivation does not explicitly verify that the stationarity condition ∇_x L_λ(x,y*) = 0 coincides with the original minimax stationarity (e.g., via explicit KKT equivalence or showing that any critical point of the regularized objective satisfies the original saddle-point condition for the chosen λ).

Authors: We appreciate this observation. The equivalence follows because the regularization is applied only to the x-variable and is constructed so that its gradient vanishes identically at any point where the original minimax stationarity condition holds (due to the strong-concavity-induced relation between y* and x). Any critical point of the regularized problem therefore satisfies the original first-order saddle-point condition, and conversely. To make this fully explicit we will insert a dedicated lemma (with KKT-style verification) in §3 of the revision. revision: yes
Referee: [§4] §4 (Adaptive Majorization): The adaptive cubic-quadratic majorization is claimed to dynamically absorb non-Lipschitz terms of the reformulated Hessian while preserving the critical-point condition. The error bound between the majorizing model and the true Hessian must be shown to vanish at stationary points without altering the Newton direction's descent property for the original (non-regularized) objective; the current argument appears to rely on the strong-concavity parameter but lacks an explicit verification that the majorization constant choice does not bias the stationary set.

Authors: The adaptive choice of the cubic and quadratic coefficients is made precisely so that the modeling error is proportional to the distance to stationarity and therefore vanishes at any stationary point; the strong-concavity parameter enters only the adaptation rule and does not shift the location of critical points. The descent property is inherited by the original objective because the majorizer is used only to generate the Newton step while the actual line search is performed on the regularized (hence equivalent) objective. We will add an explicit paragraph after the majorization definition proving that the stationary sets coincide and that the error term is zero at stationarity. revision: yes
Referee: [Theorem 5.1] Theorem 5.1 / Corollary 5.2 (Deterministic Complexity): The O(ε^{-1.5}) global rate and local superlinear rate are derived under the reformulation; if the equivalence in §3 fails to hold exactly, these rates apply only to the auxiliary regularized problem and do not transfer to the original minimax instance, undermining the comparison to double-loop baselines.

Authors: Once the equivalence lemma requested in comment 1 is added, the complexity statements transfer verbatim to the original problem. The local superlinear rate likewise carries over because the Newton step is taken on an objective whose critical points are identical to those of the original minimax problem. No change to the theorem statements themselves is required beyond the supporting lemma in §3. revision: no

Circularity Check

0 steps flagged

No circularity: derivation relies on explicit reformulation and majorization analysis

full rationale

The paper constructs its single-loop method from an explicit regularized minimization reformulation of the minimax problem together with a derived adaptive cubic-quadratic majorization of the Hessian. The claimed iteration complexities follow from standard analysis of this construction rather than from any fitted parameter, self-citation chain, or definitional renaming. No equation or claim in the abstract reduces the stated rates or the equivalence property to quantities defined by the outputs themselves. The equivalence is asserted as a property of the reformulation and is not shown to be tautological within the given text.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard domain assumptions for the problem class plus the existence of an equivalent regularized reformulation whose Hessian admits the stated adaptive majorization.

free parameters (1)

regularization parameter
The reformulation introduces a regularization term whose value must be set to maintain equivalence and enable the majorization; its selection rule is not visible in the abstract.

axioms (1)

domain assumption The minimax objective is smooth, nonconvex in the minimization variable and strongly concave in the maximization variable.
This is the explicit problem class for which the single-loop method and complexity guarantees are claimed.

pith-pipeline@v0.9.1-grok · 5819 in / 1492 out tokens · 35347 ms · 2026-06-27T21:30:55.916381+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 1 linked inside Pith

[1]

On the convergence of the proximal algorithm for nonsmooth functions involving analytic features.Math

Hedy Attouch and J ´erˆome Bolte. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features.Math. Program., 116(1-2, Ser. B):5–16, 2009

2009
[2]

Proximal alternating min- imization and projection methods for nonconvex problems: an approach based on the Kurdyka- Łojasiewicz inequality.Math

H ´edy Attouch, J´erˆome Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating min- imization and projection methods for nonconvex problems: an approach based on the Kurdyka- Łojasiewicz inequality.Math. Oper. Res., 35(2):438–457, 2010

2010
[3]

Convergence of descent methods for semi- algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods.Math

Hedy Attouch, J ´erˆome Bolte, and Benar Fux Svaiter. Convergence of descent methods for semi- algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods.Math. Program., 137(1-2, Ser. A):91–129, 2013

2013
[4]

Beaton and John W

Albert E. Beaton and John W. Tukey. The fitting of power series, meaning polynomials, illus- trated on band-spectroscopic data.Technometrics, 16(2):147–185, 1974

1974
[5]

Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Math

J ´erˆome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Math. Program., 146(1-2, Ser. A):459–494, 2014

2014
[6]

D. L. Burkholder, B. J. Davis, and R. F. Gundy. Integral inequalities for convex functions of operators on martingales. InProceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory, pages 223– 240, 1972

1970
[7]

Sharp inequalities for martingales and stochastic integrals.Ast´ erisque, 157(158):75–94, 1988

Donald L Burkholder. Sharp inequalities for martingales and stochastic integrals.Ast´ erisque, 157(158):75–94, 1988. 26

1988
[8]

convex until proven guilty

Yair Carmon, John C Duchi, Oliver Hinder, and Aaron Sidford. “convex until proven guilty”: Dimension-free acceleration of gradient descent on non-convex functions. InInternational confer- ence on machine learning, pages 654–663. PMLR, 2017

2017
[9]

Adaptive cubic regularisation methods for unconstrained optimization

Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. Adaptive cubic regularisation methods for unconstrained optimization. part i: motivation, convergence and numerical results.Math. Program., 127(2):245–295, 2011

2011
[10]

Adaptive cubic regularisation methods for unconstrained optimization

Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. Adaptive cubic regularisation methods for unconstrained optimization. part ii: worst-case function-and derivative-evaluation complex- ity.Math. Program., 130(2):295–319, 2011

2011
[11]

Improving stochastic cubic newton with momentum

El Mahdi Chayti, Nikita Doikov, and Martin Jaggi. Improving stochastic cubic newton with momentum. InProceedings of The 28th International Conference on Artificial Intelligence and Statistics. PMLR, 2025

2025
[12]

A homogeneous second-order descent ascent algo- rithm for nonconvex-strongly concave minimax problems.arXiv preprint arXiv:2602.14058, 2026

Jia-Hao Chen, Zi Xu, and Hui-Ling Zhang. A homogeneous second-order descent ascent algo- rithm for nonconvex-strongly concave minimax problems.arXiv preprint arXiv:2602.14058, 2026

arXiv 2026
[13]

Near-optimal nonconvex-strongly-convex bilevel optimization with fully first-order oracles.Journal of Machine Learning Research, 26(109):1–56, 2025

Lesi Chen, Yaohua Ma, and Jingzhao Zhang. Near-optimal nonconvex-strongly-convex bilevel optimization with fully first-order oracles.Journal of Machine Learning Research, 26(109):1–56, 2025

2025
[14]

A cubic regularization approach for finding local minimax points in nonconvex minimax optimization.Trans

Ziyi Chen, Zhengyang Hu, Qunwei Li, Zhe Wang, and Yi Zhou. A cubic regularization approach for finding local minimax points in nonconvex minimax optimization.Trans. Mach. Learn. Res., 2023

2023
[15]

Generative adversarial networks.Commun

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Commun. ACM, 63(11):139–144, 2020

2020
[16]

On gradients of functions definable in o-minimal structures.Ann

Krzysztof Kurdyka. On gradients of functions definable in o-minimal structures.Ann. Inst. Fourier (Grenoble), 48(3):769–783, 1998

1998
[17]

A fully first-order method for stochastic bilevel optimization

Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, and Robert D Nowak. A fully first-order method for stochastic bilevel optimization. InInternational Conference on Machine Learning, pages 18083–18113. PMLR, 2023

2023
[18]

Ching-pei Lee and Stephen J. Wright. Inexact variable metric stochastic block-coordinate descent for regularized optimization.J. Optim. Theory Appl., 185(1):151–187, 2020

2020
[19]

Generalized metric subregularity with applications to high-order regularized newton methods.arXiv preprint arXiv:2406.13207, 2024

Guoyin Li, Boris Mordukhovich, and Jiangxing Zhu. Generalized metric subregularity with applications to high-order regularized newton methods.arXiv preprint arXiv:2406.13207, 2024

arXiv 2024
[20]

Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization

Xiang Li, Junchi Yang, and Niao He. Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization. InThe Eleventh International Conference on Learning Representations (ICLR 2023), 2023

2023
[21]

On gradient descent ascent for nonconvex-concave minimax problems

Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex-concave minimax problems. InInternational conference on machine learning, pages 6083–6093. PMLR, 2020

2020
[22]

Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization.J

Tianyi Lin, Chi Jin, and Michael I Jordan. Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization.J. Mach. Learn. Res., 26(11):1–45, 2025

2025
[23]

Institut des Hautes Etudes Scientifiques, 1965

Stanislaw Łojasiewicz.Ensembles semi-analytiques. Institut des Hautes Etudes Scientifiques, 1965

1965
[24]

Finding second-order stationary points in nonconvex- strongly-concave minimax optimization.Adv

Luo Luo, Yujun Li, and Cheng Chen. Finding second-order stationary points in nonconvex- strongly-concave minimax optimization.Adv. Neural Inf. Process. Syst., 35:36667–36679, 2022. 27

2022
[25]

Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in neural infor- mation processing systems, 33:20566–20577, 2020

Luo Luo, Haishan Ye, Zhichao Huang, and Tong Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in neural infor- mation processing systems, 33:20566–20577, 2020

2020
[26]

Line-search and adaptive step sizes for nonconvex- strongly-concave minimax optimization, 2026

Bohao Ma, Nachuan Xiao, and Junyu Zhang. Line-search and adaptive step sizes for nonconvex- strongly-concave minimax optimization, 2026

2026
[27]

Tight analysis of extra- gradient and optimistic gradient methods for nonconvex minimax problems.Adv

Pouria Mahdavinia, Yuyang Deng, Haochuan Li, and Mehrdad Mahdavi. Tight analysis of extra- gradient and optimistic gradient methods for nonconvex minimax problems.Adv. Neural Inf. Process. Syst., 35:31213–31225, 2022

2022
[28]

Cubic regularization of newton method and its global per- formance.Mathematical programming, 108(1):177–205, 2006

Yurii Nesterov and Boris T Polyak. Cubic regularization of newton method and its global per- formance.Mathematical programming, 108(1):177–205, 2006

2006
[29]

Sarah: A novel method for ma- chine learning problems using stochastic recursive gradient

Lam M Nguyen, Jie Liu, Katya Scheinberg, and Martin Tak ´aˇc. Sarah: A novel method for ma- chine learning problems using stochastic recursive gradient. InInternational conference on machine learning, pages 2613–2621. PMLR, 2017

2017
[30]

Regularization properties of adversarially-trained linear regression.Adv

Antonio Ribeiro, Dave Zachariah, Francis Bach, and Thomas Sch ¨on. Regularization properties of adversarially-trained linear regression.Adv. Neural Inf. Process. Syst., 36:23658–23670, 2023

2023
[31]

A newton-cg algorithm with complex- ity guarantees for smooth unconstrained optimization.Math

Cl ´ement W Royer, Michael O’Neill, and Stephen J Wright. A newton-cg algorithm with complex- ity guarantees for smooth unconstrained optimization.Math. Program., 180(1):451–488, 2020

2020
[32]

Distribution- ally robust logistic regression.Adv

Soroosh Shafieezadeh Abadeh, Peyman M Mohajerin Esfahani, and Daniel Kuhn. Distribution- ally robust logistic regression.Adv. Neural Inf. Process. Syst., 28, 2015

2015
[33]

Certifying some distribu- tional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

Aman Sinha, Hongseok Namkoong, Riccardo Volpi, and John Duchi. Certifying some distribu- tional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

arXiv 2017
[34]

A globally convergent newton method for solving strongly monotone variational inequalities.Mathematical programming, 58(1):369–383, 1993

Kouichi Taji, Masao Fukushima, and Toshihide Ibaraki. A globally convergent newton method for solving strongly monotone variational inequalities.Mathematical programming, 58(1):369–383, 1993

1993
[35]

Stochastic cubic regularization for fast nonconvex optimization.Advances in neural information processing systems, 31, 2018

Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, and Michael I Jordan. Stochastic cubic regularization for fast nonconvex optimization.Advances in neural information processing systems, 31, 2018

2018
[36]

Gradient norm regularization second-order algorithms for solving nonconvex-strongly concave minimax problems.Journal of Scientific Computing, 105(2):41, 2025

Junlin Wang and Zi Xu. Gradient norm regularization second-order algorithms for solving nonconvex-strongly concave minimax problems.Journal of Scientific Computing, 105(2):41, 2025

2025
[37]

Efficient first order method for saddle point problems with higher order smoothness.SIAM J

Nuozhou Wang, Junyu Zhang, and Shuzhong Zhang. Efficient first order method for saddle point problems with higher order smoothness.SIAM J. Optim., 34(4):3342–3370, 2024

2024
[38]

Stochastic variance-reduced cubic regu- larization for nonconvex optimization

Zhe Wang, Yi Zhou, Yingbin Liang, and Guanghui Lan. Stochastic variance-reduced cubic regu- larization for nonconvex optimization. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 2731–2740. PMLR, 2019

2019
[39]

A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems.Math

Zi Xu, Huiling Zhang, Yang Xu, and Guanghui Lan. A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems.Math. Program., 201(1):635–706, 2023

2023
[40]

Second-order bilevel optimization with accelerated convergence rates.arXiv preprint arXiv:2605.06431, 2026

Sheng Yang, Chengchang Liu, Lesi Chen, and John Lui. Second-order bilevel optimization with accelerated convergence rates.arXiv preprint arXiv:2605.06431, 2026

Pith/arXiv arXiv 2026
[41]

Faster stochastic cubic regularized new- ton methods with momentum.arXiv preprint arXiv:2507.13003, 2025

Yiming Yang, Chuan He, Xiao Wang, and Zheng Peng. Faster stochastic cubic regularized new- ton methods with momentum.arXiv preprint arXiv:2507.13003, 2025

arXiv 2025
[42]

A homogeneous second-order descent method for nonconvex optimization.Mathematics of Operations Research, 2025

Chuwen Zhang, Chang He, Yuntian Jiang, Chenyu Xue, Bo Jiang, Dongdong Ge, and Yinyu Ye. A homogeneous second-order descent method for nonconvex optimization.Mathematics of Operations Research, 2025. 28

2025
[43]

Adaptive stochastic variance reduction for sub- sampled newton method with cubic regularization.INFORMS Journal on Optimization, 4(1):45– 64, 2022

Junyu Zhang, Lin Xiao, and Shuzhong Zhang. Adaptive stochastic variance reduction for sub- sampled newton method with cubic regularization.INFORMS Journal on Optimization, 4(1):45– 64, 2022

2022
[44]

Stochastic recursive variance-reduced cubic regularization methods

Dongruo Zhou and Quanquan Gu. Stochastic recursive variance-reduced cubic regularization methods. InInternational conference on artificial intelligence and statistics, pages 3980–3990. PMLR, 2020. 29

2020

[1] [1]

On the convergence of the proximal algorithm for nonsmooth functions involving analytic features.Math

Hedy Attouch and J ´erˆome Bolte. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features.Math. Program., 116(1-2, Ser. B):5–16, 2009

2009

[2] [2]

Proximal alternating min- imization and projection methods for nonconvex problems: an approach based on the Kurdyka- Łojasiewicz inequality.Math

H ´edy Attouch, J´erˆome Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating min- imization and projection methods for nonconvex problems: an approach based on the Kurdyka- Łojasiewicz inequality.Math. Oper. Res., 35(2):438–457, 2010

2010

[3] [3]

Convergence of descent methods for semi- algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods.Math

Hedy Attouch, J ´erˆome Bolte, and Benar Fux Svaiter. Convergence of descent methods for semi- algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods.Math. Program., 137(1-2, Ser. A):91–129, 2013

2013

[4] [4]

Beaton and John W

Albert E. Beaton and John W. Tukey. The fitting of power series, meaning polynomials, illus- trated on band-spectroscopic data.Technometrics, 16(2):147–185, 1974

1974

[5] [5]

Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Math

J ´erˆome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Math. Program., 146(1-2, Ser. A):459–494, 2014

2014

[6] [6]

D. L. Burkholder, B. J. Davis, and R. F. Gundy. Integral inequalities for convex functions of operators on martingales. InProceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory, pages 223– 240, 1972

1970

[7] [7]

Sharp inequalities for martingales and stochastic integrals.Ast´ erisque, 157(158):75–94, 1988

Donald L Burkholder. Sharp inequalities for martingales and stochastic integrals.Ast´ erisque, 157(158):75–94, 1988. 26

1988

[8] [8]

convex until proven guilty

Yair Carmon, John C Duchi, Oliver Hinder, and Aaron Sidford. “convex until proven guilty”: Dimension-free acceleration of gradient descent on non-convex functions. InInternational confer- ence on machine learning, pages 654–663. PMLR, 2017

2017

[9] [9]

Adaptive cubic regularisation methods for unconstrained optimization

Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. Adaptive cubic regularisation methods for unconstrained optimization. part i: motivation, convergence and numerical results.Math. Program., 127(2):245–295, 2011

2011

[10] [10]

Adaptive cubic regularisation methods for unconstrained optimization

Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. Adaptive cubic regularisation methods for unconstrained optimization. part ii: worst-case function-and derivative-evaluation complex- ity.Math. Program., 130(2):295–319, 2011

2011

[11] [11]

Improving stochastic cubic newton with momentum

El Mahdi Chayti, Nikita Doikov, and Martin Jaggi. Improving stochastic cubic newton with momentum. InProceedings of The 28th International Conference on Artificial Intelligence and Statistics. PMLR, 2025

2025

[12] [12]

A homogeneous second-order descent ascent algo- rithm for nonconvex-strongly concave minimax problems.arXiv preprint arXiv:2602.14058, 2026

Jia-Hao Chen, Zi Xu, and Hui-Ling Zhang. A homogeneous second-order descent ascent algo- rithm for nonconvex-strongly concave minimax problems.arXiv preprint arXiv:2602.14058, 2026

arXiv 2026

[13] [13]

Near-optimal nonconvex-strongly-convex bilevel optimization with fully first-order oracles.Journal of Machine Learning Research, 26(109):1–56, 2025

Lesi Chen, Yaohua Ma, and Jingzhao Zhang. Near-optimal nonconvex-strongly-convex bilevel optimization with fully first-order oracles.Journal of Machine Learning Research, 26(109):1–56, 2025

2025

[14] [14]

A cubic regularization approach for finding local minimax points in nonconvex minimax optimization.Trans

Ziyi Chen, Zhengyang Hu, Qunwei Li, Zhe Wang, and Yi Zhou. A cubic regularization approach for finding local minimax points in nonconvex minimax optimization.Trans. Mach. Learn. Res., 2023

2023

[15] [15]

Generative adversarial networks.Commun

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Commun. ACM, 63(11):139–144, 2020

2020

[16] [16]

On gradients of functions definable in o-minimal structures.Ann

Krzysztof Kurdyka. On gradients of functions definable in o-minimal structures.Ann. Inst. Fourier (Grenoble), 48(3):769–783, 1998

1998

[17] [17]

A fully first-order method for stochastic bilevel optimization

Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, and Robert D Nowak. A fully first-order method for stochastic bilevel optimization. InInternational Conference on Machine Learning, pages 18083–18113. PMLR, 2023

2023

[18] [18]

Ching-pei Lee and Stephen J. Wright. Inexact variable metric stochastic block-coordinate descent for regularized optimization.J. Optim. Theory Appl., 185(1):151–187, 2020

2020

[19] [19]

Generalized metric subregularity with applications to high-order regularized newton methods.arXiv preprint arXiv:2406.13207, 2024

Guoyin Li, Boris Mordukhovich, and Jiangxing Zhu. Generalized metric subregularity with applications to high-order regularized newton methods.arXiv preprint arXiv:2406.13207, 2024

arXiv 2024

[20] [20]

Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization

Xiang Li, Junchi Yang, and Niao He. Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization. InThe Eleventh International Conference on Learning Representations (ICLR 2023), 2023

2023

[21] [21]

On gradient descent ascent for nonconvex-concave minimax problems

Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex-concave minimax problems. InInternational conference on machine learning, pages 6083–6093. PMLR, 2020

2020

[22] [22]

Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization.J

Tianyi Lin, Chi Jin, and Michael I Jordan. Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization.J. Mach. Learn. Res., 26(11):1–45, 2025

2025

[23] [23]

Institut des Hautes Etudes Scientifiques, 1965

Stanislaw Łojasiewicz.Ensembles semi-analytiques. Institut des Hautes Etudes Scientifiques, 1965

1965

[24] [24]

Finding second-order stationary points in nonconvex- strongly-concave minimax optimization.Adv

Luo Luo, Yujun Li, and Cheng Chen. Finding second-order stationary points in nonconvex- strongly-concave minimax optimization.Adv. Neural Inf. Process. Syst., 35:36667–36679, 2022. 27

2022

[25] [25]

Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in neural infor- mation processing systems, 33:20566–20577, 2020

Luo Luo, Haishan Ye, Zhichao Huang, and Tong Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in neural infor- mation processing systems, 33:20566–20577, 2020

2020

[26] [26]

Line-search and adaptive step sizes for nonconvex- strongly-concave minimax optimization, 2026

Bohao Ma, Nachuan Xiao, and Junyu Zhang. Line-search and adaptive step sizes for nonconvex- strongly-concave minimax optimization, 2026

2026

[27] [27]

Tight analysis of extra- gradient and optimistic gradient methods for nonconvex minimax problems.Adv

Pouria Mahdavinia, Yuyang Deng, Haochuan Li, and Mehrdad Mahdavi. Tight analysis of extra- gradient and optimistic gradient methods for nonconvex minimax problems.Adv. Neural Inf. Process. Syst., 35:31213–31225, 2022

2022

[28] [28]

Cubic regularization of newton method and its global per- formance.Mathematical programming, 108(1):177–205, 2006

Yurii Nesterov and Boris T Polyak. Cubic regularization of newton method and its global per- formance.Mathematical programming, 108(1):177–205, 2006

2006

[29] [29]

Sarah: A novel method for ma- chine learning problems using stochastic recursive gradient

Lam M Nguyen, Jie Liu, Katya Scheinberg, and Martin Tak ´aˇc. Sarah: A novel method for ma- chine learning problems using stochastic recursive gradient. InInternational conference on machine learning, pages 2613–2621. PMLR, 2017

2017

[30] [30]

Regularization properties of adversarially-trained linear regression.Adv

Antonio Ribeiro, Dave Zachariah, Francis Bach, and Thomas Sch ¨on. Regularization properties of adversarially-trained linear regression.Adv. Neural Inf. Process. Syst., 36:23658–23670, 2023

2023

[31] [31]

A newton-cg algorithm with complex- ity guarantees for smooth unconstrained optimization.Math

Cl ´ement W Royer, Michael O’Neill, and Stephen J Wright. A newton-cg algorithm with complex- ity guarantees for smooth unconstrained optimization.Math. Program., 180(1):451–488, 2020

2020

[32] [32]

Distribution- ally robust logistic regression.Adv

Soroosh Shafieezadeh Abadeh, Peyman M Mohajerin Esfahani, and Daniel Kuhn. Distribution- ally robust logistic regression.Adv. Neural Inf. Process. Syst., 28, 2015

2015

[33] [33]

Certifying some distribu- tional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

Aman Sinha, Hongseok Namkoong, Riccardo Volpi, and John Duchi. Certifying some distribu- tional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

arXiv 2017

[34] [34]

A globally convergent newton method for solving strongly monotone variational inequalities.Mathematical programming, 58(1):369–383, 1993

Kouichi Taji, Masao Fukushima, and Toshihide Ibaraki. A globally convergent newton method for solving strongly monotone variational inequalities.Mathematical programming, 58(1):369–383, 1993

1993

[35] [35]

Stochastic cubic regularization for fast nonconvex optimization.Advances in neural information processing systems, 31, 2018

Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, and Michael I Jordan. Stochastic cubic regularization for fast nonconvex optimization.Advances in neural information processing systems, 31, 2018

2018

[36] [36]

Gradient norm regularization second-order algorithms for solving nonconvex-strongly concave minimax problems.Journal of Scientific Computing, 105(2):41, 2025

Junlin Wang and Zi Xu. Gradient norm regularization second-order algorithms for solving nonconvex-strongly concave minimax problems.Journal of Scientific Computing, 105(2):41, 2025

2025

[37] [37]

Efficient first order method for saddle point problems with higher order smoothness.SIAM J

Nuozhou Wang, Junyu Zhang, and Shuzhong Zhang. Efficient first order method for saddle point problems with higher order smoothness.SIAM J. Optim., 34(4):3342–3370, 2024

2024

[38] [38]

Stochastic variance-reduced cubic regu- larization for nonconvex optimization

Zhe Wang, Yi Zhou, Yingbin Liang, and Guanghui Lan. Stochastic variance-reduced cubic regu- larization for nonconvex optimization. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 2731–2740. PMLR, 2019

2019

[39] [39]

A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems.Math

Zi Xu, Huiling Zhang, Yang Xu, and Guanghui Lan. A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems.Math. Program., 201(1):635–706, 2023

2023

[40] [40]

Second-order bilevel optimization with accelerated convergence rates.arXiv preprint arXiv:2605.06431, 2026

Sheng Yang, Chengchang Liu, Lesi Chen, and John Lui. Second-order bilevel optimization with accelerated convergence rates.arXiv preprint arXiv:2605.06431, 2026

Pith/arXiv arXiv 2026

[41] [41]

Faster stochastic cubic regularized new- ton methods with momentum.arXiv preprint arXiv:2507.13003, 2025

Yiming Yang, Chuan He, Xiao Wang, and Zheng Peng. Faster stochastic cubic regularized new- ton methods with momentum.arXiv preprint arXiv:2507.13003, 2025

arXiv 2025

[42] [42]

A homogeneous second-order descent method for nonconvex optimization.Mathematics of Operations Research, 2025

Chuwen Zhang, Chang He, Yuntian Jiang, Chenyu Xue, Bo Jiang, Dongdong Ge, and Yinyu Ye. A homogeneous second-order descent method for nonconvex optimization.Mathematics of Operations Research, 2025. 28

2025

[43] [43]

Adaptive stochastic variance reduction for sub- sampled newton method with cubic regularization.INFORMS Journal on Optimization, 4(1):45– 64, 2022

Junyu Zhang, Lin Xiao, and Shuzhong Zhang. Adaptive stochastic variance reduction for sub- sampled newton method with cubic regularization.INFORMS Journal on Optimization, 4(1):45– 64, 2022

2022

[44] [44]

Stochastic recursive variance-reduced cubic regularization methods

Dongruo Zhou and Quanquan Gu. Stochastic recursive variance-reduced cubic regularization methods. InInternational conference on artificial intelligence and statistics, pages 3980–3990. PMLR, 2020. 29

2020