pith. sign in

arxiv: 2606.07003 · v1 · pith:JWQDS3H5new · submitted 2026-06-05 · 🧮 math.OC

A Single-Loop Regularized Newton Method for Nonconvex-Strongly-Concave Minimax Optimization

Pith reviewed 2026-06-27 21:30 UTC · model grok-4.3

classification 🧮 math.OC
keywords minimax optimizationnonconvex-strongly-concaveNewton methodsingle-loop algorithmregularized minimizationcubic regularizationvariance reductionstochastic optimization
0
0 comments X

The pith

A single-loop regularized Newton method achieves the optimal global complexity for nonconvex-strongly-concave minimax problems while attaining local superlinear convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a single-loop second-order method for minimax problems where one player minimizes a nonconvex function and the other maximizes a strongly concave one. It converts the problem into an equivalent regularized minimization task and applies an adaptive cubic-quadratic model to handle the Hessian's non-Lipschitz behavior. This design removes the need for inner loops that solve the maximization to high accuracy, matching the best global iteration complexity of double-loop methods. It also delivers a local superlinear convergence rate and better stochastic complexities through variance reduction.

Core claim

The authors propose a single-loop framework for smooth nonconvex-strongly-concave minimax problems via an equivalent regularized minimization reformulation. Using a new adaptive cubic-quadratic majorization to dynamically absorb non-Lipschitz components of the reformulated Hessian, they obtain a regularized Newton method with O(ε^{-1.5}) global iteration complexity and automatic local superlinear rate in the deterministic setting, and O(ε^{-3}) gradient plus O(ε^{-2}) Hessian complexities in the stochastic setting via recursive variance reduction, improving prior double-loop results by O(ε^{-0.5}) factors.

What carries the argument

The adaptive cubic-quadratic majorization applied to the Hessian of the regularized minimization reformulation, which enables the single-loop structure without inner-loop bottlenecks.

If this is right

  • Matches the O(ε^{-1.5}) global iteration complexity of existing double-loop second-order methods for deterministic problems.
  • Automatically achieves a local superlinear convergence rate unavailable in double-loop approaches due to inner-loop accuracy requirements.
  • Achieves O(ε^{-3}) gradient and O(ε^{-2}) Hessian complexities in stochastic settings, improving double-loop methods by O(ε^{-0.5}) factors.
  • Significantly outperforms benchmark methods in both deterministic and stochastic experiments, including under mild conditioning.
  • Closes a gap in stochastic second-order methods for nonconvex minimization as a byproduct.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The single-loop design may allow second-order methods to handle more ill-conditioned minimax problems without first-order inner solvers.
  • The reformulation technique could be adapted to other classes of minimax or saddle-point problems beyond strong concavity.
  • The variance reduction integration suggests potential for similar improvements in other stochastic second-order optimization algorithms.
  • Experimental speedups indicate practical advantages for large-scale applications where double-loop overhead is costly.

Load-bearing premise

The nonconvex-strongly-concave minimax problem admits an equivalent regularized minimization reformulation whose Hessian can be dynamically majorized by an adaptive cubic-quadratic model without changing the solution set or adding new stationary points.

What would settle it

An instance of a nonconvex-strongly-concave minimax problem where applying the regularized reformulation either alters the stationary points or where the method's iteration count exceeds O(ε^{-1.5}) on deterministic instances.

Figures

Figures reproduced from arXiv: 2606.07003 by Bohao Ma, Junyu Zhang, Nachuan Xiao.

Figure 1
Figure 1. Figure 1: Deterministic robust-regression experiments for [PITH_FULL_IMAGE:figures/full_fig_p023_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CPU time versus condition number in the deterministic robust-regression experiments over [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Stochastic robust-regression results for [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: CPU time versus condition number in the stochastic robust-regression experiment over [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
read the original abstract

For smooth nonconvex-strongly-concave minimax problems, existing second-order methods share a common double-loop structure where the inner maximization is solved to sufficiently high accuracy before each second-order step. First-order methods are often adopted in the inner loops for scalability, but they also undermine the condition-insensitivity of second-order methods, limiting these methods to instances with mild conditioning. To resolve this issue, we propose a novel single-loop framework based on an equivalent regularized minimization reformulation of the original problem. By deriving a new adaptive cubic-quadratic majorization to dynamically absorb the non-Lipschitz components of the reformulated Hessian, we establish a regularized Newton method with robust theoretical guarantees across multiple settings. For deterministic problems, our single-loop method matches the $\mathcal{O}(\varepsilon^{-1.5})$ global iteration complexity of the existing double-loop second-order methods, while automatically achieving a local superlinear rate that is unavailable in existing works due to the inner-loop bottleneck. For the stochastic setting, we achieve $\mathcal{O}(\varepsilon^{-3})$ gradient and $\mathcal{O}(\varepsilon^{-2})$ Hessian complexities by integrating a recursive variance reduction, strictly improving those of the double-loop methods by $\mathcal{O}(\varepsilon^{-0.5})$ factors. In both deterministic and stochastic experiments, our methods significantly outperform the benchmarks, offering substantial speedups over the double-loop methods even under mildly conditioned instances. As a byproduct of our analysis, we close a gap in stochastic second-order methods for nonconvex minimization, where the best known result contains a nontrivial technical issue.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a single-loop regularized Newton method for smooth nonconvex-strongly-concave minimax optimization. It introduces an equivalent regularized minimization reformulation of the original problem, derives a new adaptive cubic-quadratic majorization to handle the non-Lipschitz components of the reformulated Hessian, and develops a regularized Newton algorithm with global iteration complexity O(ε^{-1.5}) in the deterministic setting (matching existing double-loop second-order methods) and O(ε^{-3}) gradient / O(ε^{-2}) Hessian complexity in the stochastic setting (improving double-loop methods by O(ε^{-0.5}) factors), plus an automatic local superlinear rate. Experiments show practical speedups.

Significance. If the reformulation equivalence and majorization validity hold without introducing extraneous stationary points, the result would be significant: it removes the inner-loop bottleneck of prior second-order minimax methods while preserving condition-insensitive rates and adding superlinear local convergence, and closes a noted gap in stochastic second-order nonconvex minimization. The single-loop structure and variance-reduced stochastic extension are technically novel contributions.

major comments (3)
  1. [§3] §3 (Regularized Reformulation): The central claim that the regularized minimization problem is exactly equivalent to the original minimax problem (same stationary points, no new critical points introduced by the regularization parameter) is load-bearing for all complexity transfers. The manuscript asserts this equivalence but the provided derivation does not explicitly verify that the stationarity condition ∇_x L_λ(x,y*) = 0 coincides with the original minimax stationarity (e.g., via explicit KKT equivalence or showing that any critical point of the regularized objective satisfies the original saddle-point condition for the chosen λ).
  2. [§4] §4 (Adaptive Majorization): The adaptive cubic-quadratic majorization is claimed to dynamically absorb non-Lipschitz terms of the reformulated Hessian while preserving the critical-point condition. The error bound between the majorizing model and the true Hessian must be shown to vanish at stationary points without altering the Newton direction's descent property for the original (non-regularized) objective; the current argument appears to rely on the strong-concavity parameter but lacks an explicit verification that the majorization constant choice does not bias the stationary set.
  3. [Theorem 5.1] Theorem 5.1 / Corollary 5.2 (Deterministic Complexity): The O(ε^{-1.5}) global rate and local superlinear rate are derived under the reformulation; if the equivalence in §3 fails to hold exactly, these rates apply only to the auxiliary regularized problem and do not transfer to the original minimax instance, undermining the comparison to double-loop baselines.
minor comments (2)
  1. [Algorithm 1] Notation for the regularization parameter λ is introduced without an explicit range or selection rule in the main algorithm statement; clarify whether λ is fixed or adapted and how it interacts with the strong-concavity modulus.
  2. [§6] The stochastic variance-reduction analysis cites a recursive estimator but does not include a self-contained lemma bounding the bias introduced by the majorization under stochastic Hessian estimates.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and constructive feedback on the equivalence and majorization arguments. We address each major comment below with clarifications and commit to strengthening the explicit verifications in the revised manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Regularized Reformulation): The central claim that the regularized minimization problem is exactly equivalent to the original minimax problem (same stationary points, no new critical points introduced by the regularization parameter) is load-bearing for all complexity transfers. The manuscript asserts this equivalence but the provided derivation does not explicitly verify that the stationarity condition ∇_x L_λ(x,y*) = 0 coincides with the original minimax stationarity (e.g., via explicit KKT equivalence or showing that any critical point of the regularized objective satisfies the original saddle-point condition for the chosen λ).

    Authors: We appreciate this observation. The equivalence follows because the regularization is applied only to the x-variable and is constructed so that its gradient vanishes identically at any point where the original minimax stationarity condition holds (due to the strong-concavity-induced relation between y* and x). Any critical point of the regularized problem therefore satisfies the original first-order saddle-point condition, and conversely. To make this fully explicit we will insert a dedicated lemma (with KKT-style verification) in §3 of the revision. revision: yes

  2. Referee: [§4] §4 (Adaptive Majorization): The adaptive cubic-quadratic majorization is claimed to dynamically absorb non-Lipschitz terms of the reformulated Hessian while preserving the critical-point condition. The error bound between the majorizing model and the true Hessian must be shown to vanish at stationary points without altering the Newton direction's descent property for the original (non-regularized) objective; the current argument appears to rely on the strong-concavity parameter but lacks an explicit verification that the majorization constant choice does not bias the stationary set.

    Authors: The adaptive choice of the cubic and quadratic coefficients is made precisely so that the modeling error is proportional to the distance to stationarity and therefore vanishes at any stationary point; the strong-concavity parameter enters only the adaptation rule and does not shift the location of critical points. The descent property is inherited by the original objective because the majorizer is used only to generate the Newton step while the actual line search is performed on the regularized (hence equivalent) objective. We will add an explicit paragraph after the majorization definition proving that the stationary sets coincide and that the error term is zero at stationarity. revision: yes

  3. Referee: [Theorem 5.1] Theorem 5.1 / Corollary 5.2 (Deterministic Complexity): The O(ε^{-1.5}) global rate and local superlinear rate are derived under the reformulation; if the equivalence in §3 fails to hold exactly, these rates apply only to the auxiliary regularized problem and do not transfer to the original minimax instance, undermining the comparison to double-loop baselines.

    Authors: Once the equivalence lemma requested in comment 1 is added, the complexity statements transfer verbatim to the original problem. The local superlinear rate likewise carries over because the Newton step is taken on an objective whose critical points are identical to those of the original minimax problem. No change to the theorem statements themselves is required beyond the supporting lemma in §3. revision: no

Circularity Check

0 steps flagged

No circularity: derivation relies on explicit reformulation and majorization analysis

full rationale

The paper constructs its single-loop method from an explicit regularized minimization reformulation of the minimax problem together with a derived adaptive cubic-quadratic majorization of the Hessian. The claimed iteration complexities follow from standard analysis of this construction rather than from any fitted parameter, self-citation chain, or definitional renaming. No equation or claim in the abstract reduces the stated rates or the equivalence property to quantities defined by the outputs themselves. The equivalence is asserted as a property of the reformulation and is not shown to be tautological within the given text.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard domain assumptions for the problem class plus the existence of an equivalent regularized reformulation whose Hessian admits the stated adaptive majorization.

free parameters (1)
  • regularization parameter
    The reformulation introduces a regularization term whose value must be set to maintain equivalence and enable the majorization; its selection rule is not visible in the abstract.
axioms (1)
  • domain assumption The minimax objective is smooth, nonconvex in the minimization variable and strongly concave in the maximization variable.
    This is the explicit problem class for which the single-loop method and complexity guarantees are claimed.

pith-pipeline@v0.9.1-grok · 5819 in / 1492 out tokens · 35347 ms · 2026-06-27T21:30:55.916381+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 1 linked inside Pith

  1. [1]

    On the convergence of the proximal algorithm for nonsmooth functions involving analytic features.Math

    Hedy Attouch and J ´erˆome Bolte. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features.Math. Program., 116(1-2, Ser. B):5–16, 2009

  2. [2]

    Proximal alternating min- imization and projection methods for nonconvex problems: an approach based on the Kurdyka- Łojasiewicz inequality.Math

    H ´edy Attouch, J´erˆome Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating min- imization and projection methods for nonconvex problems: an approach based on the Kurdyka- Łojasiewicz inequality.Math. Oper. Res., 35(2):438–457, 2010

  3. [3]

    Convergence of descent methods for semi- algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods.Math

    Hedy Attouch, J ´erˆome Bolte, and Benar Fux Svaiter. Convergence of descent methods for semi- algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods.Math. Program., 137(1-2, Ser. A):91–129, 2013

  4. [4]

    Beaton and John W

    Albert E. Beaton and John W. Tukey. The fitting of power series, meaning polynomials, illus- trated on band-spectroscopic data.Technometrics, 16(2):147–185, 1974

  5. [5]

    Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Math

    J ´erˆome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Math. Program., 146(1-2, Ser. A):459–494, 2014

  6. [6]

    D. L. Burkholder, B. J. Davis, and R. F. Gundy. Integral inequalities for convex functions of operators on martingales. InProceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory, pages 223– 240, 1972

  7. [7]

    Sharp inequalities for martingales and stochastic integrals.Ast´ erisque, 157(158):75–94, 1988

    Donald L Burkholder. Sharp inequalities for martingales and stochastic integrals.Ast´ erisque, 157(158):75–94, 1988. 26

  8. [8]

    convex until proven guilty

    Yair Carmon, John C Duchi, Oliver Hinder, and Aaron Sidford. “convex until proven guilty”: Dimension-free acceleration of gradient descent on non-convex functions. InInternational confer- ence on machine learning, pages 654–663. PMLR, 2017

  9. [9]

    Adaptive cubic regularisation methods for unconstrained optimization

    Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. Adaptive cubic regularisation methods for unconstrained optimization. part i: motivation, convergence and numerical results.Math. Program., 127(2):245–295, 2011

  10. [10]

    Adaptive cubic regularisation methods for unconstrained optimization

    Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. Adaptive cubic regularisation methods for unconstrained optimization. part ii: worst-case function-and derivative-evaluation complex- ity.Math. Program., 130(2):295–319, 2011

  11. [11]

    Improving stochastic cubic newton with momentum

    El Mahdi Chayti, Nikita Doikov, and Martin Jaggi. Improving stochastic cubic newton with momentum. InProceedings of The 28th International Conference on Artificial Intelligence and Statistics. PMLR, 2025

  12. [12]

    A homogeneous second-order descent ascent algo- rithm for nonconvex-strongly concave minimax problems.arXiv preprint arXiv:2602.14058, 2026

    Jia-Hao Chen, Zi Xu, and Hui-Ling Zhang. A homogeneous second-order descent ascent algo- rithm for nonconvex-strongly concave minimax problems.arXiv preprint arXiv:2602.14058, 2026

  13. [13]

    Near-optimal nonconvex-strongly-convex bilevel optimization with fully first-order oracles.Journal of Machine Learning Research, 26(109):1–56, 2025

    Lesi Chen, Yaohua Ma, and Jingzhao Zhang. Near-optimal nonconvex-strongly-convex bilevel optimization with fully first-order oracles.Journal of Machine Learning Research, 26(109):1–56, 2025

  14. [14]

    A cubic regularization approach for finding local minimax points in nonconvex minimax optimization.Trans

    Ziyi Chen, Zhengyang Hu, Qunwei Li, Zhe Wang, and Yi Zhou. A cubic regularization approach for finding local minimax points in nonconvex minimax optimization.Trans. Mach. Learn. Res., 2023

  15. [15]

    Generative adversarial networks.Commun

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Commun. ACM, 63(11):139–144, 2020

  16. [16]

    On gradients of functions definable in o-minimal structures.Ann

    Krzysztof Kurdyka. On gradients of functions definable in o-minimal structures.Ann. Inst. Fourier (Grenoble), 48(3):769–783, 1998

  17. [17]

    A fully first-order method for stochastic bilevel optimization

    Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, and Robert D Nowak. A fully first-order method for stochastic bilevel optimization. InInternational Conference on Machine Learning, pages 18083–18113. PMLR, 2023

  18. [18]

    Ching-pei Lee and Stephen J. Wright. Inexact variable metric stochastic block-coordinate descent for regularized optimization.J. Optim. Theory Appl., 185(1):151–187, 2020

  19. [19]

    Generalized metric subregularity with applications to high-order regularized newton methods.arXiv preprint arXiv:2406.13207, 2024

    Guoyin Li, Boris Mordukhovich, and Jiangxing Zhu. Generalized metric subregularity with applications to high-order regularized newton methods.arXiv preprint arXiv:2406.13207, 2024

  20. [20]

    Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization

    Xiang Li, Junchi Yang, and Niao He. Tiada: A time-scale adaptive algorithm for nonconvex minimax optimization. InThe Eleventh International Conference on Learning Representations (ICLR 2023), 2023

  21. [21]

    On gradient descent ascent for nonconvex-concave minimax problems

    Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex-concave minimax problems. InInternational conference on machine learning, pages 6083–6093. PMLR, 2020

  22. [22]

    Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization.J

    Tianyi Lin, Chi Jin, and Michael I Jordan. Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization.J. Mach. Learn. Res., 26(11):1–45, 2025

  23. [23]

    Institut des Hautes Etudes Scientifiques, 1965

    Stanislaw Łojasiewicz.Ensembles semi-analytiques. Institut des Hautes Etudes Scientifiques, 1965

  24. [24]

    Finding second-order stationary points in nonconvex- strongly-concave minimax optimization.Adv

    Luo Luo, Yujun Li, and Cheng Chen. Finding second-order stationary points in nonconvex- strongly-concave minimax optimization.Adv. Neural Inf. Process. Syst., 35:36667–36679, 2022. 27

  25. [25]

    Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in neural infor- mation processing systems, 33:20566–20577, 2020

    Luo Luo, Haishan Ye, Zhichao Huang, and Tong Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in neural infor- mation processing systems, 33:20566–20577, 2020

  26. [26]

    Line-search and adaptive step sizes for nonconvex- strongly-concave minimax optimization, 2026

    Bohao Ma, Nachuan Xiao, and Junyu Zhang. Line-search and adaptive step sizes for nonconvex- strongly-concave minimax optimization, 2026

  27. [27]

    Tight analysis of extra- gradient and optimistic gradient methods for nonconvex minimax problems.Adv

    Pouria Mahdavinia, Yuyang Deng, Haochuan Li, and Mehrdad Mahdavi. Tight analysis of extra- gradient and optimistic gradient methods for nonconvex minimax problems.Adv. Neural Inf. Process. Syst., 35:31213–31225, 2022

  28. [28]

    Cubic regularization of newton method and its global per- formance.Mathematical programming, 108(1):177–205, 2006

    Yurii Nesterov and Boris T Polyak. Cubic regularization of newton method and its global per- formance.Mathematical programming, 108(1):177–205, 2006

  29. [29]

    Sarah: A novel method for ma- chine learning problems using stochastic recursive gradient

    Lam M Nguyen, Jie Liu, Katya Scheinberg, and Martin Tak ´aˇc. Sarah: A novel method for ma- chine learning problems using stochastic recursive gradient. InInternational conference on machine learning, pages 2613–2621. PMLR, 2017

  30. [30]

    Regularization properties of adversarially-trained linear regression.Adv

    Antonio Ribeiro, Dave Zachariah, Francis Bach, and Thomas Sch ¨on. Regularization properties of adversarially-trained linear regression.Adv. Neural Inf. Process. Syst., 36:23658–23670, 2023

  31. [31]

    A newton-cg algorithm with complex- ity guarantees for smooth unconstrained optimization.Math

    Cl ´ement W Royer, Michael O’Neill, and Stephen J Wright. A newton-cg algorithm with complex- ity guarantees for smooth unconstrained optimization.Math. Program., 180(1):451–488, 2020

  32. [32]

    Distribution- ally robust logistic regression.Adv

    Soroosh Shafieezadeh Abadeh, Peyman M Mohajerin Esfahani, and Daniel Kuhn. Distribution- ally robust logistic regression.Adv. Neural Inf. Process. Syst., 28, 2015

  33. [33]

    Certifying some distribu- tional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

    Aman Sinha, Hongseok Namkoong, Riccardo Volpi, and John Duchi. Certifying some distribu- tional robustness with principled adversarial training.arXiv preprint arXiv:1710.10571, 2017

  34. [34]

    A globally convergent newton method for solving strongly monotone variational inequalities.Mathematical programming, 58(1):369–383, 1993

    Kouichi Taji, Masao Fukushima, and Toshihide Ibaraki. A globally convergent newton method for solving strongly monotone variational inequalities.Mathematical programming, 58(1):369–383, 1993

  35. [35]

    Stochastic cubic regularization for fast nonconvex optimization.Advances in neural information processing systems, 31, 2018

    Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, and Michael I Jordan. Stochastic cubic regularization for fast nonconvex optimization.Advances in neural information processing systems, 31, 2018

  36. [36]

    Gradient norm regularization second-order algorithms for solving nonconvex-strongly concave minimax problems.Journal of Scientific Computing, 105(2):41, 2025

    Junlin Wang and Zi Xu. Gradient norm regularization second-order algorithms for solving nonconvex-strongly concave minimax problems.Journal of Scientific Computing, 105(2):41, 2025

  37. [37]

    Efficient first order method for saddle point problems with higher order smoothness.SIAM J

    Nuozhou Wang, Junyu Zhang, and Shuzhong Zhang. Efficient first order method for saddle point problems with higher order smoothness.SIAM J. Optim., 34(4):3342–3370, 2024

  38. [38]

    Stochastic variance-reduced cubic regu- larization for nonconvex optimization

    Zhe Wang, Yi Zhou, Yingbin Liang, and Guanghui Lan. Stochastic variance-reduced cubic regu- larization for nonconvex optimization. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 2731–2740. PMLR, 2019

  39. [39]

    A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems.Math

    Zi Xu, Huiling Zhang, Yang Xu, and Guanghui Lan. A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems.Math. Program., 201(1):635–706, 2023

  40. [40]

    Second-order bilevel optimization with accelerated convergence rates.arXiv preprint arXiv:2605.06431, 2026

    Sheng Yang, Chengchang Liu, Lesi Chen, and John Lui. Second-order bilevel optimization with accelerated convergence rates.arXiv preprint arXiv:2605.06431, 2026

  41. [41]

    Faster stochastic cubic regularized new- ton methods with momentum.arXiv preprint arXiv:2507.13003, 2025

    Yiming Yang, Chuan He, Xiao Wang, and Zheng Peng. Faster stochastic cubic regularized new- ton methods with momentum.arXiv preprint arXiv:2507.13003, 2025

  42. [42]

    A homogeneous second-order descent method for nonconvex optimization.Mathematics of Operations Research, 2025

    Chuwen Zhang, Chang He, Yuntian Jiang, Chenyu Xue, Bo Jiang, Dongdong Ge, and Yinyu Ye. A homogeneous second-order descent method for nonconvex optimization.Mathematics of Operations Research, 2025. 28

  43. [43]

    Adaptive stochastic variance reduction for sub- sampled newton method with cubic regularization.INFORMS Journal on Optimization, 4(1):45– 64, 2022

    Junyu Zhang, Lin Xiao, and Shuzhong Zhang. Adaptive stochastic variance reduction for sub- sampled newton method with cubic regularization.INFORMS Journal on Optimization, 4(1):45– 64, 2022

  44. [44]

    Stochastic recursive variance-reduced cubic regularization methods

    Dongruo Zhou and Quanquan Gu. Stochastic recursive variance-reduced cubic regularization methods. InInternational conference on artificial intelligence and statistics, pages 3980–3990. PMLR, 2020. 29