pith. sign in

arxiv: 2412.18929 · v5 · submitted 2024-12-25 · 🧮 math.OC

Alternating Gradient-Type Algorithm for Bilevel Optimization with Inexact Lower-Level Solutions via Moreau Envelope-based Reformulation

Pith reviewed 2026-05-23 07:23 UTC · model grok-4.3

classification 🧮 math.OC
keywords bilevel optimizationalternating gradient algorithmMoreau envelopeinexact lower-level solutionsconvergence analysishyperparameter selectionKurdyka-Łojasiewicz property
0
0 comments X

The pith

An alternating gradient algorithm converges to stationary points in bilevel optimization by allowing inexact lower-level solutions through a Moreau envelope reformulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops AGILS, an alternating gradient-type method for bilevel problems whose lower level is a convex composite model. It reformulates the problem using the Moreau envelope so that each iteration can use an approximate lower-level solution instead of an exact one. Convergence to stationary points is established in general, and sequential convergence follows when the objective satisfies the Kurdyka-Łojasiewicz property. The approach targets applications such as hyperparameter selection for regularized regression models, where repeated exact inner solves are expensive. Experiments on a toy problem and on sparse group Lasso hyperparameter tuning illustrate practical performance.

Core claim

The AGILS algorithm, obtained from a Moreau envelope reformulation of the bilevel problem, converges to stationary points without requiring exact lower-level solutions at every iteration; under the Kurdyka-Łojasiewicz property it also converges sequentially.

What carries the argument

The Moreau envelope-based reformulation that converts the bilevel problem into an alternating gradient scheme tolerant of inexact inner solves.

If this is right

  • Bilevel hyperparameter tuning can be performed without solving the lower-level problem to high accuracy at each outer iteration.
  • Convergence guarantees apply whenever the lower level satisfies the stated convexity and composite structure.
  • Sequential convergence holds on problems obeying the Kurdyka-Łojasiewicz inequality.
  • The method extends the class of bilevel solvers that remain practical when inner problems are themselves iterative.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reformulation idea might be tested on lower levels that are only weakly convex or satisfy error bounds short of full convexity.
  • Runtime comparisons against exact inner-solve baselines would quantify the efficiency gain on larger regression models.
  • The KL-based sequential convergence could be checked on problems whose objective is known to satisfy the property, such as certain quadratic or piecewise-linear cases.

Load-bearing premise

The lower-level problem must be a convex composite optimization model.

What would settle it

An instance of the bilevel problem whose lower level is convex composite, for which AGILS is run with the stated step-size rules and the iterates fail to approach any stationary point.

Figures

Figures reproduced from arXiv: 2412.18929 by Jin Zhang, Lezhi Zhang, Shangzhi Zeng, Xiaoning Bai.

Figure 1
Figure 1. Figure 1: Effectiveness of the inexact criterion in AGILS: comparison with two extreme variants [PITH_FULL_IMAGE:figures/full_fig_p027_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Iteration and computational time of AGILS on the toy example with varying dimensions [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗
read the original abstract

In this paper, we study a class of bilevel optimization problems where the lower-level problem is a convex composite optimization model, which arises in various applications, including bilevel hyperparameter selection for regularized regression models. To solve these problems, we propose an Alternating Gradient-type algorithm with Inexact Lower-level Solutions (AGILS) based on a Moreau envelope-based reformulation of the bilevel optimization problem. The proposed algorithm does not require exact solutions of the lower-level problem at each iteration, improving computational efficiency. We prove the convergence of AGILS to stationary points and, under the Kurdyka-{\L}ojasiewicz (KL) property, establish its sequential convergence. Numerical experiments, including a toy example and a bilevel hyperparameter selection problem for the sparse group Lasso model, demonstrate the effectiveness of the proposed AGILS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes the Alternating Gradient-type algorithm with Inexact Lower-level Solutions (AGILS) for bilevel optimization problems in which the lower level is a convex composite optimization model. The approach relies on a Moreau envelope-based reformulation that permits inexact lower-level solves at each iteration. The authors prove convergence of the generated sequence to stationary points of the reformulated problem and establish sequential convergence under the Kurdyka-Łojasiewicz property. Numerical results on a toy example and on bilevel hyperparameter selection for the sparse group Lasso are presented to illustrate practical performance.

Significance. If the stated convergence results hold, the work supplies a practical algorithmic framework for bilevel problems that avoids the computational cost of exact lower-level solves. The Moreau-envelope reformulation is a technically natural device for handling inexactness, and the KL-based sequential convergence result strengthens the analysis beyond mere stationarity. The hyperparameter-selection application is a relevant and timely test case.

minor comments (3)
  1. [§2.2] §2.2, Definition 2.3: the precise relationship between the original bilevel value function and the Moreau-envelope reformulation could be stated as a formal proposition rather than left implicit in the surrounding text.
  2. [Algorithm 1] Algorithm 1, line 8: the tolerance schedule for the inexact lower-level solver is described only verbally; an explicit rule (e.g., a summable sequence) would make the implementation unambiguous.
  3. [Table 1] Table 1: the reported CPU times do not indicate whether the lower-level solver tolerance was held fixed across methods or adapted; a short clarifying sentence would aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives an alternating gradient-type algorithm (AGILS) from a Moreau-envelope reformulation of a bilevel problem whose lower level is a convex composite model, then proves convergence to stationary points (and sequential convergence under the KL property) via standard arguments on the reformulated problem. No step reduces by construction to a fitted input, self-definition, or load-bearing self-citation; the claims rest on forward mathematical analysis rather than renaming or circular re-use of the target result itself. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the convexity and composite structure of the lower-level problem plus standard assumptions needed for the Moreau envelope to be well-defined and differentiable; no free parameters or invented entities are visible from the abstract.

axioms (1)
  • domain assumption The lower-level problem is a convex composite optimization model.
    Explicitly stated as the class of problems studied in the abstract.

pith-pipeline@v0.9.0 · 5681 in / 1096 out tokens · 19821 ms · 2026-05-23T07:23:04.170684+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Optimality Conditions and Numerical Algorithms for a Class of Minimax Bilevel Optimization Problems

    math.OC 2026-04 unverdicted novelty 5.0

    Optimality conditions are established for minimax bilevel problems via KKT reconstruction, and projected gradient multi-step ascent-descent algorithms are proposed that achieve ε-KKT solutions in O(ε^{-3} log(ε^{-1}))...

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · cited by 1 Pith paper

  1. [1]

    Unified smoothing approach for best hyperparameter selection problem using a bilevel optimization strategy.Mathematical Programming, pages 1–40, 2024

    Jan Harold Alcantara, Chieu Thanh Nguyen, Takayuki Okuno, Akiko Takeda, and Jein-Shan Chen. Unified smoothing approach for best hyperparameter selection problem using a bilevel optimization strategy.Mathematical Programming, pages 1–40, 2024

  2. [2]

    Solving bilevel programs with the KKT-approach

    Gemayqzel Bouza Allende and Georg Still. Solving bilevel programs with the KKT-approach. Mathematical Programming, 138:309–332, 2013

  3. [3]

    H´ edy Attouch, J´ erˆ ome Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Lojasiewicz inequality.Mathematics of Operations Research, 35(2):438–457, 2010

  4. [4]

    Hedy Attouch, J´ erˆ ome Bolte, and Benar Fux Svaiter. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods.Mathematical Programming, 137(1):91–129, 2013

  5. [5]

    Optimality conditions for bilevel programmes via Moreau envelope reformulation.Optimization, pages 1–35, 2024

    Kuang Bai, Jane J Ye, and Shangzhi Zeng. Optimality conditions for bilevel programmes via Moreau envelope reformulation.Optimization, pages 1–35, 2024

  6. [6]

    Springer, New York, 1998

    Jonathan F Bard.Practical bilevel optimization: Algorithms and applications. Springer, New York, 1998

  7. [7]

    SIAM, Philadelphia, 2017

    Amir Beck.First-order methods in optimization. SIAM, Philadelphia, 2017

  8. [8]

    Springer, Berlin, 2008

    Kristin P Bennett, Gautam Kunapuli, Jing Hu, and Jong-Shi Pang.Bilevel optimization and machine learning. Springer, Berlin, 2008

  9. [9]

    Making a science of model search: Hyper- parameter optimization in hundreds of dimensions for vision architectures

    James Bergstra, Daniel Yamins, and David Cox. Making a science of model search: Hyper- parameter optimization in hundreds of dimensions for vision architectures. InInternational Conference on Machine Learning, 2013

  10. [10]

    Implicit differentiation of Lasso-type models for hyperparameter optimization

    Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel, Samuel Vaiter, Alexandre Gram- fort, and Joseph Salmon. Implicit differentiation of Lasso-type models for hyperparameter optimization. InInternational Conference on Machine Learning, 2020

  11. [11]

    The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems.SIAM Journal on Optimization, 17(4):1205–1223, 2007

    J´ erˆ ome Bolte, Aris Daniilidis, and Adrian Lewis. The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems.SIAM Journal on Optimization, 17(4):1205–1223, 2007

  12. [12]

    Clarke subgradients of stratifiable functions.SIAM Journal on Optimization, 18(2):556–572, 2007

    J´ erˆ ome Bolte, Aris Daniilidis, Adrian Lewis, and Masahiro Shiota. Clarke subgradients of stratifiable functions.SIAM Journal on Optimization, 18(2):556–572, 2007

  13. [13]

    Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity.Transactions of the Ameri- can Mathematical Society, 362(6):3319–3363, 2010

    J´ erˆ ome Bolte, Aris Daniilidis, Olivier Ley, and Laurent Mazet. Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity.Transactions of the Ameri- can Mathematical Society, 362(6):3319–3363, 2010

  14. [14]

    Proximal alternating linearized minimiza- tion for nonconvex and nonsmooth problems.Mathematical Programming, 146(1):459–494, 2014

    J´ erˆ ome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimiza- tion for nonconvex and nonsmooth problems.Mathematical Programming, 146(1):459–494, 2014

  15. [15]

    Springer, New York, 2013

    J Fr´ ed´ eric Bonnans and Alexander Shapiro.Perturbation analysis of optimization problems. Springer, New York, 2013

  16. [16]

    Solving nonlin- ear principal-agent problems using bilevel programming.European Journal of Operational Research, 230(2):364–373, 2013

    Mark Cecchini, Joseph Ecker, Michael Kupferschmid, and Robert Leitch. Solving nonlin- ear principal-agent problems using bilevel programming.European Journal of Operational Research, 230(2):364–373, 2013

  17. [17]

    Closing the gap: Tighter analysis of alternat- ing stochastic gradient methods for bilevel problems

    Tianyi Chen, Yuejiao Sun, and Wotao Yin. Closing the gap: Tighter analysis of alternat- ing stochastic gradient methods for bilevel problems. InAdvances in Neural Information Processing Systems, 2021. 30

  18. [18]

    An overview of bilevel optimization

    Benoˆ ıt Colson, Patrice Marcotte, and Gilles Savard. An overview of bilevel optimization. Annals of Operations Research, 153:235–256, 2007

  19. [19]

    Optimizing frequencies in a transit network: A nonlinear bi-level programming approach.International Transactions in Operational Research, 2(2):149–164, 1995

    Isabelle Constantin and Michael Florian. Optimizing frequencies in a transit network: A nonlinear bi-level programming approach.International Transactions in Operational Research, 2(2):149–164, 1995

  20. [20]

    Springer, New York, 2002

    Stephan Dempe.Foundations of bilevel programming. Springer, New York, 2002

  21. [21]

    Springer, Cham, 2020

    Stephan Dempe.Bilevel optimization: Advances and next challenges. Springer, Cham, 2020

  22. [22]

    The bilevel programming problem: Reformulations, constraint qualifications and optimality conditions.Mathematical Programming, 138:447–473, 2013

    Stephan Dempe and Alain B Zemkoho. The bilevel programming problem: Reformulations, constraint qualifications and optimality conditions.Mathematical Programming, 138:447–473, 2013

  23. [23]

    Error bounds, quadratic growth, and linear conver- gence of proximal methods.Mathematics of Operations Research, 43(3):919–948, 2018

    Dmitriy Drusvyatskiy and Adrian Lewis. Error bounds, quadratic growth, and linear conver- gence of proximal methods.Mathematics of Operations Research, 43(3):919–948, 2018

  24. [24]

    Efficiency of minimizing compositions of convex functions and smooth maps.Mathematical Programming, 178(1):503–558, 2019

    Dmitriy Drusvyatskiy and Courtney Paquette. Efficiency of minimizing compositions of convex functions and smooth maps.Mathematical Programming, 178(1):503–558, 2019

  25. [25]

    Springer, New York, 2007

    Francisco Facchinei and Jong-Shi Pang.Finite-dimensional variational inequalities and com- plementadrity problems. Springer, New York, 2007

  26. [26]

    Gradient-based regularization parameter selection for prob- lems with nonsmooth penalty functions.Journal of Computational and Graphical Statistics, 27(2):426–435, 2018

    Jean Feng and Noah Simon. Gradient-based regularization parameter selection for prob- lems with nonsmooth penalty functions.Journal of Computational and Graphical Statistics, 27(2):426–435, 2018

  27. [27]

    Semismooth Newton-type method for bilevel optimization: Global convergence and extensive numerical experiments.Optimiza- tion Methods & Software, 37(5):1770–1804, 2022

    Andreas Fischer, Alain B Zemkoho, and Shenglong Zhou. Semismooth Newton-type method for bilevel optimization: Global convergence and extensive numerical experiments.Optimiza- tion Methods & Software, 37(5):1770–1804, 2022

  28. [28]

    Gauss–Newton-type methods for bilevel optimization.Computational Optimization and Applications, 78(3):793–824, 2021

    J¨ org Fliege, Andrey Tin, and Alain B Zemkoho. Gauss–Newton-type methods for bilevel optimization.Computational Optimization and Applications, 78(3):793–824, 2021

  29. [29]

    Bilevel programming for hyperparameter optimization and meta-learning

    Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. InInternational Conference on Machine Learning, 2018

  30. [30]

    Value function based difference-of-convex algorithm for bilevel hyperparameter selection problems

    Lucy L Gao, Jane J Ye, Haian Yin, Shangzhi Zeng, and Jin Zhang. Value function based difference-of-convex algorithm for bilevel hyperparameter selection problems. InInternational Conference on Machine Learning, 2022

  31. [31]

    Gao, Jane J

    Lucy L. Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, and Jin Zhang. Moreau envelope based difference-of- weakly-convex reformulation and algorithm for bilevel programs.preprint, arXiv:2306.16761, 2024

  32. [32]

    Executive compensation and principal-agent theory.Journal of Political Economy, 102(6):1175–1199, 1994

    John E Garen. Executive compensation and principal-agent theory.Journal of Political Economy, 102(6):1175–1199, 1994

  33. [33]

    On the iteration complexity of hypergradient computation

    Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, and Saverio Salzo. On the iteration complexity of hypergradient computation. InInternational Conference on Machine Learning, 2020

  34. [34]

    A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor- critic.SIAM Journal on Optimization, 33(1):147–180, 2023

    Mingyi Hong, Hoi-To Wai, Zhaoran Wang, and Zhuoran Yang. A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor- critic.SIAM Journal on Optimization, 33(1):147–180, 2023

  35. [35]

    Bilevel optimization: Convergence analysis and enhanced design

    Kaiyi Ji, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design. InInternational Conference on Machine Learning, 2021. 31

  36. [36]

    A fresh look at nonsmooth Levenberg–Marquardt methods with applications to bilevel optimization.Optimization, pages 1–48, 2024

    Lateef O Jolaoso, Patrick Mehlitz, and Alain B Zemkoho. A fresh look at nonsmooth Levenberg–Marquardt methods with applications to bilevel optimization.Optimization, pages 1–48, 2024

  37. [37]

    Linear convergence of gradient and proximal- gradient methods under the polyak- lojasiewicz condition

    Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal- gradient methods under the polyak- lojasiewicz condition. InJoint European conference on machine learning and knowledge discovery in databases, pages 795–811. Springer, 2016

  38. [38]

    Classification model selection via bilevel programming.Optimization Methods & Software, 23(4):475–489, 2008

    Gautam Kunapuli, Kristin P Bennett, Jing Hu, and Jong-Shi Pang. Classification model selection via bilevel programming.Optimization Methods & Software, 23(4):475–489, 2008

  39. [39]

    A fully first-order method for stochastic bilevel optimization

    Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, and Robert D Nowak. A fully first-order method for stochastic bilevel optimization. InInternational Conference on Machine Learning, 2023

  40. [40]

    Bome! bilevel optimization made easy: A simple first-order approach

    Bo Liu, Mao Ye, Stephen Wright, Peter Stone, and Qiang Liu. Bome! bilevel optimization made easy: A simple first-order approach. InAdvances in Neural Information Processing Systems, 2022

  41. [41]

    Risheng Liu, Jiaxin Gao, Jin Zhang, Deyu Meng, and Zhouchen Lin. Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):10045–10067, 2021

  42. [42]

    Moreau envelope for noncon- vex bi-level optimization: A single-loop and Hessian-free solution strategy

    Risheng Liu, Zhu Liu, Wei Yao, Shangzhi Zeng, and Jin Zhang. Moreau envelope for noncon- vex bi-level optimization: A single-loop and Hessian-free solution strategy. InInternational Conference on Machine Learning, 2024

  43. [43]

    SLM: A smoothed first-order Lagrangian method for structured constrained nonconvex optimization

    Songtao Lu. SLM: A smoothed first-order Lagrangian method for structured constrained nonconvex optimization. InAdvances in Neural Information Processing Systems, 2024

  44. [44]

    First-order penalty methods for bilevel optimization.SIAM Journal on Optimization, 34(2):1937–1969, 2024

    Zhaosong Lu and Sanyou Mei. First-order penalty methods for bilevel optimization.SIAM Journal on Optimization, 34(2):1937–1969, 2024

  45. [45]

    Cambridge University Press, Cambridge, 1996

    Zhi-Quan Luo, Jong-Shi Pang, and Daniel Ralph.Mathematical programs with equilibrium constraints. Cambridge University Press, Cambridge, 1996

  46. [46]

    Bilevel programming in traffic planning: Models, methods and challenge

    Athanasios Migdalas. Bilevel programming in traffic planning: Models, methods and challenge. Journal of Global Optimization, 7:381–405, 1995

  47. [47]

    The theory of moral hazard and unobservable behaviour: Part I.Review of Economic Studies, 66(1):3–21, 1999

    James A Mirrlees. The theory of moral hazard and unobservable behaviour: Part I.Review of Economic Studies, 66(1):3–21, 1999

  48. [48]

    A globally convergent proximal Newton-type method in nonsmooth convex optimization.Mathematical Program- ming, 198(1):899–936, 2023

    Boris S Mordukhovich, Xiaoming Yuan, Shangzhi Zeng, and Jin Zhang. A globally convergent proximal Newton-type method in nonsmooth convex optimization.Mathematical Program- ming, 198(1):899–936, 2023

  49. [49]

    Springer, Cham, 2018

    Boris Sholimovich Mordukhovich.Variational analysis and applications. Springer, Cham, 2018

  50. [50]

    Approximate convex functions.Journal of Nonlinear and Convex Analysis, 1(2):155–176, 2000

    Huynh V Ngai, Dinh T Luc, and M Th´ era. Approximate convex functions.Journal of Nonlinear and Convex Analysis, 1(2):155–176, 2000

  51. [51]

    The quasigradient method for the solving of the nonlinear programming problems.Cybernetics and Systems Analysis, 9(1):145–150, 1973

    Evgeni Alekseevich Nurminskii. The quasigradient method for the solving of the nonlinear programming problems.Cybernetics and Systems Analysis, 9(1):145–150, 1973

  52. [52]

    Onℓ p- hyperparameter learning via bilevel nonsmooth optimization.Journal of Machine Learning Research, 22(245):1–47, 2021

    Takayuki Okuno, Akiko Takeda, Akihiro Kawana, and Motokazu Watanabe. Onℓ p- hyperparameter learning via bilevel nonsmooth optimization.Journal of Machine Learning Research, 22(245):1–47, 2021

  53. [53]

    On the numerical solution of a class of Stackelberg problems.Zeitschrift f¨ ur Operations Research, 34:255–277, 1990

    Jiˇ r´ ı Outrata. On the numerical solution of a class of Stackelberg problems.Zeitschrift f¨ ur Operations Research, 34:255–277, 1990. 32

  54. [54]

    Springer, New York, 1998

    Jiri Outrata, Michal Kocvara, and Jochem Zowe.Nonsmooth approach to optimization prob- lems with equilibrium constraints: Theory, applications and numerical results. Springer, New York, 1998

  55. [55]

    On penalty-based bilevel gradient descent method

    Han Shen and Tianyi Chen. On penalty-based bilevel gradient descent method. InInterna- tional Conference on Machine Learning, 2023

  56. [56]

    A sparse-group Lasso

    Noah Simon, Jerome Friedman, Trevor Hastie, and Robert Tibshirani. A sparse-group Lasso. Journal of Computational and Graphical Statistics, 22(2):231–245, 2013

  57. [57]

    Springer, Berlin, 2010

    Heinrich Von Stackelberg.Market structure and equilibrium. Springer, Berlin, 2010

  58. [58]

    On the implementation of an interior-point fil- ter line-search algorithm for large-scale nonlinear programming.Mathematical programming, 106(1):25–57, 2006

    Andreas W¨ achter and Lorenz T Biegler. On the implementation of an interior-point fil- ter line-search algorithm for large-scale nonlinear programming.Mathematical programming, 106(1):25–57, 2006

  59. [59]

    Calculus rules of the generalized concave Kurdyka– Lojasiewicz property.Journal of Optimization Theory and Applications, 197(3):839–854, 2023

    Xianfu Wang and Ziyuan Wang. Calculus rules of the generalized concave Kurdyka– Lojasiewicz property.Journal of Optimization Theory and Applications, 197(3):839–854, 2023

  60. [60]

    Overcoming lower-level constraints in bilevel optimization: A novel approach with regularized gap functions.preprint, arXiv:2406.01992, 2024

    Wei Yao, Haian Yin, Shangzhi Zeng, and Jin Zhang. Overcoming lower-level constraints in bilevel optimization: A novel approach with regularized gap functions.preprint, arXiv:2406.01992, 2024

  61. [61]

    Constrained bi-level optimization: Proximal Lagrangian value function approach and Hessian-free algorithm

    Wei Yao, Chengming Yu, Shangzhi Zeng, and Jin Zhang. Constrained bi-level optimization: Proximal Lagrangian value function approach and Hessian-free algorithm. InInternational Conference on Learning Representations, 2024

  62. [62]

    Difference of convex algorithms for bilevel programs with applications in hyperparameter selection.Mathematical Programming, 198(2):1583–1616, 2023

    Jane J Ye, Xiaoming Yuan, Shangzhi Zeng, and Jin Zhang. Difference of convex algorithms for bilevel programs with applications in hyperparameter selection.Mathematical Programming, 198(2):1583–1616, 2023

  63. [63]

    Optimality conditions for bilevel programming problems.Optimization, 33(1):9–27, 1995

    Jane J Ye and DL Zhu. Optimality conditions for bilevel programming problems.Optimization, 33(1):9–27, 1995

  64. [64]

    An introduction to bilevel optimization: Foundations and applications in signal process- ing and machine learning.IEEE Signal Processing Magazine, 41(1):38–59, 2024

    Yihua Zhang, Prashant Khanduri, Ioannis Tsaknakis, Yuguang Yao, Mingyi Hong, and Sijia Liu. An introduction to bilevel optimization: Foundations and applications in signal process- ing and machine learning.IEEE Signal Processing Magazine, 41(1):38–59, 2024. 33