arxiv: 2605.11476 · v1 · submitted 2026-05-12 · 🧮 math.OC · stat.ML

Recognition: 3 theorem links

· Lean Theorem

A Barrier-Metric First-Order Method for Linearly Constrained Bilevel Optimization

Paul Grigas, Tenglong Hong

Pith reviewed 2026-05-13 02:11 UTC · model grok-4.3

classification 🧮 math.OC stat.ML

keywords bilevel optimizationlogarithmic barrierfirst-order methodlinear constraintsstationarity ratesDikin geometryproxy gradient

0 comments

The pith

Logarithmic barrier smoothing of the lower polyhedron lets a first-order proxy-gradient method reach stationarity rates of O(K to the -2/3) without Hessian inversions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a first-order algorithm for bilevel problems whose lower level has a fixed polyhedral constraint set. It replaces the lower problem with a logarithmic barrier smoothing that makes the overall upper objective differentiable and avoids active-set nonsmoothness. The method then runs a proxy-gradient iteration that only evaluates upper and lower gradients plus the explicit barrier Hessian. A local Dikin-style analysis in the barrier metric supplies curvature bounds that justify step-size schedules keeping iterates away from the boundary. This produces the stated non-asymptotic stationarity guarantees together with explicit control on the smoothing bias.

Core claim

For the barrier-smoothed bilevel objective the algorithm attains stationarity rates of O(K^{-2/3}) in the deterministic case and O(K^{-2/5}) when only the upper-level objective is corrupted by bounded stochastic noise, after K outer iterations, together with quantitative bounds showing that the bias introduced by the barrier parameter can be driven to zero at a controlled rate.

What carries the argument

The barrier metric (local Dikin geometry) that supplies an oracle-free, uniformly positive curvature scale near the moving lower-level centers and thereby justifies the barrier-aware step-size and barrier-parameter schedules.

If this is right

Upper-level gradients can be computed without inverting the lower-level Hessian or solving auxiliary linear systems at each iteration.
Active-set changes at the lower level no longer produce nondifferentiability in the upper objective because the barrier smoothing is maintained throughout.
The same first-order framework extends immediately to the case of bounded stochastic noise only in the upper objective, albeit with a slower rate.
Explicit, quantitative control on the approximation error is obtained simply by letting the barrier parameter decrease at a rate tied to the outer iteration count.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Dikin-geometry argument may extend to other barrier or interior-point smoothings of nonsmooth bilevel or min-max problems.
The explicit dependence of the schedules on problem constants suggests that an adaptive or line-search variant could be developed to remove the need for a priori knowledge of those constants.
Because the lower-level set is required only to be polyhedral, the same smoothing-plus-metric idea might apply directly to other linearly constrained nonsmooth optimization tasks beyond bilevel settings.

Load-bearing premise

The lower-level feasible set is a fixed polyhedron and the prescribed step-size and barrier schedules keep every iterate inside a locally well-behaved region where the barrier metric supplies uniform curvature control.

What would settle it

A simple linearly constrained bilevel instance whose observed outer-iteration stationarity gap after K steps fails to improve at rate K^{-2/3} or better when the barrier parameter is decreased according to the paper's schedule.

Figures

Figures reproduced from arXiv: 2605.11476 by Paul Grigas, Tenglong Hong.

**Figure 2.** Figure 2: Boundary stability experiment for the exact lower tracker. (a) The exact barrierized center [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Congestion-toll benchmark with bottleneck tightness [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Constrained MNIST hypercleaning with label corruption probability [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Interior stability experiment for the exact lower tracker. (a) The Euclidean tracker oscillates [PITH_FULL_IMAGE:figures/full_fig_p057_5.png] view at source ↗

**Figure 6.** Figure 6: Congestion-toll benchmark with bottleneck tightness [PITH_FULL_IMAGE:figures/full_fig_p058_6.png] view at source ↗

**Figure 7.** Figure 7: Congestion-toll benchmark with bottleneck tightness [PITH_FULL_IMAGE:figures/full_fig_p059_7.png] view at source ↗

**Figure 8.** Figure 8: Constrained MNIST hypercleaning with label corruption probability [PITH_FULL_IMAGE:figures/full_fig_p060_8.png] view at source ↗

read the original abstract

We study bilevel optimization with a fixed polyhedral lower feasible set. Such problems are challenging for two reasons: active-set changes can make the upper objective nonsmooth, and existing hypergradient methods typically require lower-Hessian inversions or equivalent linear solves, which are computationally expensive. To address these issues, we adopt a logarithmic barrier smoothing of the lower problem to obtain a differentiable approximation of the constrained bilevel objective, and develop a proxy-gradient algorithm for the resulting barrier-smoothed surrogate. The algorithm uses only gradients of the upper and lower objectives; its only second-order object is the explicit logarithmic barrier Hessian determined by the fixed polyhedral constraints. Barrier smoothing restores differentiability, but Euclidean smoothness constants are not uniformly bounded near the boundary. We therefore develop a local Dikin-geometry analysis in which the barrier-metric provides an oracle-free curvature scale near the moving lower centers. This leads to barrier-aware schedules that keep the iterates inside locally well-behaved regions. For the barrier-smoothed objective, we prove stationarity rates of $\widetilde{O}(K^{-2/3})$ in the deterministic setting and $\widetilde{O}(K^{-2/5})$ under upper-level-only bounded stochastic noise after $K$ outer iterations, together with quantitative bias control as the barrier parameter decreases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

This paper gives a first-order proxy-gradient method for polyhedrally constrained bilevel problems via log-barrier smoothing and local analysis in the barrier metric, but the step-size schedules depend on a priori constants that are not estimated by the algorithm. They smooth the lower level with a logarithmic barrier to restore differentiability of the upper objective, then run an algorithm that uses only gradients of the upper and lower objectives plus the explicit barrier Hessian. The analysis works in the local Dikin geometry induced by the barrier metric to get uniform curvature control near the moving lower-level centers, which yields stationarity rates of order K to the minus two-thirds deterministic and K to the minus two-fifths with bounded upper-level noise, plus quantitative bias control as the barrier parameter shrinks.

Referee Report

2 major / 2 minor

Summary. The paper develops a first-order proxy-gradient method for bilevel optimization problems where the lower-level feasible set is a fixed polyhedron. By applying logarithmic barrier smoothing to the lower-level problem, the upper-level objective becomes differentiable. The method uses only upper and lower gradients, with the barrier Hessian as the only second-order information. Using a local analysis in the Dikin geometry induced by the barrier metric, the authors derive step-size and barrier schedules that maintain iterates in regions with controlled curvature. They prove stationarity rates of O(K^{-2/3}) for the deterministic case and O(K^{-2/5}) under bounded stochastic noise in the upper level after K iterations, along with bounds on the bias introduced by the barrier smoothing as the parameter tends to zero.

Significance. If the convergence analysis holds, this provides a computationally attractive approach to bilevel optimization by avoiding lower-level Hessian inversions or solves, which are common bottlenecks in hypergradient methods. The barrier-metric analysis is a strength, allowing local control near the boundary without global Euclidean smoothness. The explicit rates and quantitative bias control as the barrier parameter decreases are valuable for understanding the approximation quality. This could be useful for applications with polyhedral constraints.

major comments (2)

[§3 (Algorithm and Schedules)] The barrier-aware step-size and barrier-parameter schedules (defined via the problem constants in the algorithm description) are constructed from a priori Lipschitz constants, strong-convexity modulus, and a lower bound on distance to the polyhedral boundary. These are used to enforce that all outer iterates remain inside the region where the logarithmic barrier metric supplies uniform self-concordance and Hessian bounds relative to the moving lower centers. Since the algorithm provides no estimation or adaptive mechanism for these constants, the uniform curvature control required for the descent lemma is not guaranteed to hold in practice.
[§4 (Convergence Analysis)] The proof of the deterministic stationarity rate O(K^{-2/3}) (and the stochastic O(K^{-2/5}) variant) relies on the descent inequality holding uniformly because the schedules keep iterates in the locally well-behaved Dikin region. If the actual distance to the boundary is smaller than the a priori lower bound used in the schedule, the Hessian bound relative to the barrier metric fails and the exponent derivation no longer applies.

minor comments (2)

[§2 (Problem Formulation)] The notation for the proxy gradient and the explicit form of the barrier Hessian could be stated more explicitly in one location to improve readability.
[Assumptions section] A single consolidated list of all assumptions (including noise bounds and geometry constants) would make cross-referencing with the rate proofs easier.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments on the algorithmic schedules and convergence analysis. We respond to each major comment below.

read point-by-point responses

Referee: [§3 (Algorithm and Schedules)] The barrier-aware step-size and barrier-parameter schedules (defined via the problem constants in the algorithm description) are constructed from a priori Lipschitz constants, strong-convexity modulus, and a lower bound on distance to the polyhedral boundary. These are used to enforce that all outer iterates remain inside the region where the logarithmic barrier metric supplies uniform self-concordance and Hessian bounds relative to the moving lower centers. Since the algorithm provides no estimation or adaptive mechanism for these constants, the uniform curvature control required for the descent lemma is not guaranteed to hold in practice.

Authors: We appreciate the referee's observation on the dependence on a priori constants. The analysis is non-adaptive and assumes these quantities (Lipschitz constants, strong-convexity modulus, and a positive lower bound δ on distance to the polyhedral boundary) are known or conservatively bounded, which is standard for deriving explicit rates in first-order methods. The schedules are constructed precisely from these constants to keep iterates inside the Dikin region with uniform curvature control relative to the barrier metric. In practice, a conservative (under)estimate of δ yields smaller steps that preserve the guarantees, though convergence may slow. We will add a remark in the revised Section 3 discussing conservative selection of these constants and noting that adaptive or line-search extensions could be considered in future work. revision: partial
Referee: [§4 (Convergence Analysis)] The proof of the deterministic stationarity rate O(K^{-2/3}) (and the stochastic O(K^{-2/5}) variant) relies on the descent inequality holding uniformly because the schedules keep iterates in the locally well-behaved Dikin region. If the actual distance to the boundary is smaller than the a priori lower bound used in the schedule, the Hessian bound relative to the barrier metric fails and the exponent derivation no longer applies.

Authors: The proof shows that when the a priori lower bound δ is chosen no larger than the true distance to the boundary, the barrier-aware schedules ensure all iterates remain inside a Dikin ball of controlled radius around the moving lower centers. This maintains the uniform Hessian bounds relative to the barrier metric, validating the descent inequality and the resulting rate exponents. Selecting δ larger than the actual distance would violate the curvature control, but the analysis treats δ as a valid lower bound that the user must supply (analogous to known smoothness parameters in standard first-order analyses). The local Dikin geometry is what permits handling non-uniform Euclidean smoothness without stronger global assumptions. revision: no

Circularity Check

0 steps flagged

No circularity: rates follow from standard analysis under explicit geometric assumptions

full rationale

The paper's central claims are stationarity rates for the barrier-smoothed bilevel objective, obtained by applying descent lemmas and noise bounds inside a local Dikin geometry induced by the logarithmic barrier. The step-size and barrier schedules are constructed explicitly from problem constants (Lipschitz moduli, strong-convexity parameters, distance to polyhedral boundary) that are treated as fixed inputs to the analysis; the rates are then derived as consequences of keeping iterates inside the region where those constants yield uniform self-concordance and Hessian bounds. No equation reduces a claimed prediction to a fitted quantity by construction, no uniqueness theorem is imported from the authors' prior work, and no ansatz is smuggled via self-citation. The derivation therefore remains self-contained against the stated assumptions rather than tautological.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the lower feasible set is a fixed polyhedron and on the existence of step-size and barrier schedules that keep iterates in regions where the local Dikin geometry supplies bounded curvature; the barrier parameter itself is a user-chosen smoothing constant that must be driven to zero.

free parameters (1)

barrier parameter
User-chosen smoothing constant that is decreased to control approximation bias; its schedule affects both convergence and final accuracy.

axioms (2)

domain assumption Lower-level feasible set is a fixed polyhedron
Explicitly stated as the problem setting; enables the explicit logarithmic barrier Hessian.
domain assumption Upper-level stochastic noise is bounded
Required for the stochastic rate; stated in the abstract.

pith-pipeline@v0.9.0 · 5527 in / 1412 out tokens · 60090 ms · 2026-05-13T02:11:42.360308+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
We adopt a logarithmic barrier smoothing... local Dikin-geometry analysis in which the barrier-metric provides an oracle-free curvature scale... barrier-aware schedules... rates of O(K^{-2/3})... O(K^{-2/5})
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear
Dikin metric induced by the barrier Hessian... self-concordant metric change
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
fixed polyhedral lower feasible set... barrier parameter decreases

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages

[1]

International conference on machine learning , pages=

Model-agnostic meta-learning for fast adaptation of deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017
[2]

International conference on machine learning , pages=

Learning to reweight examples for robust deep learning , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[3]

International conference on machine learning , pages=

Bilevel programming for hyperparameter optimization and meta-learning , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[4]

Transportation science , volume=

A bilevel model for toll optimization on a multicommodity transportation network , author=. Transportation science , volume=. 2001 , publisher=

work page 2001
[5]

Advances in neural information processing systems , volume=

Task-based end-to-end model learning in stochastic optimization , author=. Advances in neural information processing systems , volume=

work page
[6]

predict, then optimize

Smart “predict, then optimize” , author=. Management Science , volume=. 2022 , publisher=

work page 2022
[8]

International conference on machine learning , pages=

Bilevel optimization: Convergence analysis and enhanced design , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[9]

Annals of operations research , volume=

An overview of bilevel optimization , author=. Annals of operations research , volume=. 2007 , publisher=

work page 2007
[10]

SIAM Journal on Optimization , volume=

A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic , author=. SIAM Journal on Optimization , volume=. 2023 , publisher=

work page 2023
[11]

Advances in Neural Information Processing Systems , volume=

Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems , author=. Advances in Neural Information Processing Systems , volume=

work page
[12]

International Conference on Artificial Intelligence and Statistics , pages=

A single-timescale method for stochastic bilevel optimization , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2022 , organization=

work page 2022
[13]

Advances in Neural Information Processing Systems , volume=

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms , author=. Advances in Neural Information Processing Systems , volume=

work page
[14]

Advances in neural information processing systems , volume=

A near-optimal algorithm for stochastic bilevel optimization via double-momentum , author=. Advances in neural information processing systems , volume=

work page
[15]

Advances in Neural Information Processing Systems , volume=

Provably faster algorithms for bilevel optimization , author=. Advances in Neural Information Processing Systems , volume=

work page
[16]

Achieving

Yang, Yifan and Xiao, Peiyao and Ji, Kaiyi , journal=. Achieving

work page
[17]

Liu, Bo and Ye, Mao and Wright, Stephen and Stone, Peter and Liu, Qiang , journal=

work page
[18]

International Conference on Machine Learning , pages=

A fully first-order method for stochastic bilevel optimization , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[19]

International Conference on Machine Learning , pages=

Linearly constrained bilevel optimization: A smoothed implicit gradient approach , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[20]

Advances in neural information processing systems , volume=

First-order methods for linearly constrained bilevel optimization , author=. Advances in neural information processing systems , volume=

work page
[22]

The Twelfth International Conference on Learning Representations , year=

Constrained Bi-Level Optimization: Proximal Lagrangian Value Function Approach and Hessian-free Algorithm , author=. The Twelfth International Conference on Learning Representations , year=

work page
[23]

Advances in Neural Information Processing Systems , volume=

A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints , author=. Advances in Neural Information Processing Systems , volume=

work page
[24]

ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

An implicit gradient method for constrained bilevel problems using barrier approximation , author=. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2023 , organization=

work page 2023
[26]

Deng, Li , journal=. The. 2012 , publisher=

work page 2012
[27]

Some theoretical aspects of road traffic research

Road paper. Some theoretical aspects of road traffic research. , author=. Proceedings of the institution of civil engineers , volume=. 1952 , publisher=

work page 1952
[28]

Studies in the Economics of Transportation , author=

work page
[29]

Mathematical programming , volume=

Analysis of the Frank--Wolfe method for convex composite optimization involving a logarithmically-homogeneous barrier , author=. Mathematical programming , volume=. 2023 , publisher=

work page 2023
[30]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

A Single-Loop First-Order Algorithm for Linearly Constrained Bilevel Optimization , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page
[31]

Advances in neural information processing systems , volume=

Differentiable convex optimization layers , author=. Advances in neural information processing systems , volume=

work page
[32]

Differentiable convex optimization layers

Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, and J Zico Kolter. Differentiable convex optimization layers. Advances in neural information processing systems, 32, 2019

work page 2019
[33]

Studies in the economics of transportation

Martin Beckmann, Charles B McGuire, and Christopher B Winsten. Studies in the economics of transportation. Technical report, 1956

work page 1956
[34]

A bilevel model for toll optimization on a multicommodity transportation network

Luce Brotcorne, Martine Labb \'e , Patrice Marcotte, and Gilles Savard. A bilevel model for toll optimization on a multicommodity transportation network. Transportation science, 35 0 (4): 0 345--358, 2001

work page 2001
[35]

Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems

Tianyi Chen, Yuejiao Sun, and Wotao Yin. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. Advances in Neural Information Processing Systems, 34: 0 25294--25307, 2021

work page 2021
[36]

A single-timescale method for stochastic bilevel optimization

Tianyi Chen, Yuejiao Sun, Quan Xiao, and Wotao Yin. A single-timescale method for stochastic bilevel optimization. In International Conference on Artificial Intelligence and Statistics, pages 2466--2488. PMLR, 2022

work page 2022
[37]

An overview of bilevel optimization

Beno \^ t Colson, Patrice Marcotte, and Gilles Savard. An overview of bilevel optimization. Annals of operations research, 153 0 (1): 0 235--256, 2007

work page 2007
[38]

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Mathieu Dagr \'e ou, Pierre Ablin, Samuel Vaiter, and Thomas Moreau. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. Advances in Neural Information Processing Systems, 35: 0 26698--26710, 2022

work page 2022
[39]

The MNIST database of handwritten digit images for machine learning research [best of the web]

Li Deng. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29 0 (6): 0 141--142, 2012

work page 2012
[40]

Task-based end-to-end model learning in stochastic optimization

Priya Donti, Brandon Amos, and J Zico Kolter. Task-based end-to-end model learning in stochastic optimization. Advances in neural information processing systems, 30, 2017

work page 2017
[41]

predict, then optimize

Adam N Elmachtoub and Paul Grigas. Smart “predict, then optimize”. Management Science, 68 0 (1): 0 9--26, 2022

work page 2022
[42]

Model-agnostic meta-learning for fast adaptation of deep networks

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126--1135. PMLR, 2017

work page 2017
[43]

Bilevel programming for hyperparameter optimization and meta-learning

Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. In International conference on machine learning, pages 1568--1577. PMLR, 2018

work page 2018
[44]

Saeed Ghadimi and Mengdi Wang

Saeed Ghadimi and Mengdi Wang. Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018

work page arXiv 2018
[45]

A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

Mingyi Hong, Hoi-To Wai, Zhaoran Wang, and Zhuoran Yang. A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM Journal on Optimization, 33 0 (1): 0 147--180, 2023

work page 2023
[46]

Bilevel optimization: Convergence analysis and enhanced design

Kaiyi Ji, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design. In International conference on machine learning, pages 4882--4892. PMLR, 2021

work page 2021
[47]

Tenorio, Fernando Real-Rojas, Antonio G

Liuyuan Jiang, Quan Xiao, Victor M. Tenorio, Fernando Real-Rojas, Antonio G. Marques, and Tianyi Chen. A primal-dual-assisted penalty approach to bilevel optimization with coupled constraints. In Advances in Neural Information Processing Systems, volume 37, pages 95026--95066, 2024 a

work page 2024
[48]

A barrier function approach for bilevel optimization with coupled lower-level constraints: Formulation, approximation and algorithms

Xiaotian Jiang, Jiaxiang Li, Mingyi Hong, and Shuzhong Zhang. A barrier function approach for bilevel optimization with coupled lower-level constraints: Formulation, approximation and algorithms. arXiv preprint arXiv:2410.10670, 2024 b

work page arXiv 2024
[49]

A near-optimal algorithm for stochastic bilevel optimization via double-momentum

Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang, and Zhuoran Yang. A near-optimal algorithm for stochastic bilevel optimization via double-momentum. Advances in neural information processing systems, 34: 0 30271--30283, 2021

work page 2021
[50]

Linearly constrained bilevel optimization: A smoothed implicit gradient approach

Prashant Khanduri, Ioannis Tsaknakis, Yihua Zhang, Jia Liu, Sijia Liu, Jiawei Zhang, and Mingyi Hong. Linearly constrained bilevel optimization: A smoothed implicit gradient approach. In International Conference on Machine Learning, pages 16291--16325. PMLR, 2023

work page 2023
[51]

First-order methods for linearly constrained bilevel optimization

Guy Kornowski, Swati Padmanabhan, Kai Wang, Jimmy Zhang, and Suvrit Sra. First-order methods for linearly constrained bilevel optimization. Advances in neural information processing systems, 37: 0 141417--141460, 2024

work page 2024
[52]

A fully first-order method for stochastic bilevel optimization

Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, and Robert D Nowak. A fully first-order method for stochastic bilevel optimization. In International Conference on Machine Learning, pages 18083--18113. PMLR, 2023

work page 2023
[53]

BOME! bilevel optimization made easy: A simple first-order approach

Bo Liu, Mao Ye, Stephen Wright, Peter Stone, and Qiang Liu. BOME! bilevel optimization made easy: A simple first-order approach. Advances in neural information processing systems, 35: 0 17248--17262, 2022

work page 2022
[54]

Bridging constraints and stochasticity: A fully first-order method for stochastic bilevel optimization with linear constraints

Cac Phan and Kai Wang. Bridging constraints and stochasticity: A fully first-order method for stochastic bilevel optimization with linear constraints. arXiv preprint arXiv:2511.09845, 2025

work page arXiv 2025
[55]

Learning to reweight examples for robust deep learning

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. In International conference on machine learning, pages 4334--4343. PMLR, 2018

work page 2018
[56]

A single-loop first-order algorithm for linearly constrained bilevel optimization

Wei Shen, Jiawei Zhang, Minhui Huang, and Cong Shen. A single-loop first-order algorithm for linearly constrained bilevel optimization. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[57]

An implicit gradient method for constrained bilevel problems using barrier approximation

Ioannis Tsaknakis, Prashant Khanduri, and Mingyi Hong. An implicit gradient method for constrained bilevel problems using barrier approximation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1--5. IEEE, 2023

work page 2023
[58]

Road paper

John Glen Wardrop. Road paper. some theoretical aspects of road traffic research. Proceedings of the institution of civil engineers, 1 0 (3): 0 325--362, 1952

work page 1952
[59]

Provably faster algorithms for bilevel optimization

Junjie Yang, Kaiyi Ji, and Yingbin Liang. Provably faster algorithms for bilevel optimization. Advances in Neural Information Processing Systems, 34: 0 13670--13682, 2021

work page 2021
[60]

Achieving O ( ^ -1.5 ) complexity in hessian/jacobian-free stochastic bilevel optimization

Yifan Yang, Peiyao Xiao, and Kaiyi Ji. Achieving O ( ^ -1.5 ) complexity in hessian/jacobian-free stochastic bilevel optimization. Advances in Neural Information Processing Systems, 36: 0 39491--39503, 2023

work page 2023
[61]

Constrained bi-level optimization: Proximal lagrangian value function approach and hessian-free algorithm

Wei Yao, Chengming Yu, Shangzhi Zeng, and Jin Zhang. Constrained bi-level optimization: Proximal lagrangian value function approach and hessian-free algorithm. In The Twelfth International Conference on Learning Representations, 2024

work page 2024
[62]

Analysis of the frank--wolfe method for convex composite optimization involving a logarithmically-homogeneous barrier

Renbo Zhao and Robert M Freund. Analysis of the frank--wolfe method for convex composite optimization involving a logarithmically-homogeneous barrier. Mathematical programming, 199 0 (1): 0 123--163, 2023

work page 2023