pith. sign in

arxiv: 2606.21153 · v1 · pith:R7VOFMSInew · submitted 2026-06-19 · 🧮 math.OC · cs.LG

DUET: Decentralized Bilevel Optimization without Lower-Level Strong Convexity

Pith reviewed 2026-06-26 14:03 UTC · model grok-4.3

classification 🧮 math.OC cs.LG
keywords decentralized bilevel optimizationdiminishing quadratic regularizationlower-level strong convexityhypergradientgradient trackingKKT-stationary pointdata heterogeneitymulti-agent systems
0
0 comments X

The pith

DUET enables decentralized bilevel optimization without lower-level strong convexity by adding diminishing quadratic regularization to the lower-level objective.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DUET as a single-loop decentralized algorithm that removes the standard requirement of lower-level strong convexity for bilevel problems. It achieves this by introducing a diminishing quadratic regularization term to the lower-level objective while using gradient tracking to manage data heterogeneity across agents. Under relaxed assumptions on the lower-level problem, DUET reaches approximate KKT-stationary points with iteration complexity O(1/T^{1-5p-11/4 τ}), where p and τ control the lower-level learning rate and averaging. A sympathetic reader would care because existing decentralized bilevel methods are limited to strongly convex lower levels, narrowing their use in practical multi-agent settings.

Core claim

DUET is the first decentralized bilevel optimization method to guarantee approximate KKT-stationary point convergence without lower-level strong convexity, by applying diminishing quadratic regularization to the lower-level objective together with gradient tracking for heterogeneity.

What carries the argument

Diminishing quadratic regularization added to the lower-level objective, which produces a well-defined hypergradient and stationarity measure without requiring strong convexity.

If this is right

  • Convergence to approximate KKT points holds under the paper's relaxed lower-level assumptions rather than strong convexity.
  • Gradient tracking inside DUET addresses data heterogeneity without a central server.
  • The iteration complexity scales as O(1/T^{1-5p-11/4 τ}) with explicit dependence on the lower-level learning-rate and averaging parameters.
  • The algorithm applies to multi-agent systems performing local bilevel tasks in a fully decentralized manner.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The regularization technique could be tested on lower-level problems that are non-convex but still satisfy the paper's relaxed conditions.
  • Removing the strong-convexity assumption opens the method to bilevel tasks arising in modern machine-learning models where convexity rarely holds.
  • The same diminishing-regularization idea might be portable to other decentralized or distributed bilevel settings beyond the ones studied here.

Load-bearing premise

Relaxed assumptions on the lower-level problem suffice to make the diminishing quadratic regularization produce a well-defined hypergradient and stationarity measure.

What would settle it

A counterexample or numerical run in which the hypergradient becomes undefined or stationarity fails to hold for any choice of the diminishing regularization schedule.

Figures

Figures reproduced from arXiv: 2606.21153 by Jia Liu, Songtao Lu, Yingbin Liang, Zhen Qin, Zhuqing Liu.

Figure 1
Figure 1. Figure 1: The gradients of the variables x and y, and the objective values of the UL and LL problems. 0 200 400 600 800 1000 Epochs 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Test Accuracy DUET, GT = 1 DSGT, GT = 1 DSGD 850 900 950 0.80 0.85 0.90 (a) i.i.d. 0 250 500 750 1000 1250 1500 1750 2000 Epochs 0.1 0.2 0.3 0.4 0.5 0.6 Test Accuracy DUET, GT = 1 DUET, GT = 0 DSGT, GT = 1 DSGT, GT = 0 185019001950 0.525 0.550 0.575 (… view at source ↗
Figure 2
Figure 2. Figure 2: Test accuracy on the meta-learning problem with a 5-agent network on MNIST. 0 250 500 750 1000 1250 1500 1750 2000 Epochs 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Test Accuracy DUET, p=0.3 DUET, p=0.5 DUET, p=0.7 DSGT, p=0.3 DSGT, p=0.5 DSGT, p=0.7 185019001950 0.80 0.85 0.90 (a) i.i.d. 0 250 500 750 1000 1250 1500 1750 2000 Epochs 0.1 0.2 0.3 0.4 0.5 0.6 Test Accuracy DUET, p=0.3 DUET, p=0.5 DUET, p=0.7 DSGT, … view at source ↗
Figure 4
Figure 4. Figure 4: The norms of x and y. The decentralized stochastic gradient descent (DSGD) approach is used as baseline for i.i.d. case that updates θ first by gradient descent and then uses the updated θ to calculate the gradient of x, subsequently updating x via SGD. We compare the performance of DUET and DSGT in both i.i.d. and non-i.i.d. settings. Figures 5 and 6 illustrate the train loss and accuracy results for both… view at source ↗
Figure 5
Figure 5. Figure 5: Comparisons between DUET and DSGT in the i.i.d. data scenario on the meta-learning problem with a 5-agent network on MNIST. 0 250 500 750 1000 1250 1500 1750 2000 Epochs 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 Train Loss DUET, GT = 1 DUET, GT = 0 DSGT, GT = 1 DSGT, GT = 0 185019001950 1.6 1.8 (a) Train loss. 0 250 500 750 1000 1250 1500 1750 2000 Epochs 0.1 0.2 0.3 0.4 0.5 0.6 Train Accuracy DUET, GT = 1 DUET,… view at source ↗
Figure 6
Figure 6. Figure 6: Comparisons between DUET and DSGT in the non-i.i.d. data scenario on the meta-learning problem with a 5-agent network on MNIST. In [PITH_FULL_IMAGE:figures/full_fig_p045_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Label distributions of data heterogeneity across nodes for non-iid case on the meta-learning [PITH_FULL_IMAGE:figures/full_fig_p046_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparisons between DUET and DSGT in the i.i.d. data scenario on the meta-learning problem with a 10-agent network on MNIST. 0 250 500 750 1000 1250 1500 1750 2000 Epochs 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 Train Loss DUET, GT = 1 DSGT, GT = 1 DSGD 185019001950 0.4 0.6 (a) Train loss. 0 250 500 750 1000 1250 1500 1750 2000 Epochs 0.2 0.4 0.6 0.8 Train Accuracy DUET, GT = 1 DSGT, GT = 1 DSGD 185019… view at source ↗
Figure 9
Figure 9. Figure 9: Comparisons between DUET and DSGT in the i.i.d. data scenario on the meta-learning problem with a 50-agent network on MNIST. The following table summarizes the parameter settings for the DUET , DSGT , and DSGD algorithms under both i.i.d. and non-i.i.d. cases, illustrating the diverse configurations tested in our experiments. E.3 DECENTRALIZED HYPERPARAMETER OPTIMIZATION WITH REAL-WORLD DATA: In this secti… view at source ↗
Figure 10
Figure 10. Figure 10: Comparisons between DUET and DSGT on the hyperparameter optimization problem with 10-agent network on FashinMNIST. The following table summarizes the parameter settings for the DUET, DSGT, and DSGD algorithms on the hyperparameter optimization problem with 10-agent network on FashinMNIST. Algorithm UL Learning Rate LL Learning Rate Parameters (µ, p) DUET 0.001 0.1 (0.1, 1 5 ) DSGT 0.1 0.01 (0.9, 1 5 ) DSG… view at source ↗
read the original abstract

Decentralized bilevel optimization (DBO) provides a powerful framework for multi-agent systems to solve local bilevel tasks in a decentralized fashion without the need for a central server. However, most existing DBO methods rely on lower-level strong convexity (LLSC) to guarantee unique solutions and a well-defined hypergradient for stationarity measure, hindering their applicability in many practical scenarios not satisfying LLSC. To overcome this limitation, we introduce a new single-loop DBO algorithm called diminishing quadratically-regularized bilevel decentralized optimization (DUET), which eliminates the need for LLSC by introducing a diminishing quadratic regularization to the lower-level (LL) objective. We show that DUET achieves an iteration complexity of $O(1/T^{1-5p-\frac{11}{4}\tau})$ for approximate KKT-stationary point convergence under relaxed assumptions, where $p$ and $\tau $ are control parameters for LL learning rate and averaging, respectively. In addition, our DUET algorithm incorporates gradient tracking to address data heterogeneity, a key challenge in DBO settings. To the best of our knowledge, this is the first work to tackle DBO without LLSC under decentralized settings with data heterogeneity. Numerical experiments validate the theoretical findings and demonstrate the practical effectiveness of our proposed algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes DUET, a single-loop decentralized bilevel optimization algorithm that applies a diminishing quadratic regularization to the lower-level objective. This removes the need for lower-level strong convexity (LLSC) while preserving a well-defined hypergradient and enabling convergence to approximate KKT-stationary points. The claimed iteration complexity is O(1/T^{1-5p-11/4 τ}) under relaxed assumptions, with gradient tracking incorporated to handle data heterogeneity across agents. Numerical experiments are presented to validate the theory, and the work claims to be the first to address DBO without LLSC in decentralized heterogeneous settings.

Significance. If the derivation and assumptions hold, the result would meaningfully extend decentralized bilevel optimization beyond the LLSC regime that limits most prior DBO methods. The combination of diminishing regularization with gradient tracking directly targets practical multi-agent scenarios, and the explicit (parameter-dependent) complexity bound provides a concrete benchmark. The absence of machine-checked proofs or fully reproducible code is noted but does not diminish the potential impact if the analysis is correct.

major comments (2)
  1. [§3 and §4] §3 (Assumptions) and §4 (Convergence Analysis): the relaxed assumptions replacing LLSC are invoked to guarantee a well-defined hypergradient and the KKT stationarity measure, yet the manuscript does not explicitly list or compare them to standard LLSC conditions; this is load-bearing for the central claim that the complexity bound holds without LLSC.
  2. [Theorem 1] Theorem 1 (or equivalent complexity statement): the exponent 1-5p-11/4 τ depends on the control parameters p and τ; the manuscript should clarify the admissible range of these parameters and whether the bound remains meaningful (positive exponent) under the relaxed assumptions without additional hidden restrictions.
minor comments (2)
  1. [§2 and §4] Notation for the diminishing regularization parameter and the stationarity measure should be introduced once and used consistently across the algorithm description and analysis sections.
  2. [§5] The experimental section would benefit from an explicit statement of how the relaxed assumptions are satisfied (or approximated) in the chosen test problems.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive recommendation of minor revision and the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [§3 and §4] §3 (Assumptions) and §4 (Convergence Analysis): the relaxed assumptions replacing LLSC are invoked to guarantee a well-defined hypergradient and the KKT stationarity measure, yet the manuscript does not explicitly list or compare them to standard LLSC conditions; this is load-bearing for the central claim that the complexity bound holds without LLSC.

    Authors: We agree that an explicit listing and comparison would strengthen the presentation. In the revised manuscript we will insert a dedicated paragraph in §3 that enumerates the relaxed assumptions and provides a direct side-by-side comparison with the classical lower-level strong-convexity condition, clarifying how the diminishing quadratic regularization guarantees a well-defined hypergradient and a meaningful KKT stationarity measure without invoking LLSC. revision: yes

  2. Referee: [Theorem 1] Theorem 1 (or equivalent complexity statement): the exponent 1-5p-11/4 τ depends on the control parameters p and τ; the manuscript should clarify the admissible range of these parameters and whether the bound remains meaningful (positive exponent) under the relaxed assumptions without additional hidden restrictions.

    Authors: The parameters must satisfy 0 < p, τ and 5p + (11/4)τ < 1 to obtain a positive exponent. These ranges are admissible under the relaxed assumptions because the convergence analysis relies only on the diminishing regularization schedule and the gradient-tracking mechanism, not on LLSC. We will revise the statement of Theorem 1 (and the surrounding discussion in §4) to state the admissible ranges explicitly and to confirm that the exponent remains positive throughout this range with no additional hidden restrictions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces DUET via diminishing quadratic regularization on the lower-level objective to remove the LLSC assumption, then derives an iteration complexity bound O(1/T^{1-5p-11/4 τ}) for approximate KKT-stationary points under explicitly relaxed assumptions, with gradient tracking for heterogeneity. The bound is stated as a function of tunable control parameters p and τ rather than any fitted quantity; no equations reduce by construction to inputs, no self-citation chains are load-bearing for the central claim, and the derivation is presented as independent theoretical analysis. The result is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; free parameters are the two control parameters mentioned, and the key domain assumption is the set of relaxed lower-level conditions.

free parameters (2)
  • p
    Control parameter for lower-level learning rate that appears in the complexity exponent.
  • τ
    Control parameter for averaging that appears in the complexity exponent.
axioms (1)
  • domain assumption Relaxed assumptions on the lower-level problem that replace lower-level strong convexity
    Invoked to guarantee a well-defined hypergradient and to support the KKT-stationarity convergence claim.

pith-pipeline@v0.9.1-grok · 5768 in / 1234 out tokens · 17149 ms · 2026-06-26T14:03:54.272982+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 3 canonical work pages

  1. [1]

    Proceedings of International Conference on Machine Learning , pages =

    Risheng Liu and Yaohua Liu and Wei Yao and Shangzhi Zeng and Jin Zhang , title =. Proceedings of International Conference on Machine Learning , pages =

  2. [2]

    arXiv preprint arXiv:2301.00712 , year=

    On bilevel optimization without lower-level strong convexity , author=. arXiv preprint arXiv:2301.00712 , year=

  3. [3]

    Proceedings of International Conference on Artificial Intelligence and Statistics , pages=

    A conditional gradient-based method for simple bilevel optimization with convex lower-level problem , author=. Proceedings of International Conference on Artificial Intelligence and Statistics , pages=

  4. [6]

    Optimization Letters , year=

    Decentralized bilevel optimization , author=. Optimization Letters , year=

  5. [7]

    Advances in Neural Information Processing Systems , year=

    A stochastic linearized augmented Lagrangian method for decentralized bilevel optimization , author=. Advances in Neural Information Processing Systems , year=

  6. [8]

    Advances in neural information processing systems , volume=

    Decentralized gossip-based stochastic bilevel optimization over communication networks , author=. Advances in neural information processing systems , volume=

  7. [9]

    International Conference on Artificial Intelligence and Statistics , pages=

    On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

  8. [10]

    2018 , publisher=

    Lectures on Convex Optimization , author=. 2018 , publisher=

  9. [12]

    Proceedings of the 38th International Conference on Machine Learning , pages =

    Bilevel Optimization: Convergence Analysis and Enhanced Design , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

  10. [13]

    Proceedings of the Neural Information Processing Systems (NeurIPS) , year=

    A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum , author=. Proceedings of the Neural Information Processing Systems (NeurIPS) , year=

  11. [14]

    Journal of Machine Learning Research , year =

    Kaiyi ji and Yingbin Liang , title =. Journal of Machine Learning Research , year =

  12. [15]

    Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

    A Single-Timescale Method for Stochastic Bilevel Optimization , author=. Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

  13. [16]

    Advances in Neural Information Processing Systems , volume=

    A framework for bilevel optimization that enables stochastic and global variance reduction algorithms , author=. Advances in Neural Information Processing Systems , volume=. 2022 , organization=

  14. [17]

    Proceedings of the International Conference on Machine Learning (ICML) , year=

    A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton , author=. Proceedings of the International Conference on Machine Learning (ICML) , year=

  15. [18]

    Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =

    Ji, Kaiyi and Liu, Mingrui and Liang, Yingbin and Ying, Lei , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2024 , isbn =

  16. [21]

    SIAM Journal on Optimization , volume =

    Shoham Sabach and Shimrit Shtern , title =. SIAM Journal on Optimization , volume =

  17. [22]

    Xu , title =

    H.-K. Xu , title =. Journal of Mathematical Analysis and Applications , volume =. 2004 , doi =

  18. [23]

    2020 , eprint=

    Improved Bilevel Model: Fast and Optimal Algorithm with Theoretical Guarantee , author=. 2020 , eprint=

  19. [24]

    IEEE INFOCOM 2023-IEEE Conference on Computer Communications , pages=

    DIAMOND: Taming Sample and Communication Complexities in Decentralized Bilevel Optimization , author=. IEEE INFOCOM 2023-IEEE Conference on Computer Communications , pages=. 2023 , organization=

  20. [25]

    Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing , pages=

    Interact: Achieving low sample and communication complexities in decentralized bilevel learning over networks , author=. Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing , pages=

  21. [26]

    Proceedings of the 40th International Conference on Machine Learning , pages =

    Prometheus: Taming Sample and Communication Complexities in Constrained Decentralized Stochastic Bilevel Learning , author =. Proceedings of the 40th International Conference on Machine Learning , pages =

  22. [27]

    Proceedings of the 41st International Conference on Machine Learning , pages =

    Distributed Bilevel Optimization with Communication Compression , author =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , editor =

  23. [28]

    arXiv preprint arXiv:2312.14690 , year=

    Distributed Stochastic Bilevel Optimization: Improved Complexity and Heterogeneity Analysis , author=. arXiv preprint arXiv:2312.14690 , year=

  24. [29]

    and Kingsbury, Brian and Horesh, Lior , booktitle=

    Lu, Songtao and Cui, Xiaodong and Squillante, Mark S. and Kingsbury, Brian and Horesh, Lior , booktitle=. Decentralized Bilevel Optimization for Personalized Client Learning , year=

  25. [30]

    Proceedings of the 40th International Conference on Machine Learning , volume=

    Decentralized stochastic bilevel optimization with improved per-iteration complexity , author=. Proceedings of the 40th International Conference on Machine Learning , volume=. 2023 , organization=

  26. [31]

    SIAM Journal on Optimization , year=

    A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic , author=. SIAM Journal on Optimization , year=

  27. [32]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Bi-level actor-critic for multi-agent coordination , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  28. [33]

    Advances in Neural Information Processing Systems , volume=

    Smooth bilevel programming for sparse regularization , author=. Advances in Neural Information Processing Systems , volume=

  29. [34]

    2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) , pages=

    BOML: A modularized bilevel optimization library in Python for meta-learning , author=. 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) , pages=. 2021 , organization=

  30. [35]

    Advances in neural information processing systems , volume=

    Meta-learning with implicit gradients , author=. Advances in neural information processing systems , volume=

  31. [36]

    Proceedings of Thirty Seventh Conference on Learning Theory , pages =

    On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis , author =. Proceedings of Thirty Seventh Conference on Learning Theory , pages =. 2024 , editor =

  32. [37]

    Advances in Neural Information Processing Systems , volume=

    Automatic and harmless regularization with constrained and lexicographic optimization: A dynamic barrier approach , author=. Advances in Neural Information Processing Systems , volume=

  33. [38]

    2024 , eprint=

    A Single-Loop Algorithm for Decentralized Bilevel Optimization , author=. 2024 , eprint=

  34. [39]

    NeurIPS 2021 , year=

    Bi-objective trade-off with dynamic barrier gradient descent , author=. NeurIPS 2021 , year=

  35. [40]

    Neural Computation , volume=

    Dictionary learning algorithms for sparse representation , author=. Neural Computation , volume=

  36. [41]

    IEEE Transactions on Signal Processing , volume=

    Dictionary learning for sparse approximations with the majorization method , author=. IEEE Transactions on Signal Processing , volume=. 2009 , publisher=

  37. [42]

    Proceedings of the 30th ACM International Conference on Information & Knowledge Management , pages=

    PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models , author=. Proceedings of the 30th ACM International Conference on Information & Knowledge Management , pages=. 2021 , organization=

  38. [43]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Dictionary learning for sparse coding: Algorithms and convergence analysis , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2015 , publisher=

  39. [44]

    ArXiv , year=

    DoCoM-SGT: Doubly Compressed Momentum-assisted Stochastic Gradient Tracking Algorithm for Communication Efficient Decentralized Learning , author=. ArXiv , year=

  40. [45]

    Proceedings of the AAAI Conference on Artificial Intelligence , year=

    Bi-Level Actor-Critic for Multi-Agent Coordination , author=. Proceedings of the AAAI Conference on Artificial Intelligence , year=. doi:10.1609/aaai.v34i05.6226 , pages=

  41. [47]

    The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

    CoBo: Collaborative Learning via Bilevel Optimization , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

  42. [48]

    Distributed Subgradient Methods for Multi-Agent Optimization , year=

    Nedic, Angelia and Ozdaglar, Asuman , journal=. Distributed Subgradient Methods for Multi-Agent Optimization , year=

  43. [49]

    Advances in neural information processing systems , volume=

    Bome! bilevel optimization made easy: A simple first-order approach , author=. Advances in neural information processing systems , volume=

  44. [50]

    The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

    Penalty-based Methods for Simple Bilevel Optimization under H\"olderian Error Bounds , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

  45. [52]

    The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

    An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

  46. [53]

    2024 American Control Conference (ACC) , year=

    Achieving Optimal Complexity Guarantees for a Class of Bilevel Convex Optimization Problems , author=. 2024 American Control Conference (ACC) , year=

  47. [54]

    Projection-free methods for stochastic simple bilevel optimization with convex lower-level problem

    Jincheng Cao, Ruichen Jiang, Nazanin Abolfazli, Erfan Yazdandoost Hamedani, and Aryan Mokhtari. Projection-free methods for stochastic simple bilevel optimization with convex lower-level problem. arXiv preprint arXiv:2308.07536, 2023

  48. [55]

    An accelerated gradient method for convex smooth simple bilevel optimization

    Jincheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, and Aryan Mokhtari. An accelerated gradient method for convex smooth simple bilevel optimization. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=aFOdln7jBV

  49. [56]

    On finding small hyper-gradients in bilevel optimization: Hardness results and improved analysis

    Lesi Chen, Jing Xu, and Jingzhao Zhang. On finding small hyper-gradients in bilevel optimization: Hardness results and improved analysis. In Shipra Agrawal and Aaron Roth (eds.), Proceedings of Thirty Seventh Conference on Learning Theory, volume 247 of Proceedings of Machine Learning Research, pp.\ 947--980. PMLR, 30 Jun--03 Jul 2024 a . URL https://proc...

  50. [57]

    Penalty-based methods for simple bilevel optimization under h\"olderian error bounds

    Pengyu Chen, Xu Shi, Rujun Jiang, and Jiulin Wang. Penalty-based methods for simple bilevel optimization under h\"olderian error bounds. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 b . URL https://openreview.net/forum?id=oQ1Zj9iH88

  51. [58]

    X. Chen, M. Huang, S. Ma, and K. Balasubramanian. Decentralized stochastic bilevel optimization with improved per-iteration complexity. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pp.\ 4641--4671. PMLR, 2023

  52. [59]

    Decentralized bilevel optimization

    Xuxing Chen, Minhui Huang, and Shiqian Ma. Decentralized bilevel optimization. Optimization Letters, 2022. URL https://api.semanticscholar.org/CorpusID:249626492

  53. [60]

    Dagréou, P

    M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. In Advances in Neural Information Processing Systems, volume 35, pp.\ 26698--26710. Curran Associates, Inc., 2022

  54. [61]

    A single-loop algorithm for decentralized bilevel optimization, 2024

    Youran Dong, Shiqian Ma, Junfeng Yang, and Chao Yin. A single-loop algorithm for decentralized bilevel optimization, 2024. URL https://arxiv.org/abs/2311.08945

  55. [62]

    On the convergence of distributed stochastic bilevel optimization algorithms over a network

    Hongchang Gao, Bin Gu, and My T Thai. On the convergence of distributed stochastic bilevel optimization algorithms over a network. In International Conference on Artificial Intelligence and Statistics, pp.\ 9238--9281. PMLR, 2023

  56. [63]

    Approximation methods for bilevel programming

    Saeed Ghadimi and Mengdi Wang. Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018

  57. [64]

    Cobo: Collaborative learning via bilevel optimization

    Diba Hashemi, Lie He, and Martin Jaggi. Cobo: Collaborative learning via bilevel optimization. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=SjQ1iIqpfU

  58. [65]

    Distributed bilevel optimization with communication compression

    Yutong He, Jie Hu, Xinmeng Huang, Songtao Lu, Bin Wang, and Kun Yuan. Distributed bilevel optimization with communication compression. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proce...

  59. [66]

    Lower bounds and accelerated algorithms for bilevel optimization

    Kaiyi ji and Yingbin Liang. Lower bounds and accelerated algorithms for bilevel optimization. Journal of Machine Learning Research, 24 0 (22): 0 1--56, 2023. URL http://jmlr.org/papers/v24/21-0949.html

  60. [67]

    Bilevel optimization: Convergence analysis and enhanced design

    Kaiyi Ji, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.\ 4882--4892. PMLR, 18--24 Jul 2021. URL https://proceedings.mlr.press/v139/ji21c.html

  61. [68]

    Will bilevel optimizers benefit from loops

    Kaiyi Ji, Mingrui Liu, Yingbin Liang, and Lei Ying. Will bilevel optimizers benefit from loops. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS '22, Red Hook, NY, USA, 2024. Curran Associates Inc. ISBN 9781713871088

  62. [69]

    A conditional gradient-based method for simple bilevel optimization with convex lower-level problem

    Ruichen Jiang, Nazanin Abolfazli, Aryan Mokhtari, and Erfan Yazdandoost Hamedani. A conditional gradient-based method for simple bilevel optimization with convex lower-level problem. In Proceedings of International Conference on Artificial Intelligence and Statistics, pp.\ 10305--10323, 2023

  63. [70]

    Decentralized bilevel optimization over graphs: Loopless algorithmic update and transient iteration complexity

    Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, and Kun Yuan. Decentralized bilevel optimization over graphs: Loopless algorithmic update and transient iteration complexity. arXiv preprint arXiv:2402.03167, 2024

  64. [71]

    Improved bilevel model: Fast and optimal algorithm with theoretical guarantee, 2020

    Junyi Li, Bin Gu, and Heng Huang. Improved bilevel model: Fast and optimal algorithm with theoretical guarantee, 2020. URL https://arxiv.org/abs/2009.00690

  65. [72]

    Bome! bilevel optimization made easy: A simple first-order approach

    Bo Liu, Mao Ye, Stephen Wright, Peter Stone, and Qiang Liu. Bome! bilevel optimization made easy: A simple first-order approach. Advances in neural information processing systems, 35: 0 17248--17262, 2022 a

  66. [73]

    Averaged method of multipliers for bi-level optimization without lower-level strong convexity

    Risheng Liu, Yaohua Liu, Wei Yao, Shangzhi Zeng, and Jin Zhang. Averaged method of multipliers for bi-level optimization without lower-level strong convexity. In Proceedings of International Conference on Machine Learning, pp.\ 21839--21866, 2023 a

  67. [74]

    Liu and R

    Y. Liu and R. Liu. Boml: A modularized bilevel optimization library in python for meta-learning. In 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp.\ 1--2. IEEE, 2021

  68. [75]

    Interact: Achieving low sample and communication complexities in decentralized bilevel learning over networks

    Zhuqing Liu, Xin Zhang, Prashant Khanduri, Songtao Lu, and Jia Liu. Interact: Achieving low sample and communication complexities in decentralized bilevel learning over networks. In Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pp.\ 61--70, 2022 b

  69. [76]

    Prometheus: Taming sample and communication complexities in constrained decentralized stochastic bilevel learning

    Zhuqing Liu, Xin Zhang, Prashant Khanduri, Songtao Lu, and Jia Liu. Prometheus: Taming sample and communication complexities in constrained decentralized stochastic bilevel learning. In Proceedings of the 40th International Conference on Machine Learning, pp.\ 22420--22453, 2023 b

  70. [77]

    A stochastic linearized augmented lagrangian method for decentralized bilevel optimization

    Songtao Lu, Siliang Zeng, Xiaodong Cui, Mark Squillante, Lior Horesh, Brian Kingsbury, Jia Liu, and Mingyi Hong. A stochastic linearized augmented lagrangian method for decentralized bilevel optimization. Advances in Neural Information Processing Systems, 2022

  71. [78]

    First-order penalty methods for bilevel optimization

    Zhaosong Lu and Sanyou Mei. First-order penalty methods for bilevel optimization. arXiv preprint arXiv:2301.01716, 2023

  72. [79]

    Convex bi-level optimization problems with nonsmooth outer objective function

    Roey Merchav and Shoham Sabach. Convex bi-level optimization problems with nonsmooth outer objective function. SIAM Journal on Optimization, 33 0 (4): 0 3114--3142, 2023. doi:10.1137/22M1533608. URL https://doi.org/10.1137/22M1533608

  73. [80]

    Distributed subgradient methods for multi-agent optimization

    Angelia Nedic and Asuman Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54 0 (1): 0 48--61, 2009. doi:10.1109/TAC.2008.2009515

  74. [81]

    Lectures on Convex Optimization, volume 137

    Yurii Nesterov. Lectures on Convex Optimization, volume 137. Springer, 2018

  75. [82]

    Distributed stochastic bilevel optimization: Improved complexity and heterogeneity analysis, 2023

    Youcheng Niu, Jinming Xu, Ying Sun, Yan Huang, and Li Chai. Distributed stochastic bilevel optimization: Improved complexity and heterogeneity analysis, 2023

  76. [83]

    Smooth bilevel programming for sparse regularization

    Clarice Poon and Gabriel Peyr \'e . Smooth bilevel programming for sparse regularization. Advances in Neural Information Processing Systems, 34: 0 1543--1555, 2021

  77. [84]

    Diamond: Taming sample and communication complexities in decentralized bilevel optimization

    Peiwen Qiu, Yining Li, Zhuqing Liu, Prashant Khanduri, Jia Liu, Ness B Shroff, Elizabeth Serena Bentley, and Kurt Turck. Diamond: Taming sample and communication complexities in decentralized bilevel optimization. In IEEE INFOCOM 2023-IEEE Conference on Computer Communications, pp.\ 1--10. IEEE, 2023

  78. [85]

    Meta-learning with implicit gradients

    Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. Advances in neural information processing systems, 32, 2019

  79. [86]

    A first order method for solving convex bilevel optimization problems

    Shoham Sabach and Shimrit Shtern. A first order method for solving convex bilevel optimization problems. SIAM Journal on Optimization, 27 0 (2): 0 640--660, 2017

  80. [87]

    Achieving optimal complexity guarantees for a class of bilevel convex optimization problems

    Sepideh Samadi, Daniel Burbano, and Farzad Yousefian. Achieving optimal complexity guarantees for a class of bilevel convex optimization problems. 2024 American Control Conference (ACC), pp.\ 2206--2211, 2023. URL https://api.semanticscholar.org/CorpusID:264305773

Showing first 80 references.