Local LMO is a new projection-free method that achieves the convergence rates of projected gradient descent for constrained optimization by using local linear minimization oracles over small balls.
arXiv preprint arXiv:2211.14103 (2022)
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Optimistic bilevel optimization with manifold lower-level minimizers is differentiable if the optimistic selection is unique, yielding a pseudoinverse hyper-gradient and a convergent HG-MS algorithm whose rate depends on intrinsic manifold dimension.
For any qubit-qudit state under all projective measurements, an LHV model with outcome communication exists if and only if a standard LHV model without communication exists.
A randomized (1+ε)-approximation algorithm for ordered-norm load balancing uses O((n+d)(ε^{-2} + log log d) log(n+d)) linear-oracle calls via follow-the-regularized-leader prices and martingale progress analysis.
MaskPro learns categorical distributions over groups of M weights to generate exact (N:M) sparsity via N-way sampling without replacement and stabilizes training with a moving average tracker of loss residuals.
AdaNAGED combines zeroth-order gradient-free training, automatic parameter adaptation, and LMO-based non-Euclidean geometry with claimed convergence guarantees, demonstrated on OPT-1.3B fine-tuning.
Proposes (L0, L1)-Frank-Wolfe and adaptive variant claiming superior convergence rates for (L0, L1)-smooth objectives over classical Frank-Wolfe.
citing papers explorer
-
Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle
Local LMO is a new projection-free method that achieves the convergence rates of projected gradient descent for constrained optimization by using local linear minimization oracles over small balls.
-
Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets
Optimistic bilevel optimization with manifold lower-level minimizers is differentiable if the optimistic selection is unique, yielding a pseudoinverse hyper-gradient and a convergent HG-MS algorithm whose rate depends on intrinsic manifold dimension.
-
Can outcome communication explain Bell nonlocality?
For any qubit-qudit state under all projective measurements, an LHV model with outcome communication exists if and only if a standard LHV model without communication exists.
-
An Efficient Algorithm for Minimizing Ordered Norms in Fractional Load Balancing
A randomized (1+ε)-approximation algorithm for ordered-norm load balancing uses O((n+d)(ε^{-2} + log log d) log(n+d)) linear-oracle calls via follow-the-regularized-leader prices and martingale progress analysis.
-
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
MaskPro learns categorical distributions over groups of M weights to generate exact (N:M) sparsity via N-way sampling without replacement and stabilizes training with a moving average tracker of loss residuals.
-
Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning
AdaNAGED combines zeroth-order gradient-free training, automatic parameter adaptation, and LMO-based non-Euclidean geometry with claimed convergence guarantees, demonstrated on OPT-1.3B fine-tuning.
-
Frank-Wolfe Algorithms for (L0, L1)-smooth functions
Proposes (L0, L1)-Frank-Wolfe and adaptive variant claiming superior convergence rates for (L0, L1)-smooth objectives over classical Frank-Wolfe.