arxiv: 2605.08850 · v1 · submitted 2026-05-09 · 🧮 math.OC · cs.LG· stat.ML

Recognition: no theorem link

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle

Hanmin Li, Kaja Gruntkowska, Peter Richt\'arik

Pith reviewed 2026-05-12 01:12 UTC · model grok-4.3

classification 🧮 math.OC cs.LGstat.ML

keywords projection-free optimizationlinear minimization oracleFrank-Wolfeconstrained optimizationgradient descentconvergence ratesconvex optimization

0 comments

The pith

Local LMO replaces the global linear minimization oracle of Frank-Wolfe with a local version over a small ball, allowing it to match the convergence rates of projected gradient descent without assuming a bounded feasible set or function

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Local LMO, a projection-free method for constrained optimization that uses a local linear minimization oracle instead of the global one in Frank-Wolfe. This local oracle minimizes the linear approximation over the intersection of the constraint set and a ball of radius t_k around the current point, where t_k acts as an effective stepsize. The method reduces to standard gradient descent when there are no constraints or when the constraint is an affine subspace. The authors prove that Local LMO achieves the same unaccelerated convergence rates as projected gradient descent in regimes including convex functions with bounded gradients, smooth strongly convex functions, and others, even when the feasible set is unbounded.

Core claim

Local LMO performs the update by solving a linear minimization problem over the constraint set intersected with a ball of radius t_k centered at the current iterate. This scheme transfers the known convergence rates of projected gradient descent to the projection-free setting across multiple function classes, including sublinear rates for convex functions with bounded gradients without needing curvature assumptions, linear rates for smooth strongly convex functions, and sharp rates in smooth convex, non-convex, and stochastic settings, all without requiring the feasible set to be bounded.

What carries the argument

The local linear minimization oracle, which finds the point in the intersection of the feasible set and a small ball that minimizes the inner product with the gradient, serving as a localized projection-free step that generalizes gradient descent.

Load-bearing premise

The analysis assumes that positive radii t_k can be chosen appropriately to achieve the desired convergence rates under the various function regimes.

What would settle it

Finding a convex function with bounded gradients over an unbounded set where no sequence of radii t_k lets Local LMO achieve the sublinear rate known for projected gradient descent would disprove the rate transfer.

Figures

Figures reproduced from arXiv: 2605.08850 by Hanmin Li, Kaja Gruntkowska, Peter Richt\'arik.

**Figure 1.** Figure 1: Illustration of Local LMO dynamics for an L–smooth (L = 100) and µ–strongly convex (µ = 1) quadratic f : R 2 → R, and a box constraint of radius 0.5 centered at c = (−1, 0) (left) and c = (1, 1) (right), with the radius rule tk = θ∥xk − x⋆∥, where θ = 2√ µL/(L + µ) ≈ 0.19802 (see Theorem 3.5). Shown: 100 iterates {xk} of Local LMO and the corresponding balls B(xk, tk). Note that ∥xk+1 − xk∥ = tk for all k … view at source ↗

**Figure 2.** Figure 2: Semi-log plots of the squared distance (left) or distance (right) to the constrained minimizer. Left: convergence behaviour of Local LMO, PGD, and FW over the first 100 iterations, including the geometric upper bound ( (L−µ)/(L+µ)) 2k ∥x0 − x⋆∥ 2 . Right: comparison of the adaptive radius rule tk = θ∥xk − x⋆∥ with geometric radius schedules tk = θ∥x0 − x⋆∥q k for 10 values of q ∈ [0.8, 0.95]. let X = [2, 4… view at source ↗

**Figure 3.** Figure 3: Two-dimensional illustration of Local LMO with the radius rule tk = θ∥xk − x⋆∥. The plot shows the contour lines of the quadratic objective, the feasible box X = [2, 4] × [2, 4] with center c = (3, 3), the unconstrained minimizer (0, 0), the constrained minimizer x⋆, the first few trust-region balls, and 15 iterates {xk}. In this case, L = 100, µ = 1 and θ ≈ 0.19802. Although this estimate is only a best-i… view at source ↗

**Figure 4.** Figure 4: Semi-log plot of the gradient-difference quantity [PITH_FULL_IMAGE:figures/full_fig_p067_4.png] view at source ↗

**Figure 5.** Figure 5: Semi-log plot of the squared distance to the constrained minimizer for [PITH_FULL_IMAGE:figures/full_fig_p067_5.png] view at source ↗

**Figure 6.** Figure 6: Semi-log plot of the squared distance to the constrained minimizer for [PITH_FULL_IMAGE:figures/full_fig_p068_6.png] view at source ↗

**Figure 7.** Figure 7: Semi-log plot comparing the adaptive radius rule [PITH_FULL_IMAGE:figures/full_fig_p069_7.png] view at source ↗

**Figure 8.** Figure 8: Illustration of Local LMO dynamics for an L–smooth (L = 100) and µ–strongly convex (µ = 1) quadratic f : R 2 → R, and three ball constraints of radius 1 (with ℓ1, ℓ2 and ℓ∞ ball geometries), with the Local LMO radius rule tk = θ∥xk − x⋆∥, where θ ≈ 0.19802 (see Theorem 3.5). Shown: 100 iterates {xk} of Local LMO and the corresponding balls B(xk, tk). Note that ∥xk+1 − xk∥ = tk for all k (Theorem 3.1(ii) sa… view at source ↗

read the original abstract

We design Local LMO - a new projection-free gradient-type method for constrained optimization. The key algorithmic idea is to replace the global linear minimization oracle over the constraint set used by Frank-Wolfe (FW) with a local linear minimization oracle over the intersection of the constraint set and a "small" ball centered at the current iterate. In particular, when minimizing $f:\mathbb{R}^d\to \mathbb{R}$ over a constraint $\emptyset\neq\mathcal{X}\subseteq\mathbb{R}^d$, Local LMO performs the iteration \[x_{k+1}\in \arg\min_{z\in\mathcal{X}\cap\mathcal{B}(x_{k},t_k)}\langle\nabla f(x_{k}), z \rangle,\] where $x_0\in\mathcal{X}$, and $t_k>0$ is a suitably chosen radius which can be interpreted as an effective stepsize. While designed as an alternative to FW, Local LMO is perhaps best viewed as a generalization of Gradient Descent (GD) rather than a modification of FW. Indeed, it is easy to see that Local LMO reduces to GD in the unconstrained setting and, more generally, to GD restricted to an affine subspace if the constraint $\mathcal{X}$ is affine. We prove that this simple algorithmic scheme transfers the known (unaccelerated) convergence rates of Projected Gradient Descent (PGD) to the projection-free world in several important regimes, some of which are beyond the reach of FW. In contrast to FW theory, i) our guarantees hold without requiring the feasible set $\mathcal{X}$ to be bounded, ii) our theory does not require the "curvature" assumption, which allows us to establish a standard sublinear rate for convex functions with bounded gradients, iii) we obtain a linear rate in the smooth strongly convex regime. Furthermore, we obtain sharp sublinear rates in the smooth convex and non-convex regimes, in the $(L_0,L_1)$-smooth convex regime, and in stochastic and non-differentiable settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Local LMO localizes the linear minimization oracle to import projected gradient rates into a projection-free setting without boundedness or curvature assumptions.

read the letter

Local LMO replaces the global linear minimization oracle from Frank-Wolfe with one restricted to the intersection of the constraint set and a ball of radius t_k around the current iterate. The iteration is simple: pick the point in that local set that minimizes the linear approximation of the objective. When the constraint is the whole space or an affine subspace, it reduces exactly to gradient descent, which is a clean generalization rather than a minor tweak on existing methods. The main claim is that this transfers the standard unaccelerated rates of projected gradient descent to the projection-free case in several regimes. They get linear convergence for smooth strongly convex functions, sublinear rates for convex functions with bounded gradients without needing curvature, and sharp rates for smooth convex, non-convex, (L0,L1)-smooth, stochastic, and non-differentiable cases. The absence of a boundedness requirement on the feasible set is also useful, since Frank-Wolfe often needs it. From the abstract alone the logic looks direct: they define the local oracle and then appeal to known PGD results rather than deriving everything from scratch. The soft spot is the radius t_k. The abstract calls it a suitably chosen effective stepsize, but without the proofs it is unclear whether explicit, practical selection rules are given or whether existence of good t_k requires extra knowledge of constants or solving auxiliary problems that could be as hard as the original projection. If t_k selection turns out to be non-trivial in practice, the method's advantage shrinks. This is aimed at researchers working on constrained optimization where projections are expensive or impossible, especially in machine learning settings with complex feasible sets. The idea is coherent and the listed regimes are relevant, so it deserves a serious referee to check the details on t_k and the rate transfers. I would send it to peer review.

Referee Report

2 major / 1 minor

Summary. The paper introduces Local LMO, a projection-free gradient method for minimizing f over a constraint set X. It replaces the global linear minimization oracle of Frank-Wolfe with a local one over X intersect the ball of radius t_k centered at the current iterate x_k, yielding the update x_{k+1} in argmin_{z in X cap B(x_k,t_k)} <grad f(x_k), z>. The central claim is that, for suitably chosen t_k > 0, this transfers the known unaccelerated convergence rates of Projected Gradient Descent to the projection-free setting across multiple regimes (linear rate for smooth strongly convex; sublinear rates for smooth convex, non-convex, (L0,L1)-smooth convex, stochastic, and non-differentiable cases), without requiring boundedness of X or the curvature assumption needed by FW. The method reduces to GD when X is unconstrained or affine.

Significance. If the claimed rate transfers are rigorously established, the result would be significant: it provides a simple, projection-free scheme that achieves PGD rates in regimes where FW is either inapplicable or slower, while avoiding the need for global oracles or bounded domains. This could broaden the practical reach of projection-free methods in settings where local linear minimization is tractable.

major comments (2)

[Abstract] Abstract: the claim that t_k can be chosen to transfer PGD rates in the smooth strongly convex, smooth convex, and bounded-gradient convex regimes is load-bearing, yet the abstract provides neither the explicit selection rule for t_k nor any indication of how the existence of such radii is proved; without these details the rate-transfer statements cannot be verified.
[Abstract] Abstract: the statement that Local LMO 'transfers the known (unaccelerated) convergence rates of PGD' assumes that the local-oracle iteration can be analyzed by direct reduction to PGD; the abstract does not indicate whether this reduction is exact or requires additional arguments to control the difference between the local and global minimizers.

minor comments (1)

The notation B(x_k, t_k) is not defined in the provided text; it should be stated whether this is the Euclidean ball or another norm.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting the need for clarity on how the abstract summarizes the technical contributions. We address each major comment below. The manuscript body contains the full derivations, explicit choices of t_k, and proofs of the rate transfers; the abstract is kept concise as is conventional.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that t_k can be chosen to transfer PGD rates in the smooth strongly convex, smooth convex, and bounded-gradient convex regimes is load-bearing, yet the abstract provides neither the explicit selection rule for t_k nor any indication of how the existence of such radii is proved; without these details the rate-transfer statements cannot be verified.

Authors: The abstract serves as a high-level summary and therefore omits the explicit formulas for t_k and the existence proofs, which appear in the main text. In Section 3 we derive concrete choices (e.g., t_k = min{1/L, c / ||grad f(x_k)||} for appropriate constants c, or t_k proportional to the strong-convexity parameter in the linear-rate regime) and prove that such positive radii always exist under the stated smoothness and convexity assumptions. The existence argument proceeds by showing that the local linear minimization over X cap B(x_k, t_k) coincides with the projected-gradient step up to an O(t_k^2) error that can be made arbitrarily small while still guaranteeing descent; the rate-transfer theorems then follow by standard PGD analysis with this controlled perturbation. We do not believe the abstract requires these technical details. revision: no
Referee: [Abstract] Abstract: the statement that Local LMO 'transfers the known (unaccelerated) convergence rates of PGD' assumes that the local-oracle iteration can be analyzed by direct reduction to PGD; the abstract does not indicate whether this reduction is exact or requires additional arguments to control the difference between the local and global minimizers.

Authors: The reduction is not exact. The paper develops additional arguments that bound the difference between the local LMO solution and the true PGD iterate. Specifically, the proofs establish that ||x_{k+1} - P_X(x_k - eta grad f(x_k))|| = O(t_k) for step-size eta chosen consistently with the smoothness constant; by selecting t_k small enough relative to the current gradient norm (yet large enough to ensure sufficient progress), the extra error terms are absorbed into the standard PGD convergence bounds without degrading the rates. This controlled-approximation analysis is carried out in detail for each regime (smooth strongly convex, smooth convex, bounded-gradient convex, etc.) and yields the claimed transfer of unaccelerated PGD rates. The abstract's use of 'transfers' refers to this rigorously justified transfer rather than an exact equivalence. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract defines the Local LMO iteration directly as the argmin of the linear functional over the intersection of the constraint set X with a ball of radius t_k centered at the current point. It notes that the scheme reduces to standard gradient descent when X is unconstrained or affine, and states that convergence rates are transferred from known (external) results on unaccelerated projected gradient descent. No equations or claims in the provided text reduce a prediction or rate to a fitted parameter, self-citation, or definitional tautology; the t_k choice is presented as an assumption whose existence is to be verified in the (unavailable) analysis rather than presupposed by the result itself. The derivation chain therefore remains self-contained against standard external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Analysis relies on standard optimization assumptions for different regimes (smoothness, convexity, strong convexity, bounded gradients) and the existence of suitable t_k; no new entities are postulated.

free parameters (1)

t_k
Radius of the local ball, interpreted as effective stepsize and chosen suitably for each regime.

axioms (1)

domain assumption Standard assumptions on f (smoothness, convexity, strong convexity, bounded gradients) as needed for each convergence regime.
Invoked to establish rates matching those of PGD.

pith-pipeline@v0.9.0 · 5658 in / 1372 out tokens · 46283 ms · 2026-05-12T01:12:48.496562+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 2 internal anchors

[1]

Proximit\'

Moreau, Jean-Jacques , journal =. Proximit\'. 1965 , pages =

work page 1965
[2]

arXiv preprint arXiv:2202.08711 , title =

J\'er\^. arXiv preprint arXiv:2202.08711 , title =

work page arXiv
[3]

Complexity bounds for primal-dual methods mini-mizing the model of objective function

Yurii Nesterov , journal =. Complexity bounds for primal-dual methods mini-mizing the model of objective function. , 2017 , year =

work page 2017
[4]

A Conditional Gradient Framework for Composite Convex Minimizationwith Applications to Semidefinite Programming , year =

Alp Yurtsever and Olivier Fercoq and Francesco Locatello and Volkan Cevher , booktitle =. A Conditional Gradient Framework for Composite Convex Minimizationwith Applications to Semidefinite Programming , year =

work page
[5]

Naval Research Logistics Quarterly , volume =

Frank, Marguerite and Wolfe, Philip , title =. Naval Research Logistics Quarterly , volume =

work page
[6]

Proceedings of the 30th International Conference on Machine Learning (ICML) , pages =

Martin Jaggi , title =. Proceedings of the 30th International Conference on Machine Learning (ICML) , pages =

work page
[7]

Advances in Neural Information Processing Systems , pages =

Dan Garber and Elad Hazan , title =. Advances in Neural Information Processing Systems , pages =

work page
[8]

Advances in Neural Information Processing Systems , volume=

Faster projection-free convex optimization over the spectrahedron , author=. Advances in Neural Information Processing Systems , volume=

work page
[9]

2004 , publisher=

Convex optimization , author=. 2004 , publisher=

work page 2004
[10]

Journal of Machine Learning Research , volume=

Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals , author=. Journal of Machine Learning Research , volume=

work page
[11]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Fast Projection-Free Approach (without Optimization Oracle) for Optimization over Compact Convex Set , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page
[12]

Dai, Josef and Pan, Xuehai and Sun, Ruiyang and Ji, Jiaming and Xu, Xinbo and Liu, Mickel and Wang, Yizhou and Yang, Yaodong , journal=. Safe

work page
[13]

Advances in Neural Information Processing Systems , volume=

Constrained optimization to train neural networks on critical and under-represented classes , author=. Advances in Neural Information Processing Systems , volume=

work page
[14]

arXiv preprint arXiv:2601.04849 , year=

Stability of Constrained Optimization Models for Structured Signal Recovery , author=. arXiv preprint arXiv:2601.04849 , year=

work page arXiv
[15]

Real-Time Convex Optimization in Signal Processing , volume =

Mattingley, John and Boyd, Stephen , year =. Real-Time Convex Optimization in Signal Processing , volume =. Signal Processing Magazine, IEEE , doi =

work page
[16]

On portfolio optimization: Imposing the right constraints , journal =

Patrick Behr and Andre Guettler and Felix Miebs , keywords =. On portfolio optimization: Imposing the right constraints , journal =. 2013 , issn =

work page 2013
[17]

Hathaway , title =

Richard J. Hathaway , title =. The Annals of Statistics , number =. 1985 , doi =

work page 1985
[18]

Density Estimation under Constraints , urldate =

Peter Hall and Brett Presnell , journal =. Density Estimation under Constraints , urldate =

work page
[19]

Advances in Neural Information Processing Systems , volume=

Empirical risk minimization under fairness constraints , author=. Advances in Neural Information Processing Systems , volume=

work page
[20]

Deep neural network training with

Pokutta, Sebastian and Spiegel, Christoph and Zimmer, Max , journal=. Deep neural network training with

work page
[21]

Safety-constrained reinforcement learning with a distributional safety critic , volume =

Yang, Qisong and Simão, Thiago and Tindemans, Simon and Spaan, Matthijs , year =. Safety-constrained reinforcement learning with a distributional safety critic , volume =. Machine Learning , doi =

work page
[22]

arXiv preprint arXiv:1309.5550 , year=

The complexity of large-scale convex programming under a linear optimization oracle , author=. arXiv preprint arXiv:1309.5550 , year=

work page arXiv
[23]

Some comments on

Gu\'. Some comments on. Math. Program. , month = may, pages =. 1986 , issue_date =

work page 1986
[24]

and Polyak, Boris T

Levitin, Evgeny S. and Polyak, Boris T. , year =. Constrained Minimization Methods , volume =. USSR Computational Mathematics and Mathematical Physics , doi =

work page
[25]

Keller Jordan and Yuchen Jin and Vlado Boza and Jiacheng You and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =

work page
[26]

Training Deep Learning Models with Norm-Constrained

Pethick, Thomas and Xie, Wanyun and Antonakopoulos, Kimon and Zhu, Zhenyu and Silveti-Falls, Antonio and Cevher, Volkan , journal=. Training Deep Learning Models with Norm-Constrained

work page
[27]

Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of lmo-based Optimizers for LLMs)

Riabinin, Artem and Shulgin, Egor and Gruntkowska, Kaja and Richt. Gluon: Making. arXiv preprint arXiv:2505.13416 , year=

work page arXiv
[28]

Gruntkowska, Kaja and Li, Hanmin and Rane, Aadi and Richt. The. arXiv preprint arXiv:2502.02002 , year=

work page arXiv
[29]

Gruntkowska, Kaja and Richt. Non-. arXiv preprint arXiv:2510.00823 , year=

work page arXiv
[30]

Mathematical Programming , year =

Nesterov, Yurii , title =. Mathematical Programming , year =

work page
[31]

1970 , publisher=

Integer and Nonlinear Programming , author=. 1970 , publisher=

work page 1970
[32]

On the global linear convergence of

Lacoste-Julien, Simon and Jaggi, Martin , journal=. On the global linear convergence of

work page
[33]

Sinkhorn barycenters with free support via

Luise, Giulia and Salzo, Saverio and Pontil, Massimiliano and Ciliberto, Carlo , journal=. Sinkhorn barycenters with free support via

work page
[34]

arXiv preprint arXiv:2312.11200 , year=

Solving the optimal experiment design problem with mixed-integer convex methods , author=. arXiv preprint arXiv:2312.11200 , year=

work page arXiv
[35]

Don't Be Greedy, Just Relax!

Roux, Christophe and Zimmer, Max and d'Aspremont, Alexandre and Pokutta, Sebastian , journal=. Don't Be Greedy, Just Relax!

work page
[36]

Efficient Image and Video Co-localization with F rank- W olfe Algorithm

Joulin, Armand and Tang, Kevin and Fei-Fei, Li. Efficient Image and Video Co-localization with F rank- W olfe Algorithm. Computer Vision -- ECCV 2014. 2014

work page 2014
[37]

A Unified Continuous Greedy Algorithm for Submodular Maximization , year=

Feldman, Moran and Naor, Joseph and Schwartz, Roy , booktitle=. A Unified Continuous Greedy Algorithm for Submodular Maximization , year=

work page
[38]

Mathematical Programming , volume=

Submodular functions: from discrete to continuous domains , author=. Mathematical Programming , volume=. 2019 , publisher=

work page 2019
[39]

Block-coordinate

Lacoste-Julien, Simon and Jaggi, Martin and Schmidt, Mark and Pletscher, Patrick , booktitle=. Block-coordinate. 2013 , organization=

work page 2013
[40]

Stochastic

N. Stochastic. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[41]

Improved local models and new

Designolle, S. Improved local models and new. Physical Review Research , volume=. 2023 , publisher=

work page 2023
[42]

Lower Bounds for

Halbey, Jannis and Deza, Daniel and Zimmer, Max and Roux, Christophe and Stellato, Bartolomeo and Pokutta, Sebastian , journal=. Lower Bounds for

work page
[43]

arXiv preprint arXiv:2211.14103 , year=

Conditional gradient methods , author=. arXiv preprint arXiv:2211.14103 , year=

work page arXiv
[44]

arXiv preprint arXiv:2602.22608 , year=

Lower Bounds for Linear Minimization Oracle Methods Optimizing over Strongly Convex Sets , author=. arXiv preprint arXiv:2602.22608 , year=

work page arXiv
[45]

1994 , doi =

Nesterov, Yurii and Nemirovskii, Arkadii , title =. 1994 , doi =

work page 1994
[46]

, title =

Wright, Stephen J. , title =. 1997 , doi =

work page 1997
[47]

SIAM Journal on Optimization , volume=

A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization , author=. SIAM Journal on Optimization , volume=. 2016 , publisher=

work page 2016
[48]

J. B. Rosen , journal =. The Gradient Projection Method for Nonlinear Programming. Part I. Linear Constraints , urldate =

work page
[49]

and Gomory, Ralph E

Gilmore, Paul C. and Gomory, Ralph E. , year =. A Linear Programming Approach to the Cutting Stock Problem I , volume =. Oper Res , doi =

work page
[50]

New Variants of Bundle Methods , volume =

Lemar. New Variants of Bundle Methods , volume =. Math. Program. , doi =. 1995 , month =

work page 1995
[51]

Geometric Algorithms and Combinatorial Optimization , publisher =

Gr. Geometric Algorithms and Combinatorial Optimization , publisher =. 1993 , doi =

work page 1993
[52]

Sparse Approximate Solutions to Semidefinite Programs

Hazan, Elad. Sparse Approximate Solutions to Semidefinite Programs. LATIN 2008: Theoretical Informatics. 2008

work page 2008
[53]

Nemirovski, Arkadi and Yudin, David , title =

work page
[54]

Bertsekas , title =

Dimitri P. Bertsekas , title =

work page
[55]

1997 , issn =

Exponentiated Gradient versus Gradient Descent for Linear Predictors , journal =. 1997 , issn =

work page 1997
[56]

Pokutta, Sebastian , journal=. The. 2024 , publisher=

work page 2024
[57]

Convergence Rate of

Schmidt, Mark , year=. Convergence Rate of

work page
[58]

Wright , title =

Jorge Nocedal and Stephen J. Wright , title =. 2006 , edition =

work page 2006
[59]

2018 , publisher=

Lectures on convex optimization , author=. 2018 , publisher=

work page 2018
[60]

Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

Moondra, Jai and Mortagy, Hassan and Gupta, Swati , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

work page 2021
[61]

arXiv preprint arXiv:1603.00522 , year=

Solving combinatorial games using products, projections and lexicographically optimal bases , author=. arXiv preprint arXiv:1603.00522 , year=

work page arXiv
[62]

, publisher =

Polyak, Boris T. , publisher =. Introduction to optimization , year =

work page
[63]

2017 , isbn =

Beck, Amir , title =. 2017 , isbn =

work page 2017
[64]

Efficient

Yu, Adams Wei and Su, Hao and Fei-Fei, Li , journal=. Efficient

work page
[65]

arXiv preprint arXiv:1309.1541 , year=

Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application , author=. arXiv preprint arXiv:1309.1541 , year=

work page arXiv
[66]

2008 , isbn =

Duchi, John and Shalev-Shwartz, Shai and Singer, Yoram and Chandra, Tushar , title =. 2008 , isbn =. doi:10.1145/1390156.1390191 , booktitle =

work page doi:10.1145/1390156.1390191 2008
[67]

, journal=

Rutkowski, Krzysztof E. , journal=. Closed-form expressions for projectors onto polyhedral sets in. 2017 , publisher=

work page 2017
[68]

The Thirteenth International Conference on Learning Representations , year=

Methods for Convex (L\_0, L\_1) -Smooth Optimization: Clipping, Acceleration, and Adaptivity , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[69]

Vyguzov, AA and Stonyakin, FS , journal=. Frank-

work page
[70]

Journal of the Royal Statistical Society: Series B (Methodological) , volume =

Tibshirani, Robert , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =. 1996 , month =

work page 1996
[71]

SIAM review , volume=

Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization , author=. SIAM review , volume=. 2010 , publisher=

work page 2010
[72]

and Ng, Andrew Y

Blei, David M. and Ng, Andrew Y. and Jordan, Michael I. , title =. J. Mach. Learn. Res. , month = mar, pages =. 2003 , issue_date =

work page 2003
[73]

Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers.arXiv preprint arXiv:2508.10480, 2025

Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers , author=. arXiv preprint arXiv:2508.10480 , year=

work page arXiv
[74]

International Conference on Machine Learning , pages=

On orthogonality and learning recurrent networks with long term dependencies , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017
[75]

Advances in Neural Information Processing Systems , volume=

Can we gain more from orthogonality regularizations in training deep networks? , author=. Advances in Neural Information Processing Systems , volume=

work page
[76]

Explaining and Harnessing Adversarial Examples

Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[77]

Towards Deep Learning Models Resistant to Adversarial Attacks

Towards deep learning models resistant to adversarial attacks , author=. arXiv preprint arXiv:1706.06083 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[78]

Artificial Intelligence and Statistics , pages=

Fairness constraints: Mechanisms for fair classification , author=. Artificial Intelligence and Statistics , pages=. 2017 , organization=

work page 2017
[79]

Proceedings of the 35th International Conference on Machine Learning , pages =

A Reductions Approach to Fair Classification , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

work page 2018
[80]

Optimization Algorithms on Matrix Manifolds

Absil, Pierre-Antoine and Mahony, Robert and Sepulchre, Rodolphe. Optimization Algorithms on Matrix Manifolds

work page

Showing first 80 references.