pith. machine review for the scientific record. sign in

arxiv: 2605.08850 · v1 · submitted 2026-05-09 · 🧮 math.OC · cs.LG· stat.ML

Recognition: no theorem link

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle

Hanmin Li, Kaja Gruntkowska, Peter Richt\'arik

Pith reviewed 2026-05-12 01:12 UTC · model grok-4.3

classification 🧮 math.OC cs.LGstat.ML
keywords projection-free optimizationlinear minimization oracleFrank-Wolfeconstrained optimizationgradient descentconvergence ratesconvex optimization
0
0 comments X

The pith

Local LMO replaces the global linear minimization oracle of Frank-Wolfe with a local version over a small ball, allowing it to match the convergence rates of projected gradient descent without assuming a bounded feasible set or function

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Local LMO, a projection-free method for constrained optimization that uses a local linear minimization oracle instead of the global one in Frank-Wolfe. This local oracle minimizes the linear approximation over the intersection of the constraint set and a ball of radius t_k around the current point, where t_k acts as an effective stepsize. The method reduces to standard gradient descent when there are no constraints or when the constraint is an affine subspace. The authors prove that Local LMO achieves the same unaccelerated convergence rates as projected gradient descent in regimes including convex functions with bounded gradients, smooth strongly convex functions, and others, even when the feasible set is unbounded.

Core claim

Local LMO performs the update by solving a linear minimization problem over the constraint set intersected with a ball of radius t_k centered at the current iterate. This scheme transfers the known convergence rates of projected gradient descent to the projection-free setting across multiple function classes, including sublinear rates for convex functions with bounded gradients without needing curvature assumptions, linear rates for smooth strongly convex functions, and sharp rates in smooth convex, non-convex, and stochastic settings, all without requiring the feasible set to be bounded.

What carries the argument

The local linear minimization oracle, which finds the point in the intersection of the feasible set and a small ball that minimizes the inner product with the gradient, serving as a localized projection-free step that generalizes gradient descent.

Load-bearing premise

The analysis assumes that positive radii t_k can be chosen appropriately to achieve the desired convergence rates under the various function regimes.

What would settle it

Finding a convex function with bounded gradients over an unbounded set where no sequence of radii t_k lets Local LMO achieve the sublinear rate known for projected gradient descent would disprove the rate transfer.

Figures

Figures reproduced from arXiv: 2605.08850 by Hanmin Li, Kaja Gruntkowska, Peter Richt\'arik.

Figure 1
Figure 1. Figure 1: Illustration of Local LMO dynamics for an L–smooth (L = 100) and µ–strongly convex (µ = 1) quadratic f : R 2 → R, and a box constraint of radius 0.5 centered at c = (−1, 0) (left) and c = (1, 1) (right), with the radius rule tk = θ∥xk − x⋆∥, where θ = 2√ µL/(L + µ) ≈ 0.19802 (see Theorem 3.5). Shown: 100 iterates {xk} of Local LMO and the corresponding balls B(xk, tk). Note that ∥xk+1 − xk∥ = tk for all k … view at source ↗
Figure 2
Figure 2. Figure 2: Semi-log plots of the squared distance (left) or distance (right) to the constrained minimizer. Left: convergence behaviour of Local LMO, PGD, and FW over the first 100 iterations, including the geometric upper bound ( (L−µ)/(L+µ)) 2k ∥x0 − x⋆∥ 2 . Right: comparison of the adaptive radius rule tk = θ∥xk − x⋆∥ with geometric radius schedules tk = θ∥x0 − x⋆∥q k for 10 values of q ∈ [0.8, 0.95]. let X = [2, 4… view at source ↗
Figure 3
Figure 3. Figure 3: Two-dimensional illustration of Local LMO with the radius rule tk = θ∥xk − x⋆∥. The plot shows the contour lines of the quadratic objective, the feasible box X = [2, 4] × [2, 4] with center c = (3, 3), the unconstrained minimizer (0, 0), the constrained minimizer x⋆, the first few trust-region balls, and 15 iterates {xk}. In this case, L = 100, µ = 1 and θ ≈ 0.19802. Although this estimate is only a best-i… view at source ↗
Figure 4
Figure 4. Figure 4: Semi-log plot of the gradient-difference quantity [PITH_FULL_IMAGE:figures/full_fig_p067_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Semi-log plot of the squared distance to the constrained minimizer for [PITH_FULL_IMAGE:figures/full_fig_p067_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Semi-log plot of the squared distance to the constrained minimizer for [PITH_FULL_IMAGE:figures/full_fig_p068_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Semi-log plot comparing the adaptive radius rule [PITH_FULL_IMAGE:figures/full_fig_p069_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Illustration of Local LMO dynamics for an L–smooth (L = 100) and µ–strongly convex (µ = 1) quadratic f : R 2 → R, and three ball constraints of radius 1 (with ℓ1, ℓ2 and ℓ∞ ball geometries), with the Local LMO radius rule tk = θ∥xk − x⋆∥, where θ ≈ 0.19802 (see Theorem 3.5). Shown: 100 iterates {xk} of Local LMO and the corresponding balls B(xk, tk). Note that ∥xk+1 − xk∥ = tk for all k (Theorem 3.1(ii) sa… view at source ↗
read the original abstract

We design Local LMO - a new projection-free gradient-type method for constrained optimization. The key algorithmic idea is to replace the global linear minimization oracle over the constraint set used by Frank-Wolfe (FW) with a local linear minimization oracle over the intersection of the constraint set and a "small" ball centered at the current iterate. In particular, when minimizing $f:\mathbb{R}^d\to \mathbb{R}$ over a constraint $\emptyset\neq\mathcal{X}\subseteq\mathbb{R}^d$, Local LMO performs the iteration \[x_{k+1}\in \arg\min_{z\in\mathcal{X}\cap\mathcal{B}(x_{k},t_k)}\langle\nabla f(x_{k}), z \rangle,\] where $x_0\in\mathcal{X}$, and $t_k>0$ is a suitably chosen radius which can be interpreted as an effective stepsize. While designed as an alternative to FW, Local LMO is perhaps best viewed as a generalization of Gradient Descent (GD) rather than a modification of FW. Indeed, it is easy to see that Local LMO reduces to GD in the unconstrained setting and, more generally, to GD restricted to an affine subspace if the constraint $\mathcal{X}$ is affine. We prove that this simple algorithmic scheme transfers the known (unaccelerated) convergence rates of Projected Gradient Descent (PGD) to the projection-free world in several important regimes, some of which are beyond the reach of FW. In contrast to FW theory, i) our guarantees hold without requiring the feasible set $\mathcal{X}$ to be bounded, ii) our theory does not require the "curvature" assumption, which allows us to establish a standard sublinear rate for convex functions with bounded gradients, iii) we obtain a linear rate in the smooth strongly convex regime. Furthermore, we obtain sharp sublinear rates in the smooth convex and non-convex regimes, in the $(L_0,L_1)$-smooth convex regime, and in stochastic and non-differentiable settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Local LMO, a projection-free gradient method for minimizing f over a constraint set X. It replaces the global linear minimization oracle of Frank-Wolfe with a local one over X intersect the ball of radius t_k centered at the current iterate x_k, yielding the update x_{k+1} in argmin_{z in X cap B(x_k,t_k)} <grad f(x_k), z>. The central claim is that, for suitably chosen t_k > 0, this transfers the known unaccelerated convergence rates of Projected Gradient Descent to the projection-free setting across multiple regimes (linear rate for smooth strongly convex; sublinear rates for smooth convex, non-convex, (L0,L1)-smooth convex, stochastic, and non-differentiable cases), without requiring boundedness of X or the curvature assumption needed by FW. The method reduces to GD when X is unconstrained or affine.

Significance. If the claimed rate transfers are rigorously established, the result would be significant: it provides a simple, projection-free scheme that achieves PGD rates in regimes where FW is either inapplicable or slower, while avoiding the need for global oracles or bounded domains. This could broaden the practical reach of projection-free methods in settings where local linear minimization is tractable.

major comments (2)
  1. [Abstract] Abstract: the claim that t_k can be chosen to transfer PGD rates in the smooth strongly convex, smooth convex, and bounded-gradient convex regimes is load-bearing, yet the abstract provides neither the explicit selection rule for t_k nor any indication of how the existence of such radii is proved; without these details the rate-transfer statements cannot be verified.
  2. [Abstract] Abstract: the statement that Local LMO 'transfers the known (unaccelerated) convergence rates of PGD' assumes that the local-oracle iteration can be analyzed by direct reduction to PGD; the abstract does not indicate whether this reduction is exact or requires additional arguments to control the difference between the local and global minimizers.
minor comments (1)
  1. The notation B(x_k, t_k) is not defined in the provided text; it should be stated whether this is the Euclidean ball or another norm.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting the need for clarity on how the abstract summarizes the technical contributions. We address each major comment below. The manuscript body contains the full derivations, explicit choices of t_k, and proofs of the rate transfers; the abstract is kept concise as is conventional.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that t_k can be chosen to transfer PGD rates in the smooth strongly convex, smooth convex, and bounded-gradient convex regimes is load-bearing, yet the abstract provides neither the explicit selection rule for t_k nor any indication of how the existence of such radii is proved; without these details the rate-transfer statements cannot be verified.

    Authors: The abstract serves as a high-level summary and therefore omits the explicit formulas for t_k and the existence proofs, which appear in the main text. In Section 3 we derive concrete choices (e.g., t_k = min{1/L, c / ||grad f(x_k)||} for appropriate constants c, or t_k proportional to the strong-convexity parameter in the linear-rate regime) and prove that such positive radii always exist under the stated smoothness and convexity assumptions. The existence argument proceeds by showing that the local linear minimization over X cap B(x_k, t_k) coincides with the projected-gradient step up to an O(t_k^2) error that can be made arbitrarily small while still guaranteeing descent; the rate-transfer theorems then follow by standard PGD analysis with this controlled perturbation. We do not believe the abstract requires these technical details. revision: no

  2. Referee: [Abstract] Abstract: the statement that Local LMO 'transfers the known (unaccelerated) convergence rates of PGD' assumes that the local-oracle iteration can be analyzed by direct reduction to PGD; the abstract does not indicate whether this reduction is exact or requires additional arguments to control the difference between the local and global minimizers.

    Authors: The reduction is not exact. The paper develops additional arguments that bound the difference between the local LMO solution and the true PGD iterate. Specifically, the proofs establish that ||x_{k+1} - P_X(x_k - eta grad f(x_k))|| = O(t_k) for step-size eta chosen consistently with the smoothness constant; by selecting t_k small enough relative to the current gradient norm (yet large enough to ensure sufficient progress), the extra error terms are absorbed into the standard PGD convergence bounds without degrading the rates. This controlled-approximation analysis is carried out in detail for each regime (smooth strongly convex, smooth convex, bounded-gradient convex, etc.) and yields the claimed transfer of unaccelerated PGD rates. The abstract's use of 'transfers' refers to this rigorously justified transfer rather than an exact equivalence. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract defines the Local LMO iteration directly as the argmin of the linear functional over the intersection of the constraint set X with a ball of radius t_k centered at the current point. It notes that the scheme reduces to standard gradient descent when X is unconstrained or affine, and states that convergence rates are transferred from known (external) results on unaccelerated projected gradient descent. No equations or claims in the provided text reduce a prediction or rate to a fitted parameter, self-citation, or definitional tautology; the t_k choice is presented as an assumption whose existence is to be verified in the (unavailable) analysis rather than presupposed by the result itself. The derivation chain therefore remains self-contained against standard external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Analysis relies on standard optimization assumptions for different regimes (smoothness, convexity, strong convexity, bounded gradients) and the existence of suitable t_k; no new entities are postulated.

free parameters (1)
  • t_k
    Radius of the local ball, interpreted as effective stepsize and chosen suitably for each regime.
axioms (1)
  • domain assumption Standard assumptions on f (smoothness, convexity, strong convexity, bounded gradients) as needed for each convergence regime.
    Invoked to establish rates matching those of PGD.

pith-pipeline@v0.9.0 · 5658 in / 1372 out tokens · 46283 ms · 2026-05-12T01:12:48.496562+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 2 internal anchors

  1. [1]

    Proximit\'

    Moreau, Jean-Jacques , journal =. Proximit\'. 1965 , pages =

  2. [2]

    arXiv preprint arXiv:2202.08711 , title =

    J\'er\^. arXiv preprint arXiv:2202.08711 , title =

  3. [3]

    Complexity bounds for primal-dual methods mini-mizing the model of objective function

    Yurii Nesterov , journal =. Complexity bounds for primal-dual methods mini-mizing the model of objective function. , 2017 , year =

  4. [4]

    A Conditional Gradient Framework for Composite Convex Minimizationwith Applications to Semidefinite Programming , year =

    Alp Yurtsever and Olivier Fercoq and Francesco Locatello and Volkan Cevher , booktitle =. A Conditional Gradient Framework for Composite Convex Minimizationwith Applications to Semidefinite Programming , year =

  5. [5]

    Naval Research Logistics Quarterly , volume =

    Frank, Marguerite and Wolfe, Philip , title =. Naval Research Logistics Quarterly , volume =

  6. [6]

    Proceedings of the 30th International Conference on Machine Learning (ICML) , pages =

    Martin Jaggi , title =. Proceedings of the 30th International Conference on Machine Learning (ICML) , pages =

  7. [7]

    Advances in Neural Information Processing Systems , pages =

    Dan Garber and Elad Hazan , title =. Advances in Neural Information Processing Systems , pages =

  8. [8]

    Advances in Neural Information Processing Systems , volume=

    Faster projection-free convex optimization over the spectrahedron , author=. Advances in Neural Information Processing Systems , volume=

  9. [9]

    2004 , publisher=

    Convex optimization , author=. 2004 , publisher=

  10. [10]

    Journal of Machine Learning Research , volume=

    Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals , author=. Journal of Machine Learning Research , volume=

  11. [11]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Fast Projection-Free Approach (without Optimization Oracle) for Optimization over Compact Convex Set , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  12. [12]

    Dai, Josef and Pan, Xuehai and Sun, Ruiyang and Ji, Jiaming and Xu, Xinbo and Liu, Mickel and Wang, Yizhou and Yang, Yaodong , journal=. Safe

  13. [13]

    Advances in Neural Information Processing Systems , volume=

    Constrained optimization to train neural networks on critical and under-represented classes , author=. Advances in Neural Information Processing Systems , volume=

  14. [14]

    arXiv preprint arXiv:2601.04849 , year=

    Stability of Constrained Optimization Models for Structured Signal Recovery , author=. arXiv preprint arXiv:2601.04849 , year=

  15. [15]

    Real-Time Convex Optimization in Signal Processing , volume =

    Mattingley, John and Boyd, Stephen , year =. Real-Time Convex Optimization in Signal Processing , volume =. Signal Processing Magazine, IEEE , doi =

  16. [16]

    On portfolio optimization: Imposing the right constraints , journal =

    Patrick Behr and Andre Guettler and Felix Miebs , keywords =. On portfolio optimization: Imposing the right constraints , journal =. 2013 , issn =

  17. [17]

    Hathaway , title =

    Richard J. Hathaway , title =. The Annals of Statistics , number =. 1985 , doi =

  18. [18]

    Density Estimation under Constraints , urldate =

    Peter Hall and Brett Presnell , journal =. Density Estimation under Constraints , urldate =

  19. [19]

    Advances in Neural Information Processing Systems , volume=

    Empirical risk minimization under fairness constraints , author=. Advances in Neural Information Processing Systems , volume=

  20. [20]

    Deep neural network training with

    Pokutta, Sebastian and Spiegel, Christoph and Zimmer, Max , journal=. Deep neural network training with

  21. [21]

    Safety-constrained reinforcement learning with a distributional safety critic , volume =

    Yang, Qisong and Simão, Thiago and Tindemans, Simon and Spaan, Matthijs , year =. Safety-constrained reinforcement learning with a distributional safety critic , volume =. Machine Learning , doi =

  22. [22]

    arXiv preprint arXiv:1309.5550 , year=

    The complexity of large-scale convex programming under a linear optimization oracle , author=. arXiv preprint arXiv:1309.5550 , year=

  23. [23]

    Some comments on

    Gu\'. Some comments on. Math. Program. , month = may, pages =. 1986 , issue_date =

  24. [24]

    and Polyak, Boris T

    Levitin, Evgeny S. and Polyak, Boris T. , year =. Constrained Minimization Methods , volume =. USSR Computational Mathematics and Mathematical Physics , doi =

  25. [25]

    Keller Jordan and Yuchen Jin and Vlado Boza and Jiacheng You and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =

  26. [26]

    Training Deep Learning Models with Norm-Constrained

    Pethick, Thomas and Xie, Wanyun and Antonakopoulos, Kimon and Zhu, Zhenyu and Silveti-Falls, Antonio and Cevher, Volkan , journal=. Training Deep Learning Models with Norm-Constrained

  27. [27]

    Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of lmo-based Optimizers for LLMs)

    Riabinin, Artem and Shulgin, Egor and Gruntkowska, Kaja and Richt. Gluon: Making. arXiv preprint arXiv:2505.13416 , year=

  28. [28]

    Gruntkowska, Kaja and Li, Hanmin and Rane, Aadi and Richt. The. arXiv preprint arXiv:2502.02002 , year=

  29. [29]

    Gruntkowska, Kaja and Richt. Non-. arXiv preprint arXiv:2510.00823 , year=

  30. [30]

    Mathematical Programming , year =

    Nesterov, Yurii , title =. Mathematical Programming , year =

  31. [31]

    1970 , publisher=

    Integer and Nonlinear Programming , author=. 1970 , publisher=

  32. [32]

    On the global linear convergence of

    Lacoste-Julien, Simon and Jaggi, Martin , journal=. On the global linear convergence of

  33. [33]

    Sinkhorn barycenters with free support via

    Luise, Giulia and Salzo, Saverio and Pontil, Massimiliano and Ciliberto, Carlo , journal=. Sinkhorn barycenters with free support via

  34. [34]

    arXiv preprint arXiv:2312.11200 , year=

    Solving the optimal experiment design problem with mixed-integer convex methods , author=. arXiv preprint arXiv:2312.11200 , year=

  35. [35]

    Don't Be Greedy, Just Relax!

    Roux, Christophe and Zimmer, Max and d'Aspremont, Alexandre and Pokutta, Sebastian , journal=. Don't Be Greedy, Just Relax!

  36. [36]

    Efficient Image and Video Co-localization with F rank- W olfe Algorithm

    Joulin, Armand and Tang, Kevin and Fei-Fei, Li. Efficient Image and Video Co-localization with F rank- W olfe Algorithm. Computer Vision -- ECCV 2014. 2014

  37. [37]

    A Unified Continuous Greedy Algorithm for Submodular Maximization , year=

    Feldman, Moran and Naor, Joseph and Schwartz, Roy , booktitle=. A Unified Continuous Greedy Algorithm for Submodular Maximization , year=

  38. [38]

    Mathematical Programming , volume=

    Submodular functions: from discrete to continuous domains , author=. Mathematical Programming , volume=. 2019 , publisher=

  39. [39]

    Block-coordinate

    Lacoste-Julien, Simon and Jaggi, Martin and Schmidt, Mark and Pletscher, Patrick , booktitle=. Block-coordinate. 2013 , organization=

  40. [40]

    Stochastic

    N. Stochastic. International Conference on Machine Learning , pages=. 2020 , organization=

  41. [41]

    Improved local models and new

    Designolle, S. Improved local models and new. Physical Review Research , volume=. 2023 , publisher=

  42. [42]

    Lower Bounds for

    Halbey, Jannis and Deza, Daniel and Zimmer, Max and Roux, Christophe and Stellato, Bartolomeo and Pokutta, Sebastian , journal=. Lower Bounds for

  43. [43]

    arXiv preprint arXiv:2211.14103 , year=

    Conditional gradient methods , author=. arXiv preprint arXiv:2211.14103 , year=

  44. [44]

    arXiv preprint arXiv:2602.22608 , year=

    Lower Bounds for Linear Minimization Oracle Methods Optimizing over Strongly Convex Sets , author=. arXiv preprint arXiv:2602.22608 , year=

  45. [45]

    1994 , doi =

    Nesterov, Yurii and Nemirovskii, Arkadii , title =. 1994 , doi =

  46. [46]

    , title =

    Wright, Stephen J. , title =. 1997 , doi =

  47. [47]

    SIAM Journal on Optimization , volume=

    A linearly convergent variant of the conditional gradient algorithm under strong convexity, with applications to online and stochastic optimization , author=. SIAM Journal on Optimization , volume=. 2016 , publisher=

  48. [48]

    J. B. Rosen , journal =. The Gradient Projection Method for Nonlinear Programming. Part I. Linear Constraints , urldate =

  49. [49]

    and Gomory, Ralph E

    Gilmore, Paul C. and Gomory, Ralph E. , year =. A Linear Programming Approach to the Cutting Stock Problem I , volume =. Oper Res , doi =

  50. [50]

    New Variants of Bundle Methods , volume =

    Lemar. New Variants of Bundle Methods , volume =. Math. Program. , doi =. 1995 , month =

  51. [51]

    Geometric Algorithms and Combinatorial Optimization , publisher =

    Gr. Geometric Algorithms and Combinatorial Optimization , publisher =. 1993 , doi =

  52. [52]

    Sparse Approximate Solutions to Semidefinite Programs

    Hazan, Elad. Sparse Approximate Solutions to Semidefinite Programs. LATIN 2008: Theoretical Informatics. 2008

  53. [53]

    Nemirovski, Arkadi and Yudin, David , title =

  54. [54]

    Bertsekas , title =

    Dimitri P. Bertsekas , title =

  55. [55]

    1997 , issn =

    Exponentiated Gradient versus Gradient Descent for Linear Predictors , journal =. 1997 , issn =

  56. [56]

    Pokutta, Sebastian , journal=. The. 2024 , publisher=

  57. [57]

    Convergence Rate of

    Schmidt, Mark , year=. Convergence Rate of

  58. [58]

    Wright , title =

    Jorge Nocedal and Stephen J. Wright , title =. 2006 , edition =

  59. [59]

    2018 , publisher=

    Lectures on convex optimization , author=. 2018 , publisher=

  60. [60]

    Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =

    Moondra, Jai and Mortagy, Hassan and Gupta, Swati , title =. Proceedings of the 35th International Conference on Neural Information Processing Systems , articleno =. 2021 , isbn =

  61. [61]

    arXiv preprint arXiv:1603.00522 , year=

    Solving combinatorial games using products, projections and lexicographically optimal bases , author=. arXiv preprint arXiv:1603.00522 , year=

  62. [62]

    , publisher =

    Polyak, Boris T. , publisher =. Introduction to optimization , year =

  63. [63]

    2017 , isbn =

    Beck, Amir , title =. 2017 , isbn =

  64. [64]

    Efficient

    Yu, Adams Wei and Su, Hao and Fei-Fei, Li , journal=. Efficient

  65. [65]

    arXiv preprint arXiv:1309.1541 , year=

    Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application , author=. arXiv preprint arXiv:1309.1541 , year=

  66. [66]

    2008 , isbn =

    Duchi, John and Shalev-Shwartz, Shai and Singer, Yoram and Chandra, Tushar , title =. 2008 , isbn =. doi:10.1145/1390156.1390191 , booktitle =

  67. [67]

    , journal=

    Rutkowski, Krzysztof E. , journal=. Closed-form expressions for projectors onto polyhedral sets in. 2017 , publisher=

  68. [68]

    The Thirteenth International Conference on Learning Representations , year=

    Methods for Convex (L\_0, L\_1) -Smooth Optimization: Clipping, Acceleration, and Adaptivity , author=. The Thirteenth International Conference on Learning Representations , year=

  69. [69]

    Vyguzov, AA and Stonyakin, FS , journal=. Frank-

  70. [70]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume =

    Tibshirani, Robert , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =. 1996 , month =

  71. [71]

    SIAM review , volume=

    Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization , author=. SIAM review , volume=. 2010 , publisher=

  72. [72]

    and Ng, Andrew Y

    Blei, David M. and Ng, Andrew Y. and Jordan, Michael I. , title =. J. Mach. Learn. Res. , month = mar, pages =. 2003 , issue_date =

  73. [73]

    Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers.arXiv preprint arXiv:2508.10480, 2025

    Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers , author=. arXiv preprint arXiv:2508.10480 , year=

  74. [74]

    International Conference on Machine Learning , pages=

    On orthogonality and learning recurrent networks with long term dependencies , author=. International Conference on Machine Learning , pages=. 2017 , organization=

  75. [75]

    Advances in Neural Information Processing Systems , volume=

    Can we gain more from orthogonality regularizations in training deep networks? , author=. Advances in Neural Information Processing Systems , volume=

  76. [76]

    Explaining and Harnessing Adversarial Examples

    Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=

  77. [77]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Towards deep learning models resistant to adversarial attacks , author=. arXiv preprint arXiv:1706.06083 , year=

  78. [78]

    Artificial Intelligence and Statistics , pages=

    Fairness constraints: Mechanisms for fair classification , author=. Artificial Intelligence and Statistics , pages=. 2017 , organization=

  79. [79]

    Proceedings of the 35th International Conference on Machine Learning , pages =

    A Reductions Approach to Fair Classification , author =. Proceedings of the 35th International Conference on Machine Learning , pages =. 2018 , editor =

  80. [80]

    Optimization Algorithms on Matrix Manifolds

    Absil, Pierre-Antoine and Mahony, Robert and Sepulchre, Rodolphe. Optimization Algorithms on Matrix Manifolds

Showing first 80 references.