pith. machine review for the scientific record. sign in

arxiv: 2605.10183 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: no theorem link

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:14 UTC · model grok-4.3

classification 💻 cs.LG
keywords sharpness-aware minimizationadversarial perturbationloss spacegeneralizationflat minimacurvaturegradient norm
0
0 comments X

The pith

Fixing the allowed loss increase rather than the parameter radius in sharpness-aware minimization removes gradient dominance and improves generalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Sharpness-Aware Minimization seeks flat minima by minimizing the worst-case loss inside a neighborhood of the current parameters. The standard approach fixes the size of that neighborhood as a radius in parameter space, which relies on a first-order approximation. The paper instead fixes the maximum loss increase allowed inside the neighborhood. This inversion removes the part of the signal that scales with gradient size and leaves terms that reflect loss curvature. A sympathetic reader would care because the change requires no extra computation yet produces models that perform better on held-out data across many tasks.

Core claim

Loss-Equated SAM (LE-SAM) inverts the traditional SAM mechanism by replacing the fixed perturbation radius in parameter space with a fixed loss-space budget. This change effectively removes gradient-norm-dominated learning signals and shifts optimization toward curvature-dominated terms, resulting in improved generalization performance.

What carries the argument

The loss-equated adversarial perturbation, which bounds the worst-case loss increase by a fixed value instead of bounding the Euclidean distance in parameter space.

If this is right

  • LE-SAM consistently outperforms both SAM and its existing variants on diverse benchmarks and tasks.
  • The optimizer places greater weight on curvature information during each update step.
  • The resulting minima produce stronger generalization without any increase in training cost.
  • The same inversion principle applies across multiple vision and language tasks where SAM is currently used.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Loss-bounded perturbations could be substituted into other minimax formulations used for robustness or domain adaptation.
  • An adaptive version that slowly tightens the loss budget during training might combine the benefits of both radius and loss views.
  • The same idea invites direct comparison against second-order methods that explicitly estimate Hessian curvature.

Load-bearing premise

That fixing the loss-space budget for the perturbation directly removes gradient-norm effects and thereby shifts focus to curvature.

What would settle it

Training LE-SAM on a standard image-classification benchmark and finding test accuracy no higher than that of SAM would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.10183 by Jinping Wang, Qinhan Liu, Zhiqiang Gao, Zhiwu Xie.

Figure 1
Figure 1. Figure 1: Illustration of loss-equated adversarial perturbation. LE-SAM fixes a budget in loss space and back-solves the corre￾sponding adversarial perturbation radius in parameter space. This mechanism removes gradient norm dominated effects and shifts the optimization signal toward curvature-dominated (second-order) information. The shaded region indicates the excess loss beyond the budget due to the curvature ter… view at source ↗
Figure 2
Figure 2. Figure 2: Mechanism illustration of SAM vs. LE-SAM. Standard SAM fixes a perturbation radius in the parameter space and directly maps it to a loss increase dominated by the gradient norm. In contrast, LE-SAM fixes a budget in the loss space and back-solves the perturbation radius in the parameter space. Mapping back to the loss space removes the first-order gradient-dominated term and yields a second-order, curvatur… view at source ↗
Figure 3
Figure 3. Figure 3: (a) shows the distribution of top eigenvalues and the trace of Hessian at epoch 100 and 200 on CIFAR-100 with SAM, and our LE-SAM. (b) track the LPF value for both SAM and LE-SAM across 200 epochs SAM on CIFAR datasets, and empirically demonstrate our loss-equated mechanism can lead to a better generalization performance in a transfer learning scenario. 5.3. Flatness Metrics Eq. 13 shows that our loss-equa… view at source ↗
Figure 5
Figure 5. Figure 5: Perturbation radius across training process and the perturbation radius decreases, when optimization approaches a minimizer and gradients shrink again, ρ rises at the late stage of training. Beyond Mainly Late-Stage Gains of SAM Recent work on SAM points out that SAM works mainly in the late stage of training (Zhou et al., 2025). This phenomenon occurs since the practical surrogate of SAM contains a domina… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of loss landscape for SGD, SAM, and our LE-SAM 0 25 50 75 100 125 150 175 200 Epoch 0.0 0.1 0.2 0.3 0.4 P e r t u r b a t i o n R a d i u s Batch-wise Min-Max Mean [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Test accuracy trajectories under whole run vs. late￾stage switch training. We compare enabling SAM and LE-SAM throughout training against activating them after epoch 160 (SGD then SAM, SGD then LE-SAM). We apply all the methods to CIFAR-100 with ResNet-18. Model SAM LE-SAM LE-SAM+ ResNet-18 80.17 81.51 82.01 WRN-28-10 83.42 84.91 85.06 PyramidNet-110 84.46 85.91 86.19 [PITH_FULL_IMAGE:figures/full_fig_p00… view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity analysis of loss budget σ [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
read the original abstract

Sharpness-Aware Minimization (SAM) improves generalization by minimizing the worst-case loss within a fixed parameter-space radius neighborhood. SAM and its variants mainly rely on a first-order linearized surrogate, while flat minima are inherently a second-order (curvature) notion.We revisit this mismatch and propose Loss-Equated SAM (LE-SAM), which inverts the traditional SAM mechanism that fixed perturbation radius with a fixed loss-space budget,effectively removing gradient-norm-dominated learning signals and shifting optimization toward curvature-dominated terms. Extensive experiments across diverse benchmarks and tasks demonstrate the strong generalization ability of LESAM that consistently outperforms SAM and even its variants, achieving the state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Loss-Equated SAM (LE-SAM) as an inversion of standard Sharpness-Aware Minimization (SAM): instead of minimizing the worst-case loss inside a fixed parameter-space radius, it fixes a loss-space budget for the adversarial perturbation. This change is asserted to eliminate gradient-norm-dominated signals and emphasize curvature-dominated terms. Extensive experiments across benchmarks and tasks are reported to show that LE-SAM consistently outperforms SAM and its variants, reaching state-of-the-art generalization.

Significance. If the claimed mechanistic shift from gradient-norm to curvature dominance is rigorously derived and the empirical gains prove robust to controls for hyper-parameter tuning and implementation details, the work could refine the design of sharpness-aware optimizers and improve generalization bounds in deep learning. The empirical breadth is a potential strength, but the absence of a supporting derivation in the abstract leaves the central rationale unverified.

major comments (2)
  1. [Abstract] Abstract: the central claim that fixing a loss-space budget 'effectively removes gradient-norm-dominated learning signals and shifting optimization toward curvature-dominated terms' is presented as an immediate consequence of the inversion, yet no equation, first-order approximation, or update rule for the perturbation (e.g., arg min_ε ||ε|| s.t. L(θ+ε)−L(θ)=constant or its linearization) is supplied. This derivation is load-bearing for the mechanistic explanation and must be provided before the curvature-shift rationale can be evaluated.
  2. [Abstract] Abstract / Experiments: the assertion of 'state-of-the-art performance' and 'strong generalization ability' is stated without reference to specific tables, metrics, error bars, or ablation controls for the loss-budget hyper-parameter. Without these details the empirical claim cannot be assessed for statistical significance or confounding factors.
minor comments (2)
  1. [Abstract] Inconsistent acronym usage: 'LE-SAM' and 'LESAM' appear interchangeably; standardize to one form throughout.
  2. [Abstract] The phrase 'inverts the traditional SAM mechanism' is used without a concise contrast equation or pseudocode showing how the new perturbation differs from the standard SAM ascent step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We provide point-by-point responses to the major comments and are prepared to revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that fixing a loss-space budget 'effectively removes gradient-norm-dominated learning signals and shifting optimization toward curvature-dominated terms' is presented as an immediate consequence of the inversion, yet no equation, first-order approximation, or update rule for the perturbation (e.g., arg min_ε ||ε|| s.t. L(θ+ε)−L(θ)=constant or its linearization) is supplied. This derivation is load-bearing for the mechanistic explanation and must be provided before the curvature-shift rationale can be evaluated.

    Authors: The full manuscript in Section 3 derives the perturbation under the fixed loss budget. Using a first-order Taylor expansion, L(θ + ε) ≈ L(θ) + ∇L · ε = L(θ) + δ, leading to the minimal ||ε|| perturbation ε = (δ / ||∇L||^2) ∇L. This explicitly shows the inverse dependence on the gradient norm, diminishing gradient-norm dominance and highlighting curvature effects in higher-order terms. We will add a concise version of this approximation to the abstract in the revision. revision: yes

  2. Referee: [Abstract] Abstract / Experiments: the assertion of 'state-of-the-art performance' and 'strong generalization ability' is stated without reference to specific tables, metrics, error bars, or ablation controls for the loss-budget hyper-parameter. Without these details the empirical claim cannot be assessed for statistical significance or confounding factors.

    Authors: While the abstract is a high-level summary, the full manuscript details the empirical results in Tables 1-6, reporting mean performance metrics with standard deviations across multiple runs on various benchmarks, along with ablations for the loss-budget hyperparameter in Section 4. We will revise the abstract to include references to key tables and figures to support the state-of-the-art claim. revision: yes

Circularity Check

0 steps flagged

No circularity; proposal framed as independent inversion without equations or self-referential reductions.

full rationale

The provided abstract and description introduce LE-SAM by inverting SAM's fixed-radius perturbation to a fixed loss-space budget, asserting that this removes gradient-norm signals and emphasizes curvature. No equations, update rules, or derivations are supplied that would allow inspection for self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The text contains no self-citations at all, and the central claim is presented as a direct mechanistic consequence rather than a re-expression of prior fitted quantities or ansatzes. Per the rules, absence of any quotable reduction to inputs by construction means the derivation (such as it is) is self-contained; this is the expected honest non-finding when no load-bearing circular steps exist.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the proposal rests on the unelaborated claim that loss-budget inversion removes gradient-norm signals.

pith-pipeline@v0.9.0 · 5418 in / 975 out tokens · 43963 ms · 2026-05-12T04:14:06.892847+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 2 internal anchors

  1. [1]

    Langley , title =

    P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

  2. [2]

    Proceedings of the IEEE international conference on computer vision , pages=

    Deeper, broader and artier domain generalization , author=. Proceedings of the IEEE international conference on computer vision , pages=

  3. [3]

    arXiv preprint arXiv:2402.15152 , year=

    On the duality between sharpness-aware minimization and adversarial training , author=. arXiv preprint arXiv:2402.15152 , year=

  4. [4]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Towards deep learning models resistant to adversarial attacks , author=. arXiv preprint arXiv:1706.06083 , year=

  5. [5]

    2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

    Erm++: An improved baseline for domain generalization , author=. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=. 2025 , organization=

  6. [6]

    International Conference on Artificial Intelligence and Statistics , pages=

    Low-pass filtering sgd for recovering flat optima in the deep learning optimization landscape , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2022 , organization=

  7. [7]

    T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

  8. [8]

    M. J. Kearns , title =

  9. [9]

    Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

  10. [10]

    R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

  11. [11]

    Suppressed for Anonymity , author=

  12. [12]

    Newell and P

    A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

  13. [13]

    A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

  14. [14]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Cr-sam: Curvature regularized sharpness-aware minimization , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  15. [15]

    International Conference on Learning Representations , year=

    Sharpness-aware Minimization for Efficiently Improving Generalization , author=. International Conference on Learning Representations , year=

  16. [16]

    Proceedings of the 38th International Conference on Machine Learning , pages =

    ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

  17. [17]

    International Conference on Learning Representations , year=

    Surrogate Gap Minimization Improves Sharpness-Aware Training , author=. International Conference on Learning Representations , year=

  18. [18]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Li, Tao and Zhou, Pan and He, Zhengbao and Cheng, Xinwen and Huang, Xiaolin , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

  19. [19]

    2025 , eprint=

    Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization , author=. 2025 , eprint=

  20. [20]

    International Conference on Learning Representations , year=

    Efficient Sharpness-aware Minimization for Improved Training of Neural Networks , author=. International Conference on Learning Representations , year=

  21. [21]

    2024 , eprint=

    Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy , author=. 2024 , eprint=

  22. [22]

    The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

    A Single-Step, Sharpness-Aware Minimization is All You Need to Achieve Efficient and Accurate Sparse Training , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

  23. [23]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =

  24. [24]

    Wide Residual Networks, in: Proceedings of the British Machine Vision Conference (BMVC), pp

    Wide Residual Networks , author=. Proceedings of the British Machine Vision Conference (BMVC) , publisher=. 2016 , month=. doi:10.5244/C.30.87 , isbn=

  25. [25]

    Deep Pyramidal Residual Networks , year=

    Han, Dongyoon and Kim, Jiwhan and Kim, Junmo , booktitle=. Deep Pyramidal Residual Networks , year=

  26. [26]

    2009 , publisher=

    Learning multiple layers of features from tiny images , author=. 2009 , publisher=

  27. [27]

    ImageNet: A large-scale hierarchical image database , year=

    Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Kai Li and Li Fei-Fei , booktitle=. ImageNet: A large-scale hierarchical image database , year=

  28. [28]

    1987 , publisher=

    Introduction to optimization , author=. 1987 , publisher=

  29. [29]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library , doi =

    Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Köpf, Andreas and Yang, Edward and DeVito, Zach and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and C...

  30. [30]

    2018 , eprint=

    Visualizing the Loss Landscape of Neural Nets , author=. 2018 , eprint=

  31. [31]

    arXiv preprint arXiv:1905.00313 , year=

    Revisiting the Polyak step size , author=. arXiv preprint arXiv:1905.00313 , year=

  32. [32]

    Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

  33. [33]

    Advances in neural information processing systems , volume=

    Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

  34. [34]

    On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

    On large-batch training for deep learning: Generalization gap and sharp minima , author=. arXiv preprint arXiv:1609.04836 , year=

  35. [35]

    Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

    Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data , author=. arXiv preprint arXiv:1703.11008 , year=

  36. [36]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  37. [37]

    International conference on learning representations , year=

    Exploring balanced feature spaces for representation learning , author=. International conference on learning representations , year=

  38. [38]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Targeted supervised contrastive learning for long-tailed recognition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  39. [39]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Improving calibration for long-tailed recognition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  40. [40]

    International Conference on Machine Learning , pages=

    Feature directions matter: Long-tailed learning via rotated balanced representation , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  41. [41]

    Advances in neural information processing systems , volume=

    Inducing neural collapse in imbalanced learning: Do we really need a learnable classifier at the end of deep neural network? , author=. Advances in neural information processing systems , volume=

  42. [42]

    arXiv preprint arXiv:2505.01660 , year=

    Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification , author=. arXiv preprint arXiv:2505.01660 , year=

  43. [43]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Contrastive learning based hybrid networks for long-tailed image classification , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  44. [44]

    Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , url =

    Yao, Zhewei and Gholami, Amir and Lei, Qi and Keutzer, Kurt and Mahoney, Michael W , booktitle =. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , url =

  45. [45]

    Avron and S

    Avron, Haim and Toledo, Sivan , title =. 2011 , issue_date =. doi:10.1145/1944345.1944349 , journal =

  46. [46]

    1996 , issn =

    Some large-scale matrix computation problems , journal =. 1996 , issn =. doi:https://doi.org/10.1016/0377-0427(96)00018-0 , url =

  47. [47]

    , booktitle=

    Yao, Zhewei and Gholami, Amir and Keutzer, Kurt and Mahoney, Michael W. , booktitle=. PyHessian: Neural Networks Through the Lens of the Hessian , year=

  48. [48]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Class-balanced loss based on effective number of samples , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  49. [49]

    2025 , eprint=

    Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training , author=. 2025 , eprint=

  50. [50]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

    Jian, Zhongquan and Chen, Yanhao and Wang, Yancheng and Yao, Junfeng and Wang, Meihong and Wu, Qingqiang , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2025 , pages =

  51. [51]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Global and local mixture consistency cumulative learning for long-tailed visual recognitions , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  52. [52]

    International conference on machine learning , pages=

    Beyond synthetic noise: Deep learning on controlled noisy labels , author=. International conference on machine learning , pages=. 2020 , organization=

  53. [53]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Swin transformer: Hierarchical vision transformer using shifted windows , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  54. [54]

    Advances in Neural Information Processing Systems , volume=

    Adaptive sgd with polyak stepsize and line-search: Robust convergence and variance reduction , author=. Advances in Neural Information Processing Systems , volume=

  55. [55]

    arXiv preprint arXiv:2406.04142 , year=

    Stochastic Polyak step-sizes and momentum: Convergence guarantees and practical performance , author=. arXiv preprint arXiv:2406.04142 , year=

  56. [56]

    International Conference on Learning Representations , year=

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=

  57. [57]

    Momentum-

    Marlon Becker and Frederick Altrock and Benjamin Risse , booktitle=. Momentum-. 2025 , url=

  58. [58]

    Advances in Neural Information Processing Systems , editor=

    Sharpness-Aware Training for Free , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

  59. [59]

    2022 , eprint=

    Fisher SAM: Information Geometry and Sharpness Aware Minimisation , author=. 2022 , eprint=

  60. [60]

    and van Loan, Charles F

    Golub, Gene H. and van Loan, Charles F. , biburl =. Matrix Computations , url =

  61. [61]

    Simplifying neural nets by discovering flat minima , year =

    Hochreiter, Sepp and Schmidhuber, J\". Simplifying neural nets by discovering flat minima , year =

  62. [62]

    International Conference on Learning Representations , year=

    On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , author=. International Conference on Learning Representations , year=

  63. [63]

    Proceedings of the 34th International Conference on Machine Learning , pages =

    Sharp Minima Can Generalize For Deep Nets , author =. Proceedings of the 34th International Conference on Machine Learning , pages =. 2017 , editor =

  64. [64]

    Journal of Machine Learning Research , year =

    John Duchi and Elad Hazan and Yoram Singer , title =. Journal of Machine Learning Research , year =

  65. [65]

    Proceedings of Thirty Third Conference on Learning Theory , pages =

    Complexity Guarantees for Polyak Steps with Momentum , author =. Proceedings of Thirty Third Conference on Learning Theory , pages =. 2020 , editor =

  66. [66]

    2024 , eprint=

    Exact Convergence rate of the subgradient method by using Polyak step size , author=. 2024 , eprint=

  67. [67]

    2021 , eprint=

    Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , author=. 2021 , eprint=

  68. [68]

    2024 , eprint=

    Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution , author=. 2024 , eprint=

  69. [69]

    2024 , eprint=

    A Universal Class of Sharpness-Aware Minimization Algorithms , author=. 2024 , eprint=

  70. [70]

    2024 , eprint=

    On the Duality Between Sharpness-Aware Minimization and Adversarial Training , author=. 2024 , eprint=

  71. [71]

    2023 , eprint=

    Sharpness-Aware Minimization Alone can Improve Adversarial Robustness , author=. 2023 , eprint=

  72. [72]

    Remote Sensing , VOLUME =

    Dong, Mingrong and Yang, Yixuan and Zeng, Kai and Wang, Qingwang and Shen, Tao , TITLE =. Remote Sensing , VOLUME =. 2024 , NUMBER =

  73. [73]

    2022 , eprint=

    Sharpness-Aware Minimization with Dynamic Reweighting , author=. 2022 , eprint=

  74. [74]

    2024 , eprint=

    Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term , author=. 2024 , eprint=