arXiv preprint arXiv:1808.05671 , year=

On the convergence of adaptive gradient methods for nonconvex optimization , author= · 2018 · arXiv 1808.05671

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate

math.OC · 2026-04-09 · unverdicted · novelty 8.0

Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.

Why Muon Outperforms Adam: A Curvature Perspective

cs.LG · 2026-06-03 · conditional · novelty 7.0

Muon outperforms Adam by reducing curvature penalty via lower Normalized Directional Sharpness, as shown via Taylor approximation on LLM training and proven on stylized quadratic problems with heterogeneous curvature.

On the Convergence of Muon and Beyond

cs.LG · 2025-09-19 · unverdicted · novelty 7.0

Muon-MVR2 attains the optimal anytime convergence rate of ~O(T^{-1/3}) in stochastic non-convex settings under horizon-free schedules.

Adaptive Federated Optimization

cs.LG · 2020-02-29 · unverdicted · novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

Muon Learns More Robust and Transferable Features than Adam

cs.LG · 2026-06-08 · unverdicted · novelty 5.0

Muon learns more robust and transferable features than Adam and SGD, shown via corruption robustness tests, transfer experiments, layer-wise probes, effective rank measurements, and a theoretical proof on margins in a multi-component classification problem.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Adaptive Federated Optimization cs.LG · 2020-02-29 · unverdicted · none · ref 196
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

arXiv preprint arXiv:1808.05671 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer