LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.
International Conference on Machine Learning , pages=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.
Muon succeeds by guaranteeing local step-size optimality rather than by tracking any ideal global geometry, as random-spectrum and quasi-norm variants match its performance on language models.
SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.
Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.
citing papers explorer
-
Muon is Not That Special: Random or Inverted Spectra Work Just as Well
Muon succeeds by guaranteeing local step-size optimality rather than by tracking any ideal global geometry, as random-spectrum and quasi-norm variants match its performance on language models.