pith. sign in

Kakade , booktitle=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 2 2025 1

verdicts

UNVERDICTED 3

roles

background 1

polarities

background 1

representative citing papers

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

citing papers explorer

Showing 3 of 3 citing papers.

  • Direct From Darwin: Deriving Advanced Optimizers From Evolutionary First Principles cs.NE · 2026-05-06 · unverdicted · none · ref 43 · 2 links

    SGD, approximations of Newton's method, natural gradient descent, and Adam are proven compatible with evolutionary dynamics when augmented with DLS noise, turning them into valid in silico simulations of asexual Darwinian evolution.

  • Adapt or Forget: Provable Tradeoffs Between Adam and SGD in Nonstationary Optimization stat.ML · 2026-05-05 · unverdicted · none · ref 16

    Adam's adaptive preconditioning and first-moment averaging improve high-probability tracking error in noise-dominated nonstationary regimes but can increase it under strong drift, where SGD achieves a smaller floor, with explicit beta-dependent bounds.

  • Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 83

    Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.