Hessian eigenvector displacement and inverse participation ratio metrics show SGD stabilizing leading curvature directions while Adam causes more reorganization and parameter localization in MLP training.
arXiv preprint arXiv:2405.16002 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
FOAM adaptively controls damping and update frequency in Shampoo based on staleness-oriented error approximation to cut wall-clock time while preserving convergence.
citing papers explorer
-
Characterizing Optimizer-Dependent Training Dynamics Through Hessian Eigenvector Displacement and Localization
Hessian eigenvector displacement and inverse participation ratio metrics show SGD stabilizing leading curvature directions while Adam causes more reorganization and parameter localization in MLP training.
-
FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo
FOAM adaptively controls damping and update frequency in Shampoo based on staleness-oriented error approximation to cut wall-clock time while preserving convergence.