PACE is a clipped per-coordinate controller added to AdamW that improves the limiting error of the returned iterate average in both quadratic analysis and LM experiments.
The AdEMAMix optimizer: Better, faster, older.arXiv preprint arXiv:2409.03137, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Lower bounds establish that heavy-ball momentum extends the compute-efficient batch-size window by sqrt(kappa) over SGD in linear regression, with accelerated SGD showing spectrum-dependent CE-serial runtime tradeoffs.
citing papers explorer
-
Training for the Model You Return: Improving Optimization for Iterate-Averaged Language Models
PACE is a clipped per-coordinate controller added to AdamW that improves the limiting error of the returned iterate average in both quadratic analysis and LM experiments.
-
Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods
Lower bounds establish that heavy-ball momentum extends the compute-efficient batch-size window by sqrt(kappa) over SGD in linear regression, with accelerated SGD showing spectrum-dependent CE-serial runtime tradeoffs.