NAPTS extends APTS with non-monotone acceptance and nonlinear Schwarz preconditioning to reduce CPU time by 30% and rejected steps by two-thirds while preserving accuracy in neural network training.
arXiv preprint arXiv:2111.04949 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A scaling law model derived from roofline analysis and a speedup-based efficiency factor predicts training energy for BERT models across GPU parallelism configurations.
citing papers explorer
-
The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model
A scaling law model derived from roofline analysis and a speedup-based efficiency factor predicts training energy for BERT models across GPU parallelism configurations.