arXiv preprint arXiv:2111.04949 , year=

A survey, empirical evaluation of parallel deep learning frameworks , author= · 2021 · arXiv 2111.04949

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

A Non-Monotone Preconditioned Trust-Region Method for Neural Network Training

math.OC · 2026-05-14 · unverdicted · novelty 5.0

NAPTS extends APTS with non-monotone acceptance and nonlinear Schwarz preconditioning to reduce CPU time by 30% and rejected steps by two-thirds while preserving accuracy in neural network training.

The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model

cs.LG · 2026-06-22 · unverdicted · novelty 4.0

A scaling law model derived from roofline analysis and a speedup-based efficiency factor predicts training energy for BERT models across GPU parallelism configurations.

citing papers explorer

Showing 1 of 1 citing paper after filters.

The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model cs.LG · 2026-06-22 · unverdicted · none · ref 8
A scaling law model derived from roofline analysis and a speedup-based efficiency factor predicts training energy for BERT models across GPU parallelism configurations.

arXiv preprint arXiv:2111.04949 , year=

fields

years

verdicts

representative citing papers

citing papers explorer