pith. machine review for the scientific record. sign in

arxiv: 1606.03401 · v1 · submitted 2016-06-10 · 💻 cs.NE · cs.LG

Recognition: unknown

Memory-Efficient Backpropagation Through Time

Authors on Pith no claims yet
classification 💻 cs.NE cs.LG
keywords memoryalgorithmcomputationaltimeapproachbackpropagationbpttbudget
0
0 comments X
read the original abstract

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95\% of memory usage while using only one third more time per iteration than the standard BPTT.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    cs.LG 2017-01 accept novelty 8.0

    A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.