Recurrent Neural Network Regularization

Wojciech Zaremba , Ilya Sutskever , Oriol Vinyals

Authors on Pith no claims yet

classification 💻 cs.NE

keywords neuraldropoutlstmsnetworksrecurrentregularizationrnnstasks

read the original abstract

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
cs.LG 2017-01 accept novelty 8.0

A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.
Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning
cs.AI 2026-05 unverdicted novelty 7.0

MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 be...
SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition
cs.HC 2026-05 unverdicted novelty 7.0

SIGMA-ASL is a multimodal dataset with 93,545 word-level ASL clips from Kinect RGB-D, mmWave radar, and dual IMUs, plus benchmarking protocols for single- and multi-modal recognition.
Pointer Sentinel Mixture Models
cs.CL 2016-09 conditional novelty 7.0

Pointer sentinel-LSTM mixes context copying with softmax prediction to reach 70.9 perplexity on Penn Treebank using fewer parameters than standard LSTMs.