pith. machine review for the scientific record. sign in

arxiv: 1409.2329 · v5 · submitted 2014-09-08 · 💻 cs.NE

Recognition: unknown

Recurrent Neural Network Regularization

Authors on Pith no claims yet
classification 💻 cs.NE
keywords neuraldropoutlstmsnetworksrecurrentregularizationrnnstasks
0
0 comments X
read the original abstract

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    cs.LG 2017-01 accept novelty 8.0

    A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.

  2. Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

    cs.AI 2026-05 unverdicted novelty 7.0

    MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 be...

  3. SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition

    cs.HC 2026-05 unverdicted novelty 7.0

    SIGMA-ASL is a multimodal dataset with 93,545 word-level ASL clips from Kinect RGB-D, mmWave radar, and dual IMUs, plus benchmarking protocols for single- and multi-modal recognition.

  4. Pointer Sentinel Mixture Models

    cs.CL 2016-09 conditional novelty 7.0

    Pointer sentinel-LSTM mixes context copying with softmax prediction to reach 70.9 perplexity on Penn Treebank using fewer parameters than standard LSTMs.