pith. sign in

hub

On the difficulty of training Recurrent Neural Networks

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it
abstract

There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.

hub tools

citation-role summary

background 3

citation-polarity summary

roles

background 3

polarities

background 3

clear filters

representative citing papers

In Defense of Information Leakage in Concept-based Models

cs.LG · 2026-06-09 · conditional · novelty 7.0

Concept-based models can use controlled 'benign' information leakage to remain accurate and intervenable under real-world concept incompleteness by reframing their training objective.

Adaptive Federated Optimization

cs.LG · 2020-02-29 · unverdicted · novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

PaLM: Scaling Language Modeling with Pathways

cs.CL · 2022-04-05 · accept · novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

Multimodal and Multi-view Models for Emotion Recognition

cs.CL · 2019-06-24 · unverdicted · novelty 5.0

Multimodal training with attention and contrastive multi-view learning improves both combined and acoustic-only emotion recognition on IEMOCAP over prior acoustic baselines.

Autoencoding sensory substitution

q-bio.NC · 2019-07-14 · unverdicted · novelty 4.0

Deep recurrent autoencoders convert images to shortened audio signals that incorporate hearing models, enabling above-chance hand posture discrimination and object reaching after a few hours of training instead of months.

citing papers explorer

Showing 2 of 2 citing papers after filters.