hub

On the difficulty of training Recurrent Neural Networks

Pascanu, R · 2012 · cs.LG · arXiv 1211.5063

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

open full Pith review browse 17 citing papers arXiv PDF

abstract

There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

AI-enabled gravitational-waves searches for binary neutron stars at optimal sensitivity

astro-ph.HE · 2026-07-01 · unverdicted · novelty 8.0

Aframe neural network achieves matched-filter sensitivity for binary neutron star GW searches at lower computational cost using heterodyning and a single GPU.

Coherent-State Propagation: A Computational Framework for Simulating Bosonic Quantum Systems

quant-ph · 2026-04-21 · unverdicted · novelty 8.0

Coherent-state propagation enables quasi-polynomial classical simulation of bosonic circuits with logarithmically many Kerr gates at exponentially small trace-distance error, with polynomial runtime in the weak-nonlinearity regime.

In Defense of Information Leakage in Concept-based Models

cs.LG · 2026-06-09 · conditional · novelty 7.0

Concept-based models can use controlled 'benign' information leakage to remain accurate and intervenable under real-world concept incompleteness by reframing their training objective.

Geometry-Induced Long-Range Correlations in Recurrent Neural Network Quantum States

quant-ph · 2026-04-09 · conditional · novelty 7.0

Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.

Compact Spin-Charge Separated Neural Quantum States for Valence-Bond States

cond-mat.str-el · 2026-06-15 · unverdicted · novelty 6.0

A compact NQS architecture for VBS and doped sVBS states reaches high fidelity with fewer parameters than standard baselines by using solvable-point-guided designs and explicit spin-hole sector separation.

Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

S-Adam modulates updates via an LGI-based damping term and proves almost-sure convergence to Clarke stationary points at O(1/sqrt(T)) while reporting accuracy gains on CIFAR-100 and TinyImageNet.

Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks

cs.LG · 2024-04-03 · unverdicted · novelty 6.0

NEON provides uncertainty-aware operator learning for composite Bayesian optimization in function spaces using a single network, achieving claimed SOTA with orders of magnitude fewer parameters than ensembles.

Adaptive Federated Optimization

cs.LG · 2020-02-29 · unverdicted · novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

PaLM: Scaling Language Modeling with Pathways

cs.CL · 2022-04-05 · accept · novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

cs.CL · 2016-09-26 · accept · novelty 6.0

GNMT deploys 8-layer LSTMs with attention, wordpieces, low-precision inference, and coverage-penalized beam search to match state-of-the-art on WMT'14 En-Fr and En-De while cutting translation errors by 60% in human evaluations.

Physics-informed convolutional neural networks for fluid flow through porous media

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

A physics-informed CNN predicts pore-scale velocity fields from geometry and serves as a warm-start to accelerate Lattice-Boltzmann solvers in over 90% of tested cases.

Multimodal and Multi-view Models for Emotion Recognition

cs.CL · 2019-06-24 · unverdicted · novelty 5.0

Multimodal training with attention and contrastive multi-view learning improves both combined and acoustic-only emotion recognition on IEMOCAP over prior acoustic baselines.

Inferring identified hadron production in $pp$ collisions with physics-informed machine learning at the LHC

hep-ph · 2026-05-09 · unverdicted · novelty 5.0

A physics-informed neural network infers pT spectra of pi, K, p, Lambda, and Ks in unmeasured rapidity regions from PYTHIA8 pp collisions at 13.6 TeV, achieving 1.5-5.83% yield uncertainties while reproducing yield ratios and freeze-out parameters.

Preventing overfitting in deep learning using differential privacy

cs.LG · 2026-03-12 · unverdicted · novelty 4.0

Differential privacy techniques can help prevent overfitting and improve generalization in deep neural networks.

Autoencoding sensory substitution

q-bio.NC · 2019-07-14 · unverdicted · novelty 4.0

Deep recurrent autoencoders convert images to shortened audio signals that incorporate hearing models, enabling above-chance hand posture discrimination and object reaching after a few hours of training instead of months.

On Inductive Biases in Deep Reinforcement Learning

cs.LG · 2019-07-05 · unverdicted · novelty 4.0

Adaptive replacements for domain-specific components in deep RL agents can yield better learning on new tasks without additional tuning.

A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence

cs.LG · 2026-04-22 · unverdicted · novelty 4.0

A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.

citing papers explorer

Showing 17 of 17 citing papers.

AI-enabled gravitational-waves searches for binary neutron stars at optimal sensitivity astro-ph.HE · 2026-07-01 · unverdicted · none · ref 83 · internal anchor
Aframe neural network achieves matched-filter sensitivity for binary neutron star GW searches at lower computational cost using heterodyning and a single GPU.
Coherent-State Propagation: A Computational Framework for Simulating Bosonic Quantum Systems quant-ph · 2026-04-21 · unverdicted · none · ref 155
Coherent-state propagation enables quasi-polynomial classical simulation of bosonic circuits with logarithmically many Kerr gates at exponentially small trace-distance error, with polynomial runtime in the weak-nonlinearity regime.
In Defense of Information Leakage in Concept-based Models cs.LG · 2026-06-09 · conditional · none · ref 43 · internal anchor
Concept-based models can use controlled 'benign' information leakage to remain accurate and intervenable under real-world concept incompleteness by reframing their training objective.
Geometry-Induced Long-Range Correlations in Recurrent Neural Network Quantum States quant-ph · 2026-04-09 · conditional · none · ref 45
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
Compact Spin-Charge Separated Neural Quantum States for Valence-Bond States cond-mat.str-el · 2026-06-15 · unverdicted · none · ref 52 · internal anchor
A compact NQS architecture for VBS and doped sVBS states reaches high fidelity with fewer parameters than standard baselines by using solvable-point-guided designs and explicit spin-hole sector separation.
Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization cs.LG · 2026-05-28 · unverdicted · none · ref 3 · internal anchor
S-Adam modulates updates via an LGI-based damping term and proves almost-sure convergence to Clarke stationary points at O(1/sqrt(T)) while reporting accuracy gains on CIFAR-100 and TinyImageNet.
Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks cs.LG · 2024-04-03 · unverdicted · none · ref 33 · internal anchor
NEON provides uncertainty-aware operator learning for composite Bayesian optimization in function spaces using a single network, achieving claimed SOTA with orders of magnitude fewer parameters than ensembles.
Adaptive Federated Optimization cs.LG · 2020-02-29 · unverdicted · none · ref 192 · internal anchor
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
PaLM: Scaling Language Modeling with Pathways cs.CL · 2022-04-05 · accept · none · ref 108
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation cs.CL · 2016-09-26 · accept · none · ref 33
GNMT deploys 8-layer LSTMs with attention, wordpieces, low-precision inference, and coverage-penalized beam search to match state-of-the-art on WMT'14 En-Fr and En-De while cutting translation errors by 60% in human evaluations.
Physics-informed convolutional neural networks for fluid flow through porous media cs.LG · 2026-05-18 · unverdicted · none · ref 28 · internal anchor
A physics-informed CNN predicts pore-scale velocity fields from geometry and serves as a warm-start to accelerate Lattice-Boltzmann solvers in over 90% of tested cases.
Multimodal and Multi-view Models for Emotion Recognition cs.CL · 2019-06-24 · unverdicted · none · ref 21 · internal anchor
Multimodal training with attention and contrastive multi-view learning improves both combined and acoustic-only emotion recognition on IEMOCAP over prior acoustic baselines.
Inferring identified hadron production in $pp$ collisions with physics-informed machine learning at the LHC hep-ph · 2026-05-09 · unverdicted · none · ref 52
A physics-informed neural network infers pT spectra of pi, K, p, Lambda, and Ks in unmeasured rapidity regions from PYTHIA8 pp collisions at 13.6 TeV, achieving 1.5-5.83% yield uncertainties while reproducing yield ratios and freeze-out parameters.
Preventing overfitting in deep learning using differential privacy cs.LG · 2026-03-12 · unverdicted · none · ref 30 · internal anchor
Differential privacy techniques can help prevent overfitting and improve generalization in deep neural networks.
Autoencoding sensory substitution q-bio.NC · 2019-07-14 · unverdicted · none · ref 208 · internal anchor
Deep recurrent autoencoders convert images to shortened audio signals that incorporate hearing models, enabling above-chance hand posture discrimination and object reaching after a few hours of training instead of months.
On Inductive Biases in Deep Reinforcement Learning cs.LG · 2019-07-05 · unverdicted · none · ref 8 · internal anchor
Adaptive replacements for domain-specific components in deep RL agents can yield better learning on new tasks without additional tuning.
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence cs.LG · 2026-04-22 · unverdicted · none · ref 41
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.

On the difficulty of training Recurrent Neural Networks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer