A Critical Review of Recurrent Neural Networks for Sequence Learning
read the original abstract
Countless learning tasks require dealing with sequential data. Image captioning, speech synthesis, and music generation all require that a model produce outputs that are sequences. In other domains, such as time series prediction, video analysis, and musical information retrieval, a model must learn from inputs that are sequences. Interactive tasks, such as translating natural language, engaging in dialogue, and controlling a robot, often demand both capabilities. Recurrent neural networks (RNNs) are connectionist models that capture the dynamics of sequences via cycles in the network of nodes. Unlike standard feedforward neural networks, recurrent networks retain a state that can represent information from an arbitrarily long context window. Although recurrent neural networks have traditionally been difficult to train, and often contain millions of parameters, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful large-scale learning with them. In recent years, systems based on long short-term memory (LSTM) and bidirectional (BRNN) architectures have demonstrated ground-breaking performance on tasks as varied as image captioning, language translation, and handwriting recognition. In this survey, we review and synthesize the research that over the past three decades first yielded and then made practical these powerful learning models. When appropriate, we reconcile conflicting notation and nomenclature. Our goal is to provide a self-contained explication of the state of the art together with a historical perspective and references to primary research.
This paper has not been read by Pith yet.
Forward citations
Cited by 19 Pith papers
-
LongSpike: Fractional Order Spiking State Space Models for Efficient Long Sequence Learning
LongSpike integrates fractional-order state-space modeling into spiking neural networks, enabling better long-sequence performance than prior SNNs on LRA, WikiText-103, and Speech Commands benchmarks while retaining s...
-
SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression
SELF-EMO lets LLMs bootstrap better emotion recognition and expression via self-play, data flywheel filtering with smoothed IoU rewards, and SELF-GRPO reinforcement learning, yielding SOTA gains on IEMOCAP, MELD, and ...
-
Geometry-Induced Long-Range Correlations in Recurrent Neural Network Quantum States
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
-
Learning to learn with quantum neural networks via classical neural networks
Classical RNNs trained on small instances provide parameter initializations for QAOA and VQE that reduce total optimization iterations and generalize across problem sizes.
-
Topological Neural Dynamics: A Neuron-wise Framework for Sequence Modeling
TND models sequences via independent neuron evolution on a directed graph and outperforms RNN, LSTM, CfC, and Transformer baselines on Pong behavior cloning with over 3x more consecutive catches.
-
Topological Neural Dynamics: A Neuron-wise Framework for Sequence Modeling
TND is a neuron-wise dynamical system on a directed graph that achieves 17.47 consecutive catches in Pong behavior cloning, more than three times the strongest baseline.
-
Topological Neural Dynamics: A Neuron-wise Framework for Sequence Modeling
TND shifts to neuron-wise dynamics on explicit graph topology and achieves 17.47 mean consecutive catches in Pong behavior cloning, more than 3x the strongest baseline.
-
Multi-Fidelity Learning with Shallow Recurrent Decoders for Reactor Physics
Shallow Recurrent Decoders map point-kinetics time series to multi-group diffusion solutions on a benchmark reactor geometry.
-
A Data-Driven Parametric Reduced-Order Chemical Kinetics Model Derived from Atomistic Simulations
A parametric autoencoder with non-negativity and softmax constraints learns interpretable latent chemical components and couples them to kinetics and heat release for improved reduced-order modeling of decomposition.
-
Listen to Rhythm, Choose Movements: Autoregressive Multimodal Dance Generation via Diffusion and Mamba with Decoupled Dance Dataset
LRCM is a new multimodal diffusion model with audio and text Conformers plus Motion Temporal Mamba for generating long, coherent dance sequences from rhythm and descriptions using a decoupled dataset.
-
Interpretable and Steerable Sequence Learning via Prototypes
ProSeNet learns a sparse set of prototypes for case-based explanations in deep sequence models, matches state-of-the-art accuracy on several tasks, and supports manual prototype refinement by non-experts.
-
ASTEROID: A Spatiotemporal Information Transformer for Forecasting Multi-Step Time Series of Molecular Dynamics
ASTEROID is a spatiotemporal Transformer that predicts multi-step MD atomic coordinates with claimed higher accuracy and lower cost than iterative simulation on quantum-derived datasets.
-
Selective Correlation Based Knowledge Distillation for Ground Reaction Force Estimation
Selective Correlation Based Knowledge Distillation trains smaller models to accurately estimate ground reaction forces from wearable insole sensors by focusing on temporal features in correlation maps for efficient kn...
-
Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective
CmIR uses causal inference to separate invariant causal representations from spurious ones in multimodal data, improving generalization under distribution shifts and noise via invariance, mutual information, and recon...
-
Leveraging Convolutional Sparse Autoencoders for Robust Movement Classification from Low-Density sEMG
Convolutional sparse autoencoder on two-channel sEMG delivers 94.3% multi-subject F1 for six gestures, 92.3% after few-shot transfer to unseen subjects, and 90% after incremental extension to ten classes.
-
Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning
Develops and tests a model-based RL controller with post-training for gait in a tendon-driven soft quadruped, reporting improved efficiency and robustness over benchmarks.
-
EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness
EGI integrates four existing AI components for real-time multimodal emotion monitoring and feedback in simulated agile meetings, reporting 10% WER and improved self-awareness for Scrum Masters.
-
Predicting Drug Responses by Propagating Interactions through Text-Enhanced Drug-Gene Networks
A text-enhanced drug-gene network is constructed from articles and data, with edge embeddings estimated from cell line records to enable explainable drug sensitivity predictions at 94.74% accuracy.
-
From Handcrafted Features to Functional Edge Learning: Evolution of EEG Seizure Detection Frameworks
A review arguing that Kolmogorov-Arnold Networks address key limitations of deep learning models for EEG-based seizure detection through improved interpretability and efficiency.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.