Resurrecting recurrent neural networks for long sequences

Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.

Learning reveals invisible structure in low-rank RNNs

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.

Parallel-in-Time Training of Recurrent Neural Networks for Dynamical Systems Reconstruction

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

GTF-DEER augments the DEER framework with Generalized Teacher Forcing to allow effective parallel training of nonlinear recurrent models on extremely long sequences, improving dynamical systems reconstruction for data with long time scales.

citing papers explorer

Showing 3 of 3 citing papers.

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences cs.LG · 2026-05-06 · unverdicted · none · ref 29
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
Learning reveals invisible structure in low-rank RNNs cs.LG · 2026-05-05 · unverdicted · none · ref 39
Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.
Parallel-in-Time Training of Recurrent Neural Networks for Dynamical Systems Reconstruction cs.LG · 2026-05-12 · unverdicted · none · ref 55
GTF-DEER augments the DEER framework with Generalized Teacher Forcing to allow effective parallel training of nonlinear recurrent models on extremely long sequences, improving dynamical systems reconstruction for data with long time scales.

Resurrecting recurrent neural networks for long sequences

fields

years

verdicts

representative citing papers

citing papers explorer