In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
Resurrecting recurrent neural networks for long sequences
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.
GTF-DEER augments the DEER framework with Generalized Teacher Forcing to allow effective parallel training of nonlinear recurrent models on extremely long sequences, improving dynamical systems reconstruction for data with long time scales.
citing papers explorer
-
How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
-
Learning reveals invisible structure in low-rank RNNs
Learning in low-rank RNNs reduces to an exact low-dimensional ODE system in overlap space, where loss-invisible overlaps encode training history without affecting function.
-
Parallel-in-Time Training of Recurrent Neural Networks for Dynamical Systems Reconstruction
GTF-DEER augments the DEER framework with Generalized Teacher Forcing to allow effective parallel training of nonlinear recurrent models on extremely long sequences, improving dynamical systems reconstruction for data with long time scales.