pith. sign in

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it
abstract

Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models. Source code is publicly available at https://imatge-upc.github.io/skiprnn-2017-telecombcn/ .

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 2 cs.CL 1

roles

background 1

polarities

background 1

representative citing papers

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

cs.CL · 2023-10-03 · conditional · novelty 6.0

FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.

citing papers explorer

Showing 3 of 3 citing papers.

  • LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling cs.LG · 2026-05-11 · unverdicted · none · ref 28

    LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapting to non-stationary dynamics.

  • Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs cs.CL · 2023-10-03 · conditional · none · ref 69 · internal anchor

    FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.

  • ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network cs.LG · 2019-06-28 · unverdicted · none · ref 2 · internal anchor

    ARMIN introduces auto-addressing via hidden states and a novel RNN cell to produce a lighter recurrent memory network with lower overhead than existing MANNs or vanilla LSTMs.