pith. machine review for the scientific record. sign in

arxiv: 1610.06258 · v3 · submitted 2016-10-20 · 📊 stat.ML · cs.LG· cs.NE

Recognition: unknown

Using Fast Weights to Attend to the Recent Past

Jimmy Ba , Geoffrey Hinton , Volodymyr Mnih , Joel Z. Leibo , Catalin Ionescu

Authors on Pith no claims yet
classification 📊 stat.ML cs.LGcs.NE
keywords weightsneuralfastpastrecentactivitiesartificialnetworks
0
0 comments X
read the original abstract

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs. There is no good reason for this restriction. Synapses have dynamics at many different time-scales and this suggests that artificial neural networks might benefit from variables that change slower than activities but much faster than the standard weights. These "fast weights" can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proved very helpful in sequence-to-sequence models. By using fast weights we can avoid the need to store copies of neural activity patterns.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning, Fast and Slow: Towards LLMs That Adapt Continually

    cs.LG 2026-05 unverdicted novelty 6.0

    Fast-Slow Training combines slow parameter updates with fast context optimization to achieve up to 3x better sample efficiency, higher performance, less forgetting, and preserved plasticity in continual LLM learning.

  2. In-Place Test-Time Training

    cs.LG 2026-04 conditional novelty 6.0

    In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.