Memory Aware Synapses: Learning what (not) to forget

Francesca Babiloni; Marcus Rohrbach; Mohamed Elhoseiny; Rahaf Aljundi; Tinne Tuytelaars

arxiv: 1711.09601 · v4 · pith:5ZADS3OKnew · submitted 2017-11-27 · 💻 cs.CV · cs.AI· stat.ML

Memory Aware Synapses: Learning what (not) to forget

Rahaf Aljundi , Francesca Babiloni , Mohamed Elhoseiny , Marcus Rohrbach , Tinne Tuytelaars This is my paper

classification 💻 cs.CV cs.AIstat.ML

keywords learningknowledgenetworkimportanceimportantparameterstasksaware

0 comments

read the original abstract

Humans can learn in a continuous manner. Old rarely utilized knowledge can be overwritten by new incoming information while important, frequently used knowledge is prevented from being erased. In artificial learning systems, lifelong learning so far has focused mainly on accumulating knowledge over tasks and overcoming catastrophic forgetting. In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively. Inspired by neuroplasticity, we propose a novel approach for lifelong learning, coined Memory Aware Synapses (MAS). It computes the importance of the parameters of a neural network in an unsupervised and online manner. Given a new sample which is fed to the network, MAS accumulates an importance measure for each parameter of the network, based on how sensitive the predicted output function is to a change in this parameter. When learning a new task, changes to important parameters can then be penalized, effectively preventing important knowledge related to previous tasks from being overwritten. Further, we show an interesting connection between a local version of our method and Hebb's rule,which is a model for the learning process in the brain. We test our method on a sequence of object recognition tasks and on the challenging problem of learning an embedding for predicting $<$subject, predicate, object$>$ triplets. We show state-of-the-art performance and, for the first time, the ability to adapt the importance of the parameters based on unlabeled data towards what the network needs (not) to forget, which may vary depending on test conditions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining
cs.LG 2026-05 unverdicted novelty 6.0

DG-Hard uses Donoho-Gavish hard thresholding on the fine-tuning weight delta to separate task-aligned signal from noise-like residual, recovering damaged capabilities while preserving target-task gains.
Compressive Transformers for Long-Range Sequence Modelling
cs.LG 2019-11 unverdicted novelty 6.0

Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.
MANGO: Meta-Adaptive Network Gradient Optimization for Online Continual Learning
cs.LG 2026-05 unverdicted novelty 5.0

MANGO combines gradient-gating and meta-learned regularization to balance stability and plasticity in single-pass online continual learning, reporting state-of-the-art accuracy on CLEAR-10, CIFAR-100, and Tiny-ImageNet.
Efficient Task Adaptation in Large Language Models via Selective Parameter Optimization
cs.CL 2026-04 unverdicted novelty 3.0

The paper claims a selective fine-tuning method that identifies and freezes core parameters to mitigate catastrophic forgetting in LLMs while improving domain adaptation, shown in experiments with GPT-J and LLaMA-3.