Efficient Lifelong Learning with A-GEM

Arslan Chaudhry; Marc'Aurelio Ranzato; Marcus Rohrbach; Mohamed Elhoseiny

arxiv: 1812.00420 · v2 · pith:PL2DBDWXnew · submitted 2018-12-02 · 💻 cs.LG · stat.ML

Efficient Lifelong Learning with A-GEM

Arslan Chaudhry , Marc'Aurelio Ranzato , Marcus Rohrbach , Mohamed Elhoseiny This is my paper

classification 💻 cs.LG stat.ML

keywords learninga-gemlifelongtasksefficiencyefficientevaluationeven

0 comments

read the original abstract

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 23 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
cs.AI 2023-06 conditional novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning
cs.CV 2026-05 unverdicted novelty 7.0

SPA unlocks patch-level features in CLIP for class-incremental learning via semantic-guided selection and optimal transport alignment with class descriptions, plus projectors and pseudo-feature replay to reduce forgetting.
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts
cs.LG 2026-05 accept novelty 7.0

DRIFT is a benchmark for task-free continual graph learning under continuous distribution shifts, demonstrating that standard methods degrade without task boundary information.
MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound
cs.LG 2026-05 unverdicted novelty 7.0

MIST fixes unreliable splits in streaming decision trees for class-incremental learning by replacing Hoeffding-style bounds with a K-independent McDiarmid radius on Gini, plus Bayesian parent-to-child inheritance and ...
MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound
cs.LG 2026-05 unverdicted novelty 7.0

MIST fixes unreliable splits in streaming decision trees for class-incremental learning by using a K-independent McDiarmid bound on Gini impurity, Bayesian moment projection for knowledge transfer, and KLL quantile sk...
Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

TeLAPA maintains archives of behaviorally diverse yet competent policies aligned in a shared latent space to preserve plasticity and enable faster recovery after interference in continual reinforcement learning.
SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators
cs.LG 2026-03 unverdicted novelty 7.0

SLE-FNO achieves zero forgetting and strong plasticity-stability balance in continual learning for FNO surrogate models of pulsatile blood flow by adding minimal single-layer extensions across four out-of-distribution tasks.
Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory
cs.LG 2026-05 unverdicted novelty 6.0

SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts
cs.LG 2026-05 unverdicted novelty 6.0

DRIFT benchmark shows substantial performance degradation for continual graph learning methods under task-free continuous distribution shifts modeled via Gaussian mixtures.
Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning
cs.LG 2026-05 unverdicted novelty 6.0

Muon-OGD introduces a spectral-norm constrained orthogonal projection method solved via dual iterations and Newton-Schulz approximations to improve stability-plasticity trade-off in sequential LLM adaptation.
CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning
cs.LG 2026-05 unverdicted novelty 6.0

CRAFT is a continual learning method for LLMs that applies low-rank interventions on hidden states, unified by KL divergence for routing similar tasks, regularizing against forgetting, and merging updates, showing red...
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
cs.LG 2026-05 unverdicted novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
Tracking Adaptation Time: Metrics for Temporal Distribution Shift
cs.LG 2026-04 unverdicted novelty 6.0

Three complementary metrics are introduced to distinguish model adaptation from intrinsic data difficulty under temporal distribution shift.
TACO: Temporal Consensus Optimization for Continual Neural Mapping
cs.RO 2026-02 unverdicted novelty 6.0

TACO reformulates neural implicit mapping as temporal consensus optimization to enable continual adaptation to scene changes without data replay or storage.
Routing-Based Continual Learning for Multimodal Large Language Models
cs.LG 2025-11 unverdicted novelty 6.0

Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.
Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts
cs.LG 2025-06 unverdicted novelty 6.0

MoRAM frames continual learning as incremental addition of rank-1 adapters viewed as self-activating key-value associative memory units in a mixture-of-experts setup.
Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model
cs.RO 2023-11 unverdicted novelty 6.0

A hypernetwork generates clock-augmented stable neural ODEs (sNODEs) for scalable continual learning from demonstration, achieving O(N) training time via stochastic regularization while outperforming baselines on LfD ...
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning
cs.LG 2026-05 unverdicted novelty 5.0

CP-MoE uses a transient expert, consistency-preserving routing bias, and guided regularization to reduce catastrophic forgetting in MoE-based LLMs and VLMs while preserving cross-task transfer, reporting SOTA on Super...
Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning
cs.LG 2026-05 unverdicted novelty 5.0

Muon-OGD integrates Muon-style spectral-norm geometry with orthogonal gradient constraints to improve the stability-plasticity trade-off during sequential LLM adaptation.
HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning
cs.AI 2026-05 unverdicted novelty 5.0

HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigati...
CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning
cs.LG 2026-05 unverdicted novelty 5.0

CRAFT is a continual learning method for LLMs that learns low-rank interventions on hidden representations, using a unified KL-divergence objective to handle task routing by output divergence, forgetting control via p...
BRAIN: Bias-Mitigation Continual Learning Approach to Vision-Brain Understanding
cs.CV 2025-08 unverdicted novelty 5.0

BRAIN uses bias-mitigation continual learning with a new de-bias contrastive loss and angular forgetting mitigation to achieve SOTA performance on vision-brain understanding benchmarks despite brain signal inconsisten...
Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition
cs.LG 2025-07 unverdicted novelty 5.0

MDM-OC encodes fine-tuned models as deltas, projects them into orthogonal subspaces, and merges via gradient optimization to enable interference-free continual learning with reversible unmerging.