hub Mixed citations

Efficient Lifelong Learning with A-GEM

Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny · 2018 · cs.LG · arXiv 1812.00420

Mixed citation behavior. Most common role is background (67%).

21 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 21 citing papers arXiv PDF

abstract

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 1

citation-polarity summary

background 4 unclear 1 use method 1

representative citing papers

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

cs.AI · 2023-06-05 · conditional · novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

SPA unlocks patch-level features in CLIP for class-incremental learning via semantic-guided selection and optimal transport alignment with class descriptions, plus projectors and pseudo-feature replay to reduce forgetting.

MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

MIST fixes unreliable splits in streaming decision trees for class-incremental learning by replacing Hoeffding-style bounds with a K-independent McDiarmid radius on Gini, plus Bayesian parent-to-child inheritance and per-leaf quantile sketches.

SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators

cs.LG · 2026-03-20 · unverdicted · novelty 7.0

SLE-FNO achieves zero forgetting and strong plasticity-stability balance in continual learning for FNO surrogate models of pulsatile blood flow by adding minimal single-layer extensions across four out-of-distribution tasks.

Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.

DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts

cs.LG · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

DRIFT benchmark shows substantial performance degradation for continual graph learning methods under task-free continuous distribution shifts modeled via Gaussian mixtures.

Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning

cs.LG · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

Muon-OGD introduces a spectral-norm constrained orthogonal projection method solved via dual iterations and Newton-Schulz approximations to improve stability-plasticity trade-off in sequential LLM adaptation.

Routing-Based Continual Learning for Multimodal Large Language Models

cs.LG · 2025-11-03 · unverdicted · novelty 6.0

Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.

Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model

cs.RO · 2023-11-06 · unverdicted · novelty 6.0

A hypernetwork generates clock-augmented stable neural ODEs (sNODEs) for scalable continual learning from demonstration, achieving O(N) training time via stochastic regularization while outperforming baselines on LfD tasks up to 26 skills and 32 dimensions.

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.

Tracking Adaptation Time: Metrics for Temporal Distribution Shift

cs.LG · 2026-04-08 · unverdicted · novelty 6.0

Three complementary metrics are introduced to distinguish model adaptation from intrinsic data difficulty under temporal distribution shift.

SHARP: Sleep-based Hierarchical Accelerated Replay for Long Range Non-Stationary Temporal Pattern Recognition

cs.AI · 2026-05-30 · unverdicted · novelty 5.0

SHARP separates memory accumulation from pattern recognition and uses accelerated offline replay of structured traces to achieve exponentially growing effective context at linear compute cost while learning non-stationary streams.

Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting

cs.CL · 2026-05-28 · unverdicted · novelty 5.0

A plug-and-play KL regularizer that masks the target token and renormalizes probabilities to improve the learning-forgetting trade-off in LoRA adaptation of LLMs.

CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

CP-MoE uses a transient expert, consistency-preserving routing bias, and guided regularization to reduce catastrophic forgetting in MoE-based LLMs and VLMs while preserving cross-task transfer, reporting SOTA on SuperNI and gains on VQA v2.

BRAIN: Bias-Mitigation Continual Learning Approach to Vision-Brain Understanding

cs.CV · 2025-08-25 · unverdicted · novelty 5.0

BRAIN uses bias-mitigation continual learning with a new de-bias contrastive loss and angular forgetting mitigation to achieve SOTA performance on vision-brain understanding benchmarks despite brain signal inconsistencies across sessions.

Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition

cs.LG · 2025-07-28 · unverdicted · novelty 5.0

MDM-OC encodes fine-tuned models as deltas, projects them into orthogonal subspaces, and merges via gradient optimization to enable interference-free continual learning with reversible unmerging.

HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigating catastrophic forgetting.

CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

cs.LG · 2026-05-07 · unverdicted · novelty 5.0 · 2 refs

CRAFT is a continual learning method for LLMs that learns low-rank interventions on hidden representations, using a unified KL-divergence objective to handle task routing by output divergence, forgetting control via prior-state regularization, and intervention merging.

TACO: Temporal Consensus Optimization for Continual Neural Mapping

cs.RO · 2026-02-04

Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts

cs.LG · 2025-06-26

Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning

cs.LG · 2026-04-16

citing papers explorer

Showing 21 of 21 citing papers.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning cs.AI · 2023-06-05 · conditional · none · ref 12
LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning cs.CV · 2026-05-13 · unverdicted · none · ref 5 · internal anchor
SPA unlocks patch-level features in CLIP for class-incremental learning via semantic-guided selection and optimal transport alignment with class descriptions, plus projectors and pseudo-feature replay to reduce forgetting.
MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound cs.LG · 2026-05-12 · unverdicted · none · ref 13 · 2 links · internal anchor
MIST fixes unreliable splits in streaming decision trees for class-incremental learning by replacing Hoeffding-style bounds with a K-independent McDiarmid radius on Gini, plus Bayesian parent-to-child inheritance and per-leaf quantile sketches.
SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators cs.LG · 2026-03-20 · unverdicted · none · ref 63 · internal anchor
SLE-FNO achieves zero forgetting and strong plasticity-stability balance in continual learning for FNO surrogate models of pulsatile blood flow by adding minimal single-layer extensions across four out-of-distribution tasks.
Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory cs.LG · 2026-05-14 · unverdicted · none · ref 3 · internal anchor
SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts cs.LG · 2026-05-13 · unverdicted · none · ref 10 · 2 links · internal anchor
DRIFT benchmark shows substantial performance degradation for continual graph learning methods under task-free continuous distribution shifts modeled via Gaussian mixtures.
Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning cs.LG · 2026-05-09 · unverdicted · none · ref 7 · 2 links · internal anchor
Muon-OGD introduces a spectral-norm constrained orthogonal projection method solved via dual iterations and Newton-Schulz approximations to improve stability-plasticity trade-off in sequential LLM adaptation.
Routing-Based Continual Learning for Multimodal Large Language Models cs.LG · 2025-11-03 · unverdicted · none · ref 8 · internal anchor
Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.
Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model cs.RO · 2023-11-06 · unverdicted · none · ref 44 · internal anchor
A hypernetwork generates clock-augmented stable neural ODEs (sNODEs) for scalable continual learning from demonstration, achieving O(N) training time via stochastic regularization while outperforming baselines on LfD tasks up to 26 skills and 32 dimensions.
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting cs.LG · 2026-05-04 · unverdicted · none · ref 73
Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
Tracking Adaptation Time: Metrics for Temporal Distribution Shift cs.LG · 2026-04-08 · unverdicted · none · ref 11
Three complementary metrics are introduced to distinguish model adaptation from intrinsic data difficulty under temporal distribution shift.
SHARP: Sleep-based Hierarchical Accelerated Replay for Long Range Non-Stationary Temporal Pattern Recognition cs.AI · 2026-05-30 · unverdicted · none · ref 1 · internal anchor
SHARP separates memory accumulation from pattern recognition and uses accelerated offline replay of structured traces to achieve exponentially growing effective context at linear compute cost while learning non-stationary streams.
Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting cs.CL · 2026-05-28 · unverdicted · none · ref 9 · internal anchor
A plug-and-play KL regularizer that masks the target token and renormalizes probabilities to improve the learning-forgetting trade-off in LoRA adaptation of LLMs.
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning cs.LG · 2026-05-18 · unverdicted · none · ref 2 · internal anchor
CP-MoE uses a transient expert, consistency-preserving routing bias, and guided regularization to reduce catastrophic forgetting in MoE-based LLMs and VLMs while preserving cross-task transfer, reporting SOTA on SuperNI and gains on VQA v2.
BRAIN: Bias-Mitigation Continual Learning Approach to Vision-Brain Understanding cs.CV · 2025-08-25 · unverdicted · none · ref 69 · internal anchor
BRAIN uses bias-mitigation continual learning with a new de-bias contrastive loss and angular forgetting mitigation to achieve SOTA performance on vision-brain understanding benchmarks despite brain signal inconsistencies across sessions.
Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition cs.LG · 2025-07-28 · unverdicted · none · ref 18 · internal anchor
MDM-OC encodes fine-tuned models as deltas, projects them into orthogonal subspaces, and merges via gradient optimization to enable interference-free continual learning with reversible unmerging.
HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning cs.AI · 2026-05-07 · unverdicted · none · ref 79
HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigating catastrophic forgetting.
CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning cs.LG · 2026-05-07 · unverdicted · none · ref 2 · 2 links
CRAFT is a continual learning method for LLMs that learns low-rank interventions on hidden representations, using a unified KL-divergence objective to handle task routing by output divergence, forgetting control via prior-state regularization, and intervention merging.
TACO: Temporal Consensus Optimization for Continual Neural Mapping cs.RO · 2026-02-04 · unreviewed · ref 7 · internal anchor
Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts cs.LG · 2025-06-26 · unreviewed · ref 11 · internal anchor
Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning cs.LG · 2026-04-16 · unreviewed · ref 5

Efficient Lifelong Learning with A-GEM

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer