On Tiny Episodic Memories in Continual Learning
read the original abstract
In continual learning (CL), an agent learns from a stream of tasks leveraging prior experience to transfer knowledge to future tasks. It is an ideal framework to decrease the amount of supervision in the existing learning algorithms. But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks. One way to endow the learner the ability to perform tasks seen in the past is to store a small memory, dubbed episodic memory, that stores few examples from previous tasks and then to replay these examples when training for future tasks. In this work, we empirically analyze the effectiveness of a very small episodic memory in a CL setup where each training example is only seen once. Surprisingly, across four rather different supervised learning benchmarks adapted to CL, a very simple baseline, that jointly trains on both examples from the current task as well as examples stored in the episodic memory, significantly outperforms specifically designed CL approaches with and without episodic memory. Interestingly, we find that repetitive training on even tiny memories of past tasks does not harm generalization, on the contrary, it improves it, with gains between 7\% and 17\% when the memory is populated with a single example per class.
This paper has not been read by Pith yet.
Forward citations
Cited by 35 Pith papers
-
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
-
MedCRP-CL: Continual Medical Image Segmentation via Bayesian Nonparametric Semantic Modality Discovery
MedCRP-CL discovers semantic modalities online via CRP from text prompts and maintains modality-specific LoRA adapters with intra-modality EWC, achieving 73.3% Dice and 4.1% forgetting on 16 tasks while using 6x fewer...
-
Continual Learning of Domain-Invariant Representations
Introduces replay-based continual learning with sequential invariance alignment to learn domain-invariant representations, outperforming baselines on generalization to unseen domains across six datasets in vision, med...
-
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts
DRIFT is a benchmark for task-free continual graph learning under continuous distribution shifts, demonstrating that standard methods degrade without task boundary information.
-
KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks
KAN-CL cuts catastrophic forgetting by 88-93% on Split-CIFAR-10/5T and Split-CIFAR-100/10T by anchoring KAN parameters at per-knot granularity while matching baseline accuracy.
-
Online Continual Learning with Dynamic Label Hierarchies
HALO improves online continual learning under evolving label hierarchies by adaptively combining classification heads regularized with organized learnable prototypes for better adaptation and reduced forgetting.
-
MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound
MIST fixes unreliable splits in streaming decision trees for class-incremental learning by using a K-independent McDiarmid bound on Gini impurity, Bayesian moment projection for knowledge transfer, and KLL quantile sk...
-
MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound
MIST fixes unreliable splits in streaming decision trees for class-incremental learning by replacing Hoeffding-style bounds with a K-independent McDiarmid radius on Gini, plus Bayesian parent-to-child inheritance and ...
-
StrLoRA: Towards Streaming Continual Visual Instruction Tuning for MLLMs
StrLoRA is a regularized two-stage expert routing method for streaming CVIT that selects experts via textual instructions and applies token-wise cross-modal weighting with historical routing alignment.
-
Continual Learning for fMRI-Based Brain Disorder Diagnosis via Functional Connectivity Matrices Generative Replay
A structure-aware VAE generates realistic FC matrices for replay, combined with multi-level knowledge distillation and hierarchical contextual bandit sampling, to enable continual fMRI-based brain disorder diagnosis a...
-
Direct Discrepancy Replay: Distribution-Discrepancy Condensation and Manifold-Consistent Replay for Continual Face Forgery Detection
A replay method for continual face forgery detection condenses real-fake distribution discrepancies into compact maps and synthesizes compatible samples from current real faces to reduce forgetting under tight memory ...
-
SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators
SLE-FNO achieves zero forgetting and strong plasticity-stability balance in continual learning for FNO surrogate models of pulsatile blood flow by adding minimal single-layer extensions across four out-of-distribution tasks.
-
Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns
TRC² is a brain-inspired decoder-only architecture that localizes fast plasticity and uses thalamic and hippocampal pathways to substantially reduce cumulative forgetting in sequential language model training on strea...
-
Modality-Inconsistent Continual Learning of Multimodal Large Language Models
The paper introduces the MICL scenario for MLLMs with modality and task shifts and proposes MoInCL using pseudo-target generation and instruction-based distillation, reporting gains over continual learning baselines o...
-
Privacy Leakage via Output Label Space and Differentially Private Continual Learning
Identifies output label space as a privacy side-channel in DP continual learning, formalizes DP for CL, and demonstrates two mitigation methods yielding higher accuracy than prior work.
-
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
LoRA adapters should be scaled by 1/sqrt(rank) rather than 1/rank to stabilize learning and enable effective use of higher ranks during fine-tuning of large language models.
-
Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining
DG-Hard uses Donoho-Gavish hard thresholding on the fine-tuning weight delta to separate task-aligned signal from noise-like residual, recovering damaged capabilities while preserving target-task gains.
-
PMF-CL: Pareto-Minimal-Forgetting Continual Learner for Conflicting Tasks
PMF-CL derives Pareto-optimal solutions for continual learning on conflicting tasks, yielding memory-efficient algorithms for linear regression and quadratically bounded losses with static O(d^2) memory.
-
Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory
SeqMem-Eval reveals that high final accuracy in sequential LLM memory tasks often coexists with substantial forgetting and negative transfer, exposing stability-adaptability trade-offs hidden by standard aggregate metrics.
-
TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale
TFGN is an architectural overlay for transformers enabling task-free, replay-free continual pre-training across heterogeneous domains at LLM scale with near-zero backward transfer and high gradient orthogonality.
-
Continual Fine-Tuning of Large Language Models via Program Memory
ProCL organizes LoRA adapters into input-conditioned program memory slots that combine with a distributed adapter to improve retention and reduce forgetting in continual LLM fine-tuning.
-
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts
DRIFT benchmark shows substantial performance degradation for continual graph learning methods under task-free continuous distribution shifts modeled via Gaussian mixtures.
-
Critical Patch-Aware Sparse Prompting with Decoupled Training for Continual Learning on the Edge
CPS-Prompt delivers 1.6x gains in peak memory, training time, and energy on edge hardware for continual learning while staying within 2% accuracy of top prompt-based baselines.
-
Continual Few-shot Adaptation for Synthetic Fingerprint Detection
A continual few-shot adaptation method combining binary cross-entropy and supervised contrastive losses with replay achieves a good trade-off between fast adaptation to unseen synthetic fingerprint styles and retentio...
-
Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning
LifeLong-RFT applies chunking-level on-policy reinforcement learning with Quantized Action Consistency Reward, Continuous Trajectory Alignment Reward, and Format Compliance Reward to fine-tune VLA models, achieving a ...
-
CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion
CLARE is an exemplar-free continual learning framework for VLAs that autonomously expands modular adapters based on feature similarity and uses autoencoder routing for label-free deployment.
-
Data-Free Class-Incremental Gesture Recognition with Prototype-Guided Pseudo Feature Replay
A data-free class-incremental learning method for gesture recognition using prototype-guided pseudo feature replay with four components that achieves 11.8% and 12.8% mean global accuracy gains on SHREC 2017 3D and Ego...
-
Don't Forget the Critic: Value-Based Data Rehearsal for Multi-Cyclic Continual Reinforcement Learning
Qreg+NWLU improves forgetting mitigation and knowledge transfer in value-based multi-cyclic CRL by using dynamic Q-value rehearsal and immediate regularization instead of waiting after the first task.
-
MANGO: Meta-Adaptive Network Gradient Optimization for Online Continual Learning
MANGO combines gradient-gating and meta-learned regularization to balance stability and plasticity in single-pass online continual learning, reporting state-of-the-art accuracy on CLEAR-10, CIFAR-100, and Tiny-ImageNet.
-
HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning
HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigati...
-
CoMemNet: Contrastive Sampling with Memory Replay Network for Continual Traffic Prediction
CoMemNet is a dual-branch continual learning model for dynamic traffic networks that combines contrastive sampling via Wasserstein features and memory replay to achieve SOTA performance while mitigating forgetting.
-
BRAIN: Bias-Mitigation Continual Learning Approach to Vision-Brain Understanding
BRAIN uses bias-mitigation continual learning with a new de-bias contrastive loss and angular forgetting mitigation to achieve SOTA performance on vision-brain understanding benchmarks despite brain signal inconsisten...
-
On-Device Continual Learning with Dual-Stage Buffer and Dynamic Loss for Point-of-Care Pneumonia Diagnosis
PneumoNet uses a lightweight CNN, dual-stage balanced buffer, and dynamic class-weighted loss for domain-incremental learning on simulated PneumoniaMNIST shifts, reporting 86.6% accuracy and 1.4% forgetting.
-
Domain Incremental Learning for Pandemic-Resilient Chest X-Ray Analysis
A class-aware replay method for domain-incremental learning reaches 88.66% average accuracy on five simulated domain shifts in PneumoniaMNIST, beating standard replay, fine-tuning, and joint training.
-
Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection
Face-D²CL fuses spatial and frequency features and uses dual continual learning to reduce forgetting while adapting to new DeepFakes, cutting average error rates by 60.7% and raising unseen-domain AUC by 7.9% over prior SOTA.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.