hub Canonical reference

Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences, 114(13):3521–3526

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al · 2017

Canonical reference. 80% of citing Pith papers cite this work as background.

16 Pith papers citing it

Background 80% of classified citations

browse 16 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 4 baseline 1

citation-polarity summary

background 4 baseline 1

representative citing papers

Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

SPA unlocks patch-level features in CLIP for class-incremental learning via semantic-guided selection and optimal transport alignment with class descriptions, plus projectors and pseudo-feature replay to reduce forgetting.

DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts

cs.LG · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

DRIFT is a benchmark modeling continual graph data streams as time-varying mixtures of latent task distributions via Gaussian parameterization, revealing substantial performance degradation in existing continual learning methods under task-free continuous drift.

Characterizing and Correcting Effective Target Shift in Online Learning

stat.ML · 2026-05-08 · unverdicted · novelty 7.0

Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.

Continual Learning for fMRI-Based Brain Disorder Diagnosis via Functional Connectivity Matrices Generative Replay

q-bio.TO · 2026-04-15 · conditional · novelty 7.0

A structure-aware VAE generates realistic FC matrices for replay, combined with multi-level knowledge distillation and hierarchical contextual bandit sampling, to enable continual fMRI-based brain disorder diagnosis across sequentially arriving multi-site data without catastrophic forgetting.

Efficient Continual Learning in Language Models via Thalamically Routed Cortical Columns

cs.LG · 2026-02-25 · unverdicted · novelty 7.0

TRC² is a brain-inspired decoder-only architecture that localizes fast plasticity and uses thalamic and hippocampal pathways to substantially reduce cumulative forgetting in sequential language model training on streams like C4, WikiText-103, and GSM8K.

Learning to Discover at Test Time

cs.LG · 2026-01-22 · unverdicted · novelty 7.0

TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

SAE-FT uses a sparse autoencoder on pre-trained CLIP visual representations to regularize fine-tuning by penalizing changes to semantically meaningful features, aiming for robust performance on ImageNet and distribution shifts.

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Forgetting in LLM continual post-training is a geometry conflict between task-induced covariance structures and the evolving model state, controlled by gating Wasserstein barycenter merging on measured conflict.

Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning

cs.LG · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

Muon-OGD introduces a spectral-norm constrained orthogonal projection method solved via dual iterations and Newton-Schulz approximations to improve stability-plasticity trade-off in sequential LLM adaptation.

Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks

cs.LG · 2026-04-27 · unverdicted · novelty 6.0

FTN achieves near-zero forgetting on continual learning benchmarks by isolating task subnetworks via self-organizing binary masks generated through gradient descent, smoothing, and k-winner-take-all.

$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning

cs.LG · 2025-09-20 · conditional · novelty 6.0

λ-Orthogonality regularization enables distribution-specific adaptation of representations via affine transformations while retaining original learned structures.

Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

A state distribution view of post-training shows that on-policy supervision from the learner itself can outperform fixed-dataset SFT and preserve retention better than aggressive supervised updates.

FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

cs.LG · 2026-05-10 · unverdicted · novelty 5.0

FLAME is an MoE architecture using modality-specific routers and low-rank compression of expert knowledge to support efficient continual multimodal multi-task learning while reducing catastrophic forgetting.

Memory-Efficient Continual Learning with CLIP Models

cs.LG · 2026-05-05 · unverdicted · novelty 5.0

A per-class loss reweighting scheme based on distributional robustness allows CLIP models to perform class-incremental and domain-incremental learning with minimal memory while limiting forgetting on CIFAR-100, ImageNet1K, and DomainNet.

HoReN: Normalized Hopfield Retrieval for Large-Scale Sequential Model Editing

cs.LG · 2026-05-02 · unverdicted · novelty 5.0 · 2 refs

HoReN is a parameter-preserving editor that wraps an MLP with a Hopfield codebook memory and scales to 50K sequential edits on ZsRE while maintaining performance above 0.93.

Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift

cs.CV · 2025-05-26 · unverdicted · novelty 5.0

Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Characterizing and Correcting Effective Target Shift in Online Learning stat.ML · 2026-05-08 · unverdicted · none · ref 10
Online kernel regression equals offline regression with shifted targets; correcting the targets lets online learning match offline performance and outperform true targets in continual image classification.
Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning cs.LG · 2026-05-09 · unverdicted · none · ref 1 · 2 links
Muon-OGD introduces a spectral-norm constrained orthogonal projection method solved via dual iterations and Newton-Schulz approximations to improve stability-plasticity trade-off in sequential LLM adaptation.
FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning cs.LG · 2026-05-10 · unverdicted · none · ref 32
FLAME is an MoE architecture using modality-specific routers and low-rank compression of expert knowledge to support efficient continual multimodal multi-task learning while reducing catastrophic forgetting.
HoReN: Normalized Hopfield Retrieval for Large-Scale Sequential Model Editing cs.LG · 2026-05-02 · unverdicted · none · ref 11 · 2 links
HoReN is a parameter-preserving editor that wraps an MLP with a Hopfield codebook memory and scales to 50K sequential edits on ZsRE while maintaining performance above 0.93.

Overcoming catastrophic forgetting in neural networks.Proceedings of the national academy of sciences, 114(13):3521–3526

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer