Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu + 2 more · 2017 · Proceedings of the National Academy of Sciences · DOI 10.1073/pnas.1611835114

29 Pith papers cite this work, alongside 5,466 external citations. Polarity classification is still indexing.

29 Pith papers citing it

5,466 external citations · Crossref

open at publisher browse 29 citing papers

representative citing papers

Streaming Adversarial Robustness in Fuzzy ARTMAP: Mechanism-Aligned Evaluation, Progressive Training, and Interpretable Diagnostics

cs.LG · 2026-05-07 · conditional · novelty 7.0

Fuzzy ARTMAP models are highly vulnerable to a new white-box attack aligned with their category competition, but progressive selective training yields stronger replay-free robustness than offline adversarial training under adaptive evaluation.

Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

cs.RO · 2026-04-29 · unverdicted · novelty 7.0 · 2 refs

A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

SLICE applies gradient surgery via projection and truncated SVD to initialize LoRA adapters, yielding better stability-plasticity trade-offs on continual learning benchmarks including adversarial task sequences.

Early Data Exposure Improves Robustness to Subsequent Fine-Tuning

cs.LG · 2026-05-12 · conditional · novelty 6.0

Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.

DynaMiCS: Fine-tuning LLMs with Performance Constraints using Dynamic Mixtures

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

DynaMiCS uses short probing runs to build a slope matrix of cross-domain effects and solves a constrained optimization over mixture weights to improve targets while respecting performance bounds on constrained domains.

Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

cs.AI · 2026-05-10 · unverdicted · novelty 6.0

Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution under GPT-5.1.

RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.

Rotation-Preserving Supervised Fine-Tuning

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.

Adaptive Data Compression and Reconstruction for Memory-Bounded EEG Continual Learning

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

ADaCoRe enables memory-bounded UICL for EEG by compressing and reconstructing signals while preserving key morphologies, outperforming baselines with gains of at least +2.7 and +15.3 ACC on ISRUC and FACED datasets.

A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.

EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure

cs.NI · 2026-05-01 · unverdicted · novelty 6.0

EASE closes three residual anchors in federated multimodal unlearning using bilateral displacement, cosine-sine decomposition, and forget lock, achieving near-retrain performance on forget and retain data.

NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning

cs.LG · 2026-04-29 · unverdicted · novelty 6.0

NORACL dynamically grows network capacity via neurogenesis-inspired signals to achieve oracle-level continual learning performance without pre-specifying architecture size.

Fine-Tuning Regimes Define Distinct Continual Learning Problems

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

The relative rankings of continual learning methods are not preserved across different fine-tuning regimes defined by trainable parameter depth.

COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.

AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

AIM applies modality-specific masks to balance stability and plasticity in asymmetric VLMs, achieving SOTA average performance and reduced forgetting on continual VQA v2 and GQA while preserving generalization to novel compositions.

Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

A PINN transfer learning framework for coal methane sorption reaches R²=0.932 on held-out data with 227% improvement over classical isotherms and identifies Monte Carlo Dropout as the best uncertainty method while ensembles degrade under shared physics constraints.

Parameter-efficient Quantum Multi-task Learning

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

QMTL uses shared VQC encoding plus task-specific quantum ansatz heads to achieve linear parameter scaling with the number of tasks while matching or exceeding classical multi-task baselines on three benchmarks.

Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover

cs.AI · 2026-04-09 · unverdicted · novelty 6.0

Heavy supervised fine-tuning on formal math suppresses tool-calling in Goedel-Prover-V2 from 89.4% to near 0%, but 100 Lean agentic traces restore it to 83.8% on the Berkeley Function Calling Leaderboard with in-domain gains on ProofNet.

Silent Collapse in Recursive Learning Systems

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Recursive learning systems undergo silent collapse of internal distributions, preceded by entropy contraction, representation freezing, and tail erosion, which the MTR framework can monitor and avert.

Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports

cs.AI · 2026-04-21 · unverdicted · novelty 5.0

SFT followed by GRPO improves LLM accuracy and reasoning recall in disease classification from radiology reports on three radiologist-annotated datasets.

Gyan: An Explainable Neuro-Symbolic Language Model

cs.CL · 2026-05-06 · unverdicted · novelty 4.0 · 2 refs

Gyan is a novel explainable non-transformer language model that achieves SOTA results on multiple datasets by mimicking human-like compositional context and world models.

citing papers explorer

Showing 29 of 29 citing papers.

Streaming Adversarial Robustness in Fuzzy ARTMAP: Mechanism-Aligned Evaluation, Progressive Training, and Interpretable Diagnostics cs.LG · 2026-05-07 · conditional · none · ref 14
Fuzzy ARTMAP models are highly vulnerable to a new white-box attack aligned with their category competition, but progressive selective training yields stronger replay-free robustness than offline adversarial training under adaptive evaluation.
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies cs.RO · 2026-04-29 · unverdicted · none · ref 30 · 2 links
A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 147
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning cs.LG · 2026-05-12 · unverdicted · none · ref 12
SLICE applies gradient surgery via projection and truncated SVD to initialize LoRA adapters, yielding better stability-plasticity trade-offs on continual learning benchmarks including adversarial task sequences.
Early Data Exposure Improves Robustness to Subsequent Fine-Tuning cs.LG · 2026-05-12 · conditional · none · ref 8
Early mixing of post-training data into pretraining improves retention of acquired capabilities after subsequent fine-tuning in language models.
Not How Many, But Which: Parameter Placement in Low-Rank Adaptation cs.LG · 2026-05-12 · unverdicted · none · ref 45
Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
DynaMiCS: Fine-tuning LLMs with Performance Constraints using Dynamic Mixtures cs.LG · 2026-05-11 · unverdicted · none · ref 3
DynaMiCS uses short probing runs to build a slope matrix of cross-domain effects and solves a constrained optimization over mixture weights to improve targets while respecting performance bounds on constrained domains.
Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation cs.AI · 2026-05-10 · unverdicted · none · ref 20
Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution under GPT-5.1.
RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction cs.LG · 2026-05-09 · unverdicted · none · ref 40
RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.
Rotation-Preserving Supervised Fine-Tuning cs.LG · 2026-05-08 · unverdicted · none · ref 83
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
Adaptive Data Compression and Reconstruction for Memory-Bounded EEG Continual Learning cs.LG · 2026-05-04 · unverdicted · none · ref 9
ADaCoRe enables memory-bounded UICL for EEG by compressing and reconstructing signals while preserving key morphologies, outperforming baselines with gains of at least +2.7 and +15.3 ACC on ISRUC and FACED datasets.
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management cs.LG · 2026-05-04 · unverdicted · none · ref 271
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting cs.LG · 2026-05-04 · unverdicted · none · ref 46
Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure cs.NI · 2026-05-01 · unverdicted · none · ref 21
EASE closes three residual anchors in federated multimodal unlearning using bilateral displacement, cosine-sine decomposition, and forget lock, achieving near-retrain performance on forget and retain data.
NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning cs.LG · 2026-04-29 · unverdicted · none · ref 11
NORACL dynamically grows network capacity via neurogenesis-inspired signals to achieve oracle-level continual learning performance without pre-specifying architecture size.
Fine-Tuning Regimes Define Distinct Continual Learning Problems cs.LG · 2026-04-23 · unverdicted · none · ref 10
The relative rankings of continual learning methods are not preserved across different fine-tuning regimes defined by trainable parameter depth.
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling cs.LG · 2026-04-22 · unverdicted · none · ref 211
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning cs.CV · 2026-04-16 · unverdicted · none · ref 22
AIM applies modality-specific masks to balance stability and plasticity in asymmetric VLMs, achieving SOTA average performance and reduced forgetting on continual VQA v2 and GQA while preserving generalization to novel compositions.
Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification cs.LG · 2026-04-15 · unverdicted · none · ref 65
A PINN transfer learning framework for coal methane sorption reaches R²=0.932 on held-out data with 227% improvement over classical isotherms and identifies Monte Carlo Dropout as the best uncertainty method while ensembles degrade under shared physics constraints.
Parameter-efficient Quantum Multi-task Learning cs.LG · 2026-04-15 · unverdicted · none · ref 16
QMTL uses shared VQC encoding plus task-specific quantum ansatz heads to achieve linear parameter scaling with the number of tasks while matching or exceeding classical multi-task baselines on three benchmarks.
Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover cs.AI · 2026-04-09 · unverdicted · none · ref 3
Heavy supervised fine-tuning on formal math suppresses tool-calling in Goedel-Prover-V2 from 89.4% to near 0%, but 100 Lean agentic traces restore it to 83.8% on the Berkeley Function Calling Leaderboard with in-domain gains on ProofNet.
Silent Collapse in Recursive Learning Systems cs.LG · 2026-05-14 · unverdicted · none · ref 13
Recursive learning systems undergo silent collapse of internal distributions, preceded by entropy contraction, representation freezing, and tail erosion, which the MTR framework can monitor and avert.
Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports cs.AI · 2026-04-21 · unverdicted · none · ref 22
SFT followed by GRPO improves LLM accuracy and reasoning recall in disease classification from radiology reports on three radiologist-annotated datasets.
Gyan: An Explainable Neuro-Symbolic Language Model cs.CL · 2026-05-06 · unverdicted · none · ref 33 · 2 links
Gyan is a novel explainable non-transformer language model that achieves SOTA results on multiple datasets by mimicking human-like compositional context and world models.
MPCS: Neuroplastic Continual Learning via Multi-Component Plasticity and Topology-Aware EWC cs.LG · 2026-05-04 · unverdicted · none · ref 4
MPCS integrates eleven plasticity mechanisms and reaches a Normalized Efficiency Score of 94.2 on a 31-task benchmark, with ablations showing that removing EWC and Hebbian updates yields higher performance at lower cost.
Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration cs.IR · 2026-04-19 · unverdicted · none · ref 16
A multi-agent multimodal system with fact-grounded adjudication and a dynamic two-tier preference graph cuts false positives in content filtering by 74.3% and nearly doubles F1-score versus text-only baselines while supporting user-driven Delta adjustments.
Adaptive Unknown Fault Detection and Few-Shot Continual Learning for Condition Monitoring in Ultrasonic Metal Welding cs.LG · 2026-04-15 · unverdicted · none · ref 34
The method detects unknown faults in ultrasonic metal welding at 96% accuracy and incorporates new fault types from only five labeled samples to reach 98% classification accuracy.
The Dynamic Gist-Based Memory Model (DGMM): A Memory-Centric Architecture for Artificial Intelligence cs.AI · 2026-05-04 · unverdicted · none · ref 12
DGMM is proposed as an explicit graph-structured memory architecture for AI that enables persistent episodic memory, cue-based recall, and context-dependent interpretation without retraining.
Efficient Task Adaptation in Large Language Models via Selective Parameter Optimization cs.CL · 2026-04-18 · unverdicted · none · ref 2
The paper claims a selective fine-tuning method that identifies and freezes core parameters to mitigate catastrophic forgetting in LLMs while improving domain adaptation, shown in experiments with GPT-J and LLaMA-3.

Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017

fields

years

verdicts

representative citing papers

citing papers explorer