citation dossier

An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211

22 I · 2013 · arXiv 1312.6211

16Pith papers citing it

17reference links

cs.LGtop field · 10 papers

UNVERDICTEDtop verdict bucket · 15 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 16 reviewed papers. Its strongest current cluster is cs.LG (10 papers). The largest review-status bucket among citing papers is UNVERDICTED (15 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

MIST fixes unreliable splits in streaming decision trees for class-incremental learning by using a K-independent McDiarmid bound on Gini impurity, Bayesian moment projection for knowledge transfer, and KLL quantile sketches for adaptive leaf predictions.

HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

Hebatron is the first open-weight Hebrew MoE LLM adapted from Nemotron-3, reaching 73.8% on Hebrew reasoning benchmarks while activating only 3B parameters per pass and supporting 65k-token context.

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.

Diversity in Large Language Models under Supervised Fine-Tuning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.

NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning

cs.LG · 2026-04-29 · unverdicted · novelty 6.0

NORACL dynamically grows network capacity via neurogenesis-inspired signals to achieve oracle-level continual learning performance without pre-specifying architecture size.

Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks

cs.LG · 2026-04-27 · unverdicted · novelty 6.0

FTN achieves near-zero forgetting on continual learning benchmarks by isolating task subnetworks via self-organizing binary masks generated through gradient descent, smoothing, and k-winner-take-all.

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

cs.LG · 2026-04-23 · conditional · novelty 6.0

Different valid temporal partitions of the same streaming dataset can produce materially different rankings and performance numbers for continual learning methods.

Continuous Limits of Coupled Flows in Representation Learning

cs.LG · 2026-04-18 · unverdicted · novelty 6.0

Discrete decentralized learning dynamics on manifolds converge uniformly to an overdamped Langevin SDE whose stationary states produce orthogonally disentangled, linearly separable features.

Label Leakage Attacks in Machine Unlearning: A Parameter and Inversion-Based Approach

cs.CR · 2026-04-08 · unverdicted · novelty 6.0

Parameter-difference and model-inversion attacks can identify forgotten classes after machine unlearning on standard image datasets.

Debiasing LLMs by Fine-tuning

q-fin.GN · 2026-04-03 · unverdicted · novelty 6.0

Supervised fine-tuning with LoRA on rational benchmark forecasts corrects extrapolation bias out-of-sample in LLM predictions for controlled experiments and cross-sectional stock returns.

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

cs.AI · 2023-08-01 · unverdicted · novelty 6.0

MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.

Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

Muon-OGD integrates Muon-style spectral-norm geometry with orthogonal gradient constraints to improve the stability-plasticity trade-off during sequential LLM adaptation.

Online Generalised Predictive Coding

stat.ML · 2026-05-04 · unverdicted · novelty 5.0

Online generalised predictive coding (ODEM) tracks latent states in nonlinear and chaotic generative models by separating temporal scales for fast Bayesian belief updating and slow parameter learning.

(How) Learning Rates Regulate Catastrophic Overtraining

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

Learning rate decay during SFT increases pretrained model sharpness, which exacerbates catastrophic forgetting and causes overtraining in LLMs.

Dynamic Distillation and Gradient Consistency for Robust Long-Tailed Incremental Learning

cs.CV · 2026-05-05 · unverdicted · novelty 4.0

Gradient consistency regularization and entropy-driven dynamic distillation improve accuracy by up to 5% in long-tailed incremental learning, with strong gains in majority-to-minority task ordering.

MPCS: Neuroplastic Continual Learning via Multi-Component Plasticity and Topology-Aware EWC

cs.LG · 2026-05-04 · unverdicted · novelty 4.0

MPCS integrates eleven plasticity mechanisms and reaches a Normalized Efficiency Score of 94.2 on a 31-task benchmark, with ablations showing that removing EWC and Hebbian updates yields higher performance at lower cost.

citing papers explorer

Showing 16 of 16 citing papers.

MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound cs.LG · 2026-05-12 · unverdicted · none · ref 25
MIST fixes unreliable splits in streaming decision trees for class-incremental learning by using a K-independent McDiarmid bound on Gini impurity, Bayesian moment projection for knowledge transfer, and KLL quantile sketches for adaptive leaf predictions.
HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model cs.CL · 2026-05-11 · unverdicted · none · ref 12
Hebatron is the first open-weight Hebrew MoE LLM adapted from Nemotron-3, reaching 73.8% on Hebrew reasoning benchmarks while activating only 3B parameters per pass and supporting 65k-token context.
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting cs.LG · 2026-05-04 · unverdicted · none · ref 69
Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
Diversity in Large Language Models under Supervised Fine-Tuning cs.LG · 2026-04-30 · unverdicted · none · ref 26 · 2 links
TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.
NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning cs.LG · 2026-04-29 · unverdicted · none · ref 7
NORACL dynamically grows network capacity via neurogenesis-inspired signals to achieve oracle-level continual learning performance without pre-specifying architecture size.
Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks cs.LG · 2026-04-27 · unverdicted · none · ref 12
FTN achieves near-zero forgetting on continual learning benchmarks by isolating task subnetworks via self-organizing binary masks generated through gradient descent, smoothing, and k-winner-take-all.
Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability cs.LG · 2026-04-23 · conditional · none · ref 17
Different valid temporal partitions of the same streaming dataset can produce materially different rankings and performance numbers for continual learning methods.
Continuous Limits of Coupled Flows in Representation Learning cs.LG · 2026-04-18 · unverdicted · none · ref 42
Discrete decentralized learning dynamics on manifolds converge uniformly to an overdamped Langevin SDE whose stationary states produce orthogonally disentangled, linearly separable features.
Label Leakage Attacks in Machine Unlearning: A Parameter and Inversion-Based Approach cs.CR · 2026-04-08 · unverdicted · none · ref 37
Parameter-difference and model-inversion attacks can identify forgotten classes after machine unlearning on standard image datasets.
Debiasing LLMs by Fine-tuning q-fin.GN · 2026-04-03 · unverdicted · none · ref 13
Supervised fine-tuning with LoRA on rational benchmark forecasts corrects extrapolation bias out-of-sample in LLM predictions for controlled experiments and cross-sectional stock returns.
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework cs.AI · 2023-08-01 · unverdicted · none · ref 255
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning cs.LG · 2026-05-09 · unverdicted · none · ref 2
Muon-OGD integrates Muon-style spectral-norm geometry with orthogonal gradient constraints to improve the stability-plasticity trade-off during sequential LLM adaptation.
Online Generalised Predictive Coding stat.ML · 2026-05-04 · unverdicted · none · ref 35
Online generalised predictive coding (ODEM) tracks latent states in nonlinear and chaotic generative models by separating temporal scales for fast Bayesian belief updating and slow parameter learning.
(How) Learning Rates Regulate Catastrophic Overtraining cs.LG · 2026-04-15 · unverdicted · none · ref 10
Learning rate decay during SFT increases pretrained model sharpness, which exacerbates catastrophic forgetting and causes overtraining in LLMs.
Dynamic Distillation and Gradient Consistency for Robust Long-Tailed Incremental Learning cs.CV · 2026-05-05 · unverdicted · none · ref 9
Gradient consistency regularization and entropy-driven dynamic distillation improve accuracy by up to 5% in long-tailed incremental learning, with strong gains in majority-to-minority task ordering.
MPCS: Neuroplastic Continual Learning via Multi-Component Plasticity and Topology-Aware EWC cs.LG · 2026-05-04 · unverdicted · none · ref 12
MPCS integrates eleven plasticity mechanisms and reaches a Normalized Efficiency Score of 94.2 on a 31-task benchmark, with ablations showing that removing EWC and Hebbian updates yields higher performance at lower cost.

An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer