MIST fixes unreliable splits in streaming decision trees for class-incremental learning by using a K-independent McDiarmid bound on Gini impurity, Bayesian moment projection for knowledge transfer, and KLL quantile sketches for adaptive leaf predictions.
hub
An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
Hebatron is the first open-weight Hebrew MoE LLM adapted from Nemotron-3, reaching 73.8% on Hebrew reasoning benchmarks while activating only 3B parameters per pass and supporting 65k-token context.
Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.
NORACL dynamically grows network capacity via neurogenesis-inspired signals to achieve oracle-level continual learning performance without pre-specifying architecture size.
FTN achieves near-zero forgetting on continual learning benchmarks by isolating task subnetworks via self-organizing binary masks generated through gradient descent, smoothing, and k-winner-take-all.
Different valid temporal partitions of the same streaming dataset can produce materially different rankings and performance numbers for continual learning methods.
Discrete decentralized learning dynamics on manifolds converge uniformly to an overdamped Langevin SDE whose stationary states produce orthogonally disentangled, linearly separable features.
Parameter-difference and model-inversion attacks can identify forgotten classes after machine unlearning on standard image datasets.
Supervised fine-tuning with LoRA on rational benchmark forecasts corrects extrapolation bias out-of-sample in LLM predictions for controlled experiments and cross-sectional stock returns.
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
Muon-OGD integrates Muon-style spectral-norm geometry with orthogonal gradient constraints to improve the stability-plasticity trade-off during sequential LLM adaptation.
Online generalised predictive coding (ODEM) tracks latent states in nonlinear and chaotic generative models by separating temporal scales for fast Bayesian belief updating and slow parameter learning.
Learning rate decay during SFT increases pretrained model sharpness, which exacerbates catastrophic forgetting and causes overtraining in LLMs.
Gradient consistency regularization and entropy-driven dynamic distillation improve accuracy by up to 5% in long-tailed incremental learning, with strong gains in majority-to-minority task ordering.
MPCS integrates eleven plasticity mechanisms and reaches a Normalized Efficiency Score of 94.2 on a 31-task benchmark, with ablations showing that removing EWC and Hebbian updates yields higher performance at lower cost.
citing papers explorer
-
Online Generalised Predictive Coding
Online generalised predictive coding (ODEM) tracks latent states in nonlinear and chaotic generative models by separating temporal scales for fast Bayesian belief updating and slow parameter learning.