Stable Routing for Mixture-of-Experts in Class-Incremental Learning

Da-Wei Zhou; Lijun Zhang; Quan Cheng; Zirui Guo

arxiv: 2605.17571 · v1 · pith:4NSPAS7Gnew · submitted 2026-05-17 · 💻 cs.CV · cs.LG

Stable Routing for Mixture-of-Experts in Class-Incremental Learning

Zirui Guo , Quan Cheng , Da-Wei Zhou , Lijun Zhang This is my paper

Pith reviewed 2026-05-20 13:10 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords class-incremental learningmixture-of-expertsstable routingrouting alignmentcontinual learningexpert expansionknowledge preservationcatastrophic forgetting

0 comments

The pith

Aligning old-class routing to historical distributions stabilizes expandable mixture-of-experts in class-incremental learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that adding new experts to a mixture-of-experts model during class-incremental learning causes the router to reassign old-class samples, disrupting previously learned expert compositions even when old experts remain frozen. StaR-MoE counters this by adding sensitivity-aware routing alignment that pulls current routing for old classes back toward their historical distributions and by applying asymmetric capacity regularization that promotes use of the new experts for new classes. If the approach works, models can keep growing their expert pool while retaining prior accuracy and gaining on new classes. A sympathetic reader cares because this isolates routing stability as a controllable factor that existing MoE-CIL methods have left unaddressed.

Core claim

Expandable mixture-of-experts architectures for class-incremental learning require two complementary properties: stable old-class routing to preserve knowledge and sufficient capacity utilization for new-class adaptation. StaR-MoE realizes these properties through sensitivity-aware routing alignment, which matches the router's present behavior on old classes to historical routing distributions via sensitivity-guided constraints, together with asymmetric capacity regularization that encourages effective use of the expanded expert pool without eroding class-specific specialization.

What carries the argument

Sensitivity-aware routing alignment, which enforces that the current router's assignments for old-class samples remain close to their historical distributions by means of sensitivity-guided constraints.

If this is right

StaR-MoE raises both average accuracy and last-task accuracy above prior state-of-the-art methods on four standard class-incremental learning benchmarks.
Old-class knowledge remains more intact because expert assignments for those classes change less after new experts are introduced.
New classes still receive adequate expert capacity because the regularization term remains asymmetric and does not force every sample onto old experts.
The framework demonstrates that routing-level interventions can be added on top of frozen-expert expandable MoE without redesigning the underlying architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sensitivity-guided alignment idea could be tested in other continual settings where model components are added incrementally, such as expanding transformer layers or growing neural architecture search outputs.
If routing drift turns out to be the dominant interference mechanism, monitoring router sensitivity might serve as a general diagnostic for forgetting in any expert-based or modular continual-learning system.
The approach leaves open whether similar constraints could be applied at the feature level rather than the routing level when experts are not strictly additive.

Load-bearing premise

The main source of interference is routing drift from expert expansion, and constraining the router to match historical distributions for old classes will preserve knowledge without preventing effective learning of new classes.

What would settle it

On any standard CIL benchmark, measure the change in routing probability vectors for old-class samples before and after each expert addition; if StaR-MoE does not produce measurably smaller changes than an unconstrained MoE baseline, or if removing the alignment term eliminates the reported accuracy gains, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2605.17571 by Da-Wei Zhou, Lijun Zhang, Quan Cheng, Zirui Guo.

**Figure 1.** Figure 1: Illustration of routing drift in SEMA. Black lines mark expert expansion; non-zero routing probabilities to the left of each marker indicate later experts activated for earlier tasks. We refer to this structural interference problem as routing drift, where expert expansion alters the computational pathways of learned classes. Routing drift therefore provides a structural explanation for forgetting in exp… view at source ↗

**Figure 2.** Figure 2: Overview of the StaR-MoE framework. prior work [Chen et al., 2022, He et al., 2022], we freeze the ViT backbone and insert lightweight adapters as experts in parallel with the MLP blocks. Let x l ∈ R d denote the input feature to the MLP block at the l-th layer. The adapter function Al (x l ) comprises a down-projection Wl down ∈ R d×r that maps features to a bottleneck dimension r, followed by a ReLU acti… view at source ↗

**Figure 3.** Figure 3: Performance curves of different methods on ImageNet-R, ImageNet-A, and VTAB. We [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Further analysis. (a) Task-wise routing probabilities in StaR-MoE on ImageNet-A. Low [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: t-SNE visualization of router-input distributions across MoE layers on VTAB. Samples [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Additional experimental results. (a) Running time comparison on CIFAR-100 and ImageNet [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Impact of MoE design choices. (a) Effect of the number of active experts [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Class-incremental learning (CIL) requires models to learn new classes sequentially while preserving prior knowledge. Recently, approaches that combine pre-trained models with mixture-of-experts (MoE) have received increasing attention in CIL: they typically expand experts during learning and employ a router to assign weights across experts. However, existing MoE methods often overlook routing drift induced by expert expansion. Once new experts are introduced, the router may reassign samples from earlier classes to newly added experts, thereby perturbing previously established expert compositions and causing interference even when old experts remain frozen. We argue that expandable MoE in CIL requires two complementary properties: stable old-class routing for knowledge preservation and sufficient capacity utilization for new-class adaptation. To this end, we propose Stable Routing for MoE (StaR-MoE), a routing-level framework for expandable MoE in CIL. By incorporating sensitivity-aware routing alignment, StaR-MoE aligns current old-class routing behavior with historical routing distributions through sensitivity-guided constraints. Complementarily, StaR-MoE introduces asymmetric capacity regularization to encourage effective utilization of the expanded expert pool without compromising class-specific routing specialization. Extensive experiments across four standard CIL benchmarks demonstrate that StaR-MoE consistently improves both average and last accuracy over state-of-the-art methods, highlighting the importance of stable routing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

StaR-MoE adds sensitivity-aware alignment and asymmetric regularization to stabilize routing when expanding experts in class-incremental learning, with consistent benchmark gains.

read the letter

The paper's core contribution is a routing-level fix for expandable MoE in CIL. It identifies that adding new experts can shift routing away from old-class samples even when prior experts stay frozen, then counters this with sensitivity-guided alignment to historical routing distributions plus asymmetric capacity regularization to preserve room for new classes. The abstract reports steady lifts in both average and last accuracy over prior methods on four standard CIL benchmarks. That pairing of stability and adaptation constraints is not something I recall from the cited prior work, so the specific combination looks new at the routing level. The empirical pattern supports the claim that addressing drift helps without killing plasticity. The argument stays grounded in the practical problem rather than overclaiming generality. The main limitation is that the abstract gives no equations, no ablation breakdowns, and no error bars or variance numbers, so it is hard to judge how much each piece drives the result or whether the sensitivity measure is robust across datasets. If the full paper shows clean ablations and the gains hold under different expansion schedules, the contribution strengthens; if the improvements shrink once you control for total compute or router capacity, the story weakens. This work is for people already building or tuning MoE systems inside continual learning pipelines. It is not a foundational shift but a useful incremental step that deserves referee attention because the problem it targets is real and the reported results are coherent enough to merit detailed review.

Referee Report

1 major / 3 minor

Summary. The paper proposes StaR-MoE, a routing-level framework for expandable mixture-of-experts models in class-incremental learning. It identifies routing drift from expert expansion as a source of interference and introduces two components: sensitivity-aware routing alignment, which constrains current old-class routing to match historical distributions, and asymmetric capacity regularization, which promotes utilization of the expanded expert pool. Experiments across four standard CIL benchmarks report consistent gains in both average and last accuracy over prior state-of-the-art methods.

Significance. If the empirical results hold under closer scrutiny, the work is significant for highlighting routing stability as a distinct and actionable requirement in MoE-based CIL. The paired design of alignment for preservation and asymmetric regularization for plasticity offers a practical balance that prior expandable-MoE approaches appear to have under-emphasized. The multi-benchmark evaluation provides direct support for the central claim that stable routing reduces interference without sacrificing adaptation.

major comments (1)

[§3.2] §3.2 (sensitivity-aware routing alignment): the claim that aligning to historical routing distributions preserves knowledge without unduly restricting plasticity rests on the untested premise that historical router outputs for old classes remain near-optimal after expert expansion; an ablation that varies the alignment strength hyper-parameter and reports both old-class routing consistency (e.g., expert-assignment overlap) and new-class accuracy would directly test whether the constraint is load-bearing or merely additive.

minor comments (3)

[Tables 1–2] Table 1 and Table 2: error bars or standard deviations across the reported runs are not shown; adding them would allow readers to judge whether the modest last-accuracy gains (typically 1–3 points) are statistically reliable.
[§4.3] §4.3 (asymmetric capacity regularization): the precise form of the asymmetry (e.g., the weighting between old and new experts) is described only qualitatively; an explicit equation or pseudocode would improve reproducibility.
[Figure 3] Figure 3: the visualization of routing distributions before and after alignment would benefit from a quantitative metric (e.g., KL divergence) in the caption to make the visual comparison more precise.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and the specific suggestion to strengthen the analysis of sensitivity-aware routing alignment. We will incorporate the recommended ablation in the revised manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (sensitivity-aware routing alignment): the claim that aligning to historical routing distributions preserves knowledge without unduly restricting plasticity rests on the untested premise that historical router outputs for old classes remain near-optimal after expert expansion; an ablation that varies the alignment strength hyper-parameter and reports both old-class routing consistency (e.g., expert-assignment overlap) and new-class accuracy would directly test whether the constraint is load-bearing or merely additive.

Authors: We agree that an explicit ablation varying the alignment strength would provide stronger empirical support for the claim that historical routing distributions remain near-optimal for old classes post-expansion. In the revision we will add a study that sweeps the alignment strength hyper-parameter, reporting old-class routing consistency (expert-assignment overlap) together with new-class accuracy. This will clarify whether the constraint is load-bearing for the observed gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces StaR-MoE as a new routing-level framework that adds sensitivity-aware routing alignment (to match historical distributions) and asymmetric capacity regularization (to enable new-class adaptation). These are presented as complementary design choices motivated by the problem of routing drift, not as re-derivations or fits of prior quantities. The abstract and described properties contain no equations that reduce predictions to fitted inputs by construction, nor load-bearing self-citations that substitute for independent justification. Empirical gains across four CIL benchmarks are reported as external validation rather than tautological outcomes. The derivation remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The two new mechanisms (sensitivity-aware routing alignment and asymmetric capacity regularization) are introduced as design choices whose precise formulations and any hidden hyperparameters remain unspecified.

pith-pipeline@v0.9.0 · 5766 in / 1298 out tokens · 77050 ms · 2026-05-20T13:10:06.344817+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

[1]

Proceedings of the

Deep Residual Learning for Image Recognition , author =. Proceedings of the

work page
[2]

An Image is Worth 16x16 Words:

Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil , booktitle =. An Image is Worth 16x16 Words:

work page
[3]

Psychology of Learning and Motivation , volume =

Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , author =. Psychology of Learning and Motivation , volume =

work page
[4]

2024 , publisher =

Class-Incremental Learning: A Survey , author =. 2024 , publisher =

work page 2024
[5]

2024 , publisher =

A Comprehensive Survey of Continual Learning: Theory, Method and Application , author =. 2024 , publisher =

work page 2024
[6]

2022 , publisher =

Class-Incremental Learning: Survey and Performance Evaluation on Image Classification , author =. 2022 , publisher =

work page 2022
[7]

Nature Machine Intelligence , volume =

Three Types of Incremental Learning , author =. Nature Machine Intelligence , volume =. 2022 , publisher =

work page 2022
[8]

Proceedings of the

Learning to Prompt for Continual Learning , author =. Proceedings of the

work page
[9]

Wang, Zifeng and Zhang, Zizhao and Ebrahimi, Sayna and Sun, Ruoxi and Zhang, Han and Lee, Chen-Yu and Ren, Xiaoqi and Su, Guolong and Perot, Vincent and Dy, Jennifer and Pfister, Tomas , booktitle =

work page
[10]

Dynamic Mixture of Curriculum

Ge, Chendi and Wang, Xin and Zhang, Zeyang and Chen, Hong and Fan, Jiapei and Huang, Longtao and Xue, Hui and Zhu, Wenwu , booktitle =. Dynamic Mixture of Curriculum

work page
[11]

Proceedings of the

Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning , author =. Proceedings of the

work page
[12]

S-Prompts Learning with Pre-Trained Transformers: An

Wang, Yabin and Huang, Zhiwu and Hong, Xiaopeng , booktitle =. S-Prompts Learning with Pre-Trained Transformers: An

work page
[13]

Liang, Yan-Shuo and Li, Wu-Jun , booktitle =

work page
[14]

Wu, Yichen and Piao, Hongming and Huang, Long-Kai and Wang, Renzhen and Li, Wanhua and Pfister, Hanspeter and Meng, Deyu and Ma, Kede and Wei, Ying , booktitle =

work page
[15]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

work page
[16]

Theory on

Li, Hongbo and Lin, Sen and Duan, Lingjie and Liang, Yingbin and Shroff, Ness , booktitle =. Theory on

work page
[17]

Smith, James Seale and Karlinsky, Leonid and Gutta, Vyshnavi and Cascante-Bonilla, Paola and Kim, Donghyun and Arbelle, Assaf and Panda, Rameswar and Feris, Rogerio and Kira, Zsolt , booktitle =

work page
[18]

Deep Learning Using Rectified Linear Units (

Agarap, Abien Fred , journal =. Deep Learning Using Rectified Linear Units (

work page
[19]

International Conference on Learning Representations , year =

Towards a Unified View of Parameter-Efficient Transfer Learning , author =. International Conference on Learning Representations , year =

work page
[20]

Proceedings of the

Expert Gate: Lifelong Learning with a Network of Experts , author =. Proceedings of the

work page
[21]

Advances in Neural Information Processing Systems 33 , pages =

Dark Experience for General Continual Learning: A Strong, Simple Baseline , author =. Advances in Neural Information Processing Systems 33 , pages =

work page
[22]

2017 , publisher =

Learning without Forgetting , author =. 2017 , publisher =

work page 2017
[23]

Proceedings of the

Douillard, Arthur and Ram. Proceedings of the

work page
[24]

Proceedings of the

Task-Agnostic Guided Feature Expansion for Class-Incremental Learning , author =. Proceedings of the

work page
[25]

International Conference on Learning Representations , year =

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , author =. International Conference on Learning Representations , year =

work page
[26]

He, Jiangpeng and Duan, Zhihao and Zhu, Fengqing , booktitle =

work page
[27]

Learning Multiple Layers of Features from Tiny Images , author =

work page
[28]

Proceedings of the

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , author =. Proceedings of the

work page
[29]

Proceedings of the

Natural Adversarial Examples , author =. Proceedings of the

work page
[30]

Proceedings of the

Moment Matching for Multi-Source Domain Adaptation , author =. Proceedings of the

work page
[31]

International Journal of Computer Vision , volume =

Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity Are All You Need , author =. International Journal of Computer Vision , volume =. 2025 , publisher =

work page 2025
[32]

Proceedings of the

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning , author =. Proceedings of the

work page
[33]

Proceedings of the

Emerging Properties in Self-Supervised Vision Transformers , author =. Proceedings of the

work page
[34]

, booktitle =

Rebuffi, Sylvestre-Alvise and Kolesnikov, Alexander and Sperl, Georg and Lampert, Christoph H. , booktitle =

work page
[35]

A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

A large-scale study of representation learning with the visual task adaptation benchmark , author=. arXiv preprint arXiv:1910.04867 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1910
[36]

Proceedings of the European Conference on Computer Vision , pages =

Memory Aware Synapses: Learning What (Not) to Forget , author =. Proceedings of the European Conference on Computer Vision , pages =

work page
[37]

Proceedings of the National Academy of Sciences , volume=

Overcoming catastrophic forgetting in neural networks , author=. Proceedings of the National Academy of Sciences , volume=. 2017 , publisher=

work page 2017
[38]

Proceedings of the 34th International Conference on Machine Learning , pages=

Continual learning through synaptic intelligence , author=. Proceedings of the 34th International Conference on Machine Learning , pages=

work page
[39]

Proceedings of the 42nd International Conference on Machine Learning , year =

Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts , author =. Proceedings of the 42nd International Conference on Machine Learning , year =

work page
[40]

Advances in Neural Information Processing Systems 36 , pages =

Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-Optimality , author =. Advances in Neural Information Processing Systems 36 , pages =

work page
[41]

Zhang, Gengwei and Wang, Liyuan and Kang, Guoliang and Chen, Ling and Wei, Yunchao , booktitle =

work page
[42]

Proceedings of the

Dual-Teacher Class-Incremental Learning with Data-Free Generative Replay , author =. Proceedings of the

work page
[43]

Proceedings of the

Adaptive Plasticity Improvement for Continual Learning , author =. Proceedings of the

work page
[44]

Proceedings of the European Conference on Computer Vision , pages =

Visual Prompt Tuning , author =. Proceedings of the European Conference on Computer Vision , pages =

work page
[45]

Proceedings of the

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters , author =. Proceedings of the

work page
[46]

International Conference on Learning Representations , year =

Lifelong Learning with Dynamically Expandable Networks , author =. International Conference on Learning Representations , year =

work page
[47]

Proceedings of the 36th International Conference on Machine Learning , pages=

Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting , author=. Proceedings of the 36th International Conference on Machine Learning , pages=

work page
[48]

2021 , publisher =

Sokar, Ghada and Mocanu, Decebal Constantin and Pechenizkiy, Mykola , journal =. 2021 , publisher =

work page 2021
[49]

Advances in Neural Information Processing Systems 30 , pages =

Gradient Episodic Memory for Continual Learning , author =. Advances in Neural Information Processing Systems 30 , pages =

work page
[50]

Advances in Neural Information Processing Systems 35 , pages =

Exploring Example Influence in Continual Learning , author =. Advances in Neural Information Processing Systems 35 , pages =

work page
[51]

Trends in Cognitive Sciences , volume =

Catastrophic Forgetting in Connectionist Networks , author =. Trends in Cognitive Sciences , volume =. 1999 , publisher =

work page 1999
[52]

Psychological Review , volume =

Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory , author =. Psychological Review , volume =. 1995 , publisher =

work page 1995
[53]

Class-Incremental Learning with

Huang, Linlan and Cao, Xusheng and Lu, Haori and Liu, Xialei , booktitle =. Class-Incremental Learning with

work page
[54]

Guo, Xiaohan and Cai, Yusong and Liu, Zejia and Wang, Zhengning and Pan, Lili and Li, Hongliang , journal =

work page
[55]

and Ba, Jimmy , journal=

Kingma, Diederik P. and Ba, Jimmy , journal=

work page
[56]

Chen, Shoufa and Ge, Chongjian and Tong, Zhan and Wang, Jiangliu and Song, Yibing and Wang, Jue and Luo, Ping , booktitle=

work page
[57]

, booktitle=

Nair, Vinod and Hinton, Geoffrey E. , booktitle=. Rectified linear units improve

work page
[58]

Journal of Machine Learning Research , volume =

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , author =. Journal of Machine Learning Research , volume =

work page
[59]

Huai, Tianyu and Zhou, Jie and Wu, Xingjiao and Chen, Qin and Bai, Qingchun and Zhou, Ze and He, Liang , booktitle =

work page
[60]

Advances in Neural Information Processing Systems 37 , pages =

Mixture of Experts Meets Prompt-Based Continual Learning , author =. Advances in Neural Information Processing Systems 37 , pages =

work page
[61]

Advances in Neural Information Processing Systems 13 , pages =

Incorporating Second-Order Functional Knowledge for Better Option Pricing , author =. Advances in Neural Information Processing Systems 13 , pages =

work page
[62]

Transactions on Machine Learning Research , issn=

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA , author=. Transactions on Machine Learning Research , issn=

work page
[63]

Sun, Hai-Long and Zhou, Da-Wei and Ye, Han-Jia and Zhan, De-Chuan , journal =

work page
[64]

International Conference on Learning Representations , year =

Divide and Not Forget: Ensemble of Selectively Trained Experts in Continual Learning , author =. International Conference on Learning Representations , year =

work page
[65]

Sun, Hai-Long and Zhou, Da-Wei and Zhao, Hanbin and Gan, Le and Zhan, De-Chuan and Ye, Han-Jia , booktitle=

work page
[66]

International Conference on Learning Representations , year =

One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-Based Continual Learning , author =. International Conference on Learning Representations , year =

work page
[67]

Proceedings of the

Prototype augmentation and self-supervision for incremental learning , author=. Proceedings of the

work page
[68]

Visualizing data using t-

Van der Maaten, Laurens and Hinton, Geoffrey , journal=. Visualizing data using t-

work page

[1] [1]

Proceedings of the

Deep Residual Learning for Image Recognition , author =. Proceedings of the

work page

[2] [2]

An Image is Worth 16x16 Words:

Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil , booktitle =. An Image is Worth 16x16 Words:

work page

[3] [3]

Psychology of Learning and Motivation , volume =

Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , author =. Psychology of Learning and Motivation , volume =

work page

[4] [4]

2024 , publisher =

Class-Incremental Learning: A Survey , author =. 2024 , publisher =

work page 2024

[5] [5]

2024 , publisher =

A Comprehensive Survey of Continual Learning: Theory, Method and Application , author =. 2024 , publisher =

work page 2024

[6] [6]

2022 , publisher =

Class-Incremental Learning: Survey and Performance Evaluation on Image Classification , author =. 2022 , publisher =

work page 2022

[7] [7]

Nature Machine Intelligence , volume =

Three Types of Incremental Learning , author =. Nature Machine Intelligence , volume =. 2022 , publisher =

work page 2022

[8] [8]

Proceedings of the

Learning to Prompt for Continual Learning , author =. Proceedings of the

work page

[9] [9]

Wang, Zifeng and Zhang, Zizhao and Ebrahimi, Sayna and Sun, Ruoxi and Zhang, Han and Lee, Chen-Yu and Ren, Xiaoqi and Su, Guolong and Perot, Vincent and Dy, Jennifer and Pfister, Tomas , booktitle =

work page

[10] [10]

Dynamic Mixture of Curriculum

Ge, Chendi and Wang, Xin and Zhang, Zeyang and Chen, Hong and Fan, Jiapei and Huang, Longtao and Xue, Hui and Zhu, Wenwu , booktitle =. Dynamic Mixture of Curriculum

work page

[11] [11]

Proceedings of the

Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning , author =. Proceedings of the

work page

[12] [12]

S-Prompts Learning with Pre-Trained Transformers: An

Wang, Yabin and Huang, Zhiwu and Hong, Xiaopeng , booktitle =. S-Prompts Learning with Pre-Trained Transformers: An

work page

[13] [13]

Liang, Yan-Shuo and Li, Wu-Jun , booktitle =

work page

[14] [14]

Wu, Yichen and Piao, Hongming and Huang, Long-Kai and Wang, Renzhen and Li, Wanhua and Pfister, Hanspeter and Meng, Deyu and Ma, Kede and Wei, Ying , booktitle =

work page

[15] [15]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

work page

[16] [16]

Theory on

Li, Hongbo and Lin, Sen and Duan, Lingjie and Liang, Yingbin and Shroff, Ness , booktitle =. Theory on

work page

[17] [17]

Smith, James Seale and Karlinsky, Leonid and Gutta, Vyshnavi and Cascante-Bonilla, Paola and Kim, Donghyun and Arbelle, Assaf and Panda, Rameswar and Feris, Rogerio and Kira, Zsolt , booktitle =

work page

[18] [18]

Deep Learning Using Rectified Linear Units (

Agarap, Abien Fred , journal =. Deep Learning Using Rectified Linear Units (

work page

[19] [19]

International Conference on Learning Representations , year =

Towards a Unified View of Parameter-Efficient Transfer Learning , author =. International Conference on Learning Representations , year =

work page

[20] [20]

Proceedings of the

Expert Gate: Lifelong Learning with a Network of Experts , author =. Proceedings of the

work page

[21] [21]

Advances in Neural Information Processing Systems 33 , pages =

Dark Experience for General Continual Learning: A Strong, Simple Baseline , author =. Advances in Neural Information Processing Systems 33 , pages =

work page

[22] [22]

2017 , publisher =

Learning without Forgetting , author =. 2017 , publisher =

work page 2017

[23] [23]

Proceedings of the

Douillard, Arthur and Ram. Proceedings of the

work page

[24] [24]

Proceedings of the

Task-Agnostic Guided Feature Expansion for Class-Incremental Learning , author =. Proceedings of the

work page

[25] [25]

International Conference on Learning Representations , year =

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , author =. International Conference on Learning Representations , year =

work page

[26] [26]

He, Jiangpeng and Duan, Zhihao and Zhu, Fengqing , booktitle =

work page

[27] [27]

Learning Multiple Layers of Features from Tiny Images , author =

work page

[28] [28]

Proceedings of the

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , author =. Proceedings of the

work page

[29] [29]

Proceedings of the

Natural Adversarial Examples , author =. Proceedings of the

work page

[30] [30]

Proceedings of the

Moment Matching for Multi-Source Domain Adaptation , author =. Proceedings of the

work page

[31] [31]

International Journal of Computer Vision , volume =

Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity Are All You Need , author =. International Journal of Computer Vision , volume =. 2025 , publisher =

work page 2025

[32] [32]

Proceedings of the

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning , author =. Proceedings of the

work page

[33] [33]

Proceedings of the

Emerging Properties in Self-Supervised Vision Transformers , author =. Proceedings of the

work page

[34] [34]

, booktitle =

Rebuffi, Sylvestre-Alvise and Kolesnikov, Alexander and Sperl, Georg and Lampert, Christoph H. , booktitle =

work page

[35] [35]

A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

A large-scale study of representation learning with the visual task adaptation benchmark , author=. arXiv preprint arXiv:1910.04867 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1910

[36] [36]

Proceedings of the European Conference on Computer Vision , pages =

Memory Aware Synapses: Learning What (Not) to Forget , author =. Proceedings of the European Conference on Computer Vision , pages =

work page

[37] [37]

Proceedings of the National Academy of Sciences , volume=

Overcoming catastrophic forgetting in neural networks , author=. Proceedings of the National Academy of Sciences , volume=. 2017 , publisher=

work page 2017

[38] [38]

Proceedings of the 34th International Conference on Machine Learning , pages=

Continual learning through synaptic intelligence , author=. Proceedings of the 34th International Conference on Machine Learning , pages=

work page

[39] [39]

Proceedings of the 42nd International Conference on Machine Learning , year =

Addressing Imbalanced Domain-Incremental Learning through Dual-Balance Collaborative Experts , author =. Proceedings of the 42nd International Conference on Machine Learning , year =

work page

[40] [40]

Advances in Neural Information Processing Systems 36 , pages =

Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-Optimality , author =. Advances in Neural Information Processing Systems 36 , pages =

work page

[41] [41]

Zhang, Gengwei and Wang, Liyuan and Kang, Guoliang and Chen, Ling and Wei, Yunchao , booktitle =

work page

[42] [42]

Proceedings of the

Dual-Teacher Class-Incremental Learning with Data-Free Generative Replay , author =. Proceedings of the

work page

[43] [43]

Proceedings of the

Adaptive Plasticity Improvement for Continual Learning , author =. Proceedings of the

work page

[44] [44]

Proceedings of the European Conference on Computer Vision , pages =

Visual Prompt Tuning , author =. Proceedings of the European Conference on Computer Vision , pages =

work page

[45] [45]

Proceedings of the

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters , author =. Proceedings of the

work page

[46] [46]

International Conference on Learning Representations , year =

Lifelong Learning with Dynamically Expandable Networks , author =. International Conference on Learning Representations , year =

work page

[47] [47]

Proceedings of the 36th International Conference on Machine Learning , pages=

Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting , author=. Proceedings of the 36th International Conference on Machine Learning , pages=

work page

[48] [48]

2021 , publisher =

Sokar, Ghada and Mocanu, Decebal Constantin and Pechenizkiy, Mykola , journal =. 2021 , publisher =

work page 2021

[49] [49]

Advances in Neural Information Processing Systems 30 , pages =

Gradient Episodic Memory for Continual Learning , author =. Advances in Neural Information Processing Systems 30 , pages =

work page

[50] [50]

Advances in Neural Information Processing Systems 35 , pages =

Exploring Example Influence in Continual Learning , author =. Advances in Neural Information Processing Systems 35 , pages =

work page

[51] [51]

Trends in Cognitive Sciences , volume =

Catastrophic Forgetting in Connectionist Networks , author =. Trends in Cognitive Sciences , volume =. 1999 , publisher =

work page 1999

[52] [52]

Psychological Review , volume =

Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory , author =. Psychological Review , volume =. 1995 , publisher =

work page 1995

[53] [53]

Class-Incremental Learning with

Huang, Linlan and Cao, Xusheng and Lu, Haori and Liu, Xialei , booktitle =. Class-Incremental Learning with

work page

[54] [54]

Guo, Xiaohan and Cai, Yusong and Liu, Zejia and Wang, Zhengning and Pan, Lili and Li, Hongliang , journal =

work page

[55] [55]

and Ba, Jimmy , journal=

Kingma, Diederik P. and Ba, Jimmy , journal=

work page

[56] [56]

Chen, Shoufa and Ge, Chongjian and Tong, Zhan and Wang, Jiangliu and Song, Yibing and Wang, Jue and Luo, Ping , booktitle=

work page

[57] [57]

, booktitle=

Nair, Vinod and Hinton, Geoffrey E. , booktitle=. Rectified linear units improve

work page

[58] [58]

Journal of Machine Learning Research , volume =

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , author =. Journal of Machine Learning Research , volume =

work page

[59] [59]

Huai, Tianyu and Zhou, Jie and Wu, Xingjiao and Chen, Qin and Bai, Qingchun and Zhou, Ze and He, Liang , booktitle =

work page

[60] [60]

Advances in Neural Information Processing Systems 37 , pages =

Mixture of Experts Meets Prompt-Based Continual Learning , author =. Advances in Neural Information Processing Systems 37 , pages =

work page

[61] [61]

Advances in Neural Information Processing Systems 13 , pages =

Incorporating Second-Order Functional Knowledge for Better Option Pricing , author =. Advances in Neural Information Processing Systems 13 , pages =

work page

[62] [62]

Transactions on Machine Learning Research , issn=

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA , author=. Transactions on Machine Learning Research , issn=

work page

[63] [63]

Sun, Hai-Long and Zhou, Da-Wei and Ye, Han-Jia and Zhan, De-Chuan , journal =

work page

[64] [64]

International Conference on Learning Representations , year =

Divide and Not Forget: Ensemble of Selectively Trained Experts in Continual Learning , author =. International Conference on Learning Representations , year =

work page

[65] [65]

Sun, Hai-Long and Zhou, Da-Wei and Zhao, Hanbin and Gan, Le and Zhan, De-Chuan and Ye, Han-Jia , booktitle=

work page

[66] [66]

International Conference on Learning Representations , year =

One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-Based Continual Learning , author =. International Conference on Learning Representations , year =

work page

[67] [67]

Proceedings of the

Prototype augmentation and self-supervision for incremental learning , author=. Proceedings of the

work page

[68] [68]

Visualizing data using t-

Van der Maaten, Laurens and Hinton, Geoffrey , journal=. Visualizing data using t-

work page