Sparse Orthogonal Parameters Tuning for Continual Learning

Hai-Jian Ke; Jia-Yu Yao; Kun-Peng Ning; Li Yuan; Yong-Hong Tian; Yu-Yang Liu

arxiv: 2411.02813 · v3 · pith:IHSPOL72new · submitted 2024-11-05 · 💻 cs.LG

Sparse Orthogonal Parameters Tuning for Continual Learning

Kun-Peng Ning , Hai-Jian Ke , Yu-Yang Liu , Jia-Yu Yao , Yong-Hong Tian , Li Yuan This is my paper

Pith reviewed 2026-05-23 18:03 UTC · model grok-4.3

classification 💻 cs.LG

keywords continual learningsparse orthogonal parameterspre-trained modelscatastrophic forgettingparameter tuningdelta parameters

0 comments

The pith

Merging sparse orthogonal parameters from multiple tasks prevents catastrophic forgetting when adapting pre-trained models to streaming data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines continual learning where pre-trained models must handle successive tasks without losing prior knowledge. It introduces SoTU, which tunes sparse orthogonal parameters so that updates from different tasks can be combined without interference. The approach is tested on standard benchmarks and works without redesigning classifiers or other components. A reader would care because it simplifies adaptation to new data while keeping earlier performance intact through the orthogonality property.

Core claim

The paper claims that knowledge from multiple domains can be transformed into a fusion of orthogonal delta parameters, and that this fusion allows models to maintain feature representations across streaming tasks without catastrophic forgetting, yielding strong results on diverse continual learning benchmarks as a plug-and-play method.

What carries the argument

The fusion of orthogonal delta parameters obtained from sparse tuning on successive tasks.

If this is right

SoTU serves as a plug-and-play adapter for any pre-trained model on streaming data.
Optimal feature representations emerge without the need for complex classifier designs.
The method succeeds across multiple standard continual learning benchmarks.
Orthogonality in the delta parameters preserves prior task performance during fusion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same orthogonal fusion principle could be tested on prompt-based or other parameter-efficient methods.
Limits of the approach could be probed by scaling to models with billions of parameters or longer task sequences.
Orthogonality might interact with regularization techniques to further reduce interference.

Load-bearing premise

The effectiveness of SoTU lies in the transformation of knowledge learned from multiple domains into the fusion of orthogonal delta parameters.

What would settle it

An experiment that merges the same parameters without enforcing orthogonality and measures whether performance on earlier tasks drops sharply compared with the orthogonal case.

Figures

Figures reproduced from arXiv: 2411.02813 by Hai-Jian Ke, Jia-Yu Yao, Kun-Peng Ning, Li Yuan, Yong-Hong Tian, Yu-Yang Liu.

**Figure 1.** Figure 1: The sparse orthogonal characteristic of pre-trained parameters. We randomly sample delta parameters and merge them from multiple domains into one feature extractor with different masking rates. We show that merging high-sparsity deltas can maintain comparable or even superior performance on all three tasks, while low sparsity will cause seriously parameter collisions, resulting in poor performance (right o… view at source ↗

**Figure 2.** Figure 2: The framework of SoTU. Facing the continual tasks, we first fine-tune the pre-trained ViT and obtain the task-specific [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The cosine similarity from ∆ˆθ 1 to ∆ˆθ 10 with 10% to 90% masking rates in CIFAR100 for 10 tasks, where the values in the i-th row and j-th column represent the similarity between ∆ˆθ i and ∆ˆθ j . We demonstrate that high-sparsity deltas (randomly masking 90% delta parameters in Figure 3e) are more orthogonal to each other. Note that Wk will not participate in the following processes, and its role is onl… view at source ↗

**Figure 4.** Figure 4: Continual learning performance comparison with PTM-based methods on three difficult datasets, [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: The visualization of attention map in delta masking and merging processes. Randomly masking delta parameters can [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: The characteristics of delta masking and merging. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting. These methods typically refrain from updating the pre-trained parameters and instead employ additional adapters, prompts, and classifiers. In this paper, we from a novel perspective investigate the benefit of sparse orthogonal parameters for continual learning. We found that merging sparse orthogonality of models learned from multiple streaming tasks has great potential in addressing catastrophic forgetting. Leveraging this insight, we propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning). We hypothesize that the effectiveness of SoTU lies in the transformation of knowledge learned from multiple domains into the fusion of orthogonal delta parameters. Experimental evaluations on diverse CL benchmarks demonstrate the effectiveness of the proposed approach. Notably, SoTU achieves optimal feature representation for streaming data without necessitating complex classifier designs, making it a Plug-and-Play solution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SoTU tries to simplify continual learning on PTMs by merging sparse orthogonal deltas instead of using prompts or classifiers, but the paper does not isolate whether orthogonality is what actually prevents interference.

read the letter

The main takeaway is that the authors propose tuning sparse orthogonal parameters on top of a frozen pre-trained model for each new task, then fusing those deltas to handle streaming data without catastrophic forgetting. They position SoTU as a plug-and-play method that skips the usual complex classifier designs. Experiments on standard CL benchmarks are reported as showing solid results, which is the concrete evidence the paper offers. That framing is at least a practical simplification worth checking if you work in parameter-efficient continual learning. The approach draws from existing ideas in orthogonal regularization and adapter tuning, so the contribution sits more in the specific fusion setup for sequential tasks than in a brand-new primitive. The soft spot is exactly the one the stress-test flags: there is no ablation or derivation that separates the orthogonality constraint from sparsity or the merging step itself. Without that, it is hard to know whether the claimed benefit comes from the stated mechanism or from other choices in the implementation. The abstract also gives no equations for the orthogonality enforcement or the fusion operator, which leaves the central hypothesis under-supported. This paper is aimed at people already working on efficient adaptation for vision or language models in continual settings. A reader who wants a lighter alternative to prompt or classifier-heavy methods could find the experiments useful to build on. I would send it to peer review so the ablations and implementation details can be examined directly.

Referee Report

3 major / 1 minor

Summary. The paper proposes SoTU (Sparse Orthogonal Parameters Tuning), a continual learning method for pre-trained models that avoids updating base parameters and instead uses additional adapters. It claims that merging sparse orthogonal delta parameters learned across streaming tasks mitigates catastrophic forgetting. The central hypothesis is that effectiveness arises from transforming multi-domain knowledge into fused orthogonal deltas. Experiments on diverse CL benchmarks are reported to show that SoTU yields optimal feature representations for streaming data without requiring complex classifier designs, positioning it as a plug-and-play solution.

Significance. If the orthogonality-based fusion mechanism can be isolated and shown to outperform sparsity or merging alone, the approach would offer a lightweight, parameter-efficient alternative to existing PTM-based CL methods. The plug-and-play aspect without complex classifiers could simplify deployment in streaming settings. However, the current presentation supplies no equations, derivations, or controlled ablations, so the significance cannot yet be assessed beyond the abstract-level assertion.

major comments (3)

[Abstract] Abstract: The hypothesis that effectiveness 'lies in the transformation of knowledge learned from multiple domains into the fusion of orthogonal delta parameters' is stated without any derivation, equation, or formal argument showing why orthogonality (versus sparsity alone or alternative merging operators) is required to prevent task interference.
[Method] Method (inferred from abstract description): No ablation or controlled comparison is described that isolates the orthogonality constraint from the sparsity or merging mechanics; without this, performance gains cannot be attributed to the claimed orthogonal-fusion mechanism rather than other factors.
[Experiments] Experiments (inferred from abstract): The claim of 'optimal feature representation ... without necessitating complex classifier designs' is unsupported by any reported baselines, metrics, error bars, or quantitative comparisons in the provided text, leaving the plug-and-play assertion uninspectable.

minor comments (1)

[Abstract] Abstract: The sentence 'We from a novel perspective investigate' is grammatically incomplete and should be rephrased for readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and outline the revisions we will undertake to improve clarity and support for our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The hypothesis that effectiveness 'lies in the transformation of knowledge learned from multiple domains into the fusion of orthogonal delta parameters' is stated without any derivation, equation, or formal argument showing why orthogonality (versus sparsity alone or alternative merging operators) is required to prevent task interference.

Authors: The abstract provides a high-level summary of the hypothesis. The full manuscript describes the SoTU method, including how sparse orthogonal delta parameters are learned and merged across tasks. To strengthen the presentation, we will add a formal discussion or derivation in the revised version that explains why the orthogonality constraint helps mitigate task interference compared to sparsity or other merging strategies alone. revision: yes
Referee: [Method] Method (inferred from abstract description): No ablation or controlled comparison is described that isolates the orthogonality constraint from the sparsity or merging mechanics; without this, performance gains cannot be attributed to the claimed orthogonal-fusion mechanism rather than other factors.

Authors: We agree that isolating the orthogonality component is valuable for attributing performance gains. Our current experiments focus on overall effectiveness, but we will incorporate targeted ablation studies in the revision comparing SoTU against variants that remove the orthogonality constraint or employ alternative merging operators. revision: yes
Referee: [Experiments] Experiments (inferred from abstract): The claim of 'optimal feature representation ... without necessitating complex classifier designs' is unsupported by any reported baselines, metrics, error bars, or quantitative comparisons in the provided text, leaving the plug-and-play assertion uninspectable.

Authors: The provided text consists of the abstract, which summarizes the results. The full manuscript includes experimental evaluations on diverse CL benchmarks with quantitative comparisons. In the revision, we will make the supporting baselines, metrics, and results more explicit to substantiate the claims regarding optimal feature representations and the plug-and-play nature of the approach. revision: yes

Circularity Check

0 steps flagged

No derivation chain or equations present; claims rest on empirical hypothesis and experiments.

full rationale

The provided abstract and context contain no equations, derivations, or load-bearing mathematical steps. The central statement is explicitly labeled a hypothesis ('We hypothesize that the effectiveness of SoTU lies in the transformation of knowledge learned from multiple domains into the fusion of orthogonal delta parameters') rather than a derived result. No self-citations, fitted parameters renamed as predictions, or ansatzes are visible. The method is presented as a plug-and-play empirical approach validated on benchmarks, with no reduction of outputs to inputs by construction. This is the common case of a method paper whose claims are not mathematically self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be identified or audited.

pith-pipeline@v0.9.0 · 5694 in / 944 out tokens · 35421 ms · 2026-05-23T18:03:37.272646+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We hypothesize that the effectiveness of SoTU lies in the transformation of knowledge learned from multiple domains into the fusion of orthogonal delta parameters.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

merging high-sparsity deltas ... parameter conflicts decrease and the model performance significantly improves

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 3 internal anchors

[1]

Chen, C. P. 1996. A rapid supervised learning neural network for function interpolation and approximation. IEEE Transactions on Neural Networks, 7(5): 1220--1230

work page 1996
[2]

Chen, Z.; and Liu, B. 2018. Lifelong machine learning. Synth. Lect. Artif. Intell. Mach. Learn., 12(3): 1--207

work page 2018
[3]

De Lange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A.; Slabaugh, G.; and Tuytelaars, T. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7): 3366--3385

work page 2021
[4]

De Lange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A., Leonardis; Slabaugh, G.; and Tuytelaars, T. 2022. A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Trans. Pattern Anal. Mach. Intell., 44(7): 3366--3385

work page 2022
[5]

Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248--255. Ieee

work page 2009
[6]

Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2020
[7]

Frankle, J.; and Carbin, M. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635

work page internal anchor Pith review Pith/arXiv arXiv 2018
[8]

Gao, Q.; Zhao, C.; Sun, Y.; Xi, T.; Zhang, G.; Ghanem, B.; and Zhang, J. 2023. A unified continual learning framework with general parameter-efficient tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 11483--11493

work page 2023
[9]

Hendrycks, D.; Basart, S.; Mu, N.; Kadavath, S.; Wang, F.; Dorundo, E.; Desai, R.; Zhu, T.; Parajuli, S.; Guo, M.; et al. 2021 a . The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF international conference on computer vision, 8340--8349

work page 2021
[10]

Hendrycks, D.; Zhao, K.; Basart, S.; Steinhardt, J.; and Song, D. 2021 b . Natural adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 15262--15271

work page 2021
[11]

Janson, P.; Zhang, W.; Aljundi, R.; and Elhoseiny, M. 2022. A simple baseline that questions the use of pretrained-models in continual learning. arXiv preprint arXiv:2210.04428

work page arXiv 2022
[12]

A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al

Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A. A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. 2017. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. U.S.A.., 114(13): 3521--3526

work page 2017
[13]

Krizhevsky, A.; Hinton, G.; et al. 2009. Learning multiple layers of features from tiny images

work page 2009
[14]

Li, Z.; and Hoiem, D. 2018. Learning without Forgetting. IEEE Trans. Pattern Anal. Mach. Intell., 40(12): 2935--2947

work page 2018
[15]

Liu, Y.; Cong, Y.; Sun, G.; Zhang, T.; Dong, J.; and Liu, H. 2021. L3DOC: Lifelong 3D Object Classification. IEEE Transactions on Image Processing, 30: 7486--7498

work page 2021
[16]

Lopez-Paz, D.; and Ranzato, M. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30

work page 2017
[17]

Mai, Z.; Li, R.; Jeong, J.; Quispe, D.; Kim, H.; and Sanner, S. 2022. Online continual learning in image classification: An empirical survey. Neurocomputing, 469: 28--51

work page 2022
[18]

McCloskey, M.; and Cohen, N. J. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. Psychol. Learn. Motiv., 24: 109--165

work page 1989
[19]

D.; Gong, D.; Parvaneh, A.; Abbasnejad, E.; and van den Hengel, A

McDonnell, M. D.; Gong, D.; Parvaneh, A.; Abbasnejad, E.; and van den Hengel, A. 2024. Ranpac: Random projections and pre-trained models for continual learning. Advances in Neural Information Processing Systems, 36

work page 2024
[20]

D.; McKilliam, R

McDonnell, M. D.; McKilliam, R. G.; and de Chazal, P. 2016. On the importance of pair-wise feature correlations for image classification. In 2016 International Joint Conference on Neural Networks (IJCNN), 2290--2297. IEEE

work page 2016
[21]

Pelosin, F. 2022. Simpler is better: off-the-shelf continual learning through pretrained backbones. arXiv preprint arXiv:2205.01586

work page arXiv 2022
[22]

Rebuffi, S.-A.; Kolesnikov, A.; Sperl, G.; and Lampert, C. H. 2017 a . icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2001--2010

work page 2017
[23]

Rebuffi, S.-A.; Kolesnikov, A.; Sperl, G.; and Lampert, C. H. 2017 b . iCaRL: Incremental Classifier and Representation Learning. In IEEE Conf. Comput. Vis. Pattern Recog

work page 2017
[24]

F.; Kraaijveld, M

Schmidt, W. F.; Kraaijveld, M. A.; Duin, R. P.; et al. 1992. Feed forward neural networks with random weights. In International conference on pattern recognition, 1--1. IEEE Computer Society Press

work page 1992
[25]

S.; Karlinsky, L.; Gutta, V.; Cascante-Bonilla, P.; Kim, D.; Arbelle, A.; Panda, R.; Feris, R.; and Kira, Z

Smith, J. S.; Karlinsky, L.; Gutta, V.; Cascante-Bonilla, P.; Kim, D.; Arbelle, A.; Panda, R.; Feris, R.; and Kira, Z. 2023. Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11909--11919

work page 2023
[26]

Sun, H.-L.; Zhou, D.-W.; Ye, H.-J.; and Zhan, D.-C. 2023. Pilot: A pre-trained model-based continual learning toolbox. arXiv preprint arXiv:2309.07117

work page arXiv 2023
[27]

Wah, C.; Branson, S.; Welinder, P.; Perona, P.; and Belongie, S. 2011. The caltech-ucsd birds-200-2011 dataset

work page 2011
[28]

Wang, F.-Y.; Zhou, D.-W.; Ye, H.-J.; and Zhan, D.-C. 2022 a . Foster: Feature boosting and compression for class-incremental learning. In European conference on computer vision, 398--414. Springer

work page 2022
[29]

Wang, L.; Xie, J.; Zhang, X.; Huang, M.; Su, H.; and Zhu, J. 2024. Hierarchical decomposition of prompt-based continual learning: Rethinking obscured sub-optimality. Advances in Neural Information Processing Systems, 36

work page 2024
[30]

Wang, Y.; Ma, Z.; Huang, Z.; Wang, Y.; Su, Z.; and Hong, X. 2023. Isolation and impartial aggregation: A paradigm of incremental learning without interference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 10209--10217

work page 2023
[31]

Wang, Z.; Zhang, Z.; Ebrahimi, S.; Sun, R.; Zhang, H.; Lee, C.-Y.; Ren, X.; Su, G.; Perot, V.; Dy, J.; et al. 2022 b . Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, 631--648. Springer

work page 2022
[32]

Wang, Z.; Zhang, Z.; Lee, C.-Y.; Zhang, H.; Sun, R.; Ren, X.; Su, G.; Perot, V.; Dy, J.; and Pfister, T. 2022 c . Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 139--149

work page 2022
[33]

Yan, S.; Xie, J.; and He, X. 2021. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3014--3023

work page 2021
[34]

Yoon, J.; Yang, E.; Lee, J.; and Hwang, S. J. 2018. Lifelong Learning with Dynamically Expandable Networks. In Int. Conf. Learn. Represent

work page 2018
[35]

Yu, L.; Yu, B.; Yu, H.; Huang, F.; and Li, Y. 2023. Language models are super mario: Absorbing abilities from homologous models as a free lunch. arXiv preprint arXiv:2311.03099

work page arXiv 2023
[36]

A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

Zhai, X.; Puigcerver, J.; Kolesnikov, A.; Ruyssen, P.; Riquelme, C.; Lucic, M.; Djolonga, J.; Pinto, A. S.; Neumann, M.; Dosovitskiy, A.; et al. 2019. A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867

work page internal anchor Pith review Pith/arXiv arXiv 2019
[37]

Zhang, G.; Wang, L.; Kang, G.; Chen, L.; and Wei, Y. 2023. Slca: Slow learner with classifier alignment for continual learning on a pre-trained model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 19148--19158

work page 2023
[38]

Zhou, D.-W.; Sun, H.-L.; Ning, J.; Ye, H.-J.; and Zhan, D.-C. 2024. Continual Learning with Pre-Trained Models: A Survey. arXiv preprint arXiv:2401.16386

work page arXiv 2024
[39]

Zhou, D.-W.; Wang, Q.-W.; Qi, Z.-H.; Ye, H.-J.; Zhan, D.-C.; and Liu, Z. 2023 a . Deep class-incremental learning: A survey. arXiv preprint arXiv:2302.03648

work page arXiv 2023
[40]

Zhou, D.-W.; Wang, Q.-W.; Ye, H.-J.; and Zhan, D.-C. 2022. A model or 603 exemplars: Towards memory-efficient class-incremental learning. arXiv preprint arXiv:2205.13218

work page arXiv 2022
[41]

Zhou, D.-W.; Ye, H.-J.; Zhan, D.-C.; and Liu, Z. 2023 b . Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need. arXiv preprint arXiv:2303.07338

work page arXiv 2023
[42]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[43]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Chen, C. P. 1996. A rapid supervised learning neural network for function interpolation and approximation. IEEE Transactions on Neural Networks, 7(5): 1220--1230

work page 1996

[2] [2]

Chen, Z.; and Liu, B. 2018. Lifelong machine learning. Synth. Lect. Artif. Intell. Mach. Learn., 12(3): 1--207

work page 2018

[3] [3]

De Lange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A.; Slabaugh, G.; and Tuytelaars, T. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7): 3366--3385

work page 2021

[4] [4]

De Lange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A., Leonardis; Slabaugh, G.; and Tuytelaars, T. 2022. A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Trans. Pattern Anal. Mach. Intell., 44(7): 3366--3385

work page 2022

[5] [5]

Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248--255. Ieee

work page 2009

[6] [6]

Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2020

[7] [7]

Frankle, J.; and Carbin, M. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635

work page internal anchor Pith review Pith/arXiv arXiv 2018

[8] [8]

Gao, Q.; Zhao, C.; Sun, Y.; Xi, T.; Zhang, G.; Ghanem, B.; and Zhang, J. 2023. A unified continual learning framework with general parameter-efficient tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 11483--11493

work page 2023

[9] [9]

Hendrycks, D.; Basart, S.; Mu, N.; Kadavath, S.; Wang, F.; Dorundo, E.; Desai, R.; Zhu, T.; Parajuli, S.; Guo, M.; et al. 2021 a . The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF international conference on computer vision, 8340--8349

work page 2021

[10] [10]

Hendrycks, D.; Zhao, K.; Basart, S.; Steinhardt, J.; and Song, D. 2021 b . Natural adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 15262--15271

work page 2021

[11] [11]

Janson, P.; Zhang, W.; Aljundi, R.; and Elhoseiny, M. 2022. A simple baseline that questions the use of pretrained-models in continual learning. arXiv preprint arXiv:2210.04428

work page arXiv 2022

[12] [12]

A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al

Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A. A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. 2017. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. U.S.A.., 114(13): 3521--3526

work page 2017

[13] [13]

Krizhevsky, A.; Hinton, G.; et al. 2009. Learning multiple layers of features from tiny images

work page 2009

[14] [14]

Li, Z.; and Hoiem, D. 2018. Learning without Forgetting. IEEE Trans. Pattern Anal. Mach. Intell., 40(12): 2935--2947

work page 2018

[15] [15]

Liu, Y.; Cong, Y.; Sun, G.; Zhang, T.; Dong, J.; and Liu, H. 2021. L3DOC: Lifelong 3D Object Classification. IEEE Transactions on Image Processing, 30: 7486--7498

work page 2021

[16] [16]

Lopez-Paz, D.; and Ranzato, M. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30

work page 2017

[17] [17]

Mai, Z.; Li, R.; Jeong, J.; Quispe, D.; Kim, H.; and Sanner, S. 2022. Online continual learning in image classification: An empirical survey. Neurocomputing, 469: 28--51

work page 2022

[18] [18]

McCloskey, M.; and Cohen, N. J. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. Psychol. Learn. Motiv., 24: 109--165

work page 1989

[19] [19]

D.; Gong, D.; Parvaneh, A.; Abbasnejad, E.; and van den Hengel, A

McDonnell, M. D.; Gong, D.; Parvaneh, A.; Abbasnejad, E.; and van den Hengel, A. 2024. Ranpac: Random projections and pre-trained models for continual learning. Advances in Neural Information Processing Systems, 36

work page 2024

[20] [20]

D.; McKilliam, R

McDonnell, M. D.; McKilliam, R. G.; and de Chazal, P. 2016. On the importance of pair-wise feature correlations for image classification. In 2016 International Joint Conference on Neural Networks (IJCNN), 2290--2297. IEEE

work page 2016

[21] [21]

Pelosin, F. 2022. Simpler is better: off-the-shelf continual learning through pretrained backbones. arXiv preprint arXiv:2205.01586

work page arXiv 2022

[22] [22]

Rebuffi, S.-A.; Kolesnikov, A.; Sperl, G.; and Lampert, C. H. 2017 a . icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2001--2010

work page 2017

[23] [23]

Rebuffi, S.-A.; Kolesnikov, A.; Sperl, G.; and Lampert, C. H. 2017 b . iCaRL: Incremental Classifier and Representation Learning. In IEEE Conf. Comput. Vis. Pattern Recog

work page 2017

[24] [24]

F.; Kraaijveld, M

Schmidt, W. F.; Kraaijveld, M. A.; Duin, R. P.; et al. 1992. Feed forward neural networks with random weights. In International conference on pattern recognition, 1--1. IEEE Computer Society Press

work page 1992

[25] [25]

S.; Karlinsky, L.; Gutta, V.; Cascante-Bonilla, P.; Kim, D.; Arbelle, A.; Panda, R.; Feris, R.; and Kira, Z

Smith, J. S.; Karlinsky, L.; Gutta, V.; Cascante-Bonilla, P.; Kim, D.; Arbelle, A.; Panda, R.; Feris, R.; and Kira, Z. 2023. Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11909--11919

work page 2023

[26] [26]

Sun, H.-L.; Zhou, D.-W.; Ye, H.-J.; and Zhan, D.-C. 2023. Pilot: A pre-trained model-based continual learning toolbox. arXiv preprint arXiv:2309.07117

work page arXiv 2023

[27] [27]

Wah, C.; Branson, S.; Welinder, P.; Perona, P.; and Belongie, S. 2011. The caltech-ucsd birds-200-2011 dataset

work page 2011

[28] [28]

Wang, F.-Y.; Zhou, D.-W.; Ye, H.-J.; and Zhan, D.-C. 2022 a . Foster: Feature boosting and compression for class-incremental learning. In European conference on computer vision, 398--414. Springer

work page 2022

[29] [29]

Wang, L.; Xie, J.; Zhang, X.; Huang, M.; Su, H.; and Zhu, J. 2024. Hierarchical decomposition of prompt-based continual learning: Rethinking obscured sub-optimality. Advances in Neural Information Processing Systems, 36

work page 2024

[30] [30]

Wang, Y.; Ma, Z.; Huang, Z.; Wang, Y.; Su, Z.; and Hong, X. 2023. Isolation and impartial aggregation: A paradigm of incremental learning without interference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 10209--10217

work page 2023

[31] [31]

Wang, Z.; Zhang, Z.; Ebrahimi, S.; Sun, R.; Zhang, H.; Lee, C.-Y.; Ren, X.; Su, G.; Perot, V.; Dy, J.; et al. 2022 b . Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, 631--648. Springer

work page 2022

[32] [32]

Wang, Z.; Zhang, Z.; Lee, C.-Y.; Zhang, H.; Sun, R.; Ren, X.; Su, G.; Perot, V.; Dy, J.; and Pfister, T. 2022 c . Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 139--149

work page 2022

[33] [33]

Yan, S.; Xie, J.; and He, X. 2021. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3014--3023

work page 2021

[34] [34]

Yoon, J.; Yang, E.; Lee, J.; and Hwang, S. J. 2018. Lifelong Learning with Dynamically Expandable Networks. In Int. Conf. Learn. Represent

work page 2018

[35] [35]

Yu, L.; Yu, B.; Yu, H.; Huang, F.; and Li, Y. 2023. Language models are super mario: Absorbing abilities from homologous models as a free lunch. arXiv preprint arXiv:2311.03099

work page arXiv 2023

[36] [36]

A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

Zhai, X.; Puigcerver, J.; Kolesnikov, A.; Ruyssen, P.; Riquelme, C.; Lucic, M.; Djolonga, J.; Pinto, A. S.; Neumann, M.; Dosovitskiy, A.; et al. 2019. A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867

work page internal anchor Pith review Pith/arXiv arXiv 2019

[37] [37]

Zhang, G.; Wang, L.; Kang, G.; Chen, L.; and Wei, Y. 2023. Slca: Slow learner with classifier alignment for continual learning on a pre-trained model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 19148--19158

work page 2023

[38] [38]

Zhou, D.-W.; Sun, H.-L.; Ning, J.; Ye, H.-J.; and Zhan, D.-C. 2024. Continual Learning with Pre-Trained Models: A Survey. arXiv preprint arXiv:2401.16386

work page arXiv 2024

[39] [39]

Zhou, D.-W.; Wang, Q.-W.; Qi, Z.-H.; Ye, H.-J.; Zhan, D.-C.; and Liu, Z. 2023 a . Deep class-incremental learning: A survey. arXiv preprint arXiv:2302.03648

work page arXiv 2023

[40] [40]

Zhou, D.-W.; Wang, Q.-W.; Ye, H.-J.; and Zhan, D.-C. 2022. A model or 603 exemplars: Towards memory-efficient class-incremental learning. arXiv preprint arXiv:2205.13218

work page arXiv 2022

[41] [41]

Zhou, D.-W.; Ye, H.-J.; Zhan, D.-C.; and Liu, Z. 2023 b . Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need. arXiv preprint arXiv:2303.07338

work page arXiv 2023

[42] [42]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[43] [43]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page