arxiv: 2605.07512 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Hierarchical Dual-Subspace Decoupling for Continual Learning in Vision-Language Models

Mengxin Qin , Xiang Zhang , Kun Wei , Xu Yang , Cheng Deng

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:38 UTC · model grok-4.3

classification 💻 cs.CV

keywords continual learningclass-incremental learningvision-language modelssubspace decouplingparameter driftcatastrophic forgettingsingular value decomposition

0 comments

The pith

Decomposing parameter updates into general and task-specific subspaces reduces interference and forgetting in vision-language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that sequential tasks produce parameter updates that occupy overlapping low-rank subspaces, which creates cross-task interference and causes catastrophic forgetting in vision-language models. It introduces the Hierarchical Dual-Subspace Decoupling framework that uses a lightweight Feature Modulation Module to split the space explicitly into general and task-specific parts. A General Fusion Module then identifies stable transferable knowledge through relative change evaluation and an adaptive threshold, while a Hierarchical Learning Module applies singular value decomposition with scaling to keep updates inside separate subspace scales. Experiments on standard class-incremental benchmarks show this combination yields state-of-the-art retention of prior knowledge alongside acquisition of new classes. A reader would care because the method shifts focus from simply restricting update size to controlling the geometric structure of those updates.

Core claim

From a subspace perspective, updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference and severe forgetting. The Hierarchical Dual-Subspace Decoupling framework explicitly decomposes the parameter space into general and task-specific subspaces via a lightweight Feature Modulation Module, evaluates relative parameter changes across tasks with an adaptive threshold in the General Fusion Module to capture stable knowledge, and performs structured parameter decomposition via singular value decomposition together with a scaling mechanism in the Hierarchical Learning Module to constrain updates within distinct subspace 0

What carries the argument

Hierarchical Dual-Subspace Decoupling (HDSD) framework that splits parameters through a Feature Modulation Module, then applies General Fusion Module for adaptive stable-knowledge selection and Hierarchical Learning Module for SVD-based scale-constrained decomposition.

If this is right

Models acquire new classes while retaining performance on earlier ones by preserving transferable knowledge in the general subspace.
Parameter drift is limited because updates are forced into distinct scale-separated subspaces rather than allowed to overlap freely.
The method achieves state-of-the-art results on conventional class-incremental learning benchmarks for vision-language models.
Cross-task interference is lowered through explicit decomposition instead of implicit regularization alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same subspace-overlap diagnosis could be tested in continual learning settings for large language models where parameter drift is also observed.
The dual decomposition could be combined with existing regularization or replay methods to produce additive gains in retention.
Running the modules on longer task sequences would reveal whether the low-rank subspace assumption holds as task count grows.

Load-bearing premise

That updates from different tasks occupy overlapping low-rank subspaces which can be explicitly separated into general and task-specific components to reduce interference.

What would settle it

A direct measurement on a standard benchmark showing that the decomposition modules leave subspace overlap and parameter drift unchanged or that accuracy on previous classes remains the same when the dual-subspace split is removed.

Figures

Figures reproduced from arXiv: 2605.07512 by Cheng Deng, Kun Wei, Mengxin Qin, Xiang Zhang, Xu Yang.

**Figure 1.** Figure 1: Subspace overlap on CIFAR-100 (B0 Inc10). Vision–language pre-trained models (VLMs) like CLIP [3] have emerged as a promising paradigm for CIL. Thanks to their robust cross-modal alignment, continual adaptation requires updating only a marginal set of parameters. Consequently, recent methods typically freeze the backbone and insert lightweight tunable modules, such as prompts or adapters, to capture new ta… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed HDSD. FMM consists of GFM and HLM. To tackle this issue, we propose a Hierarchical DualSubspace Decoupling (HDSD) framework, which explicitly governs cross-task parameter interactions. As shown in [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Summary of the proposed approach. (a) Overall structure features a Feature Modulation [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: The training phase of our proposed hierarchical approach in HLM. Hierarchical Training. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: The test phase of our proposed hierarchical approach in HLM. Hierarchical Testing. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Accuracy curves of RAPF and our method on ImageNet-R. The left figure shows the results [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Performance under different threshold values on ImageNet-100 (B0 Inc10). The threshold τ used in the General Fusion Module (GFM) is defined based on the distribution of the relative parameter change Γ. To determine an appropriate value, we conduct experiments on the ImageNet-100 dataset with 0 base classes and an increment size of 10, resulting in a total of 10 tasks. We vary the percentile parameter q in… view at source ↗

**Figure 8.** Figure 8: Additional accuracy curves on ImageNet-100 and CIFAR-100 under different base-session [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Class-incremental learning aims to continuously acquire new knowledge while preserving previously learned information, thereby mitigating catastrophic forgetting. Existing methods primarily restrict parameter updates but often overlook their structural properties in high-dimensional spaces. From a subspace perspective, updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference and severe forgetting. To address this issue, we propose HDSD, a Hierarchical Dual-Subspace Decoupling framework for continual learning in vision-language models. Specifically, we introduce a lightweight Feature Modulation Module (FMM) that explicitly decomposes the parameter space into general and task-specific subspaces. Building on this design, we develop two complementary components. First, a General Fusion Module (GFM) evaluates relative parameter changes across tasks and uses an adaptive threshold to capture stable and transferable knowledge. Second, a Hierarchical Learning Module (HLM) performs structured parameter decomposition via Singular Value Decomposition (SVD) and uses a scaling mechanism to constrain updates within distinct subspace scales. Together, these designs reduce subspace interference and parameter drift. Extensive experiments on conventional benchmarks show that HDSD achieves state-of-the-art results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HDSD gives a concrete hierarchical dual-subspace design for continual learning in VLMs that looks workable in practice, but the link between the modules and reduced interference stays indirect.

read the letter

The main point is that this paper puts forward HDSD, a framework that splits parameter updates into general and task-specific subspaces for class-incremental learning in vision-language models. It does this through three lightweight pieces: a Feature Modulation Module for the initial split, a General Fusion Module that applies an adaptive threshold to keep stable knowledge, and a Hierarchical Learning Module that uses SVD plus scaling to limit drift at different ranks. That specific combination is new relative to earlier subspace or parameter-isolation work cited in the abstract. The approach is practical because the modules add little overhead and target the real issue of forgetting when new classes arrive without retraining everything. Experiments on standard benchmarks are reported to reach state-of-the-art, which suggests the design can deliver measurable gains in incremental settings. The framing from a subspace-overlap perspective is also clear and useful for high-dimensional models like VLMs. The softer part is the causal story. The paper assumes task updates occupy overlapping low-rank subspaces and that the modules directly cut interference and drift, yet it does not include direct checks such as principal angles between subspaces or cosine similarities of update directions before versus after the modules run. The adaptive threshold and scaling could be providing generic regularization benefits instead. Ablations would need to isolate each component clearly to settle this. Free parameters like the threshold also make it worth checking sensitivity. This paper is for people working on continual learning or efficient adaptation in multimodal models who want a structured way to handle incremental tasks. A reader already familiar with subspace methods would pick up the implementation details quickly. It deserves peer review because the problem is relevant, the method is fully specified, and the reported results are concrete enough to evaluate, even if stronger mechanistic diagnostics would improve the case.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes HDSD, a Hierarchical Dual-Subspace Decoupling framework for class-incremental continual learning in vision-language models. It posits that task-induced parameter updates occupy overlapping low-rank subspaces causing interference and forgetting, and introduces three modules to address this: the Feature Modulation Module (FMM) for explicit decomposition into general and task-specific subspaces, the General Fusion Module (GFM) that applies an adaptive threshold to retain stable knowledge, and the Hierarchical Learning Module (HLM) that uses SVD-based decomposition plus a scaling mechanism to constrain updates at different subspace scales. The central claim is that these designs reduce subspace interference and parameter drift, yielding state-of-the-art results on standard benchmarks.

Significance. If the mechanistic claims and empirical gains are substantiated, the work could meaningfully advance continual learning by shifting focus from generic regularization to explicit structural decomposition of parameter updates in high-dimensional VLM spaces. The subspace perspective is a potentially useful lens, but its impact hinges on demonstrating that the proposed modules causally mitigate interference rather than acting through incidental regularization.

major comments (3)

[Abstract] Abstract: The premise that 'updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference' is asserted without any supporting diagnostic (e.g., principal angles between task subspaces, cosine similarity of update directions, or effective rank of task gradients). Consequently, it is unclear whether the reported gains arise from the dual-subspace decoupling or from the generic effects of the adaptive threshold and scaling.
[Method] Method (FMM/GFM/HLM): The adaptive threshold in GFM and the scaling mechanism in HLM are listed as free parameters, yet no derivation, optimization procedure, or ablation isolating their contribution to subspace separation is provided. This leaves open the possibility that performance improvements are driven by added constraints rather than the claimed hierarchical dual-subspace structure.
[Experiments] Experiments: The SOTA claim is stated without reference to specific tables, error bars, or module-wise ablations. Without quantitative evidence that interference metrics improve post-decomposition (or that baselines with equivalent regularization do not match the gains), the causal link between the proposed modules and reduced forgetting remains unverified.

minor comments (1)

[Abstract] The abstract would be strengthened by naming the specific benchmarks and providing at least one quantitative result (e.g., average accuracy or forgetting rate) to ground the SOTA assertion.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. We agree that stronger empirical grounding for the subspace interference premise, clearer justification for the hyperparameters, and more explicit experimental reporting would strengthen the manuscript. Below we respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] The premise that 'updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference' is asserted without any supporting diagnostic (e.g., principal angles between task subspaces, cosine similarity of update directions, or effective rank of task gradients). Consequently, it is unclear whether the reported gains arise from the dual-subspace decoupling or from the generic effects of the adaptive threshold and scaling.

Authors: We acknowledge that the abstract states the subspace-overlap premise without accompanying diagnostics. This observation originated from preliminary gradient analyses we performed during method development. In the revised manuscript we will add a dedicated diagnostic subsection (likely in Section 3 or the appendix) that reports (i) principal angles between the dominant subspaces of task-specific updates, (ii) average cosine similarity of update directions across consecutive tasks, and (iii) effective rank of the task gradients before and after each module. These metrics will be computed on the same benchmarks used for the main results, thereby directly linking the claimed interference reduction to the performance gains. revision: yes
Referee: [Method] The adaptive threshold in GFM and the scaling mechanism in HLM are listed as free parameters, yet no derivation, optimization procedure, or ablation isolating their contribution to subspace separation is provided. This leaves open the possibility that performance improvements are driven by added constraints rather than the claimed hierarchical dual-subspace structure.

Authors: The adaptive threshold in GFM is computed on-the-fly from the relative magnitude of parameter changes across tasks (specifically, the ratio of Frobenius norms of consecutive task updates), while the scaling factors in HLM are derived from the singular-value spectrum obtained by SVD on the task-specific subspace. Both are therefore data-dependent rather than purely free hyperparameters; their only tunable aspect is a small set of scaling coefficients that we select via grid search on a 5% held-out validation split of the first task. In the revision we will (i) provide a short stability-analysis derivation showing why these choices preserve general knowledge, (ii) report the exact search ranges and selected values, and (iii) add an ablation table that isolates the contribution of each mechanism to the measured subspace-separation metrics (principal angles and overlap ratios). revision: yes
Referee: [Experiments] The SOTA claim is stated without reference to specific tables, error bars, or module-wise ablations. Without quantitative evidence that interference metrics improve post-decomposition (or that baselines with equivalent regularization do not match the gains), the causal link between the proposed modules and reduced forgetting remains unverified.

Authors: We agree that the current experimental section would benefit from more explicit cross-references and additional controls. In the revised version we will: (a) explicitly cite the main result tables (currently Tables 1–3) when claiming SOTA performance, (b) report mean and standard deviation over three random seeds for all methods, (c) add a module-wise ablation study that measures both accuracy and the same interference diagnostics (principal angles, cosine similarity, effective rank) before and after each component, and (d) include a controlled comparison against baselines that receive equivalent regularization strength but lack the explicit dual-subspace decomposition. These additions will make the causal contribution of the hierarchical decoupling clearer. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is an architectural response to an independently stated subspace assumption

full rationale

The paper states an assumption about overlapping low-rank subspaces causing interference, then directly proposes HDSD with FMM (decomposition into general/task-specific subspaces), GFM (adaptive threshold for stable knowledge), and HLM (SVD plus scaling) as mitigation. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. Results are reported on external conventional benchmarks, keeping the derivation self-contained and falsifiable without reduction to its own inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 3 invented entities

The central claim rests on a domain assumption about subspace overlap plus several newly introduced modules whose effectiveness is not independently evidenced in the abstract.

free parameters (2)

adaptive threshold
Used in the General Fusion Module to capture stable knowledge; value selection mechanism not specified.
scaling mechanism parameters
Applied in the Hierarchical Learning Module to constrain updates at distinct subspace scales.

axioms (1)

domain assumption Updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference
Directly stated in the abstract as the motivation for the approach.

invented entities (3)

Feature Modulation Module (FMM) no independent evidence
purpose: Explicitly decomposes the parameter space into general and task-specific subspaces
Lightweight module introduced as the foundation of the framework.
General Fusion Module (GFM) no independent evidence
purpose: Evaluates relative parameter changes across tasks using an adaptive threshold
New component for capturing stable knowledge.
Hierarchical Learning Module (HLM) no independent evidence
purpose: Performs structured parameter decomposition via SVD and applies scaling to constrain updates
New component for hierarchical subspace handling.

pith-pipeline@v0.9.0 · 5503 in / 1390 out tokens · 38293 ms · 2026-05-11T02:38:08.071627+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

updates induced by different tasks tend to lie in multiple overlapping low-rank subspaces, leading to cross-task subspace interference... HLM performs structured parameter decomposition via Singular Value Decomposition (SVD) and uses a scaling mechanism
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Hierarchical Dual-Subspace Decoupling (HDSD) framework... reduce subspace interference and parameter drift

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Learning without forgetting.IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017

Zhizhong Li and Derek Hoiem. Learning without forgetting.IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017

work page 2017
[2]

icarl: Incremental classifier and representation learning

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017

work page 2001
[3]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021
[4]

Learning without forgetting for vision-language models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Da-Wei Zhou, Yuanhan Zhang, Yan Wang, Jingyi Ning, Han-Jia Ye, De-Chuan Zhan, and Ziwei Liu. Learning without forgetting for vision-language models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[5]

Class-incremental learning with clip: Adaptive representation adjustment and parameter fusion

Linlan Huang, Xusheng Cao, Haori Lu, and Xialei Liu. Class-incremental learning with clip: Adaptive representation adjustment and parameter fusion. InEuropean Conference on Computer Vision, pages 214–231. Springer, 2024

work page 2024
[6]

Class-incremental learning: survey and performance evaluation on image classification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5513–5533, 2022

Marc Masana, Xialei Liu, Bartłomiej Twardowski, Mikel Menta, Andrew D Bagdanov, and Joost Van De Weijer. Class-incremental learning: survey and performance evaluation on image classification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5513–5533, 2022

work page 2022
[7]

Learning without memorizing

Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without memorizing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5138–5146, 2019

work page 2019
[8]

Rtra: Rapid training of regularization-based approaches in continual learning

Sahil Nokhwal and Nirman Kumar. Rtra: Rapid training of regularization-based approaches in continual learning. In2023 10th International Conference on Soft Computing & Machine Intelligence (ISCMI), pages 188–192. IEEE, 2023

work page 2023
[9]

Regularizing second-order influences for contin- ual learning

Zhicheng Sun, Yadong Mu, and Gang Hua. Regularizing second-order influences for contin- ual learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20166–20175, 2023

work page 2023
[10]

Regularization-based efficient continual learning in deep state-space models

Yuanhang Zhang, Zhidi Lin, Yiyong Sun, Feng Yin, and Carsten Fritsche. Regularization-based efficient continual learning in deep state-space models. In2024 27th International Conference on Information Fusion (FUSION), pages 1–8. IEEE, 2024

work page 2024
[11]

Incremental embedding learning with disentangled representation translation.IEEE Transactions on Neural Networks and Learning Systems, 35(3):3821–3833, 2022

Kun Wei, Da Chen, Yuhong Li, Xu Yang, Cheng Deng, and Dacheng Tao. Incremental embedding learning with disentangled representation translation.IEEE Transactions on Neural Networks and Learning Systems, 35(3):3821–3833, 2022

work page 2022
[12]

Gdumb: A simple approach that questions our progress in continual learning

Ameya Prabhu, Philip HS Torr, and Puneet K Dokania. Gdumb: A simple approach that questions our progress in continual learning. InEuropean conference on computer vision, pages 524–540. Springer, 2020. 10

work page 2020
[13]

Triple-memory networks: A brain-inspired method for continual learning.IEEE Transactions on Neural Networks and Learning Systems, 33(5):1925–1934, 2021

Liyuan Wang, Bo Lei, Qian Li, Hang Su, Jun Zhu, and Yi Zhong. Triple-memory networks: A brain-inspired method for continual learning.IEEE Transactions on Neural Networks and Learning Systems, 33(5):1925–1934, 2021

work page 1925
[14]

Augmented memory replay-based continual learning approaches for network intrusion detection.Advances in Neural Information Processing Systems, 36:17156–17169, 2023

Sumohana Channappayya, Bheemarjuna Reddy Tamma, et al. Augmented memory replay-based continual learning approaches for network intrusion detection.Advances in Neural Information Processing Systems, 36:17156–17169, 2023

work page 2023
[15]

Pseudo replay-based class con- tinual learning for online new category anomaly detection in advanced manufacturing.IISE Transactions, 57(12):1407–1421, 2025

Yuxuan Li, Tianxin Xie, Chenang Liu, and Zhangyue Shi. Pseudo replay-based class con- tinual learning for online new category anomaly detection in advanced manufacturing.IISE Transactions, 57(12):1407–1421, 2025

work page 2025
[16]

Not just selection, but exploration: Online class-incremental continual learning via dual view consistency

Yanan Gu, Xu Yang, Kun Wei, and Cheng Deng. Not just selection, but exploration: Online class-incremental continual learning via dual view consistency. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7442–7451, 2022

work page 2022
[17]

Coscl: Cooperation of small continual learners is stronger than a big one

Liyuan Wang, Xingxing Zhang, Qian Li, Jun Zhu, and Yi Zhong. Coscl: Cooperation of small continual learners is stronger than a big one. InEuropean Conference on Computer Vision, pages 254–271. Springer, 2022

work page 2022
[18]

Continual object detection via prototypical task correlation guided gating mechanism

Binbin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, and Xiaodan Liang. Continual object detection via prototypical task correlation guided gating mechanism. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9255–9264, 2022

work page 2022
[19]

Continual learning on dynamic graphs via parameter isolation

Peiyan Zhang, Yuchen Yan, Chaozhuo Li, Senzhang Wang, Xing Xie, Guojie Song, and Sunghun Kim. Continual learning on dynamic graphs via parameter isolation. InProceedings of the 46th international ACM SIGIR conference on research and development in information retrieval, pages 601–611, 2023

work page 2023
[20]

Isola- tion and impartial aggregation: A paradigm of incremental learning without interference

Yabin Wang, Zhiheng Ma, Zhiwu Huang, Yaowei Wang, Zhou Su, and Xiaopeng Hong. Isola- tion and impartial aggregation: A paradigm of incremental learning without interference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10209–10217, 2023

work page 2023
[21]

Class incremental learning via contrastive complementary augmentation.IEEE Transactions on Image Processing, 2025

Xi Wang, Xu Yang, Kun Wei, Yanan Gu, and Cheng Deng. Class incremental learning via contrastive complementary augmentation.IEEE Transactions on Image Processing, 2025

work page 2025
[22]

Preventing zero-shot transfer degradation in continual learning of vision-language models

Zangwei Zheng, Mingyuan Ma, Kai Wang, Ziheng Qin, Xiangyu Yue, and Yang You. Preventing zero-shot transfer degradation in continual learning of vision-language models. InProceedings of the IEEE/CVF international conference on computer vision, pages 19125–19136, 2023

work page 2023
[23]

Lifelong machine learning with deep streaming linear discriminant analysis

Tyler L Hayes and Christopher Kanan. Lifelong machine learning with deep streaming linear discriminant analysis. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 220–221, 2020

work page 2020
[24]

When prompt-based incremental learning does not meet strong pretraining

Yu-Ming Tang, Yi-Xing Peng, and Wei-Shi Zheng. When prompt-based incremental learning does not meet strong pretraining. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1706–1716, 2023

work page 2023
[25]

Prototype completion with primitive knowledge for few-shot learning

Baoquan Zhang, Xutao Li, Yunming Ye, Zhichao Huang, and Lisai Zhang. Prototype completion with primitive knowledge for few-shot learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3754–3762, 2021

work page 2021
[26]

A unified continual learning framework with general parameter-efficient tuning

Qiankun Gao, Chen Zhao, Yifan Sun, Teng Xi, Gang Zhang, Bernard Ghanem, and Jian Zhang. A unified continual learning framework with general parameter-efficient tuning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11483–11493, 2023

work page 2023
[27]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[28]

The many faces of robustness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. InProceedings of the IEEE/CVF international conference on computer vision, pages 8340–8349, 2021. 11

work page 2021
[29]

Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

work page 2012
[30]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 139–149, 2022

work page 2022
[31]

Dualprompt: Complementary prompting for rehearsal-free continual learning

Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, et al. Dualprompt: Complementary prompting for rehearsal-free continual learning. InEuropean conference on computer vision, pages 631–648. Springer, 2022

work page 2022
[32]

Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning

James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11909–11919, 2023

work page 2023
[33]

Revisiting class- incremental learning with pre-trained models: Generalizability and adaptivity are all you need

Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye, De-Chuan Zhan, and Ziwei Liu. Revisiting class- incremental learning with pre-trained models: Generalizability and adaptivity are all you need. International Journal of Computer Vision, 133(3):1012–1032, 2025

work page 2025
[34]

External knowledge injection for clip-based class-incremental learning

Da-Wei Zhou, Kai-Wen Li, Jingyi Ning, Han-Jia Ye, Lijun Zhang, and De-Chuan Zhan. External knowledge injection for clip-based class-incremental learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3314–3325, 2025

work page 2025
[35]

Bofa: Bridge- layer orthogonal low-rank fusion for clip-based class-incremental learning

Lan Li, Tao Hu, Da-Wei Zhou, Jia-Qi Yang, Han-Jia Ye, and De-Chuan Zhan. Bofa: Bridge- layer orthogonal low-rank fusion for clip-based class-incremental learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 22967–22975, 2026. 12 A Additional Learning Curves The learning curves on ImageNet-100 and CIFAR-100 are shown i...

work page 2026