Continual Learning for Multiple Modalities

Eunwoo Kim; Hyundong Jin

arxiv: 2503.08064 · v2 · pith:OWIRQHIYnew · submitted 2025-03-11 · 💻 cs.CV · cs.AI

Continual Learning for Multiple Modalities

Hyundong Jin , Eunwoo Kim This is my paper

classification 💻 cs.CV cs.AI

keywords modalitiesknowledgecontinuallearningmultipleinformationlearnedmodality

0 comments

read the original abstract

Continual learning aims to learn knowledge of tasks observed in sequential time steps while mitigating the forgetting of previously learned knowledge. Existing methods were designed to learn a single modality (e.g., image) over time, which limits their applicability in scenarios involving multiple modalities. In this work, we propose a novel continual learning framework that accommodates multiple modalities (image, video, audio, depth, and text). We train a model to align various modalities with text, leveraging its rich semantic information. However, this increases the risk of forgetting previously learned knowledge, exacerbated by the differing input traits across tasks. To alleviate the overwriting of previous knowledge of modalities, we propose a framework that consolidates intra-modal knowledge while incorporating relevant inter-modal information. This is achieved by self-regulating shifts in learned representations to gradually integrating novel knowledge into the information retained across modalities. Simultaneously, it mitigates inter-modal interference by selectively integrating knowledge from previously encountered modalities based on their mutual relevance. Furthermore, we introduce a strategy to re-align modality embeddings, effectively addressing biased alignment between modalities. We evaluate the proposed method in a wide range of continual learning scenarios using multiple datasets with different modalities. Extensive experiments demonstrate that ours outperforms existing methods in the scenarios, regardless of whether the identity of the modality is given.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hidden Forgetting in Continual Multimodal Learning: When Accuracy Survives but Grounding Fails
cs.AI 2026-07 unverdicted novelty 7.0

RCL preserves evidence-reliance in continual multimodal learning to reduce hidden forgetting beyond standard accuracy metrics.