RaPO reduces catastrophic forgetting in visual continual learning by shaping rewards around policy drift and stabilizing advantages with cross-task exponential moving averages during reinforcement fine-tuning of multimodal models.
Reinforcement fine-tuning naturally mitigates forgetting in continual post-training
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
ConSFT prevents catastrophic forgetting in fine-tuning flow-matching VLAs by dynamically scaling gradients based on model confidence, retaining over 20% more pre-trained capability than standard SFT without prior data or reference networks.
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
CRAFT is a continual learning method for LLMs that learns low-rank interventions on hidden representations, using a unified KL-divergence objective to handle task routing by output divergence, forgetting control via prior-state regularization, and intervention merging.
citing papers explorer
-
Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning
RaPO reduces catastrophic forgetting in visual continual learning by shaping rewards around policy drift and stabilizing advantages with cross-task exponential moving averages during reinforcement fine-tuning of multimodal models.
-
Preserving Foundational Capabilities in Flow-Matching VLAs through Conservative SFT
ConSFT prevents catastrophic forgetting in fine-tuning flow-matching VLAs by dynamically scaling gradients based on model confidence, retaining over 20% more pre-trained capability than standard SFT without prior data or reference networks.
-
Rotation-Preserving Supervised Fine-Tuning
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
-
CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning
CRAFT is a continual learning method for LLMs that learns low-rank interventions on hidden representations, using a unified KL-divergence objective to handle task routing by output divergence, forgetting control via prior-state regularization, and intervention merging.