pith. sign in

hub

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

24 Pith papers cite this work. Polarity classification is still indexing.

24 Pith papers citing it
abstract

With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often suffer limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' for MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting data points). To the best of our knowledge, SalUn is the first principled MU approach that can effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation tasks. As highlighted below, For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not. Codes are available at https://github.com/OPTML-Group/Unlearn-Saliency. (WARNING: This paper contains model outputs that may be offensive in nature.)

hub tools

citation-role summary

background 3 baseline 1

citation-polarity summary

years

2026 22 2025 2

representative citing papers

Machine Unlearning for Masked Diffusion Language Models

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.

Is your algorithm unlearning or untraining?

cs.LG · 2026-04-09 · conditional · novelty 7.0

Machine unlearning conflates reversing the influence of specific training examples (untraining) with removing the full underlying distribution or behavior (unlearning).

CURE:Circuit-Aware Unlearning for LLM-based Recommendation

cs.IR · 2026-04-04 · unverdicted · novelty 7.0

CURE disentangles LLM recommendation circuits into forget-specific, retain-specific, and task-shared modules with tailored update rules to achieve more effective unlearning than weighted baselines.

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

cs.CV · 2026-04-16 · unverdicted · novelty 6.0 · 2 refs

DAMP performs one-shot class unlearning by depth-aware projection removal of forget-specific directions, producing forgetting behavior closer to retraining from scratch than prior methods on image classification tasks.

BARRIER: Bounded Activation Regions for Robust Information Erasure

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

BARRIER applies interval arithmetic to SVD-based activation projections to create bounded forget regions that enable aggressive unlearning while providing formal protection for retain distributions via tail bounds on functional drift.

citing papers explorer

Showing 24 of 24 citing papers.