NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.
hub
arXiv preprint arXiv:1911.03030 (2019)
22 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Introduces interference-aware multi-task unlearning with task-aware gradient projection and instance-level gradient orthogonalization, reducing interference scores by 30.3% and 52.9% on vision benchmarks.
CoVUBench is the first benchmark framework for evaluating multimodal copyright unlearning in LVLMs via synthetic data, systematic variations, and a dual protocol for forgetting efficacy and utility preservation.
Second-order optimizers retain residual geometric memory in their state after unlearning that first-order metrics miss, and only controlled eigendecay perturbations fully erase it.
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
WIN-U delivers a retain-free unlearning update that approximates the gold-standard retrained model via a Woodbury-informed Newton step using only forget-set curvature information.
PrivEraserVerify unifies efficiency via adaptive checkpointing, privacy via layer-adaptive DP, and verifiability via fingerprints in federated unlearning, claiming 2-3x faster performance than retraining with formal guarantees.
Parameter-difference and model-inversion attacks can identify forgotten classes after machine unlearning on standard image datasets.
Jellyfish enables zero-shot federated unlearning through synthetic proxy data generation, channel-restricted knowledge disentanglement, and a composite loss with repair to forget target data while retaining model utility.
FIA uses contrastive concept saliency and temporal-spatial neuron identification to build unified masks that erase multiple target concepts while preserving general generation quality in diffusion models.
POUR derives a provably optimal forgetting operator by showing that orthogonal projections of simplex equiangular tight frames remain ETFs in lower dimensions, enabling representation-level unlearning with closed-form and distillation variants.
MCU applies mode connectivity to trace nonlinear unlearning pathways in parameter space, adds a parameter mask and adaptive penalty, and produces a range of unlearning models that plug into existing methods.
TOFU is a new benchmark with synthetic profiles and metrics demonstrating that existing unlearning algorithms for LLMs fail to achieve effective forgetting of targeted information.
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
Withdrawal rights paired with centralized cost-based assignment prevent subsidy waste by collecting data only when the improvement threshold is sustainably reachable, turning infeasible cases into null outcomes.
A complete pipeline for federated unlearning via knowledge distillation for efficient removal and a GAN-integrated classifier for visual evaluation of forgetting capacity.
A LoRA-based residual feature alignment method for efficient machine unlearning on pre-trained models by targeting zero residuals on retained data and shifted residuals on unlearned data.
AdaProb performs machine unlearning by substituting final-layer output probabilities with optimized uniform pseudo-probabilities and updating model weights.
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.
Dual-space semantic-character mutations on prompts achieve higher misuse success rates against DeepSeek than single-space attacks alone.
citing papers explorer
-
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.
-
Interference-Aware Multi-Task Unlearning
Introduces interference-aware multi-task unlearning with task-aware gradient projection and instance-level gradient orthogonalization, reducing interference scores by 30.3% and 52.9% on vision benchmarks.
-
Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models
CoVUBench is the first benchmark framework for evaluating multimodal copyright unlearning in LVLMs via synthetic data, systematic variations, and a dual protocol for forgetting efficacy and utility preservation.
-
Shape of Memory: a Geometric Analysis of Machine Unlearning in Second-Order Optimizers
Second-order optimizers retain residual geometric memory in their state after unlearning that first-order metrics miss, and only controlled eigendecay perturbations fully erase it.
-
Representation-Guided Parameter-Efficient LLM Unlearning
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
-
WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework
WIN-U delivers a retain-free unlearning update that approximates the gold-standard retrained model via a Woodbury-informed Newton step using only forget-set curvature information.
-
PrivEraserVerify: Efficient, Private, and Verifiable Federated Unlearning
PrivEraserVerify unifies efficiency via adaptive checkpointing, privacy via layer-adaptive DP, and verifiability via fingerprints in federated unlearning, claiming 2-3x faster performance than retraining with formal guarantees.
-
Label Leakage Attacks in Machine Unlearning: A Parameter and Inversion-Based Approach
Parameter-difference and model-inversion attacks can identify forgotten classes after machine unlearning on standard image datasets.
-
Jellyfish: Zero-Shot Federated Unlearning Scheme with Knowledge Disentanglement
Jellyfish enables zero-shot federated unlearning through synthetic proxy data generation, channel-restricted knowledge disentanglement, and a composite loss with repair to forget target data while retaining model utility.
-
Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking
FIA uses contrastive concept saliency and temporal-spatial neuron identification to build unified masks that erase multiple target concepts while preserving general generation quality in diffusion models.
-
POUR: A Provably Optimal Method for Unlearning Representations via Neural Collapse
POUR derives a provably optimal forgetting operator by showing that orthogonal projections of simplex equiangular tight frames remain ETFs in lower dimensions, enabling representation-level unlearning with closed-form and distillation variants.
-
Exploring Nonlinear Pathway in Parameter Space for Machine Unlearning
MCU applies mode connectivity to trace nonlinear unlearning pathways in parameter space, adds a parameter mask and adaptive penalty, and produces a range of unlearning models that plug into existing methods.
-
TOFU: A Task of Fictitious Unlearning for LLMs
TOFU is a new benchmark with synthetic profiles and metrics demonstrating that existing unlearning algorithms for LLMs fail to achieve effective forgetting of targeted information.
-
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
-
Incentivizing User Data Contributions for LLM Improvement under Withdrawal Rights
Withdrawal rights paired with centralized cost-based assignment prevent subsidy waste by collecting data only when the improvement threshold is sustainably reachable, turning infeasible cases into null outcomes.
-
Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation
A complete pipeline for federated unlearning via knowledge distillation for efficient removal and a GAN-integrated classifier for visual evaluation of forgetting capacity.
-
Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA
A LoRA-based residual feature alignment method for efficient machine unlearning on pre-trained models by targeting zero residuals on retained data and shifted residuals on unlearned data.
-
AdaProb: Efficient Machine Unlearning via Adaptive Probability
AdaProb performs machine unlearning by substituting final-layer output probabilities with optimized uniform pseudo-probabilities and updating model weights.
-
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.
-
DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection
Dual-space semantic-character mutations on prompts achieve higher misuse success rates against DeepSeek than single-space attacks alone.
- ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models
- High-Dimensional Statistics: Reflections on Progress and Open Problems