NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.
arXiv preprint arXiv:2310.10683 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Probe-geometry alignment erases cross-sequence memorization signatures in LLMs below chance using per-depth rank-one activation interventions with negligible impact on zero-shot capabilities.
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
A penalty-based bi-level optimization framework for machine unlearning that decorrelates forget and retention gradients via inner maximization and restores utility via outer minimization, with convergence guarantees and improved trade-offs on vision and language benchmarks.
MSA performs data unlearning in LLMs by arithmetic operations on prior model checkpoints to remove targeted datapoint influence, with experiments showing competitive or better results than existing unlearning methods.
citing papers explorer
-
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.
-
Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance
Probe-geometry alignment erases cross-sequence memorization signatures in LLMs below chance using per-depth rank-one activation interventions with negligible impact on zero-shot capabilities.
-
Representation-Guided Parameter-Efficient LLM Unlearning
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
-
OFMU: Optimization-Driven Framework for Machine Unlearning
A penalty-based bi-level optimization framework for machine unlearning that decorrelates forget and retention gradients via inner maximization and restores utility via outer minimization, with convergence guarantees and improved trade-offs on vision and language benchmarks.
-
Revisiting the Past: Data Unlearning with Model State History
MSA performs data unlearning in LLMs by arithmetic operations on prior model checkpoints to remove targeted datapoint influence, with experiments showing competitive or better results than existing unlearning methods.