pith. machine review for the scientific record. sign in

arxiv: 2605.11592 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.AI· cs.CR

Recognition: 2 theorem links

· Lean Theorem

SoK: Unlearnability and Unlearning for Model Dememorization

Derui Wang, Mengying Zhang, Minhui Xue, Ruoxi Sun, Shuang Hao, Xiaoyu Xia

Pith reviewed 2026-05-13 01:40 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR
keywords unlearnabilitymachine unlearningmodel dememorizationdata privacycertified unlearningsystematization of knowledgemodel forgetting
0
0 comments X

The pith

Unlearnability and unlearning both produce only shallow dememorization of sensitive data, but certified unlearning supplies the first theoretical guarantee on forgetting depth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper integrates two stages of defense against models memorizing private information: unlearnability, which adds invisible perturbations to training data to lower its learnability, and unlearning, which erases acquired knowledge from a trained model. It demonstrates that both techniques currently deliver only shallow dememorization, allowing recovery of the data under minor weight changes, and that they interfere with each other in practice. The work supplies a single taxonomy covering both families of methods, runs empirical tests that expose their robustness limits and mutual effects, and derives the first formal bound on how deeply certified unlearning can erase information. A reader interested in machine-learning privacy would care because these findings indicate where current safeguards fall short and what is required to reach a reliably forgotten state for sensitive knowledge.

Core claim

Unlearnability at the data-release stage and unlearning at the post-training stage share the goal of dememorization yet both exhibit shallow effects that fail under perturbations; input noise from unlearnability can impair later unlearning while unlearning can restore knowledge hidden by unlearnability; certified unlearning, however, yields the first provable bound on dememorization depth.

What carries the argument

The theoretical guarantee on dememorization depth for models that have undergone certified unlearning, which formally bounds the extent to which sensitive information can be removed from model parameters.

If this is right

  • Perturbations introduced by unlearnability reduce the effectiveness of subsequent unlearning steps.
  • Unlearning can recover domain-level knowledge that unlearnability had attempted to conceal.
  • Deeper immemorization of sensitive data requires combining unlearnability and unlearning under formal certification rather than using either in isolation.
  • Without certification, both families of methods leave models vulnerable to recovery of the target data under small weight perturbations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Sequential application of unlearnability followed by certified unlearning could be tested as a practical pipeline for stronger end-to-end privacy.
  • The shallow-dememorization phenomenon may appear in related privacy tools such as differential privacy, suggesting a broader pattern to investigate.
  • The taxonomy and guarantee could be extended to federated or continual-learning settings where data removal requests arrive incrementally.

Load-bearing premise

That the empirical evaluations of leading methods and the observed interplay between unlearnability and unlearning generalize beyond the specific datasets and models tested in the study.

What would settle it

A counterexample in which a model processed by certified unlearning still permits reconstruction of the supposedly forgotten data at a level exceeding the derived depth bound would falsify the theoretical guarantee.

Figures

Figures reproduced from arXiv: 2605.11592 by Derui Wang, Mengying Zhang, Minhui Xue, Ruoxi Sun, Shuang Hao, Xiaoyu Xia.

Figure 1
Figure 1. Figure 1: An overview of the model dememorization framework within the ML model development lifecycle. unlearnability [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Shallow unlearnability and shallow unlearning re [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The taxonomy of model dememorization. For samples 𝑥𝑖 from the same class, the corresponding 𝛿𝑖 can be shared, yielding class-wise unlearnability noise. Similar to E-Max, E-Min methods optimize perturbations through gradient-based procedures. However, they have been applied more broadly across data modalities, including images [52, 74], text [101, 217], and audio [121, 207], as well as across tasks such as … view at source ↗
Figure 4
Figure 4. Figure 4: The test accuracy (%) of ViT-Tiny trained on un [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: MIA on OPS (left:class-level; right:subset-level). [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: MIA on PUE (left:class-level; right:subset-level). [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: MIA on TUE (left:class-level; right:subset-level). [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Parametric robustness of unlearnability perturba [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Recovery attack against unlearned classifiers trained on UE-s from Table 14. Top: Recovery attack against the [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Recovery attack against unlearned classifiers using FT trained on UE-s from Table 14. [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Recovery attack against unlearned models trained on Regtext. [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Recovery attacks against unlearned classifiers trained on TUE from Table 15. Top: Recovery attack against the [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Recovery attacks against unlearned classifiers trained on PUE from Table 16. Top: Recovery attack against the [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Recovery attacks against unlearned classifiers trained on OPS from Table 17. Top: Recovery attack against the [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Recovery attack against unlearned classifiers using FT trained on TUE from Table 15. [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Recovery attack against unlearned classifiers using FT trained on PUE from Table 16. [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Recovery attack against unlearned classifiers using FT trained on OPS from Table 17. [PITH_FULL_IMAGE:figures/full_fig_p020_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: The test accuracy (%) of ResNet-18 trained on vary [PITH_FULL_IMAGE:figures/full_fig_p022_19.png] view at source ↗
read the original abstract

Advanced model dememorization methods, including availability poisoning (unlearnability) and machine unlearning, are emerging as key safeguards against data misuse in machine learning (ML). At the training stage, unlearnability embeds imperceptible perturbations into data before release to reduce learnability. At the post-training stage, unlearning removes previously acquired information from models to prevent unauthorized disclosure or use. While both defenses aim to preserve the right to withhold knowledge, their vulnerabilities and shared foundations remain unclear. Specifically, both unlearnability and unlearning suffer from issues such as shallow dememorization, leading to falsely claimed data learnability reduction or forgetting in the presence of weight perturbations. Moreover, input perturbations may affect the effectiveness of downstream unlearning, while unlearning may inadvertently recover domain knowledge hidden by unlearnability. This interplay calls for deeper investigation. Finally, there is a lack of formal guarantees to provide theoretical insights into current defenses against shallow dememorization. In this Systematization of Knowledge, we present the first integrated analysis of model dememorization approaches leveraging unlearnability and unlearning. Our contributions are threefold: (i) a unified taxonomy of unlearnability and scalable unlearning methods; (ii) an empirical evaluation revealing the robustness, interplay, and shallow dememorization of leading methods; and (iii) the first theoretical guarantee on dememorization depth for models processed through certified unlearning. These results lay the foundation for unifying dememorization mechanisms across the ML lifecycle to achieve a deeper immemor state for sensitive knowledge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript is a Systematization of Knowledge (SoK) on model dememorization that integrates availability poisoning (unlearnability) at training time with post-training machine unlearning. It contributes (i) a unified taxonomy of unlearnability and scalable unlearning methods, (ii) empirical evaluations of leading methods that examine robustness, the interplay between the two defenses, and the problem of shallow dememorization, and (iii) the first theoretical guarantee on dememorization depth for models processed by certified unlearning.

Significance. If the empirical results on shallow dememorization and cross-stage interplay are reproducible and the theoretical guarantee holds under realistic conditions, the work would provide a valuable organizing framework for dememorization research across the ML lifecycle. The systematization itself organizes a fragmented literature; the attempt to supply a formal depth bound is a positive step toward moving beyond purely empirical claims.

major comments (2)
  1. [§5 (Theoretical Guarantee)] §5 (Theoretical Guarantee): The main theorem establishing the dememorization-depth bound assumes exact certification (perfect influence removal or exact privacy parameters). Certified unlearning procedures in practice rely on approximations (influence-function estimates, finite-sample DP, or gradient-based surrogates). The manuscript does not show that the bound survives these approximations, which directly weakens its applicability to the shallow-dememorization phenomenon identified in the empirical sections.
  2. [§4 (Empirical Evaluation)] §4 (Empirical Evaluation): The claims that leading unlearnability and unlearning methods exhibit shallow dememorization and that input perturbations affect downstream unlearning rest on evaluations whose baselines, statistical controls, and exact metrics are not fully specified. Without these details it is impossible to assess whether the reported interplay generalizes or is an artifact of the chosen datasets and models.
minor comments (2)
  1. [Abstract] The abstract states that the work presents 'the first theoretical guarantee' but does not indicate the precise form of the bound or the key assumptions; adding one sentence would improve clarity.
  2. [§2 (Preliminaries)] Notation for 'dememorization depth' is introduced without an explicit equation reference in the early sections; a forward pointer to the definition used in the theorem would help readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of our SoK. We address each major comment below and will revise the manuscript accordingly to improve rigor and clarity.

read point-by-point responses
  1. Referee: [§5 (Theoretical Guarantee)] §5 (Theoretical Guarantee): The main theorem establishing the dememorization-depth bound assumes exact certification (perfect influence removal or exact privacy parameters). Certified unlearning procedures in practice rely on approximations (influence-function estimates, finite-sample DP, or gradient-based surrogates). The manuscript does not show that the bound survives these approximations, which directly weakens its applicability to the shallow-dememorization phenomenon identified in the empirical sections.

    Authors: We agree that the main theorem is stated under exact certification. This provides a clean first formal guarantee on dememorization depth, consistent with the theoretical literature on certified unlearning. To address the gap, the revised manuscript will include an extended analysis (new subsection in §5) that propagates approximation errors from influence estimates and finite-sample DP relaxations into the depth bound, yielding a relaxed but still non-trivial guarantee. This will directly connect the theory to the shallow-dememorization observations in the empirical sections. revision: yes

  2. Referee: [§4 (Empirical Evaluation)] §4 (Empirical Evaluation): The claims that leading unlearnability and unlearning methods exhibit shallow dememorization and that input perturbations affect downstream unlearning rest on evaluations whose baselines, statistical controls, and exact metrics are not fully specified. Without these details it is impossible to assess whether the reported interplay generalizes or is an artifact of the chosen datasets and models.

    Authors: We acknowledge that §4 would benefit from greater explicitness. In the revision we will expand the experimental protocol subsection to specify: (i) exact baseline implementations and hyper-parameters, (ii) the statistical tests and multiple-comparison corrections used, (iii) precise definitions of all metrics for shallow dememorization and cross-stage interplay, and (iv) additional controls and sensitivity checks across datasets and model scales. These additions will allow readers to evaluate generalizability directly. revision: yes

Circularity Check

0 steps flagged

No circularity: SoK paper presents taxonomy, evaluation, and guarantee without self-referential derivations.

full rationale

The paper is a systematization of knowledge offering a unified taxonomy, empirical study of methods, and a claimed first theoretical guarantee on dememorization depth under certified unlearning. No equations, predictions, or first-principles results are shown that reduce by construction to author-defined inputs, fitted parameters, or self-citation chains. The guarantee is positioned as a novel contribution based on analysis of existing certified unlearning procedures rather than tautological redefinition. Standard citations to prior unlearning and poisoning literature do not constitute load-bearing self-referential justification per the enumerated patterns. The work remains self-contained against external benchmarks without any reduction of claims to its own fitted values or renamed ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard ML assumptions about model training dynamics and the existence of certified unlearning procedures; no new free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Certified unlearning procedures exist and can be applied to trained models
    Invoked when stating the theoretical guarantee on dememorization depth.

pith-pipeline@v0.9.0 · 5594 in / 1139 out tokens · 34142 ms · 2026-05-13T01:40:01.976955+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

242 extracted references · 242 canonical work pages · 4 internal anchors

  1. [1]

    GDPR Article 17: Right to Erasure

    2016. GDPR Article 17: Right to Erasure. gdpr-info.eu. https://gdpr-info.eu/art- 17-gdpr/

  2. [2]

    California Consumer Privacy Act (CCPA) & CPRA Overview

    2023. California Consumer Privacy Act (CCPA) & CPRA Overview. California Department of Justice. https://www.oag.ca.gov/privacy/ccpa

  3. [3]

    CCPA/CPRA Regulations

    2024. CCPA/CPRA Regulations. California Privacy Protection Agency. https://cppa.ca.gov/regulations/

  4. [4]

    Regulation (EU) 2024/1689: Artificial Intelligence Act

    2024. Regulation (EU) 2024/1689: Artificial Intelligence Act. Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng

  5. [5]

    Anna Ablove, Shreyas Chandrashekaran, Xiao Qiang, and Roya Ensafi. 2026. Characterizing the Implementation of Censorship Policies in Chinese LLM Services. InNDSS

  6. [6]

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 Technical Report.arXiv preprint arXiv:2303.08774 (2023)

  7. [7]

    Sk Miraj Ahmed, Umit Yigit Basaran, Dripta S Raychaudhuri, Arindam Dutta, Rohit Kundu, Fahim Faisal Niloy, Basak Guler, and Amit K Roy-Chowdhury

  8. [8]

    Towards Source-Free Machine Unlearning. InCVPR

  9. [9]

    Silas Alberti, Kenan Hasanaliyev, Manav Shah, and Stefano Ermon. 2025. Data Unlearning in Diffusion Models. InICLR

  10. [10]

    Nasser Aldaghri, Hessam Mahdavifar, and Ahmad Beirami. 2021. Coded machine unlearning.IEEE Access9 (2021), 88137–88150

  11. [11]

    Youssef Allouah, Rachid Guerraoui, and Sanmi Koyejo. 2026. Distributional Machine Unlearning via Selective Data Removal. InICLR

  12. [12]

    Youssef Allouah, Joshua Kazdan, Rachid Guerraoui, and Sanmi Koyejo. 2025. The Utility and Complexity of In-and Out-of-Distribution Machine Unlearning. InICLR

  13. [13]

    Sadia Asif and Mohammad Mohammadi Amiri. 2026. OFMU: Optimization- Driven Framework for Machine Unlearning. InICLR

  14. [14]

    George-Octavian Bărbulescu and Peter Triantafillou. 2024. To each (textual sequence) its own: improving memorized-data unlearning in large language models. InICML

  15. [15]

    Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, and Basak Guler

  16. [16]

    A Certified Unlearning Approach without Access to Source Data. InICML

  17. [17]

    Shristi Das Biswas, Arani Roy, and Kaushik Roy. 2025. Cure: Concept unlearning via orthogonal representation editing in diffusion models. InNeurIPS

  18. [18]

    Jacob L Block, Aryan Mokhtari, and Sanjay Shakkottai. 2025. Machine Unlearn- ing under Overparameterization. InNeurIPS

  19. [19]

    Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hen- grui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. 2021. Machine unlearning. InIEEE SP

  20. [20]

    Alexander Brown, Nenad Tomasev, Jan Freyberg, Yuan Liu, Alan Karthike- salingam, and Jessica Schrouff. 2023. Detecting shortcut learning for fair medical AI using shortcut testing.Nature communications14, 1 (2023), 4314

  21. [21]

    Nhung Bui, Xinyang Lu, Rachael Hwee Ling Sim, See-Kiong Ng, and Bryan Kian Hsiang Low. 2026. How to Cure Newton for Unlearning Neural Networks? An Empirical Study from the Hessian Perspective. InICLR

  22. [22]

    Bochuan Cao, Changjiang Li, Ting Wang, Jinyuan Jia, Bo Li, and Jinghui Chen

  23. [23]

    InNeurIPS

    Impress: Evaluating the resilience of imperceptible perturbations against unauthorized data usage in diffusion-based generative ai. InNeurIPS

  24. [24]

    Sungmin Cha, Sungjun Cho, Dasol Hwang, and Moontae Lee. 2025. Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs. InICLR

  25. [25]

    Chaochao Chen, Jiaming Zhang, Yuyuan Li, and Zhongxuan Han. 2024. One for all: A universal generator for concept unlearnability via multi-modal alignment. InICML

  26. [26]

    Hang Chen, Jiaying Zhu, Xinyu Yang, and Wenya Wang. 2026. CLUE: Conflict- guided Localization for LLM Unlearning Framework. InICLR

  27. [27]

    Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. 2023. Bound- ary unlearning: Rapid forgetting of deep networks via shifting the decision boundary. InCVPR

  28. [28]

    Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, and Yang Zhang. 2022. Graph unlearning. InCCS

  29. [29]

    Sizhe Chen, Geng Yuan, Xinwen Cheng, Yifan Gong, Minghai Qin, Yanzhi Wang, and Xiaolin Huang. 2023. Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors. InICLR

  30. [30]

    Tianqi Chen, Shujian Zhang, and Mingyuan Zhou. 2025. Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models. InICLR

  31. [31]

    Xinrui Chen, Xu Cao, Jianhao Zhang, Pinlong Zhao, Di Gao, and Ou Wu. 2026. Robust LLM Unlearning via Post Judgment and Multi-round Thinking. InICLR

  32. [32]

    Jiali Cheng, George Dasoulas, Huan He, Chirag Agarwal, and Marinka Zitnik

  33. [33]

    GNNDelete: A General Strategy for Unlearning in Graph Neural Networks. InICLR

  34. [34]

    Jingpu Cheng, Ping Liu, Qianxiao Li, and CHI ZHANG. 2026. Machine Unlearn- ing under Retain–Forget Entanglement. InICLR

  35. [35]

    Xinwen Cheng, Zhehao Huang, Wenxin Zhou, Zhengbao He, Ruikai Yang, Ying- wen Wu, and Xiaolin Huang. 2026. Remaining-data-free machine unlearning by suppressing sample contribution. InICLR

  36. [36]

    Eli Chien, Haoyu Wang, Ziang Chen, and Pan Li. 2024. Certified machine unlearning via noisy stochastic gradient descent. InNeurIPS

  37. [37]

    Eli Chien, Haoyu Wang, Ziang Chen, and Pan Li. 2024. Langevin unlearning: A new perspective of noisy gradient descent for machine unlearning. InNeurIPS

  38. [38]

    Somnath Basu Roy Chowdhury, Krzysztof Marcin Choromanski, Arijit Se- hanobish, Kumar Avinava Dubey, and Snigdha Chaturvedi. 2025. Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning. In ICLR

  39. [39]

    Kaiyuan Deng, Gen Li, Yang Xiao, Bo Hui, and Xiaolong Ma. 2026. Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models. In ICLR

  40. [40]

    Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik, and Yang Liu. 2025. Adversarial Machine Unlearning. InICLR

  41. [41]

    Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov, Bo Jiang, Yuanshun Yao, Sijia Liu, and Yang Liu. 2026. Label smoothing improves machine unlearning. InICLR

  42. [42]

    Jingfeng Zhang Di Zhao, Hongsheng Hu, Philippe Fournier-Viger, Gillian Dob- bie, and Yun Sing Koh. 2026. UNLEARNING DURING TRAINING: DOMAIN- SPECIFIC GRADIENT ASCENT FOR DOMAIN GENERALIZATION. InICLR

  43. [43]

    Chenlu Ding, Jiancan Wu, Yancheng Yuan, Jinda Lu, Kai Zhang, Alex Su, Xiang Wang, and Xiangnan He. 2025. Unified Parameter-Efficient Unlearning for LLMs. InICLR

  44. [44]

    Junhao Dong, Hao Zhu, Yifei Zhang, Xinghua Qu, Yew-Soon Ong, and Piotr Koniusz. 2025. Machine unlearning via task simplex arithmetic. InNeurIPS

  45. [45]

    Alexey Dosovitskiy. 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020)

  46. [46]

    Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ash- win Swaminathan, and Stefano Soatto. 2023. Safe: Machine unlearning with shard graphs. InICCV. 14

  47. [47]

    Cynthia Dwork. 2006. Differential privacy. InInternational Colloquium on Automata, Languages, and Programming

  48. [48]

    Taha Entesari, Arman Hatami, Rinat Khaziev, Anil Ramakrishna, and Mahyar Fazlyab. 2025. Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models. InNeurIPS

  49. [49]

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. 2024. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. In ICML

  50. [50]

    Simone Facchiano, Stefano Saravalle, Matteo Migliarini, Edoardo De Matteis, Alessio Sampieri, Andrea Pilzer, Emanuele Rodolà, Indro Spinelli, Luca Franco, and Fabio Galasso. 2026. Video unlearning via low-rank refusal vector. InICLR

  51. [51]

    Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, and Sijia Liu. 2025. Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning. InNeurIPS

  52. [52]

    Chongyu Fan, Jiancheng Liu, Yihua Zhang, Dennis Wei, Eric Wong, and Sijia Liu. 2024. SalUn: Empowering Machine Unlearning via Gradient-Based Weight Saliency in Both Image Classification and Generation. InInternational Conference on Learning Representations

  53. [53]

    Bin Fang, Bo Li, Shuang Wu, Shouhong Ding, Ran Yi, and Lizhuang Ma. 2024. Re-thinking data availability attacks against deep neural networks. InCVPR

  54. [54]

    XiaoHua Feng, Yuyuan Li, Chaochao Chen, Li Zhang, Longfei Li, JUN ZHOU, and Xiaolin Zheng. 2025. Controllable Unlearning for Image-to-Image Generative Models via𝜖-Constrained Optimization. InICLR

  55. [55]

    Liam Fowl, Micah Goldblum, Ping-yeh Chiang, Jonas Geiping, Wojciech Czaja, and Tom Goldstein. 2021. Adversarial examples make strong poisons. InNeurIPS

  56. [56]

    Shaopeng Fu, Fengxiang He, Yang Liu, Li Shen, and Dacheng Tao. 2022. Robust Unlearnable Examples: Protecting Data Privacy Against Adversarial Learning. InICLR

  57. [57]

    Chongyang Gao, Lixu Wang, Kaize Ding, Chenkai Weng, Xiao Wang, and Qi Zhu. 2025. On Large Language Model Continual Unlearning. InICLR

  58. [58]

    Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A Wichmann. 2020. Shortcut learning in deep neural networks.Nature Machine Intelligence2, 11 (2020), 665–673

  59. [59]

    Kristian Georgiev, Roy Rinberg, Sung Min Park, Shivam Garg, Andrew Ilyas, Aleksander Madry, and Seth Neel. 2025. Attribute-to-delete: Machine unlearning via datamodel matching. InICLR

  60. [60]

    David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, and Vardan Papyan

  61. [61]

    Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches. InICML

  62. [62]

    Vignesh Gokul and Shlomo Dubnov. 2024. Poscuda: Position based convolution for unlearnable audio datasets.arXiv preprint arXiv:2401.02135(2024)

  63. [63]

    Aditya Golatkar, Alessandro Achille, Avinash Ravichandran, Marzia Polito, and Stefano Soatto. 2021. Mixed-privacy forgetting in deep networks. InCVPR

  64. [64]

    Aditya Golatkar, Alessandro Achille, and Stefano Soatto. 2020. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InCVPR

  65. [65]

    Chen Gong, Kecen Li, Jin Yao, and Tianhao Wang. 2025. TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents. InNDSS

  66. [66]

    Xueluan Gong, Yuji Wang, Yanjiao Chen, Haocheng Dong, Yiming Li, Mengyuan Sun, Shuaike Li, Qian Wang, and Chen Chen. 2025. Armor: Shielding unlearnable examples against data augmentation.arXiv preprint arXiv:2501.08862(2025)

  67. [67]

    Laura Graves, Vineel Nagisetty, and Vijay Ganesh. 2021. Amnesiac machine learning. InAAAI

  68. [68]

    Hanlin Gu, Hong Xi Tae, Lixin Fan, and Chee Seng Chan. 2026. Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting Without Disclosure. InICLR

  69. [69]

    Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens Van Der Maaten. 2020. Certified data removal from machine learning models. InICML

  70. [70]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.arXiv preprint arXiv:2501.12948(2025)

  71. [71]

    Varun Gupta, Christopher Jung, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Chris Waites. 2021. Adaptive machine unlearning. InNeurIPS

  72. [72]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778

  73. [73]

    Pengfei He, Han Xu, Jie Ren, Yingqian Cui, Shenglai Zeng, Hui Liu, Charu C Aggarwal, and Jiliang Tang. 2024. Sharpness-Aware Data Poisoning Attack. In ICLR

  74. [74]

    Robert Hönig, Javier Rando, Nicholas Carlini, and Florian Tramèr. 2025. Ad- versarial Perturbations Cannot Reliably Protect Artists From Generative AI. In ICLR

  75. [75]

    Hsiang Hsu, Pradeep Niroula, Zichang He, Ivan Brugere, Freddy Lecue, and Chun-Fu Chen. 2025. The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples. InNeurIPS

  76. [76]

    Jinwei Hu, Zhenglin Huang, Xiangyu Yin, Wenjie Ruan, Guangliang Cheng, Yi Dong, and Xiaowei Huang. 2025. FALCON: Fine-grained Activation Manipu- lation by Contrastive Orthogonal Unalignment for Large Language Model. In NeurIPS

  77. [77]

    Shengyuan Hu, Yiwei Fu, Steven Wu, and Virginia Smith. 2025. Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning. InICLR

  78. [78]

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger

  79. [79]

    InProceedings of the IEEE conference on computer vision and pattern recognition

    Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708

  80. [80]

    Hanxun Huang, Xingjun Ma, Sarah Monazam Erfani, James Bailey, and Yisen Wang. 2021. Unlearnable Examples: Making Personal Data Unexploitable. In ICLR

Showing first 80 references.