arxiv: 2605.02196 · v2 · submitted 2026-05-04 · 💻 cs.LG

Recognition: no theorem link

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

Abdullah Ahmad Khan , Ferdous Sohel

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:46 UTC · model grok-4.3

classification 💻 cs.LG

keywords machine unlearningquantizationprivacylarge language modelsmodel robustnessINT4forgetting attack

0 comments

The pith

INT4 quantization restores data that machine unlearning removed at higher precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that machine unlearning evaluations conducted at bfloat16 precision do not guarantee privacy once models are deployed at the lower INT4 precision common in production. It shows that this quantization step can recover up to 22 times more of the supposedly forgotten content across multiple unlearning methods and datasets. The work identifies a three-way tension where strong forgetting, retained utility, and robustness to quantization cannot be achieved simultaneously with existing techniques. It introduces a new training objective that produces models with stable durability across BF16, INT8, and INT4 precisions.

Core claim

Quantization to INT4 induces a recovery attack (QRA) that revives training data removed by unlearning, even when the model passes audits at BF16. No existing method satisfies the FA-RA-Q-INT4 trilemma of simultaneous forgetting, utility, and quantization robustness. A new sharpness-aware objective using straight-through estimator gradients yields the first method with a stable (0.047, {BF16, INT8, INT4}) durability certificate.

What carries the argument

The quantization recovery attack (QRA) observed under adapter-space INT4 quantization in the NF4+LoRA regime, together with the sharpness-aware forgetting objective that propagates gradients through the INT4 rounding operation.

Load-bearing premise

The measured recovery of forgotten content is caused by the quantization step itself rather than by interactions with the chosen unlearning algorithms, metrics, or datasets.

What would settle it

Running the same seven unlearning methods on LLaMA-3-8B-Instruct with the TOFU, MUSE-News, and WikiBio-WPU datasets and finding that INT4 outputs show no increase in membership or extraction metrics over the BF16 baseline.

Figures

Figures reproduced from arXiv: 2605.02196 by Abdullah Ahmad Khan, Ferdous Sohel.

**Figure 2.** Figure 2: INT4 recovery attack across all baselines. Left: Q-INT8 = 0 (grey) for every method; Q-INT4 (colored) is catastrophic for methods that actually forget. Cert. threshold shown as dashed line. Right: Q-INT4/FA recovery ratio for methods with FA < 0.05. GradDiff achieves the best forgetting quality yet has the worst INT4 fragility (18.9 ) state-of-the-art forgetting does not imply quantization robustness. Find… view at source ↗

read the original abstract

Machine unlearning aims to remove specified training data to satisfy privacy regulations such as GDPR. However, existing evaluations assume identical precision at unlearning and deployment, overlooking that production LLMs are deployed at low-bit precision. We show that INT4 quantization systematically restores forgotten content even when models pass compliance audits at bfloat16 (BF16), we term this the quantization recovery attack (QRA). We conduct the first systematic study of unlearning robustness under adapter-space INT4 quantization in the NF4+LoRA regime, evaluating seven methods on LLaMA-3-8B-Instruct across TOFU, MUSE-News, and WikiBio-WPU. INT8 is benign; INT4 induces recovery of up to 22x, worsening with dataset difficulty. We identify the FA-RA-Q-INT4 trilemma: no method simultaneously achieves strong forgetting, high utility, and quantization robustness. A dense Pareto sweep reveals a sharp phase transition once robustness is achieved, retaining accuracy collapses regardless of further tuning. To address this, we propose DURABLEUN-SAF (Sharpness-Aware Forgetting), a quantization-aware objective using Straight-Through Estimator gradients through INT4 rounding. DURABLEUN-SAF is the only method to achieve a stable empirical (0.047, {BF16, INT8, INT4})- durability certificate: Q-INT4= 0.043 +- 0.002, cert rate= 3/3, versus SalUn's cert rate= 1/3 at its own published hyperparameters. We call for Q-INT4 to be adopted as a standard evaluation metric alongside FA and RA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

INT4 quantization can restore forgotten data in unlearned LLMs, but the effect may be tied to the LoRA setup rather than quantization alone.

read the letter

The paper's main point is that INT4 quantization restores up to 22x of the content that unlearning was supposed to remove, even when the model passes checks at BF16. This matters because production models run at low precision, so current audit practices could miss real privacy leaks. They back this with tests on seven unlearning methods using LoRA adapters on LLaMA-3-8B across TOFU, MUSE-News, and WikiBio-WPU, showing INT8 stays safe while INT4 does not, and the problem grows with harder data. The FA-RA-Q-INT4 trilemma and the DURABLEUN-SAF objective with sharpness-aware forgetting plus straight-through estimator gradients are new, and the durability certificate across the three precisions gives a clear way to compare methods. Their method is the only one that holds a stable certificate while others fail on at least one precision. The call to add Q-INT4 as a standard metric is reasonable given the numbers. The experiments are direct and cover multiple baselines, which is better than most unlearning papers. The soft spot is the lack of controls that would separate quantization from the specific unlearning-LoRA interactions. All results stay inside the NF4 adapter regime, so the recovery could come from how those methods change the weights that then round badly, rather than from INT4 in general. Adding tests on non-unlearned models or different quantization schemes would tighten the attribution. The phase transition in the Pareto sweep is reported but could also depend on the exact tuning range they used. This is for researchers working on unlearning for deployed models and for teams that need to meet privacy rules with quantized LLMs. It has enough new empirical ground and a practical angle to deserve peer review, with the main request being stronger isolation of the quantization step.

Referee Report

2 major / 3 minor

Summary. The paper claims that INT4 quantization systematically restores forgotten content in machine-unlearned LLMs (up to 22x recovery) even when models pass BF16 compliance audits, terming this the quantization recovery attack (QRA). It presents the first systematic evaluation of seven unlearning methods via LoRA adapters on LLaMA-3-8B-Instruct across TOFU, MUSE-News, and WikiBio-WPU datasets, finds INT8 benign but INT4 problematic, identifies a FA-RA-Q-INT4 trilemma with a sharp phase transition in Pareto fronts, and proposes DURABLEUN-SAF (a sharpness-aware forgetting objective using straight-through estimator gradients through INT4 rounding) that alone achieves a stable empirical durability certificate across BF16/INT8/INT4. The work advocates adopting Q-INT4 as a standard metric alongside forgetting and utility.

Significance. If the central results hold after addressing isolation concerns, the work is significant for highlighting a practical gap in unlearning evaluations that ignore deployment quantization. The empirical measurements across multiple methods and datasets, the identification of the trilemma and phase transition, and the proposal of DURABLEUN-SAF with its (0.047, {BF16, INT8, INT4})-durability certificate provide concrete, falsifiable contributions that could influence both research and production practices. The call for Q-INT4 as a new metric is a useful normative suggestion grounded in the observed recovery factors.

major comments (2)

[Experimental Evaluation] Experimental Evaluation (and abstract): The claim that recovery is 'quantization-induced' and due to the INT4 step itself is load-bearing for the QRA and trilemma results, yet the protocol applies INT4 only to LoRA-unlearned models in the NF4 regime. No controls are described for quantizing the base model, standard fine-tuned models, or non-LoRA unlearning to isolate whether recovery arises from quantization per se versus interactions with the specific unlearning adapters and rounding scheme. This leaves the weakest assumption unaddressed and weakens attribution; adding such ablations would directly test the central claim.
[§5.2] §5.2 (Pareto sweep and phase transition): The dense Pareto sweep is used to support the trilemma and the observation of a 'sharp phase transition' where robustness causes accuracy collapse. However, the manuscript does not specify the exact hyperparameters varied, the sampling density, or how the collapse is quantified (e.g., threshold on utility drop), making the phase-transition claim difficult to verify or reproduce independently.

minor comments (3)

[Abstract] Abstract: The durability certificate is reported with specific numbers (Q-INT4=0.043±0.002, cert rate=3/3) but the precise definition of 'cert rate' and the thresholds used to declare stability across the three precisions are not stated in the summary; a compact formal definition should appear in the abstract or early introduction for immediate clarity.
Notation and metrics: The paper introduces FA, RA, and Q-INT4 but does not explicitly restate their formulas or the exact forgetting/utility definitions used to compute the 22x recovery factor in the main text; adding a short 'Notation' subsection or table would prevent ambiguity when readers compare to prior unlearning work.
The free parameter 'sharpness radius in SAF objective' is listed as tunable; an ablation or sensitivity analysis on this radius (and its interaction with the STE) would strengthen the claim that DURABLEUN-SAF is robust rather than tuned to the reported certificate.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects for strengthening the attribution of the quantization recovery attack and improving the reproducibility of the Pareto analysis. We address each major comment below and will incorporate revisions to the manuscript.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental Evaluation (and abstract): The claim that recovery is 'quantization-induced' and due to the INT4 step itself is load-bearing for the QRA and trilemma results, yet the protocol applies INT4 only to LoRA-unlearned models in the NF4 regime. No controls are described for quantizing the base model, standard fine-tuned models, or non-LoRA unlearning to isolate whether recovery arises from quantization per se versus interactions with the specific unlearning adapters and rounding scheme. This leaves the weakest assumption unaddressed and weakens attribution; adding such ablations would directly test the central claim.

Authors: We agree that additional controls would strengthen the isolation of the quantization effect. Our current experiments focus on the practical deployment scenario of LoRA-unlearned models quantized to NF4, which is the regime where we observe up to 22x recovery. In the revised manuscript, we will add the requested ablations: (i) INT4 quantization of the base LLaMA-3-8B-Instruct model, (ii) quantization of standard fine-tuned (non-unlearned) models, and (iii) at least one non-LoRA unlearning baseline. These will be reported alongside the existing results to demonstrate that the pronounced recovery is tied to the combination of unlearning adapters and INT4 rounding, thereby supporting the QRA claim more rigorously. revision: yes
Referee: [§5.2] §5.2 (Pareto sweep and phase transition): The dense Pareto sweep is used to support the trilemma and the observation of a 'sharp phase transition' where robustness causes accuracy collapse. However, the manuscript does not specify the exact hyperparameters varied, the sampling density, or how the collapse is quantified (e.g., threshold on utility drop), making the phase-transition claim difficult to verify or reproduce independently.

Authors: We acknowledge the need for greater specificity to enable reproduction. In the revised §5.2, we will explicitly state: the hyperparameters varied (forgetting strength coefficient in [0.1, 10], LoRA rank in {8,16,32}, learning rate in {1e-5, 5e-5, 1e-4}), the sampling density (uniform grid with 15 points per dimension, yielding 3375 configurations), and the collapse quantification (utility drop defined as ROUGE-L falling below 75% of the pre-unlearning baseline while Q-INT4 remains below 0.05). This will make the observed sharp phase transition verifiable and allow readers to reproduce the Pareto fronts. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical measurements

full rationale

The paper's central claims rest on direct experimental measurements of recovery rates under INT4 quantization across seven unlearning methods, three datasets, and multiple precision regimes. No derivation chain, prediction, or first-principles result is presented that reduces by the paper's own equations to a fitted parameter or self-defined quantity. The proposed DURABLEUN-SAF objective employs the standard Straight-Through Estimator for quantization-aware training, which is an external, non-circular technique. No load-bearing self-citations or uniqueness theorems are invoked. All reported durability certificates and Pareto observations are independent empirical outcomes, not tautological renamings or constructions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on standard machine-learning assumptions about gradient flow through non-differentiable operations and the validity of the chosen evaluation metrics; the main additions are the new objective and the empirical observations.

free parameters (1)

sharpness radius in SAF objective
Hyperparameter controlling the sharpness-aware term in the proposed training objective; its value is not stated in the abstract.

axioms (1)

domain assumption Straight-Through Estimator provides a usable gradient approximation through INT4 rounding
Invoked to enable end-to-end training of the quantization-aware unlearning objective.

pith-pipeline@v0.9.0 · 5595 in / 1351 out tokens · 62281 ms · 2026-05-11T01:46:19.641219+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 4 internal anchors

[1]

Towards making systems forget with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. In 2015 IEEE Symposium on Security and Privacy, pages 463–480. IEEE, 2015

work page 2015
[2]

Regulation (eu) 2016/679 of the european parliament and of the council

Protection Regulation. Regulation (eu) 2016/679 of the european parliament and of the council. Regulation (eu), 679(2016):10–3, 2016. 10

work page 2016
[3]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023. URL https://arxi...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zha ng, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy, pages 141–159. IEEE, 2021

work page 2021
[6]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9304–9312, 2020

work page 2020
[7]

Towards unbounded machine unlearning

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearning. Advances in neural information processing systems, 36:1957–1987, 2023

work page 1957
[8]

arXiv preprint arXiv:2404.05868 , year=

Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning. arXiv preprint arXiv:2404.05868, 2024

work page arXiv 2024
[9]

Salun: Empowering machine unlearning via gradient -based weight saliency in both image classification and generation

Chongyu Foster, Qilong Zhang, Mingfu Roy, Yuanshun Yao, Zhengzhong Liu, and Yang Liu. Salun: Empowering machine unlearning via gradient -based weight saliency in both image classification and generation. In International Conference on Learning Representations, 2024

work page 2024
[10]

Li, Ann- Kathrin Dombrowski, Shashwat Goel, Long Phan, et al

Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann- Kathrin Dombrowski, Shashwat Goel, Long Phan, et al. The wmdp benchmark: Measuring and reducing malicious use with unlearning. In Proceedings of Machine Learning Research, 2024

work page 2024
[11]

Alphaedit: Null-space constrained knowledge editing for language models

Junfeng Fang, Houcheng Luo, Kun Wang, Ruobing Li, Xiang Wang, Aixin Zhang, and Xiangnan He. Alphaedit: Null-space constrained knowledge editing for language models. In International Conference on Learning Representations, 2024

work page 2024
[12]

Tofu: A task of fictitious unlearning for llms

Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. Tofu: A task of fictitious unlearning for llms. In First Conference on Language Modeling, 2024

work page 2024
[13]

Large language model unlearning

Yuanshun Yao, Xiaojun Xu, and Yang Liu. Large language model unlearning. In Advances in Neural Information Processing Systems, 2024

work page 2024
[14]

and Yang, D

Jiaao Chen and Diyi Yang. Unlearn what you want to forget: Efficient unlearning for llms, 2023. URL https://arxiv.org/abs/2310.20150

work page arXiv 2023
[15]

Knowledge unlearning for mitigating privacy risks in language models

Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14389–14408, 2023

work page 2023
[16]

Who’s harry potter? approximate unlearning in llms, 2023.URL https://arxiv

Ronen Eldan and Mark Russinovich. Who’s harry potter? approximate unlearning in llms, 2023. URL https://arxiv. org/abs/2310.02238, 1(2):8, 2024

work page arXiv 2023
[17]

Zero-shot machine unlearning

Vikram S Chundawat , Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Zero-shot machine unlearning. IEEE Transactions on Information Forensics and Security, 18:2345–2354, 2023

work page 2023
[18]

Descent-to-delete: Gradient-based methods for machine unlearning

Seth Neel, Aaron Roth, and Saeed Sharifi -Malvajerdi. Descent-to-delete: Gradient-based methods for machine unlearning. In International Conference on Algorithmic Learning Theory, pages 931–962. PMLR, 2021

work page 2021
[19]

Amnesiac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11516–11524, 2021

work page 2021
[20]

Gptq: Accurate post-training quantization for generative pre-trained transformers

Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers. In International Conference on Learning Representations, 2023

work page 2023
[21]

Qlora: Efficient finetuning of quantized llms

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. arXixx, 36, 2023. 11

work page 2023
[22]

Awq: Activation- aware weight quantization for on -device llm compression and acceleration

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Guangxuan Xiao, and Song Han. Awq: Activation- aware weight quantization for on -device llm compression and acceleration. GetMobile: Mobile Comp. and Comm., 28(4):12–17, January 2025. ISSN 2375-0529. doi: 10.1145/3714983.3714987. URL https://doi.org/10.1145/3714983.3714987

work page doi:10.1145/3714983.3714987 2025
[23]

Mahoney, and Kurt Keutzer

Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, and Kurt Keutzer. Squeezellm: dense-and-sparse quantization. In Proceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024

work page 2024
[24]

Squeezellm: Dense-and-sparse quan- tization,

Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W Mahoney, and Kurt Keutzer. Squeezellm: Dense-and-sparse quantization. arXiv preprint arXiv:2306.07629, 2023

work page arXiv 2023
[25]

arXiv preprint arXiv:2410.16454 , year=

Zhiwei Zhang, Fali Wang, Xiaomin Li, Zongyu Wu, Xianfeng Tang, Hui Liu, Qi He, Wenpeng Yin, and Suhang Wang. Catastrophic failure of LLM unlearning via quantization. In International Conference on Learning Representations, pages 54940–54963, 2025. arXiv:2410.16454

work page arXiv 2025
[26]

Editing models with task arithmetic

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. In International Conference on Learning Representations, 2023

work page 2023
[27]

A survey of machine unlearning

Thanh Tam Nguyen, Thanh Trung Huynh, Zhao Ren, Phi Le Nguyen, Alan Wee-Chung Liew, Hongzhi Yin, and Quoc Viet Hung Nguyen. A survey of machine unlearning. ACM Transactions on Intelligent Systems and Technology, 16(5):1–46, 2025

work page 2025
[28]

Machine unlearning: Solutions and challenges

Jie Xu, Zihan Wu, Cong Wang, and Xiaohua Jia. Machine unlearning: Solutions and challenges. IEEE Transactions on Emerging Topics in Computational Intelligence, 2023

work page 2023
[29]

Towards safer large language models through machine unlearning

Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, and Meng Jiang. Towards safer large language models through machine unlearning. In Findings of the Association for Computational Linguistics: ACL 2024, pages 1817–1829, 2024

work page 2024
[30]

Avoiding copyright infringement via large language model unlearning

Guangyao Dou, Zheyuan Liu, Qing Lyu, Kaize Ding, and Eric Wong. Avoiding copyright infringement via large language model unlearning. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 5176–5200, 2025

work page 2025
[31]

Locating and editing factual associations in gpt

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt. Advances in neural information processing systems, 35:17359–17372, 2022

work page 2022
[32]

Editing large language models: Problems, methods, and opportunities

Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, and Ningyu Zhang. Editing large language models: Problems, methods, and opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10222–10240, 2023

work page 2023
[33]

In-context unlearning: Language models as few shot unlearners.arXiv preprint arXiv:2310.07579,

Martin Pawelczyk, Seth Neel, and Himabindu Lakkaraju. In-context unlearning: Language models as few-shot unlearners. In arXiv preprint arXiv:2310.07579, 2023

work page arXiv 2023
[34]

Knowledge sanitization of large language models

Yoichi Ishibashi and Hidetoshi Shimodaira. Knowledge sanitization of large language models. arXiv preprint arXiv:2309.11852 , 2023. URL https://arxiv.org/abs/2309.11852

work page arXiv 2023
[35]

Quantization and training of neural networks fo r efficient integer - arithmetic-only inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks fo r efficient integer - arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704–2713, 2018

work page 2018
[36]

Quantized neural networks: Training neural networks with low precision weights and activations

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research, 18:1–30, 2018

work page 2018
[37]

Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2016. URL https://arxiv.org/abs/1510.00149

work page internal anchor Pith review arXiv 2016
[38]

Learned step size quantization

Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. Learned step size quantization. In International Conference on Learning Representations, 2020

work page 2020
[39]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013

work page internal anchor Pith review arXiv 2013
[40]

Sharpness-aware minimization for efficiently improving generalization

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021. 12 √

work page 2021
[41]

Inexact unlearning needs more careful evaluations to avoid a false sense of privacy

Jamie Hayes, Ilia Shumailov, Eleni Triantafillou, Amr Khalifa, and Nicolas Papernot. Inexact unlearning needs more careful evaluations to avoid a false sense of privacy. In 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 497–519. IEEE, 2025

work page 2025
[42]

Weinberger

Qizhou Wang, Jin Peng Zhou, Zhanke Zhou, Saebyeol Shin, Bo Han, and Kilian Q. Weinberger. Rethinking llm unlearning objectives: A gradient perspective and go beyond. In International Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/forum?id=huo8MqVH6t

work page 2025
[43]

URLhttps://openreview.net/forum?id=J5IRyTKZ9s

Aengus Lynch, Phillip Guo, Aidan Ewart, Stephen Casper, and Dylan Hadfield-Menell. Eight methods to evaluate robust unlearning in llms. CoRR, abs/2402.16835, 2024. URL https://doi.org/10.48550/ arXiv.2402.16835

work page arXiv 2024
[44]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022
[45]

Membership inference attacks from first principles

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy, pages 1897–1914. IEEE, 2022

work page 2022
[46]

Certified adversarial robustness via randomized smoothing

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pages 1310–1320. PMLR, 2019

work page 2019
[47]

Muse: Machine unlearning six-way evaluation for language models

Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Yanyan Liang, Daogao He, Luke Zettlemoyer, Noah A Smith, and Chiyuan Yu. Muse: Machine unlearning six-way evaluation for language models. In International Conference on Learning Representations, 2023

work page 2023
[48]

Alpaca: A strong, replicable instruction-following model

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B Hashimoto. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6):7, 2023. A Hyperparameter details Hyperparameter Value Optimize...

work page 2023
[49]

Claims Answer: [Yes] Justification: All central claims of the paper are explicitly grounded in empirical evidence and are consistently supported across multiple independent experiments. The identification of INT4 quantization as a recovery attack is demonstrated through controlled comparisons between pre- and post-quantization model behaviors, showing sys...

work page
[50]

Limitations Answer: [Yes] Justification: The paper provides a transparent and comprehensive discussion of its lim - itations, ensuring that the scope of conclusions is clearly bounded. First, the observed degradation in retain accuracy (RA) under strong unlearning is explicitly ackn owledged, and is framed as an inherent trade -off rather than a flaw of t...

work page
[51]

Empirical Result

Theory assumptions and proofs Answer: [Yes] Justification: All theoretical components are presented with explicit assumptions and a clearly delineated scope. The recovery bound proposition is derived under an L-smoothness assumption on the loss function, which is standard in optimization theory and is explicitly stated prior to the derivation. The proof i...

work page
[52]

Section 3 specifies the complete training and evaluation protocol, including dataset preprocessing, model initialization, and evaluation metrics

Experimental result reproducibility 16 ± Answer: [Yes] Justification: The paper provides a fully reproducible experimental pipeline with all nec - essary details disclosed. Section 3 specifies the complete training and evaluation protocol, including dataset preprocessing, model initialization, and evaluation metrics. All hyperpa- rameters, including learn...

work page
[53]

The reposi- tory includes a complete implementation of the proposed method, baseline methods, and evaluation scripts

Open access to data and code Answer: [Yes] Justification: All code required to reproduce the experiments is provided in an anonymized supplementary archive, ensuring compliance with double-blind review policies. The reposi- tory includes a complete implementation of the proposed method, baseline methods, and evaluation scripts. A structured README file gu...

work page
[54]

The paper specifies the model architecture, including the base LLM and any modifications introduced by the method

Experimental setting/details Answer: [Yes] Justification: The experimental setup is described in sufficient detail to allow exact replica- tion and critical evaluation. The paper specifies the model architecture, including the base LLM and any modifications introduced by the method. Training procedures, including the number of epochs, optimization algorit...

work page
[55]

Multi-seed experiments are conducted using three independent random seeds, and results are reported as mean standard deviation

Experiment statistical significance Answer: [Yes] Justification: The paper reports statistical measures to ensure that results are not due to random variation. Multi-seed experiments are conducted using three independent random seeds, and results are reported as mean standard deviation. This applies to both the proposed method and key baselines such as Sa...

work page
[56]

Key components, such as warm -up schedul- ing, straight -through estimator (STE) coverage, and the regularization parameter λ, are systematically varied

Ablations Answer: [Yes] Justification: The paper includes a comprehensive ablation study to isolate the contribution of each component of the proposed method. Key components, such as warm -up schedul- ing, straight -through estimator (STE) coverage, and the regularization parameter λ, are systematically varied. Each ablation experiment is conducted under ...

work page
[57]

While it identifies quantization as a potential recovery pathway, this is framed in the context of vulnerability analysis and mitigation

Safeguards Answer: [N/A] Justification: The paper focuses on defensive techniques for improving the robustness of machine unlearning and does not introduce new attack mechanisms intended for misuse. While it identifies quantization as a potential recovery pathway, this is framed in the context of vulnerability analysis and mitigation. No tools or datasets...

work page
[58]

The LLaMA-3 model is used under the Meta Research License, and access is obtained through authorized channels

Licenses Answer: [Yes] Justification: All datasets, models, and libraries used in the paper comply with their respec- tive licenses. The LLaMA-3 model is used under the Meta Research License, and access is obtained through authorized channels. The TOFU dataset is released under the MIT license and is freely accessible. The bits-and-bytes library used for ...

work page
[59]

These include implementations of the proposed DurableUn method, evaluation scripts, and baseline integrations

New assets Answer: [Yes] Justification: The paper introduces new code assets available in the supplementary material. These include implementations of the proposed DurableUn method, evaluation scripts, and baseline integrations. The codebase is documented with a README file that provides step-by-step instructions for reproducing the issue. Dependencies ar...

work page
[60]

Crowdsourcing / human subjects Answer: [N/A]

work page
[61]

It is not used as a generative tool in the methodology or for producing results

LLM usage Answer: [N/A] Justification: The LLaMA-3-8B model is used purely as an experimental subject to evaluate unlearning and quantization effects. It is not used as a generative tool in the methodology or for producing results. No human-facing outputs are generated using the model. Therefore, no additional disclosure related to LLM usage is required

work page