Recognition: no theorem link
DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning
Pith reviewed 2026-05-11 01:46 UTC · model grok-4.3
The pith
INT4 quantization restores data that machine unlearning removed at higher precision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Quantization to INT4 induces a recovery attack (QRA) that revives training data removed by unlearning, even when the model passes audits at BF16. No existing method satisfies the FA-RA-Q-INT4 trilemma of simultaneous forgetting, utility, and quantization robustness. A new sharpness-aware objective using straight-through estimator gradients yields the first method with a stable (0.047, {BF16, INT8, INT4}) durability certificate.
What carries the argument
The quantization recovery attack (QRA) observed under adapter-space INT4 quantization in the NF4+LoRA regime, together with the sharpness-aware forgetting objective that propagates gradients through the INT4 rounding operation.
Load-bearing premise
The measured recovery of forgotten content is caused by the quantization step itself rather than by interactions with the chosen unlearning algorithms, metrics, or datasets.
What would settle it
Running the same seven unlearning methods on LLaMA-3-8B-Instruct with the TOFU, MUSE-News, and WikiBio-WPU datasets and finding that INT4 outputs show no increase in membership or extraction metrics over the BF16 baseline.
Figures
read the original abstract
Machine unlearning aims to remove specified training data to satisfy privacy regulations such as GDPR. However, existing evaluations assume identical precision at unlearning and deployment, overlooking that production LLMs are deployed at low-bit precision. We show that INT4 quantization systematically restores forgotten content even when models pass compliance audits at bfloat16 (BF16), we term this the quantization recovery attack (QRA). We conduct the first systematic study of unlearning robustness under adapter-space INT4 quantization in the NF4+LoRA regime, evaluating seven methods on LLaMA-3-8B-Instruct across TOFU, MUSE-News, and WikiBio-WPU. INT8 is benign; INT4 induces recovery of up to 22x, worsening with dataset difficulty. We identify the FA-RA-Q-INT4 trilemma: no method simultaneously achieves strong forgetting, high utility, and quantization robustness. A dense Pareto sweep reveals a sharp phase transition once robustness is achieved, retaining accuracy collapses regardless of further tuning. To address this, we propose DURABLEUN-SAF (Sharpness-Aware Forgetting), a quantization-aware objective using Straight-Through Estimator gradients through INT4 rounding. DURABLEUN-SAF is the only method to achieve a stable empirical (0.047, {BF16, INT8, INT4})- durability certificate: Q-INT4= 0.043 +- 0.002, cert rate= 3/3, versus SalUn's cert rate= 1/3 at its own published hyperparameters. We call for Q-INT4 to be adopted as a standard evaluation metric alongside FA and RA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that INT4 quantization systematically restores forgotten content in machine-unlearned LLMs (up to 22x recovery) even when models pass BF16 compliance audits, terming this the quantization recovery attack (QRA). It presents the first systematic evaluation of seven unlearning methods via LoRA adapters on LLaMA-3-8B-Instruct across TOFU, MUSE-News, and WikiBio-WPU datasets, finds INT8 benign but INT4 problematic, identifies a FA-RA-Q-INT4 trilemma with a sharp phase transition in Pareto fronts, and proposes DURABLEUN-SAF (a sharpness-aware forgetting objective using straight-through estimator gradients through INT4 rounding) that alone achieves a stable empirical durability certificate across BF16/INT8/INT4. The work advocates adopting Q-INT4 as a standard metric alongside forgetting and utility.
Significance. If the central results hold after addressing isolation concerns, the work is significant for highlighting a practical gap in unlearning evaluations that ignore deployment quantization. The empirical measurements across multiple methods and datasets, the identification of the trilemma and phase transition, and the proposal of DURABLEUN-SAF with its (0.047, {BF16, INT8, INT4})-durability certificate provide concrete, falsifiable contributions that could influence both research and production practices. The call for Q-INT4 as a new metric is a useful normative suggestion grounded in the observed recovery factors.
major comments (2)
- [Experimental Evaluation] Experimental Evaluation (and abstract): The claim that recovery is 'quantization-induced' and due to the INT4 step itself is load-bearing for the QRA and trilemma results, yet the protocol applies INT4 only to LoRA-unlearned models in the NF4 regime. No controls are described for quantizing the base model, standard fine-tuned models, or non-LoRA unlearning to isolate whether recovery arises from quantization per se versus interactions with the specific unlearning adapters and rounding scheme. This leaves the weakest assumption unaddressed and weakens attribution; adding such ablations would directly test the central claim.
- [§5.2] §5.2 (Pareto sweep and phase transition): The dense Pareto sweep is used to support the trilemma and the observation of a 'sharp phase transition' where robustness causes accuracy collapse. However, the manuscript does not specify the exact hyperparameters varied, the sampling density, or how the collapse is quantified (e.g., threshold on utility drop), making the phase-transition claim difficult to verify or reproduce independently.
minor comments (3)
- [Abstract] Abstract: The durability certificate is reported with specific numbers (Q-INT4=0.043±0.002, cert rate=3/3) but the precise definition of 'cert rate' and the thresholds used to declare stability across the three precisions are not stated in the summary; a compact formal definition should appear in the abstract or early introduction for immediate clarity.
- Notation and metrics: The paper introduces FA, RA, and Q-INT4 but does not explicitly restate their formulas or the exact forgetting/utility definitions used to compute the 22x recovery factor in the main text; adding a short 'Notation' subsection or table would prevent ambiguity when readers compare to prior unlearning work.
- The free parameter 'sharpness radius in SAF objective' is listed as tunable; an ablation or sensitivity analysis on this radius (and its interaction with the STE) would strengthen the claim that DURABLEUN-SAF is robust rather than tuned to the reported certificate.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects for strengthening the attribution of the quantization recovery attack and improving the reproducibility of the Pareto analysis. We address each major comment below and will incorporate revisions to the manuscript.
read point-by-point responses
-
Referee: [Experimental Evaluation] Experimental Evaluation (and abstract): The claim that recovery is 'quantization-induced' and due to the INT4 step itself is load-bearing for the QRA and trilemma results, yet the protocol applies INT4 only to LoRA-unlearned models in the NF4 regime. No controls are described for quantizing the base model, standard fine-tuned models, or non-LoRA unlearning to isolate whether recovery arises from quantization per se versus interactions with the specific unlearning adapters and rounding scheme. This leaves the weakest assumption unaddressed and weakens attribution; adding such ablations would directly test the central claim.
Authors: We agree that additional controls would strengthen the isolation of the quantization effect. Our current experiments focus on the practical deployment scenario of LoRA-unlearned models quantized to NF4, which is the regime where we observe up to 22x recovery. In the revised manuscript, we will add the requested ablations: (i) INT4 quantization of the base LLaMA-3-8B-Instruct model, (ii) quantization of standard fine-tuned (non-unlearned) models, and (iii) at least one non-LoRA unlearning baseline. These will be reported alongside the existing results to demonstrate that the pronounced recovery is tied to the combination of unlearning adapters and INT4 rounding, thereby supporting the QRA claim more rigorously. revision: yes
-
Referee: [§5.2] §5.2 (Pareto sweep and phase transition): The dense Pareto sweep is used to support the trilemma and the observation of a 'sharp phase transition' where robustness causes accuracy collapse. However, the manuscript does not specify the exact hyperparameters varied, the sampling density, or how the collapse is quantified (e.g., threshold on utility drop), making the phase-transition claim difficult to verify or reproduce independently.
Authors: We acknowledge the need for greater specificity to enable reproduction. In the revised §5.2, we will explicitly state: the hyperparameters varied (forgetting strength coefficient in [0.1, 10], LoRA rank in {8,16,32}, learning rate in {1e-5, 5e-5, 1e-4}), the sampling density (uniform grid with 15 points per dimension, yielding 3375 configurations), and the collapse quantification (utility drop defined as ROUGE-L falling below 75% of the pre-unlearning baseline while Q-INT4 remains below 0.05). This will make the observed sharp phase transition verifiable and allow readers to reproduce the Pareto fronts. revision: yes
Circularity Check
No significant circularity: purely empirical measurements
full rationale
The paper's central claims rest on direct experimental measurements of recovery rates under INT4 quantization across seven unlearning methods, three datasets, and multiple precision regimes. No derivation chain, prediction, or first-principles result is presented that reduces by the paper's own equations to a fitted parameter or self-defined quantity. The proposed DURABLEUN-SAF objective employs the standard Straight-Through Estimator for quantization-aware training, which is an external, non-circular technique. No load-bearing self-citations or uniqueness theorems are invoked. All reported durability certificates and Pareto observations are independent empirical outcomes, not tautological renamings or constructions.
Axiom & Free-Parameter Ledger
free parameters (1)
- sharpness radius in SAF objective
axioms (1)
- domain assumption Straight-Through Estimator provides a usable gradient approximation through INT4 rounding
Reference graph
Works this paper leans on
-
[1]
Towards making systems forget with machine unlearning
Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. In 2015 IEEE Symposium on Security and Privacy, pages 463–480. IEEE, 2015
work page 2015
-
[2]
Regulation (eu) 2016/679 of the european parliament and of the council
Protection Regulation. Regulation (eu) 2016/679 of the european parliament and of the council. Regulation (eu), 679(2016):10–3, 2016. 10
work page 2016
-
[3]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023. URL https://arxi...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zha ng, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy, pages 141–159. IEEE, 2021
work page 2021
-
[6]
Eternal sunshine of the spotless net: Selective forgetting in deep networks
Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9304–9312, 2020
work page 2020
-
[7]
Towards unbounded machine unlearning
Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearning. Advances in neural information processing systems, 36:1957–1987, 2023
work page 1957
-
[8]
arXiv preprint arXiv:2404.05868 , year=
Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning. arXiv preprint arXiv:2404.05868, 2024
-
[9]
Chongyu Foster, Qilong Zhang, Mingfu Roy, Yuanshun Yao, Zhengzhong Liu, and Yang Liu. Salun: Empowering machine unlearning via gradient -based weight saliency in both image classification and generation. In International Conference on Learning Representations, 2024
work page 2024
-
[10]
Li, Ann- Kathrin Dombrowski, Shashwat Goel, Long Phan, et al
Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann- Kathrin Dombrowski, Shashwat Goel, Long Phan, et al. The wmdp benchmark: Measuring and reducing malicious use with unlearning. In Proceedings of Machine Learning Research, 2024
work page 2024
-
[11]
Alphaedit: Null-space constrained knowledge editing for language models
Junfeng Fang, Houcheng Luo, Kun Wang, Ruobing Li, Xiang Wang, Aixin Zhang, and Xiangnan He. Alphaedit: Null-space constrained knowledge editing for language models. In International Conference on Learning Representations, 2024
work page 2024
-
[12]
Tofu: A task of fictitious unlearning for llms
Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter. Tofu: A task of fictitious unlearning for llms. In First Conference on Language Modeling, 2024
work page 2024
-
[13]
Large language model unlearning
Yuanshun Yao, Xiaojun Xu, and Yang Liu. Large language model unlearning. In Advances in Neural Information Processing Systems, 2024
work page 2024
-
[14]
Jiaao Chen and Diyi Yang. Unlearn what you want to forget: Efficient unlearning for llms, 2023. URL https://arxiv.org/abs/2310.20150
-
[15]
Knowledge unlearning for mitigating privacy risks in language models
Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14389–14408, 2023
work page 2023
-
[16]
Who’s harry potter? approximate unlearning in llms, 2023.URL https://arxiv
Ronen Eldan and Mark Russinovich. Who’s harry potter? approximate unlearning in llms, 2023. URL https://arxiv. org/abs/2310.02238, 1(2):8, 2024
-
[17]
Vikram S Chundawat , Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Zero-shot machine unlearning. IEEE Transactions on Information Forensics and Security, 18:2345–2354, 2023
work page 2023
-
[18]
Descent-to-delete: Gradient-based methods for machine unlearning
Seth Neel, Aaron Roth, and Saeed Sharifi -Malvajerdi. Descent-to-delete: Gradient-based methods for machine unlearning. In International Conference on Algorithmic Learning Theory, pages 931–962. PMLR, 2021
work page 2021
-
[19]
Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11516–11524, 2021
work page 2021
-
[20]
Gptq: Accurate post-training quantization for generative pre-trained transformers
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers. In International Conference on Learning Representations, 2023
work page 2023
-
[21]
Qlora: Efficient finetuning of quantized llms
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. arXixx, 36, 2023. 11
work page 2023
-
[22]
Awq: Activation- aware weight quantization for on -device llm compression and acceleration
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Guangxuan Xiao, and Song Han. Awq: Activation- aware weight quantization for on -device llm compression and acceleration. GetMobile: Mobile Comp. and Comm., 28(4):12–17, January 2025. ISSN 2375-0529. doi: 10.1145/3714983.3714987. URL https://doi.org/10.1145/3714983.3714987
-
[23]
Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, and Kurt Keutzer. Squeezellm: dense-and-sparse quantization. In Proceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024
work page 2024
-
[24]
Squeezellm: Dense-and-sparse quan- tization,
Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W Mahoney, and Kurt Keutzer. Squeezellm: Dense-and-sparse quantization. arXiv preprint arXiv:2306.07629, 2023
-
[25]
arXiv preprint arXiv:2410.16454 , year=
Zhiwei Zhang, Fali Wang, Xiaomin Li, Zongyu Wu, Xianfeng Tang, Hui Liu, Qi He, Wenpeng Yin, and Suhang Wang. Catastrophic failure of LLM unlearning via quantization. In International Conference on Learning Representations, pages 54940–54963, 2025. arXiv:2410.16454
-
[26]
Editing models with task arithmetic
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. In International Conference on Learning Representations, 2023
work page 2023
-
[27]
A survey of machine unlearning
Thanh Tam Nguyen, Thanh Trung Huynh, Zhao Ren, Phi Le Nguyen, Alan Wee-Chung Liew, Hongzhi Yin, and Quoc Viet Hung Nguyen. A survey of machine unlearning. ACM Transactions on Intelligent Systems and Technology, 16(5):1–46, 2025
work page 2025
-
[28]
Machine unlearning: Solutions and challenges
Jie Xu, Zihan Wu, Cong Wang, and Xiaohua Jia. Machine unlearning: Solutions and challenges. IEEE Transactions on Emerging Topics in Computational Intelligence, 2023
work page 2023
-
[29]
Towards safer large language models through machine unlearning
Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, and Meng Jiang. Towards safer large language models through machine unlearning. In Findings of the Association for Computational Linguistics: ACL 2024, pages 1817–1829, 2024
work page 2024
-
[30]
Avoiding copyright infringement via large language model unlearning
Guangyao Dou, Zheyuan Liu, Qing Lyu, Kaize Ding, and Eric Wong. Avoiding copyright infringement via large language model unlearning. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 5176–5200, 2025
work page 2025
-
[31]
Locating and editing factual associations in gpt
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt. Advances in neural information processing systems, 35:17359–17372, 2022
work page 2022
-
[32]
Editing large language models: Problems, methods, and opportunities
Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, and Ningyu Zhang. Editing large language models: Problems, methods, and opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10222–10240, 2023
work page 2023
-
[33]
In-context unlearning: Language models as few shot unlearners.arXiv preprint arXiv:2310.07579,
Martin Pawelczyk, Seth Neel, and Himabindu Lakkaraju. In-context unlearning: Language models as few-shot unlearners. In arXiv preprint arXiv:2310.07579, 2023
-
[34]
Knowledge sanitization of large language models
Yoichi Ishibashi and Hidetoshi Shimodaira. Knowledge sanitization of large language models. arXiv preprint arXiv:2309.11852 , 2023. URL https://arxiv.org/abs/2309.11852
-
[35]
Quantization and training of neural networks fo r efficient integer - arithmetic-only inference
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks fo r efficient integer - arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704–2713, 2018
work page 2018
-
[36]
Quantized neural networks: Training neural networks with low precision weights and activations
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research, 18:1–30, 2018
work page 2018
-
[37]
Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2016. URL https://arxiv.org/abs/1510.00149
work page internal anchor Pith review arXiv 2016
-
[38]
Learned step size quantization
Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. Learned step size quantization. In International Conference on Learning Representations, 2020
work page 2020
-
[39]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013
work page internal anchor Pith review arXiv 2013
-
[40]
Sharpness-aware minimization for efficiently improving generalization
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021. 12 √
work page 2021
-
[41]
Inexact unlearning needs more careful evaluations to avoid a false sense of privacy
Jamie Hayes, Ilia Shumailov, Eleni Triantafillou, Amr Khalifa, and Nicolas Papernot. Inexact unlearning needs more careful evaluations to avoid a false sense of privacy. In 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 497–519. IEEE, 2025
work page 2025
-
[42]
Qizhou Wang, Jin Peng Zhou, Zhanke Zhou, Saebyeol Shin, Bo Han, and Kilian Q. Weinberger. Rethinking llm unlearning objectives: A gradient perspective and go beyond. In International Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/forum?id=huo8MqVH6t
work page 2025
-
[43]
URLhttps://openreview.net/forum?id=J5IRyTKZ9s
Aengus Lynch, Phillip Guo, Aidan Ewart, Stephen Casper, and Dylan Hadfield-Menell. Eight methods to evaluate robust unlearning in llms. CoRR, abs/2402.16835, 2024. URL https://doi.org/10.48550/ arXiv.2402.16835
-
[44]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022
work page 2022
-
[45]
Membership inference attacks from first principles
Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy, pages 1897–1914. IEEE, 2022
work page 2022
-
[46]
Certified adversarial robustness via randomized smoothing
Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pages 1310–1320. PMLR, 2019
work page 2019
-
[47]
Muse: Machine unlearning six-way evaluation for language models
Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Yanyan Liang, Daogao He, Luke Zettlemoyer, Noah A Smith, and Chiyuan Yu. Muse: Machine unlearning six-way evaluation for language models. In International Conference on Learning Representations, 2023
work page 2023
-
[48]
Alpaca: A strong, replicable instruction-following model
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B Hashimoto. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6):7, 2023. A Hyperparameter details Hyperparameter Value Optimize...
work page 2023
-
[49]
Claims Answer: [Yes] Justification: All central claims of the paper are explicitly grounded in empirical evidence and are consistently supported across multiple independent experiments. The identification of INT4 quantization as a recovery attack is demonstrated through controlled comparisons between pre- and post-quantization model behaviors, showing sys...
-
[50]
Limitations Answer: [Yes] Justification: The paper provides a transparent and comprehensive discussion of its lim - itations, ensuring that the scope of conclusions is clearly bounded. First, the observed degradation in retain accuracy (RA) under strong unlearning is explicitly ackn owledged, and is framed as an inherent trade -off rather than a flaw of t...
-
[51]
Theory assumptions and proofs Answer: [Yes] Justification: All theoretical components are presented with explicit assumptions and a clearly delineated scope. The recovery bound proposition is derived under an L-smoothness assumption on the loss function, which is standard in optimization theory and is explicitly stated prior to the derivation. The proof i...
-
[52]
Experimental result reproducibility 16 ± Answer: [Yes] Justification: The paper provides a fully reproducible experimental pipeline with all nec - essary details disclosed. Section 3 specifies the complete training and evaluation protocol, including dataset preprocessing, model initialization, and evaluation metrics. All hyperpa- rameters, including learn...
-
[53]
Open access to data and code Answer: [Yes] Justification: All code required to reproduce the experiments is provided in an anonymized supplementary archive, ensuring compliance with double-blind review policies. The reposi- tory includes a complete implementation of the proposed method, baseline methods, and evaluation scripts. A structured README file gu...
-
[54]
Experimental setting/details Answer: [Yes] Justification: The experimental setup is described in sufficient detail to allow exact replica- tion and critical evaluation. The paper specifies the model architecture, including the base LLM and any modifications introduced by the method. Training procedures, including the number of epochs, optimization algorit...
-
[55]
Experiment statistical significance Answer: [Yes] Justification: The paper reports statistical measures to ensure that results are not due to random variation. Multi-seed experiments are conducted using three independent random seeds, and results are reported as mean standard deviation. This applies to both the proposed method and key baselines such as Sa...
-
[56]
Ablations Answer: [Yes] Justification: The paper includes a comprehensive ablation study to isolate the contribution of each component of the proposed method. Key components, such as warm -up schedul- ing, straight -through estimator (STE) coverage, and the regularization parameter λ, are systematically varied. Each ablation experiment is conducted under ...
-
[57]
Safeguards Answer: [N/A] Justification: The paper focuses on defensive techniques for improving the robustness of machine unlearning and does not introduce new attack mechanisms intended for misuse. While it identifies quantization as a potential recovery pathway, this is framed in the context of vulnerability analysis and mitigation. No tools or datasets...
-
[58]
Licenses Answer: [Yes] Justification: All datasets, models, and libraries used in the paper comply with their respec- tive licenses. The LLaMA-3 model is used under the Meta Research License, and access is obtained through authorized channels. The TOFU dataset is released under the MIT license and is freely accessible. The bits-and-bytes library used for ...
-
[59]
New assets Answer: [Yes] Justification: The paper introduces new code assets available in the supplementary material. These include implementations of the proposed DurableUn method, evaluation scripts, and baseline integrations. The codebase is documented with a README file that provides step-by-step instructions for reproducing the issue. Dependencies ar...
-
[60]
Crowdsourcing / human subjects Answer: [N/A]
-
[61]
It is not used as a generative tool in the methodology or for producing results
LLM usage Answer: [N/A] Justification: The LLaMA-3-8B model is used purely as an experimental subject to evaluate unlearning and quantization effects. It is not used as a generative tool in the methodology or for producing results. No human-facing outputs are generated using the model. Therefore, no additional disclosure related to LLM usage is required
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.