FedQUIT: On-Device Federated Unlearning via a Quasi-Competent Virtual Teacher
Pith reviewed 2026-05-23 21:46 UTC · model grok-4.3
The pith
FedQUIT lets clients unlearn their data on-device in federated learning by distilling from a modified global model without extra protocol assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FedQUIT achieves unlearning in federated learning by having the requesting client use a virtual teacher obtained by manipulating the global model's outputs on forget data—penalizing true-class confidence while preserving non-true class relationships—to train its local model via knowledge distillation, thereby removing its data's influence without additional assumptions beyond FedAvg.
What carries the argument
The quasi-competent virtual teacher created by selective output manipulation on forget data inside a teacher-student distillation loop where the client's local model is the student.
Load-bearing premise
Penalizing true-class confidence on forget data while preserving non-true class relationships in the global model is enough to make the client model forget without harming its overall generalization under standard FedAvg.
What would settle it
A centralized evaluation showing that the updated global model still achieves high accuracy when tested on the forget client's data after FedQUIT completes would falsify the unlearning claim.
Figures
read the original abstract
Federated Learning (FL) enables the collaborative training of machine learning models without requiring centralized collection of user data. To comply with the right to be forgotten, FL clients should be able to request the removal of their data contributions from the global model. In this paper, we propose FedQUIT, a novel unlearning algorithm that operates directly on client devices that request to remove its contribution. Our method leverages knowledge distillation to remove the influence of the target client's data from the global model while preserving its generalization ability. FedQUIT adopts a teacher-student framework, where a modified version of the current global model serves as a virtual teacher and the client's model acts as the student. The virtual teacher is obtained by manipulating the global model's outputs on forget data, penalizing the confidence assigned to the true class while preserving relationships among outputs of non-true classes, to simultaneously induce forgetting and retain useful knowledge. As a result, FedQUIT achieves unlearning without making any additional assumption over the standard FedAvg protocol. Evaluation across diverse datasets, data heterogeneity levels, and model architectures shows that FedQUIT achieves superior or comparable unlearning efficacy compared to six state-of-the-art methods, while significantly reducing cumulative communication and computational overhead relative to retraining from scratch.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes FedQUIT, an on-device federated unlearning algorithm for the right to be forgotten in FL. It uses a teacher-student knowledge distillation setup where the client's local model is the student and a modified version of the current global model serves as a virtual teacher; the teacher is created by manipulating outputs on forget data to penalize true-class confidence while preserving relative non-true class outputs. The method claims to achieve effective unlearning under the standard FedAvg protocol with no extra assumptions, and reports superior or comparable unlearning efficacy to six SOTA baselines across datasets, heterogeneity levels, and architectures, while cutting cumulative communication and compute relative to retraining from scratch.
Significance. If the core heuristic is shown to reliably excise client influence without degrading retain-set performance or requiring protocol changes, the result would be significant for practical deployment of unlearning in federated systems, as it avoids the high cost of full retraining and operates locally on the requesting client.
major comments (2)
- [Abstract / §3] Abstract (and §3, virtual-teacher construction): the central claim that FedQUIT requires 'no additional assumption over the standard FedAvg protocol' is load-bearing, yet the manuscript provides no derivation or bound showing why penalizing only the true-class logit on forget samples (while keeping non-true relationships) suffices to remove the client's contribution after aggregation; under non-IID FedAvg the global outputs on a single client's forget set may already be entangled with other clients' data, so the heuristic risks leaving residual influence or harming generalization on retain data.
- [Evaluation] Evaluation sections: the reported superiority over six baselines is presented without explicit controls for the exact definition of the manipulation parameter, statistical significance across runs, or ablation isolating the effect of the non-true-class preservation step; without these, it is unclear whether the claimed reduction in overhead is robust or an artifact of the chosen heterogeneity levels.
minor comments (2)
- [§3] Notation for the output manipulation (e.g., how the penalized logits are exactly computed and scaled) should be formalized with an equation rather than prose description.
- [Related Work] The paper should include a short related-work table contrasting FedQUIT's on-device, no-extra-assumption property against the six baselines.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract / §3] Abstract (and §3, virtual-teacher construction): the central claim that FedQUIT requires 'no additional assumption over the standard FedAvg protocol' is load-bearing, yet the manuscript provides no derivation or bound showing why penalizing only the true-class logit on forget samples (while keeping non-true relationships) suffices to remove the client's contribution after aggregation; under non-IID FedAvg the global outputs on a single client's forget set may already be entangled with other clients' data, so the heuristic risks leaving residual influence or harming generalization on retain data.
Authors: FedQUIT performs unlearning entirely locally on the requesting client using only the global model received under standard FedAvg, with no protocol changes or extra information required from the server. The virtual teacher is constructed by a local manipulation that reduces true-class confidence on forget samples while preserving relative outputs among non-true classes; this is presented as a practical heuristic rather than a theoretically bounded procedure. Experiments across non-IID partitions show effective unlearning without retain-set degradation, supporting that the approach does not introduce new assumptions. We will revise §3 and the discussion to clarify the heuristic motivation and explicitly note the absence of a formal bound on residual influence. revision: partial
-
Referee: [Evaluation] Evaluation sections: the reported superiority over six baselines is presented without explicit controls for the exact definition of the manipulation parameter, statistical significance across runs, or ablation isolating the effect of the non-true-class preservation step; without these, it is unclear whether the claimed reduction in overhead is robust or an artifact of the chosen heterogeneity levels.
Authors: We agree that the current evaluation would benefit from these controls. The revised manuscript will report the precise manipulation parameter values, include mean and standard deviation over multiple independent runs to establish statistical significance, and add an ablation isolating the non-true-class preservation component. These additions will demonstrate that the overhead reductions hold across the tested heterogeneity levels. revision: yes
- A formal derivation or bound establishing that the local heuristic suffices to excise client influence after aggregation under non-IID FedAvg.
Circularity Check
No load-bearing circularity; unlearning heuristic presented as independent of fitted inputs or self-citations
full rationale
The paper claims FedQUIT works under unmodified FedAvg with a virtual teacher obtained by output manipulation on forget data. No equations, self-citations, or ansatzes are shown reducing the unlearning efficacy or the 'no additional assumptions' claim to a quantity defined inside the paper. Evaluation uses external baselines and diverse datasets. This matches the default non-circular outcome for a method whose central step is a stated heuristic rather than a derived identity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions of the FedAvg protocol are sufficient for the unlearning procedure to succeed without further modeling choices.
Forward citations
Cited by 1 Pith paper
-
Asynchronous Federated Unlearning with Invariance Calibration for Medical Imaging
AFU-IC decouples client unlearning from global federated training in medical imaging and adds server-side invariance calibration to prevent relearning of erased data.
Reference graph
Works this paper leans on
-
[1]
Get rid of your trail: Remotely erasing backdoors in federated learning
Manaar Alam, Hithem Lamri, and Michail Maniatakos. Get rid of your trail: Remotely erasing backdoors in federated learning. arXiv preprint arXiv:2304.10638, 2023. 2
-
[2]
Decen- tralised Learning in Federated Deployment Environments: A System-Level Survey
Paolo Bellavista, Luca Foschini, and Alessio Mora. Decen- tralised Learning in Federated Deployment Environments: A System-Level Survey. ACM Computing Surveys (CSUR), 54 (1):1–38, 2021. 1
work page 2021
-
[3]
Cristian Bucilu ˇa, Rich Caruana, and Alexandru Niculescu- Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 535–541, 2006. 2
work page 2006
-
[4]
Fedrecover: Recovering from poisoning attacks in federated learning using historical information
Xiaoyu Cao, Jinyuan Jia, Zaixi Zhang, and Neil Zhenqiang Gong. Fedrecover: Recovering from poisoning attacks in federated learning using historical information. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1366– 1383, 2023. 2
work page 2023
-
[5]
Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher
Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 7210–7217, 2023. 2, 8, 3, 4, 6, 7
work page 2023
-
[6]
Large scale distributed deep networks
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc’aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, et al. Large scale distributed deep networks. In Advances in neural information process- ing systems, pages 1223–1231, 2012. 2
work page 2012
-
[7]
Regulation (EU) 2016/679 of the European Parliament and of the Council, 2016
European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council, 2016. 1
work page 2016
-
[8]
Eternal sunshine of the spotless net: Selective forgetting in deep networks
Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 9304– 9312, 2020. 1
work page 2020
-
[9]
Ferrari: federated feature unlearning via optimizing feature sensitivity
Hanlin Gu, WinKent Ong, Chee Seng Chan, and Lixin Fan. Ferrari: federated feature unlearning via optimizing feature sensitivity. Advances in Neural Information Processing Sys- tems, 37:24150–24180, 2025. 3
work page 2025
-
[10]
Not all minorities are equal: Empty- class-aware distillation for heterogeneous federated learning
Kuangpu Guo, Yuhe Ding, Jian Liang, Ran He, Zilei Wang, and Tieniu Tan. Not all minorities are equal: Empty- class-aware distillation for heterogeneous federated learning. arXiv preprint arXiv:2401.02329, 2024. 1, 2
-
[11]
FAST: Adopt- ing Federated Unlearning to Eliminating Malicious Termi- nals at Server Side
Xintong Guo, Pengfei Wang, Sen Qiu, Wei Song, Qiang Zhang, Xiaopeng Wei, and Dongsheng Zhou. FAST: Adopt- ing Federated Unlearning to Eliminating Malicious Termi- nals at Server Side. IEEE Transactions on Network Science and Engineering, pages 1–14, 2023. 2
work page 2023
-
[12]
Federated unlearning: How to effi- ciently erase a client in fl? arXiv preprint arXiv:2207.05521,
Anisa Halimi, Swanand Kadhe, Ambrish Rawat, and Nathalie Baracaldo. Federated unlearning: How to effi- ciently erase a client in fl? arXiv preprint arXiv:2207.05521,
-
[13]
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In Proc. of IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 770–778, 2016. 5, 4
work page 2016
-
[14]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. 4, 2
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[15]
Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification
Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. Mea- suring the effects of non-identical data distribution for feder- ated visual classification. arXiv preprint arXiv:1909.06335,
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[16]
Advances and open problems in federated learn- ing
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aur´elien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cum- mings, et al. Advances and open problems in federated learn- ing. Foundations and trends® in machine learning, 14(1–2): 1–210, 2021. 1, 2
work page 2021
-
[17]
Multi-level branched regularization for federated learning
Jinkyu Kim, Geeho Kim, and Bohyung Han. Multi-level branched regularization for federated learning. In Inter- national Conference on Machine Learning , pages 11058– 11073. PMLR, 2022. 4
work page 2022
-
[18]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009. 5
work page 2009
-
[19]
Preservation of the global knowledge by not- true distillation in federated learning
Gihun Lee, Minchan Jeong, Yongjin Shin, Sangmin Bae, and Se-Young Yun. Preservation of the global knowledge by not- true distillation in federated learning. In Advances in Neural Information Processing Systems, 2022. 1, 2
work page 2022
-
[20]
Federaser: Enabling efficient client-level data removal from federated learning models
Gaoyang Liu, Xiaoqiang Ma, Yang Yang, Chen Wang, and Jiangchuan Liu. Federaser: Enabling efficient client-level data removal from federated learning models. In 2021 IEEE/ACM 29th International Symposium on Quality of Ser- vice (IWQOS), pages 1–10, 2021. 2, 3
work page 2021
-
[21]
Model spar- sity can simplify machine unlearning
Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, PRANAY SHARMA, Sijia Liu, et al. Model spar- sity can simplify machine unlearning. Advances in Neural Information Processing Systems, 36, 2024. 6
work page 2024
-
[22]
The right to be forgotten in federated learning: An efficient real- ization with rapid retraining
Yi Liu, Lei Xu, Xingliang Yuan, Cong Wang, and Bo Li. The right to be forgotten in federated learning: An efficient real- ization with rapid retraining. InIEEE INFOCOM 2022-IEEE Conference on Computer Communications , pages 1749–
work page 2022
-
[23]
Federated learning with label- masking distillation
Jianghu Lu, Shikun Li, Kexin Bao, Pengju Wang, Zhenx- ing Qian, and Shiming Ge. Federated learning with label- masking distillation. In Proceedings of the 31st ACM Inter- national Conference on Multimedia , pages 222–232, 2023. 1, 2
work page 2023
-
[24]
Communication- efficient learning of deep networks from decentralized data
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics , pages 1273–1282. PMLR, 2017. 1, 2, 4
work page 2017
-
[25]
Knowledge distillation in federated learning: A prac- tical guide
Alessio Mora, Irene Tenison, Paolo Bellavista, and Irina Rish. Knowledge distillation in federated learning: A prac- tical guide. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, pages 8188–8196. International Joint Conferences on Artificial In- telligence Organization, 2024. Survey Track. 2
work page 2024
-
[26]
Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Kone ˇcn´y, Sanjiv Kumar, and Hugh Brendan McMahan. Adaptive federated optimization. In 9th International Conference on Learning Representa- tions, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021. 3
work page 2021
-
[27]
Federated unlearning: A survey on methods, design guidelines, and evaluation met- rics
Nicol `o Romandini, Alessio Mora, Carlo Mazzocca, Rebecca Montanari, and Paolo Bellavista. Federated unlearning: A survey on methods, design guidelines, and evaluation met- rics. IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2024. 2, 3, 6, 7
work page 2024
-
[28]
Membership inference attacks against machine learning models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2017. 1
work page 2017
-
[29]
Systematic evaluation of pri- vacy risks of machine learning models
Liwei Song and Prateek Mittal. Systematic evaluation of pri- vacy risks of machine learning models. In 30th USENIX Se- curity Symposium (USENIX Security 21), pages 2615–2632,
-
[30]
Privacy risks of securing machine learning models against adversarial ex- amples
Liwei Song, Reza Shokri, and Prateek Mittal. Privacy risks of securing machine learning models against adversarial ex- amples. In Proceedings of the 2019 ACM SIGSAC Confer- ence on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 241–257. ACM,
work page 2019
-
[31]
Federated Unlearning via Class-Discriminative Pruning
Junxiao Wang, Song Guo, Xin Xie, and Heng Qi. Federated Unlearning via Class-Discriminative Pruning. In Proceed- ings of the ACM Web Conference 2022, page 622–632, New York, NY , USA, 2022. Association for Computing Machin- ery. 3
work page 2022
-
[32]
Federated unlearning with knowledge distillation
Chen Wu, Sencun Zhu, and Prasenjit Mitra. Federated unlearning with knowledge distillation. arXiv preprint arXiv:2201.09441, 2022. 2, 3
-
[33]
Segformer: Simple and efficient design for semantic segmentation with transform- ers
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers. Advances in Neural Information Processing Systems , 34:12077–12090, 2021. 5, 4
work page 2021
-
[34]
Machine unlearning: A survey.ACM Computing Surveys, 56(1):1–36, 2023
Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S Yu. Machine unlearning: A survey.ACM Computing Surveys, 56(1):1–36, 2023. 1
work page 2023
-
[35]
Dezhong Yao, Wanning Pan, Yutong Dai, Yao Wan, Xi- aofeng Ding, Hai Jin, Zheng Xu, and Lichao Sun. Local- global knowledge distillation in heterogeneous federated learning with non-iid data.arXiv preprint arXiv:2107.00051,
-
[36]
Privacy risk in machine learning: Analyzing the connection to overfitting
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer se- curity foundations symposium (CSF), pages 268–282. IEEE,
work page 2018
-
[37]
6, 2 FedQUIT: On-Device Federated Unlearning via a Quasi-Competent Virtual Teacher Supplementary Material
-
[38]
Inspiring Observations We aim to design an FU method that operates on-device and fully adheres to the FL privacy requirements. This entails that the method would have direct access only to the un- learning client’s data, while the rest of the data in the fed- eration (the retain data) would not be available for use in the unlearning algorithms, except via...
-
[39]
Similarly Inspired Work in FL The mechanisms that we present in this paper use a student- teacher framework locally at FL clients to retain the good knowledge from the original model while selectively scrub- bing the contributions to forget. We use a crafted version of the FL global model as the teacher, serving as a natural proxy for the retain data that...
-
[40]
Extended Description of Table 1 Historical information. This refers to whether a method requires storing and accessing historical data, such as the complete history of per-client updates, which is typically maintained by the parameter server. Note that: (1) Link- ing per-client update histories to specific clients requesting unlearning undermines FL’s pri...
-
[41]
Infrastructure and Libraries We run all the experiments on a machine with Ubuntu 22.04, equipped with 64 GB of RAM and one NVIDIA RTX A5000 as GPU (32GB memory). We developed the code with Python and with Python libraries; in our code repository, we provide the instructions to exactly reproduce our Python environment
-
[42]
Membership Inference Attacks In this section, we briefly describe Shokri’s attack and Yeom’s attack that we use in the experimental results. In general, MIA metrics reflect the information leakage of training algorithms about individual members of the train- ing corpus. A lower MIA success rate implies less informa- tion about Du in wu. Song’s MIA [29]. T...
-
[43]
Baselines in Experiments Natural baseline (federated baseline): During the un- learning routine, there is no explicit unlearning. The un- learned model has the exact same model parameters of the 0 1 2 3 4 5 6 7 8 99 Client 0 1 2 3 4 5 6 7 8 9 Label 0 800 1600 2400 3200 (a) CIFAR-10 (Non-IID, α = 0.3). 0 1 2 3 4 5 6 7 8 99 Client 0 10 20 30 40 50 60 70 80 ...
-
[44]
Hyper-parameter Tuning and Pre- processing In this Section, we report the hyper-parameter tuning of the various methods we used. 14.1. Regular Training (Federated Settings) CIFAR-10/CIFAR-100 Data Distribution. Figure 6 shows the label distribution across clients for the federated CIFAR-10 and CIFAR-100. ResNet-18 on CIFAR-10/CIFAR-100. We used a stan- da...
-
[45]
Further Experimental Results In this Section, we include further experimental results that are mentioned in the main paper but excluded for the sake of space. 0 2 4 6 8 Recovery Rounds 0 10 20 30 40 50 60 70 80T est Accuracy (%) Original Model Retrained Model FedQUIT PGA 0 2 4 6 8 Recovery Rounds 0 10 20 30 40 50 60 70 80T est Accuracy (%) Original Model ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.