FedQUIT: On-Device Federated Unlearning via a Quasi-Competent Virtual Teacher

Alessio Mora; Andrea Passarella; Lorenzo Valerio; Paolo Bellavista

arxiv: 2408.07587 · v4 · submitted 2024-08-14 · 💻 cs.LG · cs.DC

FedQUIT: On-Device Federated Unlearning via a Quasi-Competent Virtual Teacher

Alessio Mora , Lorenzo Valerio , Paolo Bellavista , Andrea Passarella This is my paper

Pith reviewed 2026-05-23 21:46 UTC · model grok-4.3

classification 💻 cs.LG cs.DC

keywords federated learningmachine unlearningknowledge distillationon-device computationdata privacyFedAvg protocolforget data

0 comments

The pith

FedQUIT lets clients unlearn their data on-device in federated learning by distilling from a modified global model without extra protocol assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FedQUIT, a method for on-device unlearning in federated learning that lets a client remove the influence of its own data from the shared global model. It does this through knowledge distillation where the client model learns from a virtual teacher created by altering the global model's output probabilities on the client's forget data. The alteration lowers confidence in the true labels while keeping the relative ordering among the other classes intact. This approach requires no changes to the standard FedAvg aggregation rule and produces unlearning results that match or beat six existing methods while cutting communication and compute costs versus full retraining from scratch.

Core claim

FedQUIT achieves unlearning in federated learning by having the requesting client use a virtual teacher obtained by manipulating the global model's outputs on forget data—penalizing true-class confidence while preserving non-true class relationships—to train its local model via knowledge distillation, thereby removing its data's influence without additional assumptions beyond FedAvg.

What carries the argument

The quasi-competent virtual teacher created by selective output manipulation on forget data inside a teacher-student distillation loop where the client's local model is the student.

Load-bearing premise

Penalizing true-class confidence on forget data while preserving non-true class relationships in the global model is enough to make the client model forget without harming its overall generalization under standard FedAvg.

What would settle it

A centralized evaluation showing that the updated global model still achieves high accuracy when tested on the forget client's data after FedQUIT completes would falsify the unlearning claim.

Figures

Figures reproduced from arXiv: 2408.07587 by Alessio Mora, Andrea Passarella, Lorenzo Valerio, Paolo Bellavista.

**Figure 2.** Figure 2: FedQUIT-Logits. datasets supposed to exist. Next, we explain our original approaches in full detail. FedQUIT-Logits. Indicating (for ease of notation) the global model at round t as wt and its output probability as gt(x), we design a modified output probability g ′ t (x). Without loss of generality, we will omit the t index to simplify notation. FedQUIT-Logits sets to a fixed value of v the trueclass log… view at source ↗

**Figure 3.** Figure 3: Test accuracy degradation after unlearning a client’s data [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of FedQUIT and PGA performance across settings. A smaller polygon indicates better unlearning effectiveness. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Test Accuracy (Left) and Forget Accuracy (Right) for a representative client on CIFAR-100, ResNet-18, Non-IID, E = 1. FedQUIT minimizes test accuracy loss, demonstrating more selective removal of client contributions. Performance consistency during the recovery phase. Due to its lower initial degradation, FedQUIT maintains a more functional global model throughout the recovery phase [PITH_FULL_IMAGE:figur… view at source ↗

**Figure 6.** Figure 6: Label distribution across clients (0-9) for CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Test Accuracy (Left) and Forget Accuracy (Right) for a representative client on CIFAR-100, ResNet-18, Non-IID, E = 1. 0 5 10 15 Recovery Rounds 0 10 20 30 40 50 60 70 80 Test Accuracy (%) Original Model Retrained Model FedQUIT PGA 0 5 10 15 Recovery Rounds 0 10 20 30 40 50 60 70 80 Forget Accuracy (%) [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Test Accuracy (Left) and Forget Accuracy (Right) for a representative. Setting: ResNet-18, CIFAR-100, IID, E = 1. MiT-B0 on CIFAR-100. FedQUIT (all variants), Incompetent Teacher: 1 unlearning epochs, learning rate 5e-4, AdamW optimizer, local batch size 32. PGA [12]: we report in [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Test Accuracy (Left) and Forget Accuracy (Right) for a representative client. Setting: miT-B0, CIFAR-100, Non-IID, E = 1. Algorithm Rounds (↓) CR (↑) Test Acc. Forget Acc. MIA [29] Original 75.03 ±0.00 84.25 ±5.63 77.03 ±8.57 Retrained 73.30 ±0.78 57.80 ±5.29 48.59 ±4.96 PGA [12] 6.9 ±4.28 7.2× 73.60 ±0.36 62.15 (4.35 ±3.28) 53.57 (4.98 ±3.67) FedQUIT 8.78 ±2.54 5.7× 73.42 ±0.89 58.50 (1.86 ±1.46) 48.86 (2… view at source ↗

**Figure 10.** Figure 10: report the degradation after unlearning and before recovery for the centralized setting. Natural baseline (federated) results [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

read the original abstract

Federated Learning (FL) enables the collaborative training of machine learning models without requiring centralized collection of user data. To comply with the right to be forgotten, FL clients should be able to request the removal of their data contributions from the global model. In this paper, we propose FedQUIT, a novel unlearning algorithm that operates directly on client devices that request to remove its contribution. Our method leverages knowledge distillation to remove the influence of the target client's data from the global model while preserving its generalization ability. FedQUIT adopts a teacher-student framework, where a modified version of the current global model serves as a virtual teacher and the client's model acts as the student. The virtual teacher is obtained by manipulating the global model's outputs on forget data, penalizing the confidence assigned to the true class while preserving relationships among outputs of non-true classes, to simultaneously induce forgetting and retain useful knowledge. As a result, FedQUIT achieves unlearning without making any additional assumption over the standard FedAvg protocol. Evaluation across diverse datasets, data heterogeneity levels, and model architectures shows that FedQUIT achieves superior or comparable unlearning efficacy compared to six state-of-the-art methods, while significantly reducing cumulative communication and computational overhead relative to retraining from scratch.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedQUIT gives a workable on-device unlearning trick by tweaking only the true-class logit in a virtual teacher, with experiments showing lower overhead than retraining and results at least as good as six baselines.

read the letter

The core idea here is a teacher-student setup where the global model is lightly altered on forget samples—penalize the true-class output but leave the rest of the logit vector alone—then distill that into the client's local model. This runs entirely on the device requesting deletion and sticks to plain FedAvg with no protocol changes. That is the actual increment over prior unlearning work in FL: a narrow, targeted manipulation instead of full retraining or extra assumptions about client data or server behavior. The experiments cover several datasets, heterogeneity levels, and architectures, and they report the method matches or exceeds the six compared approaches while cutting cumulative communication and compute versus scratch retraining. That combination of on-device operation and measured overhead reduction is the practical part worth noting. The justification for the logit tweak itself stays heuristic; there is no derivation showing why preserving non-true-class relationships is enough to excise one client's influence after aggregation, especially when non-IID data means the global outputs already mix contributions. The stress-test worry about residual influence under heterogeneity is reasonable on paper, but the reported results across heterogeneous partitions suggest the heuristic holds up in the tested regimes. If the full experiments include proper controls for statistical significance and clear definitions of the manipulation strength, the empirical case is solid enough for the claim. This is the kind of incremental systems paper that people building privacy-compliant FL deployments will want to read. It is not foundational, but the concrete mechanism and the cost numbers make it worth a referee's time rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper proposes FedQUIT, an on-device federated unlearning algorithm for the right to be forgotten in FL. It uses a teacher-student knowledge distillation setup where the client's local model is the student and a modified version of the current global model serves as a virtual teacher; the teacher is created by manipulating outputs on forget data to penalize true-class confidence while preserving relative non-true class outputs. The method claims to achieve effective unlearning under the standard FedAvg protocol with no extra assumptions, and reports superior or comparable unlearning efficacy to six SOTA baselines across datasets, heterogeneity levels, and architectures, while cutting cumulative communication and compute relative to retraining from scratch.

Significance. If the core heuristic is shown to reliably excise client influence without degrading retain-set performance or requiring protocol changes, the result would be significant for practical deployment of unlearning in federated systems, as it avoids the high cost of full retraining and operates locally on the requesting client.

major comments (2)

[Abstract / §3] Abstract (and §3, virtual-teacher construction): the central claim that FedQUIT requires 'no additional assumption over the standard FedAvg protocol' is load-bearing, yet the manuscript provides no derivation or bound showing why penalizing only the true-class logit on forget samples (while keeping non-true relationships) suffices to remove the client's contribution after aggregation; under non-IID FedAvg the global outputs on a single client's forget set may already be entangled with other clients' data, so the heuristic risks leaving residual influence or harming generalization on retain data.
[Evaluation] Evaluation sections: the reported superiority over six baselines is presented without explicit controls for the exact definition of the manipulation parameter, statistical significance across runs, or ablation isolating the effect of the non-true-class preservation step; without these, it is unclear whether the claimed reduction in overhead is robust or an artifact of the chosen heterogeneity levels.

minor comments (2)

[§3] Notation for the output manipulation (e.g., how the penalized logits are exactly computed and scaled) should be formalized with an equation rather than prose description.
[Related Work] The paper should include a short related-work table contrasting FedQUIT's on-device, no-extra-assumption property against the six baselines.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback. We address each major comment below.

read point-by-point responses

Referee: [Abstract / §3] Abstract (and §3, virtual-teacher construction): the central claim that FedQUIT requires 'no additional assumption over the standard FedAvg protocol' is load-bearing, yet the manuscript provides no derivation or bound showing why penalizing only the true-class logit on forget samples (while keeping non-true relationships) suffices to remove the client's contribution after aggregation; under non-IID FedAvg the global outputs on a single client's forget set may already be entangled with other clients' data, so the heuristic risks leaving residual influence or harming generalization on retain data.

Authors: FedQUIT performs unlearning entirely locally on the requesting client using only the global model received under standard FedAvg, with no protocol changes or extra information required from the server. The virtual teacher is constructed by a local manipulation that reduces true-class confidence on forget samples while preserving relative outputs among non-true classes; this is presented as a practical heuristic rather than a theoretically bounded procedure. Experiments across non-IID partitions show effective unlearning without retain-set degradation, supporting that the approach does not introduce new assumptions. We will revise §3 and the discussion to clarify the heuristic motivation and explicitly note the absence of a formal bound on residual influence. revision: partial
Referee: [Evaluation] Evaluation sections: the reported superiority over six baselines is presented without explicit controls for the exact definition of the manipulation parameter, statistical significance across runs, or ablation isolating the effect of the non-true-class preservation step; without these, it is unclear whether the claimed reduction in overhead is robust or an artifact of the chosen heterogeneity levels.

Authors: We agree that the current evaluation would benefit from these controls. The revised manuscript will report the precise manipulation parameter values, include mean and standard deviation over multiple independent runs to establish statistical significance, and add an ablation isolating the non-true-class preservation component. These additions will demonstrate that the overhead reductions hold across the tested heterogeneity levels. revision: yes

standing simulated objections not resolved

A formal derivation or bound establishing that the local heuristic suffices to excise client influence after aggregation under non-IID FedAvg.

Circularity Check

0 steps flagged

No load-bearing circularity; unlearning heuristic presented as independent of fitted inputs or self-citations

full rationale

The paper claims FedQUIT works under unmodified FedAvg with a virtual teacher obtained by output manipulation on forget data. No equations, self-citations, or ansatzes are shown reducing the unlearning efficacy or the 'no additional assumptions' claim to a quantity defined inside the paper. Evaluation uses external baselines and diverse datasets. This matches the default non-circular outcome for a method whose central step is a stated heuristic rather than a derived identity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of the described output manipulation under standard federated averaging assumptions; no explicit free parameters or invented entities are named in the abstract, though the manipulation itself likely involves at least one tunable penalty strength that is not detailed here.

axioms (1)

domain assumption Standard assumptions of the FedAvg protocol are sufficient for the unlearning procedure to succeed without further modeling choices.
The abstract explicitly states that FedQUIT makes no additional assumptions over FedAvg.

pith-pipeline@v0.9.0 · 5755 in / 1482 out tokens · 47738 ms · 2026-05-23T21:46:46.693216+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Asynchronous Federated Unlearning with Invariance Calibration for Medical Imaging
cs.LG 2026-04 unverdicted novelty 5.0

AFU-IC decouples client unlearning from global federated training in medical imaging and adds server-side invariance calibration to prevent relearning of erased data.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Get rid of your trail: Remotely erasing backdoors in federated learning

Manaar Alam, Hithem Lamri, and Michail Maniatakos. Get rid of your trail: Remotely erasing backdoors in federated learning. arXiv preprint arXiv:2304.10638, 2023. 2

work page arXiv 2023
[2]

Decen- tralised Learning in Federated Deployment Environments: A System-Level Survey

Paolo Bellavista, Luca Foschini, and Alessio Mora. Decen- tralised Learning in Federated Deployment Environments: A System-Level Survey. ACM Computing Surveys (CSUR), 54 (1):1–38, 2021. 1

work page 2021
[3]

Model compression

Cristian Bucilu ˇa, Rich Caruana, and Alexandru Niculescu- Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 535–541, 2006. 2

work page 2006
[4]

Fedrecover: Recovering from poisoning attacks in federated learning using historical information

Xiaoyu Cao, Jinyuan Jia, Zaixi Zhang, and Neil Zhenqiang Gong. Fedrecover: Recovering from poisoning attacks in federated learning using historical information. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1366– 1383, 2023. 2

work page 2023
[5]

Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher

Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 7210–7217, 2023. 2, 8, 3, 4, 6, 7

work page 2023
[6]

Large scale distributed deep networks

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc’aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, et al. Large scale distributed deep networks. In Advances in neural information process- ing systems, pages 1223–1231, 2012. 2

work page 2012
[7]

Regulation (EU) 2016/679 of the European Parliament and of the Council, 2016

European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council, 2016. 1

work page 2016
[8]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 9304– 9312, 2020. 1

work page 2020
[9]

Ferrari: federated feature unlearning via optimizing feature sensitivity

Hanlin Gu, WinKent Ong, Chee Seng Chan, and Lixin Fan. Ferrari: federated feature unlearning via optimizing feature sensitivity. Advances in Neural Information Processing Sys- tems, 37:24150–24180, 2025. 3

work page 2025
[10]

Not all minorities are equal: Empty- class-aware distillation for heterogeneous federated learning

Kuangpu Guo, Yuhe Ding, Jian Liang, Ran He, Zilei Wang, and Tieniu Tan. Not all minorities are equal: Empty- class-aware distillation for heterogeneous federated learning. arXiv preprint arXiv:2401.02329, 2024. 1, 2

work page arXiv 2024
[11]

FAST: Adopt- ing Federated Unlearning to Eliminating Malicious Termi- nals at Server Side

Xintong Guo, Pengfei Wang, Sen Qiu, Wei Song, Qiang Zhang, Xiaopeng Wei, and Dongsheng Zhou. FAST: Adopt- ing Federated Unlearning to Eliminating Malicious Termi- nals at Server Side. IEEE Transactions on Network Science and Engineering, pages 1–14, 2023. 2

work page 2023
[12]

Federated unlearning: How to effi- ciently erase a client in fl? arXiv preprint arXiv:2207.05521,

Anisa Halimi, Swanand Kadhe, Ambrish Rawat, and Nathalie Baracaldo. Federated unlearning: How to effi- ciently erase a client in fl? arXiv preprint arXiv:2207.05521,

work page arXiv
[13]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In Proc. of IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 770–778, 2016. 5, 4

work page 2016
[14]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. 4, 2

work page internal anchor Pith review Pith/arXiv arXiv 2015
[15]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. Mea- suring the effects of non-identical data distribution for feder- ated visual classification. arXiv preprint arXiv:1909.06335,

work page internal anchor Pith review Pith/arXiv arXiv 1909
[16]

Advances and open problems in federated learn- ing

Peter Kairouz, H Brendan McMahan, Brendan Avent, Aur´elien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cum- mings, et al. Advances and open problems in federated learn- ing. Foundations and trends® in machine learning, 14(1–2): 1–210, 2021. 1, 2

work page 2021
[17]

Multi-level branched regularization for federated learning

Jinkyu Kim, Geeho Kim, and Bohyung Han. Multi-level branched regularization for federated learning. In Inter- national Conference on Machine Learning , pages 11058– 11073. PMLR, 2022. 4

work page 2022
[18]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009. 5

work page 2009
[19]

Preservation of the global knowledge by not- true distillation in federated learning

Gihun Lee, Minchan Jeong, Yongjin Shin, Sangmin Bae, and Se-Young Yun. Preservation of the global knowledge by not- true distillation in federated learning. In Advances in Neural Information Processing Systems, 2022. 1, 2

work page 2022
[20]

Federaser: Enabling efficient client-level data removal from federated learning models

Gaoyang Liu, Xiaoqiang Ma, Yang Yang, Chen Wang, and Jiangchuan Liu. Federaser: Enabling efficient client-level data removal from federated learning models. In 2021 IEEE/ACM 29th International Symposium on Quality of Ser- vice (IWQOS), pages 1–10, 2021. 2, 3

work page 2021
[21]

Model spar- sity can simplify machine unlearning

Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, PRANAY SHARMA, Sijia Liu, et al. Model spar- sity can simplify machine unlearning. Advances in Neural Information Processing Systems, 36, 2024. 6

work page 2024
[22]

The right to be forgotten in federated learning: An efficient real- ization with rapid retraining

Yi Liu, Lei Xu, Xingliang Yuan, Cong Wang, and Bo Li. The right to be forgotten in federated learning: An efficient real- ization with rapid retraining. InIEEE INFOCOM 2022-IEEE Conference on Computer Communications , pages 1749–

work page 2022
[23]

Federated learning with label- masking distillation

Jianghu Lu, Shikun Li, Kexin Bao, Pengju Wang, Zhenx- ing Qian, and Shiming Ge. Federated learning with label- masking distillation. In Proceedings of the 31st ACM Inter- national Conference on Multimedia , pages 222–232, 2023. 1, 2

work page 2023
[24]

Communication- efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics , pages 1273–1282. PMLR, 2017. 1, 2, 4

work page 2017
[25]

Knowledge distillation in federated learning: A prac- tical guide

Alessio Mora, Irene Tenison, Paolo Bellavista, and Irina Rish. Knowledge distillation in federated learning: A prac- tical guide. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, pages 8188–8196. International Joint Conferences on Artificial In- telligence Organization, 2024. Survey Track. 2

work page 2024
[26]

Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Kone ˇcn´y, Sanjiv Kumar, and Hugh Brendan McMahan

Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Kone ˇcn´y, Sanjiv Kumar, and Hugh Brendan McMahan. Adaptive federated optimization. In 9th International Conference on Learning Representa- tions, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021. 3

work page 2021
[27]

Federated unlearning: A survey on methods, design guidelines, and evaluation met- rics

Nicol `o Romandini, Alessio Mora, Carlo Mazzocca, Rebecca Montanari, and Paolo Bellavista. Federated unlearning: A survey on methods, design guidelines, and evaluation met- rics. IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2024. 2, 3, 6, 7

work page 2024
[28]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2017. 1

work page 2017
[29]

Systematic evaluation of pri- vacy risks of machine learning models

Liwei Song and Prateek Mittal. Systematic evaluation of pri- vacy risks of machine learning models. In 30th USENIX Se- curity Symposium (USENIX Security 21), pages 2615–2632,

work page
[30]

Privacy risks of securing machine learning models against adversarial ex- amples

Liwei Song, Reza Shokri, and Prateek Mittal. Privacy risks of securing machine learning models against adversarial ex- amples. In Proceedings of the 2019 ACM SIGSAC Confer- ence on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 241–257. ACM,

work page 2019
[31]

Federated Unlearning via Class-Discriminative Pruning

Junxiao Wang, Song Guo, Xin Xie, and Heng Qi. Federated Unlearning via Class-Discriminative Pruning. In Proceed- ings of the ACM Web Conference 2022, page 622–632, New York, NY , USA, 2022. Association for Computing Machin- ery. 3

work page 2022
[32]

Federated unlearning with knowledge distillation

Chen Wu, Sencun Zhu, and Prasenjit Mitra. Federated unlearning with knowledge distillation. arXiv preprint arXiv:2201.09441, 2022. 2, 3

work page arXiv 2022
[33]

Segformer: Simple and efficient design for semantic segmentation with transform- ers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers. Advances in Neural Information Processing Systems , 34:12077–12090, 2021. 5, 4

work page 2021
[34]

Machine unlearning: A survey.ACM Computing Surveys, 56(1):1–36, 2023

Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S Yu. Machine unlearning: A survey.ACM Computing Surveys, 56(1):1–36, 2023. 1

work page 2023
[35]

Local- global knowledge distillation in heterogeneous federated learning with non-iid data.arXiv preprint arXiv:2107.00051,

Dezhong Yao, Wanning Pan, Yutong Dai, Yao Wan, Xi- aofeng Ding, Hai Jin, Zheng Xu, and Lichao Sun. Local- global knowledge distillation in heterogeneous federated learning with non-iid data.arXiv preprint arXiv:2107.00051,

work page arXiv
[36]

Privacy risk in machine learning: Analyzing the connection to overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer se- curity foundations symposium (CSF), pages 268–282. IEEE,

work page 2018
[37]

6, 2 FedQUIT: On-Device Federated Unlearning via a Quasi-Competent Virtual Teacher Supplementary Material

work page
[38]

Inspiring Observations We aim to design an FU method that operates on-device and fully adheres to the FL privacy requirements. This entails that the method would have direct access only to the un- learning client’s data, while the rest of the data in the fed- eration (the retain data) would not be available for use in the unlearning algorithms, except via...

work page
[39]

We use a crafted version of the FL global model as the teacher, serving as a natural proxy for the retain data that cannot be directly accessed in FL

Similarly Inspired Work in FL The mechanisms that we present in this paper use a student- teacher framework locally at FL clients to retain the good knowledge from the original model while selectively scrub- bing the contributions to forget. We use a crafted version of the FL global model as the teacher, serving as a natural proxy for the retain data that...

work page
[40]

Extended Description of Table 1 Historical information. This refers to whether a method requires storing and accessing historical data, such as the complete history of per-client updates, which is typically maintained by the parameter server. Note that: (1) Link- ing per-client update histories to specific clients requesting unlearning undermines FL’s pri...

work page
[41]

We developed the code with Python and with Python libraries; in our code repository, we provide the instructions to exactly reproduce our Python environment

Infrastructure and Libraries We run all the experiments on a machine with Ubuntu 22.04, equipped with 64 GB of RAM and one NVIDIA RTX A5000 as GPU (32GB memory). We developed the code with Python and with Python libraries; in our code repository, we provide the instructions to exactly reproduce our Python environment

work page
[42]

In general, MIA metrics reflect the information leakage of training algorithms about individual members of the train- ing corpus

Membership Inference Attacks In this section, we briefly describe Shokri’s attack and Yeom’s attack that we use in the experimental results. In general, MIA metrics reflect the information leakage of training algorithms about individual members of the train- ing corpus. A lower MIA success rate implies less informa- tion about Du in wu. Song’s MIA [29]. T...

work page
[43]

The un- learned model has the exact same model parameters of the 0 1 2 3 4 5 6 7 8 99 Client 0 1 2 3 4 5 6 7 8 9 Label 0 800 1600 2400 3200 (a) CIFAR-10 (Non-IID, α = 0.3)

Baselines in Experiments Natural baseline (federated baseline): During the un- learning routine, there is no explicit unlearning. The un- learned model has the exact same model parameters of the 0 1 2 3 4 5 6 7 8 99 Client 0 1 2 3 4 5 6 7 8 9 Label 0 800 1600 2400 3200 (a) CIFAR-10 (Non-IID, α = 0.3). 0 1 2 3 4 5 6 7 8 99 Client 0 10 20 30 40 50 60 70 80 ...

work page
[44]

Hyper-parameter Tuning and Pre- processing In this Section, we report the hyper-parameter tuning of the various methods we used. 14.1. Regular Training (Federated Settings) CIFAR-10/CIFAR-100 Data Distribution. Figure 6 shows the label distribution across clients for the federated CIFAR-10 and CIFAR-100. ResNet-18 on CIFAR-10/CIFAR-100. We used a stan- da...

work page
[45]

Further Experimental Results In this Section, we include further experimental results that are mentioned in the main paper but excluded for the sake of space. 0 2 4 6 8 Recovery Rounds 0 10 20 30 40 50 60 70 80T est Accuracy (%) Original Model Retrained Model FedQUIT PGA 0 2 4 6 8 Recovery Rounds 0 10 20 30 40 50 60 70 80T est Accuracy (%) Original Model ...

work page

[1] [1]

Get rid of your trail: Remotely erasing backdoors in federated learning

Manaar Alam, Hithem Lamri, and Michail Maniatakos. Get rid of your trail: Remotely erasing backdoors in federated learning. arXiv preprint arXiv:2304.10638, 2023. 2

work page arXiv 2023

[2] [2]

Decen- tralised Learning in Federated Deployment Environments: A System-Level Survey

Paolo Bellavista, Luca Foschini, and Alessio Mora. Decen- tralised Learning in Federated Deployment Environments: A System-Level Survey. ACM Computing Surveys (CSUR), 54 (1):1–38, 2021. 1

work page 2021

[3] [3]

Model compression

Cristian Bucilu ˇa, Rich Caruana, and Alexandru Niculescu- Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 535–541, 2006. 2

work page 2006

[4] [4]

Fedrecover: Recovering from poisoning attacks in federated learning using historical information

Xiaoyu Cao, Jinyuan Jia, Zaixi Zhang, and Neil Zhenqiang Gong. Fedrecover: Recovering from poisoning attacks in federated learning using historical information. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1366– 1383, 2023. 2

work page 2023

[5] [5]

Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher

Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 7210–7217, 2023. 2, 8, 3, 4, 6, 7

work page 2023

[6] [6]

Large scale distributed deep networks

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc’aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, et al. Large scale distributed deep networks. In Advances in neural information process- ing systems, pages 1223–1231, 2012. 2

work page 2012

[7] [7]

Regulation (EU) 2016/679 of the European Parliament and of the Council, 2016

European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council, 2016. 1

work page 2016

[8] [8]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 9304– 9312, 2020. 1

work page 2020

[9] [9]

Ferrari: federated feature unlearning via optimizing feature sensitivity

Hanlin Gu, WinKent Ong, Chee Seng Chan, and Lixin Fan. Ferrari: federated feature unlearning via optimizing feature sensitivity. Advances in Neural Information Processing Sys- tems, 37:24150–24180, 2025. 3

work page 2025

[10] [10]

Not all minorities are equal: Empty- class-aware distillation for heterogeneous federated learning

Kuangpu Guo, Yuhe Ding, Jian Liang, Ran He, Zilei Wang, and Tieniu Tan. Not all minorities are equal: Empty- class-aware distillation for heterogeneous federated learning. arXiv preprint arXiv:2401.02329, 2024. 1, 2

work page arXiv 2024

[11] [11]

FAST: Adopt- ing Federated Unlearning to Eliminating Malicious Termi- nals at Server Side

Xintong Guo, Pengfei Wang, Sen Qiu, Wei Song, Qiang Zhang, Xiaopeng Wei, and Dongsheng Zhou. FAST: Adopt- ing Federated Unlearning to Eliminating Malicious Termi- nals at Server Side. IEEE Transactions on Network Science and Engineering, pages 1–14, 2023. 2

work page 2023

[12] [12]

Federated unlearning: How to effi- ciently erase a client in fl? arXiv preprint arXiv:2207.05521,

Anisa Halimi, Swanand Kadhe, Ambrish Rawat, and Nathalie Baracaldo. Federated unlearning: How to effi- ciently erase a client in fl? arXiv preprint arXiv:2207.05521,

work page arXiv

[13] [13]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In Proc. of IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 770–778, 2016. 5, 4

work page 2016

[14] [14]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. 4, 2

work page internal anchor Pith review Pith/arXiv arXiv 2015

[15] [15]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. Mea- suring the effects of non-identical data distribution for feder- ated visual classification. arXiv preprint arXiv:1909.06335,

work page internal anchor Pith review Pith/arXiv arXiv 1909

[16] [16]

Advances and open problems in federated learn- ing

Peter Kairouz, H Brendan McMahan, Brendan Avent, Aur´elien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cum- mings, et al. Advances and open problems in federated learn- ing. Foundations and trends® in machine learning, 14(1–2): 1–210, 2021. 1, 2

work page 2021

[17] [17]

Multi-level branched regularization for federated learning

Jinkyu Kim, Geeho Kim, and Bohyung Han. Multi-level branched regularization for federated learning. In Inter- national Conference on Machine Learning , pages 11058– 11073. PMLR, 2022. 4

work page 2022

[18] [18]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009. 5

work page 2009

[19] [19]

Preservation of the global knowledge by not- true distillation in federated learning

Gihun Lee, Minchan Jeong, Yongjin Shin, Sangmin Bae, and Se-Young Yun. Preservation of the global knowledge by not- true distillation in federated learning. In Advances in Neural Information Processing Systems, 2022. 1, 2

work page 2022

[20] [20]

Federaser: Enabling efficient client-level data removal from federated learning models

Gaoyang Liu, Xiaoqiang Ma, Yang Yang, Chen Wang, and Jiangchuan Liu. Federaser: Enabling efficient client-level data removal from federated learning models. In 2021 IEEE/ACM 29th International Symposium on Quality of Ser- vice (IWQOS), pages 1–10, 2021. 2, 3

work page 2021

[21] [21]

Model spar- sity can simplify machine unlearning

Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, PRANAY SHARMA, Sijia Liu, et al. Model spar- sity can simplify machine unlearning. Advances in Neural Information Processing Systems, 36, 2024. 6

work page 2024

[22] [22]

The right to be forgotten in federated learning: An efficient real- ization with rapid retraining

Yi Liu, Lei Xu, Xingliang Yuan, Cong Wang, and Bo Li. The right to be forgotten in federated learning: An efficient real- ization with rapid retraining. InIEEE INFOCOM 2022-IEEE Conference on Computer Communications , pages 1749–

work page 2022

[23] [23]

Federated learning with label- masking distillation

Jianghu Lu, Shikun Li, Kexin Bao, Pengju Wang, Zhenx- ing Qian, and Shiming Ge. Federated learning with label- masking distillation. In Proceedings of the 31st ACM Inter- national Conference on Multimedia , pages 222–232, 2023. 1, 2

work page 2023

[24] [24]

Communication- efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics , pages 1273–1282. PMLR, 2017. 1, 2, 4

work page 2017

[25] [25]

Knowledge distillation in federated learning: A prac- tical guide

Alessio Mora, Irene Tenison, Paolo Bellavista, and Irina Rish. Knowledge distillation in federated learning: A prac- tical guide. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, pages 8188–8196. International Joint Conferences on Artificial In- telligence Organization, 2024. Survey Track. 2

work page 2024

[26] [26]

Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Kone ˇcn´y, Sanjiv Kumar, and Hugh Brendan McMahan

Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Kone ˇcn´y, Sanjiv Kumar, and Hugh Brendan McMahan. Adaptive federated optimization. In 9th International Conference on Learning Representa- tions, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021. 3

work page 2021

[27] [27]

Federated unlearning: A survey on methods, design guidelines, and evaluation met- rics

Nicol `o Romandini, Alessio Mora, Carlo Mazzocca, Rebecca Montanari, and Paolo Bellavista. Federated unlearning: A survey on methods, design guidelines, and evaluation met- rics. IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2024. 2, 3, 6, 7

work page 2024

[28] [28]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2017. 1

work page 2017

[29] [29]

Systematic evaluation of pri- vacy risks of machine learning models

Liwei Song and Prateek Mittal. Systematic evaluation of pri- vacy risks of machine learning models. In 30th USENIX Se- curity Symposium (USENIX Security 21), pages 2615–2632,

work page

[30] [30]

Privacy risks of securing machine learning models against adversarial ex- amples

Liwei Song, Reza Shokri, and Prateek Mittal. Privacy risks of securing machine learning models against adversarial ex- amples. In Proceedings of the 2019 ACM SIGSAC Confer- ence on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 241–257. ACM,

work page 2019

[31] [31]

Federated Unlearning via Class-Discriminative Pruning

Junxiao Wang, Song Guo, Xin Xie, and Heng Qi. Federated Unlearning via Class-Discriminative Pruning. In Proceed- ings of the ACM Web Conference 2022, page 622–632, New York, NY , USA, 2022. Association for Computing Machin- ery. 3

work page 2022

[32] [32]

Federated unlearning with knowledge distillation

Chen Wu, Sencun Zhu, and Prasenjit Mitra. Federated unlearning with knowledge distillation. arXiv preprint arXiv:2201.09441, 2022. 2, 3

work page arXiv 2022

[33] [33]

Segformer: Simple and efficient design for semantic segmentation with transform- ers

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers. Advances in Neural Information Processing Systems , 34:12077–12090, 2021. 5, 4

work page 2021

[34] [34]

Machine unlearning: A survey.ACM Computing Surveys, 56(1):1–36, 2023

Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S Yu. Machine unlearning: A survey.ACM Computing Surveys, 56(1):1–36, 2023. 1

work page 2023

[35] [35]

Local- global knowledge distillation in heterogeneous federated learning with non-iid data.arXiv preprint arXiv:2107.00051,

Dezhong Yao, Wanning Pan, Yutong Dai, Yao Wan, Xi- aofeng Ding, Hai Jin, Zheng Xu, and Lichao Sun. Local- global knowledge distillation in heterogeneous federated learning with non-iid data.arXiv preprint arXiv:2107.00051,

work page arXiv

[36] [36]

Privacy risk in machine learning: Analyzing the connection to overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer se- curity foundations symposium (CSF), pages 268–282. IEEE,

work page 2018

[37] [37]

6, 2 FedQUIT: On-Device Federated Unlearning via a Quasi-Competent Virtual Teacher Supplementary Material

work page

[38] [38]

Inspiring Observations We aim to design an FU method that operates on-device and fully adheres to the FL privacy requirements. This entails that the method would have direct access only to the un- learning client’s data, while the rest of the data in the fed- eration (the retain data) would not be available for use in the unlearning algorithms, except via...

work page

[39] [39]

We use a crafted version of the FL global model as the teacher, serving as a natural proxy for the retain data that cannot be directly accessed in FL

Similarly Inspired Work in FL The mechanisms that we present in this paper use a student- teacher framework locally at FL clients to retain the good knowledge from the original model while selectively scrub- bing the contributions to forget. We use a crafted version of the FL global model as the teacher, serving as a natural proxy for the retain data that...

work page

[40] [40]

Extended Description of Table 1 Historical information. This refers to whether a method requires storing and accessing historical data, such as the complete history of per-client updates, which is typically maintained by the parameter server. Note that: (1) Link- ing per-client update histories to specific clients requesting unlearning undermines FL’s pri...

work page

[41] [41]

We developed the code with Python and with Python libraries; in our code repository, we provide the instructions to exactly reproduce our Python environment

Infrastructure and Libraries We run all the experiments on a machine with Ubuntu 22.04, equipped with 64 GB of RAM and one NVIDIA RTX A5000 as GPU (32GB memory). We developed the code with Python and with Python libraries; in our code repository, we provide the instructions to exactly reproduce our Python environment

work page

[42] [42]

In general, MIA metrics reflect the information leakage of training algorithms about individual members of the train- ing corpus

Membership Inference Attacks In this section, we briefly describe Shokri’s attack and Yeom’s attack that we use in the experimental results. In general, MIA metrics reflect the information leakage of training algorithms about individual members of the train- ing corpus. A lower MIA success rate implies less informa- tion about Du in wu. Song’s MIA [29]. T...

work page

[43] [43]

The un- learned model has the exact same model parameters of the 0 1 2 3 4 5 6 7 8 99 Client 0 1 2 3 4 5 6 7 8 9 Label 0 800 1600 2400 3200 (a) CIFAR-10 (Non-IID, α = 0.3)

Baselines in Experiments Natural baseline (federated baseline): During the un- learning routine, there is no explicit unlearning. The un- learned model has the exact same model parameters of the 0 1 2 3 4 5 6 7 8 99 Client 0 1 2 3 4 5 6 7 8 9 Label 0 800 1600 2400 3200 (a) CIFAR-10 (Non-IID, α = 0.3). 0 1 2 3 4 5 6 7 8 99 Client 0 10 20 30 40 50 60 70 80 ...

work page

[44] [44]

Hyper-parameter Tuning and Pre- processing In this Section, we report the hyper-parameter tuning of the various methods we used. 14.1. Regular Training (Federated Settings) CIFAR-10/CIFAR-100 Data Distribution. Figure 6 shows the label distribution across clients for the federated CIFAR-10 and CIFAR-100. ResNet-18 on CIFAR-10/CIFAR-100. We used a stan- da...

work page

[45] [45]

Further Experimental Results In this Section, we include further experimental results that are mentioned in the main paper but excluded for the sake of space. 0 2 4 6 8 Recovery Rounds 0 10 20 30 40 50 60 70 80T est Accuracy (%) Original Model Retrained Model FedQUIT PGA 0 2 4 6 8 Recovery Rounds 0 10 20 30 40 50 60 70 80T est Accuracy (%) Original Model ...

work page