Recognition: no theorem link
FedSurrogate: Backdoor Defense in Federated Learning via Layer Criticality and Surrogate Replacement
Pith reviewed 2026-05-13 02:30 UTC · model grok-4.3
The pith
FedSurrogate replaces malicious updates with downscaled surrogates from similar benign clients to defend federated learning against backdoors while keeping false positives below 10 percent under non-IID data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FedSurrogate performs selective clustering on security-critical layers identified via directional divergence analysis, combines it with bidirectional soft-filtering to screen trusted clients and rescue false positives, and replaces confirmed malicious updates with downscaled surrogate updates from structurally similar benign clients, thereby neutralizing adversarial influence while preserving gradient diversity and achieving superior main-task accuracy.
What carries the argument
Surrogate replacement of malicious updates by downscaled versions from structurally similar benign clients, after layer criticality is isolated via directional divergence analysis and bidirectional gradient alignment filtering.
Load-bearing premise
Structurally similar benign clients will always exist and supply surrogate updates that neutralize attacks without degrading the global model, and directional divergence will reliably identify critical layers even under extreme non-IID heterogeneity.
What would settle it
A federated training run on a dataset partition where no benign client shares structural similarity with the malicious ones, checking whether attack success rate rises above 2 percent or false-positive rate exceeds 10 percent.
Figures
read the original abstract
Federated Learning remains highly susceptible to backdoor attacks--malicious clients inject targeted behaviours into the global model. Existing defenses suffer from substantial false-positive rates under realistic non-independent and identically distributed (non-IID) data, incorrectly flagging benign clients and degrading model accuracy even when adversaries are correctly identified. We present FedSurrogate, a novel backdoor defense that addresses this limitation by combining bidirectional gradient alignment filtering with layer-adaptive anomaly detection. FedSurrogate performs selective clustering on security-critical layers identified via directional divergence analysis, concentrating the detection signal on a low-dimensional subspace. A bidirectional soft-filtering stage screens trusted clients for residual contamination while rescuing false positives from suspects, substantially reducing misclassifications under heterogeneous conditions. Rather than removing confirmed malicious updates, FedSurrogate replaces them with downscaled surrogate updates from structurally similar benign clients, preserving gradient diversity while neutralising adversarial influence. Extensive evaluations demonstrate that FedSurrogate maintains false-positive rates below 10% across all datasets and attack types, compared to 31-32% for the nearest comparably effective baseline, while achieving superior main-task accuracy and maintaining attack success rates below 2.1% across all tested datasets and attack types under challenging non-IID settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents FedSurrogate, a backdoor defense for federated learning. It combines bidirectional gradient alignment filtering with layer-adaptive anomaly detection via directional divergence analysis to identify security-critical layers, followed by selective clustering on those layers. Confirmed malicious updates are replaced (rather than removed) with downscaled surrogate updates drawn from structurally similar benign clients, with the goal of reducing false-positive rates under non-IID data while preserving gradient diversity, main-task accuracy, and keeping attack success rates low.
Significance. If the empirical claims hold under rigorous scrutiny, FedSurrogate would constitute a practical advance over existing defenses by lowering false-positive rates from the 31-32% range of the nearest comparably effective baseline to below 10% across datasets and attack types, while reporting superior main-task accuracy and attack success rates below 2.1% even in challenging non-IID regimes. The surrogate-replacement strategy is a distinctive contribution that avoids the accuracy penalty of outright removal; however, its effectiveness rests on the untested premise that sufficiently similar benign clients exist in the critical-layer subspace.
major comments (2)
- [Method (Surrogate Replacement and Bidirectional Gradient Alignment)] The surrogate-replacement step (described in the method as replacing confirmed malicious updates with downscaled surrogates from clients identified via bidirectional gradient alignment and selective clustering on layers flagged by directional divergence) is load-bearing for the central performance claims. The manuscript treats the existence of sufficiently similar benign clients as given across all tested attack types and non-IID regimes, yet provides no explicit quantification of similarity thresholds, no failure-case analysis when client distributions diverge sharply in the critical-layer subspace, and no ablation showing what happens to FPR, ASR, or accuracy when no adequate surrogate is available.
- [Abstract and Experimental Evaluations] The abstract and evaluation summary assert that 'extensive evaluations demonstrate' FPR below 10% and ASR below 2.1% 'across all datasets and attack types' with superior main-task accuracy under non-IID conditions, but the provided text supplies no concrete information on datasets, attack implementations, client counts, non-IID heterogeneity parameters, number of runs, statistical significance tests, or error bars. Without these details the superiority claims cannot be assessed and the comparison to the 31-32% baseline FPR is unverifiable.
minor comments (1)
- [Abstract] The abstract refers to 'challenging non-IID settings' without specifying the degree of heterogeneity (e.g., Dirichlet parameter or label skew) or the concrete datasets, which would aid immediate contextualization of the reported numbers.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. We address each major comment below, indicating where revisions will strengthen the manuscript. We agree that additional details and analyses are needed to fully substantiate the claims.
read point-by-point responses
-
Referee: [Method (Surrogate Replacement and Bidirectional Gradient Alignment)] The surrogate-replacement step (described in the method as replacing confirmed malicious updates with downscaled surrogates from clients identified via bidirectional gradient alignment and selective clustering on layers flagged by directional divergence) is load-bearing for the central performance claims. The manuscript treats the existence of sufficiently similar benign clients as given across all tested attack types and non-IID regimes, yet provides no explicit quantification of similarity thresholds, no failure-case analysis when client distributions diverge sharply in the critical-layer subspace, and no ablation showing what happens to FPR, ASR, or accuracy when no adequate surrogate is available.
Authors: We agree that the current manuscript lacks explicit quantification of similarity thresholds (e.g., cosine similarity in the critical-layer subspace) and dedicated failure-case analysis. In the revision we will add: (1) the precise similarity metric and threshold used for surrogate selection, (2) histograms of similarity scores across all experiments, and (3) an ablation that progressively reduces surrogate availability (by increasing non-IID heterogeneity or removing the most similar clients) and reports resulting changes in FPR, ASR, and main-task accuracy. This will directly test the premise that sufficiently similar benign clients exist and quantify robustness when they do not. revision: yes
-
Referee: [Abstract and Experimental Evaluations] The abstract and evaluation summary assert that 'extensive evaluations demonstrate' FPR below 10% and ASR below 2.1% 'across all datasets and attack types' with superior main-task accuracy under non-IID conditions, but the provided text supplies no concrete information on datasets, attack implementations, client counts, non-IID heterogeneity parameters, number of runs, statistical significance tests, or error bars. Without these details the superiority claims cannot be assessed and the comparison to the 31-32% baseline FPR is unverifiable.
Authors: The full experimental protocol (datasets, attack implementations, client counts, Dirichlet non-IID parameters, number of runs, and error bars) appears in Section 4, but we acknowledge that the abstract and summary paragraphs do not restate these parameters clearly enough for standalone assessment. In the revision we will expand the abstract to include key experimental settings, add a concise experimental-setup table in the main text, and report statistical significance (paired t-tests) for the FPR and ASR improvements versus the 31-32% baseline. This will make the superiority claims directly verifiable without requiring the reader to consult supplementary material. revision: yes
Circularity Check
No circularity; empirical defense with independent evaluations
full rationale
The paper describes an empirical backdoor defense combining gradient alignment, directional divergence for layer selection, selective clustering, and surrogate replacement from benign clients. No equations, derivations, or first-principles claims appear in the provided text that reduce to fitted parameters or self-citations by construction. The central performance claims rest on external baseline comparisons and evaluations across datasets/attacks rather than any self-referential reduction. This is the expected non-finding for a purely empirical method paper.
Axiom & Free-Parameter Ledger
free parameters (2)
- surrogate downscaling factor
- clustering and divergence thresholds
axioms (2)
- domain assumption Security-critical layers can be identified via directional divergence analysis of gradients
- domain assumption Structurally similar benign clients exist and their updates can serve as effective surrogates
Reference graph
Works this paper leans on
-
[1]
In: International conference on artificial intelligence and statis- tics
Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., Shmatikov, V.: How to backdoor federated learning. In: International conference on artificial intelligence and statis- tics. pp. 2938–2948. PMLR (2020)
work page 2020
-
[2]
Advances in neural information processing systems30(2017)
Blanchard, P., El Mhamdi, E.M., Guerraoui, R., Stainer, J.: Machine learning with adversaries: Byzantine tolerant gradient descent. Advances in neural information processing systems30(2017)
work page 2017
-
[3]
Fltrust: Byzantine- robust federated learning via trust bootstrapping,
Cao,X.,Fang,M.,Liu,J.,Gong,N.Z.:Fltrust:Byzantine-robustfederatedlearning via trust bootstrapping. arXiv preprint arXiv:2012.13995 (2020)
-
[4]
Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 6 (2020)
work page 2020
-
[5]
IEEE signal processing magazine29(6), 141–142 (2012)
Deng, L.: The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine29(6), 141–142 (2012)
work page 2012
-
[6]
In: 29th USENIX security symposium (USENIX Se- curity 20)
Fang, M., Cao, X., Jia, J., Gong, N.: Local model poisoning attacks to{Byzantine- Robust}federated learning. In: 29th USENIX security symposium (USENIX Se- curity 20). pp. 1605–1622 (2020)
work page 2020
-
[7]
In: Inter- national conference on machine learning
Fraboni, Y., Vidal, R., Kameni, L., Lorenzi, M.: Clustered sampling: Low-variance and improved representativity for clients selection in federated learning. In: Inter- national conference on machine learning. pp. 3407–3416. PMLR (2021)
work page 2021
-
[8]
arXiv preprint arXiv:1808.04866 (2018)
Fung, C., Yoon, C.J., Beschastnikh, I.: Mitigating sybils in federated learning poi- soning. arXiv preprint arXiv:1808.04866 (2018)
-
[9]
Advances in neural information processing systems33, 19586–19597 (2020)
Ghosh, A., Chung, J., Yin, D., Ramchandran, K.: An efficient framework for clus- tered federated learning. Advances in neural information processing systems33, 19586–19597 (2020)
work page 2020
-
[10]
He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
work page 2016
-
[11]
In: Forty-second International Conference on Machine Learning (2025)
He, W., Huang, W., Yang, B., Liu, S., Ye, M.: Spmc: Self-purifying federated back- door defense via margin contribution. In: Forty-second International Conference on Machine Learning (2025)
work page 2025
-
[12]
In: 2024 IEEE Symposium on Security and Privacy (SP)
Kabir, E., Song, Z., Rashid, M.R.U., Mehnaz, S.: Flshield: a validation based fed- erated learning framework to defend against poisoning attacks. In: 2024 IEEE Symposium on Security and Privacy (SP). pp. 2572–2590. IEEE (2024)
work page 2024
-
[13]
In: 2023 IEEE Security and Privacy Workshops (SPW)
Khan, M.A., Shejwalkar, V., Houmansadr, A., Anwar, F.M.: On the pitfalls of se- curity evaluation of robust federated learning. In: 2023 IEEE Security and Privacy Workshops (SPW). pp. 57–68. IEEE (2023)
work page 2023
-
[14]
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
work page 2009
-
[15]
Proceedings of Machine learning and sys- tems2, 429–450 (2020)
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proceedings of Machine learning and sys- tems2, 429–450 (2020)
work page 2020
-
[16]
McInnes, L., Healy, J., Astels, S., et al.: hdbscan: Hierarchical density based clus- tering. J. Open Source Softw.2(11), 205 (2017)
work page 2017
-
[17]
In: Artificial intelligence and statistics
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. pp. 1273–1282. Pmlr (2017)
work page 2017
-
[18]
In: 2024 IEEE Symposium on Security and Privacy (SP)
Naseri, M., Han, Y., De Cristofaro, E.: Badvfl: Backdoor attacks in vertical fed- erated learning. In: 2024 IEEE Symposium on Security and Privacy (SP). pp. 2013–2028. IEEE (2024) 18 F. Abacha et al
work page 2024
-
[19]
In: 31st USENIX security symposium (USENIX Security 22)
Nguyen, T.D., Rieger, P., Chen, H., Yalame, H., Möllering, H., Fereidooni, H., Marchal, S., Miettinen, M., Mirhoseini, A., Zeitouni, S., et al.:{FLAME}: Taming backdoors in federated learning. In: 31st USENIX security symposium (USENIX Security 22). pp. 1415–1432 (2022)
work page 2022
-
[20]
Nguyen, T.D., Nguyen, A.D., Nguyen, T.H., Wong, K.S., Pham, H.H., Nguyen, T.T., Le Nguyen, P.: Fedgrad: Mitigating backdoor attacks in federated learning throughlocalultimategradientsinspection.In:2023InternationalJointConference on Neural Networks (IJCNN). pp. 01–10. IEEE (2023)
work page 2023
-
[21]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Qin, Z., Chen, F., Zhi, C., Yan, X., Deng, S.: Resisting backdoor attacks in feder- ated learning via bidirectional elections and individual perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 14677–14685 (2024)
work page 2024
- [22]
-
[23]
In: European symposium on research in computer security
Tolpegin, V., Truex, S., Gursoy, M.E., Liu, L.: Data poisoning attacks against fed- erated learning systems. In: European symposium on research in computer security. pp. 480–501. Springer (2020)
work page 2020
-
[24]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for bench- marking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[25]
In: International conference on learning representations (2019)
Xie, C., Huang, K., Chen, P.Y., Li, B.: Dba: Distributed backdoor attacks against federated learning. In: International conference on learning representations (2019)
work page 2019
-
[26]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Xu, J., Zhang, Z., Hu, R.: Detecting backdoor attacks in federated learning via di- rection alignment inspection. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 20654–20664 (2025)
work page 2025
-
[27]
In: Interna- tional conference on machine learning
Zhang, Z., Panda, A., Song, L., Yang, Y., Mahoney, M., Mittal, P., Kannan, R., Gonzalez, J.: Neurotoxin: Durable backdoors in federated learning. In: Interna- tional conference on machine learning. pp. 26429–26446. PMLR (2022)
work page 2022
-
[28]
arXiv preprint arXiv:1806.00582 (2018)
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning with non-iid data. arXiv preprint arXiv:1806.00582 (2018)
-
[29]
In: International Conference on Learning Representations
Zhuang, H., Yu, M., Wang, H., Hua, Y., Li, J., Yuan, X.: Backdoor federated learn- ing by poisoning backdoor-critical layers. In: International Conference on Learning Representations. vol. 2024, pp. 40241–40266 (2024) FedSurrogate 19 Appendix Ablation Studies Clustering only + Rescue + Surrogate Replacement 85 86 87 88 89Main Task Accuracy (%) 85.95 % 87....
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.