pith. machine review for the scientific record. sign in

arxiv: 2605.11122 · v1 · submitted 2026-05-11 · 💻 cs.CR · cs.LG

Recognition: no theorem link

FedSurrogate: Backdoor Defense in Federated Learning via Layer Criticality and Surrogate Replacement

Authors on Pith no claims yet

Pith reviewed 2026-05-13 02:30 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords federated learningbackdoor defensesurrogate replacementnon-IID datalayer criticalitygradient alignmentanomaly detection
0
0 comments X

The pith

FedSurrogate replaces malicious updates with downscaled surrogates from similar benign clients to defend federated learning against backdoors while keeping false positives below 10 percent under non-IID data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FedSurrogate to counter backdoor attacks where malicious clients insert targeted behaviors into a shared global model. It first locates security-critical layers through directional divergence analysis, then applies bidirectional gradient alignment to filter anomalies and rescue misflagged benign clients. Confirmed malicious updates are not discarded but swapped for downscaled versions drawn from structurally similar trusted clients. This preserves gradient diversity and model performance. A sympathetic reader would care because prior defenses often flag too many good clients in realistic heterogeneous settings, lowering overall accuracy even when they catch the attackers.

Core claim

FedSurrogate performs selective clustering on security-critical layers identified via directional divergence analysis, combines it with bidirectional soft-filtering to screen trusted clients and rescue false positives, and replaces confirmed malicious updates with downscaled surrogate updates from structurally similar benign clients, thereby neutralizing adversarial influence while preserving gradient diversity and achieving superior main-task accuracy.

What carries the argument

Surrogate replacement of malicious updates by downscaled versions from structurally similar benign clients, after layer criticality is isolated via directional divergence analysis and bidirectional gradient alignment filtering.

Load-bearing premise

Structurally similar benign clients will always exist and supply surrogate updates that neutralize attacks without degrading the global model, and directional divergence will reliably identify critical layers even under extreme non-IID heterogeneity.

What would settle it

A federated training run on a dataset partition where no benign client shares structural similarity with the malicious ones, checking whether attack success rate rises above 2 percent or false-positive rate exceeds 10 percent.

Figures

Figures reproduced from arXiv: 2605.11122 by Fatima Z. Abacha, Lucas C. Cordeiro, Mustafa A. Mustafa, Sin G. Teo, Yuanxiang Wu.

Figure 1
Figure 1. Figure 1: Overview of the FedSurrogate defense pipeline. Stage 1 performs density [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sensitivity of FedSurrogate’s detection performance to the threshold ζ on CIFAR-10 (pdr=0.3) [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Scalability of FedSurrogate on CIFAR-10 with ResNet-18 as the num￾ber of clients n increases from 20 to 100, holding MCR = 0.2 and pdr = 0.3. FedSurrogate ASR (right axis) remains below 2.5% across the full sweep, while FedSurrogate MTA (left axis) tracks the attack-free FedAvg baseline closely, with the gap fluctuating between 1.21 and 3.87 percentage points and showing no upward trend. The MTA degradatio… view at source ↗
Figure 4
Figure 4. Figure 4: Ablation study of FedSurrogate components on CIFAR-10 with centralized backdoor attack (CBA) Each component contributes incrementally to MTA, with the rescue stage providing the largest single gain (+1.56%). Effect of Donor Selection Strategy [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Robustness of FedSurrogate against increasing Malicious Client Ratio (MCR) on CIFAR-10 under the CBA attack (Dirichlet α=0.5, n=20 clients). The defense maintains ASR below 2% for MCR ≤ 35%, enters a reduced-effectiveness regime at MCR = 40% (ASR = 15.32%) where the attack partially succeeds but remains substantially mitigated relative to the undefended case, and becomes ineffective at MCR = 45% (ASR = 74.… view at source ↗
read the original abstract

Federated Learning remains highly susceptible to backdoor attacks--malicious clients inject targeted behaviours into the global model. Existing defenses suffer from substantial false-positive rates under realistic non-independent and identically distributed (non-IID) data, incorrectly flagging benign clients and degrading model accuracy even when adversaries are correctly identified. We present FedSurrogate, a novel backdoor defense that addresses this limitation by combining bidirectional gradient alignment filtering with layer-adaptive anomaly detection. FedSurrogate performs selective clustering on security-critical layers identified via directional divergence analysis, concentrating the detection signal on a low-dimensional subspace. A bidirectional soft-filtering stage screens trusted clients for residual contamination while rescuing false positives from suspects, substantially reducing misclassifications under heterogeneous conditions. Rather than removing confirmed malicious updates, FedSurrogate replaces them with downscaled surrogate updates from structurally similar benign clients, preserving gradient diversity while neutralising adversarial influence. Extensive evaluations demonstrate that FedSurrogate maintains false-positive rates below 10% across all datasets and attack types, compared to 31-32% for the nearest comparably effective baseline, while achieving superior main-task accuracy and maintaining attack success rates below 2.1% across all tested datasets and attack types under challenging non-IID settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents FedSurrogate, a backdoor defense for federated learning. It combines bidirectional gradient alignment filtering with layer-adaptive anomaly detection via directional divergence analysis to identify security-critical layers, followed by selective clustering on those layers. Confirmed malicious updates are replaced (rather than removed) with downscaled surrogate updates drawn from structurally similar benign clients, with the goal of reducing false-positive rates under non-IID data while preserving gradient diversity, main-task accuracy, and keeping attack success rates low.

Significance. If the empirical claims hold under rigorous scrutiny, FedSurrogate would constitute a practical advance over existing defenses by lowering false-positive rates from the 31-32% range of the nearest comparably effective baseline to below 10% across datasets and attack types, while reporting superior main-task accuracy and attack success rates below 2.1% even in challenging non-IID regimes. The surrogate-replacement strategy is a distinctive contribution that avoids the accuracy penalty of outright removal; however, its effectiveness rests on the untested premise that sufficiently similar benign clients exist in the critical-layer subspace.

major comments (2)
  1. [Method (Surrogate Replacement and Bidirectional Gradient Alignment)] The surrogate-replacement step (described in the method as replacing confirmed malicious updates with downscaled surrogates from clients identified via bidirectional gradient alignment and selective clustering on layers flagged by directional divergence) is load-bearing for the central performance claims. The manuscript treats the existence of sufficiently similar benign clients as given across all tested attack types and non-IID regimes, yet provides no explicit quantification of similarity thresholds, no failure-case analysis when client distributions diverge sharply in the critical-layer subspace, and no ablation showing what happens to FPR, ASR, or accuracy when no adequate surrogate is available.
  2. [Abstract and Experimental Evaluations] The abstract and evaluation summary assert that 'extensive evaluations demonstrate' FPR below 10% and ASR below 2.1% 'across all datasets and attack types' with superior main-task accuracy under non-IID conditions, but the provided text supplies no concrete information on datasets, attack implementations, client counts, non-IID heterogeneity parameters, number of runs, statistical significance tests, or error bars. Without these details the superiority claims cannot be assessed and the comparison to the 31-32% baseline FPR is unverifiable.
minor comments (1)
  1. [Abstract] The abstract refers to 'challenging non-IID settings' without specifying the degree of heterogeneity (e.g., Dirichlet parameter or label skew) or the concrete datasets, which would aid immediate contextualization of the reported numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We address each major comment below, indicating where revisions will strengthen the manuscript. We agree that additional details and analyses are needed to fully substantiate the claims.

read point-by-point responses
  1. Referee: [Method (Surrogate Replacement and Bidirectional Gradient Alignment)] The surrogate-replacement step (described in the method as replacing confirmed malicious updates with downscaled surrogates from clients identified via bidirectional gradient alignment and selective clustering on layers flagged by directional divergence) is load-bearing for the central performance claims. The manuscript treats the existence of sufficiently similar benign clients as given across all tested attack types and non-IID regimes, yet provides no explicit quantification of similarity thresholds, no failure-case analysis when client distributions diverge sharply in the critical-layer subspace, and no ablation showing what happens to FPR, ASR, or accuracy when no adequate surrogate is available.

    Authors: We agree that the current manuscript lacks explicit quantification of similarity thresholds (e.g., cosine similarity in the critical-layer subspace) and dedicated failure-case analysis. In the revision we will add: (1) the precise similarity metric and threshold used for surrogate selection, (2) histograms of similarity scores across all experiments, and (3) an ablation that progressively reduces surrogate availability (by increasing non-IID heterogeneity or removing the most similar clients) and reports resulting changes in FPR, ASR, and main-task accuracy. This will directly test the premise that sufficiently similar benign clients exist and quantify robustness when they do not. revision: yes

  2. Referee: [Abstract and Experimental Evaluations] The abstract and evaluation summary assert that 'extensive evaluations demonstrate' FPR below 10% and ASR below 2.1% 'across all datasets and attack types' with superior main-task accuracy under non-IID conditions, but the provided text supplies no concrete information on datasets, attack implementations, client counts, non-IID heterogeneity parameters, number of runs, statistical significance tests, or error bars. Without these details the superiority claims cannot be assessed and the comparison to the 31-32% baseline FPR is unverifiable.

    Authors: The full experimental protocol (datasets, attack implementations, client counts, Dirichlet non-IID parameters, number of runs, and error bars) appears in Section 4, but we acknowledge that the abstract and summary paragraphs do not restate these parameters clearly enough for standalone assessment. In the revision we will expand the abstract to include key experimental settings, add a concise experimental-setup table in the main text, and report statistical significance (paired t-tests) for the FPR and ASR improvements versus the 31-32% baseline. This will make the superiority claims directly verifiable without requiring the reader to consult supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical defense with independent evaluations

full rationale

The paper describes an empirical backdoor defense combining gradient alignment, directional divergence for layer selection, selective clustering, and surrogate replacement from benign clients. No equations, derivations, or first-principles claims appear in the provided text that reduce to fitted parameters or self-citations by construction. The central performance claims rest on external baseline comparisons and evaluations across datasets/attacks rather than any self-referential reduction. This is the expected non-finding for a purely empirical method paper.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The method depends on several domain assumptions about layer importance and client similarity plus likely tunable parameters for clustering and scaling; no invented physical entities.

free parameters (2)
  • surrogate downscaling factor
    Mention of downscaled surrogate updates implies a scaling hyperparameter whose value is not stated in the abstract.
  • clustering and divergence thresholds
    Selective clustering and directional divergence analysis require thresholds or cluster counts that are typically fitted or chosen.
axioms (2)
  • domain assumption Security-critical layers can be identified via directional divergence analysis of gradients
    Central to the layer-adaptive detection stage.
  • domain assumption Structurally similar benign clients exist and their updates can serve as effective surrogates
    Required for the replacement strategy to preserve diversity without reintroducing attack effects.

pith-pipeline@v0.9.0 · 5536 in / 1425 out tokens · 67112 ms · 2026-05-13T02:30:49.582350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

  1. [1]

    In: International conference on artificial intelligence and statis- tics

    Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., Shmatikov, V.: How to backdoor federated learning. In: International conference on artificial intelligence and statis- tics. pp. 2938–2948. PMLR (2020)

  2. [2]

    Advances in neural information processing systems30(2017)

    Blanchard, P., El Mhamdi, E.M., Guerraoui, R., Stainer, J.: Machine learning with adversaries: Byzantine tolerant gradient descent. Advances in neural information processing systems30(2017)

  3. [3]

    Fltrust: Byzantine- robust federated learning via trust bootstrapping,

    Cao,X.,Fang,M.,Liu,J.,Gong,N.Z.:Fltrust:Byzantine-robustfederatedlearning via trust bootstrapping. arXiv preprint arXiv:2012.13995 (2020)

  4. [4]

    BMC genomics 21(1), 6 (2020)

    Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 6 (2020)

  5. [5]

    IEEE signal processing magazine29(6), 141–142 (2012)

    Deng, L.: The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine29(6), 141–142 (2012)

  6. [6]

    In: 29th USENIX security symposium (USENIX Se- curity 20)

    Fang, M., Cao, X., Jia, J., Gong, N.: Local model poisoning attacks to{Byzantine- Robust}federated learning. In: 29th USENIX security symposium (USENIX Se- curity 20). pp. 1605–1622 (2020)

  7. [7]

    In: Inter- national conference on machine learning

    Fraboni, Y., Vidal, R., Kameni, L., Lorenzi, M.: Clustered sampling: Low-variance and improved representativity for clients selection in federated learning. In: Inter- national conference on machine learning. pp. 3407–3416. PMLR (2021)

  8. [8]

    arXiv preprint arXiv:1808.04866 (2018)

    Fung, C., Yoon, C.J., Beschastnikh, I.: Mitigating sybils in federated learning poi- soning. arXiv preprint arXiv:1808.04866 (2018)

  9. [9]

    Advances in neural information processing systems33, 19586–19597 (2020)

    Ghosh, A., Chung, J., Yin, D., Ramchandran, K.: An efficient framework for clus- tered federated learning. Advances in neural information processing systems33, 19586–19597 (2020)

  10. [10]

    He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

  11. [11]

    In: Forty-second International Conference on Machine Learning (2025)

    He, W., Huang, W., Yang, B., Liu, S., Ye, M.: Spmc: Self-purifying federated back- door defense via margin contribution. In: Forty-second International Conference on Machine Learning (2025)

  12. [12]

    In: 2024 IEEE Symposium on Security and Privacy (SP)

    Kabir, E., Song, Z., Rashid, M.R.U., Mehnaz, S.: Flshield: a validation based fed- erated learning framework to defend against poisoning attacks. In: 2024 IEEE Symposium on Security and Privacy (SP). pp. 2572–2590. IEEE (2024)

  13. [13]

    In: 2023 IEEE Security and Privacy Workshops (SPW)

    Khan, M.A., Shejwalkar, V., Houmansadr, A., Anwar, F.M.: On the pitfalls of se- curity evaluation of robust federated learning. In: 2023 IEEE Security and Privacy Workshops (SPW). pp. 57–68. IEEE (2023)

  14. [14]

    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

  15. [15]

    Proceedings of Machine learning and sys- tems2, 429–450 (2020)

    Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proceedings of Machine learning and sys- tems2, 429–450 (2020)

  16. [16]

    McInnes, L., Healy, J., Astels, S., et al.: hdbscan: Hierarchical density based clus- tering. J. Open Source Softw.2(11), 205 (2017)

  17. [17]

    In: Artificial intelligence and statistics

    McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. pp. 1273–1282. Pmlr (2017)

  18. [18]

    In: 2024 IEEE Symposium on Security and Privacy (SP)

    Naseri, M., Han, Y., De Cristofaro, E.: Badvfl: Backdoor attacks in vertical fed- erated learning. In: 2024 IEEE Symposium on Security and Privacy (SP). pp. 2013–2028. IEEE (2024) 18 F. Abacha et al

  19. [19]

    In: 31st USENIX security symposium (USENIX Security 22)

    Nguyen, T.D., Rieger, P., Chen, H., Yalame, H., Möllering, H., Fereidooni, H., Marchal, S., Miettinen, M., Mirhoseini, A., Zeitouni, S., et al.:{FLAME}: Taming backdoors in federated learning. In: 31st USENIX security symposium (USENIX Security 22). pp. 1415–1432 (2022)

  20. [20]

    Nguyen, T.D., Nguyen, A.D., Nguyen, T.H., Wong, K.S., Pham, H.H., Nguyen, T.T., Le Nguyen, P.: Fedgrad: Mitigating backdoor attacks in federated learning throughlocalultimategradientsinspection.In:2023InternationalJointConference on Neural Networks (IJCNN). pp. 01–10. IEEE (2023)

  21. [21]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Qin, Z., Chen, F., Zhi, C., Yan, X., Deng, S.: Resisting backdoor attacks in feder- ated learning via bidirectional elections and individual perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 14677–14685 (2024)

  22. [22]

    Sun, Z., Kairouz, P., Suresh, A.T., McMahan, H.B.: Can you really backdoor fed- erated learning? arXiv preprint arXiv:1911.07963 (2019)

  23. [23]

    In: European symposium on research in computer security

    Tolpegin, V., Truex, S., Gursoy, M.E., Liu, L.: Data poisoning attacks against fed- erated learning systems. In: European symposium on research in computer security. pp. 480–501. Springer (2020)

  24. [24]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for bench- marking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  25. [25]

    In: International conference on learning representations (2019)

    Xie, C., Huang, K., Chen, P.Y., Li, B.: Dba: Distributed backdoor attacks against federated learning. In: International conference on learning representations (2019)

  26. [26]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Xu, J., Zhang, Z., Hu, R.: Detecting backdoor attacks in federated learning via di- rection alignment inspection. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 20654–20664 (2025)

  27. [27]

    In: Interna- tional conference on machine learning

    Zhang, Z., Panda, A., Song, L., Yang, Y., Mahoney, M., Mittal, P., Kannan, R., Gonzalez, J.: Neurotoxin: Durable backdoors in federated learning. In: Interna- tional conference on machine learning. pp. 26429–26446. PMLR (2022)

  28. [28]

    arXiv preprint arXiv:1806.00582 (2018)

    Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning with non-iid data. arXiv preprint arXiv:1806.00582 (2018)

  29. [29]

    In: International Conference on Learning Representations

    Zhuang, H., Yu, M., Wang, H., Hua, Y., Li, J., Yuan, X.: Backdoor federated learn- ing by poisoning backdoor-critical layers. In: International Conference on Learning Representations. vol. 2024, pp. 40241–40266 (2024) FedSurrogate 19 Appendix Ablation Studies Clustering only + Rescue + Surrogate Replacement 85 86 87 88 89Main Task Accuracy (%) 85.95 % 87....