pith. machine review for the scientific record. sign in

arxiv: 2604.10013 · v2 · submitted 2026-04-11 · 📊 stat.ME · math.OC

Recognition: unknown

Toward Exact Convergence in Byzantine-Robust Decentralized Learning: A Statistical Identification Approach

Changliang Zou, Chengde Qian, Siyuan Zhang, Xin Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:35 UTC · model grok-4.3

classification 📊 stat.ME math.OC
keywords Byzantine robust learningdecentralized optimizationstatistical identificationsample splittingfalse discovery ratestochastic gradient descentnetwork connectivityconvergence analysis
0
0 comments X

The pith

Identifying Byzantine machines with sample-splitting statistics allows decentralized learning to achieve exact convergence at the optimal rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Most defenses against Byzantine attacks in decentralized learning rely on robust aggregation, which biases updates and leaves steady-state errors. This paper instead uses active identification to detect and remove malicious nodes before optimization. A detection procedure based on sample-splitting score statistics controls the false discovery rate and prunes attackers without strong assumptions on data distributions. Once malicious nodes are removed, the remaining normal nodes regain enough connectivity to run standard optimization steps. The resulting algorithm then reaches the same order-optimal convergence rate as ordinary decentralized stochastic gradient methods.

Core claim

The central claim is that a detect-then-optimize pipeline with precise statistical identification of Byzantine machines recovers connectivity among honest nodes, enabling DRSGD-ByMI to match the convergence rate of standard decentralized first-order methods even when malicious machines are present.

What carries the argument

The p-value-free detection procedure using sample-splitting score statistics that prunes malicious nodes while preserving network connectivity for subsequent optimization.

If this is right

  • The decentralized network recovers sufficient connectivity among normal nodes after pruning.
  • DRSGD-ByMI achieves the same order-optimal convergence rate as standard decentralized stochastic first-order methods.
  • The identification mechanism controls the false discovery rate without restrictive distributional assumptions.
  • Numerical experiments confirm the theoretical convergence results for robust decentralized learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This suggests that identification-based methods could outperform aggregation rules in other distributed settings where graph connectivity matters.
  • Applying the detection repeatedly might allow handling of time-varying or adaptive Byzantine behaviors.
  • The framework implies a general principle that removing identified outliers early can avoid bias accumulation in iterative optimization.

Load-bearing premise

The statistical test can correctly find and remove malicious machines while keeping the honest ones connected enough to optimize together.

What would settle it

A network simulation with known malicious nodes where the identification procedure either fails to control false discoveries or leaves the honest subgraph too disconnected, causing the observed convergence rate to fall below the claimed order-optimal level.

Figures

Figures reproduced from arXiv: 2604.10013 by Changliang Zou, Chengde Qian, Siyuan Zhang, Xin Liu.

Figure 1
Figure 1. Figure 1: The framework of our proposed method: DRSGD-ByMI [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The condition–theorem framework of DRSGD-ByMI. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FDP, Pa and optimality gap of ByMI-type methods over sr when ϱ = 0.2 under Scenario A, with d = 30 (top row) and d = 80 (bottom row). 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 20 40 60 80 100 FDR (%) UBAR-ByMI-DSRGD IOS-ByMI-DSRGD BALANCE-ByMI-DSRGD Significance level (a) FDP (d = 30) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 20 40 60 80 100 Pa (%) UBAR-ByMI-DSRGD IOS-ByMI-DSRGD BALANCE-ByMI-DSRGD (b) Pa (d = 30) 0.05 0.1… view at source ↗
Figure 4
Figure 4. Figure 4: FDP, Pa and optimality gap of ByMI-type methods over ϱ under Scenario B, with d = 30 (top row) and d = 80 (bottom row). 19 [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of DRSGD-ByMI with corresponding decentralized Byzantine-robust stochastic algo [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of DRSGD-ByMI with corresponding decentralized Byzantine-robust stochastic algo [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of DRSGD-ByMI with corresponding decentralized Byzantine-robust stochastic algo [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
read the original abstract

To defend against Byzantine attacks in decentralized learning, most existing methods rely on robust aggregation rules to mitigate the influence of malicious machines. However, these strategies inherently introduce bias, leading to inexact convergence with non-vanishing steady-state errors. In this paper, we propose a strategic shift from passive aggregation to active identification by introducing the Decentralized Rescaled Stochastic Gradient Descent with Byzantine Machine Identification (DRSGD-ByMI) framework. The core of our approach is an identification-based ``detect-then-optimize'' pipeline, where a p-value-free detection procedure is developed to accurately prune malicious nodes from the network. By leveraging sample-splitting score statistics, this identification mechanism achieves false discovery rate control without requiring restrictive distributional assumptions. We theoretically demonstrate that this precise identification allows the decentralized network to recover sufficient connectivity among the normal nodes, thereby enabling DRSGD-ByMI to match, even in the presence of Byzantine machines, the same order-optimal convergence rate as standard decentralized stochastic first-order methods. Numerical experiments validate our theoretical results and demonstrate the effectiveness of DRSGD-ByMI for decentralized robust learning problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes the DRSGD-ByMI framework for Byzantine-robust decentralized learning. It introduces a detect-then-optimize pipeline that first applies a p-value-free detection procedure based on sample-splitting score statistics to prune malicious nodes while controlling the false discovery rate (FDR), then runs decentralized rescaled SGD on the remaining network. The central theoretical claim is that this precise identification recovers sufficient connectivity among the normal nodes, enabling the method to achieve the same order-optimal convergence rate O(1/√T) as standard decentralized stochastic first-order methods, even with Byzantine machines present. The abstract states that numerical experiments validate the theory and demonstrate effectiveness.

Significance. If the connectivity preservation argument holds with high probability, the work would be significant for shifting Byzantine defense from biased robust aggregation to active statistical identification, thereby achieving unbiased exact convergence at optimal rates. The p-value-free FDR control without strong distributional assumptions is a notable technical feature, and the overall approach addresses a key limitation in existing decentralized robust methods.

major comments (1)
  1. [Theoretical analysis of connectivity recovery] The theoretical demonstration (referenced in the abstract as showing that 'precise identification allows the decentralized network to recover sufficient connectivity among the normal nodes') relies on FDR control of the sample-splitting score test to guarantee that the induced subgraph on normal nodes retains a positive spectral gap or bounded mixing time. However, FDR only controls the expected number of false discoveries and does not provide a high-probability bound preventing the removal of a critical cut-set of normal nodes that could disconnect the graph or drive algebraic connectivity to zero. This gap directly undermines the subsequent claim of recovering the exact O(1/√T) rate; an explicit high-probability connectivity guarantee (e.g., via minimum-degree assumptions on the original normal subgraph or concentration on the pruned graph's Cheeger constant) is required in the main th
minor comments (2)
  1. [Abstract] The abstract would benefit from explicitly listing the network topology assumptions (e.g., initial connectivity of normal nodes) and the maximum number of Byzantine nodes tolerated relative to total nodes.
  2. [Numerical experiments] Experimental section should include more details on the specific network topologies tested, the fraction of Byzantine nodes, and quantitative metrics for post-pruning connectivity (e.g., algebraic connectivity values).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our work's significance and for the detailed major comment. We address the concern regarding the high-probability connectivity guarantee below and will revise the manuscript to close this gap.

read point-by-point responses
  1. Referee: [Theoretical analysis of connectivity recovery] The theoretical demonstration (referenced in the abstract as showing that 'precise identification allows the decentralized network to recover sufficient connectivity among the normal nodes') relies on FDR control of the sample-splitting score test to guarantee that the induced subgraph on normal nodes retains a positive spectral gap or bounded mixing time. However, FDR only controls the expected number of false discoveries and does not provide a high-probability bound preventing the removal of a critical cut-set of normal nodes that could disconnect the graph or drive algebraic connectivity to zero. This gap directly undermines the subsequent claim of recovering the exact O(1/√T) rate; an explicit high-probability connectivity guarantee (e.g., via minimum-degree assumptions on the original normal subgraph or concentration on the prunedgraph

    Authors: We agree that FDR control alone yields an expectation bound and does not automatically deliver a high-probability guarantee against disconnection. In the revision we will strengthen the analysis by adding a concentration argument (via Bernstein-type bounds on the sample-splitting score statistics) showing that the number of false discoveries among normal nodes is O(log n) with probability 1-δ. Combined with an explicit minimum-degree assumption on the normal subgraph (which we will state clearly), this implies that the algebraic connectivity of the pruned normal graph remains bounded below by a positive constant with high probability. The main theorem and its proof will be updated to reflect this, and the abstract will be revised for precision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is logically prior and self-contained

full rationale

The paper's core chain is a detect-then-optimize pipeline: sample-splitting score statistics achieve FDR-controlled identification of Byzantine nodes (without distributional assumptions), which is then used to prune the graph and recover sufficient connectivity among normal nodes for DRSGD-ByMI to attain the standard O(1/√T) rate of decentralized SGD. This identification step is presented as logically prior to the rate analysis, with no equations or claims showing the convergence rate being fitted to data, redefined in terms of itself, or reduced by construction to the detection procedure. No self-citations are load-bearing for the central theorem, and the structure does not rename a known result or smuggle an ansatz. The derivation therefore remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on the effectiveness of the new detection rule and the post-identification connectivity of normal nodes; these are introduced without independent evidence in the abstract.

axioms (2)
  • domain assumption After pruning identified Byzantine nodes, the subgraph induced by normal nodes retains sufficient connectivity for decentralized SGD to achieve its standard convergence rate.
    This connectivity recovery is invoked to justify matching the order-optimal rate of clean decentralized methods.
  • domain assumption The sample-splitting score statistics admit false-discovery-rate control without restrictive distributional assumptions on the data or gradients.
    Stated as a property of the proposed detection procedure.
invented entities (1)
  • DRSGD-ByMI framework and its p-value-free detection procedure no independent evidence
    purpose: To identify and remove Byzantine machines so that exact convergence is restored.
    Core novel component introduced by the paper; no external falsifiable handle is provided in the abstract.

pith-pipeline@v0.9.0 · 5495 in / 1452 out tokens · 39088 ms · 2026-05-10T16:35:35.393397+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Lower bounds for non-convex stochastic optimization.Mathematical Programming, 199(1):165–214, 2023

    Yossi Arjevani, Yair Carmon, John C Duchi, Dylan J Foster, Nathan Srebro, and Blake Woodworth. Lower bounds for non-convex stochastic optimization.Mathematical Programming, 199(1):165–214, 2023

  2. [2]

    Machine learning with adversaries: Byzantine tolerant gradient descent.Advances in neural information processing systems, 30, 2017

    Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent.Advances in neural information processing systems, 30, 2017

  3. [3]

    Being robust (in high dimensions) can be practical

    Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Being robust (in high dimensions) can be practical. InInternational Conference on Machine Learning, pages 999–1008. PMLR, 2017

  4. [4]

    Outlier robust mean estimation with subgaussian rates via stability .Advances in Neural Information Processing Systems, 33:1830–1840, 2020

    Ilias Diakonikolas, Daniel M Kane, and Ankit Pensia. Outlier robust mean estimation with subgaussian rates via stability .Advances in Neural Information Processing Systems, 33:1830–1840, 2020

  5. [5]

    Graph theory and probability .Canadian Journal of Mathematics, 11:34–38, 1959

    Paul Erd ¨os. Graph theory and probability .Canadian Journal of Mathematics, 11:34–38, 1959

  6. [6]

    Cheng Fang, Zhixiong Yang, and Waheed U. Bajwa. BRIDGE: byzantine-resilient decentralized gradient descent.IEEE Transactions on Signal and Information Processing over Networks, 8:610–626, 2022

  7. [7]

    Byzantine-robust decentralized federated learning

    Minghong Fang, Zifan Zhang, Hairi, Prashant Khanduri, Jia Liu, Songtao Lu, Yuchen Liu, and Neil Gong. Byzantine-robust decentralized federated learning. InProceedings of the 2024 on ACM SIGSAC Confer- ence on Computer and Communications Security, CCS ’24, pages 2874–2888, New York, NY, USA, 2024. Association for Computing Machinery

  8. [8]

    Exploring the limits of out-of-distribution detec- tion.Advances in neural information processing systems, 34:7068–7081, 2021

    Stanislav Fort, Jie Ren, and Balaji Lakshminarayanan. Exploring the limits of out-of-distribution detec- tion.Advances in neural information processing systems, 34:7068–7081, 2021

  9. [9]

    Cambridge University Press, 2015

    Alan Frieze and Michał Karo´nski.Introduction to random graphs. Cambridge University Press, 2015. 49

  10. [10]

    Byzantine-resilient decentralized stochastic gradient descent.IEEE Transactions on Circuits and Systems for Video Technology, 32(6):4096–4106, 2021

    Shangwei Guo, Tianwei Zhang, Han Yu, Xiaofei Xie, Lei Ma, Tao Xiang, and Yang Liu. Byzantine-resilient decentralized stochastic gradient descent.IEEE Transactions on Circuits and Systems for Video Technology, 32(6):4096–4106, 2021

  11. [11]

    Byzantine fault-tolerance in decentralized opti- mization under 2f-redundancy

    Nirupam Gupta, Thinh T Doan, and Nitin H Vaidya. Byzantine fault-tolerance in decentralized opti- mization under 2f-redundancy . In2021 American Control Conference (ACC), pages 3632–3637. IEEE, 2021

  12. [12]

    Byzantine-Robust Decentralized Learning via ClippedGossip,

    Lie He, Sai Praneeth Karimireddy , and Martin Jaggi. Byzantine-robust decentralized learning via clipped- gossip.arXiv preprint arXiv:2202.01545, 2022

  13. [13]

    Peter J. Huber. Robust Estimation of a Location Parameter.The Annals of Mathematical Statistics, 35(1):73–101, 1964

  14. [14]

    Data falsification attacks on consensus- based detection systems.IEEE Transactions on Signal and Information Processing over Networks, 3(1):145– 158, 2016

    Bhavya Kailkhura, Swastik Brahma, and Pramod K Varshney . Data falsification attacks on consensus- based detection systems.IEEE Transactions on Signal and Information Processing over Networks, 3(1):145– 158, 2016

  15. [15]

    Byzantine- resilient locally optimum detection using collaborative autonomous networks

    Bhavya Kailkhura, Priyadip Ray , Deepak Rajan, Anton Yen, Peter Barnes, and Ryan Goldhahn. Byzantine- resilient locally optimum detection using collaborative autonomous networks. In2017 IEEE 7th inter- national workshop on computational advances in multi-sensor adaptive processing (CAMSAP), pages 1–5. IEEE, 2017

  16. [16]

    On the geometric convergence of byzantine- resilient distributed optimization algorithms.SIAM Journal on Optimization, 35(1):210–239, 2025

    Kananart Kuwaranancharoen and Shreyas Sundaram. On the geometric convergence of byzantine- resilient distributed optimization algorithms.SIAM Journal on Optimization, 35(1):210–239, 2025

  17. [17]

    Byzantine-resilient distributed optimiza- tion of multi-dimensional functions

    Kananart Kuwaranancharoen, Lei Xin, and Shreyas Sundaram. Byzantine-resilient distributed optimiza- tion of multi-dimensional functions. In2020 American Control Conference (ACC), pages 4399–4404. IEEE, 2020

  18. [18]

    Scalable distributed optimization of multi- dimensional functions despite byzantine adversaries.IEEE Transactions on Signal and Information Pro- cessing over Networks, 2024

    Kananart Kuwaranancharoen, Lei Xin, and Shreyas Sundaram. Scalable distributed optimization of multi- dimensional functions despite byzantine adversaries.IEEE Transactions on Signal and Information Pro- cessing over Networks, 2024

  19. [19]

    Agnostic estimation of mean and covariance

    Kevin A Lai, Anup B Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665–674. IEEE, 2016

  20. [20]

    Gradient-based learning applied to docu- ment recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

    Yann LeCun, L´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to docu- ment recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

  21. [21]

    Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. Can decentralized algo- rithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent.Advances in neural information processing systems, 30, 2017

  22. [22]

    Distributed optimization over weighted directed graphs using row stochas- tic matrix

    Van Sy Mai and Eyad H Abed. Distributed optimization over weighted directed graphs using row stochas- tic matrix. In2016 American Control Conference (ACC), pages 7165–7170. IEEE, 2016. 50

  23. [23]

    Byzantine-robust decentralized stochastic optimization over static and time-varying networks.Signal Processing, 183:108020, 2021

    Jie Peng, Weiyu Li, and Qing Ling. Byzantine-robust decentralized stochastic optimization over static and time-varying networks.Signal Processing, 183:108020, 2021

  24. [24]

    Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error.arXiv preprint arXiv:2308.05292, 2023

    Jie Peng, Weiyu Li, and Qing Ling. Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error.arXiv preprint arXiv:2308.05292, 2023

  25. [25]

    Byzantine-robust decentralized stochastic optimization

    Jie Peng and Qing Ling. Byzantine-robust decentralized stochastic optimization. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5935–5939. IEEE, 2020

  26. [26]

    On probabilities of moderate deviations.Journal of Mathematical Sciences, 109(6):2189–2191, 2002

    VV Petrov. On probabilities of moderate deviations.Journal of Mathematical Sciences, 109(6):2189–2191, 2002

  27. [27]

    The perron-frobenius theorem: some of its applications.IEEE Signal Processing Magazine, 22(2):62–75, 2005

    S Unnikrishna Pillai, Torsten Suel, and Seunghun Cha. The perron-frobenius theorem: some of its applications.IEEE Signal Processing Magazine, 22(2):62–75, 2005

  28. [28]

    Bymi: Byzantine machine identi- fication with false discovery rate control

    Chengde Qian, Mengyuan Wang, Haojie Ren, and Changliang Zou. Bymi: Byzantine machine identi- fication with false discovery rate control. InForty-first International Conference on Machine Learning, 2024

  29. [29]

    Detection and isolation of adversaries in decentralized optimization for non-strongly convex objectives.IFAC-PapersOnLine, 52(20):381–386, 2019

    Nikhil Ravi and Anna Scaglione. Detection and isolation of adversaries in decentralized optimization for non-strongly convex objectives.IFAC-PapersOnLine, 52(20):381–386, 2019

  30. [30]

    A heavy-ball distributed optimization algorithm over digraphs with row- stochastic matrices

    Yang Shen and Shaofu Yang. A heavy-ball distributed optimization algorithm over digraphs with row- stochastic matrices. In2020 39th Chinese Control Conference (CCC), pages 4977–4982. IEEE, 2020

  31. [31]

    Securing distributed gradient descent in high dimensional statistical learning

    Lili Su and Jiaming Xu. Securing distributed gradient descent in high dimensional statistical learning. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(1):1–41, 2019

  32. [32]

    Byzantine-Resilient Decentralized Stochastic Optimization With Robust Aggregation Rules.IEEE Transactions on Signal Processing, 71:3179–3195, 2023

    Zhaoxian Wu, Tianyi Chen, and Qing Ling. Byzantine-Resilient Decentralized Stochastic Optimization With Robust Aggregation Rules.IEEE Transactions on Signal Processing, 71:3179–3195, 2023

  33. [33]

    Linear convergence in opti- mization over directed graphs with row-stochastic matrices.IEEE Transactions on Automatic Control, 63(10):3558–3565, 2018

    Chenguang Xi, Van Sy Mai, Ran Xin, Eyad H Abed, and Usman A Khan. Linear convergence in opti- mization over directed graphs with row-stochastic matrices.IEEE Transactions on Automatic Control, 63(10):3558–3565, 2018

  34. [34]

    Faba: An algorithm for fast aggregation against byzantine attacks in distributed neural networks

    Qi Xia, Zeyi Tao, Zijiang Hao, and Qun Li. Faba: An algorithm for fast aggregation against byzantine attacks in distributed neural networks. InProceedings of the Twenty-Eighth International Joint Confer- ence on Artificial Intelligence, IJCAI-19, pages 4824–4830. International Joint Conferences on Artificial Intelligence Organization, 7 2019

  35. [35]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017

  36. [36]

    Distributed average consensus with time-varying metropolis weights.Automatica, 1, 2006

    Lin Xiao, Stephen Boyd, and Sanjay Lall. Distributed average consensus with time-varying metropolis weights.Automatica, 1, 2006. 51

  37. [37]

    Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation

    Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation. InUncertainty in Artificial Intelligence, pages 261–270. PMLR, 2020

  38. [38]

    Robust decentralized dynamic optimization at presence of mal- functioning agents.Signal Processing, 153:24–33, 2018

    Wei Xu, Zhengqing Li, and Qing Ling. Robust decentralized dynamic optimization at presence of mal- functioning agents.Signal Processing, 153:24–33, 2018

  39. [39]

    Byzantine-robust decentralized learning via remove-then-clip aggrega- tion.Proceedings of the AAAI Conference on Artificial Intelligence, 38(19):21735–21743, March 2024

    Caiyi Yang and Javad Ghaderi. Byzantine-robust decentralized learning via remove-then-clip aggrega- tion.Proceedings of the AAAI Conference on Artificial Intelligence, 38(19):21735–21743, March 2024

  40. [40]

    Zhixiong Yang and Waheed U. Bajwa. ByRDiE: byzantine-resilient distributed coordinate descent for decentralized learning.IEEE Transactions on Signal and Information Processing over Networks, 5(4):611– 627, 2019

  41. [41]

    Byzantine-robust distributed learn- ing: Towards optimal statistical rates

    Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. Byzantine-robust distributed learn- ing: Towards optimal statistical rates. InInternational conference on machine learning, pages 5650–5659. Pmlr, 2018

  42. [42]

    Revisiting optimal convergence rate for smooth and non-convex stochastic decentralized optimization.Advances in Neural Information Processing Systems, 35:36382–36395, 2022

    Kun Yuan, Xinmeng Huang, Yiming Chen, Xiaohan Zhang, Yingya Zhang, and Pan Pan. Revisiting optimal convergence rate for smooth and non-convex stochastic decentralized optimization.Advances in Neural Information Processing Systems, 35:36382–36395, 2022

  43. [43]

    Robust estimation via generalized quasi-gradients

    Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt. Robust estimation via generalized quasi-gradients. Information and Inference: A Journal of the IMA, 11(2):581–636, 2022

  44. [44]

    Byzantine-robust federated learning with optimal statistical rates

    Banghua Zhu, Lun Wang, Qi Pang, Shuai Wang, Jiantao Jiao, Dawn Song, and Michael I Jordan. Byzantine-robust federated learning with optimal statistical rates. InInternational Conference on Arti- ficial Intelligence and Statistics, pages 3151–3178. PMLR, 2023. 52