arxiv: 2604.10013 · v2 · submitted 2026-04-11 · 📊 stat.ME · math.OC

Recognition: unknown

Toward Exact Convergence in Byzantine-Robust Decentralized Learning: A Statistical Identification Approach

Changliang Zou, Chengde Qian, Siyuan Zhang, Xin Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:35 UTC · model grok-4.3

classification 📊 stat.ME math.OC

keywords Byzantine robust learningdecentralized optimizationstatistical identificationsample splittingfalse discovery ratestochastic gradient descentnetwork connectivityconvergence analysis

0 comments

The pith

Identifying Byzantine machines with sample-splitting statistics allows decentralized learning to achieve exact convergence at the optimal rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Most defenses against Byzantine attacks in decentralized learning rely on robust aggregation, which biases updates and leaves steady-state errors. This paper instead uses active identification to detect and remove malicious nodes before optimization. A detection procedure based on sample-splitting score statistics controls the false discovery rate and prunes attackers without strong assumptions on data distributions. Once malicious nodes are removed, the remaining normal nodes regain enough connectivity to run standard optimization steps. The resulting algorithm then reaches the same order-optimal convergence rate as ordinary decentralized stochastic gradient methods.

Core claim

The central claim is that a detect-then-optimize pipeline with precise statistical identification of Byzantine machines recovers connectivity among honest nodes, enabling DRSGD-ByMI to match the convergence rate of standard decentralized first-order methods even when malicious machines are present.

What carries the argument

The p-value-free detection procedure using sample-splitting score statistics that prunes malicious nodes while preserving network connectivity for subsequent optimization.

If this is right

The decentralized network recovers sufficient connectivity among normal nodes after pruning.
DRSGD-ByMI achieves the same order-optimal convergence rate as standard decentralized stochastic first-order methods.
The identification mechanism controls the false discovery rate without restrictive distributional assumptions.
Numerical experiments confirm the theoretical convergence results for robust decentralized learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This suggests that identification-based methods could outperform aggregation rules in other distributed settings where graph connectivity matters.
Applying the detection repeatedly might allow handling of time-varying or adaptive Byzantine behaviors.
The framework implies a general principle that removing identified outliers early can avoid bias accumulation in iterative optimization.

Load-bearing premise

The statistical test can correctly find and remove malicious machines while keeping the honest ones connected enough to optimize together.

What would settle it

A network simulation with known malicious nodes where the identification procedure either fails to control false discoveries or leaves the honest subgraph too disconnected, causing the observed convergence rate to fall below the claimed order-optimal level.

Figures

Figures reproduced from arXiv: 2604.10013 by Changliang Zou, Chengde Qian, Siyuan Zhang, Xin Liu.

**Figure 2.** Figure 2: The condition–theorem framework of DRSGD-ByMI. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: FDP, Pa and optimality gap of ByMI-type methods over sr when ϱ = 0.2 under Scenario A, with d = 30 (top row) and d = 80 (bottom row). 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 20 40 60 80 100 FDR (%) UBAR-ByMI-DSRGD IOS-ByMI-DSRGD BALANCE-ByMI-DSRGD Significance level (a) FDP (d = 30) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 20 40 60 80 100 Pa (%) UBAR-ByMI-DSRGD IOS-ByMI-DSRGD BALANCE-ByMI-DSRGD (b) Pa (d = 30) 0.05 0.1… view at source ↗

**Figure 4.** Figure 4: FDP, Pa and optimality gap of ByMI-type methods over ϱ under Scenario B, with d = 30 (top row) and d = 80 (bottom row). 19 [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of DRSGD-ByMI with corresponding decentralized Byzantine-robust stochastic algo [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of DRSGD-ByMI with corresponding decentralized Byzantine-robust stochastic algo [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of DRSGD-ByMI with corresponding decentralized Byzantine-robust stochastic algo [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

read the original abstract

To defend against Byzantine attacks in decentralized learning, most existing methods rely on robust aggregation rules to mitigate the influence of malicious machines. However, these strategies inherently introduce bias, leading to inexact convergence with non-vanishing steady-state errors. In this paper, we propose a strategic shift from passive aggregation to active identification by introducing the Decentralized Rescaled Stochastic Gradient Descent with Byzantine Machine Identification (DRSGD-ByMI) framework. The core of our approach is an identification-based ``detect-then-optimize'' pipeline, where a p-value-free detection procedure is developed to accurately prune malicious nodes from the network. By leveraging sample-splitting score statistics, this identification mechanism achieves false discovery rate control without requiring restrictive distributional assumptions. We theoretically demonstrate that this precise identification allows the decentralized network to recover sufficient connectivity among the normal nodes, thereby enabling DRSGD-ByMI to match, even in the presence of Byzantine machines, the same order-optimal convergence rate as standard decentralized stochastic first-order methods. Numerical experiments validate our theoretical results and demonstrate the effectiveness of DRSGD-ByMI for decentralized robust learning problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's detect-then-prune approach using sample-splitting scores aims to remove Byzantine bias and recover exact optimal rates, but the connectivity of the pruned normal subgraph is the claim that needs the closest check.

read the letter

The core move here is replacing robust aggregation with an upfront statistical identification step that prunes suspected Byzantine nodes, then runs ordinary rescaled decentralized SGD on the rest. This is presented as a way to avoid the steady-state error that aggregation rules usually leave behind. The sample-splitting score test with FDR control, done without strong distributional assumptions, is the part that feels distinct from prior work on Byzantine-robust decentralized methods. If the identification works as claimed, the remaining normal nodes keep enough connectivity for the convergence rate to match the clean case, which is the main theoretical payoff. The abstract states they prove this recovery of connectivity, and the experiments are said to support it. That combination is what makes the paper worth looking at for people working on distributed training under attack. The soft spot is the one the stress-test note flags. FDR control limits the expected fraction of false positives among the detections, but it does not by itself guarantee that the induced subgraph on the surviving normal nodes stays connected or keeps a positive spectral gap with high probability. If the original normal graph is only marginally connected, even a small number of mis-pruned good nodes could break the mixing time or drive the algebraic connectivity to zero, and the claimed O(1/sqrt(T)) rate would not follow. The abstract treats the connectivity recovery as following from the precise identification, but without the explicit high-probability argument or extra minimum-degree assumptions on the normal subgraph, that step is not yet closed. The paper is for specialists in robust decentralized optimization. A reader already familiar with the aggregation literature would get the most out of the identification pipeline and the rate comparison. It is coherent enough on its own terms to deserve a serious referee, mainly to verify whether the connectivity guarantee is actually established in the proofs rather than asserted.

Referee Report

1 major / 2 minor

Summary. The paper proposes the DRSGD-ByMI framework for Byzantine-robust decentralized learning. It introduces a detect-then-optimize pipeline that first applies a p-value-free detection procedure based on sample-splitting score statistics to prune malicious nodes while controlling the false discovery rate (FDR), then runs decentralized rescaled SGD on the remaining network. The central theoretical claim is that this precise identification recovers sufficient connectivity among the normal nodes, enabling the method to achieve the same order-optimal convergence rate O(1/√T) as standard decentralized stochastic first-order methods, even with Byzantine machines present. The abstract states that numerical experiments validate the theory and demonstrate effectiveness.

Significance. If the connectivity preservation argument holds with high probability, the work would be significant for shifting Byzantine defense from biased robust aggregation to active statistical identification, thereby achieving unbiased exact convergence at optimal rates. The p-value-free FDR control without strong distributional assumptions is a notable technical feature, and the overall approach addresses a key limitation in existing decentralized robust methods.

major comments (1)

[Theoretical analysis of connectivity recovery] The theoretical demonstration (referenced in the abstract as showing that 'precise identification allows the decentralized network to recover sufficient connectivity among the normal nodes') relies on FDR control of the sample-splitting score test to guarantee that the induced subgraph on normal nodes retains a positive spectral gap or bounded mixing time. However, FDR only controls the expected number of false discoveries and does not provide a high-probability bound preventing the removal of a critical cut-set of normal nodes that could disconnect the graph or drive algebraic connectivity to zero. This gap directly undermines the subsequent claim of recovering the exact O(1/√T) rate; an explicit high-probability connectivity guarantee (e.g., via minimum-degree assumptions on the original normal subgraph or concentration on the pruned graph's Cheeger constant) is required in the main th

minor comments (2)

[Abstract] The abstract would benefit from explicitly listing the network topology assumptions (e.g., initial connectivity of normal nodes) and the maximum number of Byzantine nodes tolerated relative to total nodes.
[Numerical experiments] Experimental section should include more details on the specific network topologies tested, the fraction of Byzantine nodes, and quantitative metrics for post-pruning connectivity (e.g., algebraic connectivity values).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our work's significance and for the detailed major comment. We address the concern regarding the high-probability connectivity guarantee below and will revise the manuscript to close this gap.

read point-by-point responses

Referee: [Theoretical analysis of connectivity recovery] The theoretical demonstration (referenced in the abstract as showing that 'precise identification allows the decentralized network to recover sufficient connectivity among the normal nodes') relies on FDR control of the sample-splitting score test to guarantee that the induced subgraph on normal nodes retains a positive spectral gap or bounded mixing time. However, FDR only controls the expected number of false discoveries and does not provide a high-probability bound preventing the removal of a critical cut-set of normal nodes that could disconnect the graph or drive algebraic connectivity to zero. This gap directly undermines the subsequent claim of recovering the exact O(1/√T) rate; an explicit high-probability connectivity guarantee (e.g., via minimum-degree assumptions on the original normal subgraph or concentration on the prunedgraph

Authors: We agree that FDR control alone yields an expectation bound and does not automatically deliver a high-probability guarantee against disconnection. In the revision we will strengthen the analysis by adding a concentration argument (via Bernstein-type bounds on the sample-splitting score statistics) showing that the number of false discoveries among normal nodes is O(log n) with probability 1-δ. Combined with an explicit minimum-degree assumption on the normal subgraph (which we will state clearly), this implies that the algebraic connectivity of the pruned normal graph remains bounded below by a positive constant with high probability. The main theorem and its proof will be updated to reflect this, and the abstract will be revised for precision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is logically prior and self-contained

full rationale

The paper's core chain is a detect-then-optimize pipeline: sample-splitting score statistics achieve FDR-controlled identification of Byzantine nodes (without distributional assumptions), which is then used to prune the graph and recover sufficient connectivity among normal nodes for DRSGD-ByMI to attain the standard O(1/√T) rate of decentralized SGD. This identification step is presented as logically prior to the rate analysis, with no equations or claims showing the convergence rate being fitted to data, redefined in terms of itself, or reduced by construction to the detection procedure. No self-citations are load-bearing for the central theorem, and the structure does not rename a known result or smuggle an ansatz. The derivation therefore remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on the effectiveness of the new detection rule and the post-identification connectivity of normal nodes; these are introduced without independent evidence in the abstract.

axioms (2)

domain assumption After pruning identified Byzantine nodes, the subgraph induced by normal nodes retains sufficient connectivity for decentralized SGD to achieve its standard convergence rate.
This connectivity recovery is invoked to justify matching the order-optimal rate of clean decentralized methods.
domain assumption The sample-splitting score statistics admit false-discovery-rate control without restrictive distributional assumptions on the data or gradients.
Stated as a property of the proposed detection procedure.

invented entities (1)

DRSGD-ByMI framework and its p-value-free detection procedure no independent evidence
purpose: To identify and remove Byzantine machines so that exact convergence is restored.
Core novel component introduced by the paper; no external falsifiable handle is provided in the abstract.

pith-pipeline@v0.9.0 · 5495 in / 1452 out tokens · 39088 ms · 2026-05-10T16:35:35.393397+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Lower bounds for non-convex stochastic optimization.Mathematical Programming, 199(1):165–214, 2023

Yossi Arjevani, Yair Carmon, John C Duchi, Dylan J Foster, Nathan Srebro, and Blake Woodworth. Lower bounds for non-convex stochastic optimization.Mathematical Programming, 199(1):165–214, 2023

2023
[2]

Machine learning with adversaries: Byzantine tolerant gradient descent.Advances in neural information processing systems, 30, 2017

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent.Advances in neural information processing systems, 30, 2017

2017
[3]

Being robust (in high dimensions) can be practical

Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Being robust (in high dimensions) can be practical. InInternational Conference on Machine Learning, pages 999–1008. PMLR, 2017

2017
[4]

Outlier robust mean estimation with subgaussian rates via stability .Advances in Neural Information Processing Systems, 33:1830–1840, 2020

Ilias Diakonikolas, Daniel M Kane, and Ankit Pensia. Outlier robust mean estimation with subgaussian rates via stability .Advances in Neural Information Processing Systems, 33:1830–1840, 2020

2020
[5]

Graph theory and probability .Canadian Journal of Mathematics, 11:34–38, 1959

Paul Erd ¨os. Graph theory and probability .Canadian Journal of Mathematics, 11:34–38, 1959

1959
[6]

Cheng Fang, Zhixiong Yang, and Waheed U. Bajwa. BRIDGE: byzantine-resilient decentralized gradient descent.IEEE Transactions on Signal and Information Processing over Networks, 8:610–626, 2022

2022
[7]

Byzantine-robust decentralized federated learning

Minghong Fang, Zifan Zhang, Hairi, Prashant Khanduri, Jia Liu, Songtao Lu, Yuchen Liu, and Neil Gong. Byzantine-robust decentralized federated learning. InProceedings of the 2024 on ACM SIGSAC Confer- ence on Computer and Communications Security, CCS ’24, pages 2874–2888, New York, NY, USA, 2024. Association for Computing Machinery

2024
[8]

Exploring the limits of out-of-distribution detec- tion.Advances in neural information processing systems, 34:7068–7081, 2021

Stanislav Fort, Jie Ren, and Balaji Lakshminarayanan. Exploring the limits of out-of-distribution detec- tion.Advances in neural information processing systems, 34:7068–7081, 2021

2021
[9]

Cambridge University Press, 2015

Alan Frieze and Michał Karo´nski.Introduction to random graphs. Cambridge University Press, 2015. 49

2015
[10]

Byzantine-resilient decentralized stochastic gradient descent.IEEE Transactions on Circuits and Systems for Video Technology, 32(6):4096–4106, 2021

Shangwei Guo, Tianwei Zhang, Han Yu, Xiaofei Xie, Lei Ma, Tao Xiang, and Yang Liu. Byzantine-resilient decentralized stochastic gradient descent.IEEE Transactions on Circuits and Systems for Video Technology, 32(6):4096–4106, 2021

2021
[11]

Byzantine fault-tolerance in decentralized opti- mization under 2f-redundancy

Nirupam Gupta, Thinh T Doan, and Nitin H Vaidya. Byzantine fault-tolerance in decentralized opti- mization under 2f-redundancy . In2021 American Control Conference (ACC), pages 3632–3637. IEEE, 2021

2021
[12]

Byzantine-Robust Decentralized Learning via ClippedGossip,

Lie He, Sai Praneeth Karimireddy , and Martin Jaggi. Byzantine-robust decentralized learning via clipped- gossip.arXiv preprint arXiv:2202.01545, 2022

work page arXiv 2022
[13]

Peter J. Huber. Robust Estimation of a Location Parameter.The Annals of Mathematical Statistics, 35(1):73–101, 1964

1964
[14]

Data falsification attacks on consensus- based detection systems.IEEE Transactions on Signal and Information Processing over Networks, 3(1):145– 158, 2016

Bhavya Kailkhura, Swastik Brahma, and Pramod K Varshney . Data falsification attacks on consensus- based detection systems.IEEE Transactions on Signal and Information Processing over Networks, 3(1):145– 158, 2016

2016
[15]

Byzantine- resilient locally optimum detection using collaborative autonomous networks

Bhavya Kailkhura, Priyadip Ray , Deepak Rajan, Anton Yen, Peter Barnes, and Ryan Goldhahn. Byzantine- resilient locally optimum detection using collaborative autonomous networks. In2017 IEEE 7th inter- national workshop on computational advances in multi-sensor adaptive processing (CAMSAP), pages 1–5. IEEE, 2017

2017
[16]

On the geometric convergence of byzantine- resilient distributed optimization algorithms.SIAM Journal on Optimization, 35(1):210–239, 2025

Kananart Kuwaranancharoen and Shreyas Sundaram. On the geometric convergence of byzantine- resilient distributed optimization algorithms.SIAM Journal on Optimization, 35(1):210–239, 2025

2025
[17]

Byzantine-resilient distributed optimiza- tion of multi-dimensional functions

Kananart Kuwaranancharoen, Lei Xin, and Shreyas Sundaram. Byzantine-resilient distributed optimiza- tion of multi-dimensional functions. In2020 American Control Conference (ACC), pages 4399–4404. IEEE, 2020

2020
[18]

Scalable distributed optimization of multi- dimensional functions despite byzantine adversaries.IEEE Transactions on Signal and Information Pro- cessing over Networks, 2024

Kananart Kuwaranancharoen, Lei Xin, and Shreyas Sundaram. Scalable distributed optimization of multi- dimensional functions despite byzantine adversaries.IEEE Transactions on Signal and Information Pro- cessing over Networks, 2024

2024
[19]

Agnostic estimation of mean and covariance

Kevin A Lai, Anup B Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665–674. IEEE, 2016

2016
[20]

Gradient-based learning applied to docu- ment recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

Yann LeCun, L´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to docu- ment recognition.Proceedings of the IEEE, 86(11):2278–2324, 2002

2002
[21]

Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. Can decentralized algo- rithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent.Advances in neural information processing systems, 30, 2017

2017
[22]

Distributed optimization over weighted directed graphs using row stochas- tic matrix

Van Sy Mai and Eyad H Abed. Distributed optimization over weighted directed graphs using row stochas- tic matrix. In2016 American Control Conference (ACC), pages 7165–7170. IEEE, 2016. 50

2016
[23]

Byzantine-robust decentralized stochastic optimization over static and time-varying networks.Signal Processing, 183:108020, 2021

Jie Peng, Weiyu Li, and Qing Ling. Byzantine-robust decentralized stochastic optimization over static and time-varying networks.Signal Processing, 183:108020, 2021

2021
[24]

Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error.arXiv preprint arXiv:2308.05292, 2023

Jie Peng, Weiyu Li, and Qing Ling. Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error.arXiv preprint arXiv:2308.05292, 2023

work page arXiv 2023
[25]

Byzantine-robust decentralized stochastic optimization

Jie Peng and Qing Ling. Byzantine-robust decentralized stochastic optimization. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5935–5939. IEEE, 2020

2020
[26]

On probabilities of moderate deviations.Journal of Mathematical Sciences, 109(6):2189–2191, 2002

VV Petrov. On probabilities of moderate deviations.Journal of Mathematical Sciences, 109(6):2189–2191, 2002

2002
[27]

The perron-frobenius theorem: some of its applications.IEEE Signal Processing Magazine, 22(2):62–75, 2005

S Unnikrishna Pillai, Torsten Suel, and Seunghun Cha. The perron-frobenius theorem: some of its applications.IEEE Signal Processing Magazine, 22(2):62–75, 2005

2005
[28]

Bymi: Byzantine machine identi- fication with false discovery rate control

Chengde Qian, Mengyuan Wang, Haojie Ren, and Changliang Zou. Bymi: Byzantine machine identi- fication with false discovery rate control. InForty-first International Conference on Machine Learning, 2024

2024
[29]

Detection and isolation of adversaries in decentralized optimization for non-strongly convex objectives.IFAC-PapersOnLine, 52(20):381–386, 2019

Nikhil Ravi and Anna Scaglione. Detection and isolation of adversaries in decentralized optimization for non-strongly convex objectives.IFAC-PapersOnLine, 52(20):381–386, 2019

2019
[30]

A heavy-ball distributed optimization algorithm over digraphs with row- stochastic matrices

Yang Shen and Shaofu Yang. A heavy-ball distributed optimization algorithm over digraphs with row- stochastic matrices. In2020 39th Chinese Control Conference (CCC), pages 4977–4982. IEEE, 2020

2020
[31]

Securing distributed gradient descent in high dimensional statistical learning

Lili Su and Jiaming Xu. Securing distributed gradient descent in high dimensional statistical learning. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(1):1–41, 2019

2019
[32]

Byzantine-Resilient Decentralized Stochastic Optimization With Robust Aggregation Rules.IEEE Transactions on Signal Processing, 71:3179–3195, 2023

Zhaoxian Wu, Tianyi Chen, and Qing Ling. Byzantine-Resilient Decentralized Stochastic Optimization With Robust Aggregation Rules.IEEE Transactions on Signal Processing, 71:3179–3195, 2023

2023
[33]

Linear convergence in opti- mization over directed graphs with row-stochastic matrices.IEEE Transactions on Automatic Control, 63(10):3558–3565, 2018

Chenguang Xi, Van Sy Mai, Ran Xin, Eyad H Abed, and Usman A Khan. Linear convergence in opti- mization over directed graphs with row-stochastic matrices.IEEE Transactions on Automatic Control, 63(10):3558–3565, 2018

2018
[34]

Faba: An algorithm for fast aggregation against byzantine attacks in distributed neural networks

Qi Xia, Zeyi Tao, Zijiang Hao, and Qun Li. Faba: An algorithm for fast aggregation against byzantine attacks in distributed neural networks. InProceedings of the Twenty-Eighth International Joint Confer- ence on Artificial Intelligence, IJCAI-19, pages 4824–4830. International Joint Conferences on Artificial Intelligence Organization, 7 2019

2019
[35]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review arXiv 2017
[36]

Distributed average consensus with time-varying metropolis weights.Automatica, 1, 2006

Lin Xiao, Stephen Boyd, and Sanjay Lall. Distributed average consensus with time-varying metropolis weights.Automatica, 1, 2006. 51

2006
[37]

Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation

Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation. InUncertainty in Artificial Intelligence, pages 261–270. PMLR, 2020

2020
[38]

Robust decentralized dynamic optimization at presence of mal- functioning agents.Signal Processing, 153:24–33, 2018

Wei Xu, Zhengqing Li, and Qing Ling. Robust decentralized dynamic optimization at presence of mal- functioning agents.Signal Processing, 153:24–33, 2018

2018
[39]

Byzantine-robust decentralized learning via remove-then-clip aggrega- tion.Proceedings of the AAAI Conference on Artificial Intelligence, 38(19):21735–21743, March 2024

Caiyi Yang and Javad Ghaderi. Byzantine-robust decentralized learning via remove-then-clip aggrega- tion.Proceedings of the AAAI Conference on Artificial Intelligence, 38(19):21735–21743, March 2024

2024
[40]

Zhixiong Yang and Waheed U. Bajwa. ByRDiE: byzantine-resilient distributed coordinate descent for decentralized learning.IEEE Transactions on Signal and Information Processing over Networks, 5(4):611– 627, 2019

2019
[41]

Byzantine-robust distributed learn- ing: Towards optimal statistical rates

Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. Byzantine-robust distributed learn- ing: Towards optimal statistical rates. InInternational conference on machine learning, pages 5650–5659. Pmlr, 2018

2018
[42]

Revisiting optimal convergence rate for smooth and non-convex stochastic decentralized optimization.Advances in Neural Information Processing Systems, 35:36382–36395, 2022

Kun Yuan, Xinmeng Huang, Yiming Chen, Xiaohan Zhang, Yingya Zhang, and Pan Pan. Revisiting optimal convergence rate for smooth and non-convex stochastic decentralized optimization.Advances in Neural Information Processing Systems, 35:36382–36395, 2022

2022
[43]

Robust estimation via generalized quasi-gradients

Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt. Robust estimation via generalized quasi-gradients. Information and Inference: A Journal of the IMA, 11(2):581–636, 2022

2022
[44]

Byzantine-robust federated learning with optimal statistical rates

Banghua Zhu, Lun Wang, Qi Pang, Shuai Wang, Jiantao Jiao, Dawn Song, and Michael I Jordan. Byzantine-robust federated learning with optimal statistical rates. InInternational Conference on Arti- ficial Intelligence and Statistics, pages 3151–3178. PMLR, 2023. 52

2023