pith. sign in

arxiv: 2509.17398 · v2 · submitted 2025-09-22 · 💻 cs.NI

Optimizing Split Federated Learning with Unstable Client Participation

Pith reviewed 2026-05-18 15:19 UTC · model grok-4.3

classification 💻 cs.NI
keywords split federated learningunstable client participationconvergence upper boundclient samplingmodel splittingfailure-aware optimizationedge AI training
0
0 comments X

The pith

Split federated learning achieves faster convergence under unstable client participation by deriving a bound that accounts for upload, download and aggregation failures then jointly optimizing sampling and model splits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Split federated learning distributes computation between edge devices and a server to train large models without sending raw data. Real networks introduce instability through activation upload failures, gradient download failures, and model aggregation failures, which earlier methods ignored by assuming every client participates perfectly. The paper derives the first convergence upper bound that incorporates these three failure types. It then formulates a joint optimization problem over client sampling decisions and model split points to minimize the bound, and supplies an efficient algorithm that solves the problem to optimality. Simulations on EMNIST and CIFAR-10 show the resulting training outperforms prior benchmarks that do not model failures.

Core claim

The central claim is that the first convergence upper bound for split federated learning under unstable participation is obtained by explicitly modeling activation uploading failures, gradient downloading failures, and model aggregation failures; minimizing this bound through joint optimization of which clients to sample and where to split the model produces training that converges reliably despite participation instability.

What carries the argument

The convergence upper bound obtained by summing the effects of activation upload failures, gradient download failures, and aggregation failures, which becomes the objective minimized by the joint client-sampling and model-splitting decision variables.

If this is right

  • Training loss decreases at a rate bounded by the minimized expression even when clients drop activations or gradients.
  • The optimal sampling policy automatically favors clients whose reliability and compute capacity best match the chosen split point.
  • Communication overhead and computation load are traded off explicitly rather than assumed perfect.
  • The efficient solver returns the exact minimum of the bound without exhaustive search over all possible splits and subsets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same failure-aware bound could be used to set dynamic participation thresholds in non-split federated learning.
  • Real-time estimation of the three failure rates from network logs would let the optimizer adapt splits on the fly.
  • Extending the bound to heterogeneous device capabilities would reveal whether split points should also depend on per-client compute variance.
  • The framework indicates that ignoring participation failures systematically underestimates the communication budget needed for target accuracy.

Load-bearing premise

The three failure probabilities used in the bound accurately describe the actual participation behavior during training.

What would settle it

Measure the observed convergence rate of the optimized split federated learning run on a testbed that enforces the modeled failure probabilities and check whether the measured rate stays below the derived upper bound.

Figures

Figures reproduced from arXiv: 2509.17398 by Dusit Niyato, Hongyang Du, Wei Wei, Xianhao Chen, Xihui Liu, Zheng Lin.

Figure 1
Figure 1. Figure 1: (a) SFL scenario exists uploading, downloading and aggregation [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Split federated learning with unstable clients. The main challenge for tackling unstable client participation lies in that failure can occur at different [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Evaluating SFL training performance on CIFAR-10 and EMNIST using ResNet-50 with unstable clients under different model splitting and client [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The impact of uploading failure pi, downloading failure φi, aggregation failure ai and estimation errors on these parameters on test accuracy. coefficients of variation (CV) into the modeled drop probabil￾ities [60], [61]. As shown in [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The impact of number of selected clients [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation experiments for model splitting scheme on the CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

To enable training of large artificial intelligence (AI) models at the network edge, split federated learning (SFL) has emerged as a promising approach by distributing computation between edge devices and a server. However, while unstable network environments pose significant challenges to SFL, prior schemes often overlook such an effect by assuming perfect client participation, rendering them impractical for real-world scenarios. In this work, we develop an optimization framework for SFL with unstable client participation. We theoretically derive the first convergence upper bound for SFL with unstable client participation by considering activation uploading failures, gradient downloading failures, and model aggregation failures. Based on the theoretical results, we formulate a joint optimization problem for client sampling and model splitting to minimize the upper bound. We then develop an efficient solution approach to solve the problem optimally. Extensive simulations on EMNIST and CIFAR-10 demonstrate the superiority of our proposed framework compared to existing benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to derive the first convergence upper bound for split federated learning (SFL) incorporating activation uploading failures, gradient downloading failures, and model aggregation failures due to unstable client participation. It then sets up a joint optimization problem over client sampling probabilities and the model split point to minimize this upper bound, develops an efficient optimal solver for the problem, and shows through extensive simulations on EMNIST and CIFAR-10 that the resulting framework outperforms existing benchmarks.

Significance. This work tackles a relevant practical challenge in deploying SFL at the edge under realistic network instability. Deriving a convergence bound that explicitly models the three failure modes and using it to guide joint sampling and splitting decisions is a reasonable extension of existing FL theory. The provision of an efficient solver and positive simulation results on standard datasets are strengths. However, the overall significance depends on how well the bound serves as a proxy for real-world performance improvements, which requires further substantiation.

major comments (2)
  1. [§4, Theorem 1] §4, Theorem 1: The upper bound is derived by folding the three failure probabilities into the standard smoothness-based contraction term as multiplicative factors. Because the joint optimization objective is defined directly as minimization of this bound, the lack of any reported check (e.g., correlation between bound value and observed per-round loss decrease across the simulated failure regimes) makes it impossible to confirm that the minimizer of the bound is the configuration that actually accelerates convergence.
  2. [§6, Figures 3–4 and Table 2] §6, Figures 3–4 and Table 2: Superior final accuracy and wall-clock convergence are reported versus benchmarks. However, no comparison is given against alternative sampling/splitting pairs that produce strictly higher bound values; without this, the simulations do not rule out the possibility that a different configuration (with a looser bound) would have performed equally well or better empirically.
minor comments (2)
  1. [§2] §2: A short paragraph contrasting the three explicit failure modes modeled here with the partial-participation assumptions in prior SFL papers would help readers locate the novelty.
  2. [Notation table] Notation table: The relationship between the cut-layer index, per-client compute load, and communication volume is stated in text but would be clearer if summarized in a single equation or small table.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the presentation of our theoretical and empirical results.

read point-by-point responses
  1. Referee: [§4, Theorem 1] §4, Theorem 1: The upper bound is derived by folding the three failure probabilities into the standard smoothness-based contraction term as multiplicative factors. Because the joint optimization objective is defined directly as minimization of this bound, the lack of any reported check (e.g., correlation between bound value and observed per-round loss decrease across the simulated failure regimes) makes it impossible to confirm that the minimizer of the bound is the configuration that actually accelerates convergence.

    Authors: We appreciate the referee's point regarding the validation of the bound as a proxy for convergence performance. The upper bound in Theorem 1 is obtained by extending standard smoothness-based analysis to account for the three types of failures through multiplicative factors on the contraction term. Although an explicit correlation analysis was not included in the original manuscript, our extensive simulations on EMNIST and CIFAR-10 show consistent improvements in both final accuracy and wall-clock time when using the bound-minimizing configurations. To provide direct evidence, we will revise the manuscript to include a new figure or table that plots or tabulates the relationship between the bound value and the observed per-round loss reduction for various failure probabilities and configurations. This addition will help confirm that lower bound values correspond to faster empirical convergence. revision: yes

  2. Referee: [§6, Figures 3–4 and Table 2] §6, Figures 3–4 and Table 2: Superior final accuracy and wall-clock convergence are reported versus benchmarks. However, no comparison is given against alternative sampling/splitting pairs that produce strictly higher bound values; without this, the simulations do not rule out the possibility that a different configuration (with a looser bound) would have performed equally well or better empirically.

    Authors: We agree that comparing against configurations with higher bound values would provide stronger evidence for the effectiveness of bound minimization. In the current simulations, we compare against existing benchmarks from the literature, which do not perform joint optimization of sampling and splitting. In the revised version, we will add ablation experiments that include alternative sampling probabilities and split points yielding higher values of the convergence bound. We will demonstrate that these alternatives lead to inferior performance in terms of convergence speed and accuracy under the same failure regimes, thereby supporting that the minimizer of the bound indeed yields superior empirical results. revision: yes

Circularity Check

0 steps flagged

Derivation of convergence bound followed by bound-minimization is standard and non-circular

full rationale

The paper derives a convergence upper bound incorporating activation-upload, gradient-download, and aggregation failures as multiplicative factors under standard smoothness/bounded-gradient assumptions, then formulates a joint optimization problem whose objective is exactly that bound. This is a conventional theoretical workflow and does not reduce any claimed result to a self-definition, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. No equations or steps in the provided abstract and description exhibit the specific reductions required to flag circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; specific free parameters, axioms, and invented entities cannot be extracted. The framework appears to rest on standard federated-learning convergence assumptions plus new failure-probability models whose details are not provided.

pith-pipeline@v0.9.0 · 5693 in / 1046 out tokens · 37017 ms · 2026-05-18T15:19:25.468301+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FluxShard: Motion-Aware Feature Cache Reuse for Collaborative Video Analytics in Mobile Edge Computing

    cs.NI 2026-05 unverdicted novelty 7.0

    FluxShard uses per-block motion vectors and a Receptive Field Alignment Principle to manage feature cache reuse in edge-cloud video analytics, delivering 32.6-83.8% lower latency and 14.9-64.0% lower energy than basel...

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    2023 edge AI technology report,

    Wevolver, “2023 edge AI technology report,” 2023. [Online]. Available: https://www.wevolver.com/article/2023-edge-ai-technology-report

  2. [2]

    MADRL-based model partition- ing, aggregation control, and resource allocation for cloud-edge-device collaborative split federated learning,

    W. Fan, P. Chen, X. Chun, and Y . Liu, “MADRL-based model partition- ing, aggregation control, and resource allocation for cloud-edge-device collaborative split federated learning,”IEEE Transactions on Mobile Computing, vol. 24, no. 6, pp. 5324–5341, May 2025

  3. [3]

    Split learning over wireless networks: Parallel design and resource management,

    W. Wu, M. Li, K. Qu, C. Zhou, X. Shen, W. Zhuang, X. Li, and W. Shi, “Split learning over wireless networks: Parallel design and resource management,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 4, pp. 1051–1066, Feb. 2023

  4. [4]

    Pipelining split learning in multi-hop edge networks,

    W. Wei, Z. Lin, T. Li, X. Li, and X. Chen, “Pipelining split learning in multi-hop edge networks,”arXiv preprint arXiv:2505.04368, May 2025

  5. [5]

    Spectrum breathing: Protecting over-the-air federated learning against interference,

    Z. Wang, K. Huang, and Y . C. Eldar, “Spectrum breathing: Protecting over-the-air federated learning against interference,”IEEE Trans. Wire- less Commun., vol. 23, no. 8, pp. 10 058–10 071, 2024

  6. [6]

    3U: Joint design of UA V-USV-UUV networks for cooperative target hunting,

    W. Wei, J. Wang, Z. Fang, J. Chen, Y . Ren, and Y . Dong, “3U: Joint design of UA V-USV-UUV networks for cooperative target hunting,” IEEE Transactions on Vehicular Technology, vol. 72, no. 3, pp. 4085– 4090, Mar. 2023

  7. [7]

    Underwater differential game: Finite-time target hunting task with communication delay,

    W. Wei, J. Wang, J. Du, Z. Fang, C. Jiang, and Y . Ren, “Underwater differential game: Finite-time target hunting task with communication delay,” inIEEE International Conference on Communications, Seoul, Korea, May, 2022, pp. 3989–3994

  8. [8]

    Differential game-based deep reinforcement learning in underwater target hunting task,

    W. Wei, J. Wang, J. Du, Z. Fang, Y . Ren, and C. L. P. Chen, “Differential game-based deep reinforcement learning in underwater target hunting task,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 1, pp. 462–474, Jan., 2025

  9. [9]

    NVIDIA Jetson Xavier,

    NVIDIA, “NVIDIA Jetson Xavier,” 2019. [On- line]. Available: https://www.nvidia.com/en-us/autonomous-machines/ embedded-systems/jetson-xavier-series/

  10. [10]

    Split Learning for Health: Distributed Deep Learning Without Sharing Raw Patient Data,

    P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split Learning for Health: Distributed Deep Learning Without Sharing Raw Patient Data,” inICLR workshop on AI for social good, New Orleans, LA, USA, Apr. 2019, pp. 1–7

  11. [11]

    Split Learning in 6G Edge Networks,

    Z. Lin, G. Qu, X. Chen, and K. Huang, “Split Learning in 6G Edge Networks,”IEEE Wireless Communications, vol. 31, no. 4, pp. 170– 176, Aug. 2024

  12. [12]

    Pairingfl: Efficient federated learning with model splitting and client pairing,

    Z. Yao, J. Qi, Y . Xu, Y . Liao, H. Xu, and L. Wang, “Pairingfl: Efficient federated learning with model splitting and client pairing,” IEEE Transactions on Networking, vol. 33, no. 4, pp. 1811–1825, May 2025

  13. [13]

    Efficient parallel split learning over resource-constrained wireless edge networks,

    Z. Lin, G. Zhu, Y . Deng, X. Chen, Y . Gao, K. Huang, and Y . Fang, “Efficient parallel split learning over resource-constrained wireless edge networks,”IEEE Transactions on Mobile Computing, vol. 23, no. 10, pp. 9224–9239, 2024

  14. [14]

    Leo-split: A semi-supervised split learning framework over leo satellite networks,

    Z. Lin, Y . Zhang, Z. Chen, Z. Fang, C. Wu, X. Chen, Y . Gao, and J. Luo, “Leo-split: A semi-supervised split learning framework over leo satellite networks,”IEEE Transactions on Mobile Computing, 2025

  15. [15]

    Federated learning: Challenges, methods, and future directions,

    T. Li, A. K. Sahu, A. Talwalkar, and V . Smith, “Federated learning: Challenges, methods, and future directions,”IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, May 2020

  16. [16]

    Fedsn: A federated learning framework over heterogeneous leo satellite networks,

    Z. Lin, Z. Chen, Z. Fang, X. Chen, X. Wang, and Y . Gao, “Fedsn: A federated learning framework over heterogeneous leo satellite networks,” IEEE Transactions on Mobile Computing, 2024

  17. [17]

    Federated Learning: Strategies for Improving Communication Efficiency

    J. Kone ˇcn`y, “Federated learning: Strategies for improving communica- tion efficiency,”arXiv preprint arXiv:1610.05492, 2016

  18. [18]

    Splitfed: When federated learning meets split learning,

    C. Thapa, P. C. M. Arachchige, S. Camtepe, and L. Sun, “Splitfed: When federated learning meets split learning,” inProceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, Feb. 2022, pp. 8485–8493

  19. [19]

    Hasfl: Heterogeneity- aware split federated learning over edge computing systems,

    Z. Lin, Z. Chen, X. Chen, W. Ni, and Y . Gao, “Hasfl: Heterogeneity- aware split federated learning over edge computing systems,”arXiv preprint arXiv:2506.08426, 2025

  20. [20]

    Wireless distributed learning: A new hybrid split and federated learning approach,

    X. Liu, Y . Deng, and T. Mahmoodi, “Wireless distributed learning: A new hybrid split and federated learning approach,”IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2650–2665, Apr. 2023

  21. [21]

    Hsplitlora: A heterogeneous split parameter- efficient fine-tuning framework for large language models,

    Z. Lin, Y . Zhang, Z. Chen, Z. Fang, X. Chen, P. Vepakomma, W. Ni, J. Luo, and Y . Gao, “Hsplitlora: A heterogeneous split parameter- efficient fine-tuning framework for large language models,”arXiv preprint arXiv:2505.02795, 2025

  22. [22]

    Federated learning in mobile edge networks: A comprehensive survey,

    W. Y . B. Lim, N. C. Luong, D. T. Hoang, Y . Jiao, Y .-C. Liang, Q. Yang, D. Niyato, and C. Miao, “Federated learning in mobile edge networks: A comprehensive survey,”IEEE Communications Surveys & Tutorials, vol. 22, no. 3, pp. 2031–2063, Apr. 2020

  23. [23]

    Fed- erated learning under heterogeneous and correlated client availability,

    A. Rodio, F. Faticanti, O. Marfoq, G. Neglia, and E. Leonardi, “Fed- erated learning under heterogeneous and correlated client availability,” IEEE/ACM Transactions on Networking, vol. 32, no. 2, pp. 1451–1460, Apr. 2024

  24. [24]

    FedMeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,

    Q. Chen, X. Chen, and K. Huang, “Fedmeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,”arXiv preprint arXiv:2412.17231, Dec. 2024

  25. [25]

    Accelerating federated learning with model segmentation for edge networks,

    M. Hu, J. Zhang, X. Wang, S. Liu, and Z. Lin, “Accelerating federated learning with model segmentation for edge networks,”IEEE Transac- tions on Green Communications and Networking, 2024

  26. [26]

    Smart split-federated learning over noisy channels for embryo image segmentation,

    Z. H. Kafshgari, I. V . Baji ´c, and P. Saeedi, “Smart split-federated learning over noisy channels for embryo image segmentation,” inIEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, Jun. 2023, pp. 1–5

  27. [27]

    SplitFed resilience to packet loss: Where to split, that is the question,

    C. Shiranthika, Z. H. Kafshgari, P. Saeedi, and I. V . Baji ´c, “SplitFed resilience to packet loss: Where to split, that is the question,” inMed- ical Image Computing and Computer Assisted Intervention Workshops, Vancouver, BC, Canada, Dec. 2023, pp. 367–377

  28. [28]

    Communication-efficient learning of deep networks from decentralized data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics, Florida, USA, Apr. 2017, pp. 1273–1282

  29. [29]

    Cooperative SGD: a unified framework for the design and analysis of local-update sgd algorithms,

    J. Wang and G. Joshi, “Cooperative SGD: a unified framework for the design and analysis of local-update sgd algorithms,”Journal of Machine Learning Research, vol. 22, no. 1, pp. 9709 – 9758, Jan. 2021

  30. [30]

    Graph oracle models, lower bounds, and gaps for parallel stochastic optimization,

    B. E. Woodworth, J. Wang, A. Smith, B. McMahan, and N. Srebro, “Graph oracle models, lower bounds, and gaps for parallel stochastic optimization,” inAdvances in Neural Information Processing Systems, Montr´eal, Canada, Dec. 2018, pp. 1–11. 17

  31. [31]

    Federated learning over wireless networks: Convergence analysis and resource allocation,

    C. T. Dinh, N. H. Tran, M. N. H. Nguyen, C. S. Hong, W. Bao, A. Y . Zomaya, and V . Gramoli, “Federated learning over wireless networks: Convergence analysis and resource allocation,”IEEE/ACM Transactions on Networking, vol. 29, no. 1, pp. 398–409, Feb. 2021

  32. [32]

    Client selection in federated learning: Convergence analysis and power-of-choice selection strategies,

    Y . J. Cho, J. Wang, and G. Joshi, “Client selection in federated learning: Convergence analysis and power-of-choice selection strategies,”arXiv preprint arXiv:2010.01243, Oct. 2020

  33. [33]

    Optimal client sampling for federated learning,

    W. Chen, S. Horvath, and P. Richtarik, “Optimal client sampling for federated learning,”arXiv preprint arXiv:2010.13723, Oct. 2022

  34. [34]

    Federated learning under impor- tance sampling,

    E. Rizk, S. Vlaski, and A. H. Sayed, “Federated learning under impor- tance sampling,”IEEE Transactions on Signal Processing, vol. 70, pp. 5381–5396, Sep. 2022

  35. [35]

    Towards understanding biased client selection in federated learning,

    Y . Jee Cho, J. Wang, and G. Joshi, “Towards understanding biased client selection in federated learning,” inProceedings of International Conference on Artificial Intelligence and Statistics, vol. 151. Virtual Conference: PMLR, Mar. 2022, pp. 10 351–10 375

  36. [36]

    Clustered sampling: Low-variance and improved representativity for clients selection in federated learning,

    Y . Fraboni, R. Vidal, L. Kameni, and M. Lorenzi, “Clustered sampling: Low-variance and improved representativity for clients selection in federated learning,” inInternational Conference on Machine Learning, Virtual, Jul. 2021, pp. 3407–3416

  37. [37]

    Heterogeneity-guided client sampling: Towards fast and efficient Non-IID federated learning,

    H. Chen and H. Vikalo, “Heterogeneity-guided client sampling: Towards fast and efficient Non-IID federated learning,” inAdvances in Neural Information Processing Systems, Vancouver, Canada, Dec. 2024, pp. 65 525–65 561

  38. [38]

    Tackling system and statistical heterogeneity for federated learning with adaptive client sampling,

    B. Luo, W. Xiao, S. Wang, J. Huang, and L. Tassiulas, “Tackling system and statistical heterogeneity for federated learning with adaptive client sampling,” inIEEE Conference on Computer Communications, London, United Kingdom, May 2022, pp. 1739–1748

  39. [39]

    Eiffel: Efficient and fair scheduling in adaptive federated learning,

    A. Sultana, M. M. Haque, L. Chen, F. Xu, and X. Yuan, “Eiffel: Efficient and fair scheduling in adaptive federated learning,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4282–4294, Jun. 2022

  40. [40]

    Ultra-low- latency edge inference for distributed sensing,

    Z. Wang, A. E. Kalør, Y . Zhou, P. Popovski, and K. Huang, “Ultra-low- latency edge inference for distributed sensing,”IEEE Trans. Wireless Commun., 2024, early access, 2025

  41. [41]

    Revisiting outage for edge inference systems,

    Z. Wang, Q. Zeng, H. Zheng, and K. Huang, “Revisiting outage for edge inference systems,” 2025. [Online]. Available: https: //arxiv.org/abs/2504.03686

  42. [42]

    Adaptsfl: Adaptive split federated learning in resource-constrained edge networks,

    Z. Lin, G. Qu, W. Wei, X. Chen, and K. K. Leung, “Adaptsfl: Adaptive split federated learning in resource-constrained edge networks,”IEEE Transactions on Networking, pp. 1–16, early access 2025

  43. [43]

    Hierarchical split federated learning: Convergence analysis and system optimization,

    Z. Lin, W. Wei, Z. Chen, C.-T. Lam, X. Chen, Y . Gao, and J. Luo, “Hierarchical split federated learning: Convergence analysis and system optimization,”IEEE Transactions on Mobile Computing, vol. 24, no. 10, pp. 9352–9367, Oct. 2025

  44. [44]

    Accelerating split federated learning over wireless communication networks,

    C. Xu, J. Li, Y . Liu, Y . Ling, and M. Wen, “Accelerating split federated learning over wireless communication networks,”IEEE Transactions on Wireless Communications, vol. 23, no. 6, pp. 5587–5599, Jun. 2024

  45. [45]

    Unleashing the tiger: Inference attacks on split learning,

    D. Pasquini, G. Ateniese, and M. Bernaschi, “Unleashing the tiger: Inference attacks on split learning,” inProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. New York, NY , USA: Association for Computing Machinery, Nov. 2021, pp. 2113–2129

  46. [46]

    SCAFFOLD: Stochastic controlled averaging for federated learning,

    S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” inInternational Conference on Machine Learning, vol. 119. Virtual: PMLR, Jul. 2020, pp. 5132–5143

  47. [47]

    On the conver- gence of fedavg on Non-IID data,

    X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the conver- gence of fedavg on Non-IID data,” inProceedings of the International Conference on Learning Representation, Addis Ababa, Ethiopia, Apr. 2020, pp. 1–11

  48. [48]

    Achieving linear speedup with partial worker participation in non-iid federated learning,

    H. Yang, M. Fang, and J. Liu, “Achieving linear speedup with partial worker participation in non-iid federated learning,” inInternational Conference on Learning Representations, Vienna, Austria, May 2021, pp. 1–11

  49. [49]

    Resource constrained vehicular edge federated learning with highly mobile connected vehicles,

    M. F. Pervej, R. Jin, and H. Dai, “Resource constrained vehicular edge federated learning with highly mobile connected vehicles,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 6, pp. 1825–1844, May, 2023

  50. [50]

    Convergence anal- ysis of split federated learning on heterogeneous data,

    P. Han, C. Huang, G. Tian, M. Tang, and X. Liu, “Convergence anal- ysis of split federated learning on heterogeneous data,”arXiv preprint arXiv:2402.15166, 2025

  51. [51]

    On the convergence of local stochastic compositional gradient descent with momentum,

    H. Gao, J. Li, and H. Huang, “On the convergence of local stochastic compositional gradient descent with momentum,” inProceedings of International Conference on Machine Learning, Baltimore, USA, Jul. 2022, pp. 7017–7035

  52. [52]

    Optimal batch-size control for low-latency federated learning with device heterogeneity,

    H. Yang, Z. Wang, and K. Huang, “Optimal batch-size control for low-latency federated learning with device heterogeneity,” 2025. [Online]. Available: https://arxiv.org/abs/2507.15601

  53. [53]

    Adaptive heterogeneous client sampling for federated learning over wireless networks,

    B. Luo, W. Xiao, S. Wang, J. Huang, and L. Tassiulas, “Adaptive heterogeneous client sampling for federated learning over wireless networks,”IEEE Transactions on Mobile Computing, vol. 23, no. 10, pp. 9663–9677, Oct. 2024

  54. [54]

    A bisection method for systems of nonlinear equations,

    A. Eiger, K. Sikorski, and F. Stenger, “A bisection method for systems of nonlinear equations,”ACM Transactions on Mathematical Software, vol. 10, no. 4, pp. 367–377, Dec. 1984

  55. [55]

    EMNIST: an extension of MNIST to handwritten letters

    G. Cohen, S. Afshar, J. Tapson, and A. van Schaik, “EMNIST: an exten- sion of MNIST to handwritten letters,”arXiv preprint arXiv:1702.05373, Mar. 2017

  56. [56]

    Gradient-based learning applied to document recognition,

    Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proc. IEEE, vol. 86, no. 11, pp. 2278– 2324, Nov. 1998

  57. [57]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV , USA, Dec. 2016, pp. 770–778

  58. [58]

    Towards optimal heterogeneous client sampling in multi-model feder- ated learning,

    H. Zhang, Z. Gong, Z. Li, M. Siew, C. Joe-Wong, and R. El-Azouzi, “Towards optimal heterogeneous client sampling in multi-model feder- ated learning,”arXiv preprint arXiv:2504.05138, Apr. 2025

  59. [59]

    Adaptive federated learning in resource constrained edge com- puting systems,

    S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive federated learning in resource constrained edge com- puting systems,”IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1205–1221, Jun. 2019

  60. [60]

    DTS: A simulator to estimate the training time of distributed deep neural networks,

    W. J. Robinson M., F. Esposito, and M. A. Zuluaga, “DTS: A simulator to estimate the training time of distributed deep neural networks,” in International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Nice, France, Mar. 2023, pp. 17–24

  61. [61]

    Modeling forecast errors for microgrid operation using Gaussian process regression,

    Y . Yoo and S. Jung, “Modeling forecast errors for microgrid operation using Gaussian process regression,”Scientific Reports, vol. 14, no. 1, p. 2166, Jan. 2024