pith. machine review for the scientific record. sign in

arxiv: 2605.11815 · v1 · submitted 2026-05-12 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Fed-BAC: Federated Bandit-Guided Additive Clustering in Hierarchical Federated Learning

Muddesar Iqbal, Satwat Bashir, Tasos Dagiuklas

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:08 UTC · model grok-4.3

classification 💻 cs.LG
keywords hierarchical federated learningbandit algorithmscluster personalizationnon-IID dataclient selectionedge computingadditive decomposition
0
0 comments X

The pith

A two-level bandit framework with additive cluster personalization lets hierarchical federated learning jointly optimize server assignments and client selection, delivering large accuracy gains under non-IID data with only 80 percent client

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that existing hierarchical federated learning methods cannot jointly handle cluster assignment and client selection when data is heterogeneous. Fed-BAC addresses this by running contextual bandits at the cloud to assign servers to clusters and Thompson Sampling at each edge server to pick high-value clients, while an additive decomposition shares a global model across clusters and keeps cluster-specific heads for local variation. The result is measured on CIFAR-10, SVHN, and Fashion-MNIST under both moderate and severe Dirichlet partitions. A reader should care because the approach reduces the number of clients that must participate, speeds convergence, and raises accuracy most when data distributions diverge most.

Core claim

Fed-BAC integrates additive cluster personalization with a two-level bandit framework: contextual bandits at the cloud learn server-to-cluster assignments while Thompson Sampling at each edge server identifies high-contributing clients. The additive decomposition enables sharing of knowledge between groups through a globally aggregated network, while cluster-specific networks capture distribution variations. On three classification benchmarks under moderate (alpha=0.5) and severe (alpha=0.1) Dirichlet non-IID partitioning, this yields distributed accuracy gains of up to +35.5pp over HierFAVG and +8.4pp over IFCA, requires only 80 percent client participation, converges 1.5 to 4.8 times更快, is

What carries the argument

Two-level bandit framework (contextual bandits at cloud for server-to-cluster assignment plus Thompson Sampling at edge servers for client selection) combined with additive decomposition of global and cluster-specific networks.

Load-bearing premise

The two-level bandit framework combined with additive decomposition can reliably identify high-value clusters and clients without introducing selection bias under the tested Dirichlet partitions.

What would settle it

Running the method on a new dataset whose heterogeneity structure differs markedly from the tested Dirichlet partitions and observing that accuracy gains and convergence speedups both disappear would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.11815 by Muddesar Iqbal, Satwat Bashir, Tasos Dagiuklas.

Figure 1
Figure 1. Figure 1: Fed-BAC system architecture. LinUCB at the cloud assigns servers to clusters; TS at each edge server selects [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distributed accuracy vs. communication rounds (exponentially smoothed). Top row: moderate heterogeneity ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cluster dynamics on Fashion-MNIST (α= 0.5, 100 rounds). the shared global component captures transferable structure that isolated clustering discards. Cluster dynamics. Fed-BAC’s LinUCB bandit reassigns servers actively in early rounds and stabilizes as the bandit converges (Figs. 3–5). On Fashion-MNIST (initialized as one cluster), Fed-BAC discovers 7–8 active clusters versus IFCA’s static 3; on CIFAR-10 … view at source ↗
Figure 4
Figure 4. Figure 4: Cluster dynamics on CIFAR-10 (α= 0.5, 200 rounds). 0 20 40 60 80 100 120 140 0 200 400 600 Round Cumulative Reassignments Fed-BAC reassign. IFCA reassign. 0 5 10 Active Clusters Fed-BAC clusters IFCA clusters [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cluster dynamics on SVHN (α= 0.5, 150 rounds). Cross-server fairness. Table IV reports per-server accuracy distributions. At α= 0.5, Fed-BAC achieves the lowest cross￾server variance on SVHN and Fashion-MNIST (σ = 1.41 and 2.05) and halves IFCA’s variance on CIFAR-10, while its worst server (69.54%) exceeds every other method’s mean. Under severe heterogeneity (α= 0.1), HierFAVG’s σ increases sharply (2.10… view at source ↗
read the original abstract

Hierarchical federated learning (HFL) leverages edge servers for partial aggregation in edge computing. Yet existing FL methods lack mechanisms for jointly optimizing cluster assignment and client selection under data heterogeneity. This paper proposes Fed-BAC, which integrates additive cluster personalization with a two-level bandit framework: contextual bandits at the cloud learn server-to-cluster assignments, while Thompson Sampling at each edge server identifies high-contributing clients. The additive decomposition enables the sharing of knowledge between groups through a globally aggregated network, while cluster-specific networks capture distribution variations. Across three classification benchmarks (CIFAR-10, SVHN, Fashion-MNIST) under moderate ($\alpha = 0.5$) and severe ($\alpha = 0.1$) Dirichlet non-IID partitioning, Fed-BAC achieves distributed accuracy gains of up to +35.5pp over HierFAVG and +8.4pp over IFCA, while requiring only 80% client participation, converging 1.5 to 4.8$\times$ faster depending on dataset and accuracy target, and improving cross-server fairness. These gains are further validated at 5$\times$ deployment scale on CIFAR-10. The advantage of Fed-BAC increases with heterogeneity severity, confirming that additive cluster personalization becomes increasingly valuable as data distributions diverge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Fed-BAC for hierarchical federated learning, combining additive cluster personalization (global network plus cluster-specific heads) with a two-level bandit mechanism: contextual bandits at the cloud server for assigning edge servers to clusters, and Thompson Sampling at each edge server for selecting high-contributing clients. Empirical results on CIFAR-10, SVHN, and Fashion-MNIST under Dirichlet non-IID partitions (α=0.5 and α=0.1) claim accuracy gains up to +35.5pp over HierFAVG and +8.4pp over IFCA at 80% client participation, 1.5–4.8× faster convergence, improved fairness, and robustness at 5× scale.

Significance. If the empirical gains and fairness improvements are reproducible with proper statistical controls, the work offers a practical mechanism for client and cluster selection in HFL that reduces communication while handling severe heterogeneity. The additive decomposition provides a clean way to share global knowledge without full personalization overhead, which could influence future edge-computing FL deployments.

major comments (3)
  1. [§3.2] §3.2 (Thompson Sampling at edge servers): the reward signal (local loss/accuracy after partial training) is not shown to be unbiased with respect to client data difficulty or label imbalance; under α=0.1 Dirichlet partitions this risks preferential selection of easier clients, potentially biasing the global network while cluster heads overfit to remaining hard clients. No regret bound or posterior convergence analysis is provided to support stationarity.
  2. [§5.1] §5.1 and Table 2: reported accuracy gains (+35.5pp, +8.4pp) and convergence speedups lack error bars, standard deviations across runs, or statistical significance tests; without these the headline claims cannot be assessed for reliability, especially given the scale-up experiment on CIFAR-10.
  3. [§4.3] §4.3 (additive decomposition): the claim that the global network plus cluster heads reliably separates shared and distribution-specific knowledge is not accompanied by any formal decomposition bound or ablation isolating the contribution of the bandit-driven selection versus the additive structure itself.
minor comments (2)
  1. [Figure 1] Figure 1: the diagram of the two-level bandit flow would benefit from explicit arrows showing how the contextual bandit output feeds into edge-server Thompson Sampling.
  2. [§3.1] Notation: the symbols for cluster assignment probabilities and client selection probabilities are introduced without a consolidated table, making cross-references in §3.1–3.3 harder to follow.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on Fed-BAC. We address each major comment point by point below, indicating the revisions we will incorporate to improve the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Thompson Sampling at edge servers): the reward signal (local loss/accuracy after partial training) is not shown to be unbiased with respect to client data difficulty or label imbalance; under α=0.1 Dirichlet partitions this risks preferential selection of easier clients, potentially biasing the global network while cluster heads overfit to remaining hard clients. No regret bound or posterior convergence analysis is provided to support stationarity.

    Authors: We acknowledge the concern that the reward signal (local loss reduction after partial training) may not be fully unbiased under severe heterogeneity. The signal is chosen to reflect marginal contribution to model improvement rather than absolute accuracy, which helps prioritize clients providing useful gradients. Empirical results on α=0.1 partitions still show gains in both accuracy and fairness, suggesting the mechanism is robust in practice. No regret bound is derived because the primary focus is empirical validation in HFL; we will add a dedicated limitations paragraph discussing potential selection bias and stationarity assumptions in the revised version. revision: partial

  2. Referee: [§5.1] §5.1 and Table 2: reported accuracy gains (+35.5pp, +8.4pp) and convergence speedups lack error bars, standard deviations across runs, or statistical significance tests; without these the headline claims cannot be assessed for reliability, especially given the scale-up experiment on CIFAR-10.

    Authors: We agree that statistical rigor is necessary to substantiate the reported gains. In the revised manuscript we will rerun all experiments with 5 independent random seeds, report mean ± standard deviation for accuracy and convergence metrics, add error bars to figures and Table 2, and include paired t-test p-values to confirm statistical significance of the improvements over baselines. revision: yes

  3. Referee: [§4.3] §4.3 (additive decomposition): the claim that the global network plus cluster heads reliably separates shared and distribution-specific knowledge is not accompanied by any formal decomposition bound or ablation isolating the contribution of the bandit-driven selection versus the additive structure itself.

    Authors: The additive structure is motivated by the goal of sharing a common representation while allowing cluster-specific adaptation. While we do not provide a formal decomposition bound, we will add an ablation study in the revision that compares (i) full Fed-BAC, (ii) additive heads without bandit selection, and (iii) bandit selection without additive heads, to quantify the individual contributions of each component. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical bandit integration with independent experimental validation

full rationale

The paper presents Fed-BAC as an algorithmic method that combines contextual bandits at the cloud level with Thompson Sampling at edge servers for client/cluster selection, plus additive decomposition for personalization. All performance claims (+35.5pp accuracy, faster convergence, fairness at 80% participation) are supported solely by empirical results on CIFAR-10, SVHN, and Fashion-MNIST under Dirichlet non-IID partitions. No equations, first-principles derivations, or predictions are shown that reduce by construction to fitted inputs or self-citations. The method is described as a practical integration of existing bandit techniques with HFL, without load-bearing self-citation chains or self-definitional steps. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; all technical details remain at the level of high-level description.

pith-pipeline@v0.9.0 · 5537 in / 1126 out tokens · 47724 ms · 2026-05-13T07:08:03.037158+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

  1. [1]

    Communication-efficient learning of deep networks from decentralized data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial Intelligence and Statistics, pp. 1273–1282, PMLR, 2017

  2. [2]

    Advances and open problems in federated learning,

    P. Kairouz, H. B. McMahan, B. Avent,et al., “Advances and open problems in federated learning,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021

  3. [3]

    Federated optimization in heterogeneous networks,

    T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inProceedings of Machine Learning and Systems, vol. 2, pp. 429–450, 2020

  4. [4]

    Client-edge-cloud hierarchical federated learning,

    L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Client-edge-cloud hierarchical federated learning,” inIEEE International Conference on Communications (ICC), pp. 1–6, IEEE, 2020

  5. [5]

    Hierarchical federated learning across heterogeneous cellular networks,

    M. S. H. Abad, E. Ozfatura, D. G ¨und¨uz, and O. Ercetin, “Hierarchical federated learning across heterogeneous cellular networks,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8866–8870, IEEE, 2020

  6. [6]

    Topology-aware federated learning in edge computing: A comprehensive survey,

    A. Wu, Y . Sun, Y . Zhu, J. Gu, J. Yang, and F. Wei, “Topology-aware federated learning in edge computing: A comprehensive survey,”ACM Computing Surveys, vol. 56, no. 12, pp. 1–40, 2024

  7. [7]

    Demystifying why local aggregation helps: Convergence analysis of hierarchical sgd,

    J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V . Poor, “Demystifying why local aggregation helps: Convergence analysis of hierarchical sgd,” Advances in Neural Information Processing Systems, vol. 35, pp. 13251– 13265, 2022

  8. [8]

    Communication-efficient federated learning via knowledge distillation,

    C. Wu, F. Wu, L. Lyu, Y . Huang, and X. Xie, “Communication-efficient federated learning via knowledge distillation,”Nature Communications, vol. 13, no. 1, p. 2032, 2022

  9. [9]

    Data-free knowledge distillation for het- erogeneous federated learning,

    Z. Zhu, J. Hong, and J. Zhou, “Data-free knowledge distillation for het- erogeneous federated learning,” inInternational Conference on Machine Learning, pp. 12878–12889, PMLR, 2021

  10. [10]

    Fast heterogeneous federated learning with hybrid client selection,

    D. Song, A. Zhou, W. Sun, B. Li, and W. Li, “Fast heterogeneous federated learning with hybrid client selection,” inUncertainty in Artificial Intelligence, pp. 2006–2015, PMLR, 2023

  11. [11]

    Feder- ated learning with non-iid data,

    Y . Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V . Chandra, “Feder- ated learning with non-iid data,” inNeurIPS Workshop on Federated Learning, 2018

  12. [12]

    The non-iid data quagmire of decentralized machine learning,

    K. Hsieh, A. Phanishayee, O. Mutlu, and P. Gibbons, “The non-iid data quagmire of decentralized machine learning,” inInternational Conference on Machine Learning, pp. 4387–4398, PMLR, 2020

  13. [13]

    An efficient framework for clustered federated learning,

    A. Ghosh, J. Chung, D. Yin, and K. Ramchandran, “An efficient framework for clustered federated learning,” inAdvances in Neural Information Processing Systems, vol. 33, pp. 19586–19597, 2020

  14. [14]

    Clustered federated learning: Model-agnostic distributed multitask optimization under privacy con- straints,

    F. Sattler, K.-R. M ¨uller, and W. Samek, “Clustered federated learning: Model-agnostic distributed multitask optimization under privacy con- straints,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 8, pp. 3710–3722, 2021

  15. [15]

    Towards understanding biased client selection in federated learning,

    Y . J. Cho, J. Wang, and G. Joshi, “Towards understanding biased client selection in federated learning,” inInternational Conference on Artificial Intelligence and Statistics, pp. 10351–10375, PMLR, 2022

  16. [16]

    Oort: Effi- cient federated learning via guided participant selection,

    F. Lai, X. Zhu, H. V . Madhyastha, and M. Chowdhury, “Oort: Effi- cient federated learning via guided participant selection,” inUSENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 19–35, 2021

  17. [17]

    Multi- armed bandit-based client scheduling for federated learning,

    W. Xia, T. Q. Quek, K. Guo, W. Wen, H. H. Yang, and H. Zhu, “Multi- armed bandit-based client scheduling for federated learning,”IEEE Transactions on Wireless Communications, vol. 19, no. 11, pp. 7108– 7123, 2020

  18. [18]

    Ditto: Fair and robust federated learning through personalization,

    T. Li, S. Hu, A. Beirami, and V . Smith, “Ditto: Fair and robust federated learning through personalization,” inInternational Conference on Machine Learning, pp. 6357–6368, PMLR, 2021

  19. [19]

    Adaptive personalized federated learning.arXiv preprint arXiv:2003.13461, 2020

    Y . Deng, M. M. Kamani, and M. Mahdavi, “Adaptive personalized federated learning,”arXiv preprint arXiv:2003.13461, 2020

  20. [20]

    Towards personalized federated learning,

    A. Z. Tan, H. Yu, L. Cui, and Q. Yang, “Towards personalized federated learning,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 12, pp. 9587–9603, 2022

  21. [21]

    Structured federated learning through clustered additive modeling,

    J. Ma, T. Zhou, G. Long, J. Jiang, and C. Zhang, “Structured federated learning through clustered additive modeling,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

  22. [22]

    A contextual-bandit approach to personalized news article recommendation,

    L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” inProceedings of the 19th International Conference on World Wide Web, pp. 661–670, 2010

  23. [23]

    A tutorial on Thompson sampling,

    D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, “A tutorial on Thompson sampling,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–96, 2020

  24. [24]

    Bandit-based communication-efficient client selection strategies for federated learn- ing,

    Y . J. Cho, S. Gupta, G. Joshi, and O. Ya ˘gan, “Bandit-based communication-efficient client selection strategies for federated learn- ing,” in54th Asilomar Conference on Signals, Systems, and Computers, pp. 1066–1069, IEEE, 2020

  25. [25]

    Context-aware online client selection for hierarchical federated learning,

    Z. Qu, R. Duan, L. Chen, J. Xu, Z. Lu, and Y . Liu, “Context-aware online client selection for hierarchical federated learning,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4353–4367, 2022

  26. [26]

    Client selection in hierarchical feder- ated learning,

    S. Trindade and N. da Fonseca, “Client selection in hierarchical feder- ated learning,”IEEE Internet of Things Journal, 2024

  27. [27]

    Chpfl: Clustered adaptive hierarchical federated learning for edge-level personalization,

    L. Song, J. Li, H. Jiang, S. Wei, and Y . Guo, “Chpfl: Clustered adaptive hierarchical federated learning for edge-level personalization,”High- Confidence Computing, vol. 5, no. 3, p. 100279, 2025

  28. [28]

    Personalizing federated learning for hierarchical edge networks with non-IID data,

    S. Lee, O. Tavallaie, S. Chen, K. Thilakarathna, S. Seneviratne, A. N. Toosi, and A. Y . Zomaya, “Personalizing federated learning for hierarchical edge networks with non-IID data,”arXiv preprint arXiv:2504.08872, 2025

  29. [29]

    Federated learning with hierarchical clustering of local updates to improve training on non-IID data,

    C. Briggs, Z. Fan, and P. Andras, “Federated learning with hierarchical clustering of local updates to improve training on non-IID data,” inIEEE International Joint Conference on Neural Networks (IJCNN), pp. 1–9, IEEE, 2020

  30. [30]

    Interaction- aware Gaussian weighting for clustered federated learning,

    A. Licciardi, D. Leo, E. Fan `ı, B. Caputo, and M. Ciccone, “Interaction- aware Gaussian weighting for clustered federated learning,” inInterna- tional Conference on Machine Learning (ICML), 2025

  31. [31]

    Clustered federated learning with hierarchical knowl- edge distillation,

    S. Ahmadet al., “Clustered federated learning with hierarchical knowl- edge distillation,”arXiv preprint arXiv:2512.10443, 2025

  32. [32]

    Exploiting shared representations for personalized federated learning,

    L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Exploiting shared representations for personalized federated learning,” inInterna- tional Conference on Machine Learning, pp. 2089–2099, PMLR, 2021

  33. [33]

    FedCSAD: Federated learning with contextual client selection and confidence-weighted multi-teacher knowledge distillation in power equipment inspection,

    T. Luet al., “FedCSAD: Federated learning with contextual client selection and confidence-weighted multi-teacher knowledge distillation in power equipment inspection,” inAlgorithms and Architectures for Parallel Processing (ICA3PP), vol. 16382 ofLecture Notes in Computer Science, Springer, 2026

  34. [34]

    Measuring the effects of non- identical data distribution for federated visual classification,

    T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non- identical data distribution for federated visual classification,” inNeurIPS Workshop on Federated Learning, 2019

  35. [35]

    Analysis of Thompson sampling for the multi-armed bandit problem,

    S. Agrawal and N. Goyal, “Analysis of Thompson sampling for the multi-armed bandit problem,” inConference on Learning Theory, pp. 39.1–39.26, JMLR, 2012

  36. [36]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” tech. rep., University of Toronto, 2009

  37. [37]

    Reading digits in natural images with unsupervised feature learning,

    Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised feature learning,” in NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011

  38. [38]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms,”arXiv preprint arXiv:1708.07747, 2017