Recognition: 2 theorem links
· Lean TheoremFed-BAC: Federated Bandit-Guided Additive Clustering in Hierarchical Federated Learning
Pith reviewed 2026-05-13 07:08 UTC · model grok-4.3
The pith
A two-level bandit framework with additive cluster personalization lets hierarchical federated learning jointly optimize server assignments and client selection, delivering large accuracy gains under non-IID data with only 80 percent client
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Fed-BAC integrates additive cluster personalization with a two-level bandit framework: contextual bandits at the cloud learn server-to-cluster assignments while Thompson Sampling at each edge server identifies high-contributing clients. The additive decomposition enables sharing of knowledge between groups through a globally aggregated network, while cluster-specific networks capture distribution variations. On three classification benchmarks under moderate (alpha=0.5) and severe (alpha=0.1) Dirichlet non-IID partitioning, this yields distributed accuracy gains of up to +35.5pp over HierFAVG and +8.4pp over IFCA, requires only 80 percent client participation, converges 1.5 to 4.8 times更快, is
What carries the argument
Two-level bandit framework (contextual bandits at cloud for server-to-cluster assignment plus Thompson Sampling at edge servers for client selection) combined with additive decomposition of global and cluster-specific networks.
Load-bearing premise
The two-level bandit framework combined with additive decomposition can reliably identify high-value clusters and clients without introducing selection bias under the tested Dirichlet partitions.
What would settle it
Running the method on a new dataset whose heterogeneity structure differs markedly from the tested Dirichlet partitions and observing that accuracy gains and convergence speedups both disappear would falsify the central claim.
Figures
read the original abstract
Hierarchical federated learning (HFL) leverages edge servers for partial aggregation in edge computing. Yet existing FL methods lack mechanisms for jointly optimizing cluster assignment and client selection under data heterogeneity. This paper proposes Fed-BAC, which integrates additive cluster personalization with a two-level bandit framework: contextual bandits at the cloud learn server-to-cluster assignments, while Thompson Sampling at each edge server identifies high-contributing clients. The additive decomposition enables the sharing of knowledge between groups through a globally aggregated network, while cluster-specific networks capture distribution variations. Across three classification benchmarks (CIFAR-10, SVHN, Fashion-MNIST) under moderate ($\alpha = 0.5$) and severe ($\alpha = 0.1$) Dirichlet non-IID partitioning, Fed-BAC achieves distributed accuracy gains of up to +35.5pp over HierFAVG and +8.4pp over IFCA, while requiring only 80% client participation, converging 1.5 to 4.8$\times$ faster depending on dataset and accuracy target, and improving cross-server fairness. These gains are further validated at 5$\times$ deployment scale on CIFAR-10. The advantage of Fed-BAC increases with heterogeneity severity, confirming that additive cluster personalization becomes increasingly valuable as data distributions diverge.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Fed-BAC for hierarchical federated learning, combining additive cluster personalization (global network plus cluster-specific heads) with a two-level bandit mechanism: contextual bandits at the cloud server for assigning edge servers to clusters, and Thompson Sampling at each edge server for selecting high-contributing clients. Empirical results on CIFAR-10, SVHN, and Fashion-MNIST under Dirichlet non-IID partitions (α=0.5 and α=0.1) claim accuracy gains up to +35.5pp over HierFAVG and +8.4pp over IFCA at 80% client participation, 1.5–4.8× faster convergence, improved fairness, and robustness at 5× scale.
Significance. If the empirical gains and fairness improvements are reproducible with proper statistical controls, the work offers a practical mechanism for client and cluster selection in HFL that reduces communication while handling severe heterogeneity. The additive decomposition provides a clean way to share global knowledge without full personalization overhead, which could influence future edge-computing FL deployments.
major comments (3)
- [§3.2] §3.2 (Thompson Sampling at edge servers): the reward signal (local loss/accuracy after partial training) is not shown to be unbiased with respect to client data difficulty or label imbalance; under α=0.1 Dirichlet partitions this risks preferential selection of easier clients, potentially biasing the global network while cluster heads overfit to remaining hard clients. No regret bound or posterior convergence analysis is provided to support stationarity.
- [§5.1] §5.1 and Table 2: reported accuracy gains (+35.5pp, +8.4pp) and convergence speedups lack error bars, standard deviations across runs, or statistical significance tests; without these the headline claims cannot be assessed for reliability, especially given the scale-up experiment on CIFAR-10.
- [§4.3] §4.3 (additive decomposition): the claim that the global network plus cluster heads reliably separates shared and distribution-specific knowledge is not accompanied by any formal decomposition bound or ablation isolating the contribution of the bandit-driven selection versus the additive structure itself.
minor comments (2)
- [Figure 1] Figure 1: the diagram of the two-level bandit flow would benefit from explicit arrows showing how the contextual bandit output feeds into edge-server Thompson Sampling.
- [§3.1] Notation: the symbols for cluster assignment probabilities and client selection probabilities are introduced without a consolidated table, making cross-references in §3.1–3.3 harder to follow.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on Fed-BAC. We address each major comment point by point below, indicating the revisions we will incorporate to improve the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Thompson Sampling at edge servers): the reward signal (local loss/accuracy after partial training) is not shown to be unbiased with respect to client data difficulty or label imbalance; under α=0.1 Dirichlet partitions this risks preferential selection of easier clients, potentially biasing the global network while cluster heads overfit to remaining hard clients. No regret bound or posterior convergence analysis is provided to support stationarity.
Authors: We acknowledge the concern that the reward signal (local loss reduction after partial training) may not be fully unbiased under severe heterogeneity. The signal is chosen to reflect marginal contribution to model improvement rather than absolute accuracy, which helps prioritize clients providing useful gradients. Empirical results on α=0.1 partitions still show gains in both accuracy and fairness, suggesting the mechanism is robust in practice. No regret bound is derived because the primary focus is empirical validation in HFL; we will add a dedicated limitations paragraph discussing potential selection bias and stationarity assumptions in the revised version. revision: partial
-
Referee: [§5.1] §5.1 and Table 2: reported accuracy gains (+35.5pp, +8.4pp) and convergence speedups lack error bars, standard deviations across runs, or statistical significance tests; without these the headline claims cannot be assessed for reliability, especially given the scale-up experiment on CIFAR-10.
Authors: We agree that statistical rigor is necessary to substantiate the reported gains. In the revised manuscript we will rerun all experiments with 5 independent random seeds, report mean ± standard deviation for accuracy and convergence metrics, add error bars to figures and Table 2, and include paired t-test p-values to confirm statistical significance of the improvements over baselines. revision: yes
-
Referee: [§4.3] §4.3 (additive decomposition): the claim that the global network plus cluster heads reliably separates shared and distribution-specific knowledge is not accompanied by any formal decomposition bound or ablation isolating the contribution of the bandit-driven selection versus the additive structure itself.
Authors: The additive structure is motivated by the goal of sharing a common representation while allowing cluster-specific adaptation. While we do not provide a formal decomposition bound, we will add an ablation study in the revision that compares (i) full Fed-BAC, (ii) additive heads without bandit selection, and (iii) bandit selection without additive heads, to quantify the individual contributions of each component. revision: partial
Circularity Check
No circularity: empirical bandit integration with independent experimental validation
full rationale
The paper presents Fed-BAC as an algorithmic method that combines contextual bandits at the cloud level with Thompson Sampling at edge servers for client/cluster selection, plus additive decomposition for personalization. All performance claims (+35.5pp accuracy, faster convergence, fairness at 80% participation) are supported solely by empirical results on CIFAR-10, SVHN, and Fashion-MNIST under Dirichlet non-IID partitions. No equations, first-principles derivations, or predictions are shown that reduce by construction to fitted inputs or self-citations. The method is described as a practical integration of existing bandit techniques with HFL, without load-bearing self-citation chains or self-definitional steps. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearadditive cluster personalization ... ŷ = h(x; Θ_global) + f(x; Θ_k)
Reference graph
Works this paper leans on
-
[1]
Communication-efficient learning of deep networks from decentralized data,
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial Intelligence and Statistics, pp. 1273–1282, PMLR, 2017
work page 2017
-
[2]
Advances and open problems in federated learning,
P. Kairouz, H. B. McMahan, B. Avent,et al., “Advances and open problems in federated learning,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021
work page 2021
-
[3]
Federated optimization in heterogeneous networks,
T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inProceedings of Machine Learning and Systems, vol. 2, pp. 429–450, 2020
work page 2020
-
[4]
Client-edge-cloud hierarchical federated learning,
L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Client-edge-cloud hierarchical federated learning,” inIEEE International Conference on Communications (ICC), pp. 1–6, IEEE, 2020
work page 2020
-
[5]
Hierarchical federated learning across heterogeneous cellular networks,
M. S. H. Abad, E. Ozfatura, D. G ¨und¨uz, and O. Ercetin, “Hierarchical federated learning across heterogeneous cellular networks,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8866–8870, IEEE, 2020
work page 2020
-
[6]
Topology-aware federated learning in edge computing: A comprehensive survey,
A. Wu, Y . Sun, Y . Zhu, J. Gu, J. Yang, and F. Wei, “Topology-aware federated learning in edge computing: A comprehensive survey,”ACM Computing Surveys, vol. 56, no. 12, pp. 1–40, 2024
work page 2024
-
[7]
Demystifying why local aggregation helps: Convergence analysis of hierarchical sgd,
J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V . Poor, “Demystifying why local aggregation helps: Convergence analysis of hierarchical sgd,” Advances in Neural Information Processing Systems, vol. 35, pp. 13251– 13265, 2022
work page 2022
-
[8]
Communication-efficient federated learning via knowledge distillation,
C. Wu, F. Wu, L. Lyu, Y . Huang, and X. Xie, “Communication-efficient federated learning via knowledge distillation,”Nature Communications, vol. 13, no. 1, p. 2032, 2022
work page 2032
-
[9]
Data-free knowledge distillation for het- erogeneous federated learning,
Z. Zhu, J. Hong, and J. Zhou, “Data-free knowledge distillation for het- erogeneous federated learning,” inInternational Conference on Machine Learning, pp. 12878–12889, PMLR, 2021
work page 2021
-
[10]
Fast heterogeneous federated learning with hybrid client selection,
D. Song, A. Zhou, W. Sun, B. Li, and W. Li, “Fast heterogeneous federated learning with hybrid client selection,” inUncertainty in Artificial Intelligence, pp. 2006–2015, PMLR, 2023
work page 2006
-
[11]
Feder- ated learning with non-iid data,
Y . Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V . Chandra, “Feder- ated learning with non-iid data,” inNeurIPS Workshop on Federated Learning, 2018
work page 2018
-
[12]
The non-iid data quagmire of decentralized machine learning,
K. Hsieh, A. Phanishayee, O. Mutlu, and P. Gibbons, “The non-iid data quagmire of decentralized machine learning,” inInternational Conference on Machine Learning, pp. 4387–4398, PMLR, 2020
work page 2020
-
[13]
An efficient framework for clustered federated learning,
A. Ghosh, J. Chung, D. Yin, and K. Ramchandran, “An efficient framework for clustered federated learning,” inAdvances in Neural Information Processing Systems, vol. 33, pp. 19586–19597, 2020
work page 2020
-
[14]
F. Sattler, K.-R. M ¨uller, and W. Samek, “Clustered federated learning: Model-agnostic distributed multitask optimization under privacy con- straints,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 8, pp. 3710–3722, 2021
work page 2021
-
[15]
Towards understanding biased client selection in federated learning,
Y . J. Cho, J. Wang, and G. Joshi, “Towards understanding biased client selection in federated learning,” inInternational Conference on Artificial Intelligence and Statistics, pp. 10351–10375, PMLR, 2022
work page 2022
-
[16]
Oort: Effi- cient federated learning via guided participant selection,
F. Lai, X. Zhu, H. V . Madhyastha, and M. Chowdhury, “Oort: Effi- cient federated learning via guided participant selection,” inUSENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 19–35, 2021
work page 2021
-
[17]
Multi- armed bandit-based client scheduling for federated learning,
W. Xia, T. Q. Quek, K. Guo, W. Wen, H. H. Yang, and H. Zhu, “Multi- armed bandit-based client scheduling for federated learning,”IEEE Transactions on Wireless Communications, vol. 19, no. 11, pp. 7108– 7123, 2020
work page 2020
-
[18]
Ditto: Fair and robust federated learning through personalization,
T. Li, S. Hu, A. Beirami, and V . Smith, “Ditto: Fair and robust federated learning through personalization,” inInternational Conference on Machine Learning, pp. 6357–6368, PMLR, 2021
work page 2021
-
[19]
Adaptive personalized federated learning.arXiv preprint arXiv:2003.13461, 2020
Y . Deng, M. M. Kamani, and M. Mahdavi, “Adaptive personalized federated learning,”arXiv preprint arXiv:2003.13461, 2020
-
[20]
Towards personalized federated learning,
A. Z. Tan, H. Yu, L. Cui, and Q. Yang, “Towards personalized federated learning,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 12, pp. 9587–9603, 2022
work page 2022
-
[21]
Structured federated learning through clustered additive modeling,
J. Ma, T. Zhou, G. Long, J. Jiang, and C. Zhang, “Structured federated learning through clustered additive modeling,” inAdvances in Neural Information Processing Systems, vol. 36, 2023
work page 2023
-
[22]
A contextual-bandit approach to personalized news article recommendation,
L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” inProceedings of the 19th International Conference on World Wide Web, pp. 661–670, 2010
work page 2010
-
[23]
A tutorial on Thompson sampling,
D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, “A tutorial on Thompson sampling,”Foundations and Trends in Machine Learning, vol. 14, no. 1–2, pp. 1–96, 2020
work page 2020
-
[24]
Bandit-based communication-efficient client selection strategies for federated learn- ing,
Y . J. Cho, S. Gupta, G. Joshi, and O. Ya ˘gan, “Bandit-based communication-efficient client selection strategies for federated learn- ing,” in54th Asilomar Conference on Signals, Systems, and Computers, pp. 1066–1069, IEEE, 2020
work page 2020
-
[25]
Context-aware online client selection for hierarchical federated learning,
Z. Qu, R. Duan, L. Chen, J. Xu, Z. Lu, and Y . Liu, “Context-aware online client selection for hierarchical federated learning,”IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4353–4367, 2022
work page 2022
-
[26]
Client selection in hierarchical feder- ated learning,
S. Trindade and N. da Fonseca, “Client selection in hierarchical feder- ated learning,”IEEE Internet of Things Journal, 2024
work page 2024
-
[27]
Chpfl: Clustered adaptive hierarchical federated learning for edge-level personalization,
L. Song, J. Li, H. Jiang, S. Wei, and Y . Guo, “Chpfl: Clustered adaptive hierarchical federated learning for edge-level personalization,”High- Confidence Computing, vol. 5, no. 3, p. 100279, 2025
work page 2025
-
[28]
Personalizing federated learning for hierarchical edge networks with non-IID data,
S. Lee, O. Tavallaie, S. Chen, K. Thilakarathna, S. Seneviratne, A. N. Toosi, and A. Y . Zomaya, “Personalizing federated learning for hierarchical edge networks with non-IID data,”arXiv preprint arXiv:2504.08872, 2025
-
[29]
C. Briggs, Z. Fan, and P. Andras, “Federated learning with hierarchical clustering of local updates to improve training on non-IID data,” inIEEE International Joint Conference on Neural Networks (IJCNN), pp. 1–9, IEEE, 2020
work page 2020
-
[30]
Interaction- aware Gaussian weighting for clustered federated learning,
A. Licciardi, D. Leo, E. Fan `ı, B. Caputo, and M. Ciccone, “Interaction- aware Gaussian weighting for clustered federated learning,” inInterna- tional Conference on Machine Learning (ICML), 2025
work page 2025
-
[31]
Clustered federated learning with hierarchical knowl- edge distillation,
S. Ahmadet al., “Clustered federated learning with hierarchical knowl- edge distillation,”arXiv preprint arXiv:2512.10443, 2025
-
[32]
Exploiting shared representations for personalized federated learning,
L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Exploiting shared representations for personalized federated learning,” inInterna- tional Conference on Machine Learning, pp. 2089–2099, PMLR, 2021
work page 2089
-
[33]
T. Luet al., “FedCSAD: Federated learning with contextual client selection and confidence-weighted multi-teacher knowledge distillation in power equipment inspection,” inAlgorithms and Architectures for Parallel Processing (ICA3PP), vol. 16382 ofLecture Notes in Computer Science, Springer, 2026
work page 2026
-
[34]
Measuring the effects of non- identical data distribution for federated visual classification,
T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non- identical data distribution for federated visual classification,” inNeurIPS Workshop on Federated Learning, 2019
work page 2019
-
[35]
Analysis of Thompson sampling for the multi-armed bandit problem,
S. Agrawal and N. Goyal, “Analysis of Thompson sampling for the multi-armed bandit problem,” inConference on Learning Theory, pp. 39.1–39.26, JMLR, 2012
work page 2012
-
[36]
Learning multiple layers of features from tiny images,
A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” tech. rep., University of Toronto, 2009
work page 2009
-
[37]
Reading digits in natural images with unsupervised feature learning,
Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised feature learning,” in NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011
work page 2011
-
[38]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms,”arXiv preprint arXiv:1708.07747, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.