pith. sign in

arxiv: 2606.07621 · v1 · pith:EENY2EUJnew · submitted 2026-05-30 · 💻 cs.LG · cs.AI· cs.DC

HASA: Subnet Allocation for Compute-Constrained Model-Heterogeneous Federated Learning

Pith reviewed 2026-06-28 19:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DC
keywords federated learningmodel heterogeneitysubnet allocationheterogeneity-awarecompute-constrainedclient personalizationedge devicesnext-word prediction
0
0 comments X

The pith

Allocating wider subnets to clients with higher data heterogeneity raises mean accuracy from 13.82% to 14.32% and improves tail performance under fixed compute in federated learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HASA to allocate different sized subnets of a shared model to clients in federated learning based on how heterogeneous their local data is. This is done while keeping the total compute cost the same across policies for fair comparison. Experiments on a next-word prediction task show higher average client accuracy and better results for the worst-performing clients compared to uniform allocation or other baselines. An ablation confirms that the direction of allocation matters, with the reverse hurting performance. A second study on image classification indicates that success hinges on the score correctly identifying clients that benefit from wider models.

Core claim

HASA is a train-only rule that assigns subnet widths based on client heterogeneity scores computed from local training data while enforcing a fixed size-weighted compute budget. On an article-title next-word prediction benchmark with seven clients, HASA improves unweighted mean client test accuracy over uniform allocation across 10 matched seeds, increasing mean client test accuracy from 13.82 percent to 14.32 percent, and improves worst-client accuracy on average. In a matched-budget comparison with representative partial-training baselines, HASA achieves the strongest worst-client and tail-client accuracy on this benchmark.

What carries the argument

The HASA allocation rule, which computes a heterogeneity score from each client's local data to decide its subnet width while holding total compute fixed.

If this is right

  • Mean client test accuracy rises by half a percentage point over uniform allocation.
  • Worst-client and tail-client accuracies become the highest among matched-budget partial-training methods.
  • Reversing the allocation direction, so heterogeneous clients receive smaller subnets, reduces both mean and tail performance.
  • The gains hold in a cross-domain image-classification setting only when the score aligns with the need for extra width.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could extend to dynamic client pools if heterogeneity scores are recomputed periodically from recent local batches.
  • Combining the allocation rule with client-specific fine-tuning after subnet training might further close the gap to full-model personalization.
  • The method's reliance on local data only suggests it could apply directly to privacy-sensitive domains where central data inspection is impossible.

Load-bearing premise

The heterogeneity score must accurately reflect each client's need for additional model width.

What would settle it

An experiment on the same benchmark in which replacing the heterogeneity score with random values still yields the reported accuracy gains would show the score is not driving the improvement.

Figures

Figures reproduced from arXiv: 2606.07621 by Ahmed M. Abdelmoniem, Amir Hossein Shahdadian, Christian Herglotz, Mahdi Taheri, Samira Nazari.

Figure 2
Figure 2. Figure 2: Overview of HASA: train-only heterogeneity scoring, budgeted width mapping, and federated training with client-specific subnets. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Client-level test accuracy on the article-title benchmark over 10 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Edge services increasingly use federated learning to personalize on-device models while keeping sensitive data local. In practice, deployments must handle heterogeneity in both client resources and local data distributions. Model-heterogeneous federated learning lowers client cost by allowing each client to train a subnet of a shared supernet, but most subnet-allocation policies are driven by device constraints and do not explicitly account for statistical heterogeneity. This paper proposes Heterogeneity-Aware Subnet Allocation (HASA), a train-only rule that assigns subnet widths based on client heterogeneity scores computed from local training data while enforcing a fixed size-weighted compute budget. This design enables budget-matched comparisons with alternative allocation policies. On an article-title next-word prediction benchmark with seven clients, HASA improves unweighted mean client test accuracy over uniform allocation across 10 matched seeds, increasing mean client test accuracy from 13.82 percent to 14.32 percent, and improves worst-client accuracy on average. In a matched-budget comparison with representative partial-training baselines, HASA achieves the strongest worst-client and tail-client accuracy on this benchmark. A directionality ablation shows that assigning smaller subnets to more heterogeneous clients degrades both mean and tail performance. A cross-domain image-classification study further shows that the effectiveness of heterogeneity-aware allocation depends on how well the heterogeneity score reflects clients' need for additional model width.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Heterogeneity-Aware Subnet Allocation (HASA), a train-only rule that assigns subnet widths in model-heterogeneous federated learning according to client heterogeneity scores computed from local data while enforcing a fixed size-weighted compute budget. On a seven-client article-title next-word prediction benchmark it reports an increase in unweighted mean client test accuracy from 13.82% to 14.32% versus uniform allocation across ten matched seeds, together with improved worst-client and tail-client accuracy relative to representative partial-training baselines. A directionality ablation and a cross-domain image-classification study are included; the latter is presented as evidence that effectiveness depends on how well the heterogeneity score tracks clients' need for additional model width.

Significance. If the heterogeneity score reliably tracks clients' capacity demand, the budget-matched design would allow principled allocation of limited compute in statistically heterogeneous FL deployments and could improve tail performance without raising total cost. The explicit statement that effectiveness hinges on score quality and the use of matched-budget comparisons are strengths that facilitate future verification.

major comments (2)
  1. [Abstract] Abstract: the central performance claims (mean accuracy lift of 0.5 pp, strongest worst- and tail-client accuracy) rest on a single benchmark with only seven clients and ten seeds; no error bars, statistical tests, or description of how the heterogeneity score is computed or how data/seed selection was performed are supplied, which is load-bearing for attributing gains to the allocation rule rather than sampling variance.
  2. [Abstract] Abstract: the directionality ablation tests only the sign of the allocation (smaller subnets to more heterogeneous clients) but supplies no direct evidence that the heterogeneity score correlates with clients' actual need for model width (e.g., via correlation with data-complexity or optimization-difficulty metrics), even though the paper itself states that effectiveness depends on this correlation.
minor comments (1)
  1. [Abstract] The abstract does not provide the formula or precise definition of the heterogeneity score, which would improve reproducibility of the allocation rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the strength of supporting evidence. We will revise the abstract to incorporate additional details on the experimental setup and clarify the role of the ablations and cross-domain study. The responses below address each major comment.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claims (mean accuracy lift of 0.5 pp, strongest worst- and tail-client accuracy) rest on a single benchmark with only seven clients and ten seeds; no error bars, statistical tests, or description of how the heterogeneity score is computed or how data/seed selection was performed are supplied, which is load-bearing for attributing gains to the allocation rule rather than sampling variance.

    Authors: We agree the abstract is concise and will expand it in revision to briefly describe the heterogeneity score computation from local data and to note that all results are averaged over 10 matched seeds with the same data partitioning. Error bars will be referenced (they appear in the main figures) and we will add a short statement on the benchmark scale as a limitation. The seven-client setting was selected to permit fine-grained per-client analysis under controlled conditions; we do not claim broad generalizability from this benchmark alone. revision: yes

  2. Referee: [Abstract] Abstract: the directionality ablation tests only the sign of the allocation (smaller subnets to more heterogeneous clients) but supplies no direct evidence that the heterogeneity score correlates with clients' actual need for model width (e.g., via correlation with data-complexity or optimization-difficulty metrics), even though the paper itself states that effectiveness depends on this correlation.

    Authors: The directionality ablation shows that reversing the allocation rule harms both mean and tail performance, which supports the chosen sign. We acknowledge it does not include explicit correlation coefficients with data-complexity metrics. The cross-domain image-classification experiment is presented precisely to illustrate that gains appear only when the heterogeneity score tracks clients' need for width; we will revise the abstract to make this linkage more explicit and to note that the score's validity is evidenced by the conditional effectiveness across domains rather than by the ablation alone. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The paper defines HASA as an allocation rule that computes heterogeneity scores directly from local training data and assigns subnet widths under a fixed compute budget. Reported gains (13.82% to 14.32% mean accuracy) are empirical measurements on held-out test data across seeds, with directionality ablation and cross-domain checks. No equations, predictions, or first-principles results reduce to fitted inputs by construction; the score-to-width mapping is a deterministic function of externally computed inputs, not self-referential. No self-citation chains or uniqueness theorems are invoked to justify the central mechanism. The load-bearing assumption (score quality) is stated explicitly as an empirical precondition rather than derived internally.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The heterogeneity score itself is an implicit modeling choice whose definition and validation are not supplied.

pith-pipeline@v0.9.1-grok · 5794 in / 1156 out tokens · 19292 ms · 2026-06-28T19:28:30.999404+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 9 canonical work pages · 3 internal anchors

  1. [1]

    Fortune: A negative memory overhead hardware-agnostic fault tolerance technique in dnns,

    S. Nazari, M. Taheri, A. Azarpeyvand, M. Afsharchi, T. Ghasempouri, C. Herglotz, M. Daneshtalab, and M. Jenihhin, “Fortune: A negative memory overhead hardware-agnostic fault tolerance technique in dnns,” inIEEE 33rd Asian Test Symposium (ATS), 2024

  2. [2]

    Reliability-aware performance optimization of dnn hw accelerators through heterogeneous quantization,

    S. Nazari, M. Taheri, A. Azarpeyvand, M. Afsharchi, C. Herglotz, and M. Jenihhin, “Reliability-aware performance optimization of dnn hw accelerators through heterogeneous quantization,” in2025 IEEE 26th Latin American Test Symposium (LATS). IEEE, 2025, pp. 1–6

  3. [3]

    Mix-and-match pruning: Globally guided layer-wise sparsification of dnns,

    D. Monachan, S. Nazari, M. Taheri, A. Azarpeyvand, M. Krstic, M. Huebner, and C. Herglotz, “Mix-and-match pruning: Globally guided layer-wise sparsification of dnns,”arXiv preprint arXiv:2603.20280, 2026

  4. [4]

    Towards federated learning at scale: System design,

    K. A. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V . Ivanov, C. M. Kiddon, J. Kone ˇcný, S. Mazzocchi, B. McMahan, T. V . Overveldt, D. Petrou, D. Ramage, and J. Roselander, “Towards federated learning at scale: System design,” inSysML, 2019

  5. [5]

    Communication-Efficient Learning of Deep Networks from Decentralized Data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” inProceedings of AISTATS, 2017, pp. 1273–1282

  6. [6]

    A comprehensive empirical study of heterogeneity in federated learning,

    A. M. Abdelmoniem, C.-Y . Ho, P. Papageorgiou, and M. Canini, “A comprehensive empirical study of heterogeneity in federated learning,” IEEE Internet of Things Journal, vol. 10, pp. 14 071–14 083, 2023

  7. [7]

    Practical secure aggregation for privacy-preserving machine learning,

    K. Bonawitz, V . Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” inACM CCS, 2017

  8. [8]

    REFL: Resource-efficient federated learning,

    A. M. Abdelmoniem, A. N. Sahu, M. Canini, and S. A. Fahmy, “REFL: Resource-efficient federated learning,” inProceedings of the Eighteenth European Conference on Computer Systems (EuroSys), 2023, pp. 215– 232

  9. [9]

    Towards mitigating device het- erogeneity in federated learning via adaptive model quantization,

    A. M. Abdelmoniem and M. Canini, “Towards mitigating device het- erogeneity in federated learning via adaptive model quantization,” in Proceedings of the 1st Workshop on Machine Learning and Systems (EuroMLSys), 2021, pp. 96–103

  10. [10]

    Federated optimization in heterogeneous networks,

    T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inProceedings of Machine Learning and Systems, 2020

  11. [11]

    SCAFFOLD: Stochastic controlled averaging for federated learning,

    S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” inProceedings of ICML, 2020

  12. [12]

    Expanding the Reach of Federated Learning by Reducing Client Resource Requirements

    S. Caldas, J. Kone ˇcny, H. B. McMahan, and A. Talwalkar, “Expanding the reach of federated learning by reducing client resource require- ments,”arXiv 1812.07210, 2019

  13. [13]

    Heterofl: Computation and commu- nication efficient federated learning for heterogeneous clients,

    E. Diao, J. Ding, and V . Tarokh, “Heterofl: Computation and commu- nication efficient federated learning for heterogeneous clients,”arXiv 2010.01264, 2021

  14. [14]

    Slimmable Neural Networks

    J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang, “Slimmable neural networks,”arXiv 1812.08928, 2018

  15. [15]

    Edge computing: Vision and challenges,

    W. Shi, J. Cao, Q. Zhang, Y . Li, and L. Xu, “Edge computing: Vision and challenges,”IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637–646, 2016

  16. [16]

    The emergence of edge computing,

    M. Satyanarayanan, “The emergence of edge computing,”IEEE Com- puter, vol. 50, no. 1, pp. 30–39, 2017

  17. [17]

    Mitigating malicious model fusion in federated learning via confidence-aware defense,

    Q. Li, P. Papageorgiou, G. Liu, M. Gao, L. You, C. Wan, and A. M. Abdelmoniem, “Mitigating malicious model fusion in federated learning via confidence-aware defense,”Information Fusion, vol. 126, 2025

  18. [18]

    Discovering latent knowledge proto- types for heterogeneous federated learning,

    Q. Li and A. M. Abdelmoniem, “Discovering latent knowledge proto- types for heterogeneous federated learning,” inProceedings of the 28th European Conference on Artificial Intelligence (ECAI), 2025

  19. [19]

    Hierarchical knowledge structuring for effective federated learning in heterogeneous environ- ments,

    W. F. Tam, Q. Li, and A. M. Abdelmoniem, “Hierarchical knowledge structuring for effective federated learning in heterogeneous environ- ments,” inProceedings of IEEE IJCNN, 2025

  20. [20]

    Fedrolex: model- heterogeneous federated learning with rolling sub-model extraction,

    S. Alam, L. Liu, M. Yan, and M. Zhang, “Fedrolex: model- heterogeneous federated learning with rolling sub-model extraction,” in Proceedings of NeurIPS, 2022

  21. [21]

    Scalefl: Resource-adaptive federated learning with heterogeneous clients,

    F. Ilhan, G. Su, and L. Liu, “Scalefl: Resource-adaptive federated learning with heterogeneous clients,” inIEEE/CVF CVPR, 2023

  22. [22]

    Once-for-all: Train one network and specialize it for efficient deployment.arXiv preprint arXiv:1908.09791, 2019

    H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, “Once-for-all: Train one network and specialize it for efficient deployment,”arXiv 1908.09791, 2020

  23. [23]

    SlimFL: Federated Learning With Superposition Coding Over Slimmable Neural Networks ,

    W. J. Yun, Y . Kwak, H. Baek, S. Jung, M. Ji, M. Bennis, J. Park, and J. Kim, “ SlimFL: Federated Learning With Superposition Coding Over Slimmable Neural Networks ,”IEEE/ACM Transactions on Networking, vol. 31, no. 06, pp. 2499–2514, 2023

  24. [24]

    arXiv preprint arXiv:1910.03581 , year=

    D. Li and J. Wang, “Fedmd: Heterogenous federated learning via model distillation,”arXiv 1910.03581, 2019

  25. [25]

    Boyi Liu, Lujia Wang, and Ming Liu

    T. Lin, L. Kong, S. U. Stich, and M. Jaggi, “Ensemble distillation for robust model fusion in federated learning,”arXiv 2006.07242, 2021

  26. [26]

    Agnostic Federated Learning

    M. Mohri, G. Sivek, and A. T. Suresh, “Agnostic federated learning,” arXiv 1902.00146, 2019

  27. [27]

    Fair resource allocation in federated learning,

    T. Li, M. Sanjabi, A. Beirami, and V . Smith, “Fair resource allocation in federated learning,”arXiv 1905.10497, 2020

  28. [28]

    An efficient framework for clustered federated learning,

    A. Ghosh, J. Chung, D. Yin, and K. Ramchandran, “An efficient framework for clustered federated learning,” inProceedings of the 34th NeurIPS, 2020

  29. [29]

    Medium articles dataset (2019, 7 publications),

    D. Lazar, “Medium articles dataset (2019, 7 publications),” Kaggle dataset, 2019, accessed 2026-01-29. [Online]. Available: https: //www.kaggle.com/datasets/dorianlazar/medium-articles-dataset