FedeKD: Energy-Based Gating for Robust Federated Knowledge Distillation under Heterogeneous Settings
Pith reviewed 2026-05-08 15:01 UTC · model grok-4.3
The pith
FedeKD uses energy-based gating to turn private-proxy disagreement into per-sample trust weights for knowledge transfer.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FedeKD maintains high-capacity private models locally and lightweight proxy models that are aggregated globally. The energy-based gating mechanism computes trust weights from the disagreement between private and proxy predictions on each sample, allowing the distillation loss to be weighted so that reliable samples receive more guidance from the proxy while unreliable ones are downweighted. This sample-wise approach reduces negative transfer under heterogeneous client conditions while preserving strong predictive performance.
What carries the argument
The energy-based gating mechanism, which converts the disagreement between a client's private model and the aggregated global proxy into a per-sample trust weight that scales the proxy's contribution to the private model's update.
Load-bearing premise
The amount of disagreement between a private model and the aggregated proxy on a given sample is a reliable signal of whether the proxy's knowledge is trustworthy for that sample.
What would settle it
An ablation in which samples with high private-proxy disagreement are shown to be exactly the samples where the proxy is correct, yet performance drops when those samples receive low trust weights compared with uniform weighting.
Figures
read the original abstract
Federated learning (FL) operates in heterogeneous environments, where variations in data distributions and asymmetric model design often result in negative transfer. While federated knowledge distillation (FKD) avoids direct model parameter sharing, existing methods typically rely on public datasets or assume that transferred knowledge is uniformly reliable, which limits their robustness in practice. This paper presents FedeKD, a reliability-aware FKD framework that makes sample-wise trust estimation an explicit component of knowledge transfer, without relying on additional public data. Each client maintains a high-capacity private model for local learning and a lightweight shared proxy model for cross-client knowledge exchange. During training, proxy models are aggregated on the server to form a global proxy, which is then used to guide updates of the private models. At the core of FedeKD is an energy-based gating mechanism that converts task-specific private-proxy disagreement into sample-wise trust weights for backward distillation. This mechanism enables sample-wise weighting of knowledge transfer, where the proxy model contributes more to reliable samples while down-weighting unreliable ones. Extensive experiments on six real-world datasets demonstrate that FedeKD significantly reduces negative transfer under heterogeneous settings while maintaining strong predictive performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes FedeKD, a federated knowledge distillation framework for heterogeneous settings that avoids public data by maintaining per-client high-capacity private models and lightweight shared proxy models. Proxies are aggregated server-side into a global proxy; an energy-based gating mechanism then converts per-sample disagreement between the local private model and global proxy into trust weights that modulate backward distillation from proxy to private model, with the goal of down-weighting unreliable samples and thereby reducing negative transfer.
Significance. If the gating mechanism reliably distinguishes sample trustworthiness from mere heterogeneity-induced divergence, the method would offer a practical, public-data-free route to robust FKD. The six-dataset experimental claim is potentially impactful for real-world FL deployments, but its significance hinges on whether the reported gains are attributable to the energy-based weighting rather than to the private/proxy architecture alone.
major comments (2)
- [§3] §3 (Energy-based gating): The central claim that private-proxy disagreement is a reliable proxy for sample unreliability is load-bearing for the entire weighting scheme, yet the manuscript provides no external calibration (held-out labels, synthetic noise injection, or public anchor set) to validate this mapping. In non-IID regimes the global proxy is expected to diverge precisely on the client-specific samples; treating that divergence as low trust risks systematically suppressing the very data that distinguishes the client, which would undermine rather than enhance robustness.
- [§4] §4 (Experiments): The abstract asserts that FedeKD “significantly reduces negative transfer” across six real-world datasets, but the reported results lack (i) a clear list of baselines with their hyper-parameter budgets, (ii) statistical significance tests or error bars across multiple random seeds, and (iii) an ablation isolating the contribution of the energy gate versus the private/proxy split itself. Without these, it is impossible to determine whether the claimed gains are reproducible or attributable to the proposed mechanism.
minor comments (2)
- [§3] Notation for the energy function and the resulting trust weights should be introduced with explicit equations and a short derivation showing how the scalar temperature or threshold parameters are chosen (or shown to be insensitive).
- [§4] Figure captions and table headers should explicitly state the heterogeneity level (Dirichlet α or label skew) used in each experiment so that readers can map results to the claimed robustness regime.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below, providing clarifications and committing to specific revisions that strengthen the validation of the gating mechanism and the experimental reporting.
read point-by-point responses
-
Referee: [§3] §3 (Energy-based gating): The central claim that private-proxy disagreement is a reliable proxy for sample unreliability is load-bearing for the entire weighting scheme, yet the manuscript provides no external calibration (held-out labels, synthetic noise injection, or public anchor set) to validate this mapping. In non-IID regimes the global proxy is expected to diverge precisely on the client-specific samples; treating that divergence as low trust risks systematically suppressing the very data that distinguishes the client, which would undermine rather than enhance robustness.
Authors: We appreciate the referee's emphasis on validating the core assumption of the energy-based gating. The mechanism uses the energy score derived from private-proxy output disagreement to modulate distillation weights, with the private model continuing to train unweighted on all local data; this separation ensures client-specific samples are not suppressed in the primary learning process. While the current manuscript does not include external calibration (as the setting precludes public data or held-out labels), the design is motivated by the observation that proxy divergence often signals unreliable transferred knowledge rather than pure heterogeneity. We will revise §3 to include a more explicit theoretical derivation of the energy function and add synthetic experiments injecting controlled label noise to empirically correlate disagreement levels with known unreliability. This is a partial revision, as we cannot add post-hoc external anchors to the existing real-world datasets but can provide supporting analysis. revision: partial
-
Referee: [§4] §4 (Experiments): The abstract asserts that FedeKD “significantly reduces negative transfer” across six real-world datasets, but the reported results lack (i) a clear list of baselines with their hyper-parameter budgets, (ii) statistical significance tests or error bars across multiple random seeds, and (iii) an ablation isolating the contribution of the energy gate versus the private/proxy split itself. Without these, it is impossible to determine whether the claimed gains are reproducible or attributable to the proposed mechanism.
Authors: We agree that the experimental presentation requires these enhancements for reproducibility and attribution. In the revised manuscript we will: (i) add a table enumerating all baselines with their exact hyper-parameter settings and resource budgets, (ii) report all metrics as mean ± standard deviation over at least five random seeds together with appropriate statistical significance tests, and (iii) introduce an ablation study that compares the full FedeKD against a uniform-weighting variant (no energy gate) and against the private/proxy architecture without any distillation. These additions will directly isolate the gating contribution and address the referee's concern about attributing gains to the proposed mechanism. revision: yes
Circularity Check
No significant circularity; method is a defined design choice
full rationale
The paper introduces FedeKD as a new framework whose core energy-based gating is explicitly constructed to map private-proxy disagreement to sample-wise trust weights. This is a modeling decision rather than a derivation that reduces to its own inputs by construction. No load-bearing self-citations, fitted parameters renamed as predictions, or uniqueness theorems imported from prior author work are present in the provided abstract or description. The mechanism is self-contained as an ansatz for weighting, with experimental validation on external datasets serving as the independent check. Tunable scalars in the energy function, if any, represent standard hyperparameter choices rather than circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Proxy model aggregation on the server produces a globally useful knowledge signal for local private models.
Reference graph
Works this paper leans on
-
[1]
Vovk, Vladimir and Gammerman, Alexander and Shafer, Glenn , year=
-
[2]
He, Wenchong and Jiang, Zhe and Xiao, Tingsong and Xu, Zelin and Li, Yukun , journal=. 2025 , publisher =
work page 2025
-
[3]
Lambert, Benjamin and Forbes, Florence and Doyle, Senan and Dehaene, Harmonie and Dojat, Michel , journal=. 2024 , publisher=
work page 2024
- [4]
-
[5]
Brendan and Avent, Brendan and Bellet, Aur
Kairouz, Peter and McMahan, H. Brendan and Avent, Brendan and Bellet, Aur. Foundations and Trends. 2021 , publisher=
work page 2021
-
[6]
Proceedings of the National Academy of Sciences , volume=
Jin, Ying and Ren, Zhimei and Cand. Proceedings of the National Academy of Sciences , volume=. 2023 , publisher=
work page 2023
- [7]
-
[8]
Yin, Mingzhang and Shi, Claudia and Wang, Yixin and Blei., David M. , journal=. 2022 , publisher=
work page 2022
-
[9]
Bates, Stephen and Angelopoulos, Anastasios and Lei, Lihua and Malik, Jitendra and Jordan, Michael , journal=. 2021 , publisher=
work page 2021
- [10]
-
[11]
Boger and Seyone Chithrananda and Anastasios N
Ron S. Boger and Seyone Chithrananda and Anastasios N. Angelopoulos and Peter H. Yoon and Michael I. Jordan and Jennifer A. Doudna , journal=. 2025 , publisher=
work page 2025
-
[12]
Anastasios Angelopoulos and Stephen Bates and Jitendra Malik and Michael I. Jordan , booktitle=. 2021 , organization=
work page 2021
-
[13]
Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and y Arcas, Blaise Ag
McMahan, H. Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and y Arcas, Blaise Ag. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics , volume=. 2017 , organization=
work page 2017
-
[14]
Li, Tian and Sahu, Anit Kumar and Zaheer, Manzil and Sanjabi, Maziar and Talwalkar, Ameet and Smith, Virginia , booktitle=. 2020 , publisher=
work page 2020
-
[15]
Fan, Boyu and Jiang, Siyang and Su, Xiang and Tarkoma, Sasu and Hui, Pan , booktitle=. 2024 , organization=
work page 2024
-
[16]
Jordan and Ramesh Raskar , booktitle=
Charles Lu and Yaodong Yu and Sai Praneeth Karimireddy and Michael I. Jordan and Ramesh Raskar , booktitle=. 2023 , organization=
work page 2023
-
[17]
Plassier, Vincent and Makni, Mehdi and Rubashevskii, Aleksandr and Moulines, Eric and Panov, Maxim , booktitle=. 2023 , organization=
work page 2023
-
[18]
Proceedings of the 40th International Conference on Machine Learning , volume=
Humbert, Pierre and Le Bars, Batiste and Bellet, Aur. Proceedings of the 40th International Conference on Machine Learning , volume=. 2023 , organization=
work page 2023
-
[19]
Proceedings of the 34th International Conference on Neural Information Processing Systems , volume=
Yaniv Romano and Matteo Sesia and Cand. Proceedings of the 34th International Conference on Neural Information Processing Systems , volume=. 2020 , organization=
work page 2020
-
[20]
Proceedings of the 33th International Conference on Neural Information Processing Systems , volume=
Yaniv Romano and Evan Patterson and Cand. Proceedings of the 33th International Conference on Neural Information Processing Systems , volume=. 2019 , organization=
work page 2019
-
[21]
and Foygel Barber, Rina and Cand
Tibshirani, Ryan J. and Foygel Barber, Rina and Cand. Proceedings of the 33rd International Conference on Neural Information Processing Systems , volume=. 2019 , organization=
work page 2019
-
[22]
Proceedings of the 39th International Conference on Neural Information Processing Systems , volume=
Personalized Federated Conformal Prediction with Localization , author=. Proceedings of the 39th International Conference on Neural Information Processing Systems , volume=. 2025 , organization=
work page 2025
-
[23]
Plassier, Vincent and Kotelevskii, Nikita and Rubashevskii, Aleksandr and Noskov, Fedor and Velikanov, Maksim and Fishkov, Alexander and Horvath, Samuel and Takac, Martin and Moulines, Eric and Panov, Maxim , booktitle=. 2024 , organization=
work page 2024
-
[24]
Xu, Rui and Chen, Xingyuan and Huang, Wenxing and Huang, Minxuan and Xie, Yun and Chen, Weiyan and Xie, Sihong , journal=
-
[25]
and Orenstein, Paulo and Ramos, Thiago and Romano, Joao Vitor , journal=
Oliveira, Roberto I. and Orenstein, Paulo and Ramos, Thiago and Romano, Joao Vitor , journal=
-
[26]
Jiang, Meirui and Roth, Holger R. and Li, Wenqi and Yang, Dong and Zhao, Can and Nath, Vishwesh and Xu, Daguang and Dou, Qi and Xu, Ziyue , booktitle=. 2023 , organization=
work page 2023
-
[27]
and Hatamizadeh, Ali and Zhao, Can and Xu, Daguang and Huang, Heng and Xu, Ziyue , booktitle=
Xu, An and Li, Wenqi and Guo, Pengfei and Yang, Dong and Roth, Holger R. and Hatamizadeh, Ali and Zhao, Can and Xu, Daguang and Huang, Heng and Xu, Ziyue , booktitle=. 2022 , organization=
work page 2022
-
[28]
Wang, Jiaqi and Yin, Ziyi and You, Quanzeng and Lyu, Lingjuan and Ma, Fenglong , booktitle=. 2025 , organization=
work page 2025
-
[29]
Wang, Jiaqi and Zhao, Chenxu and Lyu, Lingjuan and You, Quanzeng and Huai, Mengdi and Ma, Fenglong , booktitle=. 2024 , organization =
work page 2024
- [30]
- [31]
-
[32]
Wang, Jiaqi and Qian, Cheng and Cui, Suhan and Glass, Lucas and Ma, Fenglong , booktitle=. 2022 , organization=
work page 2022
-
[33]
Proceedings of the 31st ACM International Conference on Multimedia , pages=
FedGH: Heterogeneous Federated Learning with Generalized Global Header , author=. Proceedings of the 31st ACM International Conference on Multimedia , pages=. 2023 , organization=
work page 2023
-
[34]
LeCun, Yann and Cortes, Corinna and Burges, Christopher J. C. , journal=
-
[35]
Xiao, Han and Rasul, Kashif and Vollgraf, Roland , journal=
-
[36]
Krizhevsky, Alex , year=
-
[37]
Tschandl, Philipp and Rosendahl, Cliff and Kittler, Harald , journal=. 2018 , publisher=
work page 2018
- [38]
-
[39]
Ljosa, Vebjorn and Sokolnicki, Katherine L. and Carpenter, Anne E. , journal=
-
[40]
Liu, Ruhan and Wang, Xiangning and Wu, Qiang and Dai, Ling and Fang, Xi and Yan, Tao and Son, Jaemin and Tang, Shiqi and Li, Jiang and Gao, Zijian and Galdran, Adrian and Poorneshwaran, J. M. and Liu, Hao and Wang, Jie and Chen, Yerui and Porwal, Prasanna and Wei Tan, Gavin Siew and Yang, Xiaokang and Dai, Chao and Song, Haitao and Chen, Mingang and Li, H...
work page 2022
-
[41]
Li, Daliang and Wang, Junpu , journal=
-
[42]
Tan, Yue and Long, Guodong and Liu, Lu and Zhou, Tianyi and Lu, Qinghua and Jiang, Jing and Zhang, Chengqi , booktitle=
-
[43]
Advances in Neural Information Processing Systems , volume=
Ensemble Distillation for Robust Model Fusion in Federated Learning , author=. Advances in Neural Information Processing Systems , volume=. 2020 , organizer=
work page 2020
-
[44]
Thirty-seventh Conference on Advances in Neural Information Processing Systems , volume=
Towards personalized federated learning via heterogeneous model reassembly , author=. Thirty-seventh Conference on Advances in Neural Information Processing Systems , volume=. 2023 , organization=
work page 2023
-
[45]
Yu, Sixing and Qian, Wei and Jannesari, Ali , journal=
-
[46]
Shen, Tao and Zhang, Jie and Jia, Xinkang and Zhang, Fengda and Lv, Zheqi and Kuang, Kun and Wu, Chao and Wu, Fei , journal=. 2023 , publisher=
work page 2023
-
[47]
Quang-Huy Nguyen and Jiaqi Wang and Wei-Shinn Ku , title =. arXiv , volume =. 2026 , doi =
work page 2026
-
[48]
and Goldbaum, Michael and Cai, Wenjia and Valentim, Carolina C
Kermany, Daniel S. and Goldbaum, Michael and Cai, Wenjia and Valentim, Carolina C. S. and Liang, Huiying and Baxter, Sally L. and McKeown, Alex and Yang, Ge and Wu, Xiaokang and Yan, Fangbing and Dong, Justin and Prasadha, Made K. and Pei, Jacqueline and Ting, Magdalene Y. L. and Zhu, Jie and Li, Christina and Hewett, Sierra and Dong, Jason and Ziyar, Ian...
work page 2018
-
[49]
Xu, Xuanang and Zhou, Fugen and Liu, Bo and Fu, Dongshan and Bai, Xiangzhi , journal=. 2019 , publisher=
work page 2019
-
[50]
Halabi, Safwan S. and Prevedello, Luciano M. and Kalpathy-Cramer, Jayashree and Mamonov, Artem B. and Bilbily, Alexander and Cicero, Mark and Pan, Ian and Pereira, Lucas Araújo and Sousa, Rafael Teixeira and Abdala, Nitamar and Kitamura, Felipe Campos and Thodberg, Hans H. and Chen, Leon and Shih, George and Andriole, Katherine and Kohli, Marc D. and Eric...
work page 2019
-
[51]
Woerner, Stefano and Jaques, Arthur and Baumgartner, Christian F. , year=. Scientific Data , publisher=. doi:10.1038/s41597-025-04866-4 , number=
-
[52]
Durmus, Alp Emre and Yue, Zhao and Ramon, Matas and Matthew, Mattina and Paul, Whatmough and Venkatesh, Saligrama , booktitle=. 2021 , address=
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.