Recognition: 2 theorem links
· Lean TheoremFedVSSAM: Mitigating Flatness Incompatibility in Sharpness-Aware Federated Learning
Pith reviewed 2026-05-12 04:47 UTC · model grok-4.3
The pith
FedVSSAM mitigates flatness incompatibility in sharpness-aware federated learning by anchoring local searches to a variance-suppressed global direction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that flatness incompatibility arises from data heterogeneity and the friendly adversary phenomenon and is amplified by local updates and partial device participation. FedVSSAM counters this by constructing a variance-suppressed adjusted direction and applying it uniformly in local flatness search, local descent, and global aggregation, thereby anchoring both perturbation and update steps to a stable global direction rather than purely local signals. The method supplies non-convex convergence guarantees and proves that the mean-square deviation between the adjusted direction and the global gradient remains controlled.
What carries the argument
The variance-suppressed adjusted direction, which blends local SAM perturbations with global gradient information to suppress variance and align local flatness searches with the global objective.
If this is right
- Non-convex convergence guarantees hold for FedVSSAM.
- The mean-square deviation between the adjusted direction and the global gradient is provably bounded.
- The method outperforms standard SAM and other baselines across diverse federated settings with varying heterogeneity and participation rates.
- Consistent use of the adjusted direction in perturbation, descent, and aggregation steps directly reduces the identified structural incompatibility.
Where Pith is reading between the lines
- The variance-control technique may generalize to other client-drift correction methods in distributed optimization by enforcing directional consistency rather than gradient averaging alone.
- If the adjusted direction reduces effective drift, it could lower the number of communication rounds required to reach target accuracy under heterogeneity.
- The result points to direction inconsistency across clients as a distinct bottleneck separate from gradient magnitude divergence.
Load-bearing premise
That constructing a variance-suppressed adjusted direction from local information can simultaneously resolve local-global flatness mismatch without introducing bias or slowing convergence under arbitrary heterogeneity levels.
What would settle it
An experiment in which, under high data heterogeneity, the mean-square deviation between the adjusted direction and the global gradient fails to decrease or global test accuracy shows no improvement over standard SAM would falsify the central claim.
Figures
read the original abstract
Sharpness-aware minimization (SAM) is an effective method for improving the generalization of federated learning (FL) by steering local training toward flat minima. Under data heterogeneity, however, device-side SAM searches for locally flat basins that are incompatible with the flat region preferred by the global objective. We identify this structural failure mode as flatness incompatibility, which explains why improving local flatness alone may provide limited training and generalization improvement for the global model. We reveal that flatness incompatibility arises from data heterogeneity and the friendly adversary phenomenon, and is further amplified by local updates and partial device participation. To mitigate this issue, we propose Federated Learning with variance-suppressed sharpness-aware minimization (FedVSSAM), which constructs a variance-suppressed adjusted direction and uses it consistently in local flatness search, local descent, and global update. FedVSSAM anchors both perturbation and update directions to a more stable global direction, instead of correcting only an isolated local perturbation. We establish non-convex convergence guarantees of FedVSSAM and prove that the mean-square deviation between the adjusted direction and the global gradient is effectively controlled. Experiments demonstrate that FedVSSAM mitigates flatness incompatibility and outperforms the baselines across diverse FL settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies 'flatness incompatibility' in sharpness-aware minimization (SAM) for federated learning (FL) under data heterogeneity, where local flat minima conflict with the global objective due to heterogeneity, friendly-adversary effects, local updates, and partial participation. It proposes FedVSSAM, which constructs a variance-suppressed adjusted direction from local information and applies it consistently for local perturbation, descent, and global update to anchor to a stable global direction. The manuscript claims non-convex convergence guarantees for FedVSSAM along with a proof that the mean-square deviation between this adjusted direction and the global gradient is controlled, and reports that experiments show mitigation of the incompatibility with outperformance over baselines in diverse FL settings.
Significance. If the convergence guarantees and mean-square deviation control hold under the claimed arbitrary heterogeneity levels, the work would meaningfully advance SAM applications in FL by addressing a structural mismatch that standard local SAM does not resolve. The consistent application of the adjusted direction across phases and the explicit identification of flatness incompatibility as a distinct failure mode represent conceptual contributions; reproducible experiments across settings would further strengthen the case for practical impact in heterogeneous FL.
major comments (3)
- [§4, Theorem 1] §4 (Convergence Analysis), Theorem 1 and surrounding lemmas: the claimed non-convex convergence rate and the bound on mean-square deviation of the variance-suppressed direction from the global gradient appear to rely on controlling local gradient variance; it is unclear whether these bounds remain independent of the heterogeneity parameter under the arbitrary heterogeneity levels asserted in the abstract and §3.2, or whether they implicitly require bounded dissimilarity as in standard FL analyses.
- [§3.3] §3.3 (FedVSSAM Algorithm): the construction of the variance-suppressed adjusted direction uses only local information (even if aggregated); under partial participation and high heterogeneity, the friendly-adversary phenomenon could cause the deviation term to scale with the heterogeneity measure, undermining both the convergence claim and the mitigation of flatness incompatibility without additional assumptions or clipping.
- [§5] §5 (Experiments): while outperformance is reported, the description lacks explicit quantification of heterogeneity levels (e.g., Dirichlet α values), number of local steps, and participation rates used to stress-test the deviation control; without these, it is difficult to confirm that the method succeeds precisely where flatness incompatibility is most severe.
minor comments (2)
- [Abstract, §1] The abstract and introduction introduce 'flatness incompatibility' as a new term; a brief formal definition or equation characterizing the incompatibility (e.g., difference in sharpness measures) would aid readability.
- [§3] Notation for the adjusted direction (e.g., how variance suppression is exactly formulated) should be introduced earlier and used consistently in both the algorithm box and the proof.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the convergence analysis, algorithmic design, and experimental reporting. We address each major comment point by point below.
read point-by-point responses
-
Referee: [§4, Theorem 1] §4 (Convergence Analysis), Theorem 1 and surrounding lemmas: the claimed non-convex convergence rate and the bound on mean-square deviation of the variance-suppressed direction from the global gradient appear to rely on controlling local gradient variance; it is unclear whether these bounds remain independent of the heterogeneity parameter under the arbitrary heterogeneity levels asserted in the abstract and §3.2, or whether they implicitly require bounded dissimilarity as in standard FL analyses.
Authors: The non-convex convergence rate in Theorem 1 and the mean-square deviation bound are derived without invoking bounded dissimilarity. The variance-suppressed adjusted direction is constructed to bound the deviation from the global gradient via a suppression term whose expectation is controlled independently of the heterogeneity parameter; the proof relies on this property holding for arbitrary heterogeneity levels as stated in §3.2. We will add a short clarifying remark after Theorem 1 to make the independence explicit. revision: partial
-
Referee: [§3.3] §3.3 (FedVSSAM Algorithm): the construction of the variance-suppressed adjusted direction uses only local information (even if aggregated); under partial participation and high heterogeneity, the friendly-adversary phenomenon could cause the deviation term to scale with the heterogeneity measure, undermining both the convergence claim and the mitigation of flatness incompatibility without additional assumptions or clipping.
Authors: The variance suppression is specifically introduced to counteract the friendly-adversary effect by damping local variance in the adjusted direction. The mean-square deviation control proved in §4 holds under partial participation because the global aggregation of these suppressed directions anchors the update; the bound does not scale with heterogeneity and requires no extra clipping or assumptions beyond those already stated. revision: no
-
Referee: [§5] §5 (Experiments): while outperformance is reported, the description lacks explicit quantification of heterogeneity levels (e.g., Dirichlet α values), number of local steps, and participation rates used to stress-test the deviation control; without these, it is difficult to confirm that the method succeeds precisely where flatness incompatibility is most severe.
Authors: We agree that explicit parameter values will strengthen the experimental section. The reported results used Dirichlet α ∈ {0.1, 0.5, 1.0}, 5 local steps, and participation rates of 10% and 20%. We will insert these values into the experimental setup description in the revised manuscript. revision: yes
Circularity Check
No circularity: convergence and deviation-control claims presented as independent derivations without reduction to inputs or self-citations.
full rationale
Abstract and available text claim establishment of non-convex convergence guarantees plus a proof that mean-square deviation of the variance-suppressed adjusted direction from the global gradient is controlled. No equations, self-citations, fitted parameters renamed as predictions, or ansatzes are quoted that would make these results equivalent to the method definition by construction. The central premise (variance suppression from local information mitigating flatness incompatibility) is not shown to reduce to a tautology or prior self-citation chain. This is the normal case of a self-contained derivation whose validity rests on external verification of the (unprovided) proof rather than internal circularity.
Axiom & Free-Parameter Ledger
invented entities (1)
-
flatness incompatibility
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearconstructs a variance-suppressed adjusted direction and uses it consistently in local flatness search, local descent, and global update
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearE[Δ_FI(θ,ρ)] ≤ 3ρ²/N Σ E[∥δ_i(θ)∥² + 4∥ζ_i,ϕ_i(θ)∥²] + (3/4)L²ρ⁴
Reference graph
Works this paper leans on
-
[1]
What- mough, and Venkatesh Saligrama
Durmus Alp Emre Acar, Yue Zhao, Ramon Matas Navarro, Matthew Mattina, Paul N. What- mough, and Venkatesh Saligrama. Federated learning based on dynamic regularization. In International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[2]
Towards understanding sharpness-aware minimization
Maksym Andriushchenko and Nicolas Flammarion. Towards understanding sharpness-aware minimization. InProceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 639–668. PMLR, 17–23 Jul 2022
work page 2022
-
[3]
Alan Bain and Dan Crisan.Fundamentals of Stochastic Filtering. Springer, 2009
work page 2009
-
[4]
Towards federated learning at scale: System design
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Koneˇcný, Stefano Mazzocchi, Brendan McMahan, Ti- mon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. Towards federated learning at scale: System design. In A. Talwalkar, V . Smith, and M. Zaharia, editors,Proceed- ings o...
work page 2019
-
[5]
Improving generalization in feder- ated learning by seeking flat minima
Debora Caldarola, Barbara Caputo, and Marco Ciccone. Improving generalization in feder- ated learning by seeking flat minima. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors,Computer Vision – ECCV 2022, pages 654–672. Springer Nature Switzerland, 2022
work page 2022
-
[6]
Debora Caldarola, Pietro Cagnasso, Barbara Caputo, and Marco Ciccone. Beyond local sharp- ness: Communication-efficient global sharpness-aware minimization for federated learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25187–25197, June 2025
work page 2025
-
[7]
Momentum benefits non-iid federated learning simply and provably
Ziheng Cheng, Xinmeng Huang, Pengfei Wu, and Kun Yuan. Momentum benefits non-iid federated learning simply and provably. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[8]
Exploiting shared representations for personalized federated learning
Liam Collins, Hamed Hassani, Aryan Mokhtari, and Sanjay Shakkottai. Exploiting shared representations for personalized federated learning. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 2089–2099. PMLR, 18–24 Jul 2021
work page 2089
-
[9]
Rong Dai, Xun Yang, Yan Sun, Li Shen, Xinmei Tian, Meng Wang, and Yongdong Zhang. FedGAMMA: Federated learning with global sharpness-aware minimization.IEEE Transac- tions on Neural Networks and Learning Systems, 35(12):17479–17492, 2024. 10
work page 2024
-
[10]
Efficient sharpness-aware minimization for improved training of neural net- works
Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen, Rick Siow Mong Goh, and Vincent Tan. Efficient sharpness-aware minimization for improved training of neural net- works. InInternational Conference on Learning Representations, 2022
work page 2022
-
[11]
Gintare Karolina Dziugaite and Daniel M. Roy. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. InPro- ceedings of International Conference on Machine Learning Workshop, 2017
work page 2017
-
[12]
Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach
Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. InAdvances in Neural Information Processing Systems, volume 33, pages 3557–3568, 2020
work page 2020
-
[13]
Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya Zhang, Masashi Sugiyama, and Yanfeng Wang. Locally estimated global perturbations are better than local perturbations for federated sharpness-aware minimization. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[14]
Sharpness-aware min- imization for efficiently improving generalization
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware min- imization for efficiently improving generalization. InInternational Conference on Learning Representations, 2021
work page 2021
-
[15]
Yujie Gu, Richeng Jin, Zhaoyang Zhang, and Huaiyu Dai. Gradient compression may hurt gen- eralization: A remedy by synthetic data guided sharpness aware minimization.arXiv preprint arXiv:2602.11584, 2026
-
[16]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), pages 770–778, 2016
work page 2016
-
[17]
Flat minima.Neural Computation, 9(1):1–42, 1997
Sepp Hochreiter and Jürgen Schmidhuber. Flat minima.Neural Computation, 9(1):1–42, 1997
work page 1997
-
[18]
Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. Measuring the effects of non-identical data distribution for federated visual classification.arXiv preprint arXiv:1909.06335, 2019
-
[19]
Jifei Hu, Yanli Li, Huayong Xie, Lijun Xu, Hang Zhang, and Xinqiang Zhou. Local sharp- ness aware minimization in decentralized federated learning with privacy protection.Expert Systems with Applications, page 131510, 2026
work page 2026
-
[20]
Fantas- tic generalization measures and where to find them
Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, and Samy Bengio. Fantas- tic generalization measures and where to find them. InInternational Conference on Learning Representations, 2020
work page 2020
-
[21]
Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Ar- jun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning.Foundations and Trends in Machine Learning, 14(1–2):1–210, 2021
work page 2021
-
[22]
SCAFFOLD: Stochastic controlled averaging for federated learning
Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. SCAFFOLD: Stochastic controlled averaging for federated learning. InProceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5132–5143. PMLR, 13–18 Jul 2020
work page 2020
-
[23]
On large-batch training for deep learning: Generalization gap and sharp min- ima
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp min- ima. InInternational Conference on Learning Representations, 2017
work page 2017
-
[24]
Communication-efficient federated learning with accelerated client gradient
Geeho Kim, Jinkyu Kim, and Bohyung Han. Communication-efficient federated learning with accelerated client gradient. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12385–12394, 2024
work page 2024
-
[25]
Minyoung Kim, Da Li, Shell X. Hu, and Timothy Hospedales. Fisher SAM: Information geometry and sharpness aware minimisation. InProceedings of the 39th International Confer- ence on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 11148–11161. PMLR, 17–23 Jul 2022. 11
work page 2022
-
[26]
Convolutional neural networks for sentence classification
Yoon Kim. Convolutional neural networks for sentence classification. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1746–1751, 2014
work page 2014
-
[27]
Adam: A Method for Stochastic Optimization
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[28]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, ON, Canada, 2009
work page 2009
-
[29]
ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks
Jungmin Kwon, Jeongseop Kim, Hyunseo Park, and In Kwon Choi. ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. InPro- ceedings of the 38th International Conference on Machine Learning, volume 139 ofProceed- ings of Machine Learning Research, pages 5905–5914. PMLR, 18–24 Jul 2021
work page 2021
-
[30]
Rethinking the flat minima searching in federated learn- ing
Taehwan Lee and Sung Whan Yoon. Rethinking the flat minima searching in federated learn- ing. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 27037–27071. PMLR, 21–27 Jul 2024
work page 2024
-
[31]
Enhancing sharpness-aware optimization through vari- ance suppression
Bingcong Li and Georgios Giannakis. Enhancing sharpness-aware optimization through vari- ance suppression. InAdvances in Neural Information Processing Systems, volume 36, pages 70861–70879. Curran Associates, Inc., 2023
work page 2023
-
[32]
Model-contrastive federated learning
Qinbin Li, Bingsheng He, and Dawn Song. Model-contrastive federated learning. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10708– 10717, 2021
work page 2021
-
[33]
Federated learning on non-iid data si- los: An experimental study
Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng He. Federated learning on non-iid data si- los: An experimental study. In2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 965–978, 2022
work page 2022
-
[34]
Friendly sharpness- aware minimization
Tao Li, Pan Zhou, Zhengbao He, Xinwen Cheng, and Xiaolin Huang. Friendly sharpness- aware minimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5631–5640, June 2024
work page 2024
-
[35]
Federated optimization in heterogeneous networks
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. InProceedings of Machine Learn- ing and Systems, volume 2, pages 429–450, 2020
work page 2020
-
[36]
Ditto: Fair and robust federated learning through personalization
Tian Li, Shengyuan Hu, Ahmad Beirami, and Virginia Smith. Ditto: Fair and robust federated learning through personalization. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 6357–
-
[37]
PMLR, 18–24 Jul 2021
work page 2021
-
[38]
FedWMSAM: Fast and flat federated learning via weighted momentum and sharpness-aware minimization
Tianle Li, Yongzhi Huang, Linshan Jiang, Chang Liu, Qipeng Xie, Wenfeng Du, Lu Wang, and Kaishun Wu. FedWMSAM: Fast and flat federated learning via weighted momentum and sharpness-aware minimization. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026
work page 2026
-
[39]
International Conference on Learning Representations , year =
Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. On the conver- gence of FedAvg on non-iid data.arXiv preprint arXiv:1907.02189, 2019
-
[40]
FedBN: Federated learning on non-iid features via local batch normalization
Xiaoxiao Li, Meirui Jiang, Xiaofei Zhang, Michael Kamp, and Qi Dou. FedBN: Federated learning on non-iid features via local batch normalization. InInternational Conference on Learning Representations (ICLR), 2021
work page 2021
-
[41]
Yuhang Li, Tong Liu, Yangguang Cui, Ming Hu, and Xiaoqiang Li. One arrow, two hawks: Sharpness-aware minimization for federated learning via global model trajectory. InForty- second International Conference on Machine Learning, 2025
work page 2025
-
[42]
Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying-Chang Liang, Qiang Yang, Dusit Niyato, and Chunyan Miao. Federated learning in mobile edge networks: A comprehensive survey.IEEE Communications Surveys & Tutorials, 22(3):2031–2063, 2020. 12
work page 2031
-
[43]
Towards efficient and scalable sharpness-aware minimization
Yong Liu, Siqi Mai, Xiangning Chen, Cho-Jui Hsieh, and Yang You. Towards efficient and scalable sharpness-aware minimization. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 12360–12370, June 2022
work page 2022
-
[44]
Communication-Efficient Learning of Deep Networks from Decentralized Data
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data. InProceed- ings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 ofProceedings of Machine Learning Research, pages 1273–1282. PMLR, 2017
work page 2017
-
[45]
Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. Agnostic federated learning. In Proceedings of the 36th International Conference on Machine Learning, volume 97 ofPro- ceedings of Machine Learning Research, pages 4615–4625. PMLR, 09–15 Jun 2019
work page 2019
-
[46]
Generalized federated learning via sharpness aware minimization
Zhe Qu, Xingyu Li, Rui Duan, Yao Liu, Bo Tang, and Zhuo Lu. Generalized federated learning via sharpness aware minimization. InProceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 18250– 18280. PMLR, 2022
work page 2022
-
[47]
Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Koneˇcný, Sanjiv Kumar, and H. Brendan McMahan. Adaptive federated optimization. In International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[48]
Make landscape flatter in differentially private federated learning
Yifan Shi, Yingqi Liu, Kang Wei, Li Shen, Xueqian Wang, and Dacheng Tao. Make landscape flatter in differentially private federated learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24552–24562, June 2023
work page 2023
-
[49]
Yan Sun, Li Shen, Shixiang Chen, Liang Ding, and Dacheng Tao. Dynamic regularized sharp- ness aware minimization in federated learning: Approaching global consistency and smooth landscape. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 32991–33013. PMLR, 23–29 Jul 2023
work page 2023
-
[50]
Yan Sun, Li Shen, Tiansheng Huang, Liang Ding, and Dacheng Tao. FedSpeed: Larger lo- cal interval, less communication round, and higher generalization accuracy.arXiv preprint arXiv:2302.10429, 2023
-
[51]
Dinh, Nguyen Tran, and Josh Nguyen
Canh T. Dinh, Nguyen Tran, and Josh Nguyen. Personalized federated learning with moreau envelopes. InAdvances in Neural Information Processing Systems, volume 33, pages 21394– 21405, 2020
work page 2020
-
[52]
Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008
Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008
work page 2008
-
[53]
Federated learning with matched averaging
Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, and Yasaman Khaz- aeni. Federated learning with matched averaging. InInternational Conference on Learning Representations (ICLR), 2020
work page 2020
-
[54]
Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H. V . Poor. A novel framework for the analysis and design of heterogeneous federated learning.IEEE Transactions on Signal Processing, 69:5234–5249, 2021
work page 2021
-
[55]
Kang Wei, Jun Li, Ming Ding, Chuan Ma, Hang Su, Bo Zhang, and H. Vincent Poor. User-level privacy-preserving federated learning: Analysis and performance optimization.IEEE Trans- actions on Mobile Computing, 21(9):3388–3401, 2022. doi: 10.1109/TMC.2021.3056991
-
[56]
Kaiyue Wen, Tengyu Ma, and Zhiyuan Li. How does sharpness-aware minimization minimize sharpness? InOPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop), 2022
work page 2022
-
[57]
Bingnan Xiao, Xichen Yu, Wei Ni, Xin Wang, and H. V . Poor. Over-the-air federated learning: Status quo, open challenges, and future directions.Fundamental Research, 5(4):1710–1724, 2025. 13
work page 2025
-
[58]
FedCM: Federated learning with client-level momentum.arXiv preprint arXiv:2106.10874, 2021
Jing Xu, Sen Wang, Liwei Wang, and Andrew Chi-Chih Yao. FedCM: Federated learning with client-level momentum.arXiv preprint arXiv:2106.10874, 2021
-
[59]
Riemannian SAM: Sharpness-aware minimization on riemannian manifolds
Jihun Yun and Eunho Yang. Riemannian SAM: Sharpness-aware minimization on riemannian manifolds. InAdvances in Neural Information Processing Systems, volume 36, 2023
work page 2023
-
[60]
Sergey Zagoruyko and Nikos Komodakis. Wide residual networks.arXiv preprint arXiv:1605.07146, 2016
work page internal anchor Pith review arXiv 2016
-
[61]
Character-level convolutional networks for text classification
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. InAdvances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015
work page 2015
-
[62]
Gradient norm aware minimization seeks first-order flatness and improves generalization
Xingxuan Zhang, Renzhe Xu, Han Yu, Hao Zou, and Peng Cui. Gradient norm aware minimization seeks first-order flatness and improves generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20247– 20257, June 2023
work page 2023
-
[63]
arXiv preprint arXiv:1806.00582 (2018)
Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas Chandra. Feder- ated learning with non-iid data.arXiv preprint arXiv:1806.00582, 2018
-
[64]
Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha C. Dvornek, Sekhar Tatikonda, James S. Duncan, and Ting Liu. Surrogate gap minimization improves sharpness-aware training. InInternational Conference on Learning Representations (ICLR), 2022. 14 A Additional Related Work Heterogeneous FL.FL enables collaborative training without shar...
work page 2022
-
[65]
LetF t,k denote theσ-algebra [3] generated by(θ t, ht),S t, and all data sampled before training iterationkof roundt, i.e.,{ϕ t,ℓ j }j∈S t,0≤ℓ≤k−1 . GivenF t,k,θ t,k i andh t are fixed, and the remaining randomness comes from the current batchϕ t,k i , which determinesg t,k i , ˜θt,k i through (14), and˜gt,k i = ∇Fi(˜θt,k i ;ϕ t,k i ). Conditioning onF t,...
-
[66]
WhenΣbecomes large with a smallSor largeσ 2 l,1 andσ 2 g,1, the MSE bound (19) is increasingly dominated byβΣ, leading to a smallerβsetting, which reduces residual varianceβΣwhile decel- erating convergence, as discussed above. In contrast, under a deterministic full device participation and IID data settings, the residual variance takesΣ→0, and the RHS o...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.