Recognition: 1 theorem link
· Lean TheoremDP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models
Pith reviewed 2026-05-15 20:00 UTC · model grok-4.3
The pith
DP-FedAdamW provides an unbiased second-moment estimator for AdamW in differentially private federated learning, enabling linearly accelerated convergence without heterogeneity assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose DP-FedAdamW as the first AdamW-based optimizer for DPFL that restores AdamW functionality by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to curb client drift, establishing an unbiased second-moment estimator that proves linearly accelerated convergence without heterogeneity assumptions and tighter (ε,δ)-DP guarantees.
What carries the argument
The stabilized, bias-corrected second-moment estimator combined with local-global alignment mechanism to handle DP noise and client drift in federated AdamW updates.
If this is right
- Convergence rate accelerates linearly even with heterogeneous client data.
- Tighter differential privacy guarantees are achieved compared to prior DPFL methods.
- Improved performance on large models such as Swin-Base transformers and ResNet-18 under privacy budgets like ε=1.
- Effective for both language and vision tasks in federated settings.
- Outperforms state-of-the-art by 5.83% on Tiny-ImageNet with Swin-Base.
Where Pith is reading between the lines
- The method could support training of even larger foundation models in privacy-sensitive distributed environments.
- Similar stabilization techniques might apply to other adaptive optimizers in DPFL scenarios.
- Real-world deployments might see reduced communication rounds due to faster convergence.
- Further work could explore extensions to other privacy mechanisms beyond (ε,δ)-DP.
Load-bearing premise
The proposed stabilizations and bias corrections fully resolve the amplified variance and client drift issues induced by DP noise and data heterogeneity in practice for large-scale models.
What would settle it
Observing persistent bias in the second-moment estimator or sub-linear convergence in an experiment applying DP-FedAdamW to a large language or vision model under DP constraints would disprove the claims.
Figures
read the original abstract
Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any heterogeneity assumption, while providing tighter $(\varepsilon,\delta)$-DP guarantees. Our empirical results demonstrate the effectiveness of DP-FedAdamW across language and vision Transformers and ResNet-18. On Tiny-ImageNet (Swin-Base, $\varepsilon=1$), DP-FedAdamW outperforms the state-of-the-art (SOTA) by 5.83\%. The code is available in Appendix.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DP-FedAdamW as the first AdamW-based optimizer for differentially private federated learning (DPFL). It identifies three issues when applying AdamW directly to DPFL—amplified variance in the second-moment estimator from heterogeneity and noise, DP-induced bias in that estimator, and worsened client drift from local overfitting—and addresses them via variance stabilization, bias removal, and local-update alignment to the global descent. The central claims are an unbiased second-moment estimator, a proof of linearly accelerated convergence without heterogeneity assumptions, and tighter (ε,δ)-DP guarantees. Experiments report gains on vision and language Transformers plus ResNet-18, including a 5.83% improvement over SOTA on Tiny-ImageNet (Swin-Base, ε=1).
Significance. If the theoretical claims hold, the work would be significant for enabling efficient, theoretically grounded AdamW optimization in DPFL of large models. The absence of a heterogeneity assumption in the convergence result is a notable strength relative to prior DPFL analyses, and the empirical gains on modern architectures suggest practical relevance for privacy-preserving training of Transformers.
major comments (2)
- [Abstract and §4] Abstract and §4 (Theoretical Analysis): the claims of an unbiased second-moment estimator and linearly accelerated convergence without heterogeneity assumptions are load-bearing for the paper's contribution, yet the provided text supplies no derivation steps, explicit assumptions on the noise model, or handling of the DP perturbation in the second-moment term; without these details the central theoretical result cannot be verified.
- [§5] §5 (Experiments): the reported 5.83% gain on Tiny-ImageNet (Swin-Base, ε=1) is presented as outperforming SOTA, but the text gives no protocol details, variance across runs, or ablation isolating the contribution of each stabilization component; this leaves the empirical support for the practical claims unverifiable.
minor comments (2)
- [Abstract] The code-availability statement appears only in the abstract; it should be repeated with a concrete link or appendix reference in the main text.
- [§3] Notation for the stabilized second-moment estimator and the bias-correction term should be introduced with explicit equations before the convergence theorem is stated.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve verifiability of both the theoretical claims and empirical results.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Theoretical Analysis): the claims of an unbiased second-moment estimator and linearly accelerated convergence without heterogeneity assumptions are load-bearing for the paper's contribution, yet the provided text supplies no derivation steps, explicit assumptions on the noise model, or handling of the DP perturbation in the second-moment term; without these details the central theoretical result cannot be verified.
Authors: We acknowledge that §4 states the main results concisely. The full derivation of the unbiased second-moment estimator (via explicit bias-correction for the Gaussian DP noise) and the linear convergence proof (under smoothness and bounded-variance assumptions only) appear in Appendix A. The noise model is standard additive Gaussian perturbation from the DP mechanism, and the second-moment term is corrected before the square-root operation. We will insert key derivation steps and an assumptions paragraph into the main §4 text. revision: yes
-
Referee: [§5] §5 (Experiments): the reported 5.83% gain on Tiny-ImageNet (Swin-Base, ε=1) is presented as outperforming SOTA, but the text gives no protocol details, variance across runs, or ablation isolating the contribution of each stabilization component; this leaves the empirical support for the practical claims unverifiable.
Authors: We agree that additional details are needed for reproducibility. The revised version will add: a complete experimental protocol subsection (§5.1) covering hyperparameters, DP noise calibration, and client sampling; mean and standard deviation over five independent runs for all reported numbers; and an ablation study (§5.4) that isolates the contribution of variance stabilization, bias removal, and local-update alignment. These changes will substantiate the 5.83% gain. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper introduces DP-FedAdamW to address variance amplification, DP bias in second-moment estimates, and client drift via stabilization, bias removal, and local-global alignment. The core theoretical results—an unbiased second-moment estimator and linearly accelerated convergence without heterogeneity assumptions—are stated as proven outcomes with tighter DP bounds. No equations, definitions, or self-citations in the abstract or claims reduce these results to fitted parameters, self-definitions, or prior author work by construction. The derivation chain remains self-contained against external benchmarks, with no load-bearing self-citation or ansatz smuggling exhibited.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions for proving convergence rates in stochastic optimization
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose DP-FedAdamW... stabilizing second-moment variance, removing DP-induced bias, and aligning local updates... unbiased second-moment estimator and prove a linearly accelerated convergence rate
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Deep learning with differential privacy
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016. 1
work page 2016
-
[2]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
A numerical des perspective on unfolded lin- earized admm networks for inverse problems
Weixin An, Yingjie Yue, Yuanyuan Liu, Fanhua Shang, and Hongying Liu. A numerical des perspective on unfolded lin- earized admm networks for inverse problems. InProceed- ings of the 30th ACM International Conference on Multime- dia, pages 5065–5073, 2022. 3
work page 2022
-
[4]
Weixin An, Yuanyuan Liu, Fanhua Shang, and Hongying Liu. Robust and faster zeroth-order minimax optimization: complexity and applications.Advances in Neural Informa- tion Processing Systems, 37:37050–37069, 2024
work page 2024
-
[5]
Weixin An, Yuanyuan Liu, Fanhua Shang, Hongying Liu, and Licheng Jiao. Des-inspired accelerated unfolded lin- earized admm networks for inverse problems.IEEE Trans- actions on Neural Networks and Learning Systems, 36(3): 5319–5333, 2024. 3
work page 2024
-
[6]
Toward communi- cation efficient adaptive gradient method
Xiangyi Chen, Xiaoyun Li, and Ping Li. Toward communi- cation efficient adaptive gradient method. InProceedings of the 2020 ACM-IMS on Foundations of Data Science Confer- ence, pages 119–128, 2020. 3
work page 2020
-
[7]
Fair federated learning under domain skew with local consistency and do- main diversity
Yuhang Chen, Wenke Huang, and Mang Ye. Fair federated learning under domain skew with local consistency and do- main diversity. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12077– 12086, 2024. 1
work page 2024
-
[8]
Qiao Ding, Nanyu Li, Heng Ding, Jian Wang, Tao Li, Yongqing Chen, Yantuan Xian, and Junyang Chen. Traf- fic prediction and load balancing routing algorithm based on deep q-network for sd-iot.Advanced Engineering Informat- ics, 68:103596, 2025. 3
work page 2025
-
[9]
Qiao Ding, Heng Ding, Jian Wang, Yantuan Xian, Tao Li, Nanyu Li, Tao Fang, and Junyang Chen. Enhancing news classification: Domain-specific guided pretraining based on adaptive selective masking.Knowledge-Based Systems, page 115516, 2026. 3
work page 2026
-
[10]
Cynthia Dwork. Differential privacy. InInternational col- loquium on automata, languages, and programming, pages 1–12. Springer, 2006. 1
work page 2006
-
[11]
Cynthia Dwork, Aaron Roth, et al. The algorithmic foun- dations of differential privacy.Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014. 6
work page 2014
-
[12]
Refiner: Data refining against gradient leakage attacks in federated learning
Mingyuan Fan, Cen Chen, Chengyu Wang, Xiaodan Li, and Wenmeng Zhou. Refiner: Data refining against gradient leakage attacks in federated learning. In34th USENIX Se- curity Symposium (USENIX Security 25), pages 3005–3024,
-
[13]
MaskCon: Masked Con- trastive Learning for Coarse-Labelled Dataset
Chen Feng and Ioannis Patras. MaskCon: Masked Con- trastive Learning for Coarse-Labelled Dataset. InProceed- 9 ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 3
work page 2023
-
[14]
SSR: An Efficient and Robust Framework for Learning with Unknown Label Noise
Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras. SSR: An Efficient and Robust Framework for Learning with Unknown Label Noise. In33rd British Machine Vision Con- ference (BMVC), 2022
work page 2022
-
[15]
CLIPCleaner: Cleaning Noisy Labels with CLIP
Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras. CLIPCleaner: Cleaning Noisy Labels with CLIP. InThe 32nd ACM International Conference on Multimedia (ACM MM), 2024
work page 2024
-
[16]
Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras. NoiseBox: Towards More Efficient and Effective Learning with Noisy Labels.IEEE Transactions on Circuits and Sys- tems for Video Technology, 2024
work page 2024
-
[17]
Chen Feng, Ziquan Liu, Zhuo Zhi, Ilija Bogunovic, Carsten Gerner-Beuerle, and Miguel R. D. Rodrigues. PROSAC: Provably safe certification for machine learning models un- der adversarial attacks. InThe 39th Annual AAAI Conference on Artificial Intelligence (AAAI) [Oral], 2025
work page 2025
-
[18]
Chen Feng, Nicu Sebe, Georgios Tzimiropoulos, Miguel R. D. Rodrigues, and Ioannis Patras. Unveiling open-set noise: Theoretical insights into label noise. InThe 33rd ACM International Conference on Multimedia (ACM MM), 2025
work page 2025
-
[19]
Chen Feng, Minghe Shen, Ananth Balashankar, Carsten Gerner-Beuerle, and Miguel R. D. Rodrigues. Noisy but valid: Robust statistical evaluation of LLMs with imper- fect judges. InThe Fourteenth International Conference on Learning Representations (ICLR), 2026
work page 2026
-
[20]
De- constructing the failure of ideal noise correction: A three- pillar diagnosis
Chen Feng, Zhuo Zhi, Zhao Huang, Jiawei Ge, Ling Xiao, Nicu Sebe, Georgios Tzimiropoulos, and Ioannis Patras. De- constructing the failure of ideal noise correction: A three- pillar diagnosis. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[21]
Sharpness-aware minimization for efficiently improving generalization
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. InICLR, 2021. 2
work page 2021
-
[22]
Jie Fu, Yuan Hong, Xinpeng Ling, Leixia Wang, Xun Ran, Zhiyu Sun, Wendy Hui Wang, Zhili Chen, and Yang Cao. Differentially private federated learning: A systematic re- view.arXiv preprint arXiv:2405.08299, 2024. 1, 6
-
[23]
Differentially Private Federated Learning: A Client Level Perspective
Robin C Geyer, Tassilo Klein, and Moin Nabi. Differentially private federated learning: A client level perspective.arXiv preprint arXiv:1712.07557, 2017. 1
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
In34th USENIX Security Symposium (USENIX Security 25), pages 3065– 3082, 2025
Xiaolan Gu, Ming Li, and Li Xiong.{DP- BREM}:{Differentially-Private}and{Byzantine-Robust} federated learning with client momentum. In34th USENIX Security Symposium (USENIX Security 25), pages 3065– 3082, 2025. 1
work page 2025
-
[25]
Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. Mea- suring the effects of non-identical data distribution for feder- ated visual classification.arXiv preprint arXiv:1909.06335,
-
[26]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin Ming-Wei Chang Kenton, Lee Kristina Toutanova, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of naacL-HLT. Minneapolis, Minnesota, 2019. 1
work page 2019
-
[27]
Adam: A Method for Stochastic Optimization
Diederik P Kingma. Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980, 2014. 3
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[28]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. Technical Report. 6, 1, 2
work page 2009
-
[29]
Heavy-tailed class imbalance and why adam outperforms gradient descent on language models
Frederik Kunstner, Alan Milligan, Robin Yadav, Mark Schmidt, and Alberto Bietti. Heavy-tailed class imbalance and why adam outperforms gradient descent on language models. InAdvances in Neural Information Processing Sys- tems, pages 30106–30148. Curran Associates, Inc., 2024. 3
work page 2024
-
[30]
Tiny imagenet visual recognition challenge, 2015
Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge, 2015. Stanford CS231N. 6, 1, 2
work page 2015
-
[31]
Yuqi Li, Kai Li, Xin Yin, Zhifei Yang, Junhao Dong, Zeyu Dong, Chuanguang Yang, Yingli Tian, and Yao Lu. Sep- prune: Structured pruning for efficient deep speech separa- tion.arXiv preprint arXiv:2505.12079, 2025. 3
-
[32]
Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting
Yuqi Li, Chuanguang Yang, Hansheng Zeng, Zeyu Dong, Zhulin An, Yongjun Xu, Yingli Tian, and Hao Wu. Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7262– 7272, 2025
work page 2025
-
[33]
A com- prehensive survey of interaction techniques in 3d scene gen- eration.Authorea Preprints, 2026
Yuqi Li, Siwei Meng, Chuanguang Yang, Weilun Feng, Jun- ming Liu, Zhulin An, Yikai Wang, and Yingli Tian. A com- prehensive survey of interaction techniques in 3d scene gen- eration.Authorea Preprints, 2026. 3
work page 2026
-
[34]
Zhicong Liang, Bao Wang, Quanquan Gu, Stanley Osher, and Yuan Yao. Differentially private federated learning with laplacian smoothing.Applied and Computational Harmonic Analysis, 72:101660, 2024. 2, 6
work page 2024
-
[35]
Convex relaxation for robust vanishing point estimation in manhattan world
Bangyan Liao, Zhenjun Zhao, Haoang Li, Yi Zhou, Ying- ping Zeng, Hao Li, and Peidong Liu. Convex relaxation for robust vanishing point estimation in manhattan world. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 15823–15832, 2025. 3
work page 2025
-
[36]
Dingyuan Liu, Qiannan Shen, and Jiaci Liu. The health- wealth gradient in labor markets: Integrating health, in- surance, and social metrics to predict employment density. Computation, 14(1):22, 2026. 3
work page 2026
-
[37]
Cross-silo federated learning with record-level per- sonalized differential privacy
Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, and Xiaofeng Meng. Cross-silo federated learning with record-level per- sonalized differential privacy. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communi- cations Security, pages 303–317, 2024. 2
work page 2024
-
[38]
Junkang Liu, Fanhua Shang, Yuanyuan Liu, Hongying Liu, Yuangang Li, and YunXiang Gong. Fedbcgd: Communication-efficient accelerated block coordinate gra- dient descent for federated learning. InProceedings of the 32nd ACM International Conference on Multimedia, pages 2955–2963, 2024. 3
work page 2024
-
[39]
Junkang Liu, Yuanyuan Liu, Fanhua Shang, Hongying Liu, Jin Liu, and Wei Feng. Improving generalization in federated learning with highly heterogeneous data via momentum-based stochastic controlled weight averaging. In Forty-second International Conference on Machine Learn- ing, 2025. 3 10
work page 2025
-
[40]
Consistency of local and global flat- ness for federated learning
Junkang Liu, Fanhua Shang, Yuxuan Tian, Hongying Liu, and Yuanyuan Liu. Consistency of local and global flat- ness for federated learning. InProceedings of the 33rd ACM International Conference on Multimedia, page 3875–3883, New York, NY , USA, 2025. Association for Computing Ma- chinery. 3
work page 2025
-
[41]
Consistency of local and global flat- ness for federated learning
Junkang Liu, Fanhua Shang, Yuxuan Tian, Hongying Liu, and Yuanyuan Liu. Consistency of local and global flat- ness for federated learning. InProceedings of the 33rd ACM International Conference on Multimedia, pages 3875–3883,
-
[42]
Junkang Liu, Fanhua Shang, Junchao Zhou, Hongying Liu, Yuanyuan Liu, and Jin Liu. Fedmuon: Accelerating feder- ated learning with matrix orthogonalization.arXiv preprint arXiv:2510.27403, 2025. 3
-
[43]
Junkang Liu, Fanhua Shang, Kewen Zhu, Hongying Liu, Yuanyuan Liu, and Jin Liu. Fedadamw: A communication- efficient optimizer with convergence and generalization guarantees for federated large models.arXiv preprint arXiv:2510.27486, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Junkang Liu, Yuxuan Tian, Fanhua Shang, Yuanyuan Liu, Hongying Liu, Junchao Zhou, and Daorui Ding. Dp-fedpgn: Finding global flat minima for differentially private feder- ated learning via penalizing gradient norm.arXiv preprint arXiv:2510.27504, 2025. 3
-
[45]
Yuanyuan Liu, Fanhua Shang, Weixin An, Hongying Liu, and Zhouchen Lin. Kill a bird with two stones: Closing the convergence gaps in non-strongly convex optimization by di- rectly accelerated svrg with double compensation and snap- shots. InInternational Conference on Machine Learning, pages 14008–14035. PMLR, 2022. 3
work page 2022
-
[46]
Yuanyuan Liu, Fanhua Shang, Weixin An, Junhao Liu, Hongying Liu, and Zhouchen Lin. A single-loop accelerated extra-gradient difference algorithm with improved complex- ity bounds for constrained minimax optimization.Advances in Neural Information Processing Systems, 36:61699–61711,
-
[47]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 1
work page 2021
-
[48]
Decoupled Weight Decay Regularization
Ilya Loshchilov, Frank Hutter, et al. Fixing weight decay regularization in adam.arXiv preprint arXiv:1711.05101, 5 (5):5, 2017. 1, 3
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[49]
Communication- efficient learning of deep networks from decentralized data
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–1282. PMLR, 2017. 1
work page 2017
-
[50]
Improving global gen- eralization and local personalization for federated learning
Lei Meng, Zhuang Qi, Lei Wu, Xiaoyu Du, Zhaochuan Li, Lizhen Cui, and Xiangxu Meng. Improving global gen- eralization and local personalization for federated learning. IEEE Transactions on Neural Networks and Learning Sys- tems, 36(1):76–87, 2025. 3
work page 2025
-
[51]
Ilya Mironov. R ´enyi differential privacy. InProc. IEEE com- puter security foundations symposium (CSF), pages 263– 275, 2017. 6
work page 2017
-
[52]
Differentially private federated learning on heterogeneous data
Maxence Noble, Aur ´elien Bellet, and Aymeric Dieuleveut. Differentially private federated learning on heterogeneous data. InInternational conference on artificial intelligence and statistics, pages 10110–10145. PMLR, 2022. 1, 2, 6
work page 2022
-
[53]
Laplacian smoothing gradient descent.Research in the Mathematical Sciences, 9(3):55, 2022
Stanley Osher, Bao Wang, Penghang Yin, Xiyang Luo, Farzin Barekat, Minh Pham, and Alex Lin. Laplacian smoothing gradient descent.Research in the Mathematical Sciences, 9(3):55, 2022. 2
work page 2022
-
[54]
Xin Qi, Meixuan Li, Sijin Zhou, Wei Feng, and Zhuang Qi. Federated learning for science: A survey on the path to a trustworthy collaboration ecosystem.Authorea Preprints,
-
[55]
Xin Qi, Tao Xu, Chengrun Dang, Zhuang Qi, Lei Meng, and Han Yu. Federated learning in oncology: bridging artificial intelligence innovation and privacy protection.Information Fusion, page 104154, 2026
work page 2026
-
[56]
Cross-silo prototypical calibration for fed- erated learning with non-iid data
Zhuang Qi, Lei Meng, Zitan Chen, Han Hu, Hui Lin, and Xiangxu Meng. Cross-silo prototypical calibration for fed- erated learning with non-iid data. InProceedings of the 31st ACM international conference on multimedia, pages 3099– 3107, 2023. 3
work page 2023
-
[57]
On the convergence of adam and beyond
Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of adam and beyond. InInternational Confer- ence on Learning Representations, 2018. 3
work page 2018
-
[58]
On the Convergence of Adam and Beyond
Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of adam and beyond.arXiv preprint arXiv:1904.09237, 2019. 3
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[59]
Adaptive federated optimization
Sashank J Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Kone ˇcn`y, Sanjiv Kumar, and Hugh Brendan McMahan. Adaptive federated optimization. InInternational Conference on Learning Representations,
-
[60]
Qiannan Shen and Jing Zhang. Ai-enhanced disaster risk prediction with explainable shap analysis: A multi-class classification approach using xgboost. 2025. Preprint, Ver- sion 1, posted December 31, 2025. 3
work page 2025
-
[61]
Qiannan Shen and Jing Zhang. Mftformer: Meteorological- frequency-temporal transformer with block-aligned fusion for traffic flow prediction.Research Square, 2026. Preprint, doi:10.21203/rs.3.rs-8770196/v1. 3
-
[62]
Make landscape flatter in differentially private federated learning
Yifan Shi, Yingqi Liu, Kang Wei, Li Shen, Xueqian Wang, and Dacheng Tao. Make landscape flatter in differentially private federated learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24552–24562, 2023. 2, 6
work page 2023
-
[63]
Wenxi Sun, Zhichun Qi, and Qiannan Shen. High-recall deep learning: A gated recurrent unit approach to bank account fraud detection on imbalanced data. 2025. 3
work page 2025
-
[64]
Objective over architecture: Fraud detection under extreme imbalance in bank account opening
Wenxi Sun, Qiannan Shen, Yijun Gao, Qinkai Mao, Tong- song Qi, and Shuo Xu. Objective over architecture: Fraud detection under extreme imbalance in bank account opening. Computation, 13(12):290, 2025. 3
work page 2025
-
[65]
Yan Sun, Li Shen, Hao Sun, Liang Ding, and Dacheng Tao. Efficient federated learning via local adaptive amended opti- mizer with linear speedup.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12):14453–14464,
-
[66]
LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition
Zhonglin Sun, Chen Feng, Ioannis Patras, and Georgios Tz- imiropoulos. LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 3
work page 2024
-
[67]
Glue: A multi-task benchmark and analysis platform for natural language un- derstanding
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. Glue: A multi-task benchmark and analysis platform for natural language un- derstanding. InProceedings of the 2018 EMNLP workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP, pages 353–355, 2018. 6, 1
work page 2018
-
[68]
Chengkun Wei, Weixian Li, Gong Chen, and Wenzhi Chen. Dc-sgd: Differentially private sgd with dynamic clipping through gradient norm distribution estimation.IEEE Trans- actions on Information Forensics and Security, 2025. 2
work page 2025
-
[69]
Faster adaptive federated learning
Xidong Wu, Feihu Huang, Zhengmian Hu, and Heng Huang. Faster adaptive federated learning. InProceedings of the AAAI conference on artificial intelligence, pages 10379– 10387, 2023. 3, 8
work page 2023
-
[70]
Implicit bias of adamw:ℓ ∞- norm constrained optimization
Shuo Xie and Zhiyuan Li. Implicit bias of adamw:ℓ ∞- norm constrained optimization. InInternational Conference on Machine Learning, pages 54488–54510. PMLR, 2024. 3
work page 2024
-
[71]
Runhua Xu, Shiqi Gao, Chao Li, James Joshi, and Jianxin Li. Dual defense: Enhancing privacy and mitigating poison- ing attacks in federated learning.Advances in Neural Infor- mation Processing Systems, 37:70476–70498, 2024. 1
work page 2024
-
[72]
Xiangrui Xu, Zhize Li, Yufei Han, Bin Wang, Jiqiang Liu, and Wei Wang. From risk to resilience: Towards assess- ing and mitigating the risk of data reconstruction attacks in federated learning. In34th USENIX Security Symposium (USENIX Security 25), pages 3141–3160, 2025. 1
work page 2025
-
[73]
In32nd USENIX Security Symposium (USENIX Security 23), pages 1595–1612, 2023
Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao.{PrivateFL}: Accurate, differentially private feder- ated learning via personalized data transformation. In32nd USENIX Security Symposium (USENIX Security 23), pages 1595–1612, 2023. 1
work page 2023
-
[74]
Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, and Zhiquan Luo. Why transformers need adam: A hes- sian perspective.Advances in neural information processing systems, 37:131786–131823, 2024. 3
work page 2024
-
[75]
Balf: Simple and efficient blur aware lo- cal feature detector
Zhenjun Zhao. Balf: Simple and efficient blur aware lo- cal feature detector. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 3362–3372, 2024. 3
work page 2024
-
[76]
Benchmark for evaluating initialization of visual-inertial odometry
Zhenjun Zhao and Ben M Chen. Benchmark for evaluating initialization of visual-inertial odometry. In2023 42nd Chi- nese Control Conference (CCC), pages 3935–3940. IEEE, 2023
work page 2023
-
[77]
Advances in global solvers for 3d vision.arXiv preprint arXiv:2602.14662, 2026
Zhenjun Zhao, Heng Yang, Bangyan Liao, Yingping Zeng, Shaocheng Yan, Yingdong Gu, Peidong Liu, Yi Zhou, Haoang Li, and Javier Civera. Advances in global solvers for 3d vision.arXiv preprint arXiv:2602.14662, 2026. 3
-
[78]
Pan Zhou, Xingyu Xie, Zhouchen Lin, and Shuicheng Yan. Towards understanding convergence and generalization of adamw.IEEE transactions on pattern analysis and machine intelligence, 46(9):6486–6493, 2024. 3 12 DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models Supplementary Material
work page 2024
-
[79]
More Implementation Detail 9.1. More Results Swin-Tiny/Base on CIFAR-10.Table 10 reports the averaged test accuracy on CIFAR-10 for six different DPFL methods evaluated on Swin-Tiny and Swin-Base, under two data heterogeneity levels (Dirichletα=0.6andα=0.1). Overall, DP- FedAdamW consistently achieves the best performance, while Swin-Base is markedly more...
-
[80]
DP-LocalAdamW Algorithm For completeness, we provide in Algorithm 2 the full local training procedure ofDP-LocalAdamW
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.