arxiv: 2602.19945 · v1 · submitted 2026-02-23 · 💻 cs.LG · cs.AI

Recognition: 1 theorem link

· Lean Theorem

DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models

Jin Liu , Yinbin Miao , Ning Xi , Junkang Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 20:00 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords differential privacyfederated learningAdamW optimizerlarge language modelsconvergence analysisclient driftsecond moment estimation

0 comments

The pith

DP-FedAdamW provides an unbiased second-moment estimator for AdamW in differentially private federated learning, enabling linearly accelerated convergence without heterogeneity assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the challenge of using efficient adaptive optimizers like AdamW in federated learning under differential privacy constraints for large models. Direct application of AdamW leads to increased variance in moment estimates from combined data heterogeneity and privacy noise, plus bias from perturbations and worsened client drift. DP-FedAdamW stabilizes the variance, corrects the bias, and aligns local updates with global descent to mitigate these issues. The authors prove that this yields an unbiased estimator and a convergence rate that improves linearly, independent of any assumptions on data distribution across clients, along with improved privacy accounting. Results on vision and language transformers demonstrate outperformance over existing methods.

Core claim

We propose DP-FedAdamW as the first AdamW-based optimizer for DPFL that restores AdamW functionality by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to curb client drift, establishing an unbiased second-moment estimator that proves linearly accelerated convergence without heterogeneity assumptions and tighter (ε,δ)-DP guarantees.

What carries the argument

The stabilized, bias-corrected second-moment estimator combined with local-global alignment mechanism to handle DP noise and client drift in federated AdamW updates.

If this is right

Convergence rate accelerates linearly even with heterogeneous client data.
Tighter differential privacy guarantees are achieved compared to prior DPFL methods.
Improved performance on large models such as Swin-Base transformers and ResNet-18 under privacy budgets like ε=1.
Effective for both language and vision tasks in federated settings.
Outperforms state-of-the-art by 5.83% on Tiny-ImageNet with Swin-Base.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could support training of even larger foundation models in privacy-sensitive distributed environments.
Similar stabilization techniques might apply to other adaptive optimizers in DPFL scenarios.
Real-world deployments might see reduced communication rounds due to faster convergence.
Further work could explore extensions to other privacy mechanisms beyond (ε,δ)-DP.

Load-bearing premise

The proposed stabilizations and bias corrections fully resolve the amplified variance and client drift issues induced by DP noise and data heterogeneity in practice for large-scale models.

What would settle it

Observing persistent bias in the second-moment estimator or sub-linear convergence in an experiment applying DP-FedAdamW to a large language or vision model under DP constraints would disprove the claims.

Figures

Figures reproduced from arXiv: 2602.19945 by Jin Liu, Junkang Liu, Ning Xi, Yinbin Miao.

**Figure 2.** Figure 2: An illustration of local update in DP-FedAdamW, which corrects client drift caused through global update guidance. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Training on CIFAR-100, Swin-Tiny, σ=1, α=0.1. (a) Non-IID (DPFL) causes high variance in second-moment estimator across clients of DP-LocalAdamW. (b) DP-LocalAdamW suffers from more severe client drift than FedAvg and LocalAdamW. 4. Motivation and challenges Prior work [48, 78] shows that AdamW can substantially improve convergence and generalization. However, our key observation is that, in DPFL, AdamW … view at source ↗

**Figure 4.** Figure 4: Histogram for DP-LocalAdamW, CIFAR-10, Swin-Tiny, [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Test accuracy (%) on CIFAR-100 using ResNet-18 and Swin-Tiny under the Dirichlet [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any heterogeneity assumption, while providing tighter $(\varepsilon,\delta)$-DP guarantees. Our empirical results demonstrate the effectiveness of DP-FedAdamW across language and vision Transformers and ResNet-18. On Tiny-ImageNet (Swin-Base, $\varepsilon=1$), DP-FedAdamW outperforms the state-of-the-art (SOTA) by 5.83\%. The code is available in Appendix.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DP-FedAdamW adapts AdamW for DPFL with targeted fixes for noise and drift, plus a linear convergence claim that skips heterogeneity assumptions.

read the letter

DP-FedAdamW takes AdamW and adapts it specifically for the noise introduced by differential privacy in federated training of large models. The authors fix the second-moment estimator so it stays unbiased even with DP perturbations, stabilize its variance against the combined effects of heterogeneity and noise, and adjust local updates to limit client drift. They back this with a proof of linear convergence that avoids the usual assumptions about data heterogeneity across clients, which is the standout theoretical move. They also tighten the privacy bounds.

Referee Report

2 major / 2 minor

Summary. The paper proposes DP-FedAdamW as the first AdamW-based optimizer for differentially private federated learning (DPFL). It identifies three issues when applying AdamW directly to DPFL—amplified variance in the second-moment estimator from heterogeneity and noise, DP-induced bias in that estimator, and worsened client drift from local overfitting—and addresses them via variance stabilization, bias removal, and local-update alignment to the global descent. The central claims are an unbiased second-moment estimator, a proof of linearly accelerated convergence without heterogeneity assumptions, and tighter (ε,δ)-DP guarantees. Experiments report gains on vision and language Transformers plus ResNet-18, including a 5.83% improvement over SOTA on Tiny-ImageNet (Swin-Base, ε=1).

Significance. If the theoretical claims hold, the work would be significant for enabling efficient, theoretically grounded AdamW optimization in DPFL of large models. The absence of a heterogeneity assumption in the convergence result is a notable strength relative to prior DPFL analyses, and the empirical gains on modern architectures suggest practical relevance for privacy-preserving training of Transformers.

major comments (2)

[Abstract and §4] Abstract and §4 (Theoretical Analysis): the claims of an unbiased second-moment estimator and linearly accelerated convergence without heterogeneity assumptions are load-bearing for the paper's contribution, yet the provided text supplies no derivation steps, explicit assumptions on the noise model, or handling of the DP perturbation in the second-moment term; without these details the central theoretical result cannot be verified.
[§5] §5 (Experiments): the reported 5.83% gain on Tiny-ImageNet (Swin-Base, ε=1) is presented as outperforming SOTA, but the text gives no protocol details, variance across runs, or ablation isolating the contribution of each stabilization component; this leaves the empirical support for the practical claims unverifiable.

minor comments (2)

[Abstract] The code-availability statement appears only in the abstract; it should be repeated with a concrete link or appendix reference in the main text.
[§3] Notation for the stabilized second-moment estimator and the bias-correction term should be introduced with explicit equations before the convergence theorem is stated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve verifiability of both the theoretical claims and empirical results.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Theoretical Analysis): the claims of an unbiased second-moment estimator and linearly accelerated convergence without heterogeneity assumptions are load-bearing for the paper's contribution, yet the provided text supplies no derivation steps, explicit assumptions on the noise model, or handling of the DP perturbation in the second-moment term; without these details the central theoretical result cannot be verified.

Authors: We acknowledge that §4 states the main results concisely. The full derivation of the unbiased second-moment estimator (via explicit bias-correction for the Gaussian DP noise) and the linear convergence proof (under smoothness and bounded-variance assumptions only) appear in Appendix A. The noise model is standard additive Gaussian perturbation from the DP mechanism, and the second-moment term is corrected before the square-root operation. We will insert key derivation steps and an assumptions paragraph into the main §4 text. revision: yes
Referee: [§5] §5 (Experiments): the reported 5.83% gain on Tiny-ImageNet (Swin-Base, ε=1) is presented as outperforming SOTA, but the text gives no protocol details, variance across runs, or ablation isolating the contribution of each stabilization component; this leaves the empirical support for the practical claims unverifiable.

Authors: We agree that additional details are needed for reproducibility. The revised version will add: a complete experimental protocol subsection (§5.1) covering hyperparameters, DP noise calibration, and client sampling; mean and standard deviation over five independent runs for all reported numbers; and an ablation study (§5.4) that isolates the contribution of variance stabilization, bias removal, and local-update alignment. These changes will substantiate the 5.83% gain. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces DP-FedAdamW to address variance amplification, DP bias in second-moment estimates, and client drift via stabilization, bias removal, and local-global alignment. The core theoretical results—an unbiased second-moment estimator and linearly accelerated convergence without heterogeneity assumptions—are stated as proven outcomes with tighter DP bounds. No equations, definitions, or self-citations in the abstract or claims reduce these results to fitted parameters, self-definitions, or prior author work by construction. The derivation chain remains self-contained against external benchmarks, with no load-bearing self-citation or ansatz smuggling exhibited.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, no specific free parameters, ad-hoc axioms, or invented entities are described; the work relies on standard optimization analysis assumptions.

axioms (1)

standard math Standard assumptions for proving convergence rates in stochastic optimization
Invoked implicitly for the linear convergence claim.

pith-pipeline@v0.9.0 · 5527 in / 1219 out tokens · 46796 ms · 2026-05-15T20:00:53.752350+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose DP-FedAdamW... stabilizing second-moment variance, removing DP-induced bias, and aligning local updates... unbiased second-moment estimator and prove a linearly accelerated convergence rate

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 6 internal anchors

[1]

Deep learning with differential privacy

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016. 1

work page 2016
[2]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

A numerical des perspective on unfolded lin- earized admm networks for inverse problems

Weixin An, Yingjie Yue, Yuanyuan Liu, Fanhua Shang, and Hongying Liu. A numerical des perspective on unfolded lin- earized admm networks for inverse problems. InProceed- ings of the 30th ACM International Conference on Multime- dia, pages 5065–5073, 2022. 3

work page 2022
[4]

Robust and faster zeroth-order minimax optimization: complexity and applications.Advances in Neural Informa- tion Processing Systems, 37:37050–37069, 2024

Weixin An, Yuanyuan Liu, Fanhua Shang, and Hongying Liu. Robust and faster zeroth-order minimax optimization: complexity and applications.Advances in Neural Informa- tion Processing Systems, 37:37050–37069, 2024

work page 2024
[5]

Des-inspired accelerated unfolded lin- earized admm networks for inverse problems.IEEE Trans- actions on Neural Networks and Learning Systems, 36(3): 5319–5333, 2024

Weixin An, Yuanyuan Liu, Fanhua Shang, Hongying Liu, and Licheng Jiao. Des-inspired accelerated unfolded lin- earized admm networks for inverse problems.IEEE Trans- actions on Neural Networks and Learning Systems, 36(3): 5319–5333, 2024. 3

work page 2024
[6]

Toward communi- cation efficient adaptive gradient method

Xiangyi Chen, Xiaoyun Li, and Ping Li. Toward communi- cation efficient adaptive gradient method. InProceedings of the 2020 ACM-IMS on Foundations of Data Science Confer- ence, pages 119–128, 2020. 3

work page 2020
[7]

Fair federated learning under domain skew with local consistency and do- main diversity

Yuhang Chen, Wenke Huang, and Mang Ye. Fair federated learning under domain skew with local consistency and do- main diversity. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12077– 12086, 2024. 1

work page 2024
[8]

Traf- fic prediction and load balancing routing algorithm based on deep q-network for sd-iot.Advanced Engineering Informat- ics, 68:103596, 2025

Qiao Ding, Nanyu Li, Heng Ding, Jian Wang, Tao Li, Yongqing Chen, Yantuan Xian, and Junyang Chen. Traf- fic prediction and load balancing routing algorithm based on deep q-network for sd-iot.Advanced Engineering Informat- ics, 68:103596, 2025. 3

work page 2025
[9]

Enhancing news classification: Domain-specific guided pretraining based on adaptive selective masking.Knowledge-Based Systems, page 115516, 2026

Qiao Ding, Heng Ding, Jian Wang, Yantuan Xian, Tao Li, Nanyu Li, Tao Fang, and Junyang Chen. Enhancing news classification: Domain-specific guided pretraining based on adaptive selective masking.Knowledge-Based Systems, page 115516, 2026. 3

work page 2026
[10]

Differential privacy

Cynthia Dwork. Differential privacy. InInternational col- loquium on automata, languages, and programming, pages 1–12. Springer, 2006. 1

work page 2006
[11]

The algorithmic foun- dations of differential privacy.Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014

Cynthia Dwork, Aaron Roth, et al. The algorithmic foun- dations of differential privacy.Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014. 6

work page 2014
[12]

Refiner: Data refining against gradient leakage attacks in federated learning

Mingyuan Fan, Cen Chen, Chengyu Wang, Xiaodan Li, and Wenmeng Zhou. Refiner: Data refining against gradient leakage attacks in federated learning. In34th USENIX Se- curity Symposium (USENIX Security 25), pages 3005–3024,

work page
[13]

MaskCon: Masked Con- trastive Learning for Coarse-Labelled Dataset

Chen Feng and Ioannis Patras. MaskCon: Masked Con- trastive Learning for Coarse-Labelled Dataset. InProceed- 9 ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 3

work page 2023
[14]

SSR: An Efficient and Robust Framework for Learning with Unknown Label Noise

Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras. SSR: An Efficient and Robust Framework for Learning with Unknown Label Noise. In33rd British Machine Vision Con- ference (BMVC), 2022

work page 2022
[15]

CLIPCleaner: Cleaning Noisy Labels with CLIP

Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras. CLIPCleaner: Cleaning Noisy Labels with CLIP. InThe 32nd ACM International Conference on Multimedia (ACM MM), 2024

work page 2024
[16]

NoiseBox: Towards More Efficient and Effective Learning with Noisy Labels.IEEE Transactions on Circuits and Sys- tems for Video Technology, 2024

Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras. NoiseBox: Towards More Efficient and Effective Learning with Noisy Labels.IEEE Transactions on Circuits and Sys- tems for Video Technology, 2024

work page 2024
[17]

Chen Feng, Ziquan Liu, Zhuo Zhi, Ilija Bogunovic, Carsten Gerner-Beuerle, and Miguel R. D. Rodrigues. PROSAC: Provably safe certification for machine learning models un- der adversarial attacks. InThe 39th Annual AAAI Conference on Artificial Intelligence (AAAI) [Oral], 2025

work page 2025
[18]

Chen Feng, Nicu Sebe, Georgios Tzimiropoulos, Miguel R. D. Rodrigues, and Ioannis Patras. Unveiling open-set noise: Theoretical insights into label noise. InThe 33rd ACM International Conference on Multimedia (ACM MM), 2025

work page 2025
[19]

Chen Feng, Minghe Shen, Ananth Balashankar, Carsten Gerner-Beuerle, and Miguel R. D. Rodrigues. Noisy but valid: Robust statistical evaluation of LLMs with imper- fect judges. InThe Fourteenth International Conference on Learning Representations (ICLR), 2026

work page 2026
[20]

De- constructing the failure of ideal noise correction: A three- pillar diagnosis

Chen Feng, Zhuo Zhi, Zhao Huang, Jiawei Ge, Ling Xiao, Nicu Sebe, Georgios Tzimiropoulos, and Ioannis Patras. De- constructing the failure of ideal noise correction: A three- pillar diagnosis. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),

work page
[21]

Sharpness-aware minimization for efficiently improving generalization

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. InICLR, 2021. 2

work page 2021
[22]

Differentially private federated learning: A systematic re- view.arXiv preprint arXiv:2405.08299, 2024

Jie Fu, Yuan Hong, Xinpeng Ling, Leixia Wang, Xun Ran, Zhiyu Sun, Wendy Hui Wang, Zhili Chen, and Yang Cao. Differentially private federated learning: A systematic re- view.arXiv preprint arXiv:2405.08299, 2024. 1, 6

work page arXiv 2024
[23]

Differentially Private Federated Learning: A Client Level Perspective

Robin C Geyer, Tassilo Klein, and Moin Nabi. Differentially private federated learning: A client level perspective.arXiv preprint arXiv:1712.07557, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

In34th USENIX Security Symposium (USENIX Security 25), pages 3065– 3082, 2025

Xiaolan Gu, Ming Li, and Li Xiong.{DP- BREM}:{Differentially-Private}and{Byzantine-Robust} federated learning with client momentum. In34th USENIX Security Symposium (USENIX Security 25), pages 3065– 3082, 2025. 1

work page 2025
[25]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, and 1 others

Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. Mea- suring the effects of non-identical data distribution for feder- ated visual classification.arXiv preprint arXiv:1909.06335,

work page arXiv 1909
[26]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin Ming-Wei Chang Kenton, Lee Kristina Toutanova, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of naacL-HLT. Minneapolis, Minnesota, 2019. 1

work page 2019
[27]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980, 2014. 3

work page internal anchor Pith review Pith/arXiv arXiv 2014
[28]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. Technical Report. 6, 1, 2

work page 2009
[29]

Heavy-tailed class imbalance and why adam outperforms gradient descent on language models

Frederik Kunstner, Alan Milligan, Robin Yadav, Mark Schmidt, and Alberto Bietti. Heavy-tailed class imbalance and why adam outperforms gradient descent on language models. InAdvances in Neural Information Processing Sys- tems, pages 30106–30148. Curran Associates, Inc., 2024. 3

work page 2024
[30]

Tiny imagenet visual recognition challenge, 2015

Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge, 2015. Stanford CS231N. 6, 1, 2

work page 2015
[31]

Sep- prune: Structured pruning for efficient deep speech separa- tion.arXiv preprint arXiv:2505.12079, 2025

Yuqi Li, Kai Li, Xin Yin, Zhifei Yang, Junhao Dong, Zeyu Dong, Chuanguang Yang, Yingli Tian, and Yao Lu. Sep- prune: Structured pruning for efficient deep speech separa- tion.arXiv preprint arXiv:2505.12079, 2025. 3

work page arXiv 2025
[32]

Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting

Yuqi Li, Chuanguang Yang, Hansheng Zeng, Zeyu Dong, Zhulin An, Yongjun Xu, Yingli Tian, and Hao Wu. Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7262– 7272, 2025

work page 2025
[33]

A com- prehensive survey of interaction techniques in 3d scene gen- eration.Authorea Preprints, 2026

Yuqi Li, Siwei Meng, Chuanguang Yang, Weilun Feng, Jun- ming Liu, Zhulin An, Yikai Wang, and Yingli Tian. A com- prehensive survey of interaction techniques in 3d scene gen- eration.Authorea Preprints, 2026. 3

work page 2026
[34]

Differentially private federated learning with laplacian smoothing.Applied and Computational Harmonic Analysis, 72:101660, 2024

Zhicong Liang, Bao Wang, Quanquan Gu, Stanley Osher, and Yuan Yao. Differentially private federated learning with laplacian smoothing.Applied and Computational Harmonic Analysis, 72:101660, 2024. 2, 6

work page 2024
[35]

Convex relaxation for robust vanishing point estimation in manhattan world

Bangyan Liao, Zhenjun Zhao, Haoang Li, Yi Zhou, Ying- ping Zeng, Hao Li, and Peidong Liu. Convex relaxation for robust vanishing point estimation in manhattan world. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 15823–15832, 2025. 3

work page 2025
[36]

The health- wealth gradient in labor markets: Integrating health, in- surance, and social metrics to predict employment density

Dingyuan Liu, Qiannan Shen, and Jiaci Liu. The health- wealth gradient in labor markets: Integrating health, in- surance, and social metrics to predict employment density. Computation, 14(1):22, 2026. 3

work page 2026
[37]

Cross-silo federated learning with record-level per- sonalized differential privacy

Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, and Xiaofeng Meng. Cross-silo federated learning with record-level per- sonalized differential privacy. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communi- cations Security, pages 303–317, 2024. 2

work page 2024
[38]

Fedbcgd: Communication-efficient accelerated block coordinate gra- dient descent for federated learning

Junkang Liu, Fanhua Shang, Yuanyuan Liu, Hongying Liu, Yuangang Li, and YunXiang Gong. Fedbcgd: Communication-efficient accelerated block coordinate gra- dient descent for federated learning. InProceedings of the 32nd ACM International Conference on Multimedia, pages 2955–2963, 2024. 3

work page 2024
[39]

Improving generalization in federated learning with highly heterogeneous data via momentum-based stochastic controlled weight averaging

Junkang Liu, Yuanyuan Liu, Fanhua Shang, Hongying Liu, Jin Liu, and Wei Feng. Improving generalization in federated learning with highly heterogeneous data via momentum-based stochastic controlled weight averaging. In Forty-second International Conference on Machine Learn- ing, 2025. 3 10

work page 2025
[40]

Consistency of local and global flat- ness for federated learning

Junkang Liu, Fanhua Shang, Yuxuan Tian, Hongying Liu, and Yuanyuan Liu. Consistency of local and global flat- ness for federated learning. InProceedings of the 33rd ACM International Conference on Multimedia, page 3875–3883, New York, NY , USA, 2025. Association for Computing Ma- chinery. 3

work page 2025
[41]

Consistency of local and global flat- ness for federated learning

Junkang Liu, Fanhua Shang, Yuxuan Tian, Hongying Liu, and Yuanyuan Liu. Consistency of local and global flat- ness for federated learning. InProceedings of the 33rd ACM International Conference on Multimedia, pages 3875–3883,

work page
[42]

Improving gen- eralization in federated learning with highly heterogeneous data via momentum-based stochastic controlled weight averaging

Junkang Liu, Fanhua Shang, Junchao Zhou, Hongying Liu, Yuanyuan Liu, and Jin Liu. Fedmuon: Accelerating feder- ated learning with matrix orthogonalization.arXiv preprint arXiv:2510.27403, 2025. 3

work page arXiv 2025
[43]

FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models

Junkang Liu, Fanhua Shang, Kewen Zhu, Hongying Liu, Yuanyuan Liu, and Jin Liu. Fedadamw: A communication- efficient optimizer with convergence and generalization guarantees for federated large models.arXiv preprint arXiv:2510.27486, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Dp-fedpgn: Finding global flat minima for differentially private feder- ated learning via penalizing gradient norm.arXiv preprint arXiv:2510.27504, 2025

Junkang Liu, Yuxuan Tian, Fanhua Shang, Yuanyuan Liu, Hongying Liu, Junchao Zhou, and Daorui Ding. Dp-fedpgn: Finding global flat minima for differentially private feder- ated learning via penalizing gradient norm.arXiv preprint arXiv:2510.27504, 2025. 3

work page arXiv 2025
[45]

Kill a bird with two stones: Closing the convergence gaps in non-strongly convex optimization by di- rectly accelerated svrg with double compensation and snap- shots

Yuanyuan Liu, Fanhua Shang, Weixin An, Hongying Liu, and Zhouchen Lin. Kill a bird with two stones: Closing the convergence gaps in non-strongly convex optimization by di- rectly accelerated svrg with double compensation and snap- shots. InInternational Conference on Machine Learning, pages 14008–14035. PMLR, 2022. 3

work page 2022
[46]

Yuanyuan Liu, Fanhua Shang, Weixin An, Junhao Liu, Hongying Liu, and Zhouchen Lin. A single-loop accelerated extra-gradient difference algorithm with improved complex- ity bounds for constrained minimax optimization.Advances in Neural Information Processing Systems, 36:61699–61711,

work page
[47]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 1

work page 2021
[48]

Decoupled Weight Decay Regularization

Ilya Loshchilov, Frank Hutter, et al. Fixing weight decay regularization in adam.arXiv preprint arXiv:1711.05101, 5 (5):5, 2017. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2017
[49]

Communication- efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–1282. PMLR, 2017. 1

work page 2017
[50]

Improving global gen- eralization and local personalization for federated learning

Lei Meng, Zhuang Qi, Lei Wu, Xiaoyu Du, Zhaochuan Li, Lizhen Cui, and Xiangxu Meng. Improving global gen- eralization and local personalization for federated learning. IEEE Transactions on Neural Networks and Learning Sys- tems, 36(1):76–87, 2025. 3

work page 2025
[51]

R ´enyi differential privacy

Ilya Mironov. R ´enyi differential privacy. InProc. IEEE com- puter security foundations symposium (CSF), pages 263– 275, 2017. 6

work page 2017
[52]

Differentially private federated learning on heterogeneous data

Maxence Noble, Aur ´elien Bellet, and Aymeric Dieuleveut. Differentially private federated learning on heterogeneous data. InInternational conference on artificial intelligence and statistics, pages 10110–10145. PMLR, 2022. 1, 2, 6

work page 2022
[53]

Laplacian smoothing gradient descent.Research in the Mathematical Sciences, 9(3):55, 2022

Stanley Osher, Bao Wang, Penghang Yin, Xiyang Luo, Farzin Barekat, Minh Pham, and Alex Lin. Laplacian smoothing gradient descent.Research in the Mathematical Sciences, 9(3):55, 2022. 2

work page 2022
[54]

Federated learning for science: A survey on the path to a trustworthy collaboration ecosystem.Authorea Preprints,

Xin Qi, Meixuan Li, Sijin Zhou, Wei Feng, and Zhuang Qi. Federated learning for science: A survey on the path to a trustworthy collaboration ecosystem.Authorea Preprints,

work page
[55]

Federated learning in oncology: bridging artificial intelligence innovation and privacy protection.Information Fusion, page 104154, 2026

Xin Qi, Tao Xu, Chengrun Dang, Zhuang Qi, Lei Meng, and Han Yu. Federated learning in oncology: bridging artificial intelligence innovation and privacy protection.Information Fusion, page 104154, 2026

work page 2026
[56]

Cross-silo prototypical calibration for fed- erated learning with non-iid data

Zhuang Qi, Lei Meng, Zitan Chen, Han Hu, Hui Lin, and Xiangxu Meng. Cross-silo prototypical calibration for fed- erated learning with non-iid data. InProceedings of the 31st ACM international conference on multimedia, pages 3099– 3107, 2023. 3

work page 2023
[57]

On the convergence of adam and beyond

Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of adam and beyond. InInternational Confer- ence on Learning Representations, 2018. 3

work page 2018
[58]

On the Convergence of Adam and Beyond

Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of adam and beyond.arXiv preprint arXiv:1904.09237, 2019. 3

work page internal anchor Pith review Pith/arXiv arXiv 1904
[59]

Adaptive federated optimization

Sashank J Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Kone ˇcn`y, Sanjiv Kumar, and Hugh Brendan McMahan. Adaptive federated optimization. InInternational Conference on Learning Representations,

work page
[60]

Ai-enhanced disaster risk prediction with explainable shap analysis: A multi-class classification approach using xgboost

Qiannan Shen and Jing Zhang. Ai-enhanced disaster risk prediction with explainable shap analysis: A multi-class classification approach using xgboost. 2025. Preprint, Ver- sion 1, posted December 31, 2025. 3

work page 2025
[61]

Mftformer: Meteorological- frequency-temporal transformer with block-aligned fusion for traffic flow prediction.Research Square, 2026

Qiannan Shen and Jing Zhang. Mftformer: Meteorological- frequency-temporal transformer with block-aligned fusion for traffic flow prediction.Research Square, 2026. Preprint, doi:10.21203/rs.3.rs-8770196/v1. 3

work page doi:10.21203/rs.3.rs-8770196/v1 2026
[62]

Make landscape flatter in differentially private federated learning

Yifan Shi, Yingqi Liu, Kang Wei, Li Shen, Xueqian Wang, and Dacheng Tao. Make landscape flatter in differentially private federated learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24552–24562, 2023. 2, 6

work page 2023
[63]

High-recall deep learning: A gated recurrent unit approach to bank account fraud detection on imbalanced data

Wenxi Sun, Zhichun Qi, and Qiannan Shen. High-recall deep learning: A gated recurrent unit approach to bank account fraud detection on imbalanced data. 2025. 3

work page 2025
[64]

Objective over architecture: Fraud detection under extreme imbalance in bank account opening

Wenxi Sun, Qiannan Shen, Yijun Gao, Qinkai Mao, Tong- song Qi, and Shuo Xu. Objective over architecture: Fraud detection under extreme imbalance in bank account opening. Computation, 13(12):290, 2025. 3

work page 2025
[65]

Efficient federated learning via local adaptive amended opti- mizer with linear speedup.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12):14453–14464,

Yan Sun, Li Shen, Hao Sun, Liang Ding, and Dacheng Tao. Efficient federated learning via local adaptive amended opti- mizer with linear speedup.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12):14453–14464,

work page
[66]

LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

Zhonglin Sun, Chen Feng, Ioannis Patras, and Georgios Tz- imiropoulos. LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 3

work page 2024
[67]

Glue: A multi-task benchmark and analysis platform for natural language un- derstanding

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. Glue: A multi-task benchmark and analysis platform for natural language un- derstanding. InProceedings of the 2018 EMNLP workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP, pages 353–355, 2018. 6, 1

work page 2018
[68]

Dc-sgd: Differentially private sgd with dynamic clipping through gradient norm distribution estimation.IEEE Trans- actions on Information Forensics and Security, 2025

Chengkun Wei, Weixian Li, Gong Chen, and Wenzhi Chen. Dc-sgd: Differentially private sgd with dynamic clipping through gradient norm distribution estimation.IEEE Trans- actions on Information Forensics and Security, 2025. 2

work page 2025
[69]

Faster adaptive federated learning

Xidong Wu, Feihu Huang, Zhengmian Hu, and Heng Huang. Faster adaptive federated learning. InProceedings of the AAAI conference on artificial intelligence, pages 10379– 10387, 2023. 3, 8

work page 2023
[70]

Implicit bias of adamw:ℓ ∞- norm constrained optimization

Shuo Xie and Zhiyuan Li. Implicit bias of adamw:ℓ ∞- norm constrained optimization. InInternational Conference on Machine Learning, pages 54488–54510. PMLR, 2024. 3

work page 2024
[71]

Dual defense: Enhancing privacy and mitigating poison- ing attacks in federated learning.Advances in Neural Infor- mation Processing Systems, 37:70476–70498, 2024

Runhua Xu, Shiqi Gao, Chao Li, James Joshi, and Jianxin Li. Dual defense: Enhancing privacy and mitigating poison- ing attacks in federated learning.Advances in Neural Infor- mation Processing Systems, 37:70476–70498, 2024. 1

work page 2024
[72]

From risk to resilience: Towards assess- ing and mitigating the risk of data reconstruction attacks in federated learning

Xiangrui Xu, Zhize Li, Yufei Han, Bin Wang, Jiqiang Liu, and Wei Wang. From risk to resilience: Towards assess- ing and mitigating the risk of data reconstruction attacks in federated learning. In34th USENIX Security Symposium (USENIX Security 25), pages 3141–3160, 2025. 1

work page 2025
[73]

In32nd USENIX Security Symposium (USENIX Security 23), pages 1595–1612, 2023

Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao.{PrivateFL}: Accurate, differentially private feder- ated learning via personalized data transformation. In32nd USENIX Security Symposium (USENIX Security 23), pages 1595–1612, 2023. 1

work page 2023
[74]

Why transformers need adam: A hes- sian perspective.Advances in neural information processing systems, 37:131786–131823, 2024

Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, and Zhiquan Luo. Why transformers need adam: A hes- sian perspective.Advances in neural information processing systems, 37:131786–131823, 2024. 3

work page 2024
[75]

Balf: Simple and efficient blur aware lo- cal feature detector

Zhenjun Zhao. Balf: Simple and efficient blur aware lo- cal feature detector. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 3362–3372, 2024. 3

work page 2024
[76]

Benchmark for evaluating initialization of visual-inertial odometry

Zhenjun Zhao and Ben M Chen. Benchmark for evaluating initialization of visual-inertial odometry. In2023 42nd Chi- nese Control Conference (CCC), pages 3935–3940. IEEE, 2023

work page 2023
[77]

Advances in global solvers for 3d vision.arXiv preprint arXiv:2602.14662, 2026

Zhenjun Zhao, Heng Yang, Bangyan Liao, Yingping Zeng, Shaocheng Yan, Yingdong Gu, Peidong Liu, Yi Zhou, Haoang Li, and Javier Civera. Advances in global solvers for 3d vision.arXiv preprint arXiv:2602.14662, 2026. 3

work page arXiv 2026
[78]

Towards understanding convergence and generalization of adamw.IEEE transactions on pattern analysis and machine intelligence, 46(9):6486–6493, 2024

Pan Zhou, Xingyu Xie, Zhouchen Lin, and Shuicheng Yan. Towards understanding convergence and generalization of adamw.IEEE transactions on pattern analysis and machine intelligence, 46(9):6486–6493, 2024. 3 12 DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models Supplementary Material

work page 2024
[79]

Depth” denotes the total number of layers (blocks for ResNet/Swin, encoder layers for ViT/RoBERTa), and “Stages

More Implementation Detail 9.1. More Results Swin-Tiny/Base on CIFAR-10.Table 10 reports the averaged test accuracy on CIFAR-10 for six different DPFL methods evaluated on Swin-Tiny and Swin-Base, under two data heterogeneity levels (Dirichletα=0.6andα=0.1). Overall, DP- FedAdamW consistently achieves the best performance, while Swin-Base is markedly more...

work page
[80]

DP-LocalAdamW Algorithm For completeness, we provide in Algorithm 2 the full local training procedure ofDP-LocalAdamW

work page

Showing first 80 references.