FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models

· 2025 · cs.LG · arXiv 2510.27486

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

AdamW has become one of the most effective optimizers for training large-scale models. We have also observed its effectiveness in the context of federated learning (FL). However, directly applying AdamW in federated learning settings poses significant challenges: (1) due to data heterogeneity, AdamW often yields high variance in the second-moment estimate $\boldsymbol{v}$; (2) the local overfitting of AdamW may cause client drift; and (3) Reinitializing moment estimates ($\boldsymbol{v}$, $\boldsymbol{m}$) at each round slows down convergence. To address these challenges, we propose the first \underline{Fed}erated \underline{AdamW} algorithm, called \texttt{FedAdamW}, for training and fine-tuning various large models. \texttt{FedAdamW} aligns local updates with the global update using both a \textbf{local correction mechanism} and decoupled weight decay to mitigate local overfitting. \texttt{FedAdamW} efficiently aggregates the \texttt{mean} of the second-moment estimates to reduce their variance and reinitialize them. Theoretically, we prove that \texttt{FedAdamW} achieves a linear speedup convergence rate of $\mathcal{O}(\sqrt{(L \Delta \sigma_l^2)/(S K R \epsilon^2)}+(L \Delta)/R)$ without \textbf{heterogeneity assumption}, where $S$ is the number of participating clients per round, $K$ is the number of local iterations, and $R$ is the total number of communication rounds. We also employ PAC-Bayesian generalization analysis to explain the effectiveness of decoupled weight decay in local training. Empirically, we validate the effectiveness of \texttt{FedAdamW} on language and vision Transformer models. Compared to several baselines, \texttt{FedAdamW} significantly reduces communication rounds and improves test accuracy. The code is available in https://github.com/junkangLiu0/FedAdamW.

representative citing papers

Fusion and Alignment Enhancement with Large Language Models for Tail-item Sequential Recommendation

cs.IR · 2026-04-04 · unverdicted · novelty 7.0

FAERec fuses collaborative ID embeddings with LLM semantic embeddings using adaptive gating and dual-level alignment to enhance tail-item sequential recommendations.

From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

FEAT mitigates representation collapse and prediction bias in federated continual learning by aligning feature angular similarities to shared Equiangular Tight Frame prototypes and removing task-irrelevant directional components from embeddings.

Personalized Federated Learning for Gradient Alignment

cs.LG · 2026-05-04 · unverdicted · novelty 5.0

pFLAlign uses two gradient alignment mechanisms derived from PAC-Bayesian analysis to reduce variance in local training and distortion in aggregation, yielding state-of-the-art personalization in federated learning.

citing papers explorer

Showing 3 of 3 citing papers.

Fusion and Alignment Enhancement with Large Language Models for Tail-item Sequential Recommendation cs.IR · 2026-04-04 · unverdicted · none · ref 39 · internal anchor
FAERec fuses collaborative ID embeddings with LLM semantic embeddings using adaptive gating and dual-level alignment to enhance tail-item sequential recommendations.
From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity cs.LG · 2026-04-09 · unverdicted · none · ref 39 · internal anchor
FEAT mitigates representation collapse and prediction bias in federated continual learning by aligning feature angular similarities to shared Equiangular Tight Frame prototypes and removing task-irrelevant directional components from embeddings.
Personalized Federated Learning for Gradient Alignment cs.LG · 2026-05-04 · unverdicted · none · ref 23 · internal anchor
pFLAlign uses two gradient alignment mechanisms derived from PAC-Bayesian analysis to reduce variance in local training and distortion in aggregation, yielding state-of-the-art personalization in federated learning.

FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models

fields

years

verdicts

representative citing papers

citing papers explorer