arxiv: 2604.19336 · v1 · submitted 2026-04-21 · 💻 cs.LG · math.OC

Recognition: unknown

FedSEA: Achieving Benefit of Parallelization in Federated Online Learning

Harekrushna Sahu , Pratik Jawanpuria , Pranay Sharma

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:40 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords federated learningonline learningregret boundsconvex optimizationdata heterogeneitystochastic gradient descentparallelizationadversarial setting

0 comments

The pith

Under a stochastically extended adversary, federated online learning achieves regret bounds that improve with more parallel clients when temporal data variation stays mild relative to gradient noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper extends online federated learning by introducing a stochastically extended adversary in which the loss function stays fixed across clients and time while the adversary independently picks data distributions for each client at each step. It introduces an algorithm that performs online stochastic gradient descent locally at clients and periodically aggregates at the server. The analysis produces network regret bounds of order sqrt(T) for smooth convex losses and log(T) for smooth strongly convex losses, while separating the contributions of spatial heterogeneity across clients from temporal heterogeneity over time. The central result is that these bounds allow the overall regret to decrease as the number of clients grows, provided temporal variation remains small compared to the variance of the stochastic gradients.

Core claim

The paper establishes that in the SEA setting the proposed algorithm achieves an O(sqrt(T)) network regret bound for smooth convex losses and an O(log T) bound for smooth strongly convex losses over horizon T. These bounds explicitly decompose the effects of spatial heterogeneity across clients and temporal heterogeneity over time. The analysis identifies a regime of mild temporal variation relative to stochastic gradient variance in which the network regret decreases with the number of parallel clients, improving on standard worst-case results that show no benefit from parallelization.

What carries the argument

The stochastically extended adversary (SEA), which keeps the loss function identical across clients and time steps while letting the adversary choose independent data distributions for each client at each time; this structure allows the regret proof to isolate spatial and temporal heterogeneity and locate the regime where adding clients reduces total regret.

If this is right

Network regret scales as O(sqrt(T)) for smooth convex losses.
Network regret scales as O(log T) for smooth strongly convex losses.
Regret decreases with the number of clients when temporal variation is mild relative to stochastic gradient variance.
The separate impacts of spatial and temporal data heterogeneity appear explicitly in the derived bounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of heterogeneity types could be reused in other distributed online settings that track slowly varying streams.
System designers could monitor estimated temporal variation to decide when adding more clients will yield lower cumulative regret.
The same mild-variation condition might be tested in non-convex or non-smooth variants to see whether parallelization benefits persist.

Load-bearing premise

The loss function remains fixed across all clients and all time steps, even as the adversary independently selects different data distributions for each client at each time.

What would settle it

Run the algorithm under the SEA model while systematically increasing the scale of temporal changes in the per-client distributions until that scale exceeds the stochastic gradient variance, then verify whether the observed network regret ceases to decrease with added clients.

Figures

Figures reproduced from arXiv: 2604.19336 by Harekrushna Sahu, Pranay Sharma, Pratik Jawanpuria.

**Figure 1.** Figure 1: Illustration of online federated learning in load forecasting in smart grids. The substations can be modeled as different clients, each with its unique load distribution. The curves above show the average hourly electricity load (in million kWh) in 4 regions of the US during a typical day in a few selected months (Source: U.S. Energy Information Administration). The spatial variations in the load are evide… view at source ↗

read the original abstract

Online federated learning (OFL) has emerged as a popular framework for decentralized decision-making over continuous data streams without compromising client privacy. However, the adversary model assumed in standard OFL typically precludes any potential benefits of parallelization. Further, it fails to adequately capture the different sources of statistical variation in OFL problems. In this paper, we extend the OFL paradigm by integrating a stochastically extended adversary (SEA). Under this framework, the loss function remains fixed across clients over time. However, the adversary dynamically and independently selects the data distribution for each client at each time. We propose the \algoOFL{} algorithm to solve this problem, which utilizes online stochastic gradient descent at the clients, along with periodic global aggregation via the server. We establish bounds on the global network regret over a time horizon \(T\) for two classes of functions: (1) for smooth and convex losses, we prove an \(\mathcal{O}(\sqrt{T})\) bound, and (2) for smooth and strongly convex losses, we prove an \(\mathcal{O}(\log T)\) bound. Through careful analysis, we quantify the individual impact of both spatial (across clients) and temporal (over time) data heterogeneity on the regret bounds. Consequently, we identify a regime of mild temporal variation (relative to stochastic gradient variance), where the network regret improves with parallelization. Hence, in the SEA setting, our results improve the existing pessimistic worst-case results in online federated learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The SEA model lets them prove a parallelization benefit for online federated regret when temporal drift is mild relative to gradient variance.

read the letter

The main thing to know is that this paper defines a stochastically extended adversary where the loss stays fixed but each client draws from its own time-varying distribution chosen by the adversary. Under that setup they show a regime of mild temporal variation in which adding clients improves the network regret, unlike the pessimistic bounds common in prior online federated work. They run local online SGD with periodic server aggregation and derive the usual O(sqrt(T)) bound for smooth convex losses and O(log T) for smooth strongly convex ones, with explicit additive terms for stochastic variance, spatial heterogeneity, and temporal drift. The variance term contracts by roughly 1/K when temporal drift is small enough, which is the new regime they identify. The derivation follows standard online convex optimization steps with the extra heterogeneity pieces inserted, and the stress-test note finds no circularity or load-bearing gaps in that logic. What they do cleanly is separate the two sources of variation and quantify when one does not cancel the benefit of the other. A softer spot is the fixed-loss assumption itself. Real streaming tasks often have the underlying function drifting as well, not just the data distribution, so the model may apply only to a narrower set of problems than the abstract suggests. There is also no discussion of how large the mild-drift regime needs to be in practice or any simulation checks. This is aimed at people who work on regret bounds for decentralized online methods and want to move past worst-case results that ignore any upside from parallelism. It deserves peer review because the model is new, the decomposition is useful, and the analysis rests on reproducible standard techniques rather than invented machinery.

Referee Report

0 major / 3 minor

Summary. The paper introduces the stochastically extended adversary (SEA) model for online federated learning, in which the loss function is fixed across clients and time while an adversary independently selects per-client data distributions at each round. It proposes the FedSEA algorithm, which performs local online stochastic gradient descent at clients with periodic global aggregation at the server. For smooth convex losses the network regret is bounded by O(sqrt(T)); for smooth strongly convex losses the bound is O(log T). The analysis decomposes the regret into stochastic variance, spatial heterogeneity, and temporal drift terms, and identifies a regime of mild temporal variation (relative to per-client gradient variance) in which the variance term contracts by a factor of 1/K, yielding a net improvement in regret with increased parallelism.

Significance. If the stated regret bounds hold, the work is significant because it supplies a concrete, falsifiable regime in which online federated learning benefits from parallelization, in contrast to the pessimistic worst-case results that dominate the existing literature. The explicit decomposition of regret into variance, spatial, and temporal components, together with the identification of the mild-drift threshold that triggers the 1/K contraction, provides a refined theoretical tool for understanding statistical variation in streaming federated settings and could guide practical choices of aggregation frequency and client participation.

minor comments (3)

The abstract refers to the algorithm as “algoOFL” (presumably a LaTeX macro); replace with the consistent name “FedSEA” used in the title and body.
The key modeling assumption that the loss function itself remains fixed while only the distributions vary should be stated explicitly in the introduction (rather than only in the abstract) to make the distinction from standard OFL adversary models immediately clear.
In the regret analysis, the precise threshold on the temporal-drift coefficient relative to stochastic gradient variance that guarantees the 1/K improvement should be stated as a corollary or remark immediately after the main theorems, so that the “mild temporal variation” regime is quantitatively delineated.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The referee accurately captures the SEA model, the FedSEA algorithm, the O(sqrt(T)) and O(log T) regret bounds, the decomposition into variance/spatial/temporal terms, and the identification of the mild temporal variation regime in which parallelism yields a 1/K improvement. We are encouraged by the recognition that this supplies a falsifiable regime where online federated learning can benefit from parallelization, in contrast to existing worst-case analyses.

Circularity Check

0 steps flagged

No significant circularity detected in derivation

full rationale

The paper extends the online federated learning setting with a new SEA adversary model (fixed loss function, client- and time-varying distributions chosen independently by an adversary) and derives network regret bounds for FedSEA using standard online SGD analysis augmented with explicit spatial-heterogeneity and temporal-drift terms. The O(sqrt(T)) bound for smooth convex losses and O(log T) bound for smooth strongly convex losses are obtained by decomposing the regret into variance, heterogeneity, and drift components and applying the usual telescoping and smoothness arguments; the parallelization benefit in the mild-temporal-variation regime follows directly from the 1/K contraction of the variance term when temporal drift is small relative to per-client gradient variance. No step reduces to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation; the derivation remains self-contained against the external body of online convex optimization results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on standard smoothness and convexity assumptions plus the new SEA modeling choice; no free parameters are fitted and no new physical entities are postulated.

axioms (1)

domain assumption Loss functions are smooth and convex (or strongly convex)
Invoked to obtain the O(sqrt(T)) and O(log T) regret bounds under the SEA model.

invented entities (1)

Stochastically Extended Adversary (SEA) no independent evidence
purpose: Models fixed loss but independently chosen data distributions per client per time step
Core modeling extension that enables the parallelization-benefit regime; no external falsifiable prediction supplied.

pith-pipeline@v0.9.0 · 5573 in / 1316 out tokens · 32182 ms · 2026-05-10T03:40:52.089541+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 3 canonical work pages · 1 internal anchor

[1]

, booktitle=

Mitra, Aritra and Hassani, Hamed and Pappas, George J. , booktitle=. Online Federated Learning , year=
[2]

International Conference on Machine Learning , pages=

Federated composite optimization , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021
[3]

2024 , volume=

Ganguly, Bhargav and Aggarwal, Vaneet , journal=. 2024 , volume=

2024
[4]

Advances in Neural Information Processing Systems , volume=

Between stochastic and adversarial online convex optimization: Improved regret bounds via smoothness , author=. Advances in Neural Information Processing Systems , volume=
[5]

Proceedings of the 40th International Conference on Machine Learning , pages =

Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization , author =. Proceedings of the 40th International Conference on Machine Learning , pages =
[6]

Proceedings of the 37th International conference on machine learning , pages=

A unified theory of decentralized SGD with changing topology and local updates , author=. Proceedings of the 37th International conference on machine learning , pages=
[7]

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , year =

Tighter Theory for Local SGD on Identical and Heterogeneous Data , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , year =
[8]

Proceedings of the 40th International Conference on Machine Learning , year=

Federated Online and Bandit Convex Optimization , author =. Proceedings of the 40th International Conference on Machine Learning , year=
[9]

Tighter Regret Analysis and Optimization of Online Federated Learning , year=

Kwon, Dohyeok and Park, Jonghwan and Hong, Songnam , journal=. Tighter Regret Analysis and Optimization of Online Federated Learning , year=
[10]

International Conference on Learning Representations (ICLR) , year=

Local SGD Converges Fast and Communicates Little , author=. International Conference on Learning Representations (ICLR) , year=
[11]

A Modern Introduction to Online Learning

A modern introduction to online learning , author=. arXiv preprint arXiv:1912.13213 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1912
[12]

Foundations and Trends in Optimization , volume=

Introduction to online convex optimization , author=. Foundations and Trends in Optimization , volume=. 2016 , publisher=

2016
[13]

McMahan, Brendan and Moore, Eider and Ramage, Daniel and Hampson, Seth and Arcas, Blaise Aguera y , booktitle =
[14]

Karimireddy, Sai Praneeth and Kale, Satyen and Mohri, Mehryar and Reddi, Sashank and Stich, Sebastian and Suresh, Ananda Theertha , booktitle =
[15]

Federated Optimization in Heterogeneous Networks , year =

Li, Tian and Sahu, Anit Kumar and Zaheer, Manzil and Sanjabi, Maziar and Talwalkar, Ameet and Smith, Virginia , booktitle =. Federated Optimization in Heterogeneous Networks , year =
[16]

Neurocomputing , volume=

Online learning: A comprehensive survey , author=. Neurocomputing , volume=. 2021 , publisher=

2021
[17]

Brendan , title =

Kairouz, Peter and McMahan, H. Brendan , title =. Foundations and Trends in Machine Learning , volume =. 2021 , month =

2021
[18]

2024 , issn =

Federated learning: Overview, strategies, applications, tools and future directions , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.heliyon.2024.e38137 , author =

work page doi:10.1016/j.heliyon.2024.e38137 2024
[19]

2020 , journal =

Li, Li and Fan, Yuxi and Tse, Mike and Lin, Kuo-Yi , title =. 2020 , journal =

2020
[20]

2019 , issn =

Edge computing: A survey , journal =. 2019 , issn =. doi:https://doi.org/10.1016/j.future.2019.02.050 , author =

work page doi:10.1016/j.future.2019.02.050 2019
[21]

Smart Cities , VOLUME =

Mehmood, Hassan and Kostakos, Panos and Cortes, Marta and Anagnostopoulos, Theodoros and Pirttikangas, Susanna and Gilman, Ekaterina , TITLE =. Smart Cities , VOLUME =. 2021 , NUMBER =

2021