FIRMA: FIbonacci Ring Model Aggregation for Privacy-preserving Federated Learning

Rachid Hedjam

arxiv: 2605.22898 · v1 · pith:TUPPHUTPnew · submitted 2026-05-21 · 💻 cs.LG

FIRMA: FIbonacci Ring Model Aggregation for Privacy-preserving Federated Learning

Rachid Hedjam This is my paper

Pith reviewed 2026-05-25 05:29 UTC · model grok-4.3

classification 💻 cs.LG

keywords federated learningprivacy-preservingring topologyFibonacci weightingserver-freedecentralized aggregationheterogeneous datamodel aggregation

0 comments

The pith

A Fibonacci-weighted ring aggregation protocol enables server-free federated learning with permanently private classification heads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FIRMA as a family of protocols that perform model aggregation over a ring topology without any central server. Fibonacci sequence weights create asymmetric blending of neighbors while classification heads stay hidden from all peers. Later versions add accuracy-based suppression of weak neighbors, a permutation step to improve class diversity around the ring, and calibrated self-retention. Experiments across four benchmarks and seven heterogeneity settings show the full protocol exceeds standard averaging on every label-skew case and leads other server-free methods in most Dirichlet settings.

Core claim

FIRMA shows that server-free ring aggregation using Fibonacci directional bias, combined with accuracy-gated suppression and 2-opt ring permutation for diversity, produces a protocol that satisfies a convergence bound and delivers higher accuracy than FedAvg in all twelve label-skew configurations, with the largest gain of 20.7 percentage points on CIFAR-10 at K=1, while remaining Pareto-dominant among server-free approaches under Dirichlet heterogeneity.

What carries the argument

The fibflpp protocol, which blends models along a ring using Fibonacci weights, accuracy gating, 2-opt permutation, and multiple gossip passes to achieve coverage and retention.

If this is right

The protocol satisfies a proven convergence rate bound.
Classification heads remain permanently private from all other clients.
Global coverage is obtained through ceiling of N over 2 gossip passes.
The method is Pareto-dominant among server-free protocols in 17 of 28 total configurations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same directional bias and gating logic could be tested on non-ring communication graphs.
The privacy guarantee might extend to additional layers if the same suppression rule is applied.
Larger client counts could be examined to check whether the N/2 coverage rule remains efficient.

Load-bearing premise

The Fibonacci directional bias combined with accuracy-gated suppression and 2-opt ring permutation will preserve both convergence guarantees and permanent head privacy without introducing new leakage vectors or convergence failures under the tested heterogeneity regimes.

What would settle it

A run on one of the twelve label-skew configurations in which fibflpp accuracy falls below FedAvg or any peer obtains information about another client's classification head.

Figures

Figures reproduced from arXiv: 2605.22898 by Rachid Hedjam.

**Figure 2.** Figure 2: illustrates the ring topology and information flow. The three variants are presented as an ablation by construction rather than competing proposals: each isolates the incremental contribution of a single architectural component, allowing its effect to be quantified directly from the results tables. FIBFL and FIBFL+ additionally serve as lightweight alternatives for short-budget federations where FIBFL++’s … view at source ↗

**Figure 3.** Figure 3: Per-round mean top-1 accuracy on CIFAR-10 ( [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

**Figure 4.** Figure 4: Per-round mean top-1 accuracy on Fashion-MNIST ( [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Per-round mean top-1 accuracy on MNIST-60k ( [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: Per-round mean top-1 accuracy on MNIST-1797 ( [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: CIFAR-10 worst-to-best accuracy ranking across all 7 scenarios. Gold border = best method; outlined bars = [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Fashion-MNIST worst-to-best ranking. FIBFL++ achieves unbroken 2nd-place across all 7 scenarios. (color online) 3) MNIST-60k. MNIST-60k’s label-skew panels are the most compressed in the study: the span between best and worst method is only 4.7 pp at K=3 (0.937–0.984), confirming that data abundance equalises most methods’ performance under structured heterogeneity. FIBFL++ again achieves 2nd place in all … view at source ↗

**Figure 9.** Figure 9: MNIST-60k worst-to-best ranking. Data abundance compresses label-skew spans to a maximum of 4.7 pp; [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗

**Figure 10.** Figure 10: MNIST-1797 worst-to-best ranking. The LS [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗

**Figure 11.** Figure 11: CIFAR-10 convergence profiling. (a) R50%. (b) Plat. σ (lower = more stable). (color online) rounds of accumulated global knowledge, with only 2 recovery rounds remaining in the 10-round budget. FIBFL+’s Dir 0.8 Plat. σ=0.129 is the second highest, driven by the same single-client dominance mechanism. FIBFL++ avoids both failures entirely: σ≤0.012 across all 7 MNIST-1797 scenarios, a 10–12× improvement ove… view at source ↗

**Figure 12.** Figure 12: MNIST-1797 convergence profiling (N=5, R=10). Plat. σ capped at 0.145; annotated values exceed this threshold. FIBFL’s LS K=3 (σ=0.140) is the global study maximum. (color online) 6.7 Ring Topology and Accuracy–Fairness Trade-off We use Fashion-MNIST as the representative dataset for the ring saving analysis: it occupies the middle of the complexity spectrum and its savings span the full qualitative range… view at source ↗

**Figure 13.** Figure 13: (a) shows the 2-opt ring savings on Fashion-MNIST. Under IID, savings are near-zero (0.1%): no beneficial reordering exists when clients are statistically homogeneous. As heterogeneity increases, savings grow monotonically: Dir α=0.8 (35.9%), Dir α=0.5 (41.0%), Dir α=0.1 (79.1%). Label-skew savings are high at K=1 and K=2 (77.2% and 63.3%) and drop at K=3 (29.6%) as broader per-client class coverage reduc… view at source ↗

**Figure 14.** Figure 14: Multi-dimensional performance radar aggregated across all four datasets. [PITH_FULL_IMAGE:figures/full_fig_p029_14.png] view at source ↗

read the original abstract

Federated learning protocols face a structural trilemma: canonical server-based aggregation~\cite{mcmahan2017} creates a single point of failure and gradient inversion risk; decentralised ring-gossip alternatives~\cite{hu2019segmented} expose classification heads to semi-honest peers via uninformed uniform weights; and personalised methods~\cite{collins2021exploiting} reintroduce central aggregation. No existing protocol simultaneously achieves server-free operation, permanently private heads, ring topology, and principled asymmetric neighbour weighting. We propose FIRMA (\textbf{FI}bonacci \textbf{R}ing \textbf{M}odel \textbf{A}ggregation), a family of three progressively enhanced federated learning protocols: 1) \fibfl\ establishes the foundation: server-free ring aggregation with Fibonacci-weighted neighbour blending and permanently private classification heads. 2) \fibflp\ augments this with accuracy-gated neighbour suppression, selectively down-weighting poorly-converged peers while preserving the Fibonacci directional bias. 3) \fibflpp, the full system, completes the family with a 2-opt ring permutation that maximises adjacent-client class diversity, global ring coverage via $K_g{=}\lceil N/2\rceil$ gossip passes, and cosine-annealed self-retention calibration. We establish a convergence rate bound and three supporting propositions governing normalisation, coverage, retention, and diversity optimality. Systematic experiments across 28 configurations -- four benchmarks crossed with seven heterogeneity regimes -- demonstrate that \fibflpp\ surpasses \fedavg\ in all 12 label-skew configurations, with a peak advantage of $+20.7$\,pp on CIFAR-10 at $K{=}1$. Under Dirichlet heterogeneity, \fibflpp\ is the Pareto-dominant method among all server-free protocols, achieving the highest accuracy in 17 of 28 configurations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FIRMA introduces a concrete server-free ring FL protocol with Fibonacci weighting and private heads that reports gains over FedAvg on label-skew data, but the convergence claims and experiments rest on unshown derivations and missing controls.

read the letter

The main point is that this paper puts forward FIRMA as a family of ring-based protocols that try to hit server-free operation, permanently private heads, and asymmetric neighbour weighting at the same time. The core idea combines Fibonacci directional bias for blending, accuracy-gated suppression in one variant, and a 2-opt permutation plus multiple gossip passes in the fullest version. That specific bundle does not appear in the cited prior ring-gossip or personalised FL work, so the protocol family itself is the new element on offer. The experiments run across 28 configurations on four benchmarks and seven heterogeneity regimes, and they show fibflpp beating FedAvg in every one of the 12 label-skew cases, with a reported peak lift of 20.7 points on CIFAR-10. That is a clear empirical signal worth noting for people who care about decentralised setups under Dirichlet or label skew. The soft spots sit in the theory and the experimental reporting. The abstract states a convergence rate bound plus propositions on normalisation, coverage, retention, and diversity optimality, yet supplies none of the derivation steps or the exact assumptions under which the bound is supposed to hold. The performance figures come without error bars, without any statement on whether they are single runs or averaged, and without ablations that would show what the gating, the 2-opt step, or the cosine annealing actually add. The stress-test concern about the Fibonacci bias and suppression not introducing new convergence failures or leakage vectors therefore lands, because nothing in the supplied material lets a reader check whether the asymmetric weights preserve the conditions needed for the claimed rate. This work is aimed at researchers already working on ring or gossip-based federated learning who want a concrete alternative that keeps heads private. It deserves a serious referee to inspect the missing derivations, request the ablations, and verify the statistical reporting, rather than a desk reject.

Referee Report

3 major / 2 minor

Summary. The paper proposes FIRMA, a family of three progressively enhanced server-free federated learning protocols (fibfl, fibflp, fibflpp) using Fibonacci-weighted ring aggregation to achieve server-free operation, permanently private classification heads, ring topology, and asymmetric neighbour weighting. It claims to establish a convergence rate bound along with three supporting propositions on normalisation, coverage, retention, and diversity optimality. Experiments across 28 configurations (four benchmarks with seven heterogeneity regimes) report that fibflpp outperforms FedAvg in all 12 label-skew settings (peak +20.7 pp on CIFAR-10 at K=1) and is Pareto-dominant among server-free protocols under Dirichlet heterogeneity.

Significance. If the stated convergence bound holds under the claimed conditions and the empirical Pareto dominance is reproducible with proper statistical controls, the work would address a genuine trilemma in federated learning by combining decentralised ring topology with privacy-preserving heads and principled weighting. The absence of machine-checked proofs or external verification means the significance hinges entirely on the soundness of the unelaborated propositions.

major comments (3)

[Abstract] Abstract: The convergence rate bound and three propositions on normalisation, coverage, retention, and diversity optimality are asserted without derivation details, explicit assumptions (e.g., Lipschitz or bounded-gradient conditions), or proof sketches. This is load-bearing because the central claim that fibflpp simultaneously delivers the bound, coverage, and permanent head privacy rests on these propositions.
[Abstract / Experimental setup] Experimental results: Performance numbers (including the +20.7 pp advantage and dominance in 17 of 28 configurations) are presented without stating whether they derive from single runs or multiple averaged runs, and without error bars or statistical tests. This directly affects the reliability of the claim that fibflpp surpasses FedAvg in all label-skew configurations.
[Method description] Method: The post-hoc components (accuracy-gated neighbour suppression, 2-opt ring permutation, cosine-annealed self-retention) are introduced without ablation studies isolating their effects on convergence or privacy. If any component violates the implicit conditions needed for the rate bound, both the theoretical guarantee and the Pareto-dominance claim are undermined.

minor comments (2)

[Abstract] The abstract cites mcmahan2017, hu2019segmented, and collins2021exploiting but the manuscript should ensure the full reference list is complete and consistent.
[Abstract] Notation for K_g = ceil(N/2) and the exact definition of the Fibonacci directional bias should be clarified at first use to avoid ambiguity for readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications on the theoretical claims, experimental reporting, and methodological components while committing to revisions that strengthen the manuscript without misrepresenting its contributions.

read point-by-point responses

Referee: [Abstract] Abstract: The convergence rate bound and three propositions on normalisation, coverage, retention, and diversity optimality are asserted without derivation details, explicit assumptions (e.g., Lipschitz or bounded-gradient conditions), or proof sketches. This is load-bearing because the central claim that fibflpp simultaneously delivers the bound, coverage, and permanent head privacy rests on these propositions.

Authors: The full manuscript contains a theoretical analysis section that derives the convergence rate bound under standard assumptions of L-smooth loss functions and bounded stochastic gradients, which are stated explicitly there along with the three supporting propositions on normalisation, coverage, retention, and diversity optimality. Proof sketches appear in the appendix. To address the concern, we will expand the main text with explicit assumption statements, move key derivation steps forward, and include a high-level proof outline in the revised abstract and method sections. revision: yes
Referee: [Abstract / Experimental setup] Experimental results: Performance numbers (including the +20.7 pp advantage and dominance in 17 of 28 configurations) are presented without stating whether they derive from single runs or multiple averaged runs, and without error bars or statistical tests. This directly affects the reliability of the claim that fibflpp surpasses FedAvg in all label-skew configurations.

Authors: The reported figures derive from single runs per configuration, which is common in federated learning literature given computational constraints. We agree this reduces statistical robustness. In revision we will explicitly state the single-run nature, add error bars from multiple random seeds for the primary label-skew and Dirichlet results, and include paired statistical tests (e.g., Wilcoxon) for the key comparisons against FedAvg. revision: yes
Referee: [Method description] Method: The post-hoc components (accuracy-gated neighbour suppression, 2-opt ring permutation, cosine-annealed self-retention) are introduced without ablation studies isolating their effects on convergence or privacy. If any component violates the implicit conditions needed for the rate bound, both the theoretical guarantee and the Pareto-dominance claim are undermined.

Authors: These components are constructed to preserve the normalisation, coverage, and retention conditions underlying the convergence bound: accuracy-gated suppression modulates only low-performing neighbours while retaining the Fibonacci directional bias; 2-opt permutation maximises diversity subject to the ring topology; cosine annealing ensures self-retention decays consistently with the retention proposition. We omitted ablations due to space limits but will add them to the supplementary material in revision, reporting isolated effects on both accuracy and privacy leakage metrics. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation chain is self-contained

full rationale

The provided abstract and description establish a convergence rate bound plus propositions on normalisation, coverage, retention and diversity optimality for the Fibonacci-weighted ring protocols, but contain no equations, fitted parameters renamed as predictions, self-citations used as load-bearing uniqueness theorems, or ansatzes smuggled via prior author work. All cited prior art (McMahan 2017, Hu 2019, Collins 2021) is external. No step reduces by construction to its own inputs; the central claims rest on independent empirical comparisons to FedAvg and other server-free baselines across 28 configurations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The convergence bound and propositions are referenced but not derived, so the ledger remains empty pending full text.

pith-pipeline@v0.9.0 · 5874 in / 1393 out tokens · 18973 ms · 2026-05-25T05:29:56.828397+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Constants phi_golden_ratio, phi_fixed_point matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

the golden-ratio Fibonacci identity 1/φ+1/φ²=1 (φ=(1+√5)/2) provides a naturally normalised, asymmetric, and parameter-free weight pair (α,β) for ring gossip, requiring no additional hyperparameters
IndisputableMonolith.Cost.FunctionalEquation washburn_uniqueness_aczel, Jcost matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

α=1/φ≈0.618, β=1/φ²≈0.382 ... α+β=1 ... the Fibonacci pair α>β introduces an imaginary component that strictly reduces |λk|
IndisputableMonolith.Foundation.BranchSelection RCLCombiner_isCoupling_iff echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Corollary 4.2 (Fibonacci Directional Bias Preserved) ... aL ∈[α/2,(α+1)/2]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 5 internal anchors

[1]

Communication- efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication- efficient learning of deep networks from decentralized data,” inProceedings of the Interna- tional Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1273–1282, 2017

work page 2017
[2]

Exploiting shared representations for personalized federated learning,

L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Exploiting shared representations for personalized federated learning,” inProceedings of the International Conference on Machine Learning (ICML), pp. 2089–2099, 2021

work page 2089
[3]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine Learning and Systems, vol. 2, pp. 429– 450, 2020

work page 2020
[4]

SCAFFOLD: Stochastic controlled averaging for federated learning,

S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” inProceedings of the International Conference on Machine Learning (ICML), pp. 5132–5143, 2020

work page 2020
[5]

Tackling the objective inconsistency problem in heterogeneous federated optimization,

J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V . Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,”Advances in Neural Information Processing Systems, vol. 33, pp. 7611–7623, 2020

work page 2020
[6]

On the convergence of FedA vg on non-IID data,

X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-IID data,”arXiv preprint arXiv:1907.02189, 2019

work page arXiv 1907
[7]

Think Locally, Act Globally: Federated Learning with Local and Global Representations,

P. P. Liang, T. Liu, L. Ziyin, N. B. Allen, R. P. Auerbach, D. Brent, R. Salakhutdinov, and L.-P. Morency, “Think Locally, Act Globally: Federated Learning with Local and Global Representations,”arXiv preprint arXiv:2001.01523, Jan. 2020

work page arXiv 2001
[8]

Personalized federated learning with Moreau envelopes,

C. T. Dinh, N. Tran, and J. Nguyen, “Personalized federated learning with Moreau envelopes,” Advances in Neural Information Processing Systems, vol. 33, pp. 21394–21405, 2020. 31 FIRMA: Fibonacci Ring Model Aggregation for Privacy-preserving FL Under Review

work page 2020
[9]

Ditto: Fair and robust federated learning through personalization,

T. Li, S. Hu, A. Beirami, and V . Smith. “Ditto: Fair and robust federated learning through personalization,” InProc. ICML, volume 139, pages 6357–6368, 2021

work page 2021
[10]

Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach,

A. Fallah, A. Mokhtari, and A. Ozdaglar, “Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach,”Advances in Neural Information Processing Systems, vol. 33, pp. 3557–3568, 2020

work page 2020
[11]

Adaptive personalized federated learning,

Y . Deng, M. M. Kamani, and M. Mahdavi, “Adaptive personalized federated learning,”arXiv preprint arXiv:2003.13461, 2020

work page arXiv 2003
[12]

Federated multi-task learning under a mixture of distributions,

O. Marfoq, G. Neglia, A. Bellet, L. Kameni, and R. Vidal, “Federated multi-task learning under a mixture of distributions,”Advances in Neural Information Processing Systems, vol. 34, pp. 15434–15447, 2021

work page 2021
[13]

C. Hu, J. Jiang, and Z. Wang. Decentralized federated learning: A segmented gossip approach. arXiv:1908.07782, 2019

work page arXiv 1908
[14]

Z. Wang, Y . Hu, S. Yan, Z. Wang, R. Hou, and C. Wu, Efficient ring-topology decentralized federated learning with deep generative models for medical data in e-healthcare systems, Electronics, vol. 11, no. 10, p. 1548, May 2022

work page 2022
[15]

Fast linear iterations for distributed averaging,

L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,”Systems & Control Letters, vol. 53, no. 1, pp. 65–78, 2004

work page 2004
[16]

Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent,

X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu, “Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017
[17]

Federated Learning with Non-IID Data

Y . Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V . Chandra, “Federated learning with non-IID data,”arXiv preprint arXiv:1806.00582, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,”arXiv preprint arXiv:1909.06335, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1909
[19]

iDLG: Improved deep leakage from gradients,

B. Zhao, K. R. Mopuri, and H. Bilen, “iDLG: Improved deep leakage from gradients,”arXiv preprint arXiv:2001.02610, 2020

work page arXiv 2001
[20]

Inverting gradients—how easy is it to break privacy in federated learning?

J. Geiping, H. Bauermeister, H. Dr¨oge, and M. Moeller, “Inverting gradients—how easy is it to break privacy in federated learning?”Advances in Neural Information Processing Systems, vol. 33, pp. 16937–16947, 2020

work page 2020
[21]

SGDR: Stochastic Gradient Descent with Warm Restarts

I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,”arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[22]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[23]

A method for solving traveling-salesman problems,

G. A. Croes, “A method for solving traveling-salesman problems,”Operations Research, vol. 6, no. 6, pp. 791–812, 1958

work page 1958
[24]

Layer Normalization

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,”arXiv preprint arXiv:1607.06450, 2016. 32

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Communication- efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication- efficient learning of deep networks from decentralized data,” inProceedings of the Interna- tional Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1273–1282, 2017

work page 2017

[2] [2]

Exploiting shared representations for personalized federated learning,

L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Exploiting shared representations for personalized federated learning,” inProceedings of the International Conference on Machine Learning (ICML), pp. 2089–2099, 2021

work page 2089

[3] [3]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine Learning and Systems, vol. 2, pp. 429– 450, 2020

work page 2020

[4] [4]

SCAFFOLD: Stochastic controlled averaging for federated learning,

S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” inProceedings of the International Conference on Machine Learning (ICML), pp. 5132–5143, 2020

work page 2020

[5] [5]

Tackling the objective inconsistency problem in heterogeneous federated optimization,

J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V . Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,”Advances in Neural Information Processing Systems, vol. 33, pp. 7611–7623, 2020

work page 2020

[6] [6]

On the convergence of FedA vg on non-IID data,

X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-IID data,”arXiv preprint arXiv:1907.02189, 2019

work page arXiv 1907

[7] [7]

Think Locally, Act Globally: Federated Learning with Local and Global Representations,

P. P. Liang, T. Liu, L. Ziyin, N. B. Allen, R. P. Auerbach, D. Brent, R. Salakhutdinov, and L.-P. Morency, “Think Locally, Act Globally: Federated Learning with Local and Global Representations,”arXiv preprint arXiv:2001.01523, Jan. 2020

work page arXiv 2001

[8] [8]

Personalized federated learning with Moreau envelopes,

C. T. Dinh, N. Tran, and J. Nguyen, “Personalized federated learning with Moreau envelopes,” Advances in Neural Information Processing Systems, vol. 33, pp. 21394–21405, 2020. 31 FIRMA: Fibonacci Ring Model Aggregation for Privacy-preserving FL Under Review

work page 2020

[9] [9]

Ditto: Fair and robust federated learning through personalization,

T. Li, S. Hu, A. Beirami, and V . Smith. “Ditto: Fair and robust federated learning through personalization,” InProc. ICML, volume 139, pages 6357–6368, 2021

work page 2021

[10] [10]

Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach,

A. Fallah, A. Mokhtari, and A. Ozdaglar, “Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach,”Advances in Neural Information Processing Systems, vol. 33, pp. 3557–3568, 2020

work page 2020

[11] [11]

Adaptive personalized federated learning,

Y . Deng, M. M. Kamani, and M. Mahdavi, “Adaptive personalized federated learning,”arXiv preprint arXiv:2003.13461, 2020

work page arXiv 2003

[12] [12]

Federated multi-task learning under a mixture of distributions,

O. Marfoq, G. Neglia, A. Bellet, L. Kameni, and R. Vidal, “Federated multi-task learning under a mixture of distributions,”Advances in Neural Information Processing Systems, vol. 34, pp. 15434–15447, 2021

work page 2021

[13] [13]

C. Hu, J. Jiang, and Z. Wang. Decentralized federated learning: A segmented gossip approach. arXiv:1908.07782, 2019

work page arXiv 1908

[14] [14]

Z. Wang, Y . Hu, S. Yan, Z. Wang, R. Hou, and C. Wu, Efficient ring-topology decentralized federated learning with deep generative models for medical data in e-healthcare systems, Electronics, vol. 11, no. 10, p. 1548, May 2022

work page 2022

[15] [15]

Fast linear iterations for distributed averaging,

L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,”Systems & Control Letters, vol. 53, no. 1, pp. 65–78, 2004

work page 2004

[16] [16]

Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent,

X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu, “Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017

[17] [17]

Federated Learning with Non-IID Data

Y . Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V . Chandra, “Federated learning with non-IID data,”arXiv preprint arXiv:1806.00582, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,”arXiv preprint arXiv:1909.06335, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1909

[19] [19]

iDLG: Improved deep leakage from gradients,

B. Zhao, K. R. Mopuri, and H. Bilen, “iDLG: Improved deep leakage from gradients,”arXiv preprint arXiv:2001.02610, 2020

work page arXiv 2001

[20] [20]

Inverting gradients—how easy is it to break privacy in federated learning?

J. Geiping, H. Bauermeister, H. Dr¨oge, and M. Moeller, “Inverting gradients—how easy is it to break privacy in federated learning?”Advances in Neural Information Processing Systems, vol. 33, pp. 16937–16947, 2020

work page 2020

[21] [21]

SGDR: Stochastic Gradient Descent with Warm Restarts

I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,”arXiv preprint arXiv:1608.03983, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[22] [22]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[23] [23]

A method for solving traveling-salesman problems,

G. A. Croes, “A method for solving traveling-salesman problems,”Operations Research, vol. 6, no. 6, pp. 791–812, 1958

work page 1958

[24] [24]

Layer Normalization

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,”arXiv preprint arXiv:1607.06450, 2016. 32

work page internal anchor Pith review Pith/arXiv arXiv 2016