arxiv: 2604.22494 · v1 · submitted 2026-04-24 · 📊 stat.ML · cs.LG

Recognition: unknown

FedSPDnet: Geometry-Aware Federated Deep Learning with SPDnet

Ammar Mian, Florent Bouchard, Guillaume Ginolhac, Thibault Pautrel

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:58 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords federated learningSPD matricesStiefel manifoldEEG classificationgeometric deep learningmanifold projectionsignal processing

0 comments

The pith

Federated SPDnet preserves Stiefel manifold constraints during parameter averaging by using projections or retractions instead of Euclidean means.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that two new aggregation methods can carry out federated learning on SPDnet models without violating the orthogonality constraints on their parameters. Standard Euclidean averaging would mix parameters in a way that breaks the manifold structure and loses the geometric advantages of symmetric positive definite matrix features. If the methods succeed, they would support distributed training for privacy-sensitive tasks such as brain signal classification where raw data cannot be centralized. Simulations on motor imagery EEG benchmarks indicate higher F1 scores and greater stability even when only some clients participate.

Core claim

SPDnet processes symmetric positive definite matrices under Stiefel manifold constraints on selected layers. In the federated setting the authors substitute Euclidean averaging with ProjAvg, which projects the arithmetic mean back onto the Stiefel manifold, or RLAvg, which lifts parameters to the tangent space, averages there, and retracts. This geometric aggregation produces higher F1 scores than federated EEGnet on EEG motor imagery benchmarks, greater robustness to changes in federation size and partial participation, and lower parameter communication per round.

What carries the argument

ProjAvg and RLAvg, the two aggregation strategies that project or retract arithmetic means to respect Stiefel manifold constraints on the parameters of SPDnet.

Load-bearing premise

That projecting arithmetic means onto the Stiefel manifold or approximating tangent-space averaging via retractions and liftings will preserve enough geometric information to produce measurable gains in learning performance over Euclidean averaging.

What would settle it

If the EEG motor imagery simulations show no improvement or a decrease in F1 score and robustness metrics when using ProjAvg or RLAvg compared with standard Euclidean averaging, the claimed advantage of the geometry-aware methods would be falsified.

Figures

Figures reproduced from arXiv: 2604.22494 by Ammar Mian, Florent Bouchard, Guillaume Ginolhac, Thibault Pautrel.

**Figure 1.** Figure 1: General SPDnet architecture [8]: a chain of view at source ↗

**Figure 2.** Figure 2: EEGnet overall pipeline. Algorithm 1 FedSPDnet Input: number of rounds T, number of clients per round M, number of local epochs E 1: Initialize randomly θ0 = (W1,0, . . . ,WL,0, ξ0, β0) 2: for t = 0, . . . , T − 1 do 3: Randomly sample St ⊂ C with |St| = M 4: for each client i ∈ St in parallel do 5: Initialize local SPDnet with θt 6: Train SPDnet over E epochs to get θ (i) t 7: Send θ (i) t to the server 8… view at source ↗

**Figure 3.** Figure 3: Convergence on Weibo2014 under full (solid) and half (dashed) view at source ↗

**Figure 4.** Figure 4: Convergence on PhysionetMI under full (solid) and half (dashed) view at source ↗

read the original abstract

We introduce two federated learning frameworks for the classical SPDnet model operating on symmetric positive definite (SPD) matrices with Stiefel-constrained parameters. Unlike standard Euclidean averaging, which violates orthogonality, our approach preserves geometric structure through two efficient aggregation strategies: ProjAvg, projecting arithmetic means onto the Stiefel manifold, and RLAvg, approximating tangent-space averaging via retractions and liftings. Both methods are computationally efficient, independent of the optimizer, and enable scalable federated learning for signal processing applications whose features are SPD matrices. Simulations on EEG motor imagery benchmarks show that FedSPDnet outperforms federated EEGnet in F1 score and robustness to federation and partial participation, while using fewer parameters per communication round.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces two geometry-preserving aggregation operators for federated SPDnet but the experiments do not isolate their contribution from the choice of model architecture.

read the letter

The main point is that this paper introduces two aggregation operators for federated learning with SPDnet that keep the Stiefel constraints intact, and shows some gains on EEG benchmarks over a standard federated network. What stands out as new is the concrete ProjAvg and RLAvg methods. ProjAvg takes the arithmetic mean and projects it onto the manifold, while RLAvg approximates averaging in the tangent space using retractions and liftings. These are simple, don't tie to a particular optimizer, and address a clear mismatch between Euclidean FL averaging and the geometry of these layers. That's a targeted fix for a practical issue in manifold deep learning under federation. The work does well in focusing on signal processing applications where SPD features appear naturally, like motor imagery EEG. It emphasizes efficiency and scalability, which matters for real deployments with partial participation. The soft spots are in the evaluation. The reported improvements are only against federated EEGnet, a different model class altogether. We lack an ablation that applies the same SPDnet but with naive averaging plus post-processing, or even a centralized SPDnet baseline. This makes it difficult to credit the geometry preservation specifically for the better F1 and robustness. The abstract also skips details like error bars, run counts, or split strategies, so the strength of the empirical results is hard to assess fully. This is aimed at researchers in federated manifold learning or applied ML for EEG and similar data. Someone looking for off-the-shelf ways to federate SPD models would get something usable here. I would recommend sending it for peer review. The ideas are clear and the gap they fill is real, though the experiments could use tighter controls to make the claims more convincing.

Referee Report

2 major / 1 minor

Summary. The paper introduces FedSPDnet, two federated learning frameworks for the SPDnet model on symmetric positive definite (SPD) matrices with Stiefel-constrained parameters. It proposes ProjAvg (projecting arithmetic means onto the Stiefel manifold) and RLAvg (approximating tangent-space averaging via retractions and liftings) as geometry-preserving alternatives to Euclidean averaging. Simulations on EEG motor imagery benchmarks claim that FedSPDnet outperforms federated EEGnet in F1 score and robustness to federation and partial participation while using fewer parameters per communication round.

Significance. If the geometric aggregation methods can be shown to drive the gains independently of the SPD representation, the work would address a relevant gap in federated learning for manifold-constrained models and could benefit signal-processing applications. The efficiency claims and independence from the optimizer are potentially useful strengths, but the current evidence does not yet establish these contributions.

major comments (2)

[Simulations on EEG motor imagery benchmarks] Simulations section (EEG motor imagery benchmarks): performance is reported only versus federated EEGnet, whose Euclidean CNN architecture differs fundamentally from SPDnet. No controlled comparison using identical SPDnet weights with naive Euclidean averaging (plus post-hoc orthogonalization) is provided, so gains cannot be attributed to ProjAvg/RLAvg rather than the SPD feature choice itself.
[Simulations on EEG motor imagery benchmarks] Experimental design (abstract and simulations): no details on data splits, error bars, number of runs, statistical tests, or partial-participation protocol are given, preventing verification of the robustness and F1-score claims.

minor comments (1)

[Abstract] The phrase 'robustness to federation' in the abstract is ambiguous; clarify whether it refers to number of clients, client heterogeneity, or another quantity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to better isolate the contribution of our geometric aggregation methods and to improve experimental transparency. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Simulations section (EEG motor imagery benchmarks): performance is reported only versus federated EEGnet, whose Euclidean CNN architecture differs fundamentally from SPDnet. No controlled comparison using identical SPDnet weights with naive Euclidean averaging (plus post-hoc orthogonalization) is provided, so gains cannot be attributed to ProjAvg/RLAvg rather than the SPD feature choice itself.

Authors: We acknowledge that the current experiments compare against federated EEGnet rather than applying Euclidean averaging directly to SPDnet. While the primary contribution is a complete federated framework for SPDnet that outperforms a standard CNN-based baseline, we agree that an ablation isolating the aggregation strategy on identical SPDnet weights would strengthen attribution to ProjAvg and RLAvg. We will add this controlled comparison in the revised manuscript, reporting results for naive Euclidean averaging followed by post-hoc orthogonalization on the same model and datasets. revision: yes
Referee: Experimental design (abstract and simulations): no details on data splits, error bars, number of runs, statistical tests, or partial-participation protocol are given, preventing verification of the robustness and F1-score claims.

Authors: We agree that these details are essential for reproducibility and verification. In the revised manuscript we will expand the experimental section to specify the data splits (including per-client train/test partitions for the EEG motor imagery benchmarks), report error bars as standard deviations over a stated number of independent runs, include appropriate statistical tests (e.g., paired t-tests) for the reported F1-score differences, and describe the partial-participation protocol (client sampling fraction per round and its impact on robustness). revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; methods defined geometrically and tested empirically

full rationale

The paper introduces ProjAvg (arithmetic mean projected to Stiefel) and RLAvg (retraction/lifting tangent approximation) as direct constructions from manifold geometry to preserve orthogonality, then reports empirical F1 gains versus federated EEGnet on EEG benchmarks. No equation reduces a claimed prediction to a fitted input by construction, no self-citation is load-bearing for the central premise, and no ansatz or uniqueness result is smuggled in. The performance comparison is external and falsifiable rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard properties of the Stiefel manifold and Riemannian retractions; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)

standard math The Stiefel manifold admits efficient projections and retractions that preserve orthogonality while allowing approximate averaging in the tangent space.
Invoked to justify that ProjAvg and RLAvg are both geometrically valid and computationally cheap.

pith-pipeline@v0.9.0 · 5426 in / 1373 out tokens · 63474 ms · 2026-05-08T09:58:33.304378+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics. PMLR, 2017, pp. 1273– 1282

2017
[2]

Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning,

H. Yu, S. Yang, and S. Zhu, “Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, 2019, pp. 5693–5700

2019
[3]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020

2020
[4]

On convergence of fedprox: Local dissimilarity invariant bounds, non-smoothness and beyond,

X. Yuan and P. Li, “On convergence of fedprox: Local dissimilarity invariant bounds, non-smoothness and beyond,”Advances in Neural Information Processing Systems, vol. 35, pp. 10 752–10 765, 2022

2022
[5]

Scaffold: Stochastic controlled averaging for federated learning,

S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” inInternational conference on machine learning. PMLR, 2020, pp. 5132–5143

2020
[6]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization algorithms on matrix manifolds. Princeton University Press, 2008

2008
[7]

Boumal,An introduction to optimization on smooth manifolds

N. Boumal,An introduction to optimization on smooth manifolds. Cambridge University Press, 2023

2023
[8]

A riemannian network for spd matrix learning,

Z. Huang and L. Van Gool, “A riemannian network for spd matrix learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 31, 2017

2017
[9]

Riemannian batch normalization for spd neural networks,

D. Brooks, O. Schwander, F. Barbaresco, J.-Y . Schneider, and M. Cord, “Riemannian batch normalization for spd neural networks,”Advances in Neural Information Processing Systems, vol. 32, 2019

2019
[10]

SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG,

R. J. Kobler, J. ichiro Hirayama, Q. Zhao, and M. Kawanabe, “SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG,” inNeurips, 2022

2022
[11]

Classification of buried objects from ground penetrating radar images by using second order deep learning models,

D. Jafuno, A. Mian, G. Ginolhac, and N. Stelzenmuller, “Classification of buried objects from ground penetrating radar images by using second order deep learning models,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025

2025
[12]

arXiv preprint arXiv:2206.05668 , year =

J. Li and S. Ma, “Federated learning on riemannian manifolds,”arXiv preprint arXiv:2206.05668, 2022

work page arXiv 2022
[13]

Nonconvex federated learning on compact smooth submanifolds with heterogeneous data,

J. Zhang, J. Hu, A. M.-C. So, and M. Johansson, “Nonconvex federated learning on compact smooth submanifolds with heterogeneous data,” Advances in Neural Information Processing Systems, vol. 37, pp. 109 817– 109 844, 2024

2024
[14]

Federated learning on riemannian manifolds with differential privacy,

Z. Huang, W. Huang, P. Jawanpuria, and B. Mishra, “Federated learning on riemannian manifolds with differential privacy,”arXiv preprint arXiv:2404.10029, 2024

work page arXiv 2024
[15]

Riemannian federated learning via averaging gradient stream,

——, “Riemannian federated learning via averaging gradient stream,” arXiv preprint arXiv:2409.07223, 2024

work page arXiv 2024
[16]

Beyond R- barycenters: An effective averaging method on Stiefel and Grassmann manifolds,

F. Bouchard, N. Laurent, S. Said, and N. Le Bihan, “Beyond R- barycenters: An effective averaging method on Stiefel and Grassmann manifolds,”IEEE Signal Processing Letters, vol. 32, pp. 1950–1954, 2025

1950
[17]

Goodfellow, Y

I. Goodfellow, Y . Bengio, A. Courville, and Y . Bengio,Deep learning. MIT press Cambridge, 2016, vol. 1

2016
[18]

Adam: A Method for Stochastic Optimization

D. P. Kingma, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review arXiv 2014
[19]

Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints,

P. Ablin, S. Vary, B. Gao, and P.-A. Absil, “Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints,”Journal of Machine Learning Research, vol. 25, no. 389, pp. 1–38, 2024

2024
[20]

Multiclass brain–computer interface classification by riemannian geometry,

A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Multiclass brain–computer interface classification by riemannian geometry,”IEEE transactions on biomedical engineering, vol. 59, no. 4, pp. 920–928, 2012

2012
[21]

Is second-order information helpful for large-scale visual recognition?

P. Li, J. Xie, Q. Wang, and W. Zuo, “Is second-order information helpful for large-scale visual recognition?” inProceedings of the IEEE international conference on computer vision, 2017, pp. 2070–2078

2017
[22]

A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update,

F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rako- tomamonjy, and F. Yger, “A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update,”Journal of neural engineering, vol. 15, no. 3, p. 031005, 2018

2018
[23]

Geometric neural network based on phase space for bci-eeg decoding,

I. Carrara, B. Aristimunha, M.-C. Corsi, R. Y . de Camargo, S. Chevallier, and T. Papadopoulo, “Geometric neural network based on phase space for bci-eeg decoding,”Journal of Neural Engineering, vol. 22, no. 1, p. 016049, 2025

2025
[24]

Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,

V . J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,”Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018

2018
[25]

The largest EEG -based BCI reproducibility study for open science: the moabb benchmark

S. Chevallier, I. Carrara, B. Aristimunha, P. Guetschel, S. Sedlar, B. Lopes, S. Velut, S. Khazem, and T. Moreau, “The largest eeg-based bci reproducibility study for open science: the moabb benchmark,”arXiv preprint arXiv:2404.15319, 2024

work page arXiv 2024
[26]

Early stopping—but when?

L. Prechelt, “Early stopping—but when?” inNeural Networks: Tricks of the trade. Springer, 1998, pp. 55–69

1998
[27]

Fedbn: Federated learning on non-iid features via local batch normalization,

X. Li, M. Jiang, X. Zhang, M. Kamp, and Q. Dou, “Fedbn: Federated learning on non-iid features via local batch normalization,”arXiv preprint arXiv:2102.07623, 2021

work page arXiv 2021