pith. machine review for the scientific record. sign in

arxiv: 2604.22494 · v1 · submitted 2026-04-24 · 📊 stat.ML · cs.LG

Recognition: unknown

FedSPDnet: Geometry-Aware Federated Deep Learning with SPDnet

Ammar Mian, Florent Bouchard, Guillaume Ginolhac, Thibault Pautrel

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:58 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords federated learningSPD matricesStiefel manifoldEEG classificationgeometric deep learningmanifold projectionsignal processing
0
0 comments X

The pith

Federated SPDnet preserves Stiefel manifold constraints during parameter averaging by using projections or retractions instead of Euclidean means.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that two new aggregation methods can carry out federated learning on SPDnet models without violating the orthogonality constraints on their parameters. Standard Euclidean averaging would mix parameters in a way that breaks the manifold structure and loses the geometric advantages of symmetric positive definite matrix features. If the methods succeed, they would support distributed training for privacy-sensitive tasks such as brain signal classification where raw data cannot be centralized. Simulations on motor imagery EEG benchmarks indicate higher F1 scores and greater stability even when only some clients participate.

Core claim

SPDnet processes symmetric positive definite matrices under Stiefel manifold constraints on selected layers. In the federated setting the authors substitute Euclidean averaging with ProjAvg, which projects the arithmetic mean back onto the Stiefel manifold, or RLAvg, which lifts parameters to the tangent space, averages there, and retracts. This geometric aggregation produces higher F1 scores than federated EEGnet on EEG motor imagery benchmarks, greater robustness to changes in federation size and partial participation, and lower parameter communication per round.

What carries the argument

ProjAvg and RLAvg, the two aggregation strategies that project or retract arithmetic means to respect Stiefel manifold constraints on the parameters of SPDnet.

Load-bearing premise

That projecting arithmetic means onto the Stiefel manifold or approximating tangent-space averaging via retractions and liftings will preserve enough geometric information to produce measurable gains in learning performance over Euclidean averaging.

What would settle it

If the EEG motor imagery simulations show no improvement or a decrease in F1 score and robustness metrics when using ProjAvg or RLAvg compared with standard Euclidean averaging, the claimed advantage of the geometry-aware methods would be falsified.

Figures

Figures reproduced from arXiv: 2604.22494 by Ammar Mian, Florent Bouchard, Guillaume Ginolhac, Thibault Pautrel.

Figure 1
Figure 1. Figure 1: General SPDnet architecture [8]: a chain of view at source ↗
Figure 2
Figure 2. Figure 2: EEGnet overall pipeline. Algorithm 1 FedSPDnet Input: number of rounds T, number of clients per round M, number of local epochs E 1: Initialize randomly θ0 = (W1,0, . . . ,WL,0, ξ0, β0) 2: for t = 0, . . . , T − 1 do 3: Randomly sample St ⊂ C with |St| = M 4: for each client i ∈ St in parallel do 5: Initialize local SPDnet with θt 6: Train SPDnet over E epochs to get θ (i) t 7: Send θ (i) t to the server 8… view at source ↗
Figure 3
Figure 3. Figure 3: Convergence on Weibo2014 under full (solid) and half (dashed) view at source ↗
Figure 4
Figure 4. Figure 4: Convergence on PhysionetMI under full (solid) and half (dashed) view at source ↗
read the original abstract

We introduce two federated learning frameworks for the classical SPDnet model operating on symmetric positive definite (SPD) matrices with Stiefel-constrained parameters. Unlike standard Euclidean averaging, which violates orthogonality, our approach preserves geometric structure through two efficient aggregation strategies: ProjAvg, projecting arithmetic means onto the Stiefel manifold, and RLAvg, approximating tangent-space averaging via retractions and liftings. Both methods are computationally efficient, independent of the optimizer, and enable scalable federated learning for signal processing applications whose features are SPD matrices. Simulations on EEG motor imagery benchmarks show that FedSPDnet outperforms federated EEGnet in F1 score and robustness to federation and partial participation, while using fewer parameters per communication round.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces FedSPDnet, two federated learning frameworks for the SPDnet model on symmetric positive definite (SPD) matrices with Stiefel-constrained parameters. It proposes ProjAvg (projecting arithmetic means onto the Stiefel manifold) and RLAvg (approximating tangent-space averaging via retractions and liftings) as geometry-preserving alternatives to Euclidean averaging. Simulations on EEG motor imagery benchmarks claim that FedSPDnet outperforms federated EEGnet in F1 score and robustness to federation and partial participation while using fewer parameters per communication round.

Significance. If the geometric aggregation methods can be shown to drive the gains independently of the SPD representation, the work would address a relevant gap in federated learning for manifold-constrained models and could benefit signal-processing applications. The efficiency claims and independence from the optimizer are potentially useful strengths, but the current evidence does not yet establish these contributions.

major comments (2)
  1. [Simulations on EEG motor imagery benchmarks] Simulations section (EEG motor imagery benchmarks): performance is reported only versus federated EEGnet, whose Euclidean CNN architecture differs fundamentally from SPDnet. No controlled comparison using identical SPDnet weights with naive Euclidean averaging (plus post-hoc orthogonalization) is provided, so gains cannot be attributed to ProjAvg/RLAvg rather than the SPD feature choice itself.
  2. [Simulations on EEG motor imagery benchmarks] Experimental design (abstract and simulations): no details on data splits, error bars, number of runs, statistical tests, or partial-participation protocol are given, preventing verification of the robustness and F1-score claims.
minor comments (1)
  1. [Abstract] The phrase 'robustness to federation' in the abstract is ambiguous; clarify whether it refers to number of clients, client heterogeneity, or another quantity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to better isolate the contribution of our geometric aggregation methods and to improve experimental transparency. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: Simulations section (EEG motor imagery benchmarks): performance is reported only versus federated EEGnet, whose Euclidean CNN architecture differs fundamentally from SPDnet. No controlled comparison using identical SPDnet weights with naive Euclidean averaging (plus post-hoc orthogonalization) is provided, so gains cannot be attributed to ProjAvg/RLAvg rather than the SPD feature choice itself.

    Authors: We acknowledge that the current experiments compare against federated EEGnet rather than applying Euclidean averaging directly to SPDnet. While the primary contribution is a complete federated framework for SPDnet that outperforms a standard CNN-based baseline, we agree that an ablation isolating the aggregation strategy on identical SPDnet weights would strengthen attribution to ProjAvg and RLAvg. We will add this controlled comparison in the revised manuscript, reporting results for naive Euclidean averaging followed by post-hoc orthogonalization on the same model and datasets. revision: yes

  2. Referee: Experimental design (abstract and simulations): no details on data splits, error bars, number of runs, statistical tests, or partial-participation protocol are given, preventing verification of the robustness and F1-score claims.

    Authors: We agree that these details are essential for reproducibility and verification. In the revised manuscript we will expand the experimental section to specify the data splits (including per-client train/test partitions for the EEG motor imagery benchmarks), report error bars as standard deviations over a stated number of independent runs, include appropriate statistical tests (e.g., paired t-tests) for the reported F1-score differences, and describe the partial-participation protocol (client sampling fraction per round and its impact on robustness). revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; methods defined geometrically and tested empirically

full rationale

The paper introduces ProjAvg (arithmetic mean projected to Stiefel) and RLAvg (retraction/lifting tangent approximation) as direct constructions from manifold geometry to preserve orthogonality, then reports empirical F1 gains versus federated EEGnet on EEG benchmarks. No equation reduces a claimed prediction to a fitted input by construction, no self-citation is load-bearing for the central premise, and no ansatz or uniqueness result is smuggled in. The performance comparison is external and falsifiable rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard properties of the Stiefel manifold and Riemannian retractions; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)
  • standard math The Stiefel manifold admits efficient projections and retractions that preserve orthogonality while allowing approximate averaging in the tangent space.
    Invoked to justify that ProjAvg and RLAvg are both geometrically valid and computationally cheap.

pith-pipeline@v0.9.0 · 5426 in / 1373 out tokens · 63474 ms · 2026-05-08T09:58:33.304378+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Communication-efficient learning of deep networks from decentralized data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics. PMLR, 2017, pp. 1273– 1282

  2. [2]

    Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning,

    H. Yu, S. Yang, and S. Zhu, “Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, 2019, pp. 5693–5700

  3. [3]

    Federated optimization in heterogeneous networks,

    T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020

  4. [4]

    On convergence of fedprox: Local dissimilarity invariant bounds, non-smoothness and beyond,

    X. Yuan and P. Li, “On convergence of fedprox: Local dissimilarity invariant bounds, non-smoothness and beyond,”Advances in Neural Information Processing Systems, vol. 35, pp. 10 752–10 765, 2022

  5. [5]

    Scaffold: Stochastic controlled averaging for federated learning,

    S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” inInternational conference on machine learning. PMLR, 2020, pp. 5132–5143

  6. [6]

    Absil, R

    P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization algorithms on matrix manifolds. Princeton University Press, 2008

  7. [7]

    Boumal,An introduction to optimization on smooth manifolds

    N. Boumal,An introduction to optimization on smooth manifolds. Cambridge University Press, 2023

  8. [8]

    A riemannian network for spd matrix learning,

    Z. Huang and L. Van Gool, “A riemannian network for spd matrix learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 31, 2017

  9. [9]

    Riemannian batch normalization for spd neural networks,

    D. Brooks, O. Schwander, F. Barbaresco, J.-Y . Schneider, and M. Cord, “Riemannian batch normalization for spd neural networks,”Advances in Neural Information Processing Systems, vol. 32, 2019

  10. [10]

    SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG,

    R. J. Kobler, J. ichiro Hirayama, Q. Zhao, and M. Kawanabe, “SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG,” inNeurips, 2022

  11. [11]

    Classification of buried objects from ground penetrating radar images by using second order deep learning models,

    D. Jafuno, A. Mian, G. Ginolhac, and N. Stelzenmuller, “Classification of buried objects from ground penetrating radar images by using second order deep learning models,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025

  12. [12]

    arXiv preprint arXiv:2206.05668 , year =

    J. Li and S. Ma, “Federated learning on riemannian manifolds,”arXiv preprint arXiv:2206.05668, 2022

  13. [13]

    Nonconvex federated learning on compact smooth submanifolds with heterogeneous data,

    J. Zhang, J. Hu, A. M.-C. So, and M. Johansson, “Nonconvex federated learning on compact smooth submanifolds with heterogeneous data,” Advances in Neural Information Processing Systems, vol. 37, pp. 109 817– 109 844, 2024

  14. [14]

    Federated learning on riemannian manifolds with differential privacy,

    Z. Huang, W. Huang, P. Jawanpuria, and B. Mishra, “Federated learning on riemannian manifolds with differential privacy,”arXiv preprint arXiv:2404.10029, 2024

  15. [15]

    Riemannian federated learning via averaging gradient stream,

    ——, “Riemannian federated learning via averaging gradient stream,” arXiv preprint arXiv:2409.07223, 2024

  16. [16]

    Beyond R- barycenters: An effective averaging method on Stiefel and Grassmann manifolds,

    F. Bouchard, N. Laurent, S. Said, and N. Le Bihan, “Beyond R- barycenters: An effective averaging method on Stiefel and Grassmann manifolds,”IEEE Signal Processing Letters, vol. 32, pp. 1950–1954, 2025

  17. [17]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, A. Courville, and Y . Bengio,Deep learning. MIT press Cambridge, 2016, vol. 1

  18. [18]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

  19. [19]

    Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints,

    P. Ablin, S. Vary, B. Gao, and P.-A. Absil, “Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints,”Journal of Machine Learning Research, vol. 25, no. 389, pp. 1–38, 2024

  20. [20]

    Multiclass brain–computer interface classification by riemannian geometry,

    A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Multiclass brain–computer interface classification by riemannian geometry,”IEEE transactions on biomedical engineering, vol. 59, no. 4, pp. 920–928, 2012

  21. [21]

    Is second-order information helpful for large-scale visual recognition?

    P. Li, J. Xie, Q. Wang, and W. Zuo, “Is second-order information helpful for large-scale visual recognition?” inProceedings of the IEEE international conference on computer vision, 2017, pp. 2070–2078

  22. [22]

    A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update,

    F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rako- tomamonjy, and F. Yger, “A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update,”Journal of neural engineering, vol. 15, no. 3, p. 031005, 2018

  23. [23]

    Geometric neural network based on phase space for bci-eeg decoding,

    I. Carrara, B. Aristimunha, M.-C. Corsi, R. Y . de Camargo, S. Chevallier, and T. Papadopoulo, “Geometric neural network based on phase space for bci-eeg decoding,”Journal of Neural Engineering, vol. 22, no. 1, p. 016049, 2025

  24. [24]

    Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,

    V . J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,”Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018

  25. [25]

    The largest EEG -based BCI reproducibility study for open science: the moabb benchmark

    S. Chevallier, I. Carrara, B. Aristimunha, P. Guetschel, S. Sedlar, B. Lopes, S. Velut, S. Khazem, and T. Moreau, “The largest eeg-based bci reproducibility study for open science: the moabb benchmark,”arXiv preprint arXiv:2404.15319, 2024

  26. [26]

    Early stopping—but when?

    L. Prechelt, “Early stopping—but when?” inNeural Networks: Tricks of the trade. Springer, 1998, pp. 55–69

  27. [27]

    Fedbn: Federated learning on non-iid features via local batch normalization,

    X. Li, M. Jiang, X. Zhang, M. Kamp, and Q. Dou, “Fedbn: Federated learning on non-iid features via local batch normalization,”arXiv preprint arXiv:2102.07623, 2021