Recognition: unknown
FedSPDnet: Geometry-Aware Federated Deep Learning with SPDnet
Pith reviewed 2026-05-08 09:58 UTC · model grok-4.3
The pith
Federated SPDnet preserves Stiefel manifold constraints during parameter averaging by using projections or retractions instead of Euclidean means.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SPDnet processes symmetric positive definite matrices under Stiefel manifold constraints on selected layers. In the federated setting the authors substitute Euclidean averaging with ProjAvg, which projects the arithmetic mean back onto the Stiefel manifold, or RLAvg, which lifts parameters to the tangent space, averages there, and retracts. This geometric aggregation produces higher F1 scores than federated EEGnet on EEG motor imagery benchmarks, greater robustness to changes in federation size and partial participation, and lower parameter communication per round.
What carries the argument
ProjAvg and RLAvg, the two aggregation strategies that project or retract arithmetic means to respect Stiefel manifold constraints on the parameters of SPDnet.
Load-bearing premise
That projecting arithmetic means onto the Stiefel manifold or approximating tangent-space averaging via retractions and liftings will preserve enough geometric information to produce measurable gains in learning performance over Euclidean averaging.
What would settle it
If the EEG motor imagery simulations show no improvement or a decrease in F1 score and robustness metrics when using ProjAvg or RLAvg compared with standard Euclidean averaging, the claimed advantage of the geometry-aware methods would be falsified.
Figures
read the original abstract
We introduce two federated learning frameworks for the classical SPDnet model operating on symmetric positive definite (SPD) matrices with Stiefel-constrained parameters. Unlike standard Euclidean averaging, which violates orthogonality, our approach preserves geometric structure through two efficient aggregation strategies: ProjAvg, projecting arithmetic means onto the Stiefel manifold, and RLAvg, approximating tangent-space averaging via retractions and liftings. Both methods are computationally efficient, independent of the optimizer, and enable scalable federated learning for signal processing applications whose features are SPD matrices. Simulations on EEG motor imagery benchmarks show that FedSPDnet outperforms federated EEGnet in F1 score and robustness to federation and partial participation, while using fewer parameters per communication round.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FedSPDnet, two federated learning frameworks for the SPDnet model on symmetric positive definite (SPD) matrices with Stiefel-constrained parameters. It proposes ProjAvg (projecting arithmetic means onto the Stiefel manifold) and RLAvg (approximating tangent-space averaging via retractions and liftings) as geometry-preserving alternatives to Euclidean averaging. Simulations on EEG motor imagery benchmarks claim that FedSPDnet outperforms federated EEGnet in F1 score and robustness to federation and partial participation while using fewer parameters per communication round.
Significance. If the geometric aggregation methods can be shown to drive the gains independently of the SPD representation, the work would address a relevant gap in federated learning for manifold-constrained models and could benefit signal-processing applications. The efficiency claims and independence from the optimizer are potentially useful strengths, but the current evidence does not yet establish these contributions.
major comments (2)
- [Simulations on EEG motor imagery benchmarks] Simulations section (EEG motor imagery benchmarks): performance is reported only versus federated EEGnet, whose Euclidean CNN architecture differs fundamentally from SPDnet. No controlled comparison using identical SPDnet weights with naive Euclidean averaging (plus post-hoc orthogonalization) is provided, so gains cannot be attributed to ProjAvg/RLAvg rather than the SPD feature choice itself.
- [Simulations on EEG motor imagery benchmarks] Experimental design (abstract and simulations): no details on data splits, error bars, number of runs, statistical tests, or partial-participation protocol are given, preventing verification of the robustness and F1-score claims.
minor comments (1)
- [Abstract] The phrase 'robustness to federation' in the abstract is ambiguous; clarify whether it refers to number of clients, client heterogeneity, or another quantity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight opportunities to better isolate the contribution of our geometric aggregation methods and to improve experimental transparency. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: Simulations section (EEG motor imagery benchmarks): performance is reported only versus federated EEGnet, whose Euclidean CNN architecture differs fundamentally from SPDnet. No controlled comparison using identical SPDnet weights with naive Euclidean averaging (plus post-hoc orthogonalization) is provided, so gains cannot be attributed to ProjAvg/RLAvg rather than the SPD feature choice itself.
Authors: We acknowledge that the current experiments compare against federated EEGnet rather than applying Euclidean averaging directly to SPDnet. While the primary contribution is a complete federated framework for SPDnet that outperforms a standard CNN-based baseline, we agree that an ablation isolating the aggregation strategy on identical SPDnet weights would strengthen attribution to ProjAvg and RLAvg. We will add this controlled comparison in the revised manuscript, reporting results for naive Euclidean averaging followed by post-hoc orthogonalization on the same model and datasets. revision: yes
-
Referee: Experimental design (abstract and simulations): no details on data splits, error bars, number of runs, statistical tests, or partial-participation protocol are given, preventing verification of the robustness and F1-score claims.
Authors: We agree that these details are essential for reproducibility and verification. In the revised manuscript we will expand the experimental section to specify the data splits (including per-client train/test partitions for the EEG motor imagery benchmarks), report error bars as standard deviations over a stated number of independent runs, include appropriate statistical tests (e.g., paired t-tests) for the reported F1-score differences, and describe the partial-participation protocol (client sampling fraction per round and its impact on robustness). revision: yes
Circularity Check
No circularity in derivation chain; methods defined geometrically and tested empirically
full rationale
The paper introduces ProjAvg (arithmetic mean projected to Stiefel) and RLAvg (retraction/lifting tangent approximation) as direct constructions from manifold geometry to preserve orthogonality, then reports empirical F1 gains versus federated EEGnet on EEG benchmarks. No equation reduces a claimed prediction to a fitted input by construction, no self-citation is load-bearing for the central premise, and no ansatz or uniqueness result is smuggled in. The performance comparison is external and falsifiable rather than tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The Stiefel manifold admits efficient projections and retractions that preserve orthogonality while allowing approximate averaging in the tangent space.
Reference graph
Works this paper leans on
-
[1]
Communication-efficient learning of deep networks from decentralized data,
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics. PMLR, 2017, pp. 1273– 1282
2017
-
[2]
Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning,
H. Yu, S. Yang, and S. Zhu, “Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, 2019, pp. 5693–5700
2019
-
[3]
Federated optimization in heterogeneous networks,
T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020
2020
-
[4]
On convergence of fedprox: Local dissimilarity invariant bounds, non-smoothness and beyond,
X. Yuan and P. Li, “On convergence of fedprox: Local dissimilarity invariant bounds, non-smoothness and beyond,”Advances in Neural Information Processing Systems, vol. 35, pp. 10 752–10 765, 2022
2022
-
[5]
Scaffold: Stochastic controlled averaging for federated learning,
S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” inInternational conference on machine learning. PMLR, 2020, pp. 5132–5143
2020
-
[6]
Absil, R
P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization algorithms on matrix manifolds. Princeton University Press, 2008
2008
-
[7]
Boumal,An introduction to optimization on smooth manifolds
N. Boumal,An introduction to optimization on smooth manifolds. Cambridge University Press, 2023
2023
-
[8]
A riemannian network for spd matrix learning,
Z. Huang and L. Van Gool, “A riemannian network for spd matrix learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 31, 2017
2017
-
[9]
Riemannian batch normalization for spd neural networks,
D. Brooks, O. Schwander, F. Barbaresco, J.-Y . Schneider, and M. Cord, “Riemannian batch normalization for spd neural networks,”Advances in Neural Information Processing Systems, vol. 32, 2019
2019
-
[10]
SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG,
R. J. Kobler, J. ichiro Hirayama, Q. Zhao, and M. Kawanabe, “SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG,” inNeurips, 2022
2022
-
[11]
Classification of buried objects from ground penetrating radar images by using second order deep learning models,
D. Jafuno, A. Mian, G. Ginolhac, and N. Stelzenmuller, “Classification of buried objects from ground penetrating radar images by using second order deep learning models,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025
2025
-
[12]
arXiv preprint arXiv:2206.05668 , year =
J. Li and S. Ma, “Federated learning on riemannian manifolds,”arXiv preprint arXiv:2206.05668, 2022
-
[13]
Nonconvex federated learning on compact smooth submanifolds with heterogeneous data,
J. Zhang, J. Hu, A. M.-C. So, and M. Johansson, “Nonconvex federated learning on compact smooth submanifolds with heterogeneous data,” Advances in Neural Information Processing Systems, vol. 37, pp. 109 817– 109 844, 2024
2024
-
[14]
Federated learning on riemannian manifolds with differential privacy,
Z. Huang, W. Huang, P. Jawanpuria, and B. Mishra, “Federated learning on riemannian manifolds with differential privacy,”arXiv preprint arXiv:2404.10029, 2024
-
[15]
Riemannian federated learning via averaging gradient stream,
——, “Riemannian federated learning via averaging gradient stream,” arXiv preprint arXiv:2409.07223, 2024
-
[16]
Beyond R- barycenters: An effective averaging method on Stiefel and Grassmann manifolds,
F. Bouchard, N. Laurent, S. Said, and N. Le Bihan, “Beyond R- barycenters: An effective averaging method on Stiefel and Grassmann manifolds,”IEEE Signal Processing Letters, vol. 32, pp. 1950–1954, 2025
1950
-
[17]
Goodfellow, Y
I. Goodfellow, Y . Bengio, A. Courville, and Y . Bengio,Deep learning. MIT press Cambridge, 2016, vol. 1
2016
-
[18]
Adam: A Method for Stochastic Optimization
D. P. Kingma, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review arXiv 2014
-
[19]
Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints,
P. Ablin, S. Vary, B. Gao, and P.-A. Absil, “Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints,”Journal of Machine Learning Research, vol. 25, no. 389, pp. 1–38, 2024
2024
-
[20]
Multiclass brain–computer interface classification by riemannian geometry,
A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Multiclass brain–computer interface classification by riemannian geometry,”IEEE transactions on biomedical engineering, vol. 59, no. 4, pp. 920–928, 2012
2012
-
[21]
Is second-order information helpful for large-scale visual recognition?
P. Li, J. Xie, Q. Wang, and W. Zuo, “Is second-order information helpful for large-scale visual recognition?” inProceedings of the IEEE international conference on computer vision, 2017, pp. 2070–2078
2017
-
[22]
A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update,
F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rako- tomamonjy, and F. Yger, “A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update,”Journal of neural engineering, vol. 15, no. 3, p. 031005, 2018
2018
-
[23]
Geometric neural network based on phase space for bci-eeg decoding,
I. Carrara, B. Aristimunha, M.-C. Corsi, R. Y . de Camargo, S. Chevallier, and T. Papadopoulo, “Geometric neural network based on phase space for bci-eeg decoding,”Journal of Neural Engineering, vol. 22, no. 1, p. 016049, 2025
2025
-
[24]
Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,
V . J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,”Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018
2018
-
[25]
The largest EEG -based BCI reproducibility study for open science: the moabb benchmark
S. Chevallier, I. Carrara, B. Aristimunha, P. Guetschel, S. Sedlar, B. Lopes, S. Velut, S. Khazem, and T. Moreau, “The largest eeg-based bci reproducibility study for open science: the moabb benchmark,”arXiv preprint arXiv:2404.15319, 2024
-
[26]
Early stopping—but when?
L. Prechelt, “Early stopping—but when?” inNeural Networks: Tricks of the trade. Springer, 1998, pp. 55–69
1998
-
[27]
Fedbn: Federated learning on non-iid features via local batch normalization,
X. Li, M. Jiang, X. Zhang, M. Kamp, and Q. Dou, “Fedbn: Federated learning on non-iid features via local batch normalization,”arXiv preprint arXiv:2102.07623, 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.