pith. machine review for the scientific record. sign in

arxiv: 2605.05959 · v1 · submitted 2026-05-07 · 💻 cs.AI · cs.DC· cs.LG

Recognition: unknown

From Coordinate Matching to Structural Alignment: Rethinking Prototype Alignment in Heterogeneous Federated Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-08 10:51 UTC · model grok-4.3

classification 💻 cs.AI cs.DCcs.LG
keywords heterogeneous federated learningprototype alignmentstructural alignmentcoordinate alignmentFedSAFmodel heterogeneity
0
0 comments X

The pith

In heterogeneous federated learning, prototype alignment succeeds when it matches inter-class relations rather than exact coordinates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that existing prototype-based approaches to heterogeneous federated learning reuse coordinate-wise matching, which forces every client to optimize inside the same global feature subspace. This works when all models share the same architecture but becomes counterproductive once clients use different feature extractors, because the shared subspace suppresses each client's private capacity. The authors separate the useful goal of preserving semantic relations between classes from the unnecessary goal of enforcing a common coordinate basis. Their method therefore aligns only the relative structure among prototypes, letting each client retain its own representation space while still exchanging class-level semantics. Experiments on standard benchmarks confirm that this change produces higher accuracy than prior prototype methods.

Core claim

Coordinate alignment couples two objectives that should be separate: matching inter-class semantic structure, which aids classification, and forcing a shared feature basis, which is harmful under model heterogeneity. Structural alignment removes the second objective by matching relational properties such as inter-class similarities or distances instead of absolute positions, allowing each client's feature extractor to remain distinct while still benefiting from global class relations.

What carries the argument

Structural alignment objective that matches inter-class relational structure across clients instead of absolute coordinate positions in the embedding space.

Load-bearing premise

Inter-class relational structure can be aligned across clients without any shared coordinate basis and that doing so is always more useful than coordinate matching when feature extractors differ.

What would settle it

An experiment on the same heterogeneous benchmarks where structural alignment produces equal or lower accuracy than coordinate alignment.

Figures

Figures reproduced from arXiv: 2605.05959 by Guogang Zhu, Jianwei Niu, Jiayuan Zhang, Shaojie Tang, Xinghao Wu, Xuefeng Liu.

Figure 1
Figure 1. Figure 1: Comparison of effective dimensionality in homogeneous and het view at source ↗
Figure 3
Figure 3. Figure 3: Comprehensive comparison between coordinate alignment (MSE, view at source ↗
Figure 4
Figure 4. Figure 4: Overview of FedSAF. (1) Federated Pipeline: The server constructs global class prototypes Pg by aggregating uploaded local prototypes and subsequently broadcasts Pg to heterogeneous clients. (2) Local Training on Client i: For a given mini-batch, the client extracts representations z = fi(x) and forms batch-wise local prototypes Pi. A structure operator S(·) is applied to compute the semantic structures of… view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy improvement (%) over the no-alignment baseline ( view at source ↗
Figure 6
Figure 6. Figure 6: Per-client test accuracy on CIFAR-10 and CIFAR-100 under Dir(0.1) view at source ↗
Figure 7
Figure 7. Figure 7: t-SNE visualization of local prototypes under MSE (coordinate view at source ↗
Figure 8
Figure 8. Figure 8: Test accuracy (%) of existing methods when combining with GCSA. view at source ↗
Figure 9
Figure 9. Figure 9: The test accuracy curve of different methods on CIFAR-10 and view at source ↗
Figure 10
Figure 10. Figure 10: Effect of local epoch E on different methods. since less frequent communication reduces the effectiveness of global knowledge exchange. (2) Our structural alignment methods (GCSA and RCSA) remain robust across all settings and consistently outperform other HtFL methods under all local epoch configurations. V. DISCUSSION A. Relationship with Contrastive Alignment Many prototype-based methods adopt contrast… view at source ↗
read the original abstract

Heterogeneous federated learning (HtFL) aims to enable collaboration among clients that differ in both data distributions and model architectures. Prototype-based methods, which communicate class-level feature centers (prototypes) instead of full model parameters, have recently shown strong potential for HtFL. Existing prototype-based HtFL methods typically reuse the MSE-based or cosine-based alignment mechanism developed for homogeneous FL when aligning client-specific representations with global prototypes. These approaches are essentially coordinate alignment, where representations of clients are forced to match the global prototypes in the embedding space in an element-wise manner. Such alignment implicitly assumes that all clients should map their representations into the feature subspace defined by the global prototypes. This assumption is reasonable in homogeneous FL, where all clients share the same feature extractor. However, it becomes problematic in HtFL, since heterogeneous feature extractors naturally induce client-specific feature subspaces, and forcing all clients to optimize within a single global subspace unnecessarily suppresses their learning capacity. We observe that coordinate alignment implicitly couples two distinct objectives: aligning inter-class semantic structure, which is directly beneficial for classification, and enforcing a shared feature basis, which is unnecessary and even harmful under model heterogeneity. Building on this insight, we design FedSAF, which shifts the alignment objective from absolute coordinates to inter-class relational structure. We demonstrate that structural alignment consistently outperforms coordinate alignment in heterogeneous settings. Experiments on multiple benchmarks show that our structural alignment outperforms state-of-the-art prototype-based HtFL methods by up to 3.52\%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper argues that existing prototype-based methods in heterogeneous federated learning (HtFL) rely on coordinate alignment (MSE or cosine similarity) of client prototypes to global ones, which implicitly enforces a shared feature basis unsuitable for heterogeneous model architectures. It proposes FedSAF to instead align inter-class relational structures, claiming this decouples beneficial semantic alignment from harmful coordinate constraints and yields consistent gains, outperforming state-of-the-art prototype-based HtFL methods by up to 3.52% across benchmarks.

Significance. If the performance improvements are shown to arise specifically from the structural alignment objective, the work offers a conceptually useful reframing of prototype alignment in HtFL that could guide future methods toward preserving client-specific feature subspaces while still transferring class relations. The distinction between coordinate and structural objectives is a clear contribution, though its significance hinges on whether experiments isolate this factor from other design elements.

major comments (2)
  1. [Experiments] Experiments section: the reported gains of up to 3.52% are not supported by an ablation that fixes all other FedSAF components (prototype computation, optimization schedule, regularization) and reverts only the alignment loss to standard MSE/cosine coordinate matching. Without this control, it is impossible to attribute improvements to the claimed shift from coordinate to structural alignment rather than confounding factors.
  2. [Method] Method section: no explicit loss formulation or derivation is provided for the structural alignment objective (e.g., how inter-class relations are quantified and optimized independently of absolute coordinates), which is load-bearing for the central claim that coordinate alignment couples two distinct objectives.
minor comments (1)
  1. [Abstract] Abstract: the claim of 'multiple benchmarks' is not accompanied by any enumeration of datasets, model architectures, or heterogeneity settings, which hinders immediate assessment of the scope of the empirical results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the experimental isolation of our core contribution and to improve the explicitness of the method. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the reported gains of up to 3.52% are not supported by an ablation that fixes all other FedSAF components (prototype computation, optimization schedule, regularization) and reverts only the alignment loss to standard MSE/cosine coordinate matching. Without this control, it is impossible to attribute improvements to the claimed shift from coordinate to structural alignment rather than confounding factors.

    Authors: We agree that the current experiments do not include a controlled ablation that holds every other FedSAF component fixed while swapping only the alignment loss back to coordinate matching. Such an ablation is required to isolate the effect of the structural objective. In the revised manuscript we will add this experiment on the primary benchmarks, reporting accuracy deltas when the structural loss is replaced by MSE and by cosine similarity under identical prototype computation, optimization schedule, and regularization settings. revision: yes

  2. Referee: [Method] Method section: no explicit loss formulation or derivation is provided for the structural alignment objective (e.g., how inter-class relations are quantified and optimized independently of absolute coordinates), which is load-bearing for the central claim that coordinate alignment couples two distinct objectives.

    Authors: We accept that the manuscript would benefit from a more self-contained mathematical presentation. The structural alignment objective is described in Section 3 as alignment of inter-class relation matrices (pairwise cosine similarities among prototypes), but the explicit loss expression and its derivation are not written out. In the revision we will insert a dedicated subsection containing (i) the precise loss formula, (ii) the derivation showing how the objective depends only on relative angles and is invariant to client-specific linear transformations of the feature space, and (iii) a short argument clarifying the separation from coordinate constraints. revision: yes

Circularity Check

0 steps flagged

No circularity: new structural alignment objective introduced independently of prior fitted results

full rationale

The paper proposes FedSAF by shifting the alignment loss from coordinate matching (MSE/cosine on prototypes) to inter-class relational structure. This is presented as a design choice motivated by an observation about heterogeneous feature subspaces, not as a mathematical derivation or re-expression of any fitted quantity. No equations reduce a prediction to an input by construction, no parameters are fitted on a subset and then called a prediction, and no self-citation chain is invoked to justify uniqueness or force the method. The central claim rests on experimental comparisons rather than tautological re-labeling of existing results. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that inter-class relational structure is both sufficient for classification and preferable to coordinate matching under model heterogeneity. No free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Inter-class relational structure is directly beneficial for classification and can be aligned independently of client-specific feature bases.
    This premise is invoked to justify replacing coordinate alignment with structural alignment in heterogeneous settings.

pith-pipeline@v0.9.0 · 5590 in / 1173 out tokens · 30630 ms · 2026-05-08T10:51:16.557353+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    Communication-efficient learning of deep networks from decentralized data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial Intelligence and Statistics. PMLR, 2017, pp. 1273– 1282

  2. [2]

    Federated learning for generalization, robustness, fairness: A survey and benchmark,

    W. Huang, M. Ye, Z. Shi, G. Wan, H. Li, B. Du, and Q. Yang, “Federated learning for generalization, robustness, fairness: A survey and benchmark,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 9387–9406, 2024

  3. [3]

    Federated optimization in heterogeneous networks,

    T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020

  4. [4]

    Scaffold: Stochastic controlled averaging for federated learn- ing,

    S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learn- ing,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 5132–5143

  5. [5]

    Model-contrastive federated learning,

    Q. Li, B. He, and D. Song, “Model-contrastive federated learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 713–10 722

  6. [6]

    Stabilizing and accelerating federated learning on heterogeneous data with partial client participation,

    H. Zhang, C. Li, W. Dai, Z. Zheng, J. Zou, and H. Xiong, “Stabilizing and accelerating federated learning on heterogeneous data with partial client participation,”IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, vol. 47, no. 1, pp. 67–83, 2025

  7. [7]

    Federated feature augmentation and alignment,

    T. Zhou, Y . Yuan, B. Wang, and E. Konukoglu, “Federated feature augmentation and alignment,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 11 119–11 135, 2024

  8. [8]

    Bold but cautious: Unlocking the potential of personalized federated learning through cautiously aggressive collaboration,

    X. Wu, X. Liu, J. Niu, G. Zhu, and S. Tang, “Bold but cautious: Unlocking the potential of personalized federated learning through cautiously aggressive collaboration,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 19 375–19 384

  9. [9]

    Decoupling general and personalized knowledge in federated learning via additive and low-rank decomposition,

    X. Wu, X. Liu, J. Niu, H. Wang, S. Tang, G. Zhu, and H. Su, “Decoupling general and personalized knowledge in federated learning via additive and low-rank decomposition,” inProceedings of the 32nd ACM International Conference on Multimedia, ser. MM ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 7172–7181. [Online]. Available: https...

  10. [10]

    The diversity bonus: Learning from dissimilar clients in personalized fed- erated learning,

    X. Wu, J. Niu, X. Liu, G. Zhu, S. Tang, W. Lin, and J. Cao, “The diversity bonus: Learning from dissimilar clients in personalized fed- erated learning,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 10, pp. 18 613–18 627, 2025

  11. [11]

    Tackling feature-classifier mismatch in federated learning via prompt-driven feature transformation,

    X. Wu, X. Liu, J. Niu, G. Zhu, M. Shi, S. Tang, and J. Yuan, “Tackling feature-classifier mismatch in federated learning via prompt-driven feature transformation,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id=vTJFQu5YXz JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8...

  12. [12]

    Htfllib: A comprehensive heterogeneous federated learning library and benchmark,

    J. Zhang, X. Wu, Y . Zhou, X. Sun, Q. Cai, Y . Liu, Y . Hua, Z. Zheng, J. Cao, and Q. Yang, “Htfllib: A comprehensive heterogeneous federated learning library and benchmark,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2025

  13. [13]

    Fedmd: Heterogenous federated learning via model distillation,

    D. Li and J. Wang, “Fedmd: Heterogenous federated learning via model distillation,”arXiv preprint arXiv:1910.03581, 2019

  14. [14]

    Ensemble distillation for robust model fusion in federated learning,

    T. Lin, L. Kong, S. U. Stich, and M. Jaggi, “Ensemble distillation for robust model fusion in federated learning,”Advances in neural information processing systems, vol. 33, pp. 2351–2363, 2020

  15. [15]

    Parameterized knowledge transfer for personalized federated learning,

    J. Zhang, S. Guo, X. Ma, H. Wang, W. Xu, and F. Wu, “Parameterized knowledge transfer for personalized federated learning,”Advances in Neural Information Processing Systems, vol. 34, pp. 10 092–10 104, 2021

  16. [16]

    Communication-efficient federated learning via knowledge distillation,

    C. Wu, F. Wu, L. Lyu, Y . Huang, and X. Xie, “Communication-efficient federated learning via knowledge distillation,”Nature communications, vol. 13, no. 1, p. 2032, 2022

  17. [17]

    Group knowledge transfer: Federated learning of large cnns at the edge,

    C. He, M. Annavaram, and S. Avestimehr, “Group knowledge transfer: Federated learning of large cnns at the edge,”Advances in neural information processing systems, vol. 33, pp. 14 068–14 080, 2020

  18. [18]

    Fedproto: Federated prototype learning across heterogeneous clients,

    Y . Tan, G. Long, L. Liu, T. Zhou, Q. Lu, J. Jiang, and C. Zhang, “Fedproto: Federated prototype learning across heterogeneous clients,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, 2022, pp. 8432–8440

  19. [19]

    Fedtgp: Trainable global prototypes with adaptive-margin-enhanced contrastive learning for data and model heterogeneity in federated learning,

    J. Zhang, Y . Liu, Y . Hua, and J. Cao, “Fedtgp: Trainable global prototypes with adaptive-margin-enhanced contrastive learning for data and model heterogeneity in federated learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 15, 2024, pp. 16 768–16 776

  20. [20]

    Enhancing Visual Representation with Textual Semantics: Textual Semantics-Powered Prototypes for Heterogeneous Federated Learning

    X. Wu, J. Niu, X. Liu, G. Zhu, J. Zhang, and S. Tang, “Enhancing visual representation with textual semantics: Textual semantics-powered prototypes for heterogeneous federated learning,” 2025. [Online]. Available: https://arxiv.org/abs/2503.13543

  21. [21]

    Aligning before aggregating: Enabling communication efficient cross-domain federated learning via consistent feature extraction,

    G. Zhu, X. Liu, S. Tang, and J. Niu, “Aligning before aggregating: Enabling communication efficient cross-domain federated learning via consistent feature extraction,”IEEE Transactions on Mobile Computing, vol. 23, no. 5, pp. 5880–5896, 2024

  22. [22]

    Personalized federated learning with feature alignment and classifier collaboration,

    J. Xu, X. Tong, and S.-L. Huang, “Personalized federated learning with feature alignment and classifier collaboration,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=SXZr8aDKia

  23. [23]

    Taming cross- domain representation variance in federated prototype learning with heterogeneous data domains,

    L. Wang, J. Bian, L. Zhang, C. Chen, and J. Xu, “Taming cross- domain representation variance in federated prototype learning with heterogeneous data domains,” inThe Thirty-eighth Annual Conference on Neural Information Processing Systems

  24. [24]

    Rethinking federated learning with domain shift: A prototype view,

    W. Huang, M. Ye, Z. Shi, H. Li, and B. Du, “Rethinking federated learning with domain shift: A prototype view,” in2023 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023, pp. 16 312–16 322

  25. [25]

    Federated learning from pre-trained models: A contrastive learning approach,

    Y . Tan, G. Long, J. Ma, L. Liu, T. Zhou, and J. Jiang, “Federated learning from pre-trained models: A contrastive learning approach,”Advances in neural information processing systems, vol. 35, pp. 19 332–19 344, 2022

  26. [26]

    Fedfa: Federated learning with feature anchors to align features and classifiers for heterogeneous data,

    T. Zhou, J. Zhang, and D. H. K. Tsang, “Fedfa: Federated learning with feature anchors to align features and classifiers for heterogeneous data,” IEEE Transactions on Mobile Computing, vol. 23, no. 6, pp. 6731–6742, 2024

  27. [27]

    Heterogeneous feder- ated learning: State-of-the-art and research challenges,

    M. Ye, X. Fang, B. Du, P. C. Yuen, and D. Tao, “Heterogeneous feder- ated learning: State-of-the-art and research challenges,”ACM Computing Surveys, vol. 56, no. 3, pp. 1–44, 2023

  28. [28]

    Hetero{fl}: Computation and communication efficient federated learning for heterogeneous clients,

    E. Diao, J. Ding, and V . Tarokh, “Hetero{fl}: Computation and communication efficient federated learning for heterogeneous clients,” inInternational Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=TNkPBBYFkXg

  29. [29]

    Fedrolex: Model- heterogeneous federated learning with rolling sub-model extraction,

    S. Alam, L. Liu, M. Yan, and M. Zhang, “Fedrolex: Model- heterogeneous federated learning with rolling sub-model extraction,” Advances in neural information processing systems, vol. 35, pp. 29 677– 29 690, 2022

  30. [30]

    Fiarse: Model-heterogeneous federated learning via importance-aware submodel extraction,

    F. Wu, X. Wang, Y . Wang, T. Liu, L. Su, and J. Gao, “Fiarse: Model-heterogeneous federated learning via importance-aware submodel extraction,”Advances in Neural Information Processing Systems, vol. 37, pp. 115 615–115 651, 2024

  31. [31]

    Data-Free Knowledge Distillation for Heterogeneous Federated Learning,

    Z. Zhu, J. Hong, and J. Zhou, “Data-Free Knowledge Distillation for Heterogeneous Federated Learning,” 2021

  32. [32]

    Generalizable heterogeneous fed- erated cross-correlation and instance similarity learning,

    W. Huang, M. Ye, Z. Shi, and B. Du, “Generalizable heterogeneous fed- erated cross-correlation and instance similarity learning,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. 46, no. 2, pp. 712–728, 2023

  33. [33]

    Robust asymmetric heterogeneous fed- erated learning with corrupted clients,

    X. Fang, M. Ye, and B. Du, “Robust asymmetric heterogeneous fed- erated learning with corrupted clients,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 4, pp. 2693–2705, 2025

  34. [34]

    Federated mutual learning: a collaborative machine learning method for heterogeneous data, models, and objectives,

    T. Shen, J. Zhang, X. Jia, F. Zhang, Z. Lv, K. Kuang, C. Wu, and F. Wu, “Federated mutual learning: a collaborative machine learning method for heterogeneous data, models, and objectives,”Frontiers of Information Technology & Electronic Engineering, vol. 24, no. 10, pp. 1390–1402, 2023

  35. [35]

    Federated model heterogeneous matryoshka representation learning,

    L. Yi, H. Yu, C. Ren, G. Wang, X. Liet al., “Federated model heterogeneous matryoshka representation learning,”Advances in Neural Information Processing Systems, vol. 37, pp. 66 431–66 454, 2024

  36. [36]

    Bridging model heterogeneity in federated learning via uncertainty-based asym- metrical reciprocity learning,

    J. Wang, C. Zhao, L. Lyu, Q. You, M. Huai, and F. Ma, “Bridging model heterogeneity in federated learning via uncertainty-based asym- metrical reciprocity learning,” inProceedings of the 41st International Conference on Machine Learning, 2024, pp. 52 290–52 308

  37. [37]

    Think locally, act globally: Federated learning with local and global representations,

    P. P. Liang, T. Liu, L. Ziyin, N. B. Allen, R. P. Auerbach, D. Brent, R. Salakhutdinov, and L.-P. Morency, “Think locally, act globally: Federated learning with local and global representations,”arXiv preprint arXiv:2001.01523, 2020

  38. [38]

    Fedgh: Heterogeneous federated learning with generalized global header,

    L. Yi, G. Wang, X. Liu, Z. Shi, and H. Yu, “Fedgh: Heterogeneous federated learning with generalized global header,” inProceedings of the 31st ACM International Conference on Multimedia, 2023

  39. [39]

    Fedssa: semantic similarity-based aggregation for efficient model-heterogeneous personalized federated learning,

    L. Yi, H. Yu, Z. Shi, G. Wang, X. Liu, L. Cui, and X. Li, “Fedssa: semantic similarity-based aggregation for efficient model-heterogeneous personalized federated learning,” inProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024, pp. 5371– 5379

  40. [40]

    An upload-efficient scheme for transferring knowledge from a server-side pre-trained generator to clients in heterogeneous federated learning,

    J. Zhang, Y . Liu, Y . Hua, and J. Cao, “An upload-efficient scheme for transferring knowledge from a server-side pre-trained generator to clients in heterogeneous federated learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 12 109–12 119

  41. [41]

    Fedsa: A unified representation learning via semantic anchors for prototype-based federated learning,

    Y . Zhou, X. Qu, C. You, J. Zhou, J. Tang, X. Zheng, C. Cai, and Y . Wu, “Fedsa: A unified representation learning via semantic anchors for prototype-based federated learning,”arXiv preprint arXiv:2501.05496, 2025

  42. [42]

    Tackling data heterogeneity in federated learning with class prototypes,

    Y . Dai, Z. Chen, J. Li, S. Heinecke, L. Sun, and R. Xu, “Tackling data heterogeneity in federated learning with class prototypes,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 6, 2023, pp. 7314–7322

  43. [43]

    Cifar-10 (canadian institute for advanced research),

    A. Krizhevsky, V . Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),”URL http://www. cs. toronto. edu/kriz/cifar. html, vol. 5, 2010

  44. [44]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

  45. [45]

    Tiny imagenet visual recognition challenge,

    Y . Le and X. Yang, “Tiny imagenet visual recognition challenge,”CS 231N, vol. 7, no. 7, p. 3, 2015

  46. [46]

    Deeper, broader and artier domain generalization,

    D. Li, Y . Yang, Y .-Z. Song, and T. M. Hospedales, “Deeper, broader and artier domain generalization,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017

  47. [47]

    Mo- ment matching for multi-source domain adaptation,

    X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Mo- ment matching for multi-source domain adaptation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019

  48. [48]

    Character-level convolutional net- works for text classification,

    X. Zhang, J. Zhao, and Y . LeCun, “Character-level convolutional net- works for text classification,”Advances in neural information processing systems, vol. 28, 2015

  49. [49]

    Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, and Tianyi Zhou

    S. Caldas, S. M. K. Duddu, P. Wu, T. Li, J. Kone ˇcn`y, H. B. McMahan, V . Smith, and A. Talwalkar, “Leaf: A benchmark for federated settings,” arXiv preprint arXiv:1812.01097, 2018

  50. [50]

    Communication-efficient on-device machine learning: Federated distil- lation and augmentation under non-iid private data,

    E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Communication-efficient on-device machine learning: Federated dis- tillation and augmentation under non-iid private data,”arXiv preprint arXiv:1811.11479, 2018. APPENDIX A. Proof of Proposition 1 Proof.Recall the coordinate alignment loss Lcoord(Z, P) :=∥ ˆZ− ˆP∥ 2 F ,(22) and define R∗ ∈arg min...