Recognition: unknown
Generalizing the Geometry of Model Merging Through Frechet Averages
Pith reviewed 2026-05-08 03:09 UTC · model grok-4.3
The pith
Model merging requires Fréchet averaging on symmetry-respecting manifolds rather than direct parameter averages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Merging is Fréchet averaging on an appropriate manifold: the merged parameters are those that minimize the sum of geodesic distances to the source models. The decisive design choice is therefore the triple of metric, manifold, and distance approximation, which together define what it means for two models to be close. Under simplifying assumptions the construction contains Fisher merging as a special case. For LoRA adapters the relevant manifold is a quotient space induced by the scaling symmetry, and the paper derives a practical algorithm for that geometry that improves on existing LoRA merge heuristics.
What carries the argument
Fréchet averaging, the operation of selecting the point on the manifold that minimizes the sum of geodesic distances to the given models, thereby inheriting invariance from the chosen geometry.
If this is right
- Fisher merging emerges automatically once the manifold is taken to be the one induced by the Fisher information metric and the distance is linearized.
- LoRA merges performed on the quotient manifold avoid the scaling ambiguity that current ad-hoc methods must patch after the fact.
- Any architectural symmetry group can in principle be accommodated by selecting the corresponding quotient or covering manifold before averaging.
- The quality of the merge is controlled by how faithfully the distance approximation respects the chosen geometry rather than by the averaging step itself.
Where Pith is reading between the lines
- The same geometry-first approach could be applied to merging in continual learning or federated settings where parameter symmetries recur across tasks or clients.
- Optimization libraries for manifold-valued data could be reused directly for model merging once the manifold is fixed.
- Different downstream tasks might favor different manifolds for the same base models, suggesting that merge geometry should be chosen after seeing the target metric.
Load-bearing premise
That a manifold and metric can be chosen to encode the relevant symmetries while still permitting tractable approximation of geodesics and the resulting average.
What would settle it
A concrete counter-example: take two models related by a known non-trivial symmetry (such as a scaling of LoRA factors) and show that the Fréchet average on the proposed manifold yields a merged model whose downstream performance is no better than naive averaging on that symmetry orbit.
Figures
read the original abstract
Model merging aims to combine multiple models into one without additional training. Na\"ive parameter-space averaging can be fragile under architectural symmetries, as their geometry does not take them into account. In this work we show that not only the geometry, but also the averaging procedure itself, must be symmetry-invariant to achieve symmetry-aware merges. Consequently, we propose a general solution: merging as Fr\'echet averaging, i.e., selecting parameters that minimize a sum of geodesic distances on an appropriate manifold. In this view, the key design choice is the overall geometry, i.e., the choice of metric, manifold, and distance approximation, that determines what it means for two models to be "close". We show that Fr\'echet averaging, combined with simplifying assumptions, contains Fisher merging. Building on this, we examine the particular case of low-rank adapters (LoRA), whose symmetries induce a distinct geometry: that of a quotient manifold. We outline the limitations of current LoRA merging methods, propose a practical algorithm for this setting, and show how they compare with other commonly used approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that naive parameter-space averaging for model merging is fragile under architectural symmetries because it fails to respect the underlying geometry. It argues that both the geometry and the averaging procedure must be symmetry-invariant, and proposes merging as Fréchet averaging: selecting parameters that minimize the sum of geodesic distances on an appropriate manifold. The key design choice is the metric, manifold, and distance approximation. The paper shows that Fréchet averaging contains Fisher merging under simplifying assumptions, and develops a practical algorithm for the LoRA case using quotient manifold geometry induced by symmetries, comparing it to existing methods.
Significance. If the central claims hold, this provides a principled geometric generalization of model merging that respects symmetries, extending Fisher merging and offering a new approach for LoRA adapters via quotient manifolds. It could lead to more robust merged models in practice by making the averaging procedure itself invariant.
major comments (2)
- [Abstract] Abstract: The claim that 'Fréchet averaging, combined with simplifying assumptions, contains Fisher merging' is central to the generalization argument, but the manuscript provides no explicit derivation or list of the simplifying assumptions. Without this, it is impossible to verify whether the geodesic-distance minimization reduces to Fisher merging while preserving the required symmetry invariance.
- [LoRA quotient manifold section] The section on the LoRA quotient manifold and practical algorithm: The proposal requires that the chosen metric and distance approximation commute with the symmetry group action to ensure the averaging procedure is symmetry-invariant. No proof, error bounds, or verification is given that the tractability approximations (necessary for computing the Fréchet mean) do not break equivariance, making the operational symmetry-awareness formal rather than guaranteed.
minor comments (2)
- [Introduction] The introduction could more clearly distinguish the contribution of the Fréchet averaging procedure itself from the choice of manifold geometry.
- Notation for the manifold, metric, and geodesic distance approximation should be introduced with explicit definitions before the main claims.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which helps clarify the presentation of our geometric framework for model merging. We agree that additional explicit derivations and verifications will strengthen the manuscript and plan to incorporate them in the revision.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'Fréchet averaging, combined with simplifying assumptions, contains Fisher merging' is central to the generalization argument, but the manuscript provides no explicit derivation or list of the simplifying assumptions. Without this, it is impossible to verify whether the geodesic-distance minimization reduces to Fisher merging while preserving the required symmetry invariance.
Authors: We acknowledge that the connection was presented at a high level in the main text. In the revised manuscript we will add an explicit appendix containing the full step-by-step derivation. The simplifying assumptions we will list are: (i) the manifold is taken to be the parameter space equipped with the Fisher-Rao metric, (ii) the geodesic distance is locally approximated by the KL divergence between predictive distributions, and (iii) the models lie in a neighborhood where higher-order curvature terms can be neglected. Under these conditions the Fréchet mean reduces exactly to the Fisher merging objective while the invariance properties are preserved by construction of the metric. revision: yes
-
Referee: [LoRA quotient manifold section] The section on the LoRA quotient manifold and practical algorithm: The proposal requires that the chosen metric and distance approximation commute with the symmetry group action to ensure the averaging procedure is symmetry-invariant. No proof, error bounds, or verification is given that the tractability approximations (necessary for computing the Fréchet mean) do not break equivariance, making the operational symmetry-awareness formal rather than guaranteed.
Authors: We agree that a formal guarantee is desirable. The quotient metric is defined to be invariant under the group action by construction, and the first-order Taylor approximation of the distance is equivariant to first order. In the revision we will add (a) a short proof sketch showing that the approximation commutes with the group action up to O(ε²) terms where ε is the step size, (b) explicit error bounds derived from the sectional curvature of the quotient manifold, and (c) additional synthetic experiments that quantify the residual invariance error after merging. We will also state the precise conditions under which exact equivariance holds and note the practical regimes where the approximation error remains negligible. revision: partial
Circularity Check
No significant circularity; Fréchet averaging proposal is a self-contained generalization
full rationale
The paper's derivation chain proposes merging as Fréchet averaging on a symmetry-respecting manifold (e.g., quotient manifold for LoRA) and shows that this framework contains Fisher merging under simplifying assumptions on the geometry. This is presented as a design choice of metric, manifold, and distance approximation rather than a reduction to fitted inputs or prior self-citations. No equations or claims reduce by construction to the target result itself; the central claim relies on standard manifold geometry applied to model merging without load-bearing self-referential steps or renaming of known results.
Axiom & Free-Parameter Ledger
free parameters (1)
- choice of metric, manifold, and distance approximation
axioms (1)
- domain assumption Model parameters can be viewed as points on a manifold with geodesic distances that respect architectural symmetries
Reference graph
Works this paper leans on
-
[1]
Boumal, N.An introduction to optimization on smooth manifolds
URL https://api.semanticscholar.org/ CorpusID:116976027. Boumal, N.An introduction to optimization on smooth manifolds. Cambridge University Press, 2023. da Silva, M. F., Dangel, F., and Oore, S. Hide& seek: Transformer symmetries obscure sharpness & riemannian geometry finds it,
2023
-
[2]
Dinh, L., Pascanu, R., Bengio, S., and Bengio, Y
URLhttps://arxiv.org/abs/2505.05409. Dinh, L., Pascanu, R., Bengio, S., and Bengio, Y . Sharp minima can generalize for deep nets, 2017. Draxler, F., Veschgini, K., Salmhofer, M., and Hamprecht, F. Es- sentially no barriers in neural network energy landscape. In Dy, J. and Krause, A. (eds.),Proceedings of the 35th International Conference on Machine Learn...
-
[3]
Editing Models with Task Arithmetic
URL https://proceedings.mlr.press/v80/ draxler18a.html. Edelman, A., Arias, T. A., and Smith, S. T. The Geometry of Algorithms with Orthogonality Constraints. 1998. Garipov, T., Izmailov, P., Podoprikhin, D., Vetrov, D., and Wilson, A. G. Loss surfaces, mode connectivity, and fast ensembling of dnns, 2018. URL https://arxiv.org/abs/1802. 10026. Hu, E. J.,...
work page internal anchor Pith review arXiv 1998
-
[4]
URLhttps://arxiv.org/abs/2212.09849. Lee, J. and Jung, S. Huber means on riemannian manifolds, 2025. URLhttps://arxiv.org/abs/2407.15764. Mataigne, S., Zimmermann, R., and Miolane, N. An efficient algorithm for the riemannian logarithm on the stiefel manifold for a family of riemannian metrics, 2024. URL https:// arxiv.org/abs/2403.11730. Matena, M. and R...
-
[5]
URLhttps://arxiv.org/abs/1209.0430. Pennec, X. Intrinsic statistics on riemannian manifolds: Basic tools for geometric measurements.Journal of Mathematical Imaging and Vision, 25:127–154, 2006. URL https://api. semanticscholar.org/CorpusID:653972. Pinele, J., Strapasson, J. E., and Costa, S. I. R. The fisher–rao distance between multivariate normal distri...
-
[6]
Wortsman, M., Ilharco, G., Yitzhak Gadre, S., Roelofs, R., Gontijo- Lopes, R., Morcos, A
URLhttps://arxiv.org/abs/2509.01548. Wortsman, M., Ilharco, G., Yitzhak Gadre, S., Roelofs, R., Gontijo- Lopes, R., Morcos, A. S., Namkoong, H., Farhadi, A., Carmon, Y ., Kornblith, S., and Schmidt, L. Model soups: averaging weights of multiple fine-tuned models improves accuracy with- out increasing inference time.arXiv e-prints, 2022. Yadav, P., Tam, D....
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.