pith. machine review for the scientific record. sign in

arxiv: 2604.27155 · v2 · submitted 2026-04-29 · 💻 cs.LG

Recognition: unknown

Generalizing the Geometry of Model Merging Through Frechet Averages

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:09 UTC · model grok-4.3

classification 💻 cs.LG
keywords model mergingFréchet averagingmanifold geometrysymmetry invarianceLoRAFisher merginggeodesic distancequotient manifold
0
0 comments X

The pith

Model merging requires Fréchet averaging on symmetry-respecting manifolds rather than direct parameter averages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that naive averaging of model parameters breaks down under common architectural symmetries because it treats parameters as if they live in flat space without structure. It proposes instead that merging amounts to Fréchet averaging: locate the parameter set whose total geodesic distance to the input models is smallest on a manifold whose metric encodes the symmetries. This view recovers Fisher merging when the manifold and distance are chosen appropriately and supplies a concrete algorithm for low-rank adapters whose quotient manifold arises from their built-in scaling symmetries. A reader cares because many practical merges today silently lose performance precisely when symmetries are present, and the framework gives a principled way to restore it by changing the geometry instead of the models.

Core claim

Merging is Fréchet averaging on an appropriate manifold: the merged parameters are those that minimize the sum of geodesic distances to the source models. The decisive design choice is therefore the triple of metric, manifold, and distance approximation, which together define what it means for two models to be close. Under simplifying assumptions the construction contains Fisher merging as a special case. For LoRA adapters the relevant manifold is a quotient space induced by the scaling symmetry, and the paper derives a practical algorithm for that geometry that improves on existing LoRA merge heuristics.

What carries the argument

Fréchet averaging, the operation of selecting the point on the manifold that minimizes the sum of geodesic distances to the given models, thereby inheriting invariance from the chosen geometry.

If this is right

  • Fisher merging emerges automatically once the manifold is taken to be the one induced by the Fisher information metric and the distance is linearized.
  • LoRA merges performed on the quotient manifold avoid the scaling ambiguity that current ad-hoc methods must patch after the fact.
  • Any architectural symmetry group can in principle be accommodated by selecting the corresponding quotient or covering manifold before averaging.
  • The quality of the merge is controlled by how faithfully the distance approximation respects the chosen geometry rather than by the averaging step itself.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same geometry-first approach could be applied to merging in continual learning or federated settings where parameter symmetries recur across tasks or clients.
  • Optimization libraries for manifold-valued data could be reused directly for model merging once the manifold is fixed.
  • Different downstream tasks might favor different manifolds for the same base models, suggesting that merge geometry should be chosen after seeing the target metric.

Load-bearing premise

That a manifold and metric can be chosen to encode the relevant symmetries while still permitting tractable approximation of geodesics and the resulting average.

What would settle it

A concrete counter-example: take two models related by a known non-trivial symmetry (such as a scaling of LoRA factors) and show that the Fréchet average on the proposed manifold yields a merged model whose downstream performance is no better than naive averaging on that symmetry orbit.

Figures

Figures reproduced from arXiv: 2604.27155 by Felix Dangel, Marvin F. da Silva, Mohammed Adnan, Sageev Oore.

Figure 1
Figure 1. Figure 1: Visual comparison of different model merging approaches, highlighting failure scenarios due to symmetry unawareness. Left: Naive averaging steps off the orbit because it uses the wrong geometry. Right: Even worse, naive merging is ambiguous and lands on different orbits depending on parameterization. Geodesic merging always stays on the same orbit as it uses a symmetry-invariant notion of averaging. tion o… view at source ↗
Figure 2
Figure 2. Figure 2: Geodesic merging in a toy two-parameter setting with view at source ↗
read the original abstract

Model merging aims to combine multiple models into one without additional training. Na\"ive parameter-space averaging can be fragile under architectural symmetries, as their geometry does not take them into account. In this work we show that not only the geometry, but also the averaging procedure itself, must be symmetry-invariant to achieve symmetry-aware merges. Consequently, we propose a general solution: merging as Fr\'echet averaging, i.e., selecting parameters that minimize a sum of geodesic distances on an appropriate manifold. In this view, the key design choice is the overall geometry, i.e., the choice of metric, manifold, and distance approximation, that determines what it means for two models to be "close". We show that Fr\'echet averaging, combined with simplifying assumptions, contains Fisher merging. Building on this, we examine the particular case of low-rank adapters (LoRA), whose symmetries induce a distinct geometry: that of a quotient manifold. We outline the limitations of current LoRA merging methods, propose a practical algorithm for this setting, and show how they compare with other commonly used approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that naive parameter-space averaging for model merging is fragile under architectural symmetries because it fails to respect the underlying geometry. It argues that both the geometry and the averaging procedure must be symmetry-invariant, and proposes merging as Fréchet averaging: selecting parameters that minimize the sum of geodesic distances on an appropriate manifold. The key design choice is the metric, manifold, and distance approximation. The paper shows that Fréchet averaging contains Fisher merging under simplifying assumptions, and develops a practical algorithm for the LoRA case using quotient manifold geometry induced by symmetries, comparing it to existing methods.

Significance. If the central claims hold, this provides a principled geometric generalization of model merging that respects symmetries, extending Fisher merging and offering a new approach for LoRA adapters via quotient manifolds. It could lead to more robust merged models in practice by making the averaging procedure itself invariant.

major comments (2)
  1. [Abstract] Abstract: The claim that 'Fréchet averaging, combined with simplifying assumptions, contains Fisher merging' is central to the generalization argument, but the manuscript provides no explicit derivation or list of the simplifying assumptions. Without this, it is impossible to verify whether the geodesic-distance minimization reduces to Fisher merging while preserving the required symmetry invariance.
  2. [LoRA quotient manifold section] The section on the LoRA quotient manifold and practical algorithm: The proposal requires that the chosen metric and distance approximation commute with the symmetry group action to ensure the averaging procedure is symmetry-invariant. No proof, error bounds, or verification is given that the tractability approximations (necessary for computing the Fréchet mean) do not break equivariance, making the operational symmetry-awareness formal rather than guaranteed.
minor comments (2)
  1. [Introduction] The introduction could more clearly distinguish the contribution of the Fréchet averaging procedure itself from the choice of manifold geometry.
  2. Notation for the manifold, metric, and geodesic distance approximation should be introduced with explicit definitions before the main claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the presentation of our geometric framework for model merging. We agree that additional explicit derivations and verifications will strengthen the manuscript and plan to incorporate them in the revision.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that 'Fréchet averaging, combined with simplifying assumptions, contains Fisher merging' is central to the generalization argument, but the manuscript provides no explicit derivation or list of the simplifying assumptions. Without this, it is impossible to verify whether the geodesic-distance minimization reduces to Fisher merging while preserving the required symmetry invariance.

    Authors: We acknowledge that the connection was presented at a high level in the main text. In the revised manuscript we will add an explicit appendix containing the full step-by-step derivation. The simplifying assumptions we will list are: (i) the manifold is taken to be the parameter space equipped with the Fisher-Rao metric, (ii) the geodesic distance is locally approximated by the KL divergence between predictive distributions, and (iii) the models lie in a neighborhood where higher-order curvature terms can be neglected. Under these conditions the Fréchet mean reduces exactly to the Fisher merging objective while the invariance properties are preserved by construction of the metric. revision: yes

  2. Referee: [LoRA quotient manifold section] The section on the LoRA quotient manifold and practical algorithm: The proposal requires that the chosen metric and distance approximation commute with the symmetry group action to ensure the averaging procedure is symmetry-invariant. No proof, error bounds, or verification is given that the tractability approximations (necessary for computing the Fréchet mean) do not break equivariance, making the operational symmetry-awareness formal rather than guaranteed.

    Authors: We agree that a formal guarantee is desirable. The quotient metric is defined to be invariant under the group action by construction, and the first-order Taylor approximation of the distance is equivariant to first order. In the revision we will add (a) a short proof sketch showing that the approximation commutes with the group action up to O(ε²) terms where ε is the step size, (b) explicit error bounds derived from the sectional curvature of the quotient manifold, and (c) additional synthetic experiments that quantify the residual invariance error after merging. We will also state the precise conditions under which exact equivariance holds and note the practical regimes where the approximation error remains negligible. revision: partial

Circularity Check

0 steps flagged

No significant circularity; Fréchet averaging proposal is a self-contained generalization

full rationale

The paper's derivation chain proposes merging as Fréchet averaging on a symmetry-respecting manifold (e.g., quotient manifold for LoRA) and shows that this framework contains Fisher merging under simplifying assumptions on the geometry. This is presented as a design choice of metric, manifold, and distance approximation rather than a reduction to fitted inputs or prior self-citations. No equations or claims reduce by construction to the target result itself; the central claim relies on standard manifold geometry applied to model merging without load-bearing self-referential steps or renaming of known results.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The paper relies on the assumption of an appropriate geometry for model parameters that accounts for symmetries, which is not derived but chosen as the key design choice.

free parameters (1)
  • choice of metric, manifold, and distance approximation
    Determines what it means for two models to be close; central to the Fréchet averaging proposal.
axioms (1)
  • domain assumption Model parameters can be viewed as points on a manifold with geodesic distances that respect architectural symmetries
    Invoked to define the Fréchet average and symmetry-invariant merging.

pith-pipeline@v0.9.0 · 5495 in / 1261 out tokens · 69623 ms · 2026-05-08T03:09:27.545864+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

6 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Boumal, N.An introduction to optimization on smooth manifolds

    URL https://api.semanticscholar.org/ CorpusID:116976027. Boumal, N.An introduction to optimization on smooth manifolds. Cambridge University Press, 2023. da Silva, M. F., Dangel, F., and Oore, S. Hide& seek: Transformer symmetries obscure sharpness & riemannian geometry finds it,

  2. [2]

    Dinh, L., Pascanu, R., Bengio, S., and Bengio, Y

    URLhttps://arxiv.org/abs/2505.05409. Dinh, L., Pascanu, R., Bengio, S., and Bengio, Y . Sharp minima can generalize for deep nets, 2017. Draxler, F., Veschgini, K., Salmhofer, M., and Hamprecht, F. Es- sentially no barriers in neural network energy landscape. In Dy, J. and Krause, A. (eds.),Proceedings of the 35th International Conference on Machine Learn...

  3. [3]

    Editing Models with Task Arithmetic

    URL https://proceedings.mlr.press/v80/ draxler18a.html. Edelman, A., Arias, T. A., and Smith, S. T. The Geometry of Algorithms with Orthogonality Constraints. 1998. Garipov, T., Izmailov, P., Podoprikhin, D., Vetrov, D., and Wilson, A. G. Loss surfaces, mode connectivity, and fast ensembling of dnns, 2018. URL https://arxiv.org/abs/1802. 10026. Hu, E. J.,...

  4. [4]

    URLhttps://arxiv.org/abs/2212.09849. Lee, J. and Jung, S. Huber means on riemannian manifolds, 2025. URLhttps://arxiv.org/abs/2407.15764. Mataigne, S., Zimmermann, R., and Miolane, N. An efficient algorithm for the riemannian logarithm on the stiefel manifold for a family of riemannian metrics, 2024. URL https:// arxiv.org/abs/2403.11730. Matena, M. and R...

  5. [5]

    Pennec, X

    URLhttps://arxiv.org/abs/1209.0430. Pennec, X. Intrinsic statistics on riemannian manifolds: Basic tools for geometric measurements.Journal of Mathematical Imaging and Vision, 25:127–154, 2006. URL https://api. semanticscholar.org/CorpusID:653972. Pinele, J., Strapasson, J. E., and Costa, S. I. R. The fisher–rao distance between multivariate normal distri...

  6. [6]

    Wortsman, M., Ilharco, G., Yitzhak Gadre, S., Roelofs, R., Gontijo- Lopes, R., Morcos, A

    URLhttps://arxiv.org/abs/2509.01548. Wortsman, M., Ilharco, G., Yitzhak Gadre, S., Roelofs, R., Gontijo- Lopes, R., Morcos, A. S., Namkoong, H., Farhadi, A., Carmon, Y ., Kornblith, S., and Schmidt, L. Model soups: averaging weights of multiple fine-tuned models improves accuracy with- out increasing inference time.arXiv e-prints, 2022. Yadav, P., Tam, D....