Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

Bo Zhao; Emanuele Rodol\`a; Luca Zhou; Rose Yu

arxiv: 2601.22285 · v8 · pith:G5PHVFIDnew · submitted 2026-01-29 · 💻 cs.LG

Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

Luca Zhou , Bo Zhao , Rose Yu , Emanuele Rodol\`a This is my paper

Pith reviewed 2026-05-21 13:34 UTC · model grok-4.3

classification 💻 cs.LG

keywords model mergingmergeabilitygradient alignmentinterpretable metricsfine-tuned modelsmulti-task modelsmodel compatibility

0 comments

The pith

Gradient alignment metrics are the most reliable predictors of success when merging separately fine-tuned models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks what determines whether two fine-tuned models can be combined into one that retains accuracy on both original tasks. It builds an architecture-agnostic framework that measures a collection of pairwise properties, such as gradient distances, and uses L1-regularized linear regression to find which properties best track post-merge performance. Across five different merging methods the authors observe that success drivers differ by model architecture and by method, yet gradient alignment metrics stand out as the most consistent signals of compatibility. The work reframes mergeability as something that depends on the specific merging technique and the pair of tasks rather than an intrinsic trait of the models alone. These results supply a practical diagnostic for choosing which models to merge and point toward training procedures that could improve merge outcomes.

Core claim

Mergeability is not an intrinsic property of individual models but depends on the chosen merging method and the partner tasks. By regressing normalized post-merge accuracy on a set of interpretable pairwise metrics with L1 regularization, the authors show that gradient alignment metrics emerge as the most fundamental and consistent signals of compatibility across methods, even though individual methods exhibit distinct patterns of which metrics matter most.

What carries the argument

A collection of interpretable pairwise metrics, including gradient L2 distance, fed into L1-regularized linear optimization to rank their correlation with post-merge normalized accuracy.

If this is right

Success drivers vary by architecture and merging method, with only 64 percent average overlap in the top five metrics.
Methods such as TIES display distinct metric fingerprints that differ from the broader consensus.
Gradient alignment provides a stable diagnostic signal that can be checked before attempting a merge.
The framework supplies a foundation for designing merge-aware fine-tuning procedures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Fine-tuning algorithms could be modified to explicitly encourage gradient alignment between models intended for later merging.
The same metrics might help rank candidate models for merging without running the full merge operation first.
The observed consistency of gradient signals could be tested on larger-scale models or on tasks outside the current image and language domains.
If gradient alignment proves central, it may also explain compatibility limits in related operations such as model ensembling or knowledge distillation.

Load-bearing premise

The chosen set of pairwise metrics and the L1-regularized linear model capture the actual drivers of merge success rather than merely fitting correlations in the studied cases.

What would settle it

Finding a new collection of models or merging methods in which gradient alignment metrics show no or negative correlation with post-merge accuracy while other metrics dominate.

read the original abstract

Model merging combines knowledge from separately fine-tuned models, yet the factors driving its success remain poorly understood. While recent work treats mergeability as an intrinsic property of the models, we show with an architecture-agnostic framework that it fundamentally depends on both the merging method and the partner tasks. Using L1-regularized linear optimization over a set of interpretable pairwise metrics (e.g., gradient L_2 distance), we uncover properties correlating with post-merge normalized accuracy across five merging methods. We find architecture- and method-specific variation in success drivers (64.0% average top-5 metric overlap; 79.3% sign agreement), with certain methods, notably TIES, exhibiting distinct ``fingerprints'' that diverge from the broader consensus. Crucially, however, gradient alignment metrics consistently emerge as the most fundamental signals of compatibility. These findings provide a diagnostic foundation for understanding mergeability and motivate future merge-aware fine-tuning strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gradient alignment metrics predict merge success via L1 regression on pairwise stats, with method-specific patterns, but the evidence is correlational and may reflect proxies rather than fundamentals.

read the letter

Colleague, the one or two things to know: this paper shows that gradient alignment metrics are the most consistent predictors of successful model merges using L1-regularized regression on pairwise properties, and that different merging methods have their own patterns in what matters. They do something new by creating a framework that works across architectures and applies established regression to predict merge accuracy from metrics like gradient distances. This extends the merging literature by focusing on diagnostics instead of new algorithms. It does well in demonstrating that success depends on the method and tasks, not just the models, and in highlighting things like TIES having a distinct fingerprint. The reported overlap and agreement stats give a concrete sense of the variation. The soft spots are that the central findings are correlations from regression fits, and the abstract lacks specifics on data scale or validation, which tempers how strongly we can take the 'fundamental' claim for gradient alignment. If the metrics are not orthogonal, the L1 could be selecting proxies for task similarity rather than true drivers, as the stress-test suggests. The 64% top-5 overlap indicates the results are not fully stable across setups. This is for people doing practical model merging or studying efficient ways to reuse fine-tuned models. A reader interested in applied ML techniques would get value from the predictive framework, even if they adapt the metrics. It deserves a serious referee because the approach is clear and the topic is relevant to current practices in large model reuse. I'd recommend sending it to peer review, with notes to strengthen the experimental details and check for multicollinearity in the metrics.

Referee Report

2 major / 1 minor

Summary. The paper presents an architecture-agnostic framework showing that model mergeability depends on the merging method and partner tasks. Through L1-regularized linear optimization on interpretable pairwise metrics (e.g., gradient L_2 distance), it identifies correlations with post-merge normalized accuracy for five methods. Key results include 64.0% average top-5 metric overlap and 79.3% sign agreement across methods, with gradient alignment metrics emerging as the most fundamental signals despite architecture- and method-specific variations.

Significance. Should the findings prove robust, this work contributes to the field by offering interpretable diagnostics for merge success. It highlights the role of gradient alignment and method-specific fingerprints, which could guide the development of better merging techniques and fine-tuning strategies. The empirical regression approach is a positive step toward understanding mergeability beyond black-box observations.

major comments (2)

The claim in the abstract that gradient alignment metrics 'consistently emerge as the most fundamental signals of compatibility' is not fully supported by the reported L1-regularized regression results. The 64.0% top-5 overlap and architecture/method-specific variation (including distinct TIES fingerprints) indicate that these metrics may act as correlated proxies for task similarity; without orthogonality checks or ablation of the metric pool, the high weights do not establish mechanistic drivers.
The central empirical claims rely on aggregate statistics from L1 fits, yet the manuscript provides no details on dataset size, number of pairwise comparisons, controls for confounding variables, or validation procedures (e.g., cross-validation for the regularization coefficient). This directly affects the reliability of the identified predictors and the conclusion that gradient alignment is fundamental.

minor comments (1)

Clarify the exact definition and computation of 'normalized accuracy' and 'post-merge normalized accuracy' when introducing the regression target to ensure reproducibility across the five methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas where the manuscript can be strengthened. We respond to each major comment below and commit to revisions that enhance transparency and address interpretability concerns while preserving the integrity of our reported findings.

read point-by-point responses

Referee: The claim in the abstract that gradient alignment metrics 'consistently emerge as the most fundamental signals of compatibility' is not fully supported by the reported L1-regularized regression results. The 64.0% top-5 overlap and architecture/method-specific variation (including distinct TIES fingerprints) indicate that these metrics may act as correlated proxies for task similarity; without orthogonality checks or ablation of the metric pool, the high weights do not establish mechanistic drivers.

Authors: We appreciate the referee's careful reading and the distinction drawn between correlational importance and mechanistic drivers. The L1-regularized regressions were performed independently per merging method and architecture, with gradient alignment metrics (such as gradient L2 distance) receiving the highest absolute coefficients in the majority of fits; this pattern, together with the 79.3% sign agreement across methods, is the basis for the abstract statement. We already emphasize the 64.0% top-5 overlap and the distinct TIES fingerprint as evidence of method-specific variation. Nevertheless, we agree that the current results do not include explicit checks for multicollinearity or ablation of the metric set. In the revision we will add (i) a correlation matrix among all candidate metrics and (ii) an ablation experiment that removes the gradient-alignment features and reports the resulting drop in R^2 and predictor stability. These additions will allow us to qualify the language in the abstract and discussion more precisely. revision: yes
Referee: The central empirical claims rely on aggregate statistics from L1 fits, yet the manuscript provides no details on dataset size, number of pairwise comparisons, controls for confounding variables, or validation procedures (e.g., cross-validation for the regularization coefficient). This directly affects the reliability of the identified predictors and the conclusion that gradient alignment is fundamental.

Authors: We acknowledge that the manuscript draft omits several key experimental details required for full reproducibility and assessment of statistical reliability. The underlying data consist of all pairwise combinations of fine-tuned models drawn from the benchmark tasks and architectures described in Section 4; however, the exact count of these pairs, any explicit controls for task similarity, and the procedure used to select the L1 regularization strength are not stated. In the revised manuscript we will insert a new subsection (likely in Section 3 or 4) that reports: the total number of pairwise comparisons, the cross-validation scheme (including the number of folds and the grid or search method for the regularization coefficient), and any steps taken to mitigate confounding factors such as task difficulty or model size. These additions will directly support the reliability of the aggregate statistics and the prominence of gradient-alignment predictors. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical regression on independent metrics

full rationale

The paper computes pairwise metrics such as gradient L2 distance directly from model parameters and gradients, then fits an L1-regularized linear model to observed post-merge normalized accuracy values obtained from separate merging experiments. The claim that gradient alignment metrics are the most fundamental signals follows from the resulting regression coefficients and overlap statistics across methods, without any step that defines merge success in terms of the metrics or renames a fitted parameter as an independent prediction. The derivation relies on external empirical measurements and standard regression techniques rather than reducing to self-definition or self-citation chains.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard statistical assumptions for linear regression and the sufficiency of the chosen pairwise metrics; no new physical entities are postulated.

free parameters (1)

L1 regularization coefficient
The strength of the L1 penalty is a tunable hyperparameter whose value affects which metrics are selected as predictors.

axioms (1)

domain assumption A linear relationship exists between the selected pairwise metrics and post-merge normalized accuracy
The use of linear optimization presupposes that the relationship can be approximated linearly.

pith-pipeline@v0.9.0 · 5694 in / 1125 out tokens · 47071 ms · 2026-05-21T13:34:56.697116+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

gradient alignment metrics consistently emerge as the most fundamental signals of compatibility... L1-regularized linear optimization over a set of interpretable pairwise metrics (e.g., gradient L2 distance)
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

subspace overlap and gradient alignment metrics consistently exhibit the highest normalized contribution

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
cs.LG 2026-05 unverdicted novelty 7.0

Low-rank pre-training methods converge to geometrically and spectrally distinct basins from full-rank training and from each other, even at similar validation perplexity.
Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
cs.LG 2026-05 unverdicted novelty 7.0

Low-rank pre-training methods converge to geometrically and spectrally distinct basins and show diverging activations compared to full-rank training at 60M-350M scales.
Model Merging: Foundations and Algorithms
cs.LG 2026-05 unverdicted novelty 6.0

New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.