pith. sign in

arxiv: 2507.00037 · v2 · pith:6TNMQRRAnew · submitted 2025-06-18 · 💻 cs.LG · cs.AI

Model Fusion via Retrofitting

classification 💻 cs.LG cs.AI
keywords fusionmodelacrossdataexistingmethodsnon-iidtrained
0
0 comments X
read the original abstract

Model fusion seeks to combine independently trained neural networks into a single model without retraining, but is complicated by representational divergence arising from permutation invariance, random initialization, and heterogeneous training data. Existing methods struggle particularly in zero-shot settings under non-IID data distributions, and are often limited to specific architectures or pairwise fusion. We introduce a neuron-centric family of fusion algorithms that frames fusion as a principled representation-matching problem: intermediate neurons across parent models are grouped into target representations, which the fused model's corresponding sub-networks are then trained to approximate. Unlike prior work, our approach incorporates neuron attribution scores to bias alignment toward salient features, and can be applied to any architecture modularizable as a DAG of levels -- empirically validated on VGGs, ResNets, and ViTs. Experiments across standard benchmarks show consistent improvements over existing fusion methods, with the largest gains in zero-shot and non-IID scenarios. Code is available at https://github.com/AndrewSpano/model-fusion-via-retrofitting.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Partial Fusion of Neural Networks: Efficient Tradeoffs Between Ensembles and Weight Aggregation

    cs.LG 2026-05 unverdicted novelty 7.0

    Partial fusion interpolates between neural network ensembles and weight aggregation by only fusing the most similar neurons identified via partial optimal transport, enabling flexible cost-performance tradeoffs.