pith. sign in

arxiv: 2405.07813 · v1 · pith:J5R52NFAnew · submitted 2024-05-13 · 💻 cs.LG · cs.CV

Localizing Task Information for Improved Model Merging and Compression

classification 💻 cs.LG cs.CV
keywords taskmergingmodelapproachesmulti-taskperformancetasksweights
0
0 comments X
read the original abstract

Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have linked these drops to interference in the weight space and erasure of important task-specific features. Instead, in this work we show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights. We propose TALL-masks, a method to identify these task supports given a collection of task vectors and show that one can retrieve >99% of the single task accuracy by applying our masks to the multi-task vector, effectively compressing the individual checkpoints. We study the statistics of intersections among constructed masks and reveal the existence of selfish and catastrophic weights, i.e., parameters that are important exclusively to one task and irrelevant to all tasks but detrimental to multi-task fusion. For this reason, we propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches. Our experiments in vision and NLP benchmarks with up to 20 tasks, show that Consensus Merging consistently improves existing approaches. Furthermore, our proposed compression scheme reduces storage from 57Gb to 8.2Gb while retaining 99.7% of original performance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PACT: Preserving Anchored Cores in Task-vectors for Model Merging

    cs.LG 2026-06 unverdicted novelty 6.0

    PACT preserves load-bearing wall dimensions from pre-trained weights inside task vectors to reduce conflicts and improve merged model performance.

  2. Dynamic Model Merging Made Slim

    cs.LG 2026-05 unverdicted novelty 6.0

    DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.

  3. SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention

    cs.AI 2025-06 unverdicted novelty 6.0

    SEAT preserves epistemic abstention in LLMs during knowledge adaptation via sparse tuning and entity-perturbed KL regularization, yielding 18-101% better abstention on unknown queries while retaining near-perfect know...

  4. Robust Zero-Shot Generalization for Open-Vocabulary Action Recognition via Task Arithmetic

    cs.CV 2026-06 unverdicted novelty 5.0

    Merging task vectors extracted from fine-tuned OVAR models yields superior zero-shot generalization in out-of-distribution settings compared to the pre-trained base model.