pith. sign in

hub

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

27 Pith papers cite this work. Polarity classification is still indexing.

27 Pith papers citing it
abstract

Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and more than ten machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions. A comprehensive list of papers about model merging is available at https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications.

hub tools

citation-role summary

background 1 dataset 1

citation-polarity summary

years

2026 19 2025 8

representative citing papers

Dynamic Model Merging Made Slim

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.

Can Heterogeneous Language Models Be Fused?

cs.AI · 2026-04-02 · unverdicted · novelty 6.0

HeteroFusion fuses heterogeneous LLMs via topology-based alignment and conflict-aware denoising, outperforming merging and ensemble baselines in cross-family and multi-source settings.

Token-Level LLM Collaboration via FusionRoute

cs.AI · 2026-01-08 · unverdicted · novelty 6.0

FusionRoute augments token-level expert routing with a trainable complementary logit generator to expand the policy class and recover optimal decoding under mild conditions, outperforming prior collaboration and merging methods on reasoning and generation benchmarks.

TRINITY: An Evolved LLM Coordinator

cs.LG · 2025-12-04 · unverdicted · novelty 6.0

A compact 0.6B-parameter coordinator with a 10K-parameter head uses evolutionary strategy to dynamically delegate roles to LLMs, achieving SOTA results such as 86.2% on LiveCodeBench.

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

Differentially Private Model Merging

cs.LG · 2026-04-22 · unverdicted · novelty 5.0

Post-processing via random selection or linear combination of differentially private models allows meeting arbitrary target privacy parameters without additional training.

Domain-Adaptive Model Merging Across Disconnected Modes

cs.DC · 2026-03-06 · unverdicted · novelty 5.0

DMM merges highly divergent domain-specific models without data sharing by synthesizing pseudo-data from normalization statistics and distilling knowledge, achieving state-of-the-art performance on unimodal and multimodal benchmarks.

citing papers explorer

Showing 27 of 27 citing papers.