hub

Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

Yang, E · 2024 · arXiv 2408.07666

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Differentially Private Model Merging

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Post-processing via random selection or linear combination generates differentially private models for arbitrary privacy parameters from pre-trained models on the same dataset.

Understanding and Enforcing Weight Disentanglement in Task Arithmetic

cs.AI · 2026-04-18 · unverdicted · novelty 7.0

Task-Feature Specialization explains weight disentanglement in task arithmetic and leads to orthogonality, which OrthoReg enforces to enhance performance of model composition methods.

From OSS to Open Source AI: an Exploratory Study of Collaborative Development Paradigm Divergence

cs.SE · 2026-04-10 · conditional · novelty 7.0

Open source AI shows lower collaboration intensity, reduced direct contributions, and a shift toward adaptive use rather than joint improvement compared to traditional OSS.

Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem

cs.CL · 2026-04-16 · unverdicted · novelty 6.0

Treating retention as the dominant task and using constructive gradient synthesis like SAGO allows LLM unlearning to achieve higher general performance recovery without weakening the forgetting effect.

Zero-Shot Synthetic-to-Real Handwritten Text Recognition via Task Analogies

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

A method learns synthetic-to-real parameter corrections from source languages and transfers them to target languages without any real target data, improving HTR across five languages and six models.

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

ORBIT preserves foundational language capabilities during generative retrieval fine-tuning by using origin-regulated weight averaging to constrain parameter drift beyond a distance threshold.

Black-Box Optimization of Mixed Binary-Continuous Variables: Challenges and Opportunities in Evolutionary Model Merging

cs.NE · 2026-05-12 · unverdicted · novelty 5.0

Data flow space model merging is formalized as a mixed binary-continuous black-box optimization problem, where a structured approach respecting variable dependencies achieves 6.7% higher accuracy and 51.4% smaller search space than unstructured methods on real language models.

Can Continual Pre-training Bridge the Performance Gap between General-purpose and Specialized Language Models in the Medical Domain?

cs.CL · 2026-04-21 · unverdicted · novelty 5.0

Continual pre-training on a German medical corpus lets 7B models close much of the performance gap with 24B general models on medical benchmarks, though merging introduces some language mixing and verbosity.

World Simulation with Video Foundation Models for Physical AI

cs.CV · 2025-10-28 · unverdicted · novelty 4.0

Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.

citing papers explorer

Showing 10 of 10 citing papers.

Differentially Private Model Merging cs.LG · 2026-04-22 · unverdicted · none · ref 7
Post-processing via random selection or linear combination generates differentially private models for arbitrary privacy parameters from pre-trained models on the same dataset.
Understanding and Enforcing Weight Disentanglement in Task Arithmetic cs.AI · 2026-04-18 · unverdicted · none · ref 47
Task-Feature Specialization explains weight disentanglement in task arithmetic and leads to orthogonality, which OrthoReg enforces to enhance performance of model composition methods.
From OSS to Open Source AI: an Exploratory Study of Collaborative Development Paradigm Divergence cs.SE · 2026-04-10 · conditional · none · ref 135
Open source AI shows lower collaboration intensity, reduced direct contributions, and a shift toward adaptive use rather than joint improvement compared to traditional OSS.
Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem cs.CL · 2026-04-16 · unverdicted · none · ref 2
Treating retention as the dominant task and using constructive gradient synthesis like SAGO allows LLM unlearning to achieve higher general performance recovery without weakening the forgetting effect.
Zero-Shot Synthetic-to-Real Handwritten Text Recognition via Task Analogies cs.CV · 2026-04-08 · unverdicted · none · ref 76
A method learns synthetic-to-real parameter corrections from source languages and transfers them to target languages without any real target data, improving HTR across five languages and six models.
Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 66
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging cs.CL · 2026-05-12 · unverdicted · none · ref 24
ORBIT preserves foundational language capabilities during generative retrieval fine-tuning by using origin-regulated weight averaging to constrain parameter drift beyond a distance threshold.
Black-Box Optimization of Mixed Binary-Continuous Variables: Challenges and Opportunities in Evolutionary Model Merging cs.NE · 2026-05-12 · unverdicted · none · ref 3
Data flow space model merging is formalized as a mixed binary-continuous black-box optimization problem, where a structured approach respecting variable dependencies achieves 6.7% higher accuracy and 51.4% smaller search space than unstructured methods on real language models.
Can Continual Pre-training Bridge the Performance Gap between General-purpose and Specialized Language Models in the Medical Domain? cs.CL · 2026-04-21 · unverdicted · none · ref 21
Continual pre-training on a German medical corpus lets 7B models close much of the performance gap with 24B general models on medical benchmarks, though merging introduces some language mixing and verbosity.
World Simulation with Video Foundation Models for Physical AI cs.CV · 2025-10-28 · unverdicted · none · ref 89
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.

Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer