Post-processing via random selection or linear combination generates differentially private models for arbitrary privacy parameters from pre-trained models on the same dataset.
hub
Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
Task-Feature Specialization explains weight disentanglement in task arithmetic and leads to orthogonality, which OrthoReg enforces to enhance performance of model composition methods.
Open source AI shows lower collaboration intensity, reduced direct contributions, and a shift toward adaptive use rather than joint improvement compared to traditional OSS.
Treating retention as the dominant task and using constructive gradient synthesis like SAGO allows LLM unlearning to achieve higher general performance recovery without weakening the forgetting effect.
A method learns synthetic-to-real parameter corrections from source languages and transfers them to target languages without any real target data, improving HTR across five languages and six models.
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
ORBIT preserves foundational language capabilities during generative retrieval fine-tuning by using origin-regulated weight averaging to constrain parameter drift beyond a distance threshold.
Data flow space model merging is formalized as a mixed binary-continuous black-box optimization problem, where a structured approach respecting variable dependencies achieves 6.7% higher accuracy and 51.4% smaller search space than unstructured methods on real language models.
Continual pre-training on a German medical corpus lets 7B models close much of the performance gap with 24B general models on medical benchmarks, though merging introduces some language mixing and verbosity.
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.
citing papers explorer
-
Differentially Private Model Merging
Post-processing via random selection or linear combination generates differentially private models for arbitrary privacy parameters from pre-trained models on the same dataset.
-
Understanding and Enforcing Weight Disentanglement in Task Arithmetic
Task-Feature Specialization explains weight disentanglement in task arithmetic and leads to orthogonality, which OrthoReg enforces to enhance performance of model composition methods.
-
From OSS to Open Source AI: an Exploratory Study of Collaborative Development Paradigm Divergence
Open source AI shows lower collaboration intensity, reduced direct contributions, and a shift toward adaptive use rather than joint improvement compared to traditional OSS.
-
Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem
Treating retention as the dominant task and using constructive gradient synthesis like SAGO allows LLM unlearning to achieve higher general performance recovery without weakening the forgetting effect.
-
Zero-Shot Synthetic-to-Real Handwritten Text Recognition via Task Analogies
A method learns synthetic-to-real parameter corrections from source languages and transfers them to target languages without any real target data, improving HTR across five languages and six models.
-
Muon is Scalable for LLM Training
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
-
ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging
ORBIT preserves foundational language capabilities during generative retrieval fine-tuning by using origin-regulated weight averaging to constrain parameter drift beyond a distance threshold.
-
Black-Box Optimization of Mixed Binary-Continuous Variables: Challenges and Opportunities in Evolutionary Model Merging
Data flow space model merging is formalized as a mixed binary-continuous black-box optimization problem, where a structured approach respecting variable dependencies achieves 6.7% higher accuracy and 51.4% smaller search space than unstructured methods on real language models.
-
Can Continual Pre-training Bridge the Performance Gap between General-purpose and Specialized Language Models in the Medical Domain?
Continual pre-training on a German medical corpus lets 7B models close much of the performance gap with 24B general models on medical benchmarks, though merging introduces some language mixing and verbosity.
-
World Simulation with Video Foundation Models for Physical AI
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.