Fine-tuning neural PDE operators to regime endpoints reveals a physical direction in weight space that CCM uses to compose accurate merged models for new or extrapolated regimes from metadata or short prefixes.
Adamerging: Adaptive model merging for multi-task learning
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
background 1polarities
background 1representative citing papers
Merging breaks MoE routing via softmax sensitivity; HARC uses Hessian curvature for closed-form router calibration that improves merged model performance without retraining.
CRANE applies magnitude thresholding, a Conservative Taylor Gate, and Graduated Sigmoidal Projection to the Thinking-Instruct delta to improve code agent pass rates on Roo-Eval, SWE-bench-Verified, and Terminal-Bench while preserving efficiency.
PivotMerge merges heterogeneous multimodal pre-trained models via shared-space decomposition to filter conflicts and layer-wise weights based on alignment contributions, outperforming baselines on multimodal benchmarks.
Empirical scaling laws for LLM merging show a size-dependent floor and 1/k-like tail in cross-entropy loss that holds across architectures and merging methods.
ReLoRA reduces time-to-readiness for LoRA adapters on updated LLMs by up to 8.9x through adaptive Bayesian initialization and scheduled regularization while improving accuracy by up to 4.6%.
M2A uses null-space model merging to combine mathematical and agentic reasoning in LLMs, raising SWE-Bench Verified performance from 44.0% to 51.2% on Qwen3-8B without retraining.
Post-processing via random selection or linear combination of differentially private models allows meeting arbitrary target privacy parameters without additional training.
RETROFIT enables continual learning for malware detection and binary summarization by retrospective-free parameter merging with low-rank sparse updates and confidence-guided arbitration, improving retention and generalization without historical data.
citing papers explorer
-
Discovering Physical Directions in Weight Space: Composing Neural PDE Experts
Fine-tuning neural PDE operators to regime endpoints reveals a physical direction in weight space that CCM uses to compose accurate merged models for new or extrapolated regimes from metadata or short prefixes.
-
When Model Merging Breaks Routing: Training-Free Calibration for MoE
Merging breaks MoE routing via softmax sensitivity; HARC uses Hessian curvature for closed-form router calibration that improves merged model performance without retraining.
-
ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services
ReLoRA reduces time-to-readiness for LoRA adapters on updated LLMs by up to 8.9x through adaptive Bayesian initialization and scheduled regularization while improving accuracy by up to 4.6%.
-
Differentially Private Model Merging
Post-processing via random selection or linear combination of differentially private models allows meeting arbitrary target privacy parameters without additional training.
-
Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis
RETROFIT enables continual learning for malware detection and binary summarization by retrospective-free parameter merging with low-rank sparse updates and confidence-guided arbitration, improving retention and generalization without historical data.