Fine-tuning neural PDE operators to regime endpoints reveals a physical direction in weight space that CCM uses to compose accurate merged models for new or extrapolated regimes from metadata or short prefixes.
Adamerging: Adaptive model merging for multi-task learning
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
background 1polarities
background 1representative citing papers
Merging breaks MoE routing via softmax sensitivity; HARC uses Hessian curvature for closed-form router calibration that improves merged model performance without retraining.
CRANE applies magnitude thresholding, a Conservative Taylor Gate, and Graduated Sigmoidal Projection to the Thinking-Instruct delta to improve code agent pass rates on Roo-Eval, SWE-bench-Verified, and Terminal-Bench while preserving efficiency.
PivotMerge merges heterogeneous multimodal pre-trained models via shared-space decomposition to filter conflicts and layer-wise weights based on alignment contributions, outperforming baselines on multimodal benchmarks.
Empirical scaling laws for LLM merging show a size-dependent floor and 1/k-like tail in cross-entropy loss that holds across architectures and merging methods.
ReLoRA reduces time-to-readiness for LoRA adapters on updated LLMs by up to 8.9x through adaptive Bayesian initialization and scheduled regularization while improving accuracy by up to 4.6%.
M2A uses null-space model merging to combine mathematical and agentic reasoning in LLMs, raising SWE-Bench Verified performance from 44.0% to 51.2% on Qwen3-8B without retraining.
Post-processing via random selection or linear combination of differentially private models allows meeting arbitrary target privacy parameters without additional training.
RETROFIT enables continual learning for malware detection and binary summarization by retrospective-free parameter merging with low-rank sparse updates and confidence-guided arbitration, improving retention and generalization without historical data.
citing papers explorer
-
M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models
M2A uses null-space model merging to combine mathematical and agentic reasoning in LLMs, raising SWE-Bench Verified performance from 44.0% to 51.2% on Qwen3-8B without retraining.