PARAMΔ upcycles dense models to MoE for per-language experts and grafts post-training deltas to enable data-efficient language expansion while preserving original capabilities.
Unlocking efficient long-to-short llm reasoning with model merging.arXiv preprint arXiv:2503.20641
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
polarities
background 2representative citing papers
TokenHD uses a scalable data synthesis engine and importance-weighted training to create token-level hallucination detectors that work on free-form text and scale from 0.6B to 8B parameters, outperforming larger reasoning models.
ASTOR improves a single code LLM across four tasks by 9.0-9.5% over the best specialist and 7.5-12.8% over prior multi-task RL baselines via utility-driven data scheduling and adaptive KL regularization.
CoT compression frequently introduces trustworthiness regressions with method-specific degradation profiles; a proposed normalized efficiency score and alignment-aware DPO variant reduce length by 19.3% with smaller trustworthiness loss.
The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.
citing papers explorer
-
A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE
PARAMΔ upcycles dense models to MoE for per-language experts and grafts post-training deltas to enable data-efficient language expansion while preserving original capabilities.
-
Scalable Token-Level Hallucination Detection in Large Language Models
TokenHD uses a scalable data synthesis engine and importance-weighted training to create token-level hallucination detectors that work on free-form text and scale from 0.6B to 8B parameters, outperforming larger reasoning models.
-
Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs
ASTOR improves a single code LLM across four tasks by 9.0-9.5% over the best specialist and 7.5-12.8% over prior multi-task RL baselines via utility-driven data scheduling and adaptive KL regularization.
-
Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression
CoT compression frequently introduces trustworthiness regressions with method-specific degradation profiles; a proposed normalized efficiency score and alignment-aware DPO variant reduce length by 19.3% with smaller trustworthiness loss.
-
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.