Expert upcycling duplicates experts in an existing MoE checkpoint and continues pre-training to match fixed-size baseline performance with 32% less compute.
Reuse, don't retrain: A recipe for continued pretraining of language models
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5representative citing papers
Opt-Laws predicts LLM final training loss from LR schedules via SDE-derived convergence and escape features, with 94% Top-2 hit rate on held-out schedules and F1=0.92 for divergence detection.
WildFeedback extracts preference pairs from in-situ user feedback in LLM conversations to fine-tune models for better alignment with real user preferences.
Domain-adapted LLMs and SLMs do not consistently outperform general models on STRIDE threat classification for 5G, with decoding strategies and model scale affecting validity but gains remaining insufficient for reliable use.
Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.
citing papers explorer
-
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
Expert upcycling duplicates experts in an existing MoE checkpoint and continues pre-training to match fixed-size baseline performance with 32% less compute.
-
Optimization Hyper-parameter Laws for Large Language Models
Opt-Laws predicts LLM final training loss from LR schedules via SDE-derived convergence and escape features, with 94% Top-2 hit rate on held-out schedules and F1=0.92 for divergence detection.
-
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
WildFeedback extracts preference pairs from in-situ user feedback in LLM conversations to fine-tune models for better alignment with real user preferences.
-
Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights
Domain-adapted LLMs and SLMs do not consistently outperform general models on STRIDE threat classification for 5G, with decoding strategies and model scale affecting validity but gains remaining insufficient for reliable use.
-
Phoenix-VL 1.5 Medium Technical Report
Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.