LoopMoE is a looped MoE language model that outperforms matched vanilla MoE on 8 of 9 downstream benchmarks at 3B scale and continues to outperform at 9B scale under strictly controlled budgets.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.
Proposes a three-term scaling law for model size, training steps and batch size that recovers optimal batch size scaling and can be fitted using fewer runs by incorporating suboptimal batch sizes.
citing papers explorer
-
LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling
LoopMoE is a looped MoE language model that outperforms matched vanilla MoE on 8 of 9 downstream benchmarks at 3B scale and continues to outperform at 9B scale under strictly controlled budgets.
-
Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training
Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.
-
How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size
Proposes a three-term scaling law for model size, training steps and batch size that recovers optimal batch size scaling and can be fitted using fewer runs by incorporating suboptimal batch sizes.