Standard top-k routers in MoE language models often select suboptimal routes for difficult tokens, and updating only the final router layer raises pass@K on AIME and HMMT benchmarks across multiple models.
Advances in Neural Information Processing Systems , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
DMoA is a differentiable multi-agent framework for LLMs that uses recurrent context-aware routing and predictive entropy for test-time adaptation, claiming SOTA results on 9 benchmarks with efficiency and robustness.
LLM serving requires mathematical optimization and algorithms with provable guarantees rather than generic heuristics that fail unpredictably on LLM workloads.
citing papers explorer
-
When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models
Standard top-k routers in MoE language models often select suboptimal routes for difficult tokens, and updating only the final router layer raises pass@K on AIME and HMMT benchmarks across multiple models.
-
Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models
DMoA is a differentiable multi-agent framework for LLMs that uses recurrent context-aware routing and predictive entropy for test-time adaptation, claiming SOTA results on 9 benchmarks with efficiency and robustness.
-
Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics
LLM serving requires mathematical optimization and algorithms with provable guarantees rather than generic heuristics that fail unpredictably on LLM workloads.