φ-balancing is a convex optimization method for population-level expert balance in MoE training that derives an online EMA adjustment and outperforms heuristic baselines.
Latent prototype routing: Achieving near-perfect load balancing in mixture-of-experts
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Routers in SMoE models form geometric alignments with their experts through shared gradient directions, enabling effective specialization that auxiliary load-balancing losses tend to disrupt.
DODOCO measurements show MoE routing imbalance is intrinsic to architecture and real text, not correctable by EP scaling or represented by mock tokens, forming two persistent Gini bands.
citing papers explorer
No citing papers match the current filters.