pith:OZUPR4BV
$\phi$-Balancing for Mixture-of-Experts Training
Mixture-of-experts models achieve population-level expert balance by minimizing a strictly convex potential of the expected routing distribution.
arxiv:2605.15403 v1 · 2026-05-14 · cs.LG · math.OC · stat.ML
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OZUPR4BVTWDMYSSOE6XPJ2ALWW}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
Across large-scale pretraining and downstream fine-tuning, φ-balancing consistently outperforms prior Switch-style and loss-free baselines, demonstrating more stable and effective expert utilization.
That minimizing the chosen strictly convex potential of the expected routing distribution produces the desired population-level balance and that the EMA-based online approximation via mirror descent faithfully tracks the population objective without introducing new bias (abstract, paragraph on derivation).
φ-balancing is a convex optimization method for population-level expert balance in MoE training that derives an online EMA adjustment and outperforms heuristic baselines.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:00:56.875852Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
7668f8f0359d86cc4a4e27aef4e80bb5ba698b537c1265cabd5e7de8bbcb095f
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OZUPR4BVTWDMYSSOE6XPJ2ALWW \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7668f8f0359d86cc4a4e27aef4e80bb5ba698b537c1265cabd5e7de8bbcb095f
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "4fc87225d0bf537713fef270d3fb111cfc24792dc12e6e3522fcc78d008b868f",
"cross_cats_sorted": [
"math.OC",
"stat.ML"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-14T20:39:28Z",
"title_canon_sha256": "3562498ca70109c3ba26eca3a5d2c962940053ae63d5d47f750bc534399aff91"
},
"schema_version": "1.0",
"source": {
"id": "2605.15403",
"kind": "arxiv",
"version": 1
}
}