pith:AXFU3DUT
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Transformer language models can learn to dynamically allocate compute to select tokens at each layer.
arxiv:2404.02258 v1 · 2024-04-02 · cs.LG · cs.CL
Record completeness
Claims
Not only do models trained in this way learn to dynamically allocate compute, they do so efficiently. These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50% faster to step during post-training sampling.
The assumption that a learned top-k router can reliably identify which tokens merit full processing at each layer without degrading overall model capacity or introducing training instabilities, and that this holds across model scales and tasks.
Mixture-of-Depths enables transformers to dynamically allocate compute by routing only the top-k tokens through each layer's full computations, matching baseline performance with a fraction of the FLOPs per forward pass and up to 50% faster sampling.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:15.410613Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
05cb4d8e938fe4fa196f57bdfa4ea8c7598877576765f04649a8f2cca081e4fa
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/AXFU3DUTR7SPUGLPK667UTVIY5 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 05cb4d8e938fe4fa196f57bdfa4ea8c7598877576765f04649a8f2cca081e4fa
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "20196f2ef34c9194cb315bc7bc2d6b7e36cc76b23f47901f1dbbbc05b493a687",
"cross_cats_sorted": [
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2024-04-02T19:28:11Z",
"title_canon_sha256": "ec4a74a99a0bfb9a2c6146a1ee7e2a6ddcd656c28232813ecf9f44bf5d9f3bf9"
},
"schema_version": "1.0",
"source": {
"id": "2404.02258",
"kind": "arxiv",
"version": 1
}
}