A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.
citation dossier
Le, Oriol Vinyals, and Wojciech Zaremba
1Pith papers citing it
1reference links
cs.LGtop field · 1 papers
ACCEPTtop verdict bucket · 1 papers
why this work matters in Pith
Pith has found this work in 1 reviewed paper. Its strongest current cluster is cs.LG (1 papers). The largest review-status bucket among citing papers is ACCEPT (1 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.
fields
cs.LG 1years
2017 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.