Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov

Nitish Srivastava, Geoﬀrey E · 1929

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

cs.LG · 2021-01-11 · accept · novelty 7.0

Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.

citing papers explorer

Showing 1 of 1 citing paper.

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity cs.LG · 2021-01-11 · accept · none · ref 32
Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.

Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov

fields

years

verdicts

representative citing papers

citing papers explorer