Rae, Anna Potapenko, Siddhant M

Jack W · 2020

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

cs.LG · 2020-09-30 · unverdicted · novelty 7.0

Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

cs.CL · 2021-08-27 · unverdicted · novelty 6.0

ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.

citing papers explorer

Showing 2 of 2 citing papers.

Rethinking Attention with Performers cs.LG · 2020-09-30 · unverdicted · none · ref 143
Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation cs.CL · 2021-08-27 · unverdicted · none · ref 32
ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.

Rae, Anna Potapenko, Siddhant M

fields

years

verdicts

representative citing papers

citing papers explorer