GELU activation xΦ(x) outperforms ReLU and ELU on computer vision, NLP, and speech tasks by weighting inputs by value rather than gating by sign.
Adjusting for dropout variance in batch normalization and weight initialization
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2016 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Gaussian Error Linear Units (GELUs)
GELU activation xΦ(x) outperforms ReLU and ELU on computer vision, NLP, and speech tasks by weighting inputs by value rather than gating by sign.