Demystifying GELU
Python Code for GELU activation function

In this tutorial we aim to comprehensively explain how Gaussian Error Linear Unit, GELU activation works.

Can we combine regularization and activation functions? In 2016 a paper from authors Dan Hendrycks and Kevin Gimpel came out. Since then, the paper now has been updated 4 times. The authors introduced a new activation function, the Gaussian Error Linear Unit, GELU.

The motivation behind GELU activation is to bridge stochastic regularizers, such as dropout, with non-linearities, i.e., activation functions.

Dropout regularization stochastically multiplies a neuron’s inputs with 0, randomly… Read the full blog for free on Medium.

