Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Is GELU, the ReLU successor ?
Latest

Is GELU, the ReLU successor ?

Last Updated on August 30, 2022 by Editorial Team

Author(s): Poulinakis Kon

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Is GELU the ReLU Successor?

Photo by Willian B. on Unsplash

Can we combine regularization and activation functions? In 2016 a paper from authors Dan Hendrycks and Kevin Gimpel came out. Since then, the paper now has been updated 4 times. The authors introduced a new activation function, the Gaussian Error Linear Unit, GELU.

Demystifying GELU

The motivation behind GELU is to bridge stochastic regularizers, such as dropout, with non-linearities, i.e., activation functions.

Dropout regularization stochastically multiplies a neuron’s inputs with 0, randomly rendering them inactive. On the other hand, ReLU activation deterministically multiplies inputs with 0 or 1 dependent upon the input’s value.

GELU merges both functionalities by multiplying inputs by a value from 0 to 1. However, the value of this zero-one mask, while stochastically determined, is also dependent upon the input’s value.

Mathematically, GELU is formulated as :

Φ(x) is the cumulative distribution function (CDF) of the standard normal distribution. The choice of this function stems from the fact that neuron inputs tend to follow a normal distribution, especially when Batch Normalization is used. So, essentially GELU has a higher probability of dropping a neuron (multiplying by 0) while x decreases since P(X ≤ x) becomes smaller. Please take a moment to think about this and let it sink. So the transformation applied by GELU is stochastic, yet it depends upon the input’s value through Φ(x).

Figure 1: The Gaussian Error Linear Unit (μ=0, σ=1), the Rectified Linear Unit, and the Exponential Linear Unit (ELU) (α=1). Source [1]

Observe how GELU(x) starts from zero for small values of x since the CDF P(X≤x) is almost equal to 0. However, around the value of -2, P(X≤x) starts increasing. Hence we see GELU(x) deviating from zero. For the positive values, since P(X≤x) moves closer to a value of 1, GELU(x) starts approximating ReLU(x). In the figure below, the red line represents the CDF of the Standard Normal Distribution N(0,1) i.e., P(X≤x).

Figure 2: Cumulative Distribution functions for different Gaussian Distributions. Red line represents CDF of the Standard Normal N(0,1) . Source Wikipedia.

Approximations

GELU can also be approximated through the formulas

if greater feedforward speed is worth the cost of exactness.

Variations

The GELU can also be modified by using different CDFs. For example, if the Logistic Distribution CDF (x) is used, then we would get the Sigmoid Linear Unit (SiLU) x(x). Moreover, we could pick a CDF N(μ, σ) with μ and σ being learnable hyperparameters.

Advantages

The authors in [1], experimented with the use of GELU against ReLU and ELU activation functions in 3 different benchmark datasets covering the tasks of computer vision (CIFAR 10/100 classification), natural language processing (Twitter part of speech tagging), and audio phoneme recognition (TIMIT frame classification).

Throughout their experiments, they observed a consistent improvement in accuracy when using GELU compared to ReLU, and ELU. Analytically :

The table above presents the test error rate in 4 datasets. GELU consistently achieves the lowest test error rate, posing as a promising alternative to ReLU and ELU activations.

An Interesting Fact

The well-known paper “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale” that made Vision Transformers popular makes use of GELU activation inside the MLP of the encoder transformer block (section 3.1). This suggests that GELU is considered a good option by high-quality researchers.

REFERENCES

[1] Gaussian Error Linear Units (GELUs)

[2] https://en.wikipedia.org/wiki/Normal_distribution

[3] An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

Thanks for reading, feel free to reach out!

My Links: Medium | LinkedIn | GitHub


Is GELU, the ReLU successor ? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓