Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Introduction To Pooling Layers In CNN
Latest

Introduction To Pooling Layers In CNN

Last Updated on August 16, 2022 by Editorial Team

Author(s): Rafay Qayyum

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

A Convolutional neural network(CNN) is a special type of Artificial Neural Network that is usually used for image recognition and processing due to its ability to recognize patterns in images. It eliminates the need to extract features from visual data manually. It learns images by sliding a filter of some size on them and learning not just the features from the data but also keeps Translation invariance.

The typical structure of a CNN consists of three basicΒ layers

  1. Convolutional layer: These layers generate a feature map by sliding a filter over the input image and recognizing patterns inΒ images.
  2. Pooling layers: These layers downsample the feature map to introduce Translation invariance, which reduces the overfitting of the CNNΒ model.
  3. Fully Connected Dense Layer: This layer contains the same number of units as the number of classes and the output activation function such as β€œsoftmax” or β€œsigmoid”

What are PoolingΒ layers?

Pooling layers are one of the building blocks of Convolutional Neural Networks. Where Convolutional layers extract features from images, Pooling layers consolidate the features learned by CNNs. Its purpose is to gradually shrink the representation’s spatial dimension to minimize the number of parameters and computations in theΒ network.

Why are Pooling layersΒ needed?

The feature map produced by the filters of Convolutional layers is location-dependent. For example, If an object in an image has shifted a bit it might not be recognizable by the Convolutional layer. So, it means that the feature map records the precise positions of features in the input. What pooling layers provide is β€œTranslational Invariance” which makes the CNN invariant to translations, i.e., even if the input of the CNN is translated, the CNN will still be able to recognize the features in theΒ input.

In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not changeβ€Šβ€”β€ŠPage 342, Deep Learning by Ian Goodfellow, 2016.

How do Pooling layers achieve that? A Pooling layer is added after the Convolutional layer(s), as seen in the structure of a CNN above. It downsamples the output of the Convolutional layers by sliding the filter of some size with some stride size and calculating the maximum or average of theΒ input.

There are two types of poolings that areΒ used:

  1. Max pooling: This works by selecting the maximum value from every pool. Max Pooling retains the most prominent features of the feature map, and the returned image is sharper than the originalΒ image.
  2. Average pooling: This pooling layer works by getting the average of the pool. Average pooling retains the average values of features of the feature map. It smoothes the image while keeping the essence of the feature in anΒ image.
Image source

Let’s explore the working of Pooling Layers using TensorFlow. Create a NumPy array and reshapeΒ it.

matrix=np.array([[3.,2.,0.,0.],
[0.,7.,1.,3.],
[5.,2.,3.,0.],
[0.,9.,2.,3.]]).reshape(1,4,4,1)

Max Pooling

Create a MaxPool2D layer with pool_size=2 and strides=2. Apply the MaxPool2D layer to the matrix, and you will get the MaxPooled output in the tensor form. By applying it to the matrix, the Max pooling layer will go through the matrix by computing the max of each 2×2 pool with a jump of 2. Print the shape of the tensor. Use tf.squeeze to remove dimensions of size 1 from the shape of aΒ tensor.

max_pooling=tf.keras.layers.MaxPool2D(pool_size=2,strides=2)
max_pooled_matrix=max_pooling(matrix)
print(max_pooled_matrix.shape)
print(tf.squeeze(max_pooled_matrix))

Average Pooling

Create an AveragePooling2D layer with the same 2 pool_size and strides. Apply the AveragePooling2D layer to the matrix. By applying it to the matrix, the average pooling layer will go through the matrix by computing the average of 2×2 for each pool with a jump of 2. Print the shape of the matrix and Use tf.squeeze to convert the output into a readable form by removing all 1 size dimensions.

average_pooling=tf.keras.layers.AveragePooling2D(pool_size=2,
strides=2)
average_pooled_matrix=average_pooling(matrix)
print(averge_pooled_matrix.shape)
print(tf.squeeze(average_pooled_matrix))

The GIF here explains how these pooling layers go through the input matrix and computes the maximum or average for max pooling and average pooling, respectively.

Max Pooling and Average Pooling being performedβ€Šβ€”β€ŠSource

Global PoolingΒ Layers

Global Pooling Layers often replace the classifier’s fully connected or Flatten layer. The model instead ends with a convolutional layer that produces as many feature maps as there are target classes and performs global average pooling on each of the feature maps to combine each feature map into a singleΒ value.

Create the same NumPy array but with a different shape. By keeping the same shape as above, the Global Pooling layers will reduce them to oneΒ value.

matrix=np.array([[[3.,2.,0.,0.],
[0.,7.,1.,3.]],
[[5.,2.,3.,0.],
[0.,9.,2.,3.]]]).reshape(1,2,2,4)

Global AverageΒ Pooling

Considering a tensor of shape h*w*n, the output of the Global Average Pooling layer is a single value across h*w that summarizes the presence of the feature. Instead of downsizing the patches of the input feature map, the Global Average Pooling layer downsizes the whole h*w into 1 value by taking theΒ average.

global_average_pooling=tf.keras.layers.GlobalAveragePooling2D()
global_average_pooled_matrix=global_average_pooling(matrix)
print(global_average_pooled_matrix)
The output of the GlobalAveragePooled layer

Global MaxΒ Pooling

With the tensor of shape h*w*n, the output of the Global Max Pooling layer is a single value across h*w that summarizes the presence of a feature. Instead of downsizing the patches of the input feature map, the Global Max Pooling layer downsizes the whole h*w into 1 value by taking theΒ maximum.

global_max_pooling=tf.keras.layers.GlobalMaxPool2D()
global_max_pooled_matrix=global_max_pooling(matrix)
print(global_max_pooled_matrix)
The output of the GlobalMaxPooled layer

Conclusion

In general, pooling layers are useful when you want to detect an object in an image regardless of its position in the image. The consequence of adding pooling layers is the reduction of overfitting, increased efficiency, and faster training times in a CNN model. While the max pooling layer draws out the most prominent features of an image, average pooling smoothes the image retaining the essence of its features. Global pooling layers often replace the Flatten or Dense outputΒ layers.

Read Keras Pooling layers API and Chapter 5 of Deep Learning with Python by François Chollet for detailed information. Also, check CNN Explainer for an intuitive explanation of a CNN model.


Introduction To Pooling Layers In CNN was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓