Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Introduction To Pooling Layers In CNN
Latest

Introduction To Pooling Layers In CNN

Last Updated on August 16, 2022 by Editorial Team

Author(s): Rafay Qayyum

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

A Convolutional neural network(CNN) is a special type of Artificial Neural Network that is usually used for image recognition and processing due to its ability to recognize patterns in images. It eliminates the need to extract features from visual data manually. It learns images by sliding a filter of some size on them and learning not just the features from the data but also keeps Translation invariance.

The typical structure of a CNN consists of three basic layers

  1. Convolutional layer: These layers generate a feature map by sliding a filter over the input image and recognizing patterns in images.
  2. Pooling layers: These layers downsample the feature map to introduce Translation invariance, which reduces the overfitting of the CNN model.
  3. Fully Connected Dense Layer: This layer contains the same number of units as the number of classes and the output activation function such as “softmax” or “sigmoid”

What are Pooling layers?

Pooling layers are one of the building blocks of Convolutional Neural Networks. Where Convolutional layers extract features from images, Pooling layers consolidate the features learned by CNNs. Its purpose is to gradually shrink the representation’s spatial dimension to minimize the number of parameters and computations in the network.

Why are Pooling layers needed?

The feature map produced by the filters of Convolutional layers is location-dependent. For example, If an object in an image has shifted a bit it might not be recognizable by the Convolutional layer. So, it means that the feature map records the precise positions of features in the input. What pooling layers provide is “Translational Invariance” which makes the CNN invariant to translations, i.e., even if the input of the CNN is translated, the CNN will still be able to recognize the features in the input.

In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not change — Page 342, Deep Learning by Ian Goodfellow, 2016.

How do Pooling layers achieve that? A Pooling layer is added after the Convolutional layer(s), as seen in the structure of a CNN above. It downsamples the output of the Convolutional layers by sliding the filter of some size with some stride size and calculating the maximum or average of the input.

There are two types of poolings that are used:

  1. Max pooling: This works by selecting the maximum value from every pool. Max Pooling retains the most prominent features of the feature map, and the returned image is sharper than the original image.
  2. Average pooling: This pooling layer works by getting the average of the pool. Average pooling retains the average values of features of the feature map. It smoothes the image while keeping the essence of the feature in an image.
Image source

Let’s explore the working of Pooling Layers using TensorFlow. Create a NumPy array and reshape it.

matrix=np.array([[3.,2.,0.,0.],
[0.,7.,1.,3.],
[5.,2.,3.,0.],
[0.,9.,2.,3.]]).reshape(1,4,4,1)

Max Pooling

Create a MaxPool2D layer with pool_size=2 and strides=2. Apply the MaxPool2D layer to the matrix, and you will get the MaxPooled output in the tensor form. By applying it to the matrix, the Max pooling layer will go through the matrix by computing the max of each 2×2 pool with a jump of 2. Print the shape of the tensor. Use tf.squeeze to remove dimensions of size 1 from the shape of a tensor.

max_pooling=tf.keras.layers.MaxPool2D(pool_size=2,strides=2)
max_pooled_matrix=max_pooling(matrix)
print(max_pooled_matrix.shape)
print(tf.squeeze(max_pooled_matrix))

Average Pooling

Create an AveragePooling2D layer with the same 2 pool_size and strides. Apply the AveragePooling2D layer to the matrix. By applying it to the matrix, the average pooling layer will go through the matrix by computing the average of 2×2 for each pool with a jump of 2. Print the shape of the matrix and Use tf.squeeze to convert the output into a readable form by removing all 1 size dimensions.

average_pooling=tf.keras.layers.AveragePooling2D(pool_size=2,
strides=2)
average_pooled_matrix=average_pooling(matrix)
print(averge_pooled_matrix.shape)
print(tf.squeeze(average_pooled_matrix))

The GIF here explains how these pooling layers go through the input matrix and computes the maximum or average for max pooling and average pooling, respectively.

Max Pooling and Average Pooling being performed — Source

Global Pooling Layers

Global Pooling Layers often replace the classifier’s fully connected or Flatten layer. The model instead ends with a convolutional layer that produces as many feature maps as there are target classes and performs global average pooling on each of the feature maps to combine each feature map into a single value.

Create the same NumPy array but with a different shape. By keeping the same shape as above, the Global Pooling layers will reduce them to one value.

matrix=np.array([[[3.,2.,0.,0.],
[0.,7.,1.,3.]],
[[5.,2.,3.,0.],
[0.,9.,2.,3.]]]).reshape(1,2,2,4)

Global Average Pooling

Considering a tensor of shape h*w*n, the output of the Global Average Pooling layer is a single value across h*w that summarizes the presence of the feature. Instead of downsizing the patches of the input feature map, the Global Average Pooling layer downsizes the whole h*w into 1 value by taking the average.

global_average_pooling=tf.keras.layers.GlobalAveragePooling2D()
global_average_pooled_matrix=global_average_pooling(matrix)
print(global_average_pooled_matrix)
The output of the GlobalAveragePooled layer

Global Max Pooling

With the tensor of shape h*w*n, the output of the Global Max Pooling layer is a single value across h*w that summarizes the presence of a feature. Instead of downsizing the patches of the input feature map, the Global Max Pooling layer downsizes the whole h*w into 1 value by taking the maximum.

global_max_pooling=tf.keras.layers.GlobalMaxPool2D()
global_max_pooled_matrix=global_max_pooling(matrix)
print(global_max_pooled_matrix)
The output of the GlobalMaxPooled layer

Conclusion

In general, pooling layers are useful when you want to detect an object in an image regardless of its position in the image. The consequence of adding pooling layers is the reduction of overfitting, increased efficiency, and faster training times in a CNN model. While the max pooling layer draws out the most prominent features of an image, average pooling smoothes the image retaining the essence of its features. Global pooling layers often replace the Flatten or Dense output layers.

Read Keras Pooling layers API and Chapter 5 of Deep Learning with Python by François Chollet for detailed information. Also, check CNN Explainer for an intuitive explanation of a CNN model.


Introduction To Pooling Layers In CNN was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓