Introduction To Pooling Layers In CNN
Last Updated on August 16, 2022 by Editorial Team
Author(s): Rafay Qayyum
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
A Convolutional neural network(CNN) is a special type of Artificial Neural Network that is usually used for image recognition and processing due to its ability to recognize patterns in images. It eliminates the need to extract features from visual data manually. It learns images by sliding a filter of some size on them and learning not just the features from the data but also keeps Translation invariance.
The typical structure of a CNN consists of three basicΒ layers
- Convolutional layer: These layers generate a feature map by sliding a filter over the input image and recognizing patterns inΒ images.
- Pooling layers: These layers downsample the feature map to introduce Translation invariance, which reduces the overfitting of the CNNΒ model.
- Fully Connected Dense Layer: This layer contains the same number of units as the number of classes and the output activation function such as βsoftmaxβ or βsigmoidβ
What are PoolingΒ layers?
Pooling layers are one of the building blocks of Convolutional Neural Networks. Where Convolutional layers extract features from images, Pooling layers consolidate the features learned by CNNs. Its purpose is to gradually shrink the representationβs spatial dimension to minimize the number of parameters and computations in theΒ network.
Why are Pooling layersΒ needed?
The feature map produced by the filters of Convolutional layers is location-dependent. For example, If an object in an image has shifted a bit it might not be recognizable by the Convolutional layer. So, it means that the feature map records the precise positions of features in the input. What pooling layers provide is βTranslational Invarianceβ which makes the CNN invariant to translations, i.e., even if the input of the CNN is translated, the CNN will still be able to recognize the features in theΒ input.
In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not changeβββPage 342, Deep Learning by Ian Goodfellow, 2016.
How do Pooling layers achieve that? A Pooling layer is added after the Convolutional layer(s), as seen in the structure of a CNN above. It downsamples the output of the Convolutional layers by sliding the filter of some size with some stride size and calculating the maximum or average of theΒ input.
There are two types of poolings that areΒ used:
- Max pooling: This works by selecting the maximum value from every pool. Max Pooling retains the most prominent features of the feature map, and the returned image is sharper than the originalΒ image.
- Average pooling: This pooling layer works by getting the average of the pool. Average pooling retains the average values of features of the feature map. It smoothes the image while keeping the essence of the feature in anΒ image.
Letβs explore the working of Pooling Layers using TensorFlow. Create a NumPy array and reshapeΒ it.
matrix=np.array([[3.,2.,0.,0.],
[0.,7.,1.,3.],
[5.,2.,3.,0.],
[0.,9.,2.,3.]]).reshape(1,4,4,1)
Max Pooling
Create a MaxPool2D layer with pool_size=2 and strides=2. Apply the MaxPool2D layer to the matrix, and you will get the MaxPooled output in the tensor form. By applying it to the matrix, the Max pooling layer will go through the matrix by computing the max of each 2×2 pool with a jump of 2. Print the shape of the tensor. Use tf.squeeze to remove dimensions of size 1 from the shape of aΒ tensor.
max_pooling=tf.keras.layers.MaxPool2D(pool_size=2,strides=2)
max_pooled_matrix=max_pooling(matrix)
print(max_pooled_matrix.shape)
print(tf.squeeze(max_pooled_matrix))
Average Pooling
Create an AveragePooling2D layer with the same 2 pool_size and strides. Apply the AveragePooling2D layer to the matrix. By applying it to the matrix, the average pooling layer will go through the matrix by computing the average of 2×2 for each pool with a jump of 2. Print the shape of the matrix and Use tf.squeeze to convert the output into a readable form by removing all 1 size dimensions.
average_pooling=tf.keras.layers.AveragePooling2D(pool_size=2,
strides=2)
average_pooled_matrix=average_pooling(matrix)
print(averge_pooled_matrix.shape)
print(tf.squeeze(average_pooled_matrix))
The GIF here explains how these pooling layers go through the input matrix and computes the maximum or average for max pooling and average pooling, respectively.
Global PoolingΒ Layers
Global Pooling Layers often replace the classifierβs fully connected or Flatten layer. The model instead ends with a convolutional layer that produces as many feature maps as there are target classes and performs global average pooling on each of the feature maps to combine each feature map into a singleΒ value.
Create the same NumPy array but with a different shape. By keeping the same shape as above, the Global Pooling layers will reduce them to oneΒ value.
matrix=np.array([[[3.,2.,0.,0.],
[0.,7.,1.,3.]],
[[5.,2.,3.,0.],
[0.,9.,2.,3.]]]).reshape(1,2,2,4)
Global AverageΒ Pooling
Considering a tensor of shape h*w*n, the output of the Global Average Pooling layer is a single value across h*w that summarizes the presence of the feature. Instead of downsizing the patches of the input feature map, the Global Average Pooling layer downsizes the whole h*w into 1 value by taking theΒ average.
global_average_pooling=tf.keras.layers.GlobalAveragePooling2D()
global_average_pooled_matrix=global_average_pooling(matrix)
print(global_average_pooled_matrix)
Global MaxΒ Pooling
With the tensor of shape h*w*n, the output of the Global Max Pooling layer is a single value across h*w that summarizes the presence of a feature. Instead of downsizing the patches of the input feature map, the Global Max Pooling layer downsizes the whole h*w into 1 value by taking theΒ maximum.
global_max_pooling=tf.keras.layers.GlobalMaxPool2D()
global_max_pooled_matrix=global_max_pooling(matrix)
print(global_max_pooled_matrix)
Conclusion
In general, pooling layers are useful when you want to detect an object in an image regardless of its position in the image. The consequence of adding pooling layers is the reduction of overfitting, increased efficiency, and faster training times in a CNN model. While the max pooling layer draws out the most prominent features of an image, average pooling smoothes the image retaining the essence of its features. Global pooling layers often replace the Flatten or Dense outputΒ layers.
Read Keras Pooling layers API and Chapter 5 of Deep Learning with Python by François Chollet for detailed information. Also, check CNN Explainer for an intuitive explanation of a CNN model.
Introduction To Pooling Layers In CNN was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI