Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Convolutional Neural Networks for Dummies
Computer Vision   Deep Learning

Convolutional Neural Networks for Dummies

Last Updated on July 23, 2020 by Editorial Team

Author(s): Daksh Trehan

Deep Learning, ComputerΒ Vision

A perfect guide to Convolution NeuralΒ Networks

A notification pops on your Social media handle saying, somebody uploaded a picture that might have you inΒ it.

Boom! How did itΒ happen?

Photo by Patrick Fore onΒ Unsplash

This is the magic of Image Classification.

Convolution Neural Networks(CNN) lies under the umbrella of Deep Learning. They are utilized in operations involving Computer Vision. Nowadays since the range of AI is expanding enormously, we can easily locate Convolution operation going around us. These play an important role ranging from Facebook Automatic Tagging, Face Unlock, OCR to driverless cars.

Image Classification takes an input(image in this case) and outputs a class/probability of that input belonging to a particular class with the help of a Deep learning algorithm i.e.Β CNN.

Why CNN?

According to the Universal Approximation theorem, Deep Neural Networks are powerful function approximators. And to the best, these can improve their accuracies using Backpropagation.

But due to the employment of so many parameters, Deep Neural Network often is very sensitive to overfitting that can even lead to the Gradient Vanish issue because of long chains. Additionally, Deep Neural Networks can’t handle variations such as translation, rotation, illuminations inΒ images.

Therefore, to tackle this problem we developed a Deep Learning algorithm: Convolution NeuralΒ Network.

Image Classification

How do humans observe anΒ image?

We have mastered our brains to determine objectsΒ quickly.

Now, how does itΒ happens?

Photo by Saifullah Munqad onΒ Unsplash

There are certain features that our brain relates an object to. Like for illustration, you were given an orange and an orange-colored ball. Both are identical, but still, the brain can accurately distinguish them, as we have trained it that surface of ball might be smooth but that of orange fruit is littleΒ rough.

Now for many years, we have trained our brains to differentiate between objects precisely. And same is the idea behind CNN i.e. to train our model to learn features fromΒ images.

How does a computer observes anΒ image?

When we observe an image, we scrutinize various aspects, like edges of objects, the color of an object because we have been trained to do so from our childhood.

But, when we input an image to our model, for it, it is nothing but a collection of matrices.

For our model, an image is a matrix of pixels. If we pass a colored image it will be a 3D array with RGB channels ranging from 0 to 255 if we pass black and white image, then we will get a 2D array with binaryΒ values.

So, when we talk about classification, we want our model to accurately define these variations so that, the output we get is as minutiae as possible.

For every feature, we have a different matrix representing its characteristics.

How CNNΒ works?

The basic pipeline for CNN is asΒ follows:

  • Input anΒ image.
  • Perform Convolution operation to get an activation map.
  • Apply the pooling layer to make our modelΒ robust.
  • Activation function(mostly ReLu) is applied to avoid non-linearity.
  • Flatten the last output into one linearΒ vector.
  • The vector is passed to a fully connected artificial neuralΒ network.
  • The fully connected layer will provide a probability for each class that we’reΒ after.
  • Repeat the process to get well defined trained weights and feature detectors.

Convolution Layer

Convolution filter sweeps out features from an image i.e. it tries to learn from an image. It computes a dot product between the filter value and the image pixel values to form a convolution layer.

The values in the filter matrix are updated every time backpropagation is implemented. However, the dimensions of the filter matrix are determined explicitly by the programmer.

Convolution operation

Each filter applied to 3D input will give a 2D output and by combining all those 2D outputs we get a 3D finalΒ output.

More the number of filters, the more the number of features learned by ourΒ model.

When we use the convolution filter, the dimensions of the output images are reduced to (h-f+1, w-f+1, c) from (h, w, c) and that is obvious because we can’t keep kernel at corners as it will cross input boundary.

To retain the dimensions of the output image, we use padding, in padding we follow adding zeros to everyΒ channel.

The bigger the kernel size, the larger the padding is required.

Padding, Source

Alongside padding, we often employ stride. In convolution, to get an activation map we usually skip one pixel either downwards or sideward, but if we wish to skip custom numbers of pixel we can useΒ stride.

Stride, Source

The more the stride, the less the output dimension.

Output afterΒ Stride

If we perform both stride and padding, the output dimensions wouldΒ be:-

Output after Stride &Β Padding

Pooling Layer

To create our model more robust and feature detector, CNN replaces output with a max summary to curtail data size and processing time. This allows the programmer to distinguish features with a higherΒ impact.

Max Pooling,Β Source

Max pooling takes two hyperparameters: stride and size. The stride will determine the number of values to be skipped and the size will determine the area of every skipped valueΒ pool.

Activation Function (ReLU andΒ Sigmoid)

After applying convolution and pooling operation, activation functions are introduced to pertain nonlinearity for values x>0 and returns 0 if it does not meet the condition.

This method has been effective to solve diminishing gradient issues. Very small weights will remain as 0 after the ReLU activation function.

Fully Connected Layer

CNN Architecture, Source

At this point, we are ready to add an artificial neural network to our convolutional neuralΒ network.

This is the final layer, we will input the flattened feature output to a column vector. To wrap up, we will feed our final flattened output to the softmax activation function that will assign a probability for each class. Every node in the previous layer is connected to the last layer and represents which distinct label toΒ output.

The final output,Β Source

How does CNN perform Backpropagation?

We use backpropagation to make our model learn better weights and biases to help optimize the performance of the model. This is a recursive process that helps our network to learnΒ better.

Suppose, you have two output classes, one for a cat and another one for a dog. The cat neurons are certain, they will be triggered when some particular features like β€œwhiskers”, β€œcat-iris”, β€œsmall and light body” are identified.

Through lots of iterations, the cat neuron learns that when certain features fire up, the image is aΒ cat.

Photo by Manja Vitolic onΒ Unsplash

Similarly, dog neurons are active when some features like β€œbig wet nose”, β€œfloppy ears”, β€œheavy body” is triggered.

Photo by Victor Grabarczyk onΒ Unsplash

And as our model progress and it is trained recursively, now we can expect them to distinguish relevant and significant features for different classes, thus providing greater weight to those so that those characteristics can make a remarkable impact on ourΒ output.

Making CNNΒ Better

Sadly, CNN is not perfect and is often immune to overfitting if not supervised in a properΒ manner.

Albeit, some of its issues can be resolved as discussed:

Dataset is relatively small

When the dataset is small, it is at ease to overfit. e.g. if you only show black dogs as dogs, then next time it wouldn’t recognize a white dog as a dog, because for our model the dog can be only black(which is obviously wrong andΒ racist!)

Photo by Cristina Anne Costello onΒ Unsplash

Therefore, in such a case, it is advisable to artificially boost the diversity and number of training examples. One way of doing this is to add image augmentations and creating new variants. These include translating images and creating dimension changes such as zoom, crop, flips, shifts,Β etc.

Image Augmentation, Source

Over Memorization

The more layers we have, the more the features are learned and this promotes memorization and inhibits generalize. The more you train your model, the more likely it becomes too specialized.

This can be avoided if we could reduce the complexity by removing a few hidden layers and neurons per layer. This technique is known asΒ Dropout.

The idea is to drop some of the random neurons with every epoch, the final result would be average of allΒ output.

Standard NN v/s NN with Dropout,Β Source.

The neurons that aren’t participating in the computation function, their weights won’t be updated using backpropagation.

Conclusion

Hopefully, this article will help you to understand about Convolution Neural Networks in the best possible way and also assist you to its practical usage.

Please leave your suggestions and feedback. Just like you, I am still learning how to become a better Data Scientist and Engineer. Please help me improve so that I could help you better in my subsequent article releases.

As always, thank you so much for reading, and please share this article if you found itΒ useful!

Feel free toΒ connect:

LinkedIn ~ https://www.linkedin.com/in/dakshtrehan/

Instagram ~ https://www.instagram.com/_daksh_trehan_/

Github ~ https://github.com/dakshtrehan

Follow for further Machine Learning/ Deep LearningΒ blogs.

Medium ~ https://medium.com/@dakshtrehan

Want to learnΒ more?

Detecting COVID-19 Using DeepΒ Learning

The Inescapable AI Algorithm: TikTok

Diving Deep into DeepΒ Learning

An insider’s guide to Cartoonization using MachineΒ Learning

Why are YOU responsible for George Floyd’s Murder and Delhi CommunalΒ Riots?

Why Choose Random Forest and Not DecisionΒ Trees

Clustering: What it is? When to useΒ it?

Start off your ML Journey with k-Nearest Neighbors

Naive Bayes Explained

Activation Functions Explained

Parameter Optimization Explained

Gradient Descent Explained

Logistic Regression Explained

Linear Regression Explained

Determining Perfect Fit for your MLΒ Model

Cheers!


Convolutional Neural Networks for Dummies was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓