Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

The Architecture and Implementation of LeNet-5
Deep Learning

The Architecture and Implementation of LeNet-5

Last Updated on January 6, 2023 by Editorial Team

Last Updated on July 30, 2020 by Editorial Team

Author(s): Vaibhav Khandelwal

Deep Learning

Demystifying the oldest Neural Network Architecture ofΒ LeNet-5

Photo fromΒ Faz.net

This very old neural network architecture was developed in 1998 by a French-American computer scientist Yann AndrΓ© LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. This architecture was developed for the recognition of handwritten and machine-printed characters. It is the basis of other deep learningΒ models.

Original Image published in [LeCun et al.,Β 1998]

The architecture consists of a total of 7 layers consisting- 2 sets of Convolution layers and 2 sets of Average pooling layers which are followed by a flattening convolution layer. After that, we have 2 dense fully connected layers and finally a softmax classifier.

Input Layer

If we take a standard MNIST image for our understanding then we have an input of (32×32) grayscale image which passes through the first convolution layer with the 6 feature maps or filters having the size of (5×5) kernel and with a stride as 1. The values of the input pixels are normalized so that the white background and foreground black corresponds to -0.1 and 1.175 respectively, making mean approximately as 0 and the variance approximately asΒ 1.

This input layer is not counted under network structure of LeNet-5 as traditionally, the input layer wasn’t considered as one of the network hierarchy.

First Layer

The result of the convolution of an input image with 6 filters has to lead to the change in dimension from (32x32x1) to (28x28x6) and we get our first layer. So, 1 channel is changed to 6 channels as 6 filters are applied to our input image. Also, the image size has been reduced as a result of zero paddings with a kernel size ofΒ (5×5).

Image byΒ author

> Calculations for the FirstΒ Layer

  • Filter size = f = 5 xΒ 5
  • No. of filters =Β 6
  • Strides = S =Β 1
  • Padding = P =Β 0
  • Output featuremap size = 28 xΒ 28
  • No. of neurons = 28*28*6 =Β 4,704

In Convolution, filter values are trainable parameters.

  • No. of learning parameters = (Weights + Bias )per filter * No. ofΒ filters

= (5 * 5 + 1) * 6 =Β 156

where, 5 * 5 = 25 are unit parameters and 1 bias per filter, and we have a total of 6Β filters

  • No. of connections = 156 * 28 * 28 =Β 1,22,304

> Detailed description:

  1. The first convolution operation is applied on the input image (using 6 convolution kernels of size 5 x 5) to obtain 6 C1 feature maps (6 feature maps each of size 28 x 28), where size is obtained by (N-f+2P)/S+1, but as here P=0 and S=1, hence we are using N-f+1 throughout the content. Therefore, the output size after the convolution is 32–5 + 1 =Β 28.
  2. Let’s take a look at the numbers of parameters that are needed. The size of the convolution kernel is 5 x 5, and there are 6 * (5 * 5 + 1) = 156 parameters in total, where +1 indicates that the kernel has aΒ bias.
  3. For the convolutional layer C1, each pixel in C1 is connected to 5 * 5 pixels and 1 bias, so there are 156 * 28 * 28 = 122304 connections in total. Though there are 1,22,304 connections, we only need 156 parameters to be learned, mainly through weightΒ sharing.

Second Layer

In the second layer, we implemented an average pooling layer with a filter size of (2×2) and a stride of 2. So, the resultant image dimension will decrease to (14x14x6). Here each unit in each feature map is connected to (2 x 2) neighborhood in the corresponding feature map inΒ C1.

Image byΒ author

> Calculations for the SecondΒ Layer

  • Filter size = f = 2 xΒ 2
  • No. of filters =Β 6
  • Strides = S =Β 2
  • Padding = P =Β 0
  • Output feature map size = 14 xΒ 14
  • No. of neurons = 14*14*6 =Β 1,176

The 4 inputs are added to a unit in S2 from the corresponding feature map in C1Β , then multiplied by a trainable coefficient, and added a trainable bias to it. The result is then passed through a sigmoidal activation function and we get the resultΒ Q

Image byΒ author
  • No. of learning parameters = (Coefficient + Bias ) * No. ofΒ filters

= (1+ 1) * 6 =Β 12

where, the first 1 is the weight of the 2 x 2 receptive field corresponding to the pooling, and the second 1 is theΒ bias.

  • No. of connections = (2*2 + 1)*14*14*6 =Β 5,880

> Detailed description:

  1. The pooling operation is followed immediately after the first convolution. Pooling is performed using 2 * 2 kernels, and 6 S2 feature maps of 14 * 14 are obtained.
  2. The pooling layer of S2 is the average of the pixels in the 2 * 2 area in C1 multiplied by a weight coefficient plus an offset or bias, and then the result is mappedΒ again.
  3. So each pooling core has two training parameters, and thus in total there are 2*6 = 12 training parameters, but there are 5*14*14*6 = 5880 connections.

Third Layer

If we proceed further to the third layer, we are applying 16 filters with a kernel size of (5×5) to S2 resulting in a convolution layer C3 with 16 feature maps. This convolution results in changing the dimension of the image from (14 x 14 x 6) in S2 to (10 x 10 x 16) inΒ C3.

Image byΒ author

> Calculations for the ThirdΒ Layer

  • Filter size = f = 5 xΒ 5
  • No. of filters =Β 16
  • Strides = S =Β 1
  • Padding = P =Β 0
  • Output feature map size = 10 xΒ 10
  • No. of neurons = 10*10*16 =Β 1,600

As here we can see that input i.e. S2 has 6 layers and the output i.e. C3 has 16 layers. Therefore we can not directly map each input layer to the output layer. So due to this, each unit in each feature map i.e. C3 is connected to several (5 x 5) neighborhoods at identical locations in a subset of S2’s featureΒ maps.

The combination of different input feature maps selection from S2 will allow more new features to be extracted.

The different combinations of feature maps taken from S2 are shown in the figureΒ below:

  1. Taking inputs from every contiguous subset of 3 feature maps from S2:- First 6 convolution layers of C3 are made with this combination.
  2. Taking inputs from every contiguous subset of 4 feature maps from S2:- Next 6 convolution layers of C3 are made with this combination.
  3. Taking inputs from the discontinuous subset of 4 feature maps from S2:- Next 3 layers of C3 were made with this combination.
  4. Taking all the feature maps:- The last layer of C3 is made with this combination.
Original Image published in [LeCun et al.,Β 1998]
  • No. of learning parameters = (Parameters in combination type-1) + (Parameters in combination type-2) + (Parameters in combination type-3) + (Parameters in combination type-4)

= [6 * (5*5*3 + 1)] + [6 * (5*5*4 + 1)] + [3 * (5*5*4 + 1)] + [1 * (5*5*6 +Β 1)]

= 456 + 606 + 303 + 151 =Β 1516

NOTE:- In the above calculation the numbers 3, 4, 4, 6 used with 5*5 in the parenthesis are basically theΒ depth.

  • No. of connections = 1516 * (10*10)=Β 1,51,600

Fourth Layer

In the fourth layer, we’ll again apply the average pooling layer with the filter size as (2×2) and a stride of 2. So, the resultant image has a resultant of the average pool which will be of the dimension (5x5x16). Here each unit in each feature map of S4 is connected to (2 x 2) neighborhood in the corresponding feature map inΒ C3.

Image byΒ author

> Calculations for FourthΒ Layer

  • Filter size = f = 2 xΒ 2
  • No. of filters =Β 16
  • Strides = S =Β 2
  • Padding = P =Β 0
  • Output feature map size = 5 xΒ 5
  • No. of neurons = 5*5*16 =Β 400
  • No. of learning parameters = (Coefficient + Bias ) * No. ofΒ filters

= (1+ 1) * 16 =Β 32

where, the first 1 is the weight of the 2 x 2 receptive field corresponding to the pooling, and the second 1 is theΒ bias.

  • No. of connections = (2*2 + 1)*5*5*16 =Β 2,000

This completes 2 convolution operations and 2 pooling operations.

Fifth Layer

In the fifth layer, we have a fully connected Convolution layer C5 that has 120 neuron units, and each unit of C5 is connected to (5 x 5) neighborhood on all 16 of S4’s feature maps i.e. every unit of C5 is connected to all the feature maps of S4 and, thus C5 is known as Fully Connected Convolution Layer.

C5 is named as β€œFully connected Convolution Layer” instead of simply β€œFully connected layer” because if input size to the LeNet-5 is increased keeping everything else constant, the dimension of feature maps in C5 layer would be greater than (1 xΒ 1).

So in the fourth layer, the resulting dimensions are (5x5x16), so the total nodes are 5x5x16 = 400 neurons. That means, 400 nodes are connected to 120 nodes as a dense fully connected network.

Image byΒ author

> Calculations for the FifthΒ Layer

  • Filter size = f = 5 xΒ 5
  • No. of filters =Β 120
  • Strides = S =Β 1
  • Padding = P =Β 0
  • Output feature map size = 1 xΒ 1
  • No. of neurons = 1*1*120 =Β 120
  • No. of learning parameters = (5*5*16 + 1)*120 =Β 48,120
  • No. of connections = 48,120*1*1 =Β 48,120

Sixth Layer

The Sixth layer F6 consists of 84 neurons Fully connected with C5. Here dot product between the input vector and weight vector is performed and then bias is added to it. The result is then passed through a sigmoidal activation function.

Image byΒ author

> Calculations for the SixthΒ Layer

  • Input: C5 with 120Β neurons
  • Output: F6 with 84Β neurons
  • No. of learning parameters = (120*84) + 84 =Β 10,164

The number of neurons in the F6 layer is chosen as 84, corresponding to a 7 x 12 bitmap, -1 means white, 1 means black, so the black and white of the bitmap of each symbol corresponds to a code. Such a representation is useful for recognizing strings of characters taken from the printable ASCII set. The characters that look similar and confusing as Uppercase O, 0, and lowercase O will have the same outputΒ codes.

The ASCII encoding set is asΒ follows:

Original Image published in [LeCun et al.,Β 1998]

And finally, we have a fully connected softmax output layer with 10 possible values corresponding to the digits from 0 toΒ 9.

Image byΒ author

So we have β€œsoftmax activation” function on the output layer and other layers which we saw have β€œtanh” as the activation function as softmax will give the probability of occurance each output class at theΒ end.

We are now venturing into coding territory.

Implementation of LeNet-5 usingΒ Keras

Before we start implementing the LeNet-5 through code, there are few key points to be kept inΒ mind:

  1. The input used by LeCun was of the size (32 x 32) but as we will be using the MNIST dataset, so the image size in this dataset is (28 x 28). Thus, the input size we’ll be having is (28 xΒ 28).
  2. When LeCun applied the third convolution i.e. C5, the input size was(5 x 5) but as from the initial only our input size to the network is less as compared to what LeCun took and hence, the input size for C5 in our case would be (4 x 4) and applying convolution to this input with (5 x 5) filter would result in a negative dimension size which is not possible and hence we’ll apply Flatten() afterΒ S4.

Importing Libraries

Loading the dataset and performing train-test split

Checking the sizes of train and testΒ split

The output of the above code will be asΒ follows:

Shapes of train and testΒ split

Performing reshaping operations- Converting intoΒ 4-D

Normalizing the values of the image- Converting in between 0 andΒ 1

One-hot encoding theΒ labels

Building the Model Architecture

Summary of theΒ model

  • There are approx 45 thousand trainable parameters here as can be seen from the subsequent image.
Model Summary

Compilation of theΒ model

And finally, a bit on evaluating yourΒ model.

Finding the loss and accuracy of theΒ model

The output of loss and accuracy is asΒ follows:

Loss and Accuracy of the model- Image byΒ author

πŸ“Œ To get the complete code of LeNet-5 or any other network visit my GitHub repository.

References:

[1] Yann LeCun, Gradient-Based Learning Applied to Document Recognition(1998), Proc of the IEEE(1998)

Thanks for reading. Hope this blog would have helped you with both the coding and understanding of the architecture. πŸ˜ƒ


The Architecture and Implementation of LeNet-5 was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓