Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Do You Know VGG 16?
Latest

Do You Know VGG 16?

Last Updated on January 16, 2023 by Editorial Team

Author(s): Chinmay Bhalerao

Originally published on Towards AI.

VGG 16 from scratch with python implementation

Image created byΒ Dalle.2

β€œBy far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.”

β€” Eliezer Yudkowsky

There is always an architecture or framework that comes into the picture and changes the whole scenario of that domain. VGG 16 was the first CNN architecture with design principles and systematic layer planning. Many other architectures are dependent on VGG and use this structure as aΒ base.

Let's startΒ with,

Why do we require Convolutional neural networks and their architectures? Why can't we simply use ANN or multi-layer perceptron forΒ images?

The following are theΒ reasons:

  1. We have to send the image as a flattened one for each pixel which will act as an input. But when the image is flattened, there is a loss of spatial information. [Spatial information: information related to objects’ topography, geometry, reference, etc.]
  2. Handling so many pixels as weights and biases will cause a problem of vanishing gradient, and that will cause major trouble while dealing with imageΒ data.
  3. Even if we generate a feature map with MLP but little translation will ruin everything. in CNN, we have options like Pooling which we usedΒ for

a. Downsample featureΒ map

b. Model’s invariance to local translations

Due to these reasons, we require CNN and its architectures.

On top of CNN, many architectures are built over time periods, like AlexNet, ZFNet[ A Bigger AlexNet], GoogLeNet, Residual Networks, and manyΒ more…

In this blog, let's understand VGG and its variations fromΒ scratch.

What isΒ VGG?

It is a typical deep Convolutional Neural Network (CNN) design with numerous layers, and the abbreviation VGG stands for Visual GeometryΒ Group.

Source:Papers withΒ code

The term β€œdeep” describes the number of layers, with VGG-16 or VGG-19 having 16 or 19 convolutional layers, respectively. Innovative object identification models are built using the VGG architecture. The VGGNet, created as a deep neural network, outperforms benchmarks on a variety of tasks and datasets outside of ImageNet. It also remains one of the most often used image recognition architectures today.

The VGG model was first proposed by Andrew Zisserman and Karen Simonyan in 2013, and a prototype was made for the 2014 ImageNet Challenge. They were a part of Oxford’s Visual Geometry GroupΒ (VGG).

The 16 convolutional layers of VGGNet-16 are quite appealing, and their architecture is very consistent. It has several filters but only 3×3 convolutions, like AlexNet. On 4 GPUs, it can be trained for two to threeΒ weeks.

Source: Lecture notes of Justin Johnson of Michigan university

Let's see each layer inΒ detail.

So the 16 or 19 in VGG16 or VGG19 stands for the number of layers in the architecture. Authors refer to the paper as a deeper network is better than the longer one. Rather than randomly adding layers to the networks, the authors decided to set a few rules for theΒ design.

All convolutional layers must contain 3×3 stride 1 padΒ 1

All max pooling layers should have a 2×2Β stride

After pooling, double the number ofΒ channels

This made design standard.

Source: Papers withΒ code

Let's start the scratch coding by importing libraries.

import cv2
import keras
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from keras.layers import Dense, Activation, Flatten
from keras.preprocessing.image import ImageDataGenerator
tf.random.set_seed(0)

Now we will read and extract imageΒ data

trdata = ImageDataGenerator()
traindata = trdata.flow_from_directory(directory="C:/Users/ChinmayB/BPRoject/train",target_size=(224,224))
tsdata = ImageDataGenerator()
testdata = tsdata.flow_from_directory(directory="C:/Users/ChinmayB/BPRoject/test", target_size=(224,224))

We will use a sequential modelΒ here.

model = keras.Sequential()

Let's write the actual first block for theΒ network.

######## BLOCK1 ########

model.add(Conv2D(64, kernel_size = (3,3),padding = 'same',activation = "relu",input_shape = (224,224,3)))
model.add(Conv2D(64, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

Now in the same way we will construct all blocks andΒ layers

######## BLOCK1 ########

model.add(Conv2D(64, kernel_size = (3,3),padding = 'same',activation = "relu",input_shape = (224,224,3)))
model.add(Conv2D(64, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

######## BLOCK2 ########

model.add(Conv2D(128, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(Conv2D(128, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))


######## BLOCK3 ########

model.add(Conv2D(256, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(Conv2D(256, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(Conv2D(256, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

######## BLOCK4 ########

model.add(Conv2D(512, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(Conv2D(512, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(Conv2D(512, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

######## BLOCK5 ########

model.add(Conv2D(512, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(Conv2D(512, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(Conv2D(512, kernel_size = (3,3),padding = 'same',activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

######## Dense ########

model.add(Flatten())
model.add(Dense(4096,activation = 'relu'))
model.add(Dense(4096,activation = 'relu'))
model.add(Dense(3, activation = "softmax"))

After writing all layers in the network, at last, we will build theΒ model.

model.build()
model.summary()

We can see the total config file and how many parameters weΒ have.

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 224, 224, 64) 1792

conv2d_1 (Conv2D) (None, 224, 224, 64) 36928

max_pooling2d (MaxPooling2D (None, 112, 112, 64) 0
)

conv2d_2 (Conv2D) (None, 112, 112, 128) 73856

conv2d_3 (Conv2D) (None, 112, 112, 128) 147584

max_pooling2d_1 (MaxPooling (None, 56, 56, 128) 0
2D)

conv2d_4 (Conv2D) (None, 56, 56, 256) 295168

conv2d_5 (Conv2D) (None, 56, 56, 256) 590080

conv2d_6 (Conv2D) (None, 56, 56, 256) 590080

max_pooling2d_2 (MaxPooling (None, 28, 28, 256) 0
2D)

conv2d_7 (Conv2D) (None, 28, 28, 512) 1180160

conv2d_8 (Conv2D) (None, 28, 28, 512) 2359808

conv2d_9 (Conv2D) (None, 28, 28, 512) 2359808

max_pooling2d_3 (MaxPooling (None, 14, 14, 512) 0
2D)

conv2d_10 (Conv2D) (None, 14, 14, 512) 2359808

conv2d_11 (Conv2D) (None, 14, 14, 512) 2359808

conv2d_12 (Conv2D) (None, 14, 14, 512) 2359808

max_pooling2d_4 (MaxPooling (None, 7, 7, 512) 0
2D)

flatten (Flatten) (None, 25088) 0

dense (Dense) (None, 4096) 102764544

dense_1 (Dense) (None, 4096) 16781312

dense_2 (Dense) (None, 3) 12291

Lastly, we added a plain neural network or flattened network as it will be our decision node. This was the first network that had principles for designing.

result = model.predict(np.array([img]))
print(result)

There are a lot of advantages of VGG against its predecessors.

Source: Lecture notes of Justin Johnson of Michigan university

The community now views it as the best option for extracting characteristics from photos. The weight configuration of the VGGNet is openly accessible and has been employed as a standard feature extractor in numerous different applications and challenges. As per its paper, VGG 16 is able to classify 1000 images of 1000 different categories with 92.7% accuracy.

If you have found this article insightful

If you found this article insightful, follow me on Linkedin and medium. you can also subscribe to get notified when I publish articles. Let’s create a community! Thanks for yourΒ support!

If you want to support meΒ :

As Your following and clapping is the most important thing, but you can also support me by buying coffee.Β COFFEE.

You can also read my blogs relatedΒ to

Signing off,

CHINMAY BHALERAO


Do You Know VGG 16? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓