Improve Performance of the Deep Neural Model (Part-1)
Last Updated on July 25, 2023 by Editorial Team
Author(s): Akash Dawari
Originally published on Towards AI.
Note: Before you start reading this article please click the below link to see the code and different visualization graphs to better understand the article and the concepts.
In this article, we will discuss the following questions and try to find the answers to them.
- What is the problem faced by a deep neural model?
- How to solve these problems and improve the performance of the model?
What is the problem faced by a deep neural model?
It is said by many data scientists that if we provide enough neurons and enough data and construct a good enough deep neural architecture. Then, the model can learn or fit any kind of complex function and can predict a relevant prediction. So, we want to achieve that goal by using minimal resources. But the deep neural network has two big problems.
- The deep neural network has a tendency to overfit, which as a result, gives us a poor prediction on new unseen data points.
- The deep neural network takes a lot of time to train and consumes a lot of resources while training.
How to solve these problems and improve the performance of the model?
To solve these above problems, which we discuss in brief, the most common answer or solution for that would be:
- Tuning the neurons in each layer of the deep neural model.
- Tuning the total number of hidden layers of the model.
- Increase the amount of data.
But in this article, we will not discuss that. Rather we will focus on some specific techniques to solve those two problems.
Technique to solve overfitting:
- Early stopping
- Dropout
- Regularization (L1 & L2)
What is Early Stopping?
In a deep neural network, the model starts learning and is able to reduce the loss in every epoch. But after certain epochs, the model starts overfitting, which means the training loss is still reducing, but the validation loss remains constant or not reducing as fast as the training loss.
As you can see in the above picture, the training should stop on 60 epochs, but because of high epochs, it's overfitting the data.
So, the common solution for that is to find an optimal epoch number to train the model. This is not as easy as it may sound cause we cannot train our model over and over again to find the optimal epoch number, as it takes a lot of time and resources.
Here comes the technique of Early Stopping. In this technique, we try to stop our model before it starts to overfit the data by breaking the loop of the epochs. The core concept of this technique is to check and compare the parameters of training and validation, and if the parameters of validation stop changing as compared to training parameters, then the training stops. That's why this technique is called Early stopping, as it stops the training prior to its epochs completion.
As you can see in the above picture, the problem of overfitting solves and also we able to save our resources and time. This technique can be easily used with the help of Keras and TensorFlow. We just have to call the function of early stopping in Keras and apply the function while training the model, as shown below:
callback = EarlyStopping(
monitor="val_loss",
min_delta=0,
patience=20,
verbose=1,
mode="auto",
baseline=None,
restore_best_weights=False,
)
What is Dropout?
The dropout technique is another effective technique to reduce the overfitting problem faced by a deep neural model. In this technique, we shut down some random neurons in every hidden layer ( By zeroing out its output which will become input for other neurons) in every epoch during training. This helps the model to learn properly and avoid overfitting training data. The ratio which decides how many neurons will turn off in each layer is called as dropout ratio. We can add a dropout layer in each layer and can decide the dropout ratio too.
Let's see how the dropout technique helps in avoiding overfitting.
As we can see clearly that this model is overfitting the training data. Now if we apply dropout on the same model with the same architecture, how does it performβ¦
So, by analyzing the above picture, it is clearly seeing that dropout really reduces the overfitting problem of the model. We can apply dropout very easily using Keras and TensorFlow, as shown below code:
model.add(Dense(250, activation='relu', input_dim=2))
model.add(Dropout(0.7))
model.add(Dense(250, activation='relu'))
model.add(Dropout(0.8))
model.add(Dense(250, activation='relu'))
model.add(Dropout(0.6))
model.add(Dense(250, activation='relu'))
model.add(Dropout(0.8))
model.add(Dense(100, activation='relu'))
model.add(Dropout(0.7))
model.add(Dense(1, activation=keras.activations.sigmoid))
Note: While predicting all neurons will participate. But there is a catch, the weights are multiplied by (1-p) where P is the rate of dropout during the prediction.
What is Regularization?
Regularization is a technique to prevent overfitting in a deep neural network. In this, we add a penalty term to the cost function to reduce the learning of the model. The penalty term actually reduces the value of the weights( makes the values of the weights tend to zero).
There are typically three types of regularization:
- Lasso (L1)
- Rigid (L2)
Lasso (L1)
In this regularization, we add the absolute sum of all weights to the loss function. The formula is shown below:
Rigid (L2)
In this regularization, we add the square sum of all weights to the loss function. The formula is shown below:
Letβs see how the regularization technique helps to prevent overfitting.
As we can analyze, this model is suffering from an overfitting problem. Letβs apply regularization to the same model and see the result.
We can easily apply regularization using Keras and TensorFlow, as shown below code:
model = Sequential()
model.add(Dense(250, activation='relu', input_dim=2))
model.add(Dense(250, activation='relu', kernel_regularizer=keras.regularizers.L2()))
model.add(Dense(250, activation='relu', kernel_regularizer=keras.regularizers.L2()))
model.add(Dense(250, activation='relu', kernel_regularizer=keras.regularizers.L2()))
model.add(Dense(100, activation='relu', kernel_regularizer=keras.regularizers.L2()))
model.add(Dense(1, activation=keras.activations.sigmoid))
Note: We do not consider the value of biases only focus on weights in regularizetion. L2 is widely used in Deep neural models.
This is the end of the article. Please wait for its second part, where we discuss how to reduce the training time of a deep neural model. Also, for the jupyter notebook, which contains the coding part, please click the below link.
Articles_Blogs_Content/Improve your deep learning model.ipynb at main Β·β¦
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab orβ¦
github.com
Like and Share if you find this article helpful. Also, follow me on medium for more content related to Machine Learning and Deep Learning.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI