Striking the Right Balance: Understanding Underfitting and Overfitting in Machine Learning Models

Last Updated on August 1, 2023 by Editorial Team

Author(s): Shivamshinde

Originally published on Towards AI.

This article will explain the basic concept of overfitting and underfitting from the machine learning and deep learning perspective.

Striking the Right Balance: Understanding Underfitting and Overfitting in Machine Learning Models — Photo by Ag PIC on Unsplash

Seeing underfitting and overfitting as a problem

Every person working on a machine learning problem wants their model to work as optimally as possible. But there are times when the model might not work as optimally as we want. It might either have an accuracy worse than ideal or better than ideal. In machine learning, both of these are considered a problem.

Some people might wonder that having a less-than-ideal accuracy might be considered a problem, but why are we considering the above ideal accuracy as a problem too?

Sometimes our model tries to find the relation in meaningless stuff i.e., some unnecessary features or some noise in the data, which is where this extra accuracy comes from. Let’s understand this with an example.

If we are training a model that predicts a salary of a person. For this problem, our data have four features, namely the name of the person, his/her education, his/her experience, and his/her skill set. Based on our common sense, we know that the person’s name is not a factor that affects the person’s salary. But despite this fact, if we use the person’s name as one of the features in our data, our model might try to find some kind of relation between name and salary. And this kind of relationship might add some extra accuracy to our model. This causes more-than-ideal accuracy and in such cases, our model is trained incorrectly.

Basic terminologies

Before diving into the topics, let’s understand two different kinds of errors that are necessary to understand underfitting and overfitting.

Bias error: A bias error is basically an error that we find using the training data and a trained model. In other words, here we are finding the error using the same data that is used for training the model. An error can be any kind of error such as mean squared error, mean absolute error, etc.
Variance error: A variance error is an error that we find using the test data and a trained model. Again here, the error can be any type of error. Even though we can use any type of error to find the variance, we use the same error that we used for bias finding because that way we can compare the bias and variance values.

Note that the ideal condition of our trained model is having low bias and low variance.

What are overfitting and underfitting in general life?

Let’s say you are visiting a foreign country and the taxi driver rips you off. You might be tempted to say that all the taxi drivers in that country are greedy. This is what we call over-generalization.

The over-generalization could happen to our trained machine and deep learning models. The over-generalization in the case of machine and deep learning is known as the overfitting of the model.

Similarly, the under-generalization is known as the underfitting of the model.

What does overfitting mean from a machine learning perspective?

We say our model suffers from overfitting if it has low bias and high variance.

Overfitting happens when the model is too complex relative to the amount and noisiness of the training data.

Possible solutions to the overfitting issue

Simplify the model in one of the following ways:

Select the machine learning model with fewer parameters

Reduce the features or columns used for training the machine-learning model

Constraint the model (Using regularization methods)

2. Gather more training data.

3. Reduce the noise in the data. The noise could be some errors in the data or the presence of outliers, etc.

4. Use early stopping

What is underfitting?

Underfitting happens when a machine learning model is not able to capture a relationship between our independent and dependent features. In other words, in case of underfitting, our model will give us high bias and high variance. There might be several reasons behind this.

Possible solutions to an underfitting issue

Use a more complex model that could capture the relationship between independent and dependent features.
Relax the constraints on the model i.e., reduce the regularization.
Try to obtain more training data.
Try to increase the duration of model training. This can be done by training the model for more epochs.
Try to clean the data to reduce the noise.

Let’s see how the overfitting and underfitting look like using some plots

Let’s use the red-wine-quality dataset to understand the concepts of underfitting and overfitting.

Underfitting:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')

from sklearn.model_selection import train_test_split
import tensorflow.keras.layers as tfl
from tensorflow.keras.models import Model

## Reading the data
wine = pd.read_csv('wine.csv')

## Splitting the data into independent and dependent features
X = wine.drop('quality',axis=1)
y = wine['quality']

## Splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)

## creating a model
input = tfl.Input(shape=X.shape[1:])
hidden1 = tfl.Dense(6,activation='relu')(input)
output = tfl.Dense(10, activation='softmax')(hidden1)
model = Model(inputs=[input], outputs=[output])

## compiling the model
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy', metrics=['accuracy'])

## training the model using training and test set
history = model.fit(X_train, y_train, epochs=150, validation_data=(X_test, y_test))

## visualizing the train and test accuracy
plt.plot(history.history['accuracy'],color='red',label='train accuracy')
plt.plot(history.history['val_accuracy'],color='blue',label='test accuracy')
plt.legend()
plt.show()

Observe the above plot. We can see that the accuracy of the train model on both training data and test data is less than 55% which is quite less. So our model, in this case, is suffering from the underfitting problem. This occurs because of the simplicity of the model.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
from sklearn.model_selection import train_test_split
import tensorflow.keras.layers as tfl
from tensorflow.keras.models import Model

## Reading the data
wine = pd.read_csv('wine.csv')

## Splitting the data into independent and dependent features
X = wine.drop('quality',axis=1)
y = wine['quality']

## Splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)

## creating a model
input = tfl.Input(shape=X.shape[1:])
hidden1 = tfl.Dense(100,activation='relu')(input)
hidden2 = tfl.Dense(100, activation='relu')(hidden1)
hidden3 = tfl.Dense(100, activation='relu')(hidden2)
hidden4 = tfl.Dense(100, activation='relu')(hidden3)
output = tfl.Dense(10, activation='softmax')(hidden4)
model = Model(inputs=[input], outputs=[output])

## compiling the model
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy', metrics=['accuracy'])

## training the model using training and test set
history = model.fit(X_train, y_train, epochs=150, validation_data=(X_test, y_test))

## visualizing the train and test accuracy
plt.plot(history.history['accuracy'],color='red',label='train accuracy')
plt.plot(history.history['val_accuracy'],color='blue',label='test accuracy')
plt.legend()
plt.show()

After observing the above plot, one can tell that the space between the two graphs is increasing as we go towards the left side (i.e., as we increase epochs). This means as we are increasing the epochs for which training is performed, the training accuracy is increasing while test accuracy is not. This kind of situation is considered overfitting. This kind of model doesn’t generalize well on test as well as new data.

We need to train the model in such a way that it gives good enough accuracy on both the training data and test data. This model will be on the middle line between underfitting and overfitting.

I hope you like the article. If you have any thoughts on the article then please let me know. Any constructive feedback is highly appreciated.

Connect with me on LinkedIn.

Mail me at shivamshinde92722@gmail.com

Have a great day!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Striking the Right Balance: Understanding Underfitting and Overfitting in Machine Learning Models

Author(s): Shivamshinde

This article will explain the basic concept of overfitting and underfitting from the machine learning and deep learning perspective.

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

TAI #171: How is AI Actually Being Used? Frontier Ambitions Meet Real-World Adoption Data

I Built a Clinical AI Agent — and It Skipped the Tools I Gave It

ATOKEN: A Unified Tokenizer for Vision Finally Solves AI’s Biggest Problem

How to Model APIs with Ontologies and Graphs for AI Agents

From A/B Testing to DoubleML: A Data Scientist’s Guide to Causal Inference:

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Striking the Right Balance: Understanding Underfitting and Overfitting in Machine Learning Models

Author(s): Shivamshinde

This article will explain the basic concept of overfitting and underfitting from the machine learning and deep learning perspective.

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement