Understanding Machine Learning Performance Metrics

Last Updated on July 25, 2023 by Editorial Team

Author(s): Pranay Rishith

Originally published on Towards AI.

Evaluating the Effectiveness of Your Models

I’m sure you’re familiar with machine learning, so let’s not discuss it further. There are several types of machine learning, such as supervised and unsupervised learning. Let’s focus on supervised learning.

Supervised learning involves training a computer with labeled data in order to predict new data points. It is further divided into two subcategories: regression and classification. Each has its own performance metrics.

Classification

Classification is a supervised learning algorithm, which means that the model is trained using labeled data to predict the class to which a given data point belongs. The dependent variable is categorical, meaning it has two or more classes.

Examples of classification include determining whether an email is spam or not.

Let’s discuss some performance metrics of classification.

Confusion Matrix

Seeing the name don’t get confused U+1F602. In machine learning, a classification problem is used to define something as either 0 or 1. But how do we find out if the model has correctly classified something? This is where the confusion matrix comes into play. A confusion matrix is a [0,1] matrix or table which explains comprehensively how good our model is.

The figure above is a confusion matrix. Let’s break it down. This matrix has two-row headings (Predicted YES and Predicted NO) and two-column headings (Actual YES and Actual NO). The Actual heading refers to the classes defined in the dataset, while the Predicted refers to the classes predicted.

Let’s take an example and, if possible, use it throughout the article.

The data is provided solely for illustrative purposes.

The table above displays four features, with Enjoy Sport as the dependent variable to be predicted. The Enjoy Sport column is the actual column in the confusion matrix, while the predicted values are placed in the predicted row.

As you can see, a confusion matrix has four values: true positives, true negatives, false positives, and false negatives. Let’s dive deeper.

True Positive

This value indicates the number of data points the model correctly predicted as positive when compared to the actual values. In other words, the model predicted “YES” when the actual value was “YES”.

True Negative

This value indicates the number of data points the model correctly predicted as negative when compared to the actual values. In other words, the model predicted “NO” when the actual value was “NO”.

False Positive (ERROR)

This value indicates the number of data points the model has incorrectly predicted as negative when compared to the actual values. For example, the model predicted “NO” when the actual value was “YES”.

False Negative (ERROR)

This value indicates the number of data points the model has incorrectly predicted as positive when compared to the actual values. For example, the model predicted “YES” when the actual value was “NO”.

The Confusion Matrix can provide insight into the quality of the model by showing how many correct and incorrect predictions it has made. It is particularly useful for assessing the following few performance metrics.

Accuracy

Accuracy is a metric that measures how well a model predicts the correct output for given data points. It is calculated by dividing the number of correct predictions made by the model by the total number of predictions made.

For example, consider the above example: accuracy is the percentage of correct predictions, regardless of whether they are “yes” or “no,” out of the total number of predictions.

Mathematically, accuracy is calculated as follows:

Accuracy = (correct predictions) / (total number of predictions)

For example, if the total number of predictions is 100 and the model predicts 73 correctly, then the accuracy is 73%.

If we use a confusion matrix, then we can get a much more simple and significant formula:

Precision

This is a metric that defines how accurately the model has predicted with respect to a specific class. It is calculated by dividing the number of true positives by the total number of positive predictions made.

It is calculated as,

where,

TP refers to the number of values where the model predicted YES, and the actual value was also YES.
FP is the number of values where the model predicted YES, but the actual value was NO.

Recall

This measure defines how many positive classes the model can correctly identify. It is calculated by dividing the number of true positive predictions by the total number of actual positives.

It is calculated as,

where,

TP: the number of values where the model predicted YES and the actual value was also YES.
FN: the number of values where the model predicted NO, but the actual value was YES.

F1 score

This metric combines recall and precision and is calculated as the harmonic mean of the two. It is mainly used to evaluate the effectiveness of a classification model.

This can be calculated as,

F1 scores range from 0 to 1, with higher values indicating better performance. A perfect score is achieved when both recall and precision are 1.

Regression

Regression is a supervised learning algorithm, which means that the model is trained using labeled data to predict a value. The dependent variable is continuous, meaning it is a real number.

Examples of regression include determining the price of a house in the US (which is a real number).

Let’s discuss some performance metrics of regression.

Mean Squared Error (MSE)

This is the most commonly used loss function for regression models. It is calculated by taking the average of the squared differences between the predicted and actual values.

The formula is as follows,

where,

n is the number of data points
y_pred is the predicted value
y is the actual value

The Mean Squared Error (MSE) is a non-negative value. A lower value indicates a better fit of the regression model; values close to 0 indicate the best fit, while values of 1 or more indicate a poor fit.

MSE is most often used as a loss function when training a machine learning model for regression tasks, as it can be minimized using gradient descent.

Mean Absolute Error (MAE)

This is a metric used to evaluate the performance of a model in regression tasks. It is calculated by taking the average of the absolute differences between the predicted and actual values. The absolute error of a single data point is calculated as the absolute difference between the predicted and actual values.

The formula is as follows:

where,

n is the number of data points
y_pred is the predicted value
y is the actual value

The Mean Absolute Error (MAE) is a non-negative value. The lower the value, the better the regression model. If the MAE is close to 0, it indicates the best fit; values of 1 or more indicate a poor fit.

Root Mean Squared Error (RMSE)

This metric is used to evaluate the performance of a regression model. It is calculated by taking the square root of the average of the squared differences between the predicted and actual values.

The Mean Squared Error (MSE) is the average of the squared differences between the predicted and actual values. The Root Mean Squared Error (RMSE) is the square root of this value.

The formula is as follows,

where,

n is the number of data points
y_pred is the predicted value
y is the actual value

The Root Mean Squared Error (RMSE) is a non-negative value. A lower RMSE indicates a better fit of the regression model; values close to 0 indicate the best fit, while values of 1 or more indicate a poor fit. Compared to the Mean Squared Error (MSE), RMSE tends to have smaller values, making it easier to interpret.

R-Squared (R²)

R-squared is a statistical measure that indicates how well a regression model fits the data. An ideal R-squared value is 1, meaning the model fits perfectly. The closer the R-squared value is to 1, the better the model fits the data.

The total sum of squares is calculated by the summation of squares of perpendicular distance between data points and the average line.

The residual sum of squares (SSres) is calculated by summing the squares of the perpendicular distances between data points and the best-fitted line.

R-Squared’s formula:

Where,

SSres is the residual sum of squares
SStot is the total sum of squares.

The goodness of fit of regression models can be evaluated using the R-square method. The higher the value of the R-square, the better the model is.

I hope I’ve helped you understand some fundamental concepts about performance measurements. If you enjoy this content, giving it some clapsU+1F44F will give me a little extra motivation.

You can reach me at:

LinkedIn: https://www.linkedin.com/in/pranay16/

Github: https://github.com/pranayrishith16

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Understanding Machine Learning Performance Metrics

Author(s): Pranay Rishith

Evaluating the Effectiveness of Your Models

Classification

Confusion Matrix

Accuracy

Precision

Recall

F1 score

Regression

Mean Squared Error (MSE)

Mean Absolute Error (MAE)

Root Mean Squared Error (RMSE)

R-Squared (R²)

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Accelerating Drug Approvals Using Advanced RAG

How AI is Transforming Evaluation Practices

The Potential Consciousness of AI: Simulating Awareness and Emotion for Enhanced Interaction

TAI #136: DeepSeek-R1 Challenges OpenAI-o1 With ~30x Cheaper Open-Source Reasoning Model

Mastering Data Scaling: The Only Guide You’ll Ever Need (Straight from My Journey)

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Understanding Machine Learning Performance Metrics

Author(s): Pranay Rishith

Evaluating the Effectiveness of Your Models

Classification

Confusion Matrix

Accuracy

Precision

Recall

F1 score

Regression

Mean Squared Error (MSE)

Mean Absolute Error (MAE)

Root Mean Squared Error (RMSE)

R-Squared (R²)

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement