Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-FranΓ§ois Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

The Essential Guide to ML Evaluation Metrics for Regression
Artificial Intelligence   Data Science   Latest   Machine Learning

The Essential Guide to ML Evaluation Metrics for Regression

Author(s): Ayo Akinkugbe

Originally published on Towards AI.

The Essential Guide to ML Evaluation Metrics for Regression
Photo by Europeana on Unsplash

Introduction

Machine learning models are only as good as our ability to measure them. Though a perfect model isn’t always possible, a good enough model is. But how do we determine good enough for an ML model? This is where evaluation metrics come in to play. There are various metrics for various scenarios and sometimes specific tasks and models. In production ML systems, choosing the right metric is important. ML Models can be designed to perform a variety of tasks ranging from regression, classification, unsupervised learning, generative tasks and reinforcement learning.

Regression is a fundamental machine learning task used to predict continuous outcomes based on one or more predictor variables. Common examples of regression tasks include forecasting sales, predicting house prices, or estimating patient recovery times. Choosing the right metric to evaluate a model’s performance is important. This post provides an exhaustive exploration of regression evaluation metrics, simplifiying each with practical case studies. This post covers the following metrics:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • Mean Squared Log Error (MSLE)
  • Root Mean Squared Error (RMSE)
  • Root Mean Squared Log Error (RMSLE)
  • Mean Absolute Percentage Error (MAPE)
  • Symmetric Mean Absolute Percentage Error (sMAPE)
  • Weighted Mean Absolute Percentage Error (wMAPE)
  • Mean Absolute Scaled Error (MASE)
  • Mean Squared Prediction Error (MSPE)
  • Mean Directional Accuracy (MDA)
  • Median Absolute Deviation (MAD)
  • RΒ² Score (Coefficient of Determination)
  • DΒ² Absolute Error Score
  • Mean Gamma Deviance (MGD)
  • Mean Poisson Deviance (MPD)
  • Explained Variance Score
Photo by Birmingham Museums Trust on Unsplash

Mean Absolute Error (MAE)

MAE measures the average absolute difference between predicted values and actual values. It gives equal weight to all errors, regardless of their direction.

Where n is the number of observations.

Imagine you’re predicting house prices. MAE tells you, on average, how many dollars off your predictions are, regardless of whether you overestimated or underestimated.

Case Study:

A real estate company built a model to predict home prices in Seattle. Their model had an MAE of $25,000. This means, on average, their predictions were off by $25,000 (either too high or too low).

Use When:

  • You want errors to be interpreted in the same units as the output variable
  • Large and small errors should be treated with equal importance
  • Outliers should not have an outsized influence on your evaluation

Significance:

MAE is intuitive and easy to explain to non-technical stakeholders. It’s particularly useful in business contexts where the actual magnitude of error matters, such as financial forecasting.

Mean Squared Error (MSE)

MSE measures the average of the squared differences between predicted and actual values. It penalizes larger errors more heavily than smaller ones.

If you’re predicting delivery times, MSE will penalize a prediction that’s off by 20 minutes much more than one that’s off by 5 minutes.

Case Study:

A logistics company developed a model to predict package delivery times. Their model had an MSE of 225, meaning the average squared error was 225 minutesΒ². This indicates that some predictions had relatively large errors, which were heavily penalized by the squaring operation.

Use When:

  • Larger errors are more problematic than smaller ones
  • You’re particularly concerned about outliers
  • You’re doing mathematical optimization (MSE has nice mathematical properties which include continuity and a well-behaved derivative)

Significance:

MSE is widely used in statistical modeling and machine learning. Its mathematical properties make it suitable for optimization algorithms, but its squared units can make interpretation challenging for non-technical audiences.

Mean Squared Log Error (MSLE)

MSLE applies the natural logarithm to actual and predicted values before calculating the mean squared error. It penalizes underestimation more than overestimation.

If you’re predicting sales volumes, MSLE will penalize you more for predicting 50 units when the actual was 100 (underestimation) than for predicting 150 when the actual was 100 (overestimation).

Case Study:

An e-commerce platform used MSLE to evaluate their sales forecasting model. With highly variable sales volumes (from 10 to 10,000 units), MSLE helped them focus on the relative errors rather than absolute differences, which would have been dominated by high-volume products.

Use When:

  • Target variable spans multiple orders of magnitude
  • You care more about relative errors than absolute errors
  • Underestimation is more problematic than overestimation
  • Data is right-skewed

Significance:

MSLE is particularly valuable for datasets with exponential growth patterns or when the target variable has a wide range of values. It’s commonly used in sales, revenue, and count predictions.

Root Mean Squared Error (RMSE)

RMSE is simply the square root of the Mean Squared Error. It brings the error metric back to the same units as the original data.

If you’re predicting temperature, RMSE tells you the typical size of your error in degrees, but with larger errors penalized more heavily.

Case Study:

A weather forecasting service evaluated their temperature prediction model using RMSE. Their model achieved an RMSE of 2.5Β°C, meaning that while most predictions were close, some significant misses occurred that increased the overall error.

Use When:

  • You want error in the same units as your target variable (unlike MSE)
  • Larger errors should be penalized more than smaller ones
  • You need a metric that’s widely recognized in your field

Significance:

RMSE is one of the most popular regression metrics and is often the default choice in many applications. It combines the mathematical advantages of MSE with the interpretability of having the same units as the target variable.

Root Mean Squared Log Error (RMSLE)

RMSLE is the square root of the Mean Squared Log Error. It maintains the properties of MSLE but returns values in a scale that’s closer to the original data.

If you’re predicting product demand, RMSLE helps you understand your typical relative error while penalizing underestimation more heavily than overestimation. It is useful when over-forecasting is less costly than under-forecasting.

Case Study:

In a Kaggle competition for store sales prediction, RMSLE was used as the evaluation metric. This allowed competitors to focus on getting the relative scale of predictions right across both high-volume and low-volume products.

Use When:

  • Dealing with data that spans multiple orders of magnitude
  • You want to penalize underestimation more than overestimation
  • You want a more interpretable version of MSLE

Significance:

RMSLE is particularly important in competitions and applications where the target variable has an exponential or power-law distribution, such as sales forecasting, population predictions, or epidemic modeling.

Photo by British Library on Unsplash

Mean Absolute Percentage Error (MAPE)

MAPE measures the average percentage difference between predicted and actual values. It expresses error as a percentage of the actual value.

If you’re forecasting revenue, MAPE tells you the average percentage by which your predictions missed the mark.

Case Study:

A retail chain used MAPE to evaluate their revenue forecasting model. With a MAPE of 12%, they knew that, on average, their weekly revenue predictions were off by 12% of the actual revenue.

Use When:

  • You want to understand error in percentage terms
  • Comparing performance across different scales
  • Communicating results to business stakeholders

Significance:

MAPE is highly intuitive for business contexts where percentage errors are more meaningful than absolute errors. However, it has limitations when actual values are close to or equal to zero.

Symmetric Mean Absolute Percentage Error (sMAPE)

sMAPE is a variation of MAPE that treats over-forecasting and under-forecasting more symmetrically. It uses the average of the actual and predicted values in the denominator.

sMAPE gives you a percentage error that doesn’t unfairly penalize overestimation or underestimation.

Case Study:

In the M4 Forecasting Competition, sMAPE was one of the primary evaluation metrics. It allowed fair comparison of forecasting methods across diverse datasets with different scales and characteristics.

Use When:

  • You need a percentage error that treats overestimation and underestimation more equally
  • Your actual values might be close to or equal to zero
  • Comparing different forecasting methods

Significance:

sMAPE addresses some of the mathematical limitations of MAPE, particularly when dealing with values near zero or when comparing methods that tend to bias in different directions.

Weighted Mean Absolute Percentage Error (wMAPE)

wMAPE calculates the sum of all absolute errors divided by the sum of all actual values. It effectively weights errors by the magnitude of the actual values.

wMAPE gives more importance to errors in predicting larger values than smaller values.

Case Study:

A manufacturing company used wMAPE to evaluate their inventory forecasting model. This metric gave more weight to high-volume products, which had a greater impact on their overall inventory costs.

Use When:

  • Errors in larger values are more important than errors in smaller values
  • You want to avoid the division-by-zero problem in MAPE
  • Aggregating errors across multiple items with different scales

Significance:

wMAPE is particularly useful in supply chain, inventory management, and financial forecasting where the impact of errors is proportional to the magnitude of the values being predicted.

Mean Absolute Scaled Error (MASE)

MASE compares your model’s performance to a naive forecast (typically, using the previous value as the prediction). It scales the errors relative to the naive method’s performance.

Where the denominator is the MAE of the naive forecast.

MASE tells you how much better (or worse) your model is compared to simply using the last observed value as your prediction.

Case Study:

A financial services company evaluated their stock price prediction model using MASE. With a MASE of 0.85, they knew their model was 15% better than simply using yesterday’s price as today’s prediction.

Use When:

  • You want to compare performance against a simple benchmark
  • Dealing with time series data
  • Your data has seasonal patterns or trends
  • Other percentage-based errors (like MAPE) are problematic due to zero or near-zero values

Significance:

MASE provides a scale-free error metric that works well across different datasets and avoids the mathematical problems of some other metrics. It’s particularly valuable in time series forecasting.

Mean Squared Prediction Error (MSPE)

MSPE is essentially the same as MSE but is sometimes used specifically in the context of out-of-sample prediction evaluation. MSPE measures how well your model predicts new, unseen data points.

Case Study:

A healthcare analytics team used MSPE to evaluate their patient readmission risk model on a test dataset. The MSPE helped them understand how well their model would generalize to new patients.

Use When:

  • Specifically evaluating predictive performance on test data
  • Larger errors should be penalized more heavily
  • Distinguishing between in-sample fit and out-of-sample prediction

Significance:

While mathematically identical to MSE, the term MSPE emphasizes the focus on prediction performance rather than model fit, which is an important distinction in applied machine learning.

Photo by Birmingham Museums Trust on Unsplash

Mean Directional Accuracy (MDA)

MDA measures the percentage of times that your model correctly predicts the direction of change (up or down) compared to the previous value. For instance, if you’re predicting stock prices, MDA tells you how often your model correctly predicts whether the price will go up or down, regardless of the magnitude.

Where the result of the equality check is 1 if true and 0 if false.

Case Study:

An investment firm evaluated their market trend prediction model using MDA. With an MDA of 68%, they knew their model correctly predicted market direction more than two-thirds of the time, which was valuable for trading strategies even if the exact price predictions weren’t perfect.

Use When:

  • The direction of change is more important than the exact value
  • In financial forecasting and trading models
  • Evaluating trend predictions

Significance:

MDA is particularly important in financial applications where predicting the direction correctly can be more valuable than predicting the exact magnitude. It focuses on a different aspect of predictive performance than error-based metrics.

Median Absolute Deviation (MAD)

MAD measures the median of the absolute deviations from the median of the errors. It’s a robust statistic that’s less influenced by outliers than mean-based metrics.

MAD tells you the typical size of your error, but isn’t skewed by occasional very large errors.

Case Study:

A traffic prediction system used MAD to evaluate performance because occasional extreme traffic events (accidents, sports games) would skew mean-based metrics. MAD provided a more stable measure of typical prediction accuracy.

Use When:

  • Data contains outliers
  • You want a robust measure of typical error
  • Median is a better measure of central tendency than the mean for your data

Significance:

MAD is particularly valuable in domains where outliers are common but not the primary focus of prediction quality, such as traffic prediction, demand forecasting with occasional spikes, or any domain with heavy-tailed error distributions.

Mean Poisson Deviance (MPD)

MPD is a specialized metric based on the Poisson probability distribution. It’s appropriate for count data where the variance is expected to equal the mean.

MPD is designed specifically for evaluating predictions of count data, like the number of customer arrivals, disease cases, or product sales.

Case Study:

An epidemiology team used MPD to evaluate their model predicting the number of new disease cases across different regions. This metric was appropriate because disease counts typically follow a Poisson distribution.

Use When

  • Predicting count data (non-negative integers)
  • Variance of data is approximately equal to the mean
  • In fields like epidemiology, call center management, or inventory of discrete items

Significance:

MPD is derived from statistical theory and provides an appropriate loss function for Poisson-distributed data. It’s particularly important in fields where count data prediction is common.

Mean Gamma Deviance (MGD)

MGD is based on the Gamma probability distribution and is appropriate for continuous, positive data with variance proportional to the square of the mean.

MGD is designed for evaluating predictions of positive, continuous quantities where larger values have larger variability, such as insurance claim amounts or rainfall volumes.

Case Study:

An insurance company used MGD to evaluate their model for predicting claim amounts. Since larger claims naturally had more variability, MGD provided a more appropriate evaluation than metrics assuming constant variance.

Use When:

  • Predicting positive, continuous values
  • Variance of data increases with the mean
  • In fields like insurance, hydrology, or finance dealing with skewed distributions

Significance:

MGD is derived from statistical theory for Gamma-distributed data. It’s particularly valuable in domains where the coefficient of variation (standard deviation divided by mean) is roughly constant.

RΒ² Score (Coefficient of Determination)

RΒ² measures the proportion of variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1 (or can be negative for very poor models).

RΒ² tells what percentage of the variation in the target variable the model explains. An RΒ² of 0.7 means your model explains 70% of the variation in the data.

Case Study:

A team analyzing factors affecting house prices built a regression model with an RΒ² of 0.82. They could confidently state that their model, which included features like square footage, neighborhood, and number of bedrooms, explained 82% of the variation in house prices in their dataset.

Use When:

  • You want to understand how much of the variance your model captures
  • Comparing different models on the same dataset
  • Communicating model performance to stakeholders familiar with statistics

Significance:

RΒ² is perhaps the most widely recognized regression metric across fields. It provides an intuitive scale from 0 to 1 (though it can be negative for very poor models), making it easy to interpret. However, it can be misleadingly high when overfitting or when using many features.

DΒ² Absolute Error Score

DΒ² is similar to RΒ² but uses absolute errors instead of squared errors. It measures the improvement over predicting the median (rather than the mean).

DΒ² tells you how much better your model is compared to simply predicting the median value for every observation.

Case Study:

A healthcare researcher developed a model to predict patient recovery times. With a DΒ² of 0.65, they could explain that their model reduced the absolute error by 65% compared to simply predicting the median recovery time for all patients.

Use When:

  • Data contains outliers that would overly influence RΒ²
  • Median is a better central tendency measure than the mean for your data
  • You want a metric based on absolute errors rather than squared errors

Significance:

DΒ² provides an alternative to RΒ² that is more robust to outliers and may better represent model performance for skewed distributions. It’s particularly useful in fields where absolute errors are more interpretable than squared errors.

Explained Variance Score

The Explained Variance Score measures the proportion of variance in the dependent variable that is explained by the model. It’s similar to RΒ² but focuses specifically on variance explanation.

This metric shows how much of the variability in the target variable a model captures, without penalizing systematic bias as heavily as RΒ².

Case Study:

A climate scientist developed a model to predict temperature variations. The model had an Explained Variance Score of 0.75, indicating it captured 75% of the temperature variability, even though it consistently predicted temperatures that were slightly lower than actual (a systematic bias that would reduce RΒ²).

Use When:

  • You want to focus on capturing variance patterns rather than absolute prediction accuracy
  • Systematic bias is less important than capturing the patterns of variation
  • Comparing models that might have different systematic biases

Significance:

Explained Variance provides insight into how well your model captures the patterns in your data, even if there are systematic offsets. It can be particularly useful when the pattern of variation is more important than the absolute values.

Conclusion

Regression metrics are definitely not a one-size-fits-all. The appropriate choice depends on your data characteristics, the specific problem you’re solving, and the needs of your stakeholders. By understanding the strengths, weaknesses, and appropriate use cases for each metric, you can make more informed decisions on evaluating and improving your regression models.

It’s often valuable to consider multiple metrics simultaneously, as they can provide complementary insights into a model’s performance. A model that performs well across several relevant metrics is likely to be more robust and useful in real-world applications.

It is also important to note that the field of predictive modeling continues to evolve, with new metrics and variations being developed to address specific challenges.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓