Unboxing the Concept of Drift in Machine Learning

Last Updated on July 17, 2023 by Editorial Team

Author(s): Supreet Kaur

Originally published on Towards AI.

Machine Learning Drift is a common phenomenon that occurs once the machine learning algorithm is deployed to production. It can adversely affect the overall performance of your machine-learning model if not monitored closely and mitigated at the right time.

This article will provide an overview of machine learning drift and various types of drift and cover some practical techniques to eliminate drift.

Unboxing the Concept of Drift in Machine Learning

What is Machine Learning Drift?

Machine learning and AI models are built on the assumption that historical data projects an accurate representation of the future. But in a fast-changing world, this is rarely the case. The COVID-19 pandemic and the Russia-Ukraine war are two unprecedented events impacting model predictions.

Drift is a phenomenon where a model degrades over time in terms of performance; one observes a sudden decrease in the model performance compared to the training performance.

Types of Model Drift

The two main types of Model Drift are as follows:

Concept Drift: Concept Drift occurs when input data hasn’t changed, but the user behavior has changed, leading to a change in relationships between the input and target variables. One example of this is when the COVID-19 pandemic changed buyer behavior. People started purchasing more hand sanitizers and masks and spent less on travel. Any consumer-focused model trained pre-pandemic wouldn’t have been able to predict this behavior. Hence, there was a decrease in model accuracy.
Data Drift: Data Drift occurs when input or output data properties have changed.

Data drift can further be divided into two types:

Label Drift: This occurs when output data shifts. For example, if you were trying to build a model to predict if an applicant should receive a credit card, a large proportion of credit-worthy applications start showing up.

Feature Drift: This occurs when input data shifts. For example, if one of the input variables were income for the same model described above, most applicants’ incomes would increase or decrease.

Causes of Drift

Changes in user behavior: Commonly, user behavior will evolve, leading to changes within the input data. This will show up eventually in the model performance.
Bias data: Data Drift can occur due to bias in the input data. By “bias” I mean that your training data might favor one population over the other. This can also cause the model to be biased, leading to inaccurate model predictions.
Training data is not an accurate representation: It is possible that the training or input data used to train the model is not an accurate representation of the actual data, which can lead to Data Drift. For example, you might have used consumer data from the USA to train your model but launched the product in India; because users have unique patterns, you will observe model deterioration.

Detecting Drift

The obvious way to detect drift is to monitor the accuracy of the performance. However, it might be more complex to calculate this accuracy in some cases. There are other alternative methods you can use in such cases — two are described below:

Kolmogorov-Smirnov (K-S) test: The KS test is a test used to compare the training and post-training data. The null hypothesis states that the distributions for both datasets are identical. If the alternate hypothesis is accepted, we can conclude that the model has drifted.
Population stability Index (PSI): PSI is another metric that detects population changes over time. PSI<0.1 means no significant population changes, whereas PSI≥0.2 means significant population change.

Dealing with Drift

Drift is an inevitable phenomenon, so it is better to be prepared and deploy the following mechanisms to detect it well in advance, giving you enough time to mitigate it.

Monitor the model: The model’s performance is bound to change over time. This doesn’t mean that the relationship between the input variable and output has changed, it just means that the model was not trained on this particular segment of data, so it doesn’t know how to act on it. Hence, monitoring the model is necessary. Companies can develop their frameworks to do that. They can also integrate frameworks such as AWS Sage Maker, Deep Checks, etc., that exist in the marketplace today.
Training and test data should be consistent: Training and test data should be synced. Check that both of them are in the same period and similar location.
Retraining and redeployment: A scenario could exist where the only option is to retrain the model. It is imperative to be prepared for such a scenario. At this point, it might make more sense to analyze the feature importance and add/delete a few that are the leading cause of drift.
Data monitoring: Sudden changes in the data are one of the causes of data drift. It is vital to have data quality mechanisms in place that could flag issues with the data. This will also help you backtrack the data drift issue and assist in faster capture and remediation.
Unboxing the black box: The concept of explainable AI and responsible AI is gaining popularity because it allows you to understand the model output; having such frameworks will ensure that in the case of a shift in the machine learning model performance, you can get to the root of the issue instantly. There are open-source frameworks available to leverage like AX360 by IBM and What if by Tool by Google. There are some popular techniques as well such as LIME(Local Interpretable Model Agnostic Explanations )
Data Quality Checks: It’s crucial to have Data Quality Checks in place. Sometimes the drift can be caused by deteriorating data quality. There could be bias in data causing the model performance to decay over time.
Developing Statistical Metrics: Model performance metrics can be used for tracking the performance of supervised learning models. Statistical models including AUC and ROC can be set in place.

Drift can seem to be a challenging problem to solve. However, with the proper mechanisms in place, it can be curbed and dealt with as it occurs.

Connect with Supreet Kaur on LinkedIn.

This blog was posted on the official website of symbl.ai. Thanks to the team for their support

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Unboxing the Concept of Drift in Machine Learning

Author(s): Supreet Kaur

What is Machine Learning Drift?

Types of Model Drift

Causes of Drift

Detecting Drift

Dealing with Drift

This blog was posted on the official website of symbl.ai. Thanks to the team for their support

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Unboxing the Concept of Drift in Machine Learning

Author(s): Supreet Kaur

What is Machine Learning Drift?

Types of Model Drift

Causes of Drift

Detecting Drift

Dealing with Drift

This blog was posted on the official website of symbl.ai. Thanks to the team for their support

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement