Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Biggest Problem With ML Systems Today
Latest

Biggest Problem With ML Systems Today

Last Updated on August 11, 2022 by Editorial Team

Author(s): Astha Puri

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Photo by Rain Bennett onΒ Unsplash

So you built a machine learning model and deployed it in production. Most of the models built today do not make it to production. So in this scenario, we are the rare few whose ML system is actually out there in the world.Β Hurray!

Only to see that it is worsening in performance with time. Why!! Why is its performance degrading? We looked for everything. It performed great on all the model evaluation metrics there are. And still. Why? The Boss is annoyed, customers are complaining, and business is heading toΒ doom…..

Welcome to the concept ofΒ drift.

Photo by KhamkΓ©o Vilaysing onΒ Unsplash

Drift happens when there are changes to the operational environment of your ML system. This was the kind of environment it was trained in. With time, if this changes inherently, your ML model no longer predicts correctly. Hence, the progression toward poorer and poorer performance.

How many times have you heard this at your companyβ€Šβ€”β€Šβ€˜yes, there was a data scientist before you who wanted to work on models and built this amazing model, which everyone was sold for. Then they left the company for more $$$. Their model no longer works, and well, now everyone is skeptical about using data science/ML altogether’.

If the model was never performing well, it could be a gazillion things that could be causing that. But if it was performing well at a point in time, and you notice the degradation over time, drift is something you might want to look at seriously.

There are 2 types of drift that commonlyΒ occur:

  • Concept drift
  • Data drift (also called covariate shift or featureΒ drift)

1. ConceptΒ Drift

This happens when something in the base environment changes in which the model wasΒ built.

Consider digital and online scams, for example. Fraudsters are constantly adapting to changing security and protection protocols online. The definition of what is considered spam has evolved over time. This is like a fundamental change in the environment.

To fix a drift like this, we need to change the model design itself. WeΒ can:

  • add newΒ features
  • change MLΒ model
  • or do both of theΒ above

2. DataΒ Drift

This affects the feature extraction part of ML modeling. It can usually be fixed by retraining the model on newΒ data.

Anything that degrades the data quality could lead to a data drift, for example, badly calibrated sensors. Or, for example, an NLP model trained on millennial language but then not being able to predict because, with time, gen Z have grown up and entered the world where they are also generating a lot of gen Z lingoΒ data.

It is a very real world problem and drift will happen. There is nothing you can do about it to prevent it like other problems in ML such as overfitting, data leaksΒ etc.

Good newsβ€Šβ€”β€Šwe can win the battle againstΒ drift

Photo by GR Stocks onΒ Unsplash

Like blood is a very good sign that you’ve been physically hurt, your model performing badly and accuracy going down is a very good indicator that drift has happened in some form.
Is it a very good indicator? Yes.
Is it the best timing?Β No.

You want to identify drift before things have blown up and customers are complaining or worseβ€Šβ€”β€Šchurning. An additional drawback with waiting till you see drift in model accuracy is that, even after identifying drift, we don’t know which type of drift it is. So the problem here has a higher operational impact, and problem-solving takesΒ longer.

A better approach might be to β€˜monitor’ the model and identify early signs. This can be done by monitoring the intermediate stages of the model’s outputs. This can be doneΒ at:

  • Feature extraction stageβ€Šβ€”β€Šby comparing baseline distribution of features to current distributions. If the gap is large, there might be a potential drift. This analysis can be done using statistical tests for the Kolmogorov-Smirnov test (K-SΒ testing)
  • Modeling stageβ€Šβ€”β€Šwe can compare basic patterns that appear in various layers of the model. An auxiliary model can be trained to compare these under normalcy vs outliers from trainingΒ data.
  • Checking confidenceβ€Šβ€”β€Škeeping a tab on the confidence of the model in its results can be a good indicator. If the confidence starts going down, it could be a potential flag forΒ drift.

Summary

Drift is inevitable. Even though it is something we cannot prevent during our modeling, there are still ways to detect it and fix it. The key lies in early detection so that it can be solved before it affects business operations, customer experience, reviews, andΒ revenue.


Biggest Problem With ML Systems Today was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓