Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Can We Reach Google’s MLOps Level 2 With Solely Self-hosted OSS?
Latest   Machine Learning

Can We Reach Google’s MLOps Level 2 With Solely Self-hosted OSS?

Last Updated on October 5, 2024 by Editorial Team

Author(s): Houssem Ben Braiek

Originally published on Towards AI.

Photo by Jef Willemyns on Unsplash

That’s not my answer, that’s what ChatGPT told me. You can see it below.

Figure 1: Disclosure of my discussion with ChatGPT

In May 2021, Google released a whitepaper on MLOps that introduced me to the concept of MLOps maturity levels. Back then, MLOps was mainly a concern for big tech companies already running ML models in production, as they began facing challenges and needed practices to reduce chaos and stabilize their products. Nowadays, several startups and large tech companies have introduced tools to assist practitioners in adopting these practices at scale for the development of machine learning models.

Depending on your initial software expertise and setup, there are generally three approaches to adopt these tools: First, cloud providers offer end-to-end ML engineering platforms that follow MLOps practices — perfect for corporations that rely on the cloud and want to add AI features to their products. Second, there’s the option of using various SaaS tools, each specialized in a specific MLOps function. This requires a bit more knowledge, as you need to integrate these tools into your internal ML or data science workflows. Lastly, there’s the open-source route, where you can self-host and integrate these tools with your existing (often open-source) ML engineering stack. This option requires mastery of MLOps fundamentals and a willingness to climb the learning curve. However, it offers full control over your data and models, and often delivers a higher ROI once the platform is set up.

In the following, I will introduce you to a simple application that uses ML and we will try to increase its MLOps maturity levels. By this, we mean the minimum of practices and tools to establish to ensure that we meet the requirements outlined by the whitepaper for each maturity level.

Keep in mind, there’s no black-and-white answer here. You might still choose to use some SaaS tools, depending on your case. In this blog, though, I’ll focus on how to reach MLOps maturity level 2 with a fully self-hosted, open-source stack. The good news? Most of these tools also have SaaS versions if you prefer to rely on third parties for certain steps.

Next, I’ll introduce a simple ML application, and we’ll work on increasing its MLOps maturity levels. This way, we’ll cover the minimum practices and tools needed to meet the requirements outlined by the whitepaper for each maturity level.

As I mentioned earlier, the third option requires that companies are comfortable with software and already have an internal ML engineering stack before considering self-hosting OSS. Our starting point will be a production-ready software system that leverages an ML model.

In Figure 1, you can see a telemarketing application that filters prospects based on their profiles — such as income, family situation, and previous campaign records from the campaign database — along with the current socio-economic landscape. It employs an ML-based scoring technique to enhance conversion chances, allowing telemarketers to focus on high-potential prospects who are more likely to become customers.

Figure 2: ML-powered telemarketing web application (No MLOps)

This application is built as a web platform that utilizes the company’s campaign database along with socio-economic data retrieved periodically from specialized APIs. It leverages a machine learning REST API service that exposes a model to score prospects based on the current socio-economic trends. This REST service hosts a Scikit-learn-based model trained on historical datasets from previous campaigns and their corresponding socio-economic data.

We assume that the ML engineering team has successfully completed all the necessary steps — from data extraction and preprocessing to feature engineering, model optimization, validation, and deployment — following best practices. The logos of Docker and Kubernetes highlight the company’s commitment to DevOps practices. Both the web application and the ML service are containerized and deployed using an orchestrator, allowing for easy management, cross-platform deployment, and scalability. This setup ensures loose coupling between the two applications, enabling deployment patterns like canary releases, where we can test 5% of model predictions with a new version deployed in parallel.

The additional logo for GitHub Actions indicates that CI/CD pipelines are established to ensure all development efforts conform to the software engineering practices set by the company. This systematic delivery pipeline facilitates the deployment of new features and models with every release.

So, you might think that after this introduction, we’re all set with a production-ready AI software system that we can evolve and improve over time. This brings us to the question: What can MLOps actually do for us?

Before diving into how we’ll apply MLOps levels, it’s important to take a moment to understand why we might not get very far with ML, even with all this solid software engineering in place.

The ML system consists of three key components: data, code, and model. So far, we’ve focused on versioning and managing the code. However, the data used and the model produced are two artifacts — one being the input and the other the output of the ML engineering process — that aren’t well versioned, managed, or synced with the code and the production system. This lack of synchronization makes reproducibility challenging, and evolving the model often feels like starting from scratch, with only the code being reusable.

Additionally, there are numerous hyperparameters to experiment with for each combination of code, data, and model, all in pursuit of finding the best trio to deploy. The costs associated with running experiments and extracting lessons learned can be substantial, given the computational power and expertise required. Relying on teams to keep track of these experiments in their own ways may boost productivity, but it won’t enable the company to scale effectively.

Furthermore, the ML model engineering typically occurs on a snapshot of production data, which is expected to represent all future data — something that will never be the case. Realistically, we must accept that this model will become obsolete over time due to data or concept drift, necessitating frequent retraining and redeployment alongside new features or bug fixes. Thus, having CI/CD in place won’t be much help; the ML engineering team still has to perform the entire engineering process to produce the new optimized model before launching CI/CD to ensure the ML service uses the latest model weights. It’s like having only the last mile ready for us while still needing to walk all the miles each time.

These challenges are exactly what MLOps practices and tools aim to help practitioners overcome. Rather than tackling everything at once, you can think of MLOps levels as a way to gradually shift your practices from manual to automated, allowing you to manage the complexity of your ML engineering platform more effectively and address challenges based on priority.

This approach aligns with the foundational principle of ML engineering: “start simple and build complexity over time.” You might recall this from your ML courses, where we always encourage training a simple model on your dataset before diving into state-of-the-art neural networks. It’s the same principle we’re adopting here.

Now that you’re either convinced that MLOps is essential or at least somewhat persuaded — hey, that’s perfectly fine! Don’t just take my word for it. Let me convince you by concretely demonstrating how MLOps addresses these challenges in the following sections.

To achieve MLOps Level 0, you must first recognize that the AI system consists of code, data, and models. On one hand, data versioning is crucial for incrementally versioning datasets and keeping track of all the snapshots retrieved and constructed for developing ML models. DVC (Data Version Control) is an open-source tool that integrates with Git, making it easy to version your data and synchronize it with your code versions. On the other hand, you need a model registry to version models that are selected for online testing, currently in production, or archived after failing an online test or being in production for a while. The registered model should be linked to a specific dataset version and code version to ensure overall synchronization among the three components. MLflow offers an open-source model registry that can be self-hosted and comes with many features.

Secondly, controlling the chaos of experimentation and avoiding the ephemerality of these valuable artifacts is essential. An experiment tracking tool is a must to track, record, and compare ML experiments during model selection and hyperparameter tuning across different iterations. We can utilize the MLflow experiment tracking tool to ensure that no experiments are wasted and that we maximize the value of all experiments conducted throughout the process. Using MLflow for both experiment tracking and model registry has the advantage of linking the registered model to the experiment that led to its creation, making it possible to track alternative experiments conducted during the engineering of that registered model. Plus, fewer tools mean less to learn and set up.

Thirdly, creating a REST API to expose ML models built with ML frameworks has become a repetitive software development task that is prone to bugs and can be costly to scale. This necessity has led to the emergence of ML model serving, which involves using tools specifically designed to establish a serving layer for your models built with supported ML frameworks. This serving layer ensures optimal resource usage and achieves the best possible latency, bandwidth, failure recovery, and security defenses based on the chosen deployment environment. For our use case with scikit-learn as the ML framework and MLflow as the model registry, I selected the MLServer tool, which enables high-performance and scalable serving of scikit-learn models registered in MLflow.

Figure 3 illustrates the evolution of our ML platform after enabling MLOps maturity Level 0.

Figure 3: ML-powered telemarketing web application (at MLOps Level 0)

Now, let’s move on to MLOps Level 1, where we’ll tackle the challenge of model obsolescence over time. Retraining the model on new data is essential because the IID (independently and identically distributed) assumption on production data simply won’t hold. This means that a sample from production data, no matter how large, cannot represent all future data. As we discussed earlier, CI/CD pipelines can’t fully address this issue; they merely ensure that the best-fitted model, as selected by the team, is registered for use by the MLServer — nothing more, nothing less. Instead, Continuous Training (CT) of the ML model is the key to avoiding this inevitable obsolescence.

Implementing CT requires a shift in mindset, transforming the focus from the ML model to the ML pipeline itself. This is crucial because we cannot enable CT if manual steps and human intervention are still needed to deliver new models for novel production data. ML engineers must develop an ML pipeline that processes every production data sample to produce a new, best-fitted model ready for online testing, such as through canary deployment. However, this isn’t as simple as just wrapping the ML engineering code into a runnable package for any production dataset. The ML engineering code comprises multiple interconnected steps with different expected inputs and outputs. There are also loops over certain steps based on their results, and failures or non-optimal outcomes need to be managed.

To implement this ML engineering workflow effectively, we should use a workflow orchestration framework like Prefect, which allows us to ensure production readiness with minimal refactoring of the code. However, there are two manual steps in conventional ML engineering that require human oversight: data validation and model validation before running experiments. Automating these steps purely through code is risky; if the data doesn’t meet minimum expectations, subsequent steps can either crash or fail silently, resulting in suboptimal models. Therefore, we should employ appropriate data validation and model validation tools that can translate our data and model requirements into metrics-driven validators that can run within a Prefect workflow. For our application, we can use Great Expectations for data validation and Deepchecks for model validation, creating an end-to-end automated pipeline for our prospect scoring ML model.

Figure 4 illustrates our application after placing the ML pipeline at the center of ML engineering.

Figure 4: ML-powered telemarketing web application (at MLOps Level 1)

I know what you were looking at… Yes, I’ve added a monitoring solution using EvidentlyAI, connected to a MongoDB NoSQL database, which is suitable for handling the prospects data. Monitoring is crucial at this MLOps maturity level because we need to keep an eye on data drift and model degradation to trigger retraining, rather than relying on periodic workflow jobs that can incur unnecessary costs and risks. Plus, it’s not always feasible to have ground truth labels for retraining the model, so logging input data and model predictions becomes essential. Last but not least, a solid monitoring solution allows practitioners to oversee the ML system’s performance and receive notifications if degradation occurs and persists, even after retraining attempts.

At this level, we effectively address the challenges I introduced earlier. Most of the MLOps tools you may have heard about are in use here. Importantly, there are numerous alternatives to these tools, even while adhering to the initial conditions of open-source and self-hosted deployment options. The tools selected were chosen for their ease of use, maturity, feature richness, and their specific focus or effectiveness on one or a few MLOps practices.

However, this is not the end; we still have another level to reach. I’ve included the Docker and Kubernetes logos to emphasize that self-hosting is implemented correctly following DevOps practices. Additionally, we have developed and tested a new software solution called ML pipeline, as well as another software solution for monitoring.

Simply put, MLOps level 2 is a return to the roots, emphasizing the need for CI/CD for the newly added software solutions, including the ML pipeline and ML monitoring. This ensures that these solutions continue to evolve, incorporating new features and improvements to meet business requirements. Otherwise, consider developing ML pipelines to produce individual ML models, which the company will use for multiple objectives. Scaling and maintaining these ML pipelines within an MLOps platform requires the application of software engineering and DevOps practices, just like any other software product. Also, it facilitates the reuse of components/modules throughout the organization’s different machine learning pipelines.

Figure 5 is the one showing this subtle addition to the previous schema. I highlighted in red the changes.

Figure 5: ML-powered telemarketing web application (at MLOps Level 2)

To conclude, I want to emphasize that the essence of MLOps lies in its practices, which form the foundation for effective machine learning in operations; the tools are just there to help you adopt the practices more easily. There’s a reason why a lot of posts around MLOps make it sound like it’s all about using a specific tool or a set of tools. Let me leave it up to you to figure out what that might be…

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓