What is MLOps
Last Updated on March 30, 2022 by Editorial Team
Author(s): Jeff Holmes
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Guide to the MLOps process
MLOps is a set of methods and techniques to deploy and maintain machine learning (ML) models in production, reliably and efficiently. Thus, MLOps is the intersection of Machine Learning, DevOps, and Data Engineering.
Saying that MLOps is in a state of flux would be an understatement . The best advice that I could give on MLOps would be to try to hire someone who is able to see the “big picture”. Any competent software engineer can learn how to use a particular MLOps platform since it does not require an advanced degree. Therefore, a common mistake when interviewing is to focus on the minutia of a particular platform (AWS, GCP, Databricks, MLflow, etc.). The ideal MLOps engineer would have some experience with several MLOps platforms. In fact, too much experience with a single platform will most likely be problematic in the future since 1) most companies seldom stay with one cloud platform over time and 2) there are not currently any standards for MLOps.
In fact, a major problem with MLOps is the lack of standards. Thus, each platform tends to use different terminology. Hopefully, SEI or IEEE will soon publish an AI Engineering guide to standardize the terminology similar to SWEBOK.
In general, the ML model process involves eight stages which may also include data collection and/or data labeling:
- Data preparation
- Feature engineering
- Model design
- Model training and optimization
- Model evaluation
- Model deployment
- Model serving
- Model monitoring
In contrast, ModelOps is focused on managing the full software development life cycle (SDLC) of a variety of AI models including machine learning, knowledge graphs, rules, optimization, natural language, and agent-based models.
The Machine Learning Workflow
Machine learning requires experimenting with a wide range of datasets, data preparation, and algorithms to build a model that maximizes some target metric(s). Once a model has been built, the next step would be to deploy the final model to a production system, monitor model performance, and continuously retrain the model on new data and compare it with alternative models.
Thus, being productive with machine learning can be challenging for several reasons :
- It is difficult to keep track of experiments. When working with files on your laptop or with a notebook, how can we tell which data, code, and parameters were used to obtain a particular result?
- It is difficult to reproduce code. Even if we carefully track the code versions and parameters, we still need to capture the entire environment (such as library dependencies) to reproduce the same results. This is even more challenging when working in a team or if we want to run the same code at scale on another platform (such as the cloud).
- There is no standard way to package and deploy models. Every data science team develops its own approach for each ML library that is used, so the link between the model and the code and parameters is often lost.
- There is no central store to manage models (versions and stage transitions). Without a central place to collaborate and manage the model lifecycle, data science teams will encounter challenges managing model stages.
Although individual ML libraries provide solutions to some of these problems (such as model serving), you usually want to try multiple ML libraries to get the best results. An MLOps tool allows you to train, reuse, and deploy models with any library and package them into reproducible steps that other data scientists can use as a “black box” without needing to know which libraries you are using.
Keep in mind that trying to retrofit or apply MLOps piecemeal is a common misconception and would actually be considered an antipattern .
Thus, an MLOps platform must provide at least five features to help manage the ML workflow :
- Tracking: an API for logging parameters, code versions, metrics, and artifacts when running machine learning code and to later visualize the results.
- Projects: a standard format for packaging reusable ML code.
- Models: a convention for packaging ML models in multiple flavors and a variety of tools to help in deploying them.
- Registry: a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an ML model(s).
- Scalability: designed to scale to large data sets, large output files, and a large number of experiments.
MLOps is a set of methods and techniques to deploy and maintain machine learning (ML) models. However, MLOps does not currently have any standards defined, so it is important to keep this in mind. Therefore, it is important to see the “big picture” which involves understanding the key concepts, stages, features, and challenges with MLOps.
 MLflow Concepts
 MLOps Principles
Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI