What is MLOps
Last Updated on August 17, 2023 by Editorial Team
Author(s): Jeff Holmes MS MSCS
Originally published on Towards AI.
MLOps is a set of methods and techniques to deploy and maintain machine learning (ML) models in production reliably and efficiently. Thus, MLOps is the intersection of Machine Learning, DevOps, and Data Engineering (Figure 1).
Background
Saying that MLOps is in a state of flux would be an understatement [7]. The best advice that I could give on MLOps would be to try to hire someone who is able to see the βbig pictureβ. Any competent software engineer can learn how to use a particular MLOps platform since it does not require an advanced degree. Therefore, a common mistake when interviewing applicants is to focus on the minutia of a particular platform (AWS, GCP, Databricks, MLflow, etc.). The ideal MLOps engineer would have some experience with several MLOps and/or DevOps platforms. In fact, too much experience with a single platform will most likely be problematic in the future since most companies seldom stay with one cloud platform over time, and there are not currently any standards for MLOps.
A major problem with MLOps is the lack of standards which means that each platform tends to use different terminology. Hopefully, SEI or IEEE will soon publish an AI Engineering guide to standardize the terminology similar to SWEBOK. For now, I would recommend learning MLflow since it is open-source and seems to be very popular.
Many people use the term βpipelineβ in MLOps which can be confusing since pipeline is computer science term that refers to a linear sequence with a single input/output. A better definition would make use of the directed acyclic graph (DAG) since it may not be a linear process. Thus, the term workflow is a better description of the many kinds of processes that could be involved at any stage in the MLOps SDLC.
The Machine Learning Model Process
In general, the ML model process involves eight stages (Figures 2 and 3) which may include data collection and/or data labeling [1]:
- Data preparation
- Feature engineering
- Model design
- Model training and optimization
- Model evaluation
- Model deployment
- Model serving
- Model monitoring
In contrast, ModelOps is focused on managing the full software development life cycle (SDLC) of a variety of AI models, including machine learning, knowledge graphs, rules, optimization, natural language, and agent-based models (Figure 4).
The Machine Learning Workflow
Machine learning requires experimenting with a wide range of datasets, data preparation, and algorithms to build a model that maximizes some target metric(s). Once a model has been built, the next step would be to deploy the final model to a production system, monitor model performance, and continuously retrain the model on new data and compare it with alternative models.
Therefore, being productive with machine learning for real-world applications can be challenging for several reasons [3]:
- It is difficult to keep track of experiments. When working with files on your laptop or with a notebook, how can we tell which data, code, and parameters were used to obtain a particular result?
- It is difficult to reproduce code. Even if we carefully track the code versions and parameters, we still need to capture the entire environment (such as library dependencies) to reproduce the same results. This is even more challenging when working in a team or if we want to run the same code at scale on another platform (such as the cloud).
- There is no standard way to package and deploy models. Every data science team develops its own approach for each ML library that is used, so the link between the model and the code and parameters is often lost.
- There is no central store to manage models (versions and stage transitions). Without a central place to collaborate and manage the model lifecycle, data science teams will encounter challenges managing model stages.
Although individual ML libraries provide solutions to some of these problems (such as model serving), you usually want to try multiple ML libraries to get the best results. An MLOps tool allows you to train, reuse, and deploy models with any library and package them into reproducible steps that other data scientists can use as a βblack boxβ without needing to know which libraries you are using.
Keep in mind that trying to retrofit or apply MLOps piece-meal is a common misconception and would actually be considered a software design antipattern [5][6]. Surprisingly, some companies such as Nvidia are currently trying to do just this en masse across all their software development projects which is not feasible and will likely prove to be problematic.
Thus, an MLOps platform must provide at least five features to help manage the ML workflow [3]:
- Tracking: an API for logging parameters, code versions, metrics, and artifacts when running machine learning code and later visualizing the results.
- Projects: a standard format for packaging reusable ML code.
- Models: a convention for packaging ML models in multiple flavors and a variety of tools to help in deploying them.
- Registry: a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an ML model(s).
- Scalability: designed to scale to large data sets, large output files, and a large number of experiments.
Conclusion
MLOps is a set of methods and techniques to deploy and maintain machine learning (ML) models. However, MLOps does not currently have any standards defined, so it is important to keep this in mind. Therefore, it is important to see the βbig pictureβ which involves understanding the key concepts, stages, features, and challenges with MLOps.
References
[1] J. S. Damji and M. Galarnyk, β Considerations for Deploying Machine Learning Models in Production,β Towards Data Science, Nov. 19, 2021.
[2] B. Rogojan, β What Is MLOps And Why Your Team Should Implement It,β SMB Lite, Nov. 25, 2020.
[3] β MLflow Concepts,β MLflow Documentation, Last accessed: Aug. 19, 2022*.*
[4] L. Visengeriyeva, A. Kammer, I. BΓ€r, A. Kniesz, and M. PlΓΆd, β MLOps Principles,β , Last accessed: Aug. 19, 2022.
[5] P. P. Ippolito, β Design Patterns in Machine Learning for MLOps,β Towards Data Science, Jan. 12, 2022.
[6] Abhijith C, β How (not) to do MLOps,β Towards Data Science, Jan. 10, 2022.
[7] M. Eric, β MLOps Is a Mess But Thatβs to be Expected,β KDnuggets, Mar. 25, 2022.
[8] β MLflow Quickstartβ, MLflow Documentation, Last accessed: Aug. 19, 2022.
[9] Kedion, β Managing Machine Learning Lifecycles with MLflow,β Medium, Jan. 25, 2022.
J. Demsar, T. Curk, A. Erjavec, C. Gorup, T. Hocevar, M. Milutinovic, M. Mozina, M. Polajnar, M. Toplak, A. Staric, M. Stajdohar, L. Umek, L. Zagar, J. Zbontar, M. Zitnik, and B. Zupan, β Orange QuickStart,β Orange Documentation, Last accessed: Aug. 19, 2022.
Y. Prakash, β A Quick Guide To ML Experiment Tracking β With Weights & Biases,β Towards Data Science, Mar. 7, 2022.
wallaroo.ai, β How to Evaluate Different Machine Learning Deployment Solutions,β Medium, Mar. 3, 2022.
A. Lamberti, β 4 MLOps Tools To Deploy Your Machine Learning Model,β Artificialis, Jan. 5, 2022.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI