Serving Python Machine Learning Models With Ease
Last Updated on April 19, 2022 by Editorial Team
Author(s): Ed Shee
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Ever trained a new model and just wanted to use it through an API straight away? Sometimes you donโt want to bother writing Flask code or containerizing your model and running it in Docker. If that sounds like you, you definitely want to check out MLServer. Itโs a python-based inference server that recently went GA and whatโs really neat about it is that itโs a highly-performant server designed for production environments too. That means that, by serving models locally, you are running in the exact same environment as they will be in when they get to production.
This blog walks you through how to use MLServer by using a couple of image models as examplesโฆ
Dataset
The dataset weโre going to work with is the Fashion MNIST dataset. It contains 70,000 images of clothing in greyscale 28×28 pixels across 10 different classes (top, dress, coat, trousers, etcโฆ).
If you want to reproduce the code from this blog, make sure you download the files and extract them into a folder named data. They have been omitted from the GitHub repo because they are quiteย large.
Training the Scikit-learn Model
First up, weโre going to train a support vector machine (SVM) model using the scikit-learn framework. Weโll then save the model to a file named Fashion-MNIST.joblib.
import pandas as pd
from sklearn import svm
import time
import joblib
#Load Training Data
train = pd.read_csv('../../data/fashion-mnist_train.csv', header=0)
y_train = train['label']
X_train = train.drop(['label'], axis=1)
classifier = svm.SVC(kernel="poly", degree=4, gamma=0.1)
#Train Model
start = time.time()
classifier.fit(X_train.values, y_train.values)
end = time.time()
exec_time = end-start
print(f'Execution time: {exec_time} seconds')
#Save Model
joblib.dump(classifier, "Fashion-MNIST.joblib")
Note: The SVM algorithm is not particularly well suited to large datasets because of its quadratic nature. The model in this example will depending on your hardware, take a couple of minutes toย train.
Serving the Scikit-learn Model
Ok, so weโve now got a saved model file Fashion-MNIST.joblib. Let's take a look at how we can serve that using MLServer…
First up, we need to install MLServer.
pip installย mlserver
The additional runtimes are optional but make life really easy when serving models, weโll install the Scikit-Learn and XGBoost onesย too
pip install mlserver-sklearn mlserver-xgboost
You can find details on all of the inference runtimesย here
Once weโve done that, all we need to do is add two configuration files:
- settings.json – This contains the configuration for the serverย itself.
- model-settings.json – As the name suggests, this file contains the configuration for the model we want toย run.
For our settings.json file it's enough to just define a single parameter:
{
"debug": "true"
}
The model-settings.json file requires a few more bits of info as it needs to know about the model we're trying toย serve:
{
"name": "fashion-sklearn",
"implementation": "mlserver_sklearn.SKLearnModel",
"parameters": {
"uri": "./Fashion_MNIST.joblib",
"version": "v1"
}
}
The name the parameter should be self-explanatory. It gives MLServer a unique identifier which is particularly useful when serving multiple models (we'll come to that in a bit). The implementation defines which pre-built server, if any, to use. It is heavily coupled to the machine learning framework used to train your model. In our case, we trained the model using scikit-learn so we're going to use the scikit-learn implementation for MLServer. For model, parameters we just need to provide the location of our model file as well as a versionย number.
Thatโs it, two small config files and weโre ready to serve our model using theย command:
mlserver startย .
Boom, weโve now got our model running on a production-ready server locally. Itโs now ready to accept requests over HTTP and gRPC (default ports 8080 and 8081 respectively).
Testing theย Model
Now that our model is up and running. Letโs send some requests to see it inย action.
To make predictions on our model, we need to send a POST request to the following URL:
http://localhost:8080/v2/models/<MODEL_NAME>/versions/<VERSION>/infer
That means to access the scikit-learn model that we trained earlier, we need to replace the MODEL_NAME with fashion-sklearn and VERSION withย v1.
The code below shows how to import the test data, make a request to the model server and then compare the result with the actualย label:
import pandas as pd
import requests
#Import test data, grab the first row and corresponding label
test = pd.read_csv('../../data/fashion-mnist_test.csv', header=0)
y_test = test['label'][0:1]
X_test = test.drop(['label'],axis=1)[0:1]
#Prediction request parameters
inference_request = {
"inputs": [
{
"name": "predict",
"shape": X_test.shape,
"datatype": "FP64",
"data": X_test.values.tolist()
}
]
}
endpoint = "http://localhost:8080/v2/models/fashion-sklearn/versions/v1/infer"
#Make request and print response
response = requests.post(endpoint, json=inference_request)
print(response.text)
print(y_test.values)
When running the test.py code above we get the following response from MLServer:
{
"model_name": "fashion-sklearn",
"model_version": "v1",
"id": "31c3fa70-2e56-49b1-bcec-294452dbe73c",
"parameters": null,
"outputs": [
{
"name": "predict",
"shape": [
1
],
"datatype": "INT64",
"parameters": null,
"data": [
0
]
}
]
}
Youโll notice that MLServer has generated a request id and automatically added metadata about the model and version that was used to serve our request. Capturing this kind of metadata is super important once our model gets to production; it allows us to log every request for audit and troubleshooting purposes.
You might also notice that MLServer has returned an array for outputs. In our request, we only sent one row of data but MLServer also handles batch requests and returns them together. You can even use a technique called adaptive batching to optimize the way multiple requests are handled in production environments.
In our example above, the modelโs prediction can be found in outputs[0].data which shows that the model has labeled this sample with the category 0 (The value 0 corresponds to the category t-shirt/top). The true label for that sample was a 0 too so the model got this prediction correct!
Training the XGBoostย Model
Now that weโve seen how to create and serve a single model using MLServer, letโs take a look at how weโd handle multiple models trained in different frameworks.
Weโll be using the same Fashion MNIST dataset but, this time, weโll train an XGBoost modelย instead.
import pandas as pd
import xgboost as xgb
import time
#Load Training Data
train = pd.read_csv('../../data/fashion-mnist_train.csv', header=0)
y_train = train['label']
X_train = train.drop(['label'], axis=1)
dtrain = xgb.DMatrix(X_train.values, label=y_train.values)
#Train Model
params = {
'max_depth': 5,
'eta': 0.3,
'verbosity': 1,
'objective': 'multi:softmax',
'num_class' : 10
}
num_round = 50
start = time.time()
bstmodel = xgb.train(params, dtrain, num_round, evals=[(dtrain, 'label')], verbose_eval=10)
end = time.time()
exec_time = end-start
print(f'Execution time: {exec_time} seconds')
#Save Model
bstmodel.save_model('Fashion_MNIST.json')
The code above, used to train the XGBoost model, is similar to the code we used earlier to train the scikit-learn model but this time our model has been saved in an XGBoost-compatible format as Fashion_MNIST.json.
Serving Multipleย Models
One of the cool things about MLServer is that it supports multi-model serving. This means that you donโt have to create or run a new server for each ML model you want to deploy. Using the models we built above, weโll use this feature to serve them both atย once.
When MLServer starts up, it will search the directory (and any subdirectories) for model-settings.json files. If you've got multiple model-settings.json files then it'll automatically serve themย all.
Note: you still only need a single settings.json (server config) file in the root directory
Hereโs a breakdown of my directory structure for reference:
.
โโโ data
โ โโโ fashion-mnist_test.csv
โ โโโ fashion-mnist_train.csv
โโโ models
โ โโโ sklearn
โ โ โโโ Fashion_MNIST.joblib
โ โ โโโ model-settings.json
โ โ โโโ test.py
โ โ โโโ train.py
โ โโโ xgboost
โ โโโ Fashion_MNIST.json
โ โโโ model-settings.json
โ โโโ test.py
โ โโโ train.py
โโโ README.md
โโโ settings.json
โโโ test_models.py
Notice that there are two model-settings.json files – one for the scikit-learn model and one for the XGBoostย model.
We can now just run mlserver startย . and it will start handling requests for bothย models.
[mlserver] INFO - Loaded model 'fashion-sklearn' succesfully.
[mlserver] INFO - Loaded model 'fashion-xgboost' succesfully.
Testing Accuracy of Multipleย Models
With both models now up and running on MLServer, we can use the samples from our test set to validate how accurate each of our modelsย is.
The following code sends a batch request (containing the full test set) to each of the models and then compares the predictions received to the true labels. Doing this across the whole test set gives us a reasonably good measure for each modelโs accuracy, which gets printed at theย end.
import pandas as pd
import requests
import json
#Import the test data and split the data from the labels
test = pd.read_csv('./data/fashion-mnist_test.csv', header=0)
y_test = test['label']
X_test = test.drop(['label'],axis=1)
#Build the inference request
inference_request = {
"inputs": [
{
"name": "predict",
"shape": X_test.shape,
"datatype": "FP64",
"data": X_test.values.tolist()
}
]
}
#Send the prediction request to the relevant model, compare responses to training labels and calculate accuracy
def infer(model_name, version):
endpoint = f"http://localhost:8080/v2/models/{model_name}/versions/{version}/infer"
response = requests.post(endpoint, json=inference_request)
#calculate accuracy
correct = 0
for i, prediction in enumerate(json.loads(response.text)['outputs'][0]['data']):
if y_test[i] == prediction:
correct += 1
accuracy = correct / len(y_test)
print(f'Model Accuracy for {model_name}: {accuracy}')
infer("fashion-xgboost", "v1")
infer("fashion-sklearn", "v1")
The results show that the XGBoost model slightly outperforms the SVM scikit-learn one:
Model Accuracy for fashion-xgboost: 0.8953
Model Accuracy for fashion-sklearn: 0.864
Summary
Hopefully, by now youโve gained an understanding of how easy it is to serve models using MLServer. For further info, itโs worth reading the docs and taking a look at the examples for different frameworks.
All of the code from this example can be foundย here.
Serving Python Machine Learning Models With Ease was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itโs free, we donโt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aย sponsor.
Published via Towards AI