Deploying Custom Detectron2 Models with a REST API: A Step-by-Step Guide.
Author(s): Gennaro Daniele Acciaro
Originally published on Towards AI.
In the life of a Machine Learning Engineer, training a model is only half the battle.
Indeed, after obtaining a neural network that accurately predicts all the test data, it remains useless unless itβs made accessible to the world.
Model deployment is the process of making a model accessible and usable in production environments, where it can generate predictions and provide real-time insights to end-users and itβs an essential skill for every ML or AI engineer.
In this guide, weβll walk through the process of deploying a custom model trained using the Detectron2 framework.
🤖 What is Detectron2?
Detectron2 is a powerful library for object detection and segmentation, built on PyTorch and developed by Meta. It provides an excellent framework for training and deploying your custom models. With Detectron2, you can easily build and fine-tune neural networks to accurately detect and segment objects in images and videos.
The library offers many pre-trained models and state-of-the-art algorithms, making it a popular choice among machine learning engineers and researchers. Whether youβre working on computer vision tasks or building applications that require object detection capabilities, Detectron2 provides the tools and flexibility you need to achieve accurate and efficient results.
For more information, refer to the official GitHub repository: Detectron2 GitHub
Finetune a Detectron2 model
Weβre going to assume you already have a fine-tuned model ready to be deployed. If you donβt have a model yet, donβt worry! The amazing Detectron2 team has provided an official Colab tutorial to help you out: Detectron2 Colab Tutorial
In this article, we will specifically focus on how to deploy the model trained using that Colab, finetuned on segmenting balloons🎈.
⚠οΈ Important: we need to extract two files (class_names.txt and config.yaml) from the trained model. We will need these files shortly.
Hence, run this snippet after having executed the training of the model:
import os
from pathlib import Path
# Save class names
class_names = MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes
class_names_path: str = cfg.OUTPUT_DIR + "/class_names.txt"
with open(class_names_path, "w") as f:
for item in class_names:
f.write("%s\n" % item)
print(f"\033[92m 🎉 Class names file saved in: {class_names_path} 🎉 \033[00m")
# Save the config yaml
config_yaml_path: str = cfg.OUTPUT_DIR + "/config.yaml"
with open(config_yaml_path, 'w') as file:
file.write(cfg.dump())
print(f"\033[92m 🎉 Config file saved in: {config_yaml_path} 🎉 \033[00m")
Now you can download the following files (located in the /output folder) from Colab:
class_names.txt
config.yaml
model_final.pth
Deploying your model with Torchserve
In this guide, we will use TorchServe to implement our model, you will end up having a fully functional REST API that can deliver predictions from your custom Detectron2 model, ready to be integrated into your applications.
What is Torchserve?
TorchServe is an open-source tool designed to facilitate the efficient deployment of PyTorch models. It streamlines the process of exposing models via a REST API, manages models, handles inference requests, and collects logs and metrics. TorchServe supports simultaneous serving of multiple models, dynamic batching, and adapts to different deployment environments, making it an optimal choice for serving models at production scale.
More details here: TorchServe GitHub , TorchServe Website.
An overview of the deployment process
The deployment process involves the following steps:
- Installation of required packages for Torchserve.
- Review of the handler: this component is responsible for processing inference requests through the API.
- Generation of the .mar package for the model.
- Deployment of the model: we use the Torchserve CLI commands to complete the deployment.
Step 0: Installing the requirements
Please note: In this guide we assume that you already have Detectron2 installed, if not, you can follow the official guide: https://detectron2.readthedocs.io/en/latest/tutorials/install.html
After installing Detectron2 we can install everything we need to distribute the models with the following commands:
pip install torch-model-archiver
pip install pyyaml torch torchvision captum nvgpu
pip install torchserve
Step 1: The handle
One of the key components of TorchServe is the handler.
This is a Python file responsible for loading the model into memory and managing the entire inference pipeline, including preprocessing, inference, and postprocessing.
For our tutorial, you can find the handler file at the following link: medium-repo/deploying_custom_detectron2_models_with_torchserve/model_handler.py at main Β· vargroup-datascience/medium-repo
If youβre deploying your own model, donβt forget to change the parameters 😊.
Copy and paste this handler into a new file named model_handler.py
.
Step 2: The .mar file
Torchserve requires a .mar file to function, which contains the model weights, inference code and necessary dependencies.
It is interesting to note that uploading a .mar file to a server can be done even if the server is already running, without restarting the server.This allows us to update our models in production without interrupting service. This aspect will not be discussed in detail in this manual.
At this point then we now have our four files:
class_names.txt
config.yaml
model_final.pth
model_handler.py
Next, letβs create the archive that will contain all the files. This archive, named my_d2_model.mar
, can be created using the following command:
torch-model-archiver --model-name my_d2_model --version 1.0 --handler model_handler.py --serialized-file model_final.pth --extra-files config.yaml,class_names.txt -f
This command creates a my_d2_model.mar
file containing all the files needed for the model; if a model with the same name and version already exists, you will need to increment the version.
Once executed, you should see a my_d2_model.mar
file in the current folder.
Step 3: Deploy the model
To use the server, we need to create a model_store folder where the models will be stored.
mkdir model_store
Then we copy the file my_d2_model.mar into the folder model_store:
Linux/Mac OS:
cp my_d2_model.mar model_store
Windows:
copy my_d2_model.mar model_store
At this point then we can start the server with the following command:
torchserve --start --model-store model_store --models my_d2_model=my_d2_model.mar --disable-token-auth
To test whether the server has started correctly, we ping the server with the command:
$ curl http://localhost:8080/ping
If all goes well, we will have this answer:
{
"status": "Healthy"
}
Futhermore, to test if the deployment was successful, we need to run this command, which will return the models currently active on the server.
$ curl http://localhost:8081/models
The output that we expect is as follows:
{
"models": [
{
"modelName": "my_d2_model",
"modelUrl": "my_d2_model.mar"
}
]
}
At this point we are ready to use our model.
Step 4: Access the API
Now that the server is running and the model is deployed, you can interact with the model using the provided API endpoint.
To get predictions from the deployed model, you can use the following curl command. This command sends an image file to the model, and the server returns the prediction results.
$ curl http://127.0.0.1:8080/predictions/my_d2_model -T img.jpg
Alternatively, you can use any HTTP client of your choice (such as Postman, Pythonβs requests
library, or even a custom application) to interact with the API and send the image for predictions.
If the request is successful, the server processes the image using the deployed model and returns a JSON response with the prediction results.
Conclusions
In conclusion, this guide has illustrated the step-by-step process of deploying a custom Detectron2 model using a REST API via TorchServe.
We started by installing the necessary dependencies, then created a custom handler to manage the inference pipeline, and finally generated a .mar file containing the model and its dependencies.
By using TorchServe, it was possible to launch a server capable of exposing the model through the API, allowing for real-time inference. This procedure allows not only to quickly put machine learning models into production, but also to update the models without interrupting the service.
All of this can be useful for integrating machine learning models into oneβs own applications, ensuring a robust and scalable deployment.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI