Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Deploying Custom Detectron2 Models with a REST API: A Step-by-Step Guide.
Latest   Machine Learning

Deploying Custom Detectron2 Models with a REST API: A Step-by-Step Guide.

Author(s): Gennaro Daniele Acciaro

Originally published on Towards AI.

An image generated using Midjourney

In the life of a Machine Learning Engineer, training a model is only half the battle.

Indeed, after obtaining a neural network that accurately predicts all the test data, it remains useless unless it’s made accessible to the world.

Model deployment is the process of making a model accessible and usable in production environments, where it can generate predictions and provide real-time insights to end-users and it’s an essential skill for every ML or AI engineer.

In this guide, we’ll walk through the process of deploying a custom model trained using the Detectron2 framework.

🤖 What is Detectron2?

Image taken from the official Colab for Detectron2 training.

Detectron2 is a powerful library for object detection and segmentation, built on PyTorch and developed by Meta. It provides an excellent framework for training and deploying your custom models. With Detectron2, you can easily build and fine-tune neural networks to accurately detect and segment objects in images and videos.

The library offers many pre-trained models and state-of-the-art algorithms, making it a popular choice among machine learning engineers and researchers. Whether you’re working on computer vision tasks or building applications that require object detection capabilities, Detectron2 provides the tools and flexibility you need to achieve accurate and efficient results.

For more information, refer to the official GitHub repository: Detectron2 GitHub

Finetune a Detectron2 model

We’re going to assume you already have a fine-tuned model ready to be deployed. If you don’t have a model yet, don’t worry! The amazing Detectron2 team has provided an official Colab tutorial to help you out: Detectron2 Colab Tutorial

In this article, we will specifically focus on how to deploy the model trained using that Colab, finetuned on segmenting balloons🎈.

⚠️ Important: we need to extract two files (class_names.txt and config.yaml) from the trained model. We will need these files shortly.

Hence, run this snippet after having executed the training of the model:

import os 
from pathlib import Path

# Save class names
class_names = MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes
class_names_path: str = cfg.OUTPUT_DIR + "/class_names.txt"

with open(class_names_path, "w") as f:
for item in class_names:
f.write("%s\n" % item)

print(f"\033[92m 🎉 Class names file saved in: {class_names_path} 🎉 \033[00m")

# Save the config yaml
config_yaml_path: str = cfg.OUTPUT_DIR + "/config.yaml"
with open(config_yaml_path, 'w') as file:
file.write(cfg.dump())

print(f"\033[92m 🎉 Config file saved in: {config_yaml_path} 🎉 \033[00m")

Now you can download the following files (located in the /output folder) from Colab:

  • class_names.txt
  • config.yaml
  • model_final.pth

Deploying your model with Torchserve

In this guide, we will use TorchServe to implement our model, you will end up having a fully functional REST API that can deliver predictions from your custom Detectron2 model, ready to be integrated into your applications.

What is Torchserve?

TorchServe is an open-source tool designed to facilitate the efficient deployment of PyTorch models. It streamlines the process of exposing models via a REST API, manages models, handles inference requests, and collects logs and metrics. TorchServe supports simultaneous serving of multiple models, dynamic batching, and adapts to different deployment environments, making it an optimal choice for serving models at production scale.

More details here: TorchServe GitHub , TorchServe Website.

An overview of the deployment process

The deployment process involves the following steps:

  1. Installation of required packages for Torchserve.
  2. Review of the handler: this component is responsible for processing inference requests through the API.
  3. Generation of the .mar package for the model.
  4. Deployment of the model: we use the Torchserve CLI commands to complete the deployment.

Step 0: Installing the requirements

Please note: In this guide we assume that you already have Detectron2 installed, if not, you can follow the official guide: https://detectron2.readthedocs.io/en/latest/tutorials/install.html

After installing Detectron2 we can install everything we need to distribute the models with the following commands:

pip install torch-model-archiver 
pip install pyyaml torch torchvision captum nvgpu
pip install torchserve

Step 1: The handle

One of the key components of TorchServe is the handler.

This is a Python file responsible for loading the model into memory and managing the entire inference pipeline, including preprocessing, inference, and postprocessing.

For our tutorial, you can find the handler file at the following link: medium-repo/deploying_custom_detectron2_models_with_torchserve/model_handler.py at main Β· vargroup-datascience/medium-repo

If you’re deploying your own model, don’t forget to change the parameters 😊.

Copy and paste this handler into a new file named model_handler.py.

Step 2: The .mar file

Torchserve requires a .mar file to function, which contains the model weights, inference code and necessary dependencies.

It is interesting to note that uploading a .mar file to a server can be done even if the server is already running, without restarting the server.This allows us to update our models in production without interrupting service. This aspect will not be discussed in detail in this manual.

torch-model-archiver creates an archive file (.mar) containing a pre-trained PyTorch model, its configuration files and any dependencies.

At this point then we now have our four files:

  • class_names.txt
  • config.yaml
  • model_final.pth
  • model_handler.py

Next, let’s create the archive that will contain all the files. This archive, named my_d2_model.mar, can be created using the following command:

torch-model-archiver --model-name my_d2_model --version 1.0 --handler model_handler.py --serialized-file model_final.pth --extra-files config.yaml,class_names.txt -f

This command creates a my_d2_model.marfile containing all the files needed for the model; if a model with the same name and version already exists, you will need to increment the version.

Once executed, you should see a my_d2_model.mar file in the current folder.

Step 3: Deploy the model

To use the server, we need to create a model_store folder where the models will be stored.

mkdir model_store

Then we copy the file my_d2_model.mar into the folder model_store:

Linux/Mac OS:

cp my_d2_model.mar model_store

Windows:

copy my_d2_model.mar model_store

At this point then we can start the server with the following command:

torchserve --start --model-store model_store --models my_d2_model=my_d2_model.mar --disable-token-auth

To test whether the server has started correctly, we ping the server with the command:

$ curl http://localhost:8080/ping

If all goes well, we will have this answer:

{ 
"status": "Healthy"
}

Futhermore, to test if the deployment was successful, we need to run this command, which will return the models currently active on the server.

$ curl http://localhost:8081/models 

The output that we expect is as follows:

{ 
"models": [
{
"modelName": "my_d2_model",
"modelUrl": "my_d2_model.mar"
}
]
}

At this point we are ready to use our model.

Step 4: Access the API

Now that the server is running and the model is deployed, you can interact with the model using the provided API endpoint.

To get predictions from the deployed model, you can use the following curl command. This command sends an image file to the model, and the server returns the prediction results.

$ curl http://127.0.0.1:8080/predictions/my_d2_model -T img.jpg

Alternatively, you can use any HTTP client of your choice (such as Postman, Python’s requests library, or even a custom application) to interact with the API and send the image for predictions.

If the request is successful, the server processes the image using the deployed model and returns a JSON response with the prediction results.

Conclusions

In conclusion, this guide has illustrated the step-by-step process of deploying a custom Detectron2 model using a REST API via TorchServe.

We started by installing the necessary dependencies, then created a custom handler to manage the inference pipeline, and finally generated a .mar file containing the model and its dependencies.

By using TorchServe, it was possible to launch a server capable of exposing the model through the API, allowing for real-time inference. This procedure allows not only to quickly put machine learning models into production, but also to update the models without interrupting the service.

All of this can be useful for integrating machine learning models into one’s own applications, ensuring a robust and scalable deployment.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓