Easing up the process of Tensorflow 2.0 Object Detection API and TensorRT
Last Updated on January 6, 2023 by Editorial Team
Author(s): Abhishek Annamraju
Computer Vision
Detailed steps to train your own object detector with Monkβs TF-Object-Detection-API, optimize using TensorRT and run inference on GPUΒ systems
The entire code is available as a jupyter notebook at Monk Object Detection Library
Every computer vision engineer takes up an opensource library with the goal of using it on a custom dataset. And one of the most used libraries for object detection is Tensorflow for its ever-expanding model zoo. Tensorflow recently added support of TF 2.0 for Object Detection API. Using TensorFlow object detection API for custom object detection and further model optimization using TensorRT is a lengthy time-consuming process and prone toΒ errors.
To overcome the issues that are usually faced and to reduce the workload on the developer
* modifying tfrecord examples and arranging data in rigid formats to fit in custom data
* updating config files
* using the right files to train the engine
* converting trained checkpoints to other formats for inferencing
* searching for right ways to run inference
* optimizing the model using TensorRT, etc
we integrated TF object detection API with a low-code monk AIΒ toolkit
With it, developers can easily
β
convert custom datasets to tf-records
β
update config files using pythonic syntax
β
train the engine and export to different inference formats
β
Infer either using checkpoints or saved model formats
β
Optimize model using TensorRT engines for faster inference
Issues faced while training a custom dataset using TF 2.0 Object Detection API
Along with the procedural steps, mentioned are the issues a developer or a researcher usually faces and why we thought of easing up the process using a low-code opensource library.
Step 1: Installing prerequisites and compiling models.
* Training with older versions 2.0.0 and 2.1.0 resulted inΒ error
AttributeError: module βtensorflow_core.keras.utilsβ has no attribute βregister_keras_serializableβ
* Training with version 2.2.0 also resulted in the following errors
AttributeError: 'CollectiveAllReduceExtended' object has no attribute '_cfer_fn_cache'
* Version 2.3.0 runs without any errors and is compatible with colabΒ too.
Hence V-2.3.0 was selected, soon upgrade to V2.4.0 will be done as soon as it is released as tf-lite conversion is error-prone inΒ V-2.3.0
Step 2: DatasetΒ settings
* TF modelβs dataset_tools provide examples for public datasets such as COCO, VOC, OID, etc; But not all public datasets are annotated in theseΒ formats.
* Once annotated in this format, it needs to be arranged in a data structure that can be fed into the files present in object_detection/dataset_tools/; Or these files are to be modified to ingest custom data in theΒ pipeline
To make this process easier, we added a simple parser to convert annotations to VOC format data and further to tfrecords.
Step 3: Model and Config file +Β Train
* Weights and config files have to be downloaded from the Model zoo, and data structure to be updated in thisΒ format
* Post this, the config file elements have to be updated. A set of more than 25 elements have to be changed includingβββdataset details, base feature extraction details, checkpoint details, optimizer details, tf-record details.
* Once config details are updated training can beΒ done
To avoid manual changes to config files and folder structure formats pythonic API wrapper wasΒ created.
Step 4: Export Model to saved-model format and inference
* Checkpoint files then can be converted to the βSaved ModelβΒ format.
* General issues here include conversion of ssd fpn and resnetΒ formats.
* Running inference on checkpoint files required object detection model builders whereas saved model β.pbβ formats can be loaded using tfβs load_model function.
* Creating a graph function over saved_model before running inference usually tends to speed up theΒ process
Step 5: TensorRT model conversion and inference
* Saved model can then be optimized to run on different NVidia GPUΒ machine
* The issue with TensorRT is that the library version for the development system (one where you train and convert the model) and the deployment system (one where you deploy your model) has to be theΒ same.
* * The next issue with TRT is that it optimizes the model based on the compute compatibility of the model being used. In simple terms, a model optimized on colab cannot run on the Jetson Nano board. Hence a model needs to be just converted and built on runtime on the deployment machine.
* TensorRT models can be converted only from saved_model (β.pbβ) files and it generates an optimized model in a similar format which then needs to be inferredΒ on.
Letβs Get Started with theΒ process
Note: Entire code is available as a jupyter notebook at Monk Object Detection Library. Here, I have mentioned only the important steps involved in the entireΒ work.
Step 1βββInstallation
For local or cloud-based systems:
This will install
* Pre-requisite libraries such as numpy, scipy, opencv, pillow, lxml, etc
* Tensorflow 2.3.0 and Tensorflow-models-python-2.2.0
* Tensorflow Object Detection API
Similarly, it can also be installed onΒ colab
Step 2βββData preparation
The dataset needs to be in simple pascal VOC format as mentioned below
To convert your dataset from COCO or any other format check these detailed tutorials
For this tutorial, we used the BDD100K on-road object detection datasetβββCredits
Complete steps to download the data and convert it to Pascal VOC format are mentioned in this jupyter notebook onΒ Github
Step 3βββSystem parametric setup
β Load theΒ detector
β List all the models. At present 26 different models are supported
β Set training and validation data withΒ params
* Set batch size as per the GPU memory available. A size of 24 fits well on AWS p3.2x instance with V100 GPU (16 GBΒ VRAM)
β Create TF Record!!!
* Batch size, number of classes, tf_record details all will be saved automatically in the configΒ file
β Select a model from modelΒ list
* For this example we have selected SSD Resnet50 with Feature Pyramid network and it takes input image of shape 640x604x3 (RGBΒ image)
β Set all other hyperparameters
* Set training stepsβββan ideal value would be to train for 100K steps for largeΒ datasets
* Optimal initial learning rate for all models can be set around 0.01, whereas ssd_mobilenet_v2 and faster_rcnn_inception models can take up even higher learningΒ rates.
β Set output inference graph path and TensorRTΒ params
* TensorRT optimization isΒ optional
* TensorRT optimization supports 3 types of optimizationsβββFP32, FP16,Β INT8
* Floating-point quantizations are useful for boards like JetsonΒ Nano.
* INT8 optimization builds at the time of creation while the other two build on the deployment machine. Build can be easily considered as actual optimization being done on the model. An issue mentioned above with TensorRT is that it should be built on a machine where you want to deploy it, or should be built on a machine with the same TensorRT library and Cuda compute capabilities.
* Hence, it is advised to run INT8 optimization on the deployment machineΒ itself.
* Since, for this example, we trained and ran inference on AWS P3.2x instance, we went for INT8 quantization.
Step 4β Training and modelΒ export
β To train run the following command.
* Since it runs a TF engine, running it as a wrapper on jupyter notebook results in systemΒ exit.
* For that reason, a script named train.py is provided.
* Once training is completed, checkpoint files will be saved in the output_directory that you have mentioned in the hyperparameter setupΒ command.
β To export the trained model to saved_model (β.pbβ) format run the following command.
* Since it runs a TF engine, running it as a wrapper on jupyter notebook results in systemΒ exit.
* For that reason a script named export.py is provided.
* OutputΒ .pb file will be saved in the export_directory that you have set in export_params setupΒ command.
Step 5βββInference and speed benchmark before optimization
β Load theΒ detector
β Load the trainedΒ model
* Load form saved_model in exported directory
β Run inference on a singleΒ image
β Run benchmark speedΒ analysis
* Without optimization, these are the results on AWS P3.2xΒ instance
Average Image loading time : 0.0121 sec
Average Inference time : 0.0347 sec
Result extraction time : 0.0848 sec
total_repetitions : 100
total_time : 3.4712 sec
images_per_sec : 28
latency_mean : 34.7123 ms
latency_median : 34.9255 ms
latency_min : 32.2594 ms
Step 6βββOptimize using TensorRT-6
β Install TensorRT6
* Visit Nvidia TensorRT page to downloadΒ TRT6
* Download packages from TensorRT website depending upon os and CUDA versions.
β To optimize using TensorRT run the following command.
* Since it runs a TF engine, running it as a wrapper on jupyter notebook results in systemΒ exit.
* For that reason, a script named optimize.py is provided.
* Run it over exported saved_model
Step 7β Inference and speed benchmark after optimization
β Load theΒ detector
β Load the trainedΒ model
* Load from optimized saved_model
β Run inference on a singleΒ image
β Run benchmark speedΒ analysis
* Without optimization, these are the results on AWS P3.2xΒ instance
Average Image loading time : 0.0117 sec
Average Inference time : 0.0169 sec
Result extraction time : 0.0822 sec
total_repetitions : 100
total_time : 1.6907 sec
images_per_sec : 59
latency_mean : 16.9070 ms
latency_median : 16.8167 ms
latency_min : 16.2708 ms
Conclusion
With the Monk Object detection library one canΒ easily
β
convert custom datasets to tf-records
β
update config files using pythonic syntax
β
train the engine and export to different inference formats
β
Infer either using checkpoints or saved model formats
β
Optimize model using TensorRT engines for faster inference
β
Post optimization images processed per second nearlyΒ doubled
The entire code is available on GitHub at Monk Object Detection Library
Happy Coding!!
Appendixβββ1
More About Monk Object Detection Library
Easing up the process of Tensorflow 2.0 Object Detection API and TensorRT was originally published in Towards AIβββMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI