Real-time Vehicle Detection with 50 HD Frames/sec on an AMD GPU
Last Updated on July 25, 2023 by Editorial Team
Author(s): Rohit Sharma
Originally published on Towards AI.
Real-time Vehicle Detection U+007C Towards AI
What is Vehicle detection?
Vehicle detection is a part of traffic surveillance methods in a live traffic feed that involves detecting all types of vehicles including cars, vans, trucks, bicyclists, etc. There are several approaches ranging from image processing methods like HOG (Histogram of Oriented Gradients), SIFT (Scale-invariant feature transform) to deep learning object detection like RCNN, SSD, Yolo, etc.
Here we focus on deep learning object detection models because of their superior accuracy. A neural network is a graph of operators (like convolution, RelU, etc) and their parameters (aka weights and bias matrix). An object detection network is a special subclass trained to locate the object in the image or a video frame. The input to object detection is a clear image of an object. This image is passed to the software which outputs the position, or a bounding box surrounding the input object as shown in the picture above.
How to choose a detection model?
Performance and accuracy are two cornerstones of an object detection model. The mAP (mean Average precision) is a popular metric in measuring the accuracy of object detectors.
While models like SSD and R-FCN are more accurate, Yolo stands out as the only one with the highest frames per second performance on a live video.
Why do we need a high-end GPU?
Our main pursuit is to find an object detector that can deliver over 30 frames per second with a modern GPU on a video feed with HD resolution. Picture from Google research paper below gives us some sense of accuracy (mAP) vs speed (ms) tradeoff on a desktop with 32GB RAM, Intel Xeon E5β1650 v2 processor and an Nvidia GeForce GTX Titan X GPU card.
The figure above makes it abundantly clear that very few models make it to 30 fps, which is equal to 33ms.
With our hardware limited to 16GB memory with AMD Ryzen 8-core processor and Radeon Instinct MI25 GPU, we choose YoloV2 as a starting point.
Application Design
The picture below is a high-level architecture of a typical traffic vision application.
The next picture provides lower level components of the software for our application, bottom 3 of which have been provided by the AMD MIVisionX set of tools.
How to translate the model to use the GPU?
As mentioned above, a neural network is a graph of operators (like convolution, RelU, etc) and their parameters (aka weights and bias matrix). Links to download Yolo V2 graph and its parameters are given below:
This is a raw graph, which must be translated to a set of instructions that can use CPU-cores and GPUs on AMD desktop. AMD MIVisionX package provides a simple way to translate to MI-xx GPU bases systems using AMD openVX and OpenCL libraries. AMD openVX is a low-level library for the acceleration of computer vision applications, whereas OpenCL is a framework for writing programs that execute across heterogeneous platforms like CPU & GPU.
MIVisionX model is generated using the model conversion process as shown in the picture below. This process converts Yolo V2 into the MIVision model as an openVX library (with .so extension) ready for execution on x86 based CPU-cores and MI-xx GPUs.
Application Front End
Once we have the model (a dynamic library), we wrap it as a python package. MIVisionX model is the most compute intensive part of the application, which is executed on GPU. We implemented the rest of the components as python modules to be executed on a host CPU as shown in the picture below.
How to Deploy
Although we developed it as a desktop application with live feed as IP webcam, one can easily extend this in cloud or fog as an HTTP service. Such a service is expected to use nodeJS/javascript to support other types of cameras including mobile phones, traffic cams, etc. and run in the V8 supported browser.
Results
This app runs at 50 high definitions (1920×1080) frames/sec on AMD Ryzen with MI25 AI-accelerator desktop.
You can download the source on GitHub. More information below in references for the curious souls like me.
References
- Traffic Vision app
- Vehicle Detection and Tracking Techniques
- Measuring Traffic Speed
- yoloV2 paper
- Tiny Yolo aka Darknet reference network
- MiVisionX Setup
- AMD OpenVX
- Deep Learning experts
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI