Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Real-time Vehicle Detection with 50 HD Frames/sec on an AMD GPU
Latest   Machine Learning

Real-time Vehicle Detection with 50 HD Frames/sec on an AMD GPU

Last Updated on July 25, 2023 by Editorial Team

Author(s): Rohit Sharma

Originally published on Towards AI.

Real-time Vehicle Detection U+007C Towards AI

Real-time Vehicle Detection with 50 HD Frames/sec on an AMD GPU
Real-time Vehicle Detection with 50 frames/sec of HD resolution

What is Vehicle detection?

Vehicle detection is a part of traffic surveillance methods in a live traffic feed that involves detecting all types of vehicles including cars, vans, trucks, bicyclists, etc. There are several approaches ranging from image processing methods like HOG (Histogram of Oriented Gradients), SIFT (Scale-invariant feature transform) to deep learning object detection like RCNN, SSD, Yolo, etc.

Here we focus on deep learning object detection models because of their superior accuracy. A neural network is a graph of operators (like convolution, RelU, etc) and their parameters (aka weights and bias matrix). An object detection network is a special subclass trained to locate the object in the image or a video frame. The input to object detection is a clear image of an object. This image is passed to the software which outputs the position, or a bounding box surrounding the input object as shown in the picture above.

How to choose a detection model?

Performance and accuracy are two cornerstones of an object detection model. The mAP (mean Average precision) is a popular metric in measuring the accuracy of object detectors.

overview of mAP (mean Average Precision) scores for different object detection models

While models like SSD and R-FCN are more accurate, Yolo stands out as the only one with the highest frames per second performance on a live video.

Why do we need a high-end GPU?

Our main pursuit is to find an object detector that can deliver over 30 frames per second with a modern GPU on a video feed with HD resolution. Picture from Google research paper below gives us some sense of accuracy (mAP) vs speed (ms) tradeoff on a desktop with 32GB RAM, Intel Xeon E5–1650 v2 processor and an Nvidia GeForce GTX Titan X GPU card.

Speed/accuracy trade-offs for modern convolutional object detectors (Source and Reference: Google Research paper https://arxiv.org/pdf/1611.10012.pdf)

The figure above makes it abundantly clear that very few models make it to 30 fps, which is equal to 33ms.

With our hardware limited to 16GB memory with AMD Ryzen 8-core processor and Radeon Instinct MI25 GPU, we choose YoloV2 as a starting point.

Application Design

The picture below is a high-level architecture of a typical traffic vision application.

The high-level design of a traffic vision app

The next picture provides lower level components of the software for our application, bottom 3 of which have been provided by the AMD MIVisionX set of tools.

Lower level components of traffic vision application

How to translate the model to use the GPU?

As mentioned above, a neural network is a graph of operators (like convolution, RelU, etc) and their parameters (aka weights and bias matrix). Links to download Yolo V2 graph and its parameters are given below:

  1. Yolo V2 network
  2. Yolo V2 weights

This is a raw graph, which must be translated to a set of instructions that can use CPU-cores and GPUs on AMD desktop. AMD MIVisionX package provides a simple way to translate to MI-xx GPU bases systems using AMD openVX and OpenCL libraries. AMD openVX is a low-level library for the acceleration of computer vision applications, whereas OpenCL is a framework for writing programs that execute across heterogeneous platforms like CPU & GPU.

MIVisionX model is generated using the model conversion process as shown in the picture below. This process converts Yolo V2 into the MIVision model as an openVX library (with .so extension) ready for execution on x86 based CPU-cores and MI-xx GPUs.

Yolo Model conversion process for vision applications for AMD based systems

Application Front End

Once we have the model (a dynamic library), we wrap it as a python package. MIVisionX model is the most compute intensive part of the application, which is executed on GPU. We implemented the rest of the components as python modules to be executed on a host CPU as shown in the picture below.

Traffic vision front end components

How to Deploy

Although we developed it as a desktop application with live feed as IP webcam, one can easily extend this in cloud or fog as an HTTP service. Such a service is expected to use nodeJS/javascript to support other types of cameras including mobile phones, traffic cams, etc. and run in the V8 supported browser.

Traffic Surveillance as a cloud service

Results

This app runs at 50 high definitions (1920×1080) frames/sec on AMD Ryzen with MI25 AI-accelerator desktop.

Chris Pratt reaction to our result was shocking. U+1F632

You can download the source on GitHub. More information below in references for the curious souls like me.

References

  1. Traffic Vision app
  2. Vehicle Detection and Tracking Techniques
  3. Measuring Traffic Speed
  4. yoloV2 paper
  5. Tiny Yolo aka Darknet reference network
  6. MiVisionX Setup
  7. AMD OpenVX
  8. Deep Learning experts

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓