Accelerate your data journey. Join us!


Computer Vision

Yolov3 CPU Inference Performance Comparison — Onnx, OpenCV, Darknet

Author(s): Matan Kleyman

Computer Vision

Yolov3 CPU Inference Performance Comparison — Onnx, OpenCV, Darknet

Opencv, Darknet, Onnxruntime Object Detection Frameworks | Image by author

Choosing the right inference framework for real-time object detection applications became significantly challenging, especially when models should run on low-powered devices. In this article you will understand how to choose the best inference detector for your needs, and discover the huge performance gain it can give you.

Usually, we tend to focus on light-weight model architectures when we aim to deploy models on CPU or mobile devices, while neglecting the research for a fast inference engine.

During my research on fast inference on CPU devices I have tested various frameworks that offer a stable python API. Today will focus on Onnxruntime, OpenCV DNN and Darknet frameworks, and measure them in terms of performance (running-time) and accuracy.

We will use two common Object Detection Models for the performance measurement:

image_size = 480*480
classes = 98
BFLOPS =87.892
image_size= 1024*1024
classes =98
BFLOPS= 46.448

Both models were trained using AlexeyAB’s Darknet Framework on custom data.

Now let’s walk through running inference with the detectors we want to test.

Darknet Detector

Darknet is the official framework for training YOLO (You Only Look Once) Object-Detection Models.

Furthermore, it offers the ability to run inference on models in *.weights file format, which is the same format the training outputs.

There are two methods for inferencing:

  • Various number of images:
darknet detector test cfg/ cfg/yolov3.cfg yolov3.weights -thresh 0.25
  • One image
darknet detector demo cfg/ cfg/yolov3.cfg yolov3.weights dog.png

OpenCV DNN Detector

Opencv-DNN is an extension of the well-known opencv library which is commonly used in the Computer Vision field. Darknet claims that opencv-dnn is “ the fastest inference implementation of YOLOv4/v3 on CPU Devices” because of its efficient C&C++ implementation.

Loading darknet weights to opencv-dnn is straight forward thanks to its convenient Python API.

This is a code snippet of E2E Inference:

Onnxruntime Detector

Onnxruntime is maintained by Microsoft and claims to achieve dramatically faster inference thanks to its built-in optimizations and unique ONNX weights format file.

As you can see in the next image, it supports various flavors and technologies.

In our comparison we will use Python\x64\CPU flavor.

ONNX Format defines a common set of operators — the building blocks of machine learning and deep learning models — and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.

Converting Darknet weights > Onnx weights

In order to run inference with Onnxruntime, we will have to convert *.weights format to *.onnx fomrat .

We will use a repository which was created specifically for converting darknet *.weights format into *.pt (PyTorch) and *.onnx (ONNX Format).


  • Clone the repo and install the requirements.
  • Run with your cfg & weights & img_size arguments.
python yolov3.cfg yolov3.weights 1024 1024
  • A yolov3.onnx file will be created in the yolov3.weights directory.

***Keep in mind there is a minor ~0.1 mAP% drop in accuracy when inferencing with ONNX format due to the conversion process. The converter imitates darknet functionality in PyTorch but is not flawless***

***Feel free to create issues/PR in order to support conversion for other darknet architectures other than yolov3***

After we successfully converted our model to ONNX format we can run inference using Onnxruntime.

Below you can find a code snippet of an E2E Inference:

Performance Comparison

Congratulations, we have gone through all of the technicalities and you should now have sufficient knowledge for inferencing with each one of the detectors.

Now let’s address our main goal — Performance Comparison.

The performance was measured separately for each of the models mentioned above (Yolov3, Tiny-Yolov3) on pc cpu — Intel i7 9th Gen.

For opencv and onnxruntime, we only measure the execution time of forward propagation in order to isolate it from pre/post processes.

These lines were profiled:

  1. Opencv
layers_result =

2. Onnxruntime

layers_result =[output_name_1, output_name_2], {input_name: image_blob})
layers_result = np.concatenate([layers_result[1], layers_result[0]], axis=1)

3. Darknet

darknet detector test cfg/ cfg/yolov3.cfg yolov3.weights -thresh 0.25

The Verdict


Yolov3 was tested on 400 unique images.

  1. ONNX Detector is the fastest in inferencing our Yolov3 model. To be precise, 43% faster than opencv-dnn, which is considered to be one of the fastest detectors available.
Yolov3 Total Inference Time — Created by Matan Kleyman

2. Average Time Per Image:

Yolov3 Avg time per image — Created by Matan Kleyman


Tiny-Yolov3 was tested on 600 unique images.

  1. Here as well, ONNX Detector is superior, on our Tiny-Yolov3 model, 33% faster than opencv-dnn.
Tiny-Yolov3 Total Inference Time — Created by Matan Kleyman

2. Average Time Per Image:

Tiny-Yolov3 Avg time per image — Created by Matan Kleyman


  1. We have seen that onnxruntime runs inference significantly faster than opencv-dnn.
  2. We achieved running Yolov3 in less time than Tiny-Yolov3, even though Yolvo3 is much larger!
  3. We have the necessary tools to convert a model that was trained in darknet into a *.onnx format.

This is all for this article. I hope you find it useful. If yes, please give it a clap. For any questions and suggestions, feel free to connect with me on Linkedin.

Thanks for reading!

~ Matan

Yolov3 CPU Inference Performance Comparison — Onnx, OpenCV, Darknet was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓