Accelerate your data journey. Join our AI Community!


Computer Vision

Detection of Soccer Players from Thermal Images using Monk AI

Author(s): Kushagra Awasthi

Computer Vision

Making computer vision easy with Monk, low code Deep Learning tool, and a unified wrapper for computer vision.


In this tutorial, we will be making an object detection application using the thermal image dataset from an indoor soccer field. Using this application we will be able to track the number of players present on the ground at a particular time, this application can therefore be used in target tracking activities. This would help us in tracking multiple people, especially in activities where people move quickly and erratically and wear similar uniforms. Monk’s object detection toolkit allows us to deploy our model using low-code syntax, and one-line installation of different deep learning pipelines makes our work easier.

Create real-world Object Detection applications using Monk

Wheat Detection in Field
Trash Detection
Object Detection in Bad Light

About Dataset

The thermal soccer dataset is available on Kaggle, this dataset is captured using thermal cameras which ensures better segmentation and ensures the privacy of people in public facilities.
This dataset contains four 30-seconds video sequences of 8 people playing soccer in an indoor arena. The video is captured using thermal cameras of type AXIS Q1922 with a resolution of 640480 pixels and 25 fps. The three images are stitched to one image of 1920*480 pixels.
The videos are manually annotated for tracking.

Thermal Soccer Dataset

Table of Content

1. Installation Instructions

2. Use the trained model to detect soccer players

3. Training your own detector using MMdetection wrapper

 — Training

4. Inference


The first step is to set up the MONK AI toolkit and its dependencies on the platform we are working on, I am using Google Colab as my environment.

Use an already trained model for detection.

MONK toolkit also allows us to use pre-trained models to demonstrate our applications. I have also used a model, pre-trained by me for the detection of soccer players in the thermal images.

Downloading the pre-trained model folder and using it to infer some test images.

Loading the model parameters from the pre-trained model folder.

Using the predict function we will predict the bounding box of soccer players for some test images.

Inferred images

Training a Custom Detector

The first step while training a custom detector is to convert the VOC format to MONK TYPE format, but before that, we need to prepare a proper VOC type dataset for which we would need to follow the steps given below:

  • Download the dataset onto your local system from the following link.
  • Move all the images in different folders to a common folder.
  • Select all images and rename the first image as “img”.
  • Upload this image folder onto your drive and mount your drive in the notebook.
  • Now the XML files are downloaded from Kaggle and we will create separate XML files for each image and save them in a separate folder.
  • The above steps are performed so that proper label matching can be achieved after the dataset is converted from VOC to MONK type.

Saving images in an image directory “Persons” in the root directory.

Creating separate XML files for each image and saving them in a separate directory “Person_bbox1” in the root directory.

Similarly, the XML files of images in the other three folders are saved in the annotation folder in the root directory.
Now after the VOC type dataset is ready we will convert it to MONK format.
So now, what is MONK format?

MONK format


So to convert our data to the above shown MONK format we run the code snippet given below.

Convert dataset from VOC to MONK TYPE

So the CSV file generated will be as shown below.



The MONK TYPE dataset is now to be converted to COCO TYPE which will be used for object detection. In COCO format the annotation details of bounding boxes for each image are saved in the JSON file and the classes.txt file contained all the possible classes of objects which can be present in an image.

Converting dataset from MONK to COCO type


After the conversion of the dataset to COCO format we can proceed to the final step of training our detector using MMDetection wrapper class.

Importing the Detector module.

Now, we will update the dataset parameters, model parameters, hyperparameters, and training parameters for our detector.

Now, we are all set to start training our model.


Once the training is complete we can run inference on some images to validate the accuracy and efficiency of our model.

Setting up the model parameters for inference, according to the latest epoch of the trained model.

Now, we will infer an image.

Inferred Image


So, we saw how using the MONK’s low-code syntax easily created Soccer Player detection application using a thermal image dataset. This type of application helps in real-time tracking of people at the same time ensure the privacy of the people as we are using thermal images. The ability of thermal cameras to see at night and even in severe weather conditions makes it very useful for target detection applications. This application can also be used by security forces for surveillance purposes, for more such applications refer to the Application Model Zoo of MONK object detection library.

Tutorial available on Github.

Detection of Soccer Players from Thermal Images using Monk AI was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓