Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Understanding Pascal VOC and COCO Annotations for Object Detection
Computer Vision   Machine Learning

Understanding Pascal VOC and COCO Annotations for Object Detection

Last Updated on June 5, 2020 by Editorial Team

Author(s): Pushkar Pushp

Computer Vision

Source: Unsplash

Introduction

In the previous blog, we created both COCO and Pascal VOC dataset for object detection and segmentation. So we are going to do a deep dive on these datasets.

Pascal VOC

PASCAL (Pattern Analysis, Statistical Modelling, and Computational Learning) is a Network of Excellence by the EU. They ran the Visual Object Challenge (VOC) from 2005 onwards tillΒ 2012.

The file structure obtained after annotations from VoTT is asΒ below.

The four components are Annotations, ImageSets, JPEGImages, and pascal_label_map.pbxt.

We will come to annotations; at last, JPEGImages is the folder containing original images. ImageSets provides to txt file train and valid inside the main folder containing the list of images in train andΒ valid.

The pascal_label_map.pbxt contains the id and name of the object to be detected.

pascal_label_map.pbxt

item {
id: 1
name: β€˜cricketers’
}

The last and most important section is Annotations, annotations files are created for each image in the givenΒ folder.

The annotations are stored in an XML file, and let’s look into one sample XMLΒ file.

There are different components or tags corresponding to XMLΒ output.

folder: that containsΒ images.

filename: the relative path of the image which is annotated.

path: an absolute path of the output file after annotations.

size: height, width in terms of pixels, the depth indicating the number of channels for RGB image depth is 3, for B/W it isΒ 1.

object: contains name,poseΒ ,truncated,difficult.

  • name: contains the name of the object being annotated, in our case it is cricketers.
  • pose: orientation leftΒ ,rightΒ ,etc.
  • truncated: if objects extend beyond bounding box truncated is 1 elseΒ 0.
  • difficult: if it is not evaluated difficult is 1 elseΒ 0.

bndbox: bounding box it consists of the top left-hand corner and bottom right-hand corner(xmin-top left, ymin-top left,xmax-bottom right, ymax-bottom right)

COCO

COCO is a common object in context. The dataset contains 91 objects types of 2.5 million labeled instances across 328,000Β images.

COCO is used for object detection, segmentation, and captioning dataset.

  • Object segmentation
  • Recognition inΒ context
  • Superpixel stuff segmentation

COCO stores annotations in JSON format unlike XML format in PascalΒ VOC.

The official document of COCO states it has five object detection, keypoint detection, stuff segmentation, panoptic segmentation, and image captioning.

Image Source COCO official site http://cocodataset.org/

Basic Data structure of COCO annotations are the same for all five types. There is a single annotation for all images, different from Pascal VOC annotations.

Let’s have a closer look at the annotation file.

The four basic components are info, images, annotations, andΒ license.

info: contains the standard information/description of theΒ images.

images: all detail about each individual image as listed in the figureΒ above.

license: consists of the license applicable to thatΒ image.

Here we will be focussing on Object Detection annotations.

Annotations to understand these lets first understand what are categories.

Categories contain a list of all images with a unique id and its supercategory, for instance, players supercategory will be cricketers and each cricketer will have a uniqueΒ id.

"categories": [
{"supercategory": "cricketers","id": 1,"name": "Sachin"},
{"supercategory": "cricketers","id": 2,"name": "Stokes"},
{"supercategory": "umpire","id": 3,"name": "Bucknor"},
{"supercategory": "cricketers","id": 4,"name": "Waugh"},
{"supercategory": "umpire","id": 5,"name": "Taufel"},

]

The numbers of annotations will be equal to the total number of objects present entire image datasets.

Original image on left and mask onΒ right.

The mask is converted into COCO annotation.

The annotation consists of segmentation,iscrowd,image_id,category_id,id,bbox,and area

[{'segmentation': [[236.0,616.5,610.5,...,613.5,233.0,612.5,236.0,616.5]],
'iscrowd': 0,
'image_id': 1,
'category_id': 1,
'id': 1,
'bbox': (199.5, 126.5, 474.0, 490.0),
'area': 103944.0}]
  • segmentation: list of vertices of polygon for cluster run-length-encoded (RLE) is usedΒ .RLE stores repeating values by the number of times theyΒ repeat.
  • iscrowd: 0 for a single object and 1 for a cluster ofΒ objects.
  • image_id: unique id corresponding to the image in theΒ dataset.
  • category_id: corresponds to the category.
  • id: unique id for each annotation.
  • area: bounding box area inΒ pixels.
  • bbox: a list containing [top left x position, top left y position, width,Β height]

In the next blog, we read about various object detection techniques and harness both COCO as well as Pascal VOCΒ data.

😊


Understanding Pascal VOC and COCO Annotations for Object Detection was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓