Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Interview Questions: Object Detection
Latest   Machine Learning

Interview Questions: Object Detection

Last Updated on July 20, 2023 by Editorial Team

Author(s): Akula Hemanth Kumar

Originally published on Towards AI.

Photo by pisauikan on Unsplash

I am currently in a job search for a Computer vision engineer. In this article, I am trying to share the things which I have learned. I would like to thank Jonathan for this awesome Object detection series.

“This is for my personal reference. If you find any mistakes, please comment I will correct them.”

U+1F4CCWhat is the loss function in YOLO? [src]

U+1F4A1 YOLO uses a sum of squared error between the predictions and the ground truth to calculate the loss. The loss function composes of:

  • The Classification loss.
  • The Localization loss (errors between the predicted boundary box and the ground truth).
  • The Confidence loss (the objectness of the box).

Loss function = classification loss + localization loss + confidence loss

U+1F4CCWhat is the advantage of two-stage methods? [src]

U+1F4A1 In two-stage methods like R-CNN, they first predict a few candidate object locations and then use a convolutional neural network to classify each of these candidate object locations as one of the classes or as background.

U+1F4CCWhat is the main problem faced with Single-shot methods?[src][src]

U+1F4A1 Single-shot methods like SSD suffer from extremely by class imbalance. SSD resamples the ratio of the object class and background class during training so it will not be overwhelmed by an image background.

U+1F4CCWhat is Focal Loss in RetinaNet?[src]

U+1F4A1Focal Loss helps in dealing with class imbalance. Focal loss (FL) adopts an approach to reduce the loss for a well-trained class. So whenever the model is good at detecting background, it will reduce its loss and reemphasize the training on the object class.

U+1F4CCWhat is the Loss function in SSD?[src]

U+1F4A1SSD’s loss function is a combination of two critical components :

  • Confidence Loss: This measures how confident the network is of the objectness of the computed bounding box. Categorical cross-entropy is used to compute this loss.
  • Location Loss: This measures how far away the network’s predicted bounding boxes are from the ground truth ones from the training set. L2-Norm is used here.

ssd_loss = confidence_loss + alpha * location_loss

The alpha term helps us to balance the contribution of the location loss.

U+1F4CCWhat is FPN?[src]

U+1F4A1 Feature Pyramid Network (FPN) is a feature extractor designed with a feature pyramid concept to improve accuracy and speed. Images are first to pass through the CNN pathway, yielding semantically rich final layers. Then to regain better resolution, it creates a top-down pathway by upsampling this feature map. While the top-down pathway helps detect objects of varying sizes, spatial positions may be skewed. Lateral connections are added between the original feature maps and the corresponding reconstructed layers to improve object localization. It currently provides one of the leading ways to detect objects at multiple scales, and YOLOv3, Faster R-CNN were build up with this technique.

U+1F4CC Why do we use data augmentation?[src]

U+1F4A1 Data augmentation is a technique for synthesizing new data by modifying existing data in such a way that the target is not changed, or it is changed in a known way. Data augmentation is important in improving accuracy. Augment data techniques like flipping, cropping, add noise, and color distortion.

Data augmentation helps in performance improvement in SSD300:


U+1F4CC What is the advantage of SDD over Faster R-CNN?[src]

U+1F4A1SSD speeds up the process by removing the need for the region proposal network(RPN) used in Faster R-CNN.

U+1F4CCWhat are the metrics used for object detection?[src]

U+1F4A1mAP (mean Average precision) is a popular metric in measuring the accuracy of object detectors. Average precision calculates the average precision value for recall value over 0 to 1.

U+1F4CCWhat is NMS?[src]

U+1F4A1 Non-Max Suppression (NMS) is a technique used in many computer vision object detection algorithms. It is a class of algorithms to select one bounding box out of many overlapping bounding boxes for a single class.

NMS implementation:

  1. Sort the prediction confidence scores in decreasing order.
  2. Start from the top scores, ignore any current prediction if we find any previous predictions that have the same class and IoU > Threshold(generally we use 0.5) with the current prediction.
  3. Repeat the above step until all predictions are checked.

U+1F4CCWhat is IoU?[src]

U+1F4A1NMS uses the concept of Intersection over Union (IoU). IoU calculates intersection over the union of the two bounding boxes, the bounding box of the ground truth and the predicted bounding box.

U+1F4CCWhen do you say that an object detection method is efficient?

U+1F4A1 The performance efficiency of detection is measured using Floating-point Operations Per Second (FLOPS).

In the above figure, you can see that EfficientDet and YOLOv3 have fewer FLOPS, So we can say they are Efficient.

U+1F4CC Some questions about Hands-on experience in a custom object Detection

U+1F4A1 Try out Monk Object Detection Library


A one-stop repository for low-code easily-installable object detection pipelines. …

I am extremely passionate about computer vision and deep learning. I am an open-source contributor to Monk Libraries. Give us ⭐️ on our GitHub repo if you like Monk.

You can also see my other writings at:

Akula Hemanth Kumar – Medium

Read writing from Akula Hemanth Kumar on Medium. U+1F49FComputer VisionU+007C Linkedin…

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓