YOLO V5 — Explained and Demystified

Last Updated on July 24, 2023 by Editorial Team

YOLO V5 — Model Architecture and Technical Details Explanation

From my previous article on YOLOv5, I received multiple messages and queries on how things are different in yolov5 and other related technical doubts.

Therefore, I decided to write another article to explain some technical details used in YOLOv5.

As YOLO v5 has a total of 4 versions, I will cover the ‘s’ version. But if you refer this thoroughly you will find that in other versions there are no huge changes except for the model layers/architecture and a number of parameters.

In this article, I will cover the following the most important details and aspects used in YOLOv5 implementation.

YOLO v5 Model Architecture
Activation Function
Optimization Function
Cost Function or Loss Function
Weights, Biases, Parameters, Gradients, and Final Model Summary

NOTE: As YOLO v5 is still in the development phase and we are receiving updates from ultralytics frequently, in future developers may change some aspects. So this article is specifically for the initial release of YOLOv5 only. However, I will try to update/add article for subsequent releases as well.

Let’s move to the technical discussion.

YOLO v5 Model Architecture

As YOLO v5 is a single-stage object detector, it has three important parts like any other single-stage object detector.

Model Backbone
Model Neck
Model Head

Model Backbone is mainly used to extract important features from the given input image. In YOLO v5 the CSP — Cross Stage Partial Networks are used as a backbone to extract rich in informative features from an input image.

CSPNet has shown significant improvement in processing time with deeper networks. Refer to the following image, for more information about CSPNet visit the Github repo.

Source: https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/fig/cmp3.png

Model Neck is mainly used to generate feature pyramids. Feature pyramids help models to generalized well on object scaling. It helps to identify the same object with different sizes and scales.

Feature pyramids are very useful and help models to perform well on unseen data. There are other models that use different types of feature pyramid techniques like FPN, BiFPN, PANet, etc.

In YOLO v5 PANet is used for as neck to get feature pyramids. For more information on features pyramids, refer to the following link.

Understanding Feature Pyramid Networks for object detection (FPN)

Detecting objects in different scales is challenging in particular for small objects. We can use a pyramid of the same…

medium.com

The model Head is mainly used to perform the final detection part. It applied anchor boxes on features and generates final output vectors with class probabilities, objectness scores, and bounding boxes.

In YOLO v5 model head is the same as the previous YOLO V3 and V4 versions.

Additionally, I am attaching the final model architecture for YOLO v5 — a small version. You can find it here.

Activation Function

The choice of activation functions is most crucial in any deep neural network. Recently lots of activation functions have been introduced like Leaky ReLU, mish, swish, etc.

YOLO v5 authors decided to go with the Leaky ReLU and Sigmoid activation function.

In YOLO v5 the Leaky ReLU activation function is used in middle/hidden layers and the sigmoid activation function is used in the final detection layer. You can verify it here.

Optimization Function

For optimization function in YOLO v5, we have two options

SGD
Adam

In YOLO v5, the default optimization function for training is SGD.

However, you can change it to Adam by using the “ — — adam” command-line argument.

Cost Function or Loss Function

In the YOLO family, there is a compound loss is calculated based on objectness score, class probability score, and bounding box regression score.

Ultralytics have used Binary Cross-Entropy with Logits Loss function from PyTorch for loss calculation of class probability and object score.

We also have an option to choose the Focal Loss function to calculate the loss. You can choose to train with Focal Loss by using fl_gamma hyper-parameter.

Weights, Biases, Parameters, Gradients, and Final Model Summary

To look closely at weights, biases, shapes, and parameters at each layer in the YOLOv5-small model, refer to the following information.

Source: https://gist.github.com/mihir135/969d78149b724b7684e327a1672da667

Additionally, you can also refer to the following brief summary of the YOLO v5 — small model.

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Hopefully, this may help you to understand the YOLO v5 better. In the ending notes, I would like to thank you for reading.

Feel free to contact me for any doubts/queries/suggestions.

References:

[1] https://github.com/ultralytics/yolov5

[2] https://github.com/WongKinYiu/CrossStagePartialNetworks

Contributions:

Mayur Patel

Feel free to connect:

LinkedIN : https://www.linkedin.com/in/mihir-rajput/

Instagram : https://www.instagram.com/ai_dev_/

Github : https://github.com/mihir135/

Thanks and Cheers!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

YOLO V5 — Explained and Demystified

Author(s): Mihir Rajput

Computer Vision

YOLO V5 — Model Architecture and Technical Details Explanation

YOLO v5 Model Architecture

Understanding Feature Pyramid Networks for object detection (FPN)

Detecting objects in different scales is challenging in particular for small objects. We can use a pyramid of the same…

Activation Function

Optimization Function

Cost Function or Loss Function

Weights, Biases, Parameters, Gradients, and Final Model Summary

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Inside the MCP Revolution: How AI Systems Are Learning to Speak the Same Language

DeepSeek R1: Pioneering Research and Engineering as a Competitor to Pure Scaling Approaches

The Great Disconnect: Why Talking to Machines Still Feels Like Talking to Machines

Human in The Loop

Fine-Tuning Language Models for Business: Making Large Language Models Truly Yours

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

YOLO V5 — Explained and Demystified

Author(s): Mihir Rajput

YOLO V5 — Model Architecture and Technical Details Explanation

YOLO v5 Model Architecture

Understanding Feature Pyramid Networks for object detection (FPN)

Detecting objects in different scales is challenging in particular for small objects. We can use a pyramid of the same…

Activation Function

Optimization Function

Cost Function or Loss Function

Weights, Biases, Parameters, Gradients, and Final Model Summary

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥