EfficientDet: When Object Detection Meets Scalability and Efficiency

Last Updated on July 20, 2023 by Editorial Team

Author(s): Aniket Maurya

Originally published on Towards AI.

EfficientDet, a highly efficient and scalable state of the art object detection model developed by Google Research, Brain Team. It is not just a single model. It has a family of detectors which achieve better accuracy with an order-of-magnitude fewer parameters and FLOPS than previous object detectors.

EfficientDet paper has mentioned its 7 family members.

Comparison of EfficientDet detectors[0–6] with other SOTA object detection models.

Quick Overview of the Paper

EfficientNet is the backbone architecture used in the model. EfficientNet is also written by the same authors at Google. Conventional CNN models arbitrarily scaled network dimensions- width, depth and resolution. EfficientNet uniformly scales each dimension with a fixed set of scaling coefficients. It surpassed SOTA accuracy with 10x efficiency.
BiFPN: While fusing (applying residual or skip connections) different input features, most of the works simply summed them up without any distinction. Since both input features are at the different resolutions they don’t equally contribute to the fused output layer. The paper proposes a weighted bi-directional feature pyramid network (BiFPN), which introduces learnable weights to learn the importance of different input features.
Compound Scaling: For higher accuracy previous object detection models relied on — bigger backbone or larger input image sizes. Compound Scaling is a method that uses a simple compound coefficient φ to jointly scale-up all dimensions of the backbone network, BiFPN network, class/box network, and resolution.

Combining EfficientNet backbones with our propose BiFPN and compound scaling, we have developed a new family of object detectors, named EfficientDet, which consistently achieve better accuracy with an order-of-magnitude fewer parameters and FLOPS than previous object detectors.

BiFPN

Conventional FPN (Feature Pyramid Network) is limited by the one-way information flow. PANet added an extra bottom-up path for information flow. PANet achieved better accuracy but with the cost and more parameters and computations. The paper proposed several optimizations for cross-scale connections:

Remove Nodes that only have one input edge.
If a node has only one input edge with no feature fusion, then it will have less contribution to the feature network that aims at fusing different features.
Add an extra edge from the original input to output node if they are at the same level, in order to fuse more features without adding much cost.
Treat each bidirectional (top-down & bottom-up) path as one feature network layer, and repeat the same layer multiple times to enable more high-level feature fusion.

Weighted Feature Fusion

While multi-scale fusion, input features are not simply summed up. The authors proposed to add additional weight for each input during feature fusion and let the network to learn the importance of each input feature. Out of three weighted fusion approaches —
Unbounded fusion:

Source: https://arxiv.org/abs/1911.09070

Where W is a learnable weight that can be a scalar (per-feature), a vector (per-channel), or a multi-dimensional tensor (per-pixel). Since the scalar weight is unbounded, it could potentially cause training instability. So, Softmax-based fusion was tried for normalized weights.
Softmax-based fusion:

As softmax normalizes the weights to be the probability of range 0 to 1 which can denote the importance of each input. The softmax leads to a slowdown on GPU.
Fast normalized fusion:

Є is added for numeric stability. It is 30% faster on GPU and gave almost as accurate results as softmax.

Final BiFPN integrates both the bidirectional cross-scale connections and the fast normalized fusion.

EfficientDet Architecture

EfficientDet follows one-stage-detection paradigm. A pre-trained EfficientNet backbone is used with BiFPN as the feature extractor. BiFPNN takes {P3, P4, P5, P6, P7} features from the EfficientNet backbone network and repeatedly applies bidirectional feature fusion.
The fused features are fed to a class and bounding box network for predicting object class and bounding box.

Hope you liked the article.

U+1F449 Twitter: https://twitter.com/aniketmaurya
U+1F449 Mail: [email protected]

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

EfficientDet: When Object Detection Meets Scalability and Efficiency

Author(s): Aniket Maurya

Quick Overview of the Paper

BiFPN

Weighted Feature Fusion

EfficientDet Architecture

EfficientDet: Scalable and Efficient Object Detection

Model efficiency has become increasingly important in computer vision. In this paper, we systematically study various…

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for…

Path Aggregation Network for Instance Segmentation

The way that information propagates in neural networks is of great importance. In this paper, we propose Path…

Deep Residual Learning for Image Recognition

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of…

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

EfficientDet: When Object Detection Meets Scalability and Efficiency

Author(s): Aniket Maurya

Quick Overview of the Paper

BiFPN

Weighted Feature Fusion

EfficientDet Architecture

EfficientDet: Scalable and Efficient Object Detection

Model efficiency has become increasingly important in computer vision. In this paper, we systematically study various…

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for…

Path Aggregation Network for Instance Segmentation

The way that information propagates in neural networks is of great importance. In this paper, we propose Path…

Deep Residual Learning for Image Recognition

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement