ResNeXt: from scratch
Last Updated on January 1, 2021 by Editorial Team
Author(s): Tanmay Debnath
Computer Vision,Β Research
ResNeXt follows a simple concept of βdivide and conquerβ. ResNeXt is often referred to as the Extended version of the βResNetβ. Some of its important applications are in the field of Biomedical Engineering department and especially in the Bioimaging department. Here, I am going to explore the βmaking of ResNeXt: from scratch.β
Modules: PyTorch, CUDA (Optional)
If you are confused about how to install PyTorch in your system, then you might want to check out this link here. It would help you! MovingΒ forwardβ¦
ResNeXt
ResNeXt architecture is quite similar to that of the ResNet architecture. If you want to know about the ResNet architecture, then please head in this direction. It is a deep-learning-based algorithm whose main task is to understand the deeper insights of the image features. How do we get startedΒ then?β¦
import torch
import torch.nn as nn
This is the initial block of code that would initialize the PyTorch library in the Python environment. These are pretty hefty architectures and hence requires a lot of computation. By default, the architecture would expect the system to have good specifications (in terms of CPU and GPU capacity) to be able to complete its task in due time and with utmost accuracy.
Now, if you are new to Python and wanted to understand the basics of these large CNN architectures, then this might not be productive for you because of the fact that in this definition of βResNeXtβ a lot of inheritance and class instances have been defined which is even hard for experienced programmers. This can be quite dazzling for the newbies, and hence I would request you to first go through the basics ofΒ OOPs.
Comfortable enough with the OOPs? Letβs moveΒ forwardβ¦
class resnext_block(nn.Module):
def __init__(self, in_channels, cardinality, bwidth, idt_downsample=None, stride=1):
super(resnext_block, self).__init__()
self.expansion = 2
out_channels = cardinality * bwidth
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, groups=cardinality, stride=stride, padding=1)
self.bn2 = nn.BatchNorm2d(out_channels)
self.conv3 = nn.Conv2d(out_channels, out_channels*self.expansion, kernel_size=1, stride=1, padding=0)
self.bn3 = nn.BatchNorm2d(out_channels*self.expansion)
self.relu = nn.ReLU()
self.identity_downsample = idt_downsample
Starting off with the definition of the layers block, in the first block, we defined the subsequent components that would be required for moving forward with the structure. This is just the initialization phase. Whenever we would be calling the class, the first thing that the class would do is to initialize these modules along with their defined specifications.
One of the things that you might relate, if you have studied ResNet and the research paper for ResNeXt is that we have cardinality and the base-width of the groups defined in the earlier function. We havenβt defined it in ResNet because now we are dividing the entire structure and stacking them side-by-side, and then analyzing everything. So, the βcardinalityβ would define the groups in the total architecture, and the base-width would totally define the βout channelsβ in the architecture.
What to doΒ next?β¦
def forward(self, x):
identity = x
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.conv3(x)
x = self.bn3(x)
if self.identity_downsample is not None:
identity = self.identity_downsample(identity)
x += identity
x = self.relu(x)
return x
The βforwardβ function would be called immediately after the initialization. This would contain all the methods in an orderly fashion, as described in the research paper. You can find the similarity of the βforwardβ function in the ResNetΒ blog.
Presenting you the ResNeXt architectureβ¦
class ResNeXt(nn.Module):
def __init__(self, resnet_block, layers, cardinality, bwidth, img_channels, num_classes):
super(ResNeXt, self).__init__()
self.in_channels = 64
self.conv1 = nn.Conv2d(img_channels, 64, kernel_size=7, stride=2, padding=3)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU()
self.cardinality = cardinality
self.bwidth = bwidth
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# ResNeXt Layers
self.layer1 = self._layers(resnext_block, layers[0], stride=1)
self.layer2 = self._layers(resnext_block, layers[1], stride=2)
self.layer3 = self._layers(resnext_block, layers[2], stride=2)
self.layer4 = self._layers(resnext_block, layers[3], stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1,1))
self.fc = nn.Linear(self.cardinality * self.bwidth, num_classes)
We come to the formal definition of the βResNeXtβ architecture. Again, this is the initialization phase. One might notice that there is the initialization of some β_layersβ method, and yet there are not instances of the same defined whatsoever. It would be defined as we go through the nextΒ stepsβ¦
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = x.reshape(x.shape[0], -1)
x = self.fc(x)
return x
Here comes the βforwardβ function. Officially our ResNeXt function is defined. We structure the βforwardβ function as it has been defined in the paper. Using the initial stages of Conv layers and then setting up the entire layer depending upon the requirements.
def _layers(self, resnext_block, no_residual_blocks, stride):
identity_downsample = None
out_channels = self.cardinality * self.bwidth
layers = []
if stride != 1 or self.in_channels != out_channels * 2:
identity_downsample = nn.Sequential(nn.Conv2d(self.in_channels, out_channels*2, kernel_size=1,
stride=stride),
nn.BatchNorm2d(out_channels*2))
layers.append(resnext_block(self.in_channels, self.cardinality, self.bwidth, identity_downsample, stride))
self.in_channels = out_channels * 2
for i in range(no_residual_blocks - 1):
layers.append(resnext_block(self.in_channels, self.cardinality, self.bwidth))
self.bwidth *= 2
return nn.Sequential(*layers)
So, we have defined everything, and itβs time for us to determine whether our speculations are correct or not. We can implement the code block given below to test the architecture.
def ResNeXt50(img_channels=3, num_classes=1000, cardinality=32, bwidth=4):
return ResNeXt(resnext_block, [3,4,6,3], cardinality, bwidth, img_channels, num_classes)
And we are done! That was a hefty job to go through. Thanks for sticking up to the last. This is an important architecture in the domains of CNN, and understanding it is difficult indeed! If you need further help, see the sectionsΒ belowβ¦
Help is on theΒ way!
If you still feel that you need the entire code, then please go to this linkΒ here.
References
Please check out the original works of the researchers.
ResNeXt: from scratch was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI