Accelerate your data journey. Join us!


Computer Vision   Research

ResNeXt: from scratch

Author(s): Tanmay Debnath

Source: Unsplash

Computer VisionResearch

ResNeXt follows a simple concept of ‘divide and conquer’. ResNeXt is often referred to as the Extended version of the ‘ResNet’. Some of its important applications are in the field of Biomedical Engineering department and especially in the Bioimaging department. Here, I am going to explore the “making of ResNeXt: from scratch.”

Modules: PyTorch, CUDA (Optional)

If you are confused about how to install PyTorch in your system, then you might want to check out this link here. It would help you! Moving forward…


ResNeXt architecture is quite similar to that of the ResNet architecture. If you want to know about the ResNet architecture, then please head in this direction. It is a deep-learning-based algorithm whose main task is to understand the deeper insights of the image features. How do we get started then?…

import torch
import torch.nn as nn

This is the initial block of code that would initialize the PyTorch library in the Python environment. These are pretty hefty architectures and hence requires a lot of computation. By default, the architecture would expect the system to have good specifications (in terms of CPU and GPU capacity) to be able to complete its task in due time and with utmost accuracy.

Now, if you are new to Python and wanted to understand the basics of these large CNN architectures, then this might not be productive for you because of the fact that in this definition of ‘ResNeXt’ a lot of inheritance and class instances have been defined which is even hard for experienced programmers. This can be quite dazzling for the newbies, and hence I would request you to first go through the basics of OOPs.

Comfortable enough with the OOPs? Let’s move forward…

class resnext_block(nn.Module):
def __init__(self, in_channels, cardinality, bwidth, idt_downsample=None, stride=1):
super(resnext_block, self).__init__()
self.expansion = 2
out_channels = cardinality * bwidth
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, groups=cardinality, stride=stride, padding=1)
self.bn2 = nn.BatchNorm2d(out_channels)
self.conv3 = nn.Conv2d(out_channels, out_channels*self.expansion, kernel_size=1, stride=1, padding=0)
self.bn3 = nn.BatchNorm2d(out_channels*self.expansion)
self.relu = nn.ReLU()
self.identity_downsample = idt_downsample

Starting off with the definition of the layers block, in the first block, we defined the subsequent components that would be required for moving forward with the structure. This is just the initialization phase. Whenever we would be calling the class, the first thing that the class would do is to initialize these modules along with their defined specifications.

One of the things that you might relate, if you have studied ResNet and the research paper for ResNeXt is that we have cardinality and the base-width of the groups defined in the earlier function. We haven’t defined it in ResNet because now we are dividing the entire structure and stacking them side-by-side, and then analyzing everything. So, the ‘cardinality’ would define the groups in the total architecture, and the base-width would totally define the ‘out channels’ in the architecture.

What to do next?…

def forward(self, x):
identity = x
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.conv3(x)
x = self.bn3(x)

if self.identity_downsample is not None:
identity = self.identity_downsample(identity)

x += identity
x = self.relu(x)
return x

The ‘forward’ function would be called immediately after the initialization. This would contain all the methods in an orderly fashion, as described in the research paper. You can find the similarity of the ‘forward’ function in the ResNet blog.

Presenting you the ResNeXt architecture…

class ResNeXt(nn.Module):
def __init__(self, resnet_block, layers, cardinality, bwidth, img_channels, num_classes):
super(ResNeXt, self).__init__()
self.in_channels = 64
self.conv1 = nn.Conv2d(img_channels, 64, kernel_size=7, stride=2, padding=3)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU()
self.cardinality = cardinality
self.bwidth = bwidth
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

# ResNeXt Layers
self.layer1 = self._layers(resnext_block, layers[0], stride=1)
self.layer2 = self._layers(resnext_block, layers[1], stride=2)
self.layer3 = self._layers(resnext_block, layers[2], stride=2)
self.layer4 = self._layers(resnext_block, layers[3], stride=2)

self.avgpool = nn.AdaptiveAvgPool2d((1,1))
self.fc = nn.Linear(self.cardinality * self.bwidth, num_classes)

We come to the formal definition of the ‘ResNeXt’ architecture. Again, this is the initialization phase. One might notice that there is the initialization of some ‘_layers’ method, and yet there are not instances of the same defined whatsoever. It would be defined as we go through the next steps…

def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)

x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)

x = self.avgpool(x)
x = x.reshape(x.shape[0], -1)
x = self.fc(x)
return x

Here comes the ‘forward’ function. Officially our ResNeXt function is defined. We structure the ‘forward’ function as it has been defined in the paper. Using the initial stages of Conv layers and then setting up the entire layer depending upon the requirements.

def _layers(self, resnext_block, no_residual_blocks, stride):
identity_downsample = None
out_channels = self.cardinality * self.bwidth
layers = []

if stride != 1 or self.in_channels != out_channels * 2:
identity_downsample = nn.Sequential(nn.Conv2d(self.in_channels, out_channels*2, kernel_size=1,

layers.append(resnext_block(self.in_channels, self.cardinality, self.bwidth, identity_downsample, stride))
self.in_channels = out_channels * 2

for i in range(no_residual_blocks - 1):
layers.append(resnext_block(self.in_channels, self.cardinality, self.bwidth))

self.bwidth *= 2

return nn.Sequential(*layers)

So, we have defined everything, and it’s time for us to determine whether our speculations are correct or not. We can implement the code block given below to test the architecture.

def ResNeXt50(img_channels=3, num_classes=1000, cardinality=32, bwidth=4):
return ResNeXt(resnext_block, [3,4,6,3], cardinality, bwidth, img_channels, num_classes)

And we are done! That was a hefty job to go through. Thanks for sticking up to the last. This is an important architecture in the domains of CNN, and understanding it is difficult indeed! If you need further help, see the sections below…

Help is on the way!

If you still feel that you need the entire code, then please go to this link here.


Please check out the original works of the researchers.

ResNeXt: from scratch was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓