Fine Tuning Pytorch ViT for CIFAR10

Last Updated on November 5, 2024 by Editorial Team

Author(s): Ahmad Mustapha

Originally published on Towards AI.

In the previous article here we created a ViT model from scratch and trained it on the CIFAR10 dataset. However, the model accuracy peaked at 67% without deliberate hyperparameters fine tuning. This is expected as the original creators of the ViT model noted that these models have modest performance compared to CNNs when trained on small datasets. However, when scaled on a large dataset, they start to be on par with CNNs or even better. That is why it is recommended to fine tune ViT models that have been pretrained on large datasets such as ImageNet. And this is exactly what we will do in the post.

The Training Loop

We start by writing the boilerplate code for training and testing any model on the CIFAR10 dataset. You will notice that we resized the images in the training and testing image transformations to 224, noting that the original image size of CIFAR10 is 32. This is because the model that will be used from Pytorch requires the input size to be 224, as it has been trained on ImageNet.

transform_train = transforms.Compose([
 transforms.Resize(224),
 transforms.ToTensor(),
 transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
 transforms.Resize(224),
 transforms.ToTensor(),
 transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

train_set = CIFAR10(root='./datasets', train=True, download=True, transform=transform_train)
test_set = CIFAR10(root='./datasets', train=False, download=True, transform=transform_test)

train_loader = DataLoader(train_set, shuffle=True, batch_size=64)
test_loader = DataLoader(test_set, shuffle=False, batch_size=64)

n_epochs = 10
lr = 0.0001

optimizer = Adam(model.parameters(), lr=lr)
criterion = CrossEntropyLoss()

for epoch in range(n_epochs):
 train_loss = 0.0
 for i,batch in enumerate(train_loader):
 x, y = batch
 x, y = x.to(device), y.to(device)
 y_hat = model(x)
 loss = criterion(y_hat, y)

 batch_loss = loss.detach().cpu().item()
 train_loss += batch_loss / len(train_loader)

 optimizer.zero_grad()
 loss.backward()
 optimizer.step()

 if i%100==0:
 print(f"Batch {i}/{len(train_loader)} loss: {batch_loss:.03f}")

 print(f"Epoch {epoch + 1}/{n_epochs} loss: {train_loss:.03f}")

Loading The Model

Now we have to load the ViT_b_16 model from torchvision.models. All ViT models available in torchvision are listed in the following link here. If you check the link, you will find several models with labels such as b, l, and h. Those labels correspond to the model size we have base, large, and huge. The architecture of these models are the exact ones that have been published in the first ViT paper titled An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. The number associated with these labels such as 16, 32, and 14 corresponds to the patch size that the model used. All these models have been trained on ImageNet. We start by loading the model. The default model provided is not pretrained to make sure we load a pretrained model we have to pass the weights argument as ViT_B_16_Weights.IMAGENET1K_V1.

from torchvision.models import ViT_B_16_Weights, vit_b_16

model = vit_b_16(ViT_B_16_Weights.IMAGENET1K_V1)

By default, this model output logits from 1000 classes as it has been trained on ImageNet. However, our dataset contains only 10 classes. Thus, we need to change the head of this model from 1000 to 10 logits. The outer layer of the loaded model is the “heads” layer which is a sequential layer that include only one linear layer. To do adapt the model we simply assign a new Linear layer to the “heads” layer while preserving the input features of the layer and replacing the outer features by 10.

model = vit_b_16(ViT_B_16_Weights.IMAGENET1K_V1)

model.heads = nn.Sequential(
 nn.Linear(model.heads.head.in_features, 10)
)

Rather than training or the transformer blocks in the loaded model we can freeze all the layers except for the last transformer layer. By doing this we make the fine-tuning procedure less compute intensive. We finally move the model to the GPU device and train it using the previous training loop.

model = vit_b_16(ViT_B_16_Weights.IMAGENET1K_V1)

model.heads = nn.Sequential(
 nn.Linear(model.heads.head.in_features, 10)
)

# Freeze all layers
for param in model.parameters():
 param.requires_grad = False

# Unfreeze the last encoder layer and the head
for param in model.encoder.layers[-1].parameters():
 param.requires_grad = True
for param in model.heads.parameters():
 param.requires_grad = True

Testing Loop

We finally test our model on the testing dataset of CIFAR10. You will find that the model reach a very high accuracy even after training on only one epoch. This is because of the powerful features that have been crafted when the model was being trained on ImageNet.

with torch.no_grad():
 correct, total = 0, 0
 test_loss = 0.0
 for batch in tqdm(test_loader, desc="Testing"):
 x, y = batch
 x, y = x.to(device), y.to(device)
 y_hat = model(x)
 loss = criterion(y_hat, y)
 test_loss += loss.detach().cpu().item() / len(test_loader)

 correct += torch.sum(torch.argmax(y_hat, dim=1) == y).detach().cpu().item()
 total += len(x)
 print(f"Test loss: {test_loss:.2f}")
 print(f"Test accuracy: {correct / total * 100:.2f}%")

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Fine Tuning Pytorch ViT for CIFAR10

Author(s): Ahmad Mustapha

The Training Loop

Loading The Model

Testing Loop

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Fine Tuning Pytorch ViT for CIFAR10

Author(s): Ahmad Mustapha

The Training Loop

Loading The Model

Testing Loop

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement