Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Fine Tuning Pytorch ViT for CIFAR10
Latest   Machine Learning

Fine Tuning Pytorch ViT for CIFAR10

Last Updated on November 5, 2024 by Editorial Team

Author(s): Ahmad Mustapha

Originally published on Towards AI.

In the previous article here we created a ViT model from scratch and trained it on the CIFAR10 dataset. However, the model accuracy peaked at 67% without deliberate hyperparameters fine tuning. This is expected as the original creators of the ViT model noted that these models have modest performance compared to CNNs when trained on small datasets. However, when scaled on a large dataset, they start to be on par with CNNs or even better. That is why it is recommended to fine tune ViT models that have been pretrained on large datasets such as ImageNet. And this is exactly what we will do in the post.

The Training Loop

We start by writing the boilerplate code for training and testing any model on the CIFAR10 dataset. You will notice that we resized the images in the training and testing image transformations to 224, noting that the original image size of CIFAR10 is 32. This is because the model that will be used from Pytorch requires the input size to be 224, as it has been trained on ImageNet.

transform_train = transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

train_set = CIFAR10(root='./datasets', train=True, download=True, transform=transform_train)
test_set = CIFAR10(root='./datasets', train=False, download=True, transform=transform_test)

train_loader = DataLoader(train_set, shuffle=True, batch_size=64)
test_loader = DataLoader(test_set, shuffle=False, batch_size=64)
n_epochs = 10
lr = 0.0001

optimizer = Adam(model.parameters(), lr=lr)
criterion = CrossEntropyLoss()

for epoch in range(n_epochs):
train_loss = 0.0
for i,batch in enumerate(train_loader):
x, y = batch
x, y = x.to(device), y.to(device)
y_hat = model(x)
loss = criterion(y_hat, y)

batch_loss = loss.detach().cpu().item()
train_loss += batch_loss / len(train_loader)

optimizer.zero_grad()
loss.backward()
optimizer.step()

if i%100==0:
print(f"Batch {i}/{len(train_loader)} loss: {batch_loss:.03f}")

print(f"Epoch {epoch + 1}/{n_epochs} loss: {train_loss:.03f}")

Loading The Model

Now we have to load the ViT_b_16 model from torchvision.models. All ViT models available in torchvision are listed in the following link here. If you check the link, you will find several models with labels such as b, l, and h. Those labels correspond to the model size we have base, large, and huge. The architecture of these models are the exact ones that have been published in the first ViT paper titled An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. The number associated with these labels such as 16, 32, and 14 corresponds to the patch size that the model used. All these models have been trained on ImageNet. We start by loading the model. The default model provided is not pretrained to make sure we load a pretrained model we have to pass the weights argument as ViT_B_16_Weights.IMAGENET1K_V1.

from torchvision.models import ViT_B_16_Weights, vit_b_16

model = vit_b_16(ViT_B_16_Weights.IMAGENET1K_V1)

By default, this model output logits from 1000 classes as it has been trained on ImageNet. However, our dataset contains only 10 classes. Thus, we need to change the head of this model from 1000 to 10 logits. The outer layer of the loaded model is the β€œheads” layer which is a sequential layer that include only one linear layer. To do adapt the model we simply assign a new Linear layer to the β€œheads” layer while preserving the input features of the layer and replacing the outer features by 10.

model = vit_b_16(ViT_B_16_Weights.IMAGENET1K_V1)

model.heads = nn.Sequential(
nn.Linear(model.heads.head.in_features, 10)
)

Rather than training or the transformer blocks in the loaded model we can freeze all the layers except for the last transformer layer. By doing this we make the fine-tuning procedure less compute intensive. We finally move the model to the GPU device and train it using the previous training loop.

model = vit_b_16(ViT_B_16_Weights.IMAGENET1K_V1)

model.heads = nn.Sequential(
nn.Linear(model.heads.head.in_features, 10)
)

# Freeze all layers
for param in model.parameters():
param.requires_grad = False

# Unfreeze the last encoder layer and the head
for param in model.encoder.layers[-1].parameters():
param.requires_grad = True
for param in model.heads.parameters():
param.requires_grad = True

Testing Loop

We finally test our model on the testing dataset of CIFAR10. You will find that the model reach a very high accuracy even after training on only one epoch. This is because of the powerful features that have been crafted when the model was being trained on ImageNet.

with torch.no_grad():
correct, total = 0, 0
test_loss = 0.0
for batch in tqdm(test_loader, desc="Testing"):
x, y = batch
x, y = x.to(device), y.to(device)
y_hat = model(x)
loss = criterion(y_hat, y)
test_loss += loss.detach().cpu().item() / len(test_loader)

correct += torch.sum(torch.argmax(y_hat, dim=1) == y).detach().cpu().item()
total += len(x)
print(f"Test loss: {test_loss:.2f}")
print(f"Test accuracy: {correct / total * 100:.2f}%")

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓