APTOS 2019 Blindness Detection — Playing around with ResNeXts and Progressive NASNet
Last Updated on July 24, 2023 by Editorial Team
Author(s): Luka Chkhetiani
Originally published on Towards AI.
APTOS 2019 Blindness Detection U+007C Towards AI
A while ago Kaggle announced challenge: APTOS 2019 Blindness Detection — Detect diabetic retinopathy to stop blindness before it’s too late.
The idea of competition is to predict the severity of diabetic retinopathy on a scale of 0–4.
0 - No DR1 - Mild2 - Moderate3 - Severe4 - Proliferative DR
The training set consists of 3661 images and 4 classes. We should note that the dataset is not well-balanced. Well, we’ll use a couple of tricks to maximize the possible outcome, including augmentation, freezing & unfreezing layers, etc.
I’ve tried a couple of models, such as DenseNet, ResNet50/101, Inception v3/v4 and even ResNeXt-101–32x8d — architecture with 88M parameters, 82.2 top-1 and 96.4 top-5 errors weren’t able to surpass the 66.2% accuracy limit on submission set, nonetheless, some of them have shown >95.0% acc on validation.
So, I sat down and analyzed the dataset, and a couple of models.
Idea is that — big models overfitted, and small models under fitted, no matter how good I’ve tuned them.
Searched through models once again, and I got:
- ResNeXt 5 32x4d — {see paper here}
- PNASNet 5 Large — {see paper here}
And, decided to try both models.
I’ve implemented TensorboardX , which generates ‘run’ directory after execution, so we can visualize the training process.
torch.DataParallel?
No, I’m a victim of Google Colab for now.
My PyTorch code for ResNeXt50 training:
For PNASNet 5 Large:
I used Cadene’s implementation initially, and then converted to my code for fine-tuning, by just loading the model.
Training Set
After combining the datasets, the total number of images per class looks like the next:
Total - 3661 images
0 - No DR : 1805 images1 - Mild : 370 images2 - Moderate : 999 images3 - Severe : 193 images4 - Proliferative DR : 295 images
Actually, there’s a huge difference between the classes. We’ll use augmentation, but anyway — augmenting images can help, but not a lot.
A simple code that will help you to sort the images by classes after unzipping them would be:
Data Augmentation
Dataset is not so rich. We’ve 3661 images total. So, I’m going to use PyTorch’s data augmentation techniques, such as:
transforms.RandomVerticalFlip(p=0.5),
transforms.RandomRotation((0,360), center=None),
transforms.RandomHorizontalFlip(p=0.5)
Randomly flipping images vertically with 0.5 probability.
Random Rotation within 0,359 range (basically giving the ability to make a full rotation)
Randomly flipping horizontally with 0.5 probability.
PyTorch data transformation techniques work perfectly. By the end of the epoch, the model will have seen additionally 3 different augmentations on a single image.
Data Resizing
ResNeXt is being trained on 299×299 resized images. But, PNASNet requires 331×331 inputs. Thus, I’m going to modify the code respectively.
Little UX:
In case we want the process to run in the background, and not have a laptop/PC up all night, we can use nohup.
The command will be:
nohup python3 train.py &
And, we can see the stdouts via:
cat nohup.out
Or, tail them via:
tail -f nohup.out
Additionally, I really didn’t want to mess with ngrok. So, I used subprocess function to download the tensorboard output every once in a while, and refresh it.
from time import sleepimport subprocessfor i in range(10000):subprocess.run('scp user@ip_address:~/APTOS/runs/Aug30_21-58-43/* runs/', shell=True)sleep(20)
Let’s start training.
Plan:
- Train the ResNeXt for 3 and PNASNet for 2 epochs with learning rate 1e-3.
- Train the ResNeXt and PNASNet for 3 epochs with learning rate 1e-4
- ResNeXt — Freeze all layers but 3,4, decrease lr to 1e-5, and unfreeze the blocks step-by-step by decreasing lr by 10x.
- PNASNet — Freeze everything but cell_9, 10, 11 and continue training on 1e-5 lr. Afterward, unfreeze the cells step by step and anneal the lr by 10x.
Why?
While playing around with those two models, I noticed that it yielded persuasive accuracy on the first two epochs with lr 1e-3, but if the training was continued with the same parameters, it drafted around the same accuracy and loss.
Decreasing lr 10x times helps a lot to increase accuracy and continue decreasing the loss. 4 epochs are perfectly enough for the whole model to get to know with the dataset and study enough features.
But, after leaving only the deepest layers, which are responsible for deeper features makes the model more sophisticated and fastens up the training procedure. To say it in other words: I’m giving the model some time to familiarize with the dataset and study it, and afterward concentrating it on the most powerful components of the dataset.
1st Part
Learning rate 1e-3, all layers, 2 epochs.
PNASNet 5 Large
Epoch 1/50.. Train loss: 3.503.. Test loss: 5.913.. Test accuracy: 0.011
Epoch 1/50.. Train loss: 1.449.. Test loss: 1.084.. Test accuracy: 0.697
Epoch 1/50.. Train loss: 0.806.. Test loss: 0.673.. Test accuracy: 0.745
Epoch 1/50.. Train loss: 0.630.. Test loss: 0.606.. Test accuracy: 0.776
Epoch 1/50.. Train loss: 0.681.. Test loss: 0.591.. Test accuracy: 0.773
Epoch 2/50.. Train loss: 0.610.. Test loss: 0.546.. Test accuracy: 0.796
Epoch 2/50.. Train loss: 0.711.. Test loss: 0.564.. Test accuracy: 0.792
Epoch 2/50.. Train loss: 0.474.. Test loss: 0.582.. Test accuracy: 0.790
Epoch 2/50.. Train loss: 0.484.. Test loss: 0.539.. Test accuracy: 0.807
Epoch 2/50.. Train loss: 0.555.. Test loss: 0.527.. Test accuracy: 0.811
Epoch 3/50.. Train loss: 0.559.. Test loss: 0.520.. Test accuracy: 0.814
Epoch 3/50.. Train loss: 0.455.. Test loss: 0.507.. Test accuracy: 0.821
Epoch 3/50.. Train loss: 0.572.. Test loss: 0.486.. Test accuracy: 0.822
Epoch 3/50.. Train loss: 0.408.. Test loss: 0.520.. Test accuracy: 0.831
Epoch 3/50.. Train loss: 0.546.. Test loss: 0.466.. Test accuracy: 0.829
ResNeXt 50 32x4d
Epoch 1/50.. Train loss: 2.787.. Test loss: 3.990.. Test accuracy: 0.254
Epoch 1/50.. Train loss: 1.460.. Test loss: 0.893.. Test accuracy: 0.684
Epoch 1/50.. Train loss: 0.722.. Test loss: 0.654.. Test accuracy: 0.738
Epoch 2/50.. Train loss: 0.817.. Test loss: 0.640.. Test accuracy: 0.771
Epoch 2/50.. Train loss: 0.590.. Test loss: 0.533.. Test accuracy: 0.807
Epoch 2/50.. Train loss: 0.517.. Test loss: 0.496.. Test accuracy: 0.820
Epoch 3/50.. Train loss: 0.579.. Test loss: 0.559.. Test accuracy: 0.772
Epoch 3/50.. Train loss: 0.470.. Test loss: 0.493.. Test accuracy: 0.809
Epoch 3/50.. Train loss: 0.512.. Test loss: 0.477.. Test accuracy: 0.836
2nd Part
All Layers
PNASNet 5 Large
Epoch 1/50.. Train loss: 0.472.. Test loss: 0.402.. Test accuracy: 0.848
Epoch 1/50.. Train loss: 0.452.. Test loss: 0.370.. Test accuracy: 0.857
Epoch 1/50.. Train loss: 0.474.. Test loss: 0.383.. Test accuracy: 0.840
Epoch 1/50.. Train loss: 0.402.. Test loss: 0.383.. Test accuracy: 0.855
Epoch 2/50.. Train loss: 0.552.. Test loss: 0.398.. Test accuracy: 0.848
Epoch 2/50.. Train loss: 0.453.. Test loss: 0.378.. Test accuracy: 0.867
Epoch 2/50.. Train loss: 0.431.. Test loss: 0.391.. Test accuracy: 0.846
Epoch 2/50.. Train loss: 0.307.. Test loss: 0.379.. Test accuracy: 0.857
Epoch 3/50.. Train loss: 0.453.. Test loss: 0.378.. Test accuracy: 0.855
Epoch 3/50.. Train loss: 0.372.. Test loss: 0.376.. Test accuracy: 0.851
Epoch 3/50.. Train loss: 0.429.. Test loss: 0.380.. Test accuracy: 0.862
Epoch 3/50.. Train loss: 0.408.. Test loss: 0.385.. Test accuracy: 0.855
ResNeXt 50
Epoch 1/50.. Train loss: 0.230.. Test loss: 0.423.. Test accuracy: 0.837
Epoch 1/50.. Train loss: 0.454.. Test loss: 0.416.. Test accuracy: 0.846
Epoch 1/50.. Train loss: 0.462.. Test loss: 0.410.. Test accuracy: 0.841
Epoch 2/50.. Train loss: 0.458.. Test loss: 0.416.. Test accuracy: 0.837
Epoch 2/50.. Train loss: 0.437.. Test loss: 0.386.. Test accuracy: 0.854
Epoch 2/50.. Train loss: 0.407.. Test loss: 0.406.. Test accuracy: 0.846
Epoch 3/50.. Train loss: 0.425.. Test loss: 0.401.. Test accuracy: 0.845
Epoch 3/50.. Train loss: 0.424.. Test loss: 0.394.. Test accuracy: 0.836
Epoch 3/50.. Train loss: 0.552.. Test loss: 0.398.. Test accuracy: 0.846
After freezing & unfreezing layers and tuning the lr
PNASNet 5 Large
Epoch 1/50.. Train loss: 0.220.. Test loss: 0.344.. Test accuracy: 0.884
Epoch 1/50.. Train loss: 0.477.. Test loss: 0.361.. Test accuracy: 0.873
Epoch 1/50.. Train loss: 0.428.. Test loss: 0.345.. Test accuracy: 0.881
Epoch 1/50.. Train loss: 0.410.. Test loss: 0.358.. Test accuracy: 0.872
Epoch 1/50.. Train loss: 0.403.. Test loss: 0.356.. Test accuracy: 0.874
Epoch 1/50.. Train loss: 0.414.. Test loss: 0.332.. Test accuracy: 0.884
ResNeXt 50
Epoch 1/50.. Train loss: 0.176.. Test loss: 0.354.. Test accuracy: 0.876
Epoch 1/50.. Train loss: 0.332.. Test loss: 0.370.. Test accuracy: 0.865
Epoch 2/50.. Train loss: 0.401.. Test loss: 0.361.. Test accuracy: 0.865
Epoch 2/50.. Train loss: 0.376.. Test loss: 0.366.. Test accuracy: 0.866
Epoch 2/50.. Train loss: 0.342.. Test loss: 0.354.. Test accuracy: 0.870
Epoch 3/50.. Train loss: 0.399.. Test loss: 0.372.. Test accuracy: 0.870
Epoch 3/50.. Train loss: 0.330.. Test loss: 0.349.. Test accuracy: 0.875
Results
We’ve 87.5% accuracy on ResNeXt 50, and 88.4% on PNASNet 5.
Basically, I’ve tried SOTA, mediocre, and lastly — two out of top architectures for image classification.
And, we know — the models don’t work exactly the same way in real life as they show in test/validation procedures.
Anyway, tuning two models overnight was fun, and lastly — I’ll make predictions on actual test (submission) set, and we’ll see the results.
Prediction
My inference for the prediction part.
In case we’re using Google Colab for prediction, we should note that sometimes tqdm is not a great option, as long as it refreshes the stdout for every output, and the page will crash. I always try with tqdm firstly, and if it’s not working well, just erasing it on the loop, and writing my version of the script to see the prediction process. We should make sure that we erase the *.csv file after the first try, or it’ll append the new predictions to it.
Kaggle
While trying to make a submission after the prediction part, kaggle made me furious. Basically they had a bug in the kernel, that threw submission error every time I tried to submit the predictions.
After searching for a while, I saw one kaggler’s comment, that actually helped me.
P.S. Turning off GPU and Internet in Kernel were helpful as well, in addition to downgrading Python docker image to 1–7 versions on Kaggle.
I’m gonna lend their code for submission part:
Final Results:
ResNeXt 50
PNASNet 5
When I checked the reason why PNASNet worked so poorly, I noticed many of 0’s and 2’s, a couple of 1’s as prediction numbers in submission. And absolutely no 4’s or 3’s.
NASNet overfitted on 0 and 2 classes, as long as they held the most of the data, and made poor predictions on other classes, or didn’t make at all.
And, ResNeXt 50 turned out to work well for just overnight tuning.
There are 3 more days until the challenge closes, and I’m planning to try more
promising approaches as soon as I have time.
Hope you enjoyed it!
UPDATE:
By cropping dataset, and training the last layers longer, I made 4.1% improvement.
2nd UPDATE
Mean color subtraction gave 2.0% improvement.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI