Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


APTOS 2019 Blindness Detection — Playing around with ResNeXts and Progressive NASNet
Latest   Machine Learning   Newsletter

APTOS 2019 Blindness Detection — Playing around with ResNeXts and Progressive NASNet

Last Updated on July 24, 2023 by Editorial Team

Author(s): Luka Chkhetiani

Originally published on Towards AI.

APTOS 2019 Blindness Detection U+007C Towards AI

A while ago Kaggle announced challenge: APTOS 2019 Blindness Detection — Detect diabetic retinopathy to stop blindness before it’s too late.

The idea of competition is to predict the severity of diabetic retinopathy on a scale of 0–4.

0 - No DR1 - Mild2 - Moderate3 - Severe4 - Proliferative DR

The training set consists of 3661 images and 4 classes. We should note that the dataset is not well-balanced. Well, we’ll use a couple of tricks to maximize the possible outcome, including augmentation, freezing & unfreezing layers, etc.

I’ve tried a couple of models, such as DenseNet, ResNet50/101, Inception v3/v4 and even ResNeXt-101–32x8d — architecture with 88M parameters, 82.2 top-1 and 96.4 top-5 errors weren’t able to surpass the 66.2% accuracy limit on submission set, nonetheless, some of them have shown >95.0% acc on validation.

So, I sat down and analyzed the dataset, and a couple of models.
Idea is that — big models overfitted, and small models under fitted, no matter how good I’ve tuned them.

Searched through models once again, and I got:

And, decided to try both models.

I’ve implemented TensorboardX , which generates ‘run’ directory after execution, so we can visualize the training process.

No, I’m a victim of Google Colab for now.

My PyTorch code for ResNeXt50 training:

For PNASNet 5 Large:

I used Cadene’s implementation initially, and then converted to my code for fine-tuning, by just loading the model.

Training Set

After combining the datasets, the total number of images per class looks like the next:

Total - 3661 images

0 - No DR : 1805 images
1 - Mild : 370 images2 - Moderate : 999 images3 - Severe : 193 images4 - Proliferative DR : 295 images

Actually, there’s a huge difference between the classes. We’ll use augmentation, but anyway — augmenting images can help, but not a lot.

A simple code that will help you to sort the images by classes after unzipping them would be:

Data Augmentation

Dataset is not so rich. We’ve 3661 images total. So, I’m going to use PyTorch’s data augmentation techniques, such as:


transforms.RandomRotation((0,360), center=None),


Randomly flipping images vertically with 0.5 probability.
Random Rotation within 0,359 range (basically giving the ability to make a full rotation)
Randomly flipping horizontally with 0.5 probability.

PyTorch data transformation techniques work perfectly. By the end of the epoch, the model will have seen additionally 3 different augmentations on a single image.

Data Resizing

ResNeXt is being trained on 299×299 resized images. But, PNASNet requires 331×331 inputs. Thus, I’m going to modify the code respectively.

Little UX:

In case we want the process to run in the background, and not have a laptop/PC up all night, we can use nohup.

The command will be:

nohup python3 train.py &

And, we can see the stdouts via:

cat nohup.out

Or, tail them via:

tail -f nohup.out

Additionally, I really didn’t want to mess with ngrok. So, I used subprocess function to download the tensorboard output every once in a while, and refresh it.

from time import sleepimport subprocessfor i in range(10000):subprocess.run('scp user@ip_address:~/APTOS/runs/Aug30_21-58-43/* runs/', shell=True)sleep(20)

Let’s start training.


  • Train the ResNeXt for 3 and PNASNet for 2 epochs with learning rate 1e-3.
  • Train the ResNeXt and PNASNet for 3 epochs with learning rate 1e-4
  • ResNeXt — Freeze all layers but 3,4, decrease lr to 1e-5, and unfreeze the blocks step-by-step by decreasing lr by 10x.
  • PNASNet — Freeze everything but cell_9, 10, 11 and continue training on 1e-5 lr. Afterward, unfreeze the cells step by step and anneal the lr by 10x.


While playing around with those two models, I noticed that it yielded persuasive accuracy on the first two epochs with lr 1e-3, but if the training was continued with the same parameters, it drafted around the same accuracy and loss.
Decreasing lr 10x times helps a lot to increase accuracy and continue decreasing the loss. 4 epochs are perfectly enough for the whole model to get to know with the dataset and study enough features.

But, after leaving only the deepest layers, which are responsible for deeper features makes the model more sophisticated and fastens up the training procedure. To say it in other words: I’m giving the model some time to familiarize with the dataset and study it, and afterward concentrating it on the most powerful components of the dataset.

1st Part

Learning rate 1e-3, all layers, 2 epochs.

PNASNet 5 Large

Epoch 1/50.. Train loss: 3.503.. Test loss: 5.913.. Test accuracy: 0.011 
Epoch 1/50.. Train loss: 1.449.. Test loss: 1.084.. Test accuracy: 0.697
Epoch 1/50.. Train loss: 0.806.. Test loss: 0.673.. Test accuracy: 0.745
Epoch 1/50.. Train loss: 0.630.. Test loss: 0.606.. Test accuracy: 0.776
Epoch 1/50.. Train loss: 0.681.. Test loss: 0.591.. Test accuracy: 0.773
Epoch 2/50.. Train loss: 0.610.. Test loss: 0.546.. Test accuracy: 0.796
Epoch 2/50.. Train loss: 0.711.. Test loss: 0.564.. Test accuracy: 0.792
Epoch 2/50.. Train loss: 0.474.. Test loss: 0.582.. Test accuracy: 0.790
Epoch 2/50.. Train loss: 0.484.. Test loss: 0.539.. Test accuracy: 0.807
Epoch 2/50.. Train loss: 0.555.. Test loss: 0.527.. Test accuracy: 0.811
Epoch 3/50.. Train loss: 0.559.. Test loss: 0.520.. Test accuracy: 0.814
Epoch 3/50.. Train loss: 0.455.. Test loss: 0.507.. Test accuracy: 0.821
Epoch 3/50.. Train loss: 0.572.. Test loss: 0.486.. Test accuracy: 0.822
Epoch 3/50.. Train loss: 0.408.. Test loss: 0.520.. Test accuracy: 0.831
Epoch 3/50.. Train loss: 0.546.. Test loss: 0.466.. Test accuracy: 0.829

ResNeXt 50 32x4d

Epoch 1/50.. Train loss: 2.787.. Test loss: 3.990.. Test accuracy: 0.254 
Epoch 1/50.. Train loss: 1.460.. Test loss: 0.893.. Test accuracy: 0.684
Epoch 1/50.. Train loss: 0.722.. Test loss: 0.654.. Test accuracy: 0.738
Epoch 2/50.. Train loss: 0.817.. Test loss: 0.640.. Test accuracy: 0.771
Epoch 2/50.. Train loss: 0.590.. Test loss: 0.533.. Test accuracy: 0.807
Epoch 2/50.. Train loss: 0.517.. Test loss: 0.496.. Test accuracy: 0.820
Epoch 3/50.. Train loss: 0.579.. Test loss: 0.559.. Test accuracy: 0.772
Epoch 3/50.. Train loss: 0.470.. Test loss: 0.493.. Test accuracy: 0.809
Epoch 3/50.. Train loss: 0.512.. Test loss: 0.477.. Test accuracy: 0.836

2nd Part

All Layers

PNASNet 5 Large

Epoch 1/50.. Train loss: 0.472.. Test loss: 0.402.. Test accuracy: 0.848 
Epoch 1/50.. Train loss: 0.452.. Test loss: 0.370.. Test accuracy: 0.857
Epoch 1/50.. Train loss: 0.474.. Test loss: 0.383.. Test accuracy: 0.840
Epoch 1/50.. Train loss: 0.402.. Test loss: 0.383.. Test accuracy: 0.855
Epoch 2/50.. Train loss: 0.552.. Test loss: 0.398.. Test accuracy: 0.848
Epoch 2/50.. Train loss: 0.453.. Test loss: 0.378.. Test accuracy: 0.867
Epoch 2/50.. Train loss: 0.431.. Test loss: 0.391.. Test accuracy: 0.846
Epoch 2/50.. Train loss: 0.307.. Test loss: 0.379.. Test accuracy: 0.857
Epoch 3/50.. Train loss: 0.453.. Test loss: 0.378.. Test accuracy: 0.855
Epoch 3/50.. Train loss: 0.372.. Test loss: 0.376.. Test accuracy: 0.851
Epoch 3/50.. Train loss: 0.429.. Test loss: 0.380.. Test accuracy: 0.862
Epoch 3/50.. Train loss: 0.408.. Test loss: 0.385.. Test accuracy: 0.855

ResNeXt 50

Epoch 1/50.. Train loss: 0.230.. Test loss: 0.423.. Test accuracy: 0.837 
Epoch 1/50.. Train loss: 0.454.. Test loss: 0.416.. Test accuracy: 0.846
Epoch 1/50.. Train loss: 0.462.. Test loss: 0.410.. Test accuracy: 0.841
Epoch 2/50.. Train loss: 0.458.. Test loss: 0.416.. Test accuracy: 0.837
Epoch 2/50.. Train loss: 0.437.. Test loss: 0.386.. Test accuracy: 0.854
Epoch 2/50.. Train loss: 0.407.. Test loss: 0.406.. Test accuracy: 0.846
Epoch 3/50.. Train loss: 0.425.. Test loss: 0.401.. Test accuracy: 0.845
Epoch 3/50.. Train loss: 0.424.. Test loss: 0.394.. Test accuracy: 0.836
Epoch 3/50.. Train loss: 0.552.. Test loss: 0.398.. Test accuracy: 0.846

After freezing & unfreezing layers and tuning the lr

PNASNet 5 Large

Epoch 1/50.. Train loss: 0.220.. Test loss: 0.344.. Test accuracy: 0.884 
Epoch 1/50.. Train loss: 0.477.. Test loss: 0.361.. Test accuracy: 0.873
Epoch 1/50.. Train loss: 0.428.. Test loss: 0.345.. Test accuracy: 0.881
Epoch 1/50.. Train loss: 0.410.. Test loss: 0.358.. Test accuracy: 0.872
Epoch 1/50.. Train loss: 0.403.. Test loss: 0.356.. Test accuracy: 0.874
Epoch 1/50.. Train loss: 0.414.. Test loss: 0.332.. Test accuracy: 0.884

ResNeXt 50

Epoch 1/50.. Train loss: 0.176.. Test loss: 0.354.. Test accuracy: 0.876 
Epoch 1/50.. Train loss: 0.332.. Test loss: 0.370.. Test accuracy: 0.865
Epoch 2/50.. Train loss: 0.401.. Test loss: 0.361.. Test accuracy: 0.865
Epoch 2/50.. Train loss: 0.376.. Test loss: 0.366.. Test accuracy: 0.866
Epoch 2/50.. Train loss: 0.342.. Test loss: 0.354.. Test accuracy: 0.870
Epoch 3/50.. Train loss: 0.399.. Test loss: 0.372.. Test accuracy: 0.870
Epoch 3/50.. Train loss: 0.330.. Test loss: 0.349.. Test accuracy: 0.875


We’ve 87.5% accuracy on ResNeXt 50, and 88.4% on PNASNet 5.

Basically, I’ve tried SOTA, mediocre, and lastly — two out of top architectures for image classification.
And, we know — the models don’t work exactly the same way in real life as they show in test/validation procedures.

Anyway, tuning two models overnight was fun, and lastly — I’ll make predictions on actual test (submission) set, and we’ll see the results.


My inference for the prediction part.

In case we’re using Google Colab for prediction, we should note that sometimes tqdm is not a great option, as long as it refreshes the stdout for every output, and the page will crash. I always try with tqdm firstly, and if it’s not working well, just erasing it on the loop, and writing my version of the script to see the prediction process. We should make sure that we erase the *.csv file after the first try, or it’ll append the new predictions to it.



While trying to make a submission after the prediction part, kaggle made me furious. Basically they had a bug in the kernel, that threw submission error every time I tried to submit the predictions.
After searching for a while, I saw one kaggler’s comment, that actually helped me.

P.S. Turning off GPU and Internet in Kernel were helpful as well, in addition to downgrading Python docker image to 1–7 versions on Kaggle.

I’m gonna lend their code for submission part:

Lended from https://www.kaggle.com/kinnachen’s comment

Final Results:

ResNeXt 50


When I checked the reason why PNASNet worked so poorly, I noticed many of 0’s and 2’s, a couple of 1’s as prediction numbers in submission. And absolutely no 4’s or 3’s.
NASNet overfitted on 0 and 2 classes, as long as they held the most of the data, and made poor predictions on other classes, or didn’t make at all.

And, ResNeXt 50 turned out to work well for just overnight tuning.

There are 3 more days until the challenge closes, and I’m planning to try more
promising approaches as soon as I have time.

Hope you enjoyed it!


By cropping dataset, and training the last layers longer, I made 4.1% improvement.


Mean color subtraction gave 2.0% improvement.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓