Kaggling: A Journey of Past Competitions — Part1
Last Updated on August 30, 2020 by Editorial Team
Author(s): Rashmi Margani
Deep Learning
Kaggling: A Journey of Past Competitions — Part1
Techniques learned and Stay tuned with me for more.
1. Jigsaw Multilingual Toxic Comment Classification Competition
In Jigsaw competition, Cross-validation, postprocessing, and preprocessing played a lot of importance.
- Pseudo labeling: performance improvement when we used test-set predictions as training data — the intuition is that it helps models learn the test set distribution. Using all test-set predictions as soft-labels worked better than any other version of pseudo-labeling (e.g., hard labels, confidence thresholded PLs, etc.). Towards the end of the competition, we discovered a minor but material boost in LB. To know more about Pseudo labeling, best explained here by Chris from whom I learned a lot.
- k-fold CV and validation set as hold-out but as refined test predictions and used pseudo-labels + validation set for training, the validation metric became noisy to the point where we relied primarily on the public LB score.
- Postprocessing: the history of submissions and tweaking the test set predictions. Then tracking the delta of predictions for each sample for successful submissions, averaging them, and nudging the predictions in the same direction made the way for winning solutions.
TReNDS Neuroimaging Competition
In this competition reading, MRI data was a bit tedious. So preprocessing the data for constructed features, postprocessing and TLDR played a major role.
- Preprocessing: Adding bias to different columns in the test set to make it closer to the train set. Linear models showed good performance in the competition so it was expected that adding biases would help a lot (at least for linear models). There are lots of ways to figure out possible biases: we used the minimization of Kolmogorov-Smirnov test’s statistic between train and test dataset. To know more about the Kolmogorov-Smirnov(KS) test’s statistic here is a notebook.
- Postprocessing: Postprocessing used the same logic as a preprocessing: That is calculating the KS statistic and finding the best fit. But the effect of postprocessing was small: only 5e-5 for both public and private.
- TLDR: Incremental PCA for 3d images.Offsets for test features (like we did in ION).
ALASKA2 Image Steganalysis Competition
In ALASKA2 image normalization, Augmentation, modifying an efficientnet architecture was being more important to get a valid prediction.
- Image normalization: image channel distributions between train and test and doing local image normalization brought CV and LB closer.
- Augmentation: Most models only use standard flips and transpose. Some models also add cutout and some add tiny random noise. So using Test time augmentation was being helpful.
- Efficientnet architecture: Started fitting models only on fold 0, but then switched to fitting models “blindly” on full data as CV convergence always was stable for us. If we're unsure, re-fitted on a fold on a new model and checked if the models overfit or not, and then trusted the full fits. This also meant that had to blend a bit blindly though. Both fitted vanilla EfficientNets, but also got huge boosts on a CV when changing the stride in the first layer to (1,1), as also other contestants did. This means that the models are fit for longer on the full resolution, and this is specifically helpful as a lot of information about the manipulation lies between neighboring pixels.
Prostate cANcer graDe Assessment (PANDA) Competition.
In this challenge, it was more about data preprocessing, postprocessing, augmentation, model, and loss function.
- Data preprocessing/postprocessing was almost the same. Turning the slide's image into tiles, but instead of using the MIL method, glued the tiles into a big image. The tile size is 128×128 and a total of 144 tiles will be extracted from a slide image. Then glued the 144 tiles into a big image, which means turned a slide into a 1536×1536 image. But before gluing the image, will get the foreground of each tile and resize back to 128×128. Just like autoencoder.
- augmentation: H&E colour jitter, Random contrast/brightness, Random saturation, Image transpose, Random H/V flip, Shift(50pixels)/Rotate(10degrees)/Scale(0.05) . Setting up hyparameter was being crucial.
- Model: Efficientnet b3 + GeM + 0.3 dropout + 128 units dense layer + 1 unit dense layer along with regression head.
- And at last loss function: MSE loss on Karolinska data
Huber loss with delta 1 on Radboud data was being used.
Kaggling: A Journey of Past Competitions — Part1 was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI