Building Complex Image Augmentation Pipelines with Tensorflow
Last Updated on November 3, 2020 by Editorial Team
Author(s):Β Dimitre Oliveira
Using the Tensorflow data module to build a complex image augmentation pipeline.
If you want to train your models with Tensorflow in the most efficient way you probably should use TFRecordsΒ and theΒ Tensorflow data moduleΒ to build your pipelines, but depending on the requirements and constraints of your applications, using them might be necessary not and an option, the good news is that Tensorflow has made both of them pretty clean and easy to use.
In this article, we will go through a simple yet efficient way of building pipelines with complex combinations of data augmentation using the Tensorflow data module.
One of the options I mentioned that could improve your models’ training, is to use TFRecords, TFRecord is a simple format provided by Tensorflow for storing data, I am not going into too many details about TFRecords because it is not the focus of this article but if you want to learn moreΒ check out this tutorialΒ from Tensorflow.
The information provided here can be applied to train models with Tensorflow in any hardware, I am going to use TPU as the target hardware because if you are using TPUs, probably you are already trying to make the most of your resources, and you would need to use the Tensorflow data module anyway.
Data augmentation with Tensorflow
First, we will begin by taking a look at how data augmentation is done at theΒ official data augmentation tutorial by Tensorflow.
# Data augmentation function def augment(image, label): image = tf.image.random_crop(image, size=[IMG_SIZE, IMG_SIZE, 3]) image = tf.image.random_brightness(image, max_delta=0.5) image = tf.clip_by_value(image, 0, 1) return image, label # Tensorflow data pipeline train_ds = ( train_ds .shuffle(1000) .map(augment, num_parallel_calls=AUTOTUNE) .batch(batch_size) .prefetch(AUTOTUNE) )
As we can see at theΒ augmentΒ function, it will apply a sequence of transformations to the images, first, it will take a random crop, then apply random brightness and finally clip the values to keep them between 0 and 1.
Following Tensorflow best practices, a data augmentation function is usually applied to the data pipeline by aΒ mapΒ operation.
The problem with the approach above is how the transformations are being applied to the images, you are basically just stacking them sequentially, generally, you will need to have some control over what and how is being applied, let me describe a few scenarios to make my point.
Scenario 1:
Your data may benefit from advanced data augmentations techniques likeΒ Cutout,Β Mixup, orΒ CutMix, if you are familiar with how they work you know that for each sample you are probably going to apply only one of them.
Scenario 2:
You might want to use many βpixel-levelβ augmentations, by pixel-level I mean transformations like brightness, gamma adjust, contrast, or saturation, usually lighter variations of those transformations can be safely used at many different datasets, but using all of them at once might change too much your images and end up disturbing the model training.
So what could be done?
If you are familiar with data augmentation for computer vision tasks you might have heard of libraries likeΒ ImgaugΒ orΒ Albumentations, if not, here areΒ two examplesΒ from the Albumentations libraryΒ of how it can do data augmentation:
def augment(p=0.5):
return Compose([
RandomRotate90(),
Flip(),
Transpose(),
OneOf([
IAAAdditiveGaussianNoise(),
GaussNoise(),
], p=0.2),
OneOf([
MotionBlur(p=0.2),
MedianBlur(blur_limit=3, p=0.1),
Blur(blur_limit=3, p=0.1),
], p=0.2),
OneOf([
OpticalDistortion(p=0.3),
GridDistortion(p=0.1),
IAAPiecewiseAffine(p=0.3),
], p=0.2),
OneOf([
CLAHE(clip_limit=2),
IAASharpen(),
IAAEmboss(),
RandomBrightnessContrast(),
], p=0.3),
HueSaturationValue(p=0.3),
], p=p)
augmented_image = augment(image=image)['image']
We can clearly see that Albumentations provides a much more efficient way of applying different transformations to images. You can apply them sequentially, like the Tensorflow tutorial, but you can also use operations like βOneOfβΒ and choose only one among a group of transformations to be applied, and the most important detail is that here you can control the probability that each transformation has of being applied.
It is worth it noting that the transformations that these libraries use are heavily optimized to run as fast as possible, Albumentations even have aΒ benchmark.
The best of both worlds would be if we could use a library like Albumentations that is very efficient and already implement a lot ofΒ differentΒ transformationsΒ with our Tensorflow data pipeline, but unfortunately, it is not possible, so what we can do?
Complex data augmentations with Tensorflow
Actually, if we use some creativity, we can build data augmentation functions that are pretty close to the ones provided by Albumentation, and only using Tensorflow code, so it can run on TPUs integrated with Tensorflow pipelines, here is a simple example:
def augment(image):
p_spatial = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
p_rotate = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
# Flips
if p_spatial >= .2:
image = tf.image.random_flip_left_right(image)
image = tf.image.random_flip_up_down(image)
# Rotates
if p_rotate > .75:
image = tf.image.rot90(image, k=3) # rotate 270ΒΊ
elif p_rotate > .5:
image = tf.image.rot90(image, k=2) # rotate 180ΒΊ
elif p_rotate > .25:
image = tf.image.rot90(image, k=1) # rotate 90ΒΊ
return image
Great! this function has all the things that we liked about Albumentations and is pure Tensorflow, letβs check:
β [x] Apply transformation sequentially.
β [x] βOneOfβΒ type of transformation (grouping).
β [x] Control the probability of applying a transformation.
Letβs breakdown what is going on at this function.
First, we define two variablesΒ p_spatialΒ andΒ p_rotateΒ then assign to them probabilities, those probabilities are sampled from a randomΒ uniform distribution, this means that all numbers in the interval [0, 1] have the same chance of being sampled.
Then we have two different types of transformations that we want to apply,Β flipsΒ andΒ rotates, they have different semantics so they belong to different groups.
For theΒ flipsΒ transformations ifΒ p_spatialΒ is greater thanΒ .2Β we will apply two random flip transformations, in other words, there is anΒ 80% chanceΒ of applying those two random flips.
At theΒ rotatesΒ transformations we are using more control, this will be similar to the βOneOfβΒ from Albumentations because we are applying only one of those transformations, each of them has aΒ 25% chanceΒ of being applied and there is also aΒ 25% chanceΒ of applying nothing at all, we needed this kind of control here because there is no point of rotating the image 90° thee times, then 2 more times and so on.
Using this idea you can build data augmentation functions that can be a lot more complex than this one, here is an example that I used for theΒ SIIM-ISIC Melanoma ClassificationΒ Kaggle competition:
def data_augment(image):
p_rotation = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
p_rotate = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
p_cutout = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
p_shear = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
p_crop = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
if p_shear > .2:
if p_shear > .6:
image = transform_shear(image, config['HEIGHT'], shear=20.)
else:
image = transform_shear(image, config['HEIGHT'], shear=-20.)
if p_rotation > .2:
if p_rotation > .6:
image = transform_rotation(image, config['HEIGHT'], rotation=45.)
else:
image = transform_rotation(image, config['HEIGHT'], rotation=-45.)
if p_crop > .2:
image = data_augment_crop(image)
if p_rotate > .2:
image = data_augment_rotate(image)
image = data_augment_spatial(image)
image = tf.image.random_saturation(image, 0.7, 1.3)
image = tf.image.random_contrast(image, 0.8, 1.2)
image = tf.image.random_brightness(image, 0.1)
if p_cutout > .5:
image = data_augment_cutout(image)
return image
def data_augment_spatial(image):
p_spatial = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
image = tf.image.random_flip_left_right(image)
image = tf.image.random_flip_up_down(image)
if p_spatial > .75:
image = tf.image.transpose(image)
return image
def data_augment_rotate(image):
p_rotate = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
if p_rotate > .66:
image = tf.image.rot90(image, k=3) # rotate 270ΒΊ
elif p_rotate > .33:
image = tf.image.rot90(image, k=2) # rotate 180ΒΊ
else:
image = tf.image.rot90(image, k=1) # rotate 90ΒΊ
return image
def data_augment_crop(image):
p_crop = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
crop_size = tf.random.uniform([], int(config['HEIGHT']*.7), config['HEIGHT'], dtype=tf.int32)
if p_crop > .5:
image = tf.image.random_crop(image, size=[crop_size, crop_size, config['CHANNELS']])
else:
if p_crop > .4:
image = tf.image.central_crop(image, central_fraction=.7)
elif p_crop > .2:
image = tf.image.central_crop(image, central_fraction=.8)
else:
image = tf.image.central_crop(image, central_fraction=.9)
image = tf.image.resize(image, size=[config['HEIGHT'], config['WIDTH']])
return image
I will also leave two links to complete code examples using a similar approach.
βΒ Complete code for the example above
βΒ Introductory notebook for advanced augmentation with Tensorflow
If you wanna check out how to build a complete Tensorflow pipeline to train models on TPUs here is a cool article that I have written βEfficiently Using TPU for Image Classificationβ.
To learn even more take a look at the references:
βΒ Tensorflow TFRecords tutorial
βΒ Tensorflow data module documentation
βΒ Tensorflow data module tutorial
βΒ Better performance with the tf.data API
βΒ Tensorflow data augmentation tutorial
βΒ Efficiently Using TPU for Image Classification
βΒ TPU-speed data pipelines