Join thousands of AI enthusiasts and experts at the Learn AI Community.


Building Complex Image Augmentation Pipelines with Tensorflow
Machine Learning   Programming

Building Complex Image Augmentation Pipelines with Tensorflow

Last Updated on November 3, 2020 by Editorial Team

Author(s): Dimitre Oliveira


Using the Tensorflow data module to build a complex image augmentation pipeline.

If you want to train your models with Tensorflow in the most efficient way you probably should use TFRecords and the Tensorflow data module to build your pipelines, but depending on the requirements and constraints of your applications, using them might be necessary not and an option, the good news is that Tensorflow has made both of them pretty clean and easy to use.

In this article, we will go through a simple yet efficient way of building pipelines with complex combinations of data augmentation using the Tensorflow data module.

One of the options I mentioned that could improve your models’ training, is to use TFRecords, TFRecord is a simple format provided by Tensorflow for storing data, I am not going into too many details about TFRecords because it is not the focus of this article but if you want to learn more check out this tutorial from Tensorflow.

The information provided here can be applied to train models with Tensorflow in any hardware, I am going to use TPU as the target hardware because if you are using TPUs, probably you are already trying to make the most of your resources, and you would need to use the Tensorflow data module anyway.

Data augmentation with Tensorflow

First, we will begin by taking a look at how data augmentation is done at the official data augmentation tutorial by Tensorflow.

# Data augmentation function
def augment(image, label):
  image = tf.image.random_crop(image, size=[IMG_SIZE, IMG_SIZE, 3])
  image = tf.image.random_brightness(image, max_delta=0.5)
  image = tf.clip_by_value(image, 0, 1)
  return image, label
# Tensorflow data pipeline
train_ds = (
    .map(augment, num_parallel_calls=AUTOTUNE)

As we can see at the augment function, it will apply a sequence of transformations to the images, first, it will take a random crop, then apply random brightness and finally clip the values to keep them between 0 and 1.

Following Tensorflow best practices, a data augmentation function is usually applied to the data pipeline by a map operation.

The problem with the approach above is how the transformations are being applied to the images, you are basically just stacking them sequentially, generally, you will need to have some control over what and how is being applied, let me describe a few scenarios to make my point.

Scenario 1:

Your data may benefit from advanced data augmentations techniques like CutoutMixup, or CutMix, if you are familiar with how they work you know that for each sample you are probably going to apply only one of them.

Scenario 2:

You might want to use many “pixel-level” augmentations, by pixel-level I mean transformations like brightness, gamma adjust, contrast, or saturation, usually lighter variations of those transformations can be safely used at many different datasets, but using all of them at once might change too much your images and end up disturbing the model training.

So what could be done?

If you are familiar with data augmentation for computer vision tasks you might have heard of libraries like Imgaug or Albumentations, if not, here are two examples from the Albumentations library of how it can do data augmentation:

def augment(p=0.5):
    return Compose([
        ], p=0.2),
            MedianBlur(blur_limit=3, p=0.1),
            Blur(blur_limit=3, p=0.1),
        ], p=0.2),
        ], p=0.2),
        ], p=0.3),
    ], p=p)

augmented_image = augment(image=image)['image']

We can clearly see that Albumentations provides a much more efficient way of applying different transformations to images. You can apply them sequentially, like the Tensorflow tutorial, but you can also use operations like “OneOf” and choose only one among a group of transformations to be applied, and the most important detail is that here you can control the probability that each transformation has of being applied.
It is worth it noting that the transformations that these libraries use are heavily optimized to run as fast as possible, Albumentations even have a benchmark.

The best of both worlds would be if we could use a library like Albumentations that is very efficient and already implement a lot of different transformations with our Tensorflow data pipeline, but unfortunately, it is not possible, so what we can do?

Complex data augmentations with Tensorflow

Actually, if we use some creativity, we can build data augmentation functions that are pretty close to the ones provided by Albumentation, and only using Tensorflow code, so it can run on TPUs integrated with Tensorflow pipelines, here is a simple example:

def augment(image):
    p_spatial = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
    p_rotate = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
    # Flips
    if p_spatial >= .2:
        image = tf.image.random_flip_left_right(image)
        image = tf.image.random_flip_up_down(image)
    # Rotates
    if p_rotate > .75:
        image = tf.image.rot90(image, k=3) # rotate 270º
    elif p_rotate > .5:
        image = tf.image.rot90(image, k=2) # rotate 180º
    elif p_rotate > .25:
        image = tf.image.rot90(image, k=1) # rotate 90º

    return image

Great! this function has all the things that we liked about Albumentations and is pure Tensorflow, let’s check:
— [x] Apply transformation sequentially.
— [x] “OneOf” type of transformation (grouping).
— [x] Control the probability of applying a transformation.

Let’s breakdown what is going on at this function.

First, we define two variables p_spatial and p_rotate then assign to them probabilities, those probabilities are sampled from a random uniform distribution, this means that all numbers in the interval [0, 1] have the same chance of being sampled.
Then we have two different types of transformations that we want to apply, flips and rotates, they have different semantics so they belong to different groups.
For the flips transformations if p_spatial is greater than .2 we will apply two random flip transformations, in other words, there is an 80% chance of applying those two random flips.
At the rotates transformations we are using more control, this will be similar to the “OneOf” from Albumentations because we are applying only one of those transformations, each of them has a 25% chance of being applied and there is also a 25% chance of applying nothing at all, we needed this kind of control here because there is no point of rotating the image 90° thee times, then 2 more times and so on.

Using this idea you can build data augmentation functions that can be a lot more complex than this one, here is an example that I used for the SIIM-ISIC Melanoma Classification Kaggle competition:

def data_augment(image):
    p_rotation = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
    p_rotate = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
    p_cutout = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
    p_shear = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
    p_crop = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
    if p_shear > .2:
        if p_shear > .6:
            image = transform_shear(image, config['HEIGHT'], shear=20.)
            image = transform_shear(image, config['HEIGHT'], shear=-20.)
    if p_rotation > .2:
        if p_rotation > .6:
            image = transform_rotation(image, config['HEIGHT'], rotation=45.)
            image = transform_rotation(image, config['HEIGHT'], rotation=-45.)

    if p_crop > .2:
        image = data_augment_crop(image)

    if p_rotate > .2:
        image = data_augment_rotate(image)
    image = data_augment_spatial(image)
    image = tf.image.random_saturation(image, 0.7, 1.3)
    image = tf.image.random_contrast(image, 0.8, 1.2)
    image = tf.image.random_brightness(image, 0.1)
    if p_cutout > .5:
        image = data_augment_cutout(image)
    return image

def data_augment_spatial(image):
    p_spatial = tf.random.uniform([], 0, 1.0, dtype=tf.float32)

    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_flip_up_down(image)
    if p_spatial > .75:
        image = tf.image.transpose(image)

    return image

def data_augment_rotate(image):
    p_rotate = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
    if p_rotate > .66:
        image = tf.image.rot90(image, k=3) # rotate 270º
    elif p_rotate > .33:
        image = tf.image.rot90(image, k=2) # rotate 180º
        image = tf.image.rot90(image, k=1) # rotate 90º

    return image

def data_augment_crop(image):
    p_crop = tf.random.uniform([], 0, 1.0, dtype=tf.float32)
    crop_size = tf.random.uniform([], int(config['HEIGHT']*.7), config['HEIGHT'], dtype=tf.int32)
    if p_crop > .5:
        image = tf.image.random_crop(image, size=[crop_size, crop_size, config['CHANNELS']])
        if p_crop > .4:
            image = tf.image.central_crop(image, central_fraction=.7)
        elif p_crop > .2:
            image = tf.image.central_crop(image, central_fraction=.8)
            image = tf.image.central_crop(image, central_fraction=.9)
    image = tf.image.resize(image, size=[config['HEIGHT'], config['WIDTH']])

    return image

I will also leave two links to complete code examples using a similar approach.
— Complete code for the example above
— Introductory notebook for advanced augmentation with Tensorflow

If you wanna check out how to build a complete Tensorflow pipeline to train models on TPUs here is a cool article that I have written “Efficiently Using TPU for Image Classification”.

To learn even more take a look at the references:
— Tensorflow TFRecords tutorial
— Tensorflow data module documentation
— Tensorflow data module tutorial
— Better performance with the API
— Tensorflow data augmentation tutorial
— Efficiently Using TPU for Image Classification
— TPU-speed data pipelines

Feedback ↓