Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

The Magic of Variational Autoencoders (VAE), Where Creativity Begins! 🎨
Artificial Intelligence   Latest   Machine Learning

The Magic of Variational Autoencoders (VAE), Where Creativity Begins! 🎨

Last Updated on June 3, 2024 by Editorial Team

Author(s): JAIGANESAN

Originally published on Towards AI.

The Magic of Variational Autoencoders (VAE), Where Creativity Begins! 🎨

Photo by Google DeepMind: pexels.com

I’m assuming you’re already familiar with the basics of autoencoder, convolution, transpose convolution, and latent dimension and how they work. If not, I highly recommend checking out my previous article, β€œAutoencoder is Simple 😃”, to get up to speed. Otherwise, this article won’t make sense to you 😟.

Autoencoder is Simple 😲 !

Discover the elegance in Image compression and reconstruction with autoencoders β€” where simplicity meets…

medium.com

In this article, we’re going to dive into the world of Variational Autoencoder (VAE) and explore what sets them apart from traditional autoencoder. I’ll solely focus on the reparameterization trick and the loss function that makes VAE different from the autoencoder. So, let’s dive in!

In this article, I will intentionally use certain sentences and words repeatedly to ensure that my message is clear.

Image 1: Image Created by the author.

So, what’s the magic behind VAEs?

In autoencoders, the latent vector is directly fed into the decoder without any changes. But VAE introduces a twist β€” a reparameterization trick that adds an element of randomness to the latent space. This subtle change forces the decoder to be more robust in reconstructing the original image, allowing it to generate new images that are similar to, or slightly different from the input images.

Unlike autoencoders, which have deterministic layers, Variational Autoencoders (VAEs) introduce a bit of randomness into the mix. Encoder output fed into two linear layers to calculate mean and variance vector. To generate the actual latent vector, the model samples from this distribution using an additional random component called epsilon. This touch of randomness is what sets VAE apart and makes them capable of generating new images.

Image 2 : Image created by the author

The latent vector, Z, follows a standard normal distribution, Z ~ N(0,1), meaning it has a mean of zero and a standard deviation of one (the Regularization Term forces the latent space to be in a normal distribution ). But here’s the clever part β€” the mean and standard deviation are parameterized, meaning they’re created from a linear layer with learnable weights ( W1, W2 ). During training, these weights are adjusted to optimize the model.

Image 3 : Image created by the author
self.fc_mu = nn.Linear(2048*4*4, latent_dim) # Mean vector from Encoder output
self.fc_logvar = nn.Linear(2048*4*4, latent_dim) # Variance Vector from Encoder output

By introducing this probabilistic element, VAE can generate new, diverse samples that are similar to the input data, making them incredibly powerful for tasks like image generation and data augmentation.

VAE Optimization: Unraveling the Encoder and Decoder

In a Variational Autoencoder (VAE), the encoder and decoder play crucial roles in learning and generating images.

The Encoder’s Job: Learning the Latent Space

Image 4: Created by the author

The encoder computes q_phi(z|x), which represents the probabilistic distribution of the latent variable z given the input image x. In other words, it learns to map the input images to a probabilistic distribution in the latent space.

# Encoder Input : 64 X 64 X 3 ( Input Image )
self.encoder = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1), # output dimension: 64 X 64 X 32
nn.ReLU(),
nn.Conv2d(32, 128, kernel_size=3, stride=1, padding=1), # 64 X 64 X 128
nn.ReLU(),
nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1), # 32 X 32 X 256
nn.ReLU(),
nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1), # 16 X 16 X 512
nn.ReLU(),
nn.Conv2d(512, 1024, kernel_size=4, stride=2, padding=1), # 8 X 8 X 1024
nn.ReLU(),
nn.Conv2d(1024, 2048, kernel_size=4, stride=2, padding=1), # 4 X 4 X 2048
nn.ReLU(),
)

The Decoder’s Job: Reconstructing the Input Space

Image 5: Created by the author

The decoder computes p_theta(x|z), which represents the probabilistic distribution of the input space x (reconstructed images) given the latent variable z. This means it learns to generate images from the latent space.

# Decoder Input : 4 X 4 X 2048
self.decoder = nn.Sequential(
nn.ConvTranspose2d(2048, 1024, kernel_size=4, stride=2, padding=1), # Output dimension : 8 X 8 X 1024
nn.ReLU(),
nn.ConvTranspose2d(1024, 512, kernel_size=4, stride=2, padding=1), # 16 X 16 X 512
nn.ReLU(),
nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1), # 32 X 32 X 256
nn.ReLU(),
nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1), # 64 X 64 X 128
nn.ReLU(),
nn.ConvTranspose2d(128, 32, kernel_size=3, stride=1, padding=1), # 64 X 64 X 32
nn.ReLU(),
nn.ConvTranspose2d(32, 3, kernel_size=3, stride=1, padding=1), # 64 X 64 X 3 (Reconstructed image)
nn.Sigmoid()
)

The Learnable Parameters: The phi and theta symbols denote the learnable parameters in the encoder and decoder networks, including weights, kernels, and biases.

The standard VAE Loss :

When training a Variational Autoencoder (VAE), the loss function we need to optimize has two crucial components: the reconstruction loss and the regularization term. For the reconstruction loss, I have used Mean Squared Error (MSE), while for the regularization term, I used KL divergence.

def vae_loss(x_recon, x, mu, logvar):

recon_loss = F.mse_loss(x_recon, x, reduction='sum')

kl_divergence = - 0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())

"""
Reconstruction term = mse loss
Regularization term = Kl divergence checks the latent space is standard
normal distribution or not and result the difference.
"""


return recon_loss + kl_divergence

L(phi, theta, x) = Reconstruction Loss + Regularization Term

The reconstruction loss is all about measuring how well the VAE can reconstruct the input images. Which gives us an idea of how similar the reconstructed images are to the originals.

D ( q_phi (z | x) || p(z) )

The regularization term ( D), on the other hand, is where things get really interesting. This term is calculated using the Kullback-Leibler (KL) divergence, which measures the difference between the learned approximate posterior distribution q_phi(z|x) and the prior distribution p(z).

Why do we use the KL divergence? KL divergence is to calculate or quantify the difference between the two probabilistic distributions.

The Common Choice of Prior is Normal Gaussian :

p (z) = N( mean = 0, std = 1 )

Think of it like a penalty term that encourages the VAE to keep its posterior distribution close (Latent vector ) to the prior distribution, which is usually a Gaussian distribution with zero mean and unit variance. Enforce the Latent Vectors to be roughly standard normal Gaussian distribution and don’t get too much divergence. The reason why the two vectors got it’s name mean and variance.

By doing so, the regularization term helps the VAE avoid overfitting and promotes a more robust representation of the data(Image). It’s like a gentle nudge that keeps the VAE on track, ensuring it doesn’t get too caught up in fitting the training data perfectly but instead learns to generalize well to new, unseen data.

So, what’s the intuition behind using regularization and a normal prior in VAEs?

Well, we’re trying to achieve two key properties: continuity and completeness.

continuity: We want points that are close together in the latent space to correspond to similar inputs. This means that if we move slightly in the latent space, the decoded output should change smoothly and gradually. Think of it like a continuous spectrum of images, where similar images are clustered together.

completeness: We also want to ensure that sampling from the latent space produces meaningful and coherent content. When we decode a sample from the latent space, we want every pixel in the reconstructed image to be in the correct place, making sense of the overall picture. This means that the VAE should be able to generate a diverse range of images that are all plausible and realistic.

By using regularization with a normal prior, we’re able to enforce an information gradient in the latent space. This means that the VAE learns to represent the input data in a way that’s both continuous and complete. The normal prior helps to β€œpush” the learned representation towards a more structured and meaningful organization, making it easier to generate new samples that are similar to the training images.

Is everything OK now? No, we have a problem: The Challenge is backpropagation with Stochasticity(Randomness).

So, how do we handle backpropagation when there’s stochasticity involved? Well, it turns out that’s a major problem. The problem is that we can’t backpropagate gradients through sampling layers, which are essentially layers that introduce randomness.

Backpropagation requires a complete deterministic pipeline, where each network and layer behaves in a deterministic way. This is essential for the gradient descent algorithm to work and update the model’s parameters.

However, when we introduce stochasticity through sampling layers, we break this deterministic chain. The randomness in these layers makes it impossible to compute the gradients accurately, which means we can’t apply the backpropagation algorithm as usual.

This is where the reparameterization trick comes in, which we’ll discuss next. It’s a clever workaround that allows us to approximate the gradients and still train the VAE using backpropagation.

Reparameterizing the Sampling Layer:

Image 6: Created by the author

So, how do we deal with the stochasticity in the sampling layer? Well, we can reparameterize it in a way that allows us to backpropagate through the network.

We can represent the sampling layer as Z ~ N(mean, std), where mean is a fixed vector and std is a standard deviation vector scaled by random constants drawn from a prior distribution (in this case, a normal distribution).

By doing so, we can rewrite Z as Z = mean + std * epsilon, where epsilon ~ N(0, 1). This is the reparameterization trick!

def reparameterize(self, mu, logvar):
"""
Latent vector / variable created with mean, std and epsilon. this epsilon
is N(0,1). It help ot regularize the latent space by forcing it to normal
distribution.
"""

std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std

By separating the randomness (represented by epsilon) from the deterministic nodes and layers, we can now backpropagate through the network. This allows us to update the weights and kernel using optimizers.

Variational Autoencoders (VAEs) have a wide range of applications, from image generation and data augmentation to anomaly detection, feature learning from images, image-to-image translation tasks, material discovery, and chemical planning synthesis. The list goes on! VAEs have become a pivotal tool in many areas of research and industry.

One last thing I also want to mention is the transformation from Latent vector to VAE Decoder :

Image 7: Image created by the author
Image 8: Image created by the author

Image 7 represents, with the help of a linear layer the latent vector is projected into the decoder input size. Then it’s reshaped into the decoder input dimension (Image 8 ).

self.decoder_input = nn.Linear(latent_dim, 2048*4*4) # Latent to decoder input size

Note: The weights in the linear layers W1, W2, and W3 are learnable parameters, which means they are updated during training. For simplicity, I did not use any bias in my example.

If you have time, please check out my VAE implementation on Kaggle.

If you don’t understand linear projection, I highly recommend you read my previous article about Neural networks 👽 😃

The Randomness gives Creativity in VAE 💡 🚀

I believe I have made some sense of Variational Auto Encoder ( VAE ). If you found my article useful 👍, give it a👏! Feel free to follow for more insights. If you don’t understand take some time, read it again. it will make some sense.

Let’s also stay in touch on 🔗LinkedIn🌏❤️to keep the conversation going!

References :

  1. https://direct.mit.edu/neco/article/34/1/1/107911/Predictive-Coding-Variational-Autoencoders-and
  2. http://introtodeeplearning.com/2019/materials/2019_6S191_L4.pdf
  3. https://theaisummer.com/latent-variable-models/

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓