The Magic of Variational Autoencoders (VAE), Where Creativity Begins! 🎨

Last Updated on June 3, 2024 by Editorial Team

Author(s): JAIGANESAN

Originally published on Towards AI.

The Magic of Variational Autoencoders (VAE), Where Creativity Begins! 🎨

I’m assuming you’re already familiar with the basics of autoencoder, convolution, transpose convolution, and latent dimension and how they work. If not, I highly recommend checking out my previous article, “Autoencoder is Simple 😃”, to get up to speed. Otherwise, this article won’t make sense to you 😟.

Autoencoder is Simple 😲 !

Discover the elegance in Image compression and reconstruction with autoencoders — where simplicity meets…

medium.com

In this article, we’re going to dive into the world of Variational Autoencoder (VAE) and explore what sets them apart from traditional autoencoder. I’ll solely focus on the reparameterization trick and the loss function that makes VAE different from the autoencoder. So, let’s dive in!

In this article, I will intentionally use certain sentences and words repeatedly to ensure that my message is clear.

So, what’s the magic behind VAEs?

In autoencoders, the latent vector is directly fed into the decoder without any changes. But VAE introduces a twist — a reparameterization trick that adds an element of randomness to the latent space. This subtle change forces the decoder to be more robust in reconstructing the original image, allowing it to generate new images that are similar to, or slightly different from the input images.

Unlike autoencoders, which have deterministic layers, Variational Autoencoders (VAEs) introduce a bit of randomness into the mix. Encoder output fed into two linear layers to calculate mean and variance vector. To generate the actual latent vector, the model samples from this distribution using an additional random component called epsilon. This touch of randomness is what sets VAE apart and makes them capable of generating new images.

The latent vector, Z, follows a standard normal distribution, Z ~ N(0,1), meaning it has a mean of zero and a standard deviation of one (the Regularization Term forces the latent space to be in a normal distribution ). But here’s the clever part — the mean and standard deviation are parameterized, meaning they’re created from a linear layer with learnable weights ( W1, W2 ). During training, these weights are adjusted to optimize the model.

self.fc_mu = nn.Linear(2048*4*4, latent_dim) # Mean vector from Encoder output
self.fc_logvar = nn.Linear(2048*4*4, latent_dim) # Variance Vector from Encoder output

By introducing this probabilistic element, VAE can generate new, diverse samples that are similar to the input data, making them incredibly powerful for tasks like image generation and data augmentation.

VAE Optimization: Unraveling the Encoder and Decoder

In a Variational Autoencoder (VAE), the encoder and decoder play crucial roles in learning and generating images.

The Encoder’s Job: Learning the Latent Space

The encoder computes q_phi(z|x), which represents the probabilistic distribution of the latent variable z given the input image x. In other words, it learns to map the input images to a probabilistic distribution in the latent space.

# Encoder Input : 64 X 64 X 3 ( Input Image )
self.encoder = nn.Sequential(
 nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1), # output dimension: 64 X 64 X 32
 nn.ReLU(),
 nn.Conv2d(32, 128, kernel_size=3, stride=1, padding=1), # 64 X 64 X 128
 nn.ReLU(),
 nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1), # 32 X 32 X 256
 nn.ReLU(),
 nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1), # 16 X 16 X 512
 nn.ReLU(),
 nn.Conv2d(512, 1024, kernel_size=4, stride=2, padding=1), # 8 X 8 X 1024
 nn.ReLU(),
 nn.Conv2d(1024, 2048, kernel_size=4, stride=2, padding=1), # 4 X 4 X 2048
 nn.ReLU(),
 )

The Decoder’s Job: Reconstructing the Input Space

The decoder computes p_theta(x|z), which represents the probabilistic distribution of the input space x (reconstructed images) given the latent variable z. This means it learns to generate images from the latent space.

# Decoder Input : 4 X 4 X 2048
self.decoder = nn.Sequential(
 nn.ConvTranspose2d(2048, 1024, kernel_size=4, stride=2, padding=1), # Output dimension : 8 X 8 X 1024
 nn.ReLU(),
 nn.ConvTranspose2d(1024, 512, kernel_size=4, stride=2, padding=1), # 16 X 16 X 512
 nn.ReLU(),
 nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1), # 32 X 32 X 256
 nn.ReLU(),
 nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1), # 64 X 64 X 128
 nn.ReLU(),
 nn.ConvTranspose2d(128, 32, kernel_size=3, stride=1, padding=1), # 64 X 64 X 32
 nn.ReLU(),
 nn.ConvTranspose2d(32, 3, kernel_size=3, stride=1, padding=1), # 64 X 64 X 3 (Reconstructed image)
 nn.Sigmoid()
)

The Learnable Parameters: The phi and theta symbols denote the learnable parameters in the encoder and decoder networks, including weights, kernels, and biases.

The standard VAE Loss :

When training a Variational Autoencoder (VAE), the loss function we need to optimize has two crucial components: the reconstruction loss and the regularization term. For the reconstruction loss, I have used Mean Squared Error (MSE), while for the regularization term, I used KL divergence.

def vae_loss(x_recon, x, mu, logvar):
 
 recon_loss = F.mse_loss(x_recon, x, reduction='sum')
 
 kl_divergence = - 0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp()) 
 
 """
 Reconstruction term = mse loss
 Regularization term = Kl divergence checks the latent space is standard 
 normal distribution or not and result the difference.
 """
 
 return recon_loss + kl_divergence

L(phi, theta, x) = Reconstruction Loss + Regularization Term

The reconstruction loss is all about measuring how well the VAE can reconstruct the input images. Which gives us an idea of how similar the reconstructed images are to the originals.

D ( q_phi (z | x) || p(z) )

The regularization term ( D), on the other hand, is where things get really interesting. This term is calculated using the Kullback-Leibler (KL) divergence, which measures the difference between the learned approximate posterior distribution q_phi(z|x) and the prior distribution p(z).

Why do we use the KL divergence? KL divergence is to calculate or quantify the difference between the two probabilistic distributions.

The Common Choice of Prior is Normal Gaussian :

p (z) = N( mean = 0, std = 1 )

Think of it like a penalty term that encourages the VAE to keep its posterior distribution close (Latent vector ) to the prior distribution, which is usually a Gaussian distribution with zero mean and unit variance. Enforce the Latent Vectors to be roughly standard normal Gaussian distribution and don’t get too much divergence. The reason why the two vectors got it’s name mean and variance.

By doing so, the regularization term helps the VAE avoid overfitting and promotes a more robust representation of the data(Image). It’s like a gentle nudge that keeps the VAE on track, ensuring it doesn’t get too caught up in fitting the training data perfectly but instead learns to generalize well to new, unseen data.

So, what’s the intuition behind using regularization and a normal prior in VAEs?

Well, we’re trying to achieve two key properties: continuity and completeness.

continuity: We want points that are close together in the latent space to correspond to similar inputs. This means that if we move slightly in the latent space, the decoded output should change smoothly and gradually. Think of it like a continuous spectrum of images, where similar images are clustered together.

completeness: We also want to ensure that sampling from the latent space produces meaningful and coherent content. When we decode a sample from the latent space, we want every pixel in the reconstructed image to be in the correct place, making sense of the overall picture. This means that the VAE should be able to generate a diverse range of images that are all plausible and realistic.

By using regularization with a normal prior, we’re able to enforce an information gradient in the latent space. This means that the VAE learns to represent the input data in a way that’s both continuous and complete. The normal prior helps to “push” the learned representation towards a more structured and meaningful organization, making it easier to generate new samples that are similar to the training images.

Is everything OK now? No, we have a problem: The Challenge is backpropagation with Stochasticity(Randomness).

So, how do we handle backpropagation when there’s stochasticity involved? Well, it turns out that’s a major problem. The problem is that we can’t backpropagate gradients through sampling layers, which are essentially layers that introduce randomness.

Backpropagation requires a complete deterministic pipeline, where each network and layer behaves in a deterministic way. This is essential for the gradient descent algorithm to work and update the model’s parameters.

However, when we introduce stochasticity through sampling layers, we break this deterministic chain. The randomness in these layers makes it impossible to compute the gradients accurately, which means we can’t apply the backpropagation algorithm as usual.

This is where the reparameterization trick comes in, which we’ll discuss next. It’s a clever workaround that allows us to approximate the gradients and still train the VAE using backpropagation.

Reparameterizing the Sampling Layer:

So, how do we deal with the stochasticity in the sampling layer? Well, we can reparameterize it in a way that allows us to backpropagate through the network.

We can represent the sampling layer as Z ~ N(mean, std), where mean is a fixed vector and std is a standard deviation vector scaled by random constants drawn from a prior distribution (in this case, a normal distribution).

By doing so, we can rewrite Z as Z = mean + std * epsilon, where epsilon ~ N(0, 1). This is the reparameterization trick!

def reparameterize(self, mu, logvar):
 """
 Latent vector / variable created with mean, std and epsilon. this epsilon
 is N(0,1). It help ot regularize the latent space by forcing it to normal 
 distribution. 
 """
 std = torch.exp(0.5 * logvar)
 eps = torch.randn_like(std)
 return mu + eps * std

By separating the randomness (represented by epsilon) from the deterministic nodes and layers, we can now backpropagate through the network. This allows us to update the weights and kernel using optimizers.

Variational Autoencoders (VAEs) have a wide range of applications, from image generation and data augmentation to anomaly detection, feature learning from images, image-to-image translation tasks, material discovery, and chemical planning synthesis. The list goes on! VAEs have become a pivotal tool in many areas of research and industry.

One last thing I also want to mention is the transformation from Latent vector to VAE Decoder :

Image 7 represents, with the help of a linear layer the latent vector is projected into the decoder input size. Then it’s reshaped into the decoder input dimension (Image 8 ).

self.decoder_input = nn.Linear(latent_dim, 2048*4*4) # Latent to decoder input size

Note: The weights in the linear layers W1, W2, and W3 are learnable parameters, which means they are updated during training. For simplicity, I did not use any bias in my example.

If you have time, please check out my VAE implementation on Kaggle.

If you don’t understand linear projection, I highly recommend you read my previous article about Neural networks 👽 😃

The Randomness gives Creativity in VAE 💡 🚀

I believe I have made some sense of Variational Auto Encoder ( VAE ). If you found my article useful 👍, give it a👏! Feel free to follow for more insights. If you don’t understand take some time, read it again. it will make some sense.

Let’s also stay in touch on 🔗LinkedIn🌏❤️to keep the conversation going!

References :

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

The Magic of Variational Autoencoders (VAE), Where Creativity Begins! 🎨

Author(s): JAIGANESAN

The Magic of Variational Autoencoders (VAE), Where Creativity Begins! 🎨

Autoencoder is Simple 😲 !

Discover the elegance in Image compression and reconstruction with autoencoders — where simplicity meets…

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

TAI #143: New Scaling Laws Incoming? Ilya’s SSI Raises at $30bn, Manus Takes AI Agents Mainstream

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

The Magic of Variational Autoencoders (VAE), Where Creativity Begins! 🎨

Author(s): JAIGANESAN

The Magic of Variational Autoencoders (VAE), Where Creativity Begins! 🎨

Autoencoder is Simple 😲 !

Discover the elegance in Image compression and reconstruction with autoencoders — where simplicity meets…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement