Diffusion Models — my “second?” artist.

Last Updated on July 17, 2023 by Editorial Team

Author(s): Albert Nguyen

Originally published on Towards AI.

Diffusion Models are one of the most popular algorithms in Deep Learning. It is widely used in many applications, such as image generation, object detection, and text-to-image generation. In this article, I will explain how the diffusion models work (Link to paper Denoising Diffusion Probabilistic Models)

The main idea of the algorithm is to have 2 stochastic processes:

Forward process (Diffusion process)
Reverse process

Where the forward process is a fixed Markov Chains, and the reverse process is, typically, an Unet for generating images. The following will go into detail about the two processes.

The two processes are in detail

If you are not familiar with the Stochastic processes, the following may give you a headache. To the best of my knowledge, I will explain it intuitively.

The forward process

The process is a fixed Markov Chain with Gaussian transition with a variance schedule β1, …, βT. At each time step, the process will add a Gaussian noise with a given variance to the image.

Intuitive explanation

Say each image is a data point, then your entire dataset will form a distribution. The transition q(x_t U+007C x_t-1) tells us the distribution of the next state x_t given the current state x_t-1. Then, the forward process can be done by recursive sample data from a given distribution. Or, mathematically, we have the closed form:

Where α_t:= 1 − βt and α¯t is the cumulative product of a_0 to a_t. Then we can sample directly x_t from x_0 given the scheduled variance. And when the time step T is large enough, note that (1-βt) < 1, α¯t will reach to 0. This means the distribution of x_T will be approximate, a standard multivariate Gaussian.

The reverse process

This process is our deep learning model, where you sample some random noise and get an image. BUT how does it work? As above I mentioned that, we could treat our dataset as a distribution. Hence, if we find some way to sample a data point from it, we’ll get a real-looking image.

Then the task for our model is to learn to sample the data distribution by trying to reverse the forward process. Particularly, in generating, the inputs of model, x_T, will be in standard Gaussian, then it reverses the process with learned Gaussian transition:

> Training

Given an input image, the forward process will sample x_t
the sampled x_t will then feed into the model and try to predict the image

The loss is then computed by:

This can be explained as the KL divergence between the distribution of the transition of the two processes. In practice, this formula is simplified to:

There is quite a lot of math and other parameterizations to get to this, so I left it for you who are interested in reading the paper.

Why is KL Divergence?

KL Divergence tells us the long distance between the two distributions. What we want is to make the distribution of the generated images will be similar to the distribution of the images in the dataset. Then by minimizing the KL Divergence, the distribution of generated images will be pushed close the real distribution.

Pondering: This is not the only function that can tell how far the 2 distributions are. Indeed, in training GAN for a similar task, sometimes we use “earth-mover-distance” (Wasserstein Loss). But why is the KLD chosen?

Train a model to generate a celeb face.

I used the excellent work from lucidrains/denoising-diffusion-pytorch to train on the CelebA dataset on Kaggle. My notebook

THANK YOU

This is my first article to share the knowledge I gain during my internship. I hope what I share here can help others. If you found this helpful but there is something missing, please share your thought in the comment. I will very much appreciate it.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Diffusion Models — my “second?” artist.

Author(s): Albert Nguyen

The two processes are in detail

The forward process

Intuitive explanation

The reverse process

Why is KL Divergence?

Train a model to generate a celeb face.

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

TAI #143: New Scaling Laws Incoming? Ilya’s SSI Raises at $30bn, Manus Takes AI Agents Mainstream

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Diffusion Models — my “second?” artist.

Author(s): Albert Nguyen

The two processes are in detail

The forward process

Intuitive explanation

The reverse process

Why is KL Divergence?

Train a model to generate a celeb face.

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement