GAN: Is Diffusion All You Need?

Last Updated on June 21, 2022 by Editorial Team Author(s): Kevin Berlemont, PhD Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Illustration of a diffusion process on a cat picture (adapted from [4]). Eight years ago, emerged one of the most promising approaches to generative modeling: Generative Adversarial Networks (GAN) [1]. Since then, a commensurate amount of progress has been made in the technics and results obtained. These models went from generating blurry faces to high-definition realistic pictures having different constraints. Samples generated from the model (amazon-research/gan-control: This package provides a pythorch implementation of “GAN-Control: Explicitly Controllable GANs”, ICCV 2021. (github.com)) [2] If these improvements are impressive, such models are still not ready for wide-public utilization. Indeed, for such models to be successful, they should meet the generative learning trilemma [4]: They need to generate high-quality sampling rapidly, for example, GAN is applied to image synthesis. They need good mode coverage and sample diversity as it reduces the negative social negative impacts of such models. They need a fast and inexpensive sampling for the applications of real-time image editing or real-time synthesis for example. I will first give an overview of what a GAN consists of with the advantage and inconveniences of such models. In the second part, I will explain a new trend in deep generative models: diffusion models. And finally, I will highlight very recent research results that propose a new approach to mixing GAN with diffusion models. What is a GAN? The goal of GAN is to generate new unseen data from a specific dataset. It does so by trying to learn a model of the true, unknown underlying data distribution from the samples. In another word, these networks are implicit models that try to learn a specific statistical distribution [6]. What was innovative about GAN was the way they learn to achieve this goal. Indeed, they generate data by learning an implicit model through a two-player game. The structure is the following: A Discriminator that learns to distinguish between real and generated data A Generator that learns to generate data fools the discriminator. Schematic of a GAN with the generator that feeds input to the discriminator. In other words, the generator has to design a high-resolution image to be able to fool the discriminator as the discriminator acts like a teacher network. One big difference with autoencoder models is that these generators are not trained using any distribution as output. The loss function of the model can be decomposed in two terms: a part that quantifies if the discriminator correctly predicts real data is real a part that correctly predicts generated data is generated This loss function is then minimized on the best possible discriminator: Thus, generative models can be seen as distance minimization models and, if the discriminator is optimal, as divergence minimization between the true and generated distribution. In practice, multiple divergences can be used and give rise to different training for GANs. However, while the loss function of GANs can be easily controlled it is hard to follow the learning dynamics as it consists of a trade-off between the generator and the discriminator. In addition, there are no guarantees of convergence of learning. Thus, it is challenging to train a GAN model as it is common to encounter issues such as vanishing gradient and mode collapsing (when there is no diversity in the generated samples). Diffusion Models Diffusion models have been designed with the goal of solving the issue with the training convergence of GANs. The idea behind these models is that a diffusion process equates to a loss of information due to gradual intervention of noise (a gaussian noise is added at every timestep of the diffusion process). The goal of such a model is to learn the impact of noise on the information available in the sample, or in other words how much the diffusion process reduces the information available. If a model can learn this, then it should be able to reverse the loss of information that happened and retrieve the original sample. A denoising diffusion model does exactly this. It consists of a two steps process: a forward and a reverse diffusion process. In the forward diffusion process, Gaussian noise (i.e. diffusion process) is introduced successively until the data is all noise [7]. The reverse diffusion process then trains a neural network to learn the conditional distribution probabilities to reverse the noise. Schematic of a diffusion model with the forward and reverse diffusion process These types of networks have been successful in solving the generative process, as can be seen in the following figure (model from [3]): Samples generated with the prompt “a dragon on oil canvas” with the diffusion model from CompVis/latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models (github.com) However, in order to generate the sample, the reverse diffusion chain has to be traversed a multitude number of times. This is very computationally expensive making these models inefficient to generate samples. Despite their ability to generate high-quality realistic images, their real-world adoption is thus limited. Diffusion-GAN Various methods have been suggested to reduce the generative cost of diffusion-based models. One of the most intuitive ways to reduce this cost is to reduce the number of denoising steps of the reverse process of the diffusion models [4]. Generative diffusion models typically assume that the denoising distribution can be modeled by a Gaussian distribution. The issue is that this assumption holds only for small denoising steps, which lead to a huge number of denoising steps in the generative process and thus is not practical to use. The most recent approach combines deep-generative models (GAN) and diffusion models in the following way. Instead of minimizing the divergence between real and diffused data at the end of the process, it minimizes the divergence between the diffused real data … Continue reading GAN: Is Diffusion All You Need?