Diffusers: Python Library for AI-Generated Images
Last Updated on July 17, 2023 by Editorial Team
Author(s): Muhammad Arham
Originally published on Towards AI.
This article shows the basic usage of HuggingFaceβs diffuser library, which is used for AI-generated images through code.
Introduction
The Diffusers library maintained by HuggingFace is a go-to library for Generative AI that provides multiple stable diffusion pipelines for images, audio, and several other useful functionalities.
SD-WebUI is deployed on Automatic1111 and is available on GitHub for people who prefer GUI. However, for deployment purposes, GUI may not be the best option, and people have been using the diffuser library to deploy full-blown Generative AI applications. This article will showcase the setup and usage of a diffuser library for generating images using Stable Diffusion.
Pre-requisites
Firstly, you need to set up a fresh environment for the use of the diffuser library. A fresh environment is not a necessity but it helps avoid dependency clashes between pre-installed packages and versions required by diffusers and associated libraries. You can use either a Python virtual environment or a Conda environment.
In the new environment, run the following commands to set up the required packages.
python -m pip install diffusers[torch]
python -m pip install transformers
For documentation and codebase, refer to the GitHub links for Diffusers and Transformers library. Both are currently maintained by HuggingFace, with new functionality and updates being pushed frequently.
Code
Basic text-to-image pipelines are provided by the Diffusers library. Other available pipelines, including Image-to-Image, Inpainting, and ControlNet, are used similarly. For this article, we will focus on the basic Stable Diffusion text-to-image pipeline.
Import Relevant Libraries
from diffusers import StableDiffusionPipeline
import torch
Setup Hyperparameters and Constants
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
PROMPT = 'hyperrealistic portrait of a man as astronaut, portrait, well lit, cyberpunk,'
MODEL_ID = 'stabilityai/stable-diffusion-2-1'
Device is used to select hardware devices. If an Nvidia GPU is detected in the system, it will be selected, which can speed up inference. In case a GPU is not available, inference can still be done on CPU hardware.
The prompt is the textual prompt that will be passed to the pipeline.
MODEL_ID is the pre-trained model that will be fetched from HuggingFace. Multiple models are provided for Stable Diffusion available for use on HuggingFace Model Hub. Stable Diffusion 2.1 is the most recent release that provides 768×768 output results. Other models available are Stable Diffusion 1.5, and 2.0. Moreover, other fine-tuned models are also available that are trained for specific styles, such as anime or realistic images.
Initialize Pipeline
pipe = StableDiffusionPipeline
.from_pretrained(MODEL_ID, torch_dtype=torch.float16)
.to(DEVICE)
We use torch float16 precision instead of the default float32. Using Half precision, we can reduce GPU utilization that provides efficient inference.
The from_pretrained method fetches the pre-trained model from HuggingFace. It downloads and caches all required modules such as Text Encoder, Unet, and Variational AutoEncoder, and returns a StableDiffusionPipeline object.
We then convert to pipe to the dedicated hardware that is to be used for inference. It will be either GPU or CPU.
Inference
result = pipe(PROMPT, num_inference_steps=50, guidance_scale=7).images[0]
We pass the required parameters for the forward call of the StableDiffusionPipeline object. It returns an object of type StableDiffusionOutput that is built in the diffusers library. It contains a list of generated images. For our use case, we only fetch the first generated image.
The num_inference_steps argument sets the total denoising steps used. A higher number provides better results as the Unet can denoise an image for longer.
The guidance_scale argument controls the prompt conditioning on the output image. A lower guidance scale means the model pays less attention to the prompt, so the output may not reflect the prompt provided. However, the model has more creative freedom so the generated image can showcase more variations. A higher guidance scale focuses more on the prompt provided.
The generated image is a PIL Image object. So the Pillow library can be used to save or post-process the image further.
result.save('result.png')
Output
The result is a 768×768 dimension image based on the prompt provided. The exact dimension is based on the default size the pre-trained model was trained on. However, the dimension can be changed by using the height and width keyword argument during inference.
Complete Code
from diffusers import StableDiffusionPipeline
import torch
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
PROMPT = 'hyperrealistic portrait of a man as astronaut, portrait, well lit, cyberpunk,'
MODEL_ID = 'stabilityai/stable-diffusion-2-1'
pipe = StableDiffusionPipeline.from_pretrained(MODEL_ID, torch_dtype=torch.float16).to(DEVICE)
result = pipe(PROMPT, num_inference_steps=50, guidance_scale=7).images[0]
result.save('result.png')
Conclusion
The article highlighted the basic usage of the diffusers library that can be used for the deployment of applications related to Generative AI. There are multiple other pipelines available for different use cases. The API of each pipeline is similar, with simple changes in inference arguments making the required changes.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI