Site icon Towards AI

An Insider’s guide to Cartoonization using Machine Learning

Author(s): Daksh Trehan

Deep Learning, Computer Vision

Implementing white-box models to “cartoonize” real high-quality images.

Cartoonization is itself a classic art but, the evolution in the field of Machine Learning is covering almost every realm. Today, in this article we are going to learn about a new method called ‘ White box Cartoonization’ that reconstructs a photo into an animation style focusing on the expression extracting elements, making the plans entirely controllable and flexible.

Results obtained using “White-box-Cartoonization”, Source

Introduction to Model

I bet everyone must be fond of cartoons and they must have been an integral part of your childhood too. And apart from these relishing memories, it might be a career choice for some of us.

But Machine Learning is constantly evolving thus expanding in almost every field. And research work done by Xinrui Wang and Jinze Yu has enabled us to cartoonize real high-quality images with just a little training.

The process of converting real-life high-quality pictures into practical cartoon scenes is known as cartoonization.

Earlier models that proposed the same approach used black-box models, the former model achieves great accuracies but downturns the stylization quality causing some bad cases. Like, every cartoon workflow considers different features, these variations pose a relevant effect on black-box models.

To overcome the drawbacks of the former model, more emphasis was given upon human painting behaviors and cartoon images of different styles, and a white-box model was developed.

The model decomposes images into three different cartoon representations, which further counsel the network optimization to generate cartoonized images.

  1. Surface Representation: It helps to extract smooth surfaces of the image that contains a weighted low-frequency component where the color composition and surface texture are retained along with edges, textures, and details.
  2. Structure Representation: It helps to derive global structural information and sparse color blocks, once done we implement adaptive coloring algorithms like the Felzenswalb algorithm to develop structural representation that can help us to generate sparse visual effects for celluloid styled cartoon process.
  3. Textured Representation: It helps us to retain painted details and edges. The three-dimensional image is converted to single-channel intensity map that helps to retain pixel intensity compromising color and luminance, it follows the approach of manual artist that first draw a line sketch with contours and then apply colors to it.

The extracted outputs are fed to a Generative Neural Networks (GAN) framework, which helps to optimize our problem making the solution more flexible and diversified.

Source

Proposed Approach

Preprocessing

Along with the proposed three-step approach, preprocessing is an important part of our model. It helps to smoothen the image, filter the features, converting it to sketches, and translating the output from a domain to another. After implementing these related work we can be sure that the output generated by our model will give us the best output that retains the highest quality features.

Before vs After, Source
Original vs Guided filter vs Bilateral filter, Source
Output produced by NPR, Source
GAN Model Architecture, Source
Paired vs Unpaired data, Source

To continue with the process, we segregate image features which enforces the network to learn different features with separate objectives, making the process more robust.

Output of cycleGAN aka Image-to-Image translation, Source

Understanding the whole model

Model layout, Source

The input image is dissolved in three parts wiz Surface Representation, Structural Representation, and Texture Representation. A GAN model with a generator G and two discriminators Ds and Dt are introduced. The goal of Ds is to characterize surface features extracted from model outputs and cartoons, whereas Dt is responsible for separate textural information from model outputs and cartoons. To pluck high-level features and to impose a spatial constraint on global content between outputs and provided paired cartoons we use pre-trained VGGNetwork.

The Workflow

The workflow for our model.

Model Architecture

Model Architecture, Source

Since our model is based on GAN, we operate both Generator and Discriminator in our model. The generator is a fully-convolutional network stride=2. The network further consists of three kinds of layers: convolution, Leaky ReLU, and bilinear resize layers.

In the discriminator network, we employ PatchGAN, where the last layer is a convolution layer. Each pixel in the output feature map corresponds to a patch in the input image. Each patch size is used to verify whether the patch belongs to cartoon images or generated output. The Patch GAN enhances the operation of the discriminator and fastens up the training.

Output

Outputs wiz different Representations, Source
Outputs produced by Model, Source

Code

The code for model can be found at:

dakshtrehan/White-box-Cartoonization

And

SystemErrorWang/White-box-Cartoonization

Conclusion

Hopefully, this article will help you to understand about cartoonization of images using Machine Learning in the best possible way and also assist you to its practical usage.

As always, thank you so much for reading, and please share this article if you found it useful!

References

[1] Wang, Xinrui, and Yu, Jinze. Learning to Cartoonize Using White-Box Cartoon Representations. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2020.

[2] Pedro F Felzenszwalb and Daniel P Huttenlocher. Effificient graph-based image segmentation. International Journal of Computer Vision, 59(2):167–181, 2004.

[3] Huikai Wu, Shuai Zheng, Junge Zhang, and Kaiqi Huang. Fast end-to-end trainable guided fifilter. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1838–1847, 2018.

[4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.

[5] Kaiming He, Jian Sun, and Xiaoou Tang. Guided Image Filtering. Department of Information Engineering, The Chinese University of Hong Kong, Microsoft Research Asia, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China

[6] Non-photorealistic rendering

[7] CycleGAN: How Machine Learning Learns Unpaired Image-To-Image Translation

[8] Minimum spanning tree-based segmentation

[9] Overview of GAN Structure

[10] Cewu Lu, Li Xu, and Jiaya Jia. Combining sketch and tone for pencil drawing production. In Proceedings of the Symposium on Non-Photorealistic Animation and Rendering, pages 65–73. Eurographics Association, 2012.

Feel free to connect:

LinkedIn ~ https://www.linkedin.com/in/dakshtrehan/

Instagram ~ https://www.instagram.com/_daksh_trehan_/

Github ~ https://github.com/dakshtrehan

Follow for further Machine Learning/ Deep Learning blogs.

Medium ~ https://medium.com/@dakshtrehan

Want to learn more?

Detecting COVID-19 Using Deep Learning

The Inescapable AI Algorithm: TikTok

Why are YOU responsible for George Floyd’s Murder and Delhi Communal Riots?

Why Choose Random Forest and Not Decision Trees

Clustering: What it is? When to use it?

Start off your ML Journey with k-Nearest Neighbors

Naive Bayes Explained

Activation Functions Explained

Parameter Optimization Explained

Gradient Descent Explained

Logistic Regression Explained

Linear Regression Explained

Determining Perfect Fit for your ML Model

Cheers!


An Insider’s guide to Cartoonization using Machine Learning was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Exit mobile version