How to Speedup Inference by Up to 9x on a x86 CPU with Pytorch
How to Speedup Inference by Up to 9x on a x86 CPU with Pytorch

Author(s): Nour Islam Mokhtari

The complete guide on how to achieve some impressive results with a few lines of code!

Image generated using StableDiffusion

Quantization in deep learning refers to the process of reducing the number of bits that represent the weights and biases of a model. It’s a technique used to compress models and make them more efficient for deployment, especially on resource-constrained devices like mobile phones, edge devices, and embedded systems.

image from the Nvidia website

