5 Types of ML Accelerators
Last Updated on November 2, 2022 by Editorial Team
Author(s): Luhui Hu
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Comprehensive overview of machine learning accelerators for training and serving
The past decade has been the era of deep learning. We are thrilled with unstopping milestones from AlphaGo to DELL-E 2 and more. And we cannot count how many AI-powered things have happened in our daily lives, including Alexa devices, Ads recommendations, warehouse robots, self-driving cars, and more.
Recent years have seen exponential growth in the scale of deep learning models. It is not news that Wu Dao 2.0 model contains 1.75 trillion parameters, and it takes about 25 days to train GPT-3 on 240 ml.p4d.24xlarge instances of the SageMaker training platform.
But it becomes increasingly challenging as deep learning training and serving evolve. Scalability and efficiency are two major challenges for training and serving due to the growth of deep learning models.
Are deep learning systems stuck in a rut?
No! I introduced distributed parallel training for scaling out training in my earlier two articles: model parallelism and distributed parallel training. I shared ML compilers for accelerating training and serving.
Are these all solutions for the umbrella of deep learning?
No! Here I’ll summarize five primary types of ML accelerators or accelerating areas.
Understand ML Lifecycle in AI Engineering
Before fully covering ML accelerators, let’s first visit the ML lifecycle.
ML lifecycle is a lifecycle of data and models. Data is food for ML and can determine model quality. Every area in the lifecycle is full of opportunities for acceleration.
MLOps can automate the process of ML model deployment and serving. But it is limited to the horizontal process of AI workflow and cannot improve training and serving fundamentally due to the nature of operations.
AI engineering, far beyond MLOps, can holistically (both horizontally and vertically) engineer the process of ML workflow and the architecture of training and serving. Furthermore, it can accelerate serving and training through effective orchestration for the entire ML lifecycle.
Based on the holistic ML lifecycle with AI engineering, there are five primary types of ML accelerators (or accelerating areas): hardware accelerators, AI computing platforms, AI frameworks, ML compilers, and cloud services. Please see their relationship diagram below.
We can see hardware accelerators and AI frameworks are the mainstream of acceleration. But recently, ML compilers, AI computing platforms, and ML cloud services have become increasingly important.
Let’s take a closer look at them below.
1. AI Frameworks
We cannot skip choosing the right AI framework when talking about accelerating ML training and serving. Sadly, there is no perfect or best AI framework for all use cases. Three AI frameworks widely used in research and production are TensorFlow, PyTorch, and JAX. They lead from different perspectives, such as ease of use, production maturity, and scalability.
TensorFlow: TensorFlow is the flagship AI framework. TensorFlow has dominated the deep learning open-source community since the beginning. TensorFlow Serving is a well-defined, mature platform. TensorFlow.js and TensorFlow Lite are also ripe for the web and IoT.
But due to the limitations of deep learning early exploration, TensorFlow 1.x was all about building static graphs in a very non-Pythonic way. This became a barrier for instant evaluation using the “eager” mode, which allowed PyTorch to ramp up quickly in research. TensorFlow 2.x tried to catch up, but unfortunately, upgrading from TensorFlow 1.x to 2.x has to be brutal.
TensorFlow also introduced Keras for easier use from the high level and XLA (Accelerated Linear Algebra) optimizing compiler to improve low-level speed.
PyTorch: With its eager mode and Pythonic approach, PyTorch is a significant force in today’s deep learning world, from research to production. In addition to TorchServe, PyTorch integrates with framework-agnostic platforms such as Kubeflow. Also, PyTorch’s popularity was tied to the success of Hugging Face’s Transformers library in the first place.
JAX: Based on device-accelerated NumPy and JIT (Just-In-Time), Google rolled out JAX. It is a more native framework for deep learning rapidly gaining traction in research, as PyTorch did a few years ago. But it’s not an “official” Google product yet, as Google claims.
2. Hardware Accelerators
We can have a lengthy article on hardware accelerators. Undoubtedly, NVIDIA’s GPUs ignited to speed up DL training, though it was initially intended for video cards.
The popularity of graphics cards for neural network training exploded after the advent of general-purpose GPUs. These GP-GPUs could execute arbitrary code, not just rendering subroutines. NVIDIA’s CUDA programming language provided a way to write this arbitrary code in a C-like language. With their relatively convenient programming model, massive parallelism, and high memory bandwidth, GP-GPUs now oﬀer an ideal platform for neural network programming.
Today, NVIDIA supports a range of GPUs from desktop to mobile, workstations, mobile workstations, consoles, and data centers.
With the success of NVIDIA’s GPUs, there is no lack of successors along the way, such as AMD’s GPUs, Google’s TPU ASIC, etc.
3. AI Computing Platforms
As described above, the speed of ML training and serving significantly depends on hardware (e.g., GPU and TPU). These drivers (that is, AI computing platforms) become critical for performance. There are two well-known ones: CUDA and OpenCL.
CUDA: CUDA (Compute Unified Device Architecture) is a parallel programming paradigm released in 2007 by NVIDIA. It is designed for graphic processors and a vast array of general-purpose applications for GPUs. CUDA is a proprietary API only supporting NVIDIA’s GPUs for Tesla Architecture. The CUDA-supported graphics cards include the GeForce 8 series, Tesla and Quadro.
OpenCL: OpenCL (Open Computing Language) was initially developed by Apple and is maintained by the Khronos group for heterogeneous computing, including CPUs, GPUs, DSPs, and other types of processors. This portable language is adaptable enough to allow each hardware platform to achieve high performance, including NVIDIA’s GPUs.
NVIDIA is now an OpenCL 3.0 conformant and is available on R465 and later drivers. Using the OpenCL API, people can launch compute kernels written using a limited subset of the C programming language on a GPU.
4. ML Compilers
ML compilers play a vital role in accelerating training and serving. ML compilers can significantly improve the efficiency of large-scale model serving. There are many popular compilers, such as Apache TVM, LLVM, Google MLIR, TensorFlow XLA, Meta Glow, PyTorch nvFuser, and Intel PlaidML. Please refer to ML Compilers for more details.
5. ML Cloud Services
ML cloud platforms and services manage ML platforms in the cloud. They can optimize to improve efficiency in several ways.
Take Amazon SageMaker, for example. It is a leading ML cloud platform service. SageMaker provides extensive features for the ML lifecycle, from preparing to building, training/tuning, and deploying/managing.
It optimizes many ways for training and serving efficiency, for instance, multi-model endpoints on GPU, cost-effective training using heterogeneous clusters, and proprietary Graviton processors suited for CPU-based ML inference.
It becomes more and more challenging as DL training and serving scale. Improving DL training and service efficiency is sophisticated. Based on the ML lifecycle, there are five areas to accelerate ML training and serving: AI frameworks, hardware accelerators, compute platforms, ML compilers, and cloud services. AI engineering can orchestrate all these together for comprehensive efficiency with engineering principles.
5 Types of ML Accelerators was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI