5 Types of ML Accelerators
Last Updated on November 2, 2022 by Editorial Team
Author(s): Luhui Hu
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Comprehensive overview of machine learning accelerators for training andΒ serving
The past decade has been the era of deep learning. We are thrilled with unstopping milestones from AlphaGo to DELL-E 2 and more. And we cannot count how many AI-powered things have happened in our daily lives, including Alexa devices, Ads recommendations, warehouse robots, self-driving cars, andΒ more.
Recent years have seen exponential growth in the scale of deep learning models. It is not news that Wu Dao 2.0 model contains 1.75 trillion parameters, and it takes about 25 days to train GPT-3 on 240 ml.p4d.24xlarge instances of the SageMaker training platform.
But it becomes increasingly challenging as deep learning training and serving evolve. Scalability and efficiency are two major challenges for training and serving due to the growth of deep learningΒ models.
Are deep learning systems stuck in aΒ rut?
No! I introduced distributed parallel training for scaling out training in my earlier two articles: model parallelism and distributed parallel training. I shared ML compilers for accelerating training andΒ serving.
Are these all solutions for the umbrella of deep learning?
No! Here Iβll summarize five primary types of ML accelerators or accelerating areas.
Understand ML Lifecycle in AI Engineering
Before fully covering ML accelerators, letβs first visit the ML lifecycle.
ML lifecycle is a lifecycle of data and models. Data is food for ML and can determine model quality. Every area in the lifecycle is full of opportunities for acceleration.
MLOps can automate the process of ML model deployment and serving. But it is limited to the horizontal process of AI workflow and cannot improve training and serving fundamentally due to the nature of operations.
AI engineering, far beyond MLOps, can holistically (both horizontally and vertically) engineer the process of ML workflow and the architecture of training and serving. Furthermore, it can accelerate serving and training through effective orchestration for the entire ML lifecycle.
Based on the holistic ML lifecycle with AI engineering, there are five primary types of ML accelerators (or accelerating areas): hardware accelerators, AI computing platforms, AI frameworks, ML compilers, and cloud services. Please see their relationship diagramΒ below.
We can see hardware accelerators and AI frameworks are the mainstream of acceleration. But recently, ML compilers, AI computing platforms, and ML cloud services have become increasingly important.
Letβs take a closer look at themΒ below.
1. AI Frameworks
We cannot skip choosing the right AI framework when talking about accelerating ML training and serving. Sadly, there is no perfect or best AI framework for all use cases. Three AI frameworks widely used in research and production are TensorFlow, PyTorch, and JAX. They lead from different perspectives, such as ease of use, production maturity, and scalability.
TensorFlow: TensorFlow is the flagship AI framework. TensorFlow has dominated the deep learning open-source community since the beginning. TensorFlow Serving is a well-defined, mature platform. TensorFlow.js and TensorFlow Lite are also ripe for the web andΒ IoT.
But due to the limitations of deep learning early exploration, TensorFlow 1.x was all about building static graphs in a very non-Pythonic way. This became a barrier for instant evaluation using the βeagerβ mode, which allowed PyTorch to ramp up quickly in research. TensorFlow 2.x tried to catch up, but unfortunately, upgrading from TensorFlow 1.x to 2.x has to beΒ brutal.
TensorFlow also introduced Keras for easier use from the high level and XLA (Accelerated Linear Algebra) optimizing compiler to improve low-level speed.
PyTorch: With its eager mode and Pythonic approach, PyTorch is a significant force in todayβs deep learning world, from research to production. In addition to TorchServe, PyTorch integrates with framework-agnostic platforms such as Kubeflow. Also, PyTorchβs popularity was tied to the success of Hugging Faceβs Transformers library in the firstΒ place.
JAX: Based on device-accelerated NumPy and JIT (Just-In-Time), Google rolled out JAX. It is a more native framework for deep learning rapidly gaining traction in research, as PyTorch did a few years ago. But itβs not an βofficialβ Google product yet, as GoogleΒ claims.
2. Hardware Accelerators
We can have a lengthy article on hardware accelerators. Undoubtedly, NVIDIAβs GPUs ignited to speed up DL training, though it was initially intended for videoΒ cards.
The popularity of graphics cards for neural network training exploded after the advent of general-purpose GPUs. These GP-GPUs could execute arbitrary code, not just rendering subroutines. NVIDIAβs CUDA programming language provided a way to write this arbitrary code in a C-like language. With their relatively convenient programming model, massive parallelism, and high memory bandwidth, GP-GPUs now oο¬er an ideal platform for neural network programming.
Today, NVIDIA supports a range of GPUs from desktop to mobile, workstations, mobile workstations, consoles, and dataΒ centers.
With the success of NVIDIAβs GPUs, there is no lack of successors along the way, such as AMDβs GPUs, Googleβs TPU ASIC,Β etc.
3. AI Computing Platforms
As described above, the speed of ML training and serving significantly depends on hardware (e.g., GPU and TPU). These drivers (that is, AI computing platforms) become critical for performance. There are two well-known ones: CUDA andΒ OpenCL.
CUDA: CUDA (Compute Unified Device Architecture) is a parallel programming paradigm released in 2007 by NVIDIA. It is designed for graphic processors and a vast array of general-purpose applications for GPUs. CUDA is a proprietary API only supporting NVIDIAβs GPUs for Tesla Architecture. The CUDA-supported graphics cards include the GeForce 8 series, Tesla andΒ Quadro.
OpenCL: OpenCL (Open Computing Language) was initially developed by Apple and is maintained by the Khronos group for heterogeneous computing, including CPUs, GPUs, DSPs, and other types of processors. This portable language is adaptable enough to allow each hardware platform to achieve high performance, including NVIDIAβsΒ GPUs.
NVIDIA is now an OpenCL 3.0 conformant and is available on R465 and later drivers. Using the OpenCL API, people can launch compute kernels written using a limited subset of the C programming language on aΒ GPU.
4. ML Compilers
ML compilers play a vital role in accelerating training and serving. ML compilers can significantly improve the efficiency of large-scale model serving. There are many popular compilers, such as Apache TVM, LLVM, Google MLIR, TensorFlow XLA, Meta Glow, PyTorch nvFuser, and Intel PlaidML. Please refer to ML Compilers for moreΒ details.
5. ML CloudΒ Services
ML cloud platforms and services manage ML platforms in the cloud. They can optimize to improve efficiency in severalΒ ways.
Take Amazon SageMaker, for example. It is a leading ML cloud platform service. SageMaker provides extensive features for the ML lifecycle, from preparing to building, training/tuning, and deploying/managing.
It optimizes many ways for training and serving efficiency, for instance, multi-model endpoints on GPU, cost-effective training using heterogeneous clusters, and proprietary Graviton processors suited for CPU-based ML inference.
Final Remark
It becomes more and more challenging as DL training and serving scale. Improving DL training and service efficiency is sophisticated. Based on the ML lifecycle, there are five areas to accelerate ML training and serving: AI frameworks, hardware accelerators, compute platforms, ML compilers, and cloud services. AI engineering can orchestrate all these together for comprehensive efficiency with engineering principles.
5 Types of ML Accelerators was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI