Microsoft Research Introduces Not One, Not Two, But Four New AI Compilers
Last Updated on November 6, 2023 by Editorial Team
Author(s): Jesus Rodriguez
Originally published on Towards AI.
I recently started an AI-focused educational newsletter, that already has over 160,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:
TheSequence U+007C Jesus Rodriguez U+007C Substack
The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and dataβ¦
thesequence.substack.com
Compilers are seeing a renaissance in the era of generative AI. In the context of AI, a compiler is responsible for translating a neural network architecture into executable code in a specific hardware topology. Those two areas: model and hardware architectures, have been an explosion in innovation, regularly making AI compilers obsolete.
The challenges in AI compilation are many, from hardware acceleration to computation and memory efficiency. Microsoft Research has been at the forefront of the AI compiler research, and recently, they unveiled a quartet of cutting-edge AI compilers, each tailored to address specific challenges in the realm of deep neural networks (DNNs). The list includes the following compilers:
Β· Rammer: For parallelism
Β· Roller: For computation
Β· Welder: For memory
Β· Grinder: For control flow and hardware acceleration
Letβs dive into each one.
Rammer: Pioneering Parallel Hardware Utilization
Deep neural networks (DNNs) have become integral to various intelligence tasks, spanning image classification to natural language processing. To harness their power, a plethora of computing devices, including CPUs, GPUs, and specialized DNN accelerators, are employed. A critical factor influencing DNN computation efficiency is scheduling, the process that dictates the order of computational tasks on hardware. Conventional AI compilers often represent DNN computation as a data flow graph with nodes symbolizing DNN operators, scheduled to run on accelerators independently. This methodology, however, introduces significant scheduling overhead and underutilizes hardware resources.
Enter Rammer, a DNN compiler that envisions the scheduling space as a two-dimensional plane. Here, computational tasks are akin to bricks, with varied shapes and sizes. Rammerβs mission is to arrange these bricks snugly on the two-dimensional plane, akin to constructing a seamless wall. No gaps are allowed to optimize hardware utilization and execution speed. Rammer effectively acts as a compactor within this spatial domain, efficiently placing DNN program bricks on different computing units of the accelerator, thus mitigating runtime scheduling overhead. Additionally, Rammer introduces novel hardware-independent abstractions for computing tasks and hardware accelerators, broadening the scheduling space and enabling more efficient schedules.
Roller: Enhancing Computational Efficiency
Accelerators boasting parallel computing units and intricate memory hierarchies necessitate a systematic data transfer approach. Data must ascend through memory layers, partitioned into smaller bricks at each step, before reaching the top-level processor for computation. The challenge lies in partitioning and filling memory space with large bricks to optimize memory utilization and efficiency. The current approach employs machine learning for brick partitioning strategies, requiring numerous search steps evaluated on the accelerator. This lengthy process can take days or weeks to compile a full AI model.
Roller expedites compilation while maintaining optimal computation efficiency. At its core, the Roller embodies a unique concept akin to the operation of a road roller. This innovative system smoothly deposits high-dimensional tensor data onto a two-dimensional memory structure, much like skillfully tiling a floor. It does so with precision, discerning the ideal tile sizes based on the specific memory attributes. Simultaneously, Roller intelligently encapsulates the tensor shape to harmonize with the hardware nuances of the underlying accelerator. This strategic alignment significantly streamlines the compilation process by constraining the range of shape options, ultimately leading to highly efficient outcomes.
Welder: Streamlining Memory Access
As DNN models increasingly demand higher-fidelity data and faster computing cores in modern hardware accelerators, memory bandwidth bottlenecks have surfaced. To counter this, Welder, the deep learning compiler, comprehensively optimizes memory access efficiency in the end-to-end DNN model. The process involves multiple stages, where input data is divided into blocks that traverse different operators and memory layers. Welder transforms this process into an efficient assembly line, welding together different operators and data blocks, reducing memory access traffic at lower-level memory layers.
Grinder: Mastering Control Flow Execution
In AI computation, complex control logic sometimes accompanies data block movement. Current AI compilers predominantly focus on data flow execution efficiency, neglecting efficient support for control flow. Grinder bridges this gap by seamlessly integrating control flow into data flow, enabling efficient execution on accelerators. It unifies the representation of AI models through uTask, a novel abstraction, and leverages heuristic strategies to optimize control flow execution across hardware parallelism levels. The grinder efficiently moves control flow into device kernels, thereby optimizing performance across control flow boundaries.
In summary, Microsoft Researchβs quartet of AI compilers β Rammer, Roller, Welder, and Grinder β pave the way for enhanced DNN workload optimization, memory access efficiency, and control flow execution on hardware accelerators, marking a significant leap forward in AI compiler technology.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI