Microsoft Research Introduces Not One, Not Two, But Four New AI Compilers

Last Updated on November 6, 2023 by Editorial Team

Author(s): Jesus Rodriguez

Originally published on Towards AI.

I recently started an AI-focused educational newsletter, that already has over 160,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:

TheSequence U+007C Jesus Rodriguez U+007C Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

thesequence.substack.com

Compilers are seeing a renaissance in the era of generative AI. In the context of AI, a compiler is responsible for translating a neural network architecture into executable code in a specific hardware topology. Those two areas: model and hardware architectures, have been an explosion in innovation, regularly making AI compilers obsolete.

The challenges in AI compilation are many, from hardware acceleration to computation and memory efficiency. Microsoft Research has been at the forefront of the AI compiler research, and recently, they unveiled a quartet of cutting-edge AI compilers, each tailored to address specific challenges in the realm of deep neural networks (DNNs). The list includes the following compilers:

· Rammer: For parallelism

· Roller: For computation

· Welder: For memory

· Grinder: For control flow and hardware acceleration

Let’s dive into each one.

Rammer: Pioneering Parallel Hardware Utilization

Deep neural networks (DNNs) have become integral to various intelligence tasks, spanning image classification to natural language processing. To harness their power, a plethora of computing devices, including CPUs, GPUs, and specialized DNN accelerators, are employed. A critical factor influencing DNN computation efficiency is scheduling, the process that dictates the order of computational tasks on hardware. Conventional AI compilers often represent DNN computation as a data flow graph with nodes symbolizing DNN operators, scheduled to run on accelerators independently. This methodology, however, introduces significant scheduling overhead and underutilizes hardware resources.

Enter Rammer, a DNN compiler that envisions the scheduling space as a two-dimensional plane. Here, computational tasks are akin to bricks, with varied shapes and sizes. Rammer’s mission is to arrange these bricks snugly on the two-dimensional plane, akin to constructing a seamless wall. No gaps are allowed to optimize hardware utilization and execution speed. Rammer effectively acts as a compactor within this spatial domain, efficiently placing DNN program bricks on different computing units of the accelerator, thus mitigating runtime scheduling overhead. Additionally, Rammer introduces novel hardware-independent abstractions for computing tasks and hardware accelerators, broadening the scheduling space and enabling more efficient schedules.

Roller: Enhancing Computational Efficiency

Accelerators boasting parallel computing units and intricate memory hierarchies necessitate a systematic data transfer approach. Data must ascend through memory layers, partitioned into smaller bricks at each step, before reaching the top-level processor for computation. The challenge lies in partitioning and filling memory space with large bricks to optimize memory utilization and efficiency. The current approach employs machine learning for brick partitioning strategies, requiring numerous search steps evaluated on the accelerator. This lengthy process can take days or weeks to compile a full AI model.

Roller expedites compilation while maintaining optimal computation efficiency. At its core, the Roller embodies a unique concept akin to the operation of a road roller. This innovative system smoothly deposits high-dimensional tensor data onto a two-dimensional memory structure, much like skillfully tiling a floor. It does so with precision, discerning the ideal tile sizes based on the specific memory attributes. Simultaneously, Roller intelligently encapsulates the tensor shape to harmonize with the hardware nuances of the underlying accelerator. This strategic alignment significantly streamlines the compilation process by constraining the range of shape options, ultimately leading to highly efficient outcomes.

Welder: Streamlining Memory Access

As DNN models increasingly demand higher-fidelity data and faster computing cores in modern hardware accelerators, memory bandwidth bottlenecks have surfaced. To counter this, Welder, the deep learning compiler, comprehensively optimizes memory access efficiency in the end-to-end DNN model. The process involves multiple stages, where input data is divided into blocks that traverse different operators and memory layers. Welder transforms this process into an efficient assembly line, welding together different operators and data blocks, reducing memory access traffic at lower-level memory layers.

Grinder: Mastering Control Flow Execution

In AI computation, complex control logic sometimes accompanies data block movement. Current AI compilers predominantly focus on data flow execution efficiency, neglecting efficient support for control flow. Grinder bridges this gap by seamlessly integrating control flow into data flow, enabling efficient execution on accelerators. It unifies the representation of AI models through uTask, a novel abstraction, and leverages heuristic strategies to optimize control flow execution across hardware parallelism levels. The grinder efficiently moves control flow into device kernels, thereby optimizing performance across control flow boundaries.

In summary, Microsoft Research’s quartet of AI compilers — Rammer, Roller, Welder, and Grinder — pave the way for enhanced DNN workload optimization, memory access efficiency, and control flow execution on hardware accelerators, marking a significant leap forward in AI compiler technology.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Microsoft Research Introduces Not One, Not Two, But Four New AI Compilers

Author(s): Jesus Rodriguez

TheSequence U+007C Jesus Rodriguez U+007C Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

Rammer: Pioneering Parallel Hardware Utilization

Roller: Enhancing Computational Efficiency

Welder: Streamlining Memory Access

Grinder: Mastering Control Flow Execution

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Microsoft Research Introduces Not One, Not Two, But Four New AI Compilers

Author(s): Jesus Rodriguez

TheSequence U+007C Jesus Rodriguez U+007C Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

Rammer: Pioneering Parallel Hardware Utilization

Roller: Enhancing Computational Efficiency

Welder: Streamlining Memory Access

Grinder: Mastering Control Flow Execution

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement