All About Efficiency Metrics in ML

Last Updated on June 3, 2024 by Editorial Team

Author(s): Nandini Tengli

Originally published on Towards AI.

Recently, I have been working on optimizing ML models to improve their runtime efficiency on hardware. This article is a comprehensive list of popular efficiency metrics used while evaluating the efficiency of Machine Learning Models. We will review what each of these means and how to calculate them.

MACs
FLOPs and FLOPS
OPs and OPS
Number of Parameters
Number of Peak Activations
Model Size
Latency
Throughput

A convention to note: whenever you have a small s next to an acronym (ex. MACs) it refers to the plural, while a big S (ex. FLOPS) refers to FLOPs/second

These are the notations used:

MAC operations: Multiply -Accumulate operations

in a Matrix-Vector Multiplication (MV):

in a General Matrix-Matrix Multiplication (GEMM):

Table for how to calculate the number of MAC operations for each type of layer in a Neural Network https://hanlab.mit.edu/courses/2023-fall-65940

To find the total number of MACs in a Neural Network, calculate the MACs for each layer and then add them all up.

FLOP: Floating point operations

FLOPS: Floating point operations (FLOPs)/second

One Multiply-Accumulate is 2 FLOPs
so if NN has 724 M MACs:

Number of OPs: Number of Operations.

This is a more general term than FLOP, and it is useful when we’re talking about Neural Networks where weights/activations are not floating points (ex. when they are quantized)

OPS = Operations (OPs)/second

Number of Parameters: Number of elements in the weight tensors of a Neural Network.

Calculating the number of parameters for each type of layer https://hanlab.mit.edu/courses/2023-fall-65940

Number of Activations & Peak Number Activations:

Refers to the number of Neurons essentially, since activations are essentially the ‘neurons’ in the Neural Network. This is the memory bottleneck when it comes to inference on IOT devices.

C: Channels, H: Height, W: Width

To find the total activations in the Neural Network, just add the activations of each layer!

Model Size: measures the storage size for the weights, for the given Neural Network.

Measured in MB (Mega-Bytes), KB (Kilo-Bytes), and bits

In general, assuming the whole neural network uses the same data type:

Example:

A model has 61 million parameters and all of them are stored in 32-bit precision

If the 61 million are instead stored in 8-bits

This is how Quantization helps when you want to reduce model size

Latency: measures the delay of a specific task

For instance, if the amount of time taken to process 1 frame is 2.6 ms, the latency of the network is 2.6 ms

The higher the latency, the slower the model
we are aiming for low-latency

Calculating Latency:

As seen from above equations, some parameters are Neural-Network dependent (subscripted with NN) and some of them are Processor-dependent.

Throughput: measures the rate at which data is processed

For instance, the number of frames processed in a second might be 77.4 frames/s, which is the throughput. A lower throughput would be processing 6 frames/s

we are aiming for a high throughput

Higher Throughput does not necessarily translate to lower Latency and lower Latency does not Translate to higher Throughput.

Left: Latency = 50 ms Throughput: 20 images / s. Right: Latency: 100 ms Throughput: 40 images/s https://hanlab.mit.edu/courses/2023-fall-65940

As we can see in the image above, on the left we have lower latency (50ms per image) but the throughput is lower compared to b), where the throughput is much higher but the latency is also higher (100ms).

This is how parallel processing increases throughput (increase the number of processors to increase the number of items processed per second). However, reducing latency is not as simple.

One way we could reduce latency would be overlapping the data load with the compute. For instance, if the data load for the 2nd layer overlaps with the computing of the 1st layer, it will reduce the latency of processing each frame through the network.

Energy Efficiency: Memory references are very expensive compared to any computing operations. As you can see from the graph below:

Source: https://hanlab.mit.edu/courses/2023-fall-65940

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

All About Efficiency Metrics in ML

Author(s): Nandini Tengli

MAC operations: Multiply -Accumulate operations

FLOP: Floating point operations

Number of OPs: Number of Operations.

Number of Parameters: Number of elements in the weight tensors of a Neural Network.

Number of Activations & Peak Number Activations:

Model Size: measures the storage size for the weights, for the given Neural Network.

Latency: measures the delay of a specific task

Throughput: measures the rate at which data is processed

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

TAI #143: New Scaling Laws Incoming? Ilya’s SSI Raises at $30bn, Manus Takes AI Agents Mainstream

NN#12 — Neural Networks Decoded: Concepts Over Code

Future-Proof Your Marketing: Applied AI and Prompt Engineering for Homo Sapiens

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

All About Efficiency Metrics in ML

Author(s): Nandini Tengli

MAC operations: Multiply -Accumulate operations

FLOP: Floating point operations

Number of OPs: Number of Operations.

Number of Parameters: Number of elements in the weight tensors of a Neural Network.

Number of Activations & Peak Number Activations:

Model Size: measures the storage size for the weights, for the given Neural Network.

Latency: measures the delay of a specific task

Throughput: measures the rate at which data is processed

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement