All About Efficiency Metrics in ML
Last Updated on June 3, 2024 by Editorial Team
Author(s): Nandini Tengli
Originally published on Towards AI.
Recently, I have been working on optimizing ML models to improve their runtime efficiency on hardware. This article is a comprehensive list of popular efficiency metrics used while evaluating the efficiency of Machine Learning Models. We will review what each of these means and how to calculate them.
- MACs
- FLOPs and FLOPS
- OPs and OPS
- Number of Parameters
- Number of Peak Activations
- Model Size
- Latency
- Throughput
A convention to note: whenever you have a small s next to an acronym (ex. MACs) it refers to the plural, while a big S (ex. FLOPS) refers to FLOPs/second
These are the notations used:
MAC operations: Multiply -Accumulate operations
- in a Matrix-Vector Multiplication (MV):
- in a General Matrix-Matrix Multiplication (GEMM):
To find the total number of MACs in a Neural Network, calculate the MACs for each layer and then add them all up.
FLOP: Floating point operations
FLOPS: Floating point operations (FLOPs)/second
- One Multiply-Accumulate is 2 FLOPs
- so if NN has 724 M MACs:
Number of OPs: Number of Operations.
This is a more general term than FLOP, and it is useful when weβre talking about Neural Networks where weights/activations are not floating points (ex. when they are quantized)
- OPS = Operations (OPs)/second
Number of Parameters: Number of elements in the weight tensors of a Neural Network.
Number of Activations & Peak Number Activations:
Refers to the number of Neurons essentially, since activations are essentially the βneuronsβ in the Neural Network. This is the memory bottleneck when it comes to inference on IOT devices.
C: Channels, H: Height, W: Width
To find the total activations in the Neural Network, just add the activations of each layer!
Model Size: measures the storage size for the weights, for the given Neural Network.
Measured in MB (Mega-Bytes), KB (Kilo-Bytes), and bits
In general, assuming the whole neural network uses the same data type:
Example:
- A model has 61 million parameters and all of them are stored in 32-bit precision
- If the 61 million are instead stored in 8-bits
This is how Quantization helps when you want to reduce model size
Latency: measures the delay of a specific task
For instance, if the amount of time taken to process 1 frame is 2.6 ms, the latency of the network is 2.6 ms
- The higher the latency, the slower the model
- we are aiming for low-latency
Calculating Latency:
As seen from above equations, some parameters are Neural-Network dependent (subscripted with NN) and some of them are Processor-dependent.
Throughput: measures the rate at which data is processed
For instance, the number of frames processed in a second might be 77.4 frames/s, which is the throughput. A lower throughput would be processing 6 frames/s
- we are aiming for a high throughput
Higher Throughput does not necessarily translate to lower Latency and lower Latency does not Translate to higher Throughput.
As we can see in the image above, on the left we have lower latency (50ms per image) but the throughput is lower compared to b), where the throughput is much higher but the latency is also higher (100ms).
This is how parallel processing increases throughput (increase the number of processors to increase the number of items processed per second). However, reducing latency is not as simple.
One way we could reduce latency would be overlapping the data load with the compute. For instance, if the data load for the 2nd layer overlaps with the computing of the 1st layer, it will reduce the latency of processing each frame through the network.
Energy Efficiency: Memory references are very expensive compared to any computing operations. As you can see from the graph below:
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI