Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

All About Efficiency Metrics in ML
Latest   Machine Learning

All About Efficiency Metrics in ML

Last Updated on June 3, 2024 by Editorial Team

Author(s): Nandini Tengli

Originally published on Towards AI.

Recently, I have been working on optimizing ML models to improve their runtime efficiency on hardware. This article is a comprehensive list of popular efficiency metrics used while evaluating the efficiency of Machine Learning Models. We will review what each of these means and how to calculate them.

  1. MACs
  2. FLOPs and FLOPS
  3. OPs and OPS
  4. Number of Parameters
  5. Number of Peak Activations
  6. Model Size
  7. Latency
  8. Throughput

A convention to note: whenever you have a small s next to an acronym (ex. MACs) it refers to the plural, while a big S (ex. FLOPS) refers to FLOPs/second

These are the notations used:

Notations used

MAC operations: Multiply -Accumulate operations

  • in a Matrix-Vector Multiplication (MV):
  • in a General Matrix-Matrix Multiplication (GEMM):
Table for how to calculate the number of MAC operations for each type of layer in a Neural Network https://hanlab.mit.edu/courses/2023-fall-65940

To find the total number of MACs in a Neural Network, calculate the MACs for each layer and then add them all up.

FLOP: Floating point operations

FLOPS: Floating point operations (FLOPs)/second

  • One Multiply-Accumulate is 2 FLOPs
  • so if NN has 724 M MACs:

Number of OPs: Number of Operations.

This is a more general term than FLOP, and it is useful when we’re talking about Neural Networks where weights/activations are not floating points (ex. when they are quantized)

  • OPS = Operations (OPs)/second

Number of Parameters: Number of elements in the weight tensors of a Neural Network.

Calculating the number of parameters for each type of layer https://hanlab.mit.edu/courses/2023-fall-65940

Number of Activations & Peak Number Activations:

Refers to the number of Neurons essentially, since activations are essentially the β€˜neurons’ in the Neural Network. This is the memory bottleneck when it comes to inference on IOT devices.

C: Channels, H: Height, W: Width

To find the total activations in the Neural Network, just add the activations of each layer!

Model Size: measures the storage size for the weights, for the given Neural Network.

Measured in MB (Mega-Bytes), KB (Kilo-Bytes), and bits

In general, assuming the whole neural network uses the same data type:

Example:

  • A model has 61 million parameters and all of them are stored in 32-bit precision
  • If the 61 million are instead stored in 8-bits

This is how Quantization helps when you want to reduce model size

Latency: measures the delay of a specific task

For instance, if the amount of time taken to process 1 frame is 2.6 ms, the latency of the network is 2.6 ms

  • The higher the latency, the slower the model
  • we are aiming for low-latency

Calculating Latency:

As seen from above equations, some parameters are Neural-Network dependent (subscripted with NN) and some of them are Processor-dependent.

Throughput: measures the rate at which data is processed

For instance, the number of frames processed in a second might be 77.4 frames/s, which is the throughput. A lower throughput would be processing 6 frames/s

  • we are aiming for a high throughput

Higher Throughput does not necessarily translate to lower Latency and lower Latency does not Translate to higher Throughput.

Left: Latency = 50 ms Throughput: 20 images / s. Right: Latency: 100 ms Throughput: 40 images/s https://hanlab.mit.edu/courses/2023-fall-65940

As we can see in the image above, on the left we have lower latency (50ms per image) but the throughput is lower compared to b), where the throughput is much higher but the latency is also higher (100ms).

This is how parallel processing increases throughput (increase the number of processors to increase the number of items processed per second). However, reducing latency is not as simple.

One way we could reduce latency would be overlapping the data load with the compute. For instance, if the data load for the 2nd layer overlaps with the computing of the 1st layer, it will reduce the latency of processing each frame through the network.

Energy Efficiency: Memory references are very expensive compared to any computing operations. As you can see from the graph below:

Source: https://hanlab.mit.edu/courses/2023-fall-65940

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓