MAP in Object Detection: I Bet You’ll Remember This Forever!
Last Updated on September 29, 2025 by Editorial Team
Author(s): Debasish Das
Originally published on Towards AI.

Hey there! 👋
Ever trained an object detection model and wondered, “Is this thing actually any good?” Welcome to the club!
If terms like MAP, AP, and IoU make your brain go “404 error,” you’re in the right place. We’re about to break down these scary-sounding metrics into bite-sized pieces that actually make sense.
What you’ll master in 10 minutes:
— 🎯 IoU: How well do predictions overlap with truth?
— 🔄 NMS: Eliminating duplicate detections
— ⚖️ Precision/Recall: The accuracy trade-off
— 📊 AP: Single-class performance metric
— 🏆 MAP: Overall model evaluation score
I believe that images are better for speakers than text. Visual representation is a better way to learn any complex problem.
No PhD required, just grab your coffee and let’s make these metrics your friends! ☕
Ready? Let’s dive in!

Object Detection Intro
Basically, an object detection model identifies pre-defined objects in an image. There are several models available, such as R-CNN, YOLO, and Detectron. These models predict bounding boxes to detect objects.
To evaluate the performance of an object detection model, we use various metrics, including Intersection over Union (IoU), Average Precision (AP), mean Average Precision (mAP), and other evaluation measures.
What is IOU?
Intersection over Union (IoU), also known as the Jaccard Index, is a key metric in computer vision, particularly in object detection and image segmentation. It measures the alignment between a predicted bounding box or segmentation mask and the ground truth boundary of an object.

- Find the overlap (intersection area) between predicted and ground truth boxes
- Find the total area both boxes cover together (union area)
- Divide overlap by total area
IoU Formula:
IoU = Area of Intersection / Area of Union
- Intersection Area: The overlapping region shared by both the predicted and ground truth bounding boxes.
- Union Area: The total area covered by both bounding boxes combined, accounting for the overlapping region only once.
This calculation results in an IoU score between 0 and 1, where:
- 0: Indicates no overlap between the predicted and ground truth bounding boxes.
- 1: Represents a perfect match, signifying complete overlap.

Understanding NMS (Non-maximum Suppression)

Object detection models often generate multiple bounding boxes for the same object, especially when the model is not perfectly accurate. These overlapping boxes can lead to inaccurate counts and reduced precision in object detection.
NMS addresses this issue by selecting the best bounding box for each object and discarding the redundant ones. It works by:
NMS Algorithm:
- Perform NMS separately for each class
- Sort the list of predictions based on their confidence score
- Remove the highest confidence score from the prediction list and add it to the detection list.
- Remove all predictions having high overlap with the selected highest confidence Prediction.
Algorithm Example:
Step 1: In the image below, there are two men and one car visible. However, there are several predicted boxes for the same object, which is a significant problem.

Step 2: Our NMS algorithm separates the classes and sorts the predicted bounding boxes based on the confidence score.

Step 3: For each class, we take the first and highest confidence score and add it to the output detection list. We then remove any overlapping boxes that have a high overlap with the highest confidence boxes. To determine this, we define a pre-defined overlap ratio; if the overlap exceeds this ratio, we remove the corresponding bounding box. We repeat this process for every class of object until the prediction list is empty.
In our prediction process, we first select the person class with the highest confidence and then eliminate all predictions that have significant overlap with this selected prediction.




Finally, we obtain our output detection list from the image using the NMS algorithm.
IoU Threshold, Confidence Threshold
IoU threshold decides if a predicted bounding box matches the ground truth based on overlap: predictions with IoU above the chosen value (like 0.5 or 0.75) are counted as true positives.
Confidence threshold determines which model predictions are accepted: raising it means only high-confidence detections are kept, which typically increases precision but lowers recall because more detections are filtered out.
Precision vs Recall Explained
- Accuracy, Precision, Recall :
Accuracy: Accuracy measures the proportion of total predictions that the model got correct. Accuracy can be misleading in cases of imbalanced datasets, where one class is much more frequent than the other.
Precision: ” Of all the instances the model predicted as positive, how many were actually positive?”. Precision is useful when the cost of a false positive is high.
Recall: ” Of all the instances that were actually positive, how many did the model correctly identify?”. Recall is important when the cost of a false negative is high.

Precision Formula:
Precision = TP / (TP + FP)
Recall Formula:
Recall = TP / (TP + FN)

High precision indicates accurate predictions, but low recall means that not every object is detected.

High recall indicates that every object in the image is detected, but low precision results in many bounding boxes that are false positives.

Essentially, we want a model that has high recall and high precision so we can make accurate predictions.
- Average Precision (AP) and mean Average Precision(mAP):
Average Precision (AP) is a metric for evaluating object detection models that quantifies the trade-off between Precision and Recall across a range of confidence scores for a specific class, ultimately representing the area under the precision-recall curve.
The area under the Precision-Recall (P-R) curve, known as Average Precision (AP), serves as a single metric to summarize the performance of an object detection model. AP is used to assess the performance of deep learning models in relation to the number of training images. A model that maintains high precision at all levels of recall will achieve a high AP score, while methods that yield high precision with only a subset of detections will not perform as well.
In multi-class object detection challenges, such as PASCAL VOC, mean Average Precision (mAP) is commonly utilized to evaluate model performance. mAP is calculated as the mean of AP across all classes.
Difference Between AP and mAP:
AP (Average Precision) measures a model’s performance for a single object class, while mAP (mean Average Precision) is the average of the AP values across all object classes in a multi-class scenario. AP summarizes the precision-recall curve for one class, while mAP provides an overall evaluation of the model’s performance on all classes by averaging these individual AP scores.
Calculate AP And MAP:
- Generate the prediction scores using the model.
In the image below, we have two classes: Person and Car, to detect using an object detection model like YOLO.

The image below shows the ground truth bounding boxes used for evaluating the model’s detection performance.

Here are the red boxes representing the objects predicted by the YOLO model. We now need to identify the number of distinct object classes and the confidence score or prediction score for each of the predicted boxes.

2. Convert the prediction scores to class labels.
We can now see the different classes and confidence scores of each box. In the image below, we have two classes: Person and Car. Next, we will look at how to calculate the confusion matrix.

3. Calculate the confusion matrix — TP, FP, TN, FN.
In this step, we will learn how to calculate true positives and false positives. We begin by setting an Intersection over Union (IoU) score, and in our case, we have chosen a threshold of 0.65.
Next, we select one class from the predictions and arrange it in decreasing order based on the prediction scores. For this example, we will focus on the Person class and choose the bounding box (BBOX) with the highest prediction score.
After selecting the highest prediction score bounding box, we calculate the IoU for each ground truth bounding box corresponding to the person class. We take the highest IoU value from this calculation. If the IoU score exceeds the threshold (in this case, 0.65), the predicted bounding box is considered a True Positive (TP). For instance, since we obtained an IoU score of 0.71, which is greater than 0.65, this predicted box is classified as a True Positive.

To analyse the second-highest predicted boxes from the Person class, we calculate the Intersection over Union (IoU) score. After determining the IoU score, we classify the result as either a True Positive (TP) or a False Positive (FP). In this case, the IoU score is less than 0.65 but greater than 0.59, which categorizes it as a False Positive.

We will calculate the True Positives (TP) and False Positives (FP) for the Person Class. Then, we will create cumulative TP and FP columns from top to bottom.

4. The model evaluation helper metrics — IoU, Confusion Matrix, Precision, and Recall.
Now we calculate Precision and Recall.
precision = (True Positive) / ( True Positive + False Positive)
recall= (True Positive) + num( Ground truth)
Let us take the 3rd person in the blow table now, we calculate
precision = (2)/(2+1)=0.67, and recall (2)/(5)=0.4, 5 is the number of GT person boxes in the Actual image.

5. Plot the Precision-Recall curve.
Every pair of Precision and Recall values is plotted in a 2D coordinate system, with Recall on the x-axis and Precision on the y-axis. Once we have the (Recall, Precision) pairs for the five points, we will draw the graph on the 2D coordinate system and find the area under the curve. This area represents the Average Precision (AP) for the person class.


6. Calculate Average Precision (AP) using the PASCAL VOC 11-point interpolation method.
- Identify 11 equally spaced recall values from 0 to 1: 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0.
- For each of these 11 recall points, find the maximum precision value from that point onwards in the P-R curve. This is the “interpolated precision”.
- Formula : Precision[i]=max( Precision[i+1],Precision[i] )
- For instance, for recall point 0.3, you would find the highest precision among all points on the curve with a recall of 0.3 or greater.
- Sum the 11 interpolated precision values found in the previous step.

- Formula: Precision[i]=max( Precision[i+1],Precision[i] )
- Divide the sum by 11 to get the Average Precision.
- AP = (1/11) \ Σ (Interpolated Precision at Recall = 0.0 to 1.0)

7. Find mean Average Precision (mAP) by averaging APs.
To calculate the mean Average Precision (MAP), we determine the Average Precision for each class in the image dataset. Here, n signifies the total number of classes present in the dataset.
IoU threshold explanation( Why 0.5, 0.65, 0.75 matter):
IoU threshold is a decision boundary that determines whether your model’s prediction is considered “correct” or “wrong”.
If IoU ≥ Threshold → ✅ True Positive (Good Detection)
If IoU < Threshold → ❌ False Positive (Bad Detection)
Common Threshold Values:
- IoU ≥ 0.5: Loose evaluation — Detection considered good if 50% overlap
- IoU ≥ 0.7: Stricter evaluation — Requires 70% overlap
- IoU ≥ 0.9: Very strict — Demands near-perfect alignment
Real-World Impact:
Lower Threshold (0.5): More detections marked as “correct”.Higher precision/recall scores.Good for general object detection.
- Higher Threshold (0.7–0.9): Fewer detections marked as “correct”. Lower precision/recall scores. Better for precise applications (medical imaging, autonomous driving).
mAP@0.5:0.95 explanation:
mAP@0.5:0.95 (COCO evaluation) is calculated by first measuring Average Precision (AP) for each object class at ten different Intersection over Union (IoU) thresholds (from 0.5 to 0.95 in steps of 0.05). For every class, AP is the area under the precision-recall curve considering only detections with IoU above the current threshold. After getting AP values for all thresholds and all classes, they are averaged: first across thresholds for each class, then across all classes. This gives a single score that reflects balanced model performance for both loose and tight localization — all in one number.
mAP@0.5:0.95 is like getting grades from 10 different teachers (each with different strictness levels) and taking the average. It gives a more honest, balanced evaluation of your model’s true performance! 🎯
Real Word Example

In this third column, we see that mAP@0.5 indicates an IOU threshold of 0.5 when calculating the Average Precision (AP) for each class. This single number can determine how well your object detection models perform. That is the significance of mean Average Precision (mAP).
Conclusion
Congratulations! You’ve just mastered the essential object detection evaluation metrics. 🎉
Now you understand that IoU measures overlap quality, NMS eliminates duplicate detections, Precision tells you how accurate your predictions are, Recall shows how many objects you found, AP summarizes performance per class through the precision-recall curve, and mAP gives you the overall model score by averaging AP across all classes.
Remember: mAP@0.5 is lenient evaluation, mAP@0.75 is stricter, and mAP@0.5:0.95 (COCO standard) averages performance across 10 different IoU thresholds for the most balanced assessment. When you see your YOLO model reporting mAP = 0.68, you now know exactly what that number means and how it was calculated!
Key takeaway: These metrics aren’t just numbers — they’re your guide to understanding whether your object detection model is ready for real-world deployment. Higher mAP means better overall performance, but always consider your specific application needs when choosing IoU thresholds.
Now go forth and evaluate your models like a pro!
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
If you have any questions, feel free to ask me. Buy some coffee🍵 for me.
Thank you for visiting! I plan to add more questions in the future. If you enjoyed this, please follow me on Medium for more updates.
Check Out My Other Blogs
- ML Model Evaluation Made Simple: Accuracy vs Precision vs Recall
- CNNs Explained: How Convolutional Neural Networks Actually Work
- PyTorch Mastery: Complete Deep Learning Guide (2025 Edition)
- Master LangChain in 2025: From RAG to Tools (Complete Guide)
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.