Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Six warnings You Ignore That Might Put Image Classification Dataset at risk
Latest

Six warnings You Ignore That Might Put Image Classification Dataset at risk

Last Updated on October 22, 2021 by Editorial Team

Author(s): Gaurav Sharma

Deep Learning

β€œOpportunity never knocks twice,” as the saying goes, but in the hands of image annotators, this clear-cut leaflet will assist the data scientists in addressing gaps in the training datasets that were left neglected or disregarded throughout the image cleaningΒ process.

The sole obligation of an image annotator working on an image classification assignment is not just to complete the picture labelling task at hand. But also to tell data scientists about the following alarms, which, if not handled immediately, may present unanticipated dangers in the datasets.

1. There is an excessive amount of β€œduplication”

Duplication basically indicates that there are a lot of pictures in the dataset that are repeating/reoccurring in the same class/classes across theΒ dataset.

It might be due to a variety of factors, such as the data scientist scraping the same homepage with photos numerous times or the identical photographs being available on two distinct webΒ pages.

Alternatively, the open dataset that the data scientist gave to the labelling team for the custom labels was not properlyΒ cleaned.

Whatever the reason, repeated images make it difficult for a Data Scientist’s Machine learning model to generalize because it is always learning the same information.

2. Images that are fuzzy, unless the entire dataset isΒ blurry.

When dealing with a computer vision use case, the Machine Learning model will be unable to extract prescriptive information or characteristics about the item of interest from fuzzy or pixelated pictures due to a lack of visualΒ clarity.

As a result, labellers must tell the Data Scientist about the situation and allow them to take the appropriate action.

But here’s the catch: if the whole dataset is fuzzy, it’s possible that the Data Scientist is working on a production use case that necessitates image blurriness; in that instance, just confirm with the Data Scientist.

3. There are too many instances that areΒ unclear.

The quality of the inputs supplied to any Machine Learning model for learning a certain task is the model’s advantage.

If the Data Scientist gives the Annotation team a dataset with too many ambiguous instances, such as those seen in the figureΒ below.

The data labellers just need to express their concerns to the Data Scientist and question him or her about the next best set of instructions.

4. Bias in the dataset toward a specificΒ class.

This is the warning in which data labellers must use extreme vigilance.

That is why, while labelling image classification datasets or any other computer vision dataset, data labellers should keep this inΒ mind.

If people see that one class has an excessive number of images in comparison to other classes/classes.

They must then notify the Data Scientists team as soon as possible. Otherwise, this dataset will be used to create a Machine Learning Model that favours the class with the most pictures in the dataset over the other class/classes.

In other words, the Machine Learning Model will favour that specific class. Following the implementation of that AI Model, might result in a loss of income or public relations setback.

5. The item of interest or the class to be labelled appears to beΒ blurry.

This situation is more commonly seen at the class level than at the picture level. As a result, while doing the picture labelling task.

If the data labeller notices that the object of interest or the class(es) to be classified in the dataset seems hazy or indistinct across theΒ image.

Then they should simply notify the Data Scientist about it and seek his or her opinion on how toΒ proceed.

The Data Scientists team may decide to replace or delete the pictures from the ongoing collection.

6. The designated object of interest or class is only partially visible.

β€œHalf Knowledge is Dangerous,” as the saying goes, and this is true for every computer vision datasets in the world. If the image is not clearly visible then it might hamper the overallΒ result

In this case, the image annotator should notify the Data Scientist. So that she or he may take the necessary steps to address these types of missing context pictures in their Image Classification Dataset.

EndNote

I’m hoping that the next time a Data Scientist assigns an Image Classification assignment, he or she will pass on the information of these signals to their data annotation team. It will ultimately assist various organizations’ Machine Learning teams in developing Datasets that offer a genuine and full image of the Objects of Interest. Cogito Tech LLC provides accurate and quality training datasets for ML and AIΒ models.


Six warnings You Ignore That Might Put Image Classification Dataset at risk was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓