Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

70+ Image Classification Datasets from different industry domains — Part 2
Computer Vision

70+ Image Classification Datasets from different industry domains — Part 2

Last Updated on February 6, 2021 by Editorial Team

Author(s): Abhishek Annamraju

A list of single and multi-class Image Classification datasets (With colab notebooks for training and inference) to explore and experiment with different algorithms on!

Free to use Image. Credits

In the part-1 of this two-part blog series, a list of object detection datasets were presented. In this second part, a list of image classification type datasets is provided along with training and inferencing codes.

An object recognition system involves localizing an object of interest and then tagging it with a label. An image classification system can be considered as an application that attaches single or multiple tags to an image, for example,
* Analysing a pic is of a dog or a cat
* Distinguishing a cancerous cell from a normal one
* Attaching multiple tags based on daylight time (day, night, evening), scene type (indoor, garden area, on-road), quality of the image, etc

One tackle an object recognition problem using complex algorithms such as SSD, EfficientDet, Mask-Rcnn, Yolo, Retinanet, etc. Whereas while taking on an image classification challenge you depend more on neural network (CNNs most of the time) architecture such as Densenets, Resnets, Mobinets, Vgg-nets, etc. You may approach the training using transfer learning where you pre-train your model on a large dataset so that it learns how to extract important features from an image. Or, you design your own network and train it from scratch.

And as mentioned in the blog-1 as well, it is really important to test your theoretical knowledge on datasets from different domains. The way you handle medical imaging dataset tends to differ from the way you handle a dataset of fashion products.

Our opensource team at Monk Computer Vision Org compiled this list of image classification datasets and created short tutorials over each of them for you to utilize these datasets and try out different transfer learning experiments with varied hyperparameters

In this blog, datasets from following industries are listed

★ Art
★ Agriculture
★ Automobile and Advanced Driver Assistance Systems
★ Fashion
★ Food and Groceries
★ Wildlife
★ Sports
★ Satellite Imaging
★ Medical Imaging and Healthcare
★ Security and Surveillance
★ Scene type understanding

….. and much more!!!!!

The complete list at one place is available on Github with associated usage instructions and training codes

Automobile and ADAS Related Datasets

A) German Traffic Sign Classification Dataset

Demo

* Goal — To classify traffic sign types
* Application — Essential for autonomous vehicles and adas systems to classify traffic signs pose detection to allow smooth traffic passage
* Details — 50K+ images with 40+ classes
* How to utilize the dataset and create a classifier using Pytorch’s Resnext pipeline

B) Driver distraction monitoring dataset

Demo

* Goal — To monitor driver activities
* Application — Essential for alerting the driver of any distractions while driving
* Details — 20K+ images with 10+ classes of distraction such as talking on the phone, operating radio, etc
* How to utilize the dataset and create a classifier using Keras’s Resnet pipeline

C) Vehicle Make Model Type Classification

Demo

* Goal — To classify vehicle type, it’s make, model and body type
* Application — Essential for traffic analysis and tool booths for automated taxations
* Details — 50K+ images with multi class type labels
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

D) MIO-TCD Vehicle Type Classification from Traffic Cam Videos

Demo

* Goal — To classify vehicle type, captured from cctv traffic cams
* Application — Essential for traffic analysis
* Details — 130K+ images with 11 vehicle classes type
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

Animals Related Datasets

A) Simple Cats & Dogs Dataset

* Goal — To differentiate between images of dogs and cats
* Application — Sorting a large database of images
* Details — 10K+ images with 2 classes
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

B) Monkey Species Classification Dataset

Demo

* Goal — To classify images into 10 different monkey species
* Application — Sorting a large database of images, tracking endangered species
* Details — 1K images spread over 10 classes
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

C) Stanford Dog Breed Dataset

Demo

* Goal — To classify images into 120 different dog breeds
* Application — Sorting a large database of images, tracking different breeds
* Details — 20K+ images spread over 120 classes
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

D) Oregon Wildlife Classification Dataset

Demo

* Goal — To classify wild animals into 20 different types
* Application — Tracking animals in the wild
* Details — 10K+ images spread over 20 classes
* How to utilize the dataset and create a classifier using Pytorch’s Densenet Pipeline

E) 225 Bird Species Dataset

Demo

* Goal — To classify different bird species
* Application — Tracking birds in the wild, auto-tag images with specie types
* Details — 30K+ images spread over 225 classes
* How to utilize the dataset and create a classifier using Mxnets’s Resnet Pipeline

* Another such bird specie dataset and associated training code

F) Snake Species Classification Dataset

* Goal — To classify different snake species
* Application — Tracking snakes in the wild, monitoring endangered species
* Details — 240K+ images spread over 700+ classes
* How to utilize the dataset and create a classifier using Pytorch’s Resnet Pipeline

G) Butterfly Specie Classification Dataset

Demo

* Goal — To classify different butterfly species
* Application — Tracking butterfly in the wild, monitoring endangered species
* Details — 2K+ images spread over 50+ classes
* How to utilize the dataset and create a classifier using Mxnets’s Densenet Pipeline

Medical Imaging Related Datasets

A) Malarial Cellular Image Dataset

Demo

* Goal — To detect if a cell is infected with malaria or not
* Application — Early detection of presence of malaria in cells
* Details — 25K+ images with 2 different classes
* How to utilize the dataset and create a classifier using Pytorch’s Densenet Pipeline

B) Skin Cancer Mnist HAM10000 Dataset

Demo

* Goal — To detect if a cell is infected with malaria or not
* Application — Early detection of presence of malaria in cells
* Details — 10K images with 10+different classes
* How to utilize the dataset and create a classifier using Pytorch’s Resnet Pipeline

C) Blood Cell Sub-Type Classification

Demo

* Goal — To detect blood cell components
* Application — Automated classification helps in expediting certain pathological processes
* Details — 3K images for 4 classes of cell types — Eosinophil, Lymphocyte, Monocyte, and Neutrophil
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

D) Pneumonia Chest X-Ray Dataset

Demo Image. Credits

* Goal — To differentiate between a normal and pneumonia chest x-rays
* Application — Quick initial testing for early diagnosis
* Details — 5K+ images for 2 different classes
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

* Another related dataset is Covid Chest X-Ray Dataset and associated training code

E) Breast Histopathology Image Dataset

Demo

* Goal — To detect instance of Invasive Ductal Carcinoma
* Application — Quick initial testing for early diagnosis
* Details — 5K+ images for 2 different classes
* How to utilize the dataset and create a classifier using Keras’s Mobilenet V2 Pipeline

F) Retinal OCT Image Dataset

Demo. Credits

* Goal — To classify retinal oct images into 4 different retinal diseases— NORMAL, CNV (choroidal neovascularization), DME (diabetic macular edema), DRUSEN
* Application — Quick initial testing for early diagnosis
* Details — 80K+ images for 4 different classes
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

G) APTOS Blindness Detection Dataset

Sample Image. Credits

* Goal — To detect severity of blindness based on images captured using fundus photography
* Application — Quick initial testing for early diagnosis
* Details — 3.5K+ images for 5 different classes
* How to utilize the dataset and create a classifier using Keras’s Resnet Pipeline

* Another similar dataset is Diabetic Retinopathy Dataset and training code on it

H) Cataract Detection Dataset

Demo

* Goal — To detect presence of cataract and glaucoma dataset
* Application — Quick initial testing for early diagnosis
* Details — 1K+ images for 4different classes
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

I) Intel and MobileODT Cervical Cancer Dataset

Credits

* Goal — To detect presence of cervical cancer
* Application — Cervical cancer is easy to prevent if caught in its pre-cancerous stage
* Details — 1K images for 3 classes — type-1, type-2, type-3
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

J) Human Protein Atlas Image Classification Dataset

Demo. Credits

* Goal — To predict protein organelle localization. The dataset comprises 27 different cell types of highly different morphology, which affect the protein patterns of the different organelles.
* Application — Identify a protein’s location(s) from a high-throughput image
* Details — 120K images for 27 different classes
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

K) Runmila AI Institute & minoHealth AI Labs Tuberculosis Classification Dataset

Demo

* Goal — To predict presence of tuberculosis in Chest X-Ray scans.
* Application — Quick detection can help with early diagnosis
* Details — 1K images for 2 different classes
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

Retail and Groceries Related Datasets

A) Food vs Non-Food Image Dataset

Demo

* Goal — To classify images with presence of food or not.
* Application — Auto-tag images for search and retrieval
* Details — 5K images for 2 different classes
* How to utilize the dataset and create a classifier using Mxnet’s Mobilenet V3 Pipeline

B) Freiburg Groceries Dataset

Demo

* Goal — To classify different grocery items in the image.
* Application — Auto-tag images for quick-checkout
* Details — 5K images for 25 different classes of products
* How to utilize the dataset and create a classifier using Mxnet’s Mobilenet V3 Pipeline

C) Fashion Product Image Dataset

Demo

* Goal — To add multiple tags to different fashion product items in the image.
* Application — Auto-tag images for better search and retrieval
* Details — 44K images with multiple tags per images
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

D) Apparel Images Dataset

Demo

* Goal — To classify different apparel items in the image.
* Application — Auto-tag images for better search and retrieval
* Details — 10K images with 20+ single label tags
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

E) Zalando Store Fashion Image Dataset

Demo

* Goal — To classify different clothing items in the image.
* Application — Auto-tag images for better search and retrieval
* Details — 10K+ images spread over 6 different types of clothing
* How to utilize the dataset and create a classifier using Keras’s VGG-Net Pipeline

F) Food-101 Dataset

Demo

* Goal — To classify different food items in images.
* Application — Auto-tag images for social media posts
* Details — 101K images, 1K images for each of the 101 different classes
* How to utilize the dataset and create a classifier using Mxnet’s VGG-Net Pipeline

Agriculture Related Dataset

A) Rice (Leaf) Disease Detection Dataset

Demo

* Goal — To detect different rice plant diseases.
* Application — Early and accurate detection is essential to take necessary measures in saving the rest of the crop
* Details — 2K+ images spread over 3 types of diseases — Brown spots, Hispa, and Leaf Blast
* How to utilize the dataset and create a classifier using Pytorch’s VGG-Net Pipeline

B) Broad Leaved Dock Image Dataset

Demo

* Goal — To detect presence of broad leaved docks in images.
* Application — Automated detection helps weed sprayer target right locations
* Details — 2K+ images with 2 different classes
* How to utilize the dataset and create a classifier using Pytorch’s Densenet Pipeline

C) DeepWeeds Weed type Classification Dataset

Demo

* Goal — To recognize different weed species.
* Application — Automated detection helps weed sprayer target right locations
* Details — 17K+ images with 8 different classes of weed.
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

D) Leaf Snap Dataset

Demo

* Goal — To recognize different plant species.
* Application — Automated visual recognition helps with monitoring different species as per the need
* Details — 500+ images with 10+ different classes of plant species.
* How to utilize the dataset and create a classifier using Pytorch’s Resnet Pipeline

E) Plant Pathology FGVC7 Dataset

Demo

* Goal — To recognize plant diseases.
* Application — Early and accurate detection is essential to take necessary measures in saving the rest of the crop
* Details —3.5K+ images with 5+ different classes of plant diseases.
* How to utilize the dataset and create a classifier using Keras’ Resnet Pipeline

F) Aerial Cactus Identification Dataset

Reference Image. Credits

* Goal — To detect presence of cactus spread in satellite images.
* Application — Helps with understanding spread of cacti over arid and desert regions
* Details — 17K+ 32×32 patch images with 2 different classes.
* How to utilize the dataset and create a classifier using Pytorch’s Densenet Pipeline

G) Invasive species monitoring dataset

Demo

* Goal — To detect presence of invasive weed species Hydrangea in forests.
* Application — Early detection helps with taking proper measure to keep the forest ecosystem in balance
* Details —2.5K+ images with 2 different classes.
* How to utilize the dataset and create a classifier using Keras’ Resnet Pipeline

H) Plant Seedling Dataset

Demo

* Goal — To classify different plant species with images of seedlings and find presence of weed.
* Application — Early detection helps with taking proper measures to remove the unwanted plants
* Details — 2.5K+ images with 12 different classes of plants.
* How to utilize the dataset and create a classifier using Keras’ Mobilenet V2 Pipeline

I) CGIAR Plant Disease Classification Dataset

Demo

* Goal — To detect different leaf and stem diseases.
* Application — Early detection helps with taking proper measures to save rest of the vegetation
* Details — 500+images with 4 different classes of diseases.
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

J) Plant Village Plant Disease Classification

Demo

* Goal — To detect different leaf and stem diseases.
* Application — Early detection helps with taking proper measures to save rest of the vegetation
* Details — 1K+images with 10 different classes of diseases.
* How to utilize the dataset and create a classifier using Pytorch’s Resnet Pipeline

K) CGIAR Wheat Growth Prediction Dataset

Demo

* Goal — To monitor different stages of wheat growth.
* Application — Helps to keep track of yield
* Details — 15K+images with weekly labelled stages of crop growth.
* How to utilize the dataset and create a classifier using Pytorch’s Resnet Pipeline

L) Swedish Leaf Type Classification Dataset

Demo

* Goal — To classify different leaf species.
* Application — Monitor different species and take actions for their growth
* Details — 1K+images with 10+ leaf types.
* How to utilize the dataset and create a classifier using Pytorch’s Densenet Pipeline

M) PlantDoc Plant Disease Dataset

Demo

* Goal — To classify different plant diseases.
* Application — Early Detection helps with taking proper measures to save the crop.
* Details — 2.5K+images spread over 17 disease types.
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

Art and Animation Related Datasets

A) Lego Brick Type Classification Dataset

Demo

* Goal — To classify different lego types.
* Application — Monitor lego bricks in production line.
* Details — 1K+images spread over 5 lego crick types.
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

B) Architectural element type classification dataset

Demo

* Goal — To classify different architectural structure types.
* Application — Auto-tag natural images.
* Details — 10K+images spread over 10 different types of structures.
* How to utilize the dataset and create a classifier using Mxnet’s Mobilenet V1 Pipeline

C) Pokemon Classification Dataset

Demo

* Goal — To classify different pokemon types.
* Details — 7K+images spread over 20 different types of pokemons.
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

D) Simpsons Character Dataset

Demo

* Goal — To classify different simpsons characters.
* Details — 5K+images spread over 14 different characters.
* How to utilize the dataset and create a classifier using Pytorch’s Vgg-Net Pipeline

E) Art Type Classification

Demo

* Goal — To classify different art types — paintings, drawings, sculpture, iconography, graphic art.
* Details — 5K+images with 5 different classes.
* How to utilize the dataset and create a classifier using Pytorch’s Alexnet Pipeline

F) Hackerearth Autogala Competition Dataset

Demo

* Goal — To auto-tag images and classify into food items, attire, design and decorative items, etc.
* Application — Auto-tagging helps with better search and retrieval
* Details — 4.5K+images with 4 different classes.
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

Scene Type Understanding Dataset

A) Weather & Daylight Type Classification Dataset

Demo

* Goal — To classify images as per the weather.
* Application — Auto-tagging helps with better search and retrieval
* Details — 1K+images with 5+ different classes — sunrise, rainy, cloudy, evening, night, etc
* How to utilize the dataset and create a classifier using Pytorch’s Wide-Resnet Pipeline

B) Intel Image Classification Dataset

Demo

* Goal — To classify based on the place at which it was taken.
* Application — Auto-tagging helps with better search and retrieval
* Details — 25K+images with 6 different classes —urban area, forest, glacier, mountains, sea, etc
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

C) Places-365 Scene Recognition Dataset

Demo

* Goal — To classify based on the place at which it was taken.
* Application — Auto-tagging helps with better search and retrieval
* Details — 20K+images with a broad set of 365 scene types
* How to utilize the dataset and create a classifier using Mxnet’s Vgg16 Pipeline

D) House Room Type Classification

Demo

* Goal — To classify different house areas.
* Application — Tagging helps with attaching price tags to housing properties
* Details — 5K+images with 6 different scene types
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

* Another dataset for scene type recognition but for on-road adas imagery

E) UIUC Sport Event Type Classification

Demo

* Goal — To classify different sport events.
* Application — Auto-Tagging for better search and retrieval
* Details — 5K+images with 6 different scene types
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

Satellite Imagery Related Datasets

A) Planet Understanding Amazon Dataset

Demo

* Goal — To add multi-label tags based on weather and land-use
* Application — Monitor areas of amazon forests
* Details — 40K+images with multiple tags such as cloudy, clear, forest, rivers, agriculture, etc
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

B) HistAreal V1.0 Land Usage Classification Dataset

Demo

* Goal — To classify satellite imagery patches based on land-use
* Application — Monitor and keep track of how land is being used, as well as keep track of water reserve areas
* Details — 10K+images with multiple tags such as urban, forest, rivers, agriculture, etc
* How to utilize the dataset and create a classifier using Mxnet’s Vgg-Net Pipeline

* Another such dataset focusses on monitoring coffee fields in Brazil, associated training code

C) UC Merced Land Use Classification Dataset

Demo

* Goal — To classify satellite imagery patches based on land-use
* Application — Monitor and keep track of how land is being used, as well as keep track of water reserve areas
* Details — 2K+images with multiple tags such as urban, forest, rivers, agriculture, etc
* How to utilize the dataset and create a classifier using Pytorch’s Resnet Pipeline

Other Datasets

A) CIFAR-10 Dataset

Demo

* Goal — To classify Images Based on contents
* Details — 60K+images 10 classes
* How to utilize the dataset and create a classifier using Pytorch’s Vgg-Net Pipeline

More Datasets on similar path
* Cifar-100 Dataset and associated training code
* STL-10 Dataset and associated training code
* Caltech-256 Dataset and associated training code using Pytorch’s ShuffleNet Pipeline
* Natural-Images-10 Dataset and associated training code

B) Hand-Written Math Symbol Dataset

Demo

* Goal — To classify hand-written math symbols
* Application — Digitize hand written math text
* Details — 1K+ 45×45 sized images with 20+ different symbol classes
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

C) Face Mask Dataset

Demo

* Goal — To classify whether people are wearing face masks or not
* Application — Monitor if proper protection is being taken care off
* Details — 500+ images with 2 different classes
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

D) American Sign Language Alphabet Dataset

Demo

* Goal — To classify different ASL alphabets
* Application — Basic element in complete sign language recognition demo
* Details — 85K+ images with 26 different alphabet classes
* How to utilize the dataset and create a classifier using Mxnet’s Densenet Pipeline

E) Yoga-82 Pose Estimation Dataset

Demo

* Goal — To classify different Yoga Posses
* Application — First step in analysing different pose estimations
* Details — 20K+ images with 82 different yoga pose classes
* How to utilize the dataset and create a classifier using Pytorch’s Densenet Pipeline

F) Hackerearth Dance Pose Identification Challenge

Demo

* Goal — To classify different dance styles
* Details — 300+ images with 8 different dance style classes
* How to utilize the dataset and create a classifier using Pytorch’s Resnet Pipeline

G) Bill Type Classification Dataset

Demo

* Goal — To classify different bill receipts
* Application — First step in reading different bill receipts
* Details — 500K+ images spread over 4 different classes of receipts
* How to utilize the dataset and create a classifier using Pytorch’s Resnext Pipeline

H) IEEE Camera Model Type Classification Dataset

Demo

* Goal — To predict which phone’s camera the images were taken
* Details — 500+ images spread over 4 different classes of receipts
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

I) Fire Presence Detection Dataset

Demo

* Goal — To detect presence of fire in images
* Application — Early detection is ciritcal for saving life and property
* Details — 500+ images from different scenarios
* How to utilize the dataset and create a classifier using Keras’ Densenet Pipeline

J) Bengali text Grapheme Classification Dataset

Demo

* Goal — To detect grapheme root, vowel diacritic and consonant diacritic type in reading Bengali language text
* Application — Crucial step in OCR and NLP applications
* Details — 10K+ images with multi-class labels
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

K) Russian Handwritten Digits Classification Dataset

Demo

* Goal — To detect different Russian language characters
* Application — Crucial step in OCR and NLP applications
* Details — 2K+ images with 30+ different labels
* How to utilize the dataset and create a classifier using Mxnet’s Resnet Pipeline

L) Microsoft Image Understanding Dataset

Demo

* Goal — To detect important elements in images
* Application — Auto-tag images for better indexing
* Details — 2K+ images with 5+ different labels
* How to utilize the dataset and create a classifier using Pytorch’s Resnet Pipeline

M) Office Home Dataset

Demo

* Goal — To classify common objects found in offices and homes
* Application — Auto-tag images for better indexing
* Details — 5K+ images with 10+ different object labels
* How to utilize the dataset and create a classifier using Pytorch’s Vgg-Net Pipeline

Appendix

For more details on the tutorials visit our Github page

Tutorial Credits to all the opensource contributors at the Monk Image Classification Library


70+ Image Classification Datasets from different industry domains — Part 2 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->