Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Explainable Defect Detection Using Convolutional Neural Networks: Case Study
Latest

Explainable Defect Detection Using Convolutional Neural Networks: Case Study

Last Updated on January 17, 2022 by Editorial Team

Author(s): Olga Chernytska

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Deep Learning

Train object detection model without having any bounding boxes labels. This post shows the power of Explainable AI.

Image byΒ Author

Despite being extremely accurate, neural networks are not that widely used in the domains, where prediction explainability is a requirement, such as medicine, banking, education, etc.

In this tutorial, I’ll show you how to overcome this explainability limitation for Convolutional Neural Networks. And it isβ€Šβ€”β€Šby exploring, inspecting, processing, and visualizing feature maps produced by deep neural network layers. We will go through the approach and discuss how to apply it to a real-world taskβ€Šβ€”β€ŠDefect Detection.

I’ve created a Github repository for this project, where you can find all data preparation, model, training, and evaluation scripts.

Contents
β€Šβ€”β€ŠTask
β€Šβ€”β€ŠTraining Pipeline
β€Šβ€”β€ŠInference Pipeline
β€Šβ€”β€ŠEvaluation
β€Šβ€”β€ŠConclusion

Task

You are given a 400-image dataset, that contains images of good items (labeled as class β€˜Good’) and items with a defect (labeled as class β€˜Anomaly’). Dataset is imbalancedβ€Šβ€”β€Šwith more samples of good images than defective ones. Item in the image may be literally of any type and complexityβ€Šβ€”β€Šbottle, cable, pill, tile, leather, zipper, etc. Below is an example of how the dataset may lookΒ like.

Image 1. Subset β€œPill” from MVTEC Anomaly Detection Dataset. Image byΒ Author

Your task is to build a model, that classifies images into β€˜Good’ / β€˜Anomaly’ classes and returns a bounding box for the defect if the image is classified as an β€˜Anomaly’. Even though this task may look simple, like a typical object detection task, there is an issueβ€Šβ€”β€Šwe do not have labels for boundingΒ boxes.

Fortunately, this task is solvable.

Image 2. Model is expected to predict class β€˜Good’ / β€˜Anomaly’ and localize a defect region for an β€˜Anomaly’ class. No bounding boxes are provided during training, only class labels. Image byΒ Author

Training Pipeline

Disclosure: I am not sharing my real commercial project, but showing how to explain the classification model predictions in general, so this may be used in many domains and tasksβ€Šβ€”β€Šnot only manufacturing but medicine as well. I should also say that do not expect high accuracy here, because it’s my quick pet project. But you are free to use my results as a starting point for your project, invest more time and achieve the accuracy you needΒ :Β )

Data Preparation

For all my experiments I’ve used MVTEC Anomaly Detection Dataset (pay attention, it is distributed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, which means that it cannot be used for commercial purposes).

The dataset includes 15 subsets of different item types, such as Bottle, Cable, Pill, Leather, Tile, etc; each subset has 300–400 images totalβ€Šβ€”β€Šeach labeled as β€˜Good’/’Anomaly’.

Image 3. Samples from MVTEC Anomaly Detection Dataset:
upper rowβ€Šβ€”β€Šgood images, lower rowβ€Šβ€”β€Šimages with the defective items. ImageΒ Source

As a data preprocessing step, resize images to 224Γ—224 pixels to speed up training. Images in most subsets are of size 1024Γ—1024, but as defects are also of the large size, we may resize the image to a lower resolution without sacrificing model accuracy.

Consider using Data Augmentations. In general, appropriate data augmentations are ALWAYS beneficial for your model (BTW, check my post on data augmentation to learnΒ more).

But let’s assume that when deployed to production our model will β€œsee” the data of exactly the same format as in the dataset we have now. So, if images are centered, scaled, and rotated (as in Capsule and Cable subsets), we may not use any data augmentations at all, because test images are expected to be also centered, scaled, and rotated. However, if mages are not rotated (but only centered and scaled), as in Screw and Metal Nut subsets, adding Rotation as a preprocessing step to the training pipeline would help the model learnΒ better.

Image 4. Examples of subsets with preprocessed (centered, scaled, and rotated) and not preprocessed (not rotated) images. Visualizing the mean image helps understand whether images in the subset are preprocessed or not. Image byΒ Author

Split the data into train/test parts. Ideally, we would like to have train, validation, and test partsβ€Šβ€”β€Što train a model, tune hyperparameters and evaluate model accuracy, respectively. But we have only 300–400 images, so let’s put 80% of images into the train set and 20%β€Šβ€”β€Šinto the test set. For small datasets, we may perform 5-Fold cross-validation to make sure that evaluation results areΒ robust.

When dealing with an imbalanced dataset, train/test split should be performed in a stratified manner, so train and test parts will contain the same share of both classesβ€Šβ€”β€Šβ€˜Good’/’Anomaly’. Additionally, if you have information on the defect types (such as scratch, crack, etc), it’s better to do a stratified split based also on defect types, so train and test parts will contain the same share of items with scratches/cracks.

Model

Let’s take VGG16 pre-trained on ImageNet, and change its classification headβ€Šβ€”β€Šreplace Flattening and Dense layers with Global Average Pooling and a single Dense layer. I’ll explain in section β€œInference Pipeline” why we need these particular layers.

(This approach I’ve found in the paper Learning Deep Features for Discriminative Localization. In this post, I’ll go through all the important steps described in theΒ paper.)

We train the model as a typical 2-class classification model. The model outputs a 2-dimensional vector that contains probabilities for classes β€˜Good’ and β€˜Anomaly’ (with 1-dimensional output, the approach should also work, feel free toΒ try).

Image 5. Original VGG-16 architecture VS custom one. Image byΒ Author

During training, the first 10 convolutional layers are frozen, we train only the classification head and the last 3 convolutional layers. That’s is because our dataset is too small to finetune the whole model. Loss is Cross-Entropy; optimizer is Adam with a learning rate ofΒ 0.0001.

I’ve experimented with different subsets of the MVTEC Anomaly Detection Dataset. I’ve trained the model with batch_size=10 for at most 10 epochs and early stopping when train set accuracy reaches 98%. To deal with the imbalanced dataset, we may apply loss weighting: use higher weight for β€˜Anomaly’ class images and lowerβ€Šβ€”β€ŠforΒ β€˜Good’.

Inference Pipeline

During inference, we want not only to classify an image into β€˜Good’ / β€˜Anomaly’ classes but also to get a bounding box for the defect if the image is classified as an β€˜Anomaly’.

For this reason, we make the model in inference mode to output class probabilities as well as the heatmap, which later will be processed into the bounding box. Heatmap is created from the feature maps from deepΒ layers.

Step 1. Take all feature maps from Conv5–3 layer, after ReLU activation. For a single input, there will be 512 feature maps of size 14Γ—14 (input image of size 224Γ—224 was downsampled each time twice by 4 PoolingΒ layers).

Image 6. Feature Maps from Layer Conv5–3 (after ReLU activation).
There are 512 feature maps total, each of size 14Γ—14; visualized only some of them. Image byΒ Author

Step 2. Sum up all 512 feature maps from the Conv5–3 layer each multiplied by the weight in the Dense layer that affected the calculation of the β€˜Anomaly’ class score. Carefully look at Images 7 and 8 to understand thisΒ step.

Image 7. Detailed architecture of the last model layers (classification head). Image byΒ Author
Image 8. The final heatmap is calculated as the sum of Conv5–3 layer heatmaps each multiplied by the weight in the Dense layer that affected the β€˜Anomaly’ class score. Image byΒ Author

Why so? Now you’ll see why the classification head should have a Global Average Pooling Layer and a Dense Layer. Such architecture makes it possible to follow, what feature maps (and how much) affected the final prediction and made it to be an β€˜Anomaly’ class.

Each feature map (output of layer Conv5–3; see Image 6) highlights some regions in the input image. The Global Average Pooling layer represents each feature map as a single number (we may think about it as 1-D embedding). The dense layer calculates scores (and probabilities) for classes β€˜Good’ and β€˜Anomaly’ by multiplying each embedding by the corresponding weight. This flow is shown in ImageΒ 7.

So Dense layer weights represent how much each feature map affects the scores for β€˜Good’ and β€˜Anomaly’ classes (we are interested in the β€˜Anomaly’ class score only). And summing up feature maps from layer Conv5–3 each multiplied by corresponding weight from the Dense layerβ€Šβ€”β€Šmakes a lot ofΒ sense.

Interestingly, using Global Average Pooling but not Global Max Pooling is crucial to make the model find the whole object. Here is what the original paper Learning Deep Features for Discriminative Localization says:

β€œWe believe that Global Average Pooling loss encourages the network to identify the extent of the object as compared to Global Max Pooling which encourages it to identify just one discriminative part. This is because, when doing the average of a map, the value can be maximized by finding all discriminative parts of an object as all low activations reduce the output of the particular map. On the other hand, for Global Max Pooling, low scores for all image regions except the most discriminative one do not impact the score as you just perform aΒ max.”

Image 9. The final heatmap is calculated by summing feature maps multiplied by the weight in the Dense layer that affected the β€˜Anomaly’ class score. We may guess that feature maps, such as 139 and 181, have large positive weights during summation, feature map 172 has a large negative weight and feature map 127 probably has a weight close to 0, so it doesn’t affect how the final heatmap look. Image byΒ Author

Step 3. The next step is to upsample the heatmap to match the input image sizeβ€Šβ€”β€Š224Γ—224. Bilinear upsampling is okay, like any other upsampling method.

Coming back to the model output. The model returns probabilities for classes β€˜Good’ and β€˜Anomaly’ and a heatmap that shows what pixels were important when calculating the β€˜Anomaly’ score. Models return the heatmap always, no matter it classified the image as β€˜Good’ or β€˜Anomaly’; when class is β€˜Goodβ€™β€Šβ€”β€Šwe just ignore theΒ heatmap.

Image 10. Model in the inference mode should output β€˜Anomaly’ class heatmap. Image byΒ Author

The heatmaps look quite well (see Image 11), and explain what region made the model decide that the image belongs to the β€˜Anomaly’ class. We may stop here, or (as I promised) process the heatmap into a boundingΒ box.

Image 11. Heatmaps for some β€˜Anomaly’ class images. Image byΒ Author

From heatmaps to bounding boxes. You may come up with several approaches here. I’ll show you the simplest one. In most cases, it works prettyΒ well.

1. First, normalize the heatmap, so all the values are in the rangeΒ [0,1].

2. Select a threshold. Apply it to the heatmap, so all values larger than the threshold are transformed into 1s and smallerβ€Šβ€”β€Šinto 0s. The larger the thresholdβ€Šβ€”β€Šthe smaller the bounding box will be. I like how the results look when the threshold is in the range [0.7,Β 0.9].

3. We assume, that region of 1sβ€Šβ€”β€Šis a single dense region. Then plot a bounding box around the region, by finding argmin and argmax in heights and width dimensions.

However, pay attention that this approach can only return a single bounding box (by definition), so it would fail if the image has multiple defective regions.

Image 12. How to process heatmap into the bounding box. Image byΒ Author

Evaluation

Let’s evaluate the approach on 5 subsets from the MVTEC Anomaly Detection Datasetβ€Šβ€”β€ŠHazelnut, Leather, Cable, Toothbrush, andΒ Pill.

For each subset, I’ve trained a separate model; 20% of images were selected as a test setβ€Šβ€”β€Šrandomly and in a stratified manner. No data augmentations were used. I applied class weighing in loss functionβ€Šβ€”β€Š1 for β€˜Good’ class and 3 for β€˜Anomaly’, because in most subsets there are 3 times more good images than anomalous ones. The model was trained for at most 10 epochs with early stopping if train set accuracy reaches 98%. Here is my notebook with the trainingΒ script.

Below are the evaluation results. Train set size for subsets is 80–400 images. Balanced Accuracy is between 81.7% and 95.5%. Some subsets, such as Hazelnut and Leather, are easier for models to learn, while Pill is a relatively hardΒ subset.

Image 13. Evaluation results for 5 subsets from MVTEC Anomaly Detection Dataset. Image byΒ Author

That’s it with numbers, and now let’s see how predictions look like. In most cases model produces correct class prediction and precise bounding box if the class is an β€˜Anomaly’. However, there are some errors: they are either incorrect class prediction or wrong bounding box location when class is correctly predicted as an β€˜Anomaly’.

Image 14. Hazelnut subset: Predictions on the Test Set. Image byΒ Author
Image 15. Leather subset: Predictions on the Test Set. Image byΒ Author
Image 16. Cable subset: Predictions on the Test Set. Image byΒ Author
Image 17. Toothbrush subset: Predictions on the Test Set. Image byΒ Author
Image 18. Pill subset: Predictions on the TestΒ Set.

Conclusion

In this post, I wanted to show you that neural networks are not black-box algorithms as some may think, but are quite explainable when you know where to lookΒ πŸ™‚ And the approach described here is one of the many ways of how to explain your model predictions.

Of course, the model is not that accurate, mostly because it is my quick pet project. But if you would work on a similar task, feel free to take my results as a starting point, invest more time and get the accuracy youΒ need.

I am open-sourcing the code for this project to this Github repository. Feel free to use my results as a starting point for your projectΒ πŸ™‚

What’s next?

If you’d like to improve the accuracy of this Anomaly Detection model, adding data augmentationβ€Šβ€”β€Šis the place to start. I recommend you to read my postβ€Šβ€”β€ŠComplete Guide to Data Augmentation for Computer Vision. There you’ll find how to use Data Augmentations to benefit your model, and not to do harmΒ πŸ™‚

In case you are interested in case studies, check my tutorialβ€Šβ€”β€ŠGentle introduction to 2D Hand Pose Estimation: Approach Explained.

And subscribe to my Twitter or Telegram not to miss my new postsΒ πŸ™‚

References

[1] Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba: Learning deep features for discriminative localization; in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.Β pdf

[2] Paul Bergmann, Kilian Batzner, Michael Fauser, David Sattlegger, Carsten Steger: The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: International Journal of Computer Vision, January 2021.Β pdf

[3] Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger: MVTec ADβ€Šβ€”β€ŠA Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.Β pdf


Explainable Defect Detection Using Convolutional Neural Networks: Case Study was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓