Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Underwater Object Segmentation Using MonkAI
Computer Vision

Underwater Object Segmentation Using MonkAI

Last Updated on January 6, 2023 by Editorial Team

Author(s): Omkar Tupe

Photo by NOAA onΒ Unsplash

Computer Vision

Table ofΒ contents

  1. About theΒ project
  2. Monk Toolkit
  3. Segmentation
  4. Unet
  5. Inference based on an already trainedΒ model
  6. Training
  7. Inference(Post Training)
  8. Conclusion
  9. References

1. About theΒ project

This project focuses on segmenting different objects such as animals, plants, plastic, and ROV(Remotely Operated Vehicle) using a low code wrapper Monk [2]toolkit via Unet[1]. It is essential to understand the sea garbage collection. For employing an automatic river or sea trash cleaner system should have a proper understanding of different objects present in the water. This project helps to develop such a system on small scale. Through this blog, I will share some insights about MonkAI, and how it can be used to simplify the process of object segmentation and build other computer vision applications.

Tutorial available onΒ Github

2. Feature of MonkΒ AI[6]

  • Quick mode for beginners
  • It is possible to access PyTorch, MXNet, Keras, TensorFlow, etc. with a commonΒ syntax.
  • Standard workflows for simple transfer learning applications
  • For Competition and Hackathon participants: The hassle-free setup makes prototyping faster andΒ easier

3. Segmentation[4]

By segmentation, we can understand the category of each pixel which can help us to understand the location of the object in an image, shape of an object. Image segmentation helps to generate output a pixel-wise mask of the image. Image segmentation finds application in medical imaging, self-driving cars, and satellite imaging.

A. Semantic segmentation

Semantic segmentation helps to label each pixel of an image with a corresponding class of what being represented. The following picture can help us to understand the difference between object detection, semantic segmentation, and instance segmentation.

Object Detection vs Semantic Segmentation vs Instance Segmentation

B.Instance segmentation

For example, in the image above there are 3 people, technically 3 instances of the class β€œPerson”. But semantic segmentation does not differentiate between the instances of a particular class.

4. Unet

The Unet[1] was developed by Olaf Ronneberger et al. for BioMedical Image Segmentation. The architecture contains two paths.
The first path i.e encoder or contraction path is used to capture the context of the image. The encoder consists of a traditional stack of convolutional and max-pooling layers. The second path i.e decoder or symmetric expanding is used to enable precise localization using transposed convolutions.

In the original paper, the UNET is described asΒ follows:

5. Inference based on an already trainedΒ model

A. Installation instructions

For training the network a CUDA GPU is preferrable (which is also provided by Google Colab); but one can use a local device or Kaggle notebook. Now we will set up the MonkAI toolkit and dependencies on theΒ colab

B. Inference(Pre-trained model)

We need to set required libraries for inference and some hyper-parameters along with a dictionary ofΒ classes.

Downloading the pre-trained model.

Now we will define the model, backbone, and path for the pre-trained model.

From the unzipped folder, we are using some images for inference purposes.

Inference-1

Inference-2

6. Training

We are using a dataset from the Data repository[3]

Time to download ourΒ dataset.

Monk directory

Before generating the mak images we need to check whether the dataset is balanced orΒ not

β€œEveryone wants to be perfect. So why should our dataset not be perfect? Let’s make itΒ perfect”

In the given dataset, we can easily see that data is highly imbalanced which is harmful to better generalized accuracy. To achieve approximately equal accuracy for all classes we should have an equal number of objects from eachΒ class.

From the above stats, we can see that data is highly imbalanced especially rov and trash category objects are more as compared to the other classes. For demonstration purpose, we are using 4 classes i.e plant, rov, animal,Β trash

We are choosing 20 objects from each category and grouping them according to the mainΒ category

The above discussion is implemented through code onΒ Github.

4 Main categories

  1. ROV
  2. Plant
  3. Animal-animal_eel, animal_crab, animal_etc, animal_fish, animal_shells, animal_starfish
  4. Trash-trash_etc, trash_fabric, trash_fishing_gear, trash_metal, trash_paper,trash_plastic,trash_rubber,trash_wood

So in the final count, we have approximately 150 objects for each main category. Now based on balanced data we will make mask images of selected images. We are assigning pixelΒ value-

0- Background

1- ROV

2- Plant

3- Animal

4- Trash

We have 443 images having a total of 580 trainable objects this implies that we have more than 1 object in someΒ images.

Now we will generate mask images based on the above-selected images. (Code)For segmentation training, we need a path for original images as well as mask images. In the class dictionary, we have 5 categories with pixel-values from which we are excluding background for training, as we are interested in the main categories.

Monk is providing a wide range of backbones from which we are using efficientnetb3 along with the Unet model which is one of the recommended and image size-(384,384). We are setting the learning rate as 0.0001 and 120 epochs. (For detailed implementation please check the file).IoU=0.45 isΒ achieved

7. Inference(Post-training)

Now we are interested to understand the results of our trained model. It will be similar to the pre-trained model but now we will use our own trained model so the model path will be different.

  1. Set inference engine
  2. Define classes

3. Provide some images forΒ testing.

Test image-1

Test image-2

We can observe good results from the above test images. You can find more results onΒ Github.

8. Conclusion

As compared to other categories area covered by rov in images is more so we have more pixels for training so the result is slightly biased towards rov. We have obtained these results by simultaneously adjusting a large number of hyperparametersβ€Šβ€”β€Šwhich usually takes a long time to do. However, we were able to complete this challenging task within a considerably small time frame because of Monk. We created segmentation pipelines with just a few lines of code with Monk. Trying out multiple pairs of backbone and models can also help to get better results. Overall, Monk AI is a great library which considerably simplifies performing computer vision tasks. You can find the code in this articleΒ here.

For more examples of detection and segmentation, please visit the application modelΒ zoo.

Thanks for Reading! I hope you find this article informative & useful. Do share your feedback in the commentsΒ section!

9. References

  1. Unet-https://arxiv.org/abs/1505.04597
  2. Monk AI-https://github.com/Tessellate-Imaging/Monk_Object_Detection
  3. Dataset-https://conservancy.umn.edu/handle/11299/214865
  4. Segmentation-https://towardsdatascience.com/computer-vision-instance-segmentation-with-mask-r-cnn-7983502fcad1
  5. Features of Monk AI-https://devpost.com/software/monkai


Underwater Object Segmentation Using MonkAI was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓