What Makes Computer Vision AI Development So Risky?
Last Updated on July 17, 2023 by Editorial Team
Author(s): Sumit Singh
Originally published on Towards AI.
Reports suggest 80% of computer vision development effort goes directly into labeling large volumes of data.
But if you talk to any ML practitioners they would unanimously say that this is the worst part of the computer vision data pipeline.
No one wants to do it, for ML heads this turns into a sticky situation as it is the most time-consuming, challenging, and crucial part, but it is the most uninteresting task for their team members.
So what options do they have?
Most of the team generally have 2 options-
- Hire an in-house team of annotators and domain experts that can help the ML team to generate high-quality training data in large volumes.
Or
2. Hire some 3rd party annotation BPO to do it for the team.
Why itβs not always easy to have option 1 all the time?
Because it will take time and effort to hire a big team, and you are not even sure how long you can keep them.
Also, it comes with a huge cost.
So option 2 it is so simple! Right?
No! Contracting 3rd party BPO company can help you with hiring and getting them trained, but it might jeopardize the companyβs proprietary data and also you might lost the visibility of the annotation work.
And ensuring the quality of training data generated is a huge effort in it, that ML heads simply can not outsource.
Making the right decision is very critical, and generally ML heads will have to wait for a few quarters to assess if the decision was right or wrong.
We donβt have to emphasize that the wrong decision can cost these ML heads their job, and in few cases, the whole team gets shelved.
Thatβs what makes computer vision AI development very risky.
Letβs asses the scenario in detail
What are the factors ML heads must consider-
- A large amount of data is required to train the models, which can be difficult and expensive to acquire. But you can not avoid it.
- Another is the need for high computational power, which can be costly and time-consuming to implement.
- Additionally, there are many technical challenges that must be overcome, such as dealing with variability in lighting conditions and image resolution and ensuring that the models are robust and able to generalize to new situations.
- Data privacy and security if ML heads decided to outsource the labeling to BPO or choose a platform for that matter.
- Ensuring that annotated data met with the guidelines because bad data means bad models.
- Finally, there are also ethical and legal considerations, such as ensuring that the models are not biased and do not violate privacy laws.
How to choose wisely?
It is very subjective and depends on case to case basis. Most often, itβs not either of the options but rather the combination of both.
Here are many processes successful teams generally follow-
- Start Small: At the initial stage, the team must assess the open-source models to build a proof-of-concept. They must try to get data internally or with open data set to get a small amount of training data to train the model.
- Start looking at tool and BPO from day 1: The team might not get enough training data or a ready-to-use model in open source, so they must start engaging with some tool providers with small ticket sizes and keep them engaged. It will help them to quickly scale once POC gets successful.
- Keep the better vendors engaged: Assess multiple platforms and benchmark them.
- Donβt rely too much on model tuning: Donβt look to hire in-house data scientists hoping they will tune the model to get better predictions. Most experience ML heads embraced the Data-centric AI approach to get their model better.
- Look from Subject Matter Experts now: SMEs are the ones that would help you generate the ground truth and help the team with assessing the quality of training data.
- Prepare a guideline as objective as possible: With the help of SMEs, build a guideline that will guide 3rd party annotators to label data correctly. Spend as much time as possible on this.
- Keep the data secured all the time: Ask for proper information security documents from the tool providers. Try to choose a platform that can manage the platform and human resources by itself. It helps ML heads to manage the project more efficiently. Have a proper NDA with the vendors.
Conclusion
Computer vision AI development is generally very experimentation and full of surprises. It takes years of effort to make the right model which could scale and give business value to the organizations.
Follow the above mention process to exceed the chances of success.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI