Striking the Right Balance
Last Updated on July 4, 2022 by Editorial Team
Author(s): Supreet Kaur
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
ML or no ML?
In an era where Artificial Intelligence and ML are becoming like second nature to an organization, it is sometimes essential to step back and introspect on the relevance of machine learning to your use case. Data Scientists and Leaders are often at the crossroads of creating and deploying a machine learning model or sticking with simple data analytics.
This blog will serve as an end-to-end guide on making this tough yet crucial decision. The blog is divided into two parts: the first part will provide a few elements that can be assessed to see if you have the necessary details to leverage machine learning algorithms. The blog’s second part will guide how to choose between a simple or complex machine learning model.
Let’s start with the first part. Ask yourself the questions below to see if you have the right ingredients to build a machine learning model.
1. Availability of Data: We have all heard about “Garbage in, garbage out.” Data is an essential element in deciding the success of your model. So, it is imperative to thoroughly analyze what data is available for your relevant use case. Building a model is a data-intensive process, especially if it is a deep learning model. It is vital to analyze all the data sources in your respective data warehouses to check if you will be able to build any efficient data pipelines.
2. Quality of Data: Not just the quantity of data, but quality plays a critical role in deciding if you can opt for the AI route. It is of utmost necessity to perform data quality checks for sparsity, null values, accuracy, etc.
If internal data is absent, you can always explore the option of proxy or synthetic data.
3. Resources: Successful model-building project requires a strong team of Data Scientists, ML/Data engineers, Software Developers, Strategists, etc. Hiring the right resources is very important to ensure the successful delivery of your ML/AI project. Apart from the human resources, you should even check on the budget for platforms and cloud technologies that will be needed for the successful deployment of your project.
4. Regulatory Challenges: If you are a part of a highly regulated industry, it is crucial to understand the regulatory challenges you might face down the line or the approvals you need to deploy your model in production. Regulatory authorities might require you to build explainable models or perform fairness and bias testing on your model. It would be best if you accounted for the time to make that in your model-building process.
5. Complexity of Problem and Measurable Outcome: The most important and often undermined is the discussion on how complex my problem is. This is not something that should or can be determined in solace. You might want to have a thorough conversation with your business partners and stakeholders on the business impact of the problem along with the metric that you are trying to solve. The metric could be anywhere between user retention rate or increasing click-through rate.
Sometimes it might be a simple statistical technique that could solve the use case at hand, but once you marry all these elements, you will be able to make a thoughtful and calculated decision. ML deployment can be unpredictable, but at least you would know that you did your due diligence before going down the path of AI.
You might also be at a crossroads on whether to go ahead with the ML or the statistical models. ML aims to find patterns within the data to make meaningful predictions. At the same time, if you want to understand the relationship between those variables and see how significant those relationships are, it’s best to stick to statistical models. There are other differences as well. Statistical models have a few underlined assumptions that might not be true on industrial-level data, but that can vary depending on your use case.
Now let’s come to the second part of the blog. So we decide to go through the ML route. The question is whether to choose a simple or complex model. No one size fits all, so I cannot advocate choosing one over another. But you can make a calculated decision by analyzing the elements below.
1. Start with a simple model: It is always helpful to start with vanilla models. This step will help you assess your model performance and give you a baseline number. It is always preferred to provide a diverse array of models to your stakeholders so they can choose from them.
2. Accuracy vs. Interpretability: Data Scientists often deal with this tradeoff. Complex models often provide higher accuracy, but can you build an explainable model around them? It is of utmost importance that your senior leadership trusts your results, so you must unbox the black box for them. It is also essential for Data Scientists to understand the driver behind the decision. But if accuracy is your focus, feel free to skip this step.
3. Bias vs. Variance Tradeoff: This is a known but often encountered situation. We all know that complex models can lead to overfitting while simple models can lead to underfitting, so it is crucial to find the right balance. Some practical ways you might combat this problem are tuning your hyperparameters, ensuring that your data is balanced, performing appropriate test and train division, etc.
4. Speed or Training Time: Higher accuracy often means higher training time. Simple algorithms like Linear regression are easy to implement and quick to run, while complicated algorithms might take more training time. You might be able to do a workaround on the training time and but that would require more computational resources, so that is again very personal to your organization.
Creating and deploying an ML model is a very cumbersome process, so please use the project management tool to track the progress, raise issues proactively, and ensure everyone is on the same page.
Keeping everyone informed might seem daunting at first but would avoid a lot of re-work and overhead later in the process. As complicated as this process is, once you can see and measure the impact of the analysis or your model, all hustles will seem worthwhile.
Better data always beats better algorithms, so there is value in making appropriate investments to build and store efficient data pipelines.
Striking the Right Balance was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI