Publication

Machine Learning

Top Three Ways To Get The Most Out Of Your Machine Learning Project

Author(s): Bipin Biddappa P K

Machine Learning, Opinion

And here is why you should be doing them right away

Photo by Icons8 team on Unsplash

Introduction

A machine learning model learns from the data it is given to make better and better decisions over time, but sometimes we might face a scenario where one demands better performance from the model.

It’s also likely that one faces several difficulties that might hinder them from improving their model further, such as lack of data, lack of knowledge, or in some rare cases even the lack of time.

This is where amateur Data Scientists give up, and a master Data Scientist proves his/her worth by pushing the model even further to prove their worth.

Do you also want to become a master Data Scientist? then this article will help your model to step up its game for you to get a better performance out of it and help you distinguish yourself from those amateur Data Scientists.

First, make sure your data isn't lying to you!

Yes, you read that right, oftentimes your data might be rigged with unknown data or missing values, that need to be addressed first, if not these missing or unknown data tend to mislead your model thereby reducing its efficiency.

This scenario might make your model biased towards a certain outcome amongst all other possible outcomes, this is something you would want to avoid for the model to give you a better performance.

but, what are some of the ways you deal with this?

  1. Random guessing: Yes, this arbitrary method isn't usually recommended and is only to be used when you are certain about what the data is.
  2. Average it: All you do is take the average of the data present in that feature and use it, in place of the missing data, this method too isn't suggested as it can drastically reduce the variability of the data.
  3. Listwise deletion: Delete all the data of any entity with missing data, but ensure that you don't have too many data entries with missing data before using this method so that you don't lose out on significant data that might have been helpful for the model to make a better prediction.

If one model doesn't work for you then use an army of models!

There is always something magical about unity, well your model can use this magic too, any Data Scientist worth their salt would have heard of Ensemble methods, it is the idea that the collective decision of several models will be better than the decision of any single model.

Let me explain better, what happens in an Ensemble method is several weak learners are trained on the same dataset that we have, to reach a decision, and using these weak learners a single strong learner is built which considers the opinion of all those weak learners to produce its own decision.

Take RandomForestClassifier for instance, it makes use of several decision trees to give out a single output which is based on the result of those several decision trees, Which is a perfect example of an Ensemble method.

Now, why would you do that you ask?

  1. Such Ensemble Methods prevent the problems of overfitting, Which is the scenario where your model byhearts the data rather than learning from it, as a result despite the model having high accuracy on the training data it underperforms with the testing data.
  2. The Ensemble method boosts the model performance as it relies on several weak learners to make its decision.

These are some of the reasons why Ensemble methods are widely used in Kaggle competitions to extract better results out of your model.

Don't be shy, Play with the model’s Hyper-Parameters

Hyper-parameters are those settings in your model that changes as the model goes through the training phase, these hyper-parameters help the model perform better in its testing phase.

Although one can directly set the right parameters for the model, that would require significant domain knowledge and experience to come up with those Insights. For this article, I am going to assume you don't possess those insights to set those exact parameters initially.

So what is the next best thing one can do you ask?

One can use the GridSearchCv to search for the right parameters to use to create an optimally tuned model.

Read more about GridSearchCV here

Here comes a bonus tip!

Feature engineering is your friend, don't ignore it!

What happens in feature engineering is that you choose the right features for you to train your model on, at times when you use methods like one-hot-encoding, you even end up creating new features for your model to train on, It is also possible to omit some features out which might be adding no importance to your model’s output, for example, one often finds the ID column in the dataset, which is just used to keep a track of the number of data points in the dataset and adds no significance to the model, these features can always be omitted thereby reducing the burden on your model.


Top Three Ways To Get The Most Out Of Your Machine Learning Project was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓