Site icon Towards AI

The Azure ML Algorithm Cheat Sheet

Author(s): Lawrence Alaso Krukrubo

Cloud Computing, Machine Learning

Understanding how to choose the best ML models

A common question often asked in Data Science is:-

Which machine learning algorithm should I use?

While there is no Magic-Algorithm that solves all business problems with zero errors, the algorithm you select should depend on two distinct parts of your Data Science scenario…

img_credit
  1. What do you want to do with your data?: Specifically, what is the business question you want to answer by learning from your past data?
  2. What are the requirements of your Data Science scenario?: Specifically, what is the accuracy, training time, linearity, number of parameters, and number of features your solution supports?

The ML Algorithm cheat sheet helps you choose the best machine learning algorithm for your predictive analytics solution. Your decision is driven by both the nature of your data and the goal you want to achieve with your data.

The Machine Learning Algorithm Cheat-sheet was designed by Microsoft Azure Machine Learning (AML), to specifically answer this question:-

What do you want to do with your data?

The Data Science Methodology:

I must state here that we need to have a solid understanding of the iterative system of methods that guide Data Scientists on the ideal approach to solving problems using the Data Science Methodology. Otherwise, we may never fully understand the essence of the ML Algorithm Cheat-sheet.

The Azure Machine Learning Algorithm Cheat-sheet:

The AML cheat-sheet is designed to serve as a starting point, as we try to choose the right model for predictive or descriptive analysis. It is based on the fact that there is simply no substitute for understanding the principles of each algorithm and the system that generated your data.

The AML Algorithm cheat-sheet can be downloaded here.

The AML Algorithm cheat-sheet | img_credit

Overview of the AML Algorithm Cheat-sheet:

The Cheat-sheet covers a broad library of algorithms from classification, recommender systems, clustering, anomaly detection, regression, and text analytics families.

Every machine learning algorithm has its own style or inductive bias. So, for a specific problem, several algorithms may be appropriate, and one algorithm may be a better fit than others.

But it’s not always possible to know beforehand which is the best fit. Therefore, In cases like these, several algorithms are listed together in the Cheat-sheet. An appropriate strategy would be to compare the performance of related algorithms and choose the best-befitting to the requirements of the business problem and data science scenario.

Bear in mind that the machine learning process is a highly iterative process.

AML Algorithm Cheat-sheet Applications:

1. Text Analytics:

If the solution requires extracting information from text, then text analytics can help derive high-quality information, to answer questions like:-

What information is in this text?

Text-based algorithms listed in the AML Cheat-sheet include the following:

2. Regression:

We may need to make predictions on future continuous values such as the rate-of-infections and so on… These can help us answer questions like:-

How much or how many?

Regression algorithms listed in the AML Cheat-sheet include the following:

3. Recommenders:

Well, just like Netflix and Medium, we can generate recommendations for our users or clients, by using algorithms that perform remarkably well on content and collaborative filtering tasks. These algorithms can help answer questions like:-

What will they be interested in?

The Recommender algorithm listed in the AML Cheat-sheet includes:-

  1. Predict ratings for a given user and item.
  2. Recommend items to a user

The SVD Recommender also has the following features:- Collaborative filtering, better performance with lower cost by reducing the dimensionality.

4. Clustering:

If we want to seek out the hidden structures in our data and to separate similar data points into intuitive groups, then we can use clustering algorithms to answer questions like:-

How is this organized?

The Clustering algorithm listed in the AML Cheat-sheet include:-

  1. Detecting abnormal data
  2. Clustering text documents
  3. Analyzing datasets before we use other classification or regression methods.

5. Anomaly Detection:

This technique is useful as we try to identify and predict rare or unusual data points. For example in IoT data, we could use anomaly-detection to detect and raise an alarm as we analyze the logs-data of a machine. This could be used to identify strange IP addresses or unusually high attempts to access the system or any other anomaly that could pose a serious threat.

Anomaly detection can be used to answer questions like:-

Is this weird, is this abnormal?

The Anomaly detection algorithm listed in the AML Cheat-sheet include:-

The PCA-Based Anomaly Detection module solves the problem by analyzing available features to determine what constitutes a “normal” class. The module then applies distance metrics to identify cases that represent anomalies.

This approach lets you train a model by using existing imbalanced data. PCA records fast training times.

6. Multi-Class Classification:

Often, we may need to pick the right answers from complex questions with multiple possible answers. For tasks like these, we need a Multi-class classification algorithm. This can help us answer questions like:-

Is this A or B or C or D?

Multi-class algorithms listed in the AML Cheat-sheet include the following:

Usually a Binary-Classifier, but in Multi-class logistic regression, the algorithm is used to predict multiple outcomes.

Features: Fast training times, linear model.

Between the input and output layers, you can insert multiple hidden layers. Most predictive tasks can be accomplished easily with only one or a few hidden layers. However, recent research has shown that deep neural networks (DNN) with many layers can be effective in complex tasks such as image or speech recognition, with successive layers used to model increasing levels of semantic depth.

Features: Accuracy, long training times.

Decision trees, in general, are non-parametric models, meaning they support data with varied distributions. In each tree, a sequence of simple tests is run for each class, increasing the levels of a tree structure until a leaf node (decision) is reached.

Features: Accuracy, fast training times.

Any binary classifier can be used as the basis for a one-versus-all model

Features: Depends on the two-class classifier.

A boosted decision tree is an ensemble learning method in which the second tree corrects for the errors of the first tree, the third tree corrects for the errors of the first and second trees, and so forth. Predictions are based on the ensemble of trees together.

Features: Non-parametric, fast training times, and scalable.

7. Binary Classification:

Binary classification tasks are the most common classification tasks. These often involve a yes or no, true or false, type of response. Binary classification algorithms help us to answer questions like:-

Is this A or B?

Binary-classifier algorithms listed in the AML Cheat-sheet include the following:

Features: Under 100 features, linear model.

Features: Fast training, linear model.

Features: Accurate, fast training.

Features: Fast training, linear model.

Features: Accurate, fast training, large memory footprint.

Classification using neural networks is a supervised learning method, and therefore requires a tagged dataset, which includes a label column. For example, you could use this neural network model to predict binary outcomes such as whether or not a patient has a certain disease, or whether a machine is likely to fail within a specified window of time.

Features: Accurate, long training times.

8. Image Classification:

If the analysis requires extracting information from images, then computer vision algorithms can help us to derive high-quality information, to answer questions like:-

What does this image represent?

The computer vision algorithms listed in the AML Cheat-sheet include:-

Features: High accuracy, better efficiency.

Summary:

The Azure Machine Learning Algorithm Cheat Sheet helps you with the first consideration: What do you want to do with your data? On the Machine Learning Algorithm Cheat Sheet, look for a task you want to do, and then find an Azure Machine Learning designer algorithm for the predictive analytics solution.

The Azure Machine Learning experience is quite intuitive and easy to grasp. The Azure Machine Learning designer is a drag-and-drop visual interface that makes it engaging and fun to build ML pipelines, assemble algorithms and run iterative ML jobs, build, train and deploy models all within the Azure portal. Once deployed, your models can be consumed by authorized, external, third-party applications in real-time.

Next Steps:

After deciding on the right model to choose for the business problem, using the Azure Machine Learning Cheat-sheet, the next step is to answer the second question:-

To get the best possible outcome for these metrics, kindly go through the details at the Azure Machine Learning site.

Cheers!!

About Me:

Lawrence is a Data Specialist at Tech Layer, passionate about fair and explainable AI and Data Science. I hold both the Data Science Professional and Advanced Data Science Professional certifications from IBM. After earning the IBM Data Science Explainability badge, my mission is to promote Fairness and Explainability in AI… I love to code up my functions from scratch as much as possible. I love to learn and experiment…And I have a bunch of Data and AI certifications and I’ve written several highly recommended articles.

Feel free to connect with me on:-

Github

Linkedin

Twitter


The Azure ML Algorithm Cheat Sheet was originally published in Towards AI — Multidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Exit mobile version