Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Understanding Tree Models
Latest

Understanding Tree Models

Last Updated on December 20, 2021 by Editorial Team

Author(s): Nirmal Maheshwari

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Machine Learning

Life is full of decisions and eventually, we do measure which option to take on some logical-based analysis. In this series of blogs, we will be making ourselves comfortable with two extremely popular machine learning modelsβ€Šβ€”β€Šdecision trees and randomΒ forests.

For this blog, we will limit ourselves to Decision trees and then move on to Random Forest one of the most used algorithms in the industry. We will also understand the coding part of the business by solving a real-world example. So let’sΒ start.

The intuition behind DecisionΒ Tree

With high interpretability and an intuitive algorithm, decision trees mimic the human decision-making process and excel in dealing with categorical data. Unlike other algorithms such as logistic regression or SVMs, decision trees do not find a linear relationship between the independent and the target variable. Rather, they can be used to model highly nonlinear data.

This is a supervised learning algorithm and you can easily explain all the factors leading to a particular decision/prediction. Hence, they are easily understood by business people. Let us understand this with the below example of a Loan ApprovalΒ System

Loan Dataset

We need to take a decision basis on this historical data if a new record comes in, whether that person is eligible for a loan or not. Let’s visualize the decision tree for the aboveΒ dataset

Loan Approval DecisionΒ Tree

You can see that decision trees use a very natural decision-making process: asking a series of questions in a nested if-then-else structure.

On each node, you ask a question to further split the data held by the node. If the test passes, you go left; otherwise, you go right. We split at each node till we don’t get a pure subset or reach a decision.

Interpreting a DecisionΒ Tree

Let’s understand the above in terms of a decisionΒ tree.

Decision Tree

At every Decision Node, a test is conducted related to one of the features in the dataset and if the test is passed it goes to one side of the tree otherwise the other side. The outcome of the test will take us to one of the branches of the tree and for a classification problem each leaf will be containing one of theΒ classes.

A binary decision tree is where we the test split the dataset into two parts like deciding on basis of employment type in the above example. Now if there is an attribute that has 4 kinds of values and depending on each value I want to make a decision then it will be a multiway decisionΒ tree.

The decision trees are easy to interpret. Almost always, you can identify the various factors that lead to the decision. Trees are often underestimated for their ability to relate the predictor variables to the predictions. As a rule of thumb, if interpretability by laymen is what you’re looking for in a model, decision trees should be at the top of yourΒ list.

Regression with DecisionΒ Trees

In regression problems, a decision tree splits the data into multiple subsets. The difference between decision tree classification and decision tree regression is that in regression, each leaf represents a linear regression model, as opposed to a class label. For this blog, we will keep ourselves limited to classification problems but I will mention certain points for regression as well wherever required.

Algorithms for Decision Tree Construction

Let us understand how does a decision tree knows on which attribute the split has to be takenΒ first?

Concept of Homogeneity

In general, the rule says that we should try and split the nodes such that the resulting nodes are as homogenous as possible, i.e. all the rows belong to a single class after the split. If the node is income level in the above example, try and split it with a rule such that all the data points that pass the rule have one label (i.e. is as homogenous as possible), and those that don’t, haveΒ another.

Now from the above, we come to a general rule asΒ below

Homogeneity

The rule says that we keep splitting the data till the homogeneity of the data is below a certain threshold, so you go step-by-step, picking an attribute and splitting the data such that the homogeneity increases after every split. You stop splitting when the resulting leaves are sufficiently homogenous. What is sufficiently homogenous? Well, you define the amount of homogeneity which, when achieved, the tree should stop splitting further. Let’s see what are some specific methods that are used to measure homogeneity.

Let us try to understand all the homogeneity measures with a hypothetical datasetΒ example

We need to predict for a certain group of people depending on their age and gender, whether they will be able to play cricket orΒ not.

Example

Now in the above example, we need to decide whether to split it on the age or gender to decide whether the person will be able to play cricket or not? Let’s look at the homogeneity measure to make theΒ decision

Gini Index

Gini index uses the sum of the squares of the probabilities of various labels in the dataΒ set.

Gini

The total Gini index, if you split on gender, willΒ be

Gini Index(gender) = (fraction of total observations in male-node)*Gini index of male-node + (fraction of total observations in female-node)*Gini index of female-node.

Gini Index for GenderΒ Split

Similarly, we can calculate the Gini index if we split onΒ age.

Gini Index for AgeΒ split

Assume that you have a data set with 2 class labels. If the data set is completely homogeneous (all the data points belong to label 1), then the probability of finding a data point corresponding to label 2 will be 0 and that of label 1 will be 1. So p1= 1, and p2= 0. The Gini index, equal to 1, will be the highest in such a case. The higher the homogeneity, the higher the GiniΒ index.

So if you had to choose between two splits: age and gender. The Gini index of splitting on gender is higher than that of splitting on age; so you go ahead and split onΒ gender.

Entropy and Information Gain

Another homogeneity measure is Information Gain. The idea is to use the notion of entropy. Entropy quantifies the degree of disorder in the data, and like the Gini index, its value also varies from 0 toΒ 1.

Entropy is givenΒ by

Entropy

where p _i is the probability of finding a point with the label i, same as in Gini index and k is the number of different labels, and Ξ΅[D] is the entropy of the data setΒ D.

Information Gain = Ξ΅[D]βˆ’Ξ΅[DA] i.e the entropy of the original dataset minus the weighted sum of the entropy of the partitions after you make aΒ split.

Information Gain

Let’s consider an example. You have four data points out of which two belong to the class label β€˜1’, and the other two belong to the class label β€˜2’. You split the points such that the left partition has two data points belonging to label β€˜1’, and the right partition has the other two data points that belong to label β€˜2’. Now let’s assume that you split on some attribute calledΒ β€˜A’.

  1. Entropy of original/parent data set is Ξ΅[D]=βˆ’[(24)log2(24)+(24)log2(24)]=1.0.
  2. The entropy of the partitions after splitting is Ξ΅[DA]=βˆ’0.5βˆ—log2(2/2)βˆ’0.5βˆ—log2(2/2)=0.
  3. Information gain after splitting is Gain[D,A]=Ξ΅[D]βˆ’Ξ΅[DA]=1.0.

So, the information gain after splitting the original data set on attribute β€˜A’ is 1.0, and the better the value of information gain, the better is the homogeneity of the data after theΒ split.

For splitting done for continuous output variables? You calculate the Rsquare of data sets (before and after splitting) in a similar manner to what you do for linear regression models. So split the data such that the R2 of the partitions obtained after splitting is greater than that of the original or parent data set. In other words, the fit of the model should be as β€˜good’ as possible after splitting.

I hope this blog will make you understand the Decision Trees in a very simple way. Next blog in the series we will be discussing the hyperparameters related to decision trees and also understand that β€œWhat is so random about Random ForestΒ :)”

References

  1. http://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/modules/tree.html
  2. https://www.udemy.com/course/complete-data-science-and-machine-learning-using-python/
  3. Data Science Program by Upgrad(https://www.upgrad.com/)

Link to my mediumΒ profile

https://commondatascientist.medium.com/medium-to-bond-de3c81e193b8


Understanding Tree Models was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓