Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Why Choose Random Forest and Not Decision Trees
Machine Learning

Why Choose Random Forest and Not Decision Trees

Last Updated on January 6, 2023 by Editorial Team

Last Updated on November 17, 2020 by Editorial Team

Author(s): Daksh TrehanΒ Β 

A concise guide to Decision Trees and RandomΒ Forest.

Decision trees belong to the family of the supervised classification algorithm. They perform quite well on classification problems, the decisional path is relatively easy to interpret, and the algorithm is fast andΒ simple.

The ensemble version of the Decision Trees is the RandomΒ Forest.

Table ofΒ Content

  1. Decision Trees
  • Introduction to DecisionΒ Trees.
  • How does the Decision TreeΒ work?
  • Decision Trees Implementation fromΒ scratch.
  • Pros & Cons of DecisionΒ Trees.

2. RandomΒ Forest

  • Introduction to RandomΒ Forest
  • How does Random ForestΒ Works?
  • Sci-kit implementation for RandomΒ Forest
  • Pros & Cons of RandomΒ Forest.

Decision Trees

Introduction to DecisionΒ Trees

A decision tree is a simple tree-like structure constituting nodes and branches. Data is split based on any of the input features at each node, generating two or more branches as output. This iterative process increases the number of generated branches and partitions the original data. This continues until a node is generated where all or almost all of the data belong to the same class, and further splitsβ€Šβ€”β€Šor branchedβ€Šβ€”β€Šare no longer possible.

This whole process generates a tree-like structure. The first splitting node is called the root node. The end nodes are called leaves and are associated with a class label. The paths from the root to the leaf produce the classification rules.

Suppose you’re an employee and you want to eat processed food.

Your course of action will depend on several circumstances.

If you aren’t hungry, you won’t spend money on junkies. But if you are hungry, then the choices are changed. Your next move depends on your next circumstance, i.e., have you bought lunch or not?

Now, if you don’t have lunch, your action will solely depend on your next pick, i.e., is it month-end or not? If it is the last few days of the month, you will consider skipping the meal; otherwise, you won’t take it as a preference.

Decision Trees come into play when there are several choices involved to arrive at any decision. Now you must choose accordingly to get a favorable outcome.

Tree-based learning algorithms are considered to be one of the best and mostly used supervised learning methods. Tree-based methods legitimize predictive models with better accuracy, stability, and ease of interpretation. Unlike contemporaries, they work well on non-linear relationships as well. Decision Tree algorithms are referred to as CART (Classification and Regression Trees).

How do Decision TreesΒ work?

There are two components of DecisionΒ Trees:

  • Entropyβ€Šβ€”β€ŠIt is regarded as the randomness of the system. It is a measure of node purity or impurity.

Entropy is maximum when p = 0.5, i.e., both outcome has the same favor.

  • Information gainβ€Šβ€”β€ŠIt is a reduction in entropy. It is the difference between the starting node’s uncertainty and the weighted impurity of the two child nodes.

It helps us find the root node for our decision tree; the node with maximum Information Gain is regarded as the root node as it has maximum uncertainty.

We will first split the feature with the highest information gain. This is a recursive process until all child nodes are pure or until the information gain isΒ zero.

Goal of Decision Tree: Maximize Information Gain and MinimizeΒ Entropy

Let’s say we have a sample of 60 students with three variables Gender (Boy/ Girl), Class (XI/ XII), and Weight (50 to 100 kg). 15 out of these 60 play football in their leisure time. Now, we want to create a model to predict who will play football during free time? In this problem, we need to divide students who play football in their leisure time based on a highly significant input variable among all three.

This is where the decision tree comes into play. It will classify the students based on all three variables’ values and identify the variable, which creates the best homogeneous sets of students.

Using a Decision Tree, we can easily solve our problem and classify students based on traits that whether they will prefer playing football in their leisure time or not?

Decision Trees Implementation fromΒ scratch

Dividing data into different parts calculating Entropy
Calculating Information Gain

Sci-kit Learn implementation

Visualizing your DecisionΒ Tree

Pros & Cons of DecisionΒ Trees

Pros

  • Easy to interpret
  • Handles both categorical and continuous dataΒ well.
  • Works well on a largeΒ dataset.
  • Not sensitive to outliers.
  • Non-parametric inΒ nature.

Cons

  • These are prone to overfitting.
  • It can be quite large, thus making pruning necessary.
  • It can’t guarantee optimal trees.
  • It gives low prediction accuracy for a dataset as compared to other machine learning algorithms.
  • Calculations can become complex when there are many class variables.
  • High Variance(Model is going to change quickly with a change in trainingΒ data)

Random Forest

Introduction to RandomΒ Forest

Random forest is yet another powerful and most used supervised learning algorithm. It allows quick identification of significant information from vast datasets. The biggest advantage of Random forest is that it relies on collecting various decision trees to arrive at any solution.

This is an ensemble algorithm that considers the results of more than one algorithms of the same or different kind of classification.

Random Forest,Β Source

Suppose you want to go for a vacation but are baffled about the destination. So you decide to ask your friend Lakshay for advice. Lakshay will ask you about your last vacation and whether you liked it or not, what did you do there. To get precise results, he might even inquire about your preferences, and based on your remark, and he will provide you a recommendation. Here, Lakshay is using the Decision Tree technique to provide you feedback that is based on your response.

But you think Lakshay’s advice is a bit biased, and you asked Meghna(your other friend) the same question. She, too, came up with a recommendation, but you again considered it a dicey choice. You iterated this process and asked β€œn” friends the same question. Now you’re up to some common places recommended by your friends. You collect all the votes and aggregate them. You decide to go to the place with the most votes. Here, you are using a random forest technique.

The deeper you go, the more prone to overfitting you’re as you are more specified about your dataset in Decision Tree. So Random Forest tackles this by presenting you, the product of Decision Tree’s simplicity and Accuracy through Randomness.

Random Forest = Decision Tree’s simplicity * Accuracy through Randomness

How does Random ForestΒ Works?

Assume β€œm” features in ourΒ dataset:

  1. Randomly chose β€œk” features satisfying condition k <Β m.
  2. Among the k features, calculate the root node by choosing a node with the highest Information gain.
  3. Split the node into childΒ nodes.
  4. Repeat the previous steps nΒ times.
  5. You end up with a forest constituting nΒ trees.
  6. Perform Bootstrapping, i.e., combining the results of all Decision Trees.
Source

Sci-kit implementation for RandomΒ Forest

Pros & Cons of RandomΒ Forest

Pros:

  • Robust to outliers.
  • Works well with non-linear data.
  • Lower risk of overfitting.
  • Runs efficiently on a largeΒ dataset.
  • Better accuracy than other classification algorithms.

Cons:

  • Random forests are found to be biased while dealing with categorical variables.
  • Slow Training.
  • Not suitable for linear methods with a lot of sparseΒ features

Conclusion

Hopefully, this article will help you understand Decision Trees and Random Forest in the best possible way and assist you in its practical usage.

As always, thanks so much for reading, and please share this article if you found itΒ useful!

Feel free toΒ connect:

LinkedIn ~ https://www.linkedin.com/in/dakshtrehan/

Instagram ~ https://www.instagram.com/_daksh_trehan_/

Github ~ https://github.com/dakshtrehan

Follow for further Machine Learning/ Deep LearningΒ blogs.

Medium ~ https://medium.com/@dakshtrehan

Want to learnΒ more?

Detecting COVID-19 Using DeepΒ Learning

The Inescapable AI Algorithm: TikTok

Why are YOU responsible for George Floyd’s Murder and Delhi CommunalΒ Riots?

Clustering: What is it? When to use it?

Start off your ML Journey with k-Nearest Neighbors

Naive Bayes Explained

Activation Functions Explained

Parameter Optimization Explained

Gradient Descent Explained

Logistic Regression Explained

Linear Regression Explained

Determining Perfect Fit for your MLΒ Model

Cheers!


Why Choose Random Forest and Not Decision Trees was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓