Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Decision Tree Splitting: Entropy vs. Misclassification Error
Latest

Decision Tree Splitting: Entropy vs. Misclassification Error

Last Updated on October 26, 2022 by Editorial Team

Author(s): Poojatambe

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Why is entropy preferred over misclassification error to perform decision tree splitting?

by AdobeΒ Stock

The decision tree uses a top-down, greedy search approach with recursive partitioning. In the decision tree, the goal is to partition regions recursively until homogeneous clusters are formed. To make these partitions, a sufficient number of questions areΒ asked.

To split the tree at each step, we need to choose the best attribute that maximizes the decrease in loss from parent to children node. Hence, defining a suitable loss function is an important step.

Here, we will try to understand the entropy and misclassification error. Also, answer why misclassification error is not used for splitting.

Entropy

Entropy is the phenomenon of information theory used to calculate uncertainty or impurity in information. ID3 tree algorithm uses entropy and information gain as loss functions to choose data splitting attributes at eachΒ step.

Consider a dataset with C classes. The cross-entropy for region R is calculated asΒ follows:

Where Pc= Proportion of randomly selected examples in classΒ c.

The entropy ranges between 0 to 1. The zero value of entropy indicates the data is pure or homogeneous.

Misclassification error

The misclassification loss computes the fraction of misclassified samples. Hence, it considers major class proportion in region R. Consider C target classes. Let Pc be the proportion of samples of class c belonging to the C targetΒ classes.

The misclassification loss is computed asΒ follows:

The misclassification error ranges between 0 toΒ 0.5.

Entropy vs Misclassification Error

The maximum decrease in loss from the parent region to children nodes or minimize children loss is used to decide the attribute for the splitting of a tree. This decrease is called information gain given asΒ follows:

To calculate loss, we need to define a suitable loss function. Let’s compare entropy and misclassification loss with the help of anΒ example.

Consider 900 β€œpositive” samples and 100 β€œnegative” samples. Let’s assume the X1 attribute is used for splitting at the parent node. Consider the following decision tree with unequal distribution of data samples after splitting.

Decision Tree

It has one pure node classified as 200 β€œpositive” samples and an impure node with 700 β€œpositive” and 100 β€œnegative” samples.

With entropy as a loss function, parent loss is 0.467, and children loss is 0.544. As one node is pure, the entropy is zero, and the impure node has a non-zero entropyΒ value.

A decision tree with entropyΒ values

Using the information gain formula, the loss reduction from parent to children region is calculated as,

Gain = Entropy(parent)β€Šβ€”β€Š[Entropy(left child)*(No of samples in left child/No of samples in parent) + Entropy(right child)*(No of samples in right child/No of samples inΒ parent)]

Gain = 0.467 –[0.544*(800/1000) + 0 *(200/1000)]

Gain =Β 0.0318

With a misclassification error, parent loss is 0.1, and children loss isΒ 0.125.

A decision tree with misclassification loss

The information gain is calculated as,

Gain = ME(parent)β€Šβ€”β€Š[ME(left child)*(No of samples in left child/No of samples in the parent) + ME(right child)*(No of samples in right child/No of samples in theΒ parent)]

Gain = (100/1000)β€Šβ€”β€Š[(100/800)*(800/1000) + 0*(200/1000)]

Gain =0

From the above gain values, we can say that as the misclassification error has not gained any information hence, further splitting of the tree is not required, and the decision tree is stopped growing. But in the case of entropy, the decision tree can be partitioned further until the leaf node is reached and the entropy value becomesΒ zero.

Let’s prove this with a geometrical perspective.

Entropy and misclassification errorΒ graphs.

The above graphs are plotted with the assumption of an even split of data into two nodes. The cross-entropy function has concave nature that proves the loss of children is always less than that of the parent. But this is not the case with the misclassification error. Hence the children and parent loss areΒ equal.

Therefore, compared to entropy, the misclassification loss is not sensitive to changes in the class probabilities, due to which entropy is often used in building the decision tree for classification.

The Gini impurity has the same nature as entropy which is also preferred for decision tree building over misclassification loss.

References

Check my previousΒ stories,

  1. Image Classifier with Streamlit

2. Everything about FocalΒ Loss

Happy Learning!!


Decision Tree Splitting: Entropy vs. Misclassification Error was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓