Demystifying Decision Trees
Explained from scratch, step by step

Some time ago, I found myself having to explain the tree-based algorithms to a person who was into mathematics… but with zero knowledge of data science. So, I decided to ignore the classic toy datasets and started completely from scratch, from a bunch of 2-dimensional points.

So, after some basic imports

import pandas as pdfrom sklearn import treeimport seaborn as sns

I drew 6 points on a sheet: 4 blue, and 2 orange:

X = [[0, 0], [1, 1], [2,1], [1,2], [2,2], [0.5, 1.5]]Y = [0, 1, 1, 1, 1, 0]example = pd.DataFrame(X)example["target"] = Ydisplay(example)colors = example["target"].map({0:"orange", 1:"blue"})sns.scatterplot(data=example, x=0, y=1, c=colors, s=200)

