Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Decision Trees in Machine Learning (ML) with Python Tutorial
Editorial   Machine Learning   Programming

Decision Trees in Machine Learning (ML) with Python Tutorial

Last Updated on October 21, 2021 by Editorial Team

Source: Image by Bela Geletneky from Pixabay

Diving into decision trees in machine learning (ML) with Python

Author(s): Saniya Parveez, Roberto Iriondo

This tutorial’s code is available on Github and its full implementation as well on Google Colab.

Table of Contents

  1. Introduction
  2. Decision Tree Example
  3. Building a Decision Tree
  4. Node Impurity
  5. Entropy
  6. Gini
  7. Overfitting in Decision Tree Learning
  8. Pruning
  9. Advantages and Disadvantages of Decision-tree-based Classification
  10. Code Implementation
  11. Advanced Decision Trees
  12. Conclusion

Introduction

A decision tree is a vital and popular tool for classification and prediction problems in machine learning, statistics, data mining, and machine learning [4]. It describes rules that can be interpreted by humans and applied in a knowledge system such as databases. Fundamentally, a decision tree T encodes d (a classifier or regression function) in the form of a tree structure which presents the following attributes:

  • Decision node: It defines a test on a single attribute.
  • Leaf node: It shows the value of the target attribute.
  • Edge: It is a split of one attribute.
  • Path: It is a disjunction of the test to make the final decision.

These are other names that decision trees are known as:

  • Tree classifier.
  • Divide and conquer strategy.
  • Hierarchical classifier.
  • Multistage classifier.

It classifies cases by commencing at the tree’s root and passing through it unto a leaf node.

Figure 1: A decision tree.
Figure 1: A decision tree.

A decision tree uses nodes and leaves to make a decision.

Figure 2: Vector classification.
Figure 2: A vector classification.

Representation of the classification above in the form of a decision tree:

Figure 3: Decision tree for classification.
Figure 3: Decision tree for classification.

Decision Tree Example

Problem Statement

Predict the result of the basketball game between two different teams — team1 and team2.

List of Usable Knowledge or Attributes related to the Game

  • Did Peter play center or forward?
  • What was the location of the game — home or away?
  • What was the start time of the game?
  • Was the opponent’s center tall or not?

Historical Data

Figure 4: Historical data for a basketball game.
Figure 4: Historical data for a basketball game.

Data for Prediction

Figure 5: Data for the prediction of the basketball game.
Figure 5: Data for the prediction of the basketball game.

Therefore, based on figure 5, historical and prediction data:

  • Generalize the learned rule to new data.
  • It is a classification problem.

Building a Decision Tree

Decision tree learning involves learning a sequence of if/else queries that get us to the “true” answer almost immediately. These questions are also called test. It searches over all possible tests and finds the one that is most instructive about the target variable.

How to grow a decision tree?

  • The top node is also called the root, and it represents the whole dataset.
  • A decision tree progressively splits the training set into smaller and smaller subsets.
  • If the test is “true” — a point is assigned to the left node; otherwise, it is assigned to the right node.
  • A leaf of a tree that contains data points and it shares the same target values is called pure [1].
  • A prediction on a new data point is made by examining which region of the partition of the feature space the point lazes in and then predicting the majority target in that region [1].
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_splitcancer = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state=42)
Figure 6: Accuracy of the training and testing datasets.
Figure 6: Accuracy of the training and testing datasets.
  • If we do not restrict a decision tree’s depth, it can become arbitrarily deep and complex.
  • As shown in figure 6, the accuracy on the training set is 100% — because the leaves are pure, the tree has grown deep enough to remember all the labels on the training data unquestionably.
  • The tree can grow vast and such kinds of trees are hard to understand. Larger trees are typically less accurate than smaller trees [2].

Code implementation to create leaf of a decision tree (The leaf is created based on the ml_tasks — regression or classification):

def create_leaf(data, ml_task):
    label_column = data[:, -1]
    
    if ml_task == "regression":
        leaf = np.mean(label_column)
    else:
        unique_classes, counts_unique_classes = np.unique(label_column, return_counts=True)
        i = counts_unique_classes.argmax()
        leaf = unique_classes[i]
    
    return leaf

The main aim of a decision tree is to select appropriate features for splitting the tree into subparts. Then we apply an ID3 algorithm in the background during splitting.

Node Impurity

Node impurity is the homogeneity within a node. A node is impure if cases have more than one value for the response. A node is pure if all instances have the same value for the response or target variable or impurity = 0.

These are the two most popular methods for measuring node impurity:

  • Entropy.
  • Gini.

The best split is selected based on the degree of impurity of the child nodes. Node impurity is 0 when all patterns at the node are of the same category. Impurity becomes maximum when all the classes at node N are equally likely.

Code snippet to check for purity:

def check_purity(data):
    label_column = data[:, -1]
    unique_classes = np.unique(label_column)
    
    if len(unique_classes) == 1:
        return True
    else:
        return False

Entropy

In a decision tree, entropy is a kind of disorder or uncertainty. It is the measure of impurity, disorder, or uncertainty in a bunch of data. It is a way to control the split of data decided by a decision tree. It influences how a decision tree forms its boundaries. We use entropy to measure the impurity or randomness of a dataset.

Given the equation of entropy shown below:

Figure 7: Equation for entropy.

Code snippet to calculate entropy:

def get_entropy(data):
    label_col = data[:, -1]
    a, counts = np.unique(label_col, return_counts=True)
    prob = counts / counts.sum()
    entropy = sum(probabilities * -np.log2(probabilities))
    
    return entropy

A simple example of entropy:

Let’s say there is a bag which depicts two different scenarios:

  • Bag A has 100 green balls. Peter wants to choose a green ball from this bag. Here, Bag A has 0 entropy because it implies 0 impurities or total purity.
  • We replace 40 green balls in bag A with red balls, and similarly, we replace 10 green balls with black balls. Now, John wants to choose a green ball from this bag. In this case, the probability of drawing a green ball will drop down from 1.0 to 0.5 due to the increase in the bag’s impurity.

Shannon’s entropy model uses the logarithm function with base 2 (log2(P(x)) to measure the entropy.

Shannon’s Information Theory

There are only two classes — Yes, No.

In our example, t is a set of messages sent to a receiver that must guess their class so:

  • If p(Yes | t) = 1 (resp., p(No | t) = 1), then the receiver guesses a new example as yes. No message needs to send.
  • If p(Yes | t) = p (No | t) = 0.5, then the receiver cannot guess, and we must tell them the class of a new example, sending a one-bit message.
  • If 0<p(Yes | t) < 1, then the receiver needs less than one-bit on average to know the class of a new example.

Information Gain

Information Gain measures how much information a feature provides about the class.

Information Gain is significant in a Decision tree due to the points below:

  • It is the primary key accepted by the Decision tree algorithm to build a Decision tree.
  • The Decision Tree will evermore try to maximize information gain.
  • The attribute which has the highest information gain will be tested or split first.

The figure below shows the equation for Information Gain:

Figure 8: The equation of information gain.
Figure 8: The equation of information gain.

Example of Entropy Calculation

There is a road to drive vehicles, and that road has multiple features like grade, bumpiness, speed limit, and others.

This is the dataset:

Figure 9: Road dataset.
Figure 9: Road dataset.

Features:

  • Condition
  • Bumpiness
  • Speed Limit

Label:

  • Speed

Total number of observations: 4

Calculate the entropy of Grade Feature:

Take labels as a parent node like SSFF → Slow Slow Fast Fast.

The entropy of SSFF:

P(Slow) = 2/4 = 0.5

P(Fast) = 2/4 = 0.5

So, the entropy of SSFF:

The entropy of parent = {0.5 log2(0.5) + 0.5 log2(0.5)} = -{-0.5 + (-0.5)}

Hence, The entropy of parent = 1

Next, to find the grade feature’s information gain, split the parent node by the grade feature, as shown in figure 10.

Figure 10: The split of the parent node by the grade feature.

Calculate the entropy of both (left and right) children node SSF and F, respectively.

The entropy of node F = 0 (Note: 0 because all are from the same class)

The entropy of node SSF:

P(Slow) = 2/3 = 0.334

P(Fast) = 1/3 = 0.334

So, Entropy of node SSF = -{0.667 log2 (0.667) + 0.334 log2(0.334)}

= -{-0.38 + (-0.52)} = 0.9

Figure 11: Number of nodes in the parent node and children nodes.
Figure 11: Number of nodes in the parent node and children nodes.
Figure 12: The equation to calculate the entropy of the children nodes.
Figure 12: The equation to calculate the entropy of the children nodes.
Figure 13: The entropy of the child node with a weighted average.
Figure 13: The entropy of the child node with a weighted average.

The information gain of grade from the equation = 1–0.675 = 0.325

The range of entropy lies between 0 to 1.

Gini

Like entropy, the Gini index is also a type of criterion in decision trees that serves to calculate information gain. Information gain is used by the decision tree to split a node. Gini measures the impurity of a node.

The range of Gini lies between 0 to 0.5. Gini impurity is better compared to entropy for selecting the best features [3].

The equation of the Gini index to measure impurity:

Figure 14: Equation of Gini.
Figure 14: Equation of Gini.

The split criterion in Gini index:

  • It assumes that there exist several possible split values for each attribute.
  • All attributes are assumed continuous-valued
  • It can be modified for the categorical attributes.

Overfitting in Decision Tree Learning

Overfitting is a severe problem in machine learning that leads to the worst performance issue in the model. Similarly, the decision tree can also face the problem of overfitting due to the issues below:

  • If the decision tree grows too far.
  • If the number of instances in the decision tree gets smaller as the tree is built.

We use pruning to avoid overfitting in decision trees.

Pruning

Pruning is the process of adjusting the decision tree to minimize misclassification errors. It is the inverse of splitting.

There are two ways to perform Pruning:

  • Pre-pruning.
  • Post-pruning.

It Identifies and removes branches that reflect noise or outliers.

Complete Tree

A complete tree indicates the stopping pattern for the tree. It follows the steps below:

  • If all the records belong to the same class, then stop expanding a node.
  • If all the records have the same attribute values, then stop expanding a node.

Pre-pruning

In the pre-pruning approach, the tree does not grow entirely. It follows the early stopping rule. It follows the below steps:

  • Stops the algorithm before it becomes a fully-grown tree.
  • Stops if all instances belong to the same class.
  • Stops if all attribute values are the same.

Post-pruning

Post-pruning is the most popular approach in the decision tree to avoid overfitting. It essentially solves the issue of overfitting by following the below steps:

  • Completly grow the decision tree.
  • Follow the bottom-up approach to trim the nodes of the decision tree.
  • If the error of generalization improves after the trimming of nodes, replace the sub-tree with a leaf node.
  • In a sub-tree, the class label of the leaf node is determined by the majority class.

Advantages and Disadvantages of Decision-tree-based Classification

These are the advantages and disadvantages of decision-tree-based classification:

Advantages

  • It is very cheap to build.
  • It gives outstanding accuracy.
  • It is very fast in the classification of “unknown” records.
  • It is straightforward to interpret for the small-sized trees.
  • It can handle both continuous and symbolic attributes.
  • It has an acceptable performance on noisy data.

Disadvantages

  • There can be a system memory issue because data needs to fit in the memory.
  • It requires to retrain if new data comes.
  • It has the problem of axis-parallel decision boundaries.

Code Implementation

For this example, we will be using the Iris dataset. Here we show the code implementation of a decision tree step by step:

Import necessary libraries:

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as snsfrom sklearn import tree%matplotlib inline

Read Iris dataset:

data = pd.read_csv('Iris.csv')
data
Figure 15: Iris dataset.
Figure 15: Iris dataset.

The shape of Iris data:

data.shape

Column names of the Iris dataset:

col_names = ['id', 'sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']data.columns = col_namescol_names

Columns of the Iris dataset

Drop the “id” column from the dataset:

data = data.drop(['id'], axis=1)

Check the head of the Iris dataset:

data.head()
Figure 16: Iris dataset.
Figure 16: Iris dataset.

Get the Iris dataset information:

data.info()
Figure 17: Iris dataset information.
Figure 17: Iris dataset information.

Get counts:

data['species'].value_counts()

Get target columns:

target_col = ['species']

Get the value of X and y:

X = data.drop(['species'], axis=1)y = data['species']

The split of the dataset:

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 42)

Decision tree classifier:

from sklearn.tree import DecisionTreeClassifier

Apply decision tree classification with Gini index:

clf_gini = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=0)

Fit to the model:

clf_gini.fit(X_train, y_train)
Figure 18: Decision tree classifier.
Figure 18: Decision tree classifier.

Prediction:

y_pred_gini = clf_gini.predict(X_test)
y_pred_gini
Figure 19: Prediction.
Figure 19: Prediction.

Get accuracy with criterion Gini index:

from sklearn.metrics import accuracy_scoreprint('Model accuracy score with criterion gini index: {0:0.4f}'. format(accuracy_score(y_test, y_pred_gini)))# y_pred_gini are the predicted class labels in the test-set.
Figure 20: Model accuracy.
Figure 20: Model accuracy.

Check for overfitting and underfitting:

print('Training set score: {:.4f}'.format(clf_gini.score(X_train, y_train)))print('Test set score: {:.4f}'.format(clf_gini.score(X_test, y_test)))
Figure 21: Training and test dataset score.
Figure 21: Training and test dataset score.

Plot decision tree:

plt.figure(figsize=(12,8))
tree.plot_tree(clf_gini.fit(X_train, y_train))
Figure 22: Decision tree graph of the Iris dataset.
Figure 22: Decision tree graph of the Iris dataset.

Advanced Decision Trees

There is another crucial improvement made for the decision trees, which is Random forests. These are similar to decision trees but require multiple trees together hence the name forests. Random forests ensemble a supervised learning technique used in machine learning. There are more chances of overfitting in a single decision tree, but with multiple trees together, the training error is minimal as the tree grows deeper and deeper.

Conclusion

Decision trees are widely used and are one of the most used methods used in predictive modeling. They help predict the future and are very easy to understand. They work more efficiently with discrete attributes, but there are high chances that these trees suffer from error propagation.

Decision trees are also not sensitive to outliers because the partitioning happens based on the proportion of samples within the split ranges and not on absolute values [5]. They are quite intuitive and easy to explain to non-technical users.

Another critical practice for a decision tree is that those non-linear relationships between the parameters do not affect the tree’s performance. Consequently, in decision trees, predictions are faster on high dimensional data.


DISCLAIMER: The views expressed in this article are those of the author(s) and do not represent the views of Carnegie Mellon University, nor other companies (directly or indirectly) associated with the author(s). These writings do not intend to be final products, yet rather a reflection of current thinking, along with being a catalyst for discussion and improvement.

Published via Towards AI

Resources

Github repository.

Google colab implementation.

References

[1] Introduction to Machine Learning with Python: A Guide for Data Scientists 1st Edition, Andreas C.Muller, Sarah Guido, https://towardsai.net/p/data-science/best-data-science-books-free-and-paid-data-science-book-recommendations-b519046dcca5#5ed1

[2] Decision Tree Algorithm, Comp328 Tutorial 1, Kai Zhang, Slide Share, https://www.slideshare.net/Ami_Surati/decision-tree-51573521

[3] Gini Impurity and Entropy in Decision Tree — ML, Geeks for Geeks, https://www.geeksforgeeks.org/gini-impurity-and-entropy-in-decision-tree-ml/

[4] Decision Tree, Wikipedia, https://en.wikipedia.org/wiki/Decision_tree

[5] Decision Trees, Modern Information Processing, 1st Edition. Print Book & E-Book, Bernadette Bouchon-Meunier Giulianella Coletti Ronald R. Yager, ISBN 9780444520753, 9780080461694, https://www.elsevier.com/books/T/A/9780444520753

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->