Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Descriptive Statistics for Data Science: Explained
Data Science   Statistics

Descriptive Statistics for Data Science: Explained

Last Updated on January 1, 2021 by Editorial Team

Author(s): Suhas V S

Data Science, Statistics

A detailed go through into various aspects of descriptive statistics using python.

Photo by Chris Liverani on Unsplash

It is often talked about that it is an essential requisite for a data scientist to have the ability to understand all types of data including the numerical and the categorical ones. This ability is stimulated by learning the different aspects of “Statistics”.

Statistics is mainly divided into 2 parts:

  1. Descriptive
  2. Inferential

Two important terminologies to be understood before moving any further are population and sample.

The population refers to the complete record of observation(rows) and features(columns). They are usually very big in size when solving real-time problems.

The sample is a part or subset of the population where statistical studies will be conducted on it to understand what it is going to be for the population it comes from.

From the understanding of population and sample, we will try to derive the definitions of descriptive and inferential statistics.

Descriptive Statistics – It is the study of the sample wherein we try to find out different measures(mean, median, variance…) and their dependence/inter-dependence on the existing features.

Inferential Statistics – After studying various measures and relationships in the sample, we try to generalize the measure to the whole population. It may be to estimate the mean of a certain numerical feature or to hypothesize a relationship between one or more features.

We can summarize that to make any kind of assumptions or estimations(inferential) about a large population, the understanding of its parameters and measures(descriptive) from its sample is very crucial. They may be divided into two different techniques but goes hand-in-hand to solve a real-time problem.

Descriptive Statistics:

1.Characteristics:

We will deal mostly with different measures that are important for us to develop a statistical acumen.

1.1 Measure of Central Tendency:

There are 3 measures of tendency namely mean, median, and mode. They are called so because each of the measures represents the data at its focal point which will allow us to make meaningful interpretations.

Mean:

The mean is the average of the numerical values of a feature. When we have extreme values in the numerical list, the mean tends to move towards the extreme value. Hence, the mean is not recommended during such times. The way around in such times is to either go for a median value which we will see in the next block or we trim the extreme values on either side and then calculate the mean. Figure 1 shows the numpy function to calculate the mean.

Figure 1

If we replace 5 with a more extreme value of 50, the mean changes from 3 to 12. Refer to Figure 2.

Figure 2

If we trim one value on either side of “b”, we will ignore the extreme values and take the mean for the rest of the data point. Refer to Figure 3 and see the variation of mean back to 3.

Figure 3

Note: Import numpy as np and scipy. stats to perform the above operations.

Median:

The median divides the datapoints into 2 equal halves and take the middle value as opposed to the average done in the case of mean. This makes the median immune to the presence of extreme values as it does not in any way use them for calculation.

For example, refer to figure 4 where despite the presence of an extreme value of 50, we have been able to get the central tendency value(median) to be 3. This is the same output we have seen for the trimmed mean in figure 3.

Figure 4

Mode:

The mode is the frequency of the highest occurring element in the group. Refer to figure 5 where “3” is repeated with the highest count 2.

Figure 5

1.2 Measure of Location

In this part, we will look into a measure called “quantile” which may or may not divide the numerical data into equal halves. If the data is divided into 4 different equal parts, then the quantiles become “quartiles” which is what we will be concentrating on here. Figure 6 is a pictorial representation of the quartiles using a “boxplot”.

Figure 6(Source: Stackoverflow)

The above box plot has 5 important points:

a) Min(Q0): This is the lowest value in the numerical dataset.

b) Lower Quartile(Q1): This is the point that accounts for 25% of the data points.

c) Median(Q2): This point generally gives the idea about the 50% datapoint value of the dataset which divides the numerical data into 2 halves.

d) Upper Quartile(Q3): This is the point that accounts for 75% of the data points.

e) Max(Q4): This is the highest value in the numerical dataset.

The difference between Q3 and Q1 gives us information about the range of most of the values in the dataset. This difference is called the “Inter-Quartile Range(IQR)”.

Hence, IQR=Q3-Q1

Let us see how this concept can b realized in python with an example. Refer to figure 7 and 8 shown below.

Figure 7
Figure 8

1.3 Measures of Dispersion

This characteristic of the descriptive statistic gives an idea of the spread of the data. Meaning, it will tell us the deviation or distance of a data point concerning its mean. Many measures account for dispersion such as variance, standard deviation, and coefficient of variation. We will take each one up and study their features.

Variance:

The variance measures how far each number in the set is from the mean and hence from every other number in the set. Variance is depicted by the symbol: σ2(sigma squared). The formula is given by:

Figure 9(Source: Investopedia)

Standard Deviation(σ):

This measure also does the same job of letting us know how far the given data point is from the mean. It is by definition the square root of variance. If the data points are further from the mean, then there is a higher deviation within the data set. Hence, we can say the more spread out the data, the higher the standard deviation.

Figure 10(Source: Investopedia)

Coefficient of Variation:

This measure tracks the deviation of the data concerning its mean thereby giving an idea of where the data stands in terms of dispersion. It is the ratio of the standard deviation to its mean.

Figure 11(Source: Investopedia)

It is often used in “Stock Market Analysis” to determine the risk over the return where the “mean” is usually considered as the “return” and deviation as the “risk”. The higher the mean higher is the return and the same goes for the risk-standard deviation.

Example: Let us put our knowledge gained till now at work and find out which player is more consistent. Refer to figure 12.

Player1 = [100, 20, 30 ,40 ,50]

Player2 = [50,56,60,42,55,65]

Figure 12

Z-Score:

It also gives information about the spread of the data around the mean. Specifically, it is the distance of the data point from its mean to the standard deviation.

Figure 13

If z=0, it means that the data point is exactly equal to the mean value.

If z=1, it means that the data point is one standard deviation away from the mean.

Note: The function in python for z-score: scipy.stats.zscore()

1.4 Measures of Shape

When we are dealing with numerical data, it is important to understand the shape of the distribution which in turn is going to help in developing more accurate statistical acumen. There are two measures of shape namely skewness and kurtosis.

Skewness:

It is the degree of symmetry of the numerical data. If we plot a distribution on an x-y axis, skewness will let us know the direction and magnitude of the perturbation. More is the length of the tail on either side, more is the number of outliers or extreme values in the dataset.

The different types of skewness are given in the below table.

Figure 14
Figure 15(Source: MathisFun)

Kurtosis:

Similar to skewness, kurtosis tries to give information on the distribution plot considering the sharpness of the data. It also is used to find out how heavy is the tail of the distribution is.

Many of the statisticians say that kurtosis is the study of the peakedness of the data distribution. There are still some who feel that this definition is not completely true. One among them is Dr. Donald Wheeler and as per his understanding the definition goes like this, “The kurtosis parameter is a measure of the combined weight of the tails relative to the rest of the distribution.”

Figure 16
Figure 17(Source: Tutorialspoint)

Note: We will have to import pandas package to use skew and kurt inbuilt functions. Let us an example. Here, for a list “a” the skewness and kurtosis values are negative.

Figure 18

1.5 Covariance and Correlation

Variance is a measure that is used to find the spread of values in one numerical variable from its mean. To do it for 2 or more numerical variables, we use “covariance”. It ranges from -infinity to +infinity. It also tells whether two variables are related by measuring how the variables change with each other.

Figure 19(Source: Byjus)

Since the range of covariance is from -infinity to +infinity, when the number of variables increases it becomes a harder task to compare and decide as the units and magnitude for different numerical variables will be different. Refer to Figure 21. To overcome this shortcoming, we will normalize the numerical values between -1 to +1 as opposed to -infinity to +infinity. Refer to figure 22.

Figure 20(Source: Great Leaning)

Note: corr=-1 means highly correlated in the negative direction. Same goes for corr=+1 which is highly correlated in the positive direction. Same can be deduced for the other -ve and +ve correlation values.

Getting covariance and correlation using pandas.

Figure 21(covariance using pandas)
Figure 22( Correlation using pandas and seaborn)

2.Probability and Bayes’ Theorem

When we are dealing with numbers and events, oftentimes we come across a question of the chance of an event happening when it is tried for “n” number of times. We would be needing to know the degree of uncertainty of an event happening. This is where the understanding of probability would give us meaningful insight.

By definition,

Figure 23(Source: MathisFun)

Example: Probability of ending up with value 3 on a dice(Only one trial)

Here, a we all know a dice has 6 faces and the number 3 can only come up once as it can not be repeated. Then, the probability of the event happening is 1/6.

Terminologies:

Sample space:

It is a space containing all the possible outcomes of an event happening for the “n” number of trials.

Sample Space size = (O)^n

where O= Number of uniquely exhaustive outcomes; n= number of trials.

Example: For a coin toss experiment done for 2 times.

Sample space size=(2)²=4 and sapce=[HH,HT,TH,TT]

Mutually Exclusive Events: Two events A and B are said to be mutually exclusive if they cannot occur at the same time. Or the occurrence of A excludes the occurrence of B.

Example:When you toss a coin, getting a head or a tail are mutually exclusive as either head or tail will appear in case of an ideal scenario.

Independent Events: Two events A and B are said to be independent if the occurrence of A is in no way influenced by the occurrence of B. Likewise, the occurrence of B is in no way influenced by the occurrence of A.

Example:When you toss a fair coin and get head and then you toss it again and get a head.

Rules of Probability:

There are several rules when it comes to probability, but we will concentrate on the ones which we have seen in the above terminologies. Refer to figure 24.

Figure 24

Types of Probability:

  1. Marginal Probability: The term marginal is used to indicate that the probabilities are calculated using a contingency table (also called a joint probability table). The marginal probability of one variable (X) would be the sum of probabilities for the other variable (Y rows) on the margin of the table.

2. Joint Probability: Joint probability is the chance of a variable on X happening with another variable on the Y of the contingency table.

3. Conditional Probability: The probability of one variable on X happening given that another variable on the Y has already happened and vice-versa.

Figure 25

An Example to understand the probability types:

Figure 26

Figure 25 is a contingency table for students passing/failing in Subject A and Subject B.

The marginal probabilities are given by,

Figure 27

The joint probabilities are given by,

Figure 28

The conditional probabilities are given by,

Figure 29

Odds:

The odds of an event is the ratio of the number of favorable events to the number of unfavorable events. It is another way of expressing probability. Here, we will have to derive the probability from the odds ratio.

Figure 30
Figure 31

Example: The odds in favor of John shooting a target are 14:11. What is the probability of John shooting the target?

Figure 32

Bayes’ Theorem:

The Bayes’ theorem is an extension of the conditional probability wherein we will be calculating the posterior probability with the already known prior probability, the evidence, and the likelihood of the event happening.

Figure 33(Source: Great Learning)

Example: A test for TB disease is 60% accurate when a person has the disease and 99% accurate when a person does not have the disease. If 0.01% of the population has TB disease, what is the probability that a person is chosen randomly from the population who test positive for the disease actually has the disease?

Figure 34

3. Probability Distributions

The probability distribution in simple terms is the distribution plot of the probability over the different outcomes of the event. We need to understand a very important term “Random Variable” before going any further as the entire concept of probability distributions is pivoted around it.

What is “Random Variable”?

In a random experiment, a variable takes different values as the result of the experiment and, that variable is called “Random Variable”. A random variable is discrete if it has a finite or countable number of possible outcomes that can be listed. A random variable is Continuous if it has an uncountable number of possible outcomes within a given interval. It is usually denoted by “X”.

Figure 35(Source: Great Learning)

Types of Probability distributions

Depending on the nature of the random variable whether it is discrete/continuous, we have 2 types of probability distributions.

  1. Discrete Probability Distribution
  2. Continuous Probability Distribution

Discrete Probability Distribution:

Probability Mass Function(PMF): The probability at discrete values of the random variable X is called “PMF”.

Cumulative Density Function(CDF) and Survival Function(SF) are the probability functions that add up individual pmfs cumulatively in opposite directions. The CDF adds it from left to right (includes the endpoint of the random variable). The SF adds the probabilities from right to left(excludes the final point).

Therefore, SF+CDF=1

Example: A coin is tossed 3 times. The random variable X= number of heads.

Figure 36

Let us realize the pmf, cdf, and sf.

Figure 37

We will discuss 2 types of discrete probability distributions namely Binomial and Poisson.

Binomial Distribution:

When do we use Binomial distribution?

a) When the number of trials of the experiment is “finite”.

b) When there only 2 unique outcomes.

c) The trials are independent.

Figure 38:(Source: Great Learning)

mean or expected value = n*p

variance=n*p*(1-p)

Let us see how we can realize binomial distribution in python using scipy.stats library.

Problem: The percentage of orders filled correctly at Wendy’s was approximately 86.8%. Suppose that you go to the drive-through window at Wendy’s and place an order. Two friends of yours independently place orders at the drive-through window at the same Wendy’s.

What are the probabilities that,

a) all three orders are filled correctly?

b) none of the three are filled correctly?

c) at least two of the three orders will be filled correctly?

d) what is the mean and standard deviation of the orders filled correctly?

e) Plot the binomial distribution function

Since there are only 2 events orders filled correctly and not filled correctly and the number of trials is 3, we can use Binomial distribution.

Figure 39
Figure 40
Figure 41
Figure 42

Poisson Distribution:

It is a probability distribution used to estimate the number of occurrences over a specified period of time. When the number of trials approached “infinity”, the binomial probability nears “zero” hence we need a different technique to deal with. This is where we use “Poisson distribution”.

Figure 43(Source: Great Learning)

Let us see how we can realize Poisson distribution in python using scipy.stats library.

Problem: A Life Insurance agent sells on average 3 life insurance policies per week.

Use the Poisson law to calculate the probability that in a given week, he will sell

a) Some policies

b) 2 or more but less than 5 policies?

c) Plot the Poisson distribution function?

Figure 44
Figure 45

Continuous Probability Distribution:

As we know that for a continuous probability distribution, the random variable is continuous in nature. Below are the assumptions,

a. The probability at any point is 0. Hence, we make use of the area under the curve which is the probability.

b. We consider the probability over an interval of a random variable.

All the probability density functions cdf, sf works similarly to what we have seen in the discrete probability distributions.

We will understand the most important Normal distribution which is a continuous distribution.

Normal Distribution:

It deals with the gaussian or bell-shaped curve which maintains symmetry on either side of the curve where mean, median and mode are equal. The height and weight of the people, rainfall data, and most of the numerical data around us follow a normal distribution.

Figure 46

where, σ = standard deviation, u= population mean, e, and π are constants.

Figure 47

From Figure 47, we can say that 68% of the data points lie within -1σ and +1σ, 95.5% lie within -2σ and +2σ, and 99.75% lie within -3σ and +3σ.

Let us see how we can realize Normal distribution in python using scipy.stats library.

Problem: The mean salaries of Data Scientists working in Chennai, India is calculated to be 7,00,000 INR with a standard deviation of 90,000 INR. The random variable salary of Data Scientists follows a normal distribution.

a) What is the probability that a Data Scientist in Chennai has a salary of more than 10,00,000 INR?

b) What is the probability that a Data Scientist in Chennai has a salary between 6,00,000 & 9,00,000 INR?

c) What is the probability that a Data Scientist in Chennai has a salary less than 4,00,000 INR?

Figure 48

This concludes the concepts and their realization using various python libraries in descriptive statistics. Having a strong understanding of these topics is very essential for handling complex problems that come in the inferential analysis. I have tried to include as many topics as possible which I thought would help. I will write on the inferential statistics as a continuation of this part. Till then, Happy reading!!!.


Descriptive Statistics for Data Science: Explained was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->