Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

K-Nearest Neighbors (KNN) Algorithm Tutorial — Machine Learning Basics
Editorial   Machine Learning   Tutorials

K-Nearest Neighbors (KNN) Algorithm Tutorial — Machine Learning Basics

Last Updated on July 14, 2021 by Editorial Team

Author(s): Sujan Shirol, Husna Sayedi, Roberto Iriondo

The image is a screenshot of the COVIDCAST application, an open-source project by the Delphi Research Group at Carnegie Mellon University.
The image is a screenshot of the COVIDCAST application, an open-source project by the Delphi Research Group at Carnegie Mellon University.

Diving into K-nearest neighbor, a fundamental classical machine learning (ML) algorithm

Acknowledgments: This work has been led by Sujan Shirol, supervised by Husna Sayedi, reviewed and edited by Roberto Iriondo. We used natural language optimization to improve the experience and sentiment of this article to our readers. Please let us know if you have any feedback on whether you would like to see more of this optimization. All images are from the author(s) unless stated otherwise.


The k-nearest neighbor algorithm, commonly known as the KNN algorithm, is a simple yet effective classification and regression supervised machine learning algorithm. This article will be covering the KNN Algorithm, its applications, pros and cons, the math behind it, and its implementation in Python. Please make sure to check the entire implementation from this tutorial on either Google Colab or Github to aid with your reading.

Let’s understand what is meant by classification and regression: classification is hardwired since we were kids. While the first-word we might have said would be either ‘dad’ or ‘mom.’ How would a baby know who their ‘mom’ or ‘dad’ is? We may have seen the video or heard from our parents how happy and excited they were to teach us how to identify them. They used to point at each other every day until we understood the difference.

Consequently, this method wherein pointing at each other to make us learn is known as supervised learning. Fundamentally, supervised learning is learning a function that maps an input to an output based on examples, input-output pairs [9].

The features by which a baby classifies are most likely to be the facial hair and voice, these are known as dependent variables/input, and the independent variable/output is binary, pretty much yes or no. This entire process can be called model training, and we will learn more sophisticated ways to implement this process in a way that the machines can understand. Similarly, when the independent variable/output is a constant value, it is known as a regression problem.

What is the K-Nearest Neighbors (KNN) Algorithm?

The KNN algorithm is a major classical machine learning algorithm that focuses on the distance from new unclassified/unlabeled data points to existing classified/labeled data points. Think of it as the process of entering a college in the middle of an academic year for whatever reason. As we can see, there may be already like-minded groups formed by the students among themselves, and it is just a matter of time for us to figure out what group we would fit in — particularly, which group we would feel more connected to, or in other words, less distanced from.

Figure 1: Colors of the three flower categories in our example dataset, red, yellow and green.
Figure 1: Colors of the three flower categories in our example dataset, red, yellow, and green.

We already have labeled data points from our dataset. We will plot them on a 2-dimensional graph; these data points belong to three categories represented by red, green, and yellow colors, as shown in figure 1.

Next, we will consider a new unlabelled data point represented by the black cross mark. Making it time to determine which category this new data point belongs to from the three colors. First, we take a random value, which is known to be the k-value. The k-value tells the number of nearest points to look for from the new unlabelled data point. Consider the k-value as 5. Next, we calculate the distance from the unlabelled data point to every data point on the graph and select the top 5 shortest distances.

Figure 2: Measuring the distance between the categories from our dataset.
Figure 2: Measuring the distance between the categories from our dataset.

Amongst the top 5 nearest data points, 3 belong to category red, 1 belong to category green, and 1 belong to category yellow. These are known as k(5) nearest neighbors of new data. It is now evident that the new data point belongs to category red as most of its nearest neighbors are from category red.

Figure 3: Seeing how the new data point belongs to the category red from our example dataset.
Figure 3: Seeing how the new data point belongs to the category red from our example dataset.

Similarly, in a regression problem, the aim is to predict a new data point’s value, not the category to which it belongs. Again, to illustrate, plot a 2-dimensional graph consisting of data points from the given dataset. Since it is a 2-dimensional graph, there are 2 features for each data point. The x-axis represents feature-1, and the y-axis represents feature-2.

Next, we introduce a new data point for which only the feature-1 value is known, and we need to predict the feature-2 value. We take a k-value of 5 and get the 5 nearest neighboring points from the new data point. The predicted value of feature-2 for the new data point is the mean of the feature-2 of 5 nearest neighbors.

Figure 4: Our example of introducing a new data point to illustrate the k-nearest neighbor (KNN) algorithm.
Figure 4: Our example introduces a new data point to illustrate the k-nearest neighbor (KNN) algorithm.

When to use KNN?

  1. The KNN algorithm can compete with the most accurate models because it makes highly accurate predictions. Therefore, we can use the KNN algorithm for applications that require high accuracy but that do not require a human-readable model [11].
  2. When the provided dataset for our task is small.
  3. When data is labeled correctly, and the predicted value will be among the given labels. If there is category 1, category 2, and category 3, then the predicted category should be one of them, not any other category.
  4. KNN is used to solve regression, classification, or search problems

Pros and Cons of Using KNN

Pros

  • It is straightforward and easy to implement the algorithm since it requires only two parameters: k-value and distance function.
  • A good value of k will make the algorithm robust to noise.
  • It learns a nonlinear decision boundary.
  • There are almost no assumptions on the given data. The only thing that is assumed is nearby/similar instances belong to the same category.
  • It is a non-parametric approach. No model fitting/training is required. The data speaks for itself.
  • Since model training is not required, it is easy to update the dataset.

Cons

  • Inefficient for large datasets since distance has to be calculated throughout every point, looping every time the algorithm encounters a new data point.
  • KNN assumes similar data points are close to each other. Therefore, the model is susceptible to outliers. A few outliers from a particular category can draw the new data towards it even in cases when the new data belongs to a different category.
  • It cannot handle imbalanced data. When the data is imbalanced, there is a lot more data belonging to one particular category than the rest of the categories; the algorithm will be biased. Therefore, it needs to be handled explicitly.
  • If our dataset requires a K that is a large number, it will increase the computational expense of the algorithm.

Diving Into the Math Behind KNN

As discussed, the algorithm calculates the distance from the new data point to each existing data point. The question is, how is the distance measured? There are three methods, which we will discuss in this work.

Figure 5: A representation of the Minkowski distance.
Figure 5: A representation of the Minkowski distance.

Minkowski Distance:

a) The Minkowski Distance is a generalized distance function in a normed vector space. The vector space must meet the following requirements:

  1. Zero Vector: The vector zero has 0 length
  2. Scalar Factor: Multiplying a vector with a scalar only changes the length, not the direction
  3. Triangle Inequality: The shortest distance between any two given points is a straight line.
Figure 6: Formula for the Minkowski distance.
Figure 6: Formula for the Minkowski distance.

b) In cases,

  1. When p=1 — this is the Manhattan Distance
  2. When p=2 — this is the Euclidean Distance
  3. When p=infinity — this is the Chebyshev Distance

Euclidean distance

In mathematics, the Euclidean distance between two points in the Euclidean space is the line segment’s length between the two points. It can be calculated from the points’ cartesian coordinates using the Pythagorean theorem, therefore occasionally being called the Pythagorean distance [10].

To calculate the distance between two points (x1, y1) and (x2, y2) on a 2-dimensional plane, we use the following formula:

Figure 7: Euclidean distance formula.
Figure 7: Euclidean distance formula.

Manhattan distance

The Manhattan distance calculation is similar to the calculation of Euclidean distance, with the only difference being that we take absolute value instead of taking the square root of the sum of squared difference. By taking the absolute value, we are not calculating the shortest distance between two points like in a Euclidean distance.

Fundamentally, the Euclidean distance represents “flying from one point to another,” and the Manhattan distance is “traveling from one point to another point” in a city following the pathway or the road.

Figure 8: Manhattan distance formula.
Figure 8: Manhattan distance formula.

However, and most likely, the method used to calculate distance is the Euclidean distance formula. One of the reasons being, Euclidean distance can calculate the distance in any dimension, whereas Manhattan finds the elements on a vertical or a horizontal plane.

How to Choose the Right Value for K?

Choosing K is unique to every dataset. There is no standard statistical method to compute the most optimal K value. We want to choose a K value that will reduce errors. As we increase K, our predictions become more stable due to averaging or majority voting. However, if K is too large, then the error rate will increase again as it will underfit the model. In other words, a small K yields a low bias and high variance (higher complexity), while a large K yields a high bias and a low variance. That being said, there are few different methods to try:

  1. Domain knowledge: As noted previously, K is highly data-dependent. For example, if one analyzes a distinct flower species dataset, it is easy to see that K should be that specific number of flower species.
  2. Cross-Validation: This well-known technique is useful in comparing accuracy measures of a range of K values, e.g., K being values from 1 to 10. This technique entails breaking the training set into test/validation sets to tune K to find the optimal value.
  3. Square Root: A simple method to try when one has little domain knowledge about the data is to square root the number of data points in the training set.

Implementation of KNN in Python

For the implementation of the KNN algorithm, we will be using the Iris dataset. The Iris dataset is a collection of morphologic variations of Iris flowers of three related species: Setosa, Versicolor, Virginica. The observed morphologic variations are sepal length, sepal width, petal length, and petal width.

Sklearn is a Python library that features various classification, regression, and clustering algorithms. It also holds the Iris dataset as sample data. We will import necessary libraries like NumPy, Pandas, and matplotlib.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
iris = load_iris()

The table below shows how the first 5 rows of data look like once we have it in a Pandas DataFrame. The target values are 0.0, 1.0, and 2.0, representing Setosa, Versicolor, and Virginica, respectively.

Figure 9: Table showing the first 5 rows of our data in a Pandas dataframe.
Figure 9: Table showing the first 5 rows of our data in a Pandas dataframe.

Since KNN is sensitive to outliers and imbalanced data, checking for the same and handling it is very important. Checking for imbalance data by plotting a count plot for the target variable, there are 50 samples of each flower type. Consequently, the data is perfectly balanced.

sns.countplot(x=’target’, data=iris)

Figure 10:

By checking for outliers using boxplot, there seem to be not many outliers to be handled.

for feature in [‘sepal length (cm)’, ‘sepal width (cm)’, ‘petal
 length (cm)’, ‘petal width (cm)’]:
sns.boxplot(x=’target’, y=feature, data=iris)
plt.show()
Figure 11:

Next, we split the data into training and testing sets to measure how accurate the model is. The model will be trained on the training set, which is randomly selected 60% of the original data and then evaluated with a testing set which is the remaining 40% of the original data. Before splitting it into training and testing sets, it is essential to separate the feature/dependent and target/independent variable.

X = iris.drop([‘target’], axis=1)
y = iris[‘target’]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)

Building the initial model with a k-value of 1, meaning only 1 nearest neighbor will be considered for classifying a new data point. Internally, the distance from the new data point to all the data points will be calculated and then sorted from smallest to largest, along with their respective classes. Since the k-value is 1, the class (target value) of the first instance from the sorted array will determine the new data class. As we can see, we obtain a decent accuracy score of 91.6%. However, the optimal k-value needs to be selected.

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
print(knn.score(X_test, y_test))
Output: 0.9166666666666666

To find the optimal k-value by cross-validation method, we calculate the k-values’ accuracy ranging from 1 to 26 in this case and choosing the optimal k-value. The accuracy score ranges between 86% to 96% approximately. As observed, the accuracy score starts from a low value, reaches a peak at some point, stays approximately constant for a while, and then again drops. The range where the score remains constant for some time can be considered as an optimal k-value for the given dataset.

k_range = list(range(1,26))
scores = []
for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
scores.append(metrics.accuracy_score(y_test, y_pred))
plt.plot(k_range, scores)
plt.xlabel(‘Value of k’)
plt.ylabel(‘Accuracy Score’)
plt.title(‘Accuracy Scores for different values of k’)
plt.show()
Figure 12:

Introducing a new unlabeled data point whose class we need to predict is the flower type, which belongs to a category based on its morphologic features. We will build the model with a k-value of 11.

knn = KNeighborsClassifier(n_neighbors=11)
knn.fit(iris.drop([‘target’], axis=1), iris[‘target’])
X_new = np.array([[1, 2.9, 10, 0.2]])
prediction = knn.predict(X_new)
print(prediction)
if prediction[0] == 0.0:
print(‘Setosa’)
elif prediction[0] == 1.0:
print(‘Versicolor’)
else:
print(‘Virginica’)
Output: [2.]
Virginica

KNN Applications

From forecasting epidemics [2] and the economy to information retrieval [4] [5], recommender systems [3], data compression, and healthcare [1], the k-nearest neighbors (KNN) algorithm has become fundamental in such applications. KNN is known for being one of the most straightforward supervised machine learning algorithms, and its implementations are primarily used during regression and classification tasks, as we have discussed.

One of the most significant use cases for the k-nearest neighbor algorithm is recommendation systems [3] [6]. A simple Google search gives us several promising articles to implementations on recommender systems using KNN [7], mainly due to KNN’s ability to propagate similar recommendations for a set of particular items.

For instance, imagine that we put a group of users with a diverse and pseudorandom interest in movies. A recommendation system compares the users’ profiles to find whether a set of users has a similar taste. Afterward, suppose two users have a similar taste on two or more items during the comparison. In that case, an item that the first user enjoys might probably be enjoyable to the second user.

Similarly, KNN can be of use during classification tasks [8].

Conclusion

KNN is a highly effective, simple, and easy-to-implemented supervised machine learning algorithm that can be used for classification and regression problems. The model functions by calculating distances of a selected number of examples, K, nearest to the predicting point.

For a classification problem, the label becomes the majority vote of the nearest K points. For a regression problem, the label becomes the average of the nearest K points. Whenever a prediction is being made, the model searches the entire training set to find the K-most similar examples to label the original prediction point.

The major drawback to this algorithm is that as the number of data increases, so does the computational expense and time. However, if the dataset we are working with is a proper size dataset (like the Iris dataset), KNN is an easy and straightforward algorithm to implement as there is no need to build a model, tune parameters, or make any additional assumptions on the model.


DISCLAIMER: The views expressed in this article are those of the author(s) and do not represent the views of any company (directly or indirectly) associated with the author(s). This work does not intend to be a final product, yet rather a reflection of current thinking, along with being a catalyst for discussion and improvement.

All images are from the author(s) unless stated otherwise.

Published via Towards AI

Resources

Youtube Tutorial.

References

[1] Prediction of COVID-19 Possibilities using KNN Classification Algorithm, (2021). Retrieved 14 January 2021, from https://assets.researchsquare.com/files/rs-70985/v2_stamped.pdf

[2] Prediction Model for Influenza KNN Algorithm Based Classification on Twitter Data, Kavitha, et al., (2021). Retrieved 14 January 2021, from http://ijseas.com/volume1/v1i7/ijseas20150712.pdf

[3] Subramaniyaswamy, V., & Logesh, R. (2017). Adaptive KNN based Recommender System through Mining of User Preferences. Wireless Personal Communications, 97(2), 2229–2247. DOI: 10.1007/s11277–017–4605–5

[4] Introduction to Information Retrieval, Chris Manning and Pandu Nayak, Stanford University, (2021). Retrieved 14 January 2021, from https://web.stanford.edu/class/cs276/handouts/lecture12-textcat.pdf

[5] Nearest neighbor search. (2021). Retrieved 14 January 2021, from https://en.wikipedia.org/wiki/Nearest_neighbor_search

[6] Recommendation System Based on Collaborative Filtering, Zheng Wen, Stanford University, (2021). Retrieved 14 January 2021, from http://cs229.stanford.edu/proj2008/Wen-RecommendationSystemBasedOnCollaborativeFiltering.pdf

[7] recommendation systems using KNN — Google Search. (2021). Retrieved 14 January 2021, from https://mktg.best/a1t-9

[8] K-nearest neighbors algorithm. (2021). Retrieved 14 January 2021, from https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

[9] “Supervised Learning”. 2021. En.Wikipedia.Org. https://en.wikipedia.org/wiki/Supervised_learning#cite_note-1.

[10] ”Euclidean Distance”. 2021. En.Wikipedia.Org. https://en.wikipedia.org/wiki/Euclidean_distance#:~:text=In%20mathematics%2C%20the%20Euclidean%20distance,being%20called%20the%20Pythagorean%20distance.

[11] “IBM Knowledge Center”. 2021. Ibm.Com. https://www.ibm.com/support/knowledgecenter/SSCJDQ/com.ibm.swg.im.dashdb.analytics.doc/doc/r_knn_usage.html.


Metadata: {knn algorithm} {knn classifier} {knn in r} {knn in python} {sklearn knn} {knn regression} {knn clustering} {knn classifier sklearn} {knn supervised} {knn classification} {knn vs k means} {knn algorithm example} {knn overfitting}

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->