Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Explainable Clustering — The Introduction of Recursive Embedding and Clustering and its Application
Data Visualization   Latest   Machine Learning

Explainable Clustering — The Introduction of Recursive Embedding and Clustering and its Application

Last Updated on January 3, 2025 by Editorial Team

Author(s): Yuki Shizuya

Originally published on Towards AI.

The explainable clustering method overview

Recently, I read an interesting article from Spotify titled “Recursive Embedding and Clustering.” According to the blog, Spotify developed an explainable clustering method and utilizes this technique to research user clustering. Although this method is helpful for all other companies, there are no implementation codes. Thus, in this blog, I will explain this technique step by step with Python implementation, and I hope it will help someone who wants to treat clustering in an explainable way.

Table of Contents

  1. Introduction of Recursive embedding and clustering method
  2. Real-world data Application : E-commerce — Users of a French C2C fashion store dataset

1. Introduction of Recursive embedding and clustering method

Spotify has around 600 million users worldwide. A deeper understanding of users is essential to provide a better user experience, but because of its popularity, it is challenging. Moreover, although user segmenting by clustering is one of the effective methods, also because of its popularity, it is difficult to explain or examine why the cluster exists. Therefore, Spotify developed the explainable clustering method to find significant clusters and explain why those clusters exist. The overall architecture is shown below.

The over all architecture for explainable clustering method

It mainly comprises two stages.

  1. Find the low-dimensional representation from the original data using UMAP and cluster it using HDBSCAN. Examine the clusters’ characteristics.
  2. Train XGBoost using the raw data as input and labels that HDBSCAN classified as output and understand the feature contribution using SHAP.

Let’s dive into it step by step.

For the first stage, we utilize UMAP and HDBSCAN to construct the understandable clusterings. Let’s quickly recap about UMAP and HDBSCAN.

1.1 UMAP

UMAP image adapted from the original paper [2]

Uniform Manifold Approximation and Projection (UMAP) is one of the non-linear dimension reduction algorithms. UMAP learns the manifold structure in the input data’s high-dimensional space and finds the low-dimensional representation, preserving the manifold structure as much as possible. It is classified in the same category as T-SNE. One of the advantages of UMAP is that it is more robust in preserving the high-dimensional structure than T-SNE because of the loss function (For your information, T-SNE uses KL divergence, and UMAP uses Cross-entropy loss). You can see more details in this blog [3].

Thus, we utilize UMAP as a dimension reduction algorithm to reduce the dimension of input data to 2 or 3 dimensions.

1.2 HDBSCAN

HDBSCAN clustering result adapted from the official documentation

Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) is a density-based clustering method. It is fast and robust and finds meaningful clusters if there are any. The algorithm overview from the official guide is below.

  1. Transform the space according to the density/sparsity.
  2. Build the minimum spanning tree of the distance-weighted graph.
  3. Construct a cluster hierarchy of connected components.
  4. Condense the cluster hierarchy based on minimum cluster size.
  5. Extract the stable clusters from the condensed tree.

Therefore, we utilize HDBSCAN to find dense clusters based on the UMAP representation. We can assign labels using HDBSCAN and check the features of clusters by examining the raw data.

In summary, we conduct two processes in the first stage.

  1. Condense the high-dimensional information of input data to low-dimensional representation using UMAP.
  2. Find clusters using low-dimensional data by UMAP.

After processing this procedure, we can visualize the clusters and check the details using the raw data.

1.3 XGBoost + SHAP

XGBoost is a popular gradient-boosting tree algorithm. It solves many data science problems quickly and accurately. On the other hand, SHAP is one of the explainable AI methods for any machine learning algorithm. We can check the contribution of each feature using SHAP. Since we don’t want to proceed to the first stage many times, which is time-consuming, we train the XGBoost model using the high-dimensional data as input and HDBSCAN labels as output in the second stage. Note that we use XGBoost as a one-versus-all model for each cluster to allow very fast, accurate training and a precise understanding of the feature’s importance. The illustration of this process is as follows:

The illustration of the second stage process

We train multiple XGBoost classifiers for each label and can get the detailed importance of the feature using SHAP.

So far, we have gone through the algorithmic perspective of this method. In the next section, let’s see the power of this method using real-world data!

2. Real-world data Application: E-commerce-Users of a French C2C fashion store dataset

In the last section, I will guide you through applying recursive embedding and clustering methods using the TikTok user profile dataset [4]. As its name suggests, the TikTok user profile dataset contains the user information, and each row represents one user’s information. Let’s implement it step by step.

As the first step, you need to set an environment. I used ubuntu20.04 and Python3.10 environment. Firstly, I create the virtual environment using conda.

conda create --name rclustering python==3.10 -y
conda activate sam
conda install pip
## optional: To avoid install libraries on the local environment,
## check the which pip will be used to store libraries
which pip
# I use /opt/conda/envs/rclustering/bin/pip in my enviornment.

Next, you need to install the following libraries.

conda install -c conda-forge umap-learn=0.5.6 -y
conda install -c plotly plotly=5.23.0 -y
conda install -c conda-forge pandas=2.2.2 -y
conda install -c conda-forge nbformat -y
conda install -c conda-forge py-xgboost -y
conda install -c conda-forge shap -y
conda install -c conda-forge matplotlib

We’ve finished to set an environment, so let’s see the TikTok user profile dataset. I will attach the whole code that I used later.

The TikTok user profile dataset contains 1,000 rows and 19 columns. The first five rows are shown below.

df = pd.read_csv('../TikTok profiles dataset (Public web data).csv')
df.head()
The first five row data

The dataset contains the following columns besides the unrelated columns in this setting, such as timestamp, biolink, create_time, id, top_videos, URL, and profile_pic_url.

  • account_id : Unique identifier for each account.
  • nickname : account name.
  • biography : account’s biography.
  • awg_engagement_rate : average engagement rate.
  • comment_engagement_rate
  • like_engagement_rate
  • is_verified : the verified status on TikTok.
  • followers : the number of followers.
  • following : the number of following creators.
  • likes : the number of likes the user have obtained.
  • videos_count : the number of videos.

The dataset distribution is shown below.

fig = make_subplots(rows=3, cols=2)
viz_columns = ['awg_engagement_rate', 'comment_engagement_rate', 'like_engagement_rate', 'followers', 'following', 'videos_count']

for i in range(len(viz_columns)):
x = i % 2 + 1
y = i // 2 + 1

col = viz_columns[i]
values = df[col]

fig.add_trace(go.Histogram(x=values, name=viz_columns[i]), row=y, col=x)
fig.show()
The distribution of TikTok user profile dataset

As you can see, the distribution of all features tends to be right-skewed, and each feature’s scale varies.

For the next step, we need to preprocess data. In this case, we can drop account_id, nickname, and biography for simplicity. If you want to use nicknames and biographies, such as text information, we need to utilize the language model to encode this information. Moreover, we have one categorical variable, is_verified, so we must encode it using LabelEncoder. Before proceeding with UMAP, we need to rescale input variables using StandardScaler.

Now, let’s using UMAP. UMAP has some hyperparameters that we need to adjust. “metric” and “neighbor_num” tend to make a bigger difference in the result. For your information, “metric” measures the distance between each point, and “neighbor_num” balances local and global structures. In this blog, I change the metric and neighbor_num to find distinguishable low-dimensional representations. The following graphs show the results of UMAP.

def apply_umap(features: np.array, random_state: int = 7, metric: str = 'euclidean', neighbor_num: int = 15):
reducer = umap.UMAP(random_state=random_state, n_neighbors=neighbor_num, metric=metric)
embedding = reducer.fit_transform(features)

return reducer, embedding

def visualize_sample(df: pd.DataFrame, embedding: np.array, metric: str):
df['umap_feat1'] = embedding[:, 0].reshape(-1, 1)
df['umap_feat2'] = embedding[:, 1].reshape(-1, 1)

fig = px.scatter(sample_df, x='umap_feat1', y='umap_feat2', hover_data=scaled_columns, title=f'{metric}')
fig.show()

metric = 'euclidean'
_, embedding = apply_umap(scaled_features, neighbor_num=10, metric=metric)
visualize_sample(sample_df, embedding, metric)
The UMAP result for each metric

The case using Euclidean space as a metric shows so-so distinguishability, so I chose Euclidean one for the HDBSCAN input.

The next step is HDBSCAN part. HDBSCAN also has hyperparameters, such as the minimum number of data assigned to one cluster. Note that you need to adjust hyperparameters to be compatible with your data. The HDBCAN clustering result is shown below.

# HDBSCAN
hdb = HDBSCAN(min_cluster_size=10)
hdb.fit(embedding)
labels = hdb.labels_

sample_df['HDBSCAN_label'] = labels.reshape(-1, 1)
# Change the type of label for better visualization in plotly
sample_df['HDBSCAN_label'] = sample_df['HDBSCAN_label'].astype(str)

fig = px.scatter(sample_df, x='umap_feat1', y='umap_feat2', color='HDBSCAN_label', hover_data=scaled_columns)
fig.show()
(Left) HDBSCAN result for all data, (Right) HDBSCAN result for data except for unassigned data

The left chart shows the HDBSCAN result using all data, and the right one shows the result except for data not assigned to any clusters. HDBSCAN found 23 clusters in this data. To examine the clusters thoroughly, I found each cluster has the following features. I will show you some of the main clusters below.

  • Cluster 0: verified influencer cluster.
  • Cluster 1: non-verified and many followers with low engagement rate cluster.
  • Cluster 4: Normal number of followers, following, and more engagement level clusters.
  • Cluster 13: the number of following is larger than the number of followers.

To find the feature of the cluster, I visualize the distribution in the target cluster. For example, we can do it as follows:

# cluster 1
cls1_df = sample_df[sample_df['HDBSCAN_label'] == '1']

fig = make_subplots(rows=3, cols=2)
viz_columns = ['awg_engagement_rate', 'comment_engagement_rate', 'like_engagement_rate', 'followers', 'following', 'videos_count']

for i in range(len(viz_columns)):
x = i % 2 + 1
y = i // 2 + 1

col = viz_columns[i]
values = cls1_df[col]

fig.add_trace(go.Histogram(x=values, name=viz_columns[i]), row=y, col=x)
fig.show()
The visualization result for the distribution of the cluster 1

The above example shows cluster 1’s feature distribution. As you can see, this cluster has many followers and a low rate for three engagement features.

Now, let’s train XGBoost for each cluster to verify that my assumption is correct. In this blog, I will skip training the small-sized clusters because they tend to be difficult to train and produce less business value than other bigger clusters. The following figures show the results for the bigger clusters.

# To cut off the smaller size of clusters
# But if you want to pass specified cluster, please enter the label into ok_labels list
def label_check(df_length: int, label: str, minimum_num: int = 40, ok_labels: list = []):
if label in ok_labels:
return True

if label == '-1':
return False

if df_length < minimum_num:
return False
else:
return True


minimum_num = 30 # the minumum number of the cluster size
ok_labels = ['9']
labels = sample_df['HDBSCAN_label'].unique().tolist()

os.makedirs('models', exist_ok=True)

for label in labels:
df_length = len(sample_df[sample_df['HDBSCAN_label'] == label])

if not label_check(df_length, label, ok_labels=ok_labels):
print(f'skip Label {label}')
continue

print(f'processing labels {label}')
sample_df['label'] = 0
sample_df.loc[sample_df['HDBSCAN_label'] == label, 'label'] = 1

X_train, X_test, y_train, y_test = train_test_split(sample_df[scaled_columns].values,
sample_df['label'],
stratify=sample_df['label'],
test_size=0.05)
print(np.unique(y_train, return_counts=True)[-1])
train_df = pd.DataFrame(X_train, columns=scaled_columns)
test_df = pd.DataFrame(X_test, columns=scaled_columns)

classes_weights = class_weight.compute_sample_weight(
class_weight='balanced',
y=y_train
)

tree = XGBClassifier(n_estimators=2, max_depth=2, objective='binary:logistic', sample_weight=classes_weights)
tree.fit(train_df, y_train)

preds = tree.predict(X_test)

print(f"\nLabel{label}: Classification Report:")
print(classification_report(y_test, preds))

xgboost.plot_importance(tree)
pl.title(f"Label{label}: xgboost.plot_importance(model)")
pl.show()

explainer = shap.Explainer(tree, train_df)
shap_values = explainer(train_df)

shap.plots.bar(shap_values)

tree.save_model(f'models/xgboost-{label}.model')
The XGBoost + SHAP result for cluster 1 and 4
The XGBoost + SHAP result for cluster 9 and 13

According to the result, XGBoost tends to focus on four features: “videos_count,” “following,” “followers,” and “awg_engagement_rate.” Furthermore, the explanation from SHAP is similar to my assumption. Therefore, you can gain a better understanding of the cluster in the first stage via XGBoost + SHAP. Note that it might be overfitted if your data size is small, so you need to add some regularization methods to prevent it.

Once you complete training XGBoost, you can use it when you get new data. Isn’t it very convenient? Now, you get the powerful tool to explain to your stakeholders why the cluster exists easily.

In this blog, I introduced the explainable clustering method developed by Spotify. It is simple but powerful, and I hope this blog helps you get good user segmentation or other things. The code I used is attached below.

References

[1] Pereira, G., Recursive Embedding and Clustering, Spotify R&D Engineering

[2] McInnes, L., Healy, J., Melville, J., UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, Arxiv

[3] Oskolkov, N., Why UMAP is Superior over T-SNE, Towards Data Science

[4] Kumar, M., TikTok User Profiles Dataset, kaggle

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->