Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

A Visual Walkthrough of DeepSeek’s Multi-Head Latent Attention (MLA) πŸ§Ÿβ€β™‚οΈ
Artificial Intelligence   Latest   Machine Learning

A Visual Walkthrough of DeepSeek’s Multi-Head Latent Attention (MLA) πŸ§Ÿβ€β™‚οΈ

Last Updated on June 23, 2024 by Editorial Team

Author(s): JAIGANESAN

Originally published on Towards AI.

A Visual Walkthrough of DeepSeek’s Multi-Head Latent Attention (MLA) πŸ§Ÿβ€β™‚οΈ

Exploring Bottleneck in GPU Utilization and Multi-head Latent Attention Implementation in DeepSeekV2.

Image by Vilius Kukanauskas from Pixabay

In this article, we’ll be exploring two key topics. First, we’ll discuss and understand the bottleneck problems that transformer models, also known as Large Language Models (LLMs), encounter during training and inference.

Then, we’ll delve into a specific bottleneck issue in LLM architectures regarding KV cache, and how DeepSeek’s innovative approach, Multi-Head Latent Attention, addresses this problem.

Disclaimer πŸ›‘: This article is not sponsored or affiliated with DeepSeek in any way. The content is inspired by publicly available information.

The topics we’ll be covering in this article are:

✨ The bottleneck problem in GPU processing

πŸͺ„ Multi-Head Latent Attention (MLA)

1. The bottleneck problem in GPU processing: 🦈

In recent years, there has been a surge in research and investment in Graphics Processing Units (GPUs). In fact, when I wrote this article, I came across the news that NVIDIA had become the second most valuable public company, with a staggering valuation of $3 trillion.

But why is that? The answer lies in the fact that AI models need to run lots of mathematical operations, and GPUs are instrumental in making that happen to run faster. They have become incredibly efficient at performing calculations, and, unsurprisingly, they’re in high demand.

GPUs have become too fast. They can perform calculations at an incredible rate, measured in floating-point operations per second (FLOPs).

But there’s an issueβ€” the speed of computation has far outpaced the memory bandwidth (GB/s), which is the speed at which data is transferred between different memory areas in GPUs. This mismatch has created a bottleneck that slows down the entire process.

To help illustrate the bottleneck problem, let’s take a look at images 1 and 2.

Image 1: Source: nvidia.com

Image 1 shows the specifications of NVIDIA’s A100 GPU. As you can see, this powerful GPU can perform an impressive 19.5 TFLOPs (tera floating-point operations) in FP32 mode. However, its GPU memory bandwidth is limited to around 2 TB/s. Though these two are two different things, there is a connection.

This highlights a crucial point, the bottleneck isn’t always about how many operations we can perform, but rather how quickly we can transfer data between different parts of the GPU. If the number of data transfers increases with high memory the latency will also increase.* The size and quantity of tensors involved in our calculations play a significant role in this.

For example, computing the same operation on the same tensor multiple times might be faster than computing the same operation on different tensors of the same size. This is because the GPU needs to move the tensors around, which can slow things down. So, our goal shouldn’t just be to optimize the number of operations (KV cache, MQA, GQA) we perform but also to minimize the memory access and transfers we need to make.

Image 2 will help clarify these concepts and give you a better understanding of how these operations work. Image 2 illustrates how Attention operation happens in GPU.

Image 2: Attention Operation in GPUs. Created by author

Now that we’ve explored the bottleneck problem, it’s clear that it can significantly increase latency. To tackle this issue and reduce latency during inference, researchers have proposed various approaches, including flash attention, multi-query attention, grouped query attention, KV cache method, rolling buffer KV cache, and more. Even the recently popularized method, MAMBA, addresses this problem indirectly.

Take a closer look at Image 1, and you’ll notice that the GPU memory is 80 GB HBM2e (High Bandwidth 2nd Generation enhanced). However, when training and inferencing large language models (which have grown exponentially in parameters in recent years, especially with the rise of LLM and multi-modal models), this HBM memory can quickly become a limiting factor, creating a bottleneck in the process and increasing latency.

1.1 So, what bottleneck problem does DeepSeek address? πŸ˜΅β€πŸ’«πŸ˜΅

As we’re all aware, KV cache inference is a solution that helps reduce the computational load in the attention mechanism (The vanilla transformer β€” attention is all you need paper architecture). It works by caching tokens in Key and value to generate the next token. However, in real-life scenarios where we’re dealing with long sequences, the KV caches become extremely large and memory-intensive. This limits the maximum batch size and sequence length, creating a bottleneck.

In an effort to solve this bottleneck and reduce latency, the researchers at DeepSeek came up with a new approach called Multi-Head Latent Attention. This new method aims to mitigate the bottleneck problem and speed up the inference process.

2. Multi-Head Latent Attention (MLA) πŸ§žβ€β™‚οΈπŸ§žβ€β™‚οΈ

I assume you are familiar with self-attention and multi-head attention in large language models. If not, please familiarize yourself with these concepts.

Large Language Model (LLM)πŸ€–: In and Out

Delving into the Architecture of LLM: Unraveling the Mechanics Behind Large Language Models like GPT, LLAMA, etc.

pub.towardsai.net

In simple terms, researchers have made a breakthrough with multi-head latent attention by reducing the space complexity, or memory usage, to decrease time complexity, ultimately leading to lower latency.

I want to give a small description of what a Latent vector is. A latent vector represents an underlying or unobservable factor influencing or contributing to a model's observed data or outcome.

In some models, like Mistral, the dimension of the model is set to 4096. This means that all vectors, layer outputs, and embedding dimensions will be the same size as the model dimension.

However, researchers working on DeepSeek took a different approach. They reduced the model dimension to store the KV cache efficiently.

For example, they shrunk the vector dimension from 4096 to 1024. This allows the KV cache to store in the 1024 dimension, while other layers still use the original model dimension. Let’s dive deeper into how this works, both mathematically and visually.

Image 3: Multi-Head Latent Attention in DeepSeekv2[1]

The architecture of multi-head latent attention is shown in Image 3. This architecture shows how the latent vector is created and how the hidden input is passed through the RoPE (Rotary Positional Embedding) module.

2.1 Low-Rank Projection

Image 4: Key-Value Down And Up Projection. Source: DeepSeekV2 Research paper [1]
""" I want to Give a Latent vector illustration with model dimension of 4096 
and Latent dimension of 1024 """

# dim = 4096 , latent_dim = 1024
self.c^KV_t = nn.Linear(dim, latent_dim, bias=False) # Equation 1
self.k^C_t = nn.Linear(latent_dim, dim, bias=False) # Equation 2
self.v^C_t = nn.Linear(latent_dim, dim, bias=False) # Equation 3

The core idea behind MLA (Multi-Head latent attention) is the low-rank joint compression for key and value to reduce the KV cache.

Note: (1) In this example, I’ve used a latent vector size of 1024 for simplicity. (2) The numbers you see in the images (illustration images) are just random examples and don’t represent actual numbers in the model. (3) The weight matrices shown in the images and equations are learnable parameters, which means they get updated during the backpropagation in training.

c^KV_t is a Compressed Latent Vector for keys and values (Equation 1). This is a Lower-dimensional representation of the original key and value vectors. (As I mentioned earlier this dimension will be like 1024 (d_c β€” Compressed dimension), The Latent vector is low compared to the model dimension. It is Cached during Inference.

In Equation 1 from Image 4, the down-projection weight Matrix W^DKV projects the input vector(h_t) from a model dimension (4096) to a latent dimensional (1024).

Image 5: Visual Representation of Equation 1 in Image 4. Created by author

Image 5: illustrates Equation 1 in Image 4. I took 9 tokens with a model dimension of 4096

When we’re making predictions with our model (inference), we don’t need to store the original high-dimensional key and value vectors. Instead, we can store a compressed latent key and value vectors, called c^KV_t. This is much more efficient because c^KV_t has only 1024 elements (Each vector), compared to the original 4096(model dimension) elements. This approach reduces the amount of memory we need and also speeds up the process.

The equations shown in image 4 (equations 2 and 3) are used to up-project the compressed key and value vector representation(c^KV_t). This means we’re taking the compressed latent vector and projecting it back into the model dimension.

The resulting key and value vectors, k^C_t and v^C_t, are now in the model dimension (h_d * n_h). We achieve this up-projection using linear layer weight matrices W^UK and W^UV.

But, During inference the up-projection matrices W^UK and W^UV can be absorbed (Associative law of matrix multiplication) into the query(W^Q) and output(W^O) projections matrices, eliminating the need to compute keys and values explicitly.

This approach is different from the traditional method where we project the key and value vectors separately and then multiply them with the head’s weight matrices, such as W^Q, W^K, and W^V. Instead, we project the k^C_t and v^C_t directly into the heads’ key and value matrices. I will illustrate this with the image below.

Image 6: Up Projection of Key and Value. Created by author

From Image 6 you can understand that the lower dimensional kv representation(c^KV_t) is projected into 32 Value and 32 key vectors with each size of (9, 128) using two weight matrices (W^UK) and (W^UV). W^UK and W^UV size is ( d_h X n_ h X d_c (latent vector dimension)). These queries and Values are used in Multi-head attention.

d_h β†’ Head’s dimension, n_h β†’ Number of heads

Note: I Projected again into the model dimension. But We can Up Project into any dimension( d_h X n_ h) as needed. But It will affect the Attention layer output projection. So we may need an adjustment in the output weight matrix to project into the model dimension.

Now we need to look into the Query Compression

Image 7: Query Down and Up Projection. Source: DeepSeekV2 Research paper[1]

In Image 7, Equation 4 denotes that the query is compressed into Latent vector space(From 4096 to 1024). and Equation 5 is the up-projection Equation (From 1024 to d_h X n_h).

Equations 4 and 5 are the same as the illustration you have seen in Images 5 and 6.

Till now we have seen the Down Projection and Up projection of Query, key, and Value vectors. But We have to see one important topic, how we integrate the Rotary positional Embedding into these vectors.

2.2 Decoupled Rotary Positional Embedding

RoPE (Rotary Positional Embedding) introduces the positional information to the keys and queries by rotating the vectors in multi-dimensional space. RoPE is Both Relative and absolute.

I will give a simple illustration, of how RoPE rotates the vectors.

Image 8: Vector Rotation in 2D Space. Source: ReFormer Research Paper [4]

Image 8 gives the equation about how the vector got rotated in 2-dimensional space. m is the position of the token/ vector in the sequence. theta is the angle (same for all tokens/vectors). The rotation matrix (cos m.theta, sin m.theta) is usually detonated as r.

Image 9: RoPE in 2-dimensional space. Created by author

In Image 9, I took an example sequence as we enjoy music. In this sequence, β€˜we’ is 1st token so Position m becomes 1 and for enjoy and music position becomes 2 and 3. Theta is the same for all vectors. In most of the architecture (python), sequence position starts with 0, for illustrating how vector rotation happens in 2 dimensions I have taken the position from 1.

We Only Rotate the Vector using the RoPE method. We know vectors have magnitude and direction. In RoPE we only change the direction by the angle of m * theta, and the magnitude of the vector remains the same. It gives both absolute (Position of that particular token) and relative (Position with respect to other tokens in the sequence) Positional information to key and value vectors. This enhances the attention output.

We only Rotate the Query and Key, Not the Value vectors. So Rotary positional Embedding helps to enhance the attention mechanism between inputs. In vanilla transformer architecture there is a Problem, It adds the positional information(sinusoidal function) to Word embeddings, which changes the magnitude of all vectors, Even the same words have different vectors(different magnitude) in the Input.

But in RoPE the same words Magnitude didn’t change after the rotation. The rotation gives the information, so the attention between the vectors has enhanced attention (Semantic and syntactic relationship) output. But Value vectors didn’t go through any rotation (True word embeddings), so the attention layer output has rich information about the sequence of tokens.

But there is a problem When RoPE is position-sensitive for both keys and values. If we apply RoPE in equations 1 and 2 (Image 4), It will be coupled with position-sensitive weight matrices. As a result, W^UK cannot be absorbed into W^Q anymore because the multiplication does not commute. This requires recomputing the keys for all prefix tokens during inference, which is inefficient.

To address this, a decoupled RoPE strategy is proposed. This strategy uses additional multi-head queries and shared keys to carry the RoPE information separately from the compressed keys and queries. Then these decoupled RoPE queries and Keys are concatenated with Up-projected Queries and keys.

Image 10: Decoupled RoPE applied to Keys and Queries and Concatenation

Image 10: W^QR and W^KR are weight matrices to produce the decouples Queries and key as shown in Equations 6 and 7. RoPE (.) denotes an operation that applies RoPE Rotation matrices, as You can compare with Image 8. Equations 8 and 9 denote the Concatenation of Decoupled RoPE applied Queries and Key vectors with Up-projected query and key vectors.

Note: The RoPE didn’t change the magnitude or dimension of the vector. The Weight matrices W^QR and W^KR change the input vector compatible for q^C_t and k^C_t.

The size of q^R_t and k^R_t will be less dimension compared to latent vectors. The size will be (d^R_h * n_h) for queries (n_h RoPE vectors) and d^R_h for key. q^R_t is calculated from a compressed query vector c^Q_t. But k^R_t is calculated from the input vector h_t itself.

The concatenation preserves the positional information in both queries and key. The Query and Key vectors should be in the same dimension to happen attention operation as shown in Image 11.

Image11: Attention Mechanism. Source: DeepSeekV2 Research Paper.[1]

Image 11: u_t is nothing but an attention layer output with the model dimension. W_O weight matrix attention Layer output projection matrix.

2.3 comparing the key-value cache with other methods πŸ€Όβ€

To Prove Multi-Head Latent attention works better, researchers at DeepSeek Compared KV cache Per token among different attention mechanisms as shown in Image 12. We can understand that only a small amount of memory for KV cache is required for MLA, But it can achieve stronger performance than MHA (Multi-Head Attention).

Image 12: Comparison of KV cache per token. Source: DeepSeekV2 Research Paper[1]

Image 12: n_h denotes the number of attention heads, d_h is the dimension of attention heads, l denotes the number of decoder layers in the LLM architecture, and n_g is the number of groups in Grouped query attention.

Complete Computation process of MLA (Summary):

πŸ“Œ Equation 4 (Image 7) β†’ The input vector(h_t) is projected into the latent dimension (compressed version for queries).

πŸ“Œ Equation 5 (Image 7) β†’ The latent vector is then projected into multiple queries( Multiple Heads).

πŸ“Œ Equation 6 (Image 10) β†’ To Capture the Positional Information of input vectors for queries, the Researcher used Decoupled Rotary Positional Embeddings that create Positional vectors.

πŸ“ŒEquation 8 (Image 10) β†’ The Positional Vectors are concatenated with queries

πŸ“Œ Equation 1 (Image 4) β†’ The input vector(h_t) is projected into the latent dimension (compressed version for keys and Values) [Cached During inference].

πŸ“Œ Equation 2 (Image 4) β†’ The latent vector for keys and Values projected into compressed keys using a linear layer weight matrix.

πŸ“Œ Equation 7 (Image 10) β†’ For Positional Information for keys, The RoPE has been applied to the input vector (h_t) to create the Positional Vectors [Cached During Inference].

πŸ“ŒEquation 9 (Image 10) β†’ The Positional Vectors are concatenated into compressed key vectors.

πŸ“Œ Equation 3 (Image 4) β†’ The Latent vector for Keys and Values is projected into compressed value vectors using a linear layer weight projection matrix.

πŸ“Œ Equation 10 (Image 11) β†’ The attention Mechanism happens in all heads.

πŸ“Œ Equation 11 (Image 11) β†’ Attention heads are Concatenated. This Concatenated output is then Linearly projected using a weight Matrix called W_O.

πŸ“Œ The Final output is then fed into the MoE (Fine-grained expert and shared Expert isolation) Layer.

I wrote a series of articles exploring new innovative approaches in LLM. Actually, this article is a continuation of my Previous Article, Where I explored DeepSeek's Innovation in the Mixture of Experts (Fine-Grained Expert and Shared Expert Isolation) that solves the knowledge hybridity and knowledge redundancy in Existing MoEs’ if you want to know more check out this article.

Revolutionizing AI with DeepSeekMoE: Fine-grained Expert, and Shared Expert isolation πŸ§žβ€β™‚οΈ

Optimizing MoE with Fine-Grained and shared expert isolation for enhanced precision and efficiency in Large Language…

pub.towardsai.net

I conclude this article. I believe I have made you understand how MLA Works. If you don’t understand, reread it or read the original research paper. I am just trying my best to give you my perspective and what I have understood.

Thanks for reading this article 🀩. If you found my article useful πŸ‘, give it a clapπŸ‘πŸ˜‰! Feel free to follow for more insights.

Let’s keep the conversation going! Feel free to connect with me on LinkedIn: www.linkedin.com/in/jaiganesan-n/ 🌏❀️

and join me on this exciting journey of exploring AI innovations and their potential to shape our world.

References:

[1] DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2024), Research paper (arxiv)

[2] Flash Attention Conceptual Guide, Huggingface.co

[3] Exploring the GPU Architecture, vmware.com

[4] Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu, RoFormer: Enhanced Transformer with Rotary Position Embedding (2021), Research Paper (arxiv)

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

πŸ”₯ Recommended Articles πŸ”₯

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101β€”Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c πŸš€πŸš€πŸš€ β–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Towards AI is looking for contributors! β”‚ β”‚ Join us in creating awesome AI content. β”‚ β”‚ Let's build the future of AI together β†’ β”‚ β”‚ https://towardsai.net/contribute β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->