Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Transformer Architecture Part -2
Latest   Machine Learning

Transformer Architecture Part -2

Last Updated on September 18, 2024 by Editorial Team

Author(s): Sachinsoni

Originally published on Towards AI.

In the first part of this series(Transformer Architecture Part-1), we explored the Transformer Encoder, which is essential for capturing complex patterns in input data. However, for tasks like machine translation, text generation, and other sequence-to-sequence applications, the Transformer Decoder plays a crucial role. The decoder generates meaningful output sequences by leveraging both the encoder’s representations and the previously generated outputs. It achieves this through a combination of masked self-attention, cross-attention with the encoder, and feed-forward networks. In this blog, we’ll dive into the architecture of the Transformer Decoder and how it enables powerful sequence generation across various applications.

Before we dive into the details of Transformer Architecture, I want to extend my heartfelt gratitude to my mentor, Nitish Sir. His exceptional guidance and teachings on the CampusX YouTube channel have been instrumental in shaping my understanding of this complex topic. With his support, I’ve embarked on this journey of exploration and learning, and I’m excited to share my insights with you all. Thank you, Nitish Sir, for being an inspiration and mentor!

In this blog, we'll dive deep into the Transformer Decoder architecture specifically from the perspective of training. During training, the behavior of the decoder is slightly different compared to inference. While the architecture remains the same in both cases, during training, the decoder operates in a non-autoregressive manner, whereas in inference, it becomes autoregressive. This blog will focus entirely on the decoder’s architecture during the training phase. The inference behavior will be covered in a subsequent post.

To simplify the understanding of the decoder, I'll begin by presenting an overview of the architecture and then gradually move into a detailed breakdown of each component and how they are interconnected. This step-by-step approach will ensure a clear understanding of the decoder's structure and functionality during training.

Now, imagine the Transformer as a large box containing two smaller boxes inside—an encoder and a decoder.

image by CampusX

The encoder is responsible for processing the input data, while the decoder takes the encoder's output and generates the final output sequence. While this is a simplified version, the reality is that both the encoder and decoder consist of multiple layers, often six, stacked on top of each other. Each layer of the encoder and decoder follows the same architecture, but the parameters and weights within them vary.

The decoder, like the encoder, consists of multiple blocks, and understanding the structure of a single decoder block is crucial. Each decoder block has two key components: a self-attention mechanism and a feed-forward neural network.

Internal Structure of single encoder block

While the architecture of these blocks is the same, their internal parameters are different, similar to how identical phones may have different apps installed for different users.

Let’s now break down the decoder architecture. Much like the encoder, the decoder is composed of six stacked blocks. Each of these blocks consists of three key components: the masked self-attention layer, the cross-attention layer (also known as the encoder-decoder attention), and the feed-forward neural network.

Internal Structure of single decoder block

Remember, we don’t have just one decoder block; instead, there are six such blocks stacked sequentially. The output from the first decoder block becomes the input to the second, and so on, until we reach the sixth block. The final output from the sixth block is passed to the output layer, where we get the final prediction.

Flow of Information in 6 encoder block

I know this might seem a bit overwhelming at first, but don’t worry. We’ll go through this architecture step by step, using an example that will make everything clearer. To simplify our exploration, I’ve broken it down into three parts:

  1. Input preparation: Before anything enters the decoder, it’s essential to understand how the input is prepared. We’ll discuss the steps involved in transforming the input so that it can be processed by the decoder.
  2. Inside the decoder block: This is where we’ll spend the most time. We’ll explore in detail what happens within a single decoder block. Once you understand the internal workings of one block, the same logic applies to all six, as their architectures are identical. The only differences are in their learned parameters.
  3. Output generation: Finally, we’ll examine what happens in the output layer once the decoder has finished processing.

For our deep dive, we’ll use a machine translation task as an example — specifically, translating from English to Hindi. Let’s assume we’re working with a training dataset containing English-Hindi sentence pairs. For simplicity, let’s take a single example where the English sentence is “We are friends” and its corresponding Hindi translation is “हम दोस्त हैं”.

Here’s a critical point to remember: the encoder will have already processed the English sentence before the decoder even begins its work. The encoder’s job is to generate contextual embeddings for each token in the sentence, which are then passed to the decoder. The decoder will use these embeddings as it generates the translated output, starting with the Hindi sentence.

1. Input Preparation :

Let’s start with the input preparation phase of the decoder. This phase involves four key operations: shifting, tokenization, embedding, and positional encoding. The purpose of this input block is to take the output sentence (in our case, the Hindi sentence “हम दोस्त हैं”) and process it so that it can be fed into the first block of the decoder.

Here’s a more detailed explanation of these steps:

  1. Right Shifting: The first operation is right shifting. In this step, we add a special token called the start token at the beginning of the sentence. This start token acts as a flag, indicating the beginning of the training process. So, our transformed input now becomes: <START> हम दोस्त हैं.
  2. Tokenization: The next step is tokenization. This is where we break down the sentence into individual tokens. Tokenization can be done at various levels (words, bigrams, or n-grams), but for simplicity, we’ll use word-level tokenization. After tokenizing, we get the following four tokens: <START>, हम, दोस्त, and हैं.
  3. Embedding: Once we have the tokens, the next step is to convert them into numerical representations that the machine can process. This is where the embedding layer comes in. The embedding layer takes each token and generates a corresponding vector. In the original transformer paper, each vector has 512 dimensions. So, for our tokens, we’ll have the following embeddings:
  • <START> corresponds to vector E1 (512-dimensional),
  • हम corresponds to vector E2 (512-dimensional),
  • दोस्त corresponds to vector E3 (512-dimensional),
  • हैं corresponds to vector E4 (512-dimensional).

At this point, we’ve successfully transformed our tokens into machine-readable vectors, but we still face one issue: we haven’t encoded any information about the order of the tokens (i.e., which token comes first, second, etc.).

Input Preparation Workflow diagram for decoder block

4. Positional Encoding: To address this, we use positional encoding, which helps the model understand the order of the words in the sentence. Positional encoding generates a unique vector for each position in the sentence. For example:

  • Position 1 gets vector P1 (512-dimensional),
  • Position 2 gets vector P2 (512-dimensional),
  • Position 3 gets vector P3 (512-dimensional),
  • Position 4 gets vector P4 (512-dimensional).

At this stage, we have two sets of vectors: the embedding vectors and the positional encoding vectors.

Next, we simply add these two sets of vectors together. For example:

  • The embedding vector for <START> (E1) is added to the positional vector for the first position (P1),
  • The embedding vector for हम (E2) is added to the positional vector for the second position (P2), and so on.

Since both sets of vectors have 512 dimensions, they can be added together easily. The result is our final set of input vectors: X1, X2, X3, and X4. These vectors correspond to the tokens in the sentence and encode both their meanings and their positions.

These final vectors, X1, X2, X3, and X4, are what we send into the first block of the decoder.

I hope this clarifies how the input block works and how the input sentence is prepared for processing in the decoder.

2. Inside the decoder block :

Now that we have our input vectors X1, X2, X3, and X4, we feed them into the decoder block. The first component to process these vectors is the masked multi-head attention block.

If you’ve read my blog on masked self-attention, you might recall that the masked multi-head attention works almost the same as regular multi-head attention. The only significant difference is the masking.

Let me quickly explain how this works. For each input vector (X1, X2, X3, X4), a corresponding contextual embedding vector is generated, like so:

  • Z1 is generated for X1,
  • Z2 is generated for X2,
  • Z3 is generated for X3,
  • Z4 is generated for X4.

However, the key difference here is that while generating Z1, we only consider the <START> token (X1) and ignore the rest (X2, X3, X4). While generating Z2, we consider both <START> and हम (X1 and X2) and ignore दोस्त and हैं (X3 and X4). Similarly, for Z3, we consider <START>, हम, and दोस्त (X1, X2, X3), but not हैं (X4). Finally, when generating Z4, we take all tokens into account (X1, X2, X3, X4).

This process is what makes it masked attention, as each vector is generated while only considering previous tokens, not future ones.

If you’d like to understand this in more detail, I recommend revisiting my blog on masked self-attention, where I explain the masking process and how the output is derived.

Now, if we look back at our main diagram, we can see that the output of the masked multi-head attention block is fed into the next layer: the add & normalize block.

In this block, the first operation is addition. The question is: what exactly are we adding? The answer is simple. We add the output of the masked multi-head attention block (Z1, Z2, Z3, Z4) to the original input vectors (X1, X2, X3, X4). This happens because of the residual connection or skip connection that bypasses the multi-head attention. The original input vectors are passed along this path and then added to the output of the multi-head attention.

Looking at the diagram, you’ll notice that X1, X2, X3, and X4 are passed not only into the multi-head attention but also through this residual connection. So now, we add:

  • Z1 to X1,
  • Z2 to X2,
  • Z3 to X3,
  • Z4 to X4.

All of these vectors are 512-dimensional, so the addition operation is straightforward. After adding, we get a new set of vectors, which we can call Z1', Z2', Z3', and Z4'.

image by CampusX

Next, we perform the normalization step. The type of normalization used here is layer normalization. What layer normalization does is calculate the mean (µ) and standard deviation (σ) for each vector. Using these values, it normalizes each vector so that the values lie within a small, given range. This ensures that the training process remains stable.

We use layer normalization because, during the previous operations (attention, additions, etc.), we might have generated large numbers, which could destabilize the training process. By normalizing the vectors, we ensure that the entire process remains stable.

At this point, we have reached the add & normalize block. The output from this step is ready, and we can now pass it to the next block in the decoder, which is the cross-attention block.

Now, let’s move on to the cross-attention block, which is probably the most interesting part of the entire decoder architecture.

Here’s why: the cross-attention block allows interaction between the input sequence (for example, an English sentence) and the output sequence (like a Hindi sentence). Essentially, for each token in your English sentence, you compute a similarity score with each token in your Hindi sentence. This is where the magic of cross-attention happens!

You’ll notice that this block takes two inputs:

  1. One input comes from the masked attention block, which we discussed earlier.
  2. The second input comes from the encoder.

If you refer to the overall diagram, you’ll see that once the encoder finishes its work, it passes its output to this stage in the decoder.

To summarize, the cross-attention block works just like a normal multi-head attention block, with one big difference: instead of a single input sequence, cross-attention uses two sequences.

  • The first sequence is your English sentence (coming from the encoder).
  • The second sequence is your Hindi sentence (coming from the previous step of the decoder).

This is crucial because, for attention, you need three sets of vectors: query, key, and value.

  • The query vectors come from the decoder (based on the Hindi sentence).
  • The key and value vectors come from the encoder (based on the English sentence).
image by CampusX

Once you have the query, key, and value vectors, the rest of the process is exactly like a normal self-attention block.

Let’s look at the diagram again. Previously, we received Z1_norm, Z2_norm, Z3_norm, and Z4_norm from the masked attention block. Now, we’ll send these vectors into the cross-attention block. But that’s not all — we also feed the encoder embeddings into this block.

From the Z1_norm, Z2_norm vectors, we’ll extract the query vectors. From the encoder embeddings, we’ll extract the key and value vectors. Once you have all these vectors, the attention mechanism runs as usual. The result is a new contextual embedding vector for each token in your output sentence.

For example, if your output sentence has four tokens (as we do here), you’ll get contextual embedding vectors for each token:

  • Zc1 for token 1,
  • Zc2 for token 2,
  • Zc3 for token 3,
  • Zc4 for token 4.

The “c” here stands for cross-attention, indicating that these embeddings are the output of the cross-attention block.

image by CampusX

At this point, we have reached the cross-attention block’s output, and once again, there is an add & normalize block right after it.

So, the question arises: what are we adding this time? It’s simple — we add the cross-attention output (Zc1, Zc2, Zc3, Zc4) to the output of the previous masked attention step (Z1_norm, Z2_norm, etc.).

Notice that we not only sent Z1_norm, Z2_norm to the cross-attention block, but we also passed them through a residual connection to this point, where they are added to the cross-attention output. This is still fine because all vectors are 512-dimensional, so there’s no problem with the addition operation.

After performing the addition, we get a new set of vectors, which we can call Zc1_norm, Zc2_norm, and so on.

The final step here is layer normalization, which ensures that the output vectors are normalized. After normalization, we get our new set of vectors Zc1_norm, Zc2_norm, and so on, which are ready to be passed to the next stage of the decoder.

At this point, we’ve reached the feed-forward block, which is the next component in our decoder journey.

At this point in our decoder architecture, we will now move on to the feed-forward layer. Now, these Zc1_norm, Zc2_norm, Zc3_norm and Zc4_norm vectors need to be passed through a feed-forward neural network.

Let me first explain the architecture of this feed-forward network. Interestingly, the architecture is exactly the same as the one we saw in the encoder.

This feed-forward neural network consists of two layers:

  1. The first layer has 2048 neurons, and its activation function is ReLU.
  2. The second layer has 512 neurons, and its activation function is linear.

When it comes to parameters, the feed-forward neural network expects an input of 512 dimensions. The first layer has weights of shape 512×2048 and biases of size 2048. We’ll call these weights W1 and the biases b1​. Similarly, the second layer has weights of shape 2048×512 and biases of size 512, which we’ll refer to as W2 and b2​.

Now, since we have four vectors of 512 dimensions each, we create a batch. By combining these vectors, we create a matrix of shape 4×512. This entire batch is passed through the feed-forward network simultaneously.

The first operation that takes place is the dot product of the input matrix Z with W1​, followed by adding the bias b1. Mathematically, this can be represented as:

W1⋅Z+b1

The resulting matrix has a shape of 4×2048. After this, we apply the ReLU activation function, which introduces non-linearity.

Next, the output is passed through the second layer, where we perform the dot product with W2 ​ and add the bias b2. The result is a matrix of shape 4×512, giving us four vectors of size 512 each, just like we had before.

So, we started with four vectors of size 512, passed them through the feed-forward network, and ended up with four vectors of the same size. The key difference is that non-linearity has been introduced in the process due to the ReLU activation function.

After this operation, we now have four vectors, which we’ll call y1,y2,y3,y4​.

image by CampusX

At this point, we are here in the diagram, at the output of the feed-forward block. Again, there’s an add and normalize operation ahead. So, we take the output of the feed-forward network and add it to the original input zc1norm, zc2norm​, and so on, using a residual connection. Since all these vectors are of size 512, we can perform element-wise addition.

After performing the addition, we normalize the result using layer normalization, as we’ve done after every major operation. This gives us the final set of vectors y1norm, y2norm, y3norm, y4norm​, which are still 512-dimensional vectors.

image by CampusX

With this, we have reached this point in the decoder, and these four vectors are the output of the first decoder block.

I hope this explanation makes it clear how a single decoder block functions from start to finish!

As you already know, we don’t just have one decoder block. In total, there are six decoder blocks. So, once you get the output from the first decoder block, which is y1norm, y2norm, y3norm, and so on, you send this output directly to the second decoder block. In the second decoder block, the same operations will be executed as in the first block. The only difference is that the parameters will be different, but the operations will remain identical.

Again, you will apply masked multi-head attention, followed by normalization. After that, you’ll apply cross attention and normalize again. Then, you’ll pass the result through the feed-forward layer, followed by another normalization step. After these operations, you’ll reach this point, and you’ll get another set of vectors, similar to how you got y1norm, y2norm​ and so on from the first block. These vectors will then be passed on to the third decoder block.

This process continues until you reach the sixth decoder block. Finally, the output of the sixth decoder block will be the final output.

image by CampusX

In the diagram, I’ve shown that you’re currently in decoder one, and you’ve just received the output of the first decoder block. After that, there are five more decoder layers, or blocks, to process. As you go through each of these blocks, you will eventually get the final output, which I’ve denoted as yf1norm, yf2norm, yf3norm and so on, where “f” stands for “final.”

3. Output generation :

To understand how this works, we need to look at the output portion, which consists of two parts: a Linear Layer and a Softmax Layer. This is similar to the output layer of a feed-forward neural network.

image by CampusX

Here’s how it works:

You have a single layer that takes a 512-dimensional input. The number of neurons in this layer is V, where V is a predefined number based on the vocabulary size of the Hindi words in your dataset. Let’s break this down:

We’re translating from English to Hindi. On one side, you have English sentences, and on the other side, you have their corresponding Hindi translations. For example, let’s take a dataset of 5000 sentence pairs. You’ll go through the Hindi side of the dataset and count all the unique words. For example, “बढ़िया” is a unique word, “हम” is a unique word, “दोस्त” is a unique word, and so on. You won’t count a word again if it has already appeared, like “हम” being repeated in another sentence.

Once you have the unique words, you form the vocabulary for the Hindi words. Suppose, in your dataset of 5000 Hindi sentences, there are 10,000 unique words. This 10,000 will be the value of V. The number of neurons in this output layer will be equal to the size of your Hindi vocabulary, i.e., 10,000 neurons.

Each neuron represents one unique Hindi word. The first neuron corresponds to the first word in your vocabulary, such as “मैं,” the second neuron corresponds to “बढ़िया,” and so on, until the 10,000th neuron.

Now, how are the weights structured?

  • Each input has 512 dimensions.
  • There are 10,000 neurons in the output layer.

Thus, you’ll have weights of size 512 x 10,000 and also 10,000 bias values (one for each neuron).

At this point, you have four vectors — one for each token. For instance:

  1. First vector for the token “स्टार्ट”
  2. Second vector for the token “हम”
  3. Third vector for the token “दोस्त”
  4. Fourth vector for the token “है”

Since the input to this layer can only be in 512 dimensions, you stack these four vectors into a matrix of size 4 x 512. You can now pass this entire matrix into the layer in batch form, enabling parallel processing for all tokens. But for simplicity, let’s focus on just one token.

Assume we send only the vector corresponding to the “स्टार्ट” token into this layer. The input would be of size 1 x 512. What happens next?

The vector yf1normy​ is multiplied by the weights, W3. Here, W3 is of size 512 x 10,000. The result will be a vector of size 1 x 10,000, giving you 10,000 unique numbers — one for each neuron. Each neuron will output a number. For example:

  • Neuron 1 outputs 3
  • Neuron 2 outputs 6
  • Neuron 3 outputs some other value, and so on.

These values are not normalized yet, so the next step is to apply Softmax, which normalizes the numbers such that their sum equals 1. This forms a probability distribution. For example, after applying Softmax, you might get:

  • Neuron 1 (word: “मैं”) has a probability of 0.01
  • Neuron 2 (word: “बढ़िया”) has a probability of 0.02
  • Neuron 3 (word: “हम”) has the highest probability, say 0.25.

Since “हम” has the highest probability, it will be chosen as the output word for the “स्टार्ट” token.

Next, you do the same for the other vectors yf2norm,yf3norm, and so on, multiplying them by the weights W3​ and adding the bias, then applying Softmax to get the output for each token.

For example:

  • yf2norm ​ might correspond to the word “दोस्त” with the highest probability.

Thus, the final decoded sentence could be something like: “हम दोस्त हैं”.

This is how the decoder architecture works. I hope this explanation makes things clear!

Complete Workflow and internal structure of a single decoder

References :

Research Paper : Attention is all you need

Youtube Video : https://youtu.be/DI2_hrAulYo?si=JftfuLcPNqIKWdNA

I trust this blog has enriched your understanding of Transformer Architecture. If you found value in this content, I invite you to stay connected for more insightful posts. Your time and interest are greatly appreciated. Thank you for reading!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->