Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Transformer Architecture Part -1
Latest   Machine Learning

Transformer Architecture Part -1

Last Updated on September 18, 2024 by Editorial Team

Author(s): Sachinsoni

Originally published on Towards AI.

In recent years, transformers have revolutionized the world of deep learning, powering everything from language models to vision tasks. If you’ve followed my previous blogs, you’re already familiar with some of the key components like self-attention, multi-head attention, layer normalization, and positional encoding. These building blocks form the core of how transformers excel at handling sequential data. In this blog, I’ll tie everything together and take you on a deeper dive into the complete architecture, showing how these components work in harmony to create models that outperform traditional neural networks.

Before we dive into the details of transformer architecture, I want to extend my heartfelt gratitude to my mentor, Nitish Sir. His exceptional guidance and teachings on the CampusX YouTube channel have been instrumental in shaping my understanding of this complex topic. With his support, I’ve embarked on this journey of exploration and learning, and I’m excited to share my insights with you all. Thank you, Nitish Sir, for being an inspiration and mentor!

Let’s start the journey of understanding the Transformer architecture from its core. The diagram often used to represent the Transformer can seem overwhelming at first glance. It includes both the encoder and decoder, but when you break it down, it becomes much easier to understand.

At a high level, the Transformer consists of two main components: the encoder and the decoder. Think of the Transformer as a large box containing two smaller boxes — one for the encoder and one for the decoder. This is the most simplified view of the architecture.

image by CampusX

But there’s a bit more complexity here. If you look closely at the diagram, you’ll notice that each of these boxes isn’t just a single block. The encoder and decoder are actually composed of multiple blocks — six encoder blocks and six decoder blocks, according to the original paper Attention Is All You Need. This number was achieved through experimentation, giving the best results for various tasks.

Now, here’s the key: all the encoder blocks are identical, and so are the decoder blocks. This means that once you understand the structure of one encoder block, you understand them all! So, the next logical step is to focus on understanding a single encoder block in detail, as the rest will follow from there.

Now, let’s dive into a single encoder block. When we zoom in, we see that each encoder block consists of two main components: a self-attention block and a feed-forward neural network. If you’ve read my previous blogs, you should already be familiar with self-attention and feed-forward neural networks.

So, what’s inside every encoder block? It’s simple: each block has a self-attention module and a feed-forward neural network. This is true for all six encoder blocks — they are identical, meaning once you understand one, you understand them all.

image by CampusX

But how do these blocks work together? The actual architecture of an encoder block includes additional components like add & norm layers and residual connections. These ensure the flow of information remains smooth as it passes through each block.

The input data, typically a batch of sentences, enters the first encoder block, undergoes processing, and the output moves to the next encoder block. This process continues across all six encoder blocks, with the final output being passed to the decoder. Each block processes the data similarly, making the entire architecture highly efficient and structured.

Flow of data : output of one encoder is input for next encoder block

Before diving into the main parts of the encoder, it’s crucial to understand the input block, where three essential operations are performed. These steps take place before the input is fed into the encoder.

  1. Tokenization: The first operation is tokenization. If you’re familiar with NLP, you’ll know that tokenization is the process of splitting a sentence into tokens. In this case, we are performing word-level tokenization, where each word in the sentence is broken down into individual tokens. For instance, if our sentence is “How are you?”, it gets tokenized into “How”, “are”, and “you”.
  2. Text Vectorization (Embedding): After tokenization, the words are passed through a process called text vectorization, where each word is converted into a numerical vector. This is essential because machines can’t process raw text — they need numerical representations. We use word embeddings to map each word to a vector. In our case, every word is represented as a 512-dimensional vector. For example, “How” becomes a vector of 512 numbers, “are” gets its own vector, and “you” gets another.
  3. Positional Encoding: Even though we have vectorized the words, there’s a problem: we don’t know the order of the words. Knowing the sequence of words is vital in understanding the context, as the position of each word in the sentence impacts its meaning. This is where positional encoding comes in.

Positional encoding generates a 512-dimensional vector for each word’s position in the sentence. For instance, the first word (“How”) gets a positional vector, the second word (“are”) gets another positional vector, and the third word (“you”) gets yet another. Each positional vector has the same dimensionality as the word embedding (512 dimensions).

This image illustrates how a raw input sentence is transformed into the format required by the encoder block.

Finally, we add these positional vectors to the corresponding word embedding vectors. So, the word embedding for “How” is added to its positional vector, and similarly for “are” and “you.” After this addition, we get new vectors — let’s call them X1, X2, and X3 — which represent the position-aware embeddings for each word in the sentence.

Once we have the positional encodings combined with the input embeddings, the next step is to pass these vectors through the first encoder block. In this section, we’ll focus on two key operations happening in the encoder: Multi-head Attention and Normalization.

Multi-head Attention :

At the core of the transformer architecture is the multi-head attention mechanism, which is applied to the input vectors. As a reminder, the input vectors are still of 512 dimensions each.

The input vectors are initially fed into the multi-head attention block, which is created by combining multiple self-attention mechanisms. Self-attention allows the model to understand contextual relationships between words by focusing on other words in the sentence when generating a vector for a particular word.

For instance, in a sentence like:

  • “The bank approved the loan.”
  • “He sat by the river bank.”

The word “bank” is used in different contexts in these two sentences. Initially, the embedding vectors for “bank” would be the same, but self-attention adjusts these vectors based on the surrounding words. In the first sentence, “bank” refers to a financial institution, while in the second, it refers to the side of a river. Self-attention ensures that the model can distinguish between these two meanings.

Now, instead of relying on just one self-attention mechanism, multi-head attention runs multiple self-attention operations in parallel. This allows the model to focus on different aspects of the sentence simultaneously, creating a more diverse and context-aware representation of the input.

So, when the multi-head attention block processes the first word (let’s call it X1), it outputs a new vector (Z1), which is still 512 dimensions but now contextually enriched. Similarly, when the second word X2 (e.g., “are”) and third word X3 (e.g., “you”) are processed, they produce Z2 and Z3, respectively.

An important detail is that throughout this process, the dimensionality remains consistent at 512 dimensions.

Residual Connection and Addition :

Once we get the output vectors Z1, Z2, Z3 from the multi-head attention block, we move on to the next part: the add and normalize step.

At this stage, we introduce a residual connection. The idea behind a residual connection is to bypass the multi-head attention output and carry the original input vectors X1, X2, X3 forward. These input vectors are added to their corresponding multi-head attention outputs. So, for each word, we add its original embedding to its context-aware embedding:

  • Z1 + X1
  • Z2 + X2
  • Z3 + X3

The result is a new set of vectors: Z1’, Z2’, Z3’, each of which is still 512 dimensions but now contains both the original input information and the context from multi-head attention.

Layer Normalization :

For each vector, like Z1’, which contains 512 numbers, we calculate the mean and standard deviation of those numbers. Using these two statistics, we normalize all the 512 values, bringing them into a standardized range. This process is repeated for the other vectors, Z2’ and Z3’, ensuring that all vectors are consistently normalized. Additionally, gamma (γ) and beta (β) parameters are applied during this normalization process, but I’ve covered that in detail in the layer normalization blog.

The result of this operation is a set of normalized vectors:

  • Z1_norm, Z2_norm, and Z3_norm.

Each of these vectors remains 512 dimensions, but the values are now contained within a smaller, well-defined range.

image by CampusX

Why Normalize?

The key question is, why do we need to normalize these vectors?

The answer is straightforward: stabilizing the training process. Without normalization, the output of multi-head attention, such as Z1, Z2, Z3, could exist in any range, as there’s no limit on the values produced by the self-attention mechanism. For instance, since self-attention involves multiplying numbers and performing various mathematical operations, the resulting values can vary widely. This unpredictability can destabilize training because neural networks perform best when the numbers they work with are in a small, consistent range.

By normalizing the vectors, we ensure that they remain in a stable range, which helps with the overall training process. When we add the original input vectors (X1, X2, X3) to the attention outputs (Z1, Z2, Z3), the numbers could become even larger. Hence, layer normalization is crucial to bring them back into a manageable range.

The Role of the Residual Connection :

Another question you might have is: why do we use this residual connection (or skip connection) to add the original inputs back after the multi-head attention block?

The purpose of this addition is to enable a residual connection, which helps in gradient flow during training and allows the model to learn more effectively without vanishing gradients.

Feed Forward Network :

After layer normalization, the normalized vectors, Z1_norm, Z2_norm, and Z3_norm, are passed through a Feed Forward Neural Network (FFNN). Let’s break down its architecture as described in the research paper:

  • The input layer is not counted as part of the neural network, but it receives the 512-dimensional input vectors.
  • The feed-forward network consists of two layers:
  1. First layer with 2048 neurons and a ReLU activation function.
  2. Second layer with 512 neurons and a linear activation function.

Weights and Biases in the FFNN :

  • The weights between the input and the first layer form a 512 × 2048 matrix, represented as W1.
  • Each of the 2048 neurons in the first layer has its own bias, represented collectively as B1.
  • The weights between the first and second layer form a 2048 × 512 matrix, represented as W2.
  • Each of the 512 neurons in the second layer has its own bias, represented collectively as B2.

Processing the Input :

The input vectors Z1_norm, Z2_norm, and Z3_norm can be imagined as stacked together to form a 3 × 512 matrix, where each row corresponds to one vector. This matrix is then fed into the FFNN.

  1. The input matrix is multiplied by the weights W1 and the bias B1 is added.
  2. A ReLU activation is applied to introduce non-linearity.
  3. The output is a 3 × 2048 matrix, representing the expanded dimensionality.
  4. This matrix is multiplied by the weights W2 and bias B2 is added, resulting in a 3 × 512 matrix.

Essentially, the dimensionality of the input vectors is first increased from 512 to 2048, and then reduced back to 512.

image by campusX

Why Increase and Then Reduce Dimensionality?

You might wonder, what’s the benefit of first increasing the dimensionality and then reducing it again? The key benefit comes from the ReLU activation in the first layer, which introduces non-linearities into the model. This allows the FFNN to learn more complex patterns than it could with a simple linear transformation.

Final Output of the FFNN :

The final result is a set of three vectors, each with 512 dimensions, similar to the input. Let’s call these vectors Y1, Y2, and Y3.

Add & Normalize :

After the feed-forward network processes the input, we obtain three vectors, Y1, Y2, and Y3, each with a dimensionality of 512. These correspond to the output of the feed-forward network.

Now, we perform an add operation. The original input vectors Z1_norm, Z2_norm, and Z3_norm are bypassed and added to the output vectors Y1, Y2, and Y3, respectively. This results in a new set of vectors, which we’ll call Y1', Y2', and Y3'. All these vectors are still 512-dimensional.

The purpose of this addition is to enable a residual connection, which helps in gradient flow during training and allows the model to learn more effectively without vanishing gradients.

Layer Normalization :

After the addition, layer normalization is applied again, just like we did earlier in the transformer block. Each of the vectors Y1', Y2', and Y3' undergoes normalization to ensure that the values are scaled properly, stabilizing the learning process. The resulting vectors are Y1_norm, Y2_norm, and Y3_norm.

image by campusX

Next Encoder Block :

These normalized vectors Y1_norm, Y2_norm, and Y3_norm are then passed as inputs to the next encoder block. This is similar to how the original input vectors X1, X2, and X3 were fed into the first encoder block.

In the next encoder block, the same operations will occur:

  1. Multi-head attention will be applied.
  2. Add & normalize will follow.
  3. The output will then be processed through another feed-forward network.
  4. Again, we’ll have an add & normalize step before passing the vectors to the next encoder block.

This process is repeated across a total of six encoder blocks, after which the output is passed to the decoder portion of the transformer. We’ll cover the decoder architecture in upcoming blogs. I hope you now have a clear understanding of the transformer’s encoder architecture. I am showing a brief internal structure of a encoder :

The internal structure of a encoder of a Transformer

Important Note:

1. Unique Parameters in Each Encoder Block :

One key point to remember is that while the architecture of each encoder block remains the same, the parameters (such as the weights and biases in the attention and feed-forward layers) are unique to each encoder block. Each encoder block has its own set of learned parameters that are adjusted independently during backpropagation.

2. Why Use Feed-Forward Neural Networks (FFNs)?

When you look at the workings of multi-head attention, you’ll notice that all operations — such as computing the dot products between queries, keys, and values — are linear. This is great for capturing contextual embeddings, but sometimes the data may have non-linear complexities that can’t be fully captured by linear transformations alone.

This is where the feed-forward neural network comes into play. By using an activation function like ReLU, the FFN introduces non-linearity, allowing the model to better handle more complex data patterns.

Even though this is the general understanding, it is important to note that the exact role of FFNs in transformers remains a bit of a gray area. As of now, research is still ongoing, and new insights are emerging. One interesting paper I came across suggests that feed-forward layers in transformers act as key-value memory storage. This paper highlights how FFNs might play a more important role than we currently understand.

References :

Research Paper : Attention is all you need

Youtube Video : https://youtu.be/Vs87qcdm8l0?si=aO-EAnqjwytHm14h

I trust this blog has enriched your understanding of Transformer encoder architecture. If you found value in this content, I invite you to stay connected for more insightful posts. Your time and interest are greatly appreciated. Thank you for reading!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->