Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-FranΓ§ois Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Artificial Intelligence   Latest   Machine Learning

Revolutionizing AI with DeepSeekMoE: Fine-grained Expert and Shared Expert isolation πŸ§žβ€β™‚οΈ

Author(s): JAIGANESAN

Originally published on Towards AI.

Revolutionizing AI with DeepSeekMoE: Fine-grained Expert and Shared Expert isolation πŸ§žβ€β™‚οΈ

JAIGANESAN

Follow

Published in

Towards AI

11 min read1 hour ago

Image by Imaginium from Pixabay

In this article, we’re going to dive into the world of DeepSeek’s MoE architecture and explore how it differs from Mistral MoE. We’ll also discuss the problem it addresses in the typical MoE architecture and how it solves that problem.

If you already have a solid understanding of LLMs and MoE, feel free to skip the below recommendations and continue reading this article 😊.

But If you’re new to this topic, I highly recommend checking out my previous articles on Large Language Models (LLMs) and Mixture of Experts (MoE). I’ve written a series of articles to help you understand these complex concepts.

If you’re not familiar with LLMs and MoE, start with my first article, Large Language Models: In and Out, where I explain the basic architecture of LLMs and how they work. Then, move on to Breaking Down Mistral 7B, which breaks down the Mistral architecture and its components. Finally, read Mixture of Experts and Mistral’s Sparse Mixture of Experts, which delves into the world of MoE and Sparse MoE. It will be a Visual Walkthrough in LLM and Mistral architecture from embedding to prediction.

Large Language Model (LLM)πŸ€–: In and Out

Delving into the Architecture of LLM: Unraveling the Mechanics Behind Large Language Models like GPT, LLAMA, etc.

pub.towardsai.net

Breaking down Mistral 7B ⚑🍨

Exploring Mistral’s Rotary positional Embedding, Sliding Window Attention, KV Cache with rolling buffer, and…

pub.towardsai.net

The architecture of Mistral’s Sparse Mixture of Experts (S〽️⭕E)

Exploring Feed Forward Networks, Gating Mechanism, Mixture of Experts (MoE), and Sparse Mixture of Experts (SMoE).

pub.towardsai.net

In this article, we’ll be exploring the following topics in-depth:

⚑What problems does DeepSeek’s MoE address, and what solutions does it offer?

⚑How does DeepSeek’s expert architecture differ from Mistral’s expert architecture?

⚑Fine-grained expert architecture

⚑Shared expert isolation architecture

Let’s dive in and get started!

1. What problems does DeepSeek’s MoE address, and what solutions does it offer? 🀠

Despite the promising results of the existing Mixture of Experts (MoE) architecture, there are two major limitations that were addressed by DeepSeek researchers. These limitations are knowledge hybridity and knowledge redundancy.

New solutions bring new kinds of problems to solve.

So, what is knowledge hybridity in MoE? In simple terms, it’s the integration and blending of different forms, sources, and types of knowledge. This means combining insights from various fields or domains to solve common problems.

The problem with knowledge hybridity in MoE is that existing architectures often have a limited number of experts (for example, 8, 12, or 16, and Mistral has only 8 experts). As a result, the tokens assigned to a specific expert will likely cover diverse knowledge areas. This means that each designated expert will have to assemble vastly different types of knowledge in its parameters, which can be challenging to utilize simultaneously. In other words, a single expert will have to handle different background knowledge, which can be difficult.

The root of the issue lies in the training data itself, which often contains a mix of knowledge from different backgrounds. This forces each expert to specialize in different tasks, specializing in multiple areas at once. However, this can be inefficient and sometimes even inadequate. For example, solving a single problem might require different background data, but with only a limited number of activated experts, it may not be possible to give good predictions or solve the problem.

Another issue with the existing Mixture of Experts (MoE) systems is knowledge redundancy. This occurs when multiple experts learn the same things and store them in their parameters.

For instance, tokens assigned to different experts may require a common piece of knowledge. As a result, these experts may end up learning the same knowledge and storing it in their parameters, and this is redundancy. This means that the same information is being duplicated across multiple experts, which is Parameter waste and inefficient.

To solve the issues of knowledge hybridity and redundancy, DeepSeek proposes two innovative solutions: Fine-Grained Expert and Shared Expert Isolation. But Before we dive into these methods we should understand what changes DeepSeek Researchers made and proposed in Expert (Feed Forward Architecture) How it differs from typical Expert architecture and how it lays the groundwork for these new solutions.

2. How does DeepSeek’s expert architecture differ from Mistral’s expert architecture? πŸ”Ž

DeepSeek didn’t use any magic to solve the problems of knowledge hybridity and redundancy. Instead, they simply changed their perspective on the expert architecture. To understand how? let’s take a closer look at the Mistral expert architecture.

Note: To illustrate the Fine-grained expert and shared expert isolation I have compared it with Mistral MoE architecture.

class FeedForward(nn.Module):
def __init__(self, args: ModelArgs):
super().__init__()
self.w1 = nn.Linear(args.dim, args.hidden_dim, bias=False)
self.w2 = nn.Linear(args.hidden_dim, args.dim, bias=False)
self.w3 = nn.Linear(args.dim, args.hidden_dim, bias=False)

def forward(self, x) -> torch.Tensor:
return self.w2(nn.functional.silu(self.w1(x)) * self.w3(x))

# SwiGLU = nn.functional.silu(self.w1(x)) * self.w3(x)
# self.w3(x) = Acts as Gating Mechanism
# Swish Activation(Beta=1) = nn.functional.silu(self.w1(x))
# By doing this we Introduce Non-linearity in element wise and Preserve the high magnitude of the vector.
Image 1: Mistral’s FFN or Expert Input, hidden layer, and output dimension. Created by author

The expert code in Mistral is the SwiGLU FFN architecture, with a hidden layer size of 14,336. If we break down the architecture, as shown in Image 1 and the code snippet above, we can calculate the number of parameters in each expert.

Expert’s Parameter Count : (No.of hidden Layer operation x Hidden layer weight matrix + Output layer weight matrix )

= 2 x (14336 x 4096)+ (4096 x 14336)
= 117440512 + 58720256 = 17,61,60,768 ~ 17.6 Crore Parameter.

If we calculate the Parameters in One decoder’s MoE layer = No. of .experts X parameters in One expert = 8 x 17,61,60,768 = 1,40,92,86,144 ~ 1.4 billion Parameters in MoE layer.

Here’s the interesting part: what if we split each expert into two, with the same number of parameters? This would give each expert around 8.8 crore parameters. To do this, we simply divide the hidden layer size by 2, creating two experts with the same number of parameters.

I will explain it with Parameter count after the splitting expert:

Expert’s (Fine-grained expert πŸ˜‰ )Parameter Count : (No.of hidden layer operation x Hidden layer weight matrix + Output layer weight matrix )

= 2 x (7168 x 4096)+ (4096 x 7168)

= 8,80,80,384 ~ 8.8 Crore Parameters.

Image 2: Expert after splitting the Intermediate dimension of FFN. Created by author

What we did is the Existing MoE’s Expert’s hidden size is 14336, after division, the hidden layer size of experts is 7168. DeepSeekMoE calls these new experts fine-grained experts. By splitting the existing experts, they’ve changed the game. But how does this solve the problems of knowledge hybridity and redundancy? We’ll explore that next.

3. Fine-Grained Expert Segmentation πŸ¦Έβ€β™‚οΈπŸ¦Έβ€β™‚οΈπŸ¦Έβ€β™‚οΈπŸ¦Έβ€β™‚οΈβ€¦πŸ¦Έβ€β™‚οΈ

As shown in the illustration, researchers have divided an expert into multiple, finer-grained experts without changing the number of parameters. This is done by splitting the intermediate hidden dimension of the feed-forward network (FFN).

The beauty of this approach is that it doesn’t increase the computational load but allows more experts to be activated. This, in turn, enables a more flexible and adaptable combination of activated experts. As a result of this, diverse knowledge can be broken down more precisely into different experts, and at the same time, each expert retains a higher level of specialization. Combining More Activated experts gives more flexibility and more accurate responses.

For Example, some tokens play important roles in different knowledge backgrounds. So Multiple experts will specialize in their specialization when the expert has access to the token. Otherwise limited number of experts have to cover the knowledge about tokens, which have different knowledge backgrounds.

The Tokens β€œIf, while, function” can be used in Code, Reasoning, common knowledge, and even mathematics. Because code, Reasoning and mathematics are closely connected.

Image 3: Fine-Grained Expert Segmentation. Created by author

As shown in Image 3, we know the Mistral architecture uses 8(N) experts, whereas this new approach uses 16 (2N) experts, doubling the number of experts. However, the number of parameters remains the same.

Image 4: Fine-Grained Expert Segmentation Equation. Source: DeepSeekMoE Research Paper[1]

Let’s take a closer look at the mathematical representation of fine-grained expert segmentation, as shown in Image 4. Here, u_t represents the input tensor. For example, if we have 9 input tokens, each with a model dimension of 4096, our input tensor would be represented as u_t (9, 4096).

The variable m plays a crucial role in this equation. It determines how many fine-grained experts we can split one expert into. In other words, mN represents the total number of fine-grained experts, while mK represents the top mk experts that are selected for each token.

The token-to-expert affinity is denoted by s_i,t, and g_i,t is sparse, meaning that only mK out of mN values are non-zero. Finally, h_t represents the output of the hidden state.

In the Mistral architecture, the top 2 experts are selected for each token, whereas in this new approach, the top 4 experts are chosen. This difference is significant because existing architectures can only utilize the knowledge of a token through the top 2 experts, limiting their ability to solve a particular problem or generate a sequence, otherwise, the selected experts have to specialize more about the token which may cost accuracy. In contrast, with more fine-grained experts, this new approach enables a more accurate and targeted knowledge acquisition.

In Existing Mixture of Experts (MoE) architectures, each token is routed to the top 2 experts out of a total of 8 experts. This means there are only 20 possible combinations of experts that a token can be routed to.

In contrast, Fine-Grained MoE architectures have a significant advantage when it comes to combination flexibility. With 16 experts and each token being routed to 4 experts, there are 1820 possible combinations. This increased flexibility leads to more accurate results, as the model can explore a wider range of expert combinations to find the best fit for each token.

This advantage in combination with flexibility is a key benefit of Fine-Grained MoE architectures, allowing them to give better results than existing MoE models.

4. Shared Expert Isolation 🦈

The Share Expert Isolation approach involves, activating a certain number of fine-grained experts for all tokens. This means that all tokens are passed through these experts, which are designed to capture and consolidate common knowledge across various concepts.

For example, when training data have a wide range of concepts and knowledge backgrounds, such as history, politics, mathematics, coding, reasoning, literature, and more, a common thread runs through them all β€” they are all written in English. The shared expert learns to write good content with proper grammar and flow in English, enabling it to generate a coherent sequence of content.

Meanwhile, other experts are activated based on the token, contributing their specialized knowledge in areas like math, reasoning, or coding. The combination of the shared expert and these fine-grained experts ultimately produces a well-structured sequence.

Image 5: Fine-grained expert segmentation + shared expert Isolation

By compressing common knowledge into shared experts, the redundancy among other experts is significantly reduced. Previously, each expert had to learn how to construct English words in a sequence, meaning they have the same parameters.

Now, this task is handled by the shared expert, freeing up the other experts to focus on their specific areas of specialization. As a result, fine-grained experts can specialize more intensely in their respective areas.

Image 6: Shared Expert Isolation Equations. Source: DeepSeekMoE Research paper [1]

Let’s compare the mathematical representations of Fine-Grained MoE (Image 4) and Shared Expert Isolation (Image 6).

One key difference between the two is the introduction of K_s, which represents the number of shared experts in Image 6. This is in contrast to Image 4, which doesn’t have shared experts.

Another important difference is the token-to-expert affinity, denoted by s_i,t. In Image 4, this affinity is calculated based on the number of fine-grained experts (mN, mK). However, in Image 6, the affinity is calculated based on the number of shared isolation experts (mN, mk-K_s). This means that the way tokens are assigned to experts changes depending on the number of shared experts.

These architectural innovations in DeepSeekMoE create opportunities to train a highly parameter-efficient MoE language model, where each expert is highly specialized and can contribute its unique expertise to generate accurate and informative responses.

In conclusion, we’ve seen the evolution of the typical feed-forward network over time in this series of articles. From its Feed Forward Networks, it transformed into a Mixture of Experts, then into a sparse MoE, followed by fine-grained MoE, and finally, into Shared MoE. Each new approach has paved the way for other innovative solutions to tackle real-world problems in AI.

On a philosophical note πŸ“, I’d like to touch upon human nature πŸ€·β€β™‚οΈπŸ€·β€β™€οΈ. Our inherent desire for more πŸƒπŸ½, our wanting more attitude, drives innovation. Humans have an innate tendency to identify problems and strive to solve them, much like what’s happening in the AI world. As time progresses, we can expect researchers to uncover more problems and develop solutions to address them. This relentless pursuit of improvement is what propels us forward, and it’s exciting to think about what the future holds for AI.

Thanks for reading this article 🀩. If you found my article useful πŸ‘, give it a clapπŸ‘πŸ˜‰! Feel free to follow for more insights.

Let’s keep the conversation going! Feel free to connect with me on LinkedIn www.linkedin.com/in/jaiganesan-n/ 🌏❀️

and join me on this exciting journey of exploring AI innovations and their potential to shape our world.

References:

[1] Damai Dai, Chengqi Deng, Chenggang Zhao, R.X. Xu, Huazuo Gao, DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models(2024), Research paper (arxiv)

[2] DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model(2024), Research paper(arxiv)

[3] Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot. Mixtral of Experts (2024). Research paper (Arxiv).

[4] William Fedus, Jeff Dean, Barret Zoph. A Review of Sparse Expert models in deep learning (2022). Research paper (Arxiv).

[5] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Attention is all you Need (2017). Research Paper (Arxiv)

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

πŸ”₯ Recommended Articles πŸ”₯

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101β€”Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c πŸš€πŸš€πŸš€ β–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Towards AI is looking for contributors! β”‚ β”‚ Join us in creating awesome AI content. β”‚ β”‚ Let's build the future of AI together β†’ β”‚ β”‚ https://towardsai.net/contribute β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->