Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Enhancing Multi-Layer Perceptron Performance: Demystifying Optimizers
Latest   Machine Learning

Enhancing Multi-Layer Perceptron Performance: Demystifying Optimizers

Author(s): Anand Raj

Originally published on Towards AI.

Introduction

Optimizers are algorithms or methods used to adjust the attributes of a model, such as its weights and learning rate, in order to minimize the error or loss function during the process of training a machine learning model. The main objective of an optimizer is to find the optimal set of parameters that result in the best performance of the model on the given dataset.

Optimizers in 3D. Image source: TDS

Gradient Descent

Gradient Descent (GD) is a first-order optimization algorithm used to minimize the cost function in machine learning and optimization problems. It iteratively updates the parameters of a model in the direction of the steepest descent of the cost function with respect to those parameters. The algorithm works by calculating the gradient of the cost function at a particular point and then updating the parameters in the opposite direction of the gradient. GD can be computationally expensive, especially for large datasets, as it requires storing and processing the entire dataset in memory for each iteration. Hence, Stochastic Gradient Descent (SGD) was introduced. SGD processes one sample or mini-batch at a time. SGD tends to converge faster and is more computationally efficient, especially for large datasets.

Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is an optimization algorithm used to minimize a cost function in machine learning. Unlike traditional Gradient Descent (GD), which computes gradients using the entire dataset, SGD updates model parameters using only one training example (or a small subset, called mini-batch) at a time. This makes SGD more computationally efficient and suitable for large datasets.

Mathematical Formulation for SGD.

SGD introduces more noise in parameter updates compared to GD, which can result in more erratic convergence behavior and slower convergence in some cases. Hence, we introduce the concept of momentum.

Code implementation of Optimizers for SGD.

Momentum + SGD

Momentum is an optimization algorithm used to accelerate Gradient Descent (GD) and its variants, such as Stochastic Gradient Descent (SGD), by introducing a momentum term that smooths out the update process and helps overcome local minima. The momentum algorithm maintains a moving average of the gradients and updates the parameters in a direction that aligns with the accumulated gradients.

Mathematical Formulation for Momentum + SGD.

Momentum is a powerful optimization algorithm that accelerates convergence and helps overcome local minima by introducing momentum into the parameter updates. However, it requires careful tuning of the momentum parameter and may exhibit inertia or increased memory usage in some cases.

Image source: PapersWithCode

Nesterov Accelerated Gradient

Nesterov Accelerated Gradient (NAG) is an optimization algorithm that improves upon the standard Momentum method by taking into account the future gradient. It allows the algorithm to “look ahead” before making a step in the parameter space, resulting in faster convergence and better performance, especially in the presence of noisy gradients.

Mathematical Formulation for Nesterov Accelerated Gradient (NAG)

NAG offers faster convergence, improved accuracy, and less oscillation compared to standard Momentum, but it requires careful tuning of hyperparameters and involves additional computational complexity.

Image Source: TDS

Adagrad

Adagrad, short for Adaptive Gradient Algorithm, is an optimization algorithm designed to adjust the learning rate for each parameter adaptively based on its historical gradients. It is particularly useful in settings where different parameters have vastly different scales or where the gradients of some parameters are sparse.

Mathematical Formulation for Adagrad.

As Adagrad accumulates the squared gradients in the denominator, the learning rates for each parameter decrease monotonically over time. This can lead to excessively small learning rates for parameters associated with frequently occurring features, resulting in slow convergence or premature convergence. It also suffers from inefficiency for non-convex problems, and high memory requirements.

Code implementation of Optimizers for Adagrad.

RMSprop

RMSprop, short for Root Mean Square Propagation, is an optimization algorithm commonly used for training deep neural networks. It addresses some of the limitations of the Adagrad algorithm by adapting the learning rates dynamically based on the magnitude of recent gradients.

Mathematical Formulation for RMSProp.

It offers advantages such as adaptive learning rates, efficient handling of sparse gradients, and improved convergence speed but may suffer from hyperparameter sensitivity and increased memory requirements.

Code implementation of Optimizers for RMSProp.

Adadelta

Adadelta is an adaptive learning rate optimization algorithm that aims to address some of the limitations of other adaptive algorithms like Adagrad and RMSprop. It dynamically adapts the learning rates based on the magnitude of recent gradients and accumulated gradients over time.

Mathematical Formulation for Adadelta.

It offers advantages such as no need for a learning rate hyperparameter, memory efficiency, and effective handling of sparse gradients but may be sensitive to initialization parameters and involve computational overhead.

Code implementation of Optimizers for Adadelta.

Adafactor

Adafactor is an adaptive learning rate optimization algorithm that is particularly suitable for training deep learning models with large sparse datasets. It adapts the learning rate based on the statistics of the gradients and the parameter updates.

Mathematical Formulation for Adafactor.

Its advantages include adaptivity, memory efficiency, and suitability for handling large sparse datasets, but it may require hyperparameter tuning and involve additional computational complexity. While Adafactor shares some similarities with Adagrad in terms of adaptivity and gradient scaling, it introduces several modifications and improvements to enhance its performance, particularly for large-scale optimization problems with sparse datasets.

Code implementation of Optimizers for Adafactor.

Follow-the-Regularized-Leader

FTRL, which stands for Follow-the-Regularized-Leader, is an optimization algorithm commonly used in machine learning for training linear models, particularly in settings where the data is sparse or high-dimensional. FTRL optimizes the regularized loss function by maintaining an adaptive learning rate for each feature.

Mathematical Formulation for FTRL.

FTRL (Follow The Regularized Leader) is a powerful online learning algorithm widely used in large-scale machine learning applications, particularly in recommendation systems and online advertising. Its advantages include sparse updates for memory efficiency, built-in L1 and L2 regularization for preventing overfitting, adaptive learning rates based on feature frequency and magnitude, and robustness to noisy and sparse data. However, FTRL comes with certain disadvantages. It involves complex computations compared to simpler algorithms like SGD, requiring careful parameter tuning and potentially leading to higher computational overhead. Understanding and interpreting FTRL can be challenging, making it less intuitive compared to traditional optimization methods. Additionally, FTRL may not be suitable for all optimization problems, particularly those that are non-convex or involve highly non-linear relationships between features and targets.

Code implementation of Optimizers for FTRL.

Adam

The Adam optimizer, short for Adaptive Moment Estimation, is an adaptive learning rate optimization algorithm that computes individual adaptive learning rates for different parameters. It combines the advantages of two other popular optimization algorithms, AdaGrad and RMSProp, by incorporating both first and second moment estimates to adaptively adjust the learning rates.

The Adam optimizer offers several advantages, including adaptive learning rates that adjust for each parameter individually, leading to faster convergence rates and robust performance across various optimization tasks. Its ability to handle noisy or sparse gradients and non-stationary objectives makes it suitable for complex optimization problems. However, these advantages come with some trade-offs. Adam requires additional memory to store separate adaptive learning rates for each parameter, which can be a limitation for memory-constrained environments. Moreover, its performance is sensitive to the choice of hyperparameters, such as learning rate and momentum parameters, and finding the optimal settings may require careful tuning. Additionally, the theoretical properties of Adam are not well understood, particularly in non-convex and non-smooth optimization scenarios, which can make its behavior unpredictable in certain cases.

Code implementation of Optimizers for Adam.

AdamW

AdamW is a variant of the Adam optimizer that incorporates weight decay directly into the update step, addressing the weight decay convergence issue observed in the original Adam optimizer. The mathematical formula for AdamW is similar to Adam, but it adds the weight decay term to the parameter update step, ensuring that the weight decay penalty is applied consistently during optimization. This modification helps stabilize training and prevents the model’s parameters from growing too large during training, improving generalization performance. The advantages of AdamW include faster convergence and better generalization compared to Adam, especially in scenarios with large-scale datasets or complex models. However, like Adam, AdamW requires careful tuning of hyperparameters such as the learning rate and momentum parameters to achieve optimal performance. Additionally, the computational overhead of AdamW may be higher than traditional weight decay methods, as it requires additional calculations during optimization. Despite these considerations, AdamW is widely used in deep learning applications and has demonstrated effectiveness in improving training stability and performance.

Code implementation of Optimizers for AdamW.

Adamax

Adamax is a variant of the Adam optimization algorithm that extends it to be more memory efficient and stable when dealing with large gradients. It is particularly useful in deep learning applications where large datasets and complex models are common.

Mathematical Formulation for Adamax.

Its advantages include memory efficiency, stability with large gradients, and adaptive learning rates, but it may require tuning of hyperparameters and involve additional computational complexity.

Code implementation of Optimizers for Adamax.

Nadam

Nadam optimizer is an extension of the Adam optimizer, which combines the ideas of Adam with Nesterov momentum. It stands for “Nesterov-accelerated Adaptive Moment Estimation.” Like Adam, Nadam computes adaptive learning rates for each parameter, but it also incorporates the Nesterov accelerated gradient (NAG) method for faster convergence.

Mathematical Formulation for Nadam.

Nadam optimizer combines benefits of Adam and Nesterov momentum, leading to faster convergence and better generalization. Handles non-stationary objectives and sparse gradients well. Automatic adjustment of learning rates for each parameter. Some of the drawbacks include that it requires tuning of hyperparameters such as learning rate, momentum parameters, and epsilon. It might suffer from performance degradation on some tasks compared to simpler optimizers due to increased computational complexity.

Code implementation of Optimizers for Nadam.

Lion

The Lion optimizer is a stochastic-gradient-descent method that uses the sign operator to control the magnitude of the update, unlike other adaptive optimizers such as Adam that rely on second-order moments. This make Lion more memory-efficient as it only keeps track of the momentum.

Image source: lionOptimizer

Code implementation of Optimizers for Lion.

Loss Scale Optimizer

The Loss Scale Optimizer is a technique used in deep learning to mitigate the issue of gradient underflow or overflow, particularly when training with very small or very large gradient values. This approach involves dynamically adjusting the loss scale during training to maintain numerical stability and prevent numerical precision issues, such as vanishing or exploding gradients.

The basic idea behind the Loss Scale Optimizer is to scale the loss function by a certain factor, referred to as the loss scale factor. This factor is typically adjusted dynamically during training based on the magnitude of the gradients encountered. When gradients become too small or too large, the loss scale factor is adjusted accordingly to bring the gradients back within a manageable range.

By using the Loss Scale Optimizer, deep learning models can be trained more effectively and efficiently, as it helps to prevent numerical instability issues that can hinder convergence and degrade performance. However, it’s important to note that implementing the Loss Scale Optimizer requires careful tuning and experimentation to determine the optimal scaling strategy for a given model and dataset.

Overall, the Loss Scale Optimizer is a valuable technique for improving the numerical stability of deep learning training algorithms and enabling the training of more complex and deeper neural networks.

Code implementation of Optimizers for Loss Scale Optimizer.

Learning rate schedules API

Learning rate schedules in machine learning refer to the strategy of adjusting the learning rate during training to optimize the performance of the model. Learning rate schedules can help improve convergence, prevent overfitting, and achieve better generalization. Various learning rate schedules are available in machine learning libraries such as TensorFlow, PyTorch, and Keras, often provided through dedicated APIs.

Here’s an example of how learning rate schedules are typically implemented in TensorFlow and PyTorch:

TensorFlow:

In TensorFlow, learning rate schedules can be implemented using the tf.keras.optimizers.schedules module. You can define a learning rate schedule and pass it to the optimizer during model compilation. Here's an example of using the ExponentialDecay schedule:

import tensorflow as tf

initial_learning_rate = 0.1
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate, decay_steps=10000, decay_rate=0.96, staircase=True
)

optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule)

PyTorch:

In PyTorch, learning rate schedules can be implemented using the torch.optim.lr_scheduler module. You can define a scheduler and attach it to the optimizer. Here's an example of using the StepLR scheduler:

import torch
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler

optimizer = optim.SGD(model.parameters(), lr=0.1)
scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

In this example, the learning rate will be multiplied by 0.1 every 30 epochs.

These are just a few examples of learning rate schedules available in TensorFlow and PyTorch. Depending on your specific use case, you may choose different schedules such as ReduceLROnPlateau, CosineAnnealingLR, CyclicLR, etc. Each schedule has its own parameters and usage patterns, so it's essential to experiment and choose the one that best suits your needs.

Comparison of All Optimizers — Which one to be chosen when?

Choosing the right optimizer depends on various factors such as the nature of the problem, the architecture of the neural network, the dataset size, and computational resources available. Here’s a brief comparison of popular optimizers and when to choose them:

Stochastic Gradient Descent (SGD):

  • Use SGD when you have a large dataset and limited computational resources.
  • It’s a good baseline optimizer for simple models and linear regression.

Adam:

  • Adam is suitable for most deep learning tasks due to its adaptive learning rate and momentum.
  • It works well with large datasets and complex architectures.
  • However, it may not converge well with small datasets or on problems with sparse gradients.

AdamW:

  • Choose AdamW when training deep neural networks to prevent overfitting.
  • It incorporates weight decay directly into the parameter update step, improving generalization performance.

Nadam:

  • Nadam combines Nesterov momentum with the adaptive learning rate of Adam.
  • It often achieves faster convergence than Adam on deep neural networks.

RMSprop:

  • Use RMSprop when dealing with non-stationary objectives or noisy gradients.
  • It adapts the learning rate separately for each parameter, making it suitable for online and non-stationary settings.

Adagrad:

  • Adagrad is effective for sparse data and convex optimization problems.
  • However, it may accumulate the squared gradients over time, leading to a diminishing learning rate.

Adadelta:

  • Adadelta is an extension of Adagrad that addresses its diminishing learning rate issue.
  • It dynamically adjusts the learning rate based on a moving window of gradient updates, making it suitable for long training sessions.

Adagrad:

  • Adagrad is effective for sparse data and convex optimization problems.
  • However, it may accumulate the squared gradients over time, leading to a diminishing learning rate.

Adadelta:

  • Adadelta is an extension of Adagrad that addresses its diminishing learning rate issue.
  • It dynamically adjusts the learning rate based on a moving window of gradient updates, making it suitable for long training sessions.

Adamax:

  • Adamax is a variant of Adam that replaces the L2 norm with the infinity norm.
  • It is less sensitive to the choice of learning rate hyperparameters.

RMSprop:

  • RMSprop is an adaptive learning rate method that divides the learning rate by an exponentially decaying average of past squared gradients.
  • It works well for non-stationary objectives and can converge faster than vanilla SGD.

Adafactor:

  • Adafactor is an adaptive learning rate method designed specifically for the training of deep neural networks.
  • It adapts the learning rate per parameter, using statistics based on the current gradient.

Choosing the right optimizer involves experimentation and tuning hyperparameters based on the specific characteristics of your dataset and model. It’s often recommended to start with Adam or AdamW as they generally perform well across a wide range of tasks and then experiment with other optimizers to see if they offer any improvements for your particular problem.

Comparing different Optimizers. Image Source: AnalyticsVidhya

References

  1. Code implementation for all Optimizers: Keras.
  2. Lion Optimizer.
  3. Optimizers.
  4. A Visual Explanation of Gradient Descent Methods.

Hope you liked the article!!!!

You may reach out to me on LinkedIn.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->