Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

MLOps 4: Data in production
Latest   Machine Learning

MLOps 4: Data in production

Last Updated on July 25, 2023 by Editorial Team

Author(s): Akhil Theerthala

Originally published on Towards AI.

Hi everyone! This is Akhil Theerthala, back again in the MLOps series. I had to take a break last week due to a fest at my uni. Now, returning to the series, we have seen the different phases in the machine learning project lifecycle and explored the modeling and deployment standard procedures.
Finally, it is time to move to the next phase after painstakingly going across 3 different parts for modeling in detail. This phase is the data phase, where we focus on defining the data we use to develop a model and some tips for labeling and cleaning the data.

Lifecycle of a Machine Learning Project. Source: Deeplearning.AI, licensed under the Creative Commons Attribution-ShareAlike 2.0 license.

Why is defining data hard?

In article 1 of this series, we have described this phase as the step when we define the necessary data for the problem, label it, and pre-process it as needed. But checking if the data is defined and labelled correctly is a very tedious task.
The main reason for this is the different possible labels and definitions, where none is 100% correct. Each has its own reasoning. Let us look at this through examples, starting from an example with unstructured data.

Iguana Detection

Let’s say we have an image of 2 iguanas, one facing away from and one slightly towards the camera.

Image of an Iguana. Source: Deeplearning.AI, licensed under the Creative Commons Attribution-ShareAlike 2.0 license.

Now, we ask 3 people to label the iguanas in that particular image and draw bounding boxes around them. And we get these responses,

(a) bounding boxes are drawn around the visible portions of both iguanas (b) Since the tail of the backward-facing iguana extends behind the other iguana, the entire iguana is covered. (c) All the iguanas are bounded by a single box. Source: Deeplearning.AI, licensed under the Creative Commons Attribution-ShareAlike 2.0 license.

Now, in the above labeling, we can happily work with any one of the three labelings, and the models give very optimum results. However, what happens when we have a mixture of these different kinds of labels, each backed with other logic? The model gets confused, and we get subpar results.

This is not a big problem when our dataset is small. We can correct the labels quickly; however, imagine a dataset with more than 10000 images and a possibility of a variety of labeling logic. This verification takes a lot of time.

Let’s take a look at a second example with structured data.

Suppose you run a business like LinkedIn that lets people apply to different jobs by signing up with their email address, first name, and a few such details. After a few months, you start an app to reach more people. But this has also caused some of your existing customers to create duplicate accounts. Now, it’s your job to clean this duplicate data so that you can run further analysis.

During this maintenance, you have found 2 entries that are as follows,

An example for a duplicate entry.

So, when you first see it, you can directly guess that this is a duplicate entry, and you should remove it. But there is also that slight uncertainty from your inner perfectionist screaming that these might be 2 entirely different people living with the same names in the same locality.

And in some rare cases, that might also happen. All we know is that there is a possibility for 2 people with names Nova Ng to live in the same area. In such cases, what should you do?

Should you remove this entry in hopes of making sure that the data stays clean? or Should you leave this entry with the possibility that these two are different people?

Different people will think differently in such cases; some might merge these, while others might not. Similar examples can be found in problems with structured data like bot/spam account identification, Stolen account identification, etc.

Therefore, before we process this data, we set up some basic ground rules for the labeling, which is iteratively changed and updated over the course of data cleaning. In defining this set of rules, we generally brainstorm different possibilities that can go wrong. Well, you remember Murphy’s first law,

Anything that can go wrong will go wrong.

That is so true in the case of defining data. (At least from my experience in working with different datasets.)Hence, here are a few questions you can ask yourself to set up your rules.

  • What is the input X?
    — For images, it would be better to know about the lighting conditions, the contrast of the photos, the resolution of the images, etc.,
    — For structured data, we should know more about the features available and their relevance to the project.
  • What is the target y?
    — We should clarify what kind of output we want from the model.
    — We should also brainstorm what could go wrong during the labeling so that clear instructions can be defined.

However, we must remember that the best practices for handling data of one type contradict the best practices for different kinds of data. Hence, we need a helpful framework that factors in the variety of data for machine learning projects.

Major types of data problems.

When there are multiple types of data, we need to generalize these kinds of data categories. There is one method where we use a 2×2 grid to generalize this. In this grid, one axis determines the type of data we are working with, while the other axis is the size of the dataset.

Major types of data problems and examples. Source: Deeplearning.AI, licensed under the Creative Commons Attribution-ShareAlike 2.0 license.

In the above image, we can see different examples for each significant type of data problem. In general, it is common to use manual labeling for unstructured types of data. As discussed in article 3, we can obtain more data using data augmentation. However, it is harder to get more data for unstructured data and even harder to label them manually.

When we have a relatively small dataset, having clean and consistent labels is critical for training a good model. However, it is more challenging to get clean labels for massive datasets. Hence, the emphasis is generally placed on data processing for large datasets.

Why is having consistent labels important in small data problems?

Let’s say we have a dataset with 5 examples where we are trying to map the voltage of the battery vs. the speed of the rotor (rpm) of a remote-controlled helicopter.

Example for a small data problem with noise. Source: Deeplearning.AI, licensed under the Creative Commons Attribution-ShareAlike 2.0 license.

Now, it is tough for us to build a model to fit this data perfectly, as we can clearly see some noise in this dataset. (The point in the middle, where the speed is clearly low.) Now, imagine the same dataset but without noise.

Example for a small data problem without noise. Source: Deeplearning.AI, licensed under the Creative Commons Attribution-ShareAlike 2.0 license.

Here we can confidently fit the data to a model, and we can easily model the data. Some of the ongoing research problems in Mechanical Engineering, where the goal is defect identification, also use a few examples. I have seen a friend working on such a project. The key to such problems is to try to keep the labels consistent and clean.

If labeling instructions are unclear, then the labels will unconditionally be ambiguous. One thing to note is that we can also find big data problems with small data challenges.

For example, say that we are collecting data for developing a product recommendation system for shopping marts. The average marts generally have products belonging to thousands of different categories. We might find many products with very few sales. Such cases have shallow user interaction data. We need to ensure that this small data for each item, over a long tail of things, have consistent labeling.

Improving label consistency

One way to improve label consistency is to have clear and defined instructions. To achieve that, we can ask multiple labelers to label the same examples, and when there is a disagreement,

  • We can identify it and discuss the definition of y to reach a new agreed definition.
  • We might also encounter cases where we need to consider changing X as it might not have enough information.

After identifying the issues with the labels, we can use that information to update the labeling instructions. This process can be repeated until it is hard to increase agreement significantly.

Some results from such an iterative process can include,

  • Standardizing the labels so that the labels become consistent.
Merging different kinds of text transcriptions into a single clean sentence.
  • Merge some of the classes in y. To achieve cleaner labels.
Merging 2 different categories of scratches into a single category called “Scratch”.
  • We also see cases where we add a new label to determine the uncertainty.
    — In the above example of different kinds of scratches, we can try and assign the scratches with values between 0 and 1 to determine the class.
    — Or, in the example of the transcription given earlier, we can add a new tag called ‘unintelligible’ as a Boolean value.

We can also have a voting mechanism to make the labels more consistent.

Human Level Performance

We have already seen why Human-Level Performance is important in the previous articles. One more interesting and important use for HLP is to estimate what is possible with the project.

Let us look at a scenario to understand more about this. Say you have ground truth labels for your project, and now your supervisor has given you the task of developing a model with an accuracy of 99%. To first check whether the 99% is possible, it is essential to see how a human performs on this task.

Suppose the HLP is only 85%; then, it is not easy for you to develop a model that can achieve 99% accuracy. This is one advantage of establishing a baseline before we start modeling without any direction. However, this entire premise depends on what the ground-truth label is.

When the ground truth is some actual drought-proof ground truth, then HLP can be used to measure the performance of a person and a model. However, when the ground-truth label is defined by another human, the HLP is actually a measure of one person predicting another person’s outcome.

In identifying HLP, if it is much worse than the perfect performance, then one of the most common causes is the ambiguity of the labeling instructions.

In summary,

  • HLP is used in academia to establish and beat an excellent benchmark to support the publication.
  • HLP is a more reasonable target for business or product performance than a random number like 99% or 98%

However, there can also be some misleading use of this baseline to support ML superiority over Human-Level Performance and the settings where the goal was to build a practical application.

Until now, we have seen how we define the data, set the baselines, and the common issues in that process. Let us look at how we actually go about obtaining the data.

Some of the common tips for obtaining data,

  • Get into the modeling iteration as fast as possible in the initial stages of the product. Based on the error analysis, we can go back and see if we need more data than we have already collected.
  • Instead of searching for the initial data indefinitely, set a small window, like a week, and try to find as quality data as possible within the window. However, if you have experience with a similar project, you know precisely how much time you need.
  • Make an inventory data, where you brainstorm the list of possible data sources, the cost of attaining the data, the time needed, and the quantity of the data that can be obtained. Based on that, make necessary decisions. Other vital factors are data quality, privacy, and regulatory constraints.

There is no best data source for all projects, depending on the application, the quality of the data needed, and the source of the data chosen. For example, we might have to select in-house data or purchase data from factories for a project like a factory inspection or medical image diagnosis. Then let subject experts generate the labeling instructions or do the labeling. For Recommender systems, no matter the source, it is generally hard to label the data. Hence, there are many factors that go into choosing the source of the data and choosing the right person to label the data.

Managing Data Pipelines

Data pipelines or cascades refer to the multiple steps of data processing before reaching the final data required for modeling. However, when we speak about data processing, we need to be more detailed, as there are actually 2 areas where we process the incoming data.

  1. Pre-processing for modeling: The first phase is relatively simple, as we need to go through a few pre-processing steps, be it computer scripts or manual processing, and use that processed data to train an ML model.
  2. Pre-processing the input data in production: The processing of input data from users should be done to replicate the distribution of the data on which the model is trained. However, generating accurate replicate scripts for production is only partially possible. Hence, we try and approximate the distributions as much as possible.

In practice, it is recommended to follow 2 phases for any ml project. The first phase is the POC phase (as discussed in our previous notes) and the production phase. For the POC phase, it is recommended that you do whatever processing steps necessary to get the model to work without worrying about the ability to replicate. (Of course, you should maintain extensive notes.)

After the project utility is established, we should use more sophisticated tools to make the data pipeline replicable. Some libraries/tools, like TensorFlow Transform, Apache Beam, etc., are recommended based on preference.

Metadata, Data Provenance, and Lineage

Most of the pipelines in production are so large and complex, unlike the simple systems we have seen until now. For each application, we have to move through different systems while trying to maintain the quality of our results.

A sample pipeline for job recommendation model. Source: Deeplearning.AI, licensed under the Creative Commons Attribution-ShareAlike 2.0 license.

The above image is an example where multiple systems are linked and trained so that we can get a job recommendation. There are far more complex systems that are currently used in production, which have even deeper trees. In this case, imagine what we must do if we find that some part of the initial steps (say in the spam dataset) needs to be changed due to recently discovered errors. What should we do?

To go back and fix this problem, we need to go through files developed by different people and teams. Hence, to ensure the system is maintainable, it is helpful to keep track of data provenance and lineage.

  • Data provenance — Where the data comes from
  • Data lineage — Sequence of steps in the pipeline.

For managing data pipelines and error analysis for the projects, it is also recommended to maintain extensive metadata for each project step. For example,

  • In a visual inspection project, the data would be pictures about different devices; the meta-data store information about the time the photo was taken, the factory at which it was taken, device model, Inspector id, etc.,
  • In a speech recognition system, the data would be recorded audio in mobile phones, whereas meta-data would be the model of the phone, the version of the VAD model, Network conditions for audio file transmission, etc.,

Maintaining and keeping track of meta-data, data provenance, and data lineage is recommended for better analysis and sustainability of the model pipeline.

Man! Even writing about this topic was dry for me! In actual practice, it feels way more interesting as you can directly see different problems arising in the data. (It will also be frustrating, though!). If you think this is a drag, imagine how many hours were spent training the models behind the current, everyone’s favorite ChatGPT! When I think about that, my appreciation for OpenAI rises to a new level.

Even when reading this short article, you can find many grammatically ambiguous statements and punctuation that need to be corrected. But, these people have trained a model that can generate statements without ambiguity while maintaining punctuational accuracy.

Anyway! I would originally have divided this long article into 2 different parts, but I wanted to compensate for my last week’s delay and cover the entire phase in a single step so that I won’t be behind my schedule. Hope you like this one, and if you haven’t read my previous articles in this series, I hope you will go ahead and check them. Meanwhile, follow me for more frequent updates, and stay tuned for more articles!

By the way, feel free to checkout my website, NeuroNuts! I plan to add all my ideas and some detailed articles here!

Originally published at https://www.neuronuts.in on February 5, 2023.

MLOps 3.3: Data-Centric Approach for Machine learning modelling.

Learn about the “Data-centric” machine learning modelling approach and its standard procedures here.

pub.towardsai.net

NeuroNuts

For those with a burning desire to put their enthusiasm for ML and Data Science to a good use, there must be clear and…

www.neuronuts.in

Akhil Theerthala – Medium

Read writing from Akhil Theerthala on Medium. Machine Learning Enthusiast U+007C Data Enthusiast U+007C AI Enthusiast U+007C. Every…

medium.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->