Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

The Principles of Data-Centric AI Development
Artificial Intelligence

The Principles of Data-Centric AI Development

Last Updated on March 28, 2022 by Editorial Team

Author(s): Team Snorkel

The Future of Data-Centric AI Talk Series

Background

Alex Ratner is CEO and co-founder of Snorkel AI and an Assistant Professor of Computer Science at the University of Washington. He recently joined the Future of Data-Centric AI event, where he presented the principles of data-centric AI and where it’s headed. If you would like to watch his presentation in full, you can find it below or on Youtube.

Below is a lightly edited transcript of the presentation.

What is “data-centric AI”?

The notion of data-centric AI might sound a little tautological to many. When we as a field say “AI” nowadays, we are talking primarily about machine learning, which is by definition about data and always has been. So the question we begin with is first: What is “data-centric AI” and why is the industry’s increasing focus on it new and different? A good way to answer that is to start by contrasting data-centric AI with what has been the focus of machine-learning development for many years: model-centric AI. In model-centric AI development, the data is more or less a fixed input, and the majority of your development time, as an AI/ML engineer, is spent iterating on the model.

In a data-centric AI development cycle, data is instead the central object you iteratively develop, that is you spend relatively more of your time labeling, managing, slicing, augmenting, and curating the data, with the model itself remaining relatively more fixed.

While artificial intelligence has mainly focused on ML models, those engineers actually putting models into production know that focusing on the data is crucial .

Data-Centric AI vs. Model-Centric AI

In the model-centric approach to development, there is frequently the sense that the dataset is something “outside” or “before” the actual AI development process. Machine learning development (at least the predominant supervised learning kind) starts with a training dataset: a collection of “ground truth” labeled data points that your model learns from or fits to. In a traditional model-centric ML development process, the training data is treated as a fixed input that is exogenous from the machine-learning development process. When, for example, you start your academic experiment against one of the benchmark datasets like ImageNet, your training data is something you download as a static file. After that, any new iterations of your project result from changes to the model (at least in the broadest sense). This process includes things like feature engineering, algorithm design, bespoke architecture design, etc- all about iterating on and developing the model. In other words, you are really “living” in the model and treating the data as a static artifact.

In a data-centric approach, you spend relatively more of your time labeling, managing, slicing, augmenting, and curating the data, with the model itself remaining relatively more fixed.

The tectonic shift to a data-centric approach is as much a shift in focus of the machine-learning community and culture as a technological or methodological shift-”data-centric” in this sense means you are now spending relatively more of your time on labeling, managing, slicing, augmenting, and curating the data, with the model itself relatively more fixed.

The key idea is that data is the primary arbiter of success or failure and is, therefore, the key focus of iterative development. It is important to note that this is not an either/or binary between data-centric and model-centric approaches. Successful AI requires both well-conceived models and good data.

The Principles of Data-Centric AI Development

1. AI development today increasingly centers around the data, especially training data.

As mentioned above, until recent years, machine-learning development had been almost entirely model-centric, where the data was mostly imagined as “outside” the process. Even just five years ago, the primary toolkit and focus of development for almost every single machine-learning team was centered around:

  • Feature engineering, i.e., selecting specific attributes or features of the data that the model is actually seeing and learning from.
  • Model architecture design, meaning the actual structure of whatever weights or parameterizations of those features that get fed into the model as input.
  • Training algorithm design.

More recently, though, the industry has begun to exhibit a major shift to much more powerful, automated, but also data-hungry representation learning models. We often call these “deep learning” models. Rather than, say, thousands of free parameters that need to be learned from your data, there are sometimes hundreds of millions. So, despite their power and utility, these models need a great deal more label training data to get to their peak level of performance.

An increasing diversity of tasks and data modalities are all being handled by an ever smaller and more unified set of model architectures that are more push-button, accessible, and powerful than ever before. But they are also far more data hungry and far less practical to modify.

Excitingly, deep learning model architectures are increasingly convergent and commoditized, which means that they are far more accessible than models from years or decades ago. An increasing diversity of tasks and data modalities are all being handled by an even smaller and more stable set of model architectures. But as a result, they are far less practical to modify for users.

And even as these black-box models are increasingly more powerful, they are that much hungrier for data. Because of this, your training data-including data volume but also your data’s quality, management, distribution, sampling, etc.-is more and more the primary arbiter of success. Looking at the most recent literature in the field , it seems clear that most progress on state-of-the-art benchmark tasks is finding creative ways to collect more data, augment it, and then transform or boost it to use it more effectively. In order to meaningfully improve machine learning technology, data now has to become your primary focus.

As a result, though, most of the key operations that used to be what machine-learning and AI development teams spent most of their time on-feature engineering, model architecture design, training algorithm design, and on-are no longer so prominent or time-intensive. Instead, your team’s largest chunk of time is spent on training data collection, augmentation, and management of your data. That presents a conundrum, because while data is the key emerging interface to developing AI today, it is likewise the key bottleneck that constrains progress. This challenge brings us to the second principle of data-centric AI.

2. Data-centric AI needs to be programmatic

Given the growing prominence of data-hungry machine-learning models, how teams interface with their data needs to be something much more efficient than manually labeling and curating one data point at a time.

Building AI applications today often requires virtual armies of human labelers, and that kind of investment and labor requirement is almost always a non-starter for private, high-expertise, and rapidly changing real-world settings. Far from hours or days, it can take multiple person-years for data to actually be ready for machine learning development.

For one use-case example, the Stanford AI Lab Snorkel research project partnered with Stanford Medicine to research using machine learning to rapidly classify and triage chest x-rays . Building the actual ML models took only a day or two using OSS libraries, and the differences in results between different models were actually quite minimal, usually less than one point. Whichever state-of-the-art model we fed the data to, it did not make much difference in the accuracy of our results. In contrast, it took between eight and fourteen person-months of manual labeling by our radiology and medical partners to label the training data originally, and the quality of the labeled training data we fed the models was immensely impactful, making eight- or nine-point differences. This underscores the theme: while the model is still a critical part of the machine-learning process, the highest leverage point for improvements is the data -how it is managed, partitioned, and augmented.

But again, getting that labeled data took the equivalent of eight person-months. And it highlights the fundamental challenges of training data:

  • Real-world use cases require subject matter expertise (SME) for labeling. For example, for a usable medical or clinical dataset, you need medical doctors or professionals doing the labeling. For usable legal datasets, you need qualified lawyers. Often these are SMEs who already have little time to devote to manual labeling tasks.
  • Real-world data is private and proprietary to a particular organization or enterprise. It cannot simply be exported or put in open source for others to use, modify, or learn from.
  • Real-world data and objectives often change rapidly, including the data distribution coming in and the modeling objectives for which you’re actually building your model. As a result, you frequently have to re-label data.

Manual data-labeling and curation, then, is essentially a non-starter for most real-world organizations, even for the largest of them.

Manual data-labeling and curation, then, is essentially a non-starter for most real-world organizations, even for the largest of them. This is before machine learning teams run into the ethical and governance challenges of manually labeling training data. How do we inspect or correct the biases that human labelers bring to the table? How do we govern or audit a dataset of hundreds of millions of hand-labeled data points? How do we trace the lineage of model errors back to where in the dataset the model learned it from? Solving these critical challenges with large, manually-labeled training datasets is a practical nightmare for organizations. In fact, it can actually be a bigger problem than the problem you were trying to solve in the first place.

The way Snorkel AI solves the manual-data-labeling problem is programmatic labeling. For a simple example, you might ask an SME to just write some keywords or phrases, and then label data points with lines of code, rather than laboriously labeling each data point, one by one, manually.

At Snorkel AI, our goal is creating a platform for rapid, data-centric, iterative AI development. In other words, it revolves around modifying, labeling, and managing your data. We call it Snorkel Flow. It has four basic steps:

Snorkel Flow then serves as “Supervision Middleware” for diverse sources of input including patterns, models, knowledge bases and ontologies, and more.

One way of looking at this overall process: You can take some of the best of what rules-based inputs provide-efficiency and transparency of specification, modifiability, auditability, etc.- and bridge that with the generalization capabilities of modern machine learning techniques, including transfer, self-, and semi-supervised approaches.

The key idea here, which we’ve provided theoretical rates for, is that you can then scale up with unlabeled data at the same rate as with adding labeled data. In other words, you can dump more unlabeled data (which are too expensive to label) into these programmatic labeling approaches, thus taking advantage of the volume of unlabeled data present in documents or network signals and actually get similar scaling benefits in terms of model performance.

Snorkel Flow “closes the loop” by rapidly identifying and correcting error modes in the data and models just by writing and editing labeling functions (LFs), thus allowing you to rapidly adapt to real-world conditions constantly and iteratively

In 50+ peer-reviewed publications over the years, and customer case studies via the Snorkel AI company , this Snorkel approach offers an empirically proven way to accelerate AI-it saves person-months and even years, at or above quality parity, on a diverse set of applications.

Snorkel Flow has implemented this data-centric approach across the entire workflow for AI because “data-centric” AI is much more than just labeling. It also includes Transformation Functions (TFs), Slicing Functions (SFs), and more.

Finally, we come to the third main principle of data-centric AI.

3. Data-centric AI needs to be collaborative with subject-matter-experts

For AI to be effective and safe, the SME who actually knows how to label and curate the data has to be included in the loop-and data-centric AI enables this

In the traditional way of doing things, SME labelers and ML engineers/data scientists are disconnected. With Snorkel Flow, SMEs and MLEs collaborate as a fundamental part of the process.

When the SME who actually knows how to label and curate the data is included in the loop, it makes for a much better AI platform. Here are three reasons why:

First, including the SME in the loop allows you to directly inject expertise into the model, rather than the model trying to infer features or heuristics that the SME already knows.

Second, Snorkel Flow enables you to leverage subject matter expertise that has already been encoded-e.g. knowledge bases, ontologies, legacy heuristics and rules. Snorkel Flow can use already-codified (potentially discordant) sources of structured expert knowledge for programmatic supervision.

Finally, and most importantly: including SMEs in the loop is the only real way to ensure that AI models are effective, safe, and ethical, i.e. actually aligned with the output goals and principles of the domain that the SMEs uniquely are expert in.

Crucially, all these benefits of SME / data scientist collaboration are far easier to attain in a data-centric model- since you are meeting at the common ground of data as your centerpoint for iteration and development.

Summary:

The traditional model-centric approach to ML has been tremendously successful and has brought the field to a place in which the models themselves are ever more downloadable, commoditized, and, above all, widely accessible. But the newer, powerful, “deep-learning” models are now so data-hungry that not only have datasets and manual labeling of training data become unwieldy, there are diminishing returns to be had in terms of how much progress can be made iterating only on the model. The answer to pushing AI forward now and over coming years can be found in a data-centric approach.

With data-centric AI development, teams spend much more time labeling, managing, and augmenting data, because data quality and quantity is increasingly the key to successful results. Data should thus be the primary focus of iteration. There are three main principles to keep in mind with a data-centric approach:

  1. As models become more user-friendly and commoditized, the progress of AI development increasingly centers around the quality of the training data that AI models learn from, and the ability to iterate on this data in an agile and transparent way, rather than around feature engineering, model architecture, or algorithm design.
  2. Data-centric AI be programmatic in order to cope with the volume of training data that today’s deep-learning models require, and the practical difficulty of repeatedly and manually getting these labels in most real-world contexts. Manually labeling millions of data points is simply not practical. Instead, a programmatic process for labeling, managing, augmenting, cleaning, and iterating the data is the crucial determiner of progress.
  3. Data-centric AI should treat SMEs as integral to the development process. Including SMEs in the loop who actually understand how to label and curate your data allows data scientists to inject SME expertise directly into the model. Once done, this expert knowledge can be codified and deployed for programmatic supervision.

As the field of ML progresses, successful AI will continue to involve both well-built models and well-engineered data. But because of the sophistication of today’s models, the biggest returns moving forward will emerge from approaches that prioritize the data. And if data is increasingly the key arbiter of success or failure, data has to be the focus of iterative development moving forward. There are many exciting advances in this emerging field of data-centric AI, both here now and to come on the road ahead!

If you’d like to watch Alex’s full presentation you can find it on the Snorkel AI Youtube channel. We encourage you to subscribe to receive updates or follow us on Twitter, Linkedin, Facebook, or Instagram.

1 “Github — Hazyresearch/Data-Centric-Ai: Resources For Data-Centric AI”. 2021. . https://github.com/HazyResearch/data-centric-ai.
2 Dunnmon, Jared A., Alexander J. Ratner, Khaled Saab, Nishith Khandwala, Matthew Markert, Hersh Sagreiya, and Roger Goldman et al. 2020. “Cross-Modal Data Programming Enables Rapid Medical Machine Learning”. 1 (2): 100019. doi:10.1016/j.patter.2020.100019.
3 “Research Papers”. 2022. . https://snorkel.ai/resources/research-papers/.


The Principles of Data-Centric AI Development was originally published at Snorkel AI on January 25, 2022.

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->