Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Data Scraping in the Spotlight: Are Language Models Overstepping by Training on Everyone’s Content?
Latest   Machine Learning

Data Scraping in the Spotlight: Are Language Models Overstepping by Training on Everyone’s Content?

Last Updated on July 24, 2023 by Editorial Team

Author(s): Viggy Balagopalakrishnan

Originally published on Towards AI.

While scraping enabled models to get where they are, cleanly sourced data is going to become more and important

As I wrapped up the research for this piece and was about to start writing, OpenAI had a perfect announcement to go with it — they are temporarily disabling the “Browse with Bing” feature on ChatGPT. If you haven’t used it before, this is a feature available to paying Plus users. Plus gives you access to primarily two things:

  • Browse with Bing — By default, ChatGPT does not connect to real-time website data (eg. if you ask it what are upcoming Marvel movies in 2023, it won’t give you an answer because its training data stops at Sept 2021). Browse with Bing goes beyond this limitation by leveraging real-time information from sites across the web, which OpenAI now gets access to given their partnership with Microsoft Bing
  • Plugins — These are integrations built into ChatGPT by independent companies to expose their capabilities through ChatGPT’s UI (eg. OpenTable lets you search for restaurant reservations, Kayak lets you search for flights from within ChatGPT if you use their plugins); These are experimental at this point and are “cool” features but users haven’t really found them useful (yet).

Therefore, Browse with Bing is particularly important for ChatGPT because its biggest competitor Google Bard has the ability to use real-time data from Google Search. See example responses from ChatGPT vs Bard for the Marvel movies in 2023:

ChatGPT 3.5 (left) vs Google Bard (right) for real-time information (Source: created by author)

So, you can see why it’s non-trivial for OpenAI to disable Browse with Bing (even temporarily). The reasoning is what’s interesting:

We have learned that the ChatGPT Browse beta can occasionally display content in ways we don’t want. For example, if a user specifically asks for a URL’s full text, it might inadvertently fulfill this request. As of July 3, 2023, we’ve disabled the Browse with Bing beta feature out of an abundance of caution while we fix this in order to do right by content owners. We are working to bring the beta back as quickly as possible, and appreciate your understanding!

It’s interesting because it brings into spotlight a larger issue: Companies like OpenAI and Google Bard are using a large amount of data to train their models but it’s unclear whether they have the permissions to use this data and how they are compensating creators / content platforms for use of this data.

In this article, we’ll unpack a few things:

  1. What are Large Language Models (LLMs) and why do they need data?
  2. Where are they getting this data from?
  3. Why should companies like OpenAI, Google care about how they source data?
  4. What strategies are content platforms adopting to respond to this?

At the end of the article, you will hopefully walk away with a fuller picture of this rapidly evolving topic. Let’s dive in.

What are Large Language Models and why do they need data?

We’ll start with a simple explainer of how Machine Learning models work — let’s say you want to predict how late your upcoming flight’s arrival time will be. A very basic version can be human guess work (eg. if weather sucks or if the airline sucks, it’s likely late). If you want to make that more reliable, you can take real data on flight arrivals times and pattern match it again various factors (eg. how arrival times related to airline, destination airport, temperature, rainfall etc.).

Now you can take this one step further, use the data and create a math equation to predict this. For example: Delay minutes = A * airline reliability score + B * busy-ness of an airport + C * amount of rainfall. How do you calculate A, B, C? By using the large volume of past arrival time data you have and doing some math on it.

This equation in math terms is called a “regression” and is the one of most commonly use basic machine learning models. Note that the model is basically a math formula comprising of “features” (eg. airline reliability score, busy-ness of an airport, amount of rainfall) and “weights” (eg. A, B, C which show how much weight each variable adds to the prediction).

The same concept can be extended to other more complex models — like “neural networks” (that you might have heard in the context of deep learning) or Large Language Models (often abbreviated to LLMs and are the underlying models for all text-based AI products such as Google Search, ChatGPT and Google Bard).

We won’t go into too much detail but each of these models, including LLMs, are a combination of “features” and “weights”. The most performant models have the best combination of features and weights, the way to get to that combination is through training with a TON of data. The more data you have, the more performant the model. Therefore, having a massive volume of data is critical and companies that train these models need to source this data.

Where are they getting this data from?

Broadly, data sources can be broadly categorized into:

  1. Open Source Data: These are high volume data sources that are typically available for commercial purposes, including LLM training. Examples of large open source data include Wikipedia, CommonCrawl (an open repository of web crawl data), Project Gutenberg (free eBooks), BookCorpus (free books written by unpublished authors) to name a few.
  2. Independent Content Websites: These include a broad set of websites such as news publications (think Washington Post, the Guardian), creator-specific platforms (think Kickstarter, Patreon, Medium) and user-generated content platforms (think Reddit, Twitter). These typically have more restrictive policies when it comes to scraping their content, especially if it’s used for commercial purposes.

In an ideal world, LLM companies would explicitly list out all the data sources they have used / scraped and do so in compliance with the policies of whoever owns the content. However, several of them have been non-transparent about it, the biggest offender being OpenAI (maker of ChatGPT). Google published one dataset it used for training, called C-4. The Washington Post put together a neat analysis of this data, here are the top 30 sources based on their analysis:

Source: Washington Post’s analysis of top data sources going into Google’s model training

Most of this data was acquired from scraping and content platforms contend that this data was scraped in violation of their terms of use. They are clearly unhappy about it, especially given the amount of upside the LLM companies are able to capture from the data.

Why should companies like OpenAI, Google care about how they source data?

Okay, content providers are complaining. So what? Should companies with LLM products care about this, besides wanting to be “fair” out of the goodness of their hearts?

Data sourcing is becoming increasingly critical for two major reasons.

Legal Complications: Companies developing LLMs are starting to find themselves embroiled in lawsuits from content creators and publishers who believe their data was used without permission. Legal battles can be costly and tarnish the reputation of the companies involved. Case in point:

  1. Microsoft, GitHub, and OpenAI are being sued for allegedly violating copyright law by reproducing open-source code using AI
  2. Getty Images sues AI art generator Stable Diffusion
  3. AI art tools Stable Diffusion and Midjourney targeted with copyright lawsuit

[side note: Stable Diffusion, Midjourney are AI image generators and not language generators and therefore not “LLMs” but the same principles of what constitutes a model and how they are trained is the same]

Making headway with Enterprise Customers: Enterprise customers employing LLMs or their derivatives need to be assured of the legitimacy of the training data. They do not want to face legal challenges due to the data sourcing practices of the LLMs they use, especially if they cannot pass on the liability of those lawsuits to the LLM providers.

Can you really build effective models with all of these messy data sourcing constraints? That’s a fair question. A masterclass in applying these principles is the recent announcement of Adobe Firefly (it’s a cool product and in open beta, you can play around with it) — the product has a wide set of features including Text to image, i.e. you can type a line of text and it will generate an image for you.

Adobe Firefly features (Source: Adobe website)

What makes Firefly a great example is:

  • Adobe only uses images that are part of Adobe Stock that they already have the licenses for, plus open source images that are not license restricted. In addition, they have also announced that they want to build generative AI in a way that enables creators to monetize their talents and that they will announce a compensation model for Adobe Stock contributors once Firefly is out of beta
  • Adobe will indemnify its customers for Firefly outputs (starting with the text to image feature) — if you haven’t heard the term “indemnify” before, in simple terms, Adobe is saying they are confident they have cleanly sourced the data going into their models and are therefore willing to cover any legislation that might come up if someone sues an Adobe customer for using Firefly output.

One criticism of the clean data sourcing approach has been that it will hurt the quality of output generated by the models. The opposite side of that argument is that high quality data owned by content providers can provide better quality input to model training (garbage in, garbage out is real when it comes to model training). In the image below, left is an output from Adobe Firefly, right is from OpenAI’s Dall-E. If you compare the two, they are quite similar and Firefly’s output is arguably more realistic, which goes to show that high quality language models can be built off of just cleanly sourced data.

Adobe Firefly output (left) vs Dall-E output (right) (Source: created by author)

What strategies are content platforms adopting to respond to this?

Several companies that have large volume of content have come out strongly expressing that they intend to charge AI companies for using their data. It’s important to note that most of them have not come up with an anti-AI stance (i.e. they are not saying AI is going to take over our business, so we are shutting down access to content). They are mostly pushing for a commercial construct that defines how the access of this data will occur and how they will get compensated for it.

StackOverflow, arguably the most popular forum that programmers use when they need help, plans to begin charging large AI developers for access to the 50 million Q&A content on its service. StackOverflow CEO Prashanth Chandrasekar laid out some reasonable arguments:

  • The additional revenue will be vital to ensuring StackOverflow can keep attracting users and maintaining high-quality information, which will also help future chatbots by generating new knowledge on the platform
  • StackOverflow will continue to license data for free to some people and companies, and only looking to charge companies developing LLMs forcommercial purposes
  • He argues that LLM developers are violating Stack Overflow’s terms of service, which he believes falls under a Creative Commons license that requires anyone later using the data to mention where it came from (which LLMs don’t do)

Reddit came out with a similar announcement (alongside their controversial changes to API pricing that shut down several third party apps). Reddit CEO Steve Huffman told the Times “The Reddit corpus of data is really valuable but we don’t need to give all of that value to some of the largest companies in the world for free”.

Twitter stopped free access to their APIs earlier this year, and also announced a recent change that limits the number of tweets a user can see in a day, in an attempt to prevent unauthorized scraping of data. Though the execution and rollout of the policies leave much to be desired, the intent is clear that they do not intend to provide free data access for commercial purposes.

Another group that has come out with a united front and critique of LLMs is news organizations. The News/Media Alliance (NMA), which represents publishers in print and digital media in the US, has published what they are calling AI principles. While there isn’t much tactical detail here, the message they are trying to get across is clear:

GAI (Generative AI) developers and deployers should not use publisher IP without permission, and publishers should have the right to negotiate for fair compensation for use of their IP by these developers.

Negotiating written, formal agreements is therefore necessary.

The fair use doctrine does not justify the unauthorized use of publisher content, archives and databases for and by GAI systems. Any previous or existing use of such content without express permission is a violation of copyright law.

Again, their arguments have not been to shut these down but to have commercial agreements in place to use this data in compliance with copyright law, and they also make the argument that compensation frameworks (for example, licensing) already exist in the market today and therefore will not slow innovation.

Conclusion

This is just the beginning. Platforms with high volume of content are likely to seek compensation for their data. Even companies that have not yet announced this intent but already have other forms of data licensing programs (eg. LinkedIn, Foursquare, Reuters) are likely to adapt them for AI/LLM companies.

Though this development may seem like a hindrance to innovation, it is a necessary step for the long-term sustainability of content platforms. By ensuring they are compensated fairly, content creators can continue to produce quality content, which in turn will feed into making LLMs more effective.

Thank you for reading! If you liked this piece, do consider subscribing to the Unpacked newsletter where I publish weekly in-depth analyses of current tech and business topics. You can also follow me on Twitter @viggybala. Best, Viggy.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->