Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Natural Language Processing: Concepts and Workflow
Natural Language Processing

Natural Language Processing: Concepts and Workflow

Last Updated on November 3, 2020 by Editorial Team

Author(s): Bala Priya C

Image for post
Photo by Amador Loureiro on Unsplash

With the huge influx of unstructured text data from a plethora of social media platforms , different forums and a whole wealth of documents, it’s evident that processing these sources of data to distill the information that they contain is challenging because of the inherent complexity involved in processing them. Natural Language Processing (NLP) helps greatly in processing, analyzing and understanding these sources to gain information and meaningful insights; With the recent advances in computing and easier access to computing resources, certain Deep Learning models have achieved SOTA in solving some of the most challenging NLP tasks. The NLP series by Women Who Code Data Science track gives the learners a comprehensive learning path; starting from the basics of NLP, gradually introducing advanced concepts like Deep Learning approaches to solve NLP tasks. This blog post covers module 1 of the series.

What is NLP ?

Natural language processing (NLP) can be considered to be a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language; in particular, how to program computers to process and analyze large amounts of natural language data. With interesting applications such as text classification, sentiment analysis, machine translation, speech to text, text to speech, and so on, NLP has evolved over the past few decades from rule-based approaches, statistical techniques to AI-powered applications in the recent past.

Image Source:https://pegus.digital/business-applications-of-nlp/

Interesting use cases of NLP

Let’s take a look at some of the common use cases of NLP.

  • Machine Translation: Machine Translation is the task of automatically converting one natural language into another while preserving the meaning of the input text and producing fluent text in the output language. However, this task of machine translation comes with inherent challenges such as
  • Text Classification: Text Classification is the process of assigning tags or categories to text according to its content; It’s a fundamental problem in NLP and can be done either manually(tedious, time-consuming, and susceptible to human errors) or by leveraging ML techniques.
  • Sentiment Analysis: Sentiment Analysis is the contextual mining of text which identifies and extracts subjective information in the source text, such as recognizing polarity(positive, negative, neutral), identifying emotions, etc. A typical example is in the e-commerce industry, where mining and analyzing reviews for gaining insights on customer satisfaction and experience, identifying potential areas for improvement are important.

Virtual assistants such as Siri, Alexa and Cortana; Google Translate, Speech to text and text to speech converters are all cool NLP applications that we use in our everyday lives!

Challenges in understanding natural language

Natural language has such great diversity, and every language has its own rich grammar and uniqueness. The following are some of the inherent challenges that arise in NLP tasks.

Ambiguity

Ambiguity is an intrinsic characteristic of human conversations and is particularly challenging in Natural Language Understanding scenarios where there might be different forms that are relevant in natural language and in the AI system that we’ve programmed. In AI theory, the process of handling ambiguity is called disambiguation.

Synonymity

Synonymity stems from the fact that we can express the same idea with different terms (which are also dependent on the specific context); For example, ‘big’ and ‘large’ have a similar meaning when referring to sizes, whereas ‘large’ doesn’t make sense when used as a qualifier to the word ‘sister.’

Co-reference

Co-reference is the process of finding all expressions that refer to the same entity in a text . Co-Reference resolution is an important step for a lot of higher-level NLP tasks that involve natural language understanding and is often instrumental in improving the performances of neural architectures like RNN and LSTM.

Syntactic Rules

Knowledge about the structure and syntax of the language is often helpful, and some of the typical parsing techniques for understanding text syntax are described below.

  • POS Tagging: Parts of speech (POS) are specific lexical categories to which words are assigned, based on their role and context in a given sentence.
Illustration of POS tagging (Image Source)

For example, in the above sentence, “The brown fox is quick, and he is jumping over the lazy dog,” the abbreviations denote the following parts of speech; DET: Dependency tag, ADJ: Adjective, N: Noun , V: Verb CONJ: Conjunction (coordinating), PRON: Pronoun, ADV: Adverb.

  • Shallow Parsing/Chunking: Shallow parsing, also known as chunking, is a method of analyzing the structure of a sentence and breaking it down into its smallest constituents, which usually are tokens such as words, and then grouping them together into phrases.
Example of Shallow Parsing/Chunking (Image Source)
  • Constituency Parsing: Constituency parsing aims to extract a constituency-based parse tree from a sentence. The parse tree represents its syntactic structure according to a phrase structure grammar.
Example of Constituency Parsing (Image Source)
  • Dependency Parsing: Dependency parsing is the task of analyzing the grammatical structure of a sentence by establishing relationships between “head” words and the words which modify those heads.
Example of Dependency Parsing (Image Source)

Generic NLP Workflow

Standard NLP Workflow (Image Source)

The standard workflow for an NLP problem comprises the above-shown steps. The first step is usually text wrangling and pre-processing on the corpus of documents, followed by parsing and basic exploratory data analysis. As the next step, we look at representing text with word embeddings and subsequent feature engineering, followed by choosing the model depending on whether we’re looking at a supervised/unsupervised learning problem. As with any ML workflow, the final stage involves model evaluation and deployment. This module covers the initial steps of text pre-processing and EDA.

Text pre-processing and Exploratory Data Analysis (EDA)

Significance of EDA

Exploratory Data Analysis (EDA) is the process of exploring data, generating insights, verifying assumptions, and revealing underlying hidden patterns in the data; Through these, we can get a basic description of the data, visualize it, identify patterns and potential challenges of using the data.

Text preprocessing

  • Contraction Mapping/ Expanding Contractions: Contractions are a shortened version of words or a group of words, quite common in both spoken and written language. In English, they are quite common, such as I will to I’llI have to I’ve do not to don’t, etc. Mapping these contractions to their expanded form helps in text standardization.
  • Tokenization: Tokenization is the process of separating a piece of text into smaller units called tokens. Given a document, tokens can be sentences, words, subwords, or even characters depending on the application.
  • Noise cleaning: Special characters and symbols contribute to extra noise in unstructured text. Using regular expressions to remove them or using tokenizers, which do the pre-processing step of removing punctuation marks and other special characters, is recommended.
  • Spell-checking: Documents in a corpus are prone to spelling errors; In order to make the text clean for the subsequent processing, it is a good practice to run a spell checker and fix the spelling errors before moving on to the next steps.
  • Stopwords Removal: Stop words are those words which are very common and often less significant. Hence, removing these is a pre-processing step as well. This can be done explicitly by retaining only those words in the document which are not in the list of stop words or by specifying the stop word list as an argument in CountVectorizer or TfidfVectorizer methods when getting Bag-of-Words(BoW)/TF-IDF scores for the corpus of text documents.
  • Stemming/Lemmatization: Both stemming and lemmatization are methods to reduce words to their base form. While stemming follows certain rules to truncate the words to their base form, often resulting in words that are not lexicographically correctlemmatization always results in base forms that are lexicographically correct. However, stemming is a lot faster than lemmatization. Hence, to stem/lemmatize is dependent on whether the application needs quick pre-processing or requires more accurate base forms.

Implementation

Let’s walk through the steps of EDA and text pre-processing in Google Colab.

Dataset

The dataset used is the SMS Spam Collection Data Set- a public collection of SMS labeled messages that have been collected for mobile phone spam search, available in the UCI ML repository.

Basic EDA

Let’s first load in the data using the pandas’ library. Setting header=None ensures that the first row of our data will not be interpreted as the column names of the data frame. Let’s also call the head() method on the dataframe to check the first few records of our data.

import pandas as pd
sms = pd.read_table('/content/SMSSpamCollection', header=None)
sms.head()

We can use the describe() function to obtain various summary statistics that exclude NaN values.

sms.describe()

From the above table returned by sms.head(), we have a collection of text data with 5,572 SMS messages in English, serving as training examples. The first column is the target variable containing the class labels, which tells us if the message is spam or ham (not spam). The second column is the SMS message itself, stored as a string. Since the target variable contains discrete values, this is a classification task. Let’s start by placing the target variable in its own table and checking out how the two classes are distributed.

y = sms[0]
y.value_counts()# Output
# ham     4825
# spam     747
# Name: 0, dtype: int64

We see that there are far fewer training examples for spam than ham . This is typically a class imbalance problem and should be accounted for in the subsequent analysis.

We need to encode the class labels in the target variable as numbers to ensure compatibility with some models in scikit-learn. Let’s use LabelEncoder and set ‘spam’ = 1 and ‘ham’ = 0LabelEncoder is a part of sklearn’s pre-processing utilities.

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y_enc = le.fit_transform(y)

We now store all the SMS text data in raw_text. The isnull() method of pandas can be used to gain insights on missing values. The given dataset, however, is complete, and there are no missing values.

Basic Visualization

There are a couple of basic visualizations we can do. The first is to display the length of all the records. To do this, we must first label the columns with their appropriate titles and add a column to the dataframe that contains the length.

import matplotlib as plt
import seaborn as sns
sms.columns=['label', 'msg']
sms["length"] = sms["msg"].apply(len)
sms.head()
sns.distplot(sms["length"], kde=False)
Plot showing the distribution of word length

Text Pre-processing

Let’s now apply the above discussed pre-processing steps to our dataset.

Step 1: Contraction Mapping

Let’s install and import the contractions library and apply on the message strings to expand contractions, if any.

!pip install contractions
import contractions# Add a new column to our dataframe called “no_contract”
# apply lambda function to the "msg" field which expands contractionssms['no_contract'] = sms['msg'].apply(lambda x: [contractions.fix(word) for word in x.split()])

Since the text with contractions expanded should be tokenized separately, let’s convert them back to string and examine the dataframe again.

sms["msg_str"] = [' '.join(map(str, l)) for l in sms['no_contract']]
sms.head()

Step 2: Tokenization

As discussed above, to tokenize the document into words, let’s install and import the NLTK library and apply the word_tokenize() method on each of the message strings.

import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenizesms['tokenized'] = sms['msg_str'].apply(word_tokenize)

Step 3: Noise Cleaning- Removing special characters, spaces, and lowercasing text.

Let’s convert all text to lower case and subsequently remove punctuations.

sms['lower'] = sms['tokenized'].apply(lambda x: [word.lower() for word in x])import string
punc = string.punctuation
sms['no_punc'] = sms['lower'].apply(lambda x: [word for word in x if word not in punc])sms.head()

Step 4: Spell Checking

Let’s go over a simple example of using pyspellchecker– a library for determining if a word is misspelled and what the likely correct spelling would be based on word frequency.

!pip install pyspellcheckerfrom spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

Step 5: Identifying and removing stopwords

To identify and remove stopwords, we need to import the NLTK stopwords library and set our stopwords to “English” and then use the list of stopwords to retain those words in our text that are not stopwords.

nltk.download('stopwords')
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))sms['stopwords_removed'] = sms['no_punc'].apply(lambda x: [word for word in x if word not in stop_words])
sms.head()

Step 6: POS tagging and Lemmatization

To apply lemmatization to our data, we have to apply parts of speech tags; in other words, determine the part of speech for each word.

nltk.download('averaged_perceptron_tagger')
sms['pos_tags'] = sms['stopwords_removed'].apply(nltk.tag.pos_tag)

NLTK’s lemmatizer requires POS tags to be converted to wordnet’s format. We’ll write a function that makes the conversion.

nltk.download('wordnet') 
from nltk.corpus import wordnetdef get_wordnet_pos(tag):
    if tag.startswith('J'):
        return wordnet.ADJ
    elif tag.startswith('V'):
        return wordnet.VERB
    elif tag.startswith('N'):
        return wordnet.NOUN
    elif tag.startswith('R'):
        return wordnet.ADV
    else:
        return wordnet.NOUNsms['wordnet_pos'] = sms['pos_tags'].apply(lambda x: [(word, get_wordnet_pos(pos_tag)) for (word, pos_tag) in x])
sms.head()

We may now call WordNetLemmatizer on the POS tagged data. The lemmatizer function requires two parameters, the word, and its tag, in wordnet form.

from nltk.stem import WordNetLemmatizer
wnl = WordNetLemmatizer()
sms['lemmatized'] = sms['wordnet_pos'].apply(lambda x: [wnl.lemmatize(word, tag) for word, tag in x])
sms.head()

Now that we’ve pre-processed our text data that can be used for further steps in the pipeline, let’s save the cleaned data in a CSV file.

sms.to_csv('sms_spam_collection.csv')

The Google Colab notebook for the above implementation can be found in this repo.

The recording of the webinar can be found on YouTube.

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->