Demystifying AI for everyone: Part 1 -NLP Basics

Last Updated on July 19, 2023 by Editorial Team

Author(s): Himanshu Joshi

Originally published on Towards AI.

In the age of ChatGPT, let's start with the basics

Over the years, we humans have devised ways to communicate effectively with each other. One of the ways to communicate, and the most used one, is Speech. We speak with each other using various languages Ex: English, German, French, Hindi, etc…

Natural Language Processing (NLP) is just one part of Artificial Intelligence (AI) that helps Computers understand and process human language.

Similar to human languages, we use NLP to devise language models so that machines can understand. Ex:- Chat GPT-3 is the third generation of OpenAI’s Generative Pretrained Transformer language models.

But hey, why do we even care about learning NLP??

That’s because, knowingly or unknowingly, we all use NLP in our day-to-day lives

Have you ever wondered how we get those auto-correction suggestions while typing messages or how does google lens read the words written in an image?

Everything is powered by NLP. So let's see a few use cases

Natural Language Processing (NLP) use cases:

Sentiment Analysis: This is the process of understanding the sentiment of the person speaking/writing.

Ex:- Analysis of tweets/reviews of customers to understand what they feel about a company’s products.

Document Summarization: This is used to summarize huge blocks of texts

Ex:- Book summary or Summary of customer feedback etc…

Language Translation: Translate from one language to another

Ex:- English to Japanese or vice versa.

Speech-to-text & Text-to-speech:- These are used to transcribe an audio or text or vice versa. The transcribed text can then be fed to the computers for further processing.

Ex:- Amazon Alexa

There are many other use cases, I hope you guys get a gist of a few

So in this article, let's touch upon how machines understand text data:-

Computers understand only binary information. 1 or 0, in short, numerical information.

Hence, we need to first convert text data to numerical format so that we can feed it into various NLP machine learning models for the above-mentioned use cases.

But even before we convert text to numbers. We need to work on the text data to clean it and structure it in the proper format.

Following are the steps that are generally used in the text preprocessing pipeline (some steps can be omitted based on the context of the problem):-

Remove white spaces (extra spaces in the text, these are present due to formatting issues)
Remove punctuations
Remove numbers
Remove stop words (common words which won't give much information as they are present in all documents Ex:- a, an, of, the, etc…)
Remove symbols (Ex:- @, <, $, %, etc…)
Lowercase all words
Perform stemming/lemmatization on all words (Ex:- Runs, Running, Run all become run)

As I mentioned earlier, this is just an example of a standard general preprocessing pipeline, this should be customized project to a project basis.

Post this, we need to Tokenise the documents — Tokenisation is a process of breaking up text documents into chunks of words

So now our input data would look something like this — Every word becomes one column, and every document (sentence) is a row

Now this input is then used for Vectorization

Vectorization is nothing but converting words into vector formats so that computers can understand them

And Voila, you have understood the basics, I might say, the core of NLP.

There are many Vectorization techniques:-

Bag of Words (BOW)
TFIDF
Word Embeddings

This is a topic that will require a whole article, so I will cover this in the next article.

Hope you enjoyed this post; I have tried to explain it in a very simple manner.

All the above-mentioned steps are taken care of by libraries, and you don't need to code anything on your own.

I remember when I first started learning NLP, I had a fear of everything. But when I actually started taking an interest, it was very easy.

Just try to keep learning and take small steps towards NLP. I promise nothing is difficult if you are willing to apply yourself.

All the best in your journey. Onwards and Upwards people…

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Demystifying AI for everyone: Part 1 -NLP Basics

Author(s): Himanshu Joshi

In the age of ChatGPT, let's start with the basics

But hey, why do we even care about learning NLP??

Natural Language Processing (NLP) use cases:

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Demystifying AI for everyone: Part 1 -NLP Basics

Author(s): Himanshu Joshi

In the age of ChatGPT, let's start with the basics

But hey, why do we even care about learning NLP??

Natural Language Processing (NLP) use cases:

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement