Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Demystifying AI for everyone: Part 1 -NLP Basics
Latest   Machine Learning

Demystifying AI for everyone: Part 1 -NLP Basics

Last Updated on July 19, 2023 by Editorial Team

Author(s): Himanshu Joshi

Originally published on Towards AI.

In the age of ChatGPT, let's start with the basics

Over the years, we humans have devised ways to communicate effectively with each other. One of the ways to communicate, and the most used one, is Speech. We speak with each other using various languages Ex: English, German, French, Hindi, etc…

Photo by Alexandra on Unsplash

Natural Language Processing (NLP) is just one part of Artificial Intelligence (AI) that helps Computers understand and process human language.

Similar to human languages, we use NLP to devise language models so that machines can understand. Ex:- Chat GPT-3 is the third generation of OpenAI’s Generative Pretrained Transformer language models.

But hey, why do we even care about learning NLP??

That’s because, knowingly or unknowingly, we all use NLP in our day-to-day lives

Have you ever wondered how we get those auto-correction suggestions while typing messages or how does google lens read the words written in an image?

Everything is powered by NLP. So let's see a few use cases

Natural Language Processing (NLP) use cases:

Sentiment Analysis: This is the process of understanding the sentiment of the person speaking/writing.

Photo by Nik on Unsplash

Ex:- Analysis of tweets/reviews of customers to understand what they feel about a company’s products.

Document Summarization: This is used to summarize huge blocks of texts

Ex:- Book summary or Summary of customer feedback etc…

Language Translation: Translate from one language to another

Ex:- English to Japanese or vice versa.

Speech-to-text & Text-to-speech:- These are used to transcribe an audio or text or vice versa. The transcribed text can then be fed to the computers for further processing.

Ex:- Amazon Alexa

There are many other use cases, I hope you guys get a gist of a few

So in this article, let's touch upon how machines understand text data:-

Photo by Andrea De Santis on Unsplash

Computers understand only binary information. 1 or 0, in short, numerical information.

Hence, we need to first convert text data to numerical format so that we can feed it into various NLP machine learning models for the above-mentioned use cases.

But even before we convert text to numbers. We need to work on the text data to clean it and structure it in the proper format.

Following are the steps that are generally used in the text preprocessing pipeline (some steps can be omitted based on the context of the problem):-

  • Remove white spaces (extra spaces in the text, these are present due to formatting issues)
  • Remove punctuations
  • Remove numbers
  • Remove stop words (common words which won't give much information as they are present in all documents Ex:- a, an, of, the, etc…)
  • Remove symbols (Ex:- @, <, $, %, etc…)
  • Lowercase all words
  • Perform stemming/lemmatization on all words (Ex:- Runs, Running, Run all become run)

As I mentioned earlier, this is just an example of a standard general preprocessing pipeline, this should be customized project to a project basis.

Post this, we need to Tokenise the documents — Tokenisation is a process of breaking up text documents into chunks of words

So now our input data would look something like this — Every word becomes one column, and every document (sentence) is a row

Now this input is then used for Vectorization

Vectorization is nothing but converting words into vector formats so that computers can understand them

And Voila, you have understood the basics, I might say, the core of NLP.

There are many Vectorization techniques:-

  • Bag of Words (BOW)
  • TFIDF
  • Word Embeddings

This is a topic that will require a whole article, so I will cover this in the next article.

Hope you enjoyed this post; I have tried to explain it in a very simple manner.

All the above-mentioned steps are taken care of by libraries, and you don't need to code anything on your own.

I remember when I first started learning NLP, I had a fear of everything. But when I actually started taking an interest, it was very easy.

Just try to keep learning and take small steps towards NLP. I promise nothing is difficult if you are willing to apply yourself.

All the best in your journey. Onwards and Upwards people…

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓