Demystifying AI for everyone: Part 1 -NLP Basics

Last Updated on August 1, 2023 by Editorial Team

Author(s): Himanshu Joshi

Originally published on Towards AI.

In the age of ChatGPT, let's start with the basics

Over the years, we humans have devised ways to communicate effectively with each other. One of the ways to communicate, and the most used one, is Speech. We speak with each other using various languages Ex: English, German, French, Hindi, etc…

Demystifying AI for everyone: Part 1 -NLP Basics — Photo by Alexandra on Unsplash

Natural Language Processing (NLP) is just one part of Artificial Intelligence (AI) that helps Computers understand and process human language.

Similar to human languages, we use NLP to devise language models so that machines can understand. Ex:- Chat GPT-3 is the third generation of OpenAI’s Generative Pretrained Transformer language models.

But hey, why do we even care about learning NLP??

That’s because, knowingly or unknowingly, we all use NLP in our day-to-day lives

Have you ever wondered how we get those auto-correction suggestions while typing messages or how does google lens read the words written in an image?

Everything is powered by NLP. So let's see a few use cases

Natural Language Processing (NLP) use cases:

Sentiment Analysis: This is the process of understanding the sentiment of the person speaking/writing.

Ex:- Analysis of tweets/reviews of customers to understand what they feel about a company’s products.

Document Summarization: This is used to summarize huge blocks of texts

Ex:- Book summary or Summary of customer feedback etc…

Language Translation: Translate from one language to another

Ex:- English to Japanese or vice versa.

Speech-to-text & Text-to-speech:- These are used to transcribe an audio or text or vice versa. The transcribed text can then be fed to the computers for further processing.

Ex:- Amazon Alexa

There are many other use cases, I hope you guys get a gist of a few

So in this article, let's touch upon how machines understand text data:-

Computers understand only binary information. 1 or 0, in short, numerical information.

Hence, we need to first convert text data to numerical format so that we can feed it into various NLP machine learning models for the above-mentioned use cases.

But even before we convert text to numbers. We need to work on the text data to clean it and structure it in the proper format.

Following are the steps that are generally used in the text preprocessing pipeline (some steps can be omitted based on the context of the problem):-

Remove white spaces (extra spaces in the text, these are present due to formatting issues)
Remove punctuations
Remove numbers
Remove stop words (common words which won't give much information as they are present in all documents Ex:- a, an, of, the, etc…)
Remove symbols (Ex:- @, <, $, %, etc…)
Lowercase all words
Perform stemming/lemmatization on all words (Ex:- Runs, Running, Run all become run)

As I mentioned earlier, this is just an example of a standard general preprocessing pipeline, this should be customized project to a project basis.

Post this, we need to Tokenise the documents — Tokenisation is a process of breaking up text documents into chunks of words

So now our input data would look something like this — Every word becomes one column, and every document (sentence) is a row

Now this input is then used for Vectorization

Vectorization is nothing but converting words into vector formats so that computers can understand them

And Voila, you have understood the basics, I might say, the core of NLP.

There are many Vectorization techniques:-

Bag of Words (BOW)
TFIDF
Word Embeddings

This is a topic that will require a whole article, so I will cover this in the next article.

Hope you enjoyed this post; I have tried to explain it in a very simple manner.

All the above-mentioned steps are taken care of by libraries, and you don't need to code anything on your own.

I remember when I first started learning NLP, I had a fear of everything. But when I actually started taking an interest, it was very easy.

Just try to keep learning and take small steps towards NLP. I promise nothing is difficult if you are willing to apply yourself.

All the best in your journey. Onwards and Upwards people…

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

Demystifying AI for everyone: Part 1 -NLP Basics

Author(s): Himanshu Joshi

In the age of ChatGPT, let's start with the basics

But hey, why do we even care about learning NLP??

Natural Language Processing (NLP) use cases:

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

RNNs Cannot Think What Transformers Think Cheaply. ICLR 2026 Proved the Gap Is Exponential.

Time Series Made So Easy My Aunt Got It on the Second Read

Claude Cowork 101

Is 3-Bit KV Cache the Holy Grail? A Reality Check on Google’s TurboQuant

LangGraph Multi-Agent Architecture: Building a Self-Critiquing AI Debate System

AutoML on Autopilot

I Ran This Open-Source AI Tool on a Messy Codebase and Got 71x Fewer Tokens — Here Is Exactly What Happened

Month in 4 Papers (April 2026)

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Demystifying AI for everyone: Part 1 -NLP Basics

Author(s): Himanshu Joshi

In the age of ChatGPT, let's start with the basics

But hey, why do we even care about learning NLP??

Natural Language Processing (NLP) use cases:

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement