Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

A Chatbot With the Least Number of Lines Of Code
Latest

A Chatbot With the Least Number of Lines Of Code

Last Updated on September 30, 2022 by Editorial Team

Author(s): Chinmay Bhalerao

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Chatbot and NLP in the simplest form

Photo by Alex Knight on Unsplash

Natural language processing involves creating meaningful sentences and phrases using Natural Language. Text Realization, Sentence Planning, and Text Planning are all involved. Planning a text includes locating the pertinent information in a knowledge base. Planning a sentence involves selecting the necessary words, creating meaningful phrases, and establishing the sentence’s mood. The process of translating a sentence plan into text is called text realization. The chatbot is a computer program that attempts to have a conversation with a human user in natural language. A chatbot is one of the biggest applications of Natural Language Processing [NLP].NLP is divided into three basic parts :

  • Natural language generation(NLG)
  • Natural language understanding(NLU)
  • Natural language interaction(NLI)

The aim of this blog is to make a workable chatbot with very few lines of code. Let's start building.

Importing libraries:

I am assuming you already have the python interpreter installed. Importing basic libraries such as pandas and NumPy for data handling and data manipulations. Then I used nltk [Natural language tool kit] for word and sentence data operations.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import nltk
import random
import string
import warnings
warnings.filterwarnings('ignore')

Lets download ‘punkt’ pakage. It creates a list of sentences from a text by utilizing an unsupervised algorithm to find words that start sentences, collocations, and abbreviations.

nltk.download('punkt',quiet = True)

Dataset for chatbot:

I am creating this chatbot completely dedicated to Asthma. Because there might be many doubts regarding Asthma in people and the dedicated bot is the best way to resolve doubts. This can be treated as a medical bot.

For this particular chatbot, the dataset is a text file that includes all information related to Asthma. We can use web scrapping here directly but for the sake of simplicity, I already web-scraped data regarding asthma from government and trustest websites which are mentioned below:

  1. Mayo Clinic
  2. World Health Organization [WHO]
  3. Centre for disease control and prevention [CDCP]

Although I took only 3 websites but you can take as many as you can to make predictions related to words more accurate. After scrapping, data will be stored in a single text file and supplied for further processing.

with open("Asthma_data_1.txt",'r',encoding = 'utf8') as f:
Data = f.read()
print(Data)

As preprocessing is very important for data sets to make them purer and state-of-the-art for the processing of models for better results. Natural language processing allows us to do tokenization and lemmatization.

Tokenization: Tokenized words in each sentence into their root word so that they will work as chunks to process.

Lemmatization: Lemmatization works the same as tokenization, but the difference is it converts into a word that has meaning. This means converting them into smaller chunks and then making them useful or understandable words. So we are doing tokenization here. We can directly import sent_tokenize from nltk.

# Tokenizing the data
from nltk import sent_tokenize
Data = sent_tokenize(Data)
Data
This is how your data will get tokenized [Image by Author]

Now it's time to work on the initial conversation. We will assign a few words that we use in daily life to start a conversation. It's like greetings. There are a lot of scopes to improve the following code, like adding as many as greeting you can, then making greetings in lower case to make it easy for processing. but for now, I am just assigning a few words to show the demo.

# This function generates the greeting responce
Corpus = Data
def greeting_responce(Text):
# Lowering the text
Text = Text.lower()

# Bot greetings responce
bot_greeings = ['Namaskar','Hi','Hey there','Namaskar','Hello',]

# User greetings
user_greetings = ['hi','hey','hello','ola','greetings','wassup','namaskar']

for word in Text.split():
if word in user_greetings:
return random.choice(bot_greeings)

Defining the function for index sort

def index_sort(list_var):

length = len(list_var)

list_index = list(range(0,length))

x = list_var
for i in range(length):
for j in range(length):
if x[list_index[i]] > x[list_index[j]]:
#Swap
Temp = list_index[i]
list_index[i] = list_index[j]
list_index[j] = Temp

return list_index

Now we reached the final stage of creating a chatbot. This is particularly to map the similarity between a word that the user asked for and its most similar word from the database. We are using cosine-similarity here.

#Create bot response
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def bot_response(user_input):
user_input=user_input.lower()
Corpus.append(user_input)
bot_response=''
cm=TfidfVectorizer().fit_transform(Corpus)
similarity_scores=cosine_similarity(cm[-1],cm)
similarity_scores_list=similarity_scores.flatten()
index=index_sort(similarity_scores_list)
index=index[1:]

response_flag=0


j = 0
for i in range(len(index)):
if similarity_scores_list[index[i]]>0.0:
bot_response=bot_response+''+ Corpus[index[i+1]] + ' ' + Corpus[index[i+2]] + ' ' + Corpus[index[i+3]]
response_flag=1
j=j+1
if j > 2:
break

if response_flag==0:
bot_response=bot_response+''+"I apologize I dont understand"

Corpus.remove(user_input)
return bot_response

Interaction :

Now the final chunk of code is where you can start the chat with the chatbot. You can write basic statements before chatting to know the user what this chatbot is doing.

#Start the chat with Chatbot
print("Doc Bot: Hii....!!!, I am here to help you.")
print(" I am Doctor for short, I will tell about your doubts regarding 'Asthma'. ")
print("NOTE: My suggestions are only the results of google search and only for knowledge purpose.Please seek medical attention and Medication in case of Asthma")
exit_list=['exit','bye','see you later','quit','break','no','thanks','thanks alot']
while(True):
user_input=input("You : ")
if user_input.lower() in exit_list:
print("Doc Bot: Ok, Thanks, Bye...")
print(" Stay Home, Stay Safe!!!")
break
elif user_input.lower() in ['ok','hmm','its ok']:
print('Doc Bot: Can I tell you more about it')
elif user_input.lower() in ['yes','ok']:
print('Doc Bot:'+bot_response(user_input))
else:
if greeting_responce(user_input) != None:
print('Doc Bot:'+ greeting_responce(user_input))
else:
print("Doc Bot:"+bot_response(user_input))
Chat results [Image by Author]

There is a wide variety of words that can be encountered and not known by Bot. This is where data plays an important role. Medical bots are tested very drastically before production as they give very precious information that must be verified by the proper person. The next strategy can be to use a GUI tool or a hosting website to host this bot. That's completely the user’s choice. So with very bare minimum lines of code, we created a simple chatbot.

How to test the bot?

The way that I followed for testing is creating a google form and sending it to all your near-ones/colleagues/friends. The form will ask them about their doubts regarding Asthma and the weirdest and strangest questions you can test on the bot. After answers, you can understand if more data is required or not! And that will be real-life testing for a bot.

If you find this insightful

If you found this article insightful, follow me on Linkedin and medium. you can also subscribe to get notified when I publish articles. Let's create a community! Thanks for your support!


A Chatbot With the Least Number of Lines Of Code was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓