Zomato Sentiment Analysis

Last Updated on July 15, 2023 by Editorial Team

Author(s): Roli Trivedi

Originally published on Towards AI.

A Journey through EDA and Data Preparation

In this article we will define the objective, Load data, Perform Exploratory Data Analysis and do data preparation

Steps to be followed for our model :

Define the objective of the problem statement.
Data Gathering
Exploratory Data Analysis(EDA)
EDA is basically where we use techniques to understand the data better to give visual representation to other
Data Preparation
Data might not be in the correct format. There might be outliers or missing values. So you need to scan the set of inconsistencies and fix them
NOTE: The data Preparation step and EDA goes hand in hand
Build Machine Learning model
Model Evaluation & Optimization
Prediction/ Deployment

Now, it’s time to dive into the implementation!

Objective

The project aims to analyze Zomato restaurant data in India to understand customer sentiments through reviews and visualize the data for insights.

Load Libraries

Dataset: Link

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 
%matplotlib inline

import warnings 
warnings.filterwarnings("ignore")

import datetime as dt
from wordcloud

Load data

review = pd.read_csv("Zomato Restaurant reviews.csv")

Exploratory Data Analysis

More than anything, EDA is the state of mind. It is the first step you can perform before making any changes to the dataset. The process of EDA contains contain summarizing, visualization, and getting deeply acquainted with important traits of the dataset.

What does data look like?

review.random(5): to get random 5 records
review.tail(): to get the last 5 records
review.head(): to get first 5 records
Metadata contains the number of followers and reviews on restaurants.

How big is the data?

There are 10000 records(or reviews) given with 7 features.

Columns in our dataset

What is the data type of columns?

It gives us non-null count and datatype information. Also, given how much it is occupying space in memory

What does mathematically our data look like?

Are there any duplicates?

We have 36 duplicates. Let’s have a look at its duplicates

Since all the duplicated rows are null values. We can drop them off later when we do preprocessing.

Are there any missing values?

We have many null values.

Check Unique Values for each variable in the review dataset.

We will check the unique value for rating as we have 10 unique ratings

The ratings are given by the customer as 1,1.5,2,2.5,…5, and ‘like’ while there are some missing values.

Findings:

The rating should be an integer, but it contains the value ‘like,’ indicating that it is of the object data type.
Timings are provided in text format, making them an object data type.
We have duplicate values, but since they are null values, we can eliminate them.
The dataset consists of a total of 10,000 reviews, encompassing 7 features.
With the exception of restaurant names and the number of pictures posted, most values are null.
Based on the review dataset’s description, we can deduce that 100 restaurants have received customer reviews.
The rating can be considered a categorical variable ranging from 0 to 5. We can replace missing values with the median rating for that specific restaurant. Since ‘like’ is not a rating, we can replace it with a rating of 4, as it represents people like the taste.
Customers have posted pictures with 36 distinct values.

Data Preparation

You must have noticed you had not done any cleaning or transformation by the time we finished the EDA section. However, we have determined what cleaning is required and what needs to be cleaned.
Note: Feature Engineering is the data preprocessing step. It basically makes raw data into more meaningful data or data that can be understood by ML.

Drop duplicate values as they are null

review.drop_duplicates(inplace = True, keep = False)

inplace = True: we are modifying the DataFrame rather than creating a new one.
keep = False: dropping all duplicates

Replace Rating ‘Like’ with rating 4 and convert the column to float type
Note: Series.str can be used to access the values of the series as strings and apply several methods to it

review['Rating']=review['Rating'].str.replace("Like",'4').astype('float')

Fill null values in the ‘Followers’ column with 0

review['Followers'].fillna(0,inplace = True)

Convert the column “Time” to datetime and extract the hour and year

Now we have null values in the ‘Review’ column only therefore we can drop them as we do not require records with no reviews. (Since the number of missing values was less in number, therefore, it didn’t affect )

Splitting the metadata into Reviews and Followers

Replacing Missing Values in “Followers” with 0 and converting Time to date time, extracting Hour and year

review['Followers'].fillna(0,inplace = True)
review["Time"] = pd.to_datetime(review['Time'])
review['Hour'] = pd.DatetimeIndex(review['Time']).hour
review['Year'] = pd.DatetimeIndex(review['Time']).year

Average rating and the total number of reviews given to restaurants

avg_rating = review.groupby('Restaurant').agg({'Rating' : 'mean', 'Reviewer' : 'count'}).reset_index().rename(columns = {'Reviewer' : 'Total_Review'})
avg_rating

“Thank you for joining me on this journey! Stay tuned for my upcoming updates as we dive deeper into the world of Zomato sentiment analysis. Exciting things are yet to come, so stay connected for the next steps and insights. Together, we’ll unravel the hidden stories within the data. See you soon!”

Thanks for reading! If you enjoyed this piece and would like to read more of my work, please consider following me on Medium. I look forward to sharing more with you in the future.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Zomato Sentiment Analysis

Author(s): Roli Trivedi

A Journey through EDA and Data Preparation

Objective

Load Libraries

Load data

Exploratory Data Analysis

What does data look like?

How big is the data?

Columns in our dataset

What is the data type of columns?

What does mathematically our data look like?

Are there any duplicates?

Are there any missing values?

Check Unique Values for each variable in the review dataset.

Findings:

Data Preparation

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Accelerating Drug Approvals Using Advanced RAG

How AI is Transforming Evaluation Practices

The Potential Consciousness of AI: Simulating Awareness and Emotion for Enhanced Interaction

TAI #136: DeepSeek-R1 Challenges OpenAI-o1 With ~30x Cheaper Open-Source Reasoning Model

Mastering Data Scaling: The Only Guide You’ll Ever Need (Straight from My Journey)

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Zomato Sentiment Analysis

Author(s): Roli Trivedi

A Journey through EDA and Data Preparation

Objective

Load Libraries

Load data

Exploratory Data Analysis

What does data look like?

How big is the data?

Columns in our dataset

What is the data type of columns?

What does mathematically our data look like?

Are there any duplicates?

Are there any missing values?

Check Unique Values for each variable in the review dataset.

Findings:

Data Preparation

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement