Automatic Movie Review System Using Sentimental Analysis For Positive or Negative Reviews

Last Updated on July 20, 2023 by Editorial Team

A great article to understand the basics of sentimental analysis and data scraping by reviewing movies.

In this article, we will use machine learning to perform sentimental analysis of reviews available on the IMDB website for any given movie and then decide whether to watch that film or not. It is a good project to understand the basics of NLP as well as Data Scraping. If you are in the field of machine learning for quite a long time, then most probably you can skip this tutorial.

The workflow or methodology that we will use consists of four main parts:

Installing all dependencies and required files
Model Development(Naive Bayes)
Scraping reviews of a particular movie
Predicting the sentiment of each review and deciding whether to watch it or not.

Prerequisites

I’m assuming that you are familiar with the python programming language and you have python 3 installed in your system.

Installing required packages

You can simply use pip install pakage_name for the given packages. The packages that you need to install before start coding are:

selenium — for web scraping and automating the scrolling of the website. In selenium, you also need to download chomedriver.exe for using chrome automatically.
nltk — for performing natural language processing tasks and model training.
bs4 — for BeautifulSoup that is used for parsing Html page.
lxml — it is the package that is used to process Html and XML with python.
urllib — for requesting a webpage.
sklearn(optional) — Used for saving the trained model.

Let’s Start Coding

Create a python file for training and predicting the reviews. Don’t worry, we will scrap these reviews later in a different file.

Model Development

Firstly, we have to download all the necessary data like movie_reviews on which our model will train and some other data like stopwords, punkt that nltk have used in our code. If nltk requires some more data then it will notify you with the error. The following lines will download the data.

import nltk
nltk.download("punkt")
nltk.download("movie_reviews")
nltk.download("stopwords")

Now we will import all the required packages and files. Here, movie_reviews is our training and testing data, stopwords are words like is,the,of that does not contribute to the training. We have used shuffle for shuffling training and testing data. NaiveBayes is the classifier mostly used for NLP(Natural Language Processing) .word_tokenizer is used to divide the text into smaller parts called tokens.

The MAIN_scrap_movies_reviews is our python file for scraping data and reviews_extract is the function that performs that task.

from nltk.corpus import movie_reviews
from nltk.corpus import stopwords
from random import shuffle
import string
from nltk import NaiveBayesClassifier
from nltk import classify
from nltk import word_tokenize
from MAIN_scrap_movies_reviews import reviews_extract
from sklearn.externals import joblib

The bag_words() takes the review and removes stopwords and punctuations from it as they don’t contribute to the training of the model and return the dictionary with every word as key and true as the value.

def bag_words(words):
 global stop_words
 stop_words = stopwords.words('english')
 clean = []
 for i in words:
 if i not in stop_words and i not in string.punctuation:
 clean.append(i)
 dictionary = dict([word, True] for word in clean)
 return dictionary

We have created the function TrainingAndTesting() that trains and check the accuracy of the model. There are two empty lists used to store positive and negative reviews. Then, we iterate in the movie_reviews column fileidsand if it contains posthen that row or review is stored in a positive review list and vice versa. After that, we iterate through each review stored in pos_reviews and call the bag_words function that in turn gives the dictionary containing review words (removed stopwords and punctuations) and associate poswith it representing that this is the positive review. The same goes for negative reviews. Then we shuffle and split the data for training and testing. We have simply used NaiveBayes classifier for this sentimental analysis. The training and testing are simple with inbuilt functions of nltk.

The last line here is used to store the trained model so that there is no need to train the model every time you review the movie. Also, you can simply return the classifier and use that instead in predicting phase.

def TrainingAndTesting():
 pos_review = []
 neg_review = []
 
 for fileid in movie_reviews.fileids('pos'):
 pos_review.append(movie_reviews.words(fileid)) for fileid in movie_reviews.fileids('neg'):
 neg_review.append(movie_reviews.words(fileid)) pos_set = []
 for word in pos_review:
 pos_set.append((bag_words(word),'pos')) neg_set = []
 for word in neg_review:
 neg_set.append((bag_words(word),'neg')) shuffle(pos_set)
 shuffle(neg_set)
 test_set = pos_set[:200]+neg_set[:200]
 train_set = pos_set[200:]+neg_set[200:] classifier = NaiveBayesClassifier.train(train_set)
 acc = classify.accuracy(classifier,test_set)
 print(acc)
 joblib.dump(classifier,'imdb_movies_reviews.pkl')

Now we will create another file for scraping reviews from IMDB or also you can define the function in the same file.

Scraping reviews of a particular movie

Importing the required packages. Uses of all these are defined in the starting.

from selenium import webdriver
import urllib.request as url
import bs4
import time

In this function, we take the movie name as input from the user then we take the content of that particular web page using urlopen. Now we will parse the content using BeautifulSoup and find the link of the first movie result on the page and move to that link then into the comment section, click on load more comments. This is the main reason for using selenium otherwise we will stick to only 5 or 6 preloaded comments. After clicking for specific times (20) in our case, we will find the text of every review and append it into the list and return that list.

def reviews_extract():
 movie = input('Enter name of movie:')
 movie = movie.lower() web = url.urlopen("https://www.imdb.com/find?ref_=nv_sr_fn&q="+movie)
 page1 = bs4.BeautifulSoup(web,'lxml')
 b = page1.find('td',class_='result_text')
 href = b.a['href']
 web2 = url.urlopen("https://www.imdb.com"+href)
 page2 = bs4.BeautifulSoup(web2,'lxml')
 c = page2.find('div',class_='user-comments')
 temp = []
 for a in c.find_all('a',href =True):
 g =(a['href'])
 temp.append(g)
 d = temp[-1]
 driver = webdriver.Chrome('C:\\Users\\dell\\Desktop\\chromedriver.exe')
 driver.get("https://www.imdb.com"+d)
 for i in range(20):
 try:
 loadMoreButton = driver.find_element_by_class_name('load-more-data')
 loadMoreButton.click()
 time.sleep(1)
 except Exception as e:
 print(e)
 break
 web3 = driver.page_source
 page3 = bs4.BeautifulSoup(web3,'lxml') e = page3.find('div',class_='lister-list') e1 = e.find_all('a',class_='title') user_reviews = []
 for i in e1:
 raw = (i.text)
 user_reviews.append(raw.replace('\n',''))
 driver.quit()
 print(user_reviews)
 print(len(user_reviews))
 return user_reviews,movie

Predicting the sentiment of each review

After scraping, we move back to the previous python file. Now, here we will predict the sentiment of each scraped review with the already trained model. In the code, firstly we have loaded the model that we have saved earlier and also call the reviews_extract function defined above to get reviews. After that, we process every review i.e. tokenizing, remove stop words, and convert the review into the required format. Then we predict its sentiment if it neg increases the count of n and if pos increase p and percent of positive reviews is calculated.

def predicting():
 classifier = joblib.load('imdb_movies_reviews.pkl')
 reviews_film, movie = reviews_extract()
 testing = reviews_film tokens = []
 for i in testing:
 tokens.append(word_tokenize(i)) set_testing = []
 for i in tokens:
 set_testing.append(bag_words(i)) final = []
 for i in set_testing:
 final.append(classifier.classify(i)) n = 0
 p = 0
 for i in final:
 if i == 'neg':
 n+= 1
 else:
 p+= 1
 pos_per = (p / len(final)) * 100 return movie,pos_per,len(final)

Then, in the end, we call the required functions and if the positive percentage is greater than 60%, we recommend that movie to our friends.

TrainingAndTesting()
movie,positive_per,total_reviews = predicting()
print('The film {} has got {} percent positive reviews'.format(movie, round(positive_per)))
if positive_per > 60:
 print('overall impression of movie is good ')
else:
 print('overall impression of movie is bad ')

My Results

I was able to get the accuracy of the model around 78% and here is the screenshot of my result. Here 225 are the number of reviews that are analyzed.

The described project is for the beginners, hence I have not used advanced techniques like RNN(Recurrent Neural Networks). The only focus of this article was to provide knowledge and starting phase projects in the field of machine learning.

Thank you for your precious time.U+1F60AAnd I hope you like this tutorial.

You can find the source code for the same at my Github repo.

Check out my article on Visualise Interesting Sorting Algorithms With Python

Visualize Interesting Sorting Algorithms With Python

There are various types of sorting algorithms out there and sometimes it becomes very difficult to understand their…

medium.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Automatic Movie Review System Using Sentimental Analysis For Positive or Negative Reviews

Author(s): Pushkara Sharma

Natural Language Processing

A great article to understand the basics of sentimental analysis and data scraping by reviewing movies.

Prerequisites

Installing required packages

Let’s Start Coding

Model Development

Scraping reviews of a particular movie

Predicting the sentiment of each review

My Results

Visualize Interesting Sorting Algorithms With Python

There are various types of sorting algorithms out there and sometimes it becomes very difficult to understand their…

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

AI Safety on a Budget: Your Guide to Free, Open-Source Tools for Implementing Safer LLMs

AI Safety on a Budget: Your Guide to Free, Open-Source Tools for Implementing Safer LLMs

AI Safety on a Budget: Your Guide to Free, Open-Source Tools for Implementing Safer LLMs

You Can Now Call ChatGPT From Your Phone

You Can Now Call ChatGPT From Your Phone

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Automatic Movie Review System Using Sentimental Analysis For Positive or Negative Reviews

Author(s): Pushkara Sharma

A great article to understand the basics of sentimental analysis and data scraping by reviewing movies.

Prerequisites

Installing required packages

Let’s Start Coding

Model Development

Scraping reviews of a particular movie

Predicting the sentiment of each review

My Results

Visualize Interesting Sorting Algorithms With Python

There are various types of sorting algorithms out there and sometimes it becomes very difficult to understand their…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement