Sentiment Analysis in Python Using VADER
Last Updated on July 25, 2023 by Editorial Team
Author(s): Mahesh Tiwari, PhD
Originally published on Towards AI.
Welcome to our next blog post in the series on sentiment analysis! Today, we will be exploring VADER, one of the methods used in the Python library for sentiment analysis.
The sentiment analysis was done for the movie βExtraction 2β using Twitter data that was collected. You can find the data on kaggle.com, and you can download it from this link. I also wrote a blog post about how to get data from Twitter for sentiment analysis (click me) and another one about cleaning the text before analyzing it (click me). Thereβs also a blog post that explains how to do sentiment analysis using the TextBlob library in Python (click me).
VADER (Valence Aware Dictionary and Sentiment Reasoner)
VADER is a sentiment analysis tool that uses a sentiment lexicon, a dictionary specifically designed for sentiment analysis, to determine the sentiment expressed in a text. The lexicon consists of words or phrases with their accompanying sentiment ratings. VADER assigns a score to each word in its sentiment lexicon to determine if it is positive or negative. When analyzing a text, VADER breaks it down into individual words and checks each word against its sentiment lexicon. Based on the scores assigned to the words, VADER calculates the overall sentiment score for the text.
It also considers grammatical rules like intensifiers and negations, which can alter a wordβs meaning. By considering the context and word interactions, VADER also searches for modifiers that could alter the meaning of neighboring words. VADER will be able to portray emotion more effectively if they are aware of these contextual valence shifters. Here is how it works;
- VADER adds up the sentiment scores of each individual word in the text, taking into account the strength of the feelings and managing both positive and negative expressions.
- The outcome is two scores, one showing the general emotion and the other the intensity.
- Based on these ratings, VADER categorizes the text as good, negative, or neutral, attempting to accurately represent the mood communicated.
The initial step of this code involves loading and reading data from a CSV file called βcleaned_tweets_extraction.csvβ into a pandas DataFrame using the datatable
library, which offers efficient data handling capabilities. We also import necessary libraries.
Note: To install VADER we can use
pip install vaderSentiment
#import the necessary libraries
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt
import datatable as dt
#reading
data = dt.fread('./cleaned_tweets_extraction.csv')
df = data.to_pandas()
Below is the main code snippet that is used for sentiment analysis.
#Create an instance of the VADER sentiment analyzer
analyzer = SentimentIntensityAnalyzer()
#Define a function to perform sentiment analysis using VADER
def get_sentiment(tokens):
sentiment = analyzer.polarity_scores(tokens)
compound_score = sentiment['compound']
return compound_score
# Apply the function to the 'tokens' column of the DataFrame
df['sentiment'] = df['tokens'].apply(get_sentiment)
# Print the DataFrame with sentiment scores
print(df['sentiment'])
The above code added a new column which is
Visualization
Next, the matplotlib
library is imported as plt. The code counts the occurrences of each sentiment category in the βsentimentβ column of the DataFrame and stores the counts in the sentiment_counts
variable. Then, a bar plot is created.
import matplotlib.pyplot as plt
# Count the occurrences of each sentiment
sentiment_counts = pd.cut(df['sentiment'], bins=3, labels=['Negative', 'Neutral', 'Positive']).value_counts()
# Plot the sentiments
plt.bar(sentiment_counts.index, sentiment_counts.values)
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.title('Sentiment Analysis')
plt.show()
# Print the number of counts for each sentiment
for sentiment, count in sentiment_counts.items():
print(f"{sentiment}: {count}")
Conclusion
Based on the sentiment analysis results obtained using VADER for the Extraction 2 movie data scrapped from Twitter, we have the following sentiment counts:
- Neutral: 5324
- Positive: 3010
- Negative: 1665
The sentiment analysis reveals that Twitter usersβ reactions to the movie Extraction 2 were rational and unemotional. A sizable portion of tweets conveyed neutral feelings, indicating a fair viewpoint. A total of 3010 tweets, or good feelings, conveyed satisfaction and positive experiences. However, 1665 tweets contained unfavorable messages, expressing complaints, unfavorable evaluations, or unhappiness with certain elements. Overall, the data points to a balanced viewpoint that includes both positive and negative thoughts.
FOLLOW ME to be part of my Data Analyst Journey on Medium.
Letβs get connected on Twitter or you can email me at [email protected] for project collaboration, knowledge sharing or guidance.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI