Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.

Publication

Sentiment Analysis in Python Using VADER
Latest   Machine Learning

Sentiment Analysis in Python Using VADER

Last Updated on July 25, 2023 by Editorial Team

Author(s): Mahesh Tiwari, PhD

Originally published on Towards AI.

Welcome to our next blog post in the series on sentiment analysis! Today, we will be exploring VADER, one of the methods used in the Python library for sentiment analysis.

Photo by Tim Mossholder on Unsplash

The sentiment analysis was done for the movie “Extraction 2” using Twitter data that was collected. You can find the data on kaggle.com, and you can download it from this link. I also wrote a blog post about how to get data from Twitter for sentiment analysis (click me) and another one about cleaning the text before analyzing it (click me). There’s also a blog post that explains how to do sentiment analysis using the TextBlob library in Python (click me).

VADER (Valence Aware Dictionary and Sentiment Reasoner)

VADER is a sentiment analysis tool that uses a sentiment lexicon, a dictionary specifically designed for sentiment analysis, to determine the sentiment expressed in a text. The lexicon consists of words or phrases with their accompanying sentiment ratings. VADER assigns a score to each word in its sentiment lexicon to determine if it is positive or negative. When analyzing a text, VADER breaks it down into individual words and checks each word against its sentiment lexicon. Based on the scores assigned to the words, VADER calculates the overall sentiment score for the text.

It also considers grammatical rules like intensifiers and negations, which can alter a word’s meaning. By considering the context and word interactions, VADER also searches for modifiers that could alter the meaning of neighboring words. VADER will be able to portray emotion more effectively if they are aware of these contextual valence shifters. Here is how it works;

  1. VADER adds up the sentiment scores of each individual word in the text, taking into account the strength of the feelings and managing both positive and negative expressions.
  2. The outcome is two scores, one showing the general emotion and the other the intensity.
  3. Based on these ratings, VADER categorizes the text as good, negative, or neutral, attempting to accurately represent the mood communicated.

The initial step of this code involves loading and reading data from a CSV file called ‘cleaned_tweets_extraction.csv’ into a pandas DataFrame using the datatable library, which offers efficient data handling capabilities. We also import necessary libraries.

Note: To install VADER we can use pip install vaderSentiment

#import the necessary libraries
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import matplotlib.pyplot as plt
import datatable as dt
#reading
data = dt.fread('./cleaned_tweets_extraction.csv')
df = data.to_pandas()

Below is the main code snippet that is used for sentiment analysis.


#Create an instance of the VADER sentiment analyzer
analyzer = SentimentIntensityAnalyzer()

#Define a function to perform sentiment analysis using VADER
def get_sentiment(tokens):
sentiment = analyzer.polarity_scores(tokens)
compound_score = sentiment['compound']
return compound_score

# Apply the function to the 'tokens' column of the DataFrame
df['sentiment'] = df['tokens'].apply(get_sentiment)

# Print the DataFrame with sentiment scores
print(df['sentiment'])

The above code added a new column which is

Addition of sentiment column

Visualization

Next, the matplotlib library is imported as plt. The code counts the occurrences of each sentiment category in the ‘sentiment’ column of the DataFrame and stores the counts in the sentiment_counts variable. Then, a bar plot is created.

import matplotlib.pyplot as plt

# Count the occurrences of each sentiment
sentiment_counts = pd.cut(df['sentiment'], bins=3, labels=['Negative', 'Neutral', 'Positive']).value_counts()

# Plot the sentiments
plt.bar(sentiment_counts.index, sentiment_counts.values)
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.title('Sentiment Analysis')
plt.show()

# Print the number of counts for each sentiment
for sentiment, count in sentiment_counts.items():
print(f"{sentiment}: {count}")
Sentiment analysis using VADER in Python.

Conclusion

Based on the sentiment analysis results obtained using VADER for the Extraction 2 movie data scrapped from Twitter, we have the following sentiment counts:

  • Neutral: 5324
  • Positive: 3010
  • Negative: 1665

The sentiment analysis reveals that Twitter users’ reactions to the movie Extraction 2 were rational and unemotional. A sizable portion of tweets conveyed neutral feelings, indicating a fair viewpoint. A total of 3010 tweets, or good feelings, conveyed satisfaction and positive experiences. However, 1665 tweets contained unfavorable messages, expressing complaints, unfavorable evaluations, or unhappiness with certain elements. Overall, the data points to a balanced viewpoint that includes both positive and negative thoughts.

FOLLOW ME to be part of my Data Analyst Journey on Medium.

Let’s get connected on Twitter or you can email me at [email protected] for project collaboration, knowledge sharing or guidance.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓