Sentiment Analysis of Yelp Restaurants Reviews in Real-Time
Last Updated on March 13, 2024 by Editorial Team
Author(s): Sara M.
Originally published on Towards AI.
Your Guide to real-time Sentiment Analysis
In the bustling world of modern-day dining experiences, customer feedback holds a paramount role in shaping the success and reputation of restaurants. The ability to harness the power of sentiment analysis on Yelp restaurant reviews emerges as a pivotal tool for restaurateurs and food enthusiasts alike. By delving into the sentiments expressed within these reviews, a nuanced understanding of customer satisfaction, preferences, and criticisms can be unveiled.
By leveraging NiFi for data automation, processing and distribution, Kafka for data streaming, Spark for real-time analysis, and NLTK for sentiment analysis, we will unravel the seamless integration of these technologies to decode the sentiments hidden within each review.
We will use Yelp API to get user reviews.
So, before we start, you need to first get the bearer token to use the API. Find more details here.
The Data Pipeline to Implement
As shown in the diagram above, here is what the pipeline does:
- Yelp review are fetched via HTTPS calls.
- Apache NiFi ingests and processes the data.
- Processed data is sent to Apache Kafka.
- Apache Spark consumes the Kafka data.
- Sentiment analysis is performed using Pythonβs NLTK library.
Set-Up
As usual, set up the environment is always easy since we use only one command to create all the services needed.
Letβs docker-compose up π
Since I have used similar docker-compose and talked about Kafka, Spark, Akhq and Nifi before in different occasions, this time I will not go into details.
For more details about the used services, you can refer to my previous articles.
Hands on : PySpark + Kafka Streaming + OpenAI
In this comprehensive guide, we will see how to set up Kafka Producer and Kafka Consumer with PySpark and OpenAIβ¦
pub.towardsai.net
Exactly-once Semantic Kafka with Nifi Finally Enabled !
Step-by-step guide to implement real-time analysis with Exactly-once delivery enabled
medium.com
Step 1: Ingest reviews in Kafka with NIFI
https://www.yelp.com/biz/pink-mamma-paris
The Data flow will consist of the following steps:
- A βGet Requestβ of Yelp API to retrieve reviews about the restaurant pink-mamma-paris
Pink Mamma – Paris, 75
415 reviews and 1257 photos of Pink Mamma "Their Secret Saice I really don't need to review this place. Pink Mamma isβ¦
www.yelp.com
2. Split the Json array to get each review
3. Creating attributes of βreviewβ, βuserβ, βreviewIdβ
4. Add attributes to content to generate the json that we want to send to the topic
5. Send the json to the topic βreviewsβ
I am not going into the detail of implementation, you can directly use my template here.
Donβt forget to add the bearer token so the invoke HTTP can work.
We start the process, and in few seconds, we start noticing that the topic βreviewsβ is receiving Data
Step 2 : In Spark, read the Data Stream of reviews from Kafka & Analyze
Letβs first consume the Data from βreviewsβ topic
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk
KAFKA_BOOTSTRAP_SERVERS = "kafka:9092"
KAFKA_TOPIC_SOURCE = "reviews"
spark = SparkSession.builder.appName("sentiment_analysis_reviews").getOrCreate()
# Reduce logging
spark.sparkContext.setLogLevel("WARN")
df = spark.readStream.format("kafka") \
.option("kafka.bootstrap.servers", KAFKA_BOOTSTRAP_SERVERS) \
.option("subscribe", KAFKA_TOPIC_SOURCE) \
.option("startingOffsets", "earliest") \
.load()
json_schema = StructType([
StructField("reviewId", StringType()),
StructField("review", StringType()),
StructField("user", StringType())
])
review_df = df.select(from_json(col("value").cast("string"), json_schema).alias("value"))
review_df.printSchema()
Above, we created a schema (json_schema) that represents the format of the data incoming and we use Spark object to directly read the stream. We connect to the βreviewsβ topic to retrieve the reviews and cast the value as a string.
Next, we will start sentiment analysis.
I will use NLTK VADER tool for this analyze. You can use other models if you want.
The NLTKβs VADER tool is designed to work well with social media text, including texts with emojis, slangs, and informal expressions.
So, to use the library, we need to download the βvader_lexiconβ and put it in the volume that we mounted β/nltk_dataβ (see the docker-compose file above ).
# analyze sentiment
def analyze_sentiment(review):
# I choose NLTK model, you can replace it with another sentiment analysis model
sia = SentimentIntensityAnalyzer()
sentiment_scores = sia.polarity_scores(review)
return str(sentiment_scores)
sentiment_udf = udf(preprocess_and_analyze_sentiment, StringType())
# Apply the sentiment analysis UDF to the "review" field
processed_stream_df = review_df.withColumn("sentiment", sentiment_udf(col("value.review")))
# Select the desired columns to output
output_df = processed_stream_df.select(
col("value.reviewId"),
col("value.review"),
col("value.user"),
col("sentiment")
)
Here, we created an analysis function and applied it to the βreviewβcolumn of the dataframe. Then we perform sentiment analysis on it, and return the sentiment scores as a string.
The
polarity_scores
method returns a dictionary containing the sentiment scores. The scores include:
neg
: The negative sentiment score.
neu
: The neutral sentiment score.
pos
: The positive sentiment score.
compound
: A composite score that calculates the sum of all the lexicon ratings, normalized to be between -1 (most extreme negative) and +1 (most extreme positive).
# Write the stream to the console
query = output_df \
.writeStream \
.outputMode("append") \
.format("console") \
.option("truncate", "false") \
.start()
query.awaitTermination()
And Finally, we write the result to the console
Letβs run the script, and see what happens:
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 /apps/spark-streaming.py
Hereβs how to interpret these scores for each review:
Review 1 (Jo L.)
- Scores:
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
- Interpretation: This review is considered entirely neutral, as indicated by a
neu
score of 1.0. There are no detected positive or negative sentiments in the text provided to the analyzer, resulting in acompound
score of 0.0. This might indicate that the review text mainly contains factual statements or lacks emotional language.
Review 2 (Brandi B.)
- Scores:
{'neg': 0.0, 'neu': 0.565, 'pos': 0.435, 'compound': 0.9468}
- Interpretation: This review has a strong positive sentiment. The
pos
(positive) score is 0.435, indicating a significant portion of the text is positive. Theneu
(neutral) score is 0.565, showing that the rest of the review is considered neutral. Thecompound
score of 0.9468 is high and positive, reinforcing the interpretation that the overall sentiment of the review is very positive.
Review 3 (Tracy P.)
- Scores:
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
- Interpretation: Similar to the first review, this one is also considered entirely neutral with a
neu
score of 1.0 and acompound
score of 0.0. This suggests that the review may be descriptive or informational without expressing clear positive or negative feelings.
Now What ?
Having successfully implemented real-time sentiment analysis on Yelp reviews and visualized the initial results in the console, weβve laid the groundwork for a data-driven approach to understanding customer feedback.
The process of extracting, analyzing, and interpreting sentiment from user-generated content offers invaluable insights into customer satisfaction and public perception. However, the journey doesnβt end with raw sentiment scores. The next step involves leveraging these results for deeper analysis, trend identification, and actionable business intelligence.
This leads us to a next step : integrating our sentiment analysis data with analytical tools like Grafana and Kibana for advanced visualization and monitoring.
In a next article, we will explore how to take the sentiment analysis data from Spark and integrate it with Grafana or Kibana which can unlock the full potential of sentiment analysis in driving business strategy and customer satisfaction.
Cleaning up
Donβt forget to remove the running containers:
docker-compose down
If you have any questions, feedback, or would like to share your experiences, please feel free to reach out in the comments section.
Clap my article 50 times U+1F44F, that will really help me out and boost this article to others U+270DU+1F3FBU+2764οΈ.
Thank you U+1FAF6!
Want to connect?
References
Sentiment Analysis Using VADER
VADER( Valence Aware Dictionary for Sentiment Reasoning) is an NLTK module that provides sentiment scores based on theβ¦
www.analyticsvidhya.com
- My previous articles about Kafka, Spark and Nifi
Hands on : PySpark + Kafka Streaming + OpenAI
In this comprehensive guide, we will see how to set up Kafka Producer and Kafka Consumer with PySpark and OpenAIβ¦
pub.towardsai.net
Exactly-once Semantic Kafka with Nifi Finally Enabled !
Step-by-step guide to implement real-time analysis with Exactly-once delivery enabled
medium.com
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI