Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Sentiment Analysis of Yelp Restaurants Reviews in Real-Time
Data Science   Latest   Machine Learning

Sentiment Analysis of Yelp Restaurants Reviews in Real-Time

Last Updated on March 13, 2024 by Editorial Team

Author(s): Sara M.

Originally published on Towards AI.

Image By rawpixel.com from Freepik

Your Guide to real-time Sentiment Analysis

In the bustling world of modern-day dining experiences, customer feedback holds a paramount role in shaping the success and reputation of restaurants. The ability to harness the power of sentiment analysis on Yelp restaurant reviews emerges as a pivotal tool for restaurateurs and food enthusiasts alike. By delving into the sentiments expressed within these reviews, a nuanced understanding of customer satisfaction, preferences, and criticisms can be unveiled.

By leveraging NiFi for data automation, processing and distribution, Kafka for data streaming, Spark for real-time analysis, and NLTK for sentiment analysis, we will unravel the seamless integration of these technologies to decode the sentiments hidden within each review.

We will use Yelp API to get user reviews.

So, before we start, you need to first get the bearer token to use the API. Find more details here.

The Data Pipeline to Implement

Photo by Author

As shown in the diagram above, here is what the pipeline does:

  1. Yelp review are fetched via HTTPS calls.
  2. Apache NiFi ingests and processes the data.
  3. Processed data is sent to Apache Kafka.
  4. Apache Spark consumes the Kafka data.
  5. Sentiment analysis is performed using Python’s NLTK library.

Set-Up

As usual, set up the environment is always easy since we use only one command to create all the services needed.

Let’s docker-compose up 🙂

Since I have used similar docker-compose and talked about Kafka, Spark, Akhq and Nifi before in different occasions, this time I will not go into details.

For more details about the used services, you can refer to my previous articles.

Hands on : PySpark + Kafka Streaming + OpenAI

In this comprehensive guide, we will see how to set up Kafka Producer and Kafka Consumer with PySpark and OpenAI…

pub.towardsai.net

Exactly-once Semantic Kafka with Nifi Finally Enabled !

Step-by-step guide to implement real-time analysis with Exactly-once delivery enabled

medium.com

Step 1: Ingest reviews in Kafka with NIFI

Data Fow implemented in Nifi _ Image by Author

https://www.yelp.com/biz/pink-mamma-paris

The Data flow will consist of the following steps:

  1. A “Get Request” of Yelp API to retrieve reviews about the restaurant pink-mamma-paris

Pink Mamma – Paris, 75

415 reviews and 1257 photos of Pink Mamma "Their Secret Saice I really don't need to review this place. Pink Mamma is…

www.yelp.com

2. Split the Json array to get each review

3. Creating attributes of “review”, “user”, “reviewId”

4. Add attributes to content to generate the json that we want to send to the topic

5. Send the json to the topic “reviews”

I am not going into the detail of implementation, you can directly use my template here.

Don’t forget to add the bearer token so the invoke HTTP can work.

invokeHTTP properties _ Image by Author

We start the process, and in few seconds, we start noticing that the topic “reviews” is receiving Data

Step 2 : In Spark, read the Data Stream of reviews from Kafka & Analyze

Let’s first consume the Data from “reviews” topic

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk

KAFKA_BOOTSTRAP_SERVERS = "kafka:9092"
KAFKA_TOPIC_SOURCE = "reviews"


spark = SparkSession.builder.appName("sentiment_analysis_reviews").getOrCreate()
# Reduce logging
spark.sparkContext.setLogLevel("WARN")

df = spark.readStream.format("kafka") \
.option("kafka.bootstrap.servers", KAFKA_BOOTSTRAP_SERVERS) \
.option("subscribe", KAFKA_TOPIC_SOURCE) \
.option("startingOffsets", "earliest") \
.load()

json_schema = StructType([

StructField("reviewId", StringType()),

StructField("review", StringType()),

StructField("user", StringType())

])


review_df = df.select(from_json(col("value").cast("string"), json_schema).alias("value"))

review_df.printSchema()

Above, we created a schema (json_schema) that represents the format of the data incoming and we use Spark object to directly read the stream. We connect to the “reviews” topic to retrieve the reviews and cast the value as a string.

Next, we will start sentiment analysis.

I will use NLTK VADER tool for this analyze. You can use other models if you want.

The NLTK’s VADER tool is designed to work well with social media text, including texts with emojis, slangs, and informal expressions.

So, to use the library, we need to download the “vader_lexicon” and put it in the volume that we mounted “/nltk_data” (see the docker-compose file above ).


# analyze sentiment
def analyze_sentiment(review):
# I choose NLTK model, you can replace it with another sentiment analysis model
sia = SentimentIntensityAnalyzer()
sentiment_scores = sia.polarity_scores(review)

return str(sentiment_scores)


sentiment_udf = udf(preprocess_and_analyze_sentiment, StringType())

# Apply the sentiment analysis UDF to the "review" field
processed_stream_df = review_df.withColumn("sentiment", sentiment_udf(col("value.review")))

# Select the desired columns to output
output_df = processed_stream_df.select(
col("value.reviewId"),
col("value.review"),
col("value.user"),
col("sentiment")
)

Here, we created an analysis function and applied it to the “review”column of the dataframe. Then we perform sentiment analysis on it, and return the sentiment scores as a string.

The polarity_scores method returns a dictionary containing the sentiment scores. The scores include:

neg: The negative sentiment score.

neu: The neutral sentiment score.

pos: The positive sentiment score.

compound: A composite score that calculates the sum of all the lexicon ratings, normalized to be between -1 (most extreme negative) and +1 (most extreme positive).

# Write the stream to the console 
query = output_df \
.writeStream \
.outputMode("append") \
.format("console") \
.option("truncate", "false") \
.start()

query.awaitTermination()

And Finally, we write the result to the console

Let’s run the script, and see what happens:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 /apps/spark-streaming.py

Here’s how to interpret these scores for each review:

Review 1 (Jo L.)

  • Scores: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
  • Interpretation: This review is considered entirely neutral, as indicated by a neu score of 1.0. There are no detected positive or negative sentiments in the text provided to the analyzer, resulting in a compound score of 0.0. This might indicate that the review text mainly contains factual statements or lacks emotional language.

Review 2 (Brandi B.)

  • Scores: {'neg': 0.0, 'neu': 0.565, 'pos': 0.435, 'compound': 0.9468}
  • Interpretation: This review has a strong positive sentiment. The pos (positive) score is 0.435, indicating a significant portion of the text is positive. The neu (neutral) score is 0.565, showing that the rest of the review is considered neutral. The compound score of 0.9468 is high and positive, reinforcing the interpretation that the overall sentiment of the review is very positive.

Review 3 (Tracy P.)

  • Scores: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
  • Interpretation: Similar to the first review, this one is also considered entirely neutral with a neu score of 1.0 and a compound score of 0.0. This suggests that the review may be descriptive or informational without expressing clear positive or negative feelings.

Now What ?

Having successfully implemented real-time sentiment analysis on Yelp reviews and visualized the initial results in the console, we’ve laid the groundwork for a data-driven approach to understanding customer feedback.

The process of extracting, analyzing, and interpreting sentiment from user-generated content offers invaluable insights into customer satisfaction and public perception. However, the journey doesn’t end with raw sentiment scores. The next step involves leveraging these results for deeper analysis, trend identification, and actionable business intelligence.

This leads us to a next step : integrating our sentiment analysis data with analytical tools like Grafana and Kibana for advanced visualization and monitoring.

In a next article, we will explore how to take the sentiment analysis data from Spark and integrate it with Grafana or Kibana which can unlock the full potential of sentiment analysis in driving business strategy and customer satisfaction.

Cleaning up

Don’t forget to remove the running containers:

docker-compose down

If you have any questions, feedback, or would like to share your experiences, please feel free to reach out in the comments section.

Clap my article 50 times U+1F44F, that will really help me out and boost this article to others U+270DU+1F3FBU+2764️.

Thank you U+1FAF6!

Want to connect?

References

Sentiment Analysis Using VADER

VADER( Valence Aware Dictionary for Sentiment Reasoning) is an NLTK module that provides sentiment scores based on the…

www.analyticsvidhya.com

  • My previous articles about Kafka, Spark and Nifi

Hands on : PySpark + Kafka Streaming + OpenAI

In this comprehensive guide, we will see how to set up Kafka Producer and Kafka Consumer with PySpark and OpenAI…

pub.towardsai.net

Exactly-once Semantic Kafka with Nifi Finally Enabled !

Step-by-step guide to implement real-time analysis with Exactly-once delivery enabled

medium.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓