Azure Cognitive Services Sentiment Analysis v3.0 using Databricks PySpark

Last Updated on July 19, 2023 by Editorial Team

Author(s): Rory McManus

Originally published on Towards AI.

Cloud Computing, Natural Language Processing

Azure Cognitive Services Text Analytics is a great tool you can use to quickly evaluate a text data set for positive or negative sentiment. For example, a service provider can quickly and easily evaluate reviews as positive or negative and rank them based on the sentiment score detected.

As more and more businesses rely on electronic communications with their clients, understanding the overall sentiment attached to your product, service or image has never been more important. Sentiment analysis allow companies to automatically detect sentiment in any text (reviews, insurance claims, triaging etc) in a fast and highly scalable way.

My latest project was with a property management company with the aim of using the sentiment scores from client feedback on properties to identify and prioritise major issues, which enabled a quicker resolution to issues and improved customer service.

Today I’m going to go through how to use Azure Cognitive Services Text Analytics using Databricks PySpark Notebook to analyze the sentiment of COVID-19 Tweets and return sentiment scores and indicators as to whether it is a positive or negative tweet.

What is Azure Cognitive Services Text Analytics?

Cognitive Services are a set of machine learning algorithms that Microsoft has developed to solve problems in the field of Artificial Intelligence (AI). Developers can consume these algorithms through standard REST calls over the Internet to the Cognitive Services APIs in their Apps, Websites, or Workflows.

For this article, we will focus on the Text Analytics API Sentiment Analysis feature, which evaluates the text and returns sentiment scores and labels for each document and sentence. This is useful for detecting positive and negative sentiment for any language in social media, client reviews, discussion forums, and more.

Consuming the Sentiment Analysis API using PySpark.

To analyse text and return a sentiment analysis for our data we need the code to complete the following steps.

Import a dataset with a text column.
Set a parameter to identify the input dataset text column name making our code dynamic.
Set Azure Cognitive Services API and Key.
Create input Dataframe ready for the API post with an Id and Text column only.
Convert Dataframe to JSON ready for the API Post.
Post the JSON document to the Sentiment Analysis API.
Flatten JSON API response into Dataframe with rows and columns.
Join Dataframe with the original dataset to produce the final dataset and display for analysis.

Steps

Add the following imports to your file PySpark Notebook and create input Dataframe by importing a COVID19 Tweet dataset.

Results

2. Create and set the name of the text column parameter, set this to the name of the column you want analyzed.

3. For the purpose of this demonstration, we will set the Sentiment Analysis API parameters manually. Please be aware a more secure method would be to use Azure Key Vault to provide a greater level of security.

4. The payload to the API consists of a list of JSON documents, which are tuples containing an id, languageand a text attribute. The text attribute stores the text to be analyzed, the language is text language and the id can be any value. Therefore we need to add anid column and only select columns id,language and textcolumn for the API payload.

5. Convert DataFrame dfCog into a DataFrame of JSON string in the correct format for the API.

Output below.

6. Post the JSON payload to the API passing in the subscription_key, endpoint and document.

Successful response.

7. Now we have the response returned in JSON, we must flatten the document into rows and columns.

8. Finally, we can join the analyzed dataset to the input dataset and drop the added ID column and display the final output.

The final result provides a sentiment score between 0.0 and 1.0 and an overall sentiment label, with a higher score indicating more positive sentiment.

I have created this into a re-usable PySpark function. If you would like a copy please drop me a message and I can send you a link to my private GIT repo.

I hope this was helpful in saving you time understanding Azure Cognitive Sentiment Analysis and PySpark. Any thoughts, questions, corrections, and suggestions are very welcome 🙂

If you liked this article, here are some other articles you may enjoy:

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Azure Cognitive Services Sentiment Analysis v3.0 using Databricks PySpark

Author(s): Rory McManus

Cloud Computing, Natural Language Processing

What is Azure Cognitive Services Text Analytics?

Consuming the Sentiment Analysis API using PySpark.

If you liked this article, here are some other articles you may enjoy:

Databricks PySpark Type 2 SCD Function for Azure Synapse Analytics

Slowly Changing Dimensions (SCD) is a commonly used dimensional modeling technique used in data warehousing to capture…

Databricks: Upsert to Azure SQL using PySpark

An Upsert is an RDBMS feature that allows a DML statement’s author to automatically either insert a row, or if the row…

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Azure Cognitive Services Sentiment Analysis v3.0 using Databricks PySpark

Author(s): Rory McManus

Cloud Computing, Natural Language Processing

What is Azure Cognitive Services Text Analytics?

Consuming the Sentiment Analysis API using PySpark.

If you liked this article, here are some other articles you may enjoy:

Databricks PySpark Type 2 SCD Function for Azure Synapse Analytics

Slowly Changing Dimensions (SCD) is a commonly used dimensional modeling technique used in data warehousing to capture…

Databricks: Upsert to Azure SQL using PySpark

An Upsert is an RDBMS feature that allows a DML statement’s author to automatically either insert a row, or if the row…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥