Starbucks Sales Analysis – Part 1
Last Updated on December 28, 2021 by Editorial Team
Author(s): Abhishek Jana
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Data Analysis
An in-depth look at Starbucks salesΒ data!
Every data tells a story! As a part of Udacityβs Data Science nano-degree program, I was fortunate enough to have a look at Starbucks β sales data. In this capstone project, I was free to analyze the data in my way. So, in this blog, I will try to explain what IΒ did.
Dataset Overview
The data was created to get an overview of the following things:
- To observe the purchase decision of people based on different promotional offers.
- There are three types of offers: BOGO ( buy one get one ), discount, and informational. I wanted to see the influence of these offers on purchases.
- Finally, I wanted to see how the offers influence a particular group ofΒ people.
There are 3 files in theΒ dataset:
profile.json
Rewards program users (17000 users x 5Β fields)
- gender: (categorical) M, F, O, orΒ null
- age: (numeric) missing value encoded asΒ 118
- id: (string/hash) id of eachΒ user.
- became_member_on: (date) formatΒ YYYYMMDD
- income: (numeric)
portfolio.json
Offers sent during the 30-day test period (10 offers x 6Β fields)
- reward: (numeric) money awarded for the amountΒ spent
- channels: (list) web, email, mobile,Β social
- difficulty: (numeric) money required to be spent to receive aΒ reward
- duration: (numeric) time for the offer to be open, inΒ days
- offer_type: (string) BOGO, discount, informational
- id: (string/hash) id of theΒ offers
transcript.json
Event log (306648 events x 4Β fields)
- person: (string/hash)
- event: (string) offer received, offer viewed, transaction, offer completed
- value: (dictionary) different values depending on eventΒ type
- offer id: (string/hash) not associated with any βtransactionβ
- amount: (numeric) money spent in βtransactionβ
- reward: (numeric) money gained from βoffer completedβ
- time: (numeric) hours after the start of theΒ test
Problem Statement
There are three main questions I attempted toΒ answer.
- What is the spending pattern based on offer type and demographics?
- How to recommend coupons/offers to current customers based on their spendingΒ pattern?
- How to recommend coupons/offers to new customers?
Data Analysis
From the portfolio.json file, I found out that there are 10 offers of 3 different types: BOGO, Discount, Informational.
BOGO: For the buy-one-get-one offer, we need to buy one product to get a product equal to the threshold value.
Discount: In this offer, a user needs to spend a certain amount to get a discount.
Informational: This type of offer has no discount or minimum amount toΒ spend.
To redeem the offers one has to spend 0, 5, 7, 10, or 20Β dollars.
The profile.json data is the information of 17000 unique people. The data has some null values. And by looking at the data we can say that some people did not disclose their gender, age, or income. Thatβs why we have the same number of null values in the gender and income column, and the corresponding age column has 118 asΒ age.
Distribution of the profileΒ data
The profile data has the same mean age distribution amongΒ genders.
As we can see the age data is nearly a Gaussian distribution(slightly right-skewed) with 118 as outlier whereas the income data is right-skewed.
The transcript.json data has the transaction details of the 17000 unique people. 4 types of events are registered, transaction, offer received, and offerΒ viewed
The value column has either the offer id or the amount of transaction.
Data Preprocessing
To answer the first question: What is the spending pattern based on offer type and demographics? I will rearrange the data files and try to answer a few questions to answer questionΒ 1.
The sub-questions are:
- What are the popularΒ offers?
- How offers are utilized among different genders?
- How transaction varies with gender, age, andΒ income?
Firstly, I merged the portfolio.json, profile.json, and transcript.json files to add the demographic information and offer information for better visualization. So my new dataset had the following columns:
'person', 'event', 'value', 'time', 'gender', 'age', 'income', 'date'.
Also, I changed the βnullβ gender to βUnknownβ to make it a newΒ feature.
Letβs recap the columns for better understanding:
- person(category): 17000 uniqueΒ users.
- event(category): 4 unique categories: offer completed, offer received, offer viewed, and transaction.
- value(category/numeric): when event = βtransactionβ, value is numeric, otherwise categoric with offer id as categories.
- time(numeric): 0 is the start of the experiment.
- gender(category): 4 unique categories: Male, Female, Other, andΒ Unknown.
- age(numeric): numeric column with 118 being unknown orΒ outlier.
- income(numeric): numeric column with some null values corresponding to 118Β age.
- date: date of the transaction.
What are the popular types ofΒ offers?
We can make a plot of what percentage of the distributed offer was BOGO, Discount, and Informational and finally find out what percentage of the offers were received, viewed, and completed.
To do so, I separated the offer data from transaction data (event = βtransactionβ).
We can see that the informational offers donβt need to be completed. Although, BOGO and Discount offers were distributed evenly,
- BOGO offers were viewed more than discountΒ offers.
- But, Discount offers were completed more.
So, discount offers were more popular in terms of completion.
How offers are utilized among different genders?
Since there is no offer completion for an βinformationalβ offer, we can ignore the rows containing βinformationalβ offers to find out the relation between offer viewed and offer completion.
From the βAverage offer received by genderβ plot, we see that the average offer received per person by gender is nearly theΒ same.
The βdistribution of offers by Genderβ plot shows the percentage of offers viewed among offers received by gender and the percentage of offers completed among offers received byΒ gender.
We seeΒ that,
- Other customers viewed the most offersΒ and
- Male customers viewed the leastΒ offers.
- Female customers completed the most offersΒ and,
- The Unknown group completed the leastΒ offers.
We can say, given an offer, the chance of redeeming the offer is higher among Females and OtherΒ genders!
How transaction varies with gender, age, andΒ income?
From the transaction data, letβs try to find out how gender, age, and income relates to the average transaction amount.
We can see the expected trend in age and income vs expenditure. With age and income, mean expenditure increases.
- In the gender plot, we see women tend to spend the most, and the group with no demographic data (Unknown gender) tends to spend theΒ least.
- Thereβs a positive correlation between age and average spending.
- People spend more with higherΒ income.
Conclusion
So, in conclusion, to answer What is the spending pattern based on offer type and demographics?
The possible answerΒ is,
- Although BOGO offers were viewed more, Discount offers were more popular in terms of completion.
- Given an offer, the chance of redeeming the offer is higher among Females and OtherΒ genders!
- Women tend to spend theΒ most.
- Spending increases with age andΒ income.
In part 2 of this blog, I willΒ explain,
- How to recommend coupons/offers to current customers based on their spendingΒ pattern?
- How to recommend coupons/offers to new customers?
A link to part 2 of this blog can be foundΒ here.
The GitHub repository of this project can be foundΒ here.
Starbucks Sales Analysis – Part 1 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI