Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Designing a Promotional Strategy for Alcoholic Drinks in Russia
Latest

Designing a Promotional Strategy for Alcoholic Drinks in Russia

Last Updated on January 6, 2023 by Editorial Team

Author(s): Abid Ali Awan

 

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

The main goal is to find the next ten locations similar to Saint Petersburg using unsupervised learning.

Photo by Elevate onΒ Unsplash

Alcohol consumption in Russia remains among the highest in the world. According to a 2011 report by the World Health Organization, which makes it the best place to start a beverage business. People love their drinks and the company which owns a chain of stores across Russia that sell a variety of alcoholic drinks wants to invest in marketing campaigns. The company recently ran a wine promotion in Saint Petersburg that was very successful. Due to the cost to the business, it isn’t possible to run the promotion in all regions. In this project, we are going to analyze our data, fix missing values, visualize data, train the clustering model, and finally visualize ourΒ results.

Data

The marketing team has sourced you with historical sales volumes per capita for several different drinksΒ types.

Dataset is avaible at Alcohol Consumption in Russia (1998–2016) | Kaggle with Creative Commonsβ€Šβ€”β€ŠCC0 1.0 Universal License

  • β€œyearβ€β€Šβ€”β€Šyear (1998–2016)
  • β€œregionβ€β€Šβ€”β€Šthe name of a federal subject of Russia. It could be oblast, republic, krai, autonomous okrug, federal city, and a single autonomous oblast
  • β€œwineβ€β€Šβ€”β€Šsale of wine in liters by year perΒ capita
  • β€œbeerβ€β€Šβ€”β€Šsale of beer in liters by year perΒ capita
  • β€œvodkaβ€β€Šβ€”β€Šsale of vodka in liters by year perΒ capita
  • β€œchampagneβ€β€Šβ€”β€Šsale of champagne in liters by year perΒ capita
  • β€œbrandyβ€β€Šβ€”β€Šsale of brandy in liters by year perΒ capita

Loading Dataset

We have used pandas to load theΒ .csv dataset and its pretty small dataset containing yearly (1998-2016) alcohol consumption (beer, champagne, brandy, wine, vodka) per regionΒ (85).

There are1615 sample which is logical as we have 19 years of data and 85 regions. The beer is leading the game as the mean value of beer is 51.3 liters by year per capita and the second-highest is vodka 11.81 liters by year per capita which is now even close to beer. This means people prefer beer as a go-to beverage. The beer also has the highest standard deviation which means that its demand is not stable and can fluctuate with time whereas champagne and brandy are a pretty safe bet if you want to start a low-risk business with the lowest standard deviation.

We can explore all the unique regions within theΒ dataset.

Correlation

There is a high correlation between champagne and brandy which makes it even better. If you promote champagne there will be an increase in the sale of brandy and champagne, which makes it win-win situation.

Missing Values

We will be using pandas dataframe background_gradient to display several missing values and percentages. It seems like all drinks columns have missing values and the highest is brandy withΒ 66.

We are going to use the fillna function and method pad to fill missing values with previous values in a column. As we can see that there are no missing values in ourΒ dataset.

Geo Location

For geolocation, we need to get coordinates to display stat on the map. For that weΒ need:

  • geopy -> Nominatim
  • creating user agent to connect to geopyΒ server.
  • create lat and lon function to extract latitude and longitude using the name of theΒ place.
  • value count the region column, resetting index, and then renamingΒ columns.
  • applying both functions on to geo[‘region’]
  • export the file as β€œrussian_geo.csv”

This process takes 5 minutes to run so we are just going to save the results in aΒ .csv file and later merge it with our main database.

Image byΒ Author
  • loading geo locationΒ dataset.
  • merging it with mainΒ dataset.
  • group by β€œregion” andΒ mean.
  • sort values by β€œbeer” descending.

We are going to use df_geo dataset to plot the total alcohol consumption on the PlotlyΒ map.

Map

In this section, we are going to plot the total alcohol consumption per region on theΒ map.

  • we have created a new column name β€œtotal” which sums up all the drinksΒ columns.
  • we have also created test columns that contain a caption, that will be displayed on theΒ map.
  • we are dividing our dataset into three categories based, first contains the top 10, the second contains 11th to 21st and third contains the rest of the regions. The ranking is based on Alcohol consumption perΒ region.
  • We are going to use Plotly sample code for plotting bubbleΒ maps.

The top 10 and 11–21 have no paterens. They are all over the place but we can see the pattern in the rest of the categories. You can also explore the different regions by hovering your mouse over and zooming out to observe moreΒ regions.

Alcohol Consumption PastΒ Trend

In this section, we are going to explore different types of drinks and their consumption trend over the past 19Β years.

As we can see beer consumption has risen with time up till 2007, then it became steady and its declining since 2011. Wine, champagne, and brandy consumption are lower than beer but they are steady. The vodka demand has increased up till 2002 and it’s declining slowly and steadily withΒ time.

The safest bet is to launch a campaign on either brandy or champagne but wine can be profitable as it has more consumption per capita and it isΒ stable.

Animation

You can interact with the figure below and observe the change in demand by clicking on the play button. This is a simple and attractive way to present your data as a story in from of the marketing manager.

It is fun too 😊

Kmean Clusters

Finding cluster is a subcategory of supervised learning where there are no targets available in training data. Clustering is the task of grouping together a set of objects in a way that objects in the same cluster are more similar to each other than to objects in other clusters. K-means is vastly used for clustering in many data science applications, especially useful if you need to quickly discover insights from unlabeled data.

First, we need to find how many clusters are there in our data, before that we need to create our X for that we are going to group the dataset by “region” and take average values. Then we are going to run SKlearn KMeans on a range from 1 to 9. There are 2 elbows in our line plot, we will be checking both of them. The Elbow method tells us to select the cluster when there is a significant change inΒ inertia.

We have discovered elbow at 2 and atΒ 3.

Let’s check the n=2Β cluster.

It seems quite clear, but we cannot stop here, we need to also check for n=3 in the nextΒ section.

Now checking on n=3Β cluster.

I think 3 clusters are better and we are going to divide our dataset based on it. The clusters are created based on alcohol consumption.

Cluster Visualization

In this section we are going to compare the clusters on various columns and as you can see the best results are shown for beer V/SΒ wine.

Clusters Swarmplot

Let’s visualize our results using Swamplot.

We can clearly see the orange has the highest average consumption in all drinks categories. They all are following similar rules of ranking. The orange is high, the blue is medium and the green is low. We are going to use this to pick the top ten regions and products.

Total drinks swarmΒ plot

To summarize our findings let’s visualize total alcohol consumption and clusters. It is now clear the hypothesis was true about the orange to begin at the top of theΒ ladder.

Hierarchy Cluster

We will be using scipy spatial distance_matrix to calculate the distance from each point to every other point of a dataset. The function distance_matrix requires two inputs and we are adding X andΒ X.

We will then use scipy hierarchy linkage using the average method to create links between each cluster, which later will be used to plot the dendrogram.

Dendrogram

Hierarchical clustering is typically visualized as a dendrogram every unique cluster is represented with different colors. As you can see how each node is connected forming this hierarchy of clusters.

It seems like we have 4 hierarchical clusters that are determined by different colors.

Hierarchy Agglomerative Clustering

Using Agglomerative clustering we are going to devide our dataset in 4 clusters. To learn more about Agglomerative Clustering check thisΒ link

The 4 clusters are almost similar to Kmeans 3 clusters but the medium cluster is divided. We will be using both Kmeans and hierarchy clusters to determine optimum locations for a marketing campaign.

Categorizing Clusters

It’s time to add both the Kmeans cluster and Hierarchical cluster into our main database.

  1. creating a new column β€œH_Pop” and adding H cluster prediction.
  2. renaming numerical values to categories based on popularity.
  3. Filtering out to see top 2Β values.

We are going to go the same with Kmean clusters:

  1. Renaming column name from β€œLabels” toΒ β€œKM_Pop”
  2. Create a new column β€œTotal_Drinks” and add total drinksΒ values.
  3. renaming numerical values to categories based on popularity.
  4. Filtering out to see top 11 values sort by Total Alcohol consumption.

Results

We have finally spotted 10 regions similar to Saint Petersburg for promoting launching new marketing comparing. We have selected the cluster which is β€œHigh” on the Kmean cluster and β€œTop” on the Herarchery cluster, then sort values but β€œTotal_Drinks” to get Top ten regions for the campaign.

For the final results, we are going to sort our values by wine consumption to get a similar cluster as Saint Petersburg where the companie’s wine marketing campaign became successful.

These are the region that we should target next for promoting Wine and other Alcoholic beverages:

  1. Vologda Oblast
  2. Komi Republic
  3. Leningrad Oblast
  4. Smolensk Oblast
  5. Sverdlovsk Oblast
  6. Moscow
  7. Kamchatka Krai
  8. Ivanovo Oblast
  9. Yaroslavl Oblast
  10. Sevastopol

Conclusion

I had fun playing around with various unsupervised clustering algorithms and plotting values on the map. In this project, we have learned how to analyze the data, fill missing values, plot values on the map, various seaborn visualization, and finally use a clustering algorithm to predict the top ten regions for marketing campaigns. It is easy to promote beer but its declining since 2007 so the best hope is to promote the more stable product such as wine and brandy. The company should focus on the β€œVologda Oblast” region as it has the highest vine consumption in the Top cluster and after the successful launch, it should move to the rest of the nine places mentioned above. We have no idea what is the current year and it’s pretty hard to predict clusters using times series, instead, I have simplified the problem by taking meanΒ value.

Thank you for reading my notebook and don’t forget to upvoteπŸ‘†.

Code

Learning Resource

  1. Clustering Agglomerative process | Towards DataΒ Science
  2. Topic 7. Unsupervised learning: PCA and clustering |Β Kaggle
  3. Unsupervised Learning and Data Clustering | by Sanatan Mishra | Towards DataΒ Science
  4. Coursesβ€Šβ€”β€ŠDataCampΒ Learn
  5. ✨ Introducing Plotly Express ✨. Plotly Express is a new high-level… | by plotly | Plotly |Β Medium
  6. seaborn: statistical data visualizationβ€Šβ€”β€Šseaborn 0.11.2 documentation (pydata.org)
  7. Restaurants Sales During COVID (EDA) πŸ‘¨β€πŸ³πŸ˜² |Β Kaggle

Future Works

  • In this project, I haven’t used time-series data to determine the clusters. For future work, I will use the time series model to predict clusters similar to Saints Perterberg.
  • Next, I will explore various options in geospatial analysis.
  • I will normalize data using various scaling tools and observe the difference.
  • Finally, I will do a comparison between various algorithmic results.

About Author

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models and research on the latest AI technologies. Currently testing AI Products at PEC-PITC, their work later gets approved for human trials, such as the Breast Cancer Classifier.


Designing a Promotional Strategy for Alcoholic Drinks in Russia was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

 

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓