Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Why Apriori Algorithm Is Not Applicable for All Types of Products/Stores?
Latest   Machine Learning

Why Apriori Algorithm Is Not Applicable for All Types of Products/Stores?

Last Updated on July 18, 2023 by Editorial Team

Author(s): OneByyTwo

Originally published on Towards AI.

Apriori algorithm is the most sought-after tool when it comes to conducting Market Basket Analysis. However, it is not applicable to all types of products or stores.

A note from the authors

Dear readers, before you go through our article, please be informed that we assume that you have a general idea about Market Basket Analysis and the Apriori algorithm. If not, we suggest you take a look at the following quick 7-minute article by

Eliana Grosof. Thank you for your time and interest!

Apriori Machine Learning Algorithm, Explained

A powerful yet simple ML algorithm for generating recommendations

medium.com

Image Source: https://intellipaat.com/blog/data-science-apriori-algorithm/

Purpose

We started the study with the intent of finding some out-of-the-box association rules. This intent was fueled by other such famous rules like [ beer, diapers] or [ beer, table fan] that we came across. However, through the course of this study, we ended up realizing that Apriori does not apply to all kinds of datasets. We also learned some amazing concepts and techniques which can make the process of conducting Market Basket Analysis pretty easy.

Datasets Used

We used the data set containing events data [ view, add to cart, purchase] of an e-commerce electronics platform, including all electronics brands. The intent was to identify uncommon rules which affect the purchase of several products. Since Samsung and Apple collectively constituted 57% of the data, we focused on the purchases that took place for these brands only.

Link to datasets

Electronics e-commerce platform dataset: eCommerce behavior data from multi-category store U+007C Kaggle

Grocery store dataset (Introduced later in this article): Market Basket Analysis Data U+007C Kaggle

Key Terms for this article

A “transaction” refers to the purchase of one or more items. Each “transaction” has a unique user session ID.

A “purchase” refers to the purchase of one quantity of that item only. Multiple “purchases” can have a common user session ID.

Journey

After removing “view” and “add-to-cart” records, we assume each row in the dataset pertains to the purchase of one quantity of that item. These individual purchases are grouped by user session ids resulting in different transactions.

We used the Apriori algorithm on python to conduct the Market Basket Analysis separately for Apple and Samsung. We had to reduce our metric thresholds to several decimal places due to many individual item transactions.

Code for defining a function for the Apriori association rule generator
Code for running the Apriori algorithm on the Samsung dataset
Code for running the Apriori algorithm on the Samsung dataset using a lower support threshold

As a result, all our metric values were very low, and we couldn’t find any significant association rules between any two given items. To validate that our approach was correct, we ran the same code on a grocery store data set meant, especially for Market Basket Analysis.

Results

We understood how confidence is calculated, but we couldn’t shake the feeling that confidence is just an arbitrary measure of the likelihood of item B being purchased if item A is purchased. For example, we found if “Dill” is purchased first, the confidence of “Eggs” being purchased was 0.39. At the same time, if “Eggs” are purchased first, the likelihood of “Dill” being purchased is approximately 0.41.

Code for generating association rules for Grocery dataset

Hence, confidence is not always a certain measure to tell if the chances of item B being purchased are dependent on the purchase of item A. Moreover, if the lift value is the same for any two given items, the order in which the items are purchased should not make a difference.

To facilitate a change in the results, we went back to our original data set and eliminated all the individual transactions. This time we included transactions for all the brands and not only Apple or Samsung.

The algorithm returned rules with massive lift values and significant confidence values. Surprisingly for a lot of different rules with various item sets, the support was 0.000205 (FYI, the highest support value among all the rules). We realized that these item sets were only different combinations of the same purchases and represented the same transactions. There were only 12 such transactions for items with the support of 0.000205 out of a total of 58435 transactions, and hence no significant association rule could be established between these items.

Conclusion

We concluded that the Apriori algorithm is not applicable for all kinds of datasets. It is suitable where there is a high chance of multiple products being purchased together, for example, in grocery stores or sports equipment stores or departmental stores, etcetera. Since electronics are very high-priced items, there are very few transactions where multiple products are frequently purchased together. Therefore, in such cases, Apriori is not useful for finding significant association rules.

Moreover, even where Apriori is applicable, the most important metric to consider is the support because a high support value indicates a high number of transactions for a given combination of products. Then, if the lift value is greater than 1, we can conclude that the association rule is significant, and we can explore that further for greater revenue generation.

To view the detailed code for this project, please visit OneByyTwo/A_Apriori-Project.

You can connect with us on LinkedIn at Vaibhav Gupta & Nishit Vyas or email us at OnyByyTwo

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓