Why Apriori Algorithm Is Not Applicable for All Types of Products/Stores?
Last Updated on July 18, 2023 by Editorial Team
Author(s): OneByyTwo
Originally published on Towards AI.
Apriori algorithm is the most sought-after tool when it comes to conducting Market Basket Analysis. However, it is not applicable to all types of products or stores.
A note from the authors
Dear readers, before you go through our article, please be informed that we assume that you have a general idea about Market Basket Analysis and the Apriori algorithm. If not, we suggest you take a look at the following quick 7-minute article by
Eliana Grosof. Thank you for your time and interest!
Apriori Machine Learning Algorithm, Explained
A powerful yet simple ML algorithm for generating recommendations
medium.com
Purpose
We started the study with the intent of finding some out-of-the-box association rules. This intent was fueled by other such famous rules like [ beer, diapers] or [ beer, table fan] that we came across. However, through the course of this study, we ended up realizing that Apriori does not apply to all kinds of datasets. We also learned some amazing concepts and techniques which can make the process of conducting Market Basket Analysis pretty easy.
Datasets Used
We used the data set containing events data [ view, add to cart, purchase] of an e-commerce electronics platform, including all electronics brands. The intent was to identify uncommon rules which affect the purchase of several products. Since Samsung and Apple collectively constituted 57% of the data, we focused on the purchases that took place for these brands only.
Link to datasets
Electronics e-commerce platform dataset: eCommerce behavior data from multi-category store U+007C Kaggle
Grocery store dataset (Introduced later in this article): Market Basket Analysis Data U+007C Kaggle
Key Terms for this article
A βtransactionβ refers to the purchase of one or more items. Each βtransactionβ has a unique user session ID.
A βpurchaseβ refers to the purchase of one quantity of that item only. Multiple βpurchasesβ can have a common user session ID.
Journey
After removing βviewβ and βadd-to-cartβ records, we assume each row in the dataset pertains to the purchase of one quantity of that item. These individual purchases are grouped by user session ids resulting in different transactions.
We used the Apriori algorithm on python to conduct the Market Basket Analysis separately for Apple and Samsung. We had to reduce our metric thresholds to several decimal places due to many individual item transactions.
As a result, all our metric values were very low, and we couldnβt find any significant association rules between any two given items. To validate that our approach was correct, we ran the same code on a grocery store data set meant, especially for Market Basket Analysis.
Results
We understood how confidence is calculated, but we couldnβt shake the feeling that confidence is just an arbitrary measure of the likelihood of item B being purchased if item A is purchased. For example, we found if βDillβ is purchased first, the confidence of βEggsβ being purchased was 0.39. At the same time, if βEggsβ are purchased first, the likelihood of βDillβ being purchased is approximately 0.41.
Hence, confidence is not always a certain measure to tell if the chances of item B being purchased are dependent on the purchase of item A. Moreover, if the lift value is the same for any two given items, the order in which the items are purchased should not make a difference.
To facilitate a change in the results, we went back to our original data set and eliminated all the individual transactions. This time we included transactions for all the brands and not only Apple or Samsung.
The algorithm returned rules with massive lift values and significant confidence values. Surprisingly for a lot of different rules with various item sets, the support was 0.000205 (FYI, the highest support value among all the rules). We realized that these item sets were only different combinations of the same purchases and represented the same transactions. There were only 12 such transactions for items with the support of 0.000205 out of a total of 58435 transactions, and hence no significant association rule could be established between these items.
Conclusion
We concluded that the Apriori algorithm is not applicable for all kinds of datasets. It is suitable where there is a high chance of multiple products being purchased together, for example, in grocery stores or sports equipment stores or departmental stores, etcetera. Since electronics are very high-priced items, there are very few transactions where multiple products are frequently purchased together. Therefore, in such cases, Apriori is not useful for finding significant association rules.
Moreover, even where Apriori is applicable, the most important metric to consider is the support because a high support value indicates a high number of transactions for a given combination of products. Then, if the lift value is greater than 1, we can conclude that the association rule is significant, and we can explore that further for greater revenue generation.
To view the detailed code for this project, please visit OneByyTwo/A_Apriori-Project.
You can connect with us on LinkedIn at Vaibhav Gupta & Nishit Vyas or email us at OnyByyTwo
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI