Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Why Apriori Algorithm Is Not Applicable for All Types of Products/Stores?
Latest   Machine Learning

Why Apriori Algorithm Is Not Applicable for All Types of Products/Stores?

Last Updated on July 18, 2023 by Editorial Team

Author(s): OneByyTwo

Originally published on Towards AI.

Apriori algorithm is the most sought-after tool when it comes to conducting Market Basket Analysis. However, it is not applicable to all types of products or stores.

A note from the authors

Dear readers, before you go through our article, please be informed that we assume that you have a general idea about Market Basket Analysis and the Apriori algorithm. If not, we suggest you take a look at the following quick 7-minute article by

Eliana Grosof. Thank you for your time and interest!

Apriori Machine Learning Algorithm, Explained

A powerful yet simple ML algorithm for generating recommendations

medium.com

Image Source: https://intellipaat.com/blog/data-science-apriori-algorithm/

Purpose

We started the study with the intent of finding some out-of-the-box association rules. This intent was fueled by other such famous rules like [ beer, diapers] or [ beer, table fan] that we came across. However, through the course of this study, we ended up realizing that Apriori does not apply to all kinds of datasets. We also learned some amazing concepts and techniques which can make the process of conducting Market Basket Analysis pretty easy.

Datasets Used

We used the data set containing events data [ view, add to cart, purchase] of an e-commerce electronics platform, including all electronics brands. The intent was to identify uncommon rules which affect the purchase of several products. Since Samsung and Apple collectively constituted 57% of the data, we focused on the purchases that took place for these brands only.

Link to datasets

Electronics e-commerce platform dataset: eCommerce behavior data from multi-category store U+007C Kaggle

Grocery store dataset (Introduced later in this article): Market Basket Analysis Data U+007C Kaggle

Key Terms for this article

A β€œtransaction” refers to the purchase of one or more items. Each β€œtransaction” has a unique user session ID.

A β€œpurchase” refers to the purchase of one quantity of that item only. Multiple β€œpurchases” can have a common user session ID.

Journey

After removing β€œview” and β€œadd-to-cart” records, we assume each row in the dataset pertains to the purchase of one quantity of that item. These individual purchases are grouped by user session ids resulting in different transactions.

We used the Apriori algorithm on python to conduct the Market Basket Analysis separately for Apple and Samsung. We had to reduce our metric thresholds to several decimal places due to many individual item transactions.

Code for defining a function for the Apriori association rule generator
Code for running the Apriori algorithm on the Samsung dataset
Code for running the Apriori algorithm on the Samsung dataset using a lower support threshold

As a result, all our metric values were very low, and we couldn’t find any significant association rules between any two given items. To validate that our approach was correct, we ran the same code on a grocery store data set meant, especially for Market Basket Analysis.

Results

We understood how confidence is calculated, but we couldn’t shake the feeling that confidence is just an arbitrary measure of the likelihood of item B being purchased if item A is purchased. For example, we found if β€œDill” is purchased first, the confidence of β€œEggs” being purchased was 0.39. At the same time, if β€œEggs” are purchased first, the likelihood of β€œDill” being purchased is approximately 0.41.

Code for generating association rules for Grocery dataset

Hence, confidence is not always a certain measure to tell if the chances of item B being purchased are dependent on the purchase of item A. Moreover, if the lift value is the same for any two given items, the order in which the items are purchased should not make a difference.

To facilitate a change in the results, we went back to our original data set and eliminated all the individual transactions. This time we included transactions for all the brands and not only Apple or Samsung.

The algorithm returned rules with massive lift values and significant confidence values. Surprisingly for a lot of different rules with various item sets, the support was 0.000205 (FYI, the highest support value among all the rules). We realized that these item sets were only different combinations of the same purchases and represented the same transactions. There were only 12 such transactions for items with the support of 0.000205 out of a total of 58435 transactions, and hence no significant association rule could be established between these items.

Conclusion

We concluded that the Apriori algorithm is not applicable for all kinds of datasets. It is suitable where there is a high chance of multiple products being purchased together, for example, in grocery stores or sports equipment stores or departmental stores, etcetera. Since electronics are very high-priced items, there are very few transactions where multiple products are frequently purchased together. Therefore, in such cases, Apriori is not useful for finding significant association rules.

Moreover, even where Apriori is applicable, the most important metric to consider is the support because a high support value indicates a high number of transactions for a given combination of products. Then, if the lift value is greater than 1, we can conclude that the association rule is significant, and we can explore that further for greater revenue generation.

To view the detailed code for this project, please visit OneByyTwo/A_Apriori-Project.

You can connect with us on LinkedIn at Vaibhav Gupta & Nishit Vyas or email us at OnyByyTwo

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓