# Data Science - Apriori Algorithm in Python- Market Basket Analysis

- What Is Association Rule Mining?
- What Is an Apriori Algorithm?
- How Does the Apriori Algorithm Work?
- Support
- Confidence
- Lift
- Conviction
- Hands-on: Apriori Algorithm in Python- Market Basket Analysis
- Problem Statement
- Dataset
- Environment Setup
- Implementation
- Limitations of Apriori Algorithm
- Improvements
- Applications of Apriori Algorithm
- What Did We Learn?

**Introduction to Apriori Algorithm in Python **

Apriori algorithm is a classical **algorithm** in **Data Mining** that is used for mining frequent itemsets and association rule mining.

**Watch Apriori Algorithm Tutorial**

**What is Association Rule Mining?**

As mentioned before, the Apriori algorithm is used for the purpose of association rule mining. Now, what is association rule mining? Association rule mining is a technique to identify frequent patterns and associations among a set of items.

For example, understanding customer buying habits. By finding correlations and associations between different items that customers place in their ‘shopping basket,’ recurring patterns can be derived.

Say, Joshua goes to buy a bottle of wine from the supermarket. He also grabs a couple of chips as well. The manager there analyses that, not only Joshua, people often tend to buy wine and chips together. After finding out the pattern, the manager starts to arrange these items together and notices an increase in sales.

This process of identifying an association between products/items is called association rule mining. To implement association rule mining, many algorithms have been developed. Apriori algorithm is one of the most popular and arguably the most efficient algorithms among them. Let us discuss what an Apriori algorithm is.

**Learn all about Data Science through this what is Data Science Blog!**

## What Is an Apriori Algorithm?

Apriori algorithm assumes that any subset of a frequent itemset must be frequent.

Say, a transaction containing {wine, chips, bread} also contains {wine, bread}. So, according to the principle of Apriori, if {wine, chips, bread} is frequent, then {wine, bread} must also be frequent.

*Become a Data Science engineer with expertise in Python. Enroll in Data Science with Python course in Philippines*

**How Does the Apriori Algorithm Work?**

The key concept in the Apriori algorithm is that it assumes all subsets of a frequent itemset to be frequent. Similarly, for any infrequent itemset, all its supersets must also be infrequent.

**Grab high-paying analytics jobs with the help of these Top Data Science Interview Questions!**

Let us try and understand the working of an Apriori algorithm with the help of a very famous business scenario, market basket analysis.

Here is a dataset consisting of six transactions in an hour. Each transaction is a combination of 0s and 1s, where 0 represents the absence of an item and 1 represents the presence of it.

Transaction ID | Wine | Chips | Bread | Milk |

1 | 1 | 1 | 1 | 1 |

2 | 1 | 0 | 1 | 1 |

3 | 0 | 0 | 1 | 1 |

4 | 0 | 1 | 0 | 0 |

5 | 1 | 1 | 1 | 1 |

6 | 1 | 1 | 0 | 1 |

We can find multiple rules from this scenario. For example, in a transaction of wine, chips, and bread, if wine and chips are bought, then customers also buy bread.

{wine, chips} =>; {bread}In order to select the interesting rules out of multiple possible rules from this small business scenario, we will be using the following measures:

- Support
- Confidence
- List
- Conviction

**Enrich your knowledge by reading this comprehensive Data Science Tutorial!**

**Support**

Support of item x is nothing but the ratio of the number of transactions in which item x appears to the total number of transactions.

i.e.,

Support(wine) =

Support(wine) = = 0.66667

**Confidence**

Confidence (x => y) signifies the likelihood of the item y being purchased when item x is purchased. This method takes into account the popularity of item x.

i.e.,

Conf({wine, chips} => {bread}) =

Conf({wine, chips} => {bread})= = 0.667

**Lift**

Lift (x => y) is nothing but the ‘interestingness’ or the likelihood of the item y being purchased when item x is sold. Unlike confidence (x => y), this method takes into account the popularity of the item y.

i.e.,

lift ({wine, chips} => {bread}) =

lift ({wine, chips} => {bread}) ==1

- Lift (x => y) = 1 means that there is no correlation within the itemset.
- Lift (x => y) > 1 means that there is a positive correlation within the itemset, i.e., products in the itemset, x and y, are more likely to be bought together.
- Lift (x => y) < 1 means that there is a negative correlation within the itemset, i.e., products in itemset, x and y, are unlikely to be bought together.

**Conviction**

Conviction of a rule can be defined as follows:

conv(*x* => *y*) =

i.e.,

conv({wine, chips} => {bread} ) = = =1

Its value range is [0, +∞].

- Conv(x => y) = 1 means that x has no relation with y.
- Greater the conviction higher the interest in the rule.

Now that we know the methods to find out the interesting rules, let us go back to the example. Before we get started, let us fix the support threshold to 50 percent.

**If you have any doubts or queries related to Data Science, do a post on Data Science Community.**

**Step 1: Create a frequency table of all the items that occur in all transactions**

Item | Frequency |

Wine | 4 |

Chips | 4 |

Bread | 4 |

Milk | 5 |

**Interested in learning Data Science? Click here to learn more about this Data Science Training in Sydney!**

**Step 2**: **Find the significant items based on the support threshold**

Support threshold = 3

Item | Frequency |

Wine | 4 |

Chips | 4 |

Bread | 4 |

Milk | 5 |

**Step 3**:** From the significant items, make possible pairs irrespective of the order**

Item | Frequency |

Wine, Chips | 3 |

Wine, Bread | 3 |

Wine, Milk | 4 |

Chips, Bread | 2 |

Chips, Milk | 3 |

Bread, Milk | 4 |

**Step 4**: **Again, find the significant items based on the support threshold**

Item | Frequency |

Wine, Milk | 4 |

Bread, Milk | 4 |

**Step 5: Now, make a set of three items that are bought together based on the significant items from Step 4**

Item | Frequency |

Wine, Bread, Milk | 3 |

**{Wine, Bread, Milk}** is the only significant item set we have got from the given data. But in real-world scenarios, we would have dozens of items to build rules from. Then, we might have to make four/five-pair itemsets.

**Now here is an Apriori algorithm example to explain how the Apriori algorithm works, let us implement this with the help of the Python programming language.**

## Hands-on: Apriori Algorithm in Python- Market Basket Analysis

**Problem Statement**

The manager of a retail store is trying to find out an association rule between six items, to figure out which items are more often bought together so that he can keep the items together in order to increase sales.

**Dataset**

Below is the transaction data from Day 1. This dataset contains 6 items and 22 transaction records.

Become Master of Data Science by going through this online Data Science course in Toronto.

**Environment Setup:**

Before we move forward, we need to install the ‘apyori’ package first.

`pip install apyori`

Get 100% Hike!

Master Most in Demand Skills Now !

**Market Basket Analysis Implementation within Python**

With the help of the **apyori** package, we will be implementing the Apriori algorithm in order to help the manager in **market basket analysis**.

**Step 1: Import the libraries**

**Step 2: Load the dataset**

**Step 3: Have a glance at the records**

**Step 4: Look at the shape **

**Step 5:** **Convert Pandas DataFrame into a list of lists**

**Step 6: Build the Apriori model **

**Step 7: Print out the number of rules**

**Step 8: Have a glance at the rule**

The support value for the first rule is 0.5. This number is calculated by dividing the number of transactions containing ‘Milk,’ ‘Bread,’ and ‘Butter’ by the total number of transactions.

The confidence level for the rule is 0.846, which shows that out of all the transactions that contain both “Milk” and “Bread”, 84.6 % contain ‘Butter’ too.

The lift of 1.241 tells us that ‘Butter’ is 1.241 times more likely to be bought by the customers who buy both ‘Milk’ and ‘Butter’ compared to the default likelihood sale of ‘Butter.’

**Learn Data Science from experts, click here to more in this Data Science course in Pune!**

**Limitations of Apriori Algorithm**

Despite being a simple one, Apriori algorithms have some limitations including:

- Waste of time when it comes to handling a large number of candidates with frequent itemsets.
- The efficiency of this algorithm goes down when there is a large number of transactions going on through a limited memory capacity.
- Required high computation power and need to scan the entire database.

*Learn more about Python from an expert. Enroll in our Python training in Bangalore*

**Improvements**

Following are the ways to improve the efficiency of the algorithm:

- Use hashing techniques to reduce the number of database scans.
- Do not take the infrequent transaction further into consideration.
- If a purchase is frequent in one partition, it should be frequent in another partition.
- Try to pick up random samples to improve the accuracy of your algorithm.
- Use dynamic itemset counting to introduce new candidate itemsets while the scanning of the database is performed.

**Applications of Apriori Algorithm**

Some of the popular application of the algorithm is:

- Used in forest departments to understand the intensity and probability of forest fires.
- Used by Google and other search engines for their auto-complete features.
- The Healthcare department used such algorithms to analyze the patients’ database and predict which patients might develop blood pressure, diabetes, another common disease.
- Used to categorize students based on their specialties and performance to improve their academic performance.
- E-commerce websites use it in their recommendation systems to provide a better user experience.

**What Did We Learn?**

In this tutorial, we have learned what association rule mining is, what the Apriori algorithm is, and with the help of an Apriori algorithm example, we learned how the Apriori algorithm works. In the end, we have built an Apriori model in Python programming language on market basket analysis. Python Programming Course is one of the most demanding skills right now in the market.

Read our comparison blog on Data Mining vs Statistics for in-depth knowledge about them.

Course Schedule

Name | Date | Details |
---|---|---|

Data Scientist Course |
20 Apr 2024(Sat-Sun) Weekend Batch |
View Details |

Data Scientist Course |
27 Apr 2024(Sat-Sun) Weekend Batch |
View Details |

Data Scientist Course |
04 May 2024(Sat-Sun) Weekend Batch |
View Details |