Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
+2 votes
2 views
in Machine Learning by (4.2k points)

I am new to this area as well as the terminology so please feel free to suggest if I go wrong somewhere. I have two datasets like this:

Dataset 1:

A B C 0 E
A 0 C 0 0
A 0 C D E
A 0 C 0 E

The way I interpret this is at some point in time, (A,B,C,E) occurred together and so did (A,C), (A,C,D,E) etc.

Dataset 2:

5A 1B 5C  0 2E
4A  0 5C  0  0
2A  0 1C 4D 4E
3A  0 4C  0 3E

The way I interpret this is at some point in time, 5 occurrences of A, 1 occurrence of B, 5 occurrences of C and 2 occurrences of E happened and so on.

I am trying to find what items occur together and if possible, also find out the cause and effect for this. For this, I am not understanding how to go about using both the datasets (or if one is enough). It would be good to have a good tutorial on this but my primary question is which dataset to utilize and how to proceed in (i) building a frequent itemset and (ii) building association rules between them.

Can someone point me to a practical tutorials/examples (preferably in Python) or at least explain in brief words on how to approach this problem?

1 Answer

+2 votes
by (6.8k points)

Some theoretical facts about association rules:

  1. Association rules are a type of undirected data mining that finds patterns in the data where the target is not specified beforehand. Whether the patterns make sense is left to human interpretation.
  2. The goal of association rules is to detect relationships or associations between specific values of categorical variables in large sets.
  3. And is rules can be interpreted as "70% of the customers who buy wine and cheese also buy grapes".

Use the Orange data mining library, which has a good library for association rules.

Usage example:

import orange  

items = orange.ExampleTable("item")

rules = orange.AssociationRulesSparseInducer(items, support = 0.1)  

for r in rules: 

   print "%5.3f %5.3f %s" % (r.support, r.confidence, r)

It seems like a neat way to handle this type of problem is using a Bayesian network. In particular as a Bayesian network structure learning problem. Once you have that you will be able to efficiently answer questions like p(A=1|B=0 and C=1) and so on.

Since Bayes' is a part of Machine Learning Tutorial, thus this is the reason why aspirants go for Machine Learning Algorithms while learning the particular theorem.

31k questions

32.9k answers

503 comments

693 users

Browse Categories

...