*What do the customers buy? Which products are bought together?* With these two short questions the field of association rule (AR) mining makes its appearance. In this field of ML, the original aim was to find associations and correlations between the different items that customers place in their shopping market. More generally, the goal of AR is to **find frequent **and interesting **patterns**, **associations**, **correlations**, or **causal structures ****among sets of items** or elements in large databases and put these relationships in terms of association **rules**. AR is an important part of the unsupervised learning paradigm, so the algorithm **has not** the presence of an expert to teach it during the training stage.

*Why AR mining may be so important*? Many commercial applications generate huge amounts of **unlabeled** data (just think of *Facebook* for a moment), so our favorite classifier system will **not work** in this environment. With AR we can exploit such databases and extract any kind of useful information, and rule-based systems are a well known to us...

Note that there are two main types of ARs: (1) binary or qualitative rules, and (2) quantitative rules. For this article I will use qualitative due to its simplicity.

**Association Rule Mining with Apriori**

Apriori is the most famous AR miner out there. With this algorithm we can find the most *frequent itemsets*, that is, those itemsets which appear most often in the database. Its simplicity can be summarized by the following code:

*k := 1*

*generate frequent itemsets (support count >= min support count) of length 1*

*repeat until no frequent itemsets are found:*

* **k := k + 1*

* **generate itemsets of size k from the k - 1 frequent itemsets*

* ** **compute the support of each candidate by scanning through the database*

But this article is focused on rule generation. *How do we obtain the association rules from the previously found frequent itemsets?* That is definitely tricky. To obtain the desired rules we have to obtain all the possible combinations (without repetition) from every frequent itemset and check if the confidence of every rule formed has a confidence greater or equal than a minimum threshold, the minimum confidence. Let's see this with the following example: suppose we have the frequent itemset *{B,C,D}*, where* B, C,* and *D* are items obtained from the transaction database. How many rules will this itemset generate? The answer is a maximum of six rules: because we work with qualitative rules, there are (2^k)-2, where k is the size of the set, in our case k = 3. The rules are:

*BC => E, BE => C, CE => B, B => CE, C => BE*, and *E => BC*. The idea is the following:

*for itemsInAntecedent := 1 to k*

* **first := getFirstItemFromItemset*

* **element := first*

* **do*

* **do*

* **setConsequentItemActive( first )*

* **for j := 1 to itemsInAntecedent*

* **element := getNextElementFromItemset( element )*

* **setConsequentItemActive( element )*

* **endfor*

* **inverseAntecedentFromConsequentInRule*

* **setRuleConfidence*

* **if minConf <= ruleConfidence then*

* **store rule*

* **endif*

* **while first <> lastItemFromItemset and itemsInAntecedent > 1 then*

* **first := getNextElementFromItemset( first )*

* **element := first*

* **while first <> lastItemFromItemset*

* **if itemsInAntecedent = 1 and numElements > 2 then*

* **setConsequentItemActive( lastElement )*

* **setRuleConfidence*

* **if minConf <= ruleConfidence then*

* **store rule*

* **endif*

* **endif*

*endfor*

As it can be appreciated, the algorithm is tricky enough to have a wannabe PhD for a couple of days wondering (from the very scratch) how to obtain the desired rules.

**References**

R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules. Technical report, IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, Sep 1994.

## Comments

## Post a Comment