Chapter 4 Data Mining with Association Rules 4.1 When is association rule analysis useful?. These three examples illustrate the three common types of rules produced by associa-tion rule
Trang 1Chapter 4
Data Mining with Association Rules
4.1 When is association rule analysis useful?
An appeal of market analysis comes from the clarity and utility of its results, which
are in the form of association rules There is an intuitive appeal to a market analysis
because it expresses how tangible products and services relate to each other, how they tend to group together A rule like, “if a customer purchases three way calling, then that customer will also purchase call waiting” is clear Even better, it suggests a specific course of action, like bundling three-way calling with call waiting into a sin-gle service package While association rules are easy to understand, they are not al-ways useful The following three rules are examples of real rules generated from real data:
On Thursdays, grocery store consumers often purchase diapers and beer to-gether
Customers who purchase maintenance agreements are very likely to purchase large appliances
When a new hardware store opens, one of the most commonly sold items is toilet rings
These three examples illustrate the three common types of rules produced by
associa-tion rule analysis: the useful, the trivial, and the inexplicable
The useful rule contains high quality, actionable information In fact, once the pattern
is found, it is often not hard to justify The rule about diapers and beer on Thursdays suggests that on Thursday evenings, young couples prepare for the weekend by stock-ing up on diapers for the infants and beer for dad (who, for the sake of argument, we stereotypically assume is watching football on Sunday with a six-pack) By locating their own brand of diapers near the aisle containing the beer, they can increase sales
of a high-margin product Because the rule is easily understood, it suggests plausible causes, leading to other interventions: placing other baby products within sight of the beer so customers do not “forget” anything and putting other leisure foods, like po-tato chips and pretzels, near the baby products
Trivial results are already known by anyone at all familiar with the business The
second example “Customers who purchase maintenance agreements are very likely to purchase large appliances” is an example of a trivial rule In fact, we already know that customers purchase maintenance agreements and large appliances at the same time Why else would they purchase maintenance agreements? The maintenance agreements are advertised with large appliances and rarely sold separately This rule,
Trang 2though, was based on analyzing hundreds of thousands of point-of-sale transactions from Sears Although it is valid and well-supported in the data, it is still useless Similar results abound: People who buy 2-by-4s also purchase nails; customers who purchase paint buy paint brushes; oil and oil filters are purchased together as are hamburgers and hamburger buns, and charcoal and lighter fluid
A subtler problem falls into the same category A seemingly interesting resultlike the fact that people who buy the three-way calling option on their local telephone service almost always buy call waiting-may be the result of marketing programs and product bundles In the case of telephone service options, three-way calling is typi-cally bundled with call waiting, so it is difficult to order it separately In this case, the analysis is not producing actionable results; it is producing already acted-upon results Although a danger for any data mining technique, association rule analysis is particu-larly susceptible to reproducing the success of previous marketing campaigns because
of its dependence on un-summarized point-of-sale dataexactly the same data that
defines the success of the campaign Results from association rule analysis may
sim-ply be measuring the success of previous marketing campaigns
Inexplicable results seem to have no explanation and do not suggest a course of ac-tion The third pattern (“When a new hardware store opens, one of the most
com-monly sold items is toilet rings”) is intriguing, tempting us with a new fact but pro-viding information that does not give insight into consumer behavior or the merchan-dise, or suggest further actions In this case, a large hardware company discovered the pattern for new store openings, but did not figure out how to profit from it Many items are on sale during the store openings, but the toilet rings stand out More inves-tigation might give some explanation: Is the discount on toilet rings much larger than for other products? Are they consistently placed in a high-traffic area for store open-ings but hidden at other times? Is the result an anomaly from a handful of stores? Are they difficult to find at other times? Whatever the cause, it is doubtful that further analysis of just the association rule data can give a credible explanation
4.2 How does association rule analysis work
Association rule analysis starts with transactions containing one or more products or service offerings and some rudimentary information about the transaction For the purpose of analysis, we call the products and service offerings items Table 4.1 illus-trates five transactions in a grocery store that carries five products These transactions are simplified to include only the items purchased How to use information like the date and time and whether the customer used cash will be discussed later in this chap-ter Each of these transactions gives us information about which products are pur-chased with which other products Using this data, we can create a co-occurrence ta-ble that tells the number of times that any pair of products was purchased together (see Table 4.2) For instance, by looking at the box where the “Soda” row intersects the “OJ” column, we see that two transactions contain both soda and orange juice
Trang 3The values along the diagonal (for instance, the value in the “OJ” column and the
“OJ” row) represent the number of transactions containing just that item
Table 4.1: Grocery point-of-sale transactions The co-occurrence table contains some simple patterns:
OJ and soda are likely to be purchased together than any other two items
Detergent is never purchased with window cleaner or milk
Milk is never purchased with soda or detergent
These simple observations are examples of associations and may suggest a formal
rule like: “If a customer purchases soda, then the customer also purchases milk”
For now, we defer discussion of how we find this rule automatically Instead, we ask the question: How good is this rule? In the data, two of the five transactions
include both soda and orange juice These two transactions support the rule
An-other way of expressing this is as a percentage The support for the rule is two out
of five or 40 percent
Items OJ Cleaner Milk Soda Detergent
Table 4.2: Co-occurrence of products Since both the transactions that contain soda also contain orange juice, there is a high
degree of confidence in the rule as well In fact, every transaction that contains soda
also contains orange juice, so the rule “if soda, then orange juice” has a confidence of
100 percent We are less confident about the inverse rule, “if orange juice then soda”, because of the four transactions with orange juice, only two also have soda Its confi-dence, then, is just 50 percent More formally, confidence is the ratio of the number
of the transactions supporting the rule to the number of transactions where the condi-tional part of the rule holds Another way of saying this is that confidence is the ratio
of the number of transactions with all the items to the number of transactions with just the “if” items
Trang 44.3 The basic process of mining association rules
This basic process for association rules analysis consist of three important concerns
Choosing the right set of items
Generating rules by deciphering the counts in the co-occurrence matrix
Overcoming the practical limits imposed by thousands or tens of thousands
of items appearing in combinations large enough to be interesting
Choosing the Right Set of Items The data used for association rule analysis is
typi-cally the detailed transaction data captured at the point of sale Gathering and using this data is a critical part of applying association rule analysis, depending crucially on the items chosen for analysis What constitutes a particular item depends on the busi-ness need Within a grocery store where there are tens of thousands of products on the shelves, a frozen pizza might be considered an item for analysis pur-posesregardless of its toppings (extra cheese, pepperoni, or mushrooms), its crust (extra thick, whole wheat, or white), or its size So, the purchase of a large whole wheat vegetarian pizza contains the same “frozen pizza” item as the purchase of a single-serving, pepperoni with extra cheese A sample of such transactions at this summarized level might look like Table 4.3
pizza milk sugar apples coffee
Table 4.3: Transactions with more summarized items
On the other hand, the manager of frozen foods or a chain of pizza restaurants may be very interested in the particular combinations of toppings that are ordered He or she might decompose a pizza order into constituent parts, as shown in Table 4.4
cheese onions peppers mush olives
Table 4.4: Transactions with more detailed items
Trang 5At some later point in time, the grocery store may become interested in more detail in its transactions, so the single “frozen pizza” item would no longer be sufficient Or, the pizza restaurants might broaden their menu choices and become less interested in all the different toppings The items of interest may change over time This can pose
a problem when trying to use historical data if the transaction data has been summa-rized
Choosing the right level of detail is a critical consideration for the analysis If the transaction data in the grocery store keeps track of every type, brand, and size of fro-zen pizza-which probably account for several dofro-zen productsthen all these items need to map down to the “frozen pizza” item for analysis
Taxonomies Help to Generalize Items In the real world, items have product codes
and stock-keeping unit codes (SKUs) that fall into hierarchical categories, called tax-onomy When approaching a problem with association rule analysis, what level of the taxonomy is the right one to use? This brings up issues such as
Are large fries and small fries the same product?
Is the brand of ice cream more relevant than its flavor?
Which is more important: the size, style, pattern, or designer of clothing?
Is the energy-saving option on a large appliance indicative of customer be-havior?
The number of combinations to consider grows very fast as the number of items used
in the analysis increases This suggests using items from higher levels of the taxon-omy, “frozen desserts” instead of “ice cream” On the other hand, the more specific the items are, the more likely the results are actionable Knowing what sells with a particular brand of frozen pizza, for instance, can help in managing the relationship with the producer One compromise is to use more general items initially, then to re-peat the rule generation to hone in on more specific items As the analysis focuses on more specific items, use only the subset of transactions containing those items
The complexity of a rule refers to the number of items it contains The more items in the transactions, the longer it takes to generate rules of a given complexity So, the desired complexity of the rules also determines how specific or general the items should be In some circumstances, customers do not make large purchases For in-stance, customers purchase relatively few items at any one time at a convenience store or through some catalogs, so looking for rules containing four or more items may apply to very few transactions and be a wasted effort In other cases, like in a supermarket, the average transaction is larger, so more complex rules are useful
Moving up the taxonomy hierarchy reduces the number of items Dozens or hundreds
of items may be reduced to a single generalized item, often corresponding to a single department or product line An item like a pint of Ben & Jerry’s Cherry Garcia gets generalized to “ice cream” or “frozen desserts “ Instead of investigating “orange juice”, investigate “fruit juices” Instead of looking at 2 percent milk, map it to “dairy
Trang 6products” Often, the appropriate level of the hierarchy ends up matching a depart-ment with a product-line manager, so using generalized items has the practical effect
of finding interdepartmental relationships, because the structure of the organization is likely to hide relationships between departments, these relationships are more likely
to be actionable Generalized items also help find rules with sufficient support There will be many times as many transactions sup-ported by higher levels of the taxonomy than lower levels
Just because some items are generalized does not mean that all items need to move up
to the same level The appropriate level depends on the item, on its importance for producing actionable results, and on its frequency in the data For instance, in a de-partment store big-ticket items (like appliances) might stay at a low level in the hier-archy while less expensive items (such as books) might be higher This hybrid ap-proach is also useful when looking at individual products Since there are often thou-sands of products in the data, generalize everything else except for the product or products of interest
Association rule analysis produces the best results when the items occur in roughly the same number of transactions in the data This helps prevent rules from being dominated by the most common items Taxonomies can help here Roll up rare items
to higher levels in the taxonomy; so they become more frequent More common items may not have to be rolled up at all
Generating Rules from All This Data Calculating the number of times that a given
combination of items appears in the transaction data is well and good, but a combination
of items is not a rule Sometimes, just the combination is interesting in itself, as in the dia-per, beer, and Thursday example But in other circumstances, it makes more sense to find
an underlying rule What is a rule? A rule has two parts, a condition and a result, and is usually represented as a statement:
If condition then result
If the rule says,
If 3-way calling then call-waiting
we read it as: “if a customer has 3-way calling, then the customer also has call-waiting” In practice, the most actionable rules have just one item as the result So, a rule like
If diapers and Thursday, then beer
is more useful than
If Thursday, then diapers and beer
Trang 7Constructs like the co-occurrence table provide the information about which combi-nation of items occur most commonly in the trans-actions For the sake of illustration, let’s say the most common combination has three items, A, B, and C The only rules
to consider are those with all three items in the rule and with exactly one item in the result:
If A and B, then C
If A and C, then B
If B and C, then A
What about their confidence level? Confidence is the ratio of the number of transac-tions with all the items in the rule to the number of transactransac-tions with just the items in the condition What is confidence really saying? Saying that the rule “if B and C then A” has a confidence of 0.33 is equivalent to saying that when B and C appear in a transaction, there is a 33 percent chance that A also appears in it That is, one time in three A occurs with B and C, and the other two times, A does not
The most confident rule is the best rule, so we are tempted to choose “if B and C then A” But there is a problem This rule is actually worse than if just randomly saying that A appears in the transaction A occurs in 45 percent of the transactions but the rule only gives 33 percent confidence The rule does worse than just randomly guess-ing This suggests another measure called improvement Improvement tells how much better a rule is at predicting the result than just assuming the result in the first place It is given by the following formula:
p(result) n)
p(conditio
result) and
n p(conditio t
improvemen
When improvement is greater than 1, then the resulting rule is better at predicting the
result than random chance When it is less than 1, it is worse The rule “if A then B” is
1.31 times better at predicting when B is in a transaction than randomly guessing In this case, as in many cases, the best rule actually contains fewer items than other rules being considered When improvement is less than 1, negating the result produces a better rule If the rule
If B and C then A
has a confidence of 0.33, then the rule
If B and C then NOT A
has a confidence of 0.67 Since A appears in 45 percent of the transactions, it does NOT occur in 55 percent of them Applying the same improvement measure shows that the improvement of this new rule is 1.22 (0.67/0.55) The negative rule is useful The rule “If A and B then NOT C” has an improvement of 1.33, better than any of the other rules Rules are generated from the basic probabilities available in the
Trang 8co-occurrence table Useful rules have an improvement that is greater than 1 When the improvement scores are low, you can increase them by negating the rules However, you may find that negated rules are not as useful as the original association rules when it comes to acting on the results
Overcoming Practical Limits Generating association rules is a multi-step process
The general algorithm is:
Generate the co-occurrence matrix for single items
Generate the co-occurrence matrix for two items Use this to find rules with two items
Generate the co-occurrence matrix for three items Use this to find rules with three items
And so on
For instance, in the grocery store that sells orange juice, milk, detergent, soda, and window cleaner, the first step calculates the counts for each of these items During the second step, the following counts are created:
OJ and milk, OJ and detergent, OJ and soda, OJ and cleaner
Milk and detergent, milk and soda, milk and cleaner
Detergent and soda, detergent and cleaner
Soda and cleaner
This is a total of 10 counts The third pass takes all combinations of three items and
so on Of course, each of these stages may require a separate pass through the data or multiple stages can be combined into a single pass by considering different numbers
of combinations at the same time
Although it is not obvious when there are just five items, increasing the number of items in the combinations requires exponentially more computation This results in exponentially growing run times-and long, long waits when considering
combina-tions with more than three or four items The solution is pruning Pruning is a
tech-nique for reducing the number of items and combinations of items being considered
at each step At each stage, the algorithm throws out a certain number of combina-tions that do not meet some threshold criterion
The most common pruning mechanism is called minimum support pruning Recall
that support refers to the number of transactions in the database where the rule holds Minimum support pruning requires that a rule hold on a minimum number of transac-tions For instance, if there are 1 million transactions and the minimum support is 1 percent, then only rules supported by 10,000 transactions are of interest This makes sense, because the purpose of generating these rules is to pursue some sort of action-such as putting own-brand diapers in the same aisle as beer-and the action must affect enough transactions to be worthwhile
Trang 9The minimum support constraint has a cascading effect Say we are considering a rule with four items in it, like
If A, B, and C, then D
Using minimum support pruning, this rule has to be true on at least 10,000 transac-tions in the data It follows that:
A must appear in at least 10,000 transactions; and,
B must appear in at least 10,000 transactions; and,
C must appear in at least 10,000 transactions; and,
D must appear in at least 10,000 transactions
In other words, minimum support pruning eliminates items that do not appear in enough transactions! There are two ways to do this The first way is to eliminate the items from consideration The second way is to use the taxonomy to generalize the items so the resulting generalized items meet the threshold criterion
The threshold criterion applies to each step in the algorithm The minimum threshold also implies that:
A and B must appear together in at least 10,000 transactions; and,
A and C must appear together in at least 10,000 transactions; and,
A and D must appear together in at least 10,000 transactions;
And so on
Each step of the calculation of the co-occurrence table can eliminate combinations of items that do not meet the threshold, reducing its size and the number of combina-tions to consider during the next pass The best choice for minimum support depends
on the data and the situation It is also possible to vary the minimum support as the algorithm progresses For instance, using different levels at different stages you can find uncommon combinations of common items (by decreasing the support level for successive steps) or relatively common combinations of uncommon items (by in-creasing the support level) Varying the minimum support helps to find actionable rules, so the rules generated are not all like finding that peanut butter and jelly are of-ten purchased together
4.4 The problem of large datasets
A typical fast-food restaurant offers several dozen items on its menu, says there are a
100 To use probabilities to generate association rules, counts have to be calculated for each combination of items The number of combinations of a given size tends to grow exponentially A combination with three items might be a small fries, cheese-burger, and medium diet Coke On a menu with 100 items, how many combinations are there with three menu items? There are 161,700! (This is based on the binomial
Trang 10formula from mathematics) On the other hand, a typical supermarket has at least 10,000 different items in stock, and more typically 20,000 or 30,000
Calculating the support, confidence, and improvement quickly gets out of hand as the number of items in the combinations grows There are almost 50 million possible combinations of two items in the grocery store and over 100 billion combinations of three items Although computers are getting faster and cheaper, it is still very expen- sive to calculate the counts for this number of combinations Calculating the counts for five or more items is prohibitively expensive The use of taxonomies reduces the number of items to a manageable size
The number of transactions is also very large In the course of a year, a decent-size chain of supermarkets will generate tens of millions of transactions Each of these transactions consists of one or more items, often several dozen at a time So, deter-mining if a particular combination of items is present in a particular transaction may re-quire a bit of effort-multiplied a million-fold for all the transactions
4.5 Strengths and Weaknesses of Association Rules Analysis
4.5.1 The strengths of association rule analysis
The strengths of association rule analysis are:
It produces clear and understandable results
It supports undirected data mining
It works on variable-length data
The computations it uses are simple to understandable
Results Are Clearly Understood The results of association rule analysis are
asso-ciation rules; these are readily expressed as English or as a statement in a query lan-guage such as SQL The expression of patterns in the data as “if-then” rules makes the results easy to understand and facilitates turning the results into action In some circumstances, merely the set of related items is of interest and rules do not even need
to be produced
Association rule Analysis Is Strong for Undirected Data Mining Undirected data
mining is very important when approaching a large set of data and you do not know where to begin Association rule analysis is an appropriate technique, when it can be applied, to analyze data and to get a start Most data mining techniques are not pri-marily used for undirected data mining Association rule analysis, on the other hand,
is used in this case and provides clear results
Association rule Analysis Works on Variable-Length Data Association rule
analysis can handle variable-length data without the need for summarization Other techniques tend to require records in a fixed format, which is not a natural way to