Application of Unexpected Association Rule Mining

Một phần của tài liệu IT training intelligent knowledge a study beyond data mining shi, zhang, tian li 2015 06 14 (Trang 87 - 93)

4.3 Research on Unexpected Association Rule Mining

4.3.4 Application of Unexpected Association Rule Mining

In this subsection, this algorithm will be validated by a instance in three aspects:

(1) whether rule’s extent of exceptional can filter redundant rules effectively (2) whether by this algorithm we can obtain results from different layers which are decided by users (3) whether the knowledge data base can be used to cumulate and reuse during the algorithm running. Finally, the feedback-style associated rules applied on product good are analyzed.

4.3.4.1 Extraction of Domain Knowledge

(1) Domain Knowledge of Data Preprocessing —Knowledge of Attribute Selection Supposed users focus on the relationship of only 20 sorts of food in the supermar- ket, we adopt one clause of domain knowledge of attribute selection including 20 factors in the data preprocessing stage. This knowledge limit the algorithm to only process on the 20 sorts of foods specified above.

(2) Domain Knowledge in the Algorithm Running Stage—Knowledge of Concept Hierarchy

We can obtain the tree of concept through the connection of relative experts and the research of relationship of different sorts of commodities. The tree can be recorded in XML file as knowledge of concept hierarchy, by the methods introduced above, and then can be used to induce the algorithm to do data mining in different concept level.

75 4.3 Research on Unexpected Association Rule Mining of Designed …

(3) Knowledge in the Assessment Stage—Rules Mined Before

Initially, there is no rule about this market. After the first rule, every rule will be filtered by the extent rule exception of knowledge database, which consist of the rules mined earlier. The filter results can be accessed by two ways: use extent rule exception or not.

4.3.4.2 The Feature of Data and Constrain in Algorithm Running Process (1) Feature of Data

This chapter studies 42463 records of 20 kinds of supermarket food. Data in the set are all bool type. The concept hierarchy relationship of commodities is as shown in Fig. 4.10.

A concept tree have 5 levels is shown in Fig. 4.10, for the equivalence the non- equivalent part have been processed. All the data are bool type, so the “OR” operator will be adapted in leverage of concept. For instance, if “soft cake” = 1, and “cook- ie” = 1, then “Chinese” = 1; if “soft cake” = 0 and “cookie” = 0, then “Chinese” = 0.

(2) The Constrain of Algorithm

The following constrains should be satisfied in the mining process:

First, the threshold of support, confident and lift are 8 %, 80 % and 1.1, respec- tively.

Second, the number of rule’s prior are limited to under 5.

Finally, the accepted error of all the attributes are 0.1. The unexpected extent of condition, results and integrity are all 0.9, while confident is 0.7.

4.3.4.3 Experimental Results

The mining process have been done in levels from first to fourth concept level, the result is as shown in Table 4.1.

From the table we can draw conclusion as follows: the result of the first level and second level are same because the tree structure of them are nearly same; the rules in the third level are the most numerous, while the fourth only have one.

The three aspects discussed above have been validated in this experiment:

First, filter process are effective. In the third concept level, after adapting the measure of unexpectedness, the rules are reduced to 12 in contrast to original 35, which can validate the effectiveness of filter.

Second, the different results can obtain in different concept level.

Finally, domain knowledge can be cumulated and reused. From the third level result, the number of rule increase from 0 to 12, and the rules influence the assess- ment of later rules.

4 Domain Driven Intelligent Knowledge Discovery 76

4.3.4.4 The Analysis of Experimental Results in Product Promotion (1) Support, Confidence and Their Applications in Good Promotion

For rules A→B, support represent the probability that both of A and B emerge, e. g. SP P AUB= ( ). Confidence represent in condition of A emerge, the probability of B emerge, e. g. CF P AUB P A= ( ) / ( ). Support means the universality of rules,

)RRG

VQDFNV

'ULQNV

0LON

%HYHUDJH IODYRU

'HOLFDWH EHYHUDJH

0LON

\RJXUW FRNH MXLFH

EHHU FRIIHH

EHHU

&RR NLQJ

)XVL

&RQGLPHQW

:DWHU )UR]HQ 9HJHWDEOHV

0HDW

&RRNLQJRLO 6DXFH

&KLQHVH VW\OH

ZHVWHUQ VW\OH

&KHHVH 6PDOOGHVVHUW

&UHDPFDNHV

&RRNLHV 6RIWFDNHV

5LFH

6RIWFDNHV

&RRNLHV

&UHDPFDNHV

6PDOOGHVVHUW

&KHHVH 0LON

<RJXUW FRNH MXLFH FRIIHH

:DWHU

)UR]HQ9HJHWDEOHV 0XWWRQ

3RUN

&RRNLQJRLO 6DXFH

7XQD

5LFH 6WDSOH

IRRG Fig. 4.10  A Concept Hierarchy Tree

77 4.3 Research on Unexpected Association Rule Mining of Designed …

the higher the support degree that this rule apply crowd scale is bigger, so using the rules for promotional influence was even higher. Confidence means the stability of the rules, the higher the confidence that the greater the probability of established rules. The two rules must be combined. If a rule has better support degree, but the probability of this rule is very low, this is clearly not an effective rule. In contrast, while a rule has very good confidence, but only a small percentage of people can be influenced, this rule to promotion had little practical significance. Most of the time, we need to weigh on both.

But it is not enough to only consider the support and confidence, because frequent is not necessarily interesting. For example, there are 85 % of the customers buy milk, but only 80 % of the customer and buy the beer and milk, if will confidence threshold as 80 % will get “beer, milk” this rule, but obviously it is misleading, so also need to calculate lift, which is computed by lift=( (P AUB)) / [ ( ) ( )]. P A P B Table 4.1  Results of Association Rules

Concept Support Confidence Lift Rules

Thresh- old (%) Value

(%) Thresh- old (%) Value

(%) Thresh-

old Value

1 5 6 80 80.2 1.1 1.306 water ^ cookies → Fresh milk

2 5 6 80 80.2 1.1 1.306 water ^ cookies → Fresh milk

3 8 8.1 80 85.9 1.1 1.381 Milk taste drink ^ condiment

→ Chinese style snacks

8.7 81.9 1.151 Chinese style snacks ^ condiment

→ Milk taste drink

8.4 84.2 1.202 Beverage flavor ^Western cookies

→ Dairy drink

10.2 83.5 1.188 Chinese style snacks^ Western cookies → Dairy drink 10.5 81.5 1.288 Dairy drink^ Western cookies

→ Chinese style

8.6 82.9 1.317 snacks Beverage flavor ^ fusi

→ Chinese style

8.4 82.0 1.299 snacks Beverage flavor ^ western

cookies → Chinese

12.8 82.0 1.160 style snacks Chinese style snacks

^water → Dairy

8.6 81.5 1.151 drink Beverage flavor ^ fusi

→ Dairy drink Chinese 17.7 81.2 1.146 style snacks ^ Beverage flavor

→ Dairy drink

11.1 80.2 1.128 Beverage flavor^ water → Dairy drink

14.4 80.0 1.257 Condiment → Chinese style

snacks

4 10 26.6 80 84.9 1.1 1.154 Drink^Cooking → food snacks

4 Domain Driven Intelligent Knowledge Discovery 78

It can be alternatively write as lift P B A P B= ( | ) / ( ) for P AUB( )=P A P B A( ) ( | ). This formula clearly expresses the meaning of lift: relationship between A and B. If lift > 1, A and B are positive related. If lift = 1, A and B are independent; If lift < 1, A and B negative relative. In the example above, lift of “beer, milk” is the negative relative, so the rule is not interesting. So lift can effectively identify some of the boring rules. Based on the above considerations, the test also consider the support degree, confidence and lift, Table 2 lists and at the same time through the three in- dexes of the rules, but the specific use which rules need to the three indexes balance.

(2) Analysis of Effect of Feedback Rules

This chapter identify the rules which analogy with milk taste drink ^ condiment

→ Chinese style snacks and Chinese style snacks ^ condiment → Milk taste drink as feedback rules.

This kind of feedback type of the rules of the few scholars discuss application, however an intuitive sense is the second rule compared to the first is the redundancy rules. Because milk taste drink ^ condiment → Chinese style snacks means that the customers bought dairy drinks and condiment will buy Chinese style snacks, so for the businessman only things required to do is putting together these three kinds of goods. Chinese style snacks ^ condiment → milk taste drink means the customers who bought Chinese style snacks and spices at the same time will buy milk drinks, so the choice of the businessman is also put three kinds of goods together. We can see that the second rule cannot provide new knowledge.

But this view need to discuss next, in the one case of no use rules, on another occasion can be useful. Although the two rules for commodities are redundant, they can be further used for goodions promotion field is valuable.

Promotion are making customers buy commodity which they didn't intend to buy by some ways, the condition of its success is to have promotion on an accurately located crowd. If some people would like some kind of goods, to this kind of person to promote only makes mall spend a lot of cost and no reward, even in the short time to obtain the effect also is just the consumption of the borrowed from the future, won't form the absolute sales growth. For instance, the person is like a day spent three times is 10 RMB to drink milk. Even since milk prices fell 10 % more than usual to buy a certain amount of milk, he would not drink milk six times a day, and because the buy milk more, he will end in sales promotion to reduce the number of milk, so just the impact of promotions to offset the. But mall but pay 10 % of the usual price cost but no gain. So the object of sales promotion should be those cus- tomers who were not often buy the goods, the consumption of them which increase in the sales promotion stage of consumption is absolutely consumption.

Using the above feedback rules can use to identify promotion applicable object.

To the test results, for example, suppose market will promotion dairy drink, the rules Chinese style snacks ^ condiment → Milk taste drink (A kind of customer), Beverage flavor ^Western cookies → Dairy drink (B class customers), Chinese style snacks^ Western cookies → Dairy drink (C customers), Chinese style snacks

^water → Dairy drink (D types of customers), Beverage flavor ^ fusi → Dairy drink (E types of customers), Chinese style snacks ^ Beverage flavor → Dairy drink

79

­(F­types­of­customers),­Beverage­flavor­^water­→­Dairy­drink­(G­types­of­custom- ers)­know­A,­B,­C,­D,­E,­F,­G­the­seven­types­of­customers­even­if­not­promotions­

will­also­buy­milk­drinks,­so­the­object­should­be­outside­the­seven­type­of­custom- ers.­In­extreme­cases­(assuming­no­duplication­of­seven­customer)­the­seven­class­

customers­more­than­70­%­of­the­total­customer,­rule­out­the­seven­class­is­not­sensi- tive­to­promotion­of­customer,­can­save­a­lot­of­promotion­costs­so­as­to­improve­the­

efficiency­of­sales­promotion.

But­analysis­is­not­over,­how­to­design­promotion­means­is­an­important­ques- tion.­The­most­commonly­used­method­is­to­reduce­the­price­of­milk­drinks,­this­is­

really­a­can­effectively­improve­the­means­of­sales,­but­this­kind­of­means­can’t­will­

be­seven­class­clients­from­the­demarcation­customer­groups,­and­will­cause­a­lot­of­

invalidation­of­the­promotion­cost.­A­point­of­view,­if­market­regulation­“and­at­the­

same­time­buy­milk­drinks­and­spices­can­enjoy­more­favorable­price”,­then­the­buy­

driven­by­price­and­dairy­drinks­and­condiment­customers­will­increase­in­a­certain­

degree,­the­rules­Dairy­drink^­condiment­→­Chinese­style­snacks­can­infer­that­

clients­who­Chinese­style­snacks­would­increase,­again­by­its­inverse­rules­Chinese­

style­snacks­^­condiment­→­Milk­taste­drink,­Chinese­style­snacks^­Western­cook- ies­→­Dairy­drink,­Chinese­style­snacks­^­Beverage­flavor­→­Dairy­drink

It­can­be­seen­that­the­milk­beverage­clients­would­increase,­that­is­some­was­not­

going­to­buy­milk­beverage­customers­in­these­rules­under­the­function­of­the­dairy­

drink­also­buy,­but­this­kind­of­customer­is“­and­at­the­same­time­buy­milk­drinks­

and­spices­can­enjoy­preferential­”article­this­promotion­rule­to­attract­may­further­

purchase­was­not­going­to­buy­the­dressing,­such­rules­Chinese­style­snacks­^­con- diment­→­milk­taste­drink­will­enter­into­a­new­round­of­growth,­so­cycle­down.­

In­this­way­is­realized­the­purpose­of­sales­promotion­and­at­the­same­time­will­not­

sensitive­to­promotion­seven­categories­of­people­effectively­ruled­out,­and­greatly­

reduce­the­cost­of­sales­promotion.­The­rules­of­visible­feedback­not­the­redundancy­

rules­in­business­but­has­an­important­application­value.

This­part­use­the­supermarket­with­a­data­driven­to­domain­knowledge­can­be­

prescribed­concept­levels­of­accident­association­rule­mining­algorithm­on­empiri- cal­research.­The­test­results­show­that:

(1)­ The­accident­intensity­index­can­effectively­filter­the­redundancy­rules,­and,­to­

some­extent,­guarantee­the­novelty­of­the­rules.

(2)­ In­the­field­of­knowledge­driving­can­be­prescribed­concept­levels­of­accident­

association­rule­mining­algorithm­to­user­specifies­the­specific­levels­for­min- ing,­and­get­more­close­to­the­needs­of­the­user­results.

(3)­ In­the­field­of­knowledge­driving­can­be­prescribed­concept­levels­of­accident­

association­rule­mining­algorithm­can­realize­the­knowledge­accumulation­and­

reuse

And­then­this­article­from­the­point­of­view­of­real­application­of­association­rules­

do­some­of­the­evaluation­indexes­analytical,­and­analyses­the­feedback­mode­in­

goods­ promotion­ association­ rules­ in­ an­ important­ role.­ Conclusion­ is­ feedback­

mode­rules­can­help­sales­promotion­influencing­on­more­accurate­role­to­target­

people,­and­then­can­increase­much­more­goods­promotion­effect.

4.3­ Research­on­Unexpected­Association­Rule­Mining­of­Designed­…­

4 Domain Driven Intelligent Knowledge Discovery 80

Một phần của tài liệu IT training intelligent knowledge a study beyond data mining shi, zhang, tian li 2015 06 14 (Trang 87 - 93)

Tải bản đầy đủ (PDF)

(160 trang)