Learning Management Marketing and Customer Support_3 ppt

Market Basket Analysis and Association Rules Each customer purchases a different set of products, in different quantities, at different times.. The data mining technique most closely all

Trang 1

278 Chapter 8

Furthermore, there is a general pattern of zip codes increasing from East to West Codes that start with 0 are in New England and Puerto Rico; those beginning with 9 are on the west coast This suggests a distance function that approximates geographic distance by looking at the high order digits of the zip code

■■ dzip(A,B) = 0.0 if the zip codes are identical

■■ dzip(A,B) = 0.1 if the first three digits are identical (e.g., “20008” and

“20015”

■■ dzip(A,B) = 0.5 if the first digits are identical (e.g., “95050” and “98125”)

■■ dzip(A,B) = 1.0 if the first digits are not identical (e.g., “02138” and

“94704”)

Of course, if geographic distance were truly of interest, a better approach would be to look up the latitude and longitude of each zip code in a table and calculate the distances that way (it is possible to get this information for the United States from www.census.gov) For many purposes however, geographic proximity is not nearly as important as some other measure of similarity 10011 and 10031 are both in Manhattan, but from a marketing point of view, they don’t have much else in common, because one is an upscale downtown neighborhood and the other is a working class Harlem neighborhood On the other hand 02138 and 94704 are on opposite coasts, but are likely to respond very similarly to direct mail from a political action committee, since they are for Cambridge, MA and Berkeley, CA respectively

This is just one example of how the choice of a distance metric depends on the data mining context There are additional examples of distance and similarity measures in Chapter 11 where they are applied to clustering

When a Distance Metric Already Exists

There are some situations where a distance metric already exists, but is difficult to spot These situations generally arise in one of two forms Sometimes, a function already exists that provides a distance measure that can be adapted for use in MBR The news story case study provides a good example of adapting an existing function, the relevance feedback score, for use as a distance function

Other times, there are fields that do not appear to capture distance, but can

be pressed into service An example of such a hidden distance field is solicitation history Two customers who were chosen for a particular solicitation in the past are “close,” even though the reasons why they were chosen may no longer be available; two who were not chosen, are close, but not as close; and one that was chosen and one that was not are far apart The advantage of this metric is that it can incorporate previous decisions, even if the basis for the

Trang 2

Memory-Based Reasoning and Collaborative Filtering 279

decisions is no longer available On the other hand, it does not work well for customers who were not around during the original solicitation; so some sort

of neutral weighting must be applied to them

Considering whether the original customers responded to the solicitation can extend this function further, resulting in a solicitation metric like:

■■ dsolicitation(A, B) = 0, when A and B both responded to the solicitation

■■ dsolicitation

■■ dsolicitation(A, B) = 0.2, when neither A nor B was chosen, but both were available in the data

■■ dsolicitation

■■ dsolicitation(A, B) = 0.3, when one or both were not considered

■■ dsolicitation(A, B) = 1.0, when one was chosen and the other was not

Of course, the particular values are not sacrosanct; they are only meant as a guide for measuring similarity and showing how previous information and response histories can be incorporated into a distance function

The Combination Function: Asking the Neighbors for the Answer

The distance function is used to determine which records comprise the neigh

borhood This section presents different ways to combine data gathered from those neighbors to make a prediction At the beginning of this chapter, we estimated the median rent in the town of Tuxedo, by taking an average

of the median rents in similar towns In that example, averaging was the combination function This section explores other methods of canvassing the neighborhood

The Basic Approach: Democracy

One common combination function is for the k nearest neighbors to vote on an

answer—”democracy” in data mining When MBR is used for classification, each neighbor casts its vote for its own class The proportion of votes for each class is an estimate of the probability that the new record belongs to the corre

sponding class When the task is to assign a single class, it is simply the one with the most votes When there are only two categories, an odd number of

neighbors should be poled to avoid ties As a rule of thumb, use c+1 neighbors

when there are c categories to ensure that at least one class has a plurality

Trang 3

mine if the new record is active or inactive by using different values of k for

two distance functions, deuclid and dnorm (Table 8.13)

The question marks indicate that no prediction has been made due to a tie

among the neighbors Notice that different values of k do affect the classifica

tion This suggests using the percentage of neighbors in agreement to provide the level of confidence in the prediction (Table 8.14)

Table 8.12 Customers with Attrition History

d sum 4,3,5,2,1 Y,Y,N,Y,N yes yes yes yes yes

Table 8.14 Attrition Prediction with Confidence

K = 1 K = 2 K = 3 K = 4 K = 5

d sum yes, 100% yes, 100% yes, 67% yes, 75% yes, 60%

Trang 4

The confidence level works just as well when there are more than two categories However, with more categories, there is a greater chance that no single category will have a majority vote One of the key assumptions about MBR (and data mining in general) is that the training set provides sufficient infor

mation for predictive purposes If the neighborhoods of new cases consistently produce no obvious choice of classification, then the data simply may not con

tain the necessary information and the choice of dimensions and possibly of the training set needs to be reevaluated By measuring the effectiveness of MBR on the test set, you can determine whether the training set has a sufficient number of examples

WA R N I N G MBR is only as good as the training set it uses To measure whether the training set is effective, measure the results of its predictions on the test set using two, three, and four neighbors If the results are inconclusive

or inaccurate, then the training set is not large enough or the dimensions and distance metrics chosen are not appropriate

Weighted Voting

Weighted voting is similar to voting in the previous section except that the neighbors are not all created equal—more like shareholder democracy than one-person, one-vote The size of the vote is inversely proportional to the dis

tance from the new record, so closer neighbors have stronger votes than neigh

bors farther away do To prevent problems when the distance might be 0, it is common to add 1 to the distance before taking the inverse Adding 1 also makes all the votes between 0 and 1

Table 8.15 applies weighted voting to the previous example The “yes, customer will become inactive” vote is the first; the “no, this is a good customer” vote is second

Weighted voting has introduced enough variation to prevent ties The confidence level can now be calculated as the ratio of winning votes to total votes (Table 8.16)

Table 8.15 Attrition Prediction with Weighted Voting

Trang 5

282 Chapter 8

Table 8.16 Confidence with Weighted Voting

d sum yes, 100% yes, 100% yes, 69% yes, 76% yes, 62%

In this case, weighting the votes has only a small effect on the results and the confidence The effect of weighting is largest when some neighbors are considerably further away than others

Weighting can also be applied to estimation by replacing the simple average

of neighboring values with an average weighted by distance This approach is used in collaborative filtering systems, as described in the following section

Collaborative Filtering: A Nearest Neighbor Approach to Making Recommendations

Neither of the authors considers himself a country music fan, but one of them

is the proud owner of an autographed copy of an early Dixie Chicks CD The Chicks, who did not yet have a major record label, were performing in a local bar one day and some friends who knew them from Texas made a very enthusiastic recommendation The performance was truly memorable, featuring Martie Erwin’s impeccable Bluegrass fiddle, her sister Emily on a bewildering variety of other instruments (most, but not all, with strings), and the seductive vocals of Laura Lynch (who also played a stand-up electric bass) At the break, the band sold and autographed a self-produced CD that we still like better than the one that later won them a Grammy What does this have to do with nearest neighbor techniques? Well, it is a human example of collaborative filtering A recommendation from trusted friends will cause one to try something one otherwise might not try

Collaborative filtering is a variant of memory-based reasoning particularly well suited to the application of providing personalized recommendations A collaborative filtering system starts with a history of people’s preferences The distance function determines similarity based on overlap of preferences— people who like the same thing are close In addition, votes are weighted by distances, so the votes of closer neighbors count more for the recommendation In other words, it is a technique for finding music, books, wine, or anything else that fits into the existing preferences of a particular person by using the judgments of a peer group selected for their similar tastes This approach

is also called social information filtering

Team-Fly®

Trang 6

Collaborative filtering automates the process of using word-of-mouth to decide whether they would like something Knowing that lots of people liked

something is not enough Who liked it is also important Everyone values some

recommendations more highly than others The recommendation of a close friend whose past recommendations have been right on target may be enough

to get you to go see a new movie even if it is in a genre you generally dislike

On the other hand, an enthusiastic recommendation from a friend who thinks

Ace Ventura: Pet Detective is the funniest movie ever made might serve to warn

you off one you might otherwise have gone to see

Preparing recommendations for a new customer using an automated collaborative filtering system has three steps:

1 Building a customer profile by getting the new customer to rate a selec

tion of items such as movies, songs, or restaurants

2 Comparing the new customer’s profile with the profiles of other cus

tomers using some measure of similarity

3 Using some combination of the ratings of customers with similar pro

files to predict the rating that the new customer would give to items he

or she has not yet rated

The following sections examine each of these steps in a bit more detail

Building Profiles

One challenge with collaborative filtering is that there are often far more items

to be rated than any one person is likely to have experienced or be willing to rate That is, profiles are usually sparse, meaning that there is little overlap among the users’ preferences for making recommendations Think of a user profile as a vector with one element per item in the universe of items to be rated Each element of the vector represents the profile owner’s rating for the corresponding item on a scale of –5 to 5 with 0 indicating neutrality and null values for no opinion

If there are thousands or tens of thousands of elements in the vector and each customer decides which ones to rate, any two customers’ profiles are likely to end up with few overlaps On the other hand, forcing customers to rate a particular subset may miss interesting information because ratings of more obscure items may say more about the customer than ratings of common ones A fondness for the Beatles is less revealing than a fondness for Mose Allison

A reasonable approach is to have new customers rate a list of the twenty or

so most frequently rated items (a list that might change over time) and then free them to rate as many additional items as they please

Trang 7

284 Chapter 8

Comparing Profiles

Once a customer profile has been built, the next step is to measure its distance from other profiles The most obvious approach would be to treat the profile vectors as geometric points and calculate the Euclidean distance between them, but many other distance measures have been tried Some give higher weight to agreement when users give a positive rating especially when most users give negative ratings to most items Still others apply statistical correlation tests to the ratings vectors

Making Predictions

The final step is to use some combination of nearby profiles in order to come

up with estimated ratings for the items that the customer has not rated One approach is to take a weighted average where the weight is inversely proportional to the distance The example shown in Figure 8.7 illustrates estimating

the rating that Nathaniel would give to Planet of the Apes based on the opinions

of his neighbors, Simon and Amelia

Crouching Tiger

–1

Osmosis Jones

Crouching Tiger

–4

Osmosis Jones

P eter

Jenn y

Apocalypse Now Vertical Ray of Sun

Planet Of The Apes

American Pie 2 Plan 9 From Outer Space

Apocalypse Now Vertical Ray of Sun

Planet Of The Apes

American Pie 2 Plan 9 From Outer Space

Figure 8.7 The predicted rating for Planet of the Apes is –2.66

Trang 8

Simon, who is distance 2 away, gave that movie a rating of –1 Amelia, who

is distance 4 away, gave that movie a rating of –4 No one else’s profile is close enough to Nathaniel’s to be included in the vote Because Amelia is twice as far away as Simon, her vote counts only half as much as his The estimate for Nathaniel’s rating is weighted by the distance:

(1⁄2(–1) + 1⁄4 (–4)) / (1⁄2+1⁄4)= –1.5/0.75= –2

A good collaborative filtering system gives its users a chance to comment on the predictions and adjust the profile accordingly In this example, if Nathaniel

rents the video of Planet of the Apes despite the prediction that he will not like

it, he can then enter an actual rating of his own If it turns out that he really likes the movie and gives it a rating of 4, his new profile will be in a slightly different neighborhood and Simon’s and Amelia’s opinions will count less for Nathaniel’s next recommendation

Lessons Learned

Memory based reasoning is a powerful data mining technique that can be used

to solve a wide variety of data mining problems involving classification or estimation Unlike other data mining techniques that use a training set of pre-classified data to create a model and then discard the training set, for MBR, the

training set essentially is the model

Choosing the right training set is perhaps the most important step in MBR The training set needs to include sufficient numbers of examples all possible classifications This may mean enriching it by including a disproportionate number of instances for rare classifications in order to create a balanced train

ing set with roughly the same number of instances for all categories A training set that includes only instances of bad customers will predict that all cus

tomers are bad In general, the size of the training set should have at least thou

sands, if not hundreds of thousands or millions, of examples

MBR is a k-nearest neighbors approach Determining which neighbors are

near requires a distance function There are many approaches to measuring the distance between two records The careful choice of an appropriate distance function is a critical step in using MBR The chapter introduced an approach to creating an overall distance function by building a distance function for each field and normalizing it The normalized field distances can then be combined

in a Euclidean fashion or summed to produce a Manhattan distance

When the Euclidean method is used, a large difference in any one field is enough to cause two records to be considered far apart The Manhattan method

is more forgiving—a large difference on one field can more easily be offset by close values on other fields A validation set can be used to pick the best dis

tance function for a given model set by applying all candidates to see which

Trang 9

286 Chapter 8

produces better results Sometimes, the right choice of neighbors depends on modifying the distance function to favor some fields over others This is easily accomplished by incorporating weights into the distance function

The next question is the number of neighbors to choose Once again, investigating different numbers of neighbors using the validation set can help determine the optimal number There is no right number of neighbors The number depends on the distribution of the data and is highly dependent on the problem being solved

The basic combination function, weighted voting, does a good job for categorical data, using weights inversely proportional to distance The analogous operation for estimating numeric values is a weighted average

One good application for memory based reasoning is making recommendations Collaborative filtering is an approach to making recommendations that works by grouping people with similar tastes together using a distance function that can compare two lists user-supplied ratings Recommendations for a new person are calculated using a weighted average of the ratings of his or her nearest neighbors

Trang 10

Market Basket Analysis and Association Rules

Each customer purchases a different set of products, in different quantities,

at different times Market basket analysis uses the information about what cus

tomers purchase to provide insight into who they are and why they make cer

tain purchases Market basket analysis provides insight into the merchandise

by telling us which products tend to be purchased together and which are most amenable to promotion This information is actionable: it can suggest new store layouts; it can determine which products to put on special; it can indicate when to issue coupons, and so on When this data can be tied to indi

vidual customers through a loyalty card or Web site registration, it becomes even more valuable

The data mining technique most closely allied with market basket analysis

is the automatic generation of association rules Association rules represent patterns in the data without a specified target As such, they are an example of undirected data mining Whether the patterns make sense is left to human interpretation

287

Trang 11

288 Chapter 9

In this shopping basket, the shopper purchased

a quart of orange juice, some bananas, dish detergent, some window cleaner, and a six

pack of soda

Is soda typically purchased with bananas? Does the brand of soda demographics of the

What should be in the

make a difference?

How do the neighborhood affect what customers buy?

basket but is not? Are window cleaning products

purchased when detergent and orange juice are bought together?

Figure 9.1 Market basket analysis helps you understand customers as well as items that

are purchased together

Association rules were originally derived from point-of-sale data that describes what products are purchased together Although its roots are in analyzing point-of-sale transactions, association rules can be applied outside the retail industry to find relationships among other types of “baskets.” Some examples of potential applications are:

■■ Items purchased on a credit card, such as rental cars and hotel rooms, provide insight into the next product that customers are likely to purchase

■■ Optional services purchased by telecommunications customers (call waiting, call forwarding, DSL, speed call, and so on) help determine how to bundle these services together to maximize revenue

■■ Banking services used by retail customers (money market accounts, CDs, investment services, car loans, and so on) identify customers likely to want other services

■■ Unusual combinations of insurance claims can be a sign of fraud and can spark further investigation

■■ Medical patient histories can give indications of likely complications based on certain combinations of treatments

Association rules often fail to live up to expectations In our experience, for instance, they are not a good choice for building cross-selling models in

Trang 12

Market Basket Analysis and Association Rules 289

industries such as retail banking, because the rules end up describing previous marketing promotions Also, in retail banking, customers typically start with a checking account and then a savings account Differentiation among products does not appear until customers have more products This chapter covers the pitfalls as well as the uses of association rules

The chapter starts with an overview of market basket analysis, including more basic analyses of market basket data that do not require association rules

It then dives into association rules, explaining how they are derived The chapter then continues with ways to extend association rules to include other facets

of the market basket analysis

Defining Market Basket Analysis

Market basket analysis does not refer to a single technique; it refers to a set of business problems related to understanding point-of-sale transaction data The most common technique is association rules, and much of this chapter delves into that subject Before talking about association rules, this section talks about market basket data

Three Levels of Market Basket Data

Market basket data is transaction data that describes three fundamentally different entities:

etc

ORDER

ORDER ID

SHIPPING COST etc

NAME ADDRESS

LINE ITEM ID PRODUCT ID QUANTITY

GIFT WRAP FLAG TAXABLE FLAG

CUSTOMER ID ORDER DATE PAYMENT TYPE TOTAL VALUE SHIP DATE

CUSTOMER

CUSTOMER ID

PRODUCT

PRODUCT ID CATEGORY SUBCATEGORY

Figure 9.2 A data model for transaction-level market basket data typically has three

tables, one for the customer, one for the order, and one for the order line

Trang 13

290 Chapter 9

The order is the fundamental data structure for market basket data An

order represents a single purchase event by a customer This might correspond

to a customer ordering several products on a Web site or to a customer purchasing a basket of groceries or to a customer buying a several items from a catalog This includes the total amount of the purchase, the total amount, additional shipping charges, payment type, and whatever other data is relevant about the transaction Sometimes the transaction is given a unique identifier Sometimes the unique identifier needs to be cobbled together from other data

In one example, we needed to combine four fields to get an identifier for purchases in a store—the timestamp when the customer paid, chain ID, store ID, and lane ID

Individual items in the order are represented separately as line items This

data includes the price paid for the item, the number of items, whether tax should be charged, and perhaps the cost (which can be used for calculating

margin) The item table also typically has a link to a product reference table,

which provides more descriptive information about each product This descriptive information should include the product hierarchy and other information that might prove valuable for analysis

The customer table is an optional table and should be available when a cus

tomer can be identified, for example, on a Web site that requires registration or when the customer uses an affinity card during the transaction Although the customer table may have interesting fields, the most powerful element is the

ID itself, because this can tie transactions together over time

Tracking customers over time makes it possible to determine, for instance, which grocery shoppers “bake from scratch”—something of keen interest to the makers of flour as well as prepackaged cake mixes Such customers might

be identified from the frequency of their purchases of flour, baking powder, and similar ingredients, the proportion of such purchases to the customer’s total spending, and the lack of interest in prepackaged mixes and ready-to-eat desserts Of course, such ingredients may be purchased at different times and

in different quantities, making it necessary to tie together multiple transactions over time

All three levels of market basket data are important For instance, to understand orders, there are some basic measures:

■■ What is the average number of orders per customer?

■■ What is the average number of unique items per order?

■■ What is the average number of items per order?

■■ For a given product, what is the proportion of customers who have ever purchased the product?

Trang 14

■■ For a given product, what is the average number of orders per cus

tomer that include the item?

■■ For a given product, what is the average quantity purchased in an order when the product is purchased?

These measures give broad insight into the business In some cases, there are few repeat customers, so the proportion of orders per customer is close to 1; this suggests a business opportunity to increase the number of sales per customers Or, the number of products per order may be close to 1, suggesting an opportunity for cross-selling during the process of making an order

It can be useful to compare these measures to each other We have found that the number of orders is often a useful way of differentiating among customers; good customers clearly order more often than not-so-good customers Figure 9.3 attempts to look at the breadth of the customer relationship (the number of unique items ever purchased) by the depth of the relationship (the number of orders) for customers who purchased more than one item This data is from a small specialty retailer The biggest bubble shows that many customers who purchase two products do so at the same time There is also a surprisingly large bubble showing that a sizeable number of customers purchase the same product in two orders Better customers—at least those who returned multiple times—tend to purchase a greater diversity of goods However, some of them are returning and buying the same thing they bought the first time How can the retailer encourage customers to come back and buy more and different products? Market basket analysis cannot answer the question, but it can at least motivate asking it and perhaps provide hints that might help

Trang 15

292 Chapter 9

Order Characteristics

Customer purchases have additional interesting characteristics For instance, the average order size varies by time and region—and it is useful to keep track

of these to understand changes in the business environment Such information

is often available in reporting systems, because it is easily summarized Some information, though, may need to be gleaned from transaction-level data Figure 9.4 breaks down transactions by the size of the order and the credit card used for payment—Visa, MasterCard, or American Express—for another retailer The first thing to notice is that the larger the order, the larger the average purchase amount, regardless of the credit card being used This is reassuring Also, the use of one credit card type, American Express, is consistently associated with larger orders—an interesting finding about these customers

For Web purchases and mail-order transactions, additional information may also be gathered at the point of sale:

■■ Did the order use gift wrap?

■■ Is the order going to the same address as the billing address?

■■ Did the purchaser accept or decline a particular cross-sell offer?

Of course, gathering information at the point of sale and having it available for analysis are two different things However, gift giving and responsiveness

to cross-sell offers are two very useful things to know about customers Finding patterns with this information requires collecting the information in the first place (at the call center or through the online interface) and then moving

it to a data mining environment

Number of Items Purchased

Figure 9.4 This chart shows the average amount spent by credit card type based on the

number of items in the order for one particular retailer

Team-Fly®

Trang 16

Item Popularity

What are the most popular items? This is a question that can usually be answered by looking at inventory curves, which can be generated without having to work with transaction-level data However, knowing the sales of an individual item is only the beginning There are related questions:

■■

purchasers?

■■ How has the popularity of particular items changed over time?

■■ How does the popularity of an item vary regionally?

The first three questions are particularly interesting because they may suggest ideas for growing customer relationships Association rules can provide answers to these questions, particularly when used with virtual items to represent the size of the order or the number of orders a customer has made

The last two questions bring up the dimensions of time and geography, which are very important for applications of market basket analysis Different products have different affinities in different regions—something that retailers are very familiar with It is also possible to use association rules to start to understand these areas, by introducing virtual items for region and seasonality

T I P Time and geography are two of the most important attributes of market basket data, because they often point to the exact marketing conditions at the time of the sale

Tracking Marketing Interventions

As discussed in Chapter 5, looking at individual products over time can provide a good understanding of what is happening with the product Including marketing interventions along with the product sales over time, as in Figure 9.5, makes it possible to see the effect of the interventions The chart shows a sales curve for a particular product Prior to the intervention, sales are hovering at 50 units per week After the intervention, they peak at about seven or eight times that amount, before gently sliding down over the six or seven weeks Using such charts, it can be possible to measure the response of the marketing effort

Trang 17

Figure 9.5 Showing marketing interventions and product sales on the same chart makes

it possible to see effects of marketing efforts

Such analysis does not require looking at individual market baskets—daily

or weekly summaries of product sales are sufficient However, it does require knowing when marketing interventions take place—and sometimes getting such a calendar is the biggest challenge One of the questions that such a chart can answer is the effect of the intervention A challenge in answering this question is determining whether the additional sales are incremental or are made

by customers who would purchase the product anyway at some later time Market basket data can start to answer this question In addition to looking

at the volume of sales after an intervention, we can also look at the number of baskets containing the item If the number of customers is not increasing, there

is evidence that existing customers are simply stocking up on the item at a lower cost

A related question is whether discounting results in additional sales of other products Association rules can help answer this question by finding combinations of products that include those being promoted during the period of the promotion Similarly, we might want to know if the average size of orders increases or decreases after an intervention These are examples of questions where more detailed transaction level data is important

Clustering Products by Usage

Perhaps one of the most interesting questions is what groups of products often appear together Such groups of products are very useful for making recommendations to customers—customers who have purchased some of the products may be interested in the rest of them (Chapter 8 talks about product

Tiêu đề	Learning Management Marketing and Customer Support
Trường học	University of Data Science
Chuyên ngành	Data Mining
Thể loại	PowerPoint presentation
Năm xuất bản	2023
Thành phố	Unknown

Định dạng
Số trang	34
Dung lượng	1,27 MB