John wiley sons data mining techniques for marketing sales_13 docx

Hazards and survival curves also provide snapshots of customers and their life cycles, answering questions such as: “How much should we worry that this customer is going to leave in the

Trang 1

380 Chapter 11

Using Thematic Clusters to Adjust Zone Boundaries

The goal of the clustering project was to validate editorial zones that already existed Each editorial zone consisted of a set of towns assigned one of the four clusters described above The next step was to manually increase each zone’s purity by swapping towns with adjacent zones For example, Table 11.1 shows that all of the towns in the City zone are in Cluster 1B except Brookline, which

is Cluster 2 In the neighboring West 1 zone, all the towns are in Cluster 2 except for Waltham and Watertown which are in Cluster 1B Swapping Brook

line into West 1 and Watertown and Waltham into City would make it possible for both editorial zones to be pure in the sense that all the towns in each zone would share the same cluster assignment The new West 1 would be all Cluster 2, and the new City would be all Cluster 1B As can be seen in the map

in Figure 11.12, the new zones are still geographically contiguous

Having editorial zones composed of similar towns makes it easier for the

Globe to provide sharper editorial focus in its localized content, which should

lead to higher circulation and better advertising sales

Table 11.1 Towns in the City and West 1 Editorial Zones

Trang 2

Automatic cluster detection is an undirected data mining technique that can

be used to learn about the structure of complex databases By breaking com

plex datasets into simpler clusters, automatic clustering can be used to improve the performance of more directed techniques By choosing different distance measures, automatic clustering can be applied to almost any kind of data It is as easy to find clusters in collections of news stories or insurance claims as in astronomical or financial data

Clustering algorithms rely on a similarity metric of some kind to indicate whether two records are close or distant Often, a geometric interpretation of distance is used, but there are other possibilities, some of which are more appropriate when the records to be clustered contain non-numeric data

One of the most popular algorithms for automatic cluster detection is K-means The K-means algorithm is an iterative approach to finding K clusters based on distance The chapter also introduced several other clustering algo

rithms Gaussian mixture models, are a variation on the K-means idea that allows for overlapping clusters Divisive clustering builds a tree of clusters by successively dividing an initial large cluster Agglomerative clustering starts with many small clusters and gradually combines them until there is only one cluster left Divisive and agglomerative approaches allow the data miner to use external criteria to decide which level of the resulting cluster tree is most useful for a particular application

This chapter introduced some technical measures for cluster fitness, but the most important measure for clustering is how useful the clusters turn out to be for furthering some business goal

Trang 3

470643 c11.qxd 3/8/04 11:17 AM Page 382

Team-Fly®

Trang 4

quently associated with marketing

If so, this is a shame Survival analysis, which is also called time-to-event analysis, is nothing to worry about Exactly the opposite: survival analysis is very valuable for understanding customers Although the roots and terminol

ogy come from medical research and failure analysis in manufacturing, the

concepts are tailor made for marketing Survival tells us when to start worry

ing about customers doing something important, such as ending their rela

tionship It tells us which factors are most correlated with the event Hazards and survival curves also provide snapshots of customers and their life cycles, answering questions such as: “How much should we worry that this customer

is going to leave in the near future?” or “This customer has not made a pur

chase recently; is it time to start worrying that the customer will not return?”

The survival approach is centered on the most important facet of customer behavior: tenure How long customers have been around provides a wealth of information, especially when tied to particular business problems How long customers will remain customers in the future is a mystery, but a mystery that past customer behavior can help illuminate Almost every business recognizes the value of customer loyalty As we see later in this chapter, a guiding principle

383

Trang 5

384 Chapter 12

of loyalty—that the longer customers stay around, the less likely they are to stop

at any particular point in time—is really a statement about hazards

The world of marketing is a bit different from the world of medical research For one thing, the consequences of our actions are much less dire: a patient may die from poor treatment, whereas the consequences in marketing are merely measured in dollars and cents Another important difference is the volume of data The largest medical studies have a few tens of thousands of participants, and many draw conclusions from a just a few hundred When trying

to determine mean time between failure (MTBF) or mean time to failure (MTTF)—manufacturing lingo for how long to wait until an expensive piece of machinery breaks down—conclusions are often based on no more than a few dozen failures

In the world of customers, tens of thousands is the lower limit, since customer databases often contain data on millions of customers and former customers Much of the statistical background of survival analysis is focused

on extracting every last bit of information out of a few hundred data points In data mining applications, the volumes of data are so large that statistical concerns about confidence and accuracy are replaced by concerns about managing large volumes of data

The importance of survival analysis is that it provides a way of understanding time-to-event characteristics, such as:

■■ When a customer is likely to leave

■■ The next time a customer is likely to migrate to a new customer segment

■■ The next time a customer is likely to broaden or narrow the customer relationship

■■ The factors in the customer relationship that increase or decrease likely tenure

■■ The quantitative effect of various factors on customer tenure These insights into customers feed directly into the marketing process They make it possible to understand how long different groups of customers are likely to be around—and hence how profitable these segments are likely to be They make it possible to forecast numbers of customers, taking into account both new acquisition and the decline of the current base Survival analysis also makes it possible to determine which factors, both those at the beginning

of customers’ relationships as well as later experiences, have the biggest effect

on customers’ staying around the longest And, the analysis can be applied to things other then the end of the customer tenure, making it possible to determine when another event—such as a customer returning to a Web site—is no longer likely to occur

A good place to start with survival is with visualizing customer retention, which is a rough approximation of survival After this discussion, we move

on to hazards, the building blocks of survival These are in turn combined into

Trang 6

survival curves, which are similar to retention curves but more useful The chapter ends with a discussion of Cox Proportional Hazard Regression and other applications of survival analysis Along the way, the chapter provides particular applications of survival in the business context As with all statisti

cal methods, there is a depth to survival that goes far beyond this introductory chapter, which is consciously trying to avoid the complex mathematics under

lying these techniques

Customer Retention

Customer retention is a concept familiar to most businesses that are concerned about their customers, so it is a good place to start Retention is actually a close approximation to survival, especially when considering a group of customers who all start at about the same time Retention provides a familiar framework

to introduce some key concepts of survival analysis such as customer half-life and average truncated customer tenure

Calculating Retention

How long do customers stay around? This seemingly simple question becomes more complicated when applied to the real world Understanding customer retention requires two pieces of information:

■■ When each customer started

■■ When each customer stopped The difference between these two values is the customer tenure, a good measurement of customer retention

Any reasonable database that purports to be about customers should have this data readily accessible Of course, marketing databases are rarely simple There are two challenges with these concepts The first challenge is deciding

on what is a start and stop, a decision that often depends on the type of busi

ness and available data The second challenge is technical: finding these start and stop dates in available data may be less obvious than it first appears

For subscription and account-based businesses, start and stop dates are well understood Customers start magazine subscriptions at a particular point in time and end them when they no longer want to pay for the magazine Customers sign up for telephone service, a banking account, ISP service, cable service, an insurance policy, or electricity service on a particular date and cancel on another date In all of these cases, the beginning and end of the rela

tionship is well defined

Other businesses do not have such a continuous relationship This is particularly true of transactional businesses, such as retailing, Web portals, and cata

logers, where each customer’s purchases (or visits) are spread out over time—or

Trang 7

386 Chapter 12

may be one-time only The beginning of the relationship is clear—usually the first purchase or visit to a Web site The end is more difficult but is sometimes created through business rules For instance, a customer who has not made a purchase in the previous 12 months may be considered lapsed Customer retention analysis can produce useful results based on these definitions A similar area of application is determining the point in time after which a customer is no longer likely to return (there is an example of this later in the chapter)

The technical side can be more challenging Consider magazine subscriptions Do customers start on the date when they sign up for the subscription?

Do customers start when the magazine first arrives, which may be several weeks later? Or do they start when the promotional period is over and they start paying?

Although all three questions are interesting aspects of the customer relationship, the focus is usually on the economic aspects of the relationship Costs and/or revenue begin when the account starts being used—that is, on the issue date of the magazine—and end when the account stops For understanding customers, it is definitely interesting to have the original contact date and time,

in addition to the first issue date (are customers who sign up on weekdays different from customers who sign up on weekends?), but this is not the beginning

of the economic relationship As for the end of the promotional period, this is

really an initial condition or time-zero covariate on the customer relationship

When the customer signs up, the initial promotional period is known Survival analysis can take advantage of such initial conditions for refining models

What a Retention Curve Reveals

Once tenures can be calculated, they can be plotted on a retention curve, which

shows the proportion of customers that are retained for a particular period of time This is actually a cumulative histogram, because customers who have tenures of 3 months are included in the proportions for 1 month and 2 months Hence, a retention curve always starts at 100 percent

For now, let’s assume that all customers start at the same time Figure 12.1, for instance, compares the retention of two groups of customers who started at about the same point in time 10 years ago The points on the curve show the proportion of customers who were retained for 1 year, for 2 years, and so on Such a curve starts at 100 percent and gradually slopes downward When a retention curve represents customers who all started at about the same time—

as in this case—it is a close approximation to the survival curve

Differences in retention among different groups are clearly visible in the chart These differences can be quantified The simplest measure is to look at retention at particular points in time After 10 years, for instance, 24 percent of the regular customers are still around, and only about a third of them even make it to 5 years Premium customers do much better Over half make it to 5 years, and 42 percent have a customer lifetime of at least 10 years

Trang 8

Tenure (Months after Start)

Figure 12.1 Retention curves show that high-end customers stay around longer

Another way to compare the different groups is by asking how long it takes for half the customers to leave—the customer half-life (although the statistical

term is the median customer lifetime) The median is a useful measure because

the few customers who have very long or very short lifetimes do not affect it

In general, medians are not sensitive to a few outliers

Figure 12.2 illustrates how to find the customer half-life using a retention curve This is the point where exactly 50 percent of the customers remain, which is where the 50 percent horizontal grid line intersects the retention curve The customer half-life for the two groups shows a much starker differ

ence than the 10-year survival—the premium customers have a median life

time of close to 7 years, whereas the regular customers have a median a bit under over 2 years

Finding the Average Tenure from a Retention Curve

The customer half-life is useful for comparisons and easy to calculate, so it is a valuable tool It does not, however, answer an important question: “How much, on average, were customers worth during this period of time?” Answering this question requires having an average customer worth per time and an average retention for all the customers The median cannot provide this information because the median only describes what happens to the one cus

tomer in the middle; the customer at exactly the 50 percent rank A question

about average customer worth requires an estimate of the average remaining

lifetime for all customers

There is an easy way to find the average remaining lifetime: average customer lifetime during the period is the area under the retention curve There is

a clever way of visualizing this calculation, which Figure 12.3 walks through

Trang 9

Figure 12.2 The median customer lifetime is where the retention curve crosses the

50 percent point

First, imagine that the customers all lie down with their feet lined up on the left Their heads represent their tenure, so there are customers of all different heights (or widths, because they are horizontal) for customers of all different tenures For the sake of visualization, the longer tenured customers lie at the bottom holding up the shorter tenured ones The line that connects their noses counts the number of customers who are retained for a particular period of time (remember the assumption that all customers started at about the same point in time) The area under this curve is the sum of all the customers’ tenures, since every customer lying horizontally is being counted Dividing the vertical axis by the total count produces a retention curve Instead of count, there is a percentage The area under the curve is the total tenure divided by the count of customers—voilà, the average customer tenure during the period of time covered by the chart

T I P The area under the customer retention curve is the average customer

This simple observation explains how to obtain an estimate of the average customer lifetime There is one caveat when some customers are still active The average is really an average for the period of time under the retention curve Consider the earlier retention curve in this chapter These retention curves

were for 10 years, so the area under the curves is an estimate of the average cus tomer lifetime during the first 10 years of their relationship For customers who are still

active at 10 years, there is no way of knowing whether they will all leave at 10 years plus one day; or if they will all stick around for another century For this reason, it is not possible to determine the real average until all customers have left

Trang 10

time

A group of customers with different tenures are stacked on top of each other Each bar represents one customer

At each point in time, the edges count the number of customers active at that time

Notice that the sum of all the areas is the sum of all the customer tenures

The area under the retention curve is the average customer tenure

Figure 12.3 Average customer tenure is calculated from the area under the retention curve

This value, called truncated mean lifetime by statisticians, is very useful As

shown in Figure 12.4, the better customers have an average 10-year lifetime of 6.1 years; the other group has an average of 3.7 years If, on average, a cus

tomer is worth, say, $100 per year, then the premium customers are worth

$610 – $370 = $240 more than the regular customers during the 10 years after they start, or about $24 per year This $24 might represent the return on a reten

tion program designed specifically for the premium customers, or it might give an upper limit of how much to budget for such retention programs

Looking at Retention as Decay

Although we don’t generally advocate comparing customers to radioactive materials, the comparison is useful for understanding retention Think of cus

tomers as a lump of uranium that is slowly, radioactively decaying into lead Our “good” customers are the uranium; the ones who have left are the lead Over time, the amount of uranium left in the lump looks something like our retention curves, with the perhaps subtle difference that the timeframe for ura

nium is measured in billions of years, as opposed to smaller time scales

Trang 11

average 10-year tenure high end customers =

73 months (6.1 years)

Figure 12.4 Average customer lifetime for different groups of customers can be compared

using the areas under the retention curve

One very useful characteristic of the uranium is that we know—or more precisely, scientists have determined how to calculate—exactly how much uranium is going to survive after a certain amount of time They are able to do this because they have built mathematical models that describe radioactive decay, and these have been verified experimentally

Radioactive materials have a process of decay described as exponential

decay What this means is that the same proportion of uranium turns into lead, regardless of how much time has past The most common form of uranium, for instance, has a half-life of about 4.5 billion years So, about half the lump of uranium has turned into lead after this time After another 4.5 billion years, half the remaining uranium will decay, leaving only a quarter of the original lump as uranium and three-quarters as lead

WA R N I N G Exponential decay has many useful properties for predicting beyond the range of observations Unfortunately, customers hardly ever exhibit exponential decay

What makes exponential decay so nice is that the decay fits a nice simple equation Using this equation, it is possible to determine how much uranium

is around at any given point in time Wouldn’t it be nice to have such an equation for customer retention?

It would be very nice, but it is unlikely, as shown in the example in the sidebar “Parametric Approaches Do Not Work.”

To shed some light on the issue, let’s imagine a world where customers did exhibit exponential decay For the purposes of discussion, these customers have

a half-life of 1 year Of 100 customers starting on a particular date, exactly 50 are still active 1 year later After 2 years, 25 are active and 75 have stopped Exponential decay would make it easy to forecast the number of customers in the future

Trang 12

measured in years; the units might also be days, weeks, or months

Each point has a value between 0 and 1, because the points represent a

under the curve is the sum of the areas of these rectangles

Circumscribing each point with a rectangle makes it clear how to calculate the area under the retention curve

values in the curve—an easy calculation in a spreadsheet , an easy way to

the horizontal axis So, the units of the average are also in the units of the horizontal axis

DETERMINING THE AREA UNDER THE RETENTION CURVE

Finding the area under the retention curve may seem like a daunting mathematical effort Fortunately, this is not the case at all

The retention curve consists of a series of points; each point represents the retention after 1 year, 2 years, 3 years, and so on In this case, retention is

proportion of the customers retained up to that point in time

The following figure shows the retention curve with a rectangle holding up each point The base of the rectangle has a length of one (measured in the units of the horizontal axis) The height is the proportion retained The area

The area of each rectangle is—base times height—simply the proportion retained The sum of all the rectangles, then, is just the sum of all the retention

Voilà

calculate the area and quite an interesting observation as well: the sum of the retention values (as percentages) is the average customer lifetime Notice also that each rectangle has a width of one time unit, in whatever the units are of

Tenure (Years)

Trang 13

470643 c12.qxd 3/8/04 11:17 AM Page 392

PARAMETRIC APPROACHES DO NOT WORK

It is tempting to try to fit some known function to the retention curve This

approach is called parametric statistics, because a few parameters describe the

shape of the function The power of this approach is that we can use it to estimate what happens in the future

The line is the most common shape for such a function For a line, there are two parameters, the slope of the line and where it intersects the Y-axis

Another common shape is a parabola, which has an additional X2 term, so a parabola has three parameters The exponential that describes radioactive decay actually has only one parameter, the half-life

The following figure shows part of a retention curve This retention curve is for the first 7 years of data

The figure also shows three best-fit curves Notice that all of these curves fit

the values quite well The statistical measure of fit is R2 , which varies from 0

to 1 Values over 0.9 are quite good, so by standard statistical measures, all these curves fit very, very well

It is easy to fit parametric curves to a retention curve

The real question, though is not how well these curves fit the data in the range used to define it We want to know how well these curves work beyond the original 53-week range

The following figure answers this question It extrapolates the curves ahead another 5 years Quickly, the curves diverge from the actual values, and the difference seems to be growing the further out we go

Trang 14

parameters, would fit the observed retention curve very well and continue

example does illustrate the challenges of using a parametric approach for approximating survival curves directly, and it is consistent with our experience even when using more data points Functions that provide a good fit to the retention curve turn out to diverge pretty quickly

Another way of describing this is that the customers who have been around for 1 year are going to behave just like new customers Consider a group of 100 customers of various tenures, 50 leave in the following year, regardless of the tenure of the customers at the beginning of the year—exponential decay says that half are going to leave regardless of their initial tenure That means that customers who have been around for a while are no more loyal then newer cus

tomers However, it is often the case that customers who have been around for

a while are actually better customers than new customers For whatever reason,

longer tenured customers have stuck around in the past and are probably a bit less likely than new customers to leave in the future Exponential decay is a bad situation, because it assumes the opposite: that the tenure of the customer rela

tionship has no effect on the rate that customers are leaving (the worst-case sce

nario would have longer term customers leaving at consistently higher rates than newer customers, the “familiarity breeds contempt” scenario)

Trang 15

394 Chapter 12

The preceding discussion on retention curves serves to show how useful retention curves are These curves are quite simple to understand, but only in terms

of their data There is no general shape, no parametric form, no grand theory

of customer decay The data is the message

Hazard probabilities extend this idea As discussed here, they are an example of a nonparametric statistical approach—letting the data speak instead of finding a special function to speak for it Empirical hazard probabilities simply let the historical data determine what is likely to happen, without trying to fit data to some preconceived form They also provide insight into customer retention and make it possible to produce a refinement of retention curves called survival curves

The Basic Idea

A hazard probability answers the following question:

Assume that a customer has survived for a certain length of time, so the tomer’s tenure is t What is the probability that the customer leaves before t+1? Another way to phrase this is: the hazard at time t is the risk of losing customers between time t and time t+1 As we discuss hazards in more detail,

cus-it may sometimes be useful to refer to this defincus-ition As wcus-ith many seemingly simple ideas, hazards have significant consequences

To provide an example of hazards, let’s step outside the world of business for a moment and consider life tables, which describe the probability of someone dying at a particular age Table 12.1 shows this data, for the U.S population in 2000:

Table 12.1 Hazards for Mortality in the United States in 2000, Shown as a Life Table

DIES IN EACH AGE RANGE

Trang 16

Table 12.1 (continued)

DIES IN EACH AGE RANGE

one is about 55 years old does the risk rise as high as it is during the first year

This is a characteristic shape of some hazard functions and is called the bathtub

shape The hazards start high, remain low for a long time, and then gradually increase again Figure 12.5 illustrates the bathtub shape using this data

0-1 yrs 1-4 yrs 5-9 yrs

10-14 yrs 15-19 yrs 20-24 yrs 25-29 yrs 30-34 yrs 35-39 yrs 40-44 yrs 45-49 yrs 50-54 yrs 55-59 yrs 60-64 yrs 65-69 yrs 70-74 yrs

Age (Years)

Figure 12.5 The shape of a bathtub-shaped hazard function starts high, plummets, and then

gradually increases again

Trang 17

The second is the total number of customers who could have stopped during

this period, also called the population at risk This consists of all customers whose tenure is greater than or equal to t, including those who stopped at time

t The hazard probability is the ratio of these two numbers, and being a proba

bility, the hazard is always between 0 and 1 These hazard calculations are provided by life table functions in statistical software such as SAS and SPSS It is also possible to do the calculations in a spreadsheet using data directly from a customer database

One caveat: In order for the calculation to be accurate, every customer included in the population count must have the opportunity to stop at that particular time This is a property of the data used to calculate the hazards, rather than the method of calculation In most cases, this is not a problem, because hazards are calculated from all customers or from some subset based on initial conditions (such as initial product or campaign) There is no problem when a customer is included in the population count up to that customer’s tenure, and

the customer could have stopped on any day before then and still be in the data set

An example of what not to do is to take a subset of customers who have stopped during some period of time, say in the past year What is the problem? Consider a customer who stopped yesterday with 2 years of tenure This customer is included in all the population counts for the first year of hazards However, the customer could not have stopped during the first year of tenure The stop would have been more than a year in the past and precluded the customer from being in the data set Because customers who could not have stopped are included in the population counts, the population counts are too big making the initial hazards too low Later in the chapter, an alternative method is explained to address this issue

WA R N I N G To get accurate hazards and survival curves, use groups of

When populations are large, there is no need to worry about statistical ideas such as confidence and standard error However, when the populations are small—as they are in medical research studies or in some business applications—then the confidence interval may become an issue What this means is that a hazard of say 5 percent might really be somewhere between 4 percent and 6 percent When working with smallish populations (say less than

a few thousand), it might be a good idea to use statistical methods that provide

Tiêu đề	Using Thematic Clusters to Adjust Zone Boundaries
Trường học	John Wiley Sons
Chuyên ngành	Data Mining Techniques for Marketing Sales
Thể loại	chapter

Định dạng
Số trang	34
Dung lượng	1,18 MB