Churn models, also known as retention or attrition models, predict the probability of customer attrition.. Pure transactors, or customers that pay their balance every month, are profitab
Trang 1LOG
OF ODDS
DERIVED RISK SCORE
A Different Kind of Risk:
Fraud
The main focus of this chapter has been on predicting risk for default on a payment And the methodology translates very well to predicting the risk of claims for insurance There is another type of risk that also erodes profits: the risk of fraud Losses due to fraud cost companies and ultimately consumers millions of dollars a year And the threat is
increasing as more and more consumers use credit cards, telecommunications, and the Internet for personal and business transactions
Team-Fly®
Trang 2Page 254
Figure 10.11 Tabulation of risk scores
Jaya Kolhatkar, Director of Fraud Management for Amazon.com, discusses the mechanics of developing fraud models and the importance of proper implementation:
Fraud in the e-tailing world has increased rapidly over the past two years Being a virtual marketplace, most of the fraud checks in the physical retail world do not apply Primarily, fraud is committed through the use of credit cards There are
several effective fraud management tools available from the credit card associations like address verification system, fraud scores, etc.; however, these are not enough for a rigorous fraud management system
As Amazon.com moved from selling just books to books, music, video, consumer electronics, etc., fraud losses increased
In order to control these losses a two-pronged approach was developed The two components were data analysis/model
building and operations/investigations
The data analysis component is the backbone of the fraud system It is important to underscore that because fraud rates
within the population are so low, a blend of two or more modeling techniques seems to work best at isolating fraud orders within the smallest percentage of the population We have used logistic regression, decision trees, etc., in combination to create effective fraud models Low fraud rates also impact data preparation/analysis It is easy to misjudge spurious data for
a new fraud trend
Trang 3Page 255Another important issue to keep in mind at this stage of model development is model implementation Because we function
in a real-time environment and cannot allow the scoring process to be a bottleneck in our order fulfillment process, we need
to be very parsimonious in our data selection While, at first, this seems to be very limiting from a variable selection point
of view, we have found that using a series of scorecards built on progressively larger set of variables and implemented on a progressively smaller populations is very effective
Given the "customer-centric" philosophy of Amazon.com, no order (except perhaps the most blatant fraud order) is
cancelled without manual intervention Even the best predictive system is inherently prone to misclassification We use data analysis and optimization techniques to help the investigations staff hone in on the right set of orders
To be effective in reducing fraud losses over a long period of time, the fraud models need to be constantly updated to
capture any new patterns in fraud behavior
This methodology works well for any industry seeking to limit risk If you've ordered a new Internet service, the
company probably looked at your credit report They may have even evaluated your application based on a risk score similar to the one we just developed The same methodology works well to predict the risk of claims You would substitute claims data for risk data and gather predictive variables from the customer database and overlay data
I hope you're not too stuffed We have a couple more recipes to go In the next chapter, I demonstrate how to build a churn model Bon appetit!
Trang 4Churn models, also known as retention or attrition models, predict the probability of customer attrition Because attrition has such a powerful impact on profitability, many companies are making these models the main focus of their customer loyalty program In this chapter, I begin with a discussion of the importance of customer loyalty and its effect on profits
in a number of industries The remainder of the chapter details the development of a churn model that predicts the effect
of a rate increase on credit card balances The steps are familiar I begin by defining the objective Then I prepare the variables, process the model, and validate I wrap up the chapter with some options for implementing a churn model and the effect on overall customer profitability
Trang 5Page 258
Customer Loyalty
As I just mentioned, the main advantage of a retention program is economics If you have $1 to spend on marketing, you would be much better off spending it on customer retention than customer acquisition Why? It's much more expensive
to attract a new customer than to retain a current one Also, loyal customers tend to be less price-sensitive
The airline industry is very adept at building customer loyalty The more you fly with one airline, the more benefits you receive Many other industries have followed the pattern with loyalty cards and incentives for repeat business The gambling industry has embraced customer profiling and target modeling to identify and provide benefits for their most profitable customers Credit card banks have affinity cards with everything from schools to pet clubs These added benefits and incentives are essential for survival since most companies are learning that it is difficult to survive by competing on price alone Building customer loyalty by creating additional value is becoming the norm in many
industries
Defining the Objective
For many industries, defining the objective is simple You can have only one long-distance provider If you switch, it's a complete gain for one company and a complete loss for the other This is also true for energy providers; you have one source for your electrical power Insurance customers generally patronize one company for certain types, if not all, of their insurance For some industries, though, it's not so simple For example, a catalog company may hope you are a loyal customer, but it doesn't really know what you are spending with its competitors This is true for most retailers.Credit card banks have exceptional challenges in this area due to the combination of stiff competition and industry dynamics For the most part, the only profitable customers are the "revolvers" or those customers that carry a balance
"Silent attrition" occurs when customers pay down their balances without closing their accounts Pure transactors, or customers that pay their balance every month, are profitable only if their monthly purchases are above a certain amount.This chapter's recipe details the steps for building a model to predict attrition or churn for credit card customers
following a rate increase Rowan Royal Bank has a modest portfolio of 1.2 million customers Its interest rates or APRs (annual percentage rates) are lower than the industry average, especially on its
Trang 6Page 259high-risk customers But before it increases rates on the entire group of high-risk customers, the bank wants to predict which customers are highly rate-sensitive In other words, they want to determine which customers have a high
probability of shifting balances away following a rate increase For these customers, the increase in interest revenue may
be offset by losses due to balances attrition
By definition, the opposite of customer retention or loyalty is customer attrition or churn Measuring attrition is easy Defining an attritor is the challenging part There are many factors to consider For example, how many months do you want to consider? Or do you compare lost balances to a beginning balance in a given month or the average of several months? Do you take a straight percentage drop in balances? If so, is this meaningful for someone with a very low beginning balance? In other words, the definition should not just describe some arbitrary action that ensures a strong model The definition should be actionable and meaningful to the business goals See the accompanying sidebar for Shree Pragada's discussion on the significance of this definition
Defining Attrition to Optimize Profits
Shree Pragada, Vice President of Customer Acquisition at Fleet Credit Card Bank, discusses the effect of the
definition of attritors on profitability.
The emphasis in model development is usually just on the model performance measures and not much on the model usage In addition to building a statistically sound model, an analyst should focus on the business application of the model
The following is an example from the financial services industry A business manager requests for a model to
identify balance attritors If the analyst were to build a model just to suffice the request, he or she would define the objective to identify just balance attritors But further inquiry into the application of the model reveals that the
attrition probabilities will be applied to customers' account balances to estimate the level of balance at attrition
risk— and eventually in a customer profitability system for targeting for a marketing promotion The analyst would now change the objective to predicting balance attritors with the emphasis on attritors with significant account
balances As the financial impact of attrition is the final goal, such a change in the definition of the dependent
variable will improve the effectiveness of the model in the business strategies
The logical choice in this modeling exercise was to build a logistic model to predict the likelihood of attrition The exercise also involved comparing several definitions of the dependent variable For simplicity, we will focus on only two dependent variable definitions— one with the balance cut -off and the other without
Trang 7Definition: % Reprice Balance Attrition, the dependent variable, is the percent reduction in balance:
% Balance Attrition = 1 – Fraction of Pre-Event Balances Remaining
Business analysis reveals that most accounts tend to be unprofitable when more than 75% of the pre-reprice balances were paid off Therefore, a binary variable is defined using this 75% balance attrition cutoff:
Dependent Variable: = 1 If % of Balance Attrition GT 75% = 0 otherwise
Fraction of balances left is defined as Average of the Three-Month Balances Post-Event over Average Annual Balance Pre
As the goal of this model is to predict the probability of "balance" attrition, the definition of the dependent variable has been altered to focus on the magnitude (or dollar amount) of balance attrition in addition to the likelihood of attrition By doing this, customers with a high percent of attrition but with marginal amount of balance attrition (dollar amount) will be treated as nonattritors As a result we can be more confident that we are modeling deliberate and significant balance attrition and not just swings in balance level that may not be related to reprice The modified definition is:
Dependent Variable: = 1 If % of Balance Attrition GT 75% and Dollar Amount GT $1,000 = 0 Otherwise
Table 11.1 summarizes the model measurement statistics and percent of attritors in the top 10 of 20 segments of the Cumulative Gains tables:
Table 11.1 Comparison of Dependent Variable Definition–Minimum Percentage
MODEL
DESCRIPTION
# / % OF ACCOUNTS CATEGORIZED AS ATTRITORS OF THE TOTAL SAMPLE OF 53,877 A/CS
Trang 8To the surprise of the Implementation/Targeting groups, Model 1 was recommended despite its lesser strength in identifying
attritors Because the model is used to understand the financial impact as a result of attrition, the dependent variable in Model 1 was changed to focus on attritors with significant account balances (over $1,000 in this case) This lowered the ability of the model to identify the likelihood of attrition, but it significantly improved the rank ordering of attritors with significant account balances, as evident in Table 11.2
Table 11.2 Comparison of Dependent Variable Definition–Minimum
MODEL
DESCRIPTION
# / % OF ACCOUNTS CATEGORIZED
AS ATTRITORS
OF THE TOTAL SAMPLE OF 53,877 A/CS
RANK OF MAX
IN THE TOP 50% OF THE POP
% OF TOTAL LOST DOLLARS SEPARATED IN THE TOP 50%
The data for modeling was randomly selected from the high -risk section of Rowan Royal Bank's customer portfolio The attrition rate
is almost 24%, so further sampling wasn't necessary Prior work with attrition modeling had narrowed the field of eligible variables
In fact, a couple of the variables are actually scores from other models Figure 11.1 shows the list of variables Note: The term
is commonly used in the credit card industry and stands for Financial Revolving Unsecured Trade
I am defining an attritor using the definition developed by Shree Pragada in the above sidebar The variable pre3moav equals the average balance for the three months prior to the rate increase The variable pst3moav equals the average balance for the three period beginning the fourth month following the rate increase.
Trang 9Page 262
Figure 11.1 List of variables
Trang 10if dollattr > 1000 and dollattr/pre3moav > 75 then attrite = 1;
else attrite = 0;
run;
Preparing the Variables
This turns out to be one of the easiest recipes because I have relatively few variables I begin by looking at the continuous variables using a program similar to the one I used in chapter 10 I'll follow up with the categorical variables using standard frequencies
Continuous Variables
I begin with PROC MEANS to see if the continuous variables have missing or extreme values (outliers) Figure 11.2 displays the output
proc means data=ch11.rowan maxdec=2;
run;
Figure 11.2 Means on continuous variables
Trang 11Page 264All the variables seem to be within range with no missing values The next step is to look for segmentation opportunities and find the best form of each continuous variable The following macro is a slight variation on the macro in chapter 10
It processes all continuous variables (listed at the bottom of the macro) The var calls the full variable name and svar
calls a three-letter nickname for the variable used to create prefixes throughout the program The %INCLUDE command accesses the program that creates the transformations run the logistic regression to determine the best final variable
formations That program is named transf:
%macro cont (var, svar);
title "Evaluation of &var";
proc univariate data=ch11.rowan noprint;
var &var;
output out=ch11.&svar.data pctlpts= 10 20 30 40 50 60 70 80 90 99 100
pctlpre=&svar;
run;
<<CODE IN THIS SECTION IS SIMILAR TO TRANSFORMATION CODE IN CHAPTER 10>>
proc freq data=&svar.dset;
Figure 11.3 shows the decile segmentation for the variable avbalfru Notice how the attrition rate flattens out in deciles 3
through 5 This is a likely place for
Team-Fly®
Trang 12Page 265
Figure 11.3 Decile segmentation for average balance on FRUTs
a binary split to create a segmentation variable In Figure 11.4, notice how the variable avb_20 was selected as the
second best-fitting transformation
Table 11.3 lists all the continuous variables and their top two transformations These will be used in the final model processing
Categorical Variables
The following frequency calculates the attrition rate for every level of each categorical variable
proc freq data=ch11.rowan;
table attrite*(age_lt25 autoloan child donate gender hh_ind
homeown mailord marital mortgage popdens region retired sgle_fam
somecoll ssn_ind)/ chisq missing;
run;
Figure 11.5 displays the output for population density (popdens) The attrition rates are very different for each level, so I
will create indicator variables for three of the four levels and allow the fourth level to be the default I repeat this process for every categorical variable The following code transforms the categorical variables into numeric form for use in the model
Trang 13Page 266
Figure 11.4 Logistic output for average balance on FRUTs
if gender = 'I' then gender = 'U';
west = (region in ('west', ' '));
married = (marital = 'M');
single = (marital = 'S');
unkngend = (gender = 'U');
if age_lt25 = 'Y' then age_d = 1;
Trang 14Page 267
Table 11.3 List of Continuous Variable Transformations
numnewac Number of New Accts in Last 6 Months num_sq num_exp
hibal12m High Balance in Last 12 Months hib_20 hib_sqrt
nccblgt0 Number of Cards with Bal > $0 ncc_90 ncc_curi
pacctobl Percent of Accts with $0 Balance pac_90 pac_sini
pbalshar Percent of Total Balances with Us pba_10 pba_cu
if hh_ind = 'H' then hhind_d = 1;