Chapter 701 02 03 04 05 07 08 09 Evaluating Classifiers Plain Accuracy and Its Problems The Confusion Matrix Problems with Unbalanced Classes Problems with Unequal Costs and Benefits A
Trang 1Lecturer: VAN CHAU
CHAPTER 7 & 11
Decision
Analytic
Thinking
Data Science
for Business
Trang 2Chapter 7
01 02 03 04 05
07 08 09
Evaluating Classifiers Plain Accuracy and Its Problems The Confusion Matrix
Problems with Unbalanced Classes Problems with Unequal Costs and Benefits
A Key Analytical Framework: Expected Value Using Expected Value to Frame Classifier Use Using Expected Value to Frame Classifier Evaluation Evaluation, Baseline Performance, and Implications for Investments in Data
Trang 31 Evaluating Classifiers
1. Binary classification, for which the classes often are simply called
“positive” and “negative.”
2. How shall we evaluate how well such a model performs?
3. In Chapter 5 we discussed how for evaluation we should use a
holdout test set to assess the generalization performance of the
model But how should we measure generalization performance?
4. Let’s look at the basic confusion matrix first
Trang 41 Evaluating Classifiers(cont.)
24 Evaluation Metrics for Binary Classification (And When to Use Them)
1 Confusion Martix
2 False positive rate | Type-I error
3 False negative rate | Type-II error
4 True negative rate | Specificity
5 Negative predictive value
6 False discovery rate
7 True positive rate | Recall | Sensitivity
8 Positive predictive value | Precision
21 Cumulative gain chart
22 Lift curve | Lift chart
23 Kolmogorov-Smirnov plot
24 Kolmogorov Smirnov statistics
Trang 52 Plain Accuracy and Its Problems
1. Up to this point we have assumed that some simple metric, such as
classifier error rate or accuracy, was being used to measure a model’s performance
Accuracy
Error rate = 1- Accuracy
2. Accuracy is a common evaluation metric that is often used in data mining
studies because it reduces classifier performance to a single number and it
is very easy to measure
Trang 62 Plain Accuracy and Its Problems(cont.)
•True Positive (TP):Reality:
Let's try calculating accuracy for the following model that classified 100 tumors as
malignant (the positive class) or benign (the negative class):
Accuracy =0.91
Accuracy comes out to 0.91, or 91% (91 correct predictions out of 100 total examples)
That means our tumor classifier is doing a great job of identifying malignancies, right?
Of the 91 benign tumors, the model correctly identifies 90 as benign That's good However, of the 9
malignant tumors, the model only correctly identifies 1 as malignant—a terrible outcome, as 8 out of
Trang 73 The Confusion Matrix (CM)
1. A confusion matrix for a problem involving n classes is an n × n matrix with
the columns labeled with actual classes and the rows labeled with
predicted classes.
2. A confusion matrix separates out the decisions made by the classifier,
making explicit how one class is being confused for another In this way different sorts of errors may be dealt with separately Here is the 2 X 2 confusion matrix.
Trang 83 The Confusion Matrix (cont.)
True Positive (TP): patients who are presumed
to have an illness are indeed carriers.
True Negative (TN): The patients who are not
presumed to have no disease are truly healthy.
False Positive (FP): patients who are expected
to have an illness are in fact healthy.
False Negative (FN): Patients who are not
presumed to have no illness are actually
FP and FN are called with the names as Type I error and Type II error.
1 In the table right side, there are 4 terms we need to pay attention to:
Trang 93 The Confusion Matrix(cont.)
2 With CM, we will calculate two important quantities, Precision and
Recall.
Precision: This is the ratio between people who actually have the
disease compared to all predicted cases In other words, how many
positive predictions are actually "true" in actual?
Recall (called as Sensitivity): of those who actually have the disease,
how many of them are correctly predicted by our model? In other words, how many predicted 'positive' are correct due to our model?
Trang 103 The Confusion Matrix (cont.)
Source medium.com
Note:
True / False: indicates whether what we predicted is true or not (true or false).
Source of Image: Effect Size FAQs by Paul Ellis
Type I Error (False Positive) and Type 2 Error (False Negative)
Trang 113 The Confusion Matrix (cont.)
For example, predict the test results of 1000 patients (person) to see
if they are pregnant or not Here's what our model predicts:
90 pregnant patients and all of our predictions are correct
910 patients are not pregnant, but actually up to 900 are pregnant
Actual (Yes) Actual (No) Predicted (Yes) 90(True Positive) 0 (False Positive)
Predicted (No) 900 (False Negative) 10 (True Negative)
Trang 123 The Confusion Matrix (cont.)
For example, predict the test results of 1000 patients (person) to see
if they are pregnant or not Here's what our model predicts:
90 pregnant patients and all of our predictions are correct
910 patients are not pregnant, but actually up to 900 are pregnant
Predicted (Yes) 90 (True Positive) 0 (False Positive) Precision=100%
Predicted (No) 900 (False Negative) 10 (True Negative) R=9%
Trang 133 The Confusion Matrix (cont.)
Precision low/ Recall high
Predicted (Yes) 90(True Positive) 900 (False Positive) Precision=9%
Predicted (No) 10(False Negative) 10 (True Negative) R=90%
R== 90%
Trang 144 Problems with Unbalanced Classes
1 As an example of how we need to think carefully about model evaluation,
consider a classification problem where one class is rare.
2 This is a common situation in applications, because classifiers often are used
to sift through a large population of normal or uninteresting entities in order to
find a relatively small number of unusual ones;
for example, looking for defrauded customers, checking an assembly line for
defective parts, or targeting consumers who actually would respond to an offer.
1 Because the unusual or interesting class is rare among the general
population, the class distribution is unbalanced or skewed.
Trang 154 Problems with Unbalanced Classes (cont.)
4. Unfortunately, as the class distribution becomes more skewed, evaluation
based on accuracy breaks down
5. Consider a domain where the classes appear in a 999:1 ratio A simple
rule - always choose the most prevalent class - gives 99.9% accuracy
6. Skews of 1:100 are common in fraud detection, and skews greater than
1:106 have been reported in other classifier learning applications
Trang 164 Problems with Unbalanced Classes (cons.)
7 Consider MegaTelCo Cellular-churn example: prediction model results
Me: 80% accuracy ; coworker : 37% accuracy.
8 We need more info about the data.
9 We need to know what is the proportion of churn in the population we are
considering.
10 Suppose the baseline churn rate is approximately 10% per month So if we
simply classify everyone as negative (not churn), we could achieve a base rate accuracy of 90%!
11 Digging deeper, I discover my coworker and I evaluated on two different
datasets My coworker calculated the accuracy on a representative sample from the population, whereas I created artificially balanced datasets for training and
Trang 174 Problems with Unbalanced Classes (cons.)
Trang 185 Problems with Unequal Costs & Benefits
1 Another problem with simple classification accuracy as a metric
is that it makes no distinction between false positive and false negative errors.
2 These are typically very different kinds of errors with very
different costs because the classifications have consequences
of differing severity.
3 Medical diagnosis domain- cancer example:
False Positive(FP):
False Negative(FN):
Trang 195 Problems with Unequal Costs & Benefits (cont.)
4 Whatever costs you might decide for each, it is unlikely they
would be equal; and the errors should be counted separately regardless.
5 Ideally, we should estimate the cost or benefit of each decision
a classifier can make.
6 Once aggregated, these will produce an expected profit (or
expected benefit or expected cost) estimate for the classifier.
Trang 206 A Key Analytical Framework: Expected Value
1. The expected value computation provides a framework that is extremely
useful in organizing thinking about data-analytic problems
2. It decomposes data-analytic thinking into:
the structure of the problem,
the elements of the analysis that can be extracted from the data, and
the elements of the analysis that need to be acquired from other sources.
3. The general form of an expected value calculation:
EV = p(x1) · v(x1) + p(x2) · v(x2) + p(x3) · v(x3 )
Equation 7-1 The general form of an expected value calculation
Trang 216 A Key Analytical Framework: Expected Value(cont.)
Each X i is a possible decision outcome; p(X i ) is its probability and v(Xi) is its value The probabilities often can be estimated from the data (ii), but the business values often need
to be acquired from other sources (iii) As we will see in Chapter 11, driven modeling may help estimate business values, but usually the values must come from external domain knowledge
Trang 226 A Key Analytical Framework: Expected Value (cont.)
Example of Expected Value (Multiple Events)
You are a financial analyst in a development company Your manager just asked you to assess the viability of future development projects and select the most promising one According to estimates:
Project A, upon completion, shows a probability of 0.4 to achieve a
value of $2 million and a probability of 0.6 to achieve a value of
$500,000
Project B shows a probability of 0.3 to be valued at $3 million and a
probability of 0.7 to be valued at $200,000 upon completion
$500K
0.
4 0.
$200K
0.
3 0.
Trang 236 A Key Analytical Framework: Expected Value (cont.)
Example of Expected Value (Multiple Events)
In order to select the right project, you need to calculate the expected value of each project and compare the values with each other The EV can be calculated in the following way:
Trang 247 Using Expected Value to Frame Classifier Use
1 In targeted marketing, for example, we may want to assign each
consumer a class of likely responder versus not likely responder, then we could target the likely responders
2 Unfortunately, for targeted marketing often the probability of
response for any individual consumer is very low, so no consumer may seem like a likely responder
3 However, with the expected value framework we can see the crux
of the problem
4 Consider that we have an offer for a product that, for simplicity, is
only available via this offer If the offer is not made to a consumer, the consumer will not buy the product
Trang 257 Using Expected Value to Frame Classifier Use(cont.)
5 To be concrete, let’s say that a consumer buys the product for $200
and our productrelated costs are $100 To target the consumer with the offer, we also incur a cost Let’s say that we mail some flashy marketing materials, and the overall cost including postage is $1, yielding a value (profit) of vR = $99 if the consumer responds (buys the product)
6 Now, what about vNR, the value to us if the consumer does not
respond? We still mailed the marketing materials, incurring a cost of
$1 or equivalently a benefit of -$1
Trang 267 Using Expected Value to Frame Classifier Use(cont.)
Expected benefit of targeting = P R(x).VR(x) + [1-pR(x)].VNR
Trang 277 Using Expected Value to Frame Classifier Use(cont.)
PR (x).$99 –[1- PR (x) ].$1>0
A little rearranging of the equation gives us a decision rule: Target a given
customer x only if:
PR (x).$99 >[1- PR (x) ].$1
PR (x) >0.01
With these example values, we should target the consumer as long as the estimated probability of responding is greater than 1%
Trang 288 Using Expected Value to Frame Classifier Evaluation
1 We need to evaluate the set of decisions made by a model when
applied to a set of examples Such an evaluation is necessary in order
to compare one model to another
2 It is likely that each model will make some decisions better than the
other model What we care about is, in aggregate, how well does each model do: what is its expected value
3 We can use the expected value framework just described to determine
the best decisions for each particular model, and then use the expected value in a different way to compare the models
Trang 298 Using Expected Value to Frame Classifier
Evaluation(cont.)
A consumer being predicted to churn and acutally does’n churn ?
(Y,n)
Trang 308 Using Expected Value to Frame Classifier
Trang 31Costs and benefits
8 Using Expected Value to Frame Classifier
Trang 328 Using Expected Value to Frame Classifier
Evaluation(cont.)
b(predicted, actual)
A true positive is a consumer who is
offered the product and buys it The
benefit in this case is the profit from the
revenue ($200) minus the product
related costs ($100) and the mailing
costs ( $1) , so b(Y,p)=99
A false positive corrurs when we classify a consumer as a likely responder and therefore target her, but she does not respond We’ve said theat the cost of prearing and mailing the
marketing materials is a fixed cost of $1 per consumer The benefit in this case is negative b(Y,n)=-1.
A false negative is a consumer is a
consumer who was predicted not to be a
likely responder (so was not offered the
product) , but whould have bought it if
offered In this case, now money was sent
and nothing was gained., so b(N, p)=0
A true negative is a consumer who was not offered a deal and who would not have
bought it even if it had been offered The benefit in this case is zero (not profit but no cost), so b(N,n)=0.
Trang 338 Using Expected Value to Frame Classifier Evaluation(cont.)
1. A common way of expressing expected profit is to factor out the
probabilities of seeing each class, often referred to as the class priors
2. The class priors, p(p) and p(n), specify the likelihood of seeing positive
and negative instances, respectively
3. Factoring these out allows us to separate the influence of class
imbalance from the fundamental predictive power of the model
Alternative calculation
Trang 348 Using Expected Value to Frame Classifier Evaluation(cont.)
A rule of basic probability is :
Each of these is weighted by the probability that we see that sort of example.
So, if positive examples are very rare, their contribution to the overall expected
profit will be correspondingly small.
Equation 7-2 Expected profit equation with priors p(p) and p(n) factored
Expected profit = p(p) ・ [ p(Y| p) ・ b(Y, p) + p(N| p) ・ c(N,p) ]+
p(n) ・ [p(N| n) ・ b(N,n) + p(Y| n) ・ c(Y, n)
Trang 35P n
Table 7-5 Our sample
confusion matrix (raw
= 0.55 ・ 0.92 ・ b(Y,p ) + 0.08 ・ b(N, p) + 0.45 ・ [ 0.86 ・ b(N,n) + 0.14 ・ p(Y, n)]
= 0.55 ・ [ 0.92 ・ 99 + 0.08 ・ 0 ]+
8 Using Expected Value to Frame Classifier Evaluation(cont.)
Returning to the targeted marketing example, what is the expected profit of the model learned? We can calculate it using Equation 7-2: