Data Science for Business (chapter7&11)v1

Chapter 7 01 02 03 04 05 07 08 09 Evaluating Classifiers Plain Accuracy and Its Problems The Confusion Matrix Problems with Unbalanced Classes Problems with Unequal Costs and Benefits

Trang 1

Lecturer: VAN CHAU

CHAPTER 7 & 11

Decision

Analytic

Thinking

Data Science

for Business

Trang 2

Chapter 7

01 02 03 04 05

07 08 09

Evaluating Classifiers Plain Accuracy and Its Problems The Confusion Matrix

Problems with Unbalanced Classes Problems with Unequal Costs and Benefits

A Key Analytical Framework: Expected Value Using Expected Value to Frame Classifier Use Using Expected Value to Frame Classifier Evaluation Evaluation, Baseline Performance, and Implications for Investments in Data

Trang 3

1 Evaluating Classifiers

1. Binary classification, for which the classes often are simply called

“positive” and “negative.”

2. How shall we evaluate how well such a model performs?

3. In Chapter 5 we discussed how for evaluation we should use a

holdout test set to assess the generalization performance of the

model But how should we measure generalization performance?

4. Let’s look at the basic confusion matrix first

Trang 4

1 Evaluating Classifiers(cont.)

4

24 Evaluation Metrics for Binary Classification (And When to Use Them)

1 Confusion Martix

2 False positive rate | Type-I error

3 False negative rate | Type-II error

4 True negative rate | Specificity

5 Negative predictive value

6 False discovery rate

7 True positive rate | Recall | Sensitivity

8 Positive predictive value | Precision

21 Cumulative gain chart

22 Lift curve | Lift chart

23 Kolmogorov-Smirnov plot

24 Kolmogorov Smirnov statistics

https://neptune.ai/blog/evaluation-metrics-binary-classification

Trang 5

2 Plain Accuracy and Its Problems

1. Up to this point we have assumed that some simple metric, such as

classifier error rate or accuracy, was being used to measure a model’s performance

Accuracy

Error rate = 1- Accuracy

•

2. Accuracy is a common evaluation metric that is often used in data mining

studies because it reduces classifier performance to a single number and it

is very easy to measure

Trang 6

2 Plain Accuracy and Its Problems(cont.)

Let's try calculating accuracy for the following model that classified 100 tumors as

malignant (the positive class) or benign (the negative class):

Trang 7

3 The Confusion Matrix (CM)

1. A confusion matrix for a problem involving n classes is an n × n matrix with

the columns labeled with actual classes and the rows labeled with

predicted classes.

2. A confusion matrix separates out the decisions made by the classifier,

making explicit how one class is being confused for another In this way different sorts of errors may be dealt with separately Here is the 2 X 2 confusion matrix.

Trang 8

3 The Confusion Matrix (cont.)

to have an illness are indeed carriers.

presumed to have no disease are truly healthy.

to have an illness are in fact healthy.

presumed to have no illness are actually

FP and FN are called with the names as Type I error and Type II error.

1 In the table right side, there are 4 terms we need to pay attention to:

Trang 9

3 The Confusion Matrix(cont.)

precision R

2 With CM, we will calculate two important quantities, Precision and

Recall.

 Precision: This is the ratio between people who actually have the

disease compared to all predicted cases In other words, how many

positive predictions are actually "true" in actual?

 Recall (called as Sensitivity): of those who actually have the disease,

how many of them are correctly predicted by our model? In other words, how many predicted 'positive' are correct due to our model?

Trang 10

10

Source medium.com

Note:

True / False: indicates whether what we predicted is true or not (true or false).

Positive / Negative: indicates what we predict (yes or no).

Source of Image: Effect Size FAQs by Paul Ellis

Type I Error (False Positive) and Type 2 Error (False Negative)

Trang 11

For example, predict the test results of 1000 patients (person) to see

if they are pregnant or not Here's what our model predicts:

 90 pregnant patients and all of our predictions are correct

 910 patients are not pregnant, but actually up to 900 are pregnant

Predicted (Yes) 90(True Positive) 0 (False Positive)

Predicted (No) 900 (False Negative) 10 (True Negative)

precision== 100% R== 9%

Trang 12

12

For example, predict the test results of 1000 patients (person) to see

if they are pregnant or not Here's what our model predicts:

 90 pregnant patients and all of our predictions are correct

 910 patients are not pregnant, but actually up to 900 are pregnant

Predicted (No) 900 (False Negative) 10 (True Negative) R=9%

Actual (Yes) Actual (No)

Trang 13

Precision low/ Recall high

Predicted (Yes) 90(True Positive) 900 (False Positive) Precision=9%

Predicted (No) 10(False Negative) 10 (True Negative) R=90%

Predicted (Yes) 90(True Positive) 900 (False Positive) Precision=9%

Predicted (No) 10(False Negative) 10 (True Negative)

R== 90%

Trang 14

4 Problems with Unbalanced Classes

1 As an example of how we need to think carefully about model evaluation,

consider a classification problem where one class is rare.

to sift through a large population of normal or uninteresting entities in order to

find a relatively small number of unusual ones;

defective parts, or targeting consumers who actually would respond to an offer.

population, the class distribution is unbalanced or skewed.

14

Trang 15

4 Problems with Unbalanced Classes (cont.)

4. Unfortunately, as the class distribution becomes more skewed, evaluation

based on accuracy breaks down

5. Consider a domain where the classes appear in a 999:1 ratio A simple

rule - always choose the most prevalent class - gives 99.9% accuracy

6. Skews of 1:100 are common in fraud detection, and skews greater than

1:106 have been reported in other classifier learning applications

Trang 16

4 Problems with Unbalanced Classes (cons.)

7 Consider MegaTelCo Cellular-churn example: prediction model results

Me: 80% accuracy ; coworker : 37% accuracy.

8 We need more info about the data.

9 We need to know what is the proportion of churn in the population we are

considering.

10 Suppose the baseline churn rate is approximately 10% per month So if we

simply classify everyone as negative (not churn), we could achieve a base rate accuracy of 90%!

11 Digging deeper, I discover my coworker and I evaluated on two different

datasets My coworker calculated the accuracy on a representative sample from the population, whereas I created artificially balanced datasets for training and

Trang 17

4 Problems with Unbalanced Classes (cons.)

Trang 18

5 Problems with Unequal Costs & Benefits

1 Another problem with simple classification accuracy as a metric

is that it makes no distinction between false positive and false negative errors.

2 These are typically very different kinds of errors with very

different costs because the classifications have consequences

Trang 19

5 Problems with Unequal Costs & Benefits (cont.)

4 Whatever costs you might decide for each, it is unlikely they

would be equal; and the errors should be counted separately regardless.

5 Ideally, we should estimate the cost or benefit of each decision

a classifier can make.

6 Once aggregated, these will produce an expected profit (or

expected benefit or expected cost) estimate for the classifier.

Trang 20

6 A Key Analytical Framework: Expected Value

1. The expected value computation provides a framework that is extremely

useful in organizing thinking about data-analytic problems

2. It decomposes data-analytic thinking into:

 the structure of the problem,

 the elements of the analysis that can be extracted from the data, and

 the elements of the analysis that need to be acquired from other sources.

3. The general form of an expected value calculation:

20

EV = p(x1) · v(x1) + p(x2) · v(x2) + p(x3) · v(x3 )

Equation 7-1 The general form of an expected value calculation

Trang 21

6 A Key Analytical Framework: Expected Value(cont.)

Each X i is a possible decision outcome; p(X i ) is its probability and v(Xi) is its value The probabilities often can be estimated from the data (ii), but the business values often need

to be acquired from other sources (iii) As we will see in Chapter 11, driven modeling may help estimate business values, but usually the values must come from external domain knowledge

Trang 22

6 A Key Analytical Framework: Expected Value (cont.)

22

Example of Expected Value (Multiple Events)

You are a financial analyst in a development company Your manager just asked you to assess the viability of future development projects and select the most promising one According to estimates:

 Project A, upon completion, shows a probability of 0.4 to achieve a

value of $2 million and a probability of 0.6 to achieve a value of

$500,000

 Project B shows a probability of 0.3 to be valued at $3 million and a

probability of 0.7 to be valued at $200,000 upon completion

7

Trang 23

6 A Key Analytical Framework: Expected Value (cont.)

Example of Expected Value (Multiple Events)

In order to select the right project, you need to calculate the expected value of each project and compare the values with each other The EV can be calculated in the following way:

Trang 24

7 Using Expected Value to Frame Classifier Use

24

1 In targeted marketing, for example, we may want to assign each

consumer a class of likely responder versus not likely responder, then we could target the likely responders

2 Unfortunately, for targeted marketing often the probability of

response for any individual consumer is very low, so no consumer may seem like a likely responder

3 However, with the expected value framework we can see the crux

of the problem

4 Consider that we have an offer for a product that, for simplicity, is

only available via this offer If the offer is not made to a consumer, the consumer will not buy the product

Trang 25

7 Using Expected Value to Frame Classifier Use(cont.)

5 To be concrete, let’s say that a consumer buys the product for $200

and our productrelated costs are $100 To target the consumer with the offer, we also incur a cost Let’s say that we mail some flashy marketing materials, and the overall cost including postage is $1, yielding a value (profit) of vR = $99 if the consumer responds (buys the product)

6 Now, what about vNR, the value to us if the consumer does not

respond? We still mailed the marketing materials, incurring a cost of

$1 or equivalently a benefit of -$1

Trang 26

Trang 27

PR (x).$99 –[1- PR (x) ].$1>0

A little rearranging of the equation gives us a decision rule: Target a given

customer x only if:

PR (x).$99 >[1- PR (x) ].$1

PR (x) >0.01

With these example values, we should target the consumer as long as the estimated probability of responding is greater than 1%

Trang 28

8 Using Expected Value to Frame Classifier Evaluation

28

1 We need to evaluate the set of decisions made by a model when

applied to a set of examples Such an evaluation is necessary in order

to compare one model to another

2 It is likely that each model will make some decisions better than the

other model What we care about is, in aggregate, how well does each model do: what is its expected value

3 We can use the expected value framework just described to determine

the best decisions for each particular model, and then use the expected value in a different way to compare the models

Trang 29

8 Using Expected Value to Frame Classifier

Evaluation(cont.)

A consumer being predicted to churn and acutally does’n churn ?

(Y,n)

Trang 30

Trang 31

Costs and benefits

Trang 32

Evaluation(cont.)

32

b(predicted, actual)

A true positive is a consumer who is

offered the product and buys it The

benefit in this case is the profit from the

revenue ($200) minus the product

related costs ($100) and the mailing

costs ( $1) , so b(Y,p)=99

A false positive corrurs when we classify a consumer as a likely responder and therefore target her, but she does not respond We’ve said theat the cost of prearing and mailing the

marketing materials is a fixed cost of $1 per consumer The benefit in this case

is negative b(Y,n)=-1.

A false negative is a consumer is a

consumer who was predicted not to be a

likely responder (so was not offered the

product) , but whould have bought it if

offered In this case, now money was sent

and nothing was gained., so b(N, p)=0

A true negative is a consumer who was not offered a deal and who would not have

bought it even if it had been offered The benefit in this case is zero (not profit but no cost), so b(N,n)=0.

Trang 33

8 Using Expected Value to Frame Classifier Evaluation(cont.)

1. A common way of expressing expected profit is to factor out the

probabilities of seeing each class, often referred to as the class priors

2. The class priors, p(p) and p(n), specify the likelihood of seeing positive

and negative instances, respectively

3. Factoring these out allows us to separate the influence of class

imbalance from the fundamental predictive power of the model

Alternative calculation

Trang 34

34

A rule of basic probability is :

 Each of these is weighted by the probability that we see that sort of example.

 So, if positive examples are very rare, their contribution to the overall expected

profit will be correspondingly small.

p(x, y) = p(y) ・ p(x | y)

Expected profit = p(Y|p) ・ p(p) ・ b(Y,p) + p(N| p) ・ p(p) ・ b(N,p) +

p(N|n) ・ p(n) ・ b(N,n) + p(Y| n) ・ p(n) ・ b(Y,n)

Factoring out the class priors p(y) and p(n), we get the final

equation

Equation 7-2 Expected profit equation with priors p(p) and p(n) factored

Expected profit = p(p) ・ [ p(Y| p) ・ b(Y, p) + p(N| p) ・ c(N,p) ]+

p(n) ・ [p(N| n) ・ b(N,n) + p(Y| n) ・ c(Y, n)

Trang 35

P n

Table 7-5 Our sample

confusion matrix (raw

Returning to the targeted marketing example, what is the expected profit of the model learned? We can calculate it using Equation 7-2:

Trang 36

1. Up to this point we have talked about model evaluation in isolation.

2. Nevertheless, another fundamental notion in data science is: it is

important to consider carefully what would be a reasonable baseline

against which to compare model performance

3. The answer of course depends on the actual application, and coming

up with suitable baselines is one task for the business understanding

phase of the data mining process

4. Principle: is simple but not simplistic.

36

9 Evaluation, Baseline Performance, and Implications for

Investments in Data

Định dạng
Số trang	59
Dung lượng	2,5 MB