1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

Statistics for business decision making and analysis robert stine and foster chapter 22

48 133 1

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 48
Dung lượng 0,96 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

22.1 Problem 1: Changing VariationAlthough regression analysis allows the use of prices of different size homes to estimate the home of a specific size, prices tend to be more variable

Trang 2

Regression Diagnostics

Chapter 22

Trang 3

22.1 Problem 1: Changing Variation

Although regression analysis allows the use of prices of different size homes to estimate the home of a specific size, prices tend to be

more variable for larger homes How does

this affect the SRM?

 Consider how to recognize and fix three potential

problems affecting regression models: changing

variation in the data, outliers, and dependence

among observations

Trang 4

22.1 Problem 1: Changing Variation

Price ($000) vs Home Size (Sq Ft.)

Both the average and standard deviation in price

Trang 5

22.1 Problem 1: Changing Variation

SRM Results: Home Price Example

Trang 6

22.1 Problem 1: Changing Variation

Fixed Costs, Marginal Costs, and Variable Costs

 The estimated intercept (50.598687) can be

interpreted as the fixed cost of a home

 The 95% confidence interval for the intercept (after rounding) is -$4,000 to $105,000

 Since it includes zero, this interval is not a precise

Trang 7

22.1 Problem 1: Changing Variation

Fixed Costs, Marginal Costs, and Variable Costs

 The slope (0.1594259) estimates the marginal cost

of an additional square foot of space

 The 95% confidence interval for the slope (after

rounding) is $135,000 to $183,500

 It can be interpreted as the average difference in

home price associated with 1,000 square feet

Trang 8

22.1 Problem 1: Changing Variation

Detecting Differences in Variation

 Based on the scatterplot, the association between home price and size appears linear

 Little concern about lurking variables since the

sample of homes is from the same neighborhood

 Similar variances condition is not satisfied

Trang 9

22.1 Problem 1: Changing Variation

Detecting Differences in Variation

Fan-shaped appearance of residual plot indicates

changing variances.

Trang 10

22.1 Problem 1: Changing Variation

Detecting Differences in Variation

Side-by-side boxplots confirm that variances increase

Trang 11

22.1 Problem 1: Changing Variation

Detecting Differences in Variation

 Heteroscedastic: errors that have different

amounts of variation

 Homoscedastic: errors having equal amounts of variation

Trang 12

22.1 Problem 1: Changing Variation

Consequences of Different Variation

 Prediction intervals are too narrow or too wide

 Confidence intervals for the slope and intercept

are not reliable

 Hypothesis tests regarding β0 and β1 are not

reliable

Trang 13

22.1 Problem 1: Changing Variation

Consequences of Different Variation

The 95% prediction intervals are too wide for small

homes and too narrow for large homes

Trang 14

22.1 Problem 1: Changing Variation

Fixing the Problem: Revise the Model

 If F represents fixed cost and M marginal costs,

the equation of the SRM becomes

Price = FM SqFt 

Trang 15

22.1 Problem 1: Changing Variation

Fixing the Problem: Revise the Model

 Divide both sides of the equation by the number

of square feet and simplify:

SqFt

SqFt SqFt

Trang 16

22.1 Problem 1: Changing Variation

Fixing the Problem: Revise the Model

 The response variable becomes price per square foot and the explanatory variable becomes the

reciprocal of the number of square feet

The marginal cost M is the intercept and the

slope is F, the fixed cost.

 The residuals have similar variances

Trang 17

22.1 Problem 1: Changing Variation

Fixing the Problem: Revise the Model

Boxplots confirm homoscedastic errors

Trang 18

prices into fixed and variable costs to better prepare for negotiations with realtors.

Trang 19

4M Example 22.1:

ESTIMATING HOME PRICES

Method

Data consists of a sample of 94 homes for

sale in Seattle The explanatory variable is the reciprocal of home size and the

response is price per square foot The

scatterplot shows a linear association and there are no obvious lurking variables.

Trang 20

4M Example 22.1:

ESTIMATING HOME PRICES

Mechanics

Evidently independent, similar variances, and

nearly normal conditions met

Trang 21

4M Example 22.1:

ESTIMATING HOME PRICES

Mechanics

The SRM results.

Trang 22

The 95% confidence interval for the intercept is

[136.8182 to 178.6878] and the 95% confidence interval for the slope is [18,592.36 to 89,181.64]

Trang 23

4M Example 22.1:

ESTIMATING HOME PRICES

Message

Prices for homes in this Seattle

neighborhood run about $140 to $180 per

square foot, on average Average fixed

costs associated with the purchase are in

the range $19,000 to $89,000, with 95%

confidence

Trang 24

22.1 Problem 1: Changing Variation

Comparing Models with Different Responses

Even though the revised model has a smaller r 2,

 It provides more reliable and narrower confidence intervals for fixed and variable costs; and

 It provides more sensible prediction intervals

Trang 25

22.1 Problem 1: Changing Variation

Comparing Models with Different Responses

Trang 26

22.1 Problem 1: Changing Variation

Comparing Models with Different Responses

Trang 27

22.2 Problem 2: Leveraged Outliers

Consider a Contractor’s Bid on a Project

A contractor is bidding on a project to construct an

875 square-foot addition to a home

 If he bids too low, he loses money on the project

 If he bids too high, he does not get the job

Trang 28

22.2 Problem 2: Leveraged Outliers

Contractor Data for n=30 Similar Projects

Note that all but one of his previous projects are

Trang 29

22.2 Problem 2: Leveraged Outliers

Contractor Example

 His one project at 900 square feet is an outlier

It is also a leveraged observation as it pulls the

regression line in its direction

 Leveraged: an observation in regression that has

a small or large value of the explanatory variable

Trang 30

22.2 Problem 2: Leveraged Outliers

Consequences of an Outlier

 To see the consequences of an outlier, fit the

least squares regression line both with and

without it

 Use the standard errors obtained without

including the outlier to compare estimates

Trang 31

22.2 Problem 2: Leveraged Outliers

Consequences for the Contractor Example

Trang 32

22.2 Problem 2: Leveraged Outliers

Consequences for the Contractor Example

 Including the outlier shifts the estimated fixed cost

up by about 1.5 standard errors

 Including the outlier shifts the estimated marginal cost down by about 1.56 standard errors

Trang 33

22.2 Problem 2: Leveraged Outliers

Consequences for the Contractor Example

Prediction intervals when the outlier is included

Trang 34

22.2 Problem 2: Leveraged Outliers

Consequences for the Contractor Example

Prediction intervals when the outlier is not included

Trang 35

22.2 Problem 2: Leveraged Outliers

Fixing the Problem: More Information

 If the outlier describes what is expected the next time under the same conditions, then it should be included

 In the contractor example, more information is

needed to decide whether to include or exclude

the outlier

Trang 36

22.3 Problem 3: Dependent Errors and Time Series

Detecting Dependence

 With time series data, plot residuals versus time

to look for a pattern indicating dependence in the errors

Use the Durbin-Watson statistic to test for

correlation between adjacent residuals (known as autocorrelation)

Trang 37

22.3 Problem 3: Dependent Errors and Time Series

The Durbin-Watson Statistic

 Tests the null hypothesis H0: ρε = 0

 Is calculated as follows:

2

2

2 2

2 1

1

2 2 3

2 1

2

) (

) (

) (

n

n

n

e e

e

e e

e e

e

e D

Trang 38

22.3 Problem 3: Dependent Errors and Time Series

The Durbin-Watson Statistic

 Use p-value provided by software or table

(portion shown below) to draw a conclusion

Trang 39

22.3 Problem 3: Dependent Errors and Time Series

Consequences of Dependence

 If there is positive autocorrelation in the errors, the

estimated standard errors are too small.

 The estimated slope and intercept are less precise than suggested by the output.

 Best remedy is to incorporate the dependence into the regression model.

Trang 41

4M Example 22.2:

CELL PHONE SUBSCRIBERS

Motivation

The rate of growth is captured by taking the

¼ power of the number of subscribers.

Trang 42

4M Example 22.2:

CELL PHONE SUBSCRIBERS

Method

Use simple regression to predict the future

number of subscribers The quarter power

of the number of subscribers, in millions, is the response The explanatory variable is time The scatterplot shows a linear

association Other lurking variables may

be present, however, such as technology

and marketing.

Trang 43

4M Example 22.2:

CELL PHONE SUBSCRIBERS

Mechanics

The least squares equation is

Estimated Subscribers1/4 = -317.4 + 0.16 Date

Trang 44

4M Example 22.2:

CELL PHONE SUBSCRIBERS

Mechanics

The timeplot of residuals and D = 0.11 indicates

independence condition is not satisfied Also

Trang 45

4M Example 22.2:

CELL PHONE SUBSCRIBERS

Message

Using a novel transformation, the historical

trend can be summarized as

Estimated Subscribers1/4 = -317.4 + 0.16 Date

However, since the conditions for SRM are

not satisfied, we cannot quantify the

uncertainty for predictions.

Trang 46

Best Practices

 Make sure that your model makes sense

 Plan to change your model if it does not match

the data

 Report the presence of and how you handle any outliers

Trang 47

Do not rely on summary statistics like r 2 to pick

the best model

Don’t compare r 2 between regression models

unless the response is the same

 Do not check for normality until you get the right equation

Trang 48

Pitfalls (Continued)

 Don’t think that your data are independent if the

Durbin-Watson statistic is close to 2

 Never forget to look at plots of the data and

model

Ngày đăng: 10/01/2018, 16:01

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN