1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

Statistics for business economics 7th by paul newbold chapter 12

58 249 1

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 58
Dung lượng 2,01 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Chapter GoalsAfter completing this chapter, you should be able to:  Apply multiple regression analysis to business making situations decision- Analyze and interpret the computer outpu

Trang 1

Statistics for Business and Economics

7 th Edition

Chapter 12

Multiple Regression

Trang 2

Chapter Goals

After completing this chapter, you should be able to:

 Apply multiple regression analysis to business making situations

decision- Analyze and interpret the computer output for a multiple regression model

 Perform a hypothesis test for all regression coefficients

or for a subset of coefficients

 Fit and interpret nonlinear regression models

 Incorporate qualitative variables into the regression

model by using dummy variables

 Discuss model specification and analyze residuals

Trang 3

The Multiple Regression Model

Idea: Examine the linear relationship between

1 dependent (Y) & 2 or more independent variables (X i )

ε X

β X

β X

β β

Multiple Regression Model with k Independent Variables:

12.1

Trang 4

Multiple Regression Equation

The coefficients of the multiple regression model are

estimated using sample data

ki k

2i 2

1i 1

Estimated slope coefficients

Multiple regression equation with k independent variables:

Estimated intercept

In this chapter we will always use a computer to obtain the

regression slope coefficients and other regression

summary measures.

Trang 5

Multiple Regression Equation

Two variable model

y

x 1

x 2

2 2 1

1

Slope fo r variab

le x2

(continued)

Trang 6

Multiple Regression Model

Two variable model

x 1i The best fit equation, y ,

is found by minimizing the sum of squared errors, e 2

Sample observation

2 2 1

1

b

Trang 7

Standard Multiple Regression

Assumptions

independent

(The constant variance property is called

homoscedasticity)

n) , 1, (i

for σ

] E[ε and

0 ]

Trang 8

 The random error terms, ε i , are not correlated

with one another, so that

all for 0

] ε E[ε i j  

0 x

c x

c x

c

Standard Multiple Regression

Assumptions

Trang 9

Example:

2 Independent Variables

evaluate factors thought to influence demand

 Dependent variable: Pie sales (units per week)

 Independent variables: Price (in $)

Advertising ($100’s)

Trang 10

Pie Sales Example

Week

Pie Sales

Price ($)

Advertising ($100s)

Trang 11

Estimating a Multiple Linear

Regression Equation

measures of goodness of fit for multiple regression

 Data / Data Analysis / Regression

12.2

Trang 12

Multiple Regression Output

ertising) 74.131(Adv

ce) 24.975(Pri -

306.526

Trang 13

The Multiple Regression Equation

ertising) 74.131(Adv

ce) 24.975(Pri

306.526

b 1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, net of the effects of

changes due to advertising

b 2 = 74.131: sales will increase, on average,

by 74.131 pies per week for each $100 increase in

advertising, net of the effects of changes due to price

where

Sales is in number of pies per week

Price is in $

Advertising is in $100’s.

Trang 14

Coefficient of Determination, R 2

explained by all x variables taken together

total sample variability

squares of

sum total

squares of

sum

regression SST

SSR

12.3

Trang 15

.52148 56493.3

29460.0 SST

52.1% of the variation in pie sales

is explained by the variation in price and advertising

(continued)

Trang 16

Estimation of Error Variance

 Consider the population regression model

 The unbiased estimate of the variance of the errors is

where

 The square root of the variance, se , is called the

standard error of the estimate

1 K n

SSE 1

K n

e s

n

1 i

2 i 2

K 2i

2 1i

1 0

i i

e   ˆ

Trang 17

s e 

The magnitude of this value can be compared to the average y value

Trang 18

Adjusted Coefficient of

Determination,

added to the model, even if the new variable is not an important predictor variable

models

 What is the net effect of adding a new variable?

variable is added

explanatory power to offset the loss of one degree of freedom?

2

R

Trang 19

Adjusted Coefficient of

Determination,

 Used to correct for the fact that adding non-relevant

independent variables will still reduce the error sum of

squares

(where n = sample size, K = number of independent variables)

 Adjusted R 2 provides a better comparison between

multiple regression models with different numbers of independent variables

 Penalize excessive use of unimportant independent

/ SST

1) K

(n /

SSE 1

Trang 20

R 2 

44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables

2

R

Trang 21

Coefficient of Multiple

Correlation

 The coefficient of multiple correlation is the correlation between the predicted value and the observed value of the dependent variable

 Is the square root of the multiple coefficient of

determination

 Used as another measure of the strength of the linear relationship between the dependent variable and the independent variables

 Comparable to the correlation between Y and X in

simple regression

2 R y)

, y r(

Trang 22

Evaluating Individual Regression Coefficients

 Use t-tests for individual coefficients

conditionally important

12.4

Trang 23

Evaluating Individual Regression Coefficients

Test Statistic:

( df = n – k – 1)

j b

Trang 24

Evaluating Individual Regression Coefficients

t-value for Price is t = -2.306, with p-value 0398

t-value for Advertising is t = 2.855, with p-value 0145

(continued)

Trang 25

The test statistic for each variable falls

in the rejection region (p-values < 05)

There is evidence that both Price and Advertising affect pie sales at  = 05

From Excel output:

Reject H 0 for each variable

Trang 26

Confidence Interval Estimate

for the Slope

Confidence interval limits for the population slope β j

Example: Form a 95% confidence interval for the effect of

changes in price (x 1 ) on pie sales:

-24.975 ± (2.1788)(10.832)

So the interval is -48.576 < β < -1.374

j b α/2

1, K

where t has (n – K – 1) d.f.

Here, t has (15 – 2 – 1) = 12 d.f.

Trang 27

Confidence Interval Estimate

for the Slope

Confidence interval for the population slope β i

Example: Excel output also reports these interval endpoints:

Weekly sales are estimated to be reduced by between 1.37 to

48.58 pies for each increase of $1 in the selling price

Coefficients Standard Error Lower 95% Upper 95%

Intercept 306.52619 114.25389 … 57.58835 555.46404

Price -24.97509 10.83213 … -48.57626 -1.37392

Advertising 74.13096 25.96732 … 17.55303 130.70888

(continued)

Trang 28

Test on All Coefficients

of the X variables considered together and Y

H 0 : β 1 = β 2 = … = β k = 0 (no linear relationship)

H 1 : at least one β i ≠ 0 (at least one independent

variable affects Y)

12.5

Trang 29

F-Test for Overall Significance

where F has k (numerator) and

(n – K – 1) (denominator) degrees of freedom

1) K

SSE/(n

SSR/K s

0 if F F H

Trang 30

F-Test for Overall Significance

6.5386 2252.8

14730.0 MSE

(continued)

With 2 and 12 degrees

Trang 31

F-Test for Overall Significance

There is evidence that at least one independent variable affects Y

MSR

Critical Value:

F  = 3.885

(continued)

F

Trang 32

Tests on a Subset of Regression Coefficients

 Consider a multiple regression model involving

variables x j and z j , and the null hypothesis that the z variable coefficients are all zero:

r) 1, , (j

0 α

of one least

at : H

0 α

α α

: H

j 1

r 2

1 0

r 1i

1 Ki

K 1i

1 0

Trang 33

Tests on a Subset of Regression Coefficients

 Goal: compare the error sum of squares for the

complete model with the error sum of squares for the

restricted model

 First run a regression for the complete model and obtain SSE

 Next run a restricted regression that excludes the z variables (the number of variables excluded is r) and obtain the

restricted error sum of squares SSE(r)

 Compute the F statistic and apply the decision rule for a significance level 

(continued)

α 1, r K n r,

2 e

s

r / ) SSE SSE(r)

( F

if H

Trang 34

 Given a population regression model

 then given a new observation of a data point

(x 1,n+1 , x 2,n+1 , , x K,n+1 ) the best linear unbiased forecast of y n+1 is

 It is risky to forecast for new X values outside the range of the data used

to estimate the model coefficients, because we do not have data to

support that the linear model extends beyond the observed range.

n) , 1,2, (i

ε x

β x

β x

β β

y i  0  1 1i  2 2i    K Ki  i  

1 n K, K 1

n 2, 2 1

n 1, 1 0

Trang 35

Using The Equation to Make

Predictions

Predict sales for a week in which the selling

price is $5.50 and advertising is $350:

Predicted sales

is 428.62 pies

428.62

(3.5) 74.131

(5.50) 24.975

306.526

-ertising) 74.131(Adv

ce) 24.975(Pri

306.526 Sales

Trang 36

Nonlinear Regression Models

variable and an independent variable may not be linear

non-linear relationships

 Example: Quadratic model

 The second independent variable is the square

of the first variable

ε X

β X

β β

12.7

Trang 37

Quadratic Regression Model

 where:

β 0 = Y intercept

β 1 = regression coefficient for linear effect of X on Y

β 2 = regression coefficient for quadratic effect on Y

ε i = random error in Y for observation i

i

2 1i 2

1i 1

0

Model form:

Trang 38

Linear vs Nonlinear Fit

Linear fit does not give random residuals

Nonlinear fit gives random residuals

Trang 39

Quadratic Regression Model

Quadratic models may be considered when the scatter

diagram takes on one of the following shapes:

Y

β 1 < 0 β 1 > 0 β 1 < 0 β 1 > 0

β1 = the coefficient of the linear term

β2 = the coefficient of the squared term

X 1

β 2 > 0 β 2 > 0 β 2 < 0 β 2 < 0

i

2 1i 2

1i 1

0

Trang 40

Testing for Significance:

Quadratic Effect

 Compare the linear regression estimate

 with quadratic regression estimate

 Hypotheses

 (The quadratic term does not improve the model)

 (The quadratic term improves the model)

2 1 2 1

Trang 41

Testing for Significance:

Quadratic Effect

Hypotheses

 (The quadratic term does not improve the model)

 (The quadratic term improves the model)

 The test statistic is

where:

b2 = squared term slope coefficient

β2 = hypothesized slope (zero)

Sb = standard error of the slope 2

Trang 42

Testing for Significance:

Quadratic Effect

quadratic model is a better model

(continued)

Trang 43

Example: Quadratic Model

increases:

Purity

Filter Time

Trang 44

Example: Quadratic Model

 Simple regression results:

t statistic, F statistic, and

R 2 are all high, but the residuals are not random:

Trang 45

Example: Quadratic Model

  Coefficients Standard Error t Stat P-value

The quadratic term is significant and

improves the model: R 2 is higher and s e is

lower, residuals are now random

Trang 46

The Log Transformation

 Original multiplicative model

 Transformed multiplicative model

The Multiplicative Model :

ε X

X β

2

β 1 0

) log(ε )

log(X β

) log(X β

) log(β log(Y)  0  1 1  2 2 

Trang 47

Interpretation of coefficients

For the multiplicative model:

variables are logged:

 The coefficient of the independent variable X k can

1 0

Y

Trang 48

Dummy Variables

variable with two levels:

 yes or no, on or off, male or female

 recorded as 0 or 1

variable is significant

variables needed is (number of levels - 1)

12.8

Trang 49

Dummy Variable Example

Let:

y = Pie Sales

(X 2 = 0 if there was no holiday that week)

2 1

b

Trang 50

Dummy Variable Example

Same slope

1 0

1 2

0 1

0

x b

b (0)

b x

b b

y

x b )

b (b

(1) b

x b b

y

1 2

1

1 2

“Holiday” has a significant effect

on pie sales

Trang 51

Interpreting the Dummy Variable Coefficient

Sales: number of pies sold per week

Price: pie price in $

Holiday:

Example:

1 If a holiday occurred during the week

0 If no holiday occurred

weeks with a holiday than in weeks without a

holiday, given the same price

) 15(Holiday 30(Price)

300

Trang 52

Interaction Between Explanatory Variables

variables

 Response to one x variable may vary at different

levels of another x variable

) x (x b

x b x

b b

x b x

b x

b b

y

2 1 3

2 2 1

1 0

3 3 2

2 1

1 0

Trang 53

2 1

1 0

1 2

3 1

2 2 0

X X β X

β X

β β

)X X

β (β

X β β

Trang 55

Significance of Interaction Term

 The coefficient b 3 is an estimate of the difference

in the coefficient of x 1 when x 2 = 1 compared to when x 2 = 0

 The t statistic for b 3 can be used to test the

hypothesis

 If we reject the null hypothesis we conclude that there is

a difference in the slope coefficient for the two subgroups

0 β

0, β

| 0 β

: H

0 β

0, β

| 0 β

: H

2 1

3 1

2 1

3 0

Trang 56

Multiple Regression Assumptions

Assumptions:

e i = (y i – y i )

Errors (residuals) from the regression model:

12.9

Trang 58

Chapter Summary

 Developed the multiple regression model

 Tested the significance of the multiple regression model

 Discussed adjusted R 2 ( R 2 )

 Tested individual regression coefficients

 Tested portions of the regression model

 Used quadratic terms and log transformations in

regression models

 Explained dummy variables

 Evaluated interaction effects

 Discussed using residual plots to check model

assumptions

Ngày đăng: 10/01/2018, 16:03