Chapter GoalsAfter completing this chapter, you should be able to: Apply multiple regression analysis to business making situations decision- Analyze and interpret the computer outpu
Trang 1Statistics for Business and Economics
7 th Edition
Chapter 12
Multiple Regression
Trang 2Chapter Goals
After completing this chapter, you should be able to:
Apply multiple regression analysis to business making situations
decision- Analyze and interpret the computer output for a multiple regression model
Perform a hypothesis test for all regression coefficients
or for a subset of coefficients
Fit and interpret nonlinear regression models
Incorporate qualitative variables into the regression
model by using dummy variables
Discuss model specification and analyze residuals
Trang 3The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (X i )
ε X
β X
β X
β β
Multiple Regression Model with k Independent Variables:
12.1
Trang 4Multiple Regression Equation
The coefficients of the multiple regression model are
estimated using sample data
ki k
2i 2
1i 1
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimated intercept
In this chapter we will always use a computer to obtain the
regression slope coefficients and other regression
summary measures.
Trang 5Multiple Regression Equation
Two variable model
y
x 1
x 2
2 2 1
1
Slope fo r variab
le x2
(continued)
Trang 6Multiple Regression Model
Two variable model
x 1i The best fit equation, y ,
is found by minimizing the sum of squared errors, e 2
Sample observation
2 2 1
1
b
Trang 7Standard Multiple Regression
Assumptions
independent
(The constant variance property is called
homoscedasticity)
n) , 1, (i
for σ
] E[ε and
0 ]
Trang 8 The random error terms, ε i , are not correlated
with one another, so that
all for 0
] ε E[ε i j
0 x
c x
c x
c
Standard Multiple Regression
Assumptions
Trang 9Example:
2 Independent Variables
evaluate factors thought to influence demand
Dependent variable: Pie sales (units per week)
Independent variables: Price (in $)
Advertising ($100’s)
Trang 10Pie Sales Example
Week
Pie Sales
Price ($)
Advertising ($100s)
Trang 11Estimating a Multiple Linear
Regression Equation
measures of goodness of fit for multiple regression
Data / Data Analysis / Regression
12.2
Trang 12Multiple Regression Output
ertising) 74.131(Adv
ce) 24.975(Pri -
306.526
Trang 13The Multiple Regression Equation
ertising) 74.131(Adv
ce) 24.975(Pri
306.526
b 1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, net of the effects of
changes due to advertising
b 2 = 74.131: sales will increase, on average,
by 74.131 pies per week for each $100 increase in
advertising, net of the effects of changes due to price
where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
Trang 14Coefficient of Determination, R 2
explained by all x variables taken together
total sample variability
squares of
sum total
squares of
sum
regression SST
SSR
12.3
Trang 15.52148 56493.3
29460.0 SST
52.1% of the variation in pie sales
is explained by the variation in price and advertising
(continued)
Trang 16Estimation of Error Variance
Consider the population regression model
The unbiased estimate of the variance of the errors is
where
The square root of the variance, se , is called the
standard error of the estimate
1 K n
SSE 1
K n
e s
n
1 i
2 i 2
K 2i
2 1i
1 0
i i
e ˆ
Trang 17s e
The magnitude of this value can be compared to the average y value
Trang 18Adjusted Coefficient of
Determination,
added to the model, even if the new variable is not an important predictor variable
models
What is the net effect of adding a new variable?
variable is added
explanatory power to offset the loss of one degree of freedom?
2
R
Trang 19Adjusted Coefficient of
Determination,
Used to correct for the fact that adding non-relevant
independent variables will still reduce the error sum of
squares
(where n = sample size, K = number of independent variables)
Adjusted R 2 provides a better comparison between
multiple regression models with different numbers of independent variables
Penalize excessive use of unimportant independent
/ SST
1) K
(n /
SSE 1
Trang 20R 2
44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables
2
R
Trang 21Coefficient of Multiple
Correlation
The coefficient of multiple correlation is the correlation between the predicted value and the observed value of the dependent variable
Is the square root of the multiple coefficient of
determination
Used as another measure of the strength of the linear relationship between the dependent variable and the independent variables
Comparable to the correlation between Y and X in
simple regression
2 R y)
, y r(
Trang 22Evaluating Individual Regression Coefficients
Use t-tests for individual coefficients
conditionally important
12.4
Trang 23Evaluating Individual Regression Coefficients
Test Statistic:
( df = n – k – 1)
j b
Trang 24Evaluating Individual Regression Coefficients
t-value for Price is t = -2.306, with p-value 0398
t-value for Advertising is t = 2.855, with p-value 0145
(continued)
Trang 25The test statistic for each variable falls
in the rejection region (p-values < 05)
There is evidence that both Price and Advertising affect pie sales at = 05
From Excel output:
Reject H 0 for each variable
Trang 26Confidence Interval Estimate
for the Slope
Confidence interval limits for the population slope β j
Example: Form a 95% confidence interval for the effect of
changes in price (x 1 ) on pie sales:
-24.975 ± (2.1788)(10.832)
So the interval is -48.576 < β < -1.374
j b α/2
1, K
where t has (n – K – 1) d.f.
Here, t has (15 – 2 – 1) = 12 d.f.
Trang 27Confidence Interval Estimate
for the Slope
Confidence interval for the population slope β i
Example: Excel output also reports these interval endpoints:
Weekly sales are estimated to be reduced by between 1.37 to
48.58 pies for each increase of $1 in the selling price
Coefficients Standard Error … Lower 95% Upper 95%
Intercept 306.52619 114.25389 … 57.58835 555.46404
Price -24.97509 10.83213 … -48.57626 -1.37392
Advertising 74.13096 25.96732 … 17.55303 130.70888
(continued)
Trang 28Test on All Coefficients
of the X variables considered together and Y
H 0 : β 1 = β 2 = … = β k = 0 (no linear relationship)
H 1 : at least one β i ≠ 0 (at least one independent
variable affects Y)
12.5
Trang 29F-Test for Overall Significance
where F has k (numerator) and
(n – K – 1) (denominator) degrees of freedom
1) K
SSE/(n
SSR/K s
0 if F F H
Trang 30F-Test for Overall Significance
6.5386 2252.8
14730.0 MSE
(continued)
With 2 and 12 degrees
Trang 31F-Test for Overall Significance
There is evidence that at least one independent variable affects Y
MSR
Critical Value:
F = 3.885
(continued)
F
Trang 32Tests on a Subset of Regression Coefficients
Consider a multiple regression model involving
variables x j and z j , and the null hypothesis that the z variable coefficients are all zero:
r) 1, , (j
0 α
of one least
at : H
0 α
α α
: H
j 1
r 2
1 0
r 1i
1 Ki
K 1i
1 0
Trang 33Tests on a Subset of Regression Coefficients
Goal: compare the error sum of squares for the
complete model with the error sum of squares for the
restricted model
First run a regression for the complete model and obtain SSE
Next run a restricted regression that excludes the z variables (the number of variables excluded is r) and obtain the
restricted error sum of squares SSE(r)
Compute the F statistic and apply the decision rule for a significance level
(continued)
α 1, r K n r,
2 e
s
r / ) SSE SSE(r)
( F
if H
Trang 34 Given a population regression model
then given a new observation of a data point
(x 1,n+1 , x 2,n+1 , , x K,n+1 ) the best linear unbiased forecast of y n+1 is
It is risky to forecast for new X values outside the range of the data used
to estimate the model coefficients, because we do not have data to
support that the linear model extends beyond the observed range.
n) , 1,2, (i
ε x
β x
β x
β β
y i 0 1 1i 2 2i K Ki i
1 n K, K 1
n 2, 2 1
n 1, 1 0
Trang 35Using The Equation to Make
Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:
Predicted sales
is 428.62 pies
428.62
(3.5) 74.131
(5.50) 24.975
306.526
-ertising) 74.131(Adv
ce) 24.975(Pri
306.526 Sales
Trang 36Nonlinear Regression Models
variable and an independent variable may not be linear
non-linear relationships
Example: Quadratic model
The second independent variable is the square
of the first variable
ε X
β X
β β
12.7
Trang 37Quadratic Regression Model
where:
β 0 = Y intercept
β 1 = regression coefficient for linear effect of X on Y
β 2 = regression coefficient for quadratic effect on Y
ε i = random error in Y for observation i
i
2 1i 2
1i 1
0
Model form:
Trang 38Linear vs Nonlinear Fit
Linear fit does not give random residuals
Nonlinear fit gives random residuals
Trang 39Quadratic Regression Model
Quadratic models may be considered when the scatter
diagram takes on one of the following shapes:
Y
β 1 < 0 β 1 > 0 β 1 < 0 β 1 > 0
β1 = the coefficient of the linear term
β2 = the coefficient of the squared term
X 1
β 2 > 0 β 2 > 0 β 2 < 0 β 2 < 0
i
2 1i 2
1i 1
0
Trang 40Testing for Significance:
Quadratic Effect
Compare the linear regression estimate
with quadratic regression estimate
Hypotheses
(The quadratic term does not improve the model)
(The quadratic term improves the model)
2 1 2 1
Trang 41Testing for Significance:
Quadratic Effect
Hypotheses
(The quadratic term does not improve the model)
(The quadratic term improves the model)
The test statistic is
where:
b2 = squared term slope coefficient
β2 = hypothesized slope (zero)
Sb = standard error of the slope 2
Trang 42Testing for Significance:
Quadratic Effect
quadratic model is a better model
(continued)
Trang 43Example: Quadratic Model
increases:
Purity
Filter Time
Trang 44Example: Quadratic Model
Simple regression results:
t statistic, F statistic, and
R 2 are all high, but the residuals are not random:
Trang 45Example: Quadratic Model
Coefficients Standard Error t Stat P-value
The quadratic term is significant and
improves the model: R 2 is higher and s e is
lower, residuals are now random
Trang 46The Log Transformation
Original multiplicative model
Transformed multiplicative model
The Multiplicative Model :
ε X
X β
2
β 1 0
) log(ε )
log(X β
) log(X β
) log(β log(Y) 0 1 1 2 2
Trang 47Interpretation of coefficients
For the multiplicative model:
variables are logged:
The coefficient of the independent variable X k can
1 0
Y
Trang 48Dummy Variables
variable with two levels:
yes or no, on or off, male or female
recorded as 0 or 1
variable is significant
variables needed is (number of levels - 1)
12.8
Trang 49Dummy Variable Example
Let:
y = Pie Sales
(X 2 = 0 if there was no holiday that week)
2 1
b
Trang 50Dummy Variable Example
Same slope
1 0
1 2
0 1
0
x b
b (0)
b x
b b
y
x b )
b (b
(1) b
x b b
y
1 2
1
1 2
“Holiday” has a significant effect
on pie sales
Trang 51Interpreting the Dummy Variable Coefficient
Sales: number of pies sold per week
Price: pie price in $
Holiday:
Example:
1 If a holiday occurred during the week
0 If no holiday occurred
weeks with a holiday than in weeks without a
holiday, given the same price
) 15(Holiday 30(Price)
300
Trang 52Interaction Between Explanatory Variables
variables
Response to one x variable may vary at different
levels of another x variable
) x (x b
x b x
b b
x b x
b x
b b
y
2 1 3
2 2 1
1 0
3 3 2
2 1
1 0
Trang 532 1
1 0
1 2
3 1
2 2 0
X X β X
β X
β β
)X X
β (β
X β β
Trang 55Significance of Interaction Term
The coefficient b 3 is an estimate of the difference
in the coefficient of x 1 when x 2 = 1 compared to when x 2 = 0
The t statistic for b 3 can be used to test the
hypothesis
If we reject the null hypothesis we conclude that there is
a difference in the slope coefficient for the two subgroups
0 β
0, β
| 0 β
: H
0 β
0, β
| 0 β
: H
2 1
3 1
2 1
3 0
Trang 56Multiple Regression Assumptions
Assumptions:
e i = (y i – y i )
Errors (residuals) from the regression model:
12.9
Trang 58Chapter Summary
Developed the multiple regression model
Tested the significance of the multiple regression model
Discussed adjusted R 2 ( R 2 )
Tested individual regression coefficients
Tested portions of the regression model
Used quadratic terms and log transformations in
regression models
Explained dummy variables
Evaluated interaction effects
Discussed using residual plots to check model
assumptions