- Predict the value of a dependent variable based on the value of at least one independent variable - Explain the impact of changes in an independent variable on the dependent variab
Trang 1Chapter 4 Linear Regression
and Correlation
analysis
Trang 21 Introduction to regression
analysis
Regression analysis
- Describe a relationship between two
variables in mathematical terms.
- Predict the value of a dependent variable
based on the value of at least one
independent variable
- Explain the impact of changes in an
independent variable on the dependent
variable
Trang 3the variable used to
explain the dependent variable
Trang 4Names for ys and xs in
Regressand Regressors
Effect variable Causal variablesExplained variable Explanatory
variables
Trang 5Simple Linear Regression
Model
Only one independent variable,
x
Relationship between x and y
is described by a linear function
Changes in y are assumed to
be caused by changes in x
Trang 6Types of Regression
Models
Positive Linear Relationship
Negative Linear Relationship
Non-linear relationship
No Relationship
Trang 7ε x
β β
Coefficient
Random Error term, or residual
Dependent
Variable
Independent Variable
Random Error component
Trang 8 The probability distribution of the
errors has constant variance
The underlying relationship between the x variable and the y variable is linear
Trang 9Population Linear
Regression
Random Error for this x valuey
β β
Trang 10x b
Estimate of the regression slope
Estimated
(or predicted)
y value
Independent variable
The individual random error terms ei have a mean of zero
Trang 11Least Squares Criterion
b 0 and b 1 are obtained by finding the values of b 0 and b 1 that minimize the sum of the squared residuals
2 1
0
2 2
x)) b
(b (y
) yˆ (y
Trang 12The Least Squares
n
y
x xy
2
1
) (
x b y
o
r
Trang 13 b0 is the estimated average value
of y when the value of x is zero
b1 is the estimated change in the average value of y as a result of a one-unit change in x
Interpretation of the Slope and the Intercept
Trang 14 A real estate agent wishes to examine the relationship between the selling
price of a home and its size
(measured in square feet)
A random sample of 10 houses is
Trang 15Sample Data for House
Price Model
House Price in
$1000s (y)
Trang 16Least Squares Regression Properties
The sum of the residuals from the least squares regression line is 0 (y yˆ ) 0
2
) ˆ (y y
The least squares coefficients are
unbiased estimates of β 0 and β 1
The simple regression line always passes through the mean of the y variable and
the mean of the x variable
The sum of the squared residuals is a
minimum (minimized)
0 1
y b b x
Trang 17 The coefficient of determination is
the portion of the total variation in
the dependent variable that is
explained by variation in the
independent variable
The coefficient of determination is
also called R-squared and is denoted as
Trang 18TSS total
2
R
Trang 19R 2 = +1
Examples of Approximate
Values
y
x y
Trang 20Examples of Approximate
Values
y
x y
variation in x
Trang 21Examples of Approximate
Values
R 2 = 0
No linear relationship between x and y:
variation in y is explained by
variation in x)
Trang 23Coefficient of determination
2 RSS
R
TSS
Trang 242 Correlation analysis
Correlation is a technique used to
measure the strength of the
relationship between two variables
The stronger the correlation, the
better the relationship or the better fit the regression line and vice versa
Trang 25Scatter Plot Examples
Trang 26Scatter Plot Examples
Trang 27The correlation coefficient (r)
The correlation coefficient is
used to measure the strength of the linear relationship between
two variables
The product moment correlation coefficient is calculated using the formula:
Trang 28The correlation coefficient (r)
( ][
) x x
( [
) y y
)(
x x
( r
2 2
) y (
n ][
) x (
) x (
n
[
y x
xy
n r
2 2
2 2
Trang 29r : simple correlation coefficient
Trang 30Features of r
Unit free
Range between -1 and 1
The closer to -1, the stronger the
negative linear relationship
The closer to 1, the stronger the
positive linear relationship
The closer to 0, the weaker the linear relationship
Trang 31r = +.3 r = +1
Examples of Approximate
Trang 32Example calculation
2 2 2 2
( ) ( )
xy x y r
Trang 35Estimate b0 and b1
Trang 36Linear regression
equation
and b1?
Trang 37Coefficient of determination and correlation coefficient
Trang 38The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (y) & 2 or more independent variables (xi)
ε x
β x
β x
β β
k k
2 2
1 1
Estimated multiple regression model:
Estimated intercept
Trang 39Estimates b0, b1, b2,….,bk
0 1 1 2 2
2
1 0 1 1 1 2 1 2 1
2
2 0 2 1 1 2 2 2 2
k
x
2
0 k 1 1 k 2 2 k k k
�
�
�
�
�
�
�
�
Trang 410 b x b x b
yˆ
Slop
e for
variable x
1
le x 2
Trang 420 b x b x b
x1i The best fit equation, y ,
is found by minimizing the sum of squared errors, e 2
Sample observation
Trang 43Multiple Regression
Assumptions
The errors are normally distributed
The mean of the errors is zero
Errors have a constant variance
The model errors are independent
e = (y – y)
Errors (residuals) from the
regression model:
Trang 45Week Pie Sales Price ($) Advertising ($100s)
Trang 46Estimated (Predicted) regression
equation:
0 1 1 2 2
Trang 48Multiple Coefficient of
Determination
Reports the proportion of total
variation in y explained by all x
variables taken together
2 RSS Regression sum of squaresR
TSS Total sum of squares
Trang 49Multiple correlation (R)
Multiple correlation provides a
measure of the overall strength of
the relationship between dependent variable and independent variables
It is defined as the positive square root of the coefficient of the
determination R R2
Trang 50Correlation matrix
Provides measures of the strength of the relationship between dependent variable and each independent variable
x 1 r x 1 y 1
x 2 r x 2 y r x 1 x 2 1