Introduction to Regression Analysis Regression analysis is used to: the value of at least one independent variable variable on the dependent variable Dependent variable: the variable we
Trang 1Statistics for Business and Economics
7 th Edition
Chapter 11
Simple Regression
Trang 2Chapter Goals
After completing this chapter, you should be able to:
equation for a set of data
Trang 4Overview of Linear Models
An equation can be fit to show the best linear
relationship between two variables:
Y = β 0 + β 1 X
Where Y is the dependent variable and
X is the independent variable
β 0 is the Y-intercept
11.1
Trang 5Least Squares Regression
Estimates for coefficients β 0 and β 1 are found
using a Least Squares Regression technique
data, is
Where b 1 is the slope of the line and b 0 is the
y-intercept:
x b b
2 1
s
y) Cov(x,
Trang 6Introduction to Regression Analysis
Regression analysis is used to:
the value of at least one independent variable
variable on the dependent variable
Dependent variable: the variable we wish to explain
(also called the endogenous variable ) Independent variable: the variable used to explain
(also called the exogenous variable )
Trang 7Linear Regression Model
described by a linear function
changes in X
coefficients and is a random error term.
i i
1 0
11.2
Trang 8Simple Linear Regression
Model
i i
1 0
Coefficient
Random Error term Dependent
Variable
Independent Variable
Random Error component
Trang 9Simple Linear Regression
Model
(continued)
Random Error for this X i value
1 0
Trang 10Simple Linear Regression
Equation
i 1
0
The simple linear regression equation provides an
estimate of the population regression line
Estimate of the regression intercept
Estimate of the regression slope
Estimated (or predicted)
y value for observation i
Value of x for observation i
The individual random error terms e i have a mean of zero
) )
ˆ
e
Trang 11Least Squares Estimators
b 0 and b 1 are obtained by finding the values
of b 0 and b 1 that minimize the sum of the squared differences between y and :
2 i 1 0
i
2 i i
2 i
)]
x b (b
[y min
) y (y
min
e min
SSE
Trang 12Least Squares Estimators
x
y xy 2
x n
1 i
2 i
n
1 i
i i
1
s
s r s
y) Cov(x, )
x (x
) y )(y
(continued)
Trang 13Finding the Least Squares
Equation
The coefficients b 0 and b 1 , and other
regression results in this chapter, will be found using a computer
Hand calculations are tedious
Statistical routines are built into Excel
Other statistical analysis software can be used
Trang 14Linear Regression Model
Assumptions
of X, plus random error)
(the constant variance property is called homoscedasticity )
another, so that
n) , 1, (i
for σ
] E[ε and
0 ]
E[ε i i 2 2
Trang 15Interpretation of the Slope and the Intercept
b 0 is the estimated average value of y when the value of x is zero (if x = 0 is
in the range of observed x values)
b 1 is the estimated change in the average value of y as a result of a one-unit change in x
Trang 16Simple Linear Regression
Example
A real estate agent wishes to examine the
relationship between the selling price of a home and its size (measured in square feet)
A random sample of 10 houses is selected
Dependent variable (Y) = house price in $1000s
Independent variable (X) = square feet
Trang 17Sample Data for House Price Model
Trang 18Graphical Presentation
House price model: scatter plot
0 50 100 150 200 250 300 350 400 450
Trang 19Regression Using Excel
Excel will be used to generate the coefficients and measures of goodness of fit for regression
Data / Data Analysis / Regression
Trang 20Regression Using Excel
Data / Data Analysis / Regression (continued)
Provide desired input:
Trang 21Excel Output
Trang 220.10977 98.24833
price
(continued)
Trang 230 50 100 150 200 250 300 350 400 450
0.10977 98.24833
Trang 240.10977 98.24833
price
Trang 25Interpretation of the Slope Coefficient, b 1
b 1 measures the estimated change in the
average value of Y as a result of a
0.10977 98.24833
price
Trang 26Measures of Variation
Total variation is made up of two parts:
SSE
SSR
Trang 27
Measures of Variation
SST = total sum of squares
Measures the variation of the y i values around their mean, y
SSR = regression sum of squares
Explained variation attributable to the linear relationship between x and y
SSE = error sum of squares
Variation attributable to factors other than the linear relationship between x and y
(continued)
Trang 29Coefficient of Determination, R 2
The coefficient of determination is the portion
of the total variation in the dependent variable that is explained by variation in the
independent variable
The coefficient of determination is also called
R-squared and is denoted as R 2
1 R
note:
squares of
sum total
squares of
sum
regression SST
SSR
Trang 30Examples of Approximate
r 2 Values
Y
X Y
Trang 31Examples of Approximate
r 2 Values
Y
X Y
Trang 32Examples of Approximate
r 2 Values
r 2 = 0
No linear relationship between X and Y:
The value of Y does not depend on X (None of the variation in Y is explained
by variation in X)
Y
X
r 2 = 0
Trang 330.58082 32600.5000
18934.9348 SST
SSR
Trang 34Correlation and R 2
The coefficient of determination, R 2 , for a
simple regression is equal to the simple correlation squared
2 xy
R
Trang 35SSE 2
n
e s
σ
n
1 i
2 i 2
2 e
s
Trang 37Comparing Standard Errors
Y Y
e
s
values from the regression line
The magnitude of s e should always be judged relative to the size
of the y values in the sample data i.e., s = $41.33K is moderately small relative to house prices in
Trang 38Inferences About the Regression Model
The variance of the regression slope coefficient (b 1 ) is estimated by
2 x
2 e 2
i
2 e
2
1)s (n
s )
x (x
s s
Trang 40Comparing Standard Errors of
is a measure of the variation in the slope of regression lines from different possible samples b 1
S
Trang 41Inference about the Slope:
t Test
t test for a population slope
Null and alternative hypotheses
β1 = hypothesized slope
s = standard
Trang 42Inference about the Slope:
98.25 price
house
Estimated Regression Equation:
The slope of this model is 0.1098
Does square footage of the house affect its sales price?
(continued)
Trang 43Inferences about the Slope:
t Test Example
H 0 : β 1 = 0
H 1 : β 1 0
From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039
1
b s
t
b 1
3.32938 0.03297
0
0.10977 s
β
b t
1 b
1 1
Trang 44Inferences about the Slope:
Trang 45Inferences about the Slope:
Trang 46Confidence Interval Estimate
for the Slope
Confidence Interval Estimate of the Slope:
Excel Printout for House Prices:
At 95% level of confidence, the confidence interval for
the slope is (0.0337, 0.1858)
1
b α/2 2, n
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
d.f = n - 2
Trang 47Confidence Interval Estimate
for the Slope
Since the units of the house price variable is
$1000s, we are 95% confident that the average impact on sales price is between $33.70 and
$185.80 per square foot of house size
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
This 95% confidence interval does not include 0
Conclusion: There is a significant relationship between house price and square feet at the 05 level of significance
(continued)
Trang 48F-Test for Significance
SSE MSE
k
SSR MSR
Trang 4918934.9348 MSE
MSR
With 1 and 8 degrees
of freedom P-value for the F-Test
Trang 50F-Test for Significance
= 05
11.08 MSE
MSR
Critical Value:
F = 5.32
(continued)
F
Trang 51 The regression equation can be used to
predict a value for y, given a particular x
For a specified value, x n+1 , the predicted
value is
1 n 1 0
1
y ˆ
11.6
Trang 52Predictions Using Regression Analysis
317.85
0) 0.1098(200 98.25
(sq.ft.) 0.1098
98.25 price
Trang 53Relevant Data Range
When using a regression model for prediction, only predict within the relevant range of data
Trang 54Estimating Mean Values and Predicting Individual Values
Goal: Form intervals around y to express
y
Trang 55Confidence Interval for the Average Y, Given X
Confidence interval estimate for the
expected value of y given a particular x i
Notice that the formula involves the term
so the size of interval varies according to the distance
2 1
n e
α/2 2, n 1
n
1 n 1
n
) x (x
) x
(x n
1 s
t y
: ) X
| E(Y
for interval
Confidence
ˆ
2 1
(x
Trang 56Prediction Interval for
2 1
n e
α/2 2, n 1
n
1 n
) x (x
) x
(x n
1 1
s t
y
: y
for interval
Confidence
ˆ
ˆ
Trang 57Estimation of Mean Values:
x (x
) x
(x n
1 s
t
i
2 i
e α/2 2, - n 1
Trang 58Estimation of Individual Values:
Example
Find the 95% confidence interval for an individual
house with 2,000 square feet
Confidence Interval Estimate for y n+1
102.28
317.85 )
X (X
) X
(X n
1 1
s t
i
2 i
e α/2 1, - n 1
Trang 59Correlation Analysis
Correlation analysis is used to measure
strength of the association (linear relationship) between two variables
relationship
11.7
Trang 60s s
s
r
1 n
) y )(y
x
(x
s xy i i where
Trang 61Hypothesis Test for Correlation
To test the null hypothesis of no linear
association,
the test statistic follows the Student’s t
distribution with (n – 2 ) degrees of freedom:
0 ρ
:
) r (1
2) (n
r t
2
Trang 62r
Trang 63Graphical Analysis
The linear regression model is based on
minimizing the sum of squared errors
If outliers exist, their potentially large squared
errors may have a strong influence on the fitted regression line
Be sure to examine your data graphically for
outliers and extreme points
Decide, based on your model and logic, whether the extreme points should remain or be removed 11.9
Trang 64Chapter Summary
Introduced the linear regression model
Reviewed correlation and the assumptions of
linear regression
Discussed estimating the simple linear
regression coefficients
Described measures of variation
Described inference about the slope
Addressed estimation of mean values and
prediction of individual values