Chapter 13 - Linear regression and correlation, after studying this chapter you will be able to: Identify a relationship between variables on a scatter diagram, measure and interpret a degree of relationship by a coefficient of correlation, conduct a test of hypothesis about the coefficient of correlation in a population,...and other contents.
Trang 2When you have completed this chapter, you will be able to:
Trang 3Conduct a test of hypothesis for a regression
model and each coefficient of regression
Trang 5The Coefficient of Correlation… The Coefficient of Correlation… r r
… Is a measure of strength of the relationship
between two variables … It requires interval or ratioscaled data
… It can range from 1.00 to 1.00
…Values of 1 00 or 1 00 indicate perfect
and strong correlation
…Values close to 0.0 indicate weak correlation
… Negative values indicate an inverse relationship
and positive values indicate a direct relationship
Trang 10Chart 136
Trang 11Chart 13.4
Trang 12How Income and WellBeing of Canadians are Related (197197)
How Income and WellBeing of Canadians are Related (197197)
r = 0.7415
Estimate r
Trang 14 … represented by r 2
… is the proportion of the total variation in the
dependent variable (Y) that is explained or accounted for by the variation in the
Trang 15Dan Ireland, the student body president, is concerned about the cost to students of textbooks.
He believes there is a relationship between the number of pages in the text and the selling price of the book !
To provide insight into the problem he selects a sample of eight (8) textbooks
Trang 16Book # Pages Price ($)
Trang 17400 500 600 700 800 60
70
90 100
Trang 18Scatter Diagram Excel Printout
Scatter Diagram Excel Printout
Trang 19Book # Pages Price ($)
xy
n r
2 ) ( x
2
x
n n y 2 ( y ) 2
Trang 20Correlation Coefficient Correlation Coefficient
x y x y x 2 y 2
4 900 636 397 200 3 150 000 51 606
The correlation coefficient The correlation coefficient is is 61.4% 61.4%
This indicates a moderate association between the
2 )
636
( )
606 ,
51
( 8
2
315 000 ( 4 900 ) (
xy
n r
2 ) ( x
2
x
n n y 2 ( y ) 2
) 200 397
( 8
Trang 21H 0 is rejected if t>3.143 or if t<3.143.
There are 6 df, found by
n – 1 = 8 – 2 = 6.
Let’s test the hypothesis that there is no correlation in the population.
Use a .02 significance level.
Let’s test the hypothesis that there is no correlation in the population.
Compute the test statistic and make a decision
Step 5 Step 5
Trang 22Compute the test statistic and make a decision
Compute the test statistic and make a decision
Use a .02 significance level.
Let’s test the hypothesis that there is no correlation in the population.
Use a .02 significance level
905
1
2
) 614 (.
1
2 8
614
Trang 23We use the independent variable (X) to
Trang 25R egression E quation
y = a + bx y = a + b x
Trang 26Dan Ireland, the student body president, is concerned about the cost to students of textbooks.
He believes there is a relationship between the number of pages in the text and the selling price of the book !
To provide insight into the problem he selects a sample of eight (8) textbooks
Trang 27x y x y x y
4 900 636 397 200 3 150 000 51 606
8(397 200) – ( 4 900 )( 636 ) 8( 3 150 000 ) – ( 4 900 ) 2
$48
b n x y x y
( ) ( )( ) ( 2 ) ( ) 2
y = a + bx y = a + b x
Trang 29Using
Trang 31Click on
XY (Scatter)
Using
E xcel
Trang 32INPUT DATA range INPUT DATA range
Click Next Click Next
Using
E xcel
Trang 33Complete INPUTTING of TITLES Complete INPUTTING of TITLES
Click Next Click Next Click Finish Click Finish
Using
E xcel
Trang 35To remove the Legend
on the right side…
Right mouse click and Click
on Clear
Using
E xcel
Trang 36Regression Line and equation to this scatter plot…
Trang 37… then CLICK on OPTIONS TAB … then CLICK on OPTIONS TAB
Trang 38Check EQUATION and Check EQUATION and Rsquared Value Rsquared Value
Using
E xcel
Trang 39You can now interpret your results! You can now interpret your results!
Using
E xcel
Concerned about the y intercept?
Trang 40Formatting the
axes…
Resulted in ….
a distortion of the y
intercept
Trang 41E xcel
Trang 42DATA ANALYSIS
Trang 44INPUT NEEDS INPUT NEEDS
Using
E xcel
… Click OK
Trang 45E xcel
See See
Trang 47y a
Trang 48) 200 ,
397 (
05143
0 )
636
( 48
Trang 49Assumptions Underlying
Linear Regression Assumptions Underlying
Linear Regression For each value of x , there is a group of y values, and these y values are normally distributed The means of these normal distributions of y values all lie on the straight line of regression The standard deviations of these
normal distributions are equal The y values are statistically independent.
This means that in the selection of a sample
the y values chosen for a particular x value
do not depend on the y values
for any other x values
Trang 50The confidence interval for the mean value of y for a given
value of x is given by:
31 15 14
3
) 5 612 800
( 8
1
2
2
) 408
10 ( 447
2 14
) (
1
2 2
2
n x
n
e
x
Trang 51The prediction interval for an individual value of
y for a given value of x is given by:
) 408
10 ( 447
2 14
3
) 5 612 800
( 8
89
) (
) (
1
2 2
2
n x
Trang 52 The estimated selling price for a book with 800 pages
is $89.14
The standard error of estimate is $10.41
The 95 percent confidence interval for all books with 800 pages is $89.14 + $15.31 This means the limits are between $73.83 and
$ 104.45
The 95 percent prediction interval for a particular book with 800 pages is $89.14 + $29.72 The means the limits are between $59.42 and
Trang 53The regression equation is
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI
1 89.14 6.26 (73.82,104.46) (59.41,118.88)
Regression Analysis: Price versus Pages
Trang 54Price vs. Pages EXCEL output: Price vs. Pages
Trang 55searchable glossary access to Statistics Canada’s EStat data
…and much more!
Trang 56This completes Chapter 13