Regardless of the value of x, the standard deviation of the distribution of

Một phần của tài liệu Introduction to business statistics by ronald weiersj brian gray 7th edition (Trang 580 - 585)

THE SIMPLE LINEAR REGRESSION MODEL

2. Regardless of the value of x, the standard deviation of the distribution of

3. The y values are statistically independent of each other. For example, if a given y value happens to exceed yx 0 1 1xi, this does not affect the probability that the next y value observed will also exceed yx 011xi. Figure 15.1 shows the variation of y values above and below a population re- gression line. There is a “family” of such distributions (one for each possible x value).

Each distribution has yx as its mean, and the standard deviations are the same (yx ).

The three assumptions can also be expressed in terms of the error, or re- sidual, component (i) in the simple linear regression model: (1) For any given value of x, the population of i values will be normally distributed with a mean of zero and a standard deviation of ; (2) this standard deviation will be the same regardless of the value of x; and (3) the i values are statistically indepen- dent of each other.

( 15.2 )

NO TES

Based on the sample data, the y-intercept and slope of the population regres- sion line can be estimated. The result is the sample regression line:

Sample regression line:

yˆ5 b0 1 b1 x where yyˆ 5 the estimated value of the dependent variable ( y) for a given value of x

b05 the y-intercept; this is the value of y where the line intersects the y-axis whenever x 5 0.

b15 the slope of the regression line x 5 a value for the independent variable

The cap (ˆ) over the y indicates that it is an estimate of the (unknown) “true”

value of y. The equation is completely described by the y-intercept (b0) and slope (b1), which are sample estimates of their population counterparts, 0 and 1, respectively. An infinite number of possible equations can be fitted to a given scatter diagram, and each equation will have a unique combination of values for b0 and b1. However, only one equation will be the “best fit” as defined by the least-squares criterion we are going to use.

The Least-Squares Criterion

The least-squares criterion requires that the sum of the squared deviations between y values in the scatter diagram and y values predicted by the equation be minimized. In symbolic terms:

FIGURE 15.1 For any given value of x, the

y values are assumed to be normally distributed about the population regression line and to have the same standard deviation, . The regression line based on sample data is an estimate of this “true” line.

Likewise, sy .x is our sample estimate of .

The mean of each distribution is E(y) for the given x, and the standard

deviation is s. my.x = b0 + b1x

x y

FIGURE 15.2 Using the least-squares criterion, the line fitted in part (b) is a better fit to the data than the line in part (a).

x

y

g

f e

d

c

c

d

e

f

g

00 5 10 15 20

1 2 3 4 5 x

(b) The same scatter diagram and estimation line y = 1 + 3xˆ y

0 0 5 10 15 20

1 2 3 4 5

(a) Scatter diagram and estimation line y = 7 + 2xˆ

y = 1 + 3x, and sum ofˆ the squared deviations is c2 + d2 + e2 + f2+g2= 67

y = 7 + 2x, and sum ofˆ the squared deviations is c2 + d2 + e2 + f2+g2= 100 Least-squares criterion for determining the best-fit equation:

The equation must be such that (yi2 yˆi )2 is minimized where yi5 the observed value of y for the given value of x

yˆi5 the predicted value of y for that x value, as determined from the regression equation

To show how the least-squares criterion works, consider parts (a) and (b) of Figure 15.2. In part (a) the sum of the squared deviations between observed and predicted y values is 100.0, while in part (b) the sum is only 67.0. According to the least-squares criterion, the line in part (b) is a better fit to the data than the line in part (a).

Determining the Least-Squares Regression Line

Equations have been developed for proceeding from a set of data to the least- squares regression line. They are based on the methods of calculus and provide values for b0 and b1 such that the least-squares criterion is met. The least-squares regression line may also be referred to as the least-squares regression equation or as simply the regression line:

Least-squares regression line,5 b01 b1x :

• Slope

b15 ( oxi yi) 2n }x }y ______________

( ox 2 i ) 2n }x 2 where n 5 number of data points

y-intercept

b05 }y 2 b1x }

With the slope determined, we take advantage of the fact that the least- squares regression equation passes through the point ( x }, y ). The equation for } finding the y-intercept (b05 y }2 b1x ) is just a rearrangement of } }y 5 b01 b1x . } (Note: If you are using a pocket calculator, you may wish to redefine the units so as to reduce the number of digits in the data before applying these and other formulas in the chapter. For example, by converting from dollars to millions of dollars, $30,500,000 may be expressed as $30.5 million. For some pocket calcu- lators, this can help avoid a “blow-up” in the number of digits when calculating summations of products or squares.)

EXAMPLE

Regression Equation

A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity. The data are in CX15DEX and shown here.

x Score on y Units Produced Employee Dexterity Test in One Hour

A 12 55

B 14 63

C 17 67

D 16 70

E 11 51

SOLUTION

The calculations necessary for determining the slope and y-intercept of the re- gression equation are shown in Table 15.1. Once the slope (b1 53.0) has been determined, this value is substituted into the equation for the y-intercept, and b0 is found to be 19.2. The least-squares regression equation, shown in the scatter diagram of Figure 15.3, is 5 19.2 1 3.0x.

EXAMPLE EXAMPLE EXAMPLE EXAMPLE E

TABLE 15.1 Data and calculations for determining the

least-squares regression line for the example involving dexterity test score (x) and units produced per hour (y).

Data and Preliminary Calculations

xi Score on yi Units Produced

Employee Dexterity Test in One Hour xi yi xi2 yi2

A 12 55 660 144 3025

B 14 63 882 196 3969

C 17 67 1139 289 4489

D 16 70 1120 256 4900

E 11 51 561 121 2601

70 306 4362 1006 18,984

xi yi xi yi xi2 yi2

}x 5 70y5 514.0 y }5 306y5 5 61.2

Calculations for Slope and y-Intercept of Least-Squares Regression Line slope,b1 (xi yi) 2 n }x y }

____________

(xi2) 2n }x 2 5 4362 2 5(14.0)(61.2) ___________________

1006 2 5(14.0)2 5 78.0_____

26.0 5 3.0 y-intercept, b05 y }2 b1 }x 5 61.2 2 3.0(14.0) 5 61.2 2 42.0 5 19.2

The least-squares regression line is yˆ 19.2 3.0x

where 5 estimated units produced per hour x 5score on manual dexterity test

FIGURE 15.3 Scatter diagram and least-squares regression line for the data of Table 15.1.

40 50 60 70 80

9 10 11 12 13 14 15 16 17 18 19 20

ˆ

Regression line for estimation:

y = 19.2 + 3.0x

x = Score on dexterity test

y = Productivity (units/hour)

558 Part 5: Regression, Model Building, and Time Series The slope of the regression line is positive, suggesting a direct relationship between dexterity test score and productivity. The value of the slope (b15 3.0) indicates that each one-point increase in the dexterity test score will increase the estimated productivity by 3.0 units per hour.

Simple Linear Regression

EXCEL

Một phần của tài liệu Introduction to business statistics by ronald weiersj brian gray 7th edition (Trang 580 - 585)

Tải bản đầy đủ (PDF)

(892 trang)