1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Handbook of Industrial Automationedited - Chapter 4 pptx

105 234 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 105
Dung lượng 831,66 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

1 is 14:44, which corresponds to one of the four points below the regression line, namely the point x1; y† ˆ 11; 87:6†: At each of the x1 values in the data set we assume that the popul

Trang 1

Massey University±Albany, Palmerston North, New Zealand

1.1 FITTING A MODEL TO DATA

1.1.1 What is Regression?

1.1.1.1 Historical Note

Regression is, arguably, the most commonly used

tech-nique in applied statistics It can be used with data that

are collected in a very structured way, such as sample

surveys or experiments, but it can also be applied to

observational data This ¯exibility is its strength but

also its weakness, if used in an unthinking manner

The history of the method can be traced to Sir

Francis Galton who published in 1885 a paper with

the title, ``Regression toward mediocrity in hereditary

stature.'' In essence, he measured the heights of

par-ents and found the median height of each mother±

father pair and compared these medians with the

height of their adult offspring He concluded that

those with very tall parents were generally taller

than average but were not as tall as the median height

of their parents; those with short parents tended to be

below average height but were not as short as the

median height of their parents Female offspring

were combined with males by multiplying female

heights by a factor of 1.08

Regression can be used to explain relationships or

to predict outcomes In Galton's data, the median

height of parents is the explanatory or predictor

vari-able, which we denote by X, while the response or

predicted variable is the height of the offspring,

denoted by Y While the individual value of Y cannot

be forecast exactly, the average value can be for a givenvalue of the explanatory variable, X

1.1.1.2 Brief OverviewUppermost in the minds of the authors of this chapter

is the desire to relate some basic theory to the tion and practice of regression In Sec 1.1, we set outsome terminology and basic theory Section 1.2 exam-ines statistics and graphs to explore how well theregression model ®ts the data Section 1.3 concentrates

applica-on variables and how to select a small but effectivemodel Section 1.4 looks to individual data pointsand seeks out peculiar observations

We will attempt to relate the discussion to somedata sets which are shown in Sec 1.5 Note that datamay have many different forms and the questionsasked of the data will vary considerably from oneapplication to another The variety of types of data

is evident from the description of some of these datasets

Example 1 Pairs (Triplets, etc.) of Variables (Sec.1.5.1): The Y-variable in this example is the heat devel-oped in mixing the components of certain cements whichhave varying amounts of four X-variables or chemicals inthe mixture There is no information about how the var-ious amounts of the X-variables have been chosen Allvariables are continuous variables

269

Trang 2

Example 2 Grouping Variables (Sec 1.5.2):

Qualitative variables are introduced to indicate groups

allocated to different safety programs These qualitative

variables differ from other variables in that they only

take the values of 0 or 1

Example 3 A Designed Experiment (Sec 1.5.3): In

this example, the values of the X-variables have been

set in advance as the design of the study is structured

as a three-factor composite experimental design The

X-variables form a pattern chosen to ensure that they are

uncorrelated

1.1.1.3 What Is a Statistical Model?

A statistical model is an abstraction from the actual

data and refers to all possible values of Y in the

popu-lation and the repopu-lationship between Y and the

corre-sponding X in the model In practice, we only have

sample values, y and x, so that we can only check to

ascertain whether the model is a reasonable ®t to these

data values

In some area of science, there are laws such as the

relationship e ˆ mc2 in which it is assumed that the

model is an exact relationship In other words, this

law is a deterministic model in which there is no

error In statistical models, we assume that the model

is stochastic, by which we mean that there is an error

term, e, so that the model can be written as

Y ˆ f …X ˆ x† ‡ e

In a regression model, f …:† indicates a linear function ofthe X-terms The error term is assumed to be randomwith a mean of zero and a variance which is constant,that is, it does not depend on the value taken by the X-term It may re¯ect error in the measurement of the Y-variable or by variables or conditions not de®ned inthe model The X-variable, on the other hand, isassumed to be measured without error

In Galton's data on heights of parents and spring, the error term may be due to measurementerror in obtaining the heights or the natural variationthat is likely to occur in the physical attributes of off-spring compared with their parents

off-There is a saying that ``No model is correct butsome are useful.'' In other words, no model will exactlycapture all the peculiarities of a data set but somemodels will ®t better than others

1.1.2 How to Fit a Model1.1.2.1 Least-Squares Method

We consider Example 1, but concentrate on the effect

of the ®rst variable, x1, which is tricalcium aluminate,

on the response variable, which is the heat generated.The plot of heat on tricalcium aluminate, with theleast-squares regression line, is shown in Fig 1 Theleast-squares line is shown by the solid line and can bewritten as

^y ˆ f …X ˆ x1† ˆ a ‡ bx1ˆ 81:5 ‡ 1:87x1 …1†where ^y is the predicted value of y for the given value

x1of the variable X1

Figure 1 Plot of heat, y, on tricalcium aluminate, x1

Trang 3

All the points represented by …x1; y† do not fall on

the line but are scattered about it The vertical distance

between each observation, y, and its respective

pre-dicted value, ^y, is called the residual, which we denote

by e The residual is positive if the observed value of y

falls above the line and negative if below it Notice in

Sec 1.5.1 that for the fourth row in the table, the ®tted

value is 102.04 and the residual (shown by e inFig 1)

is 14:44, which corresponds to one of the four points

below the regression line, namely the point …x1; y† ˆ

…11; 87:6†:

At each of the x1 values in the data set we assume

that the population values of Y can be written as a

linear model, by which we mean that the model is

linear in the parameters For convenience, we drop

the subscript in the following discussion

More correctly, Y should be written as Y j x, which is

read as ``Y given X ˆ x.''

Notice that a model, in this case a regression model,

is a hypothetical device which explains relationships in

the population for all possible values of Y for given

values of X The error (or deviation) term, ", is

assumed to have for each point in the sample a

popu-lation mean of zero and a constant variance of 2 so

that for X ˆ a particular value x, Y has the following

distribution:

Y j x is distributed with mean … ‡ x† and variance

2

It is also assumed that for any two points in the

sam-ple, i and j, the deviations "i and "j are uncorrelated

The method of least squares uses the sample of n

(ˆ 13 here) values of x and y to ®nd the least-squares

estimates, a and b, of the population parameters and

by minimizing the deviations More speci®cally, we

seek to minimize the sum of squares of e, which we

denote by S2, which can be written as

S2ˆXe2ˆX‰y f …x†Š2ˆX‰y …a ‡ bx†Š2

…3†

The symbol Pindicates the summation over the n ˆ

13 points in the sample

1.1.2.2 Normal Equations

The values of the coef®cients a and b which minimize

S2 can be found by solving the following, which are

called normal equations We do not prove this

state-ment but the reader may refer to a textbook on

regres-sion, such as Brook and Arnold [1]

X

‰y …a ‡ bx†Š ˆ 0 or na ‡ bXx ˆXyX

3 From Sec 1.5.1, we see that the mean of x is 7.5and of y is 95.4

The normal equations become13a ‡ 97b ˆ 1240:5

Simple arithmetic gives the solutions as a ˆ 81:5 and

b ˆ 1:87

1.1.3 Simple Transformations1.1.3.1 Scaling

The size of the coef®cients in a ®tted model will depend

on the scales of the variables, predicted and predictor

In the cement example, the X variables are measured ingrams Clearly, if these variables were changed to kilo-grams, the values of the X would be divided by 1000and, consequently, the sizes of the least squares coef®-cients would be multiplied by 1000 In this example,the coef®cients would be large and it would be clumsy

to use such a transformation

In some examples, it is not clear what scales should

be used To measure the consumption of petrol (gas), it

is usual to quote the number of miles per gallon, butfor those countries which use the metric system, it isthe inverse which is often quoted, namely the number

of liters per 100 km travelled

1.1.3.2 Centering of Data

In some situations, it may be an advantage to change x

to its deviation from its mean, that is, x x The ®ttedequation becomes

Trang 4

^y ˆ a ‡ b…x x†

but these values of x and b may differ from Eq (1)

Notice that the sum of the …x x† terms is zero as

which can be shown to be the same as in Eq (5)

The ®tted line is

^y ˆ 95:42 ‡ 1:87…x x†

If the y variable is also centered and the two centered

variables are denoted by y and x, the ®tted line is

y ˆ 1:87x

The important point of this section is that the inclusion

of a constant term in the model leads to the same

coef®cient of the X term as transforming X to be

cen-tered about its mean In practice, we do not need to

perform this transformation of centering as the

inclu-sion of a constant term in the model leads to the same

estimated coef®cient for the X variable

1.1.4 Correlations

Readers will be familiar with the correlation coef®cient

between two variables In particular the correlation

between y and x is given by

rxyˆ Sxy=q…SxxSyy† …8†

There is a duality in this formula in that interchanging

x and y would not change the value of r The

relation-ship between correlation and regression is that the

coef®cient b in the simple regression line above can

be written as

b ˆ rqSyy=Sxx …9†

In regression, the duality of x and y does not hold A

regression line of y on x will differ from a regression

line of x and y

1.1.5 Vectors1.1.5.1 Vector NotationThe data for the cement example (Sec 1.5) appear asequal-length columns This is typical of data sets inregression analysis Each column could be considered

as a column vector with 13 components We focus onthe three variables y (heat generated), ^y(FITS1 ˆ predicted values of y), and e (RESI1 ˆresiduals)

Notice that we represent a vector by bold types: y, ^y,and e

The vectors simplify the columns of data to twoaspects, the lengths and directions of the vectors and,hence, the angles between them The length of a vectorcan be found by the inner, or scalar, product Thereader will recall that the inner product of y is repre-sented as y  y or yTy, which is simply the sum of thesquares of the individual elements

Of more interest is the inner product of ^y with e,which can be shown to be zero These two vectors aresaid to be orthogonal or ``at right angles'' as indicated

in Fig 2

We will not go into many details about the try of the vectors, but it is usual to talk of ^y being theprojection of y in the direction of x Similarly, e is theprojection of y in a direction orthogonal to x, ortho-gonal being a generalization to many dimensions of ``atright angles to,'' which becomes clear when the angle 

geome-is considered

Notice that e and ^y are ``at right angles'' or gonal.'' It can be shown that a necessary and suf®cientcondition for this to be true is that eT^y ˆ 0

``ortho-In vector terms, the predicted value of y is

^y ˆ a1 ‡ bxand the ®tted model is

Writing the constant term as a column vector of `1'spave the way for the introduction of matrices in Sec.1.1.7

Figure 2 Relationship between y; ^y and e

Trang 5

1.1.5.2 VectorsÐCentering and Correlations

In this section, we write the vector terms in such a way

that the components are deviations from the mean; we

The length of the vector y, written as jyj, is the

square root of …yTy† ˆ 52:11 Similarly the lengths of

^y and e are 38.08 and 35.57, respectively

The inner product of y with the vector of ®tted

As y and x are centered, the correlation coef®cient of y

on x can be shown to be cos 

1.1.6 Residuals and Fits

We return to the actual values of the X and Y

vari-ables, not the centered values as above.Figure 2

pro-vides more insight into the normal equations, as the

least-squares solution to the normal equation occurs

when the vector of residuals is orthogonal to the vector

of predicted values Notice that ^yTe ˆ 0 can be

expanded to

…a1 ‡ bx†Te ˆ a1Te ‡ bxTe ˆ 0 …12†

This condition will be true if each of the two parts are

equal to zero, which leads to the normal equations, Eq

(4), above

Notice that the last column of Sec 1.5.1 con®rms

that the sum of the residuals is zero It can be shown

that the corollary of this is that the sum of the observed

y is the same as the sum of the ®tted y values; if the

sums are equal the means are equal and Section 1.5.1shows that they are both 95.4

The second normal equation in Eq (4) could bechecked by multiplying the components of the twocolumns marked x1 and RESI1 and then adding theresult

In Fig 1.3, we would expect the residuals toapproximately fall into a horizontal band on eitherside of the zero line If the data satisfy the assumptions,

we would expect that there would not be any tic trend in the residuals At times, our eyes maydeceive us into thinking there is such a trend when infact there is not one We pick this topic up again later.1.1.7 Adding a Variable

as b0x0 and without loss of generality, x0ˆ 1

The normal equations follow a similar pattern tothose indicated by Eq (4), namely,

X

‰b0‡ b1x1‡ b2x2Š ˆXyX

x1‰b0‡ b1x1‡ b2x2Š ˆXx1yX

x2‰b0‡ b1x1‡ b2x2Š ˆXx2y

…13†

Figure 3 Plot of residuals against ®tted values for y on x1

Trang 6

Note that the entries in bold type are the same as those

in the normal equations of the model with one

predic-tor variable It is clear that the solutions for b0and b1

will differ from those of a and b in the normal

equa-tions, Eq (6) It can be shown that the solutions are:

b0ˆ 52:6, b1ˆ 1:47, and b2ˆ 0:622:

Note:

1 By adding the second prediction variable x2, the

coef®cient for the constant term has changed

from a ˆ 81:5 to b0ˆ 52:6 Likewise the

coef®-cient for x has changed from 1.87 to 1.47 The

structure of the normal equations give some

indication why this is so

2 The coef®cients would not change in value if the

variables were orthogonal to each other For

example, if x0 was orthogonal to x2, Px0x2

would be zero This would occur if x2 was in

the form of deviation from its mean Likewise,

if x1 and x2 were orthogonal,Px1x2 would be

zero

3 What is the meaning of the coef®cients, for

example b1? From the ®tted regression

equa-tion, one is tempted to say that ``b1 is the

increase in y when x1 increases by 1.'' From 2,

we have to add to this, the words ``in the

pre-sence of the other variables in the model.''

Hence, if you change the variables, the meaning

of b1 also changes

When other variables are added to the model, the

for-mulas for the coef®cients become very clumsy and it is

much easier to extend the notation of vectors to that of

matrices Matrices provide a clear, generic approach to

the problem

1.1.7.2 Vectors and Matrices

As an illustration, we use the cement data in which

there are four predictor variables The model is

y ˆ 0x0‡ 1x1‡ 2x2‡ 3x3‡ 4x4‡ "

The ®tted regression equation can be written in vector

notation,

y ˆ b0x0‡ b1x1‡ b2x2‡ b3x3‡ b4x4‡ e …15†

The data are displayed in Sec 1.5.1 Notice that each

column vector has n ˆ 13 entries and there are k ˆ 5

vectors As blocks of ®ve vectors, the predictors can bewritten as an n  k ˆ 13  5 matrix, X

The ®tted regression equation is

13b0‡ 97b1‡ 626b2‡ 153b3‡ 39064b4ˆ 1240:597b0‡ 1130b1‡ 4922b2‡ 769b3‡ 2620b4ˆ 10,032626b0‡ 4922b1‡ 33050b2‡ 7201b3‡ 15739b4

ˆ 62,027.8153b0‡ 769b1‡ 7201b2‡ 2293b3‡ 4628b4

ˆ 13,981.539,064b0‡ 2620b1‡ 15;739b2‡ 4628b3‡ 15;062b4

ˆ 34,733.3Notice the symmetry in the coef®cients of the bi.The matrix solution is

b ˆ …XTX† 1XTY

bTˆ …62:4; 1:55; 0:510; 0:102; 0:144† …18†With the solution to the normal equations written asabove, it is easy to see that the least-squares estimates

of the parameters are weighted means of all the yvalues in the data The estimates can be written as

biˆXwiyiwhere the weights wi are functions of the x values:The regression coef®cients re¯ect the strengths andweaknesses of means The strengths are that eachpoint in the data set contributes to each estimate butthe weaknesses are that one or two unusual values inthe data set can have a disproportionate effect on theresulting estimates

1.1.7.3 The Projection Matrix, PFrom the matrix solution, the ®tted regression equa-tion becomes

Trang 7

^y ˆ xb ˆ x…XTX† 1XTy or Py …19†

P ˆ X…XTX† 1XTis called the projection matrix and it

has some nice properties, namely

1 PT ˆ P that is, it is symmetrical

2 PTP ˆ P that is, it is idempotent

3 The residual vector e ˆ y ^y ˆ …I P†y

I is the identity matrix with diagonal elements

being 1 and the off-diagonal elements being 0

4 From the triangle diagram, e is orthogonal to ^y,

which is easy to see as

eT^y ˆ yT…I PT†Py ˆ yT…P PTP†y ˆ 0

5 P is the projection matrix onto X and ^y is the

projection of y onto X

6 I P is the projection matrix orthogonal to X

and the residual, 1, is the projection of y onto a

direction orthogonal to X

The vector diagram ofFig 2becomes Fig 4

1.1.8 Normality

1.1.8.1 Assumptions about the Models

In the discussion so far, we have seen some of the

relationships and estimates which result from the

least-squares method which are dependent on

assump-tions about the error, or deviation, term in the model

We now add a further restriction to these assumptions,

namely that the error term, e, is distributed normally

This allows us to ®nd the distribution of the residuals

and ®nd con®dence intervals for certain estimates and

carry out hypothesis tests on them

The addition of the assumption of normality adds

to the concept of correlation as a zero correlation

coef-®cient between two variables will mean that they are

We are usually more interested in the coef®cient of the

x term The con®dence interval (CI) for this coef®cient

… 1† is given by

CI ˆ b1 tn 2qs2=Sxx …21†

1.1.8.3 Con®dence Interval for the MeanThe 95% con®dence interval for the predicted value, ^y,when x ˆ x0 is given by

smal-This con®dence interval is illustrated in Fig 5 usingthe cement data

1.1.8.4 Prediction Interval for a Future Value

At times one wants to forecast the value of y for agiven single future value x0 of x This prediction inter-val for a future single point is widier than the con®-dence interval of the mean as the variance of singlevalue of y around the mean is 2 In fact, the ``1''

Figure 4 Projections of y in terms of P Figure 5 Con®dence and prediction intervals

Trang 8

under the square root symbol may dominate the other

terms The formula is given by

Regression is a widely used and ¯exible tool,

applic-able to many situations

The method of least squares is the most commonly

used in regression

The resulting estimates are weighted means of the

response variable at each data point Means

may not be resistant to extreme values of either

X or y

The normal, gaussian, distribution is closely linked

to least squares, which facilitates the use of the

standard statistical methods of con®dence

inter-vals and hypothesis tests

In ®tting a model to data, an important result of the

least-squares approach is that the vector of ®tted

or predicted values is orthogonal to the vector of

residuals With the added assumptions of

nor-mality, the residuals are statistically independent

of the ®tted values

The data appear as columns which can be

consid-ered as vectors Groups of X vectors can be

manipulated as a matrix A projection matrix is

a useful tool in understanding the relationships

between the observed values of y, the predicted y

and the residuals

1.2 GOODNESS OF FIT OF THE MODEL

1.2.1 Regression Printout from MINITAB

1.2.1.1 Regression with One or More Predictor

Variables

In this section, comments are made on the printout

from a MINITAB program on the cement data using

the heat evolved as y and the number of grams of

tricalcium aluminate as x This is extended to two or

We have noted in Sec 1.1.7.1 that the estimatedcoef®cients will vary depending on the other variables

in the model With the ®rst two variables in the model,the ®tted regression equation represents a plane andthe least-squares solution is

y ˆ 52:6 ‡ 1:47x1‡ 0:662x2

In vector terms, it is clear that x1 is not orthogonal to

x2.1.2.1.3 Distribution of the Coef®cients

The formula for the standard deviation (also called thestandard error by some authors) of the constant termand for the x1 term is given in Sec 1.1.8.1

The T is the t-statistic ˆ (estimator hypothesizedparameter)/standard deviation The hypothesizedparameter is its value under the null hypothesis,which is zero in this situation The degrees of freedomare the same as those for the error or residual term.One measure of the goodness of ®t of the model iswhether the values of the estimated coef®cients, andhence the values of the respective t-statistics, couldhave arisen by chance and these are indicated by thep-values

The p-value is the probability of obtaining a moreextreme t-value by chance As the p-values here aresmall, we conclude that small t-value is due to thepresence of x1 in the model In other words, as theprobabilities are small (< 0:05 which is the commonlevel used), both the constant and b1 are signi®cant

at the 5% level

1.2.1.4 R-Squared and Standard Error

S = 10.73 R-Sq = 53.4% R-Sq(adj) = 49.2%

S ˆ 10:73 is the standard error of the residual term

We would prefer to use lower case, s, as it is anestimate of the S in the S2 of Eq (3)

R Sq (short for R-squared) is the coef®cient ofdetermination, R2, which indicates the proportion of

Trang 9

the variation of Y explained by the regression

equa-tion:

R2ˆ S^y ^y=Syy and recall that Syy ˆX…y y†2

It is shown that R is the correlation coef®cient between

^y and y provided that the x and y terms have been

y2ˆyTPy

R2 lies between 0, if the regression equation does not

explain any of the variation of Y, and 1 if the

regres-sion equation explains all of the variation Some

authors and programs such as MINITAB write R2 as

a percentage between 0 and 100% In this case, R2 is

only about 50%, which does not indicate a good ®t

After all, this means that 50% of the variation of y is

unaccounted for

As more variables are added to the model, the value

of R2will increase as shown in the following table The

variables x1; x2; x3, and x4 were sequentially added to

the model Some authors and computer programs

con-sider the increase in R2, denoted by R2 In this

exam-ple, x2 adds a considerable amount to R2 but the next

two variables add very little In fact, x4 appears not to

add any prediction power to the model but this would

suggest that the vector x4is orthogonal to the others It

is more likely that some rounding error has occurred

One peculiarity of R2 is that it will, by chance, give a

value between 0 and 100% even if the X variable is a

column of random numbers To adjust for the random

effect of the k variables in the model, the R2, as a

proportion, is reduced by k=…n 1† and then adjusted

to fall between 0 and 1 to give the adjusted R2 It could

be multiplied by 100 to become a percent:

Sums of squares of y ˆ Sums of squares of ^y

‡ Sums of squares of eThat is,

Sums of squares of total ˆSums of squares for regression ‡Sums of squares for residual

…26†

The ANOVA table is set up to test the hypothesis thatthe parameter ˆ 0 If there are more than one pre-dictor variable, the hypothesis would be,

H: ...

60522 047 3322 644 2226 341 21230

94. 56183. 348 102.036102.036 94. 561102.03687.08683. 348 85.217120.72383. 348 102.036100.16795 .4

16:06069: 048 12:2 644 14: 43561:33 947 :1 644 15:6 144 10: 848 17:883 24: 82300 :45 1911:2 644 9:23320...

16:06069: 048 12:2 644 14: 43561:33 947 :1 644 15:6 144 10: 848 17:883 24: 82300 :45 1911:2 644 9:23320

80.0 747 3.251105.81589.25897.293105.1521 04. 002 74. 57591.2751 14. 53880.536112 .43 7112.29395 .4

1:5 740 01: 049 081:5 147 41:65 848 1:2925 14: 047 511:302052:07 542 1:8 245 11:36 246 3:2 643 30:862762:89 344 0...

1:923870:281 640 : 249 060:768370:506520: 242 462:059570:598011:35 846 1:9 341 41:193510:318 640 :37788

0:7 649 60 :49 6130: 047 231:01 948 0:808360: 048 290:099350:285060:068630:766200:380170:015580:23013

0:02 847 20: 049 0050:1 046 090:1968 140 :07 845 40:0367120: 049 9 640 :0708930: 046 0950:1122220: 049 449 0: 042 2930:0729894

Ngày đăng: 14/08/2014, 10:22

TỪ KHÓA LIÊN QUAN