One-Way ANOVA with Fixed Effects

Một phần của tài liệu Linear models and time series analysis regression, ANOVA, ARMA and GARCH (Trang 101 - 121)

2.4.1 The Model

The one-way analysis of variance, or one-way ANOVA, extends the two-sample situation discussed above toa⩾2 groups. For example, in an agricultural setting,1there might bea⩾2 competing fer- tilizer mixtures available, the best one of which (in terms of weight of crop yield) is not theoretically obvious for a certain plant under certain conditions (soil, climate, amount of rain and sunshine, etc.).

To help determine the efficacy of each fertilizer mixture, which ones are statistically the same, and, possibly, which one is best, an experiment could consist of formingnaequally sized plots of land on which the plant is grown, such that all external conditions are the same for each plot (sunshine, rainfall, etc.), with nof thenaplots, randomly chosen (to help account for any exogenous factor not con- sidered), getting treated with theith fertilizer mixture,i=1,,a. When the allocation of fertilizer treatments to the plots is done randomly, the crop yield of thenaplots can be treated as independent realizations of random variablesYij, whereirefers to the fertilizer used,i=1,,a, andjrefers to which replication,j=1,,n.

The usual assumption is that theYij are normally distributed with equal variance𝜎2and possibly different means𝜇i,i=1,,a, so that the model is given by

Yij =𝜇i+𝜖ij, 𝜖ij i.i.d.

∼ N(0, 𝜎2). (2.19)

1 The techniques of ANOVA were actually founded for use in agriculture, which can be seen somewhat in the terminology that persists, such as “treatments”, “plots”, “split-plots” and “blocks”. See Mahalanobis (1964) for a biography of Sir Ronald Fisher and others who played a role in the development of ANOVA. See also Plackett (1960) for a discussion of the major early developments in the field.

Observe that the normality assumption cannot be correct, as crop yield cannot be negative. However, it can be an excellent approximation if the probability is very small of a crop yield being lower than some small positive number, and is otherwise close to being Gaussian distributed.

The first, and often primary, question of interest is the extent to which the model can be more simply expressed asYij =𝜇+𝜖ij, i.e., all the𝜇iare the same and equal𝜇. Formally, we wish to test

H0∶𝜇1=𝜇2= ã ã ã =𝜇a (=𝜇) (2.20)

against the alternative that at least one pair of𝜇iare different. It is worth emphasizing that, fora>2, the alternative is not that all𝜇iare different. Ifa=2, then the method in Section 2.2 can be used to testH0. Fora>2, a more general model is required. In addition, new questions can be posed, most notably: If we indeed can reject the equal-𝜇ihypothesis, then precisely which pairs of𝜇iactually differ from one another?

Instead of (2.19), it is sometimes convenient to work with the model parameterization given by Yij =𝜇+𝛼i+𝜖ij, i=1,,a, j=1,,n. (2.21) i.e.,𝜇i=𝜇+𝛼i=𝔼[Yij], which can be interpreted as an overall mean𝜇plus a factor𝛼ifor each of the atreatments. TheXmatrix is then similar to that given in (2.1), but witha+1 columns, the first of which is a column of all ones, and thus such thatXis rank deficient, with ranka.

In this form, we havea+1 parameters for theameans, and the set of thesea+1 parameters is not identified, and only some of their linear combinations are estimable, recalling the discussion in Section 1.4.2. In this case, one linear restriction on the𝛼iis necessary in order for them to be estimable.

A natural choice is∑a

i=1𝛼i=0, so that the𝛼ican be interpreted as deviations from the overall mean 𝜇. The null hypothesis (2.20) can also be writtenH0∶𝛼1= ã ã ã =𝛼a=0 versusHa: at least one𝛼i≠0.

2.4.2 Estimation and Testing

Based on the model assumptions of independence and normality, we would expect that the parameter estimators for model formulation (2.19) are given by ̂𝜇i=i•,i=1,,a, and, recalling the notation of S2iin (2.4),̂𝜎2= (n−1)∑a

i=1S2i∕(naa), the latter being a direct generalization of the pooled variance estimator of𝜎2in the two-sample case. This is indeed the case, and to verify these we cast the model in the general linear model framework by writingY=X𝜷+𝝐, where

Y= (Y11,Y12,,Y1n, Y21,,Yan)′, X=

⎛⎜

⎜⎜

𝟏n 𝟎n ã ã ã 𝟎n 𝟎n 𝟏n ã ã ã ⋮

⋮ ⋮ ⋱

𝟎n 𝟎n ã ã ã 𝟏n

⎞⎟

⎟⎟

=Ia𝟏n, (2.22) 𝜷 = (𝜇1,, 𝜇a)′, and𝝐is similarly structured asY. Note that this generalizes the setup in (2.1) (but with the sample sizes in each group being the same) and is not the formulation in (2.21). MatrixXin (2.22) has full ranka.

AsXisna×a, there areT=natotal observations andk=aregressors. The Kronecker product notation allows the design matrix to be expressed very compactly and is particularly helpful for repre- sentingXin more complicated models. It is, however, only possible when the number of replications is the same per treatment, which we assume here for simplicity of presentation. In this case, the model is said to bebalanced. More generally, theith group hasniobservations,i=1,,a, and if any two of theniare not equal, the model isunbalanced.

Using the basic facts that, for conformable matrices,

(AB)′= (AB′) and (AB)(CD) = (ACBD), (2.23) it is easy to verify that

(XX) = (Ia𝟏n)′(Ia𝟏n) =nIa and XY= ( Y1• Y2• ã ã ã Ya• )′, (2.24) yielding the least squares unbiased estimators

̂𝜷= (1•, ̄Y2•,, ̄Ya•)′, S(̂𝜷) =

a i=1

n j=1

(Yiji•)2, ̂𝜎2= S(𝜷̂)

a(n−1), (2.25)

with ̂𝜎2generalizing that given in (2.6) fora=2.

For the restricted modelY=X𝜸+𝝐, i.e., the model under the null hypothesis of no treatment (fer- tilizer) effect, we could use (1.69) to computê𝜸with theJ=a−1 restrictions represented asH𝜷=h with, say,

H= [Ia−1,𝟏a−1] and h=𝟎. (2.26)

It should be clear for this model that

̂𝜸= ̂𝜇=•• and S(̂𝜸) =

a i=1

n j=1

(Yij••)2, (2.27)

so that theFstatistic (1.88) associated with the null hypothesis (2.20) can be computed. Moreover, the conditions in Example 1.8 are fulfilled, so that (1.55) (with =i•and =••) implies

a i=1

n j=1

(Yij••)2=

a i=1

n j=1

(Yiji•)2+

a i=1

n j=1

(i•−••)2, (2.28) and, in particular,

S(̂𝜸) −S(𝜷̂) =

a i=1

n j=1

(i•−••)2=n

a i=1

(i•−••)2. Thus, (1.88) gives

F= na

i=1(i•−••)2∕(a−1)

a i=1

n

j=1(Yiji•)2∕(naa) ∼Fa−1,naa, (2.29)

underH0from (2.20), which, fora=2, agrees with (2.10) withm=n.

Remark The pitfalls associated with (and some alternatives to) the use of statistical tests for dichoto- mous model selection were discussed in Section III.2.8, where numerous references can be found, including recent ones such as McShane and Gal (2016) and Briggs (2016). We presume that the reader has got the message and realizes the ludicrousness of a procedure as simple as “if p-value is less than 0.05, the effect is significant”, and “ifp-value is greater than 0.05, there is no effect”. We sub- sequently suppress this discussion and present the usual test statistics associated with ANOVA, and common to all statistical software, using the traditional language of “reject the null” and “not reject

the null”, hoping the reader understands that this nonfortuitous language is not a synonym for model

selection. ◾

A test of size𝛼“rejects”H0ifF>c, wherecis such that Pr(F >c) =𝛼. We will sometimes write this as: TheFtest in (2.29) forH0rejects ifF>Fa−1,naa𝛼 , whereFn,d𝛼 is the 100(1−𝛼)th percent quantile of theFn,ddistribution. As a bit of notational explanation to avoid any confusion, note how, as history has it,𝛼is the standard notation for the significance level of a test, and how we use𝛼iin (2.21), this also being common notation for the fixed effects. Below, in (2.40), we will expressFin matrix terms.

To determine the noncentrality parameter𝜃under the alternative hypothesis, we can use (1.82), i.e., 𝜃=𝜷H′(HAH′)−1H𝜷𝜎2, whereA= (XX)−1. In particular, from (2.24) and (2.26),HAH′=n−1HH′, andHH′=Ia−1+𝟏a−1𝟏a−1. From (1.70), its inverse is

Ia−1−𝟏a−1(𝟏a−1Ia−1𝟏a−1+1)−1𝟏a−1=Ia−1−a−1𝟏a−1𝟏a−1, so that

𝜷H′(n−1HH′)−1H𝜷=n𝜷HH𝜷na−1𝜷H𝟏a−1𝟏a−1H𝜷

=n

a−1 i=1

(𝜇i𝜇a)2−n a

(a−1

i=1

(𝜇i𝜇a) )2

. Notice that, whena=2, this becomesntimes

(𝜇1−𝜇2)2− 1

2(𝜇1−𝜇2)2= 1

2(𝜇1−𝜇2)2,

so that𝜃=n(𝜇1−𝜇2)2∕(2𝜎2), which agrees with (2.12) form=n.

To simplify the expression for generala⩾2, we switch to the alternative notation (2.21), i.e.,𝜇i= 𝜇+𝛼iand∑a

i=1𝛼i=0. Then

a−1 i=1

(𝜇i𝜇a)2=

a−1∑

i=1

(𝛼i𝛼a)2=

a i=1

(𝛼i𝛼a)2=

a i=1

𝛼i2−2𝛼a

a i=1

𝛼i+a𝛼a2=

a i=1

𝛼2i +a𝛼2a and

1 a

(a−1

i=1

(𝜇i𝜇a) )2

= 1 a

(a−1

i=1

(𝛼i𝛼a) )2

= 1

a(0−𝛼a− (a−1)𝛼a)2=a𝛼a2, so that

𝜃= n 𝜎2

a i=1

𝛼i2. (2.30)

Thus, withFFa−1,naa(𝜃), the power of the test is Pr(F>c), wherecis determined from (2.29) for a given probability𝛼.

Remark Noncentrality parameter𝜃in (2.30) can be derived directly using model formulation (2.21), and the reader is encouraged to do so. Hint: We do so in the more general two-way ANOVA below;

see (2.64). ◾

2.4.3 Determination of Sample Size

To determinen, the required number of replications in each of theatreatments, for a given signifi- cance𝛼, power𝜌, and value of𝜎2, we solve

Pr(Fa,ana(0)>c) =𝛼 and Pr(Fa,ana(𝜃)>c) =𝜌

fornandc, and then round upnto the nearest integer, giving, say,n∗. The program in Listing 2.1 is easily modified to compute this.

Remark It is worth emphasizing the luxury we have with the availability of cheap modern com- puting power. This makes such calculations virtually trivial. Use of the saddlepoint approximation to the noncentralFspeeds things up further, allowing “what if” scenarios and plots ofnas a function of various input variables to be made essentially instantaneously. To get an idea of how sample size determination was previously done and the effort put into construction of tabulated values, see Sahai

and Ageel (2000, pp. 57–60). ◾

Also similar to the sample size calculation in the two-sample case, the value of𝜎2must be specified.

As𝜎2will almost always be unknown, an approximation needs to be used, for which there might be several (based on prior knowledge resulting from, perhaps, a pilot experiment, or previous, related experiments, or theoretical considerations, or, most likely, a combination of these). Asn∗is an increas- ing function of𝜎2, use of the largest “educated guess” for𝜎2would lead to a conservative choice ofn∗. Arguably even more complicated is the specification of∑a

i=1𝛼i2, for whichn∗is a decreasing function, i.e., to be conservative we need to choose the smallest relevant∑a

i=1𝛼i2.

One way to make such a choice is to choose a value𝛿that represents the smallest practically sig- nificant difference worth detecting between any two particular treatments, say 1 and 2. Then tak- ing|𝛼1−𝛼2|=𝛿and𝛼i=0,i=3,,a, together with the constraint∑a

i=1𝛼i=0 implies𝛼1= ±𝛿∕2, 𝛼2= ∓𝛿∕2 and∑a

i=1𝛼i2=𝛿2∕2. Specification of𝛿appears easier than∑a

i=1𝛼i2, although might lead to unnecessarily high choices ofn∗if more specific information is available about the choice of the𝛼i.

In certain cases, an experiment is conducted in which the treatments are actually levels of a partic- ular “input”, the choice of which determines the amount of “output”, which, say, is to be maximized.

For example, the input might be the dosage of a drug, or the temperature of an industrial process, or the percentage of a chemical in a fertilizer, etc. Depending on the circumstances, the researcher might be free to choose the number of levels,a, as well as the replication number,n, but with the constraint thatnaN∗. The optimal choice ofaandnwill depend not only onN∗and𝜎2, but also on the approximate functional form (linear, quadratic, etc.) relating the level to the output variable;

see, e.g., Montgomery (2000) for further details.

Alternatively, instead of different levels of some particular treatment, the situation might be compar- ing the performance of several different treatments (brands, methods, chemicals, medicines, etc.). In this case, there is often acontrol groupthat receives the “standard treatment”, which might mean no treatment at all (or a placebo in medical studies involving humans), and interest centers on determin- ing which, if any, treatments are better than the control, and, among those that are better, which is best.

Common sense would suggest including only those treatments in the experiment that might possibly be better than the control. For example, imagine a study for comparing drugs that purport to increase the rate at which the human liver can remove alcohol from the bloodstream. The control group would

consist of those individuals receiving no treatment (or, possibly, a placebo), while treatment with caf- feine would not be included, as its (unfortunate) ineffectiveness is well-known.

Example 2.1 To see the effect on the power of theFtest when superfluous treatments are included, let the first group correspond to the prevailing treatment and assume all other considered treatments do not have an effect. In terms of model formulation (2.21) with the natural constraint∑a

i=1𝛼i=0, we take𝛼1=𝛿and𝛼2=𝛼3= ã ã ã =𝛼a = −𝛿∕(a−1), so that∑a

i=1𝛼2i =𝛿2a∕(a−1). For𝜎2=1,n=22, test size𝛼=0.05, and𝛿=0.5, the power is 0.90 fora=2 and decreases asaincreases. Figure 2.4

2 3 4 5 6 7 8 9 10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

a, the number of treatment groups

Power of the F test

α= 0.05, δ= 0.5, σ2= 1

n = 22 n = 16 n = 8

2 3 4 5 6 7 8 9 10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

a, the number of treatment groups

Power of the F test

α= 0.05, n = 16, σ2= 1

delta = 1 delta = 1/2 delta = 1/4

Figure 2.4 Top:Power of theFtest as a function ofa, for fixed𝛼,𝛿, and𝜎2, and three values ofn.Bottom:Similar, butn is fixed at 16, and three values of𝛿are used. The middle dashed line is the same in both graphics.

plots the power, as a function ofa, for various constellations ofnand𝛿. Observe how the total sample sizeN∗=naincreases withn. This might not be realistic in practice, and insteadN∗might be fixed, so that, asaincreases,ndecreases, and the power will drop far faster than shown in the plots. The reader is encouraged to reproduce the plots in Figure 2.4, as well as considering the case whenN

is fixed. ◾

2.4.4 The ANOVA Table

We endow the various sums of squares arising in this model with particular names that are common (but not universal; see the Remark below) in the literature, as follows.

a i=1

n j=1

(Yij••)2 is called thetotal (corrected) sum of squares, abbreviatedSST;

a i=1

n j=1

(Yiji•)2is thewithin (group) sum of squares, abbreviatedSSW, also referred to as thesum of squares due to error; and, recalling (2.25) and (2.27),

S(̂𝜸) −S(𝜷̂)is referred to as thebetween (group) sum of squares, orSSB. That is,SST =SSW+SSBfrom (2.28).

Remark It is important to emphasize that this notation, while common, is not universal. For example, in the two-way ANOVA in Section 2.5 below, there will be two factors, say A and B, and we will useSSBto denote the latter. Likewise, in a three-factor model, the factors would be labeled A, B, and C.

In the two-way ANOVA case, some authors refer to the “more interesting” factor A as the “treat- ment”, and the second one as ablock(block here not in the sense of “preventing”, but rather as “seg- menting”), such as for “less interesting” things, such as gender, age group, smoker/non-smoker, etc.

As the word block coincidentally also starts with a b, its associated sum of squares is denotedSSB. ◾ A more complete sum of squares decomposition is possible by starting with theuncorrected total sum of squares,

a i=1

n j=1

Yij2=

a i=1

n j=1

(Yiji•+i•−••+••)2

=

a i=1

n j=1

(Yiji•)2+

a i=1

n j=1

(i•−••)2+

a i=1

n j=1

••2, (2.31)

and verifying that all the cross terms are zero. For that latter task, letPXbe the projection matrix based onXin (2.22) andP𝟏the projection matrix based on a column of ones. Then the decomposition (2.31) follows directly from the algebraic identity

YIY=Y′(IPX)Y+Y′(PXP𝟏)Y+YP𝟏Y, (2.32) and the fact that

S(̂𝜸) −S(𝜷̂) =Y′(PX−P𝟏)Y, (2.33)

from (1.87). Recall from Theorem 1.6 that, if0 and are subspaces ofℝT such that0, then PP

0 =P

0 =P

0P. Thus, from (1.80) and that𝟏∈(X),PX−P𝟏is a projection matrix.

Remark Anticipating the discussion of the two-way ANOVA in Section 2.5 below, we rewrite (2.32), expressing the single effect as A, and thus its projection matrix, asPAinstead ofPX:

YY=YP𝟏Y+Y′(PAP𝟏)Y+Y′(IPA)Y, (2.34) where the three terms on the right-hand side are, respectively, the sums of squares with respect to the grand mean, the treatment effect, and the error term. Note that, in the latter term,𝟏∈(A) =(X) (whereAhere refers to the columns ofXassociated with the factor A), and is why the term isnot Y′(IPAP𝟏)Y.

Further, moving the last term in (2.32), namelyYP𝟏Y, to the left-hand side of (2.34) gives the decom- position in terms of thecorrected total sum of squares:

Y′(IP𝟏)Y=Y′(PAP𝟏)Y+Y′(IPA)Y, (2.35)

this being more commonly used. ◾

Each of the sums of squares in (2.32) has an associated number of degrees of freedom that can be determined from Theorem A.1. In particular, forSST, rank(I−P𝟏) =na−1, forSSW, rank(I−PX) = naa, and forSSB, asPX−P𝟏is a projection matrix,

rank(PX−P𝟏) =tr(PX−P𝟏) =tr(PX) −tr(P𝟏) =rank(PX) −rank(P𝟏) =a−1, (2.36) from Theorem 1.2. Note also thatPX=n−1XX′= (IaJn)∕n, with tracenan=a. Clearly, the sum of squares for the mean,naȲ••2, and the uncorrected total sum of squares have one andnadegrees of freedom, respectively.

From (2.33), the expected between (or treatment) sum of squares is

𝔼[SSB] =𝔼[Y′(PXP𝟏)Y] =𝜎2𝔼[(Y𝜎)′(PXP𝟏)(Y𝜎)], (2.37) so that, from (1.92), and recalling from (II.10.6) the expectation of a noncentral𝜒2random variable, i.e., ifZ𝜒2(n, 𝜃), then𝔼[Z] =n+𝜃, we have, withJ=a−1 and𝜃defined in (2.30),

𝔼[SSB] =𝜎2(J+𝜷X′(PXP𝟏)X𝜷𝜎2) =𝜎2(a−1+𝜃)

=𝜎2(a−1) +n

a i=1

𝛼i2. (2.38)

Similarly, from (1.93),𝔼[SSW] =𝜎2(naa).

Remark It is a useful exercise to derive (2.38) using the basic quadratic form result in (A.6), which states that, forY =XAXwithX∼Nn(𝝁,𝚺),𝔼[Y] =tr(A𝚺) +𝝁A𝝁.

Before proceeding, the reader should confirm that, forT=an,

P𝟏=T−1𝟏T𝟏T = (na)−1JaJn. (2.39)

This is somewhat interesting in its own right, for it says thatP1,an=P1,aP1,n, whereP1,jdenotes the j×jprojection matrix onto𝟏j.

From (2.33), we have

𝔼[SSB] =𝔼[S(̂𝜸) −S(𝜷̂)] =𝔼[Y′(PXP𝟏)Y] =𝔼[YPXY] −𝔼[YP𝟏Y], and, from (A.6) with

𝔼[Y] =𝝁=𝜷𝟏n and 𝜷= (𝜇1,, 𝜇a)′= (𝜇+𝛼1,, 𝜇+𝛼a)′, we have

𝔼[YPXY] =𝜎2 tr(PX) +𝝁PX𝝁

=a𝜎2+n−1(𝜷𝟏n)(IaJn)(𝜷𝟏n)

=a𝜎2+n−1(𝜷Ia𝜷𝟏nJn𝟏n)

=a𝜎2+n−1(𝜷𝜷⊗n2)

=a𝜎2+n

a i=1

𝜇i2.

Similarly, withP𝟏=T−1𝟏T𝟏T = (na)−1JaJn,

𝔼[YP𝟏Y] =𝜎2 tr(P𝟏) +𝝁P𝟏𝝁=𝜎2+ (na)−1(𝜷𝟏n)(JaJn)(𝜷𝟏n)

=𝜎2+ (na)−1(𝜷Ja𝜷𝟏nJn𝟏n) =𝜎2+ (na)−1

⎛⎜

⎜⎝ ( a

i=1

𝜇i )2

⊗n2

⎞⎟

⎟⎠

=𝜎2+ (na)−1n2(a𝜇)2=𝜎2+na𝜇2. Thus,

𝔼[Y′(PXP𝟏)Y] = (a−1)𝜎2+n ( a

i=1

𝜇2ia𝜇2 )

, but

a i=1

𝜇2i =a𝜇2+

a i=1

𝛼i2+2𝜇

a i=1

𝛼i=a𝜇2+

a i=1

𝛼2i, so that

𝔼[SSB] =𝔼[Y′(PXP𝟏)Y] = (a−1)𝜎2+n

a i=1

𝛼2i,

as in (2.38). ◾

For conducting statistical inference, it is usually more convenient to work with themean squares, denotedM S, which are just the sums of squares divided by their associated degrees of freedom. For this model, the important ones areMSW =SSW∕(naa)andMSB=SSB∕(a−1). Notice, in partic- ular, that theFstatistic in (2.29) can be written as

F= Y′(PX−P𝟏)Y∕rank(PX−P𝟏)

Y′(I−PX)Y∕rank(I−PX) = MSB

MSW. (2.40)

The expected mean squares 𝔼[M S] are commonly reported in the analysis of variance. For this model, it follows from (2.36) and (2.38) that

𝔼[MSB] =𝜎2+ n a−1

a i=1

𝛼i2=𝜎2+n𝜎2a, (2.41)

where𝜎a2is defined to be 𝜎a2∶= (a−1)−1

a i=1

(𝜇ī𝜇•)2= (a−1)−1

a i=1

(𝛼ī𝛼•)2= (a−1)−1

a i=1

𝛼2i, (2.42)

which follows because𝛼•=∑a

i=1𝛼i=0. Similarly,𝔼[MSW] =𝜎2.

Higher order moments of the mean squares, while not usually reported in this context, are straight- forward to compute using the results in Section II.10.1.2. In particular, forZ𝜒2(n, 𝜃), along with 𝔼[Z] =n+𝜃, we have𝕍(Z) =2n+4𝜃, and, most generally, fors∈ℝwiths>n∕2,

𝔼[Zs] = 2s e𝜃∕2

Γ(n∕2+s)

Γ(n∕2) 1F1(n∕2+s,n∕2;𝜃∕2), s>n∕2, as shown in (II.10.9). More useful for integer moments is, fors∈ℕ,

𝔼[Zs] =2sΓ (

s+n 2

)∑s

i=0

(s i

) (𝜃∕2)i

Γ(i+n∕2), s∈ℕ. (2.43)

The various quantities associated with the sums of squares decomposition are typically expressed in tabular form, as shown in Table 2.1. Except for the expected mean squares, the output from statistical software will include the table using the values computed from the data set under examination. The last column contains the p-valuepB, which is the probability that a centralF-distributed random variable witha−1 andnaadegrees of freedom exceeds the value of theFstatistic in (2.40). This number is often used for determining if there are differences between the treatments. Traditionally, a Table 2.1 The ANOVA table for the balanced one-way ANOVA model. Mean squares denote the sums of squares divided by their associated degrees of freedom. Term𝜎a2in the expected mean square corresponding to the treatment effect is defined in (2.42).

Source of Degrees of Sum of Mean Expected

variation freedom squares square mean square Fstatistic p-value

Between

(model) a−1 SSB MSB 𝜎2+n𝜎a2 MSBMSW pB

Within

(error) naa SSW MSW 𝜎2

Total

(corrected) na−1 SST

Overall mean 1 naȲ••2

Total na YY

value under 0.1 (0.05, 0.01) is said to provide “modest” (“significant”, “strong”) evidence for differences in means, though recall the first Remark in Section 2.4.2, and the discussion in Section III.2.8.

If significant differences can be safely surmised, then the scientist would proceed with further inferential methods for ascertaining precisely which treatments differ from one another, as discussed below. Ideally, the experiment would be repeated several times, possibly with different designs and larger sample sizes, in line with Fisher’s paradigm of using a “significant p-value” as (only) an indication that the experiment is worthy of repetition (as opposed to immediately declaring significance ifpBis less than some common threshold such as 0.05).

2.4.5 Computing Confidence Intervals

Section 1.4.7 discussed the Bonferroni and Scheffé methods of constructing simultaneous confidence intervals on linear combinations of the parameter vector𝜷. For the one-way ANOVA model, there are usually two sets of intervals that are of primary interest. The first is useful when one of the treatments, say the first, serves as a control, in which case interest centers on simultaneous c.i.s for𝜇i𝜇1,i= 2,,a. Whether or not one of the treatments is a control, the second set of simultaneous c.i.s is often computed, namely for alla(a−1)∕2 differences𝜇i𝜇j.

For the comparisons against a control, the Bonferroni procedure uses the cutoff valuec=tnaa−1 (1− 𝛼∕(2J)), where we use the notationtk−1(p)to denote the quantile of the Student’stdistribution with kdegrees of freedom, corresponding to probabilityp, 0<p<1. Likewise, the Scheffé method takes q=FJ,naa−1 (1−𝛼), whereJ=a−1. For all pairwise differences, Bonferroni usesc=tnaa−1 (1−𝛼∕(2D)), D=a(a−1)∕2, while the Scheffé cutoff value is stillq=Fa−1,naa−1 (1−𝛼), recalling (1.102) and the fact that onlya−1 of thea(a−1)∕2 differences are linearly independent.

Remark Methods are also available for deciding which population has the largest mean, most notably that from Bechhofer (1954). See also Bechhofer and Goldsman (1989), Fabian (2000), and the references therein. Detailed accounts of these and other methods can be found in Miller (1981), Hochberg and Tamhane (1987), and Hsu (1996). Miller (1985), Dudewicz and Mishra (1988, Sec. 11.2), Tamhane and Dunlop (2000), and Sahai and Ageel (2000) provide good introductory

accounts. ◾

We illustrate the inferential consequences of the different intervals using simulated data. The Matlab function in Listing 2.3 generates data (based on a specified “seed” value) appropriate for a one-way ANOVA, usingn=8,a=5,𝜇1=12,𝜇2=11,𝜇3=10,𝜇4=10,𝜇5=9 and𝜎2=4. For seed value 1, the program produces the text fileanovadata.txtwith contents given in Listing 2.4 and which we will use shortly. A subsequent call top=anova1(x)in Matlab yields thep-value 0.0017 and produces the ANOVA table (as a graphic) and a box plot of the treatments, as shown in Figure 2.5. While the p-value is indeed well under the harshest typical threshold of 0.01, the box plot shows that the true means are not well reflected in this data set, nor does the data appear to have homogeneous variances across treatments.

For computing thea(a−1)∕2=10 simultaneous c.i.s of the differences of each pair of treatment means using 𝛼=0.05, the cutoff values c=t−135(1−0.05∕20) =2.9960 and q=F4,35−1(1−0.05) = 2.6415 for the Bonferroni and Scheffé methods, respectively, are required. The appropriate value for the maximum modulus method is not readily computed, but could be obtained from tabulated sources for the standard values of𝛼=0.10, 0.05 and 0.01.

Một phần của tài liệu Linear models and time series analysis regression, ANOVA, ARMA and GARCH (Trang 101 - 121)

Tải bản đầy đủ (PDF)

(880 trang)