cata-Comparison of the strengths of certain objects made of different batches 17.1 One-way Layout or One-way Classification with the Same Number of Observations Per Cell The models to be
Trang 1Chapter 17 Analysis of Variance
The Analysis of Variance techniques discussed in this chapter can be used tostudy a great variety of problems of practical interest Below we mention a fewsuch problems
Crop yields corresponding to different soil treatment
Crop yields corresponding to different soils and fertilizers
Comparison of a certain brand of gasoline with and without an additive byusing it in several cars
Comparison of different brands of gasoline by using them in several cars.Comparison of the wearing of different materials
Comparison of the effect of different types of oil on the wear of severalpiston rings, etc
Comparison of the yields of a chemical substance by using different lytic methods
cata-Comparison of the strengths of certain objects made of different batches
17.1 One-way Layout (or One-way Classification) with the Same Number of Observations Per Cell
The models to be discussed in the present chapter are special cases of thegeneral model which was studied in the previous chapter In this section, weconsider what is known as a one-way layout, or one-way classification, which
we introduce by means of a couple of examples
Trang 2EXAMPLE 1 Consider I machines, each one of which is manufactured by I different
compa-nies but all intended for the same purpose A purchaser who is interested inacquiring a number of these machines is then faced with the question as towhich brand he should choose Of course his decision is to be based on the
productivity of each one of the I different machines To this end, let a worker run each one of the I machines for J days each and always under the same conditions, and denote by Y ij his output the jth day he is running the ith
machine Let μi be the average output of the worker when running the ith machine and let e ij be his “error” (variation) the jth day when he is running the ith machine Then it is reasonable to assume that the r.v.’s e ij are normallydistributed with mean 0 and variance σ2
It is further assumed that they are
independent Therefore the Y ij’s are r.v.’s themselves and one has the ing model
EXAMPLE 2 For an agricultural example, consider I · J identical plots arranged in an I × J
orthogonal array Suppose that the same agricultural commodity (some sort of
a grain, tomatoes, etc.) is planted in all I · J plots and that the plants in the ith row are treated by the ith kind of I available fertilizers All other conditions assumed to be the same, the problem is that of comparing the I different kinds
of fertilizers with a view to using the most appropriate one on a large scale.Once again, we denote by μi the average yield of each one of the J plots in the ith row, and let e ij stand for the variation of the yield from plot to plot in the
ith row, i = 1, , I Then it is again reasonable to assume that the r.v.’s e ij , i=
repre-(straight) lines In such a case there are formed IJ rectangles in the resulting rectangular array which are also referred to as cells (see also Fig 17.1) The
same interpretation and terminology is used in similar situations throughoutthis chapter
In connection with model (1), there are three basic problems we areinterested in: Estimation of μi , i = 1, , I; testing the hypothesis: H:μ1= ··· =
μI (=μ, unspecified) (that is, there is no difference between the I machines, or
the I kind of fertilizers) and estimation of σ2
Set
Y e
Trang 3Then it is clear that Y = X′βββββ + e Thus we have the model described in (6)
of Chapter 16 with n = IJ and p = I Next, the I vectors (1, 0, · · · , 0)′, (0, 1,
0, , 0)′, , (0, 0, , 0, 1)′ are, clearly, independent and any other row
vector in X′ is a linear combination of them Thus rank X′ = I (= p), that is, X
J
Figure 17.1
Trang 4is of full rank Then by Theorem 2, Chapter 16, μi = 1, , I have uniquely
determined LSE’s which have all the properties mentioned in Theorem 5 ofthe same chapter In order to determine the explicit expression of them, weobserve that
J j
J
1 1
1
J j
J j
so that, under the hypothesis H :μ1=···=μI (=μ, unspecified), ηηηηη ∈V1 That is,
r − q = 1 and hence q = r − 1 = p − 1 = I − 1 Therefore, according to (31) in
Chapter 16, the F statistic for testing H is given by
F
S SS
S SS
= − − = ( )−
−
−
n r q
I J I
c C C
c C C
I
Y
1 1
μ One has then the (unique) solution
I
1 1
(4)Therefore relations (28) and (29) in Chapter 16 give
17.1 One-way Layout (or One-way Classification) 443
Trang 5SC C j ij ij C
J
ij i j
J i I i
ηand
Sc c j ij ij c
J
ij j J i I i
J
i i I i
I i
2 1 1
1
(5)Likewise,
J
ij j J i I i
2 1
I
i i
I
i i
SS SS
MS MS
H e H e
Trang 6Table 1 Analysis of Variance for One-Way Layout
=
− 1
within groups SS e Y ij Y i
j J
I J
e e
17.1 One-way Layout (or One-way Classification) 445
REMARK 1 From (5), (6) and (7) it follows that SS T = SS H + SS e Also from
(6) it follows that SS T stands for the sum of squares of the deviations of the Y ij’s
from the grand (sample) mean Y Next, from (5) we have that, for each i,
∑J
j=1(Y ij − Y i.)2 is the sum of squares of the deviations of Y ij , j = 1, , J within the ith group For this reason, SS e is called the sum of squares within groups.
On the other hand, from (7) we have that SS H represents the sum of squares of
the deviations of the group means Y i. from the grand mean Y . (up to the factor
J) For this reason, SS H is called the sum of squares between groups Finally,
SS T is called the total sum of squares for obvious reasons, and as mentioned above, it splits into SS H and SS e Actually, the analysis of variance itself derives
its name because of such a split of SS T
Now, as follows from the discussion in Section 5 of Chapter 16, the
quantities SS H and SS e are independently distributed, under H, as σ2
IJ−1 distributed, under H We may
summarize all relevant information in a table (Table 1) which is known as an
Analysis of Variance Table.
EXAMPLE 3 For a numerical example, take I = 3, J = 5 and let
and MS H = 315.5392, MS e= 7.4, so that F = 42.6404 Thus for α = 0.05, F2,12;0.05,
= 3.8853 and the hypothesis H:μ1=μ2=μ3 is rejected Of course, ˜σ2
= MS = 7.4
Trang 717.2 Two-way Layout (Classification) with One Observation Per Cell
The model to be employed in this paragraph will be introduced by an priate modification of Examples 1 and 2
appro-EXAMPLE 4 Referring to Example 1, consider the I machines mentioned there and also J
workers from a pool of available workers Each one of the J workers is
assigned to each one of the I machines which he runs for one day Let μij be the
daily output of the jth worker when running the ith machine and let e ij be his
“error.” His actual daily output is then an r.v Y ij such that Y ij=μij + e ij At thispoint it is assumed that each μij is equal to a certain quantity μ, the grand mean,
plus a contribution αi due to the ith row (ith machine), and called the ith row effect, plus a contribution βj due to the jth worker, and called the jth column effect It is further assumed that the I row effects and also the J column effects
cancel out each other in the sense that
αi βj j J i
EXAMPLE 5 Consider the identical I · J plots described in Example 2, and suppose that J
different varieties of a certain agricultural commodity are planted in each one
of the I rows, one variety in each plot Then all J plots in the ith row are treated
by the ith of I different kinds of fertilizers Then the yield of the jth variety of the commodity in question treated by the ith fertilizer is an r.v Y ij which is
assumed again to have the structure described in (10) Here the ith row effect
Trang 8is the contribution of the ith fertilized and the jth column effect is the bution of the jth variety of the commodity in question.
contri-From the preceding two examples it follows that the outcome Y ij is fected by two factors, machines and workers in Example 4 and fertilizers and
af-varieties of agricultural commodity in Example 5 The I objects (machines or fertilizers) and the J objects (workers or varieties of an agricultural commod- ity) associated with these factors are also referred to as levels of the factors.
The same interpretation and terminology is used in similar situations out this chapter
through-In connection with model (10), there are the following three problems to
be solved: Estimation of μ; αi , i = 1, , I;βj , j = 1, , J; testing the hypothesis
H A:α1=···=αI = 0 (that is, there is no row effect), H B:β1=···=βJ= 0 (that is,there is no column effect) and estimation of σ2
Trang 9and then we have
Y= ′ +Xββ e with n=IJ and p= + +I J 1
It can be shown (see also Exercise 17.2.1) that X′ is not of full rank but rank
X′ = r = I + J − 1 However, because of the two independent restrictions
αi β
i
I
j j J
0and
impliesμˆ = Y , where Y . is again given by (4);
∂
∂a iS Y,ββ( )= 0impliesαˆ i = Y i − Y ., where Y i. is given by (2) and (∂/∂βj)S(Y, βββββ) = 0 implies βββββˆj = Y.j − Y ., where
EY= = ′ηη X(μ α: 1, ,α βI; 1, ,βJ)′∈V r, where r= + −I J 1.Consider the hypothesis
H A:α1= · · · =αI= 0
Then, under H A,ηηηηη ∈ V r −q , where r − q A = J, so that q A = I − 1.
Next, under H A again, S(Y, βββββ) becomes
j J i
from where by differentiation, we determine the LSE’s of μ and βj, to be
denoted by ˆμ and ˆβ , respectively That is, one has
Trang 10ˆ ˆ , ˆ . ˆ , , ,
μA =Y .=μ βj A, =Y j−Y .=βj j=1 J (13)Therefore relations (28) and (29) in Chapter 16 give by means of (11) and (12)
SC C j ij ij C
J
ij i j j
J i I i
J i I i
ij j j
J i
I
i i I
1
2 1
1
2 1
(14)because
ij j i j
J i
I
i i
I
ij j j
J
i i I
i i I
2 1
I
SS SS
MS MS
and SS A , SS e are given by (15) and (14), respectively (However, for an
expres-sion of SS to be used in actual calculations, see (20) below.)
17.2 Two-way Layout (Classification) with One Observation Per Cell 449
Trang 11Next, for testing the hypothesis
J
SS SS
MS MS
J
j j
2
1
2 1
J i
ij j
J
i i I i
I
j j
J
ij j
2
1
2
1 1
2
1
2
1 1
1 1
1 1
22
J i
I
i
ij j J i
I
j
i j J i
J i
I
i i
I
ij j J
i I
Trang 12Table 2 Analysis of Variance for Two-way Layout with One Observation Per Cell
J
.
β 2 1
=
− 1
residual SS e Y ij Y i Y j Y
j J
J i
I
j j
J
ij i I
j j
and
j J i
Trang 133 7 5 4
17.3 Two-way Layout (Classification) with K ( ≥≥≥≥≥ 2) Observations Per Cell
In order to introduce the model of this section, consider Examples 4 and 5 and
suppose that K ( ≥ 2) observations are taken in each one of the IJ cells This amounts to saying that we observe the yields Y ijk , k = 1, , K of K identical plots with the (i, j)th plot, that is, the plot where the jth agricultural commodity was planted and it was treated by the ith fertilizer (in connection with Example 5); or we allow the jth worker to run the ith machine for K days instead of one
day (Example 4) In the present case, the relevant model will have the form
Y ijk=μij + e ijk However, the means μij , i = 1, , I; j = 1, , J need not be
additive any longer In other words, except for the grand mean μ and the rowand column effects αi and βj, respectively, which in the previous section added
up to make μij , we may now allow interactionsγij among the various factorsinvolved, such as fertilizers and varieties of agricultural commodities, or work-ers and machines It is not unreasonable to assume that, on the average, theseinteractions cancel out each other and we shall do so Thus our present model
is as follows:
Y ijk=μ + αi+βj+γij + e ijk, (22)where
i
I
j ij j
J
ij i I j
Once again the problems of main interest are estimation of μ, αi,βj and γij,
i = 1, , I; j = 1, , J; testing the hypotheses: H A:α1=···=αI = 0, H B:β1=···=
βJ = 0 and H AB:γij = 0, i = 1, , I; j = 1, , J (that is, there are no interactions
present); and estimation of σ2
By setting
Y e
′
′
ββ (μ , μ μ1 .μIJ)′and
Trang 14it is readily seen that
Y= X′βββββ + e with n = IJK and p = IJ, (22′)
so that model (22′) is a special case of model (6) in Chapter 16 From the form
of X′ it is also clear that rank X′ = r = p = IJ; that is, X′ is of full rank (see also
Exercise 17.3.1) Therefore the unique LSE’s of the parameters involved areobtained by differentiating with respect to μij the expression
S Y,
.ββ
I
μ 2
1 1 1
We have then
Trang 15ˆ ., , ; , ,
μij=Y ij i=1 I j=1 J (23)Next, from the fact that μij=μ + αi+βj+γij and on the basis of the assumptionsmade in (22), we have
μ = μ ., αi=μi.−μ ., βj=μ.j−μ ., γij=μij−μi.−μ.j+μ ., (24)
by employing the “dot” notation already used in the previous two sections.From (24) we have that μ, αi,βj and γij are linear combinations of the param-eters μij Therefore, by the corollary to Theorem 3 in Chapter 16, they are
estimable, and their LSE’s ˆ μ, ˆαi , ˆβj , ˆγij, are given by the above-mentioned linearcombinations, upon replacing μij by their LSE’s It is then readily seen that
.
K j J i I j
J i
K
C j
J i I
i i i
I
j j j
J
ij ij j
J i I
(26)because, as is easily seen, all other terms are equal to zero (See also Exercise17.3.2.)
From identity (26) it follows that, under the hypothesis
H A:α1= · · · = αI= 0,the LSE’s of the remaining parameters remain the same as those given in (25)
It follows then that
ijk k
K
ij j J i I j
J i I
2
2 1
2 1 1 1
1
(27)and
Trang 16i i I
2 1
2 1
Therefore the F statistic in the present case is
FA
A e
A e
IJ K I
SS SS
MS MS
and SS A , SS e are given by (28) and (27), respectively
For testing the hypothesis
IJ K J
SS SS
MS MS
j j J
IJ K
SS SS
MS MS
Trang 17ij i j j
J i I
1 1
ijk k K j J i I
2
1
2 1
2 1
Once again the main results of this section are summarized in a table, Table 3
The number of degrees of freedom of SS T is calculated by those of SS A,
SS B , SS AB and SS e, which can be shown to be independently distributed as σ2χ2r.v.’s with certain degrees of freedom
EXAMPLE 6 For a numerical application, consider two drugs (I= 2) administered in three
dosages (J = 3) to three groups each of which consists of four (K = 4) subjects.
Certain measurements are taken on the subjects and suppose they are asfollows:
Trang 18FA= 0.8471, FB= 12.1038, FAB= 0.1641
Thus for α = 0.05, we have F1,18;0.05 = 4.4139 and F2,18;0.05 = 3.5546; we accept H A,
reject H B and accept H AB Finally, we have σ˜2
= 183.0230
The models analyzed in the previous three sections describe three mental designs often used in practice There are many others as well Some ofthem are taken from the ones just described by allowing different numbers ofobservations per cell, by increasing the number of factors, by allowing the roweffects, column effects and interactions to be r.v.’s themselves, by randomizingthe levels of some of the factors, etc However, even a brief study of thesedesigns would be well beyond the scope of this book
J
.
β 2 1
=
− 1
AB interactions SS AB K ij K Y Y Y Y
j J
i
I
ij i j j
2
1 1
j J
j J
Exercises 457
Trang 1917.3.3 Show that SS T = SS e + SS A + SS B + SS AB , where SS e , SS A , SS B , SS AB and
SS T are given by (27), (28), (31), (33) and (34), respectively
17.3.4 Apply the two-way layout with two observations per cell analysis ofvariance to the data given in the table below (take α = 0.05)
Consider again the one-way layout with J (≥ 2) observations per cell described
in Section 17.1 and suppose that in testing the hypothesis H :μ1=···=μI (= μ,unspecified) we decided to reject it on the basis of the available data In
rejecting H, we simply conclude that the μ’s are not all equal No conclusionsare reached as to which specific μ’s may be unequal
The multicomparison method described in this section sheds some light onthis problem
For the sake of simplicity, let us suppose that I = 6 After rejecting H, the
natural quantities to look into are of the following sort:
131
3
13
1
6
0
This observation gives rise to the following definition
DEFINITION 1 Any linear combination ψ = ∑I
i=1 c iμi of the μ’s, where c i , i = 1, , I are known
constants such that ∑I
i=1c i = 0, is said to be a contrast among the parameters μ i,
i = 1, , I.
Letψ = ∑I
=1cμ be a contrast among the μ’s and let
Trang 20i i
2
1
1
1
where n = IJ We will show in the sequel that the interval [ψˆ − Sσˆ(ψˆ ), ψˆ +
S σˆ(ψˆ )] is a confidence interval with confidence coefficient 1 − α for all
con-trastsψ Next, consider the following definition
DEFINITION 2 Letψ and ψˆ be as above We say that ψˆ is significantly different from zero,
according to the S (for Scheffé) criterion, if the interval [ ψˆ − Sσˆ(ψˆ ), ψˆ + Sσˆ
(ψˆ )] does not contain zero; equivalently, if |ψˆ | > Sσˆ(ψˆ ).
Now it can be shown that the F test rejects the hypothesis H if and only if
there is at least one contrast ψ such that ψˆ is significantly different from zero Thus following the rejection of H one would construct a confidence inter-
val for each contrast ψ and then would proceed to find out which contrasts are
responsible for the rejection of H starting with the simplest contrasts first.
The confidence intervals in question are provided by the followingtheorem
THEOREM 1 Refer to the one-way layout described in Section 17.1 and let
I
1 1
where MS e is given in Table 1 Then the interval [ψˆ − Sσˆ(ψˆ ), ψˆ + Sσˆ(ψˆ )] is a
confidence interval simultaneously for all contrasts ψ with confidence cients 1 −α, where S2= (I − 1)F I −1,n−I;α and n = IJ.
coeffi-PROOF Consider the problem of maximizing (minimizing) (with respect to
I i i i i
I
1
2 1 1
11
=
1
0
Now, clearly, f(c1, , c I)= f(γc1, , γc I) for any γ > 0 Therefore the
maxi-mum (minimaxi-mum) of f(c1, , c I), subject to the restraint
is the same with the maximum (minimum) of f( γ c1, , γ c I)= f(c′1, , c I′),
c′=γ c , i = 1, , I subject to the restraints
17.4 A Multicomparison Method 459
Trang 21′ =
=
i I
1
0and
1
1
2 1
I
i i
k
i i I
i i I
Trang 22λ1 μ λ2 μ μ
2
1
12
I
k k
k k k
I
k k k
2
2 .
k k k
i I
1
2 1
2 1
Trang 23From (40) and (39) it follows then that
for all c i , i = 1, , I such that ∑ I
i=1c i= 0, or equivalently,
P[ψˆ −Sσ ψˆ ˆ( )≤ψ ψ≤ ˆ+Sσ ψˆ ˆ( ) ]= −1 α,for all contrasts ψ, as was to be seen (This proof has been adapted from thepaper “A simple proof of Scheffé’s multiple comparison theorem for contrasts
in the one-way layout” by Jerome Klotz in The American Statistician, 1969,
Vol 23, Number 5.) ▲
In closing, we would like to point out that a similar theorem to the one just
proved can be shown for the two-way layout with (K≥ 2) observations per celland as a consequence of it we can construct confidence intervals for all con-trasts among the α’s, or the β’s, or the γ’s
Exercises
17.4.1 Show that the quantity J i Y i Y
i I
Sec-χ2
I−1, under the null hypothesis
17.4.2 Refer to Exercise 17.1.1 and construct confidence intervals for allcontrasts of the μ’s (take 1 − α = 0.95)
Trang 2418.1 Introduction 463
463
Chapter 18 The Multivariate Normal Distribution
DEFINITION 1
18.1 Introduction
In this chapter, we introduce the Multivariate Normal distribution and lish some of its fundamental properties Also, certain estimation and inde-pendence testing problems closely connected with it are discussed
estab-Let Y j , j = 1, , m be i.i.d r.v.’s with common distribution N(0, 1) Then we know that for any constants c j , j = 1, , m and μ the r.v ∑ m
j=1c j Y j+
μ is distributed as N(μ, ∑ m
j=1c2
j) Now instead of considering one
(non-homogeneous) linear combination of the Y’s, consider k such combinations;
Let Y j , j = 1, , m be i.i.d r.v.’s distributed as N(0, 1) and let the r.v.’s X i , i=
1, , k, or the r vector X, be defined by (1) or (2), respectively Then the
joint distribution of the r.v.’s X i , i = 1, , k or the distribution of the r vector
X, is called Multivariate (or more specifically, k-Variate) Normal.
REMARK 1 From Definition 1, it follows that if X i , i = 1, , k are jointly
normally distributed, then any subset of them also is a set of jointly normallydistributed r.v.’s
Trang 25From (2) and relation (10), Chapter 16, it follows that EX= μμμμμ and ΣΣΣΣΣ/x =
k
m
j j j
k
j jm j
k
m
1
1 1
1
1 1 1
k
Y j jm j k
j j j
k
j j m j k
1
2
12
121
k
1 1 2
1 2
The ch.f of the r vector X= (X1 , , X k)′, which has the k-Variate Normal
distribution with mean μμμμμ and covariance matrix ΣΣΣΣΣ/, is given by
From (6) it follows that φx , and therefore the distribution of X, is completely
determined by means of its mean μμμμμ and covariance matrix ΣΣΣΣΣ/, a fact analogous
to that of a Univariate Normal distribution This fact justifies the followingnotation:
X ~N( )μμ ΣΣ/, whereμμμμμ and ΣΣΣΣΣ/ are the parameters of the distribution.
Now we shall establish the following interesting result
Let Y j , j = 1, , k be i.i.d r.v.’s with distribution N(0, 1) and set X = CY + μμμμμ, where C is a k × k non-singular matrix Then the p.d.f fx of X exists and is given
by
THEOREM 2
THEOREM 1
Trang 26whereΣΣΣΣΣ/ = CC′ and |ΣΣΣΣΣ/| denotes the determinant of ΣΣΣΣΣ/.
PROOF From X = CY + μμμμμ we get CY = X − μμμμμ, which, since C is non-singular,
REMARK 2 A k-Variate Normal distribution with p.d.f given by (7) is called
a non-singular k-Variate Normal The use of the term non-singular
corre-sponds to the fact that |ΣΣΣΣΣ/| ≠ 0; that is, the fact that ΣΣΣΣΣ/ is of full rank
In the theorem, let k = 2 Then X = (X1 , X2)′ and the joint p.d.f of X1 , X2is theBivariate Normal p.d.f
PROOF By Remark 1, both X1 and X2 are normally distributed and let X1~ N(μ1,σ2
2 2
1 2
1 2 1
2
11
COROLLARY 1
Trang 271 2 2
2 2
2 1 2
1 2
1 1 2 2
2 2 2 2
1
1
2 12
μσ
n On the other hand, |ΣΣΣΣΣ/| ΣΣΣΣΣ/−1 is also a diagonal matrix with the jth
diagonal element given by ∏i ≠jσ2
i, so that ΣΣΣΣΣ/−1 itself is a diagonal matrix with the
jth diagonal element being given by 1/σ2
j It follows that
i i
12
and this establishes the independence of the X’s. ▲
REMARK 3 The really important part of the corollary is that noncorrelationplus normality implies independence, since independence implies non-correlation in any case It is also to be noted that noncorrelation withoutnormality need not imply independence, as it has been seen elsewhere
Exercises
18.1.1 Use Definition 1 herein in order to conclude that the LSE βββββˆ of βββββ in (9)
of Chapter 16 has the n-Variate Normal distribution with mean βββββ andcovariance matrix σ2
S−1 In particular, (βˆ ,βˆ )′, given by (19″) and (19′) of
COROLLARY 2
Trang 282 2
2 2 1
j j
n
j j n
and correlation coefficient equal to − =
j j n
1
2 1
18.1.2 Verify relation (8)
18.1.3 Let the random vector X= (X1 , X k)′ be distributed as N(μμμμμ, ΣΣΣΣΣ/) and
suppose that ΣΣΣΣΣ/ is non-singular Then show that the conditional joint
distribu-tion of X i1, , X i m , given X j1, , X j n (1 ≤ m < k, m + n = k, all i1, , i m≠ from
all j1, , j n), is Multivariate Normal and specify its parameters
18.2 Some Properties of Multivariate Normal Distributions
In this section we establish some of the basic properties of a MultivariateNormal distribution
Let X= (X1, , X k)′ be N(μμμμμ, ΣΣΣΣΣ/) (not necessarily non-singular) Then for any
m × k constant matrix A = (α ij), the r vector Y defined by Y = AX has the
m-Variate Normal distribution with mean Aμμμμμ and covariance matrix AΣΣΣΣΣ/A′.
In particular, if m = 1, the r.v Y is a linear combination of the X’s, Y = ααααα′X,
say, and Y has the Univariate Normal distribution with mean ααααα′μμμμμ and varianceα
αααα′ΣΣΣΣΣ/ααααα
so that by means of (6), we have
and this last expression is the ch.f of the m-Variate Normal with mean Aμ and
covariance matrix A ΣΣΣΣΣ/A′, as was to be seen The particular case follows from
the general one just established ▲
For j = 1, , n, let X j be independent N(μμμμμj,ΣΣΣΣΣ/j ) k-dimensional r vectors and let
c j be constants Then the r vector
2 1
Trang 29(a result parallel to a known one for r.v.’s).
j
n j
j
j But
2 j
1
2 1
12
PROOF In the theorem, taken μμμμμj= μμμμμ, ΣΣΣΣΣ/j = ΣΣΣΣΣ/ and c j = 1/n, j = 1, , n. ▲
Let X= (X1, , X k)′ be non-singular N(μμμμμ, ΣΣΣΣΣ/) and set Q = (X − μμμμμ)′ΣΣΣΣΣ/−1
of a k-Variate Normal with mean μμμμμ and covariance matrix ΣΣΣΣΣ//(1 − 2it) Hence
COROLLARY
THEOREM 5
Trang 3018.2.1 Consider the k-dimensional random vectors X n = (X1n , , X kn)′, n =
1, 2, and X= (X1 , , X k)′ with d.f.’s F n , F and ch.f.’sφn,φ, respectively
Then we say that {Xn } converges in distribution to X as n→ ∞, and we write
for which F is continuous (see
also Definition 1(iii) in Chapter 8) It can be shown that a multidimensionalversion of Theorem 2 in Chapter 8 holds true Use this result (and alsoTheorem 3′ in Chapter 6) in order to prove that Xn d X
where X is distributed as N(μμμμμ, ΣΣΣΣΣ/) if and only if {λλλλλ′Xn} converges in distribution
as n → ∞, to an r.v Y which is distributed as Normal with mean λλλλλ′μμμμμ and
varianceλλλλλ′ΣΣΣΣΣ/λλλλλ for every λλλλλ ∈ k
18.3 Estimation of μμμμμ and ΣΣΣΣΣ/ and a Test of Independence
First we formulate a theorem without proof, providing estimators for μμμμμ andΣΣΣΣΣ/, and then we proceed with a certain testing hypothesis problem
For j = 1, , n, let X j = (X j1 , , X jk)′ be independent, non-singular N(μμμμμ, ΣΣΣΣΣ/)
r vectors and set
i) X ¯ and S are sufficient for (μμμμμ, ΣΣΣΣΣ/);
ii) X¯ and S/(n− 1) are unbiased estimators of μμμμμ and ΣΣΣΣΣ/, respectively;
iii) X¯ and S/n are MLE’s of μμμμμ and ΣΣΣΣΣ/, respectively
Now suppose that the joint distribution of the r.v.’s X and Y is the
Bivariate Normal distribution That is,
THEOREM 6
18.3 Estimation of μμμμμ and ∑∑∑/ and Test of Independence 469
Trang 311 1
2
1 1
2 2
2 2 2
πσ σ ρρ
μ
μσ
μσ
μσ
Then by Corollary 2 to Theorem 2, the r.v.’s X and Y are independent if and only if they are uncorrelated Thus the problem of testing independence for X and Y becomes that of testing the hypothesis H :ρ = 0 For this purpose,
consider an r sample of size n(X j , Y j ), j = 1, , n, from the Bivariate Normal under consideration Then their joint p.d.f., f, is given by
2
1 1
2 2
2 2
μσ
μ
, , (9)
For testing H, we are going to employ the LR test And although the
MLE’s of the parameters involved are readily given by Theorem 6, we
choose to derive them directly For this purpose, we set g( θθθθθ) for logf(θθθθθ)
considered as a function of the parameter θθθθθ ∈ ΩΩΩ, where the parameter space ΩΩ
2 2 2
2 2
1 2
Trang 3218.1 Introduction 471
where q j , j = 1, , n are given by (9) Differentiating (10) with respect to μ1
and μ2 and equating the partial derivatives to zero, we get after somesimplifications
1
1
2 2
See also Exercise 18.3.1 (11)
Solving system (11) for μ1 and μ2, we get
deriva-1
11
Next, differentiating g with respect to ρ and equating the partial derivative tozero, we obtain after some simplifications (see also Exercise 18.3.3)
2
2 2
It can further be shown (see also Exercise 18.3.5) that the values of the
parameters given by (12) and (16) actually maximize f (equivalently, g) and the
Trang 33It follows that the MLE’s of μ1,μ2,σ2
1,σ2
2 and ρ, under ΩΩΩ, are given by (12) and(16), which we may now denote by μˆ1,Ω,μˆ2,Ω,σˆ2
1,Ω,σˆ2 2,Ω and ρˆΩ That is,
ˆ , , ˆ , , ˆ , , ˆ , , ˆ .
2
2 2
Underωω (that is, for ρ = 0), it is seen (see also Exercise 18.3.6) that the MLE’s
of the parameters involved are given by
ˆ , , ˆ , , ˆ , , ˆ ,
2
2 2
ωω=x ωω=y ωω=S x ωω=S y (19)and
1
2
(20)
Replacing the x’s and y’s by X’s and Y’s, respectively, in (17) and (20), we have
that the LR statistic λ is given by
1
2 1
2 1
(22)
From (22), it follows that R2≤ 1 (See also Exercise 18.3.7.) Therefore by
the fact that the LR test rejects H whenever λ < λ0, where λ0 is determined, so
that P H(λ < λ0)= α, we get by means of (21), that this test is equivalent to
rejecting H whenever
R2>c0, equivalently, R< −c0 or R>c0, c0 = −1 λ02n (23)
In (23), in order to be able to determine the cut-off point c0, we have to know
the distribution of R under H Now although the p.d.f of the r.v R can be
derived, this p.d.f is none of the usual ones However, if we consider thefunction
it is easily seen, by differentiation, that W is an increasing function of R
Therefore, the test in (23) is equivalent to the following test
Reject wheneverH W< −c or W> ,c (25)
where c is determined, so that P H (W < −c or W > c) =α It is shown in the sequel
that the distribution of W under H is t−2 and hence c is readily determined.