Nested Random Eﬀects Models

An REM with two factors can be either crossed, as in Section 3.2.1, ornested, as studied now. Models with three or more factors can have aspects of both. It turns out that we have already seen an example of a nested REM: Recall the one-way model of Section 3.1 and observe how it can be envisioned as a two-stage design, whereby ﬁrst, theAunits, or classes, are randomly chosen from the relevant population and then, conditional on those chosen, from each a random sample ofnunits, or samples, are chosen. The factor corresponding to the samples is nested within the levels, or classes, of the treatment factor. While indeed a nested model, the adjective nested is typically used only when there are two or more factors (besides the error term), such that one is nested in another.

To see this hierarchy in the two-factor model, and how it diﬀers from its crossed counterpart, let the ﬁrst factor be school, withA=20 schools being chosen from a large population of such, and, for each of the chosen schools,B=8 teachers employed at the school are randomly chosen from the entire cohort of teachers. This implies that the factor “evaluator” (the teacher doing the grading) is nested within the factor “school”, and there are notBclasses of evaluators, but ratherAB, grouped according to which school they are from. Each of theABevaluators are asked to grade the writing assignment fromnrandomly chosen students such that thejth teacher in theith school receivesnexams from students in his or her schooli,i=1,…,A,j=1,…,B.

3.3.1 Two Factors

For the two-way nested case, we will distinguish two cases. The first is such that both factors are random, and the second is such that the first factor is fixed, giving rise to (our first example of ) a mixed model.

3.3.1.1 Both Eﬀects Random: Model and Parameter Estimation

In this setup, we observeYijk, thekth observation in thejth subclass of theith class,i=1,…,A, j=1,…,B,k=1,…,n, and assume that it can be represented as

Yijk =𝜇+ai+bij+eijk, (3.63)

withaii.i.d.∼ N(0, 𝜎2a),biji.i.d.∼ N(0, 𝜎2b), andeijki.i.d.∼ N(0, 𝜎e2). It is imperative to notice that there is no factor bj: Recall that the nested factor hasABlevels, and thus requires the double subscript. This can be compared to model (3.37), where the term with the double subscript refers to the interaction term resulting from crossing two factors. Some authors write factorbij asbj(i) to emphasize that thejth subclass is nested in theith class.

From (3.63), it follows that

𝔼[Yijk] =𝜇, Var(Yijk) =𝜎2a+𝜎b2+𝜎e2, Cov(Yijk,Yij′k′) =𝜎a2, (3.64) and

Cov(Yijk,Yijk′) =𝔼[(ai+bij+eijk)(ai+bij+eijk′)] =𝜎2a+𝜎b2. (3.65) For representing the likelihood, we stack theYijkin theABn×1 vectorYin the usual lexicon ordering (and similar for the error vectore), and leta= (a1,…,aA)′andb= (b11,b12,…,bAB)′. Then, as in (3.39),

Y= (1A⊗1B⊗1n)𝜇+ (IA⊗1B⊗1n)a+ (IA⊗IB⊗1n)b+ (IA⊗IB⊗In)e, (3.66) or, similar to (3.40),

Y= (1ABn)𝜇+ (IA⊗1Bn)a+ (IAB⊗1n)b+e (3.67)

=X𝜷+𝝐,

whereX=1ABn,𝜷=𝜇, and𝝐is the rest of (3.67). As usual, we let𝝁=𝔼[Y] =X𝜷. Then, (3.63) and (3.67) can be expressed asY∼NABn(𝝁,𝚺), where, from (3.67),

𝚺=𝕍(Y) =𝕍(𝝐)

= (IA⊗1Bn)Var(a)(IA⊗1Bn)′+ (IAB⊗1n)Var(b)(IAB⊗1n)′+Var(e)

= (IA⊗JB⊗Jn)𝜎a2+ (IA⊗IB⊗Jn)𝜎2b+ (IA⊗IB⊗In)𝜎e2.

= (IA⊗JBn)𝜎a2+ (IAB⊗Jn)𝜎2b+IABn𝜎e2. (3.68) Simulating a vectorYfor a given parameter constellation is easily done by ﬁrst drawinga,b, ande, and then computing (3.67). Maximum likelihood is also straightforwardly accomplished by modifying the program in Listing 3.1 to create, say, a Matlab function calledREM2wayNestedMLE, which the reader is encouraged to do. Similar to the code in Listing 3.3, Listing 3.10 generates a data set and outputs it as a text ﬁle. This can be subsequently read into SAS and analyzed using several of their

1 A=10; B=6; n=8; mu=5; siga=1; sigb=0.4; sige=0.8;

2 a = siga*randn(A,1); b = sigb*randn(A*B,1); e = sige*randn(A*B*n,1);

3 y = ones(A*B*n,1)*mu ...

4 + kron(eye(A),ones(B*n,1))*a ...

5 + kron(eye(A*B),ones(n,1))*b + e;

6 school = kron( (1:A)' , ones(B*n,1) );

7 TeacherNestedInSchool = kron( (1:(A*B))' , ones(n,1) );

8 Out=[y school TeacherNestedInSchool];

9 fname='REM2nested.txt';

10 if exist(fname,'file'), delete(fname), end 11 fileID = fopen(fname,'w');

12 fprintf(fileID,'%8.5f %4u %4u\r\n',Out'); fclose(fileID);

13 % [param, stderr, loglik, iters, bfgsok] = REM2wayNestedMLE(y,A,B,n)

Program Listing 3.10: Generates and writes to a text ﬁle a two-way nested balanced REM data set, and the associated class variables, based on the parameter constellation given in line 1, for input into SAS. The last line, 9, if uncommented, calls the custom-made Matlab program to compute the m.l.e., though note that a closed-form solution exists; see (3.81).

ods html close; ods html;

filename ein 'REM2nested.txt';

data school; infile ein stopover; input Y school Evaluator; run;

title 'REM 2-Way Nested Example';

proc varcomp method=ml;

class school Evaluator;

*model Y=school Evaluator;

model Y=school Evaluator(school);

run;

proc mixed method=ml cl ratio;

class school Evaluator;

model Y= / cl solution;

*random school Evaluator;

random school Evaluator(school);

run;

proc nested;

class school Evaluator;

var Y;

run;

SAS Program Listing 3.4: Reads in the data from the text file generated in Listing 3.3 and usesproc varcompandproc mixedwith maximum likelihood, andproc nested(which does not support maximum likelihood, and uses only the ANOVA method of estimation).proc nesteddoes not support use of mixed models, i.e., inclusion of fixed effects (besides the grand mean), and also assumes the input data are sorted by theclassvariables, which is the case by virtue of how we generated and wrote the data. Themodelstatements that are commented out inproc varcompand proc mixedcan also be used and deliver the same output.

procedures, as shown in SAS Program Listing 3.4. The m.l.e. obtained for a particular generated data set from Listing 3.10 and use of the custom Matlab functionREM2wayNestedMLEwas the same as those output from both SAS procedures, as was the obtained log-likelihood, as shown in the output of proc mixed. For this model, there is a closed-form expression for the m.l.e., provided the variance component estimates are positive, obviating the need for numeric calculations; see (3.81).

Theorem 3.5 Independence and Distribution By squaring and summing the expression Yijk =Ȳ•••+ (Ȳi••−Ȳ•••) + (Yij•−Ȳi••) + (Yijk−Ȳij•), (3.69) and conﬁrming that cross terms are zero, theSSdecomposition is given by

SST =SS𝜇+SSa+SSb+SSe, (3.70)

whereSST =∑A i=1

∑B j=1

∑n

j=1Yijk2, and

SS𝜇 = ABnȲ•••2 , SSa = Bn∑A

i=1(Ȳi••−Ȳ•••)2, SSb = n∑A

i=1

∑B

j=1(Ȳij•−Ȳi••)2, SSe = ∑A i=1

∑B j=1

∑n

j=1(Yijk−Ȳij•)2.

Proof: We wish show thatSS𝜇, SSa,SSb, andSSeare independent, and derive their distributions.

Instead of directly showing the zero correlation between the (4

)

=6 quantities, we generalize the method used in Section 3.1.2 by deﬁning

Gij =bij+ēij• and Hi=ai+b̄i•+ēi••=ai+Ḡi•. Now write

Yijk = Ȳ••• + (Ȳi••−Ȳ•••) + (Yij•−Ȳi••) + (Yijk−Ȳij•)

= (𝜇+H̄•) + (Hi−H̄•) + (Gij−Ḡi•) + (eijk−ēij•)

= 𝜇+ai+Ḡi• + (Gij−Ḡi•) + (eijk−ēij•)

= 𝜇+ai+bij + ēij• + (eijk−ēij•)

= 𝜇+ai+bij + eijk,

(3.71)

where the second row follows because, working from right to left, Yijk−Ȳij•= (𝜇+ai+bij+eijk) − (𝜇+ai+bij+̄eij•) =eijk−ēij•, Yij•−Ȳi••= (𝜇+ai+bij+ēij•) − (𝜇+ai+bi•+ēi••)

=bij−bi•+ēij•−̄ei••=Gij−Ḡi•,

Ȳi••−Ȳ•••= (𝜇+ai+bi•+ēi••) − (𝜇+ā•+b••+ē•••)

=ai−ā•+bi•−b••+ēi••−̄e•••=Hi−H̄•, andȲ•••=𝜇+ā•+b••+ē•••=𝜇+H̄•.

Next observe thatSSe=∑ ∑ ∑

(eijk−ēij•)2, and SSb=n∑ ∑

(Gij−Ḡi•)2, SSa=Bn∑

(Hi−H̄•)2, SS𝜇=ABn(𝜇+H̄•)2.

From the independence ofX̄ andS2Xfor normal samples,SSe⟂ēij•. AsSSeis a function of onlyēij•, andGij,HiandH̄• are functions ofēij• and other random variables independent ofēij•,SSe⟂SSb, SSe⟂SSa, andSSe⟂SS𝜇.

Similarly,SSb⟂Ḡi•and, becauseSSbdoes not involveai, it is independent of functions ofaiand Ḡi•, i.e., ofHi, so thatSSb⟂SSaandSSb⟂SS𝜇. Finally,SSa⟂H̄•, soSSa⟂SS𝜇.

For the distribution ofSSe, as, for each giveni,jpair,𝜎−2e

∑n

k=1(eijk−ēij•)2∼𝜒n−12 , and, from the independence of all theeijk,𝜎e−2SSe∼𝜒AB(n−1)2 .

Next,Gij∼N(0, 𝜎b2+𝜎e2∕n)or√

nGij∼N(0, 𝛾b), where𝛾b=n𝜎b2+𝜎e2, andSSb∕𝛾b ∼𝜒A(B−1)2 . Simi- larly,Hi∼N(0, 𝜎a2+𝜎b2∕B+𝜎e2∕Bn)or√

BnHi∼N(0, 𝛾a), where𝛾a=Bn𝜎a2+n𝜎b2+𝜎e2, so that SSa

𝛾a = Bn Bn

∑A

i=1Bn(Hi−H̄•)2

𝛾a =

∑A i=1(√

BnHi−√ BnH̄•)2

𝛾a ∼𝜒A−12 . (3.72)

Lastly,H̄•+𝜇∼N(𝜇, 𝜎a2∕A+𝜎b2∕AB+𝜎e2∕ABn)or√

ABn(H̄•+𝜇) ∼N(√

ABn𝜇, 𝛾a), so that dividing

by√𝛾aand squaring gives(SS𝜇∕𝛾a) ∼𝜒12(ABn𝜇2∕𝛾a). ◾

Summarizing,SS𝜇,SSa,SSb, andSSeare independent, and SS𝜇

𝛾a ∼𝜒12

(ABn𝜇2 𝛾a

) , SSa

𝛾a ∼𝜒A−12 , SSb

𝛾b ∼𝜒A(B−1)2 , SSe

𝜎e2 ∼𝜒AB(n−1)2 , (3.73)

where𝛾a =Bn𝜎a2+n𝜎2b+𝜎e2and𝛾b=n𝜎b2+𝜎e2. TheEMSare given by

𝔼[MS𝜇] =𝛾a𝔼 [

𝜒12

(ABn𝜇2 𝛾a

)]

=𝛾a (

1+ABn𝜇2 𝛾a

)

=𝛾a+ABn𝜇2, (3.74)

𝔼[MSa] = 𝛾a

A−1𝔼[𝜒A−12 ] =𝛾a, 𝔼[MSb] = 𝛾b

A(B−1)𝔼[𝜒A(B−1)2 ] =𝛾b, (3.75)

and

𝔼[MSe] = 𝜎e2

AB(n−1)𝔼[𝜒AB(n−1)2 ] =𝜎2e. (3.76)

These results are summarized in their standard fashion in Table 3.5.

It is valuable to explicitly consider how the sums of squares are computed, for anABn×1 vector Yin the lexicon ordering (3.66), generated, say, from lines 1–3 in Listing 3.10. The key is to use the matrices in (3.67), as shown in lines 2 and 4, in the code given in Listing 3.11, which also computes the closed-form m.l.e. solution (3.81).

Table 3.5 ANOVA table for the balanced two-factor nested REM. Notation B(A) is short for “B within A”, indicating the hierarchy of the nested factor.

Source df SS EMS

Mean 1 ABnȲ•••2 𝜎e2+n𝜎b2+Bn𝜎2a+ABn𝜇2

A A−1 Bn∑A

i=1(Ȳi••−Ȳ•••)2 𝜎e2+n𝜎b2+Bn𝜎2a

B(A) A(B−1) n∑A i=1

∑B

j=1(Ȳij•−Ȳi••)2 𝜎e2+n𝜎b2 Error AB(n−1) ∑A

i=1

∑B j=1

∑n

k=1(Yijk−Ȳij•)2 𝜎e2

Total ABn ∑A

i=1

∑B j=1

∑n k=1Yijk2

1 SST=sum(y'*y); Ydddb=mean(y); SSu=A*B*n*Ydddbˆ2; % Yddb= \bar{Y}_{dot dot dot}

2 H=kron(eye(A), ones(B*n,1)); Yiddb=y'*H/(B*n); % Yiddb= \bar{Y}_{i dot dot}

3 SSa=B*n*sum( (Yiddb-Ydddb).ˆ2 );

4 H=kron(eye(A*B), ones(n,1)); Yijdb=y'*H/n; % Yijdb= \bar{Y}_{i j dot}

5 m = kron(Yiddb,ones(1,B)); SSb=n*sum( (Yijdb-m).ˆ2 );

6 m=kron(Yijdb', ones(n,1)); SSe=sum( (y-m).ˆ2 );

7 check=SST-(SSu+SSa+SSb+SSe) % is zero

8 % Now the MLE, if the MLE variance components are positive 9 MSe=SSe/A/B/(n-1); MSb=SSb/A/(B-1); MSa=SSa/(A-1);

10 sigma2eMLE=MSe; sigma2bMLE=(MSb-MSe)/n;

11 sigma2aMLE = ( (1-1/A)*MSa - MSb )/B/n;

12 muMLE=Ydddb; MLE=[muMLE sigma2aMLE sigma2bMLE sigma2eMLE]

Program Listing 3.11: Computes theSSvalues in (3.70) for a given vector𝐘in lexicon order, corresponding to a two-way nested, both factors random, balanced REM, and the closed-form m.l.e.

solution (3.81).

With respect to hypothesis test statistics, a test for𝜎a2>0 will be based onMSadivided not byMSe (which would otherwise test𝜎2a=𝜎2b=0) butMSadivided byMSb, i.e.,

SSa

𝛾a ∕(A−1) SSb

𝛾b

∕A(B−1)

∼F(A−1),A(B−1) or Fa= MSa MSb ∼ 𝛾a

𝛾b

FA−1,A(B−1), (3.77)

a scaled centralFdistribution. If𝜎a2=0, then𝛾a=𝛾b, so that an𝛼-level test for𝜎a2=0 versus𝜎2a>0 rejects ifFa >FA−1,A(B−1)𝛼 , whereFn,d𝛼 is the 100(1−𝛼)th percent quantile of theFn,ddistribution. Like- wise,

SSb

𝛾b ∕A(B−1) SSe

𝜎2e ∕AB(n−1)

∼FA(B−1),AB(n−1) or Fb= MSb MSe ∼ 𝛾b

𝜎e2

FA(B−1),AB(n−1) (3.78)

is a scaledFdistribution. For𝜎b2=0,𝛾b=𝜎e2, so that an𝛼-level test for𝜎b2=0 versus𝜎b2>0 rejects if Fb>FA(B−1)𝛼 ,AB(n−1).

Now turning to point estimators, from (3.75) and (3.76), 𝔼[MSe] =𝜎e2,𝔼[MSb] =n𝜎b2+𝜎2e, and 𝔼[MSa] =Bn𝜎a2+n𝜎b2+𝜎e2, so that

̂𝜎2e =MSe, ̂𝜎b2= (MSb−MSe)∕n, and ̂𝜎a2= (MSa−MSb)∕Bn (3.79) yield unbiased estimators using the ANOVA method of estimation. A closed-form solution to the set of equations that equate zero to the ﬁrst derivatives of the log-likelihood is available, and is the m.l.e.

if all variance component estimates are positive. For𝜇, the m.l.e. is

̂𝜇ML=Ȳ•••, (3.80)

which turns out to be true forallpure random eﬀects models, balanced or unbalanced; see, e.g., Searle et al. (1992, p. 146). For the variance components, if they are positive,

̂𝜎2e,ML=MSe, ̂𝜎2b,ML= (MSb−MSe)∕n, ̂𝜎a,ML2 = ((1−A−1)MSa−MSb)∕Bn; (3.81) see, e.g., Searle et al. (1992, p. 148). Point estimators of other quantities of interest can be determined from the invariance property of the m.l.e. For example, the m.l.e. of𝜌∶=𝜎a2∕𝜎2e is, comparing (3.79) and (3.81),

̂𝜌ML= ̂𝜎a,ML2

̂𝜎e,ML2 ≈ ̂𝜎a2

̂𝜎e2

= MSa−MSb

Bn MSe =∶ ̂𝜌. (3.82)

3.3.1.2 Both Eﬀects Random: Exact and Approximate Conﬁdence Intervals

For conﬁdence intervals, the easiest (and usually of least relevance) is for the error variance. Similar to (3.22) for the one-factor case, from (3.73),SSe∕𝜎e2is a pivot, so that a 100(1−𝛼)%conﬁdence interval for𝜎e2is given by(SSe∕u,SSe∕l)because

1−𝛼=Pr (

l⩽ SSe 𝜎e2

⩽u )

=Pr (SSe

u ⩽𝜎2e ⩽ SSe l

), (3.83)

where landu are given by Pr(l⩽𝜒AB(n−1)2 ⩽u) =1−𝛼, and 0< 𝛼 <1 is a chosen tail probability, typically 0.05.

Exact intervals for some variance ratios of interest are available. From (3.78), 𝜎e2

𝛾bFb= 𝜎2e n𝜎b2+𝜎e2

Fb∼FA(B−1),AB(n−1)

is a pivot, and, using similar manipulations as in the one-factor case, we obtain the intervals 1−𝛼=Pr

( L

Fb < 𝜎e2 n𝜎2b+𝜎e2

< U Fb

)

=Pr

(Fb∕U−1 n < 𝜎b2

𝜎e2

< Fb∕L−1 n

)

=Pr

( Fb−U

nU+Fb−U < 𝜎b2

𝜎2e+𝜎b2 < Fb−L nL+Fb−L

)

, (3.84)

where Pr(L⩽FA(B−1),AB(n−1)⩽U) =1−𝛼.

Wald-based approximate conﬁdence intervals for𝜎a2 and𝜎b2 can be computed in the usual way, and the Satterthwaite approximation is also available. In particular, for 𝜎a2= (𝛾a−𝛾b)∕(Bn), where 𝛾a=Bn𝜎2a+n𝜎b2+𝜎2eand𝛾b=n𝜎b2+𝜎e2, withh1= −h2= (Bn)−1andd1=A−1,d2=A(B−1), then either from (3.29) and (3.31), or (3.33) and (3.34),

d̂= (h1̂𝛾a+h2̂𝛾b)2

(h21̂𝛾a2∕d1+h22̂𝛾b2∕d2) = (̂𝛾a−̂𝛾b)2

̂𝛾a2∕(A−1) +̂𝛾b2∕A(B−1) = (MSa−MSb)2

(MSa)2

A−1 + (MSb)2

A(B−1)

, (3.85)

and, for 1−𝛼=Pr(l⩽𝜒2̂

d ⩽u), 1−𝛼≈Pr

(d̂(MSa−MSb)

Bn u ⩽𝜎a2⩽d̂(MSa−MSb) Bn l

) . Similarly, for𝜎b2= (𝛾b−𝜎e2)∕n=n−1(𝔼[MSb] −𝔼[MSe]),

d̂= (MSb−MSe)2

(MSb)2

A(B−1) + (MSe)2

AB(n−1)

, (3.86)

and

1−𝛼≈Pr

(d̂(MSb−MSe)

n u ⩽𝜎2b⩽d̂(MSb−MSe) n l

)

, (3.87)

foruandlsuch that 1−𝛼=Pr(l⩽𝜒2̂

d ⩽u).

As is clear from (3.82), an exact interval for𝜌=𝜎a2∕𝜎e2 is not available because there is no exact pivot, but applying the Satterthwaite approximation using (3.85) results in

̂𝜌

𝜌 = (MSa−MSb)∕𝜎a2

Bn MSe∕𝜎e2

app∼Fd,AB(n−1)̂

being an approximate one. Thus, withLandUgiven by Pr(L⩽Fd,AB(n−1)̂ ⩽U) =1−𝛼for 0< 𝛼 <1, an approximate c.i. for𝜌is

1−𝛼≈Pr ( ̂𝜌

U < 𝜌 < ̂𝜌 L

)

, ̂𝜌= MSa−MSb

Bn MSe . (3.88)

The bootstrap/saddlepoint-based method of Butler and Paolella (2002b) is also applicable in this case and yields higher accuracy for small sample sizes.

LettingV =𝜎a2+𝜎b2+𝜎e2be the total variance, other ratios, such as𝜎a2∕V,𝜎b2∕V, and(𝜎a2+𝜎b2)∕V, are also of potential interest, as well as 𝜎a2∕(𝜎2a+𝜎b2)and 𝜎b2∕(𝜎a2+𝜎2b). In the balanced setting, if exact intervals are not available, the Satterthwaite method and/or the bootstrap/saddlepoint-based method can be invoked. These could then, in turn, be used for the unbalanced case by the bootstrap/q-calibration exercise.

Similar to the idea in the remark at the end of Section 3.1.6.2, it is highly instructional (and potentially useful) to make a program that inputs an unbalanced panel for a two-way nested REM, and outputs (among other things, such as the approximate m.l.e. based on the method discussed in Section 3.1.6.1) a conﬁdence interval for, say, 𝜌=𝜎a2∕𝜎2e, based on (3.88) using the bootstrap/q-calibration exercise described in Section 3.1.6.2. Naturally, other conﬁdence intervals, such as for the individual variance components or other ratios of interest, could also be incorporated.

In doing so, the first orders of business are to (i) write a program to compute the estimates of the missing values (via optimization to get also the approximate covariance matrix), using the closed-form expression for the m.l.e. of the model parameters𝜇,𝜎2a,𝜎b2, and𝜎2e, and (ii) confirm that the approximate m.l.e. for𝜇,𝜎a2, and𝜎b2are essentially equal to the true m.l.e. (as computed, say, by SAS), and that of𝜎e2is off by a multiplicative factor of 1.0735 for the constellation of parameters and number of (and constellation of ) missing values used, namelyA=10,B=6,n=8,𝜇=5,𝜎a =1,𝜎b=0.4,𝜎e=0.8, and 30 missing observations.

The precise constellation of missing values we chose that gave rise to this multiplicative factor of 1.0735 is shown in Listing 3.12. This correction factor needs to be applied because otherwise, the bootstrap inference will be jeopardized. At this point, the reader might protest: How can this be done without access to the true m.l.e., in particular, without, say, SAS? As mentioned in Section 3.1.6.1 in the context of the one-way model, one could use simulation (and only Matlab), taking the multiplicative adjustment to be that value such that the estimator’s (mean, or possibly median) bias is minimized.

ProgramREM2wayNestedUnbalancedSatterforrho(not shown, leaving it as a wonderful exercise for the reader) accomplishes this. Reasonably reliable assessment of the actual coverage would require use of at least s=1,000 simulated data sets, in which case, withs=1,000, a simple 95%

binomial conﬁdence interval of the actual coverage probability (assuming the true coverage, and the observed actual, isp=0.90) is, to two digits, 0.90±1.96√

p(1−p)∕s= (0.88,0.92).

Use ofBoot=250 bootstrap replications (and, for each,sMiss=250 replications of the missing data and computation of the balanced-case interval (3.88)) takes, for a single simulated data set, about 30 to 60minuteson a typical PC at the time of writing (and use of one core only) to produce the conﬁdence interval of𝜌. Such a simulation withs=1,000 was done (with 24 cores and 21 hours), and resulted in an actual coverage of 0.927, suggesting that the actual coverage might be slightly larger than the nominal. Use ofBoot=sMiss=1,000 takes correspondingly longer and resulted in an actual coverage of 0.926, suggesting that use of 250 is adequate and such that the larger nominal coverage does not stem from too small a choice ofBootorsMiss.

A histogram of the interval lengths (not shown) reveals that it is roughly Gaussian, with an elon- gated right tail. The average interval length was 4.0 and the sample standard deviation of the lengths was 1.9, indicating how much uncertainly is inherent in conﬁdence intervals for (ratios of ) variance components, even with a respectable sample size.

1 A=10; B=6; n=8; mu=5; siga=1; sigb=0.4; sige=0.8; bad=1;

2 while bad

3 a=siga*randn(A,1); b=sigb*randn(A*B,1); e=sige*randn(A*B*n,1);

4 y=ones(A*B*n,1)*mu ...

5 + kron(eye(A),ones(B*n,1))*a ...

6 + kron(eye(A*B),ones(n,1))*b + e;

7 iset=1:2:9; % set some values to missing, here 30 of them 8 for iloop=1:length(iset)

9 i=iset(iloop);

10 j=1; k=1; ind=B*n*(i-1)+n*(j-1)+k; y(ind)=NaN;

11 j=1; k=2; ind=B*n*(i-1)+n*(j-1)+k; y(ind)=NaN;

12 j=3; k=1; ind=B*n*(i-1)+n*(j-1)+k; y(ind)=NaN;

13 j=5; k=1; ind=B*n*(i-1)+n*(j-1)+k; y(ind)=NaN;

14 j=6; k=1; ind=B*n*(i-1)+n*(j-1)+k; y(ind)=NaN;

15 j=6; k=2; ind=B*n*(i-1)+n*(j-1)+k; y(ind)=NaN;

16 end

17 try

18 [mu_miss, V_miss]=REM2wayNestedMLEMiss(y,A,B,n);

19 bad=min(eig(V_miss)) < 0.1;

20 catch %#ok<CTCH>

21 bad=1;

22 end

23 end

Program Listing 3.12: Simulates a two-way nested, both factors random, balanced REM, with the indicated constellation of parameters, and then sets 30 of the values to missing, as indicated. The use of while badis to ensure that the data set results in a valid approximate covariance matrix for the estimated missing values. It is very rare that this is problematic, but it is necessary when using the bootstrap procedure for approximate confidence intervals with unbalanced data. The use oftry and catchis because, also rarely but possibly, the BFGS optimization algorithm, as used in pro- gramREM2wayNestedMLEMissand based on Matlab version 2010, can fail. It is important to note that both of these selection mechanisms can induce a sample selection bias, and could affect the small sample properties of the point and interval estimators. We ignore this issue because, first, both mechanisms are rarely engaged, and, second, because interest here centers on development of concepts and teaching. A more rigorous analysis would have to address and resolve both issues.

Figure 3.5 shows the resulting point estimates of the variance components based on the approximate m.l.e. with multiplicative factor adjustment for ̂𝜎e2.

3.3.1.3 Mixed Model Case

Recall from Section 2.1 that a mixed effects model is one that contains both fixed and random effects, outside of the grand mean and the error term. We now describe the two-way nested mixed model, such that the first factor, A, is fixed, and the second factor, B, is nested in A, and is random. Using our perpetual example with schools and writing evaluations from the beginning of Section 3.3, the model is now similar to that of Section 3.3.1.1, where both factors are random, but now the schools are considered fixed (“because we are interested in them”).

0 0.05 0.1 0.15 0.2 0.25 0.3 0

10 20 30 40 50 60 70 80 90

100 Estimate of σb2

0 0.5 1 1.5 2 2.5

0 20 40 60 80 100

120 Estimate of σa2

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0

10 20 30 40 50 60 70 80 90

100 Estimate of σe2

Figure 3.5 Point estimates of the three variance components for the two-way nested REM, based on the approximate m.l.e. with multiplicative factor adjustment for̂𝜎2eand use of 1,000 replications. True model parameters are those given in Listing 3.12, namely𝜎a2=12,𝜎2b= (0.4)2,𝜎2e= (0.8)2.

Exactly as in the case where both factors are assumed random, we observeYijk, thekth observation in thejth subclass of theith class,i=1,…,A,j=1,…,B,k=1,…,n, but now assume that

Yijk =𝜇+𝛼i+bij+eijk,

∑A i=1

𝛼i=0, bij i.i.d.∼ N(0, 𝜎b2), eijki.i.d.∼ N(0, 𝜎e2), (3.89) where theAclasses are fixed levels of particular interest and, for eachi, theBsubclasses are randomly chosen. Differing from (3.65) and (3.65), first and second moments are

𝔼[Yijk] =𝜇+𝛼i, Var(Yijk) =𝜎b2+𝜎e2, Cov(Yijk,Yijk′) =𝜎2b, Cov(Yijk,Yij′k′) =0. (3.90) From expression (3.69), we again get decomposition (3.70).

VectorYis expressed exactly the same as in (3.67), but using𝜶= (𝛼1, 𝛼2,…, 𝛼A)′instead ofa, namely Y = (1ABn)𝜇+ (IA⊗1Bn)𝜶+ (IAB⊗1n)b+e

=X𝜷+𝝐, (3.91)

where 𝜷= [𝜇, 𝜶′]′, Xconsists of the column 1ABn followed by those ofIA⊗1Bn, and𝝐= (IAB⊗ 1n)b+e. As always, let𝝁∶=𝔼[Y] =X𝜷. We can then express (3.89) and (3.91) asY∼NABn(𝝁,𝚺), where, similar to (3.68) but without component𝜎2a,

𝚺= (IAB⊗Jn)𝜎b2+IABn𝜎e2. (3.92)

With the likelihood expressible, one could use Matlab’s constrained optimization methods (to respect

∑A

i=1𝛼i=0, and the positiveness of the two variance components), though much more eﬃcient methods exist (see, e.g., the references given at the beginning of the chapter, as well as Galwey, 2014, and West et al., 2015) and are built into statistical software packages (along with the availability of the more popular restricted m.l.e., or REML). In particular, the m.l.e. of𝜷 is, from (i) model structure (3.91) and (3.92), (ii) the Gaussianity assumption on𝝐, and (iii) results in Chapter 1, equal to the generalized least squares estimator, andfor balanced data, this turns out to be equal to the ordinary least squares estimator; see, e.g., Searle et al. (1992, Sec. 4.9) for a detailed explanation.

Recall (2.77) for computing the coefficient estimates in the two-way fixed effects ANOVA. Similarly, with𝟏AanA-length column of ones, andmidenoting the mean of theBnelements corresponding to theith class of the first factor,i=1,…,A, the least squares estimator̂𝜷of𝜷in (3.91) is given by the solution to the over-identified system of equationsZc=m, where

Z= [𝟏A IA

0 𝟏′A ]

, c=

⎡⎢

⎢⎢

⎢⎣ 𝛼𝜇1 𝛼2 𝛼⋮A

⎤⎥

⎥⎥

⎥⎦

, m=

⎡⎢

⎢⎢

⎢⎣ m1 m2

⋮ mA

⎤⎥

⎥⎥

⎥⎦

, (3.93)

The solution isc= (Z′Z)−1Z′m, with code given in Listing 3.13, assuming the relevant variables are in computer memory, e.g., as constructed by lines 1–3 in Listing 3.10. This estimator is diﬀerent than what is delivered by SAS’proc mixed, which setŝ𝛼A=0 as the constraint. However, one can con- ﬁrm that the estimates of the estimable functions ̂𝜇+ ̂𝛼iagree between SAS and use of (3.93).

Maximum likelihood estimates of the remaining parameters of the model, 𝜎b2 and 𝜎e2, can be obtained by numerically maximizing the log-likelihood of(Y−X̂𝜷) ∼NABn(𝟎,𝚺), where𝚺is given in (3.92). The reader is encouraged to construct a program, sayREM2wayNestedMixedMLE, to accomplish this, and confirm that the point estimates of the two variance components are the same as those delivered by SAS. Regarding SAS code, first note that we can use the same data set as was generated by the code in Listing 3.10 for analysis in SAS, just treating the classes of factor A as fixed.

The code in SAS Listing 3.5 then shows how the two-way mixed model is estimated with maximum likelihood.

As with previous models, we wish to show thatSS𝜇,SS𝛼,SSb, andSSeare independent, and derive their distributions and the correspondingEMSvalues. The independence of theSSfollows the same

1 X=kron(eye(A),ones(B*n,1));

2 v=X*inv(X'*X)*X'*y; v=reshape(v,B*n,A)';

3 m=[v(:,1) ; 0]; Z=[ones(A,1), eye(A) ; 0 , ones(1,A)];

4 c=inv(Z'*Z)*Z'*m;

Program Listing 3.13: Computes the solution to (3.93).

Ordinary and Generalized Least Squares

The Geometric Approach to Least Squares