One-Factor Balanced Random Eﬀects Model

The one-factor REM is the simplest case, and the obvious starting point. It is also an important model, serving to introduce the various concepts and procedures common to all REMs. As such, we go through the development slowly, with much detail, and also address the unbalanced case (albeit only partially, and in a non-conventional way). For the subsequent development of higher-order models, the pace is sped up, with some derivations and computational exercises given in the end of chapter exercises (answers are provided).

Recall the simple example mentioned in Section 2.1, where we sampleA=20 schools from a large population of schools belonging to some well-defined cohort (e.g., public high schools in a particular geographic area), and from each we samplen=15 students in the same grade whose performance on some standardized test is to be evaluated. This is an example of a one-way REM. Examples abound in numerous fields of research: In agriculture (or forestry, animal studies, etc.), different plots of land can form the classes; in manufacturing, the classes can be from the factory lines of production, and/or the workers. In medicine, hospitals or clinics (or medical practitioners) can be the population of interest, etc. Returning to the school example, the variation due to the evaluators of the test might be the subject of interest.

3.1.1 Model and Maximum Likelihood Estimation

LetYij denote thejth observation in theith class,i=1,…,A,j=1,…,n. As with the fixed effects model, thenreplications are random effects, but now theAvalues of the object under study (e.g., schools, hospitals, factory machines, sampled batches, segments of an ocean, galaxies, etc.) are also considered to be random realizations from a large population. Under the normality assumption, the model is represented as

Yij =𝜇+ai+eij, aii.i.d.∼ N(0, 𝜎a2), eiji.i.d.∼ N(0, 𝜎e2), (3.1)

and such thataiandeijare independent for alliandj. The three model parameters, which are assumed ﬁxed but unknown, are𝜇,𝜎a2, and𝜎e2. From (3.1), the ﬁrst two moments are

𝔼[Yij] =𝜇, Var(Yij) =𝜎a2+𝜎e2, (3.2)

and

Cov(Yij,Yij′) =𝔼[(ai+eij)(ai+eij′)] =𝜎2a, j′ ≠j. (3.3) (Notice here the use of the prime to denote “an alternative element”, as opposed to a matrix transpose, or the ﬁrst derivative.) In light of (3.3),𝜎a2is denoted theintra-class variance, and we will denote𝜎2e as theerror variance.

In order to express the model in matrix notation, we ﬁrst stack theYijin “lexicon order”, such that indexjchanges the fastest, giving

Y= (Y11,Y12,…,Y1n,Y21,Y22,…,Y2n,…,Y11,YA2,…,YAn)′, (3.4) and deﬁne vectoresimilarly. Then, witha= (a1,…,aA)′and, similar to (2.22) using Kronecker product notation,

Y = (1A⊗1n)𝜇+ (IA⊗1n)a+e

=X𝜷+𝝐, (3.5)

whereX=1An,𝜷 =𝜇, and𝝐= (IA⊗1n)a+e. We can thus express (3.1) and (3.5) asY∼NAn(𝝁,𝚺), where𝝁=𝔼[Y] =X𝜷and, withJnann×nmatrix of ones,

𝚺=Var(Y) =Var(𝝐) = (IA⊗1n)Var(a)(IA⊗1n)′+Var(e)

= (IA⊗1n)𝜎a2IA(IA⊗1n)′+𝜎e2IAn=𝜎a2(IA⊗Jn) +𝜎e2(IA⊗In)

=IA⊗(𝜎2aJn+𝜎2eIn). (3.6)

Based on representationY∼NAn(𝝁,𝚺)and (3.6), it is straightforward to express the likelihood and numerically maximize it to obtain the m.l.e. The code for doing this is given in Listing 3.1. This exercise is beneficial for learning to “do things oneself” using basic principles, though (i) we will see below in Section 3.1.3 that there is a closed-form expression for the m.l.e., provided ̂𝜎a,ML2 is positive, and (ii) for this model and, particularly, for more complicated models (such as one with mixed fixed and random effects, unbalanced data, continuous covariates, etc.), one would typically use canned reliable statistical software packages for the computations, as shown in Section 3.1.5.

Simulation is the easiest way of determining the small-sample performance of the m.l.e., and this was done for the constellationA=20,n=15,𝜇=5,𝜎a2=0.4, and𝜎2e =0.8, usingS=10,000 replications.

Code for one such replication is shown in Listing 3.2, from which the reader can generate code to perform the simulation. (The code also computes the elements in the sums of squares decomposition given in (3.8), which we will need for other point and interval estimators.)

The simulation results are shown in Figure 3.1. The top panels show histograms of the estimated parameters, with the vertical dashed lines indicating the true parameters. (The m.l.e. is computed for 𝜇,𝜎a, and𝜎e, and recall the invariance property of the m.l.e., such that ̂𝜎a2is just the square of ̂𝜎a.)

1 function [param, stderr, loglik, iters,bfgsok] = REM1wayMLE(y,A,n) 2 % param = [mu siga sige], sig is sigma, not sigmaˆ2

3 ylen=length(y); if A*n ~= ylen, error('A and/or n wrong'), end 4 y=reshape(y,ylen,1); lo=1e-3; hi=2*std(y);

5 bound.lo= [-1 lo lo]'; % mu, siga, sige 6 bound.hi= [ 1 hi hi]';

7 bound.which=[ 0 1 1]';

8 initvec=[mean(y) std(y)/2 std(y)/2]';

9 opts=optimset('Display','None','TolX',1e-6,'MaxIter',200,...

10 'MaxFunEval',600,'LargeScale','off'); bfgsok=1;

11 try

12 [pout,fval,~,theoutput,~,hess]= fminunc(@(param) ...

13 REM1_(param,y,A,n,bound),einschrk(initvec,bound),opts);

14 catch %#ok<CTCH>

15 disp('switching to use of simplex algorithm (fminsearch)') 16 [pout,fval,~,theoutput]= fminsearch(@(param) ...

17 REM1_(param,y,A,n,bound),einschrk(initvec,bound),opts);

18 hess=eye(length(pout)); % just a place filler.

19 bfgsok=0;

20 end

21 V=inv(hess); [param,V]=einschrk(pout,bound,V); param=param';

22 stderr=sqrt(diag(V))'; iters=theoutput.iterations; loglik=-fval;

24 function loglik=REM1_(param,y,A,n,bound) 25 if nargin<5, bound=0; end

26 if isstruct(bound), param=einschrk(real(param),bound,999); end 27 mu=param(1); siga=param(2); sige=param(3);

28 sigma2a=sigaˆ2; sigma2e=sigeˆ2;

29 muv=ones(A*n,1)*mu; J=ones(n,n); tmp=sigma2a*J+sigma2e*eye(n);

30 Sigma=kron(eye(A),tmp); loglik=-log(mvnpdf(y,muv,Sigma));

Program Listing 3.1: Maximum likelihood estimation of the three parameters of the one-way REM.

Functioneinschrkis given in Listing III.4.7. An arbitrary positive lower bound is necessarily placed on the variance components. It was found that, as this bound gets closer to zero, numeric issues associated with the gradient/Hessian-based optimization method using the so-called BFGS algorithm (after the authors Charles George Broyden, Roger Fletcher, Donald Goldfarb, and David Shanno; see Section III.4.3.1) in Matlab version 2010 sometimes occur. To resolve this, if this happens, the program switches to use of the simplex method for optimization, which appears to never fail, though, for the same requested accuracy, requires far more function evaluations and thus takes longer. For the two constellations of parameters used in the simulations, and the imposed lower bound of 0.001 on𝜎aand 𝜎e, the BFGS method never failed.

We see that, for this constellation, the m.l.e. appears close to unbiased and normally distributed, cer- tainly for the ﬁxed eﬀect𝜇, but notably for𝜎e2and reasonably so for𝜎a2.

The bottom panels show the histograms of the approximate standard errors (square roots of the variances) output from the BFGS algorithm (see, e.g., Section III.4.3 for details), with the vertical dashed lines being the best approximation of the truth: the sample standard error of theS m.l.e.

point estimates of𝜇, ̂𝜎a, and ̂𝜎e, respectively. It thus appears that, for this constellation of parameters, inference on𝜇,𝜎a2, and𝜎e2can safely be made using the asymptotic normal distribution and the

1 % desired parameters

2 A=20; n=15; mu=5; sigma2a=0.4; sigma2e=0.8;

4 % make Sigma matrix and generate a sample

5 muv=ones(A*n,1)*mu; J=ones(n,n); tmp=sigma2a*J+sigma2e*eye(n);

6 Sigma=kron(eye(A),tmp);

7 y=mvnrnd(muv,Sigma,1)'; % this is built into Matlab 8

9 % compute the various sums of squares

10 SST=sum(y'*y); Yddb=mean(y); SSu=A*n*Yddbˆ2; % Yddb is \bar{Y}_{dot dot}

11 H=kron(eye(A), ones(n,1)); Yidb=y'*H/n; % Yidb is \bar{Y}_{i dot}

12 SSa=n*sum( (Yidb-Yddb).ˆ2 ); m=kron(Yidb', ones(n,1)); SSe=sum( (y-m).ˆ2 );

13 check=SST-(SSu+SSa+SSe) % is zero 14

15 % MLE by brute force maximization

16 [param, stderr, loglik, iters,bfgsok] = REM1wayMLE(y,A,n);

17 AME=[param(1), param(2)ˆ2, param(3)ˆ2]

19 % MLE using closed form expression

20 mu_hat_MLE = mean(y); sigma2e_hat_MLE = SSe/A/(n-1);

21 sigma2a_hat_MLE = ( SSa/A - SSe/A/(n-1) )/n;

22 MLE = [mu_hat_MLE sigma2a_hat_MLE sigma2e_hat_MLE]

Program Listing 3.2: Generates a one-way REM data set, computes the sums of squares decomposition given in (3.8), calls the m.l.e. program in Listing 3.1, and computes the closed-form m.l.e. given in (3.21).

approximate standard errors output from use of the BFGS algorithm for computing the m.l.e. Another simple approximation to the standard errors is given in Section 3.1.3.

Figure 3.2 is similar to Figure 3.1, but based onA=7 andn=5. While the empirical distribution of ̂𝜎e2is still close to Gaussian and the estimator appears virtually unbiased, its variance has increased markedly due to the reduction ofn=15 ton=5. Having reducedA=20 toA=7, not only has the variation of ̂𝜎2a increased, but it is no longer Gaussian, so that Wald conﬁdence intervals based on the estimated standard error will not be particularly accurate. Below, we will discuss other ways of generating conﬁdence intervals for ̂𝜎2a that tend to be more accurate in such situations.

While that seems beneﬁcial for small sample sizes, the resulting intervals, however accurate, will be frustratingly wide.

3.1.2 Distribution Theory and ANOVA Table

In this and subsequent REMs, we will begin with the trivial “telescoping” identity

Yij =Ȳ••+ (Ȳi•−Ȳ••) + (Yij−Ȳi•). (3.7) By squaring each term and summing, the reader is encouraged to conﬁrm that the sums of all cross terms vanish, so that, similar to (2.28),

∑A i=1

∑n

j=1Yij2 = AnȲ••2 + n∑A

i=1(Ȳi•−Ȳ••)2 + ∑A i=1

∑n

j=1(Yij−Ȳi•)2

SST = SS𝜇 + SSa + SSe, (3.8)

4.5 5 5.5 0

200 400 600 800 1000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0

200 400 600 800 1000

BFGS Std Err of σa BFGS Std Err of σe

0.6 0.7 0.8 0.9 1

0 200 400 600 800 1000 1200

0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0

200 400 600 800 1000

1200 BFGS Std Err of μ

0.08 0.1 0.12 0.14 0.16 0

200 400 600 800 1000 1200

0.032 0.034 0.036 0.038 0.04 0.042 0.0440 200

400 600 800 1000 1200 1400

Figure 3.1 Top:Histograms of the m.l.e. of the three parameters, from left to right,𝜇,𝜎2a, and𝜎2e, of the one-way REM, based onA=20,n=15, and S=10,000 replications. The vertical dashed line indicates the true value of the parameter in each graph.Bottom:Histograms of the approximate standard errors output from the BFGS algorithm, with the vertical dashed lines being the sample standard error of theSm.l.e. point estimates of𝜇,̂𝜎a, and̂𝜎e.

4 4.5 5 5.5 6 0

100 200 300 400 500 600 700 800 900

0 0.5 1 1.5

0 200 400 600 800 1000 1200 1400 1600 1800

0 0.1 0.2 0.3 0.4 0.5 0.6 0

200 400 600 800 1000 1200 1400 1600 1800

2000 BFGS Std Err of σa

0.08 0.1 0.12 0.14 0.16 0

200 400 600 800 1000 1200 1400 1600 1800

200000.2 0.4 BFGS Std Err of σ0.6 0.8 1 e1.2 1.4 200

400 600 800 1000

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0

100 200 300 400 500 600 700 800 900

1000 BFGS Std Err of μ

Figure 3.2Same as Figure 3.1 but forA=7 andn=5.

whereSST denotes total (uncorrected) sum of squares,SS𝜇is sum of squares for the mean,SSais sum of squares for eﬀect A, andSSeis the error sum of squares. Thus,SSTis partitioned into theSS of the model factors.

Theorem 3.1 Independence The three terms on the right-hand side (r.h.s.) of (3.7) are independent, in which case so are sums of their squares (or any functions of them), i.e.,SS𝜇,SSa, andSSeare independent.

Proof: Observe that each term on the r.h.s. of (3.7) is normally distributed, so that we only need to verify that the covariance between each of them is zero to establish their independence. The ﬁrst term, Ȳ••, has mean𝜇, while the other two have mean zero. We thus need to show that the expected product of each of the three pairs of terms is zero.

Before beginning, recall the notation from (2.3), and let ei•=

∑n j=1

eij, ̄ei•= ei•

n, e••=

∑A i=1

∑n j=1

eij, ̄e••= e••

An, and similarly forȲi•andȲ••, so that

Ȳi•= 1 n

∑n j=1

Yij= n𝜇+nai+ei•

n =𝜇+ai+ēi•, (3.9)

and

Ȳ••= 1 An

∑A i=1

∑n j=1

Yij= An𝜇+na•+e••

An =𝜇+ā•+ē••. Then, for the ﬁrst pair,

𝔼[Ȳ••(Ȳi•−Ȳ••)] =𝔼[(̄a•+ē••)(ai−ā•+ēi•−ē••)]

=𝔼[̄a•(ai−ā•)] +𝔼[̄e••(̄ei•−̄e••)]

=𝔼[̄a•ai] −𝔼[ā2•] +𝔼[̄e••ēi•] −𝔼[̄e2••] = 𝜎a2 A −𝜎a2

A + 𝜎e2 An− 𝜎e2

An =0, as in Graybill (1976, p. 610). Likewise,

𝔼[Ȳ••(Yij−Ȳi•)] =𝔼[(̄a•+ē••)(ai−ai+eij−ēi•)]

=𝔼[̄e••eij] −𝔼[̄e••ēi•] = 𝜎2e An− n𝜎e2

An2 =0, and

𝔼[(Ȳi•−Ȳ••)(Yij−Ȳi•)] =𝔼[(ai−ā•+ēi•−̄e••)(eij−̄ei•)]

=𝔼[̄ei•eij] −𝔼[̄e2i•] −𝔼[̄e••eij] +𝔼[̄e••ēi•]

= 𝜎2e n − 𝜎2e

n − 𝜎2e An+ n𝜎e2

An2 =0,

conﬁrming the result. ◾

Theorem 3.2 Distribution SS𝜇

𝛾a ∼𝜒12 (An𝜇2

𝛾a )

, SSa

𝛾a ∼𝜒A−12 , SSe 𝜎e2

∼𝜒A(n−1)2 , (3.10)

where𝛾a ∶=n𝜎a2+𝜎e2.

Proof: It is not hard to show this directly (see, e.g., Graybill, 1976, p. 609), but a simple transformation can speed things up and is of great use when working with higher-order models. As in Stuart et al.

(1999, p. 676), we deﬁneHi∶=ai+̄ei•and then verify that Yij = Ȳ•• + (Ȳi•−Ȳ••) + (Yij−Ȳi•)

= (𝜇+H̄•) + (Hi−H̄•) + (eij−̄ei•)

= 𝜇+ai+eij.

(3.11)

Next, note thatH̄•=ā•+ē••andHii.i.d.∼ N(0, 𝜎2a+𝜎e2∕n). Starting from the top right of (3.11), write Yij−Ȳi• = (𝜇+ai+eij) − (𝜇+ai+ēi•) =eij−ēi•,

and, similarly,

Ȳi•−Ȳ••= (𝜇+ai+ēi•) − (𝜇+ā•+ē••) =Hi−H̄•, andȲ••=𝜇+ā•+ē••=𝜇+H̄•. Thus, for a giveni,𝜎−2e ∑n

j=1(eij−̄ei•)2∼𝜒n−12 and 𝜎−2e SSe=𝜎e−2

∑A i=1

∑n j=1

(eij−ēi•)2∼𝜒A(n−1)2 , (3.12)

from the independence of theeij and the summability of independent chi-square random variables.

Similarly,𝜎H−2

∑A

i=1(Hi−H̄•)2∼𝜒A−12 . With𝛾a∶=n𝜎a2+𝜎e2, SSa

𝛾a

= n∑A

i=1(Hi−H̄•)2 n𝜎a2+𝜎2e

=𝜎H−2

∑A i=1

(Hi−H̄•)2∼𝜒A−12 . (3.13)

Finally,(𝜇+H̄•) ∼N(𝜇, 𝜎a2∕A+𝜎e2∕An)implies√

An(𝜇+H̄•) ∼N(√

An𝜇, 𝛾a), so that√

An∕𝛾a(𝜇+ H̄•) ∼N(√

An∕𝛾a𝜇,1), which, in turn, implies An

𝛾a(𝜇+H̄•)2∼𝜒12 (An𝜇2

𝛾a )

or SS𝜇 𝛾a ∼𝜒12

(An𝜇2 𝛾a

) ,

completing the proof. ◾

Remark Theorem 3.1 showed thatȲ••,Ȳi•−Ȳ••andYij−Ȳi•are independent for alliandj, from which it follows that sums of their squares (or any functions of them) are independent, so thatSS𝜇, SSa, andSSeare independent. We can also see this in the following way.

For a giveni, we know from the independence property ofX̄ andS2Xfor normal samples (see, e.g., Section II.3.7) that̄ei•⟂∑

(eij−ēi•)2. This is the case for anyi, i.e., alsoēi′•⟂∑

(eij−ēi•)2, so that, from (3.12),ēi• ⟂SSe. AsSSais a function only ofHi=ai+ēi•, andSSeis not a function ofai, we haveSSe⟂SSa(recallingai⟂ēi•). The same applies toSS𝜇, being a function ofH̄•and a ﬁxed value

Table 3.1 ANOVA table for the balanced one-factor REM. The second column is speciﬁc to our model notation (3.1), and is not necessary, but shown for further clarity.

Source Terms df SS EMS

Mean 𝜇 1 AnȲ••2 𝜎e2+n𝜎2a+An𝜇2

A {ai} A−1 n∑A

i=1(Ȳi•−Ȳ••)2 𝜎e2+n𝜎2a

Error {eij} A(n−1) ∑A i=1

∑n

j=1(Yij−Ȳi•)2 𝜎e2

Total {Yij} An ∑A

i=1

∑n j=1Yij2

𝜇, i.e.,SSe⟂SS𝜇. Finally, asHiare also normally distributed,H̄•⟂SSaand, asSS𝜇is a function ofH̄•

and a ﬁxed value𝜇,SS𝜇⟂SSa. ◾

Dividing eachSSterm by its degrees of freedom and taking expected values yields theexpected mean squares, orEMS. This gives, with𝛾a ∶=n𝜎a2+𝜎e2,

𝔼[MS𝜇] =𝔼[SS𝜇] =𝛾a𝔼 [

𝜒12 (An𝜇2

𝛾a

)]

=𝛾a (

1+An𝜇2 𝛾a

)

=𝛾a+An𝜇2, 𝔼[MSa] =𝔼[ SSa

A−1 ]

= 𝛾a

A−1𝔼[𝜒A−12 ] =𝛾a, (3.14)

and

𝔼[MSe] =𝔼 [ SSe

A(n−1) ]

= 𝜎e2

A(n−1)𝔼[𝜒A(n−1)2 ] =𝜎e2, (3.15)

recalling that𝔼[𝜒2𝛿(𝜈)] =𝛿+𝜈. These are summarized in the ANOVA table (Table 3.1).

Recall the discussion of suﬃciency and completeness in, e.g., Chapter III.7.

Theorem 3.3 Complete, Minimal Suﬃcient Statistics The set of complete, minimally suﬃcient statistics for𝜇,𝜎a2, and𝜎2e is given bySS𝜇,SSa, andSSe.

Proof: Suﬃciency follows by expressingY∼NAn(𝝁,𝚺)with𝚺given in (3.6) as fY(y;𝜇, 𝜎a2, 𝜎e2) =

exp {

−1

[SSe 𝜎2e

+SSa𝛾

+An𝛾

(Ȳ••−𝜇)]}

(2𝜋)An∕2(𝜎e2)A(n−1)∕2𝛾aA∕2 , (3.16)

(where𝛾a ∶=n𝜎a2+𝜎e2), which the reader is encouraged to verify. The details are provided in, e.g., Searle et al. (1992, Sec. 3.7) and Sahai and Ojeda (2004, p. 26–27). For minimal suﬃciency and com-

pleteness, see, e.g., Graybill (1976). ◾

The reader can conﬁrm (3.16) by using it for the calculation of the log-likelihood in the program in Listing (3.1). From this result, and recalling that the m.l.e. is a function of the suﬃcient statistics (see, e.g., Section III.7.1.2), one might expect that the m.l.e. can be algebraically expressed in terms ofSS𝜇, SSa, andSSe, which is indeed the case, as given below.

3.1.3 Point Estimation, Interval Estimation, and Signiﬁcance Testing

Observe that, from the values ofEMSin Table 3.1, comparing the magnitudes ofMSaandMSeappears pertinent for assessing if𝜎a2>0. From the independence of theSS, the distribution of their ratio is tractable, and leads to

SSa 𝛾a

∕(A−1) SSe

𝜎2e

∕A(n−1)

∼F(A−1),A(n−1) or Fa ∶=MSa MSe ∼ 𝛾a

𝜎e2

FA−1,A(n−1), (3.17)

a scaled centralFdistribution, where, again,𝛾a ∶=n𝜎a2+𝜎e2. If𝜎a2=0, then𝛾a =𝜎e2(and𝛾a∕𝜎e2=1), so that an𝛼-level hypothesis test for𝜎a2=0 versus𝜎a2>0 rejects ifFa>FA−1,A(n−1)𝛼 , whereFn,d𝛼 is the 100(1−𝛼)th percent quantile of theFn,ddistribution.

There are several useful point estimators of𝜎e2and𝜎a2, including the method of maximum likelihood, as shown in Section 3.1.1. Others include the “ANOVA method” (see below), restricted m.l.e. (denoted REML, the most recommended in practice and the default of software such as SAS), and Bayesian methods. Discussions and comparisons of these methods can be found in, e.g., Searle et al. (1992), Miller Jr. (1997), Sahai and Ojeda (2004), and Christensen (2011).

We demonstrate the easiest of these, which is also referred to as the “ANOVA method of estimation”

(Searle et al., 1992, p. 59) and amounts to equating observed and expected sums of squares. From (3.15) and (3.14),𝔼[MSe] =𝜎e2and𝔼[MSa] =n𝜎a2+𝜎e2, so that

̂𝜎2e =MSe= SSe

A(n−1), and ̂𝜎a2= 1

n(MSa−MSe) = 1 n

( SSa

A−1− SSe A(n−1)

)

(3.18) yield unbiased estimators. Observe, however, that ̂𝜎a2in (3.18) can be negative. That (3.18) is not the m.l.e. is then intuitively obvious because the likelihood is not deﬁned for non-positive𝜎a2. We will see below in (3.21) that̂𝜎e2is indeed the m.l.e., and̂𝜎a2is nearly so. To calculate the probability that̂𝜎a2<0, use (3.17) to obtain

Pr(̂𝜎a2<0) =Pr(MSa<MSe) =Pr (MSa

MSe <1 )

=Pr (

FA−1,A(n−1)< 𝜎e2 n𝜎2a+𝜎e2

) .

Searle et al. (1992, p. 66–69) and Lee and Khuri (2001) provide a detailed discussion of how the sample sizes and true values of the variance components inﬂuence Pr(̂𝜎a2<0). In practice, in the case where ̂𝜎a2<0, one typically reports that𝜎a2=0, though formally the estimator+𝜎a2=max(0, ̂𝜎a2)is biased—an annoying fact for hardcore frequentists. Realistically, it serves as an indication that the model might be mis-speciﬁed, or a larger sample is required.

Note that, from Theorem 3.2 and (3.18), SSe

𝜎e2 ∼𝜒A(n−1)2 and ̂𝜎e2=MSe= SSe A(n−1), from which it follows that

Var(̂𝜎e2) = Var(SSe)

A2(n−1)2 = 1

A2(n−1)2Var (𝜎e2

𝜎e2

SSe )

= 2A(n−1)

A2(n−1)2𝜎e4= 2𝜎e4

A(n−1). (3.19) Similarly, with𝛾a=n𝜎2a+𝜎e2, as

SSa 𝛾a

∼𝜒A−12 and ̂𝜎a2= MSa−MSe

n = 1

n ( SSa

A−1− SSe A(n−1)

) ,

and the independence ofSSaandSSe, we have Var(̂𝜎a2) = 1

[Var(SSa)

(A−1)2 + Var(SSe) A2(n−1)2

]

= 1 n2

[𝛾a2Var(SSa∕𝛾a)

(A−1)2 +𝜎e4Var(SSe∕𝜎2e) A2(n−1)2

]

= 1 n2

[𝛾a22(A−1)

(A−1)2 + 𝜎4e2A(n−1) A2(n−1)2

]

= 2 n2

[(n𝜎a2+𝜎e2)2

(A−1) + 𝜎e4

A(n−1) ]

. (3.20)

Replacing𝜎a2and𝜎e2by their point estimates and taking square roots, these expressions yield approx- imations to the standard error of ̂𝜎2e and ̂𝜎a2, respectively (as were given in Scheﬀé, 1959, p. 228; see also Searle et al., 1992, p. 85), and can be used to form Wald conﬁdence intervals for the parameters.

These could be compared to the numerically obtained standard errors based on maximum likelihood estimation. The reader is invited to show that Cov(̂𝜎a2, ̂𝜎e2) = −2𝜎e4∕(An(n−1)).

By equating the partial derivatives of the log-likelihood𝓁(𝜇, 𝜎a2, 𝜎e2;y) =logfY(y;𝜇, 𝜎2a, 𝜎e2)given in (3.16) to zero and solving, one obtains (see, e.g., Searle et al., 1992, p. 80; or Sahai and Ojeda, 2004, p. 35–36)

̂𝜇ML=Ȳ••, ̂𝜎e,ML2 = SSe

A(n−1) =MSe, ̂𝜎a,ML2 = 1 n

(SSa

A − SSe A(n−1)

)

, (3.21)

provided ̂𝜎a,ML2 >0. The reader is encouraged to numerically conﬁrm this, which is very easy, using the codes in Listings 3.1 and 3.2.

Comparing (3.21) to (3.18), we see that the ANOVA method and the m.l.e. agree for ̂𝜎e2, and are nearly identical for ̂𝜎a2. The divisor ofAin ̂𝜎a,ML2 instead ofA−1 from the ANOVA method implies a shrinkage towards zero. Recall in the i.i.d. setting for the estimators of variance𝜎2, the m.l.e. has a divisor of (sample size)n, while the unbiased version usesn−1, and that the m.l.e. has a lower mean squared error. This also holds in the one-way REM setting here, i.e., mse(̂𝜎a,ML2 )<mse(̂𝜎a2); see, e.g., Sahai and Ojeda (2004, Sec. 2.7) and the references therein.

We now turn to conﬁdence intervals. Besides the Wald intervals, further interval estimators for the variance components (and various functions of them) are available. Recall (from, e.g., Chapter III.8) that a pivotal quantity, orpivot, is a function of the data and one or more (ﬁxed but unknown model) parameters, but such that its distribution does not depend on any unknown model parameters. From (3.10),

Q(Y, 𝜎2e) = SSe 𝜎e2

∼𝜒A(n−1)2

is a pivot, so that a 100(1−𝛼)%conﬁdence interval (c.i.) for the error variance𝜎e2is Pr

( l⩽ SSe

𝜎2e

⩽u )

=Pr (SSe

u ⩽𝜎e2⩽ SSe l

), (3.22)

where Pr(l⩽𝜒A(n−1)2 ⩽u) =1−𝛼and𝛼is a chosen tail probability, typically 0.05.

Likewise, from (3.17) withFa=MSa∕MSe,(𝜎e2∕(n𝜎a2+𝜎e2))Fa∼FA−1,A(n−1), so that 1−𝛼=Pr

Fa ⩽ 𝜎e2 n𝜎a2+𝜎e2

⩽ U Fa

)

=Pr (Fa

U ⩽1+n𝜎a2 𝜎e2

⩽ Fa L

)

=Pr

(Fa∕U−1 n ⩽ 𝜎a2

𝜎e2

⩽ Fa∕L−1 n

) , whereLandUare given by Pr(L⩽FA−1,A(n−1)⩽U) =1−𝛼.

Of particular interest is a conﬁdence interval for theintraclass correlation coeﬃcient, given by 𝜎a2∕(𝜎a2+𝜎e2). Taking reciprocals in the c.i. for𝜎a2∕𝜎e2gives

Pr ( n

Fa∕L−1 ⩽ 𝜎e2 𝜎a2

⩽ n

Fa∕U−1 )

=Pr (

1+ n

Fa∕L−1⩽ 𝜎a2+𝜎2e 𝜎a2

⩽1+ n Fa∕U−1

)

=Pr (

1 1+ n

Fa∕U−1

⩽ 𝜎a2 𝜎a2+𝜎e2

⩽ 1

1+ n

Fa∕L−1

)

=1−𝛼, or

1−𝛼=Pr

( Fa∕U−1

Fa∕U−1+n ⩽ 𝜎a2 𝜎a2+𝜎2e

⩽ Fa∕L−1 Fa∕L−1+n

)

, (3.23)

whereFa=MSa∕MSeandLandUare given by Pr(L⩽FA−1,A(n−1) ⩽U) =1−𝛼.

It turns out that a pivot and, thus, an exact conﬁdence interval for the intra-class covariance𝜎2ais not available. One obvious approximation is to replace𝜎e2with ̂𝜎e2in the c.i. for𝜎a2∕𝜎e2to get

1−𝛼≈Pr (

̂𝜎e2Fa∕U−1

n ⩽𝜎a2⩽ ̂𝜎e2Fa∕L−1 n

)

, (3.24)

which (perhaps obviously) performs well ifAnis large (Stapleton, 1995, p. 286), in which casê𝜎e2→𝜎e2. We saw in Section 3.1.1 that, whenAis large, the Wald c.i. based on the m.l.e. will also be accurate. A more popular approximation than (3.24) due to Williams (1962) is

1−2𝛼≈Pr

(SSa(1−U∕Fa)

nu∗ ⩽𝜎a2⩽ SSa(1−L∕Fa) nl∗

)

, (3.25)

where u∗ andl∗ are such that Pr(l∗ ⩽𝜒A−12 ⩽u∗) =1−𝛼. See also Graybill (1976, p. 618–620) for derivation.

The reader is encouraged to compare the empirical coverage probabilities of these intervals to use of their asymptotically valid Wald counterparts from use of the m.l.e. and recalling that, for function 𝝉(𝜽) = (𝜏1(𝜽),…, 𝜏m(𝜽))′fromℝk→ℝm,

𝝉(̂𝜽ML)asy∼N(𝝉(𝜽), ̇𝝉J−1𝝉̇′), (3.26)

where𝝉̇ =𝝉(𝜽)̇ denotes the matrix with(i,j)th element𝜕𝜏i(𝜃)∕𝜕𝜃j(see, e.g., Section III.3.1.4). In this case, the c.i. is formed using anasymptotic pivot.

The test for𝜎a2>0 is rather robust against leptokurtic or asymmetric alternatives, while the c.i.s for the variance components and their ratios are, unfortunately, quite sensitive to departures from normality. Miller Jr. (1997, p. 105–107) gives a discussion of the eﬀects of non-normality on some of the hypothesis tests and conﬁdence intervals.

3.1.4 Satterthwaite’s Method

We have seen three ways of generating a c.i. for𝜎2a, namely via the generally applicable and asymptotically valid Wald interval based on the m.l.e. and its approximate standard error (resulting from either the approximate Hessian matrix output from the BFGS algorithm or use of (3.19) and (3.20)), and use of (3.24) and (3.25). A further approximate method makes use of a result due to Satterthwaite

One-Factor Balanced Random Eﬀects Model

Ordinary and Generalized Least Squares

The Geometric Approach to Least Squares