Some robust estimators which are independent of the distributions of S i have beenintroduced, such as the moment estimator Klienman, 1973, analysis of variance es-timator Eslton, 1977, q
Trang 1Estimation of Intra-class Correlation Parameter for Correlated Binary Data In Common Correlated
Models
Zhang Hao(B.Sc Peking University)
A THESIS SUBMITTEDFOR THE DEGREE OF MASTER OF SCIENCEDEPARTMENT OF STATISTICS AND APPLIED PROBABILITY
NATIONAL UNIVERSITY OF SINGAPORE
2005
Trang 2For the completion of this thesis, I would like very much to express my heartfeltgratitude to my supervisor Associate Professor Yougan Wang for all his invaluableadvice and guidance, endless patience, kindness and encouragement during the pasttwo years I have learned many things from him regarding academic research andcharacter building.
I also wish to express my sincere gratitude and appreciation to my other lecturers,namely Professors Zhidong Bai, Zehua Chen and Loh Wei Liem, etc., for impartingknowledge and techniques to me and their precious advice and help in my study
It is a great pleasure to record my thanks to my dear friends: to Ms Zhu Min,
Mr Zhao Yudong, Mr Ng Wee Teck, and Mr Li Jianwei for their advice and help
in my study; to Mr and Mrs Rong, Mr and Mrs Guan, Mr and Mrs Xiao,
Ms Zou Huixiao, Ms Peng Qiao and Ms Qin Xuan for their kind help and warmencouragement in my life during the past two years
Finally, I would like to attribute the completion of this thesis to other members andstaff of the department for their help in various ways and providing such a pleasantworking environment, especially to Jerrica Chua for administrative matters and Mrs.Yvonne Chow for advice in computing
Zhang HaoJuly, 2005
Trang 31.1 Common Correlated Model 2
1.2 Two Specifications of the Common Correlated Model 5
1.2.1 Beta-Binomial Model 5
1.2.2 Generalized Binomial Model 6
1.3 Application Areas 7
1.3.1 Teratology Study 7
1.3.2 Other Uses 9
1.4 The Review of the Past Work 10
1.5 The Organizations of the Thesis 11
2 Estimating Equations 12 2.1 Estimation for the mean parameter π 12
2.2 Estimation for the ICC ρ 14
2.2.1 Likelihood based Estimators 14
2.2.2 Non-Likelihood Based Estimators 16
i
Trang 42.4 The Estimators We Compare 27
2.5 The Properties of the Estimators 28
2.5.1 The Asymptotic Variances of the Estimators 28
2.5.2 The Relationship of the Asymptotic Variances 39
3 Simulation Study 41 3.1 Setup 41
3.2 Results 45
3.2.1 The Overall Performance 45
3.2.2 The Effect of the Various Factors 48
3.2.3 Comparison Between Different Estimators 49
3.3 Conclusion 52
4 Real Examples 62 4.1 The Teratological Data Used in Paul 1982 62
4.2 The COPD Data Used in Liang 1992 62
4.3 Results 63
ii
Trang 5In common correlation models, the intra-class correlation parameter (ICC) vides a quantitative measure of the similarity between individuals within the samecluster The estimation for ICC parameter is of increasing interest and important use
pro-in biological and toxicological studies, such as the disease aggression study and theTeratology study
The thesis mainly compares the following four estimators for the ICC parameter
Gaussian likelihood estimator (ρ G ) and a new estimator (ρ U J) that is based on theCholesky Decomposition The new estimator is a specification of the UJ methodproposed by Wang and Carey (2004) and has not been considered before
Analytic expressions of the asymptotic variances of the four estimators are obtainedand extensive simulation studies are carried out The bias, standard deviation, themean square error and the relative efficiency for the estimators are compared Theresults show that the new estimator performs well when the mean and correlation aresmall
Two real examples are used to investigate and compare the performance of theseestimators in practice
keyword: binary clustered data analysis, common correlation model, intra-class
corre-lation parameter/coefficient, Cholesky Decomposition, Teratology study
iii
Trang 61.1 A Typical Data in Teratological Study (Weil, 1970) 8
3.1 Distributions of the Cluster Size 43
3.2 The effect of various factors on the bias of the estimator ρ U J in 1000 simulations from a beta binomial distribution 53
3.3 The effect of various factors on the mean square error of ρ U J in 1000 simulations from a beta binomial distribution 54
3.4 The MSE of ρ F C and ρ U J when the cluster size distribution is Kupper 55 3.5 The MSE of ρ F C and ρ U J when the cluster size distribution is Brass 55
3.6 The ”turning point” of ρ when π = 0.05 55
4.1 Shell Toxicology Laboratory, Teratology Data 63
4.2 COPD familial disease aggregation data 63
4.3 Estimating Results for the Real Data Sets 64
4.4 The Estimated value of the Asymptotic Variance of ˆρ (By plugging the estimates of (π, ρ)) into formulas: (2.29), (2.28), (2.26) and (2.21) 65
iv
Trang 74.5 The Estimated value of the Asymptotic Variance of ˆρ (by using the
Robust Method) 65
v
Trang 83.1 The two distributions of the cluster size n i 44
3.2 The overall performances of the four estimators when k = 10 46
3.3 The overall performances of the four estimators when k = 25 47
3.4 The overall performances of the four estimators when k = 50 48
3.5 The Legend for Figure (3.8), (3.7), (3.6), (3.9) and (3.10) 56
3.6 The Relative Efficiencies when k = 25 and π = 0.5 57
3.7 The Relative Efficiencies when k = 25 and π = 0.2 58
3.8 The Relative Efficiencies when k = 25 and π = 0.05 59
3.9 The Relative Efficiencies when k = 10 and π = 0.05 60
3.10 The Relative Efficiencies when k = 50 and π = 0.05 61
1
Trang 9Chapter 1
Introduction
Data in the form of clustered binary response arise in the toxicological and biologicalstudies in the recent decades Such kind of data are in the form like this: thereare several identical individuals in one cluster and the response for each individual
is dichotomous For ease of the presentation, we name the binary responses here as
”alive” or ”dead”, and the metric (0,1) is imposed with 0 for ”alive” and 1 for ”dead”
Suppose there are n i individuals in the i th cluster and there are k clusters in total The binary response for the j th individual in the i th cluster is denoted as
y ij = 1/0 (i = 1, 2, , k; j = 1, 2, , n i ) So S i =Pn i
j=1 y ij is the total number of the
individuals observed to respond 1 in the i th cluster It is postulated that the ”death”
rate of all the individuals in the i th cluster are the same, which is P (y ij = 1) = π.
The correlation between any two individuals in the same cluster are assumed to be the
2
Trang 10same We denote this Intra-Class Correlation parameter as ρ = Corr(y il , y ik) for any
l 6= k For individuals from different clusters, they are assumed to be independent,
which means y ij is independent of y mn for any i 6= m.
The variance of S i often exhibit greater value than the predicted value if a simplebinomial model is used This phenomenon is called the over-dispersion, which is due
to the tendency that the individuals in the same cluster would respond more likelythan individuals from different clusters
According to the above assumptions, we can see that:
Ey ij = π and Vary ij = π(1 − π) i = 1, 2, k j = 1, 2, n i
And for the sum variable S i =Pn i
j=1 y ij , which is the sufficient statistics for π:
The second moment of S i is determined by ρ but the third, forth and the higher order moment of S i may depend on the other parameters Only when we know the
likelihood of S i (such as the Beta-binomial model or the generalized binomial model),
we can get the closed forms of these higher order moment of S i
Define a series of parameters:
Qj=s
j=1 (y ij − π)
For the common correlated model, we can show that φ2 = ρ and the s th moment
m si = E(S i − n i π) s of S i only depends on {π, φ2, , φ s }
When π is fixed, ρ can not take all the values between (−1, 1) Prentice( 1986) has
Trang 11where n max = max{n1, n2, , n k }, ω = n max π − int(n max π) and int(.) means the
integer part of any real number For the different specifications of the model, theconstraints might be different
The model described above was first formally suggested as the Common lated Model by Landis and Koch (1977a) It includes various specifications, such asBeta-Binomial and Extended Beta-Binomial model (BB) of Crowder (1986), Corre-lated Beta-Binomial model (CB) of Kupper and Haseman (1978) and the GeneralizedBinomial model (GB) of Madsen (1993)
Corre-Kupper and Haseman (1978) has given an alternative specification of the common
correlated model when ρ is positive It is assumed that the probability of alive (success)
varies from group to group (but keep the same between individuals in the same group)
according to a distribution with mean π and variance ρπ(1 − π) All the individuals
(both within the same group and different groups) are independent conditional on thisprobability If this probability is distributed according to Beta distribution, it willlead to the well-known Beta-Binomial model
Trang 121.2 Two Specifications of the Common Correlated
Model
Of the specifications of the common correlated model, Beta-Binomial model is themost popular Paul (1982) and Pack (1986) has shown the superiority of the beta-binomial model for the analysis of proportions However, Feng and Grizzle (1992)
found that the BB model is too restrictive to be relied on for inference when n i arevariable
The beta-binomial distribution is derived as a mixture distribution in which theprobability of alive varies from group to group according to a beta distribution with
parameters α and β S i is binomially distributed conditional on this probability
In terms of the parameterizations of α and β, the marginal probability of alive for any individual is: π = α/(α + β) and the intra-class correlation parameter is: ρ = 1/(1 + α + β) Denote θ = 1/(α + β), we can get the probability function for the
j=0 (1 − π + jθ)
Qy−1
If the intra-class correlation ρ > 0, it is called over-dispersion, otherwise it is called
under-dispersion Over-dispersion is much more common than under-dispersion in
Trang 13Chapter 1: Introduction 6
practice since the litter effect suggests that any two individuals are tended to respondmore likely and therefore they are positively correlated But this does not mean that
ρ must be positive For BB model, it is required that ρ > 0 However, Crowder (1986)
showed that to ensure (1.1) to be a probability function, ρ only needs to satisfy
1 − π
In this case, ρ can take negative values, which makes the BB model also suitable for
under-dispersion data This is called extended beta-binomial model
The generalized binomial model is proposed by Madsen (1993) It can be treated asthe mixture of two binomial distributions:
Trang 14An advantage of the generalized binomial model is that ρ contains information for the higher(≥ 3) order moment As we know, the correlation for any pair
Corr(y ij , y ik) = E(y ij − π)(y ik − π)
E(y ij − π)2 = ρ = φ2
For the GB model, it can be shown that:
E(y ij − π)3 = φ3 = ρ
E(y ij − π)4 = φ4 = ρ That means ρ also determines the third and forth moment of S i
Of the various applied areas of the common correlated model, we mainly focus on theTeratology studies In a typical Teratology study, female rats are exposed to differ-ent dose of drugs when they are pregnant Each fetus is examined and a dichotomousresponse variable indicating the presence or absence of a particular response (e.g., mal-formation) is recorded For ease of the presentation, we often denote the dichotomousresponse as alive or dead Apply the common correlation model and the notations
above to the teratology study, it can be described as: k female rats were exposed
to certain dose of drug during their pregnancy For the i th rat, she gave birth to n i fetuses Of the n i fetuses, y ij denotes the survival status for the j th fetus y ij = 1
means the fetus is observed dead or it is alive Then S i =Pn i
j=1 y ij is the total number
Trang 15Table 1.1: A Typical Data in Teratological Study (Weil, 1970)
Trang 16be accounted for by binomial variation (Liang and Hanfelt, 1994) This is a typicalover-dispersion clustered binary response data and the ICC parameter ought to bepositive.
Besides the Teratological studies, the estimation for the intra-class correlation ficient are also widely used in the other fields of toxicological and biological studies.For example, Donovan, Ridout and James (1994) used the ICC to quantify the extent
coef-of variation in rooting ability among somaclones coef-of the apple cultivar Greensleeves;Gibson and Austin (1996) used an estimator of ICC to characterize the spatial pat-tern of disease incidence in an orchard; Barto (1966), Fleiss and Cuzick (1979) andKraemer et al.(2002) used ICC as an index measuring the level of interobserver agree-ment; Gang et al (1996) used ICC to measure the efficiency of hospital staff in thehealth delivery research; Cornfield (1978) used ICC for estimating the required size of
a cluster randomization trial
In some clustered binary situation, the ICC parameter can be interpreted as the
”heritability of a dichotomous trait” (Crowder 192, Elston, 1977) It is also frequentlyused to quantify the familial aggregation of disease in the genetic epidemiologicalstudies (Cohen, 1980; Liang, Quaqish and Zeger, 1992)
Trang 17Chapter 1: Introduction 10
Donner (1986) has given a summarized review for the estimators of ICC in the casethat the responses are continuous He also remarked that the application of continuoustheory for the binary response has severe limitations In addition, the moment method
to estimate the correlation, which is used in the GEE approach proposed by Liang andZegger (1986) is also not appropriate for the estimation of ICC when the response isbinary
A commonly used method to estimate ICC is the Maximum likelihood methodbased on the Beta-Binomial model (Williams 1975) or the extended beta binomialmodel (Prentice 1986) However the estimator based on the parametric model mayyield inefficient or biased results when the true model was wrongly specified
Some robust estimators which are independent of the distributions of S i have beenintroduced, such as the moment estimator (Klienman, 1973), analysis of variance es-timator (Eslton, 1977), quasi-likelihood estimator (Breslow, 1990; Moore and Tsi-atis, 1991), extended quasi-likelihood estimator (Nelder and Pregibon, 1987), pseudo-likelihood estimator (Davidian and Carroll, 1987) and the estimators based on thequadratic estimating equations (Crowder 1987; Godambe and Thompson 1989).Ridout et al (1999) had given an excellent review of the earlier works and con-ducted a simulation study to compare the bias, standard deviation, mean square errorand the relative efficiencies of 20 estimators The reviewing work is based on the datasimulated from beta binomial and mixture binomial distributions and the simulationresults showed that seven estimators performed well as far as these properties were
Trang 18concerned Paul (2003) introduced 6 new estimators based on the quadratic estimatingequations and compare these estimators along with the 20 estimator used by Ridout
et al (1999) Paul’s work shows that an estimator based on the quadratic estimating
equations also perform well for the joint estimation of (π, ρ).
Chapter 1(this chapter) gives an introduction to the clustered binary data, common
correlated model and reviews the past works on the estimation of the ICC ρ
Chap-ter 2 will introduce the commonly used estimators and the new estimators that weare going to investigate Then we will obtain the asymptotic variances of the four
estimators that we are going to compare: κ-type (FC) estimator, ANOVA estimator,
Gaussian likelihood estimator and the new estimator based on Cholesky tion Chapter 3 will carry the simulation studies to compare the performances of thesefour estimators We will compare the bias, standard deviation, mean square error andthe relative efficiency of these four estimators To investigate the performance of theestimators in practice, chapter 4 will apply these four estimators on two real exampledata sets Chapter 5 will give general conclusions and describe the future work
Trang 19decomoposi-Chapter 2
Estimating Equations
Since S i is the sufficient statistics for π, modelling on the vector response y ij does
not give more information for π than modelling on S i = Pn i
j=1 y ij On the otherhand, the estimating equation should not dependent on the order of the fetuses in
the developmental studies Denote the residual g i = S i − n i π and the variance V i =
Var(S i − n i π) = σ2
we can get the estimating equation for π:
Trang 20Simplify (2.1), we get the Quasi-likelihood estimating equation for π:
From another point of view,we may also use the GEE approach, which is modelled
on the vector response y i = {y i1 , y i2 , , y in i }.
Note that (2.3) also does not depend on the order of y ij even though it is modelled
on the vector response It has the same form with the Quasi-likelihood estimatingequation (2.1)
Consider a general set of estimators for π:
Trang 21Chapter 2: Estimating Equations 14
When w i = [1 + (n i − 1)ρ] −1 = ν −1
i , we can get (2.2) The weight factor ω i can also
take other values For example, when ω i = 1, the estimator for π is ˆ π =Pi S i /Pi n i
and when ω = 1/n i , the estimator for π is (Pi S i /n i )/k
The maximum likelihood estimators are based on the parametric models However,
when the parametric model does not fit the data well, these estimators may be highly
biased or inefficient
• MLE Estimator Based on Beta Binomial Model
As mentioned in (1.2.1), the likelihood of the beta binomial distribution is:
j=0 (1 − π + jθ)
Qy−1
j=0 (1 + jθ) Denote the log-likelihood as l(π, ρ), so the jointly estimating equations for (π, ρ)
n i −SXi −1 r=0
1 − ρ (1 − ρ)(1 − π) + rρ
r − (1 − π)
(1 − ρ)(1 − π) + rρ −
nXi −1 r=0
r − 1
(1 − ρ) + rρ
)
= 0
Trang 22Denote the solution for the above estimating equations as the maximum
likeli-hood estimator ρ M L
• Gaussian Likelihood Estimator
The Gaussian likelihood estimator was introduced by Whittle (1961) when ing with the continuous response and Crowder(1985) introduced it to the analysis
deal-of binary data As shown in Chapter 1, we know that the Gaussian likelihoodmodel only needs to assume the first two moments and are very easy to calculate
of all the moment based methods Paul (2003) also showed that the Gaussianestimator for the binary data performance well, compared with the other knownestimators for ICC
Assume the vector response y i = {y i1 , y i2 , , y in i } is distributed according to
the multivariate Gaussian distribution, with the mean and variance:
Here A i = diag{π(1 − π), π(1 − π), , π(1 − π)} is the diagonal variance matrix.
Denote the residual
Trang 23Chapter 2: Estimating Equations 16
the standardized residual
y i2 −π
√ π(1−π)
y ini −π
√ π(1−π)
Denote the solution for (2.5) as the Gaussian likelihood estimator ρ G
The non likelihood based estimators are supposed to be more robust than the
maxi-mum likelihood estimators since they are independent of the distributions of S i We
will introduce the new estimator ρ U J which based on the Cholesky decomposition, aswell as some other commonly used estimators
Trang 24• New Estimator Based on Cholesky Decomposition
The new estimator is a specification of the U-J method proposed by Wang andCarey (2004), which is based on the Cholesky Decomposition:
Trang 25Chapter 2: Estimating Equations 18
1 + (j − 2)ρ (1 − ρ)[1 + (j − 1)ρ]
Trang 26Let’s consider all the permutations of ε ij We use ε i[l]represent one permutation.
Since there are n i ! permutations for the i th cluster, we shall use 1/n i! as the weight
for the i th cluster
Trang 27Chapter 2: Estimating Equations 20
Denote the solution for (2.6) as the new estimator ρ UJ
• The Analysis of Variance Estimator
The analysis of variance estimator is by defination:
where MSB and MSW are the between and within group mean squares from a
one-way analysis of variance of the response y i And
• Direct Probability Interpretation Estimators
Assume the probability that two individuals have the same response to be α if they are from the same cluster or β if they are from the different clusters The
assumptions of the common correlation model shows that:
Trang 28and hence that
Similarly, we can get other estimators with the different estimators of α and β.
Mak (1988) has proposed the Mak’s estimator:
• Direct Calculation of Correlation Estimator
Donner (1986) suggested to estimate ρ by calculating the Pearson correlation
coefficient over all possible pairs within one group Karlin et al (1981) proposedthe general form of such kind of estimators Ridout et al (1999) extended this
Trang 29Chapter 2: Estimating Equations 22
method to the binary data and proposed the Pearson correlation estimator as:
Denote the estimator that use the constant weight ω i = 1/Pi n i (n i − 1) as the
Pearson estimator ρ P earson
P
P
i n i (n i − 1) (2.12)
• Pseudo Likelihood Estimator
Davidian and Carroll (1987) and Carroll and Ruppert (1988) introduced the
pseudo likelihood estimator Treat the count number S i =Pi y ij as a Gaussian
distribution random variable So the likelihood for S i is:
Trang 30Denote the solution for (2.13) as the pseudo likelihood estimator ρ P L Note that,
ρ P L is different with the Gaussian likelihood estimator ρ G ρ G is got by treating
the vector response y i = {y i1 , y i2 , , y in i } as a multivariate normal distribution
while ρ P L is got by maxmizing the pseudo likelihood of S i =Pj y ij
• Extended Quasi Likelihood Estimator
Nelder and Pregibon (1987) extended the quasi likelihood estimating equation for
the common correlation model to estimate the ICC ρ Note that the traditional quasi likelihood approach can not be used here, since the residuals ε i does not
likelihood estimator ρ P Another way is to replace D i (S i , π) with k − 1D k i (S i , π),
this will yield the unbiased version of the quasi likelihood estimator ρ EQ
• Moment Estimator
Trang 31Chapter 2: Estimating Equations 24
Kleinman (1973) proposed a set of moment estimators in the form of:
Two specifications of the moment estimators are used in Ridout et al (1999),
one with weights (ω i = 1/k) and the other with (ω i = n i /N) They are labeled
ρ KEQ and ρ KP R If S ω is replaced by S ∗
ω = k − 1 k S ω, we can get two slightly
different moment estimators ρ ∗
KEQ and ρ ∗
KP R
A more general moment estimator proposed by Whittle (1982) is by using the
iterative algorithms Take ω i = n i
1 + (n i − 1)ˆ ρ, where ˆρ is the current estimate
of ρ, we can get a new moment estimator ρ W and ρ ∗
W (by replacing S ω with S ∗
ω
mentioned above)
• Estimators Based on Quadratic Estimating Equations
The quadratic estimating equations was first proposed by Crowder (1987) It is
a set of estimating equations with the quadratic form of S i − n i π:
Trang 32He also proposed that the optimal estimating equations is obtained by setting:
a iπ= −(γ 2iλ + 2) + γ 1iλ (1 − 2π)σ iλ /π(1 − π)
Here γ 1j and γ 2j are the skewness and kurtosis of S i − n i π
n i and σ iλis the variance
of S i − n i π
n i However we do not know the exact form of γ 1i and γ 2i for the nonlikelihood estimators Paul (2001) suggested to use the 3rd and 4th momentsderived from the beta-binomial distribution instead:
Trang 33esti-Chapter 2: Estimating Equations 26
The Gaussian likelihood estimator and pseudo likelihood estimator are specialcases of the optimal quadratic estimating equations For the Gaussian likelihoodestimator, the parameters are:
a iρ = n i (1 − 2π) and b iρ = n
2
i [1 + (n i − 1)ρ2]
[1 + (n i − 1)ρ]2
For the pseudo likelihood estimator, the parameters are:
a iρ = 0 and b iρ= n
2
i (n i − 1)
[1 + (n i − 1)ρ]m 2i
It also coincides with the optimal estimating equations when we set γ 1i = γ 2i= 0
Ridout et al (1999) compared 20 estimators of the intra-class coefficient for theirbias, standard deviation, mean square error and relative efficiency He suggested that
the analysis of variance estimator (ρ A ), the κ-type estimator (ρ F C) and the moment
estimator (ρ KP R and ρ W) performed well as far as the median of the mean square
error were concerned He also found that the Pearson estimator (ρ P earson) performed
well when the true value of the intra-class correlation parameter ρ was small But the overall performance of ρ P earson depends on the true value of ρ The conclusion of Rid-
out et al (1999) were based on the simulation results on the data generated from thebeta binomial distribution and the mixed distribution of two binomial distributions.Paul (2003) introduced 6 other estimators based on the quadratic estimating equa-tions and compare these 6 estimators along with the 20 estimators used by Ridout
Trang 34et al (1999) With similar setup of the simulation, Paul (2003) showed that the
estimator based on the optimal quadratic estimating equations ρ QB, which used the
3rd and 4th moment from beta binomial distribution, also performs well in the jointlyestimation of (ˆπ, ˆ ρ) For the data generated from the beta binomial distribution, it
even has higher efficiency than that of ρ A He also found that the performance of
ρ P earson depends on the true value of ρ, which is consistent with the findings of Ridout
et al.(1999)
Zou and Donner (2004) introduced the coverage rate as a new index to comparethe performance of the estimators They obtained the closed form of the asymptotic
variances of the analysis of variance estimator ρ A , the κ-type estimator ρ F C and the
Pearson estimator ρ P earson, under the distribution of the generalized binomial models
(Madsen, 1993) The simulation results indicated that the κ type estimator ρ F C formed best among these three estimators as far as the coverage rate of the confidenceinterval was concerned
We are going to compare four estimators The κ-type estimator ρ F C, the analysis of
variance estimator ρ A , the Gaussian estimator ρ G and the UJ estimator based on theCholesky decomposition
The κ-type estimator ρ F C and the ANOVA estimator ρ A are widely used tors for ICC and performs well in many situations (Ridout et al 1999) Gaussianlikelihood method is the most general form of all the moment based methods And it
Trang 35estima-Chapter 2: Estimating Equations 28
also only relies on the first two moments, that is what we know in the common lated model Besides, the Gaussian likelihood method is also the most convenient tocalculate method of all the pseudo likelihood methods (Crowder 1985)
corre-We are going to compare these three estimators with the new estimator ρ U J based
on the Cholesky decomposition, which is the specification of the UJ method proposed
by Wang and Carey (2004)
The asymptotic variance quantifies the limit properties of the estimators As shown
above, we have two types of estimators for ρ One type of the estimator is the solution
of some estimating equation, such as the new estimator ρ U J and the Gaussian
Like-lihood estimator ρ G Another type of the estimator has the closed form, such as the
obtain the asymptotic variances of these two types of estimators
• Estimators Without Closed Forms
This type of the estimator is the solution of some estimating equation and has
no closed forms The typical example is the NEW (UJ) estimator
Trang 36the estimating equations for (π, ρ) jointly So the choice of the estimators of ˆ π
may affect the asymptotic variance of ˆρ Here we will use (2.2):
as the estimating equation for ˆπ The advantage of this estimator is that it would
maximize the efficiency of ˆπ.
Of all the estimators mentioned above, the MLE estimator(ρ M L), the Gaussian
estimator (ρ G ), the Pseudo likelihood estimator (ρ P L), the extended quasi-likelihood
estimator (ρ EQ), the estimator based on the quadratic estimating equations
(ρ QB ) and the New (UJ) estimator ρ U J based on Cholesky decomposition are
Trang 37Chapter 2: Estimating Equations 30
So the asymptotic variance-covariance matrix is
Here Var(ˆρ) = V22 Simply plugging in the estimates of (ˆπ, ˆ ρ) can not ensure the
positiveness of matrix M and sometimes we will get the negative values of the
asymptotic variances of ˆρ G and ˆρ U J Here we define:
M ] is a positive matrix So, use M ] instead of M if necessary, then the
asymp-totic variance of ˆρ G and ˆρ U J will always be positive
For our choice of the estimating equation for π:
Trang 38Var(ˆπ) = V11 =
µ1
where m 2i = E(S i − n i π)2 = n i π(1 − π)[1 + (n i − 1)ρ] is the 2 nd order centralized
moment of S i m 3i and m 4i are the 3rd and 4th order centralized moment of S i
Apply the sandwich method on the NEW(UJ) estimator and the Gaussian
like-lihood estimator, with m 3i and m 4i to denote the 3rd and 4th order centralized