We consider statistical methods for the analysis of teratology data and investigatebias and efficiency of different estimators in presence of overdispersion.. Chapter 2The Models and Est
Trang 1Data Analysis
Min ZHU
NATIONAL UNIVERSITY OF SINGAPORE
2004
Trang 2Data Analysis
Min ZHU(B.Sc University of Science & Technology of China)
A THESIS SUBMITTEDFOR THE DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF STATISTICS AND APPLIED PROBABILITY
NATIONAL UNIVERSITY OF SINGAPORE
2004
Trang 3For the completion of this thesis, I would like very much to express my heartfeltgratitude to my supervisor Associate Professor Yougan Wang for all his invaluableadvice and guidance, endless patience, kindness and encouragement during thementor period in the Department of Statistics and Applied Probability of NationalUniversity of Singapore I have learned many things from him, especially regardingacademic research and character building I truly appreciate all the time and effort
he has spent in helping me to solve the problems encountered even when he is inthe midst of his work
I also wish to express my sincere gratitude and appreciation to my other turers, namely Professors Zhidong Bai, Zehua Chen for imparting knowledge andtechniques to me and their precious advice and help in my study
lec-It is a great pleasure to record my thanks to other members and staff of thedepartment for their help in various ways and providing such a pleasant workingenvironment, especially to Mrs Yvonne Chow and for advice in computing andIrene Tan and others for administrative matters I also wish to thank the NUS forproviding me the overseas graduate student scholarship
Finally, I would like to contribute the completion of this thesis to my dearestfamily who have always been supporting me with their encouragement and under-standing in all my years Special thanks to all my friends who helped me in oneway or another for their friendship and encouragement
Trang 41.1 Teratology Studies 1
1.2 Examples and Notation 3
1.3 Organization of the Thesis 7
2 The Models and Estimation 8 2.1 Introduction 8
2.2 Diagnostics of overdispersion 9
2.3 Likelihood-based Models 11
2.3.1 Beta-binomial Model (BB) 12
2.3.2 Correlated-binomial Model (CB) 14
2.3.3 Beta-correlated binomial Model (BCB) 16
2.3.4 Mixture Models 19
2.4 Non-likelihood-based Models 20
2.4.1 Quasi-likelihood Approach 21
2.4.2 Generalized Estimating Equations 22
ii
Trang 52.5 Estimating Intraclass Correlation 25
2.6 Summary 29
3 Gaussian Working Likelihood Approach 30 3.1 Decoupled Gaussian Estimation 30
3.2 Model Comparisons 34
3.3 Simulation Setup 37
3.3.1 Beta-binomial data 37
3.3.2 Non-beta-binomial data 40
3.4 Simulation Studies 44
3.5 Discussion 55
4 Dose-response Models Incorporating Risk of Death in Utero 57 4.1 Introduction 57
4.2 The model 59
4.3 Specification of the mean and variance functions 63
4.4 An Example 65
4.5 Discussion 68
Trang 6List of Tables
1.1 Data presented in Paul (1982) 4
1.2 Sample No 3 of Haseman and Soares (1976) 5
2.1 Binomial dispersion statistics and their null expectations and vari-ances evaluated for the data of Paul (1982) 10
3.1 Distribution of number of live fetuses per litter (n ij) used in the simulation studies 39
3.2 Different correlation combinations considered in the exponential-gamma approach 42
3.3 Different response probabilities considered in the step-function ap-proach 44
3.4 Biases of ˆβ0 and ˆβ1 under various methods 45
3.5 MSEs of ˆβ0 and ˆβ1 under various methods 46
3.6 Biases of ˆρ from four different methods . 48
3.7 MSEs of ˆρ from four different methods . 49 3.8 Biases of ˆθT for data generated from exponential-gamma approach 51
iv
Trang 73.9 MSEs of ˆθT for data generated from exponential-gamma approach 523.10 Biases of ˆθT for data generated from step-function approach 533.11 MSEs of ˆθT for data generated from step-function approach 54
4.1 Summary of the data of L¨uning et al (1966) 66
Trang 8List of Figures
1.1 Data structure of teratology studies 3
4.1 Estimated risk functions using the dominant lethal assay data 67
Trang 9We consider statistical methods for the analysis of teratology data and investigatebias and efficiency of different estimators in presence of overdispersion In partic-ular, we focus on the decoupled Gaussian method for analysis of the correlatedbinary data Both analytic and simulation studies are carried out to evaluate dif-ferent models It is found that the decoupled Gaussian method work especiallywell for joint estimation of mean and intraclass correlation parameters
Previous modelling effort usually has been conducted on viable fetuses alone
To incorporate information in the prenatal dead/resorbed fetuses, we propose a newapproach for joint analysis of prenatal death/resorption and malformation Thisapproach has several advantages: (i) it provides a convenient way of modelling
the unobserved in utero death as well as observed defeats in risk assessment; (ii)
it enables us to have flexible choices in ordinal categorical data analysis; and (iii)
we can obtain efficient statistical inference using the framework of the generalizedestimating equations Real data analyses are provided for demonstrations
Trang 10My major contributions in this thesis are as follows:
(i) to verify using litter counts in stead of fetus-specific outcomes does not lose
any information for the GEE approach to estimate mean parameters (§4.2, Chapter
(iv) to propose a multivariate model incorporating the risk of death in Utero
by modelling the probability of being observed (Chapter 4)
Trang 11to the growth of drug industry and more people suffering toxic reaction to drugs.
In order to identify the toxicity, three test designs (segments I, II, and III) havebeen established by FDA in 1966 to assess specific type of effects (Molenberghs,Declerck, and Aerts, 1998):
Segment I studies are known as “fertility studies” which are designed to assessmale and female fertility and general reproductive ability Such studies typically
Trang 12involve exposing males for 60 days and females for 14 days before mating.
Segment II studies are also referred to as “teratology studies” since historicallythe primary goal was to study malformations The origin of the word “teratology”,which comes from the Greek word “tera”, means monster The methodologiesdiscussed in this thesis mainly apply to Segment II studies
Segment III tests are focused on effects later in gestation and involve exposingpregnant animals from the 15th day of gestation through lactation
Animal laboratory experiments are used in each of three test designs and vide good evidences for identifying the toxicity A typical teratology experimentdistributes timed-pregnant animals (mice, rats, and occasionally rabbits) to sev-eral groups treated with varying doses of a compound and a control group Thesedams are exposed during major organogenesis (days 6 to 15 for mice and rats) andstructural development All dams are sacrificed prior to normal delivery, at whichtime the uterine contents are thoroughly examined to study the reproductive anddevelopmental toxicity of the test agent The number of implantations (or some-times called implantation sites) is counted An implant or fetus may be resorbed
pro-at different stages during gestpro-ation, or die before birth; if it survives, growth duction, such as weight loss, may occur; it may further exhibit various types ofstructural variation, or even one or more types of malformation (Zhu and Fung,
re-1996) Figure 1.1 illustrates the teratology data structure.
Since laboratory experiments involve considerable amounts of time and money,
as well as huge numbers of animals, it is essential that the most appropriate and
Trang 13?
Implants
Q Q Q Q Q Q Q
K
Figure 1.1: Data structure of teratology studiesefficient statistical models are used (Williams and Ryan, 1996) In addition, theanalysis of the teratology data combining multivariate and clustered data issuesraises a number of challenges
Although there are dichotomous as well as continuous outcomes in a teratologyexperiment, we will focus on dichotomous outcomes — the occurrence of malfor-mations or fetal deaths — in this thesis We now introduce several teratologydata sets, which will be used in the thesis, to give an intuitional understanding ofteratology data
Trang 14Example 1.1 Paul (1982)
The data (from Shell Toxicology Laboratory, Sittingbourne Research Centre, tingbourne, Kent, England) are given in Table 1.1, which are analyzed by Paul(1982) The species used in the experiment is banded Dutch rabbit, and skeletal
Sit-and visceral abnormalities were observed Here n denotes the number of live tuses, and s indicates the number affected by treatment in each dose group For
fe-example, the first pregnant female in the control group has one abnormal fetus out
of the twelve live fetuses, and the first pregnant female in the low-dose treatmentgroup has no abnormal fetuses out of five fetuses
Table 1.1: Data presented in Paul (1982)
Example 1.2 Haseman and Soares (1976)
The data in Table 1.2 are taken from Haseman and Soares (1976) and describeone of the control groups from dominant lethal assays In this experiment a drug’sability to cause damage to reproductive genetic material, sufficient to kill the fer-tilized egg or developing embryo, is tested by dosing a male mouse and mating it
Trang 15to one or more females A significant increase in fetal deaths is then indicative aadverse effect.
Table 1.2: Sample No 3 of Haseman and Soares (1976)
Observed frequency distribution of fetal death in mice
we adopt for teratology data in the following paragraphs
In general, we will use capital letters to represent random variables or ces, relying on the context to distinguish the two, and small letters for specificobservations Scalars and matrices will be in normal type, vectors will be in boldtype
Trang 16matri-In a typical teratology experiment, suppose we have t different dose groups There are m i litters under the ith dose group The jth litter out of the m i has
size n ij , and y ij fetuses out of the n ij are abnormal We further write the binary
response vector from the litter j in dose i as Y ij = (y ij1 , · · · , y ijn ij)0 , where y ijk takes value 1 if the k-th fetus in the j-th litter under i-th group is abnormal, otherwise 0 Obviously, y ij = Pk y ijk It may be useful to refer to the diagrambelow for a quick check of the notation
The response probability is assumed to be the same for all fetuses in the same
dose group; specifically, Pr(Y ijk = 1) = µ i for all j and k Furthermore, the
responses of fetuses from different litters are assumed to be independent
Because of genetic similarity and the same treatment conditions, fetuses ofthe same mother behave more alike than those of another mother This has beentermed “litter effect” and is one important form of clustering As a result, responses
of different fetuses within the same litter are likely to be correlated Assume theintraclass correlation between any pair of binary responses is affected by dose level
Trang 17only, that is Corr(Y ijk , Y ijl ), k 6= l, for litter j in dose group i is ρ i It follows that
Var(Y ij ) = n ij µ i (1 − µ i )[1 + (n ij − 1)ρ i ],
so that the Y ij are overdispersed relative to the binomial distribution if ρ i > 0
and underdispered if ρ i < 0 In practice, overdispersion is much more common
compared with underdispersion phenomenon As an important characteristic ofteratological data, this extra variation must be taken into account for valid statis-tical inferences
This chapter has given a brief introduction to teratology data Chapter 2 reviewsthe existing methods for analysis of teratology data and compares advantages anddisadvantages of each method Chapter 3 focuses on the decoupled Gaussian ap-proach for analysis of the correlated binary data Simulation studies are carriedout to investigate the bias and efficiency of different estimators in the presence ofoverdispersion A new approach of ordinal responses for joint analysis of prenataldeath/resorbtion and malformation is given in Chapter 4 Conclusions and furtherresearch are presented in Chapter 5
Trang 18Chapter 2
The Models and Estimation
In the past 30 years, a number of methods have been proposed for the analysis
of clustered binary data Roughly, methods for correlated binary data can begrouped into two classes: likelihood-based methods and non-likelihood methods.For likelihood-based methods, the key problem is to find proper likelihood func-tions that can be used for the efficient and parsimonious estimation of parameters.For non-likelihood methods, the quasi-likelihood approach and the generalized es-timating equations (GEE) have gained widespread popularity In addition, thepeculiar parameter to clustered binary data, intraclass correlation, is also an im-portant parameter, and a number of approaches have been proposed for estimatingthis parameter Before we present the existing methods for analysis of teratologydata, let’s first introduce a fundamental issue — identifying overdispersion
Trang 192.2 Diagnostics of overdispersion
As mentioned in §1.2, teratological data typically exhibits overdispersion But it is
not to affirm that teratological data is born with overdispersion A question for thiskind of data arises naturally: how to identify whether there is any overdispersion
in data? If there is, how to quantify it relative to normal binomial or multinomialdistribution? As pointed out by Williams(1987), evidence of overdispersion comesfrom an analysis of the binomial dispersion statistics
Consider the litters, f is in number, say, of size s in dose group i Let ¯ Y is denote
the mean of the response Y ij for these litters, and let U is denote the sum of squares
of the Y ij about this mean Then the binomial dispersion statistic D is for theselitters is
D is = ¯ sU is
Y is (s − ¯ Y is),which has expectation
(see Uspensky, 1937, Chap XI, §4), where N is = sf is
If ¯Y is = 0 or ¯Y is = s, then D is = E is = V is = 0 An overall test comparing
t = (PD is −PE is)2/PV is with χ2
1 diagnoses the overdispersion We use Paul’sdata (1982), which is given in Table 1.1, to demonstrate the method
Trang 20Table 2.1: Binomial dispersion statistics and their null expectations and variancesevaluated for the data of Paul (1982).
Trang 21The values of Y is , E is , and V is are given in Table 2.1 In the majority of cases,
Y is exceeds the expectation E is Compute the test statistics
Recall our general setting for the developmental toxicity data in Chapter 1: a
typical teratology experiment consists of t different dose groups There are m i litters under the ith dose group The jth litter out of the m i has size n ij, and
out the n ij fetuses, y ij respond In this Section, for simplicity we shall continue
with the case of just one dose group and one litter, denoting by (y, n) And let
Y = (y1, · · · , y n)0 , in which y k is the fetus-specific binary outcome taking value 1 if
malformed, otherwise, 0 Obviously, y = Pn k=1 y k The extension to general cases
is straightforward
Likelihood-based approaches use a marginal mean regression parameter and quire full specification of the joint multivariate distribution through higher-momentassumptions These marginal approaches are computationally intensive even in sit-uations with a small number of fetuses of each independent experimental unit
Trang 22re-2.3.1 Beta-binomial Model (BB)
The beta-binomial distribution may be the most popular distribution employedfor correlated binary data in likelihood methods Based on the excellent workpreviously done by several statisticians (Williams, 1975; Haseman and kupper,1979; Segreti and Munson, 1981), the beta-binomial distribution was suggested forteratology data The superiority of the beta-binomial model for the analysis ofproportions has been shown by many authors (Paul, 1982; Pack, 1986)
• Likelihood
A beta-binomial model assumes that: the malformation probability P varies
as a beta distribution between dose groups with parameter α and β,
p α−1 (1 − p) β−1
B(α, β) , 0 ≤ p ≤ 1, α, β > 0,
where B(., ) denotes the beta function Conditional on P , the number of malformations Y in the litter follows a binomial distribution The marginal distribution for Y ij is:
Pr(Y = y|n, p) =
µ
n y
¶
B(α + y, n + β − y) B(α, β) , y = 0, 1, · · · , n. (2.1)
An alternative parametrization is in terms of (µ, θ), where
α + β , θ =
1
α + β .
Trang 23In this parametrization, the above density is:
Pr(Y = y|n, p) =
µ
n y
¶Γ(µ+yθ
θ )Γ(1−µ+(n−y)θ θ )Γ(1
θ)Γ(1+nθ
θ )Γ(µ θ)Γ(1−µ θ )
=
µ
n y
¶(1
¶Qy−1
r=0 (µ + rθ)Qn−y−1 r=0 (1 − µ + rθ)
Qn−1
• Mean, variance and correlation
E[Y ] = E[E(Y |P )] = E(nP ) = nµ, Var[Y ] = Var[E(Y |P )] + E[Var(Y |P )]
= Var(nP ) + E[nP (1 − P )]
= nµ(1 − µ)[1 + θ
1 + θ (n − 1)], Corr(Y k , Y l) = p Cov(Y k , Y l)
Trang 24The beta-binomial model only handles the data exhibiting overdispersion tice (1986) presented an extended beta-binomial model which allows overdispersion
Pren-as well Pren-as underdispersion
The correlated-binomial model was proposed by Kupper and Haseman (1978), andindependently, as an “additional binomial” generalization of the binomial distri-bution, by Altham (1978) This distribution is derived using the assumption thatthe responses of the fetuses in a litter are not mutually independent There is a
constant probability of malformation, µ, for all litters within a treatment group, and we introduce ρ, the correlation between the responses of any two litter-mates.
• Likelihood
Bahadur (1961) showed that when mutual independence does not hold, then
the usual binomial form for P r(Y = y|n) is multiplied by a function as
Trang 25Bahadur has shown that
Trang 26Paul (1987) presents a model that incorporates separate parameters to describeboth the intraclass or intralitter correlation and the heterogeneity of outcome rates
(p i ) between litters within a dose group He assumes that conditional on P the ter counts Y follow the correlated-binomial distribution, CB(n, p, φ) (see §2.3.2), at the same time the malformation probability P comes from a beta-distribution with parameter α, β instead of a constant This model can be viewed as a generaliza-
lit-tion of the binomial, the beta-binomial, and the correlated binomial distribulit-tions
Trang 27And Paul recommended to use his model for separate assessment of intralitter andinterlitter sources of variability.
¶
B(α + y, n + β − y) B(α, β)
Since this distribution is derived from the correlated-binomial distribution,
a bound similar to the one in the correlated-binomial distribution has to be
imposed on ρ, which is the correlation coefficient between any two littermates
Trang 28Y k and Y l, for the probability to be nonnegative:
• Mean, Variance and correlation
E[Y ] = E[E(Y |P )] = E(nP ) = nµ, Var[Y ] = nµ(1 − µ)(nθ + 1)/(1 + θ) + n(n − 1)φ,
distribution (2.7)
Trang 30mixture model can be written as:
l = γl(1)+ (1 − γ)l(2)The first derivatives with respect to γ, ν, µ and θ are:
∂l
∂µ = γ
nXy−1 r=0
The use of finite mixture models for proportions receives a number of
statisti-cians’ attention, such as William (1988), Brook et al (1997) In their paper, Brook
et al (1997) concluded that even while the non-mixture models may provide an
acceptable description of the main body of the data, a more complicated mixturemodel still worths to be considered
The quasi-likelihood approach and the generalized estimating equations approachare two main non-likelihood-based tools for the analysis of teratology data Todemonstrate these two approaches, we consider multiple-dose case and adopt thefollowing settings in this Section:
There are t dose groups in the teratology experiment Let m i be the number
of litters being exposed to dose d i (i = 1, 2, , t), and n ij be the litter size of the
Trang 31j-th litter in group i (j = 1, 2, , m i ) Among the n ij fetuses y ij fetuses respond.
The probability of adverse events for a fetus in the dose i is µ i , i = 1, 2, · · · , t The logistic model is the generally assumed model for the response probability µ i whenmultiple doses are considered, which is
µ i = g(x i) = exp(β0+ β1x i)
1 + exp(β0+ β1x i), (2.10)
where the covariate x i is the dose level for the ith group Recall that we adopt a constant intraclass correlation ρ i within dose i in our general settings for teratology data (see §1.2) Thus the mean and variance of y ij are
Var(Y ij ) = V ij = n ij µ i (1 − µ i ){1 + (n ij − 1)ρ i }. (2.12)
The main idea behind the quasi-likelihood approach (Wedderburn, 1974) is to avoid
a fully specified distribution for the response variable, Y ij, when one is uncertainabout the random mechanism by which the data were generated Instead, only the
relationship between the variance and the mean of Y ij is specified, which is (2.12)
To estimate β = (β0, β1)T, we solve the quasi-score equations
Trang 32for ρ i , i = 1, · · · , t (see Liang and Hanfelt, 1994; Kuk, 2003).
The variance-covariance matrix of the quasi-likelihood estimate ˆβ can be
The generalized estimating equations or GEE methodology for the analysis of related binary data is a marginal approach that was proposed by Liang and Zeger(1986) and Zeger and Liang (1986) The GEE approach is an extension of quasi-likelihood to longitudinal data analysis or an extension of the generalized linearmodel estimating equation to multivariate responses Unlike the quasi-likelihoodapproach and GLMs which still make between- and within-cluster independence as-sumptions, the GEE further relaxes the constraint of within-cluster independence.Because of the close connection to quasi-likelihood models, optimal properties of
cor-the solution to cor-the GEE can be extended (Liang et al., 1992).
Unlike some quasi-likelihood models modelling litter counts y ij, the GEE method
works with the fetus-specific outcomes y ijk Let Y ij = (y ij1 , · · · , y ijn ij)T denote the
vector of responses for the jth litter in dose group i The GEE can be written in
Trang 33the vector form as
where µ i is the mean vector for Y ij , D ij represents the n ij × p design matrix
∂µ i /∂β, and V ij is the n ij × n ij assumed covariance matrix of Y ij Liang and
Zeger (1986) suggest writing V ij in the form
V ij = A 1/2 ij R i (α)A 1/2 ij ,
where A ij is the diagonal matrix
A ij = diag[Var(Y ijk )] = diag[σ2
i ] = diag[µ i (1 − µ i )], and R i (α), which is also called “working correlation”, is a suitable correlation ma- trix for the fetus-specific outcome vector Y ij indexed by a vector of parameters α Liang and Zeger (1986) suggest estimating α iteratively using moment-type esti-
mators based on the residuals at each iteration While the working correlation can
be chosen as independent, exchangeable, AR(1) or unstructured The exchangeable
or equicorrelated assumption
Corr(Y ijk , Y ijl ) = ρ i
corresponding to R ij = 1 and R ij = ρ i (i 6= j) seems reasonable for developmental
teratology data
If the assumed covariance of Y ij is correct, the covariance of the estimatedparameters, which is also called model-based covariance, is estimated by the so-
Trang 34called naive estimator
Cov(ˆβ) =
à tX
Cov(ˆβ) = I0−1 I1I0−1 , (2.17)where
I0 =
à tX
Regardless of the choices of R i and A ij, the estimators are unbiased And as
long as the chosen R i and A ij are reasonable for the data, the GEE approach willyield highly efficient estimates of the parameters (see Liang and Zeger, 1986 andZeger and Liang, 1986)
One interesting question, which has not been addressed in the literature, is
whether modelling Y ij can result in more efficient estimation than modelling y ij,the aggregated responses from each litter With this in mind, we further investigate
Trang 35the GEE (2.15) by expanding it in vectors
In the analysis of littermate data from teratology studies, the intraclass correlationparameter or the dispersion parameter receives considerable attention Firstly, it
is an important quantitative measure of similarity between fetuses within litters;secondly, the validity of statistical inferences for mean parameters are based on agood estimation of intraclass correlation in many cases For example, in order to
estimate the mean parameters β in the GEE approach, we need to solve (2.15)
Trang 36which involves correlation matrix R i (α), so an estimator for the intraclass
corre-lation is necessary In addition, the estimation of covariance of ˆβ, Cov(ˆ β), also
needs the specification of the intraclass correlation
Usually, we prefer to assume that the correlation between any pair of responses
within each dose group is a constant, Corr(Y ijk , Y ijl ) = ρ i , k 6= l for any litter j in dose i In other words, the intraclass correlation only depends on dose level For simplicity we shall continue with the case of just one dose group (t = 1) in this Section The corresponding set up is: there are m dams, y j fetuses response out
of the dam j with n j fetuses, and the correlation between members k and l is a constant, Corr(Y jk , Y jl ) = ρ Extension of the following estimators for intraclass
correlation to multiple doses is straightforward
There are various estimators of intraclass correlation proposed for binary data.Several authors have provided excellent reviews on this topic For example, Ridout
et al (1999) compares bias, standard deviation, mean square error and efficiency
properties of 20 estimators Paul et al (2003) further compare above statistical
properties of 26 estimators
Intraclass correlation parameters can be obtained by the maximum likelihood
method based on a parametric model, such as BB, CB, CBC (see §2.3) The
corre-sponding score functions are given in (2.5),(2.8),(2.9) etc We usually think thatthe estimators based on the parametric models may suffer from inefficiency or biaswhen the likelihood is misspecified, which motivates consideration of other robustestimators Here we list several estimators which perform well in the simulation
Trang 37studies conducted by Paul and Saha.
• The Analysis of Variance Estimator
This estimator originally proposed for continuous data and later used by
various authors including Elston (1977) and Ridout et al (1999) is given by
ˆAOV = MS b − MS ω
MS b + (n0− 1)MS ω ,
where MS b and MS ω are, respectively, the between-group and within-group
mean squares from a one-way analysis of variance of the binary data y ijk andwhere
As point out by Paul et al (2003), this estimator has excellent statistical
properties including a least amount of bias, standard deviation and meansquared error
• The Estimator Based on Optimal Quadratic Estimating Equations
Following Crowder (1987), Paul (2001) obtained a set of unbiased estimating
equations for the regression and dispersion parameters Let Z j = Y j /n j The
Trang 38estimating equations for µ and ρ take the general form,
(see Crowder, 1987) Note that γ 1j and γ 2j are skewness and kurtosis of z i
In practice we lack of information about the true skewness and kurtosis ofthe distribution that generates the data Paul (2001) suggested to use thebeta-binomial distribution for the 2nd, 3rd and 4th moments, ie
κ 2j = µ(1 − µ){1 + (n j − 1)ρ}/n j ,
κ 3j = κ 2j (1 − 2µ){1 + (2n j − 1)ρ}/n j (1 + ρ),
Trang 39.
An estimator based on the optimal quadratic estimating equations not onlybehaves as good as the analysis of variance estimator, and further more, ithas consistently high efficiency and least variability
In the parametric approach, likelihood may be difficult to specify and justify.The GEE approach bypass specification of the likelihood, which is robust andcan give satisfactory parameter estimating results However, it requires sup-plementary estimating functions for variance/correlation parameters Mean-
while, the moment estimation method for α (in our case, it is ρ) may result in
infeasibility problem (see Crowder, 1995), we now introduce Gaussian ing likelihood approach for the analysis of correlated binary data
Trang 40work-Chapter 3
Gaussian Working Likelihood
Approach
Whittle (1961) introduced Gaussian estimation, which uses a normal log-likelihood
as a criterion function, though not assuming that data are normally distributed.Crowder (1985) applied this estimating method for correlated binomial data, andfound that it works well even for this type of data The possible reason may bebecause that a Gaussian distribution is the approximation of a binomial distribution
as n → ∞ Thus we can reasonably deduce that the performance of the Gaussian
estimation is not so satisfactory for correlated binomial data in the case of smallsample size or misspecification of variance of data
We now adopt a widely used assumption in teratology data: the response