Overdispersion, bias and efficiency in teratology data analysis

We consider statistical methods for the analysis of teratology data and investigatebias and efficiency of different estimators in presence of overdispersion.. Chapter 2The Models and Est

Trang 1

Data Analysis

Min ZHU

NATIONAL UNIVERSITY OF SINGAPORE

2004

Trang 2

Data Analysis

Min ZHU(B.Sc University of Science & Technology of China)

A THESIS SUBMITTEDFOR THE DEGREE OF MASTER OF SCIENCE

DEPARTMENT OF STATISTICS AND APPLIED PROBABILITY

NATIONAL UNIVERSITY OF SINGAPORE

2004

Trang 3

For the completion of this thesis, I would like very much to express my heartfeltgratitude to my supervisor Associate Professor Yougan Wang for all his invaluableadvice and guidance, endless patience, kindness and encouragement during thementor period in the Department of Statistics and Applied Probability of NationalUniversity of Singapore I have learned many things from him, especially regardingacademic research and character building I truly appreciate all the time and effort

he has spent in helping me to solve the problems encountered even when he is inthe midst of his work

I also wish to express my sincere gratitude and appreciation to my other turers, namely Professors Zhidong Bai, Zehua Chen for imparting knowledge andtechniques to me and their precious advice and help in my study

lec-It is a great pleasure to record my thanks to other members and staff of thedepartment for their help in various ways and providing such a pleasant workingenvironment, especially to Mrs Yvonne Chow and for advice in computing andIrene Tan and others for administrative matters I also wish to thank the NUS forproviding me the overseas graduate student scholarship

Finally, I would like to contribute the completion of this thesis to my dearestfamily who have always been supporting me with their encouragement and under-standing in all my years Special thanks to all my friends who helped me in oneway or another for their friendship and encouragement

Trang 4

1.1 Teratology Studies 1

1.2 Examples and Notation 3

1.3 Organization of the Thesis 7

2 The Models and Estimation 8 2.1 Introduction 8

2.2 Diagnostics of overdispersion 9

2.3 Likelihood-based Models 11

2.3.1 Beta-binomial Model (BB) 12

2.3.2 Correlated-binomial Model (CB) 14

2.3.3 Beta-correlated binomial Model (BCB) 16

2.3.4 Mixture Models 19

2.4 Non-likelihood-based Models 20

2.4.1 Quasi-likelihood Approach 21

2.4.2 Generalized Estimating Equations 22

ii

Trang 5

2.5 Estimating Intraclass Correlation 25

2.6 Summary 29

3 Gaussian Working Likelihood Approach 30 3.1 Decoupled Gaussian Estimation 30

3.2 Model Comparisons 34

3.3 Simulation Setup 37

3.3.1 Beta-binomial data 37

3.3.2 Non-beta-binomial data 40

3.4 Simulation Studies 44

3.5 Discussion 55

4 Dose-response Models Incorporating Risk of Death in Utero 57 4.1 Introduction 57

4.2 The model 59

4.3 Specification of the mean and variance functions 63

4.4 An Example 65

4.5 Discussion 68

Trang 6

List of Tables

1.1 Data presented in Paul (1982) 4

1.2 Sample No 3 of Haseman and Soares (1976) 5

2.1 Binomial dispersion statistics and their null expectations and vari-ances evaluated for the data of Paul (1982) 10

3.1 Distribution of number of live fetuses per litter (n ij) used in the simulation studies 39

3.2 Different correlation combinations considered in the exponential-gamma approach 42

3.3 Different response probabilities considered in the step-function ap-proach 44

3.4 Biases of ˆβ0 and ˆβ1 under various methods 45

3.5 MSEs of ˆβ0 and ˆβ1 under various methods 46

3.6 Biases of ˆρ from four different methods . 48

3.7 MSEs of ˆρ from four different methods . 49 3.8 Biases of ˆθT for data generated from exponential-gamma approach 51

iv

Trang 7

3.9 MSEs of ˆθT for data generated from exponential-gamma approach 523.10 Biases of ˆθT for data generated from step-function approach 533.11 MSEs of ˆθT for data generated from step-function approach 54

4.1 Summary of the data of L¨uning et al (1966) 66

Trang 8

List of Figures

1.1 Data structure of teratology studies 3

4.1 Estimated risk functions using the dominant lethal assay data 67

Trang 9

We consider statistical methods for the analysis of teratology data and investigatebias and efficiency of different estimators in presence of overdispersion In partic-ular, we focus on the decoupled Gaussian method for analysis of the correlatedbinary data Both analytic and simulation studies are carried out to evaluate dif-ferent models It is found that the decoupled Gaussian method work especiallywell for joint estimation of mean and intraclass correlation parameters

Previous modelling effort usually has been conducted on viable fetuses alone

To incorporate information in the prenatal dead/resorbed fetuses, we propose a newapproach for joint analysis of prenatal death/resorption and malformation Thisapproach has several advantages: (i) it provides a convenient way of modelling

the unobserved in utero death as well as observed defeats in risk assessment; (ii)

it enables us to have flexible choices in ordinal categorical data analysis; and (iii)

we can obtain efficient statistical inference using the framework of the generalizedestimating equations Real data analyses are provided for demonstrations

Trang 10

My major contributions in this thesis are as follows:

(i) to verify using litter counts in stead of fetus-specific outcomes does not lose

any information for the GEE approach to estimate mean parameters (§4.2, Chapter

(iv) to propose a multivariate model incorporating the risk of death in Utero

by modelling the probability of being observed (Chapter 4)

Trang 11

to the growth of drug industry and more people suffering toxic reaction to drugs.

In order to identify the toxicity, three test designs (segments I, II, and III) havebeen established by FDA in 1966 to assess specific type of effects (Molenberghs,Declerck, and Aerts, 1998):

Segment I studies are known as “fertility studies” which are designed to assessmale and female fertility and general reproductive ability Such studies typically

Trang 12

involve exposing males for 60 days and females for 14 days before mating.

Segment II studies are also referred to as “teratology studies” since historicallythe primary goal was to study malformations The origin of the word “teratology”,which comes from the Greek word “tera”, means monster The methodologiesdiscussed in this thesis mainly apply to Segment II studies

Segment III tests are focused on effects later in gestation and involve exposingpregnant animals from the 15th day of gestation through lactation

Animal laboratory experiments are used in each of three test designs and vide good evidences for identifying the toxicity A typical teratology experimentdistributes timed-pregnant animals (mice, rats, and occasionally rabbits) to sev-eral groups treated with varying doses of a compound and a control group Thesedams are exposed during major organogenesis (days 6 to 15 for mice and rats) andstructural development All dams are sacrificed prior to normal delivery, at whichtime the uterine contents are thoroughly examined to study the reproductive anddevelopmental toxicity of the test agent The number of implantations (or some-times called implantation sites) is counted An implant or fetus may be resorbed

pro-at different stages during gestpro-ation, or die before birth; if it survives, growth duction, such as weight loss, may occur; it may further exhibit various types ofstructural variation, or even one or more types of malformation (Zhu and Fung,

re-1996) Figure 1.1 illustrates the teratology data structure.

Since laboratory experiments involve considerable amounts of time and money,

as well as huge numbers of animals, it is essential that the most appropriate and

Trang 13

?

Implants

Q Q Q Q Q Q Q

K

Figure 1.1: Data structure of teratology studiesefficient statistical models are used (Williams and Ryan, 1996) In addition, theanalysis of the teratology data combining multivariate and clustered data issuesraises a number of challenges

Although there are dichotomous as well as continuous outcomes in a teratologyexperiment, we will focus on dichotomous outcomes — the occurrence of malfor-mations or fetal deaths — in this thesis We now introduce several teratologydata sets, which will be used in the thesis, to give an intuitional understanding ofteratology data

Trang 14

Example 1.1 Paul (1982)

The data (from Shell Toxicology Laboratory, Sittingbourne Research Centre, tingbourne, Kent, England) are given in Table 1.1, which are analyzed by Paul(1982) The species used in the experiment is banded Dutch rabbit, and skeletal

Sit-and visceral abnormalities were observed Here n denotes the number of live tuses, and s indicates the number affected by treatment in each dose group For

fe-example, the first pregnant female in the control group has one abnormal fetus out

of the twelve live fetuses, and the first pregnant female in the low-dose treatmentgroup has no abnormal fetuses out of five fetuses

Table 1.1: Data presented in Paul (1982)

Example 1.2 Haseman and Soares (1976)

The data in Table 1.2 are taken from Haseman and Soares (1976) and describeone of the control groups from dominant lethal assays In this experiment a drug’sability to cause damage to reproductive genetic material, sufficient to kill the fer-tilized egg or developing embryo, is tested by dosing a male mouse and mating it

Trang 15

to one or more females A significant increase in fetal deaths is then indicative aadverse effect.

Table 1.2: Sample No 3 of Haseman and Soares (1976)

Observed frequency distribution of fetal death in mice

we adopt for teratology data in the following paragraphs

In general, we will use capital letters to represent random variables or ces, relying on the context to distinguish the two, and small letters for specificobservations Scalars and matrices will be in normal type, vectors will be in boldtype

Trang 16

matri-In a typical teratology experiment, suppose we have t different dose groups There are m i litters under the ith dose group The jth litter out of the m i has

size n ij , and y ij fetuses out of the n ij are abnormal We further write the binary

response vector from the litter j in dose i as Y ij = (y ij1 , · · · , y ijn ij)0 , where y ijk takes value 1 if the k-th fetus in the j-th litter under i-th group is abnormal, otherwise 0 Obviously, y ij = Pk y ijk It may be useful to refer to the diagrambelow for a quick check of the notation

The response probability is assumed to be the same for all fetuses in the same

dose group; specifically, Pr(Y ijk = 1) = µ i for all j and k Furthermore, the

responses of fetuses from different litters are assumed to be independent

Because of genetic similarity and the same treatment conditions, fetuses ofthe same mother behave more alike than those of another mother This has beentermed “litter effect” and is one important form of clustering As a result, responses

of different fetuses within the same litter are likely to be correlated Assume theintraclass correlation between any pair of binary responses is affected by dose level

Trang 17

only, that is Corr(Y ijk , Y ijl ), k 6= l, for litter j in dose group i is ρ i It follows that

Var(Y ij ) = n ij µ i (1 − µ i )[1 + (n ij − 1)ρ i ],

so that the Y ij are overdispersed relative to the binomial distribution if ρ i > 0

and underdispered if ρ i < 0 In practice, overdispersion is much more common

compared with underdispersion phenomenon As an important characteristic ofteratological data, this extra variation must be taken into account for valid statis-tical inferences

This chapter has given a brief introduction to teratology data Chapter 2 reviewsthe existing methods for analysis of teratology data and compares advantages anddisadvantages of each method Chapter 3 focuses on the decoupled Gaussian ap-proach for analysis of the correlated binary data Simulation studies are carriedout to investigate the bias and efficiency of different estimators in the presence ofoverdispersion A new approach of ordinal responses for joint analysis of prenataldeath/resorbtion and malformation is given in Chapter 4 Conclusions and furtherresearch are presented in Chapter 5

Trang 18

Chapter 2

The Models and Estimation

In the past 30 years, a number of methods have been proposed for the analysis

of clustered binary data Roughly, methods for correlated binary data can begrouped into two classes: likelihood-based methods and non-likelihood methods.For likelihood-based methods, the key problem is to find proper likelihood func-tions that can be used for the efficient and parsimonious estimation of parameters.For non-likelihood methods, the quasi-likelihood approach and the generalized es-timating equations (GEE) have gained widespread popularity In addition, thepeculiar parameter to clustered binary data, intraclass correlation, is also an im-portant parameter, and a number of approaches have been proposed for estimatingthis parameter Before we present the existing methods for analysis of teratologydata, let’s first introduce a fundamental issue — identifying overdispersion

Trang 19

2.2 Diagnostics of overdispersion

As mentioned in §1.2, teratological data typically exhibits overdispersion But it is

not to affirm that teratological data is born with overdispersion A question for thiskind of data arises naturally: how to identify whether there is any overdispersion

in data? If there is, how to quantify it relative to normal binomial or multinomialdistribution? As pointed out by Williams(1987), evidence of overdispersion comesfrom an analysis of the binomial dispersion statistics

Consider the litters, f is in number, say, of size s in dose group i Let ¯ Y is denote

the mean of the response Y ij for these litters, and let U is denote the sum of squares

of the Y ij about this mean Then the binomial dispersion statistic D is for theselitters is

D is = ¯ sU is

Y is (s − ¯ Y is),which has expectation

(see Uspensky, 1937, Chap XI, §4), where N is = sf is

If ¯Y is = 0 or ¯Y is = s, then D is = E is = V is = 0 An overall test comparing

t = (PD is −PE is)2/PV is with χ2

1 diagnoses the overdispersion We use Paul’sdata (1982), which is given in Table 1.1, to demonstrate the method

Trang 20

Table 2.1: Binomial dispersion statistics and their null expectations and variancesevaluated for the data of Paul (1982).

Trang 21

The values of Y is , E is , and V is are given in Table 2.1 In the majority of cases,

Y is exceeds the expectation E is Compute the test statistics

Recall our general setting for the developmental toxicity data in Chapter 1: a

typical teratology experiment consists of t different dose groups There are m i litters under the ith dose group The jth litter out of the m i has size n ij, and

out the n ij fetuses, y ij respond In this Section, for simplicity we shall continue

with the case of just one dose group and one litter, denoting by (y, n) And let

Y = (y1, · · · , y n)0 , in which y k is the fetus-specific binary outcome taking value 1 if

malformed, otherwise, 0 Obviously, y = Pn k=1 y k The extension to general cases

is straightforward

Likelihood-based approaches use a marginal mean regression parameter and quire full specification of the joint multivariate distribution through higher-momentassumptions These marginal approaches are computationally intensive even in sit-uations with a small number of fetuses of each independent experimental unit

Trang 22

re-2.3.1 Beta-binomial Model (BB)

The beta-binomial distribution may be the most popular distribution employedfor correlated binary data in likelihood methods Based on the excellent workpreviously done by several statisticians (Williams, 1975; Haseman and kupper,1979; Segreti and Munson, 1981), the beta-binomial distribution was suggested forteratology data The superiority of the beta-binomial model for the analysis ofproportions has been shown by many authors (Paul, 1982; Pack, 1986)

• Likelihood

A beta-binomial model assumes that: the malformation probability P varies

as a beta distribution between dose groups with parameter α and β,

p α−1 (1 − p) β−1

B(α, β) , 0 ≤ p ≤ 1, α, β > 0,

where B(., ) denotes the beta function Conditional on P , the number of malformations Y in the litter follows a binomial distribution The marginal distribution for Y ij is:

Pr(Y = y|n, p) =

µ

n y

¶

B(α + y, n + β − y) B(α, β) , y = 0, 1, · · · , n. (2.1)

An alternative parametrization is in terms of (µ, θ), where

α + β , θ =

1

α + β .

Trang 23

In this parametrization, the above density is:

Pr(Y = y|n, p) =

µ

n y

¶Γ(µ+yθ

θ )Γ(1−µ+(n−y)θ θ )Γ(1

θ)Γ(1+nθ

θ )Γ(µ θ)Γ(1−µ θ )

=

µ

n y

¶(1

¶Qy−1

r=0 (µ + rθ)Qn−y−1 r=0 (1 − µ + rθ)

Qn−1

• Mean, variance and correlation

E[Y ] = E[E(Y |P )] = E(nP ) = nµ, Var[Y ] = Var[E(Y |P )] + E[Var(Y |P )]

= Var(nP ) + E[nP (1 − P )]

= nµ(1 − µ)[1 + θ

1 + θ (n − 1)], Corr(Y k , Y l) = p Cov(Y k , Y l)

Trang 24

The beta-binomial model only handles the data exhibiting overdispersion tice (1986) presented an extended beta-binomial model which allows overdispersion

Pren-as well Pren-as underdispersion

The correlated-binomial model was proposed by Kupper and Haseman (1978), andindependently, as an “additional binomial” generalization of the binomial distri-bution, by Altham (1978) This distribution is derived using the assumption thatthe responses of the fetuses in a litter are not mutually independent There is a

constant probability of malformation, µ, for all litters within a treatment group, and we introduce ρ, the correlation between the responses of any two litter-mates.

• Likelihood

Bahadur (1961) showed that when mutual independence does not hold, then

the usual binomial form for P r(Y = y|n) is multiplied by a function as

Trang 25

Bahadur has shown that

Trang 26

Paul (1987) presents a model that incorporates separate parameters to describeboth the intraclass or intralitter correlation and the heterogeneity of outcome rates

(p i ) between litters within a dose group He assumes that conditional on P the ter counts Y follow the correlated-binomial distribution, CB(n, p, φ) (see §2.3.2), at the same time the malformation probability P comes from a beta-distribution with parameter α, β instead of a constant This model can be viewed as a generaliza-

lit-tion of the binomial, the beta-binomial, and the correlated binomial distribulit-tions

Trang 27

And Paul recommended to use his model for separate assessment of intralitter andinterlitter sources of variability.

¶

B(α + y, n + β − y) B(α, β)

Since this distribution is derived from the correlated-binomial distribution,

a bound similar to the one in the correlated-binomial distribution has to be

imposed on ρ, which is the correlation coefficient between any two littermates

Trang 28

Y k and Y l, for the probability to be nonnegative:

• Mean, Variance and correlation

E[Y ] = E[E(Y |P )] = E(nP ) = nµ, Var[Y ] = nµ(1 − µ)(nθ + 1)/(1 + θ) + n(n − 1)φ,

distribution (2.7)

Trang 30

mixture model can be written as:

l = γl(1)+ (1 − γ)l(2)The first derivatives with respect to γ, ν, µ and θ are:

∂l

∂µ = γ

nXy−1 r=0

The use of finite mixture models for proportions receives a number of

statisti-cians’ attention, such as William (1988), Brook et al (1997) In their paper, Brook

et al (1997) concluded that even while the non-mixture models may provide an

acceptable description of the main body of the data, a more complicated mixturemodel still worths to be considered

The quasi-likelihood approach and the generalized estimating equations approachare two main non-likelihood-based tools for the analysis of teratology data Todemonstrate these two approaches, we consider multiple-dose case and adopt thefollowing settings in this Section:

There are t dose groups in the teratology experiment Let m i be the number

of litters being exposed to dose d i (i = 1, 2, , t), and n ij be the litter size of the

Trang 31

j-th litter in group i (j = 1, 2, , m i ) Among the n ij fetuses y ij fetuses respond.

The probability of adverse events for a fetus in the dose i is µ i , i = 1, 2, · · · , t The logistic model is the generally assumed model for the response probability µ i whenmultiple doses are considered, which is

µ i = g(x i) = exp(β0+ β1x i)

1 + exp(β0+ β1x i), (2.10)

where the covariate x i is the dose level for the ith group Recall that we adopt a constant intraclass correlation ρ i within dose i in our general settings for teratology data (see §1.2) Thus the mean and variance of y ij are

Var(Y ij ) = V ij = n ij µ i (1 − µ i ){1 + (n ij − 1)ρ i }. (2.12)

The main idea behind the quasi-likelihood approach (Wedderburn, 1974) is to avoid

a fully specified distribution for the response variable, Y ij, when one is uncertainabout the random mechanism by which the data were generated Instead, only the

relationship between the variance and the mean of Y ij is specified, which is (2.12)

To estimate β = (β0, β1)T, we solve the quasi-score equations

Trang 32

for ρ i , i = 1, · · · , t (see Liang and Hanfelt, 1994; Kuk, 2003).

The variance-covariance matrix of the quasi-likelihood estimate ˆβ can be

The generalized estimating equations or GEE methodology for the analysis of related binary data is a marginal approach that was proposed by Liang and Zeger(1986) and Zeger and Liang (1986) The GEE approach is an extension of quasi-likelihood to longitudinal data analysis or an extension of the generalized linearmodel estimating equation to multivariate responses Unlike the quasi-likelihoodapproach and GLMs which still make between- and within-cluster independence as-sumptions, the GEE further relaxes the constraint of within-cluster independence.Because of the close connection to quasi-likelihood models, optimal properties of

cor-the solution to cor-the GEE can be extended (Liang et al., 1992).

Unlike some quasi-likelihood models modelling litter counts y ij, the GEE method

works with the fetus-specific outcomes y ijk Let Y ij = (y ij1 , · · · , y ijn ij)T denote the

vector of responses for the jth litter in dose group i The GEE can be written in

Trang 33

the vector form as

where µ i is the mean vector for Y ij , D ij represents the n ij × p design matrix

∂µ i /∂β, and V ij is the n ij × n ij assumed covariance matrix of Y ij Liang and

Zeger (1986) suggest writing V ij in the form

V ij = A 1/2 ij R i (α)A 1/2 ij ,

where A ij is the diagonal matrix

A ij = diag[Var(Y ijk )] = diag[σ2

i ] = diag[µ i (1 − µ i )], and R i (α), which is also called “working correlation”, is a suitable correlation matrix for the fetus-specific outcome vector Y ij indexed by a vector of parameters α Liang and Zeger (1986) suggest estimating α iteratively using moment-type esti-

mators based on the residuals at each iteration While the working correlation can

be chosen as independent, exchangeable, AR(1) or unstructured The exchangeable

or equicorrelated assumption

Corr(Y ijk , Y ijl ) = ρ i

corresponding to R ij = 1 and R ij = ρ i (i 6= j) seems reasonable for developmental

teratology data

If the assumed covariance of Y ij is correct, the covariance of the estimatedparameters, which is also called model-based covariance, is estimated by the so-

Trang 34

called naive estimator

Cov(ˆβ) =

Ã tX

Cov(ˆβ) = I0−1 I1I0−1 , (2.17)where

I0 =

Ã tX

Regardless of the choices of R i and A ij, the estimators are unbiased And as

long as the chosen R i and A ij are reasonable for the data, the GEE approach willyield highly efficient estimates of the parameters (see Liang and Zeger, 1986 andZeger and Liang, 1986)

One interesting question, which has not been addressed in the literature, is

whether modelling Y ij can result in more efficient estimation than modelling y ij,the aggregated responses from each litter With this in mind, we further investigate

Trang 35

the GEE (2.15) by expanding it in vectors

In the analysis of littermate data from teratology studies, the intraclass correlationparameter or the dispersion parameter receives considerable attention Firstly, it

is an important quantitative measure of similarity between fetuses within litters;secondly, the validity of statistical inferences for mean parameters are based on agood estimation of intraclass correlation in many cases For example, in order to

estimate the mean parameters β in the GEE approach, we need to solve (2.15)

Trang 36

which involves correlation matrix R i (α), so an estimator for the intraclass

corre-lation is necessary In addition, the estimation of covariance of ˆβ, Cov(ˆ β), also

needs the specification of the intraclass correlation

Usually, we prefer to assume that the correlation between any pair of responses

within each dose group is a constant, Corr(Y ijk , Y ijl ) = ρ i , k 6= l for any litter j in dose i In other words, the intraclass correlation only depends on dose level For simplicity we shall continue with the case of just one dose group (t = 1) in this Section The corresponding set up is: there are m dams, y j fetuses response out

of the dam j with n j fetuses, and the correlation between members k and l is a constant, Corr(Y jk , Y jl ) = ρ Extension of the following estimators for intraclass

correlation to multiple doses is straightforward

There are various estimators of intraclass correlation proposed for binary data.Several authors have provided excellent reviews on this topic For example, Ridout

et al (1999) compares bias, standard deviation, mean square error and efficiency

properties of 20 estimators Paul et al (2003) further compare above statistical

properties of 26 estimators

Intraclass correlation parameters can be obtained by the maximum likelihood

method based on a parametric model, such as BB, CB, CBC (see §2.3) The

corre-sponding score functions are given in (2.5),(2.8),(2.9) etc We usually think thatthe estimators based on the parametric models may suffer from inefficiency or biaswhen the likelihood is misspecified, which motivates consideration of other robustestimators Here we list several estimators which perform well in the simulation

Trang 37

studies conducted by Paul and Saha.

• The Analysis of Variance Estimator

This estimator originally proposed for continuous data and later used by

various authors including Elston (1977) and Ridout et al (1999) is given by

ˆAOV = MS b − MS ω

MS b + (n0− 1)MS ω ,

where MS b and MS ω are, respectively, the between-group and within-group

mean squares from a one-way analysis of variance of the binary data y ijk andwhere

As point out by Paul et al (2003), this estimator has excellent statistical

properties including a least amount of bias, standard deviation and meansquared error

• The Estimator Based on Optimal Quadratic Estimating Equations

Following Crowder (1987), Paul (2001) obtained a set of unbiased estimating

equations for the regression and dispersion parameters Let Z j = Y j /n j The

Trang 38

estimating equations for µ and ρ take the general form,

(see Crowder, 1987) Note that γ 1j and γ 2j are skewness and kurtosis of z i

In practice we lack of information about the true skewness and kurtosis ofthe distribution that generates the data Paul (2001) suggested to use thebeta-binomial distribution for the 2nd, 3rd and 4th moments, ie

κ 2j = µ(1 − µ){1 + (n j − 1)ρ}/n j ,

κ 3j = κ 2j (1 − 2µ){1 + (2n j − 1)ρ}/n j (1 + ρ),

Trang 39

.

An estimator based on the optimal quadratic estimating equations not onlybehaves as good as the analysis of variance estimator, and further more, ithas consistently high efficiency and least variability

In the parametric approach, likelihood may be difficult to specify and justify.The GEE approach bypass specification of the likelihood, which is robust andcan give satisfactory parameter estimating results However, it requires sup-plementary estimating functions for variance/correlation parameters Mean-

while, the moment estimation method for α (in our case, it is ρ) may result in

infeasibility problem (see Crowder, 1995), we now introduce Gaussian ing likelihood approach for the analysis of correlated binary data

Trang 40

work-Chapter 3

Gaussian Working Likelihood

Approach

Whittle (1961) introduced Gaussian estimation, which uses a normal log-likelihood

as a criterion function, though not assuming that data are normally distributed.Crowder (1985) applied this estimating method for correlated binomial data, andfound that it works well even for this type of data The possible reason may bebecause that a Gaussian distribution is the approximation of a binomial distribution

as n → ∞ Thus we can reasonably deduce that the performance of the Gaussian

estimation is not so satisfactory for correlated binomial data in the case of smallsample size or misspecification of variance of data

We now adopt a widely used assumption in teratology data: the response

Định dạng
Số trang	85
Dung lượng	438,33 KB