Infeasibility and Efficiency of Working Correlation Matrixin Generalized Estimating Equations NG WEE TECK NATIONAL UNIVERSITY OF SINGAPORE 2005... Infeasibility and Efficiency of Working
Trang 1Infeasibility and Efficiency of Working Correlation Matrix
in Generalized Estimating Equations
NG WEE TECK
NATIONAL UNIVERSITY OF SINGAPORE
2005
Trang 2Infeasibility and Efficiency of Working Correlation Matrix
in Generalized Estimating Equations
NG WEE TECK(B.Sc.(Hons) National University of Singapore)
A THESIS SUBMITTEDFOR THE DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF STATISTICS AND APPLIED PROBABILITY
NATIONAL UNIVERSITY OF SINGAPORE
2005
Trang 31.1 Organization of this thesis 1
1.2 Preliminaries 2
1.3 Generalized Estimating Equations 3
1.4 Estimation of α using the moment method 10
i
Trang 4Contents ii
3.1 Quasi-Least Squares 16
3.2 Pseudolikelihood (Gaussian Estimation) 19
3.3 Cholesky Decomposition 21
3.4 Covariance of the estimates 22
3.5 Computation of Estimates 24
3.5.1 Algorithm for Quasi-Least Squares corrected for Bias 24
3.5.2 Algorithm for Gaussian (Pseudo-Likelihood) Method and Cholesky Method 25
4 Common Patterned Correlation Structures 27 4.1 Exchangeable/Equicorrelation Structure 27
4.2 AR(1) Structure 34
4.3 One-dependence Structure 36
4.4 Generalized Markov Correlation Structure 42
5 Simulation Studies 47 5.1 Conclusion 50
Trang 5I would like to thank my advisor, Associate Professor Wang You-Gan for hisguidance in the 2 years Without his patience and understanding when I was downthis piece of work would probably have never been completed
My thanks also goes out to all the professors in the department who have parted knowledge in one way or another thoughout my undergraduate and graduatedays They are an exceptional bunch of folks who have taught me in not just anacademic matters but sometimes about life in general as well I only hope that infuture I wish have more opportunities to learn more from them
im-This thesis would never have been completed without the help of some of myfellow students and with Yvonne’s help in computer stuff To my dear friends from
iii
Trang 6Acknowledgements iv
PRC, I wish to thank you guys for helping me brush up my chinese and teaching
me mathematical terms in the chinese langauge
Lastly, I would like to dedicate this piece of work to my family who has alwaysbeen there for me every step in the journey of life
Carpe Diem
Ng Wee TeckAugust 2005
Trang 7List of Figures
5.1 Estimated MSE of ˆα and ˆβ1 for normal unbalanced data, True EXCWorking AR(1), K=25 545.2 Estimated MSE of ˆα and ˆβ1 for normal unbalanced data, True EXCWorking AR(1), K=100 555.3 Estimated MSE of ˆα and ˆβ1 for normal balanced data, True MA(1)Working AR(1), K=25 565.4 Estimated MSE of ˆα and ˆβ1 for normal balanced data, True MA(1)Working AR(1), K=100 575.5 Estimated MSE of ˆα and ˆβ1 for normal balanced data, True AR(1)Working EXC, K=25 58
v
Trang 8List of Figures vi
5.6 Estimated MSE of ˆα and ˆβ1 for normal balanced data, True AR(1)
Working EXC, K=100 59
Trang 9List of Tables
5.1 Table of percentage of times infeasible answers occurs, true tion EXC and working AR(1) 525.2 Table of percentage of times infeasible answers occurs, true correla-tion EXC and working AR(1) 525.3 Table of percentage of times infeasible answers occurs, true correla-tion MA(1) and working AR(1) 525.4 Table of percentage of times infeasible answers occurs, true correla-tion MA(1) and working AR(1) 535.5 Estimated MSE of ˆα for correlated Poisson data, True AR(1) Work-ing AR(1) 53
correla-vii
Trang 10List of Tables viii
5.6 Estimated MSE of ˆβ1 for correlated Poisson data, True AR(1)
Work-ing AR(1) 605.7 Estimated MSE of ˆα for correlated binary data, True EXC Working
AR(1) 615.8 Estimated MSE of ˆβ1 for correlated binary data, True EXC Working
AR(1) 61
Trang 11Generalized estimating equations (GEEs) have been used extensively in the ysis of clustered and longitudinal data since the seminal paper by Liang & Zeger(1986) A key attraction of using GEEs is that the regression parameters remainconsistent even when the ’working’ correlation is misspecified However, Crowder(1995) pointed out that there are problems with the estimation of the correlationparameters when it is misspecified and this affects the consistency of the regressionparameters as well This issue has been addressed to a certain extent in a paper
anal-by Chaganty & Shults (1999) , however the estimates are asymptotically biased
In this thesis, we aim to clarify some of these issues Firstly, the feasibility ofthe estimators for the correlation parameters under misspecification and secondlythe efficiency of the various methods of estimating the correlation parameters under
ix
Trang 12Summary x
misspecification are investigated Analytic expressions for the estimating functionsusing the decoupled Gaussian and cholesky decomposition methods proposed byWang & Carey (2004) are also provided for common correlation structures such asexchangeable, AR(1) and MA(1)
Trang 13Chapter 1
Introduction
1.1 Organization of this thesis
The main objective of the thesis is to study the impact of misspecification of thecorrelation matrix on both the regression and correlation parameters in a
Generalized Estimating Equations (GEE) setup The structure is as follows, inChapter 1 we give a brief introduction to GEE and also introduce some estimatesfor common correlations structures using the moment approach
In Chapter 2, we describe the main problem and present examples as to when theinfeasibility problem sets in and breaks down the robustness property of GEE
Chapter 3 describes other techniques for obtaining estimates for the correlationparameters using estimating equations In particular, the three methods arequasi-least squares, gaussian (pseudo-likelihood) method and the Cholesky
1
Trang 14In this thesis, we assume the usual set-up for GEE, each response vector
yi = (y1, , yni)0 measured on subject i = 1, , K are assumed to be
independent between subjects The vector of responses yi is measured at times
ti = (ti1, , tin i)0 For subject i at time j, the response is denoted by yij and hascovariates x0
ij = (xij1, , xijp), p being the number of regression parameters Wedenote the expected value E(yij) = µij and it is linked to the covariates through
µij = g−1(x0
ijβ) where β = (β1, , βp)0 are the regression parameters The
variance of an observation var(yij) = φσ2ij, where φ is an unknown dispersion
parameter The covariance matrix of yi, denoted by Vi is assumed to be of theform φA1/2i RiA1/2i with Ai=diag(σ2
ij) and Ri the correlation matrix
Trang 151.3 Generalized Estimating Equations 3
The notation for the estimates of the correlation parameters will be denoted inthe following manner, ˆαmethod,structure where method is a single letter describingthe method used and structure is the estimator for the correlation structure
under study For example, ˆαM,AR(1) indicates a moment estimator for an AR(1)structure
1.3 Generalized Estimating Equations
In a seminal paper in 1986, Liang & Zeger (1986) introduced Generalized
Estimating Equations (GEE) that extends upon the work of Generalized LinearModels (GLM) McCullagh & Nelder (1989) to a multivariate setting with
correlated data An important contribution of Liang & Zeger (1986) is that theyincorporated the information inherently present in the correlation structure of
longitudinal data into estimating functions The theoretical justifications and
asymptotic properties for the resulting estimators from using GEE’s are also
presented in that paper
One of the key features that has encouraged the use of GEE’s in clustered and
longitudinal data analysis is that the regression parameters ( ˆβ) remain consistenteven if the ’working’ correlation or covariance structure is misspecified What
they mean by a ’working’ correlation matrix is as follows, in real life we would
not know what the true correlation structure of the data is However in the GEE
Trang 161.3 Generalized Estimating Equations 4
framework, we only need to specify some structure that is a good approximationand we call that a ’working’ correlation structure There is no need to have
complete knowledge of the true correlation, we would only need a ’working’
correlation structure to estimate the regression parameters Throughout this
thesis, the true correlation structure will be denoted as Ri and the ’working’
correlation ¯Ri Although the theory of GEE indicates that we only need a
’working’ correlation structure, we can expect that if the correlation or covariancestructure is modeled accurately, statistical inference on the regression parameterswould most definitely be improved in terms of (smaller) standard errors or
improvement in the asymptotic relative efficiency (Wang & Carey (2003))
The results obtained in Liang & Zeger (1986) are asymptotic results, thus in a
finite sample setting or when the number of subjects available is small there is anobvious need to model the correlation structure properly due to the lack of
information Furthermore, rather than regarding the correlation and covarianceparameters as nuisance parameters, there are instances when these parameters
are of scientific interests, for eg in genetic studies Lastly, we need to emphasizethe importance of proper modelling of the correlation parameters in that it is
possible that a gross misspecification of the structure may lead to infeasible
results This is in fact the main concern of this thesis and is explained in furtherdetail in Chapter 2
Trang 171.3 Generalized Estimating Equations 5
We would next describe briefly the optimality of GEE’s along the lines of the
classic Gauss-Markov Theorem Suppose we have i.i.d observations yi with
E(yi) = µ and Var(yi) = σ2 The Gauss-Markov Theorem states that the
estimated regression parameters is a best linear unbiased estimator (BLUE)
For example, if y = Xβ + , E(y) = Xβ and Cov(y) = σ2I,
then βBLU E = (X0X)−1X0y has minimum variance among all unbiased estimators
of β Another way to look at this problem would be that we are interested to find
a matrix A, such that E(Ay) = β and Cov(Ay) has minimum variance amongall estimators of this type It can be shown that A = (X0X)−1X0 satisfies the 2above conditions
Under the independence assumption, suppose we have E(yij) = µij(β),
Var(yij) = νij/φ and the design matrix is Xi = (xi1, , xin i)0 The score
equations using a likelihood analysis have the form,
where ∆i = diag(dµij/dηij) Denote the solution of (1.1) as βI
It can be shown that the asymptotic variance of ˆβI is,
Trang 181.3 Generalized Estimating Equations 6
As an extension to (1.1), the GEE setup involves solving the following estimatingfunction,
correlation matrix of the ith subject
The key difference between the independence and GEE setup is the extension ofthe uncorrelated response vector in GLM to the multivariate response we have inGEE GEE includes the information from the correlation matrix Ri, which
models the correlation in each subject/cluster Note that GEE reduces to the
independence equation when we specify Ri = I This approach is also similar tothe function derived from the quasi-likelihood approach proposed by Wedderburn(1974) and McCullagh (1983) The optimality of estimators arising from
quasi-likelihood functions are also shown in the two papers, in particular,
McCullagh (1983) shows there is a close connection between the asymptotic
optimality of quasi-likelihood estimators and the Gauss-Markov Theorem
In general, since Ri is unknown, we use ¯Ri(α) as a ’working’ correlation matrixand α is a q×1 vector of unknown parameters that the correlation matrix
depends on We write the matrix ¯R as a function of α because we cannot be surethat we have the correct model, thus it is appropriate to write it as a function of
Trang 191.3 Generalized Estimating Equations 7
α and αi itself is some function of the true correlation parameter ρi Note that qcan take values from 0 (independence model) to ni (n i −1)
2 in the case of a fullyunstructured correlation matrix
If R = I (independence model), GEE reduces to the usual GLM setup
Below are some common choices for R, the motivation for these structures can befound in Crowder & Hand (1990)
1 R = I, Ini×ni is the identity matrix
This structure implies that the measurements on the ith subject is
independent within the subject itself, i.e, yij is independent of yik for all
For lattice based observations, sometimes we can expect the correlations
between observations within the same subject to decrease over time A
Trang 201.3 Generalized Estimating Equations 8
simple way to model such a phenomenon is to allow the correlations to
decrease geometrically at a rate of ρ at each time point
3 Exchangeable (EXC) or compound symmetry
In clustered data, for example in teratological studies we expect the
offspring of a female rat in the same litter to share the same correlation ρfor the traits we’re measuring, thus this structure will come in handy
Trang 211.3 Generalized Estimating Equations 9
Suppose we observe that the correlations decreases at each time point
depending on how far they’re apart (typically ρ1 < · · · < ρm) but this
correlation drops to zero when they’re more than m time points away Thisphenomenon can be modeled using an MA(m) structure, which is
essentially a banded matrix with bandwidth m For example, when m = 1,
we have ones on the diagonal and ρ on the off diagonals above and belowthe main diagonal
Trang 221.4 Estimation of α using the moment method 10
1.4 Estimation of α using the moment method
To estimate α, Liang & Zeger (1986) proposed the moment approach
Let ˆεi = (yi− µi( ˆβ))0A−i 1 where ˆβ is the estimated regression coefficients and ˆεij
be the jth element of ˆεi We have,
εij = yij− µij
ˆ
σij( ˆβ) ∼ N(0, φ)and a general moment based approach (Wang & Carey (2003)) is to solve,
(1.3)For the AR(1) model, the moment method is to solve,
Pn i
j=1ε2
ij (the usual Pearson residuals), N =PK
i=1ni, K isthe number of subjects, p is the number of covariates and ni is the number of
observations per subject/cluster However, there are problems with this method(Crowder (1995)) when the correlation structure is misspecified, details will be
discussed in Chapter 2
Trang 231.4 Estimation of α using the moment method 11
In fact, for the AR(1) or MA(1) model, we can estimate α using all pairs lagged
by one unit of observation time, resulting in an estimate as follows,
Burg-type estimator in econometric time series analysis and it is well documented
in time series literature (Pourmadi (2001)) It’s derivation is along the lines of
information theory and it is a maximum entropy estimator of ρ
We will next show that the moment estimator in (1.5) is always well defined,
Without loss of generality, we only need to consider the inner summation
Trang 24Chapter 2
Problems with Estimation of α
In Liang & Zeger (1986), they proved that ˆβ is consistent even if the ’working’correlation structure is different from the true correlation structured providedthat the mean structure is correctly modeled, ˆα and ˆφ are consistently estimated.However, Crowder (1995) pointed out that the working correlation ¯Ri(α) maylead to infeasible results if specified incorrectly, this leads to a breakdown of theasymptotic theory for ˆβ The following 2 examples illustrates how the lack ofconsistency of ˆαunder misspecification can affect the estimation of ˆβ
Example 1
To illustrate the pitfalls of the moment method considered by Liang & Zeger(1986), consider (1.4) with ni = 3 where the working correlation is AR(1) and thetrue correlation is exchangeable To find ˆα, we have to solve the following
equation,
12
Trang 25ρ can be thought of as a estimator of ρ where we have correctly specified the
correlation structure Note that for an exchangeable correlation structure of
dimension 3, ρ ∈ (−1/2, 1) Hence, a problem arises when −1/2 < ρ < −1/3 as itwould lead to the solution of (2.2) being infeasible Even if all the roots of
equation (1.4) lies in the feasible range, there would still be the problem of
choosing the ’correct’ solution
Example 2
Assume now that the working correlation is AR(1) and the true correlation is
MA(1) and that ni = 3 Using (2.1) and taking expectations again, we have
Trang 26−2 cos(π/4)1 ≤ ρ ≤ −2 cos(3π/4)1 , i.e, −1/√2 ≤ ρ ≤ 1/√2 Therefore, if the true ρ isless then -1/2, there is a positive probability that −1/√2 ≤ ˆρ ≤ −1/2 and we runinto the same problem as in example 1 when the estimator for ˆα would be
undefined
Example 3
In this example, we assume that the true structure is autoregressive and the
working correlation is exchangeable Now using the moment estimators in Liang
& Zeger (1986), we have
ˆjk= 1(n − p) ˆφ
by approximating εijεik with its expectation when n is large Assuming that ˆφ
tends to φ and using the average correlation for estimating α,
Trang 27where d = (n − p)n(n − 1)/2 Observe that ˆα → n/(n − p) ≈ 1 as ρ → 1 Thus,although ˆα exists and converges as n tends to infinity, it is not consistent for anyrecognizable ’true parameter’ underlying the stochastic nature because of its
dependence on the sample size n
Trang 28Chapter 3
Methods for Estimating α
Apart from the moment method described in the chapter 1, various authors haveused estimating equations to estimate the correlation parameters In this chapter,
we present 3 of the methods which can be found in the literature The notationfor the estimating functions introduced in this chapter will be of the form, UQ,
UG and UC denoting the Quasi-Least Squares method, Gaussian method andCholesky method respectively
Trang 29where E(εi) = 0 and E(εiε0i) = φRi
Thus we have the following estimating equation and the objective is to find theˆ
αq (q for Quasi-Least Squares) that satisfies,
In that paper, the author showed that for commonly used correlation structureslike exchangeable, tridiagonal, AR(1) and unstructured Ri’s, there exists a
solution in the space where the solution is positive definite However, the
drawback is that ˆαis asymptotically biased even when the correlation structure
is correctly specified The estimating equation for ˆβ is identical to that proposed
in Liang & Zeger (1986), thus ˆβ is consistent asymptotically If the investigator isonly interested in the regression coefficients, QLS offers a feasible method to
obtain sensible results
In a follow up paper, Changanty & Shults (1999) modified their QLS method to
Trang 30is α = f (ρ) where f is a continuous and one to one function Denote ˆαq as the
Quasi Least Squares estimate of α, then ˆρq= f−1( ˆαq) is a consistent estimate of
ρ Note that the working matrix ¯Ri can be of any structure but clearly the
number of parameters in ¯Ri must be of the same number in the true Ri This
technique also assumes that the working correlation is correctly specified, in thisthesis we will carry out studies to investigate the impact of misspecification of theworking correlation matrix
In the Changanty & Shults (1999) paper, the authors also noted that the limitingvalue of ˆα under an AR(1) working correlation is
regardless of whether the true correlation structure R is AR(1), MA(1),
exchangeable or independent Thus, they propose that we should set the workingcorrelation matrix as AR(1) and use the bias-corrected estimate (denoted by cq,Corrected Quasi Least Squares)
Trang 313.2 Pseudolikelihood (Gaussian Estimation) 19
ˆ
αcq = 2 ˆαq
1 + ˆα2 q
since the AR(1), MA(1), exchangeable or independent are the most commonly
used correlation models used in analyzing balanced and equally spaced data
In the case of data that are unbalanced with ni measurements per subject and atirregularly timed intervals ti1, ti2, , tin i for i = 1, , K, the authors suggestedusing a markov and generalized markov structure for modelling the correlation
structure The bias corrected estimates can be obtained analogously but there
are no closed form solutions and the estimates have to be computed numerically
3.2 Pseudolikelihood (Gaussian Estimation)
To correct for the bias in the QLS method, we evaluate the bias of (3.1) as,
Trang 323.2 Pseudolikelihood (Gaussian Estimation) 20
For a given ˆβ, by minimising
φ−1X
i
ˆ
ε0iR−1i εˆi− log (|φ|Ri)
we can see the relation of estimating function (3.3) to the Gaussian distribution
is that it can be obtained by minimising -2 times the Gaussian loglikelihood Itcan be shown that this estimating function is unbiased even though ˆε is not
Gaussian (Crowder (1985))
Another way to derive (3.3) would be through the generalized least squares
method in which β is treated as known in the covariance (weighting) function
Thus, the bias corrected version of Quasi Least Squares together with U (β, α)
can be viewed as (Gaussian) pseudo-likelihood (Carroll & Ruppert, 1988, §3.2;
Davidian & Giltinan, 1995, §2.2-2.3)
Since the parameter β appears both in the mean and variance functions, we treat
β in the variance function as known (or distinct from the β in the mean
function) to avoid complications in the minimising of the loglikelihood function,this technique is known as ”decoupling” Another advantage of decoupling the
parameters is so that ˆβ remains consistent even when the working correlation ismisspecified, this is not the case when they are not decoupled It is clear from theestimating function that we have to estimate the scale parameter φ
More generally, instead of using the Gaussian likelihood as a vehicle for
estimation it might to of interest to try non-Gaussian distributions in the
Trang 333.3 Cholesky Decomposition 21
estimation procedure Possible candidates include the multivariate t (Lange,
Little, Taylor 1989) and the multivariate skew-normal distribution (Azzalini,
Trang 343.4 Covariance of the estimates 22
Thus, we can use UC1(αj, β) as another unbiased estimating function for αj
Analogous expressions to UC1 and UΓ1 can be obtained by decomposing
R−1i = U0
iD∗
iUi, where Ui is an upper triangular matrix and D∗
i a diagonalmatrix Denote the estimating functions derived from this decomposition by UC2and UΓ2 Letting UC = UC1+ UC2, we have
∂αj
DiLi+∂U
0 i
in a finite sample setting will be investigated in Chapter 5
3.4 Covariance of the estimates
Let the joint estimating functions of U(θ) = (U(β), U(α))0 with covariance
Trang 353.4 Covariance of the estimates 23
For the Cholesky decomposition method,
Trang 37Method and Cholesky Method
I Initialization Denote the estimating function by UW, where W can be
either G for the Gaussian method and C for the Cholesky method ˆβ(0) iscomputed under a working independence model(Ri = In i) by setting
ˆ
α(0)=0
II Computation of estimates
(a) Compute ˆε(k)i using ˆβ(k−1)
Trang 383.5 Computation of Estimates 26
(b) Solve UW(α(k−1), β(k)) = 0 (eqn 3.3 or eqn 3.4) to obtain ˆα(k)
(c) Find ˆβ(k) by solving U(β, ˆα(k)) = 0
(d) If k(ˆβ(k), ˆα(k))0− (ˆβ(k−1), ˆα(k−1))0k is less than specified tolerance thenstop, else continue until convergence
(e) Suppose convergence is achieved at iteration n, stop iteration and
return ( ˆβ(n), ˆα(n))0
Trang 39R(α) = (1 − ρ)In+ ρ110, where ρ ∈(-1/(n-1),1) and 1 is an n-vector of 1’s Next,
we will show that
R−1(α) = 1
1 − ρIn−
ρ(1 − ρ)(1 + (n − 1)ρ)11
0
27