Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual
Trang 1School of Mathematical Sciences Queensland University of Technology
Addressing Issues in Sparseness, Ecological Bias and Formulation of the Adjacency Matrix in Bayesian Spatio-temporal Analysis of Disease Counts
Arul Earnest
B.Soc.Sc (Hons) in Statistics, National University of Singapore
MSc in Medical Statistics, London School of Hygiene and Tropical Medicine,
University of London
A thesis submitted for the degree of Doctor of Philosophy in the Faculty of Science and Technology, Queensland University of Technology according to QUT requirements
Principal Supervisor: Professor Kerrie Mengersen
Associate Supervisors: Associate Professor Geoff Morgan
Professor Tony Pettitt
Trang 2KEYWORDS
Spatial, autoregressive, disease mapping, CAR model, birth defects, ecological bias,
neighbourhood weight matrix, forecasting, priors, Bayesian, MCMC, joint modeling
Trang 3ABSTRACT
The main objective of this PhD was to further develop Bayesian spatio-temporal models
(specifically the Conditional Autoregressive (CAR) class of models), for the analysis of
sparse disease outcomes such as birth defects The motivation for the thesis arose from
problems encountered when analyzing a large birth defect registry in New South Wales
The specific components and related research objectives of the thesis were developed
from gaps in the literature on current formulations of the CAR model, and health service
planning requirements Data from a large probabilistically-linked database from 1990 to
2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR)
and Midwives Data Collection (MDC) were used in the analyses in this thesis
The main objective was split into smaller goals The first goal was to determine how the
specification of the neighbourhood weight matrix will affect the smoothing properties of
the CAR model, and this is the focus of chapter 6 Secondly, I hoped to evaluate the
usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a
shared-component model in terms of modeling a sparse outcome, and this is carried out in
chapter 7 The third goal was to identify optimal sampling and sample size schemes
designed to select individual level data for a hybrid ecological spatial model, and this is
done in chapter 8 Finally, I wanted to put together the earlier improvements to the CAR
Trang 4For the first objective, I examined a series of neighbourhood weight matrices, and
showed how smoothing the relative risk estimates according to similarity by an
important covariate (i.e maternal age) helped improve the model’s ability to recover the
underlying risk, as compared to the traditional adjacency (specifically the Queen)
method of applying weights
Next, to address the sparseness and excess zeros commonly encountered in the analysis
of rare outcomes such as birth defects, I compared a few models, including an extension
of the usual Poisson model to encompass excess zeros in the data This was achieved via
a mixture model, which also encompassed the shared component model to improve on
the estimation of sparse counts through borrowing strength across a shared component
(e.g latent risk factor/s) with the referent outcome (caesarean section was used in this
example) Using the Deviance Information Criteria (DIC), I showed how the proposed
model performed better than the usual models, but only when both outcomes shared a
strong spatial correlation
The next objective involved identifying the optimal sampling and sample size strategy
for incorporating individual-level data with areal covariates in a hybrid study design I
Trang 5SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number
of controls, provided the lowest AMSE
The final objective involved combining the improved spatio-temporal CAR model with
population (i.e women) forecasts, to provide 30-year annual estimates of birth defects at
the Statistical Local Area (SLA) level in New South Wales, Australia The projections
were illustrated using sixteen different SLAs, representing the various areal measures of
socio-economic status and remoteness A sensitivity analysis of the assumptions used in
the projection was also undertaken
By the end of the thesis, I will show how challenges in the spatial analysis of rare
diseases such as birth defects can be addressed, by specifically formulating the
neighbourhood weight matrix to smooth according to a key covariate (i.e maternal age),
incorporating a ZIP component to model excess zeros in outcomes and borrowing
strength from a referent outcome (i.e caesarean counts) An efficient strategy to sample
individual-level data and sample size considerations for rare disease will also be
presented Finally, projections in birth defect categories at the SLA level will be made
Trang 6TABLE OF CONTENTS
2.2.3 Areal-level indices of socio-economic status 14
2.3 Definition and classification of birth defects 16
2.4 Spatial and temporal trends of birth defects in New South Wales,
Australia 18
3.3.5 Common risk factors for caesarean section rates/ spatial variation 31
Trang 75.5 Specifying the hyperprior distribution 78
Trang 9STATEMENT OF ORIGINAL AUTHORSHIP
"The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution To the best of
my knowledge and belief, the thesis contains no material previously published or written
by another person except where due reference is made”
Arul Earnest
26 th February 2010
Trang 10ACKNOWLEDGEMENTS
I would like to thank my principal supervisor, Professor Kerrie Mengersen, from
Queensland University of Technology (QUT), for her unlimited guidance and
supervision throughout the course of my PhD candidature I am indebted to her for
introducing the field of Bayesian statistics to me My appreciation also goes out to
Professor Tony Pettitt for facilitating the smooth flow of my PhD studies I would also
like to express my gratitude to my associate supervisor, Associate Professor Geoff
Morgan, from the Northern Rivers University Department of Rural Health (University of
Sydney) for constantly providing input on my PhD, in particular the epidemiological,
study design and clinical implication aspects of the thesis I have certainly enjoyed the
numerous thought-provoking discussions we had in his office in Lismore I am equally
indebted to Professor John Beard, director of Ageing and Lifecourse at the World Health
Organisation, who was my previous supervisor I would like to credit him with
providing me with the opportunity to start on this PhD studies, and also for his generous
advice and guidance on the manuscripts resulting from this thesis My sincere gratitude
goes to Dr Lee Taylor and Dr David Muscatello from the New South Wales Department
of Health for providing me with useful advice on the data upon which this thesis is built
on, and also valuable opinion on the practical applications resulting from this thesis I
Trang 11CHAPTER 1 INTRODUCTION
1.1 Primary research aims and motivation
This thesis aims to answer questions related to the small area analysis of sparse disease
counts in a geographical region The first question relates to the formulation of the
Conditional Autoregressive (CAR) model, a commonly used statistical model in the
analysis of geographically aggregated data Specifically, I wanted to evaluate whether
the formulation of the neighbourhood weight matrix has any impact on the smoothing
properties of the CAR model In addition, I wished to examine whether there were any
differences between the adjacency and distance-based methods of assigning neighbours
in terms of recovering the underlying relative risk estimates
The second hypothesis relates to the modeling or estimation of a sparse outcome, such as
birth defects The questions I wished to answer were: “Can we better estimate the
outcome with a sparse count by jointly modelling it with another related outcome that
may share some latent risk factors?” and “Can we improve on the estimates by
incorporating a component (zero-inflated Poisson component through a mixture model)
to model the excess zeros in the data?”
The third broad question relates to a CAR regression model, and includes both
Trang 12performed an extensive simulation analysis to evaluate 13 different scenarios, including
various sampling schemes and variations in sample size
The fourth aim of this thesis was to provide a method for forecasting sparse outcomes at
a small-area level, which took into account spatial correlation in the data, optimal
neighbourhood weight matrix formulation, consideration of excess zeros in the data, as
well as population (women) forecasts for the next 30 years at the Statistical Local Area
(SLA) level in New South Wales, Australia Sensitivity analysis based on different
population scenarios was also assessed
The motivation for this thesis came about from the challenges faced when analyzing
birth defects from a large registry in New South Wales (NSW), Australia, as part of an
Australian Research Council (ARC) linkage grant The first challenge faced was
sparseness of the disease outcome, especially when individual birth defects were
mapped in geographical locations, or even when defects were analysed in broader
groupings, according to the International Classification of Disease- British Pediatric
Association (ICD9-BPA) coding system The problem was compounded when there
were a large number of areas with zero counts of particular defects Secondly, one had to
Trang 13The impetus to fine-tune the CAR model was primarily driven by gaps in literature,
identified after an extensive literature review on single-disease and multiple-disease
CAR models was performed Both spatial only and spatio-temporal models were
evaluated and compared The review revealed that most of the models were applied to
outcomes that were not rare, and applied to data across broad time intervals, thus
ensuring that there were enough cases in each time point Disease mapping studies
involving birth defects were few, and none of them actually accounted for spatial
correlation in the data Almost all the spatial studies used the simpler formulation of the
Queen adjacency method of assigning neighbours, which I suspect was done out of
convenience The various formulations of the CAR models also failed to incorporate
sparseness in the data, implicitly or explicitly
Trang 141.2 Content and scope of thesis
This section details what is covered in this thesis and areas which are not within the
scope of this manuscript This section also provides the links between the various
chapters
In chapter 1, the motivation for undertaking the study is stated, along with the main aims
of this thesis The content, scope and structure of the thesis are also presented in this
chapter The source of data used in the analysis is described in chapter 2 Here, I also
provide the definition and classification of birth defects A description of the current
state of birth defects in New South Wales, in terms of spatial and temporal trends is
given in this chapter
A comprehensive literature review is provided in chapter 3 Summarized components of
the literature review are included in subsequent chapters, which are structured as
manuscripts to be submitted for publication Firstly, I summarise spatial analytical
studies in relation to birth defects, to identify gaps in literature Secondly, I provide a
review on selected risk factors, with a view to inform the analytical models for a
subsequent analysis that combines both areal with individual risk factors (i.e chapter 8)
Trang 15Chapter 4 introduces the CAR model I provide readers with an understanding of the
context upon which the CAR model is applied and describe the two main fields of
application: namely disease mapping and geographical correlation studies The
mathematical properties of the CAR model are also described, along with a brief section
on the adjacency matrix, which introduces a subsequent chapter which examines in
detail the impact of various neighbourhood weight matrices on the smoothing properties
of the CAR model (i.e chapter 6)
In the same chapter 4, I also discuss the strengths and limitations of the various types of
CAR models commonly used In addition, I examine the properties of spatial and
spatio-temporal models, including specific comparisons about the nature of data (sparseness of
outcome) used in the studies reviewed, along with the priors and model selection
techniques The results from these comparisons inform the modelling strategy adopted in
subsequent chapters Comparisons were made within the multivariate (i.e models
examining more than one disease outcome simultaneously) classes of models, and the
results used in chapter 7
The CAR model predominantly uses the Bayesian framework of analysis To help
readers familiarize with the context of Bayesian modeling, an introduction to Bayesian
Trang 16(MAUP) and Bayesian model convergence diagnostics are discussed briefly, as these do
not relate to the main objective of the thesis
In chapter 6, I examine in detail, the effect of various choices of neighbourhood weight
matrices (ranging from adjacency to distance-based functions, as well as weights based
on key covariates) on the smoothing properties of the CAR model Addressing the issue
of sparse disease count is the focus of chapter 7, where I investigate the performance of
a CAR model with a zero-inflated Poisson extension, in terms of its ability to recover the
underlying risk surface of specific birth defects, such as Spina Bifida and Trisomy 21 I
also demonstrate how the model can be strengthened by incorporating a shared
component, via jointly modeling birth defects with a referent outcome (caesarean
counts)
Chapter 8 discusses in detail the major drawback of ecological analysis (i.e potential
ecological bias) and reviews the literature for suggested strategies to incorporate
individual-level data with areal level data, in order to minimize this potential bias
Through extensive simulation studies, I investigate the performance of various sampling
strategies, along with modifications in sample sizes, and examine how they fare for
Trang 17be seen in the next 30 years from 2001 for sixteen randomly selected SLAs The
strengths and limitations of this thesis, as well as areas for future research, are the focus
of the discussion in chapter 10
In this thesis, I have excluded discussions on other seemingly related models such as
multi-level models and statistical models to analyse point process data, as my main
focus is the CAR model The aims and objectives as well as the nature of data utilised by
the other models are generally different from studies which use the CAR model, as I will
briefly describe here Multi-level models, or random effects model as they are
commonly known, are often used to study variables which can vary at more than one
level The levels can be nested hierarchicaly, and the models can be formulated within
both the frequentist and Bayesian frameworks Gelman provides details on the theory
behind these models, as well as various formulations and applications of multi-level
models(1) In the context of our spatial analysis, the CAR convolution prior (to be
discussed later in the thesis) is a more specific formulation of a multi-level model, where
the variance of the relative risk estimates is partitioned into both spatially structured and
spatially unstructured random effects
As for point-process models, one basic goal is to determine whether cases occur at
Trang 18models have been used to fit stochastic epidemic models to study measles epidemics in
one study(5) Gelfand and colleagues have also used spatiotemporally varying
coefficient models to study and make predictions of climate data, such as precipitation
and temperature, which are measured at fixed locations(6) The fundamental difference
between these models and CAR models is that for the latter, data is available at an
aggregate level, as opposed to fixed locations or at continuous geographical scales
Trang 191.3 Structure of thesis
The thesis is structured in the following way It consists of a series of chapters that are
either published or submitted for publication and unpublished Chapter 6 “Addressing
the Neighbourhood Weight Matrix” has been published in the International Journal of
Health Geographics Chapter 7 “Modelling Sparse Disease Counts” has been accepted
for publication in the Health and Place Journal Chapter 8 “Strategies for Combining
Areal with Individual Data” and chapter 9 “Forecasting Birth Defects at the Small Area
Level, NSW” have been submitted to the Statistics in Medicine journal and the BMC
Health Services Research journal respectively These chapters have been included in the
same format as they were submitted for publication This explains the variations in the
way the chapters are presented, the different sub-headings used in the various chapters,
and the distinct format of the bibliographies required by the various journal The rest of
the chapters consist of unpublished works I have included the bibliographies separately
at the end of each chapter for the published works, and one overall bibliography for the
rest of the unpublished chapters at the end of the thesis
Trang 201.4 List of publications and conferences arising from thesis
Arul Earnest, Geoff Morgan, Kerrie Mengersen, Louise Ryan, Richard Summerhayes,
John Beard Evaluating the effect of neighbourhood weight matrices on smoothing
properties of Conditional Autoregressive (CAR) models International Journal of Health
Geographics, November 2007, Volume 29;6: pp 54-65
Arul Earnest, John Beard, Geoff Morgan, Douglas Lincoln, Richard Summerhayes,
Deborah Donoghue, Therese Dunn, David Muscatello, Kerrie Mengersen Small Area
Estimation of Sparse Disease Counts using Shared Component Models- Application to
Birth Defect Registry Data in New South Wales, Australia Health and Place Journal
(Accepted for publication 23 February 2010)
Arul Earnest, John Beard, Geoff Morgan, Deborah Donoghue, Therese Dunn, David
Muscatello, Danielle Taylor, Kerrie Mengersen Sampling and sample size strategies for
including individual with areal-level covariates in the spatial analysis of a sparse disease
outcome Submitted to Statistics in Medicine Journal, Oct 2009
Arul Earnest, Kerrie Mengersen, Geoff Morgan, John Beard Forecasting Birth Defects
Trang 21Arul Earnest Evaluating the effect of neighbourhood weight matrices on smoothing
properties of Conditional Autoregressive models Contributed talk for Spring Bayes
27-29 September 2006, Queensland University of Technology
Arul Earnest, John Beard, Geoff Morgan, Douglas Lincoln, Richard Summerhayes,
Deborah Donoghue, Therese Dunn, David Muscatello, Kerrie Mengersen Modelling
Sparse Disease Counts Using the Shared Component Model Poster presentation at the
International Society for Bayesian Analysis, 9th World Meeting, Hamilton Island,
Australia, July 20-25 2008
Arul Earnest, John Beard, Geoff Morgan, Douglas Lincoln, Richard Summerhayes,
Deborah Donoghue, Therese Dunn, David Muscatello, Kerrie Mengersen Modelling
Sparse Disease Counts Using the Shared Component Model Poster presentation at the
National Healthcare Group Annual Scientific Congress 7-8 November 2008, Singapore
The poster won the first prize in the best poster competition for the Quality/ Health
Services Research section
Trang 22CHAPTER 2 DATA
2.1 Summary
The aim of this chapter is to provide readers with an understanding of the sources of data
used in subsequent analyses in this thesis Selected birth defects are also described,
along with the classification or grouping of birth defects A background description of
current spatial and temporal trends of birth defects in New South Wales is provided as a
precursor to subsequent work in this area It is clear from existing official health
department reports that birth defects do indeed exhibit clear spatial relationships as well
as a time gradient
2.2 Sources of data
2.2.1 Birth defects
De-identified birth defect records were obtained from the NSW Birth Defects Register
(BDR) The register has been operational since 1990, and in the early years, reporting of
defects was done on a voluntary basis Since 1998, doctors, hospitals and laboratories
have been required by law to report all birth defects These defects included those
observed during pregnancy, at birth or up to one year of life Each birth defect is
recorded as a separate record, so the total number of congenital abnormalities reported is
Trang 232.2.2 Births and maternal characteristics
Information on births in NSW from 1990 to 2004 was obtained from the NSW
Midwives Data Collection (MDC), which is a population-based register just like the
BDR Covering all births in NSW (including public, private and home-births), the MDC
is dependent on the attending midwife or doctor to complete and submit a notification
form whenever a birth occurs(7).The registry includes all livebirths and stillbirths of at
least 20 weeks gestation or at least 400 grams birth weight I also obtained maternal
demographic information (e.g residential address at time of birth, maternal age at
delivery, maternal smoking during pregnancy, maternal diabetes, delivery in private
versus public hospital), pregnancy, labour, delivery and perinatal outcomes from the
MDC
Each of the birth records in NSW within the study period was geocoded (i.e given a
longitude and latitude) based on the mother’s residential address at the time of birth
This geocoding was done by Mr Richard Summerhayes from the Northern Rivers
University Department of Rural Health using geocoding software developed by the
NSW Health and Australian National University Further details on the software called
FEBRYL, can be found in this reference(8) Each record was then assigned to the 2001
Census Collectors Districts (CCDs) within which they fell in There are 11,706 CCDs in
Trang 24also probabilistically linked to the MDC, and this was carried out by the Department of
Health, NSW The combined data was used in a subsequent analysis in the thesis,
involving the association between birth defects and individual maternal characteristics
along with areal covariates, such as socio-economic status of the area that the mother
was living in
In 1998 a 2% sample of Midwives Data Collection records (N=1703) was validated
against other hospital records(9) The excellent quality of this database is reflected in
high correlations, including a 99.1% agreement on gestational diabetes (kappa 0.87),
94.9% agreement for smoking in pregnancy (kappa 0.85), 96.5% agreement for
birthweight (kappa not calculated) and 84.8% agreement for gestational age (kappa
0.81) This study, and access to both BDR and MDC databases, was approved by the
New South Wales Population & Health Services Research Ethics Committee
2.2.3 Areal-level indices of socio-economic status
I used data from the Australian Bureau of Statistics (ABS) to describe the level of social
and economic well-being in areal levels of NSW This data was freely available on the
ABS website, and a technical paper can be found here (10) The following 4 indices
Trang 252 Index of Relative Social Disadvantage Higher values reflect lack of
disadvantage, which has a subtle difference from the index above The variables
that were used to compute this index included income, educational attainment,
unemployment, and dwellings without motor vehicles
3 Index of Economic Resources Variables such as income, expenditure and assets
of families, such as family income, rent paid, mortgage repayments, and dwelling
size went into computing this index
4 Index of Education and Occupation This index took into account the proportion
of people with a higher qualification or those employed in a skilled occupation
The data were available at the various Australian Standard Geographical Classification
(ASGC) levels, starting from the most basic Census Collection District (CCD) to the
Statistical Local Area (SLA) level There are problems associated with the simple
averaging up of the indices from CCD to SLA level, and I used an index that was
calculated at the SLA level and population-weighted This was performed by the ABS
Data on the four indices were standardised by the ABS to have a mean of 1000 units and
Trang 26used to derive the final index score for each CD Further details on how these indices
were derived from variables obtained from the 2001 census is described in an
information paper available from the ABS(10) It should be noted that the indices
measure the socio-economic well-being of a region, and not the individual, and this
subtle difference is exemplified in a subsequent analysis presented later in the paper
2.3 Definition and classification of birth defects
A birth defect can be thought of as a physiological or structural abnormality that is
present at birth and is significant enough to be considered a problem According to the
US Centers for Disease Control and Prevention, most birth defects are thought to be
caused by a complex mix of factors including genetics, environment, and behaviors(11).
Much of the analysis for this thesis draws on data from the NSW Birth Defects Register
The Register uses the following definition for a birth defect: ‘Any structural defect or
chromosomal abnormality detected during pregnancy, at birth, or in the first year of life,
excluding birth injuries and minor anomalies such as skin tags, talipes, birthmarks, or
clicky hips(7)
Trang 27the state from which the data for this study was drawn, relies on the BPA classification
system(13) that is basically organised by body system(7) Table 1 shows a list of birth
defects recorded by the Registry using this approach, together with a short description
In the United States, the Centers for Disease Control and Prevention uses a classification
system that is modified from the original BPA system(14) The key advantage of this
system is that it allows researchers to describe more specific details about the birth
defects and related conditions In particular, it describes the laterality of the defect (i.e
whether the defect was on the right or left part of the body) and provides greater
specificity for a defect One disadvantage of this approach is that the analysis of such
data becomes more challenging due to the sparseness of the defects as one becomes
more specific
Table 1 Description of selected birth defects from the NSW birth defect registry Defect Description
Anencephaly Absence of the cranial vault, with the brain tissue
completely missing or markedly reduced
Spina bifida Defective closure of the bony encasement of the spinal
cord, through which the spinal cord may protrude Encephalocele Protrusion of brain through a congenital opening in the
skull Hydrocephalus Dilatation of the cerebral ventricles accompanied by an
accumulation of cerebral fluid within the skull
Buphthalmos Enlargement and distension of the fibrous coats of the
Trang 28turned outward
Polydactyly Presence of additional fingers or toes on hands or feet Syndactyly Attachment of adjacent fingers or toes on hands or feet Craniosynostosis Premature closure of the sutures of the skull
Exomphalos Herniation of the abdominal contents into the umbilical
cord
Gastroschisis A defect in the abdominal wall not involving the
umbilicus and through which the abdominal contents herniate
Cystic hygroma A sac, cyst or bursa distended with fluid
Centre for Epidemiology and Research NSW Department of Health New South Wales Mothers and Babies 2005 N S W Public Health Bull 2006; 18(S-1); pp 135
2.4 Spatial and temporal trends of birth defects in New South Wales, Australia
Across all states in Australia, there has been considerable variation in the reported rates
of birth defects over the past 20 years For instance, the rate of all malformations ranged
from 159.4 per 10,000 births (1981-1995) to about 175.2 per 10,000 births (1997) There
was also gross spatial variation in the reported rates between states in the period
1991-1997, with highest rates found in Victoria (229.2 per 10,000 births), followed by ACT
(222.3 per 10,000 births) and Queensland (194.3 per 10,000 births)(15)
However, since the criteria and source of notification varies by state, these trends may
Trang 29comparisons, due to small numbers At this stage, we try not to be unduly concerned
about the reasons for spatial variation, except to note that defects exhibit both spatial and
temporal variation, even at the broader scales of analysis
In NSW, state-wide surveillance of birth defects is monitored through the Birth Defects
Register (BDR), which is administered through the NSW Department of Health The
overall rate of birth defects appears to have been stable between 1999 and 2004
However, when the defects were examined by individual diagnostic categories, there
was considerable year to year variation Ventricular septal defect, for instance, saw rates
ranging from 2.1 per 1,000 births in 2002, to 0.9 per 1,000 births in 2003 and 2.1 per
1,000 births in 2004(7, 16)
Within NSW, there was spatial variation in the reported birth defect rates for the 8
different administrative health areas between 1999 and 2005 For example, the NSW
Mothers and Babies Report 2005 found elevated rates of birth defects in the Hunter and
New England area(7) However, it should be noted that there are some issues to note
when making this sort of comparisons across regions The first involves mothers
residing near state borders and the possibly of them going interstate for treatment, where
they may nominate an interstate place of residence for the duration of treatment It is
Trang 30sources of spatial variation at this stage of the project, but rather make the point that
spatial variation in reported rates of birth defects at a smaller aggregate level is inherent
in the data Health data in NSW can be grouped at various geographical scales, including
the Census Collection District (CCD) and the Statistical Local Area (SLA) The CCDs
are the smallest spatial unit, and there are 11,706 units in NSW These CCDs can be
aggregated up to broader groupings, including the SLA and Local Government Area
(LGA) In urban areas, the average number of dwellings per CCD is about 220, whereas
this number drops considerably for CCDs in rural areas There is also considerable
variation in the geographical size of these CCDs (i.e interquartile range of 1 km2 to 62
km2)
Trang 31CHAPTER 3 LITERATURE REVIEW
3.1 Summary
The aim of this chapter is to provide a comprehensive literature review in a few selected
important topics Firstly, current research on the spatial analysis of birth defects is
summarized Next, I examine the prior evidence on the relationship between selected
risk factors such as maternal age, maternal smoking, maternal diabetes mellitus status
and socio-economic indicators (both areal and individual measures) and birth defects
The aim is to identify which particular defects are associated with the risk factors This
chapter also identifies risk factors that are common to birth defects and caesarean rates,
as well as describing spatial variation in caesarean rates The results are used in
subsequent chapters examining the risk factors for birth defects, as well as the joint
modeling of two related outcomes
The search strategy used to identify studies for discussion in this chapter is described
here I searched for all relevant research articles in MEDLINE, which contains
bibliographic citations and author abstracts from more than 4,000 biomedical journals
published in the United States and 70 other countries The PubMed on-line search
engine tool was used for this purpose In addition, I also went through the bibliographic
lists of relevant journal articles to identify additional pieces of research to include in my
Trang 32that were performed in the laboratories, as well as those not published in English For
the section on risk factors for birth defects, I used the following search terms For
maternal age, as an example, I used “maternal age” and “birth defect”, “age” and “birth
defect”, as well as “risk factor” and “birth defect” more generally For caesarean rates,
the following key-words were used: “caesarean” and “risk factor”, “caesarean” and
spatial”, as well as “caesarean” and “geographic” I would like to add that this was not a
systematic review exercise, and hence I did not provide a summary of the results from
the literature Rather, the studies identified in the literature were used to justify the use
of the selected risk factors, in studying their association with birth defects in my thesis
3.2 Spatial analysis of birth defects
In many countries, information on birth defects is obtained and analysed from national
or regional birth defect registries These registries often have data on the mother’s
residential address This location data enables researchers to undertake various forms of
spatial analyses on the epidemiology of birth defects Application of spatial analyses
ranges from simple mapping of defects, to identifying clusters and exploring the
influence of environmental factors, such as air pollution, contaminated sites, disinfection
byproducts from water chlorination on the occurrence of birth defects Ecological
Trang 33Few studies have examined the spatial distribution of birth defects and their association
with possible spatially varying risk factors High altitude has been implicated in at least
three studies in South America In the first, looking at 53 hospitals across Latin
America(17), adjusted relative risks were found to be significantly higher among those
living in the highland, specifically for cleft lip (RR=1.57, 95%CI: 1.27-1.94), microtia
(RR=3.21, 95%CI: 2.35-4.79), preauricular tag (RR=2.09, 95%CI: 1.86-2.36), branchial
arch anomaly complex (RR=1.79, 95%CI: 1.23-2.61), constriction band complex
(RR=1.92, 95%CI: 1.11-3.31) and anal atresia (craniofacial defects) (RR=1.61, 95% CI:
1.01-2.57) On the other hand, risks were lower for spina bifida (RR=0.57, 95% CI:
0.37-0.78), anencephaly (RR=0.33, 95% CI: 0.20-0.54), hydrocephaly (RR=0.41, 95%
CI: 0.22-0.77) and pes equinovarus (neural tube defects) (RR=0.70, 95%CI: 0.51-0.96)
The second study also linked altitude with the risk of microtia, with a relative risk of
2.66 (p<0.01) comparing those living more than 1000m above sea-level versus those
living less than 500m(18), whilst the third study(19) found that cleft lip/ palate birth
prevalence rates were higher for those living at high altitude above sea-level (effect size
not provided)
In another study, researchers examined indicators of exposure to industrial activities(20)
and found significant associations between textile industry and anencephaly (RR=1.59,
Trang 34relative risk of having the second infant with the same defect was higher among those
who lived in the same municipality (RR=11.6, 95%CI: 9.3-14.0) during both
pregnancies, as compared to those who moved to another municipality (RR=5.1, 95%CI:
3.4-6.7)(21) In contrast, the second study did not find any significant change in the
frequency of facial-cleft defects among mothers who changed municipality of residence
(RR=0.9, 95%CI: 0.6-1.5)(22)
In studies that examined the spatial variation (in particular geographical difference) in
risks of specific birth defects, neural tube defects seemed to be the most common defect
that was found to be spatially correlated(23-28), followed by clefts(19, 29),
anophthalmia and microphthalmia(30), where prevalence was found to be higher in rural
versus urban areas as well as diaphragmatic hernia and gastroschisis(27) Birth defects
were also found to vary by the level of aggregation: e.g across register areas and
hospital catchments, but not below this level(31) A study from England found variation
in reported rates by local register and hospital catchment area (p<0.001), but not by area
deprivation scores (p>0.1, effect size not provided)(32) Proximity of maternal residence
to landfill sites was found to be associated with certain birth defects such as neural tube
defects (RR=1.05, 95%CI: 1.01-1.10), hypospadias and epispadias (RR=1.07, 95%CI:
Trang 35nonsyndromic cleft lip/ palate across public health region of residence across Texas,
USA(38)
3.3 Risk factors for birth defects
There have been numerous studies that have looked at the association between various
risk factors and the occurrence of congenital abnormalities at birth Study designs have
included case-control (including the use of sick controls), cohort studies, and more
commonly the use of birth defect registries to explore associations with risk factors
Variables of interest have included socio-demographic covariates, genetic, nutritional,
infectious, and other environmental factors It is not the aim of this thesis to undertake
an evaluation of the various risk factors, as this has been carried out by other authors
Here, I discuss specific risk factors that were commonly found in the literature and
available to us for analysis, and these include maternal age, maternal smoking, maternal
diabetes status and socio-economic status I also examine studies that have analysed the
defects spatially/ geographically, as this project’s interest is in studying risk factors from
the spatial perspective
3.3.1 Maternal age at delivery
Maternal age is the most commonly studied risk factor in most studies involving birth
Trang 36Younger mothers have been shown to have a higher risk of giving birth to babies with
gastroschisis(39-43), chromosomal abnormalities(43, 44), cystic hygroma, autosomal
recessive disorders, monogenic disorders, ventricular septal disorders(43), anencephaly,
all ear defects, female genital defects, polydactyly, omphalocele(42) Older mothers, on
the other hand, are known to have a higher risk of having babies with various types of
atresia(30, 42), anophthalmia, microphthalmia(45), heart defects, right outflow tract
defects, males genital defects including hypospadias, craniosynostosis(42), trisomy 18,
trisomy 21, dysplasia of hip, chromosomal abnormalities(43, 46), pancreas, down
syndrome(47) and cleft lip/ palate(48)
Some defects like chromosomal abnormalities are related to risk factors, associated with
both older and younger mothers, as we can see from above Neural tube defects also
display such a U-shaped relationship with maternal age (26, 28, 49, 50) On the other
end of the spectrum, other studies have shown that there is no significant relationship
between maternal age and defects such as cleft palate and lip(29, 51, 52) along with
ventricular septal defect(53), severe birth defects(54) and anencephalus and spina
bifida(55)
Trang 37defects, persistent ductus arteriosus(58), isolated craniosynostosis(59), kidney
malformations(60), oral clefts(48, 61-66), limb reduction birth defects(67), deformities
of the foot(68), gastroschisis(40, 41, 69, 70), defects of the cardiovascular system(71),
hydrocephaly, polydactyly/ syndactyly/ adactyly(69), clubfoot(69, 72, 73) and defects in
general(47)
Most of the studies have involved the case-control study design and included data from
birth defect registries However, there were two meta-analyses (74, 75) that looked at the
combined (across studies) effect of maternal smoking on the occurrence of oral cleft
birth defects Both studies found a relationship between smoking and risk of oral clefts
Some studies (48, 62) managed to show a dose-response relationship between smoking
and birth defects, a sign of the causality criteria met for an observational study (48, 59,
62, 65)
The use of self-reported smoking as a risk factor variable in epidemiological studies has
been criticized because of the possibility of recall bias This is especially true in birth
defect studies, which use normal controls (i.e mothers with babies without any birth
defect who may not be as motivated to accurately recall past behaviours) In the
literature that we reviewed, we found some studies that used affected or sick controls
Trang 38oral clefts, musculoskeletal malformations(78), conotruncal heart defects, limb
deficiencies(77, 79), Down’s syndrome(80), and nonsyndromic oral cleft(55)
3.3.3 Socio-economic indicators
There is growing evidence that socio-economic disadvantage is associated with higher
risk of a range of adverse health outcomes Epidemiological research in this field is often
hampered by difficulties in eliciting data on socio-economic status from study
participants This is particularly because such information may be sensitive to study
participants, or not be collected by routine data collection systems Consequently,
researchers often resort to using a wide variety of surrogate indicators of socio-economic
status, including occupation, income, race, education, health insurance type, etc
Gonzalez provides a systematic review of studies that looked at the relationship between
socio-economic status (as measured by education or occupation) and ischemic heart
disease (IHD), and they report a clear relationship between the two variables and risk of
IHD (81) A strong inverse relationship between socioeconomic status (SES) and risk of
cardiovascular disease and mortality has also been highlighted in another study (82)
Sometimes, socioeconomic information may not be collected from the individual
Trang 39substitute for individual level information and any bias introduced by this approach is
likely to lead to conservative estimates of association (83)
These areal measures are often derived from data available from the census or other
ad-hoc independent surveys Composite indices measuring some form of disadvantage are
then formed using statistical techniques such as factor analysis or principal component
analysis The areal measures are postulated to measure contextual socio-economic
effects of a person’s residential area I reviewed literature on birth defects to examine the
various types of socio-economic status measured and their subsequent relationship to
specific birth defects As I will show in a subsequent chapter, the effect of
socio-economic status on the occurrence of birth defects seems to operate at both the
individual and areal-level, and complex statistical models are needed to incorporate this
hierarchical structure in the data (i.e data measured at disparate scales)
Among the various individual-level socio-economic indicators, occupation(20, 84-90)
seems to be the most commonly studied covariate, followed by education(26, 28, 48, 84,
85, 91, 92), race(23, 38, 50, 93-95), income(49, 70, 84) and insurance status(26, 47)
Areal-level measures of socio-economic status have been evaluated in a number of
studies of occurrences of birth defects(23, 30, 85, 96-101)
Trang 40having babies with the defect For facial clefts, education (48, 85) and occupation(85,
87, 88) were the two main covariates implicated Areal-level measures were shown to be
related to neural tube defects in a number of studies (23, 97, 99) along with facial clefts
(85, 100, 101) Two particular studies evaluated both individually measured
socio-economic status, along with areal measures The results were mixed The first study
found a significant effect of lower individual socio-economic status and residence in a
SES-lower neighbourhood on the occurrence of neural tube defects (OR=1.7, 95%CI:
1.1-2.5) This was when we looked at maternal employment and neighbourhood
unemployment as an indicator of SES (99) The other study (101) found an increased
risk of spina bifida (OR=2.3, 95%CI: 1.0-5.5) and cleft palate (OR=2.3, 95%CI: 1.4-3.8)
with a household SES index, but not with an individual SES measure such as maternal
unemployment, with an OR=1.2, 95%CI: 0.7-2.2 for spina bifida and OR=1.2, 95%CI:
0.9-1.7 for cleft palate respectively
3.3.4 Maternal diabetes mellitus
There are two general types of diabetes(102) Diabetes type 1 is when the body produces
too little insulin that the body can’t make use of blood sugar for energy Type 2 diabetes
happens when the body makes too little insulin or is unable to use the insulin to produce