1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Genetic determinants of infectious disease susceptibility

104 204 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 104
Dung lượng 1,62 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Owing to their highly polymorphic nature, they are not ideal for population based association studies, which have now become the mainstay of genetic epidemiological study design.. 2.6 Ge

Trang 1

1 Introduction

Beginning last century, modernization and improved health care have brought a reduction in infectious diseases mortality mostly seen in developed countries, but the burden of infectious diseases in developing countries, such as Indonesia, continues to remain high[14] Surprisingly, in recent years, there has been a resurgence in TB incidences[15], even among the developed nations, and this phenomena has sparked renewed interest in epidemiological and other studies of infectious diseases The World Health Organization (WHO) and other organizations are reaching out to developing countries, where infectious diseases are endemic[16, 17] Despite the massive efforts made in introducing drugs and vaccines for treatment and prevention, many common infectious diseases, such as TB and hepatitis B, have yet to be brought under sufficient control, and the spread continues [15, 18-21]

Surveillance of infections revealed that certain individuals could be exposed to large doses of infectious agent, but were recalcitrant to infection[22] Furthermore, even for those infected, a large percentage of these individuals have the immunity to naturally clear the infection without disease symptoms[6] This suggests that host defense is often

an effective means of controlling infection Heritability studies also indicated a strong genetic component in determining the variable degrees of immune response to infection, and disease susceptibility among individuals[1, 2] The burden of infectious diseases has been tremendous throughout the history of mankind, not only in economic terms, but especially as a major selective pressure on our genome [23, 24] A diverse immunological response to a wide range of infectious agents translates into an evolutionary advantage, thus it is not surprising to find genes involving immunity as the

Trang 2

most abundant and diverse in the human genome[25] With their multiple adverse social, economical, and evolutionary effects, there is indeed motivation to study host genetics in controlling immune response to infectious diseases[5-7]

Global initiatives, such as the Human Genome Project, the International HapMap Consortium and the recent 1000 Genomes Project, have cataloged most of the genetic variations that are common in humans[26-30] These efforts have spurred the advent of high throughput genotyping technology for conducting genetic association studies in a cost effective and efficient manner[31] This essential advancement in genomics has helped to materialize the main objective of the current PhD project, which is to identify genetic variation that influences susceptibility to infection or the variable outcomes to infection Two genetic mapping studies are used to achieve the aim: the first is a case-control study on pulmonary TB susceptibility, and the latter, a post vaccination antibody titer response against hepatitis B virus infection The study populations for both studies were drawn from Indonesia, where these diseases are endemic[16, 20] (Figures 1 and 2), and there is significant interest in elucidating the genomic mechanisms of disease control

Trang 3

Figure 1(A): Annual number of new reported TB cases [16]

Trang 4

Figure 1(B)[16]

Trang 5

Figure2: Worldwide distribution of chronic HBV infection and annual incidence of

primary hepatocellular carcinoma[20]

Trang 6

In addition, although there is a surprisingly high percentage of similarity in our genome between two individuals (~99%), there are vast differences in many phenotypic aspects, from the outward appearance to the internal responses to environmental assaults, such as pathogen infections Contributing to these differences are, interestingly, small DNA variations They comprise of the remaining small percentage of differences, which renders each of us unique, and are important in determining our response to diseases and well-being

2.2 Genes and immunity

Over the course of human evolutionary and migratory history, selection and adaptation to environmental challenges has shaped our genome Natural selection has influenced our modern population’s collection of genetic variations, and may instill a population specific genetic response to environmental triggers[15, 24, 32] Since the beginning of our species, we are constantly confronted with massive amounts of microbes, and the burden of infectious diseases can be tremendous Therefore, it is not

Trang 7

surprising that current studies on natural selection in humans has highlighted genes related to host defense and immunity as among the most strongly selected genes[23, 24] This implies that immunity and host defense in individuals today are highly influenced by the historical experience and genetic responses of our ancestors to infections, and the

consequential selection and thus inherited form of our current gene pool[25, 33]

2.3 Types of genetic variation

Genetic variants are primarily ancestral mutations that arise many generations back, which were successfully passed on to offspring, and thus occur in increased frequency in the current population When its alleles reach a frequency of more than 1 percent in the population, these variations are termed polymorphisms Although many of these variants may confer functional changes to protein, most of them act as markers for mapping specific genomic loci of interest, and serve to aid the identification of differences between individuals

Restriction fragment length polymorphisms (RFLPs)

RFLPs were among the earliest type of DNA variants/polymorphisms utilized for genetic studies It is characterized as an alteration to its electrophoresis gel pattern When there is a base change (variant form) in the DNA, it renders restriction endonuclease unable to recognize and cut the specific target sequence, and hence produces fragments of different lengths, which were identified by gel electrophoresis followed by Southern blotting Although a useful tool in the earlier days, RFLPs had many drawbacks due to their relative scarcity in the genome, and the fact that tedious and time-consuming laboratory steps were difficult to automate

Trang 8

Microsatellites

Microsatellites are composed of multiple tandem repeats of a short DNA sequence motif, in which differences in the number of short sequence repeats differentiates between alleles Unlike SNPs, they have an extremely high mutation rate, giving rise to their high variability, thus rendering them a highly informative and popular choice for linkage studies, especially in the 1990s Owing to their highly polymorphic nature, they are not ideal for population based association studies, which have now become the mainstay of genetic epidemiological study design In addition, they have the disadvantage

of being less amenable to cost efficient high throughput genotyping technology, and their finite number in the genome limits the density of microsatellite-based genetic maps

Single nucleotide polymorphism (SNP)

Single nucleotide polymorphisms (SNPs) are variants in the form of single base substitutions; in which each of its alleles has a population frequency of at least 1% SNPs are the most abundant and well-studied form of genetic variation in our genome The completion of the Human Genome Project discovered at least 1.8 million SNPs, and had attempted to map the physical positions of these SNPs to their specific genomic locations, which are found to be widespread throughout the genome[26] Most SNPs are located in introns or between genes, and are therefore non-coding, while those that are in protein-coding region could be either non-synonymous or synonymous, depending on whether there is an amino acid change involved Nevertheless, all classes of SNPs could implicate

a change in phenotype, since intronic and intergenic SNPs may also affect regulation and expression of genes To date, there are more than 10 million common SNPs in humans that have been cataloged in databases, of which many were independently validated

Trang 9

(http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi) This is a treasure trove of information, where the data for genetic studies can be mined and utilized In addition, SNPs are usually bi-allelic and relatively stable due to its low mutation rate, which makes

it easy to genotype and decipher, and is thus perfectly amenable for high-throughput technology[34] Therefore, it is the variant of choice for the popular population-based association studies in recent years

2.4 Infection and host defense

Harmful microorganisms, such as pathogenic bacteria and parasitic viruses, which

we encounter constantly in our environment, cause infectious diseases The intrusive attack of such infectious agents typically triggers a natural cascade of tightly regulated immune responses to control the infection, which often resulted in a successful attempt to eradicate the pathogens In natural infections, there are two basic lines of defense – innate and adaptive immunity

Although it is usually successful in controlling infections, our immune system is complex and multi-factorial, which makes it susceptible to persistent pathogens, thus failing in its protective role and allowing the onset of disease In addition, the variability

in our genome could possibly cause variation in the immune responses of different individuals Although many may be exposed to the same infectious agents in the environment, only certain individuals suffer the onset of the disease from the infection[22, 35] This is proven true when heritability studies on twins, and other familial aggregation and segregation studies indicated a genetic component in contributing to the variable immunity between individuals [3, 4, 36, 37] This suggests a

Trang 10

potential area of study where the identification of genetic variation in genes that may influence immune responses can be made to further understand the relationship between

pathogenic exposure and actual infections[2, 4, 6, 35, 38]

2.5 Human genetic traits

2.5.1 Simple monogenic traits

Mendelian traits are controlled by a single locus and show a simple Mendelian inheritance pattern We have been relatively successful in identifying the genetic culprits for this kind of monogenic traits, which they tend to be rare single mutations that often display severe phenotype early in life, and therefore are infrequently found in the vast population[39]

However, many common diseases that most people in the general population suffer from are far more complex Even though common traits like diabetes may still have a tendency of correlating in families, they do not follow simple inheritance patterns, and hence encompasses a different spectrum of genetic architecture[40]

2.5.2 Multi-factorial common complex traits

Most common complex traits have been shown to result from the complex interplay

of multiple genes with environmental factors [41] Even though each factor contributes only subtly, they are able to effect phenotypic changes if allowed to sufficiently accumulate to tip the balance Evidence from archeology and population genetics suggested that our current population size of more than 6 billion people is the result of a fairly rapid expansion over just the last 100,000 years or less from a relatively small

Trang 11

number of ancestors who originated in Africa [42, 43] This implied that most of our species shares common variants, and most of the genomic variants found in us are ubiquitously circulating among continental populations This might also lay the foundation for common variants to play a role in common traits present in most people[44-46] Moreover, in contrast to rare mendelian diseases, common diseases typically result from common variants with subtle to moderate effects – possibly because the late age of onset for the disease reduces the impact of these variants on reproductive fitness Hence, these variants could carry through generations more successfully, and are commonly present in the population to contribute their part towards the trait

In recent decades, geneticists have banked on the common disease common variant (CDCV) hypothesis for studying common complex traits[41, 47] This idea is well-received because it is conceptually straightforward to conduct association tests in search

of common variants Moreover, the typically high frequency of risk- alleles translates to significant population-attributable risk estimates that have important impact on public health

In order to support research in this area, resources such as the Human Genome Project, International HapMap Consortium, and state of the art genotyping technologies, are also made available

Trang 12

2.6 Genetic mapping for common complex trait: linkage and association methods

2.6.1 Linkage study

Genome wide linkage analysis is the earliest gene mapping method When the study involves human subjects, this method is performed among genetically related individuals as part of a family Each family is designated as a single unit of analysis to trace the transmission of genetic markers in their genomes In cases where the marker is found more frequent than expected to accompany the trait of interest, it suggests that a gene with functional effects is present close to it[48] In mendelian traits, linkage analysis could trace the simple transmission pattern to a defined site of interest However, for common diseases with complex inheritance, linkage method lacks both power and resolution to achieve such specificity Instead, it usually maps to broad region; typically tens of centimorgans in size, which often contain hundreds of genes, in which, the key players are difficult to distinguish As a result, it is still necessary to employ extensive fine mapping through candidate gene studies for better identification Incidentally, the necessity of familial design for detecting transmission also renders linkage studies a propensity to discover highly penetrant single gene defects of severe effect size, which seldom covers the polygenic spectrum of common complex trait Therefore, it is not surprising that most success stories of linkage analysis have the characteristic of mendelian disease, of which Huntington disease is the most celebrated since the 1980s[49] Variants found through linkages are usually unique to the studied families It explains few of the cases in the general population, and are also bleakly replicated if possible in subsequent studies[40, 50] Likewise, this tendency of detecting low

Trang 13

frequency (rare) marker would require a prohibitively large number of families to meet the required study power, which is both expensive and essentially unrealistic to achieve

in the instance of late disease onset

2.6.2 Association study

Hope came from a paradigm shift to association study, when a landmark report by Risch and Merikangas in 1996 showed that association method is more powerful than linkage analysis to identify genes of modest effects for common complex traits[47] Even though an association study also utilized genetic markers such as SNPs, it compares the allele frequency differences between response groups, such as case versus control, instead of tracing transmission Association study has since become the method of choice, because it is designed to study unrelated individuals from the population at large, where each sample is considered a single study unit rather than a cluster of family members This is imperative because it eases the collection of more subjects for a less expensive larger study, and there is no need to know the mode of inheritance for the particular trait Besides, many variants found through associations usually are common in the population, which more likely retain the required power for detection, since given that

a higher exposure (variant’s allele) frequency could translate to greater study power For decades, the community had advocated the need of large-scale association analysis, which in essence, entails “a large number of polymorphisms on a reasonably large sample size” for studying the genetics of common complex diseases[47] This is fundamentally only achievable in recent years when high-throughput genotyping technologies are available for genome-wide association studies

Trang 14

2.6.2.1 Direct association

Ideally, direct genotyping of the target polymorphism, which is possibly the causal variant, is the easiest to analyze and most powerful for testing association with the trait of interest In general, non-synonymous SNP in the coding region of a gene is a promising candidate, because of the ease of predicting from database for an obvious functional role[51] However, due to evolutionary pressure, most of the functional coding sequence

of genes are highly conserved and have low frequency in the genome[52] In addition, many of the causal variants in common complex diseases may also be located in non-coding region of genes These variants could be controlling regulatory elements and affecting gene expression, which should have eminent functional roles as well[51] Nevertheless, current knowledge is still insufficient to classify variants that has key regulatory roles Thus, the difficulty of identifying candidate polymorphisms limited our opportunities to perform direct associations

2.6.2.2 Indirect association

Frequently, we are unaware of the causal variant’s identity in our target region, and usually a surrogate marker, which is in close proximity with it, is employed for indirect associations This is achievable based on the foundation of linkage disequilibrium (LD) between tightly linked loci, such that information captured in one could be shared among them[53, 54]

Concept of linkage disequilibrium

Our human genome is inherited as a distinctive block like pattern that consists of ancestrally conserved chromosome segments called haplotype blocks, which are interspersed between boundaries of recombination hot spots[55, 56] Variants in a

Trang 15

haplotype block tend to be inherited together and are correlated This phenomenon of non-random association of alleles between tightly linked loci in a population is called linkage disequilibrium (LD) The occurrence of recombination and mutation would lead

When r2 = 1, the two loci provide identical information, they not only have a D’ value of

1, they also have equal allele frequencies Nevertheless, these measurements primarily provide us with the level of correlation and redundancy between linked loci

The concept of linkage disequilibrium is important for the design and analysis of genetic disease mapping A variant found to be associated with a phenotype or disease state may not necessarily contribute directly to the disease process, but merely acts as a marker, which is in linkage disequilibrium with the neighboring causal variant that is usually unknown And since, knowing that variants in LD are correlated and basically providing the same information, it is therefore not essential to study all variants in a LD block This can be a cost savings advantage, as it only requires genotyping of tagging SNP that best capture the information for the region instead of for each specific variant[58, 59] With the same principle, a large main sample set that just had a subset of variants genotyped could perform imputation to have its information extended to the full

Trang 16

set of variants collected on a small reference population, as long as they share similar LD patterns[60] As such, this also potentially helps in reducing genotyping burden

However, studies have found linkage disequilibrium patterns to vary between different ethnic populations As each population may experience a unique history of recombination that could result in divergence in haplotype size and allele frequencies between populations, the degree of correlation and redundancy among SNPs might vary between distant populations In recognition of this, resources such as the International HapMap Consortium have cataloged the genome wide LD pattern of a number of reference populations to guide researchers in their planning of genotyping resources This

is especially impetus to the growing importance of association studies in gene mapping and understanding disease mechanisms [27-29]

2.7 Hypothesis driven candidate gene approach

There are substantial interest on association studies, after it is established as a more powerful method to study complex traits[47] In the earlier days, when the scale of genotyping was low and expensive, genetic associations were only affordable on manageable candidate gene regions that usually have biological merit The candidate gene method is solely hypothesis driven, and hence relies on published studies on the trait

of interest for suggestions[51] Even though immunological studies have been ongoing for decades, but current knowledge on the full complexity of the immune system in an infected human host is still limited Besides candidate region from previous linkage studies, many hypotheses for infectious diseases are generated in-directly from animal models and among related infections that may portray shared etiologies [2, 61-63]

Trang 17

In addition to the criticisms of its limited scope, care must also be taken in the study design, and interpretation of associations of a candidate gene study Since most population studies are likely to be confounded by population sub-structure, it could lead

to spurious association that is unrelated to the trait of interest[64] This is a more critical problem for the candidate gene study, as it generally has low variant density, which renders it less capable of detecting the presence of population stratification Hence, even though it would add to the genotyping burden, additional genotyping of carefully selected ancestry informative markers (AIMs) may be necessary to cater for this purpose In addition, if ancestry differences are indeed present between response groups, the association test statistics may require further genomic correction[64]

The current customizable platforms available for targeted genotyping of candidate gene study are the small and medium scale technologies Small-scale technologies are the single-plex Applied Biosystem’s Taqman® assay (Life Technologies Corporation, CA, USA), and Sequenom iPlex® Gold assay (Sequenom Inc., CA, USA), which could multiplex up to 40 SNPs per sample For medium-level multiplexing, Illumina’s GoldenGate assay (Illumina Inc., CA, USA) allows genotyping to a maximum of 1,536 SNPs per sample, and newer players, which allow multi-plexing of up to tens of thousands of SNPs are Illumina’s recent Infinium iSelect custom panel (Illumina Inc.,

CA, USA) and Affymetrix’s GeneChip® custom SNP kit (Affymetrix Inc., CA, USA)

2.8 Hypothesis generating genome wide association scan (GWAS)

Our imperfect understanding on the biological mechanisms of most complex traits

is limiting our capability of generating hypothesis for candidate gene associations Many

Trang 18

genes that are still awaiting revelations may harbor unexpected but biologically important functions, which current literature has yet to find relevant associations with the trait of interest To expand our knowledge, we need a hypothesis-free, yet, extensive search throughout the genome to identify these novel genes Acknowledging greater power awarded by association study for common complex trait, many genetic epidemiologists are motivated toward a comprehensive genome wide association scan (GWAS) for genes, related to common traits in the human populations[47] Even though the cost of genotyping has significantly reduced over the years, it is still too expensive to query all

10 million common SNPs in the human genome Fortunately, the structure of our genome

is made of haplotype blocks, where variants in LD tend to be inherited together.This will allow us to conduct indirect association through marker tagging, which could theoretically capture the required information of all variants in the same block/ region [27-29, 56] This shortcut strategy of marker tagging in reducing genotyping and cost has been developed into microarray genechips, which are commercialized mainly by Affymetrix GeneChip® (Affymetrix Inc., CA, USA) and Illumina’s Infinium technologies (Illumina Inc., CA, USA) Depending on its generations, these microarrays have SNP densities that range from 10,000 to a million common SNPs per array, such that it is now technologically possible to query the entire genome for genes implicating our trait of interest [65-68] However, different commercially available fixed content platforms have their preferred combination of SNPs tiled on the array, which provide variable degree of tagging efficiency in a GWAS study[69, 70] Nevertheless, studies from HapMap phase II data had revealed the mere requirement of just slightly more than 500,000 SNPs to capture all known common SNPs in phase II of HapMap with an r2 of ≥

Trang 19

0.8 in Asian and Caucasian, while Africans will need twice the amount of more than a million SNPs[27]

2.8.1 Multiple-stage design

Most of these genome wide genotyping products follow the concept of the

“common disease common variant” (CDCV) hypothesis in their SNP selection The effect size of these common variants tends to be modest; with typical odds ratio less than 2[71] In order to have adequate power for detecting associations of this magnitude, it is essential to have large number of samples, which preferentially are in the range of thousands[54] Even though the SNP microarray had increased our throughput and made high-density genotyping more cost efficient, but the price of an array is still fairly expensive When the first generation microarray was first introduced to the market, it was selling at about a thousand dollars each Nonetheless, as they get superseded by the ever-increasing SNP density of newer chips, the cost of older products had significantly gone down over the years Even so, if the entire large sample size is genotype with the array, this will still cost millions just for the initial screening This is seldom affordable by most laboratories, and is also found to be unnecessary, if a staged sampling approach is employed

The multiple stages study design is widely adopted for GWAS, with the main aim

of reducing genotyping burden and cost while still maintaining good study power[72, 73]

To achieve this, an affordable fraction of samples is genotyped with the microarray for an initial screen on the entire SNP collection After association analysis, a liberal alpha significant threshold is used to identify a subset of putatively associated SNPs for validation on the remaining independent samples It is taken that SNPs in the first phase,

Trang 20

which do not even pass this liberal threshold are unlikely to achieve the required higher alpha significance threshold of the whole study, and therefore can be safely discarded to minimize genotyping On the other hand, those promising variants that are carried into subsequent stage are tested on all samples, and the good power of the whole sample set for detecting associations should therefore be retained Furthermore, any spurious associations that might arise in the initial stage could also be validated subsequently in the other stages

Trang 21

3 Objectives

The main objective of my thesis project is to identify genetic variation that influences susceptibility to infectious disease This was conducted in two studies to specifically address the following objectives:

I To understand host genetic susceptibility to the natural infection of

Trang 22

new cases that accounts for 35% of incidence cases globally Hence, it is necessary for us

to study TB susceptibility in this endemic population

Although Mycobacterium tuberculosis (M tuberculosis) has infected around a third

of the world’s population, only 3-10% of those infected develop active disease during their lifetime[76] More than 90% of infected individuals remain asymptomatic with a latent infection This indicates that host immune / defense pathways are often highly effective in controlling this disease Because the infection causes such a burden of disease

in those unable to contain the infection, it is therefore important to discover underlying mechanisms to aid the development of more effective interventions such as better vaccines and novel treatments for latent and active infection Similarly, it is important to identify predictive biomarkers that might identify individuals who are most susceptible to developing active TB disease

Heritability estimates from twins and other familial aggregation and segregation studies had convincingly implicate a notable genetic component contributing to this outcome[3, 36, 77-79] Consequently, this has convinced us to use genetic mapping for discovering relevant genes in pulmonary TB susceptibility For this kind of common disease, the genetic mode of action frequently attributed from multiple genetic factors, which individually produce modest effects and is present commonly in the population[41, 46] In this case, population-based association study that compare allele frequency differences between disease and non-disease groups is known as a more powerful method

of mapping to relevant gene regions, that may accordingly help in comprehending the

mechanisms of TB[47]

Trang 23

was further confirmed by sputum culture of M tuberculosis Clinical information, as well

as the patients’ age, ethnicity, socio-economic status, and concurrent medical history were recorded in structured questionnaires Patients with extra-pulmonary TB, diabetes mellitus (fasting blood glucose > 126 mg/dL), and HIV-positive subjects were excluded from the genetic study [81, 82] For controls, we recruited 746 (mean age 33, range 15-

70, 52.5% male) sex- and age (+/- 10 year) matched subjects with no history of TB and showing no evidence of TB-related infiltrates in chest X-rays, who were also living in the same or neighboring households of the enrolled cases, but were not known to be genetically related Self and parental ethnicities recorded during recruitment were used to characterize subjects with a Javanese origin from three groups -the Jawa, Betawi, and Sunda, which altogether comprised more than 80% of the total sample Individuals in the non-Javanese category have both parents coming from other Indonesian Islands, whereas subjects with one parent from non-Javanese origin were considered having mixed parentage (Table 1) Using the information, we made an effort to avoid spurious associations arising from population stratification, by excluding ssubjects with self-

Trang 24

reported ethnicity that were of non-Indonesian origin from the genetic studies Nevertheless, we also confirmed in a previous study that the cohort does not have significant population stratification problem[83] In summary, for the purpose of this population stratification analysis 299 SNPs were carefully chosen from genomic locations that are more than 10kb away from any known genes, besides that, these SNPs also need to be in linkage equilibrium with one another, and having average minor allele frequencies around 30%, to act as a set of ancestry informative markers (AIMs) We genotyped these AIMs in a subset of randomly selected 330 cases and 368 controls from this cohort One of the SNPs was out of HWE, and thus was excluded from the analysis

We used the Devlin and Roeder method to calculate the lambda inflation factor [84], this was calculated by using the median of the chi-square values for all 298 SNPs, divided by 0.675 and then squared We arrived with a value close to 1 (0.82), which indicated that there is no significant population stratification in this Indonesian group [83]

This protocol was reviewed and approved by the relevant institutional review boards in Indonesia and the Netherlands, and written informed consent was voluntarily signed by all patients and control subjects

Trang 25

Table 1: Demographic data of the main study population

In total 1,912 TB patients (mean age 43.8, range 17-86, 73.8% male) were diagnosed by the local health care service based on clinical symptoms, and evidence from

X-rays and sputum smear For confirmation of diagnosis, sputum cultures of M

Trang 26

tuberculosis were conducted on all patients in this study We also recruited 2,104 adult

blood bank donors that have no history of TB (mean age 30, range 16-66, and 75.0 % male) as the local population controls Clinical information, as well as patients’ age, ethnicity, socio-economic status, and concurrent medical history had been recorded in the structured questionnaires From the information, all patients with extra-pulmonary TB as well as all HIV-positive subjects were excluded from the study

As reported previously, population stratification was also assessed to be minimal in

this cohort[85] Basically, as adapted from Smith et al (2004) admixture mapping

method[86], 15 AIMs were selected among intergenic or intronic SNPs in non-immune genes that were spread across the genome, which need to have minor allele frequency greater than 10% in Europeans, and are over 65% difference in allele frequency between European and Asian derived populations All the subjects in the Russian cohort were genotyped with these selected 15 AIMs, and we found all of the markers had similar allele frequency in both TB patients and healthy subjects (chi-square test p > 0.13), thus, suggesting that population stratification is unlikely to be a problem for this cohort[85] Permissions were obtained from the local ethics committees in St Petersburg and Samara, Russia, and Cambridge, UK, and had written informed consent from all participating subjects

Ghanaian cohort

Ghanaian participants were enrolled between September 2001 and July 2004 at Korle Bu Teaching Hospital, Accra, Komfo Anokye Teaching Hospital, Kumasi, additional hospitals and polyclinics in Accra and Kumasi and at surrounding district hospitals Cases and controls belonged to the ethnic groups of Akan, Gaa-Adangbe, Ewe,

Trang 27

and several ethnic groups of northern Ghana The case group consisted of 2004 negative individuals with smear-/culture-positive pulmonary TB (mean age 34.1, range 9-

HIV-60, 68.1% male) Phenotyping of patients were based on the medical histories and documentation of major symptoms on structured questionnaires, physical examination, HIV-1/2 testing (Capillus, Trinity Biotech, Bray, Co Wicklow, Ireland), posterior-anterior chest X-rays, Ziehl-Neelsen staining of two independent sputum smears, and

culturing of M tuberculosis on Loewenstein-Jensen agar The control group of 2,366

apparently healthy person (mean age 32.4, range 6-66, 59.4% male) comprised 1,231 unrelated personal contacts of cases and 1,135 working contacts or members of the same communities Characterization of controls included a medical history and clinical examination, chest X-ray and a tuberculin skin test (Tuberculin Test PPD Mérieux, bioMérieux, Nürtingen, Germany) The controls had no radiological signs of actual or previous pulmonary TB Further details of the recruitment procedure and the composition

of the study group have been described previously[87]

The study protocol was approved by the Committee on Human Research, Publications and Ethics, School of Medical Sciences, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana, and the Ethics Committee of the Ghana Health Services, Accra, Ghana

Trang 28

4.2 Genetic study

4.2.1 Candidate gene approach

The infectious nature of M tuberculosis has made experimental infection studies of

inoculating the pathogen directly into humans ethically impossible Hence, most of the

TB candidate genes are from immunology studies that are frequently modeled on animals

or through shared etiologies among infectious diseases[2, 62] Most of the previous candidate gene studies on TB had been focusing on variants in genes functioning in innate immunity, though adaptive immunity is also probable in playing a role[2, 12, 88] With the interest to facilitate the effective use of our limited genotyping resources, the first part of our TB genetic studies concentrated on replicating new and interesting associations that have yet to be studied in Southeast Asian We hope to use the Indonesian cohort as a representative of the Asian population for replication, and test for evidence of shared genetic variance that is possibly present across the continents

4.2.1.1 MAL/TIRAP

Among the components of innate immunity, recent interest has centered on the toll like receptors (TLRs), a class of pattern recognition receptors (PPR) TLR binding

generally results in NFκB activation a major transcription factor that regulates genes

responsible for both the innate and adaptive immune responses, and thus plays a key role

in regulating the immune responses to infection[89] Upon recognition of pathogen, TLR

is stimulated and signals the activation of NFκB signaling cascade[90] Of importance is the TIRAP (Toll-Interleukin-1 Receptor domain containing Adaptor Protein) gene, which

Trang 29

encodes the MAL (MyD88 Adapter-Like) adapter protein that is essential for dependent signaling downstream of TLR2 and TLR4 This molecule has a central position

MYD88-in the TLR2 and TLR4 pathways, and is capable of signalMYD88-ing from the startMYD88-ing TLR receptors to the IRAK molecules, which eventually leading to NFκB activation and type I

interferon production[91]

Many, were intrigued when a non-synonymous variant (rs8177374) capable of Ser

to Leu amino acid change at position 180 of MAL/TIRAP gene, demonstrated association

to four different human infectious diseases, including TB [92, 93] Functional study on the protective allele displays impaired signaling, and heterozygous carriage of this variant was shown to be protective against invasive pneumococcal disease, bacteremia, malaria

as well as TB[2, 93] In the original study, the association of this variant (p 0.04, and corrected p 0.013) with TB was conducted on Africans with a sample size of 675 cases and 605 controls The minor (protective) allele was found to be rare (MAF 0.1%), and the estimated effect size of OR = 0.23 was considerably larger than those typically seen in other common complex diseases (0.5 < OR < 2) [71, 93] Therefore, it should be straightforward to replicate this association, if the protective allele is more frequent in other study population, and/or in sample collection of sufficient size Even though our Indonesian TB cohort is of a similar sample size, but the same allele is ten times more frequent in Asian, which provides us with greater than 80% power to replicate this association Besides the Indonesians, we have also collaborated with other groups for independent datasets from Ghana (1,913 cases and 2,293 controls) and Russia (1,867

cases and 2,076 controls), and collectively attempted the replication of this TIRAP

Ser180Leu variant (rs8177374) on a total of 9,441 subjects These collections are larger

Trang 30

than the TB collection of the initial study, and represented an African, an Asian, and a European population

70, 59.1% male) from the same area in this study

Russian cohort

Russian TB patients and controls were collected at two cities, St Petersburg and Samara, according to the same protocol Details of this cohort is discussed in section 4.1.1, and was also reported previously[85] We included 1,867 pulmonary TB patients and 2,076 healthy blood bank controls from this Russian cohort to be part of this study

Ghanaian cohort

Ghanaian participants were enrolled between September 2001 and July 2004 at Korle Bu Teaching Hospital, Accra, Komfo Anokye Teaching Hospital, Kumasi, additional hospitals and polyclinics in Accra and Kumasi and at surrounding district hospitals Cases and controls belonged to the ethnic groups of Akan, Gaa-Adangbe, Ewe, and several ethnic groups of northern Ghana, details on this cohort is discussed in section

Trang 31

4.1.1, and was also reported previously[87] We also included 1,913 cases and 2,293 controls from this Ghanaian cohort to participate in the replication exercise of this study

The Russian collection was also genotyped using the same Taqman (Applied Biosystems, ABI, CA, USA) functionally tested assay as those used by the Indonesian cohort Even though the genotypes from this assay were double scored to minimize error,

we had also confirmed the results by re-sequencing a 506 bp fragment of the TIRAP gene

in 65 Russian subjects using primers: ACTATGACGTCTGCGTGTGC (forward) and AATAAGTGCAGGAGCCAGGA (reverse) Full genotype concordance was observed between both assays

Genotypes of the Ghanaian samples were determined by dynamic allele specific hybridization with fluorescence resonance energy transfer in a LightTyper device (Roche

Trang 32

Diagnostics, Mannheim, Germany) The primers and probes used for genotyping were: GCTGCTTCCTGCAACTC (forward), TGACTTGACGAAAGCCAC (reverse), CCTGCTGTCGGGCCTCA (5'-Cy5 and 3'-phosphate labeled) and GGCCGAGGGCTGCACCATCC (3'-fluorescein labeled)

Data analysis

The power calculation was carried out using Genetic Power Calculator[94] by assuming the multiplicative model and TB prevalence at 262 cases in Indonesia, 160 cases in Russia, and 376 cases in Ghana per 100,000 persons [95] In each of the control groups, we calculated statistic for Hardy-Weinberg equilibrium (HWE): Indonesian (p = 0.02), Russian (p = 0.1), and Ghanaian (p = 0.95), indicating no major deviation from the Hardy-Weinberg equilibrium The association analysis was on genotype frequencies, using Stata 8.2 Odds ratios and p-values that are shown in Table 2 were calculated by Cochran-Mantel-Haenszel (CMH) test controlling for the origin of the sample within the Russian or Indonesian cohorts and for the ethnic group in the Ghanaian cohort Because allele frequencies were similar between samples collected in St Petersburg and Samara (11.5% and 10.7%, respectively), in Table 2 we present combined genotype data for the Russian population

Trang 33

4.2.1.1.2 Results

Table 2 displays the association results of rs8177374 for the risk of pulmonary TB on three cohorts All the test statistics’

p-values did not surpass the alpha threshold of 0.05, and the confidence interval of the odds ratios estimate include 1, which hence demonstrate as not statistically significant to support any evidence of this locus to associate with pulmonary TB

Table 2 Association analysis of the rs8177374 (C>T, Ser180Leu) polymorphism in the TIRAP gene with pulmonary tuberculosis

Samples Leu/Leu (%) Ser/Leu (%) Ser/Ser (%) MAF OR (95% CI) p-value Powerc(%)

a – OR for the Leu allele

b – OR for the Ser/Leu heterozygote versus Ser/Ser and Leu/Leu homozygotes

c – statistical power to detect OR = 0.23 for Ser/Leu heterozygote at one tailed p-value= 0.05/ one-tailed p-value = 0.2

Trang 34

4.2.1.1.3 Discussion

We have no evidence to support the association of rs8177374 with pulmonary TB

in our Indonesian cohort (Table 2), indicating neither a multiplicative allelic effect nor a

protective effect of Ser/ Leu heterozygosity, as was the model proposed by Khor et al

This is despite the study having an excellent power of above 80% for detecting association, such that type II error is unlikely Even though, this lack of replication in our Indonesian cohort could initially be thought as genetic heterogeneity, since the variant might vary in its role across different populations Or it could even be due to the complications of LD and allele frequencies differences between diverse ethnicities, such that this variant, which has yet to be confirmed as casual for TB, might be tagging the actual casual variant less efficiently in Asian, compared to the initial population of West African origin However, negative results were also seen in Russians, and most importantly the Ghanaian collection, which originates from West Africa, thereby resembling the TB population studied by the initial study, as such the possibility of genetic heterogeneity could be quite confidently ruled out The evidence for no association is quite evident, since we have even extended the Russian and Ghanaian

collections to be three times larger than the initial collection, and thus ensuring the

cohorts to be better powered to detect the originally estimated effect of OR = 0.23 (Table 2) Additionally, given that the Leu allele is much more common in the Russian and Indonesian populations than West African (11.3%, 1.4%, and 0.1% respectively); hence these non- African populations should also be adequately powered to detect effects that could be even smaller in scale This also highlighted that the rarity of this variant in the African populations should have also rendered our Ghanaian cohort (MAF 0.0013) as

Trang 35

well as the initial West African cohort (MAF 0.006) of Khor et al., which was even

smaller in sample size, to be weak in power for detecting the associations in these populations[93] Nevertheless, it seems unlikely for our study, at least in Russians and Indonesians, to have missed any true associations, since there are also additional evidences from other subsequent studies, which had also attempted to associate this variant with risk of TB in Colombian, South African, South Indian, and Peruvian cohorts

A meta-analysis based on all studies to date was conducted recently, which had formally concluded that rs8177374 does not have a measurable effect on TB, in both the allelic (OR = 0.99, 95% C.I 0.88-1.11) and heterozygote advantage (OR = 0.99, 95% C.I 0.87-1.13) comparisons, with a heterogeneity p-values > 0.05[92]

Even though the original study had an impressive combined statistic of p = 9.6 x

10-8, as their evidence of significant association, but this was based on a joint analysis of all four infectious diseases On the other hand, the evidence specifically for TB (p 0.04) was only marginally significant; hence it is not unexpected that the association of rs8177374 with TB is not replicated in our study Besides, it is obvious that TB, invasive pneumococcal disease, malaria, and bacteremia are rather different infections, comprising

multiple specific pathways of infection and host defense Even if TIRAP were involved in

each of these diseases in a similar way, its relative contribution should be expected to vary From the results of this study, we suggest against combining datasets of distinct diseases in a single association test to avoid false conclusion Nevertheless, we want to encourage a unified effort of combining well-powered independent datasets from multiple cohorts possibly across diverse ethnicities, when studying a same trait such that the evidence of association could be more convincing

Trang 36

Paper1:

1 Analysis of Association of the TIRAP (MAL) S180L Variant and Tuberculosis in

Three Populations, Nature Genetics, 40(3): 261-262, (2008)

Trang 37

susceptible (C3HeB/FeJ) mouse macrophages It potentially controls innate immunity by limiting the multiplication of the pathogen, and switching cell death from pathogenic necrosis to less damaging and bacterial control promoting apoptosis[98, 101]

The biological significance of Ipr1 in causing TB in mice has spurred interest to study its human sequence homolog (41%); the SP110 gene, which is the next candidate gene of our TB study[102] SP110 is located on human chromosome 2q37.1, and is a

member of the Sp100/Sp140 family of nuclear body components, which may function as

an activator of gene transcription and a nuclear hormone receptor co-activator [102] This gene is primarily expressed in blood leukocytes and in the spleen, but at negligible levels

in other tissues[102] As SP110 was found to be induced by interferon (IFN), it might

plays a role in IFN response[103], mainly responding to viral proteins[103, 104] Hence,

SP110 may also present as a potential target for controlling intracellular infections

Variants located in SP110 were first found to be associated with TB in an West

African study [100] Through linkage analysis on families in Gambia, three sequence variants were found to be associated with pulmonary TB in this cohort Replication in subsequent stages on separate cohorts from Guinea and Guinea-Bissau, further supported

Trang 38

the associations of rs2114592 and rs3948464 Subsequent attempts to replicate these findings in other populations, including another West African (Ghanaian) cohort, as well

as cohorts from South Africa, Russia, and India, however, were unsuccessful [85, 87,

105, 106] Hoping to independently replicate the findings, we genotyped 20 SNPs located

in SP110, including the two previously associated SNPs in a South East Asian cohort

from Indonesia The South East Asian region contributes to a large proportion of TB incidence annually, and Indonesia is among the top five high burden countries[16], hence there is urgent needs to study TB in this endemic population

4.2.1.2.1 Methods

Subjects

Indonesian TB patients and controls were enrolled from the cities of Jakarta and Bandung on the island of Java, Indonesia, using a uniform enrollment protocol for all subjects [80], details of this cohort is discussed in section 4.1.1 and in Table 1 In summary, we studied 351 pulmonary TB patients (mean age 28, range 14-75, 61.0% male) and 364 sex and age (+/-10 year) matched controls (mean age 32, range 15-70, 60.0% male) for 19 out of the 20 SNPs Assuming a multiplicative model, and a TB prevalence at 262 cases per 100,000 people in Indonesia, the sample of this size has at least 80% power to detect associations for risk alleles ≥ 20% frequency, and OR ≥ 1.6,

for an uncorrected significance threshold of significance at p-value 0.05[94, 95] For the

remaining SNP, we had doubled the samples to 655 pulmonary TB patients (mean age

27, range 14-75, 56.0% male) and 722 sex and age (+/-10 year) matched controls (mean

Trang 39

age 31, range 15-70, 53.0% male), for a greater power of at least 95% to detect association of the previously associated locus (rs2114592) in this cohort [94, 95]

Genotyping

We genotyped 20 SNPs located in the SP110 gene on our Indonesian cohort

Genotyping of the first 19 SNPs were performed by the GoldenGate assay (Illumina Inc,

CA, USA), as according to manufacturer’s protocol, and the BeadStudio software was used to call the genotypes [107] For the remaining SNP; rs2114592, genotyping was carried out with a Taqman (Applied Biosystems, ABI, CA, USA) functionally tested assay The probe and primer were designed by the manufacturer (ABI) for Assay ID C 15816049_10 To minimize error, genotypes were double scored with the Real-Time

Ct values of both FAM & VIC reporters, as well as using the Allelic Discrimination analysis of the Sequence Detection System (SDS) software (Applied Biosystems, ABI,

CA, USA) to call genotype

For quality control purposes, subjects were excluded if they were found to have call-rate <90% (N=0), and having discrepancies with reported gender (N=0) SNPs were also filtered to remove those that may have call-rate <90% (N=0), MAF <0.01 (N=6, including index SNP rs3948464), and p-value of HWE test (controls only) < 0.004 (N=0)

Association analysis

Association analysis was based on the Cochran-Armitage Trend test, using the Exemplar software (Sapio Sciences, MD, USA) A Bonferroni correction for the number of loci tested (p-value 0.05/14 SNPs tested) was applied, which set p-value 0.004

as the alpha threshold of significance in our experiment

Trang 40

4.2.1.2.2 Results

A total of 20 SNPs in the SP110 gene were genotyped in at least 351 pulmonary

TB cases and 364 controls from Indonesia, and were subjected to quality control, as described under the foregoing methods section All of the 20 SNPs examined are in agreement with Hardy Weinberg equilibrium in this population, see Table 3 However, 6 SNPs are found to be monomorphic or have MAF < 0.01, and were excluded from subsequent association analysis In the remaining 14 SNPs, which were analyzed with the genotypic Trend tests, none of them have p-values that surpassed the alpha threshold of significance (p < 0.004) for associating with pulmonary TB susceptibility in this cohort (Table 3), despite this being a well-powered study

Ngày đăng: 09/09/2015, 18:51

🧩 Sản phẩm bạn có thể quan tâm