4.1 LD and Variation of the MHC in Singaporean Chinese Many association studies have highlighted strong links between the MHC and a number of diseases in the Singaporean Chinese populati
Trang 1CHAPTER 4:
DISCUSSION
Trang 24.1 LD and Variation of the MHC in Singaporean Chinese
Many association studies have highlighted strong links between the MHC and a number of diseases in the Singaporean Chinese population, and a key to identifying the genetic aetiology of these diseases lies in the understanding of the LD structure of the MHC LD describes the non-independence of alleles at different loci, and knowing how LD varies across the MHC will guide the efficient design of markers for population-based disease association studies However, a haplotype map of the MHC in the local Chinese population is currently not available To this end, SNP maps of increasing marker density were constructed and these sets of data provide a comprehensive guide to the patterns of variation and LD of the MHC in the Singaporean Chinese population The LD of the MHC was first analysed in relation to the entire chromosome 6p, before progressively drilling down at high-resolution into the MHC proper SNP data was later integrated with HLA genotypes, providing an opportunity to analysis LD with respect to the HLA haplotype background The data presented here provides a foundation for future disease association studies in the local Chinese population
Strong correlation in SNP profile of two Asian populations – that of Singaporean Chinese and the Han Chinese in Beijing
The HapMap project has become an important resource for genetic variation in the human genome, with the SNP frequencies and haplotype structure of the 4 HapMap populations strongly influencing the design of commercially available SNP panels used in large association studies (e.g The Wellcome Trust Case Control Consortium 2007) The utility of such a map across other populations around the world is still being investigated, and the SNP maps in this study afforded the opportunity to
Trang 3examine the applicability of the HapMap data to the local Chinese population 45 Beijing Han Chinese (CHB) individuals were genotyped as part of the HapMap project and these genotype results are readily available from the project’s online resources From these, we found that the SNP allele frequencies in the Singaporean Chinese population are tightly correlated to those in the CHB population (Figure 3.12), stressing both the reliability of the SNP data sets as well as highlighting the ancestral proximity of the 2 populations, even though the former consists mainly of southern Chinese while the latter is more skewed towards the northern Chinese The tight correlation in allele frequencies also indicates that in the absence of available variation data in other regions of the genome for the Singaporean Chinese population, the CHB panel of the HapMap project will serve as an adequate proxy The correlation of the Singaporean Chinese SNP frequencies with the other HapMap populations also reflect well the genealogy between the populations, with the Japanese population showing high correlation, followed by the Caucasians and the Africans
The transferability of HapMap derived tag SNPs in the local population was also investigated True to the allele frequency correlations, the tag SNPs generated using the Beijing Han Chinese population outperforms any of the other sets, including a combined set of Japanese and Chinese samples (Table 3.8) This again has an important implication in SNP selection for disease association studies, as typically the Japanese and Han Chinese population panels are considered as a single unit in HapMap datasets, but when utilising these data for studies pertaining to the Singaporean Chinese population, better efficiency and power will be achieved if the Han Chinese data set was considered singularly
Trang 4High-resolution map efficiently captures SNP variation in the Chinese MHC
The 1877 informative SNPs used in this study to construct the high-resolution variation map of the MHC is a subset of the total SNP variation previously identified
in the loci Using the genotype data from the HapMap Han Chinese population as a proxy, almost all (98.5%) of the informative SNPs across the region are found to be well represented by at least a single partner in this SNP map and more than half are perfectly correlated with at least one of the 1877 SNPs (Figure 3.13) Hence we find that this map efficiently captures the other un-typed variation in the Chinese population, and further supports its effectiveness in describing LD patterns of the local population
Linkage disequilibrium in the MHC-telomeric region is stronger than that of the MHC
Various linkage analysis and estimates from sperm typing have previously reported that the recombination rate across the MHC is lower than that of the chromosomal average (Cullen et al 2002, Kong et al 2002, Tsunoda et al 2004) Similarly within the local Chinese population, sliding window analysis of LD across the chromosome 6p show a striking peak of LD around 28.9Mb, with average D´ and r2 plots elevated above the chromosome average in a 8Mb segment that includes the MHC (Figure 3.3) There however is evidence that the LD in the MHC-telomeric region is stronger than the MHC itself, most evident in the LD heatmap of the chromosome 6p (Figure 3.2), with strong pairwise D´ seen even in SNPs separated by up to 5Mb Furthermore, of the 3 haplotype blocks that are identified from the low-density SNP map, two of them lie within this telomeric-MHC segment (Figure 3.5) This pattern is also repeated in the 2.6kb resolution SNP map, where pairwise LD values between
Trang 5SNPs in the extended class I region are higher than the rest of the MHC for a similar marker distance (Figure 3.14)
Several explanations for this lengthy track of high LD exist Firstly recombination may be suppressed as a consequence of the tight clustering of histone and tRNA genes in this region Transcription levels of both of these gene families are required in large quantities and tight clustering of these gene groups may be a result of selection pressures for maximising transcription activity (Horton et al 2004) The strong LD pattern across this segment could be due to a hitchhiking effect between the histone/tRNA clusters and the surrounding regions
A second fascinating explanation for this high LD is the association between olfactory receptors and MHC alleles The genomic arrangement of olfactory receptors and MHC genes is syntenic across human, mice and rats Experiments performed with semi-natural mice populations have shown that there is a distinct preference for MHC-dissimilar partners in mate-selection (Potts et al 1991) A study performed on a reproductively and culturally isolated human population also showed a lower number
of MHC-dissimilar spouses than expected if there were no preferential mate-selection (Ober et al 1997) Additionally, it has been shown that odorants from MHC gene products influence individual specific odours (Yamakazi et al 1999) The MHC-influenced negative assortment mating may either be driven by avoidance for inbreeding and/or MHC-heterozygous advantage against infections or parasites (Beauchamp and Yamakazi 1997, Ehlers et al 2000) A selection-driven linkage between the olfactory receptor cluster and MHC loci would consequently give rise to the high LD block seen in the data Indeed, taken in this light, the overall suppressed
Trang 6recombination rate across the MHC loci could be a result of hitchhiking with the MHC region (Horton et al 2004)
telo-High-resolution LD structure of the MHC in the Singaporean Chinese Population
The high-resolution SNP map provided an opportunity to describe in detail the LD pattern within the MHC in different ways On a population level, LD pattern has a block-like structure (most evident in Figure 3.16) with long isolated stretches of consecutive SNPs sharing high pairwise LD flanked by short intervals within which
no allelic association is seen between neighbouring SNPs This block-like structure of
LD was first suggested in 2001 in a seminal paper (Daly et al 2001), and subsequently seen in genome-wide scans (Gabriel et al 2002), cumulating in the construction of a haplotype map of the human genome (International HapMap Consortium 2005, 2007) The data in this study fits this model well
The haplotype blocks were formally defined using a well-established criteria that linked segments of consecutive markers in significantly high D´ as haplotype blocks (Gabriel et al 2002), and in the high-resolution map 203 haplotype blocks were identified in the 4.9Mb region The characteristics of these haplotype blocks are remarkably similar to those identified in a Caucasian population in terms of average lengths, diversity and coverage (Miretti et al 2005), and probably reflect the similar historical recombination events in both populations, supporting the single “out of Africa” hypothesis (Gabriel et al 2002, International HapMap Consortium 2005) As these 2 studies were conducted in different populations and marker sets, the haplotype
Trang 7block similarity also lends support to the robustness of the data generated in this study, as well as the haplotype block definitions employed
The MHC can be divided into 5 sub-regions that reflect the clustering of the different classes of HLA genes within (Horton et al 2004), and to see if the strength of LD varies depending on the sub-region, LD was analysed separately in each as well as the MHC as a whole While the class I, class III and extended-class II regions have similar LD patterns, the extended class I and class II regions are found to be at extreme ends Linked to the high LD segment of the telo-MHC, the extended class I region is in strong LD and carry longer haplotype blocks In contrast, across a similar physical length, the class II region has lesser LD with a more fragmented, shorter block structure This difference in LD agrees with the recombination rate reported for the Caucasian population; a very low 0.195 cM/Mb in the extended class I, and nearly
9 times higher in the class II (1.712 cM/Mb) (Miretti et al 2005) There is a positive correlation between polymorphism levels and the recombination rate (Nachman 2001, Kauppi et al 2003) and as the extremely polymorphic class II region is under balancing selection, this could have driven the recombination rate higher, facilitating the creation of new haplotype variants in an arms race against pathogens (Meyer and Thomson 2001)
This pattern of LD has an important implication on SNP selection for disease association studies of the MHC In an association scan using a panel of test SNP markers, the power of such a panel is related to the LD between the test markers and the disease loci through the LD parameter r2 (Pritchard and Przeworski 2001) Therefore for an association scan with a fixed sample size of cases and controls, it is
Trang 8important to choose a density of SNPs that reflects the LD of the tested regions with a higher density needed in regions of low LD This point is also reiterated in the generation of tag SNPs for the MHC in Singaporean Chinese (Table 3.7) It is seen that the tagging efficiency – as defined by the number of tag SNPs needed per Mb – is highest in the extended class I region where the LD is the strongest (83 tags / Mb), and lowest in the class II region (222 tags / Mb) Consequently, in SNP selection or marker design for future MHC disease association studies, instead of an even distribution of SNPs across the MHC a proportionally greater number should be typed
in the class II region
A major impetus of LD maps is to guide the selection of SNPs in disease association studies based on the localized LD patterns, these selected SNPs are also called tag SNPs and we have identified that 701 tag SNPs are sufficient to tag the common variation in the MHC in Singaporean Chinese
4.2 Haplotype-specific LD Patterns in the Singaporean Chinese MHC
To investigate if LD in the MHC varies in a HLA allele or haplotype specific manner, the pattern of LD in common HLA alleles was analysed in 3 different approaches, the first of which is through simple haplotype counting When 2- and 3-locus HLA haplotype frequencies are counted, it becomes evident that several HLA pairs appear
on the same chromosome more frequently than one would expect by chance, suggesting that LD exist between HLA pairs (Table 3.4) The second type of analysis used to study allele specific LD was through the use of SNP homozygosity plots of different combinations of HLA haplotypes (Figures 3.8-3.10, Figure 3.25) These homozygosity plots provide an indication of the conservation of HLA haplotypes, and
Trang 9it is seen that tightly linked HLA allele pairs carry higher levels of similarity in the homozygosity plots underlining the fact that little recombination occurs in between Finally, extended haplotype homozygosity (EHH) analysis was employed to describe the similarity of haplotypes carrying identical HLA alleles at increasing distance from the HLA loci, which reflects the strength and number of recombination events in these haplotypes (Figure 3.18)
In each of these analyses there is clear evidence that LD pattern in the MHC is HLA allele and haplotype specific The EHH plots show that common alleles such as A*0207, A*3303, B*4601 and B*5801 are associated with long extended stretches of
LD, and consequently such alleles are seen with dominant HLA partners at other loci when the extent of LD stretches from one HLA locus to another This is also reflected
by the high D´ between these HLA pairs and the low p-values of the observed haplotype frequencies indicate that these haplotypes do not exist by chance The SNP homozygosity plots reiterate these patterns; alleles associated with long EHH show little variability in the haplotypes that carry them, suggesting that they are conserved and not interrupted by recombination Yet there also exist other common HLA alleles, such as A*1101 and B*4001, which are frequently found with multiple HLA partners
at other loci and short EHH stretches The majority of SNP homozygosity plots of haplotypes carrying these alleles show high variability at multiple locations, indicating that these are not conserved but rather gradually decayed by recombination
Conserved extended haplotypes in the Singapore Chinese population
From this study we conclusively identify 4 conserved extended haplotypes that are common in the Singaporean Chinese population: A*0203-C*0702-B*3802-
Trang 10DRB1*1602 (A2-B38-DR16), A*0207-C*0102-B*4601-DRB1*0901 DR9), A*1101-C*0801-B*1502-DRB1*1202 (A11-B15-DR12) and A*3303-C*0302-B*5801-DRB1*0301 (A33-B58-DR3) These conserved haplotypes can be seen through high LD between the corresponding HLA alleles, as well as in the 2-, 3- and 4-locus SNP homozygosity plots
(A2-B46-The haplotype alignments of the samples homozygous for the A2-B46-DR9 and B58-DR3 haplotypes provide confirmation of these CEHs, and emphasize the stretch
A33-of genetic fixity in these haplotypes Although 8 full-length Caucasian MHC sequences have recently been reported (Horton et al 2008), the homozygous samples here provide a different dimension to the understanding of CEHs as they represent biological replicates of the extent of conservation and also unambiguously demarcate the boundaries of this extent
The A33-B58-DR3 conserved extended haplotype
The A33-B58-DR3 CEH is prominent due to its frequency (5%) and its length The haplotype alignments of the 5 individuals homozygous for this CEH provide 10 non-ambiguous phased chromosomes that show the stretch of genetic fixity of A33-B58-DR3 haplotypes The conservation of the A33-B58-DR3 haplotype is striking; 9 of the 10 chromosomes carrying the haplotype are indistinguishable from the telomeric end of the SNP map (28.9Mb) to 33Mb of the chromosome 6p, breaking just before the HLA-DPA1 locus (Figure 3.26) This marks a stretch of at least 4Mb that is inherited as a block uninterrupted by recombination, and such a long stretch of genetic fixity across unrelated individuals has never before been reported in a variation map of this resolution
Trang 11The genetic fixity of this haplotype also raises important questions about diseases associated with this CEH This CEH has been shown to be a genetic marker for nasopharyngeal carcinoma (NPC) (Chan et al 1983, Hildesheim et al 2002) as well
as adverse drug events such as Stevens-Johnson syndrome (SJS) (Hung et al 2005) The interpretation of these associations is not trivial, as implicating a disease locus is problematic given the almost perfect LD across the entire MHC region in A33-B58-DR3 haplotypes
Given the low diversity and suppressed recombination rates in the MHC-telomeric region, it is conceivable that the A33-B58-DR3 haplotype stretches from position 26.0Mb of the chromosome 6p into and across the entire MHC There is certainly precedence for this: the Caucasian CEH A3-B7-DR15 is in LD with HFE, located more than 4Mb telomeric to the HLA-A locus (Gandon et al 1996, Malfroy et al 1997)
The A2-B46-DR9 conserved extended haplotype
Compared to the A33-B58-DR3 CEH, the stretch of genetic fixity in the A2-B46-DR9 CEH is not as pronounced, but it still stretches at least 3.1Mb from 29.8Mb to 32.9Mb Chromosomes from the 8 individuals homozygous for A2-B46-DR9 are highly similar within this segment but there are 5 SNP locations within the MHC for which these 16 chromosomes show variation Having established that the quality of the SNP genotyping is very high, and as these homozygous samples are phase un-ambiguous, these alternative SNP alleles within the A2-B46-DR9 are not likely to be artefacts The mutation rate of the MHC is not higher than the genome average (Parham et al 1995), so while we cannot completely rule out the possibility of
Trang 12recurrent point mutations at these locations, it is more likely that the variability of these SNP positions in A2-B46-DR9 haplotypes are locations of gene-conversion or double-crossover recombination events Compared with the A33-B58-DR3 haplotype, the accumulation of variation and shorter stretch of genetic fixity in A2-B46-DR9 haplotypes indicated that the latter is likely to have been present in the Chinese population longer than the former
The A2-B46-DR3 CEH is also marked by its association with several autoimmune conditions such as Graves’ disease, myasthenia gravis and psoriasis, as well as nasopharyngeal carcinoma (Chan et al 1978, Lu et al 1990, Chan et al 1993, Ren et
al 1995, Vejbaesya et al 1998) The disease loci on this haplotype have not been conclusively identified and are likely to be masked by the strong LD seen here
B*4001 Haplotypes are non-conserved
Consistently throughout the data there is convincing evidence that the allele B*4001, the most common HLA-B allele in the Singaporean Chinese population with a frequency of 15.6%, does not exist as part of any conserved haplotype This is first seen from the allele and haplotype frequency counts (Table 3.3, 3.4), in which B*4001 is found with multiple HLA-A, -C and -DRB1 partners, mostly in complete linkage-equilibrium The SNP homozygosity plots reinforces this point; HLA haplotypes carrying B*4001 alleles all show low similarity along their lengths (Figures 3.8-3.10) The A*1101-B*4001 haplotype is fairly common in the population, with a frequency of 5.5%, however the majority of A*1101-B*4001 haplotypes are completely non-identical from one another These haplotypes are likely to have arisen from recombination shuffling and not remnants of a block
Trang 13inherited from a distal ancestor The same may be said for the 4-locus B*4001-C*0702-DRB1*0901 haplotype in Figure 3.25, of the 7 chromosomes carrying this combination of alleles, none of them are identical across the interval
A*1101-The B*4001 allele is ubiquitous in different populations and ethnic groups around the world; besides the Chinese populations in China, Singapore and Taiwan, it has a high frequency in indigenous populations in the Americas, Australia and Asia It is also found with a frequency of at least 5% in several Caucasian populations across the United States and Europe (Middleton et al 2003) The wide distribution of this allele suggests that it has been around in the human population for a while, and over time repeated recombination events has rendered the B*4001 ancestral haplotype unrecognisable
Alleles matching at the HLA-A, -C, -B and -DRB1 loci are taken as surrogates for matching the entire MHC haplotype in solid organ or bone marrow transplants This however obviously falls short between B*4001 individuals A recent study of graft versus host disease (GVHD) in bone marrow transplants compared patients and donors matched only at the 4 classical HLA loci to those matched with the entire MHC haplotype They report an odds ratio of 4.5 for acute GVHD in MHC haplotype unmatched compared to MHC haplotype matched patients (Petersdorf et al 2007) The data shown here indicates that the non-conservation of B*4001 haplotypes should
be taken into consideration when matching donors to recipients in solid organ and hematopoietic cell transplantations