3.3.1 Capturing HLA Alleles with Tag SNPs and Haplotypes Having shown that it is possible to generate tag SNPs with this high-resolution map, and that these tag SNPs sufficiently act as
Trang 13.3 An Integrated High-Resolution SNP and HLA Haplotype Map of the
MHC
The fine-scale organization of linkage disequilibrium across the MHC was described
in the previous section using a high-resolution SNP map In this section, HLA genotype data was merged into the SNP genotypes to construct an integrated LD map, allowing for the analysis of HLA haplotype-specific differences in LD at a higher resolution The data from HLA homozygous samples was also included into this analysis, providing valuable and unambiguous genotypes for establishing a high-resolution map of the conserved extended haplotypes in the local Chinese population
Allele level HLA genotypes at the HLA-A, HLA-B, HLA-C and HLA-DRB1 loci were determined for each sample in this study using sequenced-based typing These multi-allelic genotypes were combined with bi-allelic SNP genotypes and integrated full-length haplotypes were re-constructed using the program PHASE (Stephens and Scheet 2005) To maximise the phase-certain haplotypes provided by the family-based samples, the phasing was performed in 2 steps Parental haplotypes were first constructed from the parents-offspring genotypes unambiguously, and next these phase-certain haplotypes were seeded into the set of unrelated samples, improving the performance of the algorithm (Stephens and Scheet 2005) The program PHASE provides an estimate to the quality of the phasing and this was very high; the average phase certainty for a genotype was over 99%, and 96% of the genotypes had a phase certainty of 100% Given the high phasing percentages, the high-density of this map and that most of the SNPs are in strong LD with at least another SNP, it is not believed that the in-silico haplotype-reconstruction introduced much bias into the results presented in the following pages
Trang 23.3.1 Capturing HLA Alleles with Tag SNPs and Haplotypes
Having shown that it is possible to generate tag SNPs with this high-resolution map, and that these tag SNPs sufficiently act as proxies for other SNPs, the tagging of common HLA alleles in the local Chinese population using single SNPs or combinations of SNPs was explored To do this, an iterative algorithm was created to search for 1) the best single SNP and 2) the best multiple-SNP haplotype that would tag for each common HLA allele, based on the r2 coefficient All combinatorial possibilities of 2 to 6-SNP haplotypes were assembled and the haplotype that tags a HLA allele with the maximum r2 was taken as the tagging haplotype for that particular allele The results are shown in Table 3.9
Using single-SNP tags, most HLA alleles are not captured well, with only 9 out of the
26 common HLA alleles captured with a maximum r2 >= 0.8 The mean single-tag maximum r2 was 0.67 Interestingly, there is no connection between LD, conserved haplotypes or allele frequency with the ability to tag an HLA allele Alleles that lie on conserved haplotypes such as A*0207, A*0203, B*4601, DRB1*0803 are not captured by a single SNP tag Neither are common alleles such as A*1101 (30% in the population) nor DRB1*1202 (9.4% in the population) No single tag SNP successfully captures a HLA allele perfectly (r2=1)
Multiple-SNP haplotype tags however perform much better than single tags 25 of the
26 HLA alleles were tagged with an r2 of at least 0.8, with a mean maximum r2 value
of 0.95 In some cases the performance is dramatically better; DRB1*1602 is tagged with an r2 value of 0.23 using a single SNP but is captured with r2 =0.95 using a 5-
Trang 3SNP haplotype It is however noted that the HLA-A*0201 allele could not be captured by a single or a multi-SNP tag with high r2
As r2 is not merely a measure of linkage disequilibrium but rather a correlation factor
of allele frequencies (Ott 1999), a high r2 between a single SNP on this map and an HLA allele is unlikely unless that SNP allele is unique only to chromosomes carrying that HLA allele This is reflected in the low single-SNP scores Similarly, a high r2between multi-SNP haplotype tags and an HLA allele occurs when that combination
of SNP alleles is unique (or nearly so) to chromosomes carrying that HLA allele By extension, such a multi-SNP haplotype represent segments of the MHC that is identical-by-descent in most chromosomes carrying that HLA allele, and the underlying SNP haplotype unique to the HLA allele The inability to capture the HLA-A*0201 allele with high r2 reflects the difficulty in identifying segments on A*0201 haplotypes that fulfil these 2 criterion
Trang 4Table 3.9 Using Tag SNPs to Define HLA Alleles
Trang 53.3.2 Extended Haplotype Homozygosity of HLA Alleles
Data from the first SNP map described in Section 3.1.3 of this thesis showed that LD patterns across the MHC vary in a HLA haplotype/allele-specific manner To describe this in detail using the high-resolution SNP map, the extent of LD in haplotypes carrying specific HLA alleles was examined using the Extended Haplotype Homozygosity (EHH) analysis Briefly, EHH calculated at a point X is defined as the probability that 2 chromosomes, carrying an allele (or haplotype) of interest at an anchor locus, are identical-by-descent from the anchor locus to the point X (Sabeti et
al 2002) Linkage disequilibrium decays with increasing distance from the anchor locus as the number of historical recombination events that occurred within increases, leading to a similar decay in EHH values A segment of high EHH is therefore indicative of the conservation of an extended haplotype without recombination, and segments of high EHH are identical-by-descent in the respective haplotypes
Extended haplotype homozygosity calculations were performed with each of the 4 classical HLA loci (HLA-A, -B, -C and DRB1) as the anchor locus Only common HLA alleles (>= 5% in the population) were included in this analysis These EHH values were plotted as a function of physical distance and shown as a set of charts in Figure 3.18 (panels A to D) Figure 3.19 summarizes the extent of genetic fixity for chromosomes carrying each common HLA allele, by depicting the segments of high EHH in coloured blocks
Trang 6HLA-F Telo
G -
HLA-A HLA-A
A*3303 A*0207
A*0203 A*2402
HLA-C
C*0302
C*0702 C*0801
HLA-B
LTA - BAT2
HLA-B B*5801
B*1301
B*1502
B*4001 B*4601
B*3802
HLA-C Figure 3.18 Extended Haplotype Homozygosity Plots for Common HLA Alleles
(See legend on the next page)
Trang 7Figure 3.18 Extended Haplotype Homozygosity Plots for Common HLA Alleles
Extended Haplotype Homozygosity (EHH) plots of 1Mb wide segments centred on the anchored locus are shown in this figure EHH values (primary Y-axis) are plotted against physical location along chromosome 6p (X-axis)
The 4 panels are anchored on 4 different classical HLA loci Panel A: HLA-A, B: HLA-C, C: HLA-B, D: HLA-DRB1
Recombination hotspots are mapped onto the plots Vertical red bars rising from the X-axis are hotspots inferred from the recombination rate published by the HapMap project (International HapMap Consortium 2005) The recombination rate is reflected in the secondary Y-axis of the plots
Translucent orange areas are boundaries of recombination hotspots identified by Cullen et al
2002, through genotyping of short tandem repeats in recombinant sperm The names of these hotspots are shown in red lettering above each plot, and correspond to the labels in Cullen et al
2002
Green bars are locations of recombination hotspots precisely mapped by Jeffreys et al 2000,
2001 through DNA sequencing of recombinant sperm Labels of these hotspots correspond to those published by Jeffreys et al There are multiple hotspots within a small 5-10Kb window and these are collapsed into a single bar on the plots
Figure 3.19 Extent of EHH of Common HLA Alleles
This figure summarizes the extent of EHH for all common alleles at the 4 HLA loci (A,B,C and DRB1)
The top table indicates the segments of high EHH for each common HLA allele Segments with EHH >=0.5 and EHH >= 0.8 are listed
The bottom panel illustrates the extent of EHH for all common HLA alleles The darker shades indicate stretches where EHH is higher than 0.8 while the lighter shades mark out stretches where EHH is greater than 0.5
The long stretches of EHH for the conserved extended haplotype DRB1*0301 is clearly seen in the figure
Trang 8A*3303-C*0302-B*5801-Start (Mb) End (Mb) Length (Mb) Start (Mb) End (Mb) Length (Mb) A*0201 29.803 30.032 0.23 29.680 30.187 0.51
A*3303 B*1301 B*1502 B*3802 B*4001 B*4601
B*5801
C*0304 C*0702 C*0801
DRB1*0301 DRB1*0405 DRB1*0901 DRB1*1101 DRB1*1202 DRB1*0803
DRB1*1501 DRB1*1602
(See legend on previous page)
Trang 9In the plots of EHH as a function of physical distance, decay of EHH at increasing physical distance from the anchor locus is evident EHH does not decay in a gradual curve but rather in a step-wise manner, with segments of constant EHH punctuated by
a sudden breakdown in values Sites of EHH decay across different allelic backgrounds tend to be clustered together in the same location To see if these segments of clustered EHH decay are consistent with the location of established recombination hotspots, the locations of the latter are mapped onto the EHH plots
There are 2 sources of information for recombination hotspots across the MHC One
is the hotspots inferred from predicted recombination rates in HapMap populations (International HapMap Consortium, 2005) The other is recombination hotspots identified by genotyping recombinant sperm from random individuals, and there were two groups that separately published sperm crossover locations within the MHC By genotyping short tandem repeats in recombinant single-sperm, the first group identified 6 recombination hotspot-regions across the MHC (Cullen et al 2002) Using a slightly different approach of pooled-sperm typing, the second group very precisely mapped another 6 recombination hotspots within the class II region of the MHC (Jeffreys et al 2000, 2001)
The locations of inferred and sperm-recombinant hotspots show a strong agreement with the sites of EHH decay In fact every recombination hotspot, either from the HapMap data or determined from sperm recombinants, lie in locations where there is significant decay in EHH This lends weight to the hypothesis that regions of high EHH are stretches of DNA that are identical-by-descent and unbroken by recombination, interrupted by sites of clustered EHH drops that are likely to be a
Trang 10result of increased homologous recombination activity There is a great amount of variability in the rate of decay of EHH depending on the HLA allele background The A*3303-C*0302-B*5801-DRB1*0301 conserved extended haplotype (CEH) stands out immediately from the plots On chromosomes with a B*5801, C*0302 or a DRB1*0301 background, the half-life of EHH (where EHH remains above 0.5) stretches more than 3Mb across the classical MHC region (Figure 3.19) Put another way, if 2 samples carrying any one of these 3 alleles are picked at random, there is a greater than 50% chance that they are completely identical-by-descent across the entire MHC The other 3 conserved extended haplotypes identified earlier do not exhibit the same remarkable extent of EHH; the half-life of EHH is about 1Mb for alleles A*0207, B*4601 and B*1502, and for the other CEH associated alleles there are no significant differences from non-CEH alleles
The segments of strong homozygosity (EHH >=0.8) and high homozygosity (EHH >= 0.5) seen at each HLA loci point to regions that are in strong allelic association with HLA alleles at that loci All HLA-A alleles, with the exception of allele A*1101, show strong EHH (EHH >=0.8) at the telomeric end that stretches at least to HLA-G (located at 29.90Mb) and HLA-F (29.81Mb) It is therefore likely that each HLA-A allele is associated with only one HLA-F and HLA-G allele If EHH >=0.5 is considered, most HLA-A alleles have high homozygosity to at least position 29.68Mb, and this region includes the gene loci MOG (29.74Mb) The centromeric boundary of strong EHH typically lies just after the HLA-A loci for most A-alleles Haplotypes carrying A*0207 show strong homozygosity to position 30.4Mb with high EHH to position 30.7Mb This segment of the chromosome contains a TRIM cluster (30.17Mb – 30.30Mb) and the HLA-E loci (30.56Mb) For chromosomes carrying
Trang 11allele A*3303, the segment of strong EHH stretches more than 1Mb This segment includes the olfactory receptor cluster at position 29.50Mb The half-life EHH of A*3303 chromosomes stretches beyond the telomeric end of this SNP variation map and beyond the HLA-B locus at the centromeric end, encroaching into the MHC class III region
At the HLA-B locus, all HLA-B alleles have strong homozygosity (EHH>=0.8) that reaches past the MICA locus (31.48Mb) and the HCP5 gene (31.54Mb) Consequently, each HLA-B allele is likely to be associated with a single allele at those loci The centromeric boundary for strong homozygosity for 4 out of 6 HLA-B alleles lie at a recombination hotspot around position 31.54Mb Strong homozygosity for the other 2 alleles, B*1502 and B*5801, stretches to the recombination hotspot between LTA and BAT2 at around position 31.68Mb These 2 alleles are likely to have strong allelic association with genes into the class III region, which includes MICB, BAT1, NFKBIL1, LTA, TNF, LTB, NCR and AIF1 The EHH half-life segment for allele B*5801 stretches more than 3.44Mb, starting from the telomeric boundary of this SNP map until past the DRB1 region Apart from allele B*4001, the telomeric boundary of strong EHH for HLA-B alleles reaches to least position 31.2Mb, coinciding with a recombination hotspot at position 31.214Mb seen in the HapMap data This segment includes the psoriasis susceptibility candidate genes PSORS1C1, PSORS1C2 and PSORS1C3, as well as HLA-C and POU5F1 that encodes for the OCT4 transcription factor necessary in maintaining stem cell pluripotency This high EHH block consistent in all HLA-B alleles coincides with the Cw-B frozen block described previously (Degli-Esposti et al 1992b, Yunis et al 2003)