3.5 Fine Mapping of Recombination Hotspots in the MHC The vast majority of meiotic recombination hotspots annotated in the human genome are derived from population recombination rate est
Trang 13.5 Fine Mapping of Recombination Hotspots in the MHC
The vast majority of meiotic recombination hotspots annotated in the human genome are derived from population recombination rate estimated using data from genome-wide variation maps (Myers et al 2005, International HapMap Consortium 2005), but very few recombination hotspots have been confirmed and finely mapped in recombinant haplotypes In humans, indisputable determination of meiotic recombination hotspots is only possible by family linkage analyses and crossover mapping in recombinant sperm However the former is not sufficiently powered for accurate fine mapping for hotspots (Arnheim et al 2003) while the latter does not scale up well (Kauppi et al 2004) Hence only a handful of recombination hotspots have been definitively mapped through recombinant sperm mapping (Jeffreys et al
1998, 2000, 2001, Jeffreys and Neumann 2002, 2005, Cullen et al 2002)
In the MHC, using allele-specific PCR primers to selectively amplify recombinant haplotypes in pooled sperm DNA, 6 sperm-crossover locations within the Class II region were previously finely mapped to hotspots of around 2kb in length (Jeffreys et
al 2000, 2001) Using a different approach, a separate study genotyped polymorphic STR sites in amplified DNA from single-sperm to map 6 recombination hotspot regions across the entire MHC (Cullen et al 2002) However, these hotspots were identified at a lesser resolution ranging from 35kb to 105kb
The underlying LD structure in the Chinese population has strong correlation with the location of the sperm-mapped recombination hotspots; all the hotspots precisely mapped by Jeffreys and co-workers (Jeffreys et al 2000, 2001) occur outside of defined haplotype blocks Although the relatively lesser resolution of the hotspot
Trang 2regions identified by Cullen and co-workers (Cullen et al 2002) preclude the ability
of comparing them to the more precise boundaries of SNP haplotype blocks, this set
of hotspot regions match up well with the EHH plots; EHH values reflect the decay of haplotypes due to recombination (Sabeti et al 2002) and within each of these Cullen regions, EHH values of different HLA-haplotypes are seen to drop (Figure 3.18) Furthermore, there are multiple examples of individuals who carry recombinant haplotypes where the haplotype breaks occur within the Cullen regions These suggest that population genetics data could be utilised to narrow down crossover locations in the Cullen hotspot regions
With this in mind, the utility of the SNP variation data in this study for fine mapping crossover locations was explored Within each of the Cullen hotspot segments, the location of the crossover was refined by identifying intervals on the SNP map using the following factors:
1) Spikes in the population recombination rate,
2) Locations of EHH,
3) Boundaries of haplotype blocks,
4) Presence of recombinant haplotypes across the interval
The 6 Cullen hotspot segments were first mapped to the MHC using the corresponding STR markers (obtained from Cullen et al 2002) Within each of these segments, the distribution of population recombination rate was overlaid using data from the HapMap project (International HapMap Consortium 2005) Next, using the EHH data for HLA haplotypes in Section 3.3.2 of this thesis, localized sites of EHH decay across two or more HLA haplotypes were identified within the Cullen
Trang 3segments Detailed LD structure from haplotype boundaries defied in Section 3.2.3 was also overlaid across each segment Finally, SNP haplotypes in the local Chinese population were scanned for recombinant haplotypes within these Cullen segments: samples that carry SNP haplotypes homozygous telomeric to the recombination hotspot with homozygosity diverges centromeric to it (telomeric-homozygous samples) and similarly, samples that were homozygous centromeric to the hotspot, but heterozygous telomeric to it (centromeric-homozygous samples) The homozygous breakpoints in these samples with recombinant haplotypes define an interval of chromosomal crossover associated with the recombination hotspot SNP intervals that are a confluence of the points above were identified as the likely crossover interval within the Cullen hotspot segments
Having narrowed down the hotspot location to SNP intervals, the samples with recombinant haplotypes were re-sequenced across these intervals and additional polymorphic markers were identified The genotypic pattern of these polymorphic sites in the recombinant haplotypes was used to mark out the boundary where chromosome crossover occurred; between the last common homozygous markers in telomeric- and centromeric-homozygous samples respectively This way, the crossover window in these recombination hotspots can be unambiguously identified at
a base-pair level
The 6 Cullen recombination hotspot rergions and corresponding crossover intervals narrowed using the SNP maps are listed in Table 3.11 Sets of PCR and DNA sequencing primers were designed to amplify and re-sequence across 3 of these SNP intervals In each of these 3, the re-sequencing results uncover additional polymorphic
Trang 4sites across the interval and these were used to frame the crossover location As a
proof of concept, the recombination hotspot mapped within the TAP2 locus (Jeffreys
et al 2000) was also re-sequenced using the same approach of SNP interval and
sample selection The re-sequencing results of each of these 4 hotspots are detailed in
the following pages
In the SNP variation map, homozygous haplotypes are seen to break across an
interval from positions 32,912,195 to 32,913,448 of the SNP map, located within the
TAP2 gene 14 individuals carrying recombinant haplotypes across this interval is
seen; 6 have haplotypes that are homozygous up to the telomeric end of this interval,
while 8 samples were seen to be homozygous up to the centromeric end A spike in
the estimated population recombination rate is reported in the HapMap data in this
region of the chromosome, and also coincides with a decay in EHH across multiple
HLA-DRB1 haplotypes, including DRB1*0301, DRB1*1602 and DRB1*1501 found
RNG-CA to DPB2A2 33,070,852 - 33,172,428 101,576 33,129,170 - 33,133,471 4,301
1-9 to 1-9d 29,914,298 - 29,995,182 80,884 29,946,621 - 30,007,656 61,035 3-3A to DRA-CA 32,406,317 - 32,511,466 105,149 32,446,673 - 32,450,515 3,842 G5-11525 to G4-96 32,778,014 - 32,813,417 35,403 32,789,623 - 32,792,322 2,699
Table 3.11 Narrowing of Sperm-Mapped Recombination Hotspot Regions Using Population Data
The locations and sizes of the 6 recombination intervals mapped by Cullen et al 2002 through single-sperm typing is listed in this table Using population data from the HapMap project and the SNP variation map in this study, the hotspots in these intervals were narrowed
The first three hotspots on this list (in bold) were re-sequenced in recombinant haplotypes in this study.
Hotspot Segments Identified using STR Markers Typed in Single-Sperm
(Cullen et al 2002)
Refined Locations of Hotspots Using LD Structure and HapMap Population Recombination Rate
Trang 5in CEHs (Figure 3.29, panel B) This interval also falls in between 2 haplotype blocks described earlier (Figure 3.29, panel A)
To accurately determine the location of the haplotype breaks in the telomeric- and centromeric-homozygous samples, 4-overlapping PCR fragments spanning 8.5kb from position 32,907,046 to 32,915,571 were amplified from genomic DNA of these
14 samples A total of 36 sequencing primers were designed and utilized to tile across the segment in a bidirectional manner Polymorphisms were identified from the sequence reads, and a locus is tagged as polymorphic only if the alternate allele is seen in at least 2 samples
A total of 40 polymorphic sites were identified across the interval, 2 of these were indels and the remaining 38 were bi-allelic SNPs The genotypes of these 40 polymorphic sites in the 14 re-sequenced samples are listed in Table 3.12 The results indicate that all the telomeric-homozygous recombinant samples show a crossover after position 32,912,197, while the centromeric-homozygous recombinant samples had signs of crossing-over before position 32,914,099 Each of the 14 samples carries its first heterozygous marker within this 1.9kb window, allowing the definition of a narrow segment in which homozygosity breaks and crossovers are clustered This crossover window starts within the first exon of the TAP2 gene and ends within its second intron, raising the possibility that the open reading frame may be affected by the crossover (Figure 3.30)
Trang 6Figure 3.29 Recombination Hotspot in the TAP2 Locus
From the SNP variation map, several indicators point to a 1294bp interval in the TAP2 vicinity as a recombination hotspot
Panel A: The LD heatmap shows that the locus falls between haplotype blocks defined earlier
Panel B: EHH plot across the interval Several DRB1 haplotypes show a decay of EHH across this locus There is also a spike in the population recombination rate estimated in the HapMap project (black vertical lines) A crossover location mapped in recombinant sperm typing (Jeffreys et al 2000), maps to this interval and is indicated
by a downward red triangle
Panel C: 14 individuals that carry recombinant haplotypes across this interval are shown in this panel Both haplotypes of each individual is shown, together with the linked DRB1 allele 6 samples are seen to be homozygous leading to the telomeric edge of a SNP at position 32912195 and the homozygosity breaks after the locus Similarly,
8 samples are homozygous centromeric to position 32913027 with homozygous haplotypes breaking across the locus This establishes a window across which multiple crossover events occurred The TAP2 locus is centromeric
to HLA-DRB1, and the DRB1 alleles of these samples are provided as a reference
CM99002
WGP068 NP113 NP348 CF02756
Trang 7Using a pooled-sperm approach and selective PCR for amplifying recombinant haplotypes, Jeffreys and colleagues mapped a recombination hotspot was precisely to
a 1.4kb window between positions 32,912,006 and 32,913,385, centring in the 2ndintron of the TAP2 gene (Jeffrey et al 2000) This sperm recombination hotspot mapped overlaps precisely with the crossover cluster seen here in recombinant samples, with the approximate centre of the sperm recombination hotspot (32,912,695) lying within this crossover window The agreement between these 2 sets
of data lends support to the strategy employed here in combining population SNP variation data as a guide for fine mapping the crossovers clusters
The sequence within the 1.9kb crossover window is shown in Figure 3.31 The telomeric end of the window contains a full-length MER7A – a DNA transposon The centromeric end of the window contains the 2 TAP2 exons There are also 18 motifs
in this sequence that matches 6- to 9bp motifs found over-represented in recombination hotspots (Myers et al 2005), with most of these motifs residing in the exon of TAP2
Trang 9Figure 3.30 Genomic Location of the TAP2 Hotspot
This figure illustrates the location of the crossover window identified through re-sequencing of individuals carrying recombinant haplotypes The crossover window is located within the TAP2 gene loci and marked out
in the pink silhouette The exon-intron structure of the TAP2 gene is shown as orange blocks on the top of figure, and grey tracks are used to mark out locations of repeat elements in the region
The genotypes of the 14 re-sequenced samples are indicated at the bottom, with polymorphic sites drawn as circles Blue and red circles indicate that the individual is homozygous and heterozygous at that location respectively Larger circles indicate SNPs genotyped in the SNP panel, while smaller circles are additional polymorphic loci uncovered via re-sequencing
The centre of the recombination hotspot identified in recombinant-sperm (Jeffreys et al 2000) lies within this crossover window (red arrow), supporting the strategy used to select SNP intervals and recombinant haplotypes for re-sequencing
Trang 10Figure 3.31 Sequence of the Crossover Window within the TAP2 Recombination Hotspot
The 1902bp sequence inside the TAP2 crossover window is detailed in this figure Polymorphic sites uncovered through re-sequencing are shown as highlighted orange residues
The approximate centre of the sperm recombination hotspot mapped by Jeffreys et al 2000
is indicated in pink highlights
Two DNA transposons lie within this window, marked-out in the black border This window also overlaps with 2 TAP2 exons identified by blue borders
Eighteen 6- to 9bp motifs that coincided with those found over-represented in
MER7A, DNA Transposon
MER96, DNA Transposon
TAP2 Exon1
TAP2 Exon2
Trang 113.5.2 Recombination Hotspot Telomeric to the HLA-F Locus
A 45kb segment flanked by 2 STR markers (RF and MOG-CA) was identified as a recombination hotspot region in single-sperm typing experiments, with a recombination rate 2.4 times greater than expected (Cullen et al 2002) From the SNP variation map generated in this study, several factors point to an 826bp interval (between positions 29,791,787 and 29,792,613) within this segment as a crossover interval First, the population recombination rate estimated in the HapMap project is seen to be elevated here, and this interval falls outside of haplotype block boundaries defined earlier (Figure 3.32, panels A and B) Haplotype breaks are also seen to occur here, most notably the ancestral A2 haplotypes start differing across this interval, as discussed in a previous section of this thesis (Figure 3.20) This haplotype break is further supported by homozygosity drops in the EHH plots (Figure 3.32, panel B) There were 7 recombinant samples that were homozygous in the SNP map telomeric
to this interval and 12 recombinant samples that were homozygous centromeric to it All 12 centromeric-homozygous samples were carrying A2 alleles on both chromosomes (Figure 3.32, panel C), reiterating that this interval marks a common breakpoint in A2 chromosomes These 16 samples were re-sequenced across this interval to refine the crossover location
To re-sequence across this window in these samples, 3 PCR fragments tiling across positions 29,790,747 to 29,798,776 were amplified in the 19 samples and sequenced using 24 primers generating overlapping reads in both strands Based on coordinates from the reference human genome assembly (NCBI build 36, which uses the PGF haplotype as the reference across the MHC), the segment sequenced should be 8kb in total However, when the PCR product lengths did not corroborate with the expected
Trang 12sizes, it was discovered upon further inspection that the reference PGF haplotype carries a large 3kb SVA retrotransposon from positions 29,793,009 – 29,796,033 This SVA element was not detected in any of the 19 samples and also not found in the other 2 well-annotated full-length MHC haplotypes, COX and QBL (Traherne et al 2006) The 3 PCR fragments therefore covered only 5kb of unique sequence, however
in staying with convention, the coordinates reported here will be based on the reference (PGF) sequence
Trang 13A B
A*0207 A*2402 A*0201 A*0206
A recombination hotspot region was identified within a 45kb segment from single-sperm typing experiments (Cullen et al 2002) Using data from the SNP variation map and HapMap project, this hotspot was narrowed
to a SNP interval on the SNP map
Panel A: The LD heatmap shows this SNP interval falling between haplotype blocks defined earlier
Panel B: EHH plot across the segment The position of the hotspot segment mapped by Cullen and co-workers
is indicated by the red bar at the top of the chart EHH values of HLA haplotypes decay of across this interval This interval also coincides with the breaks of HLA-A02 ancestral haplotypes discussed earlier (Section 3.3) There is also a spike in the population recombination rate estimated in the HapMap project (black vertical lines) across the interval
Panel C: 19 individuals who carry recombinant haplotypes across this interval are listed in this panel with both SNP haplotypes of each individual shown together with the linked HLA-A allele 7 samples are seen to
be homozygous leading to the telomeric edge of a SNP at position 29,791,787 while 12 samples are homozygous centromeric to position 29,792,613 All of these 12 samples carry A2 alleles at the HLA-A locus This haplotypes of the recombinant samples narrow the crossover window within this hotspot to 827bp.
WGP050 CM01232 NP368 WGP006
CF0914 BM05/075 NUH032 CM0452
NUH042 CM01510 NP192 WGP030 CM0791 CM99003 BM05/195
CM0835