1. Trang chủ
  2. » Giáo Dục - Đào Tạo

A linkage disequilibrium map of the human major histocompatibility complex in singapore chinese conserved extended haplotypes and ancestral blocks 4

25 260 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 25
Dung lượng 6,29 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

For constructing the LD and haplotype maps of the Singaporean Chinese population, only genotype data from the 208 unrelated individuals 6 samples failed the Illumina genotype assays were

Trang 1

3.2 A High-Resolution Linkage Disequilibrium Map of the MHC

In the preceding section, the low-resolution, first-generation SNP map provided an

overview of the linkage disequilibrium patterns of the MHC in the Singaporean

Chinese population From that map, the block-like structure of LD is seen and the

conserved extended haplotypes stretching across megabases were described However

the density of the first generation SNP map limits the ability to resolve fine-scale

recombination patterns in the MHC While only 31.5% of that map falls within a

haplotype block, the recent HapMap publication concluded that the fraction of the

genome covered by haplotype blocks is greater than 65% (International HapMap

Consortium, 2005)

In a bid to delineate the fine-scale recombination patterns, a higher resolution

SNP-variation map of the MHC in the local Chinese population was created Rapid

improvements in SNP genotyping technology coupled with increased polymorphism

data from the International HapMap Project and MHC studies in other populations

(e.g Miretti et al 2005, de Bakker et al 2006) facilitated a construction of such a

map Having established the conserved haplotypes in the previous study, these were

taken into consideration and HLA-homozygous samples were sourced for and

included in the sample set Genotyping these homozygous samples at a

high-resolution provides a high quality dataset to study in detail the conserved haplotypes

in the local population, and to compare these CEHs with those reported in other

populations The data reported here will also provide a resource for studying and

understanding HLA-disease associations

Trang 2

3.2.1 High-Resolution SNP Variation Map of the MHC

For constructing a fine-scale variation map of the MHC, 2360 SNPs were genotyped

in 284 Singaporean Chinese individuals The bulk of these samples consisted of 214

randomly selected and unrelated individuals Of these 77 overlapped with the samples

genotyped in the previous map Another 27 samples were taken from archived

B-lymphoblastoid cell-lines that were tested and selected for being homozygous at 2 or

3 HLA loci These samples were representative of the CEHs identified in the previous

section The final 41 samples were taken from 12 parental-offspring families (with at

least both parents and a child) These 12 families provided 48 phase-unambiguous

haplotypes that would be useful for improving the haplotype-reconstruction in the

unrelated individuals A breakdown of the 284 samples is shown in Table 3.5 below

Table 3.5: Composition of Samples Used in Constructing High

Resolution SNP Variation Map

Trang 3

SNP genotyping was once again performed using the Illumina GoldenGate assay on a

BeadArray platform Of the 2360 SNP positions attempted, 2290 were successfully

genotyped The overall genotyping quality was very high; the locus success rate was

over 97%, the call rate was over 99% and the reproducibility was higher than 99.99%

Of the 284 Chinese samples genotyped, results were not obtained for 6 (all belonging

to the unrelated individuals group), giving a sample success rate of 97.9%

For filtering out uninformative and possibly erroneously called genotypes, a series of

filters was employed Only SNPs with at least a 5% minor allele frequency and in

Hardy-Weinberg equilibrium (using a p-value threshold of 0.001) were retained The

minor allele frequency and heterozygosity distributions of the 2290 markers can be

seen in Figure 3.11 These charts show that the 2290 SNPs had a uniform MAF

distribution with more SNPS skewing towards the higher end of the heterozygosity

Trang 4

To weed out other potential genotyping errors, SNPs that had genotypes

disconcordant with pedigree structure in more than one family were also removed

The locations of the SNPs were re-confirmed by mapping the flanking sequences used

in the design of the SNP assays back to the human genome assembly This resulted in

the remapping of 2 SNPs within the MHC The SNP “rs2308655” was remapped from

31,345,141 to 31,430,282 while “rs1611627” was remapped from 29,965,650 to

29,905,761 In both of these cases, the error was in the Illumina annotation, and the

error was communicated back to them

In total 1877 markers were retained, establishing a SNP map that covers a 4.91Mb

segment of the chromosome 6p, from positions 28.97 to 33.88Mb With an average

gap of 2.6kb (and a median of 1.6kb) between consecutive SNPs, this map is about 8

times denser than the previous one Gap intervals range from 18bp to 71kb with over

88% of the gaps less than 5kb There were 6 distinct gaps that span over 25kb and

these are listed in Table 3.6 Two of the largest gaps were within the hyper-variable

HLA-DRB (71kb) and RCCX loci (59kb), which exhibit MHC haplotype-specific

lengths and gene content (Dawkins et al 1999) Individuals carrying different MHC

haplotypes may differ in the number of HLA-DRB paralogues as well as different

number of copies of the C4A/C4B genes within the RCCX locus The other large gaps

cover segments that are densely packed with large tracks of repetitive and

transposable elements These gaps most probably reflect difficulties in designing SNP

assays in regions with repetitive sequences and variable-length polymorphisms,

resulting in the lack of genotype information here

Trang 5

Table 3.6: Gaps Larger than 25kb in the High-Resolution SNP Map

Gap Length

(kb)

Position Along Chromosome 6p (Mb) Description of Loci

The large gaps in this map coincide with regions of complex polymorphism and repeat

elements, reflecting the difficulty in designing SNP assays here

For constructing the LD and haplotype maps of the Singaporean Chinese population,

only genotype data from the 208 unrelated individuals (6 samples failed the Illumina

genotype assays) were used As the 29 specifically chosen homozygous cell-lines and

the 41 family-chromosomes were not a random sampling of the local Chinese

population, these were not included in constructing population LD maps However,

genotype information from the HLA homozygous cell-lines are a valuable source of

extended haplotypes across the MHC and these were used in subsequent analysis of

HLA haplotypes and recombination breakpoints The family-based genotypes were

used to reconstruct phase-unambiguous haplotypes that were subsequently used to

improve the haplotype phasing of the unrelated individuals (See Methods)

The allele frequencies for the SNPs in this data set were compared to those reported

for the 4 populations genotyped as part of HapMap project (International HapMap

Consortium, 2005) As expected, of the 4 populations the allele frequencies in the

Trang 6

local Chinese show the tightest correlation with those reported in the Beijing Chinese

0.84), reflecting the relatively recent shared ancestry of the 2 ethnic groups The CHB

and JPT datasets are frequently combined in HapMap data releases, but the results

here indicate that when using HapMap data for designing informative genotyping

panels in the local Chinese population, it is better to consider the CHB data only

Figure 3.12 Comparing Allele Frequencies with HapMap Panels

Allele frequencies for the 1877 informative SNPs genotyped in the local Chinese population were plotted against the corresponding allele frequencies from each HapMap population and the Pearson correlation coefficient was calculated

Clockwise from top left: CHB – Han Chinese (Beijing), JPT – Japanese (Tokyo), CEU – Caucasian (CEPH), YRI – African (Yoruban, Nigeria) Data was obtained from HapMap release 22

Trang 7

3.2.2 Estimating Coverage of Known Variation in the MHC using the

high-resolution SNP Map

The MHC is known to be the most polymorphic region in the genome and the 1877

SNPs genotyped in this study is a subset of the known variation here (Horton et al

2008) The publicly available HapMap data offers the opportunity to address how

effective a proxy this 2.6kb-resolution SNP map is to the other known SNPs in the

Chinese population Having established above that HapMap Han Chinese data is a

good representative for allele frequencies in the local Chinese population, this Han

Chinese data was used as a surrogate test set Deposited Han Chinese genotypes in

release 22 of the HapMap consist of 9479 SNPs across the MHC, including the 1877

informative SNPs genotyped in this study To test the efficacy of these 1877 in

representing the variation in the remaining HapMap Han Chinese SNPs not genotyped

remaining HapMap SNPs were calculated from the HapMap Han Chinese genotypes

The results are plotted in 2 bar charts in Figure 3.13 The panel of SNPs used in this

study represents most of the variation in the HapMap Han Chinese population well

Of the 7602 HapMap SNP loci not genotyped in this study, more than half (51.1%)

0.84 Uninformative SNPs make up bulk of the 341 HapMap SNPs that were poorly

poorly represented SNPs have a MAF of less than 5%

Trang 8

Interestingly, the distribution of the 341 poorly represented SNPs was not uniform

across the MHC, but rather there were 125 poorly represented SNPs concentrated

within a 900kb segment (32.3Mb to 33.2Mb) defined as the class II region (Horton et

al 2004), while the remaining 216 were scattered across the other 4Mb of this SNP

map Furthermore, the 125 poorly represented SNPs in the class II region had an

average minor allele frequency of 8%, compared to an average of 4% for the other

216 This result suggests that although the overall performance of the 1877-SNP map

in capturing HapMap variation is very high, some common variation in the class II

that LD in the class II region is lower than the rest of the MHC

Figure 3.13 Estimating Coverage of Known Variation in the MHC using the Resolution SNP Map

between the 1877 SNPs used and the remaining HapMap SNPs within the MHC locus, were calculated using the genotype data of the HapMap Han Chinese population

(red portion of bar chart) Only 341 (4.5%) of the 7602 SNPs were poorly represented (defined as

Panel B: Of these poorly represented SNPs, the majority of them are present at a frequency of less than 5% in the population, and thus not informative in the Han Chinese population

r 2 = 1.0

Trang 9

3.2.3 Fine-scale Linkage Disequilibrium Patterns of the MHC

Linkage-disequilibrium structure of the MHC was analysed in 2 ways First, the

pairs of SNPs up to 500kb apart This gives an overview of LD decay across the

4.9Mb SNP map Second, as recombination is known to occur at preferred ‘hotspots’

and not uniformly across chromosomes, the detailed localised variation of LD over

kilobases was resolved by describing the location of haplotype blocks across the

MHC

The MHC can be divided into 5 sub-regions that reflect the clustering of the different

classes of HLA genes within (Horton et al 2004) To see if LD patterns differ across

these sub-regions, the SNP map was divided accordingly, with LD analysed in each

sub-region separately and also across the MHC as a whole The 5 sub-regions are:

Extended class I (29.0Mb to 29.8Mb), class I (29.8Mb to 31.6Mb), class III (31.6Mb

to 32.3Mb), class II (32.3Mb to 33.2Mb) and extended class II (33.2 to 33.9Mb)

In this high-resolution variation map, all SNPs are in high LD with at least one other

SNP, as determined by D′ Of the 1877 SNPs, 1872 are in perfect LD with at least one

neighbouring SNP (D′=1) while the 5 remaining SNPs have at least a D′=0.9 with a

pairs, LD is seen to decay with increasing physical distance across the MHC (Figure

Trang 10

3.14) SNP pairs less than 20kb apart have an average D′ of 0.81 and pairs separated

by 500kb have an average D′ of 0.32 However, there is a noticeable difference in the

rate of decay of LD across the different sub-regions of the MHC The class I, class III

and extended class II segments show a level of LD similar to the MHC average, but

SNP pairs in the extended class I region show a lower rate of LD decay across

distances while the opposite is seen for the class II region Across the extended class I

region, SNP pairs less than 20kb apart have an average D′ of 0.91, and pairs separated

by 500kb have an average value of 0.36 By contrast, the corresponding values within

the Class II region are 0.78 and 0.28 This pattern of LD confirms the observation

reported in Caucasian MHC haplotypes (Miretti et al 2005), and is in concordance

with the higher LD in the telomeric segment described in the previous section

Figure 3.14 Pairwise Linkage Disequilibrium as a Function of Marker Distance

Average linkage disequilibrium values (r2 – left, D’ – right) between all SNP pairs up to a distance of 500kb apart are shown as a function of physical distance Greater physical distance affords more opportunity for recombination, hence the general trend of LD decreasing with increasing marker distance There is a noticeable spread between LD values

in the Extended Class I (blue curves) versus Class II (green curves) segments

Trang 11

However these average pairwise LD values mask the local variations seen on a

finer-scale Widely spaced SNPs up to 500kb apart can be found in perfect LD (D′=1),

while some closely spaced markers less than 1kb apart exist in complete equilibrium

(D′=0) Linkage-disequilibrium distribution across this high-resolution SNP map can

be construed as consecutive runs of SNPs in strong LD interrupted by a sudden

breakdown of LD between closely spaced markers, similar to the observations of a

“block-like” structure of LD described in other parts of the genome (Daly et al, 2001,

Dawson et al 2002, International HapMap Consortium 2005)

To map the structure of the haplotype blocks seen in the local Chinese population,

block boundaries were determined using a well-established criteria (Gabriel et al

2002) that defines a consecutive run of SNPs with significantly high pairwise D′ as a

block In contrast with the previous first generation map, this denser SNP map enables

more haplotype blocks to be uncovered Most of the SNPs on this map (1712 out of

1877, or 91%) lie within defined haplotype blocks and a total of 203 haplotype blocks

can be identified across the MHC, covering 3.7Mb of this 4.9Mb map (75.25%) This

is similar to the 202 blocks covering 82% of the MHC region reported in a LD map of

a Caucasian population (Miretti et al 2005) The haplotype block coverage also falls

into the range of the genome-wide average (67-87%) reported in the HapMap project

(International HapMap Consortium, 2005)

The haplotype blocks have an average size of 18.2kb and range from 70bp to 180kb

As seen in the previous lower-resolution SNP map, 2 of the biggest blocks with sizes

of 180kb and 100kb lay within the extended class I region This indicates that the

blocks identified in the lower-resolution SNP map are robust There is an average of

Trang 12

7.1 haplotypes per block and this is very similar to that reported in the Caucasian

population (18kb average, 6.4 haplotypes per block) Haplotype blocks are also

segments of low diversity – within a haplotype block, 95% of the total variation in the

local Chinese population is represented by an average of 4.4 haplotypes Furthermore,

each haplotype block carries an average of 3.9 common haplotypes (present in greater

than 5% of the population)

The characteristics of the haplotype blocks seen at a MHC-wide average, as well as

when broken down into the 5 sub-regions of the MHC, are detailed in Figure 3.15

The number of haplotype blocks (expressed as a ratio to physical length to account for

the different sizes of the MHC sub-regions) was greatest in the class II region with

over 60 blocks per Mb By contrast there are 24 blocks per Mb in the extended class I

segment, while the MHC average is 41 blocks per Mb Haplotype blocks in the

extended class I region are larger and have higher coverage, averaging 37.1kb in

length and extending across 88% of the region Class II region haplotype blocks are

almost a third smaller (12.5kb) and cover only 76% of underlying DNA sequence

This shorter, more fragmented haplotype structure of the class II region appear

consistent with the greater number of discovered recombination hotspots there

(Cullen et al 1997, Jeffreys et al 2001, Cullen et al 2002) The haplotype block

characteristics mirror the pattern of stronger and longer LD in the extended class I

region, and weaker LD in the class II segment The stark contrast of the haplotype

blocks within these 2 sub-regions is clearly illustrated in the LD heatmap in Figure

3.16

Ngày đăng: 14/09/2015, 14:07

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm