1. Trang chủ
  2. » Khoa Học Tự Nhiên

gene isolation and mapping protocols

314 330 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Gene Isolation and Mapping Protocols
Tác giả John Valdes, Danilo A. Tagle
Trường học Humana Press Inc
Chuyên ngành Molecular Biology
Thể loại Protocols
Thành phố Totowa
Định dạng
Số trang 314
Dung lượng 20,41 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

For diseases where a large collection of affected families exists, the gene can be locahzed by lmkage analysts which involves studying the segregation pattern of the disease phenotype wi

Trang 1

1

Gene Mapping Goes from FISH to Surfing the Net

John Valdes and Danilo A Tagle

1 Introduction

The amval of the second millenmum will usher m unsurpassed information and knowledge of our genetic constitution, and will promise to revolutronize basic research and molecular medicine The road toward a complete under- standing of our genetic makeup is largely the fruit of the Human Genome Project that has mmated, advanced, and made major strtdes in constructing genetic and physical maps of humans and other model organisms Already enttre genomic sequences of a few prokaryottc organisms have become avarl- able with efforts toward completion of the budding yeast not too far behind Gene mapping and identification are critical steps m this ambitrous undertak- ing Unfortunately, the identrficatton of genes, especially those responsible for the vast majority of inherited human disorders, must often proceed without any knowledge of then biochemical functions To wit, positional clomng (I) has taken center stage toward the initial steps m the molecular characterization of the estimated 100,000 genes in the human genome This approach has gar- nered over 60 “diseased” genes thus far, with many more to come as the pro- cess becomes more streamlined Despite having achieved in the last several years a framework of genetic and physical maps of the human genome, none- theless the efficient and comprehensive isolation of transcribed sequences within large targeted genomic intervals remains a formidable task The numer- ous chapters in this book document the creativity and ingenuity of various investigators and laboratories m this global effort Our aim in this introductory chapter is to give an overvrew of gene mapping and assess where approaches

in gene isolation are headed m the near future

From Methods IR Molecular B/o/ogy, Vol 68 Gene lsolatfon and Mapprng Protocols

Edlted by J Boultwood Humana Press Inc , Totowa, NJ

1

Trang 2

Valdes and Tag/e

7.7 ldenfifying and Defining the Chromosome of Interest

The mapping of a gene that contains disease-causing mutations frequently begins with the assignment of the gene to a single chromosome or to a specific subchromosomal region Chromosomal gene assignments can be accomplished

in several ways For diseases where a large collection of affected families exists, the gene can be locahzed by lmkage analysts which involves studying the segregation pattern of the disease phenotype with selected genetic markers within a pedigree Statistical methods are used to determine the likelihood that the marker and disease are segregating Independently If the chance of mde- pendent segregation 1s Cl in 1000 (an LOD score of 3), then lmkage 1s assumed Identification of recombinant families using addmonal polymorphrc markers allows further delineation of the lmked interval Linkage analysis has shown widespread success m mapping monogenic disorders that show clear Mende- lian inheritance patterns The same principles are now bemg applied to poly- gemc diseases (those that show complex genetic patterns, likely owing to multiple genes and/or environmental factors acting m combmation), but this has proven difficult in practice (2) Proposed soluttons have mcluded use of standardized ascertainment and the incorporation of interference models (3,4), inclusion of larger sample sizes, or use of genetically homogeneous popula- tions in lmkage disequilibrium studies (5)

Human-rodent somatic cell hybrids (either monochromosomal, regional/dele- tion, or radiation-reduced mapping panels) provide a convenient resource for mapping of genes by hybridization or polymerase chain reaction (PCR) Hybrid cell lines have also been useful in genetic complementation studies, such as in xeroderma plgmentosa and m Niemann-Pick disease (6) Aside from mapping, radiation hybrids provide additional information about the order and distance

of markers/genes (7,s) where segments of DNA that are farther apart on a chro- mosome are more likely to be broken apart by radiation and thus segregate independently in the radiation hybrid cells than rf they were closely linked together Fluorescence in situ hybridization (FISH) is also widely used to determine the chromosomal map location and the relative order of genes and DNA sequences within a chromosomal band Unlike hybrid panel mapping where a cDNA clone or PCR primers are all that is needed, larger genomic clones, such as cosmids, are needed when mapping via FISH However FISH can readily provide more precise regional mapping than regional or radiation panels FISH can also detect aneuploidy, gene amplification, and subtle chromo- somal rearrangements Discovery of a patient whose inherited disease has resulted from a visible chromosomal abnormality has often been the ‘Jackpot” that has accelerated efforts to clone the causal gene (9, IO) The ability to map

by FISH most chromosomal translocations that interrupts or inactivates the

Trang 3

Gene Mapping Goes from FISH to Surfing the Net 3 gene has tremendous utility m the field of cancer genetics (II), where molecu- lar events leading to the loss of tumor suppressor genes (12) or the generation

of fusion genes (13) can often be detected at the chromosome level Usmg FISH on normal metaphase spreads, comparative genomic hybridization (CGH) allows total genome assessment of changes m relative copy number (regions of chromosomal loss, gain, or amplificattons) of DNA sequences using DNA probes derived from tumor cells (14) CGH has the potential to identify previously unknown regions involved m tumorigenesis

1.2 Defining and Cloning the Physical Region

Once a genomic interval has been defined for a disease locus, the gene map- ping efforts now shift toward constructmg a physical map of the candidate region, determining accurate distances between markers, and cloning the genomic segment m large insert clones Physical distances can then be estab- lished and correlated with the genetic distance (e.g., if two marker probes hybridize to the same 250-kb fragment, then their maximum dtstance apart must be 250 kb) Physical distances between genomic markers can be refined with pulsed-field gel electrophoresis (PFGE) and a combination of rare cutting restriction enzymes Because such enzymes occur in GC-rich sequences, the location of CpG islands, which are likely landmarks for expressed genes, can then be determined The pulsed-field maps also provide a reliable method for verifying the extent of coverage of overlapping clones within a contig in rela- tion to the actual genomic distance PFGE can also be used to compare patient and normal DNA samples, looking for genomic abnormalities that may have been too small to be detected by cytogenetic techniques (13)

In long-range physical mapping, yeast artificial chromosomes (YACs) are the cloning library of choice because of their larger insert size, which means that fewer markers and clones are required to anchor and assemble the contig (15,26) Where a dense ordered array of markers is available, bacterial artifi- cial chromosomes (BACs), Pls, or even cosmids are preferred for screening despite their smaller insert sizes (120 kb for BACs, 95 kb for P 1 s and 40 kb for cosmids) because of their ease in purifying DNA, relative stability, and low frequency of chimerism compared to YAC clones Genomic clones isolated for the candidate interval are analyzed for insert size and for degree of overlap

by marker content mapping using sequence-tagged sites (STSs) and repetitive element fingerprint patterns The clones or derivatives of it can be used as probes for chromosome walking until full coverage of the candidate interval are obtained More importantly, these genomic clones provide a readily avail- able source of DNA for isolating additional markers, for use as FISH or hybridization probes, for generating sequence data, and for gene identification

Trang 4

4 Valdes and Tag/e

1.3 Gene lsola tion

Genetic linkage analysis and physical mapping experiments can often resolve the rough locatron of a gene to a region of 0.5-l centrmorgan (eqmva- lent to a frequency of 1 recombinant/l00 meloses), which IS approx 1 Mb Such an interval may contain from 3G.50 genes, and rdenttfymg all the genes n-r such a region and finding the causative gene for the disorder has been a major bottleneck m most posmonal cloning projects The choice of which gene cloning strategies to utrhze often depends on the available resources in a given laboratory The common gene hunting methods can be divided mto hybrtdrza- non-based and functional detection of sequences involved m RNA splicing Exon trapping identifies putative transcribed sequences from genomtc clones (often cosmlds as starting templates) based on splrcmg signals present m exon-mtron junctions No assumptrons are made regarding the tissue-specific pattern of expression of a given gene or of its level of expression The targeted exons can be internal (17,18) or directed toward the 3’-termmal exon (19)

Numerous labs have applied the method successfully for both gene lsolatton (20,21) and mapping intron-exon boundaries of known genes (22)

Transcribed sequences m genomtc DNA can also be detected by either using labeled cDNAs as hybridization probes on arrayed genomic clones (23) or the converse, where genomtc clones are used as probes against cDNA libraries (24,25) The former approach has taken on numerous permutations where the genomic YAC clones are either immobrlized on filters (26,27), brotmylated (28-301, or used in solutron hybrrdizatron schemes (32-34) These methodolo- gies assume some prior knowledge of the targeted gene’s expression level, since moderately to abundantly expressed messages are those usually obtained,

as well as an idea on the proper tissue source of library to screen Because the techniques are hybrrdrzatron-based, problems with sticky or GC-rich cDNAs, repeat sequences, and pseudogenes and related family gene family members frequently accompany the final product

None of the aforementtoned methodologtes are expected to garner full- length clones The end points using these techniques are for the most part small exons or cDNA fragments that can then serve as additional expressed sequence tagged sites (ESTs) or probes for rsolatmg larger clones

Other gene cloning strategies take advantage of certain features m the genomlc DNA or transcript One such feature would be CpG islands that are areas of the human genome where the CpG dinucleotide is enriched (1 O-20 times greater than other regions) CpG islands tend to be associated with the 5’-ends of genes and can therefore provide a means of tsolatmg those genes

A recent survey of 375 genes m the GenBank database demonstrated that almost all housekeeping genes, and about 40% of tissue-specific genes are

Trang 5

Gene Mapping Goes from FISH to Surfing the Net 5 associated wtth these Islands (3.5) These Islands can be isolated by rare-cutting enzymes (36-38) or by PCR (39), and used as hybrrdrzatron probes against cDNA libraries Another feature would be the differential expression pattern

of genes in certain tissues Subtraction techmques (40,41) have been used to isolate genes spectfic to one particular tissue source or developmental stage, This technique involves the use of a target cDNA hbrary (derived from a tissue where the desired gene IS likely to be expressed) and a drover cDNA library to subtract out most ubiquitously expressed sequences Differential display (42-44)

is another method for isolating genes that are unique to a partrcular cell type or developmental stage and allows the analysts of expressron patterns of multiple cell types

A third feature takes advantage of mutants m model organisms whose phenotype resembles that in human The mouse genome (as well as that of other organisms) is also being investigated as part of the Human Genome Project Mouse genetic studies are able to take advantage of selective breeding, short generation times, and backcrosses (matmg between two mice, one of which is homozygous for a recessive tract, in order to establish the genotype of the first) One possible approach to mappmg a gene is to isolate the mouse homolog, determine its genetic localization within the mouse genome, and then focus efforts on the part of the human genome to which it corresponds Com- parative mapping between the mouse and human is fairly well defined: The entire genome can be separated into 68 homologous chromosomal regions (4.5,) The observatron and characterizatron of naturally occurrmg mouse mutants have also supplied model systems (46), as well as acceleratmg the search for human disease genes (45)

1.4 Future Directions

There is no doubt that the number of genes being cloned by positional clon- ing approaches is increasing at a rapid rate (5) Most of these genes have been obtained using the methodologies outlined in this chapter However newer resources being made accessible through the Human Genome Project are promising even to accelerate gene mapping and isolation at a more rapid rate With the increasing resolutton of the chromosome physical maps, it is now feasible to embark on large-scale genomic sequencing (47) This has become possible despite the lack of significant improvement in sequencing methodol- ogy, but through a combination of faster computational machines to store and analyze the data, ready availability of sequence-ready cosmtd clones and their derivatives, and dense mapping information to help minimize overlap of cosmid templates Large-scale sequencing of genomtc clones has been com-

;sleted for a number of prokaryotic organisms (48,49) and implemented for

Trang 6

6 Valdes and Tag/e diseased loci (50) as an additional gene searching tool Sequences are queried

to the sequence databases and fed to the Gene Recognition and Analysis Internet Link (GRAIL) server for exon prediction through computational analy- sis of the sequence (51,.52)

Another critical development is the concerted effort to develop a transcript map of the human genome that involves sequencing of human cDNA clones by the Washington University Genome Sequencing Center under the auspices of Merck (Whitehouse, NJ) (53) The centerpieces of this undertaking are the oligo(dT)-primed, directionally cloned and normalized cDNA clones from vari- ous tissue sources (54,55) Concomitant with the sequencing are efforts to develop these sequences into gene-based STSs, and place them on the physical map via YACs (56,57) and radiation hybrid maps Although attempted in the past on a limited scale, it is projected that this endeavor will generate approx 400,000 ESTs by early this year (53) The sequences, mapping information, and homology results are easily accessible via World Wide Servers in the Internet As the number of the mapped cDNAs increase, these ESTs automati- cally become candidate genes if they so happen to fall in an interval linked to a disease locus The tremendous potential of this resource can be gleamed from recent statistics obtained by National Center of Biotechnology Information at the National Institutes of Health that 79% of positionally cloned genes are actually represented in the EST database (dBEST at http://www.ncbi nlm.nih.gov/dbEST/index.html) Positional cloning will soon be simplified to

a positional candidate approach where linkage of a particular monogenic or polygenic disorder to a particular chromosomal subregion will be followed by

a survey of the interval for any interesting ESTs (5)

by transfer of a human chromosome 18 Hum Genet 92,157-l 62

7 Walter, M A and Goodfellow, P N (1993) Radiation hybrids: irradiation and fusion gene transfer Trends Genet 9,352-356

Trang 7

8 James, M R., Richard, C W., III, Schott, J J., Yoursy, C , Clark, K., Bell, J., Tersilliger, J D., Hazan, J., Dubay, C., Viginal, A., Agrapart, M., Imai, T., Nakamura, Y., Polymeropoulos, M., Weissenbach, J., Cox, D R., and Lathrop, G

M (1994) A radiation hybrid map of 506 STS markers spanning human chromo- some 11 Nut Genet 8,70-76

9 Black, G and Redmond, R M (1994) The molecular biology of Norrie’s disease Eye 8,491-496

10 Chotai, K A., Brueton, L A., van Herwerden, L., Garrett, C., Hinkel, G K., Schinzel, A., Mueller, R F., Speleman, F., and Winter, R M (1994) Six cases of 7p dele- tion: clinical, cytogenetic and molecular studies Am J Med Genet 51,270-276

11 Cohen, M M., Rosenblum-Vos, L S., and Prabhakar, G (1993) Human cytoge- netics Am J Dis Child 147, 1159-l 166

12 Johansson, B., Met-tens, F., and Mitelman, F (1993) Cytogenetic deletion maps

of hematologic neoplasms: circumstantial evidence for tumor suppressor loci Genes Chromosomes Cancer 8,205-2 18

13 Liu, P., Tarle, S A., Hajra, A., Claxton, D F., Marlton, P., Freedman, M., Siciliano, M J., and Collins, F S (1993) Fusion between transcription factor CBF beta/PEBP2 beta and a myosin heavy chain in acute myeloid leukemia Science 261, 1041-1044

14 Kallioniemi, A., Kallioniemi, 0 P., Sudar, D., Rutovitz, D., Gray, J W., Waldman, F., and Pinkel, D (1992) Comparative genomic hybridization: a rapid new method for detecting and mapping DNA amplification in tumors Semin Cancer Biol 4,4 l-46

15 Ramsay, M (1994) Yeast artificial chromosome cloning Mol Biotechnol 2, 181-201

16 Khristich, J V., Bailis, J., Diggle, K., Rodkins, A., Romo, A., Quackenbush, J., and Evans, G A (1994) Large-scale screening of yeast artificial chromosome libraries using PCR BioTechniques 17,498-50 1

17 Duyk, G M., Kim, S., Myers, R M., and Cox, D R (1990) Exon trapping: a genetic screen to identify candidate transcribed sequences in cloned mammalian genomic DNA Proc Natl Acad Sci USA 87,8995-8999

18 Buckler, A J., Chang, D D., Graw, S L., Brook, J D., Haber, D A., Sharp, P A., and Housman, D E (1991) Exon amplification: a strategy to isolate mammalian genes based on RNA splicing Proc Natl Acad Sci USA 88,4005+009

19 Krizman, D B and Berget, S M (1993) Efficient selection of 3’ terminal exons from vertebrate DNA, Nucleic Acids Res 21,5 198-5202

20 Abel, K J., Castila, L H., Buckler, A J., Couch, F J., Ho, P., Schaefer, I., Chandrasekharappa, S C., Collins, F S., and Weber, B L (1994) Isolation of gene sequences from the BRCAl region of chromosome 17q2 1 by exon ampliti- cation, in Identification of Transcribed Sequences (Hochgeschwender, U and Gardiner, K., eds.), Plenum, New York, pp 183-189

21 Andreadis, A., Nisson, P E., Koisk, K S., and Watkins, P C (1993) The exon trapping assay partly discriminates against alternatively spliced exons Nucleic Acids Res 21,2217-2221

Trang 8

8 Valdes and Tagle

22 Kwok, J B , Gardner, E., Warner, J P., Ponder, B A., and Mulligan, L M (1993) Structural analysis of the human ret proto-oncogene usmg exon trapping

Oncogene 8,2575-2582

23 Hochgeschwender, U , Sutcliffe, J G., and Brennan, M B (1989) Construction and screening of a genomic library spectfic for mouse chromosome 16 Proc

Nat1 Acad Scz USA 86,8482-8486

24 Wallace, A4 R., Marchuk, D A, Anderson, L B., Letcher, R, Odeh, H M , Saulmo, A M., Fountain, J W , Brereton, A., Nicholson, J , and Mitchell, A L (1990) Type I neurolibromatosis gene: identification of a large transcript disrupted

in three NFI patients Science 249, 18 1-186

25 Elvm, P., Slynn, G., Black, D., Graham, A., Butler, R., Riley, J , Anand, R , and Markham, A F (1990) Isolation of cDNA clones using yeast artificial chromo- some probes Nuclerc Aczds Res l&39 13-39 17

26 Lovett, M , Kere, J , and Hinton, L M (1991) Direct selection: a method for the isolation of cDNAs encoded by large genomic regions Proc Nat1 Acad Scz USA 88,9628-9632

27 Parimoo, S., PatanJab, S R., Shukla, H., Chaplin, D D., and Weissman, S M (1991) cDNA selection efficient PCR approach for the selection of cDNAs encoded m large chromosomal DNA fragments Proc Nat1 Acad SCL USA 88, 9623-9627

28 Korn, B , Sedlacel, Z., Manta, A, Kioschis, P., Koneckt, D., Lehrach, H., and Poustka, A (1992) A strategy for the selection of transcribed sequences m the Xq28 region Hum Mol Genet 1,235-242

29 Morgan, J G., Dolganov, G M., Robbms, S E., Hmton, L M., and Lovett, M (1992) The selective isolation of novel cDNAs encoded by the regions surround- ing the human mterleukm 4 and 5 genes Nucleic Acids Res 20, 5 173-5 179

30 Tagle, D A., Swaroop, M., Lovett, M., and Collms, F S (1993) Magnetic bead capture of expressed sequences encoded within large genomic segments Nature 361,751-753

3 1 Swaroop, A and Yan, D (1994) A sandwich-hybridtzation method for specific and efficient selection of cDNA clones from genomic regions, m Zdentificatzon of Transcrzbed Sequences (Hochgeschwender, U and Gardmer, K., eds.), Plenum, New York, pp 91-100

32 Jagadeeswaran, P., Odom, M W., and Boland, E J (1994) Novel strategy for isolating unknown coding sequences from genomic DNA by generating genomic- cDNA chimeras, in Identzjkatlon of Transcrzbed Sequences (Hochgeschwender,

U and Gardmer, K., eds.), Plenum, New York, pp 10 l-l 10

33 Brookes, A J (1994) Identifymg and directly purifymg transcribed elements coincident sequence cloning, in Zdentzfzcation of Transcribed Sequences (Hoch- geschwender, U and Gardiner, K , eds ), Plenum, New York, pp 111-122

34 Hozier, J C , Davis, L M., Siebert, P D., Dietrich, K., and Paterson, M C (1994) Finding candidate genes by preparative zn sztu hybridization, m Identification of Transcribed Sequences (Hochgeschwender, U and Gardmer, K., eds.), Plenum, New York, pp 123-138

Trang 9

Gene Mapping Goes from FISH to Surfing the Net 9

35 Larsen, F., Solheim, J , Kristensen, T , Kolsto, A B., and Prydz, H (1993) A tight cluster of five unrelated human genes on chromosome 16q22 1 Hum Mol Genet

38 Trtbtoh, C., Maestrmi, E., Bione, S., Tamamm, F., Mancini, M., Sala, C., Torrt,

G , Rivella, S , and Toniolo, D (1994) Identification of genes and construction of

a transcriptional map in Xq28, m Identz$catzon of Transcrzbed Sequences (Hochgeschwender, U and Gardmer, K., eds ) Plenum, New York, pp 5-10

39 Valdes, J M., Tagle, D A , and Collins, F S (1994) Island rescue PCR: a rapid and efficient method for isolating transcribed sequences from yeast arttfictal chro- mosomes and cosmids Proc Nat1 Acad Sci USA 91,5377-538 1

40 Swaroop, A., Xu, J , Pawar, H., Jackson, C., Skolmck, C., and Agarwal, N (1992)

A conserved retina-specific gene encodes a basic mottf/leucme zipper domam Proc Nat1 Acad Scz USA 89,266-270

41 Gratas, C., Herlyn, M., and Becker, D (1994) Isolation and analysts of novel human melanocyte-specific cDNA clones DNA Cell Biol 13, 5 15-5 19

42 Liang, P , Averboukh, L , and Pardee, A B (1993) Distribution and cloning of eukaryottc mRNAs by means of differential display refinements and optimization Nucleic Acids Res 21, 3269-3275

43 Liang, P., Averboukh, L., and Pardee, A B (1993) Distribution and clonmg of eukaryotic mRNAs by means of differential display refinements and opttmtza- non Nucleic Acids Res 21,3269-3275

44 Bauer, D , Muller, H., Reich, J., Riedel, H., Ahrenkiel, V., Warthoe, P., and Strauss, M (1993) Identtfication of differentially expressed mRNA species by an improved display technique (DDRT-PCR) Nuclezc Aczds Res 21,4272-4280

45 Delezoide, A L and Vekemans, M (1994) Waardenburg syndrome in man and splotch mutants m the mouse: a paradigm of the usefulness of linkage and synteny homologies m mouse and man for the genetic analysis of human congenital mal- formations Bzomed Pharmacother 48,335-339

46 Brown, S D (1994) Integrating maps ofthe mouse genome Curr Opznion Genet Dev 4,389-394

47 Olson, M V (1995) A time to sequence Science 270,394-396

48 Fleischmann, R D., Adams, M D., White, O., Clayton, R A., Kirkness, E F., Kerlavage, D R., et al (1995) Science 269,496-5 12

49 Fraser, C M , Gocayne, J D., White, O., Adams, M D., Clayton, R A, Fleischmann, R D., et al (1995) The minimal gene complement of Mycoplasma gemtahum Sczence 270, 397-403

50 Brody, L C , Abel, K J , Castilla, L H , Couch, F J., McKinley, D R., Yin, G Y., Ho,

P P , MeraJver, S , Chandrasekharappa, S C., Xu, J , Cole, J L , Struewmg, J P , Valdes,

J M., Colhns, F S , and Weber, B L (1995) Construction of a transcription map surrounding the BRCAl locus of human chromosome 17 Genomzcs 25,238-247

Trang 10

IO Valdes and Tag/e

5 1 Uberbacher, E C and Mural, R J (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach Proc Nat1 Acad Scl USA 88, 11,261-l 1,265

52 Shah, M B., Guan, X., Einstein, J R., Matls, S., Xu, Y , Mural, R J , and Uberbacher, E C (1994) User’s guide to GRAIL and GENQUEST (sequence analysis, gene assembly and sequence comparison systems) e-mail servers and XGRAIL (Version 1 2) and XGENQUEST (Version 1 1) client-server systems Available by anonymous ftp to arthur.epm.oml gov (128 219.9.76) from direc- tory pub/xgrail or pub/xgenQuest as file Manual grail-genquest

53 Boguski, M and Schuler, G D (1995) ESTabhshing a human transcript map Nat Genet 10,369-371

54 Soares, M B., Bonaldo, M F., Jelene, P., Su, L , Lawton, L., and Efstrattadrs, A (1994) Construction and characterization of a normalized cDNA library Proc Nat1 Acad Sci USA 91,228-232

55 Adams, M D., Soares, M B., Kerlavage, A R., Fields, C., and Venter, J C (1993) Rapid cDNA sequencing (expressed sequence tags) from a dnectionally cloned human mfant brain cDNA library Nut Genet 4,373-380

56 Polymeropoulos, M H , Xiao, H., Sikela, J M , Adams, M , Venter, J C., and Merril, C R (1993) Chromosomal drstribution of 320 genes from a brain cDNA library Nat Genet 4,381-386

57 Berry, R , Stevens, T J., Walter, N A, Wilcox, A S , Rubano, T., Hopkins, J A , Weber, J., Goold, R., Soares, M B , and Sikela, J M (1995) Gene-based sequence-tagged-sites (STSs) as the basis for a human gene map Nat Genet 10,

415-423

Trang 11

Linkage Analysis of Genetic Disorders

Eugene W Taylor, Jianfeng Xu, Ethylin Wang Jabs,

and Deborah A Meyers

1 Introduction

1 1 Definition

Genetic disorders follow a classic Mendelian dominant or recessive single- locus pattern of inheritance or a complex genetic pattern (multiple genes and environmental influences) In general, the complexity arises when the simple correspondence between genotype and phenotype is not one to one due to pos- sible misclasstfication of phenotype, mcomplete and age-dependent pene- trance, phenocopies, genetic heterogeneity and/or ohgogemc inheritance Errors in diagnosis could be the result of variable expression of a disease with mildly affected individuals being misdiagnosed as unaffected In the presence

of incomplete or age-dependent penetrance, an mdividual who inherits a pre- disposing disease allele may not manifest the disease at all or the chance of manifesting the disease may depend on his or her age On the other hand, phenocopies are indivtduals who do not inherit the disease allele but have the disease m question, probably caused by environmental factors and/or other genes Genetic heterogeneity is a situation where mutations in any one of several genes may result m identical phenotype Oligogemc inheritance requires the simultaneous presence of mutations in multiple genes

1.2 Types of Approaches

The lack of a clear one-to-one relationship between genotype and pheno- type makes genetic studies difficult; however, several genetic epidemiology approaches are helpful m determining if there is a genetic component to a com- plex disorder These approaches are used to determine whether a disorder is caused by environmental factors, polygenes (several genes affect the disorder,

From Methods in Molecular B/ology, Vol 68 Gene /so/at/on and Mapping Protocols

Edlted by J Boultwood Humana Press Inc , Totowa, NJ

11

Trang 12

12 Taylor et al each by a small amount), major genes (one or several major genes involved),

or mixed polygenic and major genes (in addition to a major gene, there 1s still

a residual polygemc effect)

1.2.1 Familial Aggregation (Relative Risk)

Although familial aggregation of a disorder could be caused by either com- mon environmental factors within a family or genetic components, it 1s usually the first hint that a disorder may have a genettc component In the presence of familial aggregation, the recurrent risk for relatives of an affected person 1s higher than that of the general population Often, an accurate estimate of the recurrent nsk m relattves and the populatton mcidence and prevalence is drff- cult to obtain Well-designed large-scale epidemiological studies and even longitudmal studies are needed (I)

Relattve risk, h,, defined as the mctdence rate for a relative of an affected person divided by that for the general population, is one measure of familial aggregation The subscript denotes the type of relative, for example ho and h, are the risk to offspring and sibs, respectively Rtsch (2) showed that genettc mapping IS much easier for traits with hrgh hs (for example hs > 10) than for those with low (for example hs < 2) Again, it may be difficult to obtain accu- rate estimates of these risks Risk ratios are very high for Mendelian disorders, because family members of an affected individual may inherit the same gene, whereas the risk in the general population is very low

1.2.2 Twin Studies

Twms constttute a unique sample design provided by nature, and are an excellent way to match for age and many envrronmental factors The goal of twm studies is to compare similarities (correlation coefficients for quantitative traits and concordance rates for qualitative traits) m monozygotic twins (MZ) and dizygotrc twms (DZ) A large difference m the degree of similarity between

MZ and DZ twins suggests a genetic component For example, in a study of

938 female twin pairs, there is a concordance rate of 37.3% of major depres- sion in MZ twins, compared to a DZ rate of 23.9%, which suggests a genetic etrology (3) Studies of twins raised apart (adopted) can be very useful m par- titioning of environmental influences

I 2.3 Segregation Analysis

Segregation analysis is used in Mendelian disorders to estimate various parameters, such as penetrance, whereas for complex disorders, segregation analysis is a useful tool to identify the mode of inheritance and estimate impor- tant parameters In complex segregation analysts, the fitting of various speci- fied models to the observed inheritance pattern m the pedigrees is compared

Trang 13

if the classic LOD score approach is used for the lmkage analysrs For example,

a complex segregation analysis, using the computer program S.A G.E (4), of adjusted log IgE levels (adjusted for age) in a Dutch asthma famtly study popu- lation (5) was performed, since IgE levels, an easily measured quantitattve trait, correlate with the presence of asthma Evidence was obtained for a major gene inherited as a recessive trait yielding a model that could be used for lmkage Unfortunately, segregation analysis 1s sensitive to bias m the ascertainment

of families The common ascertainment scheme for lmkage analyses, of select-

mg only pedigrees with multiple affected members, may lead to false evidence

of Mendellan inheritance and also to an overestrmate of gene frequency and penetrance However, if families are selected through a single proband, such as

m the Dutch asthma study described above, segregation analysis with adjust- ment for ascertamment is possible It 1s often difficult to detect a major gene, with relatively small family sizes Usually only one locus analysis is performed, and it is difficult to analyze for the presence of multiple distinct loci (6) How- ever, multilocus segregation analysts is especially worth considermg if there is

a quantitative measure related to disease status that is easily measured

2 Parametric Linkage Analysis

2 I Definition

Genetic linkage is used to elucidate the underlying genetic mechanisms for inherited disorders (traits) and to find chromosomal locations for the suscepti- bility disease genes The demonstration of a lmkage is often considered the highest level of statistical “proof’ that a disease is the result of a genetic mecha- nism (7) At present, there are two major categories of genetic linkage analysis, parametric linkage analysis using family pedigree methods and allele sharing analysis using relative pairs (especially sib pairs)

Genetic linkage is defined as the violation of Mendel’s law of independent assortment The law states that the alleles at two chromosome locations (loci) will assort independently and are transmitted to offspring m random combina- tions Nonindependent assortment occurs when genetic loci are positioned near each other on the same chromosome (Fig 1) As the distance between two loci increases, crossovers (recombination fraction) between the two loci increase, producing new haplotypes (the alleles for a chromosomal region received by

an individual from a given parent [8])

Trang 14

Parametric lmkage analysis involves the comparison of likelrhoods of observing the segregation pattern of two loci within the pedigree for several specific hypotheses First, the hkelihood of observing the segregation pattern

of two loci assummg the null hypothesis of no genetlc linkage is calculated, that is, independent assortment between the two loci or

where Z = LOD, 0 = recombmation fraction, and L = likelihood of observ- ing the patterns of inheritance at the given 8 Next, the likelihoods for each of several alternative hypotheses4ifferent extents of crossing over (recombma- tion fractionFare calculated and compared with the likelihood of the null hypothesis by means of an “odds ratio.” This is commonly done using the LINKAGE computer programs (9) The odds ratio consists of the likelihood of

an alternative hypothesis divided by the likelihood of the null hypothesis For Mendelian disorders, an odds ratio of >lOOO: 1 is usually considered evidence for linkage (10) Clinical aspects of the disorder being studied, i.e., late age of onset, failure of affected individuals to reproduce, or mode of inheritance, are all factors that make it unlikely that a single family will provide significant evidence for linkage, so often multiple small families are used The LOD scores are summed over pedigrees as seen m the example in Table 1 To allow sum- mation of pedigrees, the base 10 logarithm of the odds ratio is reported (LOD score) at different recombinatron fractions Strong evidence for lmkage of the locus for Treacher Collins syndrome and a marker on chromosome 5 (D5S210)

Trang 15

Linkage Analysis

Table 1

LOD Scores of Families with Treacher Collins Syndrome

and Marker DSS210a

aAdapted from Jabs et al (II)

was obtained (Table 1) A total LOD score of 8.54 at a recombination fraction

of 1% was obtained, suggesting that the disease locus maps very close to this marker (II) Family 7 by itself has an LOD score of >3; the magnitude of the resulting LOD score 1s affected by family size and informativeness of a given marker Markers with a heterozygosity of >.70 are generally used, this will help increase the power of the study by making more pedigrees informative

To search the entire genome, markers mapped at 10 CM intervals are often used, resulting in genotyping approximately 350 markers The density of markers used in such a genome screen will depend on the mformatlveness of the mark- ers, structure and number of families, and mode of inheritance for the disease,

If the two loci being studied are both genetic markers, the parametric link- age analysis is straight forward, because the mode of inheritance of a genetic marker is usually codominant and there is one-to-one relationship between genotype and phenotype The situation is similar for a simple Mendelian dis- ease locus, because by definition, the disorder is controlled by a major locus with known mode of inheritance, and it is safe to infer the genotype from the phenotype There may be rare cases of misdiagnosis, and it may be necessary

to estimate the degree of penetrance for unaffected family members Linkage analysis has been successfully applied to many Mendelian traits The simplest situation is when unequivocal linkage can be demonstrated in a single large pedigree with LOD score >3, even though other families may show no linkage (genetic heterogeneity) If linkage cannot be established on the basis of any single pedigree or seen in the total sample of families, one can ask whether a subset ofpedigrees collectively shows evidence of Imkage Of course, one can-

Trang 16

16 Taylor et al not simply choose those families with positive LOD scores Such an expost selection criteria will always produce a positive LOD score However, families can be selected on the basis of a priori considerations (for example, different clmical presentations) The admixture test using the computer program HOMOG can be used for genetic heterogeneity when the families are not divided m groups based on other criteria, such as clinical differences (8) For small families, it 1s difficult to estimate accurately the degree of heterogeneity from this type of analysis

2.2 Problems in Complex Disorders

Parametric lmkage analysis may not be useful for a complex disorder, mainly because of the breakdown of the simple relationship between phenotype and genotype, caused by the following*

1 Mrsdiagnosrs-the mrsdragnosed affected family members are not susceptrbrhty gene carrters, whereas the misdiagnosed “unaffecteds” actually carry the suscep- tibrhty gene,

2 Incomplete penetrance owmg to reduced penetrancwertain percent of the unaffected famrly members are susceptibility gene carriers,

3 Phenocopy-indrviduals with the disorder are affected by some other mecha- nism and do not have the susceptibility gene under study (possibly a different gene);

4 Heterogeneity-some affected famthes have a genetic defect m another locus and thus do not have the susceptibrhty gene under study; and

5 Ohgogemc inherrtancea disease phenotype is the result of several defectrve genes, erther additrve or mteractrve

Thus, in a given family, the phenotype “affected” may or may not be owmg

to the specific gene under study It is necessary to relate an individual’s genotype for the susceptibility gene from his or her phenotype for linkage stud- ies The breakdown m the relationship between phenotype and genotype increases the difficulty of finding linkage using parametric linkage analysis (6) These factors affect all methods of linkage analysis of complex disorders, including allele-sharing methods, because they create uncertainty However the impact tends to be greater m parametric lmkage analysis where the results are the outcome of two components, the correct specified model and lmkage

As can be seen, the correct specified model is often difficult to determine for complex disorders

2.3 Strategies Used in the Analysis of Complex Disorders

Parametric linkage analysis for complex disorders, however, is by no means useless The understanding of these difficulties may help researchers to overcome these problems, and there are several successful examples, such as

Trang 17

Linkage Analysis 17 early onset breast cancer (12) Several strategies can be considered m the para- metric linkage analysis of complex disorders Overestrmatmg the degree of penetrance can lead to spurious evidence against linkage owing to individuals who inherit a trait-causing allele, but are unaffected “Affected only” paramet- ric lmkage analysis is a common practice used to deal with the problem of incomplete and age-dependent penetrance (6) This type of analysis might decrease the effective number of meioses However, it decreases the possible impact of false recombinants from unaffected family members who are gene carriers In the case of an obscure phenotype where there may be a relatively high rate of misdiagnosis, various alternative diagnostic schemes can be applied However, it is then necessary to adjust for the number of disease models used when determming significance Another approach is first to study a related phenotype where information on a genetic model may be available An example of this is total serum IgE levels, a quantitative mea- sure correlated with the presence of asthma (13) After obtaining evidence for a major locus for IgE regulation mapping to 5q, linkage analysis with the asthma phenotype was performed, resulting m evidence for linkage to this same region (24)

Parametric lmkage analysis has also been successfully applied to disorders with genetic heterogeneity If available, a clinical variable, such as age of onset

or severity, can be used to subdivide a sample mto two groups of pedigrees Families can thus be selected on the basis of a priorz considerations An example of this approach is provided by the genetic mapping of a gene for early onset breast cancer (BRCAl) to chromosome 17q Families were added

to the linkage analysis in order of their average age of onset, resulting in an LOD score that rose steadily to a peak of 6.0 with the inclusion of families with onset before age 47 and then fell with addition of late onset pedigrees (22) Notwithstanding these successes, many failed linkage studies may result from the presence of a high degree of heterogeneity, It is usually wise to try to define clinically a homogeneous set of families

Although several simulation studies have suggested that in a disorder caused

by two genes, a single-locus approxrmation has high power to detect linkage (15), a correctly specified two-locus model can sometime significantly increase evidence for linkage An example 1s the parametric linkage analysis between the locus for IgE levels and markers on chromosome 5q An LOD score of 3.0 for marker D5S436 was first reported using a one-locus recessive model

in a Dutch asthma family study After a subsequent segregation analysis sug- gesting that a two-locus recessive model fit the inheritance pattern signifi- cantly better than one-locus recessive model, parametric linkage analysis using the best two-locus model gave the LOD score of 4.6 for the same marker (16)

Trang 18

18 Taylor et al 2.4 Multipoin t Mapping

It is possible to combine information from several markers to increase the mformativeness of the famtlies Families that are not informative for a spe- cific marker may be informative for the flanking markers This method can

be used to pinpoint the most likely map location for the disease gene In Table 2, the multipoint analysis shows that the most likely location for BRCAl is close to D17874 As described previously, families with a young age of onset showed the strongest evidence for linkage Multipomt analysis

is sensitive to errors m genotyping and phenotyping, and care must be taken

to ensure data integrity Linkage disequilibrium (a deviation of random occurrences of specific alleles in haplotypes) studies can then be used to refine further the location of the disease gene (81 This approach is especially effective if the families come from an isolated population, thus increasing the possibility of a founder effect

2.5 Cautions

Misspectficatton of marker allele frequencies can cause false positive link- age results, especially m families where many parents are untyped This is because underestimation of allele frequencies may lead to spurious lmkage mformation For example, if cousms share a “rare” allele, this suggests the presence of linkage However, if the grandparents are deceased, they may have been homozygous for the allele in question and the cousins actually inherited different copies of the allele Thus, it is important to consider the allele frequencies from both population data and the study sample The other problem is multiple tests, mainly owmg to the uncertainty of modes of inherit- ance This will inflate the type I error (i.e., false positive) and make LOD score results difficult to interpret (6) Two approaches are very useful in these cases First is a computer simulation method where marker data with no linkage can

be simulated using the same pedigree information (availability of typed per- sons) and the same characteristics of the marker (heterozygosity) where the highest LOD score was observed (6) Then, the simulated data is analyzed using the same approaches (number of models tested) that were used in the actual analysis An empirical significance level is then obtained The other approach

is to adjust the significance level by the number of models tested (3 + log[number of models tested]) (8) It is difficult to determine the exact cut point for significance in complex disorders On the one hand, it is important to type additional markers in any region with a suggestion of linkage, especially

m regions with known candidate genes On the other hand, it is important to realize that this may be a false-positive result Replications between studies are very important (17)

Trang 19

“Adapted from Hall et al (12)

bAppropnate map locations There IS 10% recombmatlon between D 17878 and D 17S4 1, and 6%, between D 17S4 1 and D 17874

Trang 20

Considering the possible problems in complex disorders, espectally incom- plete and age-dependent penetrance and misdiagnosis, many researchers focus

on affected sib-pair methods, although the theories also apply to unaffected sib-pairs Under the hypothesis of no linkage between a disease predisposing locus and a marker, affected sib-pans sharing of marker alleles IBD will be independent from their phenotype, and follow Mendehan expectation of shar- ing IBD 0, 1, and 2, with the frequencies of 0.25 0.5, and 0.25 This distribu- tion of sharing marker allele IBD can also be expressed as a mean number of alleles IBD = OS[ l(O.5) + 2(0.25)]/2 If, however, there is linkage between the disease predisposmg locus and a marker, the Mendelian expectation of sharing marker allele IBD will deviate from the above distributions Several statistical methods have been proposed to test this deviance One of most powerful meth- ods is a mean test, which tests whether the mean number of a marker allele IBD

is significantly different from 0.5 (19) Table 3 shows sib-pair analysts based

on the mean test for bipolar disorder and markers located on chromosome 18 (20) Increased sharing (>0.5) was observed for several markers, although most were not highly significant

Another newly developed affected sib-pair method is the likelihood method (2,21), where a LOD score is calculated from the ratio of two likelihoods the likelihood of observed marker allele IBD of affected sib pairs and the likelihood of sharing IBD under the null hypothesis of no linkage, that is, Men- delian expectation

Affected sib-pair allele-sharing methods can also be used to investigate pos- sible parental origin effect for the disorders One can look at affected sib-pairs sharing paternal and maternal alleles IBD separately This may be useful in the

Trang 21

Linkage Analysis 21

Table 3 Results of Affected Sib-Pair Analyses for Bipolar Disorder

Marker # Pairs Mean r) valueb

“Adapted from Stme et al (20)

bAll p values <O 1 are reported

presence of imprinting or mitochondrial inheritance For example, in bipolar disorder, there is evidence for linkage to chromosome 18 and excess sharing is especially pronounced m paternally transmitted alleles @ = 0.004) (20) 3.3 Quantitative Trait

The basis for the allele-sharing method for a quantitative trait is straightfor- ward: siblings that share more alleles at a locus IBD should be more similar

in phenotypic measurement than siblings that share fewer alleles Thus, the squared difference of phenotype values between sibs can be regressed on the sharing of marker alleles IBD (18) There is evidence for linkage if the regression coefficient is sigmficantly negative (i.e., sibs with a small differ- ence tend to share two alleles)

An example of this method is the sib-pair analysis for total IgE levels in the Dutch asthma family study Significant negative regression coefficients were found for several markers on 5q (5) As previously described, positive LOD scores were obtained for these same markers using the genetic model obtained from the segregation analysis

3.4 Multipoint Sib-Pair Analysis

Most allele-sharing methods are primarily based on studying genetic markers one at a time Such analyses may be inadequate, since the exact IBD status cannot always be inferred at the marker loci (for example, if parents

Trang 22

3.5 Advantages and Limitations

Allele-sharing methods are nonparametric linkage analyses, that is, they require no prior assumptions about such parameters as mode of inheritance, penetrance, phenocopy rate, and disease allele frequency In this sense, they are more robust than parametric methods because we are not dependent on as many potential erroneous model assumptions Moreover, the problem of trying multiple models and correcting for inflation of the LOD score (as is often required in such cases) is avoided in these approaches, although one must still correct for multiple diagnostic schemes The trade-off is that allele-sharing methods are often less powerful than a correctly specified linkage model (6, Sib-pairs methods are important tools for linkage studies of complex disorders, and are often used for genome screens In addition to the advantages described above, sib-pairs are relatively easy to ascertain in large numbers, and tend to be more closely matched for age and environment than other relative pairs

It would, however, be incorrect to conclude that the genetic model of the disease is irrelevant The fact that a model is not required in the analysis only implies that the model cannot be misspecified Thus, false negative or false positive findings will not be owing to the use of an incorrect model Instead, the mode of inheritance of the disease influences the power of allele-sharing methods directly Determining the model of inheritance for major genes for susceptibility to a complex disorder may provide useful information on under- standing the pathophysiology of the disorder Once evidence for linkage is obtained, more complex modeling, such as two and three locus or MOD (chang- ing the model to maximize the LOD score) score analysis (23), may provide further insight into disease mechanisms

4 Summary

Basic principles and methods of genetic analysis were covered in this chapter The approaches of linkage analysis for Mendelian or complex disorders can be summarized in the following flowchart (Fig 2) It is important that the clinical, analytical, and molecular investigators be involved in all steps in the process Mapping genes for complex disorders is often more difficult than mapping genes for Mendelian disorders, but both may prove to be very important in understand- ing disease processes and designing new treatments Practical use of computer programs available for genetic analysis is detailed elsewhere (24)

Trang 23

For complex disorders,

is there a major gene 7 Is

it dominant or recessive ?

Where is this major gene

in the human genome ?

Is there a linkage with

DNA markers under a

specific genetic model ?

Is there an increased

allele sharing for affected

relatives (sib pairs) or for

relatives with similar

phenotype ?

Analysis repeated after

typing additional makers

in region to narrow the

A Parametric Approach

B Allele-Sharing Approach (sib-pair analyses)

1 Multipoint and Fine mapping

Fig 2 Flowchart of linkage analysis

References

1 Khoury, M J., Beaty, T H., and Cohen, H B (1993) Fundamentals of Genetic Epidemiology Oxford University Press, New York

2 Risch, N (1990) Linkage strategies for genetically complex traits II The power

of affected relative pairs Am J Hum Genet 46,229-241

Trang 24

24 Taylor et al

3 Kendler, K S , Neale, M C , Kessler, R C , Heath, A , and Eaves, L J (1993) A longrtudmal twin study of l-year prevalence of major depression m women Arch Gen Psychzatry 50,843-852

4 S A.G E (1994) Statistical Analysts for Genetic Epidemiology, Release 2 2 Com- puter program package available from the Department of Biometry and Genetics, LSU Medical Center, New Orleans, LA

5 Meyers, D A., Postma, D S , Panhuysen, C I M., Xu, J , Amelung, P J., Levitt,

R C , and Bleecker, E R (1994) Evrdence for a locus regulating total serum IgE levels mapping to chromosome 5 Genomics 23,464-470

6 Lander, E S and Schork, N J (1994) Genetic dissection of complex traits Science 265,2037-2048

7 Elston, R C (1981) Segregation analysis Adv Human Genet 11,63-120

8 Ott, J (1992) Analyszs of Human Genetzc Linkage Johns Hopkins University Press, Baltimore, MD

9 Lathrop, G M., Lalouel, J M., Julier, C , and Ott, J (1984) Strategies for multilocus lmkage analysis m humans Proc Nat1 Acad SCI USA 81,3443-3446

10 Morton, N E (1955) Sequential tests for the detection of linkage Am J, Hum Genet 7,277-3 18

11 Jabs, E W., Lr, X., Coss, C A , Taylor, E W., Meyers, D A , and Weber, J L (1991) Mappmg the Treacher Collins Syndrome Locus to 5q3 1.3-q33.3 Genomzcs

11,193-198

12 Hall, J M , Lee, M K., Newman, B., Morrow, J E., Anderson, L A , Huey, B , and King, M C (1990) Lmkage of early-onset familial breast cancer to chromo- some 17q2 1 Sczence 250(4988), 1684-1689

13 Sears, M., Burrows, B., Flannery, E M , Herbison, G P., Hewitt, C J , and Holdaway, M D (1991) Relation between airway responsiveness and serum IgE

m children with asthma and in apparently normal children N Engl J Med 325, 1967-1971

14 Panhuysen, C I M., Levitt, R C , Postma, D S., Xu, J., Amelung, P J , Holroyd,

K J., Altena, R V., Koeter, G H., Meyers, D A., and Bleecker, E R (1995) Evidence for a susceptibility locus for asthma mapping to chromosome 5q

J Invest Med 43(Suppl.), 281A

15 Greenberg, D A and Hodge, S E (1989) Lmkage analysts under “random” and

“genetic” reduced penetrance Genet Epldemlol 6,259-264

16 Xu, J., Levitt, R C , Panhuysen, C I M , Postma, D S., Taylor, E W., Amelung, P J , Holroyd, K J , Bleecker, E R., and Meyers, D A (1995) Evidence for two unlmked loci regulating total serum IgE levels Am J Hum Genet 57,425+30

17 Thomson, G (1994) Identifying complex drsease genes: progress and paradigms

Trang 25

Linkage Analysis 25

20 Stme, 0 C , Xu, J F., Koskela, R., McMahon, F J , Gschwend, M., Frtddle, C., Clark, C D., McInms, M G., Sampson, S G., Breschel, T S., Vishto, E., Riskin, K., Feilotter, H., Chen, E., Shen, S , Folstein, S , Meyers, D A , Botstem, D., Marr, T G., and DePaulo, J R (1995) Evidence for linkage of bipolar disorder to chromosome 18 with a parent-of-ongm effect Am J Hum Genet 57, 1384-1394

21 Holman, P (1993) Asymptotic properties of affected-sib-pair linkage analysis

Am J Hum Genet 52,519-527

22 Kruglyak, L and Lander, E S (1995) Complete multipoint sib-pax analysis of quahtattve and quantitative traits Am J, Hum Genet 57, 439-454

23 Hodge, S E and Elston, E R (1994) Lods, Wrods and Mods: the interpretation

of lod scores calculated under different models Genet Epldemiol 11,32%342

24 Terwtlhger, J D and Ott, J (1994) Handbook of Human Genetzc Lznkuge Johns Hopkins Umverstty Press, Baltimore, MD

Trang 27

The essence of lmkage analysis is a deviation from the Mendehan principle

of independent, or random, assortment of gene pairs when transmitted from generation to generation Two genes are said to be completely linked (see Sec- tion 4.1 for defimttons of genetic terms) when there is no recombination between them; the same alleles or phenotypes are always transmitted together from generation to generation within a family Two genes are completely unlinked if they are situated on different chromosomes; m this case, the trans- mission of alleles within a family will not deviate from the Mendelian prin- ciple of independent assortment However, many gene pairs will be in an intermediate state of incomplete linkage, where there is a consistent and mea- surable deviation from independent assortment, but also a consistent recombi- nation fraction between them This recombination fraction is very roughly proportional to the physical distance between the two genes, and it is this principle that forms the basis of genetic linkage mapping

In the early days of genetics, mapping by lmkage analysis was used both to order genes and to estimate the genetic distance between them Many of the maps of the human genome now available are those derived from genetic linkage data More recently, as the technology available has increased dramati- cally, physical maps are becommg more common, at least over smaller dis- tances However, linkage analysis continues to be crucial in the construction of genetic maps when distances involved are too large to be covered by physical cloning vectors or contigs, when a physical map needs to be integrated with an existing genetic map or vice versa, or when the locus to be localized onto the

From Methods m Molecular Bology, Vo/ 68 Gene /so/at/on and Mappmg Protocols

Edlted by J !3oultwood Humana Press Inc , Totowa, NJ

27

Trang 28

28 March existing map is detectable only by its phenotype, e.g., a physical or quantrta- tive trait or a disease locus

Lmkage analysis in its simplest form is a matter of examining the transmis- sion of alleles of polymorphrc genes typed in DNA from fully mformative individuals within pedigrees, and counting recombmants and nonrecombmants Modern lmkage analysis has become much more complex, both m order to maximize the amount of information that can be derived from pedigrees, and

to infer the likely genotypes of any missmg Individuals Likelihood calcula- tions may be used to estimate recombination fractions, order of loci on the genetic map, evidence of linkage, and other parameters of the genetic model used, such as allele frequencies and penetrances These calculations are there- fore normally carried out using complex computer programs

In this chapter, it is possible to give only a very brief mtroduction to the theory and practice of linkage analysts For a fuller treatment, the reader is strongly recommended to refer to the literature listed in refs l-7 The proce- dures outlined here should be sufficient to allow the construction of a simple genetic map from data similar to that given m the imaginary example outlined below However, readers should be aware that there are many possible pitfalls

in the use of programs for linkage analysis, which may lead to highly mislead-

mg results If a more extensive use of these techniques IS needed, the best course

of action is to contact a competent statistical geneticist

This chapter covers the construction of a simple genetic map using poly- morphic markers m human DNA from two- or three-generation families (the programs used for experimental crosses are slightly different) It is assumed that raw data from mtcrosatellite or other alleles have already been gathered, possibly using the ABI 377 automatic DNA sequencer or other semiautomatic systems It also covers the coding of these raw data into numbered alleles, the setting up of computer files suitable for subsequent genettc linkage analysis, and the calculation of gene frequencies The ordering of several markers, if unknown, into a genetic map, and the calculation of recombination fractions and genetic distance between the markers is then examined Finally the placing

of a new locus, such as a phenotype or a disease locus, on this known genetic map is covered This chapter concentrates on the use of the Genetic Analysis System (GAS) package (8), mainly because this is among the most user- friendly linkage analysis packages available, but other options are also covered briefly (see Section 4.3.)

1.1 Linkage Analysis-An Example

To illustrate the principles of gene ordering and localization using linkage analysis, an imaginary genetic problem is used throughout this chapter con- cerning the localization on a genetic map of a single gene of which one allele

Trang 29

Gene Ordering and Localization 29

Dli3-Ml52 Dl12.Ml51 Dil 1 M153 Dll4-Ml55

Dll2.Ml51 Dll2.Ml55 Dll 1 -Ml52 Dll3-Ml52 Dli3-Mi53 011 3-Ml52 Dli4-Ml51 Dll4-Ml55

Fig 1 Pedigree of Smith family, m which the phenotype of red hair is segregating

codes for red hair Hair color is normally a complex trait, but serves to illus- trate the problem of localizmg a phenotype, which may be a disease state, onto

a genetic map that may include microsatellites, genes encoding polymorphic protems, or other polymorphic markers

In the Smith family, illustrated m Fig 1, the single gene of which one allele encodes red hair is segregating m the family Two markers have been typed,

Dl 1 and Ml 5 All the grandparents are fortunately available and are homozy- gous for both markers Mrs Smith’s mother was red-haired and her father was brown-haired Mrs Smith, who is red-haired and probably heterozygous at the “red” locus, must therefore have inherited allele 2 of Dl 1 and allele 1 of

Ml 5 from her mother Her paternal haplotype carries allele 4 of D 11 and allele

5 of M15

Mr Smith, who is brown-haired and has two brown-haired parents, has inherited the haplotype carrying allele 3 of D 11 and allele 2 of M 15 from his father HIS maternal haplotype carries allele 1 of Dl 1 and allele 3 of Ml 5

Mr and Mrs Smith have four children, two boys and two girls One of the boys and one of the girls has red hair Both of the red-haired children have the alleles 2 and 3 at locus Dl 1 The boy also carries alleles 1 and 3 at locus M15, whereas the girl carries alleles 2 and 5 The brown-haired boy carries alleles 1 and 4 at locus D 11, and alleles 1 and 2 at Ml 5, whereas the girl carries alleles

Trang 30

30 March

3 and 4 at Dl 1, and alleles 2 and 5 at locus Ml 5 Thus, the alleles of Dl 1 are cosegregatmg with the phenotype of red hair (there are more nonrecombinants than recombinants), but the alleles of Ml5 are segregatmg more or less ran- domly (there are equal numbers of recombinants and nonrecombmants)

If we typed several more families, and continued to find no recombinants between locus D 11 and “red,” we could conclude that these two loci were very tightly linked, with a LOD score of more than 3.0, the threshold normally accepted for significant linkage In contrast, locus M 15 appears to be unlmked

to the gene responsible for red hair m this family

The rest of this chapter follows all the practical steps necessary for the local- ization of the gene “red” with respect to the markers Dl 1 and M15

3 Method: Linkage Analysis

3.1 Preparation of Linkage-Format Pedigree and Data Files

If you are entering raw size data, e.g., from microsatellites genotyped on the

ABI 373 or 377 automatic DNA sequencer, set your GENOTYPER program to

produce a table like the following:

of any columns until it is in this format Now save the tile as text format, givmg

it a name such as “D I I txt” If you have data from more than one marker and from more than one gel, you can combine all the data together m one large file

Trang 31

Gene Ordering and Localization 31 (but with all the data for each marker together), or you can divide the data into one tile per marker

The allele sizes can be converted into numbered alleles (called “named” alleles in GAS), so that they can be used m subsequent analysis, and the gene frequencies and recombmation fractions calculated, using the GAS package

To enter your raw data into GAS, you will need the text file or files you have just created, an initial pedigree file, which does not contain marker data, and a command or “gas” file

3.1 I Preparation of Pedigree File

The gas-format pedigree file should have six or seven columns, and look like this (the top two rows are not part of the final file, but you may find it helps to have them m place when you are creating the file and delete them just before use)

5, and 6 and 7 are the paternal grandparents The next two columns contains the identification for the father and mother, respectively, of each person, if the father appears m the pedigree in your database Therefore, the paternal and maternal identification for the father of this pedigree is 6 and 7, respectively, but the mother’s father and mother do not appear m this database, so her paternal and maternal identifications are “x” for unknown The paternal and maternal identifications of the three children are 1 and 2 The grandparents’ parents do not appear m the pedigree, so their paternal and maternal identifica- tions are marked as “x.”

The fifth column contains the code for sex (m for male and f for female) This completes the requirements for the GAS-format pedigree, but if you wish, you can use a srxth column for affection status (if you are not using an affec-

Trang 32

32 March tion locus, this column is blank); y for affected, n for unaffected, and x for unknown, or you could put in the affection status as a separate file and merge It mto the pedigree file with your allele data (see ref 8 for details) Once these columns are completed, check that every individual in the pedigree is related to

at least one other person by means of the paternal and maternal identlficatlon column (i.e., person 122.2 is identified as the mother of person 122.3) If you have missing individuals in your pedigree, such as a missing mother or father,

it is worth inserting a “phantom” individual with unknown data Missing par- ents or nonconsecutlve identification numbers (e.g., children 3, 4, and 6) can cause some lmkage programs to crash Once you have checked the pedigree file, save it m a text format, with a name such as “red.ped.”

3.1.2 Preparation of “Gas” File

The next file to write 1s much smaller than the other two, but it 1s the most crucial This IS the command file, which has the extension “gas.” It can be written in any word processmg program that can be saved in a text format, or directly in a text editor There are a very wide range of options that can be used vta the command file, which are covered fully in the GAS manual The foilow-

mg is a basic introduction to the use of gas files in genetic mapping

To write a command file to read in ABI-generated raw data, for example on the loci Dl 1 and Ml5 m people who are affected or nonaffected at the locus red-hair, first you can view the distribution of the allele frequencies lake this (the remarks after ! m italics explain each lme and should not be included

m the file):

set outfile = bar;

! the program will wrote theple “bar.out” which gives results of the barchart gthere is no postscript file set

set logfile = bar;

! the program wdl wrtte the file “bar.log” whzch gives a record of any problems encountered

set pstile = bar;

! the program will write thefile “bar.ps” whzch gzves results of a barchart

rn graphical format

set locus dl 1 named nofreq nodata;

! set the numbered locus Dll to be read mto the pedigree jle, no gene frequencies or other data known

set locus m 1.5 named nofreq nodata;

! set the numbered locus Ml 5 to be read mto the pedigree jile; no gene frequennes or other data known

read( pedigree red.ped );

! read the gas-format pedlgreefile

Trang 33

Gene Ordering and Localization 33

Fig 2 Allele sizes for locus dl 1

read( alsize dl l.txt ml5.txt locus dll ml5 graph );

! read in the two files containing raw size data for loci Dl I and M15, and display a graphical barchart of the distribution of allele lengths of each allele

stop;

! end of program

Save this file as a text file with the extension “gas” (e.g., “red.gas”) Now make sure that all the files you have prepared are in the same directory as the GAS program (gas.exe) To run the gas program, simply type “gas” at the DOS

or UNIX prompt, and you will be given a list of all the gas programs available Your tile “red.gas” should be among them, so type “red.gas” and press return

If the program finds all the right files in the directory, and if the relationships between the individuals in the pedigrees are all written correctly, it will start to read in size data from the two files Dl l&t and MlS.txt

When the command file has finished running (which you will know because GAS will send you a friendly little message saying “SUCCESS!“), look at the

“red.ps” file (or the “red.out” file if your computer cannot display postscript files) to see whether your allele sizes are clustered around a 2- or 3-bp interval,

or whether they have a continuous distribution If the latter is the case, it will

be difficult to score your alleles globally (identify one particular allele across different families) It may be worth looking at your experimental procedures to improve the discrimination between different alleles Figure 2 shows the graphical output for a typical (CA), repeat The larger alleles fall neatly into clusters of 2 bp, but the smaller alleles show some overlap between clusters, possibly because of nonoptimal gel running conditions If it is not possible to

Trang 34

34 March improve drscrtmmatton between alleles, they may have to be scored “locally” (within families), as explained in the GAS manual

Once you have decided on the most appropriate dividing points between groups of alleles, such as 2 bp for (CA), repeats, edit the command file

“red.gas” as follows:

set allfile = red;

1 the three outfiles are “red.out “, “red log “, whzch gtves a record of any problems encountered and the dectsrons taken on the scorrng of allele stzes, and “redps ”

set locus red affection

1 set the affectzon locus with data found in the pedtgreejile

1 set the numbered locus DI 1 to be read into the pedigree file, no gene frequencies or other data known

set locus ml 5 named nofreq nodata,

1 set the numbered locus Ml5 to be read into the pedzgree file, no gene frequencies or other data known

read( pedigree red.ped ),

! read the gas-format pedtgreeple

read( alslze dl l.txt ml5.txt locus dl 1 ml5 sameslze=I 95 dlffslze=2 05 global );

1 read tn the twoPles containing raw stze data for loct Dll and M15, defining alleles tn 2-bp tntervals, so that alleles more than 2 bp apart are defined as different alleles, and score globally (the same stze IS the same allele over all the families used)

wnte( ldata red.dat ),

! write a ltnkage-format data file containtng gene frequencies of all the alleles scored

write( lpedigree red.new );

1 wrote a linkage-formatpedtgreefile contatnmg all the allele data scored

stop;

! end of program

As the program goes through the size data, tt will be able to score or “bin” most of the alleles, but there will be some that are “ambiguous” or do not fit easily into one or other bin GAS will ask you for confirmation of where to put these ambrguous alleles: in most cases following the suggested route will be fine, but if the gap between the two sizes given is very close to 2 bp, you may feel safer to follow the prompts to make a separate bm for this odd allele, or

Trang 35

Gene Ordering and Localization 35

to exclude it from the pedigree file You can fine-tune the number of alleles

“included,” “ excluded,” and “ambiguous” for each marker by slightly altering the “samesize” and “diffslze” criteria

Once the alleles have been scored, GAS will check that the inheritance of the alleles is consistent, and write you a warning message if there are any cases

of non-Mendelian inheritance In this case, the data will not be written to the new pedigree file, and you will need to check the scoring of alleles and, If necessary, re-enter them manually (see Section 4.2.) Once the program is fin- ished, you can check the files “red.log,” which will contain a record of which allele sizes were entered into which bm

3.7.3 Linkage-Format Pedigree and Data Files

You can now look at the linkage-format pedigree file red.new, which con- tains all your data scored mto numbered alleles The format of the new pedi- gree file is like thts (the top two rows are not included in the file):

The correspondmg linkage-format datafile looks like this (the remarks after

! explain each line and are not part of the program; some of these are written in

by the GAS preparation program):

! frequency of affectzon m homozygotes of dominant allele, in heterozygotes,

zn homozygotes of recessive allele

Trang 36

! thus lme 1s rewritten by the locus control program

Both these files are now ready for further processmg, and for running a wide variety of GAS or LINKAGE programs (see Section 4.3.), which will help you

m ordering and localizing your markers into a genetic map

3.2 Programs for Linkage Analysis

Version 2.0 of GAS mcorporates several programs or “routmes” designed for classical linkage analysis These routines use the Vitesse likehhood engine (9) and have the advantage of bemg able to analyze up to 8 loci simultaneously and to use highly polymorphic marker alleles without recodmg Although the Vitesse engine is said to be the fastest likelihood engine in existence at the time

of writing, it still takes a relatively large amount of computer time to perform these routines in comparison, say, to sib-pan linkage analysis (see Section 4.4.) The routines are not yet able to deal with more complicated problems, such as sex-linked loci; for these you should use the LINKAGE package (see Section 4.3.) However, for the construction of a simple genetic map, they are gener- ally much easier to use than the traditional programs

3.2.1 Ordering of Loci and Calculation of Recombination Fractions

If you do not know the order of your loci, you can use the “lik2point” rou-

tine in GAS to find the most likely order and to calculate the maximum likeh- hood recombination fraction between each of the pairs of 1oc1 used The same routine will estimate the maximum LOD scores between the loci and, if needed, the support levels for each

Trang 37

Gene Ordering and Localization 37

A typical “gas” file using “lik2point” and the problem described above would be:

set allfile = theta;

! write results tofiles theta.out, thetaJog and theta.ps

read( ldata red.dat );

! read the linkage-format data file containing all the details of the loci used

read ( lpedigree red.new );

! read the linkage-format pedigree file containing all the allele data program;

call lik2point( locus red Dl 1 Ml5 allorders );

! calculate the most probable recombination fractions among the Loci red, Dll, and M15, in all possible orders

stop;

! end of program

When the program has stopped running (which may take some time), you can read the file “theta.out” to see your recombination values and LOD scores:

3.2.2 Localization of an Unknown Locus on a Fixed Map

To localize one unknown locus on a fixed map of markers (either the map constructed using “lik2point” or a physical map), the routine “l&map” is used This program calculates likelihoods for the entire map with the unknown locus

at one position after another, indicating via LOD scores the most probable localization for the unknown locus (called “movable”), in relation to the known loci (called “fixed”) It is important to realize that “l&map” or any other para- metric multipoint linkage program (in other words, one where you have to specify the mode of inheritance) should not be used if the mode of inheritance

Trang 38

38 March

is unknown or too complex to be exactly specified in the datafiles using penetrance and liability classes This is because inaccuracies in the specified mode of inheritance tend to drive the placing of the unknown locus toward the edge of the fixed map Therefore, if the inheritance of your unknown locus is likely to be complex, it may be better to use one of the affected relative methods described briefly in Section 4.4., if you have access to this sort of family material

Multipoint linkage analysis requires a considerable amount of computer time; therefore, if more than eight fixed loci are used, it is essential that the loci are analyzed only a few at a time or the program may crash Using the LINKAGE package (see Section 4.3.), each subset must be set manually, but GAS uses the “dosets” parameter to do this automatically The default value for “dosets” is four loci to be analyzed at a time The gas file used might look like this:

set allfile = likmap;

read( ldata red.dat );

read( lpedigree red.ped );

program;

call likmap( lo&x dll ml5 d12 d13 d14 g40 841 842

! these are the eight fuced loci

LIKELIHOOD MAP for locus “red”

Distances calculated in M using haldane map

is set at the left-hand edge

of the map, beyond the iocus D11

Trang 39

Gene Ordering and Localization 39

Position

dll ml5 d12 d13

0.000601 0.0004 -0.520 0.000801 0.0002 -0.539 0.001 -0 -0.568

! the map position of the locus from the left-hand side) is 0

! the LOD score is -0.541 at theta = 0.805 between

Trang 40

March

genetic distance (haldane) Fig 3 Likmap analysis for locus red

4 Notes

4.1 Genetic Analysis-Some Definitions

The following is a brief definition of genetic terms used in this chapter

1 Gene frequencies: The relative frequencies in the population of the different states

or alleles of a gene or marker

2 Polymorphism: A gene or marker is defined as polymorphic if its alleles occur so frequently that they cannot be explained by recurrent mutation (normally the most common allele has a gene frequency of <95%)

3 Locus: The site on a chromosome occupied by a particular gene or marker

4 Haplotype: The alleles of different genes received by an individual from one parent

5 Linkage: Two loci are said to be linked when they are relatively close together on the same chromosome, so that alleles of different genes appear to be genetically coupled together on the same haplotype

6 Recombinants and nonrecombinants: A recombination event between two genes occurs if the alleles on an individual’s haplotype are derived from two different grandparents; if there is no recombination event, the alleles of the two genes on the individual’s haplotype will be the same as in one of the grandparents In certain situations, such as a family in which both parents are homozygous, it is not possible to distinguish between recombinants and nonrecombinants

Ngày đăng: 11/04/2014, 09:42

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
1. Bryant, S. (1989) Software for genetic linkage analysis, in Methods zn Molecu- lar Biology, vol 9 Protocols in Human Molecular Genetics (Mathew, C., ed ), Humana, Totowa, NJ, pp. 403-4 18la Bryant, S. (1996) Software for genetic linkage analysis. An update. A401 Biotech 5,49-61.2 Risch, N. (1990) Linkage strategies for genetically complex traits I multllocus models. Am. J. Hum Genet. 46,222-228 Sách, tạp chí
Tiêu đề: Software for genetic linkage analysis
Tác giả: S. Bryant
Nhà XB: Humana
Năm: 1989
10. Rysavy, F. (1994) The computing environment, in Guide to Human Genome Computmg (Bishop, M. J., ed.), Academic, London, pp. l-37.Il. Chapman, C J. (1990) A Visual interface to computer programs for linkage analysis Am J Med. Genet 36, 155-160 Sách, tạp chí
Tiêu đề: Guide to Human Genome Computing
Tác giả: F. Rysavy, M. J. Bishop
Nhà XB: Academic
Năm: 1994
3. Rlsch, N. (1990) Linkage strategies for genetically complex traits II. The power of affected relative pairs. Am. J Hum Genet 46, 229-241 Khác
4. Ward, P. J. (1993) Some developments on the affected-pedigree-member method of linkage analysis. Am. J Hum. Genet. 52, 1200-12 15 Khác
5. Matlse, T. C., Perlm, M., and Chakravartl, A. (1994) Automated construction of genetic lmkage maps usmg an expert system (MULTIMAP): a human genome linkage map. Nat Genet. 6,384-390 Khác
6. Davies, J. L., Kawaguchl, Y., Bennett, S T., Copeman, J B , Cordell, H. J , Pritchard, L E , Reed, P. W , Gough, S. C. L , Jenkins, S C , Palmer, S. M Khác
7. Bryant, S (1994) Genetic Linkage Analysts, m Gutde to Human Genome Com- putzng (Btshop, M J , ed ), Academic, London, pp 59-l 10 Khác
8. Krol, E. (1992) The Whole Internet User’s Guzde and Catalog. O’Reilly and Associates, Sebastopol, CA Khác
9. Spurr, N. K., Bryant, S. P., Attwood, J., Nyberg, K., Cox, S. A., Mills, A., Barns, R , Warne, D., Cullin, L , Povey, S., Sebaoun, J.-M., Weissenbach, J., Cann, H. M., Lathrop, M , Dausset, J., Marcadet-Troton, A., and Cohen, D. (1994) Euro- pean Gene Mapping ProJect (EUROGEM): Genetic Maps based on the CEPH refer- ence farmhes. Eur. J Hum Genet 2, 193252 Khác
14. Elston, R. C. and Stewart, J (1971) A general model for the genetic analysts of pedigree data. Hum Hered 21,523-542 Khác
15. Morton, N. (1955) Sequential tests for the detection of linkage Am J Hum Genet. 7,277-3 18 Khác
16. Lange, K and Elston, R. C. (1975) Extensions to Pedigree Analysts I. Likelihood calculation for simple and complex pedigrees. Hum Hered 25,95-l 05 Khác
17. Ott, J. (1974) Estimation of the recombination in human pedtgrees. efficient computation of the hkehhood for human linkage studies. Am J. Hum Genet. 26, 588-597 Khác
18. Lathrop, G M and Lalouel, J. M (1984) Easy calculations of lod scores and genetic risks on small computers. Am. J. Hum. Genet 36,460-465 Khác
19. Lander, E S and Green, P. (1987) Construction of multilocus genetic lmkage maps in humans. Proc. Nut1 Acud Scl USA 84,2363-2367 Khác

TỪ KHÓA LIÊN QUAN