In the present study, we found that the zebrafish genome contains at least seven vtg genes vtg1-7 encoding heterogeneous vitellogenins with three distinct groups: group A Vtgs Vtg1, 4-7
Trang 1Chapter 2
Characterization of the zebrafish vtg
family: sequencing, mapping and
phylogenetic analysis
Trang 2
Abstract
Vitellogenins (Vtgs) are precursors of yolk proteins in oviparous species and are cleaved into three portions upon uptake by oocytes, lipovitellin I (LVI), phosvitin (PV) and liopovitellin II (LVII) In the present study, we found that the zebrafish genome contains
at least seven vtg genes (vtg1-7) encoding heterogeneous vitellogenins with three distinct
groups: group A Vtgs (Vtg1, 4-7) contain all three major portions but lack the C-terminal half of LVII; group B Vtg (Vtg2) is the only one including intact three portions; group C
Vtg (Vtg3) lacks both PV and the LVII C-terminal half The seven vtgs were located in two different chromosomes: one (vtg3) in LG11 and the rest closely linked in LG22 Phylogenetic analysis indicated that the expansion of group A vtgs in zebrafish is lineage specific, whereas gene duplication forming precursors of three groups of vtg may occur
before the radiation of teleost fish and may through whole chromosome or genome duplication
Trang 32.1 Introduction
In oviparous vertebrates, yolk is critical for embryonic development as it is a rich source
of nutrients, including amino acids, phosphate, carbohydrates, lipids and vitamins In addition, maternally derived hormones such as thyroxine and T3 have also been detected
in embryonic yolk (Kobuke et al., 1987) Early studies on the composition of yolk revealed two classes of egg yolk proteins: the phosphoserine rich glycosylated phosvitin (PV) and lipid-binding lipovitellin (LV) (Wallace and Jared, 1968; 1969) Both types of proteins are derived from a common precursor protein termed as vitellogenin (Vtg), which was first coined by Pan et al (1969) Vtgs are calcium and zinc-binding phospholipoglycoproteins synthesized in hepatic parenchymal cells under the influence of female sex steroid hormone, estrogen (E2) (Wallace and Jared, 1969; Montorzi et al., 1995)
Vtgs are ancient proteins belonging to a multiple member family that includes a variety of lipoproteins (Bryne et al., 1989) Mammalian apolipoprotein B, the large subunit of mammalian microsomal triglyceride transfer protein and insect apolipophorin II/I precursor are suspected to share a common ancestor with Vtgs (Baker, 1988a&b; Babin et al., 1999) Vtgs are encoded by multigene families in essentially all oviparous species
examined For example, Wahli et al (1979) first reported that there were four vtgs in Xenopus laevis In the nematode Caenorhabditis elegans, six vtgs have been identified,
which are located on different chromosomes (Spieth et al., 1991 and references within)
The number of vtgs reported in different fish varies with species It is believed that a teleost genome contains 2-20 copies of vtgs depending on species The presence of several
Trang 4distinct vtg EST clones in the zebrafish (Danio rerio) also indicated that multiple copies of vtgs exist in this fish species (Gong et al., 1997)
The genes or cDNAs encoding Vtgs have been described in many vertebrate species
including chicken, Xenopus, and several fish species (Chen et al., 1997 and references
within) Vtgs are among the largest proteins and contain up to 2,139 amino acids with a predicted molecular weight of 250 kDa before post-translational modification (Chen et al., 1994a) After a Vtg precursor is internalized into the ovary by receptor-mediated endocytosis (RME), it is cleaved into LVI, PV and LVII (Wahli, 1988) PV is a serine rich
domain containing one or more stretches of serine residues In Xenopus, the PV can be
further cleaved into two smaller phosphoproteins, phosvettes I and II (Wahli, 1988)
In genomic organization, despite similar length of Vtg proteins, C elegans vtgs have 4-5 exons (Spieth et al., 1991) whereas Xenopus and chicken vtgs have 35 exons (Nardelli, et al., 1987) The size of the corresponding exons between Xenopus and chicken vtgs was
highly conserved except for exon 23, which generally codes for the PV moiety in vertebrates Preliminary studies on Vtgs in invertebrates suggested that these organisms do not have the PV domain (Spieth et al., 1985; Trewitt et al., 1992) However, a contradictory report from Chen et al (1994a) stated that two polyserine rich regions were
discovered in mosquito (Aedes aegypti) Vtg Whether Vtgs without phosvitin also exist in vertebrates is unknown Therefore, it is imperative to investigate vtgs from a wide
diversity of organisms for a better understanding of the evolutionary relationships among
vtgs
Trang 5The purpose of the present study was to characterize the zebrafish (Danio rerio) vtg multigene family, including identification of individual vtg members, elucidation of the primary structures of Vtg proteins, mapping of vtg genes and inferring the evolutionary
relationships among Vtgs of various fish species Based on this characterization, an appropriate Vtg candidate will be selected for preparation of a DNA carrier, which will be used in the development of a novel gene transfer method, receptor-mediated gene transfer (see Chapter 4)
Trang 62.2 Materials and Methods
2.2.1 vtg cDNA clones and DNA sequencing
All vtg cDNA clones were obtained from our previous EST clones isolated from an adult cDNA library (Gong et al., 1997) Longest clones representing vtg1 to vtg7 were
sequenced completely from both ends by automatic sequencing The deduced amino acid sequences of Vtg1-7 were determined by DNAMAN V 4.15 (Lynnon BioSoft) and putative signal peptide cleavage sites were determined by a computer program SignalP V1.1, accessible at http://www.cbs.dtu.dk/services/SignalP/ The deduced amino acid sequences of Vtg1-7 were used in Fasta search (http://www.ebi.ac.uk/fasta33/) to determine the most homologous Vtgs in other species for phylogenetic analysis
For manual sequencing, SK primer (5’-CGCTCTAGAACTAGGATC-3’) was used for direct sequencing of 5’ ends of cDNA inserts using the T7 Sequencing Kit (Pharmacia) and rapid denaturation and annealing were performed according to the manufacturer's instructions For automatic sequencing, primers of SK, T7 (5’-GTAATACGACTCACTATAGGGC-3’) and various gene specific primers (not shown)
were used for complete sequencing of whole inserts of representative vtg cDNA clones
ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction Kit (Perkin Elmer Applied Biosystems) was used to prepare sequencing reactions and the cycle sequencing reactions were performed on a GeneAmp PCR system 9600 (Perkin Elmer) Briefly, each
20 µl of reaction was composed of 8 µl of Terminator Ready Reaction Mix, 3.2 pmol of primer and 200-500 ng of plasmid DNA The reaction was performed for 25 cycles with the following parameters for each cycle: denaturing at 96 ºC for 10 sec, annealing at 50 ºC for 5 sec and extension at 60 ºC for 4 min The sequencing reaction products were then
Trang 7loaded onto polyacrylamide gel, followed by gel electrophoresis and data process using an automatic sequencer (ABI Prism 377, Perkin Elmer)
2.2.2 Sequence alignment and phylogenetic analysis
Putative Vtg amino acid sequences were aligned by a multiple sequence alignment program Clustal W (http://www.ebi.ac.uk/clustalw/) with default parameters Well aligned regions were chosen from each sequence and used in phylogenetic analysis by the parsimony method using the phylogenetic program PAUP v3.1 (Swofford, 1993) The input file was in NEXUS format following the PROTPARS example included in the PAUP program One-hundred bootstrap replicates were attempted using the heuristic type
of search The mosquito (Aedes aegypti) Vtg (GenBank accession No AAA18221) or C elegans Vtg1 (GenBank accession No U37430) was used as an outgroup in construction
of the phylogenetic tree Sequence alignment was also performed by DNAMAN V 4.15 (Lynnon Biosoft)
2.2.3 5' RACE PCR and partial genomic region amplification of vtg3
Because of the relatively high sequence divergence of vtg3, full length vtg3 cDNA was
isolated for further characterization Briefly, a Marathon cDNA Amplification Kit (Clontech) was used to construct an adaptor-ligated double stranded cDNA library from total liver RNA of female zebrafish based on the manufacturer’s instructions One gene-specific primer (primer 1: 5'-GGTAACTCAAGTGGCCAAGT-3', Figs 2-3 and 2-12) was designed for the 5' RACE-PCR using a diluted adaptor-ligated liver cDNA library as
templates The 5’ missing cDNA sequence of vtg3 was obtained by performing PCR using
primer 1 and an adaptor primer AP1 supplied by the manufacturer In 50 µl of PCR
Trang 8reaction, there was 29.8 µl of dH2O, 5 µl of 10 x PCR buffer, 5 µl of 10 x dNTPs (2 mM each), 3 µl of 25 mM MgCl2, 1 µl each of primer 1 and AP1 (10 µM each), 5 µl of 1/5 diluted adaptor-ligated liver cDNA library and 0.2 µl of Taq DNA polymerase (5 U/µl) PCR was performed on a DNA Thermal Cycler 480 (Perkin Elmer) with the following parameters: 94 oC for 2 min; 35 cycles of 94 oC for 30 sec, 60 oC for 1 min and 72 oC for 2 min; finally 72 oC for 8 min A PCR fragment of about 2-kb was recovered from agarose gel and ligated into pT7Blue vector (Novagen) Subsequent transformation was carried out using competent cells DH5α and resulting colonies were screened by PCR The plasmid
harboring the retrieved 5’ sequence of vtg3 was sequenced from the multiple cloning site
by automatic sequencing
To determine whether the vtg3 genomic sequence comprises a PV region, zebrafish
genomic DNA was extracted from a single fish according to a modified protocol reported
by Xu et al (1999), which was based on a standard protocol by Sambrook et al (1989) Two primers (primer 2: 5' -TGCACACTATCTTCACGAA-3' and primer 3: 5' -GCTGATGTATGAGTCCTAT-3', Figs 2-3 and 2-12) flanking the putative missing PV
region were designed and used in PCR amplification of partial genomic sequence of vtg3
Briefly, 250 ng of zebrafish genomic DNA was used as templates in 50 µl of PCR reaction, which contained the same reagents as described for the 5’ RACE PCR except for the template PCR was also performed on the DNA Thermal Cycler 480 with the following parameters: 94 oC for 5 min; 30 cycles of 94 oC for 30 sec, 54 oC for 1 min and
72 oC for 2 min; finally 72 oC for 8 min A fragment of ~ 3.0-kb was amplified from the genomic DNA and cloned into pT7Blue vector (Novagen) The 3.0 kb insert was then sequenced completely by automatic sequencing
Trang 92.2.4 Genome mapping of zebrafish vtgs
To map the seven vtgs, mapping PCR was carried out using the T51 radiation hybrid
panel, which consists of 94 radiation hybrids of zebrafish-hamster hybrid cell lines
(Research Genetics) (Kwok et al., 1998; Geisler et al., 1999) For vtg1, vtg4, vtg5 and vtg7, their forward and reverse mapping primers were designed based on nucleotide
sequences in the 3’end coding regions and 3’ UTRs, respectively (Figs 2-1, 2-4, 2-5 and
2-7) For vtg2, mapping primers were designed based on the sequence located in the unique 3’ end coding region of cDNA clone A183 (Fig 2-2) vtg3 mapping primers were targeted to an intron of the vtg3 gene (Fig 2-14B)
Mapping PCR was performed using Taq PCR Core Kit (Qiagen) 5 µl of 25 ng DNA (of radiation hybrids or controls) was used in each 20 µl of reaction which contained 6.5 µl of dH2O, 2 µl of 10 x PCR buffer, 4 µl of 5 x Q-solution, 0.4 µl of dNTPs (10 mM each), 1
µl each of forward and reverse primers (20 µM each) and 0.1 µl of Taq DNA polymerase (5 U/µl) The PCR machine was programmed with the following parameters: one cycle of initial denaturation at 95 °C for 3 min, 45 cycles of amplification at 94 °C for 30 sec, 55 °C for 15 sec and 72 °C for 1 min and a final extension at 72 °C for 8 min PCR products were resolved on a 1.5% agarose gel and images were recorded under UV illumination Most of the reactions were performed in duplicate and some in triplicate Consistent results from at least two separate reactions were scored according to SAMapper codes ("1" – specific band amplified, "0" – no band amplified, "R" – undecided) and submitted for analysis via e-mail to http://wwwmap.tuebingen.mpg.de:8082/rh/
Trang 102.2.5 Zebrafish databases used
Data on mapped vtg EST clones was retrieved from the zebrafish EST database at
http://zfish.wustl.edu/ (Washington University Zebrafish Genome Resources Project)
Searching of the UniGene database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene) and the Ensembl zebrafish
genome database v19.3.2 (http://www.ensembl.org/Danio_rerio/) was carried out for
determining vtg clusters and potential novel vtgs in the zebrafish genome As a
comparison, fugu(Fugu rubripes) genome database v2.0
(http://www.ensembl.org/Fugu_rubripes/) was also searched to determining homologous
vtgs in this new model fish.
Trang 11
2.3 Results
2.3.1 Presence of multiple vtg cDNAs in zebrafish
2.3.1.1 Identification of seven distinct vtg sequences from vtg EST clones
In our preliminary zebrafish EST project, among the 401 randomly selected cDNA clones
from an adult zebrafish cDNA library, 42 clones were identified to encode Vtgs (Gong et
al., 1997) Among them, twenty four (57.1%) were derived from the same gene, which
was named vtg1; six of them (14.3%) from a second gene named vtg2; one (2.5%) from a third distinct gene, vtg3; two (5%) from vtg4; seven (16.7%) from vtg5; one each (2.5%) from vtg6 and vtg7 (Table 2-1) Thus, the zebrafish has at least seven distinct and functional vtgs From the frequency of the seven vtgs in the cDNA library, it seems that vtg1 is the most abundantly expressed vtg vtg2 and vtg5 have similar medial expression levels, whereas the other vtgs are expressed at relatively low levels
The longest vtg inserts in each of the seven groups of vtg EST clones were sequenced completely, including A248 (vtg1), A183 (vtg2), A376 (vtg3), A391 (vtg4), A227 (vtg5), A220 (vtg6) and A349 (vtg7) The nucleotide and deduced amino acid sequences of the seven vtg cDNAs are shown in Figs 2-1 to 2-7 Among the seven vtg cDNA clones, A248 (vtg1) has the longest insert of 3645 bp, covering the majority of the coding sequence (CDS) A vtg1 genomic clone has been isolated recently in our lab and a 5’ end 573-bp vtg1 CDS was retrieved after sequencing (Shan, 2002; Tong et al., 2004) Thus, a combined full-length vtg1 cDNA was obtained (4199 bp, excluding 5’ UTR and poly-A
tail) and it encodes a complete Vtg1 of 1360 amino acid residues (Fig 2-1) The clone of
vtg3 (A376) is the second longest one with an insert of 2156 bp After 5’ RACE-PCR,
Trang 12Table 2-1 Percentage identity between vtg EST sequences and full-length cDNA
A126 98% (240) 66% (242) 45% (242) 59% (239) 69% (237) 77% (240) 42% (241)
A139 100%(207) 44% (207) 43% (208) 43% (204) 95% (207) 45% (204) 41% (205) A183 86% (316) 100%(319) 51% (316) 40% (315) 86% (313) 83% (319) 37% (317)
A186 96% (207) 76% (208) 52% (208) 94% (207) 94% (207) 93% (205) 93% (207)
A192 97% (241) 43% (239) 48% (240) 45% (238) 43% (240) 45% (240) 44% (240) A193 92% (287) 47% (287) 44% (288) 40% (290) 96% (287) 39% (283) 41% (285) A220 91% (237) 86% (227) 43% (235) 45% (232) 92% (237) 100%(237) 43% (233) A227 91% (249) 47% (244) 47% (247) 49% (246) 100% (249) 47% (249) 43% (247)
A248 100%(325) 43% (318) 40% (325) 44% (325) 45% (317) 45% (324) 46% (316)
A250 99% (222) 78% (222) 49% (222) 96% (222) 96% (222) 96% (219) 95% (209)
A252 97% (318) 42% (318) 39% (316) 43% (317) 95% (318) 41% (317) 40% (315) A253 96% (268) 80% (261) 48% (265) 97% (268) 100%(268) 97% (265) 95% (234)
A256 100%(208) 44% (206) 40% (208) 46% (209) 95% (208) 40% (210) 47% (208)
A257 100%(260) 44% (256) 47% (259) 46% (257) 97% (260) 44% (258) 45% (295)
A259 99% (250) 78% (244) 43% (249) 95% (250) 95% (250) 95% (250) 43% (249) A269 92% (328) 46% (323) 46% (328) 43% (320) 95% (328) 41% (322) 39% (329)
A272 97% (201) 79% (200) 52% (200) 94% (201) 94% (201) 93% (198) 92% (201)
A290 98% (207) 42% (207) 43% (205) 50% (204) 96% (207) 41% (208) 45% (205) A295 91% (219) 79% (218) 43% (218) 95% (218) 97% (218) 95% (215) 92% (218) A296 81% (253) 99% (253) 46% (252) 81% (253) 81% (253) 82% (250) 81% (253)
A300 97% (244) 45% (244) 43% (244) 48% (238) 95% (244) 44% (242) 40% (243)
A306 94% (215) 44% (217) 48% (216) 48% (211) 40% (216) 48% (215) 44% (211)
A342 99% (258) 43% (256) 41% (256) 44% (257) 39% (256) 43% (252) 43% (252) A349 96% (221) 81% (221) 53% (217) 94% (221) 95% (221) 95% (218) 100%(221)
A368 98% (306) 88% (289) 48% (304) 43% (304) 90% (304) 88% (299) 43% (306) A371 88% (306) 98% (265) 47% (304) 49% (306) 86% (307) 83% (274) 43% (305) A376 53% (339) 43% (337) 100%(341) 43% (338) 46% (337) 45% (336) 44% (341) A377 98% (368) 44% (360) 44% (368) 43% (364) 95% (368) 42% (368) 41% (364)
A381 100%(218) 48% (214) 48% (219) 40% (215) 92% (218) 43% (217) 43% (213) A391 85% (216) 72% (216) 49% (214) 100%(216) 92% (215) 82% (215) 44% (211) A397 98% (197) 47% (193) 48% (195) 43% (196) 99% (197) 48% (195) 47% (193) A401 48% (223) 98% (224) 47% (218) 49% (218) 47% (219) 50% (223) 48% (223)
A419 97% (178) 88% (178) 40% (176) 45% (174) 95% (178) 94% (178) 45% (178) A436 46% (312) 96% (313) 47% (312) 43% (306) 43% (309) 50% (314) 40% (307)
†Clones falling into one of the seven vtg categories are indicated by bold numbers (with ≥93%
sequence identities) Alignment range (bp) is listed in brackets Note that <50% sequence
identities were observed for many clones, which was due to the lack of overlapping region
*Full-length vtg cDNA sequence
**Partial vtg EST sequence
Trang 15Fig 2-1 Nucleotide and deduced amino acid sequences of full-length vtg1 cDNA It was obtained
by combination of vtg1 cDNA clone, A248 and the 5’ end missing sequence provided by Mr Tao Shan The start of vtg1 cDNA clone is indicated by an unfilled arrowhead The stop codon is
represented by an asterisk The polyadenylation signal, AATAAA and 3’ end XmnI site are underlined Primer sequences for vtg1RTF1/vtg1RTR1 (for real-time PCR) and vtg1MF1/vtg1MR1 (for mapping PCR) were underlined Putative signal peptide cleavage site and two intron positions are marked by an arrow and black arrowheads, respectively
Trang 16Fig 2-2 Nucleotide and deduced amino acid sequences of vtg2 cDNA clone, A183 The stop codon is
represented by an asterisk and the polyadenylation signal, AATAAA, is underlined Primers sequences for vtg2MF4/vtg2MR4 (for mapping PCR), vtg2RTF3/vtg2RTR3 (for real-time PCR) and vtg2MR1 were underlined Intron positions are marked by arrowheads CGLC motif in LV II region is boxed
Trang 18an asterisk and the polyadenylation signal, AATAAA, is underlined Primer sequences for primers 1, 2,
3 and vtg3RTF1/vtg3RTR1 (for real-time PCR) were underlined Intron positions are marked by black arrowheads
Trang 19Fig 2-4 Nucleotide and deduced amino acid sequences of vtg4 cDNA clone, A391 The stop codon is
represented by an asterisk and the polyadenylation signal, AATAAA, is underlined Primer sequences
for vtg4MF1/vtg4MR2 (for mapping PCR) and a 3’ end AseI site were underlined
Trang 20Fig 2-5 Nucleotide and deduced amino acid sequences of vtg5 cDNA clone, A227 The stop codon is
represented by an asterisk and the polyadenylation signal, AATAAA, is underlined Primer sequences for vtg5MF1/vtg5MR1 (for mapping PCR) and a 3’ end BamHI site were underlined
Trang 21Fig 2-6 Nucleotide and deduced amino acid sequences of vtg6 cDNA clone, A220 The stop codon is
represented by an asterisk The polyadenylation signal, AATAAA, and a 3’ end AseI site were underlined
Trang 22
Fig 2-7 Nucleotide and deduced amino acid sequences of vtg7 cDNA clone, A349 The stop codon is
represented by an asterisk and the polyadenylation signal, AATAAA, is underlined in italic letters Primer sequences for vtg7MF2/vtg7MR1 (for mapping PCR) and a 3’ end BamHI site were underlined
Trang 23a 5’ end 1802-bp sequence was obtained and the resulting near full-length vtg3 cDNA is
3938 bp long (excluding poly-A tail), encoding a Vtg3 with 1251 amino acid residues
(Fig 2-3) Retrieval of the missing 5’-end cDNA sequences for the other five vtg cDNA clones has not been carried out The insert lengths for vtg2 and vtg4-7 cDNA clones range
from 777 to 2138 bp, encoding partial Vtgs with 217 to 666 amino acid residues (Figs
2-2, 2-4 to 2-7) After multiple sequence alignment of the seven vtg cDNAs, it was revealed that cDNA clone A183 (vtg2) has a 771-bp extension of 3’ end coding region and may represent a unique vtg The heterogeneity of vtg cDNAs may be a common phenomenon
in teleost fish since similar observations were reported previously in other species For
example, vtg cDNA clones pSG Vg 5.50 from rainbow trout (Oncorhynchus mykiss) and pOA Vg 71 from tilapia (Oreochromis aureus) were found have 1.5-kb and 400-bp extension towards the 3’ end respectively when compared with other vtg cDNA clones in
the same fish (Le Guellec et al., 1988; Ding et al., 1990) It is worth to note that a “CGLC motif”, which was believed to be required for protein multimer assembly (Mayadas and Wagner, 1992) was found in the extension portion of the deduced Vtg2 amino acid sequence as well as in the similar position of rainbow trout Vtg1 (Figs 2-2 and 2-9) However, no such motif was found in either the full-length Vtg1 or Vtg3 sequences It is
also noteworthy that the lengths of the 3’ UTRs for vtg1 and vtg4-7 are very similar (ranging from 103 to 129 bp)in contrast with that of vtg2 (84 bp) or vtg3 (180 bp) (Figs 2-
1 to 2-7)
2.3.1.2 Sequence comparison of the zebrafish Vtgs
Multiple alignment of the C-terminal deduced amino acid sequences of Vtg1-7 showed
that there is a high degree of sequence similarity between most members except for Vtg2
Trang 24and Vtg3 As shown in Fig 2-8, about 84% of the C-terminal 215-217 amino acid residues are identical in Vtg1 and Vtg4-7 Among the 35 sites with different amino acid residues in different Vtgs, 30 of them were replaced by highly similar or similar amino acid residues Only 5 of them were substituted by dissimilar amino acid residues (Fig 2-8) Thus, it is
reasonable to conclude that vtg1 and vtg4-7 might arise from recent duplications of a like precursor gene, while vtg2 and vtg3 probably arose from more ancient gene
vtg1-amplification event
All known vertebrate Vtgs can be divided into three common domains, LVI, PV and LVII based on precursor-product relationships between Vtg and yolk protein products (LaFleur, 1999) In addition, in the entire Vtg sequence, five subdomains (I-V) have been defined which can be aligned relatively easily among Vtgs of divergent species from invertebrates
to oviparous vertebrates (Chen et al., 1997) By sequence comparison with rainbow trout Vtg1 (Mouchel et al., 1996), the lengths of the three domains in zebrafish Vtg1 are similar
to those of corresponding domains in rainbow trout Vtg1 except for LVII which is 278 amino acids shorter in the zebrafish Vtg1 (Fig 2-9) Based on the definition by Chen et al (1997), zebrafish Vtg1 contains only homologous subdomains I-III, but lacks IV and V (Fig 2-9) Deduced from our partial EST clone A183, zebrafish Vtg2 contains a truncated LVI domain but intact domains of PV and LVII (Fig 2-9) Interestingly, zebrafish Vtg2 is unique in that the length of its LVII domain (comparable to that in rainbow trout Vtg1) is the longest among those of the seven Vtgs In contrast to Vtg1, zebrafish Vtg2 contains homologous subdomains IV and V As for zebrafish Vtg3, it contains LVI and LVII, but
no PV domain (Fig 2-9) In addition, the LVII domain in Vtg3 is the shortest among the seven Vtgs Similar to Vtg1, zebrafish Vtg3 contains only subdomains I-III
Trang 25Vtg1 SRMSKTATIIEPFRKFHKDRYLAHHSATKDTSSGSAAASFEQMQKQNRFLGNDIPPVFAIIARAVRADQKLLGYQLAAYF 1190 Vtg2 w -l-km-a -q-kt gds- r-tgs-l i -s-y -tv -v-r -fv-f- 182 Vtg3 lnqrvfkekrdentsceerktssslpv-q- ldvtpd vtvk-lslspqa-p -egv-fy 1092 Vtg4 -m -p -k - 116 Vtg5 -k - 498 Vtg6 -m -a -k - 189 Vtg7 . -l -d - 47 † ‡ †
Vtg1 DKPTARVQL IVSSIAENDNMKICADGALLSKHKVTGKFS.WGAECKQYAVFAKAEAGVLGEFPAARLEVEWER.LPI 1265 Vtg2 -ss -a -f-f v -s-vt. -e -tt -l -f -w . - 257 Vtg3 l. -qkddiem -evg-ea-w-m nahfdrt-tsakahlr -qt-d-smrvs-acqp-skpsistkinwgt s 1171 Vtg4 - -a -a -. -y -. k 191 Vtg5 - -a -. -y -. k 573 Vtg6 - -a -a -. -y -. k 264 Vtg7 - -a -r -. -. - 122 † † ‡ ‡ #
Vtg1 IVTTYAKKLGKHILTAAYDTGFRFERATNSEKEIELTAALPSQRSLNIIARIPEITMSKRDIYLPVAVPINPDGTFSIET 1345 Vtg2 -f -s -pm lqa nv -k -l -v k-t v-v-v m -rm p -t -dvhf 337 Vtg3 vf vgqivqeyvpgvs-im yqknee-p-rqasv-vvas-petfdlkvk -r-iy-kk-ps-ielvgieaanltms- 1251 Vtg4 -as -vs-y n -f-k -v -r -t -t-d 270 Vtg5 -vc n -n -k -k -f -h it -n 652 Vtg6 -s-c n -k -v -t -n 343 Vtg7 -c k -k-fdf -e-p -h- 202 †† ‡† # † ‡ ‡ # ‡ ‡‡‡ ‡ ‡ # ‡† † †
Vtg1 YEDFLAWIQKYIKEE* 1360 Vtg2 -iyfra-n ydyttaqcsmmqdtistfnnktyknempiscyqvlaqdctselkfvallkkdeesekthlnvklvdid 417 Vtg3 * 1251 Vtg4 k -s p -n-d* 284 Vtg5 s h-h-m * 666 Vtg6 s h -d* 357 Vtg7 -s -d-* 217 † † † # †‡‡
Fig 2-8 Multiple alignment of the C-terminal deduced amino acid sequences of Vtg1-7 by Clustal W Dashes represent identical amino acid residues as in Vtg1 Dots are inserted for maximum alignment Asterisks mark the C-terminal ends of Vtgs The degree of similarity for the substituted amino acid residues in Vtg1 and Vtg4-7 is indicated below the Vtg7 sequence ‡, highly similar; †, similar; #, dissimilar Two vertical lines define the comparison region Sequences in the aligned region marked by horizontal lines were used in phylogenetic analysis (see Fig 2-10)
Trang 26signal peptide lipovitellin I phosvitin lipovitellin II
15 1073 57 514 rtVtg1 N I II III IV V C
Fig 2-9 Schematic representation of the deduced amino acid sequences of rainbow trout Vtg1 (rtVtg1)
and zebrafish Vtg1-7 (zfVtg1-7) The three common domains (LVI, PV and LVII) in rtVtg1 were
determined by Mouchel et al (1996) and the five subdomains (I-V) were determined based on Chen et
al (1997) Signal peptide, LVI, PV and LVII are marked by gray, white, black and dots, respectively
The length (in amino acid number) of each region is listed above boxes and the positions of flanking
amino acids are indicated below The N-terminals of zfVtg2-7 are truncated Regions used in
phylogenetic analysis are indicated by bars “CGLC” motives in rtVtg1 and zfVtg2 are indicated N,
N-terminal; C, C-terminal
Trang 27In zebrafish Vtg4-7, all of them lack subdomains IV and V in their LVII regions (Fig 9) Due to the truncated N-terminal sequence, it is unclear whether Vtg7 contains PV domains However, as the Vtg7 sequence is highly similar to the other PV containing Vtg sequences, it is most likely that Vtg7 also contains the PV domain Based on the above analysis, it is obvious that there are distinctive features in the deduced primary structure of the zebrafish Vtgs It seems that Vtg1 (and likely Vtg4-7) contains homologous subdomains I-III in LVI region, but lacks subdomains IV and V; Vtg2 likely contains all
2-of the five subdomains (I-V) in both LVI and LVII regions; Vtg3 also contains only homologous subdomains I-III in LVI region Furthermore, Vtg3 lacks the PV domain
2.3.1.3 Phylogeny of zebrafish Vtgs
In order to infer the evolutionary relationships among the seven zebrafish Vtgs, a phylogenetic tree was constructed based on a well conserved region in LVII domain, which was determined after the seven deduced Vtg amino acid sequences were aligned by the multiple sequence alignment program Clustal W (Figs 2-8 and 2-9) As shown in Fig 2-10A, the phylogenetic analysis indicated that Vtg2 and Vtg3 are the most divergent Vtgs and each of them belongs to a distinct Vtg group The remaining five Vtgs, including Vtg1
and Vtg4-7 are very similar between each other and all of them are clustered together, representing one Vtg group Thus, the seven zebrafish vtgs can be further classified into three groups which were named group A (including vtg1, vtg4, vtg5, vtg6 and vtg7), group
B (vtg2) and group C (vtg3) The previous observation that the lengths of 3’ UTRs of the seven vtgs also fall into three groups is consistent with the classification
Trang 28zfVtg1 zfVtg7 zfVtg5 zfVtg6 zfVtg4 zfVtg2 zfVtg3 ceVtg1
A
gVtg320
ccVtg fmVtg zfVtg1 zfVtg7 zfVtg5 zfVtg6 zfVtg4 ccVtg2 zfVtg2 kVtgI mVtgI frGeneA jsVtg hVtgA gVtg530 kVtgII mVtgII tVtg1 hVtgB rtVtg1
frGeneB zfVtg3 slVtg wsVtg ceVtg1
A
B
C B
Vtg group
A
BC
frVtgB
frVtgC
Fig 2-10 Phylogenetic analysis of fish Vtgs by parsimony method using the phylogenetic
program PAUP v3.1 A: Phylogeny of the seven zebrafish Vtgs (zfVtg), Vtg1-7, based on
an aligned region in LVII domain (see Figs 2-8 and 2-9) B: Phylogeny of fish Vtgs,
including the seven zebrafish Vtgs and eighteen Vtgs from other fish species based on
aligned regions in LVII domains (see Table 2-2) C elegans Vtg1 (ceVtg1) was used as an
outgroup in each of the two phylogenetic trees Branch lengths are proportional to the magnitude of sequence divergence For the definition of three groups of Vtgs, see text For the full names of fish species, see Table 2-2
Trang 29Table 2-2 Summary of vtgs used in phylogenetic analysis
Species Gene name GenBank accession number
(protein ID)
F/P † Region used
for phylogenetic analysis
(vitellogenin 1)
AB064320 (BAB79696)
(Melanogrammus
aeglefinus) vitellogenin B AF284034 (AAK15157)
F 1171-1330 aa hVtgB vitellogenin I U07055
(Vitellogenin-AB088474 (BAC06191)
elegans vitellogenin 1 precursor U37430 (P55155) F 1202-1368 aa ceVtg1
† F, Full-length; P, Partial coding sequence *NF, Near full-length ‡ Ensembl Gene ID.