1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: The astacin protein family in Caenorhabditis elegans docx

12 393 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 12
Dung lượng 466,74 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The astacin protein family in Caenorhabditis elegans1 Institute of Zoology, University of Heidelberg; 2 Max-Planck-Institute for Medical Research, Heidelberg, Germany In the nematode Cae

Trang 1

The astacin protein family in Caenorhabditis elegans

1 Institute of Zoology, University of Heidelberg; 2 Max-Planck-Institute for Medical Research, Heidelberg, Germany

In the nematode Caenorhabditis elegans, 40 genes code for

astacin-like proteins (nematode astacins, NAS) The astacins

are metalloproteases present in bacteria, invertebrates and

vertebrates and serve a variety of physiological functions like

digestion, hatching, peptide processing, morphogenesis and

pattern formation With the exception of one distorted

pseudogene, all the other C elegans astacins are expressed

and are evidently functional For 13 genes we found splicing

patterns differing from the Genefinder predictions in

WormBase, sometimes markedly The GFP expression

pattern for NAS-4 shows a specific localization in anterior

pharynx cells and in the whole digestive tract (as the secreted

form) In contrast, NAS-7 is found in the head of adult

hermaphrodites, but not in pharynx cells or in the lumen of

the digestive tract In embryos, NAS-7 fluorescence becomes

detectable just before hatching In C elegans astacins, three

basic structural and functional moieties can be discerned:

a prepro portion, the central catalytic chain and long C-terminal extensions with presumably regulatory func-tions Within the regulatory moiety, EFG-like, CUB, SXC, and TSP-1 domains can be distinguished Based on struc-tural differences of the regulatory unit we established six NAS subgroups, which seemingly represented different functional and evolutionary clusters This pattern deduced exclusively from the domain arrangement in the regulatory moiety is perfectly reflected in an evolutionary tree con-structed solely from amino acid sequence information of the catalytic chain Related catalytic chains tend to have related regulatory extensions The notable gene, NAS-39 shows a striking resemblance to human BMP-1 and the tolloids Keywords: astacin family; Astacus astacus; Caenorhabditis elegans; protein evolution; metalloproteases

The first evidence for the existence of the astacin protein

family can be traced back to the year 1967, when one of us

(R Zwilling) observed a proteolytic activity in the digestive

fluid of the decapode crayfish Astacus astacus that was

different to all other proteases known at that time [1]

Investigations of the cleavage and inhibition specificity

confirmed this notion [2–4] and the elucidation of its unique

amino acid sequence demonstrated definitely that the

crayfish protease represented a new protein family [5] In

subsequent studies, the X-ray crystal structure of the

ÔastacinÕ, was solved to a resolution of 1.8 A˚ [6] Astacin was recognized to be a metalloprotease exhibiting a penta-coordinated zinc ion in its active center [7] In addition, the site of biosynthesis [8], genome organization [9], and mode

of activation [10,11] have been elucidated, which made the crayfish protease a prototype for the astacins

A second member of the astacin protein family was identified when Wang et al and Wozney et al (1988) studied the human bone-inducing factor BMP-1, into which a domain with high resemb lance to crayfish astacin is inserted [12,13] After that many more astacin-like proteins or genes were described in rapid succession in vertebrates, inverte-brates and even in prokaryotes [14], where they serve as different physiological functions as food digestion, hatching, peptide processing, morphogenesis and pattern formation (for an overview see [15]) In the crayfish Astacus astacus, a second astacin gene can be found in the embryo that is activated only during a narrow time window just before hatching [16]

In the model organism Caenorhabditis elegans metallo-proteases are present in a great variety, as we have seen in data bank analysis (also [17]) On the other hand we have shown recently that the bulk of total proteolytic activity found in crude extracts of mixed stage populations consists

of acidic aspartyl proteases [18,19] However, with regard to the number of expressed astacin genes C elegans surpasses any other organism studied so far This investigation therefore was stimulated by the question, what for this 959-cell organism would need more than 30 different and active astacin genes

Correspondence to R Zwilling, Zoologisches Institut, Universita¨t

Heidelberg, Im Neuenheimer Feld 230, D-69120 Heidelberg,

Germany Fax: + 49 6221 544913, Tel.: + 49 6221 545887,

E-mail: RobertZwilling@t-online.de

Abbreviations: cDNA, complementary DNA; dsRNA,

double-stran-ded RNA; EST, expressed sequence tag; GFP, green fluorescent

protein; L1-4, larval stage 1–4; OST, open reading frame sequence tag;

RNAi, RNA interference; RT-PCR, reverse transcription-polymerase

chain reaction; NAS, nematode astacin.

Note: Supplementary figures are available at

http://www.zoo.uni-heidelberg.de/moehrlen

Note: The sequences and the alignment reported in this paper have

b een sub mitted to GenBank/EMBL/DDBJ data b ank with accession

numbers AJ561200, AJ561201, AJ561202, AJ561203, AJ561204,

AJ561205, AJ561206, AJ561207, AJ561208, AJ561209, AJ561210,

AJ561211, AJ561212, AJ561213, AJ561214, AJ561215, AJ561216,

AJ561217, AJ561218, AJ561219, AJ561220, AJ561221 and

ALIGN_000543.

(Received 3 September 2003, revised 15 October 2003,

accepted 22 October 2003)

Trang 2

Materials and methods

The C elegans wild-type strain N2 variant Bristol was

grown as a liquid culture in S-medium [20] supplemented

with Escherichia coli OP50 as food source The cultures were

When the E coli food source appeared to have been nearly

exhausted, the nematodes, representing a mixed population

of adults, all four larval stages and eggs, were harvested and

separated from bacteria as described elsewhere [20]

RNA purification

For the isolation of RNA, 100 lg fresh or frozen nematode

pellets from a liquid culture were ground by means of a

pestle in a mortar containing liquid nitrogen Total RNA

was extracted from the resulting powder following the

protocol of Chomcynski and Sacchi [21] Contamination by

genomic DNA was avoided by treating total RNA with

DNase I (RNase-free, Boehringer) Poly(A)-rich RNA was

isolated by the Oligotex mRNA procedure (Qiagen,

Germany)

DNA purification

Genomic DNA was isolated from 1 mL fresh nematodes

from a liquid culture using a standard protocol [22]

PCR amplification and cloning

Polyadenylated RNA (1 lg) was converted into

hexa-mer prihexa-mer as described [23] For the amplification of the

predicted astacin-like cDNA fragments specific

oligo-nucleotide primers derived from the genome sequencing

data were used Primer sequences are available at http://

www.zoo.uni-heidelberg.de/moehrlen/docs/WebFig1.htm

PCR amplification was performed on single-stranded

cDNA or genomic DNA as a control with 2 U high fidelity

the mutation rate inherent to the PCR reaction The cycling

8 min After PCR, samples were analyzed in 2% agarose

gels and discrepancies between expected and observed size

of any PCR product were readily detected on visual

inspection of the gels The PCR products were then excised

from 1.5% agarose gels and purified with a NucleoSpin

gel-extraction kit (Macherey and Nagel, Germany) The

purified fragments were subjected to the SureClone

Ligation procedure and cloned into a pUC18 vector

according to manufacturer’s instructions (Pharmacia,

Sweden)

Plasmid DNA was prepared and subsequently nucleotide

sequences were determined by double-strand sequencing

according to the dideoxynucleotide chain-termination

method, using T7 DNA polymerase (Amersham, Sweden)

Universal M13 primers were used for sequencing All

sequences have been deposited in EMBL/GenBank/DDBJ

under accession numbers AJ561200, AJ561201, AJ561202,

AJ561203, AJ561204, AJ561205, AJ561206, AJ561207, AJ561208, AJ561209, AJ561210, AJ561211, AJ561212, AJ561213, AJ561214, AJ561215, AJ561216, AJ561217, AJ561218, AJ561219, AJ561220, AJ561221

GFP fusion genes for expression studies The genomic sequence data in WormBase [24] were used to identify a genomic DNA fragment suitable for fusion to a GFP reporter gene In order to make sure that the gene specific promoter and all proper cis-elements necessary for guiding tissue specific expression are included in the reporter, the whole upstream region between the gene of interest and the neighboring upstream gene was used For PCR amplification of the genomic DNA fragment the forward primers NAS-4:GFP/SacI/F1 (5¢-CGA GCT CTT GAG TGA AGA TGC CAA GA-3¢), NAS7:GFP/BamHI/ F1 (5¢-CGG GAT CCT TCC GCC AAA GTC ATT TAG-3¢), NAS-15:GFP/PstI/F1 (5¢-AAC TGC AGC TTT TCG GAA GAC TTT TGC-3¢), NAS33:GFP/KpnI/F1 (5¢-GGG GTA CCC CGG ACC ACA GTA AAG AAT-3¢) and the corresponding reverse primers NAS4:GFP/KpnI/R1 (5¢-GGG GTA CCC TGA CAC GCT GAC CCA TAC-3¢), NAS7:GFP/KpnI/R1 (5¢-GGG GTA CCC GATC CTC GCA TTC TA-3¢), NAS15:GFP/KpnI/R1 (5¢-GGG GTA CCC GCT GGG TAG TGG AGT TG-3¢), NAS33:GFP/SacI/ F1 (5¢-CGA GCT CTG ACA AGA AAG GCA CAA AG-3¢) were used A 8–10 kbPCR fragment containing approximately 3–5 kbupstream sequences down to the last 30–50 codons of the astacin genes was fused in frame to the reporter gene GFP Thus, the intergenic region as well as the protein coding regions of the astacin-like genes NAS-4, NAS-7; NAS-15 and NAS-33 were amplified with 2 U Elongase DNA polymerase (Invitrogene, Germany), gel purified (NucleoSpin gel-extraction kit, Macherey and Nagel, Germany) and cloned in frame into a pBD95.85 vector (having the S65C mutation and artificial introns

to increase the expression of GFP; A Fire Vector Kit, Baltimore, USA) according to standard protocols [23,25] The molecular details of all fusion constructs are available

on request The construct, together with the marker plasmid pBx, was introduced into pha-1 hermaphrodites, and the worms having the constructs as extrachromosomal arrays

under a Zeiss Axiovert 200 microscope

Sequence analysis and phylogenetic studies

To identify metalloprotease genes in the genome of

C elegans, we used representative vertebrate and insect proteins, or their conserved domains according to the PFAM [26] and PRINTS database [27], as queries for BLAST searches [28,29] of WormBase [24] For astacin genes the astacin domain, the zinc binding motif or the Met-turn sequences, as listed by PRINTS, were used to repeatedly screen the whole C elegans genomic sequence, available from WormBase

DNA sequences of all astacin genes were further analyzed

structures were compared to the Genefinder predictions as annotated in WormBase, and to the alternative GenieGene open reading frame predictions of Kent and Zahler [31] The

Trang 3

splicing patterns were subsequently refined using the EST/

OST sequences available in the latest WormBase release

(WormBase97, 7 March 2003) and the cDNA sequences

resulting from this work Discrepancies between the

WormBase, GenieGene predictions and our own cDNA

sequences were communicated to those annotating the

docs/WebFig2.htm) The corrected cDNA sequences were

unconfirmed splicing patterns, those protein predictions

were used for further analysis, which are in accordance with

the protein family alignment showing no exceptional

ALIGN_000543)

For identification and annotation of protein domains and

the analysis of domain architectures the tools of the

SMART [33], PFAM [26], ProDom [34] and INTERPRO

[35] protein domain databases were used

For phylogenetic studies the active protease domains,

covering the region from Ala-1 to Leu-200 in the

prototype crayfish astacin, from the C elegans astacins

and selected other astacin family members were aligned

further manipulation The alignment is available at EMBL

database with accession number ALIGN_000543

Phylo-genetic analyses were carried out using the

neighbor-joining method and the Bayesian phylogenetic method

package [37] was used Distances between the pairs of

protein sequences were calculated and corrected for

multiple changes according to the PAM001 distance

matrix The reliability of the tree was tested by bootstrap

analysis with 100 replications Bayesian phylogenetic

program [40] with the WAG matrix [41] assuming a

gamma distribution of substitution rates Prior

probabil-ities for all trees and amino acid replacement models were

equal; the starting trees were random Metropolis-coupled

Markov chain Monte Carlo sampling was performed with

one cold and three heated chains that were run for 50 000 generations Trees were sampled every 10th generation Posterior probabilities were estimated on 2000 trees

Results and discussion

During a preliminary data base survey we observed in 1996 that the 959-cell organism C elegans accommodates a surprising number of gene sequences coding for astacin-like proteins, while for other species with a much larger genome not more than 2–3 astacin genes had been reported (G Geier and R Zwilling, unpublished)

The complete sequencing of the 97 megabase genome of

1998 [43] then made a thorough analysis possible The latest WormBase release (WormBase97, 7 March 2003) contains now 21 437 coding sequences when counting

1891 alternate splice forms Of these the MEROPS protease database (latest release 6.11: 20 January 2003) lists 382 protease genes (E.C.3.4), of which 158 genes belong to the group of metalloproteases (E.C.3.4.24) The metalloproteases of C elegans can be arranged into 11 protein clans and subdivided into 27 protein families, according to the nomenclature of Barrett et al [44] Our own BLAST searches in WormBase, using protein family consensus sequences according to the PFAM or PRINTS databases as queries, revised the number of identified

BLAST searches based on the whole astacin domain, the zinc binding motif or the Met-turn sequence revealed some more astacin genes in C elegans in addition to those listed by MEROPS so far, which finally brought up the total number of astacin genes in C elegans to 40 (Tables 1 and 2)

The nomenclature proposed for these 40 C elegans astacin genes is in accordance with suggestions of the

Table 1 One hundred and fifty-one genes coding for metalloproteases in C elegans Identification of genes was based on data available in MEROPS (The protease database, release 6.11: 20 January 2003, http://merops.sanger.ac.uk) and subsequently corrected by BLAST searches using the genome sequencing data of C elegans Nomenclature is according to Barrett et al [44].

Clan Protease family

Number

of genes Clan Protease family

Number

of genes MA(E) M1 aminopeptidase 12 MF M17 leucyl aminopeptidase 2

M2 peptidyl-dipeptidase 1 MG M24A methionyl aminopeptidase I 5

M41 E coli endopeptidase 3 M20A/B glutamate carboxypeptidase 5

M12A Astacin 40 MJ M38 beta-aspartyl dipeptidas 1 M12B/C ADAM 10 MK M22 O-sialoglycoprotein endopeptidase 2

M14B carboxypeptidase E 3 MX M48A Ste24 endopeptidase 1

M16B mitochondrial processing peptidase 3 M67 proteasome regulatory subunit RPN11 3

Trang 4

RT-PCR sequen

Trang 5

C elegans Sequencing Consortium In Table 2 we have numbered these C elegans astacins (nematode astacins, NAS) from 1 to 40 The two proteins NAS-23 and NAS-40 (located on cosmids F54B8 and D1022) are not recorded in the WormPep database (predicted proteins from Worm-Base) but could be detected by a genomic TBLASTN search

GENSCAN did not predict a complete protein but rather an

88 amino acid fragment which is interrupted by two stop codons

Hishida et al [45] reported that HCH-1 (¼ F40E10.1, NAS-34) is required for normal hatching and neuroblast migration in C elegans For all other astacin genes, beyond the Genefinder protein prediction in WormBase and the partial transcription analysis by the EST or open reading frame sequence tags (OST) projects no further details were known It therefore was indispensable to confirm as a first step for each gene the existence of expression products

Transcriptome analysis Comparing all genomic DNA sequences of astacin genes identified by our BLAST search to the cDNA data of WormBase it became evident that for 12 of the total of 40 genes EST or OST clones [46,47] were already known (WormBase release 57, 17 December 2001) This confirmed that the 12 genes in question were expressed on the mRNA level

The remaining 28 genes were analyzed by RT-PCR followed by sequencing of the DNA fragments in order to demonstrate their transcription activity For each gene specific primer pairs were synthesized, the gene frag-ments amplified by PCR and the products analyzed on agarose gels (http://www.zoo.uni-heidelberg.de/moehrlen/ docs/WebFig1.htm) In each case the PCR reaction with reverse-transcribed RNA was accompanied by a control reaction with genomic DNA Introns within the amplified DNA regions gave rise to correspondingly larger DNA fragments when compared to their cDNA fragments For unambiguous identification and for the correction of erroneous splicing pattern predictions for all DNA frag-ments the PCR products were eluted from a agarose gel, blunt end cloned into the vector pUC18 and subsequently

docs/WebFig2.htm)

In combination with the recently available EST and OST sequences (WormBase release 97, 7 March 2003) we found for 13 genes (Table 2) splicing patterns differing from the Genefinder predictions in WormBase, sometimes markedly

In these cases, the experimental cDNA transcripts were in good accordance with the alternative GenieGene open reading frame predictions of Kent and Zahler [31] (Table 2

WebFig2.htm) For 1, 21, 22 and

NAS-28 we observed aberrant splice sites from both, the Genefinder and the GenieGene prediction The manually corrected cDNA sequences can be found at http://www

new sequence data including corrected gene structures have been submitted to WormBase and EMBL/GenBank/DDBJ databases (for accession number, see footnote) The genes

Trang 6

NAS-2, NAS-5, NAS-16, NAS-17, NAS-18 and NAS-30

showed no apparent PCR product in our RT-PCR analysis

(Table 2, http://www.zoo.uni-heidelberg.de/moehrlen/docs/

WebFig2.htm) However, the microarray projects of Hill

et al [48,49], Kim et al [50], or Jiang et al [51] (for an

overview see WormBase) support the expression of these

genes We would like to point out that this technique has

no way to unerringly verify either the identity or the

splicing pattern of a gene because no sequence data are

produced

Nevertheless, in summary it may be stated that with the exception of pseudogene NAS-40 for all other 39 astacin genes a transcription activity could be confirmed

Functional analysis

We made an attempt to analyze the function of selected astacin genes in C elegans investigating the expression pattern of four representative astacin genes of different subgroups (see section on Structural and phylogenetic analysis, Fig 2.) using GFP-fusion constructs All astacin-GFP fusions were assayed for expression in animals from embryonic stages onwards At least three independent transgenic lines were generated from at least two inde-pendent clones of each of the astacin-GFP fusion constructs to control for PCR-induced sequence errors The reporter gene fusion 15::GFP and NAS-33::GFP failed to give detectable expression in any life stage The fusion protein NAS-4::GFP showed extensive GFP fluorescence throughout the digestive tract in larval stages and in adult worms (Fig 1A) At higher magnifi-cation, we saw GFP staining within pharynx cells of the procorpus, metacorpus, isthmus and terminal bulb, and extracellular staining in the lumen of the terminal bulb (Fig 1B, arrows) Therefore, NAS-4 most likely is secreted

by the pharynx cells into the lumen and then is found in secreted form all the way down in the lumen of the gut

We conclude from this expression pattern that NAS-4 is associated with digestive functions Of special interest is the notion that NAS-4 and the digestive enzyme astacin from crayfish [8] have a similar domain arrangement, both lacking a C-terminal extension (see section on Structural and phylogenetic analysis) They also cluster in the

similar functions These considerations might be extended

to the whole subgroup I (Fig 2, NAS-2–6) which shares these features

By contrast, NAS-7::GFP staining was observed only in the head of adult hermaphrodites, but not within pharynx cells (Fig 1C) The expressing cells are located outside of the pharynx, around the metacarpus and the terminal bulb, and could include neurons, cells of the excretory system or gland cells of still unknown functions [20] Reporter gene expression also became detectable in the embryo before hatching (Fig 1D) While at this moment the function of the gene expressed in the adult remains open, in the embryo it possibly could serve as a hatching enzyme

To further characterize the function of astacin genes in

Gonczy et al [52], Fraser et al [53], Maeda et al [54], Kamath et al [55,56], Ashrafi et al [57], Lee et al [58] and Pothof et al [59] Although nearly all astacin genes have been investigated for gene silencing by RNAi, most of them lack of an obvious phenotype and no function could be deduced from the attempted inactivation Whether this phenomenon reflects the dsRNA interference being incom-plete or a redundancy in functions for the high number of expressed astacin genes remains to be established Strong RNAi phenotypes were observed for NAS-9, -11 and -37 only, revealing these three astacin genes to be essential Inactivated NAS-9 showed 6% embryonic lethality [54],

Fig 1 GFP expression pattern images for NAS-4 (A, B) and NAS-7

(C, D) (A) Extensive GFP fluorescence throughout the digestive tract

in an adult hermaphrodite and a L2 larvae for a NAS-4::GFP fusion

gene; 100 · magnification (B) Higher magnification of the head of an

adult hermaphrodite showing GFP expression for the same construct

in pharynx cells and in the lumen of the terminal bulb; 400 ·

magni-fication (C) GFP expression of a NAS-7::GFP fusion gene is found in

the head of adult hermaphrodites, but not in pharynx cells or in the

lumen of the digestive tract; 300 · magnification (D) In embryos

NAS-7::GFP reporter gene fluorescence became detectable just before

hatching; 400 · magnification.

Trang 7

NAS-11 showed retarded growth [56] and NAS-37 showed

long body deviancy and a molt defect [54,56] As a rule it

can be stated that all known astacin gene inactivations had

only little, if any, effect One explanation for this could be

that C elegans astacins have overlapping functions, which

is also suggested by structural homologies

Structural and phylogenetic analysis

All known sequence data of astacin-like proteins are derived

from cDNA and genomic sequences, with the exception of

crayfish astacin, which in addition had been completely sequenced by Edman degradation [5]

The present analysis is based on protein sequences available from SwissProt, TrEMBL, EMBL, and GenBank databases If necessary, open reading frames of DNA sequences were translated by the HUSAR Package into amino acid sequences For C elegans we used the Gene-finder or GenieGene predictions corrected by our cDNA

WebFig1.htm) Altogether, we found over a hundred

Fig 2 Schematic representation of homologues and domain structures in astacin genes in C elegans Pre-pro sequences, catalytic domain and presumably regulatory appendices Diagram scale is related to amino acid length Presequences, purple shaded boxes; prosequences, grey oval; astacin domain, red box; six cysteins, SXC; EGF-like, yellow oval; CUB domains, CUB; thrombospondin-1 like, TSP1; low complexity sequences, striped boxes; not specified, open boxes.

Trang 8

are known at present (http://www.zoo.uni-heidelberg.de/

moehrlen/docs/WebFig2.htm) Considering only the

euca-ryote genomes sequenced completely, in human and mouse

six, and in Drosophila melanogaster 12 astacin genes are

found However, the tiny 959-cell organism C elegans

exhibits the striking number of 40 astacin genes, a number

by far not reached in any other organism studied up to now

With the only exception of the pseudogene NAS-40 all these

genes are expressed and seem to have specific functions

Therefore, these findings not only allow the study of an

extraordinary divergence of a protein family within one

single organism, but also shed light on a multiple functional

fine modulation evolving from a common structural source

In the astacins typically three basic structural and

functional moieties can be discerned: a pre-pro portion,

the catalytic astacin chain, and long C-terminal extensions,

which presumably contain messages for proper function

(Fig 2) Pro-sequences are found in all functional C elegans

astacins, while presequences (signal peptides) are lacking in

nine genes (Fig 2) The missing of signal peptides in these

genes may reflect specific intracellular functions of

non-secreted proteins On the other hand the lack of these signal

peptides could also reflect problems with the still

uncon-firmed 5¢-gene predictions of Genfinder or GenieGene as the

sequencing data produced here have been limited to

PCR-derived fragments, and to the reanalysis of EST and OST

fragments In some rare cases in other organisms prepro

structures may be lacking completely, often combined

with a N-terminally truncated catalytic domain [Cortunix

exception of the not expressed pseudogene NAS-40) this

feature never could be seen In the central domain of all

been identified in crayfish astacin as essential for catalytic

activity [6,7,60,61] are preserved without exception From

this fact it may be concluded that all C elegans astacins

potentially have catalytic activity, too

complex C-terminal extensions adjacent to the catalytic

domain, which presumably define time and place of their

activity (Fig 2) Based on homology criteria within these

appendices CUB-, EGF-, SXC-, and TSP-1 domains can

be discerned, while other sequences must be classified as

Ônon specificÕ or having Ôlow compositional complexityÕ

(LC) LC regions are often Ser/Thr-rich, are found in

many astacins and could serve as sites for O-glycosylation

EGF domains are epidermal growth factor like modules

(PFAM accession number: PF00008) CUB domains

(SMART accession number: SM0042) are named after

their occurrence in complement components C1r/C1s,

embryonic sea urchin protein Uegf, and BMP-1 [62]

These domains may be involved in calcium-binding and

protein-protein or enzyme–substrate interactions [63] The

SXC (six-cysteine) motif was observed in several

hypo-thetical C elegans proteins [64,65] but was originally

described in metridin, a toxin from sea anemone and is

also called ShK toxin domain (SMART accession number:

SM0254) TSP-1-like domains are thrombospondin type 1

repeats (SMART accession number: SM0209) which are

present in several families of metalloproteases namely in

the ADAM-TS proteases (ADAM-TS, a disintegrin-like and metalloproteinase with thrombospondin type I motifs; family M12B/C, see Table 1) TSP-1 domains are reported here for the first time for astacins

C-terminal extensions we arranged all 40 C elegans genes into the subgroups I–VI (Fig 2) Subgroup I comprises five genes with no C-terminal extension (NAS-1), or with short, unspecific extensions, where probably no specific signals can be accommodated Subgroup II exhibits in its

10 genes exclusively the SXC domain, while other domain types are completely lacking The SXC domain appears in

a single, double or triple arrangement and the domains may be attached directly to the catalytic chain or separated from it and from each other by short, unspecific sequences A tandem-like arrangement can only be seen with these SXC domains, while other domain types are represented only once in a regulatory chain (for an exception see subgroup VI) Subgroup III combines 15 genes that typically have an EGF-like domain directly attached to the catalytic chain, followed by a CUB domain In gene NAS-18 the CUB domain and in gene NAS-21 the EGF-like domain is missing In subgroup IV (two genes) a SXC domain and in subgroup V (six genes)

a TSP-1 domain is added to EGF and CUB domains,

Fig 3 Phylogenetic relationship of the astacins, including all C elegans astacin proteins (shaded yellow) and selected examples from other organisms The tree was deduced by Bayesian and neighbor-joining analysis based on the alignment of the amino acid sequences of the catalytic chain At branching points, Bayesian posterior probabilities and bootstrap values greater than 50 of 100 replications (values in parentheses) and are given as an indication for the confidence of the tree presented The scale bar represents a distance of 0.1 accepted point mutations per site (PAM) Evolutionary subgroups of the astacin protein family are indicated on the right side The schematic repre-sentation of the protein domains (colored bars) corresponds to that in Fig 2 Meprin domains: MAM domain, MAM; MATH domain, MATH; I-domain, I; intervening sequence, inter; transmembrane domain, TM; cytoplasmic domain, c For an overview, see [66] Abbreviations and Swissprot/TREMBL/PIR accession number of the astacins: AA Astacin, Astacus astacus (crayfish) astacin (P07584); AC TBL-1, Aplysia californica TBL-1 (P91972); AJ EHE-4, Anguilla japonica (fish) EHE-4 (Q90Y89); CC Nephrosin, Cyprinus carpio (fish) Nephrosin (O42326); DM Tolloid and Tolkin, Drosophila melanogaster Tolloid (P25723) and Tolkin (Q23995); FM Flavast, Flavobacteriumm eningosepticumFlavastacin (Q47899); HS BMP-1, Homo sapiens bone morphogenetic protein 1 (Q14874); HS Meprin A and B, Homo sapiens Meprin a (Q16819) and b (Q16820); HS TLL and TLL-2, Homo sapiens Tolloid like 1 (Q9NQS4) and 2 (Q9UQ00); HV HMP-2, Hydra vulgaris (Cnidaria) Metalloprotease 2 (Q9XZG0);

MM BMP-1, Mus musculus BMP-1 (I49540); MM Meprin A and B, Mus musculus Meprin a (P28825) and b (Q61847); OL LCE and

HCE-1, Oryzias latipes (fish) low choreolytic enzyme (P31581) and high choreolytic enzyme 1 (EMBL:M96170); PC PMP-1, Podocoryne car-nea (Cnidaria) Metalloprotease 1 (O62558PL); PL BP-10, Paracen-trotus lividus (sea urchin) blastula protease 10 (P42674); SP BMP-H, Strongylocentrotus purpuratus (sea urchin) BMP-1 homolog (P98069);

SP SPAN, Strongylocentrotus purpuratus (sea urchin) SPAN (P98069);

TR MP, Takifugu rubripes (fish) HCE-1 (AAL40376); XL BMP-1, Xenopus laevis BMP-1 (P98070).

Trang 9

which show an identical arrangement as in subgroup III.

Subgroup VI is a special case: the only entry NAS-39

shows a striking similarity to human bone inducing factor

BMP-1 A comparison between both proteins reveals a

sequence identity of the catalytic chains of 74%, while for

other nematode astacins this value reaches on average only

40% But also xolloid (Xenopus), tolloid and tolkin

(Drosophila) and TBL-1 (Aplysia) have corresponding structures The Number and arrangement of CUB- and EGF-domains are identical in these genes NAS-39 exceeds in its length by far all other C elegans genes It will be interesting to see what physiological role a factor almost identical to human BMP-1 might perform in

Trang 10

primordial functions from which human BMP-1 has

evolved The distinctive and complex pattern, which

appears in the subgroups I–VI seems to provide a specific

function for each C elegans astacin gene Members of the

same subgroup might have similar or identical functions

We constructed a phylogenetic tree comprising all 39

expressed C elegans astacins and in addition selected

astacin proteins from a variety of other organisms

(Fig 3) The tree is based on a multiple alignment of the

amino acid sequence of the active protease domain, covering

the region from Ala1 to Leu200 in the prototype, crayfish

astacin Results were corrected with help of the known

secondary structures and conserved regions of crayfish

astacin The alignment has been submitted to EMBL

databank with accession number ALIGN_000543

Phylogenetic relationships were initially established on

program package As outgroup we used the

phylogeneti-cally most remote flavastacin from bacteria However, an

isolated occurrence of an astacin sequence in a single

bacteria species could be due to a lateral gene transfer,

which would render this sequence unsuitable as an

out-group Because recently at least one more astacin-like

protein has been detected in bacteria

(http://www.zoo.uni-heidelberg.de/moehrlen), lateral gene transfer is most

unlikely Moreover, we also tried the phylogenetically

remote Cnidaria astacins (HMP-2 and PMP-1) as an

outgroup, which gave exactly the same phylogenetic tree

For statistical verification a consensus tree including 100

sequences was calculated and bootstrap values were

estab-lished for each point of divergence However, the

phylo-genetic tree based on the neighbor-joining method showed

rather low bootstrap values (< 50) for the most ancestral

nodes (Fig 3) Pro sequences could not be used additionally

to strengthen these branching points because they are

differing extremely in length, are changing rapidly or are

lacking completely A similar consideration can be made for

the C-terminal extensions The robustness of the tree was

therefore verified additionally by the Bayesian phylogenetic

method With this study the confidence of the tree

significantly increased and resulted in high posterior

prob-abilities The evolutionary tree now presented in Fig 3

summarizes all above-mentioned approaches and exhibits

therefore the best reliability

From this analysis it becomes evident that similar

sequences of the catalytic chain tend to have similar

C-terminal extensions (Fig 3) All 39 complete NAS

proteins can be subdivided into two different types: one

having CUB domains in their regulatory domains, and

another one where these are lacking completely (see also

Fig 2) This pattern is clearly reflected in the amino acid

sequence based phylogenetic tree, where all NAS proteins

exhibiting a CUB domain come closely together in one

cluster (Fig 3) The CUB domain is almost always preceded

by an EGF domain (exception NAS-21) To these either no

further segments are attached (subgroup III), or a SXC

domain (subgroup IV) or a TSP-1 domain (subgroup V)

might follow The second cluster comprises the NAS-1 to

NAS-15 proteins, characterized by having no distinct

extensions (subgroup I) or showing one, two or three

SXC domains (subgroup II) NAS-39 (subgroup VI) is

strikingly different from all other C elegans astacins, but

can perfectly be inserted into the BMP-1/Tolloid-group, likewise on the basis of the sequence homologies or the complex, but identical arrangement of the 5 CUB- and the 2 EGF-segments (Figs 2 and 3)

One might wonder about the expression of such large a number of related, but different astacin genes in a 959-cell organism Potentially all these genes could have different functions, showing in each case at least clear, in some cases marked structural differences However, much of this diver-gence seems to be due to relatively recent gene duplications

In the closely related species Caenorhabditis briggsae the genes NAS-16, -18, -19, -22, -24 and the pseudo-gene NAS-40 are missing C elegans and C briggsae share, however, the neighboring genes NAS-17, -20, and -21 In addition, these genes show a tandem-like arrangement in clusters and are all located on chromosome V, where

NAS-16, -17, -18, -19 form one cluster, and separated by different other genes a second cluster comprising NAS-20, -21, -22 can be found These notions are also supported by the position of these genes in the evolutionary tree (see Table 2, and Figs 2 and 3) It therefore seems reasonable to assume that these genes comprising one half of subgroup III resulted from recent gene duplications, which implies that they might have more or less similar functions If one extends this kind

of reasoning with some caution to the whole of the analyzed

established subgroups actually represent major functional differences, as these are based on marked differences in their regulatory units This would reduce the number of func-tionally different gene types to six, a number that comes close to that found for astacins also in other organisms Nevertheless, the fact remains that each NAS gene is expressed and structurally distinct from the others This constitutes a favorable starting point for the rapid acquisi-tion of new funcacquisi-tions, a capacity, which might be a prerequisite for the ubiquous occurrence of C elegans in nearly all soil types However, most NAS genes are dispersed over all six chromosomes of C elegans, which indicates a long evolutionary history of the astacin protein family in the nematodes The identical and complex arrangement of the seven regulatory domains in NAS-39 and BMP-1 suggests furthermore that this distinct structure has been retained unchanged for long periods and was already present in the common ancestor of nematodes and vertebrates

Acknowledgements

This study was supported by a grant from the Deutsche Forschungsg-emeinschaft, Bonn, to RZ (Zw 17/14–2) We also wish to thank Thorsten Burmester, University of Mainz, Germany for supporting the Bayesian phylogenetic analysis.

References

1 Pfleiderer, G., Zwilling, R & Sonneborn, H.H (1967) On the evolution of endopeptidases, 3: a protease of molecular weight 11,000 and a trypsin-like fraction from Astacus fluviatilis fabr Hoppe Seylers Z Physiol Chem 348, 1319–1331.

2 Sonneborn, H.H., Zwilling, R & Pfleiderer, G (1969) Evolution

of endopeptidases X Cleavage specificity of low molecular weight protease from Astacus leptodactylus Esch Hoppe Seylers Z Physiol Chem 350, 1097–1102.

Ngày đăng: 23/03/2014, 15:21

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm