The astacin protein family in Caenorhabditis elegans1 Institute of Zoology, University of Heidelberg; 2 Max-Planck-Institute for Medical Research, Heidelberg, Germany In the nematode Cae
Trang 1The astacin protein family in Caenorhabditis elegans
1 Institute of Zoology, University of Heidelberg; 2 Max-Planck-Institute for Medical Research, Heidelberg, Germany
In the nematode Caenorhabditis elegans, 40 genes code for
astacin-like proteins (nematode astacins, NAS) The astacins
are metalloproteases present in bacteria, invertebrates and
vertebrates and serve a variety of physiological functions like
digestion, hatching, peptide processing, morphogenesis and
pattern formation With the exception of one distorted
pseudogene, all the other C elegans astacins are expressed
and are evidently functional For 13 genes we found splicing
patterns differing from the Genefinder predictions in
WormBase, sometimes markedly The GFP expression
pattern for NAS-4 shows a specific localization in anterior
pharynx cells and in the whole digestive tract (as the secreted
form) In contrast, NAS-7 is found in the head of adult
hermaphrodites, but not in pharynx cells or in the lumen of
the digestive tract In embryos, NAS-7 fluorescence becomes
detectable just before hatching In C elegans astacins, three
basic structural and functional moieties can be discerned:
a prepro portion, the central catalytic chain and long C-terminal extensions with presumably regulatory func-tions Within the regulatory moiety, EFG-like, CUB, SXC, and TSP-1 domains can be distinguished Based on struc-tural differences of the regulatory unit we established six NAS subgroups, which seemingly represented different functional and evolutionary clusters This pattern deduced exclusively from the domain arrangement in the regulatory moiety is perfectly reflected in an evolutionary tree con-structed solely from amino acid sequence information of the catalytic chain Related catalytic chains tend to have related regulatory extensions The notable gene, NAS-39 shows a striking resemblance to human BMP-1 and the tolloids Keywords: astacin family; Astacus astacus; Caenorhabditis elegans; protein evolution; metalloproteases
The first evidence for the existence of the astacin protein
family can be traced back to the year 1967, when one of us
(R Zwilling) observed a proteolytic activity in the digestive
fluid of the decapode crayfish Astacus astacus that was
different to all other proteases known at that time [1]
Investigations of the cleavage and inhibition specificity
confirmed this notion [2–4] and the elucidation of its unique
amino acid sequence demonstrated definitely that the
crayfish protease represented a new protein family [5] In
subsequent studies, the X-ray crystal structure of the
ÔastacinÕ, was solved to a resolution of 1.8 A˚ [6] Astacin was recognized to be a metalloprotease exhibiting a penta-coordinated zinc ion in its active center [7] In addition, the site of biosynthesis [8], genome organization [9], and mode
of activation [10,11] have been elucidated, which made the crayfish protease a prototype for the astacins
A second member of the astacin protein family was identified when Wang et al and Wozney et al (1988) studied the human bone-inducing factor BMP-1, into which a domain with high resemb lance to crayfish astacin is inserted [12,13] After that many more astacin-like proteins or genes were described in rapid succession in vertebrates, inverte-brates and even in prokaryotes [14], where they serve as different physiological functions as food digestion, hatching, peptide processing, morphogenesis and pattern formation (for an overview see [15]) In the crayfish Astacus astacus, a second astacin gene can be found in the embryo that is activated only during a narrow time window just before hatching [16]
In the model organism Caenorhabditis elegans metallo-proteases are present in a great variety, as we have seen in data bank analysis (also [17]) On the other hand we have shown recently that the bulk of total proteolytic activity found in crude extracts of mixed stage populations consists
of acidic aspartyl proteases [18,19] However, with regard to the number of expressed astacin genes C elegans surpasses any other organism studied so far This investigation therefore was stimulated by the question, what for this 959-cell organism would need more than 30 different and active astacin genes
Correspondence to R Zwilling, Zoologisches Institut, Universita¨t
Heidelberg, Im Neuenheimer Feld 230, D-69120 Heidelberg,
Germany Fax: + 49 6221 544913, Tel.: + 49 6221 545887,
E-mail: RobertZwilling@t-online.de
Abbreviations: cDNA, complementary DNA; dsRNA,
double-stran-ded RNA; EST, expressed sequence tag; GFP, green fluorescent
protein; L1-4, larval stage 1–4; OST, open reading frame sequence tag;
RNAi, RNA interference; RT-PCR, reverse transcription-polymerase
chain reaction; NAS, nematode astacin.
Note: Supplementary figures are available at
http://www.zoo.uni-heidelberg.de/moehrlen
Note: The sequences and the alignment reported in this paper have
b een sub mitted to GenBank/EMBL/DDBJ data b ank with accession
numbers AJ561200, AJ561201, AJ561202, AJ561203, AJ561204,
AJ561205, AJ561206, AJ561207, AJ561208, AJ561209, AJ561210,
AJ561211, AJ561212, AJ561213, AJ561214, AJ561215, AJ561216,
AJ561217, AJ561218, AJ561219, AJ561220, AJ561221 and
ALIGN_000543.
(Received 3 September 2003, revised 15 October 2003,
accepted 22 October 2003)
Trang 2Materials and methods
The C elegans wild-type strain N2 variant Bristol was
grown as a liquid culture in S-medium [20] supplemented
with Escherichia coli OP50 as food source The cultures were
When the E coli food source appeared to have been nearly
exhausted, the nematodes, representing a mixed population
of adults, all four larval stages and eggs, were harvested and
separated from bacteria as described elsewhere [20]
RNA purification
For the isolation of RNA, 100 lg fresh or frozen nematode
pellets from a liquid culture were ground by means of a
pestle in a mortar containing liquid nitrogen Total RNA
was extracted from the resulting powder following the
protocol of Chomcynski and Sacchi [21] Contamination by
genomic DNA was avoided by treating total RNA with
DNase I (RNase-free, Boehringer) Poly(A)-rich RNA was
isolated by the Oligotex mRNA procedure (Qiagen,
Germany)
DNA purification
Genomic DNA was isolated from 1 mL fresh nematodes
from a liquid culture using a standard protocol [22]
PCR amplification and cloning
Polyadenylated RNA (1 lg) was converted into
hexa-mer prihexa-mer as described [23] For the amplification of the
predicted astacin-like cDNA fragments specific
oligo-nucleotide primers derived from the genome sequencing
data were used Primer sequences are available at http://
www.zoo.uni-heidelberg.de/moehrlen/docs/WebFig1.htm
PCR amplification was performed on single-stranded
cDNA or genomic DNA as a control with 2 U high fidelity
the mutation rate inherent to the PCR reaction The cycling
8 min After PCR, samples were analyzed in 2% agarose
gels and discrepancies between expected and observed size
of any PCR product were readily detected on visual
inspection of the gels The PCR products were then excised
from 1.5% agarose gels and purified with a NucleoSpin
gel-extraction kit (Macherey and Nagel, Germany) The
purified fragments were subjected to the SureClone
Ligation procedure and cloned into a pUC18 vector
according to manufacturer’s instructions (Pharmacia,
Sweden)
Plasmid DNA was prepared and subsequently nucleotide
sequences were determined by double-strand sequencing
according to the dideoxynucleotide chain-termination
method, using T7 DNA polymerase (Amersham, Sweden)
Universal M13 primers were used for sequencing All
sequences have been deposited in EMBL/GenBank/DDBJ
under accession numbers AJ561200, AJ561201, AJ561202,
AJ561203, AJ561204, AJ561205, AJ561206, AJ561207, AJ561208, AJ561209, AJ561210, AJ561211, AJ561212, AJ561213, AJ561214, AJ561215, AJ561216, AJ561217, AJ561218, AJ561219, AJ561220, AJ561221
GFP fusion genes for expression studies The genomic sequence data in WormBase [24] were used to identify a genomic DNA fragment suitable for fusion to a GFP reporter gene In order to make sure that the gene specific promoter and all proper cis-elements necessary for guiding tissue specific expression are included in the reporter, the whole upstream region between the gene of interest and the neighboring upstream gene was used For PCR amplification of the genomic DNA fragment the forward primers NAS-4:GFP/SacI/F1 (5¢-CGA GCT CTT GAG TGA AGA TGC CAA GA-3¢), NAS7:GFP/BamHI/ F1 (5¢-CGG GAT CCT TCC GCC AAA GTC ATT TAG-3¢), NAS-15:GFP/PstI/F1 (5¢-AAC TGC AGC TTT TCG GAA GAC TTT TGC-3¢), NAS33:GFP/KpnI/F1 (5¢-GGG GTA CCC CGG ACC ACA GTA AAG AAT-3¢) and the corresponding reverse primers NAS4:GFP/KpnI/R1 (5¢-GGG GTA CCC TGA CAC GCT GAC CCA TAC-3¢), NAS7:GFP/KpnI/R1 (5¢-GGG GTA CCC GATC CTC GCA TTC TA-3¢), NAS15:GFP/KpnI/R1 (5¢-GGG GTA CCC GCT GGG TAG TGG AGT TG-3¢), NAS33:GFP/SacI/ F1 (5¢-CGA GCT CTG ACA AGA AAG GCA CAA AG-3¢) were used A 8–10 kbPCR fragment containing approximately 3–5 kbupstream sequences down to the last 30–50 codons of the astacin genes was fused in frame to the reporter gene GFP Thus, the intergenic region as well as the protein coding regions of the astacin-like genes NAS-4, NAS-7; NAS-15 and NAS-33 were amplified with 2 U Elongase DNA polymerase (Invitrogene, Germany), gel purified (NucleoSpin gel-extraction kit, Macherey and Nagel, Germany) and cloned in frame into a pBD95.85 vector (having the S65C mutation and artificial introns
to increase the expression of GFP; A Fire Vector Kit, Baltimore, USA) according to standard protocols [23,25] The molecular details of all fusion constructs are available
on request The construct, together with the marker plasmid pBx, was introduced into pha-1 hermaphrodites, and the worms having the constructs as extrachromosomal arrays
under a Zeiss Axiovert 200 microscope
Sequence analysis and phylogenetic studies
To identify metalloprotease genes in the genome of
C elegans, we used representative vertebrate and insect proteins, or their conserved domains according to the PFAM [26] and PRINTS database [27], as queries for BLAST searches [28,29] of WormBase [24] For astacin genes the astacin domain, the zinc binding motif or the Met-turn sequences, as listed by PRINTS, were used to repeatedly screen the whole C elegans genomic sequence, available from WormBase
DNA sequences of all astacin genes were further analyzed
structures were compared to the Genefinder predictions as annotated in WormBase, and to the alternative GenieGene open reading frame predictions of Kent and Zahler [31] The
Trang 3splicing patterns were subsequently refined using the EST/
OST sequences available in the latest WormBase release
(WormBase97, 7 March 2003) and the cDNA sequences
resulting from this work Discrepancies between the
WormBase, GenieGene predictions and our own cDNA
sequences were communicated to those annotating the
docs/WebFig2.htm) The corrected cDNA sequences were
unconfirmed splicing patterns, those protein predictions
were used for further analysis, which are in accordance with
the protein family alignment showing no exceptional
ALIGN_000543)
For identification and annotation of protein domains and
the analysis of domain architectures the tools of the
SMART [33], PFAM [26], ProDom [34] and INTERPRO
[35] protein domain databases were used
For phylogenetic studies the active protease domains,
covering the region from Ala-1 to Leu-200 in the
prototype crayfish astacin, from the C elegans astacins
and selected other astacin family members were aligned
further manipulation The alignment is available at EMBL
database with accession number ALIGN_000543
Phylo-genetic analyses were carried out using the
neighbor-joining method and the Bayesian phylogenetic method
package [37] was used Distances between the pairs of
protein sequences were calculated and corrected for
multiple changes according to the PAM001 distance
matrix The reliability of the tree was tested by bootstrap
analysis with 100 replications Bayesian phylogenetic
program [40] with the WAG matrix [41] assuming a
gamma distribution of substitution rates Prior
probabil-ities for all trees and amino acid replacement models were
equal; the starting trees were random Metropolis-coupled
Markov chain Monte Carlo sampling was performed with
one cold and three heated chains that were run for 50 000 generations Trees were sampled every 10th generation Posterior probabilities were estimated on 2000 trees
Results and discussion
During a preliminary data base survey we observed in 1996 that the 959-cell organism C elegans accommodates a surprising number of gene sequences coding for astacin-like proteins, while for other species with a much larger genome not more than 2–3 astacin genes had been reported (G Geier and R Zwilling, unpublished)
The complete sequencing of the 97 megabase genome of
1998 [43] then made a thorough analysis possible The latest WormBase release (WormBase97, 7 March 2003) contains now 21 437 coding sequences when counting
1891 alternate splice forms Of these the MEROPS protease database (latest release 6.11: 20 January 2003) lists 382 protease genes (E.C.3.4), of which 158 genes belong to the group of metalloproteases (E.C.3.4.24) The metalloproteases of C elegans can be arranged into 11 protein clans and subdivided into 27 protein families, according to the nomenclature of Barrett et al [44] Our own BLAST searches in WormBase, using protein family consensus sequences according to the PFAM or PRINTS databases as queries, revised the number of identified
BLAST searches based on the whole astacin domain, the zinc binding motif or the Met-turn sequence revealed some more astacin genes in C elegans in addition to those listed by MEROPS so far, which finally brought up the total number of astacin genes in C elegans to 40 (Tables 1 and 2)
The nomenclature proposed for these 40 C elegans astacin genes is in accordance with suggestions of the
Table 1 One hundred and fifty-one genes coding for metalloproteases in C elegans Identification of genes was based on data available in MEROPS (The protease database, release 6.11: 20 January 2003, http://merops.sanger.ac.uk) and subsequently corrected by BLAST searches using the genome sequencing data of C elegans Nomenclature is according to Barrett et al [44].
Clan Protease family
Number
of genes Clan Protease family
Number
of genes MA(E) M1 aminopeptidase 12 MF M17 leucyl aminopeptidase 2
M2 peptidyl-dipeptidase 1 MG M24A methionyl aminopeptidase I 5
M41 E coli endopeptidase 3 M20A/B glutamate carboxypeptidase 5
M12A Astacin 40 MJ M38 beta-aspartyl dipeptidas 1 M12B/C ADAM 10 MK M22 O-sialoglycoprotein endopeptidase 2
M14B carboxypeptidase E 3 MX M48A Ste24 endopeptidase 1
M16B mitochondrial processing peptidase 3 M67 proteasome regulatory subunit RPN11 3
Trang 4RT-PCR sequen
Trang 5C elegans Sequencing Consortium In Table 2 we have numbered these C elegans astacins (nematode astacins, NAS) from 1 to 40 The two proteins NAS-23 and NAS-40 (located on cosmids F54B8 and D1022) are not recorded in the WormPep database (predicted proteins from Worm-Base) but could be detected by a genomic TBLASTN search
GENSCAN did not predict a complete protein but rather an
88 amino acid fragment which is interrupted by two stop codons
Hishida et al [45] reported that HCH-1 (¼ F40E10.1, NAS-34) is required for normal hatching and neuroblast migration in C elegans For all other astacin genes, beyond the Genefinder protein prediction in WormBase and the partial transcription analysis by the EST or open reading frame sequence tags (OST) projects no further details were known It therefore was indispensable to confirm as a first step for each gene the existence of expression products
Transcriptome analysis Comparing all genomic DNA sequences of astacin genes identified by our BLAST search to the cDNA data of WormBase it became evident that for 12 of the total of 40 genes EST or OST clones [46,47] were already known (WormBase release 57, 17 December 2001) This confirmed that the 12 genes in question were expressed on the mRNA level
The remaining 28 genes were analyzed by RT-PCR followed by sequencing of the DNA fragments in order to demonstrate their transcription activity For each gene specific primer pairs were synthesized, the gene frag-ments amplified by PCR and the products analyzed on agarose gels (http://www.zoo.uni-heidelberg.de/moehrlen/ docs/WebFig1.htm) In each case the PCR reaction with reverse-transcribed RNA was accompanied by a control reaction with genomic DNA Introns within the amplified DNA regions gave rise to correspondingly larger DNA fragments when compared to their cDNA fragments For unambiguous identification and for the correction of erroneous splicing pattern predictions for all DNA frag-ments the PCR products were eluted from a agarose gel, blunt end cloned into the vector pUC18 and subsequently
docs/WebFig2.htm)
In combination with the recently available EST and OST sequences (WormBase release 97, 7 March 2003) we found for 13 genes (Table 2) splicing patterns differing from the Genefinder predictions in WormBase, sometimes markedly
In these cases, the experimental cDNA transcripts were in good accordance with the alternative GenieGene open reading frame predictions of Kent and Zahler [31] (Table 2
WebFig2.htm) For 1, 21, 22 and
NAS-28 we observed aberrant splice sites from both, the Genefinder and the GenieGene prediction The manually corrected cDNA sequences can be found at http://www
new sequence data including corrected gene structures have been submitted to WormBase and EMBL/GenBank/DDBJ databases (for accession number, see footnote) The genes
Trang 6NAS-2, NAS-5, NAS-16, NAS-17, NAS-18 and NAS-30
showed no apparent PCR product in our RT-PCR analysis
(Table 2, http://www.zoo.uni-heidelberg.de/moehrlen/docs/
WebFig2.htm) However, the microarray projects of Hill
et al [48,49], Kim et al [50], or Jiang et al [51] (for an
overview see WormBase) support the expression of these
genes We would like to point out that this technique has
no way to unerringly verify either the identity or the
splicing pattern of a gene because no sequence data are
produced
Nevertheless, in summary it may be stated that with the exception of pseudogene NAS-40 for all other 39 astacin genes a transcription activity could be confirmed
Functional analysis
We made an attempt to analyze the function of selected astacin genes in C elegans investigating the expression pattern of four representative astacin genes of different subgroups (see section on Structural and phylogenetic analysis, Fig 2.) using GFP-fusion constructs All astacin-GFP fusions were assayed for expression in animals from embryonic stages onwards At least three independent transgenic lines were generated from at least two inde-pendent clones of each of the astacin-GFP fusion constructs to control for PCR-induced sequence errors The reporter gene fusion 15::GFP and NAS-33::GFP failed to give detectable expression in any life stage The fusion protein NAS-4::GFP showed extensive GFP fluorescence throughout the digestive tract in larval stages and in adult worms (Fig 1A) At higher magnifi-cation, we saw GFP staining within pharynx cells of the procorpus, metacorpus, isthmus and terminal bulb, and extracellular staining in the lumen of the terminal bulb (Fig 1B, arrows) Therefore, NAS-4 most likely is secreted
by the pharynx cells into the lumen and then is found in secreted form all the way down in the lumen of the gut
We conclude from this expression pattern that NAS-4 is associated with digestive functions Of special interest is the notion that NAS-4 and the digestive enzyme astacin from crayfish [8] have a similar domain arrangement, both lacking a C-terminal extension (see section on Structural and phylogenetic analysis) They also cluster in the
similar functions These considerations might be extended
to the whole subgroup I (Fig 2, NAS-2–6) which shares these features
By contrast, NAS-7::GFP staining was observed only in the head of adult hermaphrodites, but not within pharynx cells (Fig 1C) The expressing cells are located outside of the pharynx, around the metacarpus and the terminal bulb, and could include neurons, cells of the excretory system or gland cells of still unknown functions [20] Reporter gene expression also became detectable in the embryo before hatching (Fig 1D) While at this moment the function of the gene expressed in the adult remains open, in the embryo it possibly could serve as a hatching enzyme
To further characterize the function of astacin genes in
Gonczy et al [52], Fraser et al [53], Maeda et al [54], Kamath et al [55,56], Ashrafi et al [57], Lee et al [58] and Pothof et al [59] Although nearly all astacin genes have been investigated for gene silencing by RNAi, most of them lack of an obvious phenotype and no function could be deduced from the attempted inactivation Whether this phenomenon reflects the dsRNA interference being incom-plete or a redundancy in functions for the high number of expressed astacin genes remains to be established Strong RNAi phenotypes were observed for NAS-9, -11 and -37 only, revealing these three astacin genes to be essential Inactivated NAS-9 showed 6% embryonic lethality [54],
Fig 1 GFP expression pattern images for NAS-4 (A, B) and NAS-7
(C, D) (A) Extensive GFP fluorescence throughout the digestive tract
in an adult hermaphrodite and a L2 larvae for a NAS-4::GFP fusion
gene; 100 · magnification (B) Higher magnification of the head of an
adult hermaphrodite showing GFP expression for the same construct
in pharynx cells and in the lumen of the terminal bulb; 400 ·
magni-fication (C) GFP expression of a NAS-7::GFP fusion gene is found in
the head of adult hermaphrodites, but not in pharynx cells or in the
lumen of the digestive tract; 300 · magnification (D) In embryos
NAS-7::GFP reporter gene fluorescence became detectable just before
hatching; 400 · magnification.
Trang 7NAS-11 showed retarded growth [56] and NAS-37 showed
long body deviancy and a molt defect [54,56] As a rule it
can be stated that all known astacin gene inactivations had
only little, if any, effect One explanation for this could be
that C elegans astacins have overlapping functions, which
is also suggested by structural homologies
Structural and phylogenetic analysis
All known sequence data of astacin-like proteins are derived
from cDNA and genomic sequences, with the exception of
crayfish astacin, which in addition had been completely sequenced by Edman degradation [5]
The present analysis is based on protein sequences available from SwissProt, TrEMBL, EMBL, and GenBank databases If necessary, open reading frames of DNA sequences were translated by the HUSAR Package into amino acid sequences For C elegans we used the Gene-finder or GenieGene predictions corrected by our cDNA
WebFig1.htm) Altogether, we found over a hundred
Fig 2 Schematic representation of homologues and domain structures in astacin genes in C elegans Pre-pro sequences, catalytic domain and presumably regulatory appendices Diagram scale is related to amino acid length Presequences, purple shaded boxes; prosequences, grey oval; astacin domain, red box; six cysteins, SXC; EGF-like, yellow oval; CUB domains, CUB; thrombospondin-1 like, TSP1; low complexity sequences, striped boxes; not specified, open boxes.
Trang 8are known at present (http://www.zoo.uni-heidelberg.de/
moehrlen/docs/WebFig2.htm) Considering only the
euca-ryote genomes sequenced completely, in human and mouse
six, and in Drosophila melanogaster 12 astacin genes are
found However, the tiny 959-cell organism C elegans
exhibits the striking number of 40 astacin genes, a number
by far not reached in any other organism studied up to now
With the only exception of the pseudogene NAS-40 all these
genes are expressed and seem to have specific functions
Therefore, these findings not only allow the study of an
extraordinary divergence of a protein family within one
single organism, but also shed light on a multiple functional
fine modulation evolving from a common structural source
In the astacins typically three basic structural and
functional moieties can be discerned: a pre-pro portion,
the catalytic astacin chain, and long C-terminal extensions,
which presumably contain messages for proper function
(Fig 2) Pro-sequences are found in all functional C elegans
astacins, while presequences (signal peptides) are lacking in
nine genes (Fig 2) The missing of signal peptides in these
genes may reflect specific intracellular functions of
non-secreted proteins On the other hand the lack of these signal
peptides could also reflect problems with the still
uncon-firmed 5¢-gene predictions of Genfinder or GenieGene as the
sequencing data produced here have been limited to
PCR-derived fragments, and to the reanalysis of EST and OST
fragments In some rare cases in other organisms prepro
structures may be lacking completely, often combined
with a N-terminally truncated catalytic domain [Cortunix
exception of the not expressed pseudogene NAS-40) this
feature never could be seen In the central domain of all
been identified in crayfish astacin as essential for catalytic
activity [6,7,60,61] are preserved without exception From
this fact it may be concluded that all C elegans astacins
potentially have catalytic activity, too
complex C-terminal extensions adjacent to the catalytic
domain, which presumably define time and place of their
activity (Fig 2) Based on homology criteria within these
appendices CUB-, EGF-, SXC-, and TSP-1 domains can
be discerned, while other sequences must be classified as
Ônon specificÕ or having Ôlow compositional complexityÕ
(LC) LC regions are often Ser/Thr-rich, are found in
many astacins and could serve as sites for O-glycosylation
EGF domains are epidermal growth factor like modules
(PFAM accession number: PF00008) CUB domains
(SMART accession number: SM0042) are named after
their occurrence in complement components C1r/C1s,
embryonic sea urchin protein Uegf, and BMP-1 [62]
These domains may be involved in calcium-binding and
protein-protein or enzyme–substrate interactions [63] The
SXC (six-cysteine) motif was observed in several
hypo-thetical C elegans proteins [64,65] but was originally
described in metridin, a toxin from sea anemone and is
also called ShK toxin domain (SMART accession number:
SM0254) TSP-1-like domains are thrombospondin type 1
repeats (SMART accession number: SM0209) which are
present in several families of metalloproteases namely in
the ADAM-TS proteases (ADAM-TS, a disintegrin-like and metalloproteinase with thrombospondin type I motifs; family M12B/C, see Table 1) TSP-1 domains are reported here for the first time for astacins
C-terminal extensions we arranged all 40 C elegans genes into the subgroups I–VI (Fig 2) Subgroup I comprises five genes with no C-terminal extension (NAS-1), or with short, unspecific extensions, where probably no specific signals can be accommodated Subgroup II exhibits in its
10 genes exclusively the SXC domain, while other domain types are completely lacking The SXC domain appears in
a single, double or triple arrangement and the domains may be attached directly to the catalytic chain or separated from it and from each other by short, unspecific sequences A tandem-like arrangement can only be seen with these SXC domains, while other domain types are represented only once in a regulatory chain (for an exception see subgroup VI) Subgroup III combines 15 genes that typically have an EGF-like domain directly attached to the catalytic chain, followed by a CUB domain In gene NAS-18 the CUB domain and in gene NAS-21 the EGF-like domain is missing In subgroup IV (two genes) a SXC domain and in subgroup V (six genes)
a TSP-1 domain is added to EGF and CUB domains,
Fig 3 Phylogenetic relationship of the astacins, including all C elegans astacin proteins (shaded yellow) and selected examples from other organisms The tree was deduced by Bayesian and neighbor-joining analysis based on the alignment of the amino acid sequences of the catalytic chain At branching points, Bayesian posterior probabilities and bootstrap values greater than 50 of 100 replications (values in parentheses) and are given as an indication for the confidence of the tree presented The scale bar represents a distance of 0.1 accepted point mutations per site (PAM) Evolutionary subgroups of the astacin protein family are indicated on the right side The schematic repre-sentation of the protein domains (colored bars) corresponds to that in Fig 2 Meprin domains: MAM domain, MAM; MATH domain, MATH; I-domain, I; intervening sequence, inter; transmembrane domain, TM; cytoplasmic domain, c For an overview, see [66] Abbreviations and Swissprot/TREMBL/PIR accession number of the astacins: AA Astacin, Astacus astacus (crayfish) astacin (P07584); AC TBL-1, Aplysia californica TBL-1 (P91972); AJ EHE-4, Anguilla japonica (fish) EHE-4 (Q90Y89); CC Nephrosin, Cyprinus carpio (fish) Nephrosin (O42326); DM Tolloid and Tolkin, Drosophila melanogaster Tolloid (P25723) and Tolkin (Q23995); FM Flavast, Flavobacteriumm eningosepticumFlavastacin (Q47899); HS BMP-1, Homo sapiens bone morphogenetic protein 1 (Q14874); HS Meprin A and B, Homo sapiens Meprin a (Q16819) and b (Q16820); HS TLL and TLL-2, Homo sapiens Tolloid like 1 (Q9NQS4) and 2 (Q9UQ00); HV HMP-2, Hydra vulgaris (Cnidaria) Metalloprotease 2 (Q9XZG0);
MM BMP-1, Mus musculus BMP-1 (I49540); MM Meprin A and B, Mus musculus Meprin a (P28825) and b (Q61847); OL LCE and
HCE-1, Oryzias latipes (fish) low choreolytic enzyme (P31581) and high choreolytic enzyme 1 (EMBL:M96170); PC PMP-1, Podocoryne car-nea (Cnidaria) Metalloprotease 1 (O62558PL); PL BP-10, Paracen-trotus lividus (sea urchin) blastula protease 10 (P42674); SP BMP-H, Strongylocentrotus purpuratus (sea urchin) BMP-1 homolog (P98069);
SP SPAN, Strongylocentrotus purpuratus (sea urchin) SPAN (P98069);
TR MP, Takifugu rubripes (fish) HCE-1 (AAL40376); XL BMP-1, Xenopus laevis BMP-1 (P98070).
Trang 9which show an identical arrangement as in subgroup III.
Subgroup VI is a special case: the only entry NAS-39
shows a striking similarity to human bone inducing factor
BMP-1 A comparison between both proteins reveals a
sequence identity of the catalytic chains of 74%, while for
other nematode astacins this value reaches on average only
40% But also xolloid (Xenopus), tolloid and tolkin
(Drosophila) and TBL-1 (Aplysia) have corresponding structures The Number and arrangement of CUB- and EGF-domains are identical in these genes NAS-39 exceeds in its length by far all other C elegans genes It will be interesting to see what physiological role a factor almost identical to human BMP-1 might perform in
Trang 10primordial functions from which human BMP-1 has
evolved The distinctive and complex pattern, which
appears in the subgroups I–VI seems to provide a specific
function for each C elegans astacin gene Members of the
same subgroup might have similar or identical functions
We constructed a phylogenetic tree comprising all 39
expressed C elegans astacins and in addition selected
astacin proteins from a variety of other organisms
(Fig 3) The tree is based on a multiple alignment of the
amino acid sequence of the active protease domain, covering
the region from Ala1 to Leu200 in the prototype, crayfish
astacin Results were corrected with help of the known
secondary structures and conserved regions of crayfish
astacin The alignment has been submitted to EMBL
databank with accession number ALIGN_000543
Phylogenetic relationships were initially established on
program package As outgroup we used the
phylogeneti-cally most remote flavastacin from bacteria However, an
isolated occurrence of an astacin sequence in a single
bacteria species could be due to a lateral gene transfer,
which would render this sequence unsuitable as an
out-group Because recently at least one more astacin-like
protein has been detected in bacteria
(http://www.zoo.uni-heidelberg.de/moehrlen), lateral gene transfer is most
unlikely Moreover, we also tried the phylogenetically
remote Cnidaria astacins (HMP-2 and PMP-1) as an
outgroup, which gave exactly the same phylogenetic tree
For statistical verification a consensus tree including 100
sequences was calculated and bootstrap values were
estab-lished for each point of divergence However, the
phylo-genetic tree based on the neighbor-joining method showed
rather low bootstrap values (< 50) for the most ancestral
nodes (Fig 3) Pro sequences could not be used additionally
to strengthen these branching points because they are
differing extremely in length, are changing rapidly or are
lacking completely A similar consideration can be made for
the C-terminal extensions The robustness of the tree was
therefore verified additionally by the Bayesian phylogenetic
method With this study the confidence of the tree
significantly increased and resulted in high posterior
prob-abilities The evolutionary tree now presented in Fig 3
summarizes all above-mentioned approaches and exhibits
therefore the best reliability
From this analysis it becomes evident that similar
sequences of the catalytic chain tend to have similar
C-terminal extensions (Fig 3) All 39 complete NAS
proteins can be subdivided into two different types: one
having CUB domains in their regulatory domains, and
another one where these are lacking completely (see also
Fig 2) This pattern is clearly reflected in the amino acid
sequence based phylogenetic tree, where all NAS proteins
exhibiting a CUB domain come closely together in one
cluster (Fig 3) The CUB domain is almost always preceded
by an EGF domain (exception NAS-21) To these either no
further segments are attached (subgroup III), or a SXC
domain (subgroup IV) or a TSP-1 domain (subgroup V)
might follow The second cluster comprises the NAS-1 to
NAS-15 proteins, characterized by having no distinct
extensions (subgroup I) or showing one, two or three
SXC domains (subgroup II) NAS-39 (subgroup VI) is
strikingly different from all other C elegans astacins, but
can perfectly be inserted into the BMP-1/Tolloid-group, likewise on the basis of the sequence homologies or the complex, but identical arrangement of the 5 CUB- and the 2 EGF-segments (Figs 2 and 3)
One might wonder about the expression of such large a number of related, but different astacin genes in a 959-cell organism Potentially all these genes could have different functions, showing in each case at least clear, in some cases marked structural differences However, much of this diver-gence seems to be due to relatively recent gene duplications
In the closely related species Caenorhabditis briggsae the genes NAS-16, -18, -19, -22, -24 and the pseudo-gene NAS-40 are missing C elegans and C briggsae share, however, the neighboring genes NAS-17, -20, and -21 In addition, these genes show a tandem-like arrangement in clusters and are all located on chromosome V, where
NAS-16, -17, -18, -19 form one cluster, and separated by different other genes a second cluster comprising NAS-20, -21, -22 can be found These notions are also supported by the position of these genes in the evolutionary tree (see Table 2, and Figs 2 and 3) It therefore seems reasonable to assume that these genes comprising one half of subgroup III resulted from recent gene duplications, which implies that they might have more or less similar functions If one extends this kind
of reasoning with some caution to the whole of the analyzed
established subgroups actually represent major functional differences, as these are based on marked differences in their regulatory units This would reduce the number of func-tionally different gene types to six, a number that comes close to that found for astacins also in other organisms Nevertheless, the fact remains that each NAS gene is expressed and structurally distinct from the others This constitutes a favorable starting point for the rapid acquisi-tion of new funcacquisi-tions, a capacity, which might be a prerequisite for the ubiquous occurrence of C elegans in nearly all soil types However, most NAS genes are dispersed over all six chromosomes of C elegans, which indicates a long evolutionary history of the astacin protein family in the nematodes The identical and complex arrangement of the seven regulatory domains in NAS-39 and BMP-1 suggests furthermore that this distinct structure has been retained unchanged for long periods and was already present in the common ancestor of nematodes and vertebrates
Acknowledgements
This study was supported by a grant from the Deutsche Forschungsg-emeinschaft, Bonn, to RZ (Zw 17/14–2) We also wish to thank Thorsten Burmester, University of Mainz, Germany for supporting the Bayesian phylogenetic analysis.
References
1 Pfleiderer, G., Zwilling, R & Sonneborn, H.H (1967) On the evolution of endopeptidases, 3: a protease of molecular weight 11,000 and a trypsin-like fraction from Astacus fluviatilis fabr Hoppe Seylers Z Physiol Chem 348, 1319–1331.
2 Sonneborn, H.H., Zwilling, R & Pfleiderer, G (1969) Evolution
of endopeptidases X Cleavage specificity of low molecular weight protease from Astacus leptodactylus Esch Hoppe Seylers Z Physiol Chem 350, 1097–1102.