Keywords C-type lectin-like domain; domain superfamily; protein evolution; carbohydrate binding Correspondence J.. Gready, Computational Proteomics and Therapy Design Group, Division of
Trang 1The C-type lectin-like domain superfamily
Alex N Zelensky and Jill E Gready
Computational Proteomics and Therapy Design Group, John Curtin School of Medical Research, Australian National University, Canberra, Australia, Subdivision: Proteomics
Introduction
The superfamily of proteins containing C-type
lectin-like domains (CTLDs) is a large group of extracellular
Metazoan proteins with diverse functions It has been
the subject of some general literature reviews [1,2], but
with many more focusing on its particular functions
(e.g [3,4]) There are also several systematic studies
[5–9] A classification of the family members based on
the overall domain architecture of the CTLD-containingproteins (CTLDcps), which was introduced by Drick-amer in 1993 [2] and updated recently [6], served as auseful framework for the superfamily studies However,despite a voluminous literature describing some of thefamily’s properties in great detail, we feel that a freshcritical review would be useful, as the previous review ofthis scale was published more than a decade ago [2] Ourapproach has several main goals, outlined below
Keywords
C-type lectin-like domain; domain
superfamily; protein evolution; carbohydrate
binding
Correspondence
J E Gready, Computational Proteomics and
Therapy Design Group, Division of
Molecular Bioscience, John Curtin School of
Medical Research, PO Box 334, Canberra
The superfamily of proteins containing C-type lectin-like domains (CTLDs)
is a large group of extracellular Metazoan proteins with diverse functions.The CTLD structure has a characteristic double-loop (‘loop-in-a-loop’)stabilized by two highly conserved disulfide bridges located at the bases ofthe loops, as well as a set of conserved hydrophobic and polar interactions.The second loop, called the long loop region, is structurally and evolutio-narily flexible, and is involved in Ca2+-dependent carbohydrate bindingand interaction with other ligands This loop is completely absent in asubset of CTLDs, which we refer to as compact CTLDs; these include theLink⁄ PTR domain and bacterial CTLDs CTLD-containing proteins(CTLDcps) were originally classified into seven groups based on their over-all domain structure Analyses of the superfamily representation in severalcompletely sequenced genomes have added 10 new groups to the classifica-tion, and shown that it is applicable only to vertebrate CTLDcps; despitethe abundance of CTLDcps in the invertebrate genomes studied, thedomain architectures of these proteins do not match those of the vertebrategroups Ca2+-dependent carbohydrate binding is the most common CTLDfunction in vertebrates, and apparently the ancestral one, as suggested bythe many humoral defense CTLDcps characterized in insects and otherinvertebrates However, many CTLDs have evolved to specificallyrecognize protein, lipid and inorganic ligands, including the vertebrateclade-specific snake venoms, and fish antifreeze and bird egg-shell proteins.Recent studies highlight the functional versatility of this proteinsuperfamily and the CTLD scaffold, and suggest further interesting discov-eries have yet to be made
Abbreviations
CRD, carbohydrate recognition domain; CTLD, C-type lectin-like domain; CTLDcp, CTLD-containing protein; DC-SIGN, Dendritic cell-specific ICAM-grabbing nonintegrin; EST, expressed sequence tag; MBP, mannose-binding protein; NK, natural killer cell; PSP, pulmonary surfactant protein; PTR, protein tandem repeat.
Trang 2The literature is strongly biased towards several
groups of mammalian proteins, many of more
biomed-ical interest In this review we tried to capture the
superfamily in all its variety, rather than attempting to
provide a description of the known members
propor-tional to the amount of published data In particular,
we wanted to integrate the results of the systematic
studies of the CTLDs from lower vertebrates, such as
proteins from snake venom and fish CTLDs, etc with
the classification of mammalian CTLDs The recent
inclusion of new CTLDcp groups inspired a critical
reassessment of the principles on which the current
domain-based classification was built We also wanted
to summarize the functional data on invertebrate
CTLDs, which to our knowledge has never been
reviewed previously at a general level
In addition, numerous structural studies of CTLDs
in the last decade have provided much information on
the inner workings of the fold and the mechanisms of
Ca2+-dependent carbohydrate binding We have
attempted to generalize these data and outline the
most common elements of the domain An important
correlation between the residue composition of the
pri-mary carbohydrate-binding site and its basic specificity
towards mannose- or galactose-group monosaccharides
was discovered early in the history of CTLD studies
and remains the most useful means for CTLD-function
prediction However, several models suggested to
explain the mechanisms of such a correlation had to
be rejected as the volume of data grew, and no
com-prehensive explanation of this fundamental
phenom-enon has been published Our goal was to analyze the
current state of the literature on this problem, to see if
an explanation is apparent
Finally, we wanted to address the inconsistencies of
the terminology of the CTLDcp superfamily which
exist in the literature, and to suggest clear definitions
for the relevant terms
The CTLD superfamily
A brief history of discovery
C-type lectins were among the first animal lectins
dis-covered Bovine conglutinin, which belongs to the
col-lectin group of C-type col-lectins, has been known since
1906, and agglutinating activity of the snake venom
lectins was first described much earlier, in 1860 [10] In
1988 Drickamer suggested to organize animal lectins
into several categories, and classified Ca2+-dependent
lectins structurally similar to the asialoglycoprotein
receptor as the C-type lectin group [11] Since then, the
known family has grown significantly, and now
includes more than a thousand identified members(including those from genome sequences only) fromdifferent animal species, most of which lack lectinactivity
Term definitions: CTLD, CRD, C-type lectinThe terms ‘C-type lectin’, ‘carbohydrate recognitiondomain’ (CRD), ‘C-type lectin domain’ (CTLD),
‘C-type lectin-like domain’ (also abbreviated asCTLD), are often used interchangeably in the litera-ture This may be a source of confusion The history
of the introduction and the common meanings of theterms are outlined below, followed by the definitions
we will use in this review
The term ‘C-type lectin’ was introduced to guish a group of Ca2+-dependent (C-type) carbohy-drate-binding (lectin) animal proteins from the other(Ca2+-independent) types of animal lectins When thestructures of C-type lectins were established biochemi-cally and functions of different domains were defined,
distin-it was found that carbohydrate-binding activdistin-ity wasmediated by a compact module – the ‘carbohydrate-recognition domain’ (CRD) – which was present in all
Ca2+-dependent lectins but not in other types of mal lectins [11–13] Comparison of CRD sequencesfrom different C-type lectins revealed conserved resi-due motifs characteristic of the domain [2,11,13],which allowed discovery of many more proteins thatcontained it At the same time, crystallographic studiesconfirmed that the CRD of the C-type lectins has acompact globular structure, which was not similar toany known protein fold [14] This domain has beencalled ‘C-type CRD’ or ‘C-type lectin domain’ As thenumber of determined sequences grew, it became clearthat not all proteins containing C-type CRDs can actu-ally bind carbohydrates or even Ca2+ To resolve thecontradiction, a more general term ‘C-type lectin-likedomains’ was introduced to refer to such domains[1,3] The usage of this term is however, somewhatambiguous, as it is used both as a general name forthe group of domains with sequence similarity toC-type lectin CRDs (regardless of the carbohydrate-binding properties), and as a name of the subset ofsuch domains that do not bind carbohydrates, with thesubset that does bind carbohydrates being calledC-type CRDs [6,8] Also both ‘C-type CRD’ and
ani-‘C-type lectin domain’ terms are still being used inrelation to the C-type lectin homologues that do notbind carbohydrate (e.g [15–17]), and the group of pro-teins containing the domain is still often called the
‘C-type lectin family’ or ‘C-type lectins’, although most
of them are not in fact lectins The abbreviation CRD
Trang 3is used both in the meaning of ‘C-type
carbohydrate-recognition domain’ and in a more general meaning of
‘carbohydrate-recognition domain’, which encompasses
domains from different lectin groups [8] Occasionally
CRD is also used to designate the short amino-acid
motifs (i.e amino-acid domain) within CTLDs that
directly interact with Ca2+ and carbohydrate (e.g
[18])
Structure comparisons add another meaning to the
definition of the C-type lectin domain, as structural
similarities have been discovered between C-type lectin
CRDs and protein domains that did not show
signifi-cant sequence similarity to any of the known C-type
lectins but adopted a similar fold [19–23] As the fold
is very unusual, these domains have been separated
into a common group in structure classification
data-bases For example, in the SCOP database [24] C-type
lectins and structurally related domains are grouped at
the fold level (‘C-type lectin-like fold’), which is the
second level from the top of the classification
hier-archy However, although the structural similarity is
often acknowledged in the literature, the common
meaning of the C-type lectin-like domain does not
include these domains [1,6]
Here we will use the term ‘C-type lectin-like domain’
(CTLD) in its broadest definition to refer to protein
domains that are homologous to the CRDs of the
C-type lectins, or which have structure resembling the
structure of the prototypic C-type lectin CRD
Pro-teins harboring this domain will be called
CTLD-containing proteins (CTLDcps) instead of the more
common ‘C-type lectins’, as the latter implies
carbohy-drate-binding ability which most of the CTLDcps are
not known to possess
Phylogenetic distribution, groups
With a few exceptions, which will be discussed
below, CTLDs are only found, extracellularly, in
Metazoa The domain has been a very popular
framework evolutionarily for generating new
tions and is found in various structural and
func-tional contexts CTLDcps are ubiquitous in
multicellular animals, and are found in a broad
range of species, from sponges to human [6,25]
CTLDcp-encoding genes have been found in all fully
sequenced Metazoan genomes, and, in general, in
large numbers For example, the CTLD is the 7th
most abundant domain family in Caenorhabditis
ele-gans [26] The family shows both evolutionary
flexi-bility and conservation Whole-genome studies have
shown that although there are virtually no
similarit-ies between CTLDcps from worm, fruit fly and
vertebrates [8], relatively few modifications occurredwithin the vertebrate lineage during evolution fromfish to mammals [9], with some members showingsequence conservation approaching the conservation
of histones
Non Metazoan CTLDsThere are several interesting examples of non MetazoanCTLDcps, which can be divided into two groups.Members of the first group come from parasitic bac-teria and viruses; these are involved in interactionswith the animal host and are either hijacked host pro-teins or their imitations This group includes bacterialtoxins (pertussis toxin [23] and proaerolysin [22]) andouter membrane adhesion proteins (intimin fromenteropathogenic Escherichia coli [21] and invasin fromYersinia pseudotuberculosis [27]) and viral proteins.Viral CTLDcps are either transmembrane proteins orstructural envelope proteins, and include, for example,eight ORFs in the fowlpox virus genome [28], proteinsfrom vaccinia virus [29,30], African swine fever virus[31], cowpox virus [32], avian adenovirus gal1 [33],myxoma virus [34], molluscum contagiosum [35],Epstein-Barr virus [36], and alcelaphine herpesvirus[37] Unlike bacterial CTLDs, which were assigned tothe CTLD superfamily on the basis of structural simi-larity only, viral proteins contain a canonical CTLDwith significant similarity to those in mammalianCTLDcps
While the presence of CTLDcps in parasites has anobvious rationalization, the origins of another group
of non Metazoan CTLDcps is unclear We have foundthree proteins that can be assigned to this group: twoproteins from plants, and a putative protein encoded
by an ORF from a marine planctomycete Pirellula sp.(GenBank ID:32443381) The latter sequence, which is
7716 amino acids long and is encoded by the biggestORF in the genome of that bacterium [38], containsseveral C-type lectin-like, laminin G and cadherindomains, all of which are domains almost exclusivelyfound in Metazoa The most parsimonious explanation
of the presence of all these domains in the Pirellulagenome is horizontal gene transfer, but what the func-tion of the protein harboring them might be is amystery, as Pirellula are free-living species The plantCTLDcp sequences originate from the Arabidopsisthaliana genome annotation (transcript IDs At4g22160and At1g52310) and are not characterized functionally.At1g52310 is a transmembrane protein with a typicalCTLD in the extracellular domain and a proteinkinase domain in the cytoplasmic part; it has a well-conserved orthologue in the rice genome sequence
Trang 4It is not absolutely clear whether the CTLD
super-family is monophyletic, as homology between the
canonical and some of the compact CTLDs (see
below) cannot be confidently established There seems
little doubt that the Link domain group of CTLDs has
emerged as a result of a deletion of the long loop
region from an ancestral canonical CTLD, because the
Link domains have a much narrower phylogenetic
dis-tribution (only found in vertebrates), are less diverse,
and show detectable sequence similarity to the
canon-ical CTLDs [19] However, the evolutionary
relation-ship of the compact CTLDs from the bacterial toxins
to the animal CTLDs is uncertain [39] These domains
could either have been acquired by horizontal transfer
or could have arisen by convergent evolution, as
mim-icry of host proteins
The CTLD fold
The CTLD fold has a double-loop structure (Fig 1)
The overall domain is a loop, with its N- and
C-ter-minal b strands (b1, b5) coming close together to form
an antiparallel b-sheet The second loop, which is
called the long loop region, lies within the domain; it
enters and exits the core domain at the same location
Four cysteines (C1-C4), which are the most conserved
CTLD residues, form disulfide bridges at the bases ofthe loops: C1 and C4 link b5 and a1 (the wholedomain loop) and C2 and C3 link b3 and b5 (the longloop region) The rest of the chain forms two flanking
a helices (a1 and a2) and the second (‘top’) b-sheet,formed by strands b2, b3 and b4 The long loop region
is involved in Ca2+-dependent carbohydrate binding,and in domain-swapping dimerization of some CTLDs(Fig 2), which occurs via a unique mechanism [40–44].The conserved positions involved in CTLD foldmaintenance and their structural roles have been dis-cussed in detail elsewhere [5] In addition to the fourconserved cysteines, one other sequence feature needs to
be mentioned here, the highly conserved ‘WIGL’ motif
It is located on the b2 strand, is highly conserved andserves as a useful landmark for sequence analysis
Variations of the fold: canonical, compact, long,short
Structurally, CTLDs can be divided into two groups:canonical CTLDs having a long loop region, and com-pact CTLDs that lack it (Fig 2) The second groupincludes Link or protein tandem repeat (PTR) domains
α1α2
β5β1
β1'
β3β2
β4
β1
Fig 1 CTLD structure A cartoon representation of a typical CTLD
structure (1k9i) The long loop region is shown in blue Cystine
brid-ges are shown as orange sticks The cystine bridge specific for
long form CTLDs (C0-C0¢) is also shown.
A
Fig 2 Variation of the long loop region structure Three common forms of the CTLD long loop region are shown Panels (A) and (C) show canonical CTLDs in which the long loop region is tightly packed (A) or flipped out to form a domain-swapping dimer (C) A compact CTLD from human CD44 Link domain is shown in panel (B) The core domain and long loop region are colored green and blue, respectively.
Trang 5[19,20] and bacterial CTLDs [27,39,45] Another family
usually included in the CTLD superfamily is that of
endostatin [1,24,46] However, in the comparative
structure analysis [5], we did not find substantial
simi-larity between the CTLD and endostatin folds, apart
from the general topology As sequence similarity
between endostatin and CTLDs is also absent, we not
consider the endostatin fold as an example of a CTLD
and do not consider it further
Another subdivision of CTLDs is based on the
pres-ence of a short N-terminal extension, which forms a
b-hairpin at the base of the domain (Fig 1) The
CTLDs containing such an extension are called ‘long
form’ The hairpin is stabilized by an additional
cys-tine bridge, and the presence of these two additional
cysteines at the beginning of the CTLD sequence is
used to distinguish between long and short form
CTLDs in sequence analysis No systematic study of
the N-terminal extension, or of its possible roles, has
been published
Secondary structure element numbering
Although the CTLD fold is very well conserved among
its known representatives, there is no general
agree-ment on the numbering of CTLD secondary structure
elements in the literature The secondary structure
element numbering scheme in the first solved CTLD
structure (rat MBP-A [14]) included five strands, two
helices and four loops However, this description
turned out to be insufficient, as MBP lacks some
sec-ondary structure elements that are present in
long-form CTLD structures, while other small strands werenot defined Other reports describing the structures ofCTLDs that have a different number of secondarystructure elements than MBP-A either introduced theirown numbering (b strands 1–6 in asialoglycoproteinreceptor (ASGPR [47]); six b strands in Link module,with labeling not consistent with ASGPR or MBP-A[20]; b1- b7 in NKG2D [48]; b1-b8 in EMBP [49]), orextended the secondary structure element namingscheme used for MBP-A (Ly49A secondary structureelement numbering is consistent with that in MBP-A[50]) For consistency we will use a universal number-ing scheme ([5], Fig 3), taking the same approach aswas used in the Ly-49 A structure; this allows bothdirect reference to the most studied CTLD structures(MBP-A and -C) and assigns individual numbers tothe elements that are present throughout the family.Other elements will be given derived names and num-bers: the b strand specific for the long-form CTLD islabeled b0, the short b strand between a1 and a2 islabeled b1¢, and the two b strands forming a hairpinC-terminal to b2 are labeled b2¢ and b2¢¢
Ca-binding sites
Four Ca2+-binding sites are found in CTLDsFour Ca2+-binding sites in the CTLD domain recur inCTLD structures from different groups (Fig 4) Thesite occupancy depends on the particular CTLDsequence and on the crystallization conditions [14,51];
in different known structures zero, one, two or three
Fig 3 CTLD secondary structure element numbering Ribbon diagrams for a compact (intimin, 1f00) and a canonical (E-selectin, 1g1t) CTLD structure The long loop region in E-selectin, and the short a helix, which replaces the long loop region in compact CTLDs, are shown in black Secondary structure elements are numbered according to the universal numbering scheme [5].
Trang 6sites are occupied Sites 1, 2 and 3 are located in the
upper lobe of the structure, while site 4 is involved in
salt bridge formation between a2 and the b1⁄ b5 sheet
Sites 1 and 2 were observed in the structure of rat
MBP-A complexed with holmium, which was the first
CTLD structure determined [14] Site 3 was first
observed in the MBP-A complex with Ca2+and
oligo-mannose asparaginyl-oligosaccharide [51] It is located
very close to site 1 and all the side chains coordinating
Ca2+in site 3 are involved in site 1 formation As
bio-chemical data indicate that MBP-A binds only two
cal-cium atoms [52], Ca2+-binding site 3 is considered a
crystallographic artifact [51] However, in many CTLD
structures where site 1 is occupied, a metal ion is also
found in site 3; examples include the structures of
DC-SIGN and DC-SIGNR [53], invertebrate C-type
lectin CEL-I [54], lung surfactant protein D [55] and
the CTLD of rat aggrecan [56] It is interesting to note
that molecular dynamics simulations of the MBP-A⁄
mannose complex suggested that Ca2+-3 is involved in
the binding interaction [57]
Ca-binding site 2 is involved in carbohydrate
binding
Residues with carbonyl sidechains involved in Ca2+
coordination in site 2 form two characteristic motifs in
the CTLD sequence, and together with the calcium
atom itself are directly involved in monosaccharidebinding The first group of residues, the ‘EPN motif’ inMBP-A (E185, P186, N187), is contributed by the longloop region and contains two residues with carbonylsidechains separated by a proline in cis conformation.The carbonyl side chains provide two Ca-coordinationbonds, form hydrogen bonds with the monosaccharideand determine binding specificity The cis-proline ishighly conserved and maintains the backbone confor-mation that brings the adjacent carbonyl side chainsinto the positions required for Ca2+coordination Thesecond group of residues, the ‘WND motif’ (positions204–206), is contributed by the b4 strand Althoughonly asparagine and aspartate are involved inCa-coordination, tryptophan immediately precedingthem is a highly conserved contributor to the hydropho-bic core (position b4W [5]) and is a useful landmark fordetecting the motif in a sequence In the MBP-A struc-ture, Asn205 and Asp206 provide three Ca-coordinationbonds (two from the side chains, one from the backbonecarbonyl of Asp) and also form hydrogen bonds withthe sugar One more carbonyl side chain is involved insite 2 formation It belongs to the residue preceding thesecond conserved cysteine at the end of the long loopregion (Glu193 in MBP-A), and forms one coordinationbond with the Ca2+ion
As no other Ca-binding site except for site 2 isknown to be involved in sugar binding, and as the site
Fig 4 Ca-binding sites in CTLDs Shown are ribbon diagrams of two representative CTLD structures, rat MBP-A and human ASGPR-I,
different sites in the text are indicated next to the arrows.
Trang 72 residue motifs can be confidently detected in the
sequence, it is common in the literature to associate
the predicted Ca2+⁄ carbohydrate binding properties of
an uncharacterized sequence with the presence of these
motifs (e.g [7,8]) Although this is a useful
simplifica-tion, it should be noted that the absence of the motifs
associated with Ca2+-binding site 2 does not indicate
that the CTLD is incapable of binding Ca2+, as there
are two independent sites (1 and 4) Also, the presence
of these motifs does not guarantee lectin activity for
the CTLD, as there are numerous examples of CTLDs
that contain the conserved motifs but are not known
to bind monosaccharides (see below)
Sites 1, 2 and 4 play structural roles
Despite their spatial proximity, from the evolutionary
and structural points of view Ca2+-binding sites 1 and
2 should be considered as independent
Crystallograph-ic studies of rat MBP-A CTLD crystallized at a low
metal ion concentration (0.325 mm Ho3+ instead of
20 mm as used to obtain the CTLD complexed with
mannose) have shown that site 1 has higher affinity for
Ca2+ as it remains occupied and Ca2+-coordination
geometry is retained while site 2 loses its metal ion
[58] On the other hand, in the 4th CTLD of the
human macrophage mannose receptor, Ca2+-binding
site 1 is less stable than site 2 [41,59] This is also the
case for the rat pulmonary surfactant protein A
(SP-A), where only some of the required ligands for
Ca2+-1 are present and these can provide only three
coordination bonds to the Ca2+ In one of the two
solved SP-A structures (PDB 1r14) both site 1 and site
2 are occupied by metal atoms, while in the other
(PDB 1r13) only site 2 is occupied [60] SP-A is a
par-ticularly good example supporting the mutual
inde-pendence of sites 1 and 2 because in its close
homologue – pulmonary surfactant protein D – sites 1,
2 and 3 are occupied by Ca2+(PDB 1pw9, 1pwb [61])
Independence of Ca2+-binding site 1 is also supported
by the fact that in several CTLD structures site 1 is
missing, while site 2 contains a calcium ion and is
involved in carbohydrate binding Examples of such
structures are human E- and P-selectins (PDB 1esl,
1g1t, 1g1q, 1g1r, 1g1s) [62,63] and tunicate lectin TC14
(PDB 1byf, 1tlg) [64]
Ca2+-binding site 4 was first observed in the
struc-ture of the factor IX⁄ X-binding protein from the
venom of Trimeresurus flavoviridis, where it was the
only location of Ca2+ ions [40] It is occupied by
Ca2+ in several other snake venom CTLD structures
Two observations suggest that this site is a property of
the CTLD in general rather than restricted to the
snake venom group of CTLDs First, it is present inthe human asialoglycoprotein receptor I [47], which is
a very remote homologue of the snake venom CTLDs.Second, as shown by comparative analysis of CTLDstructures [5], Ca2+-4 is involved in a stabilizing inter-action that is a highly conserved structural featureobserved in virtually all CTLD structures It can bemediated by salt bridge formation between chargedgroups and by metal ion coordination In one structure(galactose-specific C-type lectin from rattlesnake Cro-talus atrox (PDB 1jzn, 1muq [65]) Na+ was foundinstead of Ca2+in site 4
A stabilizing effect of bound Ca2+ on CTLD ture has been reported for a number of proteins fromdifferent CTLD groups [52,66,67] Ca2+ removalgreatly increases CTLD susceptibility to proteolysisand changes physical properties of the domain such ascircular dichroism spectra and intrinsic tryptophanfluorescence Structures of the apo forms of humantetranectin [68] and rat MBP-C, and of the one-ionform of rat MBP-A [58], have demonstrated the mech-anism underlying these changes In these structurescompactness of the long loop region is disrupted lead-ing to multiple conformational changes including a cis-trans isomerization of the conserved proline However,not all CTLDs require Ca2+ to form a stable longloop region structure NMR studies of the tunicateCTLD TC14 have shown that its loops maintain itscompact fold when Ca2+is removed [69]
struc-Role of Ca2+in CTLD functionThe most important functional role of the bound
Ca2+in CTLDs is monosaccharide binding This tion is limited to site 2 and is discussed in detail in thefollowing section However, in several cases, which aredescribed below, Ca2+-binding sites participate ininteractions that do not involve carbohydrate recogni-tion
func-In proteins, Ca2+ is found in 7- or 8-coordinatedform Because of the metal’s ability to simultaneouslyinteract with multiple ligands within the protein, itsbinding can orchestrate dramatic rearrangements inthe tertiary structure of the protein At the same time,the reversible nature of the binding and its dependence
on different parameters of the milieu (e.g ion tration, pH) provide mechanisms to control the struc-tural transformations induced by metal binding.There are several examples of CTLD functions thatare mediated by Ca2+-induced structural changes,namely the destabilization of the long loop regioncaused by Ca2+ removal, rather than its involvement
concen-in monosaccharide bconcen-indconcen-ing It is thought that the
Trang 8destabilization of the loops caused by pH-induced
Ca2+ loss plays a physiological role in the function of
the CTLDs in endocytic proteins such as
asialoglyco-protein receptors [52,70] and macrophage mannose
receptor [41,59] Transition of the receptor-ligand
com-plex from the cell surface into the acidic environment
of a lysosome leads to Ca2+ loss and to the release of
the bound ligand After release, the ligand is processed
by the lysosomal enzymes, while the receptor is
recy-cled to the cell surface
Another example of functional CTLD transformation
induced by Ca2+is human tetranectin Although in the
CTLD of tetranectin Ca2+-binding sites 1 and 2 are
pre-sent, the CTLD is not known to bind carbohydrates
The domain, however, interacts with several kringle
domain-containing proteins, including plasminogen,
and the interaction involves several residues from the
Ca2+-binding site 2 Moreover, the interaction with
kringle domain 4 of plasminogen is only possible when
Ca2+ is lost from the binding site [71], which leads to
changes in the long loop region conformation similar to
those observed in the apo-MBP-C [58,68] The
physiolo-gical role of Ca2+ as an inhibitor of the
tetranec-tin⁄ plasminogen interaction is, however, unclear
The antifreeze protein (AFP) from Atlantic herring
provides an interesting example of a CTLD in which
Ca2+bound in site 2 is involved in an interaction with
a noncarbohydrate ligand [72] Ewart et al have
shown that not only is the antifreeze activity of the
protein Ca2+-dependent [73], but that it is disrupted
by minor changes in the geometry of the Ca2+-binding
site 2 introduced by replacing the original
galactose-type QPD motif by a mannose-galactose-type EPN motif [72]
This strongly suggests that the Ca2+ site 2 in the
herring antifreeze protein interacts directly with the ice
crystal altering its growth pattern
Ligand binding
CTLDs selectively bind a wide variety of ligands As
the superfamily name suggests, carbohydrates (in
var-ious contexts) are primary ligands for CTLDs and the
binding is Ca2+-dependent [74] However, the fold has
been shown to specifically bind proteins [75], lipids [76]
and inorganic compounds including CaCO3 and ice
[72,77–79] In several cases the domain is multivalent
and may bind both protein and sugar [80–82]
Carbohydrate binding is, however, a fundamental
function of the superfamily and the best studied one
The first characterized vertebrate CTLDcps were
Ca2+-dependent lectins, and most of the functionally
characterized CTLDcps from lower organisms were
isolated because of their sugar-binding activity
Although as the number of CTLDcp sequences grows
it becomes clearer that the majority of them do notpossess lectin properties, CTLDcps are still regarded
as a lectin family (according to Drickamer, 85% of
C elegans and 81% of Drosophila CTLDcps are dicted as noncarbohydrate binding [9]) Unlike manyother functions of the CTLDcps, Ca2+-dependent car-bohydrate binding is found across the whole phylo-genetic distribution of the family, from sponges tohuman, and thus is likely to be the ancestral function.Also, Ca2+⁄ carbohydrate-binding CTLDs from differ-ent species demonstrate amazing similarity in themechanisms of sugar binding Systematic studies byDrickamer and his colleagues have provided in depthunderstanding of many aspects of this mechanism.The results of this theoretical and experimental workestablished a basis for developing bioinformatics tech-niques for predicting CTLD sugar-binding propertieswith substantial reliability by sequence analysis [83].Whole-genome studies of the CTLD family published
pre-by Drickamer and his colleagues focused on the tion of the carbohydrate-binding properties and usedthese prediction methods [6–8] Although our approachfor the Fugu rubripes genome was somewhat different[9], for carbohydrate-binding prediction we used thetechniques developed by Drickamer and coworkers
evolu-An overview of the literature on the mechanism of
Ca2+-dependent monosaccharide binding by CTLDs isgiven next
Ca2+-dependent monosaccharide bindingThe mechanism of Ca2+-dependent monosaccharidebinding by several CTLDs has been studied in greatdetail by X-ray crystallography, site-directed muta-genesis and biochemical methods The first crystallo-graphic study of a complex between a CTLD and acarbohydrate was carried out on rat MBP-A and theN-glycan Man6-GalNAc2-Asn [51] In the structureobtained, a ternary complex between the terminalmannose moiety of the oligosaccharide, the Ca2+ ionbound in site 2 and the protein was observed Thecomplex is stabilized by a network of coordination andhydrogen bonds: oxygen atoms from 4- and 3- hydrox-yls of the mannose form two coordination bonds withthe Ca2+ion and four hydrogen bonds with the carbo-nyl sidechains that form the Ca2+-binding site 2(Fig 5) This bonding pattern is fundamental forCTLD⁄ Ca2+⁄ monosaccharide complexes, and isobserved in all known structures It is also a majorcontributor to the binding affinity, especially inCTLDs specific for the mannose group of monosac-charides For example in MBP-A, mannose atoms
Trang 9form very few interactions with the protein other than
hydrogen⁄ coordination bond formation by the two
equatorial hydroxyls, and extensive mutagenesis
screening has shown that the only other significant
contributor to mannose binding is Cb from His189
that forms a hydrophobic interaction with the sugar
[84]
The positioning of hydrogen donors and acceptors in
the binding sites has two important features First, it
determines the overall positioning and orientation of the
ligand in the binding site It may seem from Fig 5A that
the sugar-binding site of CTLDs has a twofold
sym-metry axis relating the sugar hydroxyls, and the
hypo-thetical sugar shown can be rotated by 180 without
introducing any changes to the bonding scheme It is
now known that this is indeed the case, although some
early modeling and mutagenesis studies were based on
the assumption that the orientation of the sugar was
fixed However, when the structure of a complex
between rat MBP-C with mannose was determined, the
orientation of the bound mannose was opposite to the
orientation that was observed in MBP-A [85], and
fur-ther studies revealed some of the factors that determine
the preferred orientation [86] Although the rat MBPs
are the only established example of a CTLD that can
bind carbohydrates in both orientations, it is known
that different CTLDs bind the same monosaccharide in
different orientations (e.g galactose-binding MBP-A
mutant and CEL-I vs TC-14 lectin)
The second constraint imposed by the Ca2+
-coordi-nation site on the ligand determines the properties of
the carbohydrate hydroxyls that the site can accept,and this is best demonstrated by the mechanism of dis-crimination between the mannose group of monosac-charides and the galactose group of monosaccharides
by CTLDs As noted previously, early in the history ofCTLDs an important correlation between the residuesflanking the conserved cis-proline in the long loopregion, which are involved in Ca2+-binding site 2 for-mation, and the specificity for either galactose or man-nose was made In all mannose-binding proteinsknown at that time, the sequence of the motif wasEPN (E185 and N187 in MBP-A), while in the galac-tose-specific CTLDs it was QPD In a series of elegantmutagenesis experiments Drickamer and coworkershave shown that replacing the EPN sequence in MBP-Awith a galactose-type QPD sequence was enough toswitch the specificity to galactose [87], and that furthermodifications around the binding site (mainly intro-duction of a properly positioned aromatic ring to form
a hydrophobic interaction with the apolar face of thesugar) can increase the affinity and specificity of themutant MBP-A for galactose to the level observed innatural galactose-binding CTLDs [88]
Crystallographic analysis of the galactose-specificMBP-A mutant showed that the EPN to QPD changedoes not cause any serious restructuring of the Ca2+-binding site 2 geometry [89]; this suggested that thekey switch in the specificity was induced by swappingthe hydrogen-bond donor and acceptor across themonosaccharide-binding plane and changing thehydrogen-bonding pattern from the mannose-type
Coordination bond H-bond
Protein groups that act as hydrogen donors and acceptors are not shown Arrows show the direction of hydrogen bonds in mannose-specific CTLDs, while light-grey arrows indicate changed directions in galactose-specific CTLDs (B) A stereoview of the MBP-A complex with man- nose (PDB 2msb) Coordination bonds are orange Hydrogen bonds where sugar hydroxyl acts as acceptor and donor are red and blue,
Trang 10asymmetrical (Fig 5A, dark-grey arrows) to
galactose-type symmetrical (Fig 5A, light-grey arrows) The
same distribution of hydrogen-bonding partners was
observed in the galactose-binding lectin TC-14 from
the tunicate Polyandrocarpa misakiensis [64] The
TC-14 CTLD contains an unusual EPS motif in the
long loop region, which is similar to the motifs of
the mannose-binding proteins but contains a serine as
a hydrogen-bond donor instead of the asparagine in
MBP-A The crystal structure revealed that due to a
compensatory change on the opposite side of the
ligand-binding site (the ‘WND’ motif is changed to
LDD), and a 180 rotation of the galactose residue
compared with the orientation observed in the
galac-tose-binding MBP-A mutant, the symmetrical pattern
of the hydrogen bonding is maintained
Although many of the determinants of the
monosac-charide-binding specificity have been established
experi-mentally, the mechanism underlying them is still
unclear Mutual spatial disposition of bonded
hydrox-yls, which was initially suggested to be the main
contri-butor to the specificity, is no longer considered so
important; a growing number of crystal structures of
CTLDs with the MBP-A-like (‘asymmetrical’)
distribu-tion of hydrogen-bond donors and acceptors have
shown that the core binding site is compatible not only
with any two equatorial hydroxyl (3- and 4-OH of
man-nose and glucose, 2- and 3-OH of fucose), but also with
a combination of axial and equatorial hydroxyls (3- and
4-OH of fucose, as in E- and P-selectin structures) A
comparative study of different lectin-carbohydrate
com-plexes published by Elgavish and Shaanan [90] suggests
that additional stereochemical factors need to be taken
into consideration Elgavish and Shaanan noted the
unique clustering of hydrogen-bond donors and
accep-tors around the 4-OH hydroxyl in all compared
struc-tures, which was not observed for other hydroxyls: in a
Newman projection along the O4-C4 bond, hydrogen
bond acceptors are never gauche to both vicinal ring
carbons (C3 and C5), and thus the 4-OH proton is
always pointing outside the ring Poget et al [64]
confirmed this observation and also noted that in
CTLDs the same rule is also true for the 3-OH proton
However, no explanation of the unique stereochemistry
of the 4-OH binding orientation has been offered
Other contributions to monosaccharide binding
affinity and specificity
Although the networks of interactions between the
Ca2+ ion, the carbonyl residues that coordinate it and
the sugar hydroxyls determines the basic binding
affin-ity and specificaffin-ity to either mannose-type or
galactose-type monosaccharides, other structural elements in thebinding sites increase the affinity to the level requiredfor efficient binding, impose steric limitations on theorientation of the ligand and introduce selectivity tothe particular members within the mannose or galac-tose groups
Structural determinants of specificity for particularmonosaccharides from both mannose and galactosegroups were studied by protein engineering on theMBP framework [91,92] and by mutagenesis of severalwild-type proteins (mechanisms of discriminationbetween Glc and GlcNAc by chicken hepatic lectin[93], contribution of His189 to the mannose-bindingaffinity in MBP-A [84], mutations affecting MBP-Abinding of mannose [94], discrimination between Gal-NAc and Gal by ASGPR [95], increasing the mutantMBP-A affinity towards galactose [88], role of van derWaals interaction with Val351 in fucose recognition
by human DC-SIGN [96] and residues affectingpH-dependent ligand release by ASGPR [70]) Theseadditional contributors to binding, however, are vari-able even between close homologues, which combinedwith the inherent plasticity of the core binding sitemakes any predictive modeling questionable
Reliability of Ca2+/carbohydrate-bindingprediction
As noted above, the molecular mechanism of Ca2+dependent carbohydrate binding is conserved in allfamily members studied; the amino acids that form thecore of the binding sites form characteristic motifs(‘EPN’ and ‘WND’) that can be identified by sequencesimilarity and are indicative of the binding specificity(mannose vs galactose) These observations provide asimple and very popular approach to predicting whe-ther a CTLD of unknown function is likely to bindsugar (‘EPN’ and ‘WND’ present) and whether itwould preferentially bind mannose- or galactose-typeligands (‘EPN’ vs ‘QPD’) This simple prediction tech-nique is widely used and has proven to be reliable inmany cases However, its development was based oncomparison of a limited set of well-characterizeddomains, whereas the number of uncharacterizedsequences to which it is applied is quickly growing, asdoes also the evolutionary distance between the char-acterized and new sequences It is therefore important,especially for studies involving large-scale CTLDsequence analysis, to take into account the assump-tions on which this approach is based, and its possiblelimitations
-The three main assumptions are: (a) the presence
of Ca2+-binding site 2 strongly suggests sugar-binding
Trang 11activity (b) Ca2+-dependent sugar binding involving
Ca2+-binding site 2 is the only (major) mechanism of
monosaccharide binding by CTLDs (c) Positioning
of hydrogen-bond donors and acceptors flanking the
conserved proline in the long loop region determines
specificity to either mannose- or galactose-type
mono-saccharides
As described above, the presence of the residue
motifs associated with Ca2+-binding site 2 does not
guarantee that the CTLD will bind carbohydrates
Several examples exist in the literature where
sugar-binding activity and specificity predicted from the
sequence were not confirmed by experiment The
CTLD of human tetranectin contains a galactose-type
QPD motif and binds two Ca2+ ions, but the only
demonstrated carbohydrate-binding activity of this
protein [97] is not associated with the CTLD [98]
Antifreeze protein from Atlantic herring also contains
a galactose-type QPD motif and binds Ca2+, but does
not bind carbohydrate [99] Although human
macro-phage mannose receptor CTLDs 4 and 5 both contain
mannose-type EPN motifs and other positions
typic-ally involved in Ca2+-binding are occupied by identical
or similar residues [100], monosaccharide-binding
activity could be demonstrated only for CTLD 4 [101]
On the other hand, lung surfactant protein A has an
EPK motif in the long loop region, but binds Ca2+ at
site 2 and also monosaccharides from the mannose
group [102,103]
As to the second assumption, there is no firm
evi-dence to indicate an alternative mechanism of
mono-saccharide binding by CTLDs exists, but we found
several examples in the literature that may suggest this
possibility These were: (a) existence of a secondary
site was proposed for rabbit and rat hepatic lectins
based on binding data [104] (b) In a study using a
photo-activatable galactose derivative to map the
bind-ing site of a galactose-specific lectin from acorn
barna-cle (BRA-3) the labeled regions were not adjacent to
the Ca2+-site 2 [105] (c) A secondary binding site was
observed in one of the MBP-C crystals soaked with a
high concentration (1.3 m) of a-methyl-mannose [85]
Although the second binding site was not observed at
lower monosaccharide concentration (0.2 m) and
elec-tron density for the sugar could only be assigned for
one of the two copies in the asymmetric unit, it has
been suggested that the secondary binding site may be
a part of an extended site that has significant affinity
only for larger ligands [85] Interestingly, the
monosac-charide bound at the alternative site is in contact with
the regions corresponding to the regions labeled in the
acorn barnacle lectin study (d) Although the CTLD
of human thrombomodulin does not contain the
typical Ca-binding sequence signature, aggregation ofmelanoma cells mediated by it is abolished by Ca2+removal or by addition of mannose, chondroitin sul-fate A or chondroitin sulfate C [106], which suggests
a Ca2+-dependent carbohydrate-binding activity (e)Dectin-1, which is reported as a macrophage b-glucanreceptor [107], does not contain Ca2+⁄ carbohydratebinding motifs or require Ca2+for carbohydrate bind-ing; residues required for binding are located in strandb3 [108] Other group V CTLDcps may have similarproperties [81]
The evidence for an alternative mechanism of sugarbinding by CTLDs is scarce and does not show anycommon trend On the other hand, a surprisingly large(> 80%) number of CTLDs from invertebrates arepredicted as not sugar binding It is possible that some
of these proteins use an alternative mechanism forsugar binding In this regard the example of the Linkgroup of proteins is pertinent These proteins do notcontain a long loop region but nevertheless bind car-bohydrates via a different mechanism
As to the last assumption, there is no compellingexplanation of the correlation between donor-acceptorpositioning in site 2 and the discrimination betweengalactose and mannose, although it is supported bythe majority of the CTLDs discovered since the obser-vation was made However, the example of the Poly-androcarpa lectin shows that the correlation is notabsolute [64]
Groups of vertebrate CTLDcps
In a review of the C-type lectin family published in
1993 Drickamer separated the CTLDcps known at thattime into seven groups (I to VII) based on their domainarchitecture and showed that such grouping correlateswell with the results of phylogenetic analysis of theCTLD sequences and captures functional similaritiesbetween the proteins [2] The classification was revised
in 2002 [6] with the addition of seven new groups (VIII
to XIV) Whereas the first seven groups of CTLDcpshave a substantial history and are widely referenced inthe literature, the new groups were only briefly outlined
in the work introducing them Along with the updatedclassification, a link to the ‘World-wide web-basedresource for animal lectins’ (http://www.imperial.ac.uk/research/animallectins/default.html) was published,where some additional information on the new groupscan be found, including the lists of database identifiersfor the sequences that were used to define them How-ever, no functional description of the CTLDcps fromthe new groups similar to the description of the groups
I to VII has been published The domain architecture
Trang 12of the CTLDcps in different groups is shown in Fig 6.
In addition to the 14 groups present in Drickamer’s
updated classification, three new groups (XV to XVII)
are shown, which we have added to accommodate the
novel vertebrate proteins we identified in the study ofFugu CTLDcps [9] Table 1 summarizes the literature
on the vertebrate CTLDcp groups, focusing on thestructural and functional features of the CTLDs
Fig 6 Domain architecture of vertebrate CTLDcps, with mammalian homologues, from different groups Group numbers are indicated next
to the domain charts I –lecticans, II – the ASGR group, III – collectins, IV – selectins, V – NK receptors, VI – the macrophage mannose receptor group, VII – REG proteins, VIII – the chondrolectin group, IX – the tetranectin group, X – polycystin 1, XI – attractin, XII – EMBP, XIII – DGCR2, XIV – the thrombomodulin group, XV – Bimlec, XVI –SEEC, XVII CBCP.