1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: The C-type lectin-like domain superfamily ppt

39 518 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề The C-type lectin-like domain superfamily ppt
Tác giả Alex N. Zelensky, Jill E. Gready
Trường học Australian National University
Chuyên ngành Computational Proteomics and Therapy Design
Thể loại review article
Năm xuất bản 2005
Thành phố Canberra
Định dạng
Số trang 39
Dung lượng 908,87 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Keywords C-type lectin-like domain; domain superfamily; protein evolution; carbohydrate binding Correspondence J.. Gready, Computational Proteomics and Therapy Design Group, Division of

Trang 1

The C-type lectin-like domain superfamily

Alex N Zelensky and Jill E Gready

Computational Proteomics and Therapy Design Group, John Curtin School of Medical Research, Australian National University, Canberra, Australia, Subdivision: Proteomics

Introduction

The superfamily of proteins containing C-type

lectin-like domains (CTLDs) is a large group of extracellular

Metazoan proteins with diverse functions It has been

the subject of some general literature reviews [1,2], but

with many more focusing on its particular functions

(e.g [3,4]) There are also several systematic studies

[5–9] A classification of the family members based on

the overall domain architecture of the CTLD-containingproteins (CTLDcps), which was introduced by Drick-amer in 1993 [2] and updated recently [6], served as auseful framework for the superfamily studies However,despite a voluminous literature describing some of thefamily’s properties in great detail, we feel that a freshcritical review would be useful, as the previous review ofthis scale was published more than a decade ago [2] Ourapproach has several main goals, outlined below

Keywords

C-type lectin-like domain; domain

superfamily; protein evolution; carbohydrate

binding

Correspondence

J E Gready, Computational Proteomics and

Therapy Design Group, Division of

Molecular Bioscience, John Curtin School of

Medical Research, PO Box 334, Canberra

The superfamily of proteins containing C-type lectin-like domains (CTLDs)

is a large group of extracellular Metazoan proteins with diverse functions.The CTLD structure has a characteristic double-loop (‘loop-in-a-loop’)stabilized by two highly conserved disulfide bridges located at the bases ofthe loops, as well as a set of conserved hydrophobic and polar interactions.The second loop, called the long loop region, is structurally and evolutio-narily flexible, and is involved in Ca2+-dependent carbohydrate bindingand interaction with other ligands This loop is completely absent in asubset of CTLDs, which we refer to as compact CTLDs; these include theLink⁄ PTR domain and bacterial CTLDs CTLD-containing proteins(CTLDcps) were originally classified into seven groups based on their over-all domain structure Analyses of the superfamily representation in severalcompletely sequenced genomes have added 10 new groups to the classifica-tion, and shown that it is applicable only to vertebrate CTLDcps; despitethe abundance of CTLDcps in the invertebrate genomes studied, thedomain architectures of these proteins do not match those of the vertebrategroups Ca2+-dependent carbohydrate binding is the most common CTLDfunction in vertebrates, and apparently the ancestral one, as suggested bythe many humoral defense CTLDcps characterized in insects and otherinvertebrates However, many CTLDs have evolved to specificallyrecognize protein, lipid and inorganic ligands, including the vertebrateclade-specific snake venoms, and fish antifreeze and bird egg-shell proteins.Recent studies highlight the functional versatility of this proteinsuperfamily and the CTLD scaffold, and suggest further interesting discov-eries have yet to be made

Abbreviations

CRD, carbohydrate recognition domain; CTLD, C-type lectin-like domain; CTLDcp, CTLD-containing protein; DC-SIGN, Dendritic cell-specific ICAM-grabbing nonintegrin; EST, expressed sequence tag; MBP, mannose-binding protein; NK, natural killer cell; PSP, pulmonary surfactant protein; PTR, protein tandem repeat.

Trang 2

The literature is strongly biased towards several

groups of mammalian proteins, many of more

biomed-ical interest In this review we tried to capture the

superfamily in all its variety, rather than attempting to

provide a description of the known members

propor-tional to the amount of published data In particular,

we wanted to integrate the results of the systematic

studies of the CTLDs from lower vertebrates, such as

proteins from snake venom and fish CTLDs, etc with

the classification of mammalian CTLDs The recent

inclusion of new CTLDcp groups inspired a critical

reassessment of the principles on which the current

domain-based classification was built We also wanted

to summarize the functional data on invertebrate

CTLDs, which to our knowledge has never been

reviewed previously at a general level

In addition, numerous structural studies of CTLDs

in the last decade have provided much information on

the inner workings of the fold and the mechanisms of

Ca2+-dependent carbohydrate binding We have

attempted to generalize these data and outline the

most common elements of the domain An important

correlation between the residue composition of the

pri-mary carbohydrate-binding site and its basic specificity

towards mannose- or galactose-group monosaccharides

was discovered early in the history of CTLD studies

and remains the most useful means for CTLD-function

prediction However, several models suggested to

explain the mechanisms of such a correlation had to

be rejected as the volume of data grew, and no

com-prehensive explanation of this fundamental

phenom-enon has been published Our goal was to analyze the

current state of the literature on this problem, to see if

an explanation is apparent

Finally, we wanted to address the inconsistencies of

the terminology of the CTLDcp superfamily which

exist in the literature, and to suggest clear definitions

for the relevant terms

The CTLD superfamily

A brief history of discovery

C-type lectins were among the first animal lectins

dis-covered Bovine conglutinin, which belongs to the

col-lectin group of C-type col-lectins, has been known since

1906, and agglutinating activity of the snake venom

lectins was first described much earlier, in 1860 [10] In

1988 Drickamer suggested to organize animal lectins

into several categories, and classified Ca2+-dependent

lectins structurally similar to the asialoglycoprotein

receptor as the C-type lectin group [11] Since then, the

known family has grown significantly, and now

includes more than a thousand identified members(including those from genome sequences only) fromdifferent animal species, most of which lack lectinactivity

Term definitions: CTLD, CRD, C-type lectinThe terms ‘C-type lectin’, ‘carbohydrate recognitiondomain’ (CRD), ‘C-type lectin domain’ (CTLD),

‘C-type lectin-like domain’ (also abbreviated asCTLD), are often used interchangeably in the litera-ture This may be a source of confusion The history

of the introduction and the common meanings of theterms are outlined below, followed by the definitions

we will use in this review

The term ‘C-type lectin’ was introduced to guish a group of Ca2+-dependent (C-type) carbohy-drate-binding (lectin) animal proteins from the other(Ca2+-independent) types of animal lectins When thestructures of C-type lectins were established biochemi-cally and functions of different domains were defined,

distin-it was found that carbohydrate-binding activdistin-ity wasmediated by a compact module – the ‘carbohydrate-recognition domain’ (CRD) – which was present in all

Ca2+-dependent lectins but not in other types of mal lectins [11–13] Comparison of CRD sequencesfrom different C-type lectins revealed conserved resi-due motifs characteristic of the domain [2,11,13],which allowed discovery of many more proteins thatcontained it At the same time, crystallographic studiesconfirmed that the CRD of the C-type lectins has acompact globular structure, which was not similar toany known protein fold [14] This domain has beencalled ‘C-type CRD’ or ‘C-type lectin domain’ As thenumber of determined sequences grew, it became clearthat not all proteins containing C-type CRDs can actu-ally bind carbohydrates or even Ca2+ To resolve thecontradiction, a more general term ‘C-type lectin-likedomains’ was introduced to refer to such domains[1,3] The usage of this term is however, somewhatambiguous, as it is used both as a general name forthe group of domains with sequence similarity toC-type lectin CRDs (regardless of the carbohydrate-binding properties), and as a name of the subset ofsuch domains that do not bind carbohydrates, with thesubset that does bind carbohydrates being calledC-type CRDs [6,8] Also both ‘C-type CRD’ and

ani-‘C-type lectin domain’ terms are still being used inrelation to the C-type lectin homologues that do notbind carbohydrate (e.g [15–17]), and the group of pro-teins containing the domain is still often called the

‘C-type lectin family’ or ‘C-type lectins’, although most

of them are not in fact lectins The abbreviation CRD

Trang 3

is used both in the meaning of ‘C-type

carbohydrate-recognition domain’ and in a more general meaning of

‘carbohydrate-recognition domain’, which encompasses

domains from different lectin groups [8] Occasionally

CRD is also used to designate the short amino-acid

motifs (i.e amino-acid domain) within CTLDs that

directly interact with Ca2+ and carbohydrate (e.g

[18])

Structure comparisons add another meaning to the

definition of the C-type lectin domain, as structural

similarities have been discovered between C-type lectin

CRDs and protein domains that did not show

signifi-cant sequence similarity to any of the known C-type

lectins but adopted a similar fold [19–23] As the fold

is very unusual, these domains have been separated

into a common group in structure classification

data-bases For example, in the SCOP database [24] C-type

lectins and structurally related domains are grouped at

the fold level (‘C-type lectin-like fold’), which is the

second level from the top of the classification

hier-archy However, although the structural similarity is

often acknowledged in the literature, the common

meaning of the C-type lectin-like domain does not

include these domains [1,6]

Here we will use the term ‘C-type lectin-like domain’

(CTLD) in its broadest definition to refer to protein

domains that are homologous to the CRDs of the

C-type lectins, or which have structure resembling the

structure of the prototypic C-type lectin CRD

Pro-teins harboring this domain will be called

CTLD-containing proteins (CTLDcps) instead of the more

common ‘C-type lectins’, as the latter implies

carbohy-drate-binding ability which most of the CTLDcps are

not known to possess

Phylogenetic distribution, groups

With a few exceptions, which will be discussed

below, CTLDs are only found, extracellularly, in

Metazoa The domain has been a very popular

framework evolutionarily for generating new

tions and is found in various structural and

func-tional contexts CTLDcps are ubiquitous in

multicellular animals, and are found in a broad

range of species, from sponges to human [6,25]

CTLDcp-encoding genes have been found in all fully

sequenced Metazoan genomes, and, in general, in

large numbers For example, the CTLD is the 7th

most abundant domain family in Caenorhabditis

ele-gans [26] The family shows both evolutionary

flexi-bility and conservation Whole-genome studies have

shown that although there are virtually no

similarit-ies between CTLDcps from worm, fruit fly and

vertebrates [8], relatively few modifications occurredwithin the vertebrate lineage during evolution fromfish to mammals [9], with some members showingsequence conservation approaching the conservation

of histones

Non Metazoan CTLDsThere are several interesting examples of non MetazoanCTLDcps, which can be divided into two groups.Members of the first group come from parasitic bac-teria and viruses; these are involved in interactionswith the animal host and are either hijacked host pro-teins or their imitations This group includes bacterialtoxins (pertussis toxin [23] and proaerolysin [22]) andouter membrane adhesion proteins (intimin fromenteropathogenic Escherichia coli [21] and invasin fromYersinia pseudotuberculosis [27]) and viral proteins.Viral CTLDcps are either transmembrane proteins orstructural envelope proteins, and include, for example,eight ORFs in the fowlpox virus genome [28], proteinsfrom vaccinia virus [29,30], African swine fever virus[31], cowpox virus [32], avian adenovirus gal1 [33],myxoma virus [34], molluscum contagiosum [35],Epstein-Barr virus [36], and alcelaphine herpesvirus[37] Unlike bacterial CTLDs, which were assigned tothe CTLD superfamily on the basis of structural simi-larity only, viral proteins contain a canonical CTLDwith significant similarity to those in mammalianCTLDcps

While the presence of CTLDcps in parasites has anobvious rationalization, the origins of another group

of non Metazoan CTLDcps is unclear We have foundthree proteins that can be assigned to this group: twoproteins from plants, and a putative protein encoded

by an ORF from a marine planctomycete Pirellula sp.(GenBank ID:32443381) The latter sequence, which is

7716 amino acids long and is encoded by the biggestORF in the genome of that bacterium [38], containsseveral C-type lectin-like, laminin G and cadherindomains, all of which are domains almost exclusivelyfound in Metazoa The most parsimonious explanation

of the presence of all these domains in the Pirellulagenome is horizontal gene transfer, but what the func-tion of the protein harboring them might be is amystery, as Pirellula are free-living species The plantCTLDcp sequences originate from the Arabidopsisthaliana genome annotation (transcript IDs At4g22160and At1g52310) and are not characterized functionally.At1g52310 is a transmembrane protein with a typicalCTLD in the extracellular domain and a proteinkinase domain in the cytoplasmic part; it has a well-conserved orthologue in the rice genome sequence

Trang 4

It is not absolutely clear whether the CTLD

super-family is monophyletic, as homology between the

canonical and some of the compact CTLDs (see

below) cannot be confidently established There seems

little doubt that the Link domain group of CTLDs has

emerged as a result of a deletion of the long loop

region from an ancestral canonical CTLD, because the

Link domains have a much narrower phylogenetic

dis-tribution (only found in vertebrates), are less diverse,

and show detectable sequence similarity to the

canon-ical CTLDs [19] However, the evolutionary

relation-ship of the compact CTLDs from the bacterial toxins

to the animal CTLDs is uncertain [39] These domains

could either have been acquired by horizontal transfer

or could have arisen by convergent evolution, as

mim-icry of host proteins

The CTLD fold

The CTLD fold has a double-loop structure (Fig 1)

The overall domain is a loop, with its N- and

C-ter-minal b strands (b1, b5) coming close together to form

an antiparallel b-sheet The second loop, which is

called the long loop region, lies within the domain; it

enters and exits the core domain at the same location

Four cysteines (C1-C4), which are the most conserved

CTLD residues, form disulfide bridges at the bases ofthe loops: C1 and C4 link b5 and a1 (the wholedomain loop) and C2 and C3 link b3 and b5 (the longloop region) The rest of the chain forms two flanking

a helices (a1 and a2) and the second (‘top’) b-sheet,formed by strands b2, b3 and b4 The long loop region

is involved in Ca2+-dependent carbohydrate binding,and in domain-swapping dimerization of some CTLDs(Fig 2), which occurs via a unique mechanism [40–44].The conserved positions involved in CTLD foldmaintenance and their structural roles have been dis-cussed in detail elsewhere [5] In addition to the fourconserved cysteines, one other sequence feature needs to

be mentioned here, the highly conserved ‘WIGL’ motif

It is located on the b2 strand, is highly conserved andserves as a useful landmark for sequence analysis

Variations of the fold: canonical, compact, long,short

Structurally, CTLDs can be divided into two groups:canonical CTLDs having a long loop region, and com-pact CTLDs that lack it (Fig 2) The second groupincludes Link or protein tandem repeat (PTR) domains

α1α2

β5β1

β1'

β3β2

β4

β1

Fig 1 CTLD structure A cartoon representation of a typical CTLD

structure (1k9i) The long loop region is shown in blue Cystine

brid-ges are shown as orange sticks The cystine bridge specific for

long form CTLDs (C0-C0¢) is also shown.

A

Fig 2 Variation of the long loop region structure Three common forms of the CTLD long loop region are shown Panels (A) and (C) show canonical CTLDs in which the long loop region is tightly packed (A) or flipped out to form a domain-swapping dimer (C) A compact CTLD from human CD44 Link domain is shown in panel (B) The core domain and long loop region are colored green and blue, respectively.

Trang 5

[19,20] and bacterial CTLDs [27,39,45] Another family

usually included in the CTLD superfamily is that of

endostatin [1,24,46] However, in the comparative

structure analysis [5], we did not find substantial

simi-larity between the CTLD and endostatin folds, apart

from the general topology As sequence similarity

between endostatin and CTLDs is also absent, we not

consider the endostatin fold as an example of a CTLD

and do not consider it further

Another subdivision of CTLDs is based on the

pres-ence of a short N-terminal extension, which forms a

b-hairpin at the base of the domain (Fig 1) The

CTLDs containing such an extension are called ‘long

form’ The hairpin is stabilized by an additional

cys-tine bridge, and the presence of these two additional

cysteines at the beginning of the CTLD sequence is

used to distinguish between long and short form

CTLDs in sequence analysis No systematic study of

the N-terminal extension, or of its possible roles, has

been published

Secondary structure element numbering

Although the CTLD fold is very well conserved among

its known representatives, there is no general

agree-ment on the numbering of CTLD secondary structure

elements in the literature The secondary structure

element numbering scheme in the first solved CTLD

structure (rat MBP-A [14]) included five strands, two

helices and four loops However, this description

turned out to be insufficient, as MBP lacks some

sec-ondary structure elements that are present in

long-form CTLD structures, while other small strands werenot defined Other reports describing the structures ofCTLDs that have a different number of secondarystructure elements than MBP-A either introduced theirown numbering (b strands 1–6 in asialoglycoproteinreceptor (ASGPR [47]); six b strands in Link module,with labeling not consistent with ASGPR or MBP-A[20]; b1- b7 in NKG2D [48]; b1-b8 in EMBP [49]), orextended the secondary structure element namingscheme used for MBP-A (Ly49A secondary structureelement numbering is consistent with that in MBP-A[50]) For consistency we will use a universal number-ing scheme ([5], Fig 3), taking the same approach aswas used in the Ly-49 A structure; this allows bothdirect reference to the most studied CTLD structures(MBP-A and -C) and assigns individual numbers tothe elements that are present throughout the family.Other elements will be given derived names and num-bers: the b strand specific for the long-form CTLD islabeled b0, the short b strand between a1 and a2 islabeled b1¢, and the two b strands forming a hairpinC-terminal to b2 are labeled b2¢ and b2¢¢

Ca-binding sites

Four Ca2+-binding sites are found in CTLDsFour Ca2+-binding sites in the CTLD domain recur inCTLD structures from different groups (Fig 4) Thesite occupancy depends on the particular CTLDsequence and on the crystallization conditions [14,51];

in different known structures zero, one, two or three

Fig 3 CTLD secondary structure element numbering Ribbon diagrams for a compact (intimin, 1f00) and a canonical (E-selectin, 1g1t) CTLD structure The long loop region in E-selectin, and the short a helix, which replaces the long loop region in compact CTLDs, are shown in black Secondary structure elements are numbered according to the universal numbering scheme [5].

Trang 6

sites are occupied Sites 1, 2 and 3 are located in the

upper lobe of the structure, while site 4 is involved in

salt bridge formation between a2 and the b1⁄ b5 sheet

Sites 1 and 2 were observed in the structure of rat

MBP-A complexed with holmium, which was the first

CTLD structure determined [14] Site 3 was first

observed in the MBP-A complex with Ca2+and

oligo-mannose asparaginyl-oligosaccharide [51] It is located

very close to site 1 and all the side chains coordinating

Ca2+in site 3 are involved in site 1 formation As

bio-chemical data indicate that MBP-A binds only two

cal-cium atoms [52], Ca2+-binding site 3 is considered a

crystallographic artifact [51] However, in many CTLD

structures where site 1 is occupied, a metal ion is also

found in site 3; examples include the structures of

DC-SIGN and DC-SIGNR [53], invertebrate C-type

lectin CEL-I [54], lung surfactant protein D [55] and

the CTLD of rat aggrecan [56] It is interesting to note

that molecular dynamics simulations of the MBP-A⁄

mannose complex suggested that Ca2+-3 is involved in

the binding interaction [57]

Ca-binding site 2 is involved in carbohydrate

binding

Residues with carbonyl sidechains involved in Ca2+

coordination in site 2 form two characteristic motifs in

the CTLD sequence, and together with the calcium

atom itself are directly involved in monosaccharidebinding The first group of residues, the ‘EPN motif’ inMBP-A (E185, P186, N187), is contributed by the longloop region and contains two residues with carbonylsidechains separated by a proline in cis conformation.The carbonyl side chains provide two Ca-coordinationbonds, form hydrogen bonds with the monosaccharideand determine binding specificity The cis-proline ishighly conserved and maintains the backbone confor-mation that brings the adjacent carbonyl side chainsinto the positions required for Ca2+coordination Thesecond group of residues, the ‘WND motif’ (positions204–206), is contributed by the b4 strand Althoughonly asparagine and aspartate are involved inCa-coordination, tryptophan immediately precedingthem is a highly conserved contributor to the hydropho-bic core (position b4W [5]) and is a useful landmark fordetecting the motif in a sequence In the MBP-A struc-ture, Asn205 and Asp206 provide three Ca-coordinationbonds (two from the side chains, one from the backbonecarbonyl of Asp) and also form hydrogen bonds withthe sugar One more carbonyl side chain is involved insite 2 formation It belongs to the residue preceding thesecond conserved cysteine at the end of the long loopregion (Glu193 in MBP-A), and forms one coordinationbond with the Ca2+ion

As no other Ca-binding site except for site 2 isknown to be involved in sugar binding, and as the site

Fig 4 Ca-binding sites in CTLDs Shown are ribbon diagrams of two representative CTLD structures, rat MBP-A and human ASGPR-I,

different sites in the text are indicated next to the arrows.

Trang 7

2 residue motifs can be confidently detected in the

sequence, it is common in the literature to associate

the predicted Ca2+⁄ carbohydrate binding properties of

an uncharacterized sequence with the presence of these

motifs (e.g [7,8]) Although this is a useful

simplifica-tion, it should be noted that the absence of the motifs

associated with Ca2+-binding site 2 does not indicate

that the CTLD is incapable of binding Ca2+, as there

are two independent sites (1 and 4) Also, the presence

of these motifs does not guarantee lectin activity for

the CTLD, as there are numerous examples of CTLDs

that contain the conserved motifs but are not known

to bind monosaccharides (see below)

Sites 1, 2 and 4 play structural roles

Despite their spatial proximity, from the evolutionary

and structural points of view Ca2+-binding sites 1 and

2 should be considered as independent

Crystallograph-ic studies of rat MBP-A CTLD crystallized at a low

metal ion concentration (0.325 mm Ho3+ instead of

20 mm as used to obtain the CTLD complexed with

mannose) have shown that site 1 has higher affinity for

Ca2+ as it remains occupied and Ca2+-coordination

geometry is retained while site 2 loses its metal ion

[58] On the other hand, in the 4th CTLD of the

human macrophage mannose receptor, Ca2+-binding

site 1 is less stable than site 2 [41,59] This is also the

case for the rat pulmonary surfactant protein A

(SP-A), where only some of the required ligands for

Ca2+-1 are present and these can provide only three

coordination bonds to the Ca2+ In one of the two

solved SP-A structures (PDB 1r14) both site 1 and site

2 are occupied by metal atoms, while in the other

(PDB 1r13) only site 2 is occupied [60] SP-A is a

par-ticularly good example supporting the mutual

inde-pendence of sites 1 and 2 because in its close

homologue – pulmonary surfactant protein D – sites 1,

2 and 3 are occupied by Ca2+(PDB 1pw9, 1pwb [61])

Independence of Ca2+-binding site 1 is also supported

by the fact that in several CTLD structures site 1 is

missing, while site 2 contains a calcium ion and is

involved in carbohydrate binding Examples of such

structures are human E- and P-selectins (PDB 1esl,

1g1t, 1g1q, 1g1r, 1g1s) [62,63] and tunicate lectin TC14

(PDB 1byf, 1tlg) [64]

Ca2+-binding site 4 was first observed in the

struc-ture of the factor IX⁄ X-binding protein from the

venom of Trimeresurus flavoviridis, where it was the

only location of Ca2+ ions [40] It is occupied by

Ca2+ in several other snake venom CTLD structures

Two observations suggest that this site is a property of

the CTLD in general rather than restricted to the

snake venom group of CTLDs First, it is present inthe human asialoglycoprotein receptor I [47], which is

a very remote homologue of the snake venom CTLDs.Second, as shown by comparative analysis of CTLDstructures [5], Ca2+-4 is involved in a stabilizing inter-action that is a highly conserved structural featureobserved in virtually all CTLD structures It can bemediated by salt bridge formation between chargedgroups and by metal ion coordination In one structure(galactose-specific C-type lectin from rattlesnake Cro-talus atrox (PDB 1jzn, 1muq [65]) Na+ was foundinstead of Ca2+in site 4

A stabilizing effect of bound Ca2+ on CTLD ture has been reported for a number of proteins fromdifferent CTLD groups [52,66,67] Ca2+ removalgreatly increases CTLD susceptibility to proteolysisand changes physical properties of the domain such ascircular dichroism spectra and intrinsic tryptophanfluorescence Structures of the apo forms of humantetranectin [68] and rat MBP-C, and of the one-ionform of rat MBP-A [58], have demonstrated the mech-anism underlying these changes In these structurescompactness of the long loop region is disrupted lead-ing to multiple conformational changes including a cis-trans isomerization of the conserved proline However,not all CTLDs require Ca2+ to form a stable longloop region structure NMR studies of the tunicateCTLD TC14 have shown that its loops maintain itscompact fold when Ca2+is removed [69]

struc-Role of Ca2+in CTLD functionThe most important functional role of the bound

Ca2+in CTLDs is monosaccharide binding This tion is limited to site 2 and is discussed in detail in thefollowing section However, in several cases, which aredescribed below, Ca2+-binding sites participate ininteractions that do not involve carbohydrate recogni-tion

func-In proteins, Ca2+ is found in 7- or 8-coordinatedform Because of the metal’s ability to simultaneouslyinteract with multiple ligands within the protein, itsbinding can orchestrate dramatic rearrangements inthe tertiary structure of the protein At the same time,the reversible nature of the binding and its dependence

on different parameters of the milieu (e.g ion tration, pH) provide mechanisms to control the struc-tural transformations induced by metal binding.There are several examples of CTLD functions thatare mediated by Ca2+-induced structural changes,namely the destabilization of the long loop regioncaused by Ca2+ removal, rather than its involvement

concen-in monosaccharide bconcen-indconcen-ing It is thought that the

Trang 8

destabilization of the loops caused by pH-induced

Ca2+ loss plays a physiological role in the function of

the CTLDs in endocytic proteins such as

asialoglyco-protein receptors [52,70] and macrophage mannose

receptor [41,59] Transition of the receptor-ligand

com-plex from the cell surface into the acidic environment

of a lysosome leads to Ca2+ loss and to the release of

the bound ligand After release, the ligand is processed

by the lysosomal enzymes, while the receptor is

recy-cled to the cell surface

Another example of functional CTLD transformation

induced by Ca2+is human tetranectin Although in the

CTLD of tetranectin Ca2+-binding sites 1 and 2 are

pre-sent, the CTLD is not known to bind carbohydrates

The domain, however, interacts with several kringle

domain-containing proteins, including plasminogen,

and the interaction involves several residues from the

Ca2+-binding site 2 Moreover, the interaction with

kringle domain 4 of plasminogen is only possible when

Ca2+ is lost from the binding site [71], which leads to

changes in the long loop region conformation similar to

those observed in the apo-MBP-C [58,68] The

physiolo-gical role of Ca2+ as an inhibitor of the

tetranec-tin⁄ plasminogen interaction is, however, unclear

The antifreeze protein (AFP) from Atlantic herring

provides an interesting example of a CTLD in which

Ca2+bound in site 2 is involved in an interaction with

a noncarbohydrate ligand [72] Ewart et al have

shown that not only is the antifreeze activity of the

protein Ca2+-dependent [73], but that it is disrupted

by minor changes in the geometry of the Ca2+-binding

site 2 introduced by replacing the original

galactose-type QPD motif by a mannose-galactose-type EPN motif [72]

This strongly suggests that the Ca2+ site 2 in the

herring antifreeze protein interacts directly with the ice

crystal altering its growth pattern

Ligand binding

CTLDs selectively bind a wide variety of ligands As

the superfamily name suggests, carbohydrates (in

var-ious contexts) are primary ligands for CTLDs and the

binding is Ca2+-dependent [74] However, the fold has

been shown to specifically bind proteins [75], lipids [76]

and inorganic compounds including CaCO3 and ice

[72,77–79] In several cases the domain is multivalent

and may bind both protein and sugar [80–82]

Carbohydrate binding is, however, a fundamental

function of the superfamily and the best studied one

The first characterized vertebrate CTLDcps were

Ca2+-dependent lectins, and most of the functionally

characterized CTLDcps from lower organisms were

isolated because of their sugar-binding activity

Although as the number of CTLDcp sequences grows

it becomes clearer that the majority of them do notpossess lectin properties, CTLDcps are still regarded

as a lectin family (according to Drickamer,  85% of

C elegans and 81% of Drosophila CTLDcps are dicted as noncarbohydrate binding [9]) Unlike manyother functions of the CTLDcps, Ca2+-dependent car-bohydrate binding is found across the whole phylo-genetic distribution of the family, from sponges tohuman, and thus is likely to be the ancestral function.Also, Ca2+⁄ carbohydrate-binding CTLDs from differ-ent species demonstrate amazing similarity in themechanisms of sugar binding Systematic studies byDrickamer and his colleagues have provided in depthunderstanding of many aspects of this mechanism.The results of this theoretical and experimental workestablished a basis for developing bioinformatics tech-niques for predicting CTLD sugar-binding propertieswith substantial reliability by sequence analysis [83].Whole-genome studies of the CTLD family published

pre-by Drickamer and his colleagues focused on the tion of the carbohydrate-binding properties and usedthese prediction methods [6–8] Although our approachfor the Fugu rubripes genome was somewhat different[9], for carbohydrate-binding prediction we used thetechniques developed by Drickamer and coworkers

evolu-An overview of the literature on the mechanism of

Ca2+-dependent monosaccharide binding by CTLDs isgiven next

Ca2+-dependent monosaccharide bindingThe mechanism of Ca2+-dependent monosaccharidebinding by several CTLDs has been studied in greatdetail by X-ray crystallography, site-directed muta-genesis and biochemical methods The first crystallo-graphic study of a complex between a CTLD and acarbohydrate was carried out on rat MBP-A and theN-glycan Man6-GalNAc2-Asn [51] In the structureobtained, a ternary complex between the terminalmannose moiety of the oligosaccharide, the Ca2+ ionbound in site 2 and the protein was observed Thecomplex is stabilized by a network of coordination andhydrogen bonds: oxygen atoms from 4- and 3- hydrox-yls of the mannose form two coordination bonds withthe Ca2+ion and four hydrogen bonds with the carbo-nyl sidechains that form the Ca2+-binding site 2(Fig 5) This bonding pattern is fundamental forCTLD⁄ Ca2+⁄ monosaccharide complexes, and isobserved in all known structures It is also a majorcontributor to the binding affinity, especially inCTLDs specific for the mannose group of monosac-charides For example in MBP-A, mannose atoms

Trang 9

form very few interactions with the protein other than

hydrogen⁄ coordination bond formation by the two

equatorial hydroxyls, and extensive mutagenesis

screening has shown that the only other significant

contributor to mannose binding is Cb from His189

that forms a hydrophobic interaction with the sugar

[84]

The positioning of hydrogen donors and acceptors in

the binding sites has two important features First, it

determines the overall positioning and orientation of the

ligand in the binding site It may seem from Fig 5A that

the sugar-binding site of CTLDs has a twofold

sym-metry axis relating the sugar hydroxyls, and the

hypo-thetical sugar shown can be rotated by 180 without

introducing any changes to the bonding scheme It is

now known that this is indeed the case, although some

early modeling and mutagenesis studies were based on

the assumption that the orientation of the sugar was

fixed However, when the structure of a complex

between rat MBP-C with mannose was determined, the

orientation of the bound mannose was opposite to the

orientation that was observed in MBP-A [85], and

fur-ther studies revealed some of the factors that determine

the preferred orientation [86] Although the rat MBPs

are the only established example of a CTLD that can

bind carbohydrates in both orientations, it is known

that different CTLDs bind the same monosaccharide in

different orientations (e.g galactose-binding MBP-A

mutant and CEL-I vs TC-14 lectin)

The second constraint imposed by the Ca2+

-coordi-nation site on the ligand determines the properties of

the carbohydrate hydroxyls that the site can accept,and this is best demonstrated by the mechanism of dis-crimination between the mannose group of monosac-charides and the galactose group of monosaccharides

by CTLDs As noted previously, early in the history ofCTLDs an important correlation between the residuesflanking the conserved cis-proline in the long loopregion, which are involved in Ca2+-binding site 2 for-mation, and the specificity for either galactose or man-nose was made In all mannose-binding proteinsknown at that time, the sequence of the motif wasEPN (E185 and N187 in MBP-A), while in the galac-tose-specific CTLDs it was QPD In a series of elegantmutagenesis experiments Drickamer and coworkershave shown that replacing the EPN sequence in MBP-Awith a galactose-type QPD sequence was enough toswitch the specificity to galactose [87], and that furthermodifications around the binding site (mainly intro-duction of a properly positioned aromatic ring to form

a hydrophobic interaction with the apolar face of thesugar) can increase the affinity and specificity of themutant MBP-A for galactose to the level observed innatural galactose-binding CTLDs [88]

Crystallographic analysis of the galactose-specificMBP-A mutant showed that the EPN to QPD changedoes not cause any serious restructuring of the Ca2+-binding site 2 geometry [89]; this suggested that thekey switch in the specificity was induced by swappingthe hydrogen-bond donor and acceptor across themonosaccharide-binding plane and changing thehydrogen-bonding pattern from the mannose-type

Coordination bond H-bond

Protein groups that act as hydrogen donors and acceptors are not shown Arrows show the direction of hydrogen bonds in mannose-specific CTLDs, while light-grey arrows indicate changed directions in galactose-specific CTLDs (B) A stereoview of the MBP-A complex with man- nose (PDB 2msb) Coordination bonds are orange Hydrogen bonds where sugar hydroxyl acts as acceptor and donor are red and blue,

Trang 10

asymmetrical (Fig 5A, dark-grey arrows) to

galactose-type symmetrical (Fig 5A, light-grey arrows) The

same distribution of hydrogen-bonding partners was

observed in the galactose-binding lectin TC-14 from

the tunicate Polyandrocarpa misakiensis [64] The

TC-14 CTLD contains an unusual EPS motif in the

long loop region, which is similar to the motifs of

the mannose-binding proteins but contains a serine as

a hydrogen-bond donor instead of the asparagine in

MBP-A The crystal structure revealed that due to a

compensatory change on the opposite side of the

ligand-binding site (the ‘WND’ motif is changed to

LDD), and a 180 rotation of the galactose residue

compared with the orientation observed in the

galac-tose-binding MBP-A mutant, the symmetrical pattern

of the hydrogen bonding is maintained

Although many of the determinants of the

monosac-charide-binding specificity have been established

experi-mentally, the mechanism underlying them is still

unclear Mutual spatial disposition of bonded

hydrox-yls, which was initially suggested to be the main

contri-butor to the specificity, is no longer considered so

important; a growing number of crystal structures of

CTLDs with the MBP-A-like (‘asymmetrical’)

distribu-tion of hydrogen-bond donors and acceptors have

shown that the core binding site is compatible not only

with any two equatorial hydroxyl (3- and 4-OH of

man-nose and glucose, 2- and 3-OH of fucose), but also with

a combination of axial and equatorial hydroxyls (3- and

4-OH of fucose, as in E- and P-selectin structures) A

comparative study of different lectin-carbohydrate

com-plexes published by Elgavish and Shaanan [90] suggests

that additional stereochemical factors need to be taken

into consideration Elgavish and Shaanan noted the

unique clustering of hydrogen-bond donors and

accep-tors around the 4-OH hydroxyl in all compared

struc-tures, which was not observed for other hydroxyls: in a

Newman projection along the O4-C4 bond, hydrogen

bond acceptors are never gauche to both vicinal ring

carbons (C3 and C5), and thus the 4-OH proton is

always pointing outside the ring Poget et al [64]

confirmed this observation and also noted that in

CTLDs the same rule is also true for the 3-OH proton

However, no explanation of the unique stereochemistry

of the 4-OH binding orientation has been offered

Other contributions to monosaccharide binding

affinity and specificity

Although the networks of interactions between the

Ca2+ ion, the carbonyl residues that coordinate it and

the sugar hydroxyls determines the basic binding

affin-ity and specificaffin-ity to either mannose-type or

galactose-type monosaccharides, other structural elements in thebinding sites increase the affinity to the level requiredfor efficient binding, impose steric limitations on theorientation of the ligand and introduce selectivity tothe particular members within the mannose or galac-tose groups

Structural determinants of specificity for particularmonosaccharides from both mannose and galactosegroups were studied by protein engineering on theMBP framework [91,92] and by mutagenesis of severalwild-type proteins (mechanisms of discriminationbetween Glc and GlcNAc by chicken hepatic lectin[93], contribution of His189 to the mannose-bindingaffinity in MBP-A [84], mutations affecting MBP-Abinding of mannose [94], discrimination between Gal-NAc and Gal by ASGPR [95], increasing the mutantMBP-A affinity towards galactose [88], role of van derWaals interaction with Val351 in fucose recognition

by human DC-SIGN [96] and residues affectingpH-dependent ligand release by ASGPR [70]) Theseadditional contributors to binding, however, are vari-able even between close homologues, which combinedwith the inherent plasticity of the core binding sitemakes any predictive modeling questionable

Reliability of Ca2+/carbohydrate-bindingprediction

As noted above, the molecular mechanism of Ca2+dependent carbohydrate binding is conserved in allfamily members studied; the amino acids that form thecore of the binding sites form characteristic motifs(‘EPN’ and ‘WND’) that can be identified by sequencesimilarity and are indicative of the binding specificity(mannose vs galactose) These observations provide asimple and very popular approach to predicting whe-ther a CTLD of unknown function is likely to bindsugar (‘EPN’ and ‘WND’ present) and whether itwould preferentially bind mannose- or galactose-typeligands (‘EPN’ vs ‘QPD’) This simple prediction tech-nique is widely used and has proven to be reliable inmany cases However, its development was based oncomparison of a limited set of well-characterizeddomains, whereas the number of uncharacterizedsequences to which it is applied is quickly growing, asdoes also the evolutionary distance between the char-acterized and new sequences It is therefore important,especially for studies involving large-scale CTLDsequence analysis, to take into account the assump-tions on which this approach is based, and its possiblelimitations

-The three main assumptions are: (a) the presence

of Ca2+-binding site 2 strongly suggests sugar-binding

Trang 11

activity (b) Ca2+-dependent sugar binding involving

Ca2+-binding site 2 is the only (major) mechanism of

monosaccharide binding by CTLDs (c) Positioning

of hydrogen-bond donors and acceptors flanking the

conserved proline in the long loop region determines

specificity to either mannose- or galactose-type

mono-saccharides

As described above, the presence of the residue

motifs associated with Ca2+-binding site 2 does not

guarantee that the CTLD will bind carbohydrates

Several examples exist in the literature where

sugar-binding activity and specificity predicted from the

sequence were not confirmed by experiment The

CTLD of human tetranectin contains a galactose-type

QPD motif and binds two Ca2+ ions, but the only

demonstrated carbohydrate-binding activity of this

protein [97] is not associated with the CTLD [98]

Antifreeze protein from Atlantic herring also contains

a galactose-type QPD motif and binds Ca2+, but does

not bind carbohydrate [99] Although human

macro-phage mannose receptor CTLDs 4 and 5 both contain

mannose-type EPN motifs and other positions

typic-ally involved in Ca2+-binding are occupied by identical

or similar residues [100], monosaccharide-binding

activity could be demonstrated only for CTLD 4 [101]

On the other hand, lung surfactant protein A has an

EPK motif in the long loop region, but binds Ca2+ at

site 2 and also monosaccharides from the mannose

group [102,103]

As to the second assumption, there is no firm

evi-dence to indicate an alternative mechanism of

mono-saccharide binding by CTLDs exists, but we found

several examples in the literature that may suggest this

possibility These were: (a) existence of a secondary

site was proposed for rabbit and rat hepatic lectins

based on binding data [104] (b) In a study using a

photo-activatable galactose derivative to map the

bind-ing site of a galactose-specific lectin from acorn

barna-cle (BRA-3) the labeled regions were not adjacent to

the Ca2+-site 2 [105] (c) A secondary binding site was

observed in one of the MBP-C crystals soaked with a

high concentration (1.3 m) of a-methyl-mannose [85]

Although the second binding site was not observed at

lower monosaccharide concentration (0.2 m) and

elec-tron density for the sugar could only be assigned for

one of the two copies in the asymmetric unit, it has

been suggested that the secondary binding site may be

a part of an extended site that has significant affinity

only for larger ligands [85] Interestingly, the

monosac-charide bound at the alternative site is in contact with

the regions corresponding to the regions labeled in the

acorn barnacle lectin study (d) Although the CTLD

of human thrombomodulin does not contain the

typical Ca-binding sequence signature, aggregation ofmelanoma cells mediated by it is abolished by Ca2+removal or by addition of mannose, chondroitin sul-fate A or chondroitin sulfate C [106], which suggests

a Ca2+-dependent carbohydrate-binding activity (e)Dectin-1, which is reported as a macrophage b-glucanreceptor [107], does not contain Ca2+⁄ carbohydratebinding motifs or require Ca2+for carbohydrate bind-ing; residues required for binding are located in strandb3 [108] Other group V CTLDcps may have similarproperties [81]

The evidence for an alternative mechanism of sugarbinding by CTLDs is scarce and does not show anycommon trend On the other hand, a surprisingly large(> 80%) number of CTLDs from invertebrates arepredicted as not sugar binding It is possible that some

of these proteins use an alternative mechanism forsugar binding In this regard the example of the Linkgroup of proteins is pertinent These proteins do notcontain a long loop region but nevertheless bind car-bohydrates via a different mechanism

As to the last assumption, there is no compellingexplanation of the correlation between donor-acceptorpositioning in site 2 and the discrimination betweengalactose and mannose, although it is supported bythe majority of the CTLDs discovered since the obser-vation was made However, the example of the Poly-androcarpa lectin shows that the correlation is notabsolute [64]

Groups of vertebrate CTLDcps

In a review of the C-type lectin family published in

1993 Drickamer separated the CTLDcps known at thattime into seven groups (I to VII) based on their domainarchitecture and showed that such grouping correlateswell with the results of phylogenetic analysis of theCTLD sequences and captures functional similaritiesbetween the proteins [2] The classification was revised

in 2002 [6] with the addition of seven new groups (VIII

to XIV) Whereas the first seven groups of CTLDcpshave a substantial history and are widely referenced inthe literature, the new groups were only briefly outlined

in the work introducing them Along with the updatedclassification, a link to the ‘World-wide web-basedresource for animal lectins’ (http://www.imperial.ac.uk/research/animallectins/default.html) was published,where some additional information on the new groupscan be found, including the lists of database identifiersfor the sequences that were used to define them How-ever, no functional description of the CTLDcps fromthe new groups similar to the description of the groups

I to VII has been published The domain architecture

Trang 12

of the CTLDcps in different groups is shown in Fig 6.

In addition to the 14 groups present in Drickamer’s

updated classification, three new groups (XV to XVII)

are shown, which we have added to accommodate the

novel vertebrate proteins we identified in the study ofFugu CTLDcps [9] Table 1 summarizes the literature

on the vertebrate CTLDcp groups, focusing on thestructural and functional features of the CTLDs

Fig 6 Domain architecture of vertebrate CTLDcps, with mammalian homologues, from different groups Group numbers are indicated next

to the domain charts I –lecticans, II – the ASGR group, III – collectins, IV – selectins, V – NK receptors, VI – the macrophage mannose receptor group, VII – REG proteins, VIII – the chondrolectin group, IX – the tetranectin group, X – polycystin 1, XI – attractin, XII – EMBP, XIII – DGCR2, XIV – the thrombomodulin group, XV – Bimlec, XVI –SEEC, XVII CBCP.

Ngày đăng: 30/03/2014, 11:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm