Báo cáo khoa học: Structure and function of KH domains docx

Here we compare the structures of the tan-dem KH1–KH2 domains from protein NusA Protein Data Bank entry 2ASB [14,15] and from human FMRP hFMRP Protein Data Bank entry 2QND [17] as exampl

Trang 1

Structure and function of KH domains

Roberto Valverde1, Laura Edwards2and Lynne Regan1,3

1 Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA

2 Department of Molecular and Cellular Developmental Biology, Yale University, New Haven, CT, USA

3 Department of Chemistry, Yale University, New Haven, CT, USA

Introduction

The hnRNP K homology (KH) domain was named

for the human heterogeneous nuclear

ribonucleopro-tein K (hnRNP K), the ﬁrst proribonucleopro-tein in which the motif was identiﬁed [1] The KH motif consists of approxi-mately 70 amino acids, and is found in a diverse vari-ety of proteins in archaea, bacteria and eukaryota

Keywords

fragile X mental retardation; interaction

motif; KH domains; K homology domain;

noncrystallographic symmetry; protein motif;

RNA-binding; RNA-binding protein;

RNA-recognition; solvent accessibility

Correspondence

L Regan, Yale University, 266 Whitney

Avenue, New Haven, CT 06520, USA

Fax: +1 203 432 3104

Tel: +1 203 432 9843

E-mail: lynne.regan@yale.edu

(Received 3 January 2008, revised 18

February 2008, accepted 14 March 2008)

doi:10.1111/j.1742-4658.2008.06411.x

The hnRNP K homology (KH) domain was first identified in the protein human heterogeneous nuclear ribonucleoprotein K (hnRNP K) 14 years ago Since then, KH domains have been identified as nucleic acid recognition motifs in proteins that perform a wide range of cellular functions KH domains bind RNA or ssDNA, and are found in proteins associated with transcriptional and translational regulation, along with other cellular processes Several diseases, e.g fragile X mental retardation syndrome and paraneoplastic disease, are associated with the loss of function of a particular

KH domain Here we discuss the progress made towards understanding both general and speciﬁc features of the molecular recognition of nucleic acids by

KH domains The typical binding surface of KH domains is a cleft that is versatile but that can typically accommodate only four unpaired bases Van der Waals forces and hydrophobic interactions and, to a lesser extent, elec-trostatic interactions, contribute to the nucleic acid binding affinity ‘Aug-mented’ KH domains or multiple copies of KH domains within a protein are two strategies that are used to achieve greater affinity and specificity of nucleic acid binding Isolated KH domains have been seen to crystallize as monomers, dimers and tetramers, but no published data support the forma-tion of noncovalent higher-order oligomers by KH domains in soluforma-tion Much attention has been given in the literature to a conserved hydrophobic residue (typically Ile or Leu) that is present in most KH domains The inter-est derives from the observation that an individual with this Ile mutated to Asn, in the KH2 domain of fragile X mental retardation protein, exhibits a particularly severe form of the syndrome The structural effects of this muta-tion in the fragile X mental retardamuta-tion protein KH2 domain have recently been reported We discuss the use of analogous point mutations at this posi-tion in other KH domains to dissect both structure and funcposi-tion

Abbreviations

BPS, branchpoint sequence; dFXRP, Drosophila fragile X-related protein; FBP, FUSE-binding protein; FMRP, fragile X mental retardation protein; FUSE, ssDNA far-upstream element; FXRP, fragile X-related protein; hFMRP, human fragile X mental retardation protein; hnRNP K, human heterogeneous nuclear ribonucleoprotein K; KH, hnRNP K homology; KSRP, K homology splicing regulator protein; NCS,

noncrystallographic symmetry; PCBP, poly(C)-binding protein; PSI, P-element somatic inhibitor protein; SF1, splicing factor 1; Y2H, yeast-two hybrid.

Trang 2

[1,2] Typically, KH domains are found in multiple

copies, two in fragile X mental retardation protein

(FMRP) [3–5], three in hnRNP K [1,6], and 14 in

vigi-lin [7,8] There are, however, a few examples of

pro-teins with single KH motifs; Mer1p [1,9] and Sam68

[10] each have just one The typical function of KH

domains, whether they are present in single or multiple

copies, is RNA or ssDNA recognition When present

in a protein in multiple copies, KH domains can

func-tion independently or cooperatively In ssDNA

far-upstream element (FUSE)-binding protein (FBP), for

example, the KH3 and KH4 domains are separated by

a ﬂexible Gly linker with no interdomain contacts [11]

Each KH domains binds to a segment of ssDNA, with

a linker of noncontacted ssDNA between [12] By

con-trast, the two KH domain of NusA have an extensive

interdomain contact area, and bind an extended

seg-ment of RNA that runs across both domains [13–15]

KH modules are found in many different proteins,

which are involved in a myriad of different biological

processes, including splicing, transcriptional regulation,

and translational control

Two folds, one motif

It was pointed out by Grishin that there are actually

two different versions of the KH motif, which he

named type I and type II KH folds (Fig 1) [2] The

type I fold is typically found in eukaryotic proteins,

whereas the type II fold is typically found in

prokary-otic proteins Although type I and type II folds both

share a ‘minimal KH motif’ in the linear sequence, the

three-dimensional arrangement of the secondary

struc-tural elements is different In the type I fold, a b-sheet

composed of three antiparallel b-strands is abutted by

three a-helices (a1, a2, and a¢) The b-sheet in type I

KH domains consists of three b-strands in the order b1, b¢ and b2 The b1-strand and b2-strand are parallel

to each other, and the b¢-strand is antiparallel to both (Fig 1) This all-antiparallel arrangement of strands distinguishes the type I KH fold from the type II KH fold, in which the b1-strand and b2-strand are adjacent and parallel to each other, and the b¢-strand is adja-cent and antiparallel to the b1-strand (Fig 1) The length and sequence of the variable loop are different

in different KH domains, be they type I or type II (the variable loop is shown as a dotted line in Fig 1) Vari-able loop lengths from three to over 60 residues are known All typical KH domains have a GXXG loop (shown in white in Fig 1) [2], although this is some-times altered or interrupted in divergent KH domains [16]

Not only is the order of secondary structural ele-ments in individual eukaryotic type I KH domains dif-ferent from that in prokaryotic type II KH domains, but the relative orientation of tandem type I versus type II KH domains is also quite different The com-parison is limited, however, because the structure of only one of each type of tandem KH domain has been published Here we compare the structures of the tan-dem KH1–KH2 domains from protein NusA (Protein Data Bank entry 2ASB) [14,15] and from human FMRP (hFMRP) (Protein Data Bank entry 2QND) [17] as examples of tandem prokaryotic KH (type II) domains and tandem eukaryotic (type I) KH domains, respectively In NusA, an unstructured six amino acid linker connects KH1 to KH2, and an area of

1380 A˚2 is buried at the interface between the b-sheet of KH1 and the a-helices (a¢ and a2) of KH2 (Fig 2B) By contrast, in hFMRP(KH1–KH2D), the a¢-helix of KH1 is linked to the b1-strand of KH2 by the single residue, Glu280, which adopts non-b non-a

A

B

Fig 1 Type I and type II KH domain folds.

Stylized representations of (A) the type I KH

domain (eukaryotic) and (B) the type II KH

domain (prokaryotic) The labeling of

second-ary structure elements is according to

stan-dard KH nomenclature [2] The dotted line

connecting the b2-strand and b¢-strand

rep-resents the variable loop The white line

connecting the a1-helix and the a2-helix

represents the GXXG loop.

Trang 3

phi⁄ psi angles to accomplish this tight connection,

which contains minimal interdomain contacts between

aliphatic residues from the a1-helix of KH1 and the

b-sheet of KH2 [17,18]

Evolutionary relationships between KH

domains

Type I domains are found in multiple copies in

eukaryotic proteins, whereas type II KH domains are

typically found as single copies in prokaryotic proteins

Here, therefore, we discuss eukaryotic proteins Within

a family of KH proteins with multiple KH domains

(i.e type I KH domains), the KH1 domain is always

more similar to other KH1 domains in different

pro-teins than to the KH2 and KH3 domains in the same

protein Similar relationships are seen for KH2 and

KH3 domains – they are more similar to other KH2

and KH3 domains, respectively, than they are to each

other or to KH1 domains (Fig 3) This relationship

holds true in all families and between species, from

those within which the like-pairs of domains have very

high identity [over 95% in the Nova and

poly(C)-bind-ing protein families], to those within which like-pairs

of domains have much lower identity (around 50% in

the FXR family; Fig 3)

From this observation, a number of hypotheses

about the origin and evolution of the KH domains

may be proposed If multiple KH domains arose as a

result of a gene duplication event, the results cited

above suggest that duplication occurred before the

divergent evolution of the members of each protein

family Alternatively, one could speculate that the

interdomain identities are a result of convergent

evolu-tion of different domains in a parent protein, before

subsequent evolutionary divergence produced different members of the family

Nucleic acid binding by KH domains – general features

The structures of KH domains in complex with their cognate nucleic acid ligand are mostly of type I domains from eukaryotic proteins, which function in transcriptional and translation regulation The only structures of type II KH domains in complex with nucleic acid ligand are of the bacterial protein NusA [15] (Protein Data Bank entries 2ATW and 2ASB)

Although the total number of structures in the Pro-tein Data Bank of KH domains bound to cognate nucleic acid ligand is small, some common features of nucleic acid recognition emerge among them The RNA or DNA is bound in an extended, single-stranded conformation across one face of the KH domain, between the a1-helix and the a2-helix and GXXG on the ‘left’, and the b2-sheet and the vari-able loop on the right (Fig 4A) Together, these secondary structural elements form a binding cleft that accommodates four bases Note that the secondary structure elements that shape the binding cleft com-prise, in part, the core motif found in type I and type II domains The variable loop in type II KH domains, however, is located at the bottom of the binding cleft (Fig 4A) The center of the binding pocket tends to be hydrophobic, with a variety of additional speciﬁc interactions stabilizing the complex Nucleic acid base-to-protein aromatic side-chain stack-ing interactions, which are prevalent in other types of single-stranded nucleic acid binding motifs [19,20], are notably absent in KH domain nucleic acid recognition

In some complexes, the bases in the ssDNA or RNA bound by the KH domain stack with each other (Fig 4B), whereas in other examples there is no base stacking

An adenine–backbone interaction is a feature seen in some KH domain–nucleic acid structures (Fig 4C) Examples are (relevant adenine in bold) A42–G43– A44–A45 in NusA KH1, C48–A49–A50–U51 in NusA KH2 [15], U12–C13–A14–C15 in Nova-2 KH3 [21], and U6–A7–A8–C9 in splicing factor 1 (SF1) [22] The adenine bases hydrogen bond to the protein backbone, mimicking a Watson–Crick base pairing pattern Superimposing the NusA KH1 domain and cleotides 42–46 on the NusA KH2 domain and ribonu-cleotides 48–53 reveals that the adenine bases of A44 and A50 make exactly equivalent hydrogen bonds to the protein backbone [15]

Fig 2 The orientation of individual KH domains in tandem type I

and type II arrays Schematics are based on the crystal structures

of the KH1–KH2 domains of NusA (type II) (Protein Data Bank entry

1KOR) and fragile X mental retardation protein [type I (B)] (Protein

Data Bank entry 2QND) Each domain is represented as an oval

with the b-sheet side colored solid black and the abutting a-helices

striped.

Trang 4

KH domains bind ssDNA and RNA with low

micromolar afﬁnity For example, the Kdvalues of the

KH domain of the SF1–DNA complex and the

hnRNP K KH3 domain–DNA complex are 3 lm and

1 lm, respectively [22,23] The clustering of KH

domains increases nucleic acid recognition and

speciﬁc-ity [24]; the four tandem KH domains of P-element

somatic inhibitor protein (PSI), for example, bind

ligand cooperatively [25] The KH1–KH2 domains of

NusA (Protein Data Bank entries 2ATW and 2ASB)

form an uninterrupted recognition surface that binds

RNA with nanomolar afﬁnity [15] Together, the third

and fourth KH domains of the K homology splicing

regulator protein (KSRP) bind RNA ligand more

tightly than each does separately [26]

Finally, where the structures of both the KH–nucleic acid complex and free KH domain have been deter-mined, ligand binding produces little or no structural change in the protein as determined by our analysis [27] and concluded in [15,21,28]

Nucleic acid recognition by KH domains – specific examples

NMR structure of the KH3 domain of hnRNPK with ssDNA bound

The type I KH3 domain of the transcriptional regula-tor hnRNP K binds to a 10mer ssDNA, speciﬁcally recognizing the tetrad 5¢-dTCCC (Fig 5) [23] (Protein

FMRP KH1 100.0 - - -

-FMRP KH2 21.7 100.0 - - -

-FXR1 KH1 82.0 23.0 100.0 - - - -

-FXR1 KH2 20.3 55.2 17.7 100.0 - - -

-FXR2 KH1 64.3 20.0 68.6 19.0 100.0 - -

-FXR2 KH2 21.7 53.7 22.9 82.0 22.8 100.0 - -dmFMR1 KH1 54.4 22.1 58.8 23.1 58.8 25.6 100.0 -dmFMR1 KH2 22.5 43.1 24.0 65.3 26.7 62.5 22.6 100.0 NOVA-1 KH1 NOVA-1 KH2 NOVA-1 KH3 NOVA-2 KH1 NOVA-2 KH2 NOVA-2 KH3 NOVA-1 KH1 100.0 - - - -

-NOVA-1 KH2 35.3 100.0 - - -

-NOVA-1 KH3 40.3 37.3 100.0 - -

-NOVA-2 KH1 95.5 36.8 36.8 100.0 - -NOVA-2 KH2 32.4 86.3 34.3 34.3 100.0 -NOVA-2 KH3 38.8 35.8 90.9 37.3 35.4 100.0 PCB1 KH1 PCB1 KH2 PCB1 KH3 PCB2 KH1 PCB2 KH2 PCB2 KH3 PCB1 KH1 100.0 - - - -

-PCB1 KH2 33.8 100.0 - - -

-PCB1 KH3 35.4 33.8 100.0 - -

-PCB2 KH1 95.2 33.8 32.3 100.0 - -PCB2 KH2 35.4 93.8 31.0 35.4 100.0 -PCB2 KH3 33.8 35.4 92.1 30.8 32.4 100.0 PCB3 KH1 88.7 32.3 36.9 90.3 33.8 35.4 PCB3 KH2 35.4 84.6 35.4 33.8 89.2 36.9 PCB3 KH3 36.4 38.5 84.1 33.3 40.0 84.1 PCB4 KH1 74.2 35.4 35.4 69.4 33.8 35.4 PCB4 KH2 35.4 76.9 36.9 33.8 80.0 38.5 PCB4 KH3 33.8 33.8 66.7 30.8 35.4 71.4 PCB3 KH1 PCB3 KH2 PCB3 KH3 PCB4 KH1 PCB4 KH2 PCB4 KH3 PCB1 KH1 - - -

-PCB1 KH2 - - -

-PCB1 KH3 - - -

-PCB2 KH1 - - -

-PCB2 KH2 - - -

-PCB2 KH3 - - -

-PCB3 KH1 100.0 - - - -

-PCB3 KH2 33.4 100.0 - - -

-PCB3 KH3 36.9 40.0 100.0 - -

Fig 3 Table showing sequence identities of KH domains within protein families Data for the FMRP, Nova and PCBP families are shown For each family, the sequences of individual KH domains were aligned with KH domains at different positions in the same protein, and KH domains at the same position in different proteins The highest percentage identities were consistently those between KH domains at the same position in different members of a protein family (highlighted in purple).

Trang 5

Data Bank entry 1JK5) The authors propose that the

complex is stabilized by methyl-to-oxygen hydrogen

bonds between three Ile side-chains and the O2 and

N3 atoms of the two central cytosine bases

Methyl-to-oxygen hydrogen bonds are uncommon and weak, but

not without precedent [29,30] Additional interactions

that stabilize the complex include protein backbone

and side-chain hydrogen bonds to bases, and

electro-static interactions between positively charged

side-chains on the protein and the phosphate backbone of

the nucleic acid

Poly(C)-binding proteins

Poly(C)-binding proteins (PCBPs) contain three type I

KH domains, which appear to function independently,

because they are separated by long linkers: KH1–(16

amino acid spacer)–KH2–(67 to > 100 amino acid

spacer)–KH3 They bind to poly(C)-rich DNA and

RNA sequences and function in a diverse range of

cel-lular processes, including mRNA stabilization,

transla-tional activation, and translatransla-tional silencing [31,32]

Crystal structures have been solved of the

PCBP2 KH1 in complex with a 12-nucleotide ssDNA

and with its RNA equivalent (Protein Data Bank

entries 2PQU and 2PQY, respectively) [33] In both

the ssDNA and RNA complexes, the 12 nucleotides

correspond to two repeats of the human C-rich strand

telomeric DNA, 5¢-AACCCTAACCCT-3¢ (a single

repeat is underlined, and the core recognition sequence

is in bold) The asymmetric unit of both ssDNA and

RNA crystals contains two KH1 molecules tethered by

one oligonucleotide ligand The crystal structures of

PCBP2 KH1 in complex with either 12-nucleotide

ssDNA or its equivalent RNA are similar, with no indication that the hydroxyl groups of the RNA bases are involved in interactions with the protein (Fig 6A) The CCCT⁄ U tetranucleotide motif constitutes the core recognition sequence

Interestingly, however, when PCBP2 KH1 was crys-tallized with a seven-nucleotide single repeat ssDNA ligand 5¢-AACCCTA-3¢ (core recognition sequence in bold), a different ‘register’ of the nucleic acid–protein complex was observed [28] (Protein Data Bank entry 2AXY, shown in Fig 6B) In all structures, the nucleic acid was in the ‘typical’ cleft, but its position relative

to the protein was shifted up by one base in the 5¢-direction in the seven-nucleotide structure (ACCC versus CCCT; Fig 6A–C) The ﬁrst position of the core recognition motif sits on top of the a1-helix, and then the phosphate backbone of the next two nucleo-tides interacts with the a1-helix and the GXXG motif

on the left, and the b2-strand and the variable loop on the right Base stacking is observed between the third and fourth position nucleotides of the core recognition sequence The recently solved high-resolution structure

of the third KH domain of PCBP2 bound to ssDNA, 5¢-dAACCCTA-3¢ [34] (Protein Data Bank entry 2P2R) is similar to previous structures of the ﬁrst KH domain of PCBP2 However, because the crystals dif-fracted to ultra-high resolution, hydrogen bonding and water molecules mediating protein DNA contacts were observed that previously could not be resolved in other crystal structures Speciﬁcally, the binding cleft is occu-pied by the tetrad 5¢-CCCT-3¢, with direct water-medi-ated contacts stabilizing the last two bases, and protein nucleic acid contacts to two additional bases beyond the binding cleft where seen Also of interest is

C

Fig 4 Common features of KH domain– nucleic acid interactions (A) Type I KH domain; the binding cleft comprises the sec-ondary structural elements a1-helix, GXXG loop, a2-helix, b2-strand, and variable loop (colored green), and recognizes four nucleo-tides (cyan sticks) The green dotted line represents the location of the variable loop

in type II KH domains (B) Nucleic acid bases of the ligand stacking with each other Coordinates from Protein Data Bank entry 1J5K were used in (A) and (B), and coordinates from Protein Data Bank entry 2ASB were used in (C).

Trang 6

the observation that in different crystal forms, the KH

domains of PCBP2 were either monomeric or were a

crystal-contact-mediated dimer (see section on KH

dimers)

RNA recognition by a single KH domain in

cooperation with a QUA2 domain

SF1 speciﬁcally recognizes the intron branchpoint

sequence (BPS) UACUAAC in pre-mRNA transcripts

[35], with KH domain binding augmented by

addi-tional interactions with an N-terminal helix known as

the QUA2 domain (labeled in Fig 7) [36] The RNA

adopts an extended single-stranded conformation, and

is bound in a hydrophobic groove between QUA2, the GXXG loop and the variable loop of the KH domain [22] (Fig 7; Protein Data Bank entry 1K1G) The QUA2 region recognizes the 5¢-nucleotides of the BPS (ACU), with the a1-helix and a2-helix and the b2-strand of the KH domain region interacting with the next nucleotides of the RNA in ‘typical’ fashion

A large surface area of predominantly aliphatic hydro-phobic residues is buried at the protein–RNA inter-face In addition, positively charged side-chains undergo electrostatic interactions with the solvent-exposed phosphate backbone Protein contacts to the 3¢-end of the RNA are provided by the variable loop and the b2-strand

Binding of the seven-nucleotide RNA BPS requires both the QUA2 and KH regions Another example of

an augmented KH domain is the fourth KH domain

of KSRP [26], which contains a novel fourth b-strand located adjacent and angled to the b1-strand and contributes to the stability of the protein (Protein Data Bank entry 2HH2) It is not yet known whether the fourth b-strand is involved in contacts with RNA [26]

X-ray structure of Nova-2 KH3 plus SELEX RNA The X-ray structure of the KH3 domain of Nova-2 bound to an in vitro selected stem–loop RNA contain-ing the 5¢-UCAC-3¢ core recognition sequence has been solved [21] (Fig 8) This structure is something of an

‘outlier’, because the nucleic acid has a double-stranded hairpin stretch (not shown in Fig 8), which may be a consequence of stability requirements for selection in vitro [37]

The stem of the hairpin adopts the A-form double-helical conformation, with four Watson–Crick base pairs (G1–C20, A2–U19, G3–C18, G4–C17) and a single hydrogen bond between A5 and C16 (N1– O2 = 2.4 A˚)

The extended target RNA (A11, U12, C13, A14, C15) lies upon a hydrophobic platform (formed by the a1-helix and the edge of the b2-strand), where it con-tacts both the invariant GXXG motif and the variable loop

Nucleic acid binding by tandem but independent

KH domains – NMR structure of the KH3 and KH4 domains of FBP in complex with FUSE ssDNA FUSE-binding protein has four KH domains, which are separated by linkers of varying lengths [11] FBP regulates c-myc expression by binding to FUSE [38] The NMR structure of a complex between the KH3

Fig 5 Solution structure of the KH3 domain of hnRNP K bound to

ssDNA The third KH domain of hnRNP K (Protein Data Bank entry

1J5K) recognizes a tetrad of sequence 5¢-dTCCC (purple sticks).

Regions on the protein that are in contact with the nucleic acid

ligand are colored green (hydrophilic) and cyan (polar) The sugar

phosphate backbone curves around the a1-helix near the GXXG

loop before proceeding parallel to the a2-helix The first base sits

on top of the a1-helix, and the 5¢-dCCC bases of the tetrad fill the

interior of the predominantly hydrophobic cleft and base stack with

each other (see Fig 4B) The ends of the ssDNA sugar backbone

are stabilized by electrostatic interactions with positively charged

residues that line the ridge of the cleft on the GXXG loop and

a2-helix.

Trang 7

and KH4 domains of FBP and a 29-base ssDNA

frag-ment from FUSE [12] shows that each KH domain

binds to a separate 9- to10-base segment of ssDNA

(Fig 9) The KH domains are connected by a ﬂexible

Gly-rich linker, and behave independently In addition,

the two ssDNA segments to which the KH domains

bind are themselves separated by a ﬁve-base linker of

ssDNA There are no protein contacts between the

KH domains, and the linker DNA is not in contact

with protein

In both KH domains, the ssDNA is bound in the

typical extended orientation, in the groove between

the a1-helix and a2-helix plus the GXXG loop on

one side, and the b2-strand and the variable loop on

the other The center of the groove is hydrophobic,

and the edges are hydrophilic and charged, with the

narrow binding site (10 A˚) favoring pyrimidines over

purines

NusA – crystal structure of tandem type II KH

domains

NusA regulates transcriptional elongation, pausing,

termination and antitermination in prokaryotes [39–

41] The protein contains two tandem type II KH domains, which are connected by a short six-residue linker [14,15] This short linker, combined with a tight turn between the domains, results in a structure in which the two KH domains are in contact and form

an extended and continuous surface for RNA binding NusA binds with high afﬁnity and speciﬁcity to BoxB– BoxA–BoxC antitermination sequences within the lea-der region of the rRNA operon [15] Ligand binding produces no change in the structure or relative orienta-tion of the KH domains, (Protein Data Bank entries 1KOR and 2ASB) [27] The ssRNA is bound in an extended conformation and is in contact with large areas on both KH domains (Fig 10)

Despite having type II connectivity, each KH domain of NusA contains a ‘typical’ binding cleft The variable loop, however, hangs at the bottom of the cleft (Fig 4A) instead of up and across from the GXXG loop, as in type I KH domains The 5¢-end

of the RNA (bases A42 through A45) is buried in and across the groove between the a1-helix and a2-helix and the b2-strand of KH1 Intimate contacts between protein and RNA continue across the cusp

of the KH1 and KH2 domains C46 binds to the

Fig 6 Crystal structures of the first KH domain from PCBP-2 in complex with ssDNA The first KH domain of PCBP2 recognizes the tetrad sequence 5¢-dCCCT [(A) Protein Data Bank entry 2PQU] and 5¢-dACCC [(B) Protein Data Bank entry 2AXY) Polar and hydrophobic residues that make contacts with nucleic acid (purple sticks) are colored cyan and green, respectively Waters (gray spheres) that bridge protein and ssDNA contacts were unambiguously resolved in the high-resolution structure in (B) Both structures are representative molecules within the asymmetric unit In (C), the tetrad sequence (purple letters) of each structure is aligned with respect to the seven-nucleotide single repeat ssDNA ligand The register of the sequence is shifted in the 5¢-direction in (A) In both structures, the nucleotide at the 5¢-end of the ssDNA strand sits on the top of the a1-helix, and is stabilized by contacts that can recognize an adenine or cytosine nucleotide The central cytosine bases of the tetrad sequence occupy the hydrophobic interior of the binding cleft The last nucleotide at the 3¢-end of the ssDNA strand (dC in 2AXY; dT in 2PQU) is participating in base-stacking interactions with the preceding cytosine base.

Trang 8

loop connecting b¢ strand and a¢ helix of KH2, and

U47 and C48 make contacts with the a1-helix and

the GXXG loop of KH2 Finally, the nucleotides at

the 3¢-end of the RNA (A49–A52) pack against the

groove comprising the a1-helix and a2-helix and the

b2-strand of KH2 Hydrogen bonds to both amino

acid side-chains and the protein backbone,

electro-static and polar interactions and, to a lesser extent,

hydrophobic interactions between bases and

nonaro-matic amino acid side-chains stabilize the protein

RNA complex

The interaction of the NusA tandem KH domains

with RNA is quite different from that seen in the

dou-ble KH domain of FBP bound to ssDNA from FUSE

– the only other structure of a double KH domain

bound to a nucleic acid target In FBP, the two KH

domains are connected by a ﬂexible 30-residue

Gly-rich linker and behave like beads on a string [12] In

the protein DNA complex, each KH domain interacts

with a separate ssDNA recognition sequence, and a

ﬁve-nucleotide noninteracting spacer separates the two bound DNA recognition sequences

Although in both examples the coupling of two RNA-binding domains will effectively increase the speciﬁcity and afﬁnity of the RNA–protein interaction, the two different binding modes have very different consequences for the type and length of RNA bound

KH crystal dimers – a tenuous relationship

Crystallographic data Different KH domains crystallize as monomers, dimers, or tetramers This and other observations have

Fig 7 Solution structure of the QUA2 and KH domains of SF1 in

complex with RNA The Qua2 and KH domain of SF1, together,

recognize RNA BPS 5¢-UACUAAC (blue sticks; Protein Data Bank

entry 1K1G) Protein side-chains making polar and hydrophobic

con-tacts with RNA are colored cyan and green, respectively The

QUA2 domain (labeled) abuts the a2-helix of the KH domain, giving

rise to an expanded contact with RNA, with the five nucleotides at

the 5¢-end of the RNA contacting the QUA2 domain, exclusively.

The base of Ura6 is buried between the a1-helix and the QUA2

helix The RNA then continues in single-stranded, extended

confor-mation into the ‘typical’ KH groove Finally, the RNA loops over to

the right and makes contact with the b2-strand Note also the very

long variable loop, 24 amino acids, which loops back over the RNA

from the right.

Fig 8 Crystal structure of Nova-2 KH3 bound to SELEX RNA The third KH domain of the protein Nova-2 binds to the tetranucleotide sequence 5¢-UCAC (blue sticks; Protein Data Bank entry 1EC6), which is part of the larger SELEX RNA Protein side-chains making polar and hydrophobic contacts with RNA are shown in cyan and green, respectively U12–C13–A14 rests on a hydrophobic platform formed by the a1-helix and the b2-strand Electrostatic interactions between protein side-chains, nucleic acid bases and the sugar phosphate backbone further stabilize the complex Bases A14 and C15 participate in base-stacking interactions with each other The 2¢-hydroxyl groups of the tetrad hydrogen bond with protein or other bases, making it unlikely that ssDNA could bind tightly to this

KH domain.

Trang 9

led to the proposal that the functional form of certain

KH domains may involve noncovalent dimers or higher-order oligomers Here we review the data Crystals of the single KH3 domain of the protein Nova-2 contain four KH molecules per asymmetric unit (Protein Data Bank entry 1DTJ) related by pseudo-222 noncrystallographic symmetry (NCS; Fig 11A) with two different surfaces on each KH domain mediating protein–protein contacts (Fig 11B,C) One protein– protein interface comprises primarily two b1-strands from two KH domains related by two-fold NCS This arrangement creates an augmented antiparallel b-sheet stabilized by cross-strand side-chain interac-tions [42] and a buried surface area of 890 A˚2 [18,43] (reported as 950 A˚2 in [44]) (Fig 12A) The other interface comprises two a¢-helices with an 500 packing angle [45] of the two KH domains related by NCS that buries 1000 A˚ (reported as 1250 A˚2 in [44]; Fig 11C)

Interestingly, the same KH domain in complex with

a SELEX RNA crystallizes with only two KH mole-cules in the asymmetric unit related by NCS [21] The two KH molecules interact through related a¢-helices and bury 1000 A˚2 (Fig 12B) This arrangement is identical to the protein–protein interactions observed

in crystals of apo-Nova (Fig 11C)

Crystals of the ﬁrst KH domain of PCBP2 in plex with ssDNA contain two identical dimer com-plexes per asymmetric unit related by two-fold NCS [28] (Protein Data Bank entry 2AXY; Fig 13A) The dimer buries 1890 A˚2, and as in the protein–protein interface depicted in Fig 12A, an augmented antipar-allel b-sheet is formed by symmetry-related b1-strands and further stabilized by interactions between a¢-helices (Fig 13B) This dimeric arrangement is reproduced

in crystals of two PCBP2 KH1 molecules tethered by one ssDNA or RNA ligand [33] (Protein Data Bank entries 2PQU and 2PYQ) In the cocrystal structure

of the third KH domain of human PCBP-2 with DNA [34], however, no protein–protein contacts were observed in the crystal Instead, crystal contacts were solely formed by base-stacking interactions of DNA molecules from adjacent asymmetric units A1 of the heptanucleotide stacks on C3 of a symmetry-related DNA and vice versa

For neither the apo nor nucleic acid-bound forms of these KH domains are there published solution data in support of the idea that these KH domains may exist

as dimers or higher-order oligomers in solution [17,44], and nor have dimers or higher-order oligomers been shown to be of functional signiﬁcance in vivo

Fig 9 Solution structure of the FBP KH3–KH4 domain bound to

ssDNA The third and fourth KH domains of FBP recognize ssDNA

5¢-dTTTT (A) and 5¢-ATTC (B), respectively In both domains, the

binding cleft makes hydrophobic contacts with the ssDNA bases,

and polar residues lining the edge of the cleft contact the sugar

phosphate backbone The bases of the DNA ligand stack with each

other, with the methyl groups of thymine pointing away from the

binding cleft Both domains behave independently Although both

the KH domains and both the DNA-binding sites were present as a

single unit, neither the Gly-rich protein linker nor the noncontacted

ssDNA were resolved.

Trang 10

In crystals of the tandem KH domains from human

FMRP, there are also two molecules in the asymmetric

unit related by NCS [17] (Protein Data Bank entry

2QND) Contacts between NCS-related b2-strands

and, to a lesser extent, a1-helices bury, 2100 A˚2

(Fig 14A) This b-sheet augmentation is similar to that

seen with apo-Nova-2 KH3 and PCBP2 KH1, but its

interface comprises primarily b2–b2 and not b1–b1

interactions (compare Figs 12 and 13 with Fig 14B)

When the C2 operation is applied to the asymmetric

unit, another interface is formed between

neigh-boring KH domains This interface is mediated by

symmetry-related a¢-helices, as seen in crystals of

RNA-bound Nova-2 KH3 (Figs 11C and 12B), and buries 1200 A˚2 – signiﬁcantly less than observed in the asymmetric unit

In summary, two interfaces are commonly observed

in the crystals: (a) helix–helix packing between symme-try-related a¢-helices with a 500 packing angle, as seen in the Nova-2 KH3–RNA structure; and (b) b-sheet augmentation achieved by contacts between b1

or b2 symmetry-related strands, as seen in Nova-2 KH3, hFMRP (KH1–KH2D), and PCBP2 KH1 Caution is advised in extrapolating from crystal structures to predict the solution oligomeric state of

KH domains Although several KH domains form

Fig 10 Crystal structure of tandem type II KH domains of NusA in complex with RNA The tandem KH1–KH2 domains of NusA recognize RNA ligand 5¢-GAACUCAAUAG (A) The KH1–KH2 domains of NusA bound to cognate RNA ligand (Protein Data Bank entry 2ASB) The RNA–protein contact surface spans across both domains In particular, A45 makes contacts with residues in both KH1 and KH2 Additional polar contacts with 2¢-hydroxyls specify RNA recognition The KH1 and KH2 domains are shown separately in (B) and (C), respectively Type II KH domains are connected differently The variable loop, for example, is located at the bottom and to the left of the binding cleft Although the connection of type II KH domains is different, the structural elements that comprise the binding cleft are the same in as type I domains, and accommodate four nucleotides as well.

Fig 11 Protein–protein interfaces in Nova-2 KH3 in crystals This figure is an adaptation of Figs 6 and 7 from Lewis et al [44], using Protein Data Bank coordinates 1DTJ (A) Contents of the asymmetric unit with the two-fold NCS axis labeled The tetrameric arrangement of mole-cules produces two protein–protein interfaces (B) One protein–protein interface generated by two-fold NCS (C) Other protein–protein inter-faces also generated by two-fold NCS.

Tiêu đề	Structure and function of KH domains
Tác giả	Roberto Valverde, Laura Edwards, Lynne Regan
Trường học	Yale University
Chuyên ngành	Molecular Biophysics & Biochemistry
Thể loại	review article
Năm xuất bản	2008
Thành phố	New Haven

Định dạng
Số trang	15
Dung lượng	0,91 MB