Here we compare the structures of the tan-dem KH1–KH2 domains from protein NusA Protein Data Bank entry 2ASB [14,15] and from human FMRP hFMRP Protein Data Bank entry 2QND [17] as exampl
Trang 1Structure and function of KH domains
Roberto Valverde1, Laura Edwards2and Lynne Regan1,3
1 Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA
2 Department of Molecular and Cellular Developmental Biology, Yale University, New Haven, CT, USA
3 Department of Chemistry, Yale University, New Haven, CT, USA
Introduction
The hnRNP K homology (KH) domain was named
for the human heterogeneous nuclear
ribonucleopro-tein K (hnRNP K), the first proribonucleopro-tein in which the motif was identified [1] The KH motif consists of approxi-mately 70 amino acids, and is found in a diverse vari-ety of proteins in archaea, bacteria and eukaryota
Keywords
fragile X mental retardation; interaction
motif; KH domains; K homology domain;
noncrystallographic symmetry; protein motif;
RNA-binding; RNA-binding protein;
RNA-recognition; solvent accessibility
Correspondence
L Regan, Yale University, 266 Whitney
Avenue, New Haven, CT 06520, USA
Fax: +1 203 432 3104
Tel: +1 203 432 9843
E-mail: lynne.regan@yale.edu
(Received 3 January 2008, revised 18
February 2008, accepted 14 March 2008)
doi:10.1111/j.1742-4658.2008.06411.x
The hnRNP K homology (KH) domain was first identified in the protein human heterogeneous nuclear ribonucleoprotein K (hnRNP K) 14 years ago Since then, KH domains have been identified as nucleic acid recognition motifs in proteins that perform a wide range of cellular functions KH domains bind RNA or ssDNA, and are found in proteins associated with transcriptional and translational regulation, along with other cellular processes Several diseases, e.g fragile X mental retardation syndrome and paraneoplastic disease, are associated with the loss of function of a particular
KH domain Here we discuss the progress made towards understanding both general and specific features of the molecular recognition of nucleic acids by
KH domains The typical binding surface of KH domains is a cleft that is versatile but that can typically accommodate only four unpaired bases Van der Waals forces and hydrophobic interactions and, to a lesser extent, elec-trostatic interactions, contribute to the nucleic acid binding affinity ‘Aug-mented’ KH domains or multiple copies of KH domains within a protein are two strategies that are used to achieve greater affinity and specificity of nucleic acid binding Isolated KH domains have been seen to crystallize as monomers, dimers and tetramers, but no published data support the forma-tion of noncovalent higher-order oligomers by KH domains in soluforma-tion Much attention has been given in the literature to a conserved hydrophobic residue (typically Ile or Leu) that is present in most KH domains The inter-est derives from the observation that an individual with this Ile mutated to Asn, in the KH2 domain of fragile X mental retardation protein, exhibits a particularly severe form of the syndrome The structural effects of this muta-tion in the fragile X mental retardamuta-tion protein KH2 domain have recently been reported We discuss the use of analogous point mutations at this posi-tion in other KH domains to dissect both structure and funcposi-tion
Abbreviations
BPS, branchpoint sequence; dFXRP, Drosophila fragile X-related protein; FBP, FUSE-binding protein; FMRP, fragile X mental retardation protein; FUSE, ssDNA far-upstream element; FXRP, fragile X-related protein; hFMRP, human fragile X mental retardation protein; hnRNP K, human heterogeneous nuclear ribonucleoprotein K; KH, hnRNP K homology; KSRP, K homology splicing regulator protein; NCS,
noncrystallographic symmetry; PCBP, poly(C)-binding protein; PSI, P-element somatic inhibitor protein; SF1, splicing factor 1; Y2H, yeast-two hybrid.
Trang 2[1,2] Typically, KH domains are found in multiple
copies, two in fragile X mental retardation protein
(FMRP) [3–5], three in hnRNP K [1,6], and 14 in
vigi-lin [7,8] There are, however, a few examples of
pro-teins with single KH motifs; Mer1p [1,9] and Sam68
[10] each have just one The typical function of KH
domains, whether they are present in single or multiple
copies, is RNA or ssDNA recognition When present
in a protein in multiple copies, KH domains can
func-tion independently or cooperatively In ssDNA
far-upstream element (FUSE)-binding protein (FBP), for
example, the KH3 and KH4 domains are separated by
a flexible Gly linker with no interdomain contacts [11]
Each KH domains binds to a segment of ssDNA, with
a linker of noncontacted ssDNA between [12] By
con-trast, the two KH domain of NusA have an extensive
interdomain contact area, and bind an extended
seg-ment of RNA that runs across both domains [13–15]
KH modules are found in many different proteins,
which are involved in a myriad of different biological
processes, including splicing, transcriptional regulation,
and translational control
Two folds, one motif
It was pointed out by Grishin that there are actually
two different versions of the KH motif, which he
named type I and type II KH folds (Fig 1) [2] The
type I fold is typically found in eukaryotic proteins,
whereas the type II fold is typically found in
prokary-otic proteins Although type I and type II folds both
share a ‘minimal KH motif’ in the linear sequence, the
three-dimensional arrangement of the secondary
struc-tural elements is different In the type I fold, a b-sheet
composed of three antiparallel b-strands is abutted by
three a-helices (a1, a2, and a¢) The b-sheet in type I
KH domains consists of three b-strands in the order b1, b¢ and b2 The b1-strand and b2-strand are parallel
to each other, and the b¢-strand is antiparallel to both (Fig 1) This all-antiparallel arrangement of strands distinguishes the type I KH fold from the type II KH fold, in which the b1-strand and b2-strand are adjacent and parallel to each other, and the b¢-strand is adja-cent and antiparallel to the b1-strand (Fig 1) The length and sequence of the variable loop are different
in different KH domains, be they type I or type II (the variable loop is shown as a dotted line in Fig 1) Vari-able loop lengths from three to over 60 residues are known All typical KH domains have a GXXG loop (shown in white in Fig 1) [2], although this is some-times altered or interrupted in divergent KH domains [16]
Not only is the order of secondary structural ele-ments in individual eukaryotic type I KH domains dif-ferent from that in prokaryotic type II KH domains, but the relative orientation of tandem type I versus type II KH domains is also quite different The com-parison is limited, however, because the structure of only one of each type of tandem KH domain has been published Here we compare the structures of the tan-dem KH1–KH2 domains from protein NusA (Protein Data Bank entry 2ASB) [14,15] and from human FMRP (hFMRP) (Protein Data Bank entry 2QND) [17] as examples of tandem prokaryotic KH (type II) domains and tandem eukaryotic (type I) KH domains, respectively In NusA, an unstructured six amino acid linker connects KH1 to KH2, and an area of
1380 A˚2 is buried at the interface between the b-sheet of KH1 and the a-helices (a¢ and a2) of KH2 (Fig 2B) By contrast, in hFMRP(KH1–KH2D), the a¢-helix of KH1 is linked to the b1-strand of KH2 by the single residue, Glu280, which adopts non-b non-a
A
B
Fig 1 Type I and type II KH domain folds.
Stylized representations of (A) the type I KH
domain (eukaryotic) and (B) the type II KH
domain (prokaryotic) The labeling of
second-ary structure elements is according to
stan-dard KH nomenclature [2] The dotted line
connecting the b2-strand and b¢-strand
rep-resents the variable loop The white line
connecting the a1-helix and the a2-helix
represents the GXXG loop.
Trang 3phi⁄ psi angles to accomplish this tight connection,
which contains minimal interdomain contacts between
aliphatic residues from the a1-helix of KH1 and the
b-sheet of KH2 [17,18]
Evolutionary relationships between KH
domains
Type I domains are found in multiple copies in
eukaryotic proteins, whereas type II KH domains are
typically found as single copies in prokaryotic proteins
Here, therefore, we discuss eukaryotic proteins Within
a family of KH proteins with multiple KH domains
(i.e type I KH domains), the KH1 domain is always
more similar to other KH1 domains in different
pro-teins than to the KH2 and KH3 domains in the same
protein Similar relationships are seen for KH2 and
KH3 domains – they are more similar to other KH2
and KH3 domains, respectively, than they are to each
other or to KH1 domains (Fig 3) This relationship
holds true in all families and between species, from
those within which the like-pairs of domains have very
high identity [over 95% in the Nova and
poly(C)-bind-ing protein families], to those within which like-pairs
of domains have much lower identity (around 50% in
the FXR family; Fig 3)
From this observation, a number of hypotheses
about the origin and evolution of the KH domains
may be proposed If multiple KH domains arose as a
result of a gene duplication event, the results cited
above suggest that duplication occurred before the
divergent evolution of the members of each protein
family Alternatively, one could speculate that the
interdomain identities are a result of convergent
evolu-tion of different domains in a parent protein, before
subsequent evolutionary divergence produced different members of the family
Nucleic acid binding by KH domains – general features
The structures of KH domains in complex with their cognate nucleic acid ligand are mostly of type I domains from eukaryotic proteins, which function in transcriptional and translation regulation The only structures of type II KH domains in complex with nucleic acid ligand are of the bacterial protein NusA [15] (Protein Data Bank entries 2ATW and 2ASB)
Although the total number of structures in the Pro-tein Data Bank of KH domains bound to cognate nucleic acid ligand is small, some common features of nucleic acid recognition emerge among them The RNA or DNA is bound in an extended, single-stranded conformation across one face of the KH domain, between the a1-helix and the a2-helix and GXXG on the ‘left’, and the b2-sheet and the vari-able loop on the right (Fig 4A) Together, these secondary structural elements form a binding cleft that accommodates four bases Note that the secondary structure elements that shape the binding cleft com-prise, in part, the core motif found in type I and type II domains The variable loop in type II KH domains, however, is located at the bottom of the binding cleft (Fig 4A) The center of the binding pocket tends to be hydrophobic, with a variety of additional specific interactions stabilizing the complex Nucleic acid base-to-protein aromatic side-chain stack-ing interactions, which are prevalent in other types of single-stranded nucleic acid binding motifs [19,20], are notably absent in KH domain nucleic acid recognition
In some complexes, the bases in the ssDNA or RNA bound by the KH domain stack with each other (Fig 4B), whereas in other examples there is no base stacking
An adenine–backbone interaction is a feature seen in some KH domain–nucleic acid structures (Fig 4C) Examples are (relevant adenine in bold) A42–G43– A44–A45 in NusA KH1, C48–A49–A50–U51 in NusA KH2 [15], U12–C13–A14–C15 in Nova-2 KH3 [21], and U6–A7–A8–C9 in splicing factor 1 (SF1) [22] The adenine bases hydrogen bond to the protein backbone, mimicking a Watson–Crick base pairing pattern Superimposing the NusA KH1 domain and cleotides 42–46 on the NusA KH2 domain and ribonu-cleotides 48–53 reveals that the adenine bases of A44 and A50 make exactly equivalent hydrogen bonds to the protein backbone [15]
Fig 2 The orientation of individual KH domains in tandem type I
and type II arrays Schematics are based on the crystal structures
of the KH1–KH2 domains of NusA (type II) (Protein Data Bank entry
1KOR) and fragile X mental retardation protein [type I (B)] (Protein
Data Bank entry 2QND) Each domain is represented as an oval
with the b-sheet side colored solid black and the abutting a-helices
striped.
Trang 4KH domains bind ssDNA and RNA with low
micromolar affinity For example, the Kdvalues of the
KH domain of the SF1–DNA complex and the
hnRNP K KH3 domain–DNA complex are 3 lm and
1 lm, respectively [22,23] The clustering of KH
domains increases nucleic acid recognition and
specific-ity [24]; the four tandem KH domains of P-element
somatic inhibitor protein (PSI), for example, bind
ligand cooperatively [25] The KH1–KH2 domains of
NusA (Protein Data Bank entries 2ATW and 2ASB)
form an uninterrupted recognition surface that binds
RNA with nanomolar affinity [15] Together, the third
and fourth KH domains of the K homology splicing
regulator protein (KSRP) bind RNA ligand more
tightly than each does separately [26]
Finally, where the structures of both the KH–nucleic acid complex and free KH domain have been deter-mined, ligand binding produces little or no structural change in the protein as determined by our analysis [27] and concluded in [15,21,28]
Nucleic acid recognition by KH domains – specific examples
NMR structure of the KH3 domain of hnRNPK with ssDNA bound
The type I KH3 domain of the transcriptional regula-tor hnRNP K binds to a 10mer ssDNA, specifically recognizing the tetrad 5¢-dTCCC (Fig 5) [23] (Protein
FMRP KH1 100.0 - - -
-FMRP KH2 21.7 100.0 - - -
-FXR1 KH1 82.0 23.0 100.0 - - - -
-FXR1 KH2 20.3 55.2 17.7 100.0 - - -
-FXR2 KH1 64.3 20.0 68.6 19.0 100.0 - -
-FXR2 KH2 21.7 53.7 22.9 82.0 22.8 100.0 - -dmFMR1 KH1 54.4 22.1 58.8 23.1 58.8 25.6 100.0 -dmFMR1 KH2 22.5 43.1 24.0 65.3 26.7 62.5 22.6 100.0 NOVA-1 KH1 NOVA-1 KH2 NOVA-1 KH3 NOVA-2 KH1 NOVA-2 KH2 NOVA-2 KH3 NOVA-1 KH1 100.0 - - - -
-NOVA-1 KH2 35.3 100.0 - - -
-NOVA-1 KH3 40.3 37.3 100.0 - -
-NOVA-2 KH1 95.5 36.8 36.8 100.0 - -NOVA-2 KH2 32.4 86.3 34.3 34.3 100.0 -NOVA-2 KH3 38.8 35.8 90.9 37.3 35.4 100.0 PCB1 KH1 PCB1 KH2 PCB1 KH3 PCB2 KH1 PCB2 KH2 PCB2 KH3 PCB1 KH1 100.0 - - - -
-PCB1 KH2 33.8 100.0 - - -
-PCB1 KH3 35.4 33.8 100.0 - -
-PCB2 KH1 95.2 33.8 32.3 100.0 - -PCB2 KH2 35.4 93.8 31.0 35.4 100.0 -PCB2 KH3 33.8 35.4 92.1 30.8 32.4 100.0 PCB3 KH1 88.7 32.3 36.9 90.3 33.8 35.4 PCB3 KH2 35.4 84.6 35.4 33.8 89.2 36.9 PCB3 KH3 36.4 38.5 84.1 33.3 40.0 84.1 PCB4 KH1 74.2 35.4 35.4 69.4 33.8 35.4 PCB4 KH2 35.4 76.9 36.9 33.8 80.0 38.5 PCB4 KH3 33.8 33.8 66.7 30.8 35.4 71.4 PCB3 KH1 PCB3 KH2 PCB3 KH3 PCB4 KH1 PCB4 KH2 PCB4 KH3 PCB1 KH1 - - -
-PCB1 KH2 - - -
-PCB1 KH3 - - -
-PCB2 KH1 - - -
-PCB2 KH2 - - -
-PCB2 KH3 - - -
-PCB3 KH1 100.0 - - - -
-PCB3 KH2 33.4 100.0 - - -
-PCB3 KH3 36.9 40.0 100.0 - -
Fig 3 Table showing sequence identities of KH domains within protein families Data for the FMRP, Nova and PCBP families are shown For each family, the sequences of individual KH domains were aligned with KH domains at different positions in the same protein, and KH domains at the same position in different proteins The highest percentage identities were consistently those between KH domains at the same position in different members of a protein family (highlighted in purple).
Trang 5Data Bank entry 1JK5) The authors propose that the
complex is stabilized by methyl-to-oxygen hydrogen
bonds between three Ile side-chains and the O2 and
N3 atoms of the two central cytosine bases
Methyl-to-oxygen hydrogen bonds are uncommon and weak, but
not without precedent [29,30] Additional interactions
that stabilize the complex include protein backbone
and side-chain hydrogen bonds to bases, and
electro-static interactions between positively charged
side-chains on the protein and the phosphate backbone of
the nucleic acid
Poly(C)-binding proteins
Poly(C)-binding proteins (PCBPs) contain three type I
KH domains, which appear to function independently,
because they are separated by long linkers: KH1–(16
amino acid spacer)–KH2–(67 to > 100 amino acid
spacer)–KH3 They bind to poly(C)-rich DNA and
RNA sequences and function in a diverse range of
cel-lular processes, including mRNA stabilization,
transla-tional activation, and translatransla-tional silencing [31,32]
Crystal structures have been solved of the
PCBP2 KH1 in complex with a 12-nucleotide ssDNA
and with its RNA equivalent (Protein Data Bank
entries 2PQU and 2PQY, respectively) [33] In both
the ssDNA and RNA complexes, the 12 nucleotides
correspond to two repeats of the human C-rich strand
telomeric DNA, 5¢-AACCCTAACCCT-3¢ (a single
repeat is underlined, and the core recognition sequence
is in bold) The asymmetric unit of both ssDNA and
RNA crystals contains two KH1 molecules tethered by
one oligonucleotide ligand The crystal structures of
PCBP2 KH1 in complex with either 12-nucleotide
ssDNA or its equivalent RNA are similar, with no indication that the hydroxyl groups of the RNA bases are involved in interactions with the protein (Fig 6A) The CCCT⁄ U tetranucleotide motif constitutes the core recognition sequence
Interestingly, however, when PCBP2 KH1 was crys-tallized with a seven-nucleotide single repeat ssDNA ligand 5¢-AACCCTA-3¢ (core recognition sequence in bold), a different ‘register’ of the nucleic acid–protein complex was observed [28] (Protein Data Bank entry 2AXY, shown in Fig 6B) In all structures, the nucleic acid was in the ‘typical’ cleft, but its position relative
to the protein was shifted up by one base in the 5¢-direction in the seven-nucleotide structure (ACCC versus CCCT; Fig 6A–C) The first position of the core recognition motif sits on top of the a1-helix, and then the phosphate backbone of the next two nucleo-tides interacts with the a1-helix and the GXXG motif
on the left, and the b2-strand and the variable loop on the right Base stacking is observed between the third and fourth position nucleotides of the core recognition sequence The recently solved high-resolution structure
of the third KH domain of PCBP2 bound to ssDNA, 5¢-dAACCCTA-3¢ [34] (Protein Data Bank entry 2P2R) is similar to previous structures of the first KH domain of PCBP2 However, because the crystals dif-fracted to ultra-high resolution, hydrogen bonding and water molecules mediating protein DNA contacts were observed that previously could not be resolved in other crystal structures Specifically, the binding cleft is occu-pied by the tetrad 5¢-CCCT-3¢, with direct water-medi-ated contacts stabilizing the last two bases, and protein nucleic acid contacts to two additional bases beyond the binding cleft where seen Also of interest is
C
Fig 4 Common features of KH domain– nucleic acid interactions (A) Type I KH domain; the binding cleft comprises the sec-ondary structural elements a1-helix, GXXG loop, a2-helix, b2-strand, and variable loop (colored green), and recognizes four nucleo-tides (cyan sticks) The green dotted line represents the location of the variable loop
in type II KH domains (B) Nucleic acid bases of the ligand stacking with each other Coordinates from Protein Data Bank entry 1J5K were used in (A) and (B), and coordinates from Protein Data Bank entry 2ASB were used in (C).
Trang 6the observation that in different crystal forms, the KH
domains of PCBP2 were either monomeric or were a
crystal-contact-mediated dimer (see section on KH
dimers)
RNA recognition by a single KH domain in
cooperation with a QUA2 domain
SF1 specifically recognizes the intron branchpoint
sequence (BPS) UACUAAC in pre-mRNA transcripts
[35], with KH domain binding augmented by
addi-tional interactions with an N-terminal helix known as
the QUA2 domain (labeled in Fig 7) [36] The RNA
adopts an extended single-stranded conformation, and
is bound in a hydrophobic groove between QUA2, the GXXG loop and the variable loop of the KH domain [22] (Fig 7; Protein Data Bank entry 1K1G) The QUA2 region recognizes the 5¢-nucleotides of the BPS (ACU), with the a1-helix and a2-helix and the b2-strand of the KH domain region interacting with the next nucleotides of the RNA in ‘typical’ fashion
A large surface area of predominantly aliphatic hydro-phobic residues is buried at the protein–RNA inter-face In addition, positively charged side-chains undergo electrostatic interactions with the solvent-exposed phosphate backbone Protein contacts to the 3¢-end of the RNA are provided by the variable loop and the b2-strand
Binding of the seven-nucleotide RNA BPS requires both the QUA2 and KH regions Another example of
an augmented KH domain is the fourth KH domain
of KSRP [26], which contains a novel fourth b-strand located adjacent and angled to the b1-strand and contributes to the stability of the protein (Protein Data Bank entry 2HH2) It is not yet known whether the fourth b-strand is involved in contacts with RNA [26]
X-ray structure of Nova-2 KH3 plus SELEX RNA The X-ray structure of the KH3 domain of Nova-2 bound to an in vitro selected stem–loop RNA contain-ing the 5¢-UCAC-3¢ core recognition sequence has been solved [21] (Fig 8) This structure is something of an
‘outlier’, because the nucleic acid has a double-stranded hairpin stretch (not shown in Fig 8), which may be a consequence of stability requirements for selection in vitro [37]
The stem of the hairpin adopts the A-form double-helical conformation, with four Watson–Crick base pairs (G1–C20, A2–U19, G3–C18, G4–C17) and a single hydrogen bond between A5 and C16 (N1– O2 = 2.4 A˚)
The extended target RNA (A11, U12, C13, A14, C15) lies upon a hydrophobic platform (formed by the a1-helix and the edge of the b2-strand), where it con-tacts both the invariant GXXG motif and the variable loop
Nucleic acid binding by tandem but independent
KH domains – NMR structure of the KH3 and KH4 domains of FBP in complex with FUSE ssDNA FUSE-binding protein has four KH domains, which are separated by linkers of varying lengths [11] FBP regulates c-myc expression by binding to FUSE [38] The NMR structure of a complex between the KH3
Fig 5 Solution structure of the KH3 domain of hnRNP K bound to
ssDNA The third KH domain of hnRNP K (Protein Data Bank entry
1J5K) recognizes a tetrad of sequence 5¢-dTCCC (purple sticks).
Regions on the protein that are in contact with the nucleic acid
ligand are colored green (hydrophilic) and cyan (polar) The sugar
phosphate backbone curves around the a1-helix near the GXXG
loop before proceeding parallel to the a2-helix The first base sits
on top of the a1-helix, and the 5¢-dCCC bases of the tetrad fill the
interior of the predominantly hydrophobic cleft and base stack with
each other (see Fig 4B) The ends of the ssDNA sugar backbone
are stabilized by electrostatic interactions with positively charged
residues that line the ridge of the cleft on the GXXG loop and
a2-helix.
Trang 7and KH4 domains of FBP and a 29-base ssDNA
frag-ment from FUSE [12] shows that each KH domain
binds to a separate 9- to10-base segment of ssDNA
(Fig 9) The KH domains are connected by a flexible
Gly-rich linker, and behave independently In addition,
the two ssDNA segments to which the KH domains
bind are themselves separated by a five-base linker of
ssDNA There are no protein contacts between the
KH domains, and the linker DNA is not in contact
with protein
In both KH domains, the ssDNA is bound in the
typical extended orientation, in the groove between
the a1-helix and a2-helix plus the GXXG loop on
one side, and the b2-strand and the variable loop on
the other The center of the groove is hydrophobic,
and the edges are hydrophilic and charged, with the
narrow binding site (10 A˚) favoring pyrimidines over
purines
NusA – crystal structure of tandem type II KH
domains
NusA regulates transcriptional elongation, pausing,
termination and antitermination in prokaryotes [39–
41] The protein contains two tandem type II KH domains, which are connected by a short six-residue linker [14,15] This short linker, combined with a tight turn between the domains, results in a structure in which the two KH domains are in contact and form
an extended and continuous surface for RNA binding NusA binds with high affinity and specificity to BoxB– BoxA–BoxC antitermination sequences within the lea-der region of the rRNA operon [15] Ligand binding produces no change in the structure or relative orienta-tion of the KH domains, (Protein Data Bank entries 1KOR and 2ASB) [27] The ssRNA is bound in an extended conformation and is in contact with large areas on both KH domains (Fig 10)
Despite having type II connectivity, each KH domain of NusA contains a ‘typical’ binding cleft The variable loop, however, hangs at the bottom of the cleft (Fig 4A) instead of up and across from the GXXG loop, as in type I KH domains The 5¢-end
of the RNA (bases A42 through A45) is buried in and across the groove between the a1-helix and a2-helix and the b2-strand of KH1 Intimate contacts between protein and RNA continue across the cusp
of the KH1 and KH2 domains C46 binds to the
Fig 6 Crystal structures of the first KH domain from PCBP-2 in complex with ssDNA The first KH domain of PCBP2 recognizes the tetrad sequence 5¢-dCCCT [(A) Protein Data Bank entry 2PQU] and 5¢-dACCC [(B) Protein Data Bank entry 2AXY) Polar and hydrophobic residues that make contacts with nucleic acid (purple sticks) are colored cyan and green, respectively Waters (gray spheres) that bridge protein and ssDNA contacts were unambiguously resolved in the high-resolution structure in (B) Both structures are representative molecules within the asymmetric unit In (C), the tetrad sequence (purple letters) of each structure is aligned with respect to the seven-nucleotide single repeat ssDNA ligand The register of the sequence is shifted in the 5¢-direction in (A) In both structures, the nucleotide at the 5¢-end of the ssDNA strand sits on the top of the a1-helix, and is stabilized by contacts that can recognize an adenine or cytosine nucleotide The central cytosine bases of the tetrad sequence occupy the hydrophobic interior of the binding cleft The last nucleotide at the 3¢-end of the ssDNA strand (dC in 2AXY; dT in 2PQU) is participating in base-stacking interactions with the preceding cytosine base.
Trang 8loop connecting b¢ strand and a¢ helix of KH2, and
U47 and C48 make contacts with the a1-helix and
the GXXG loop of KH2 Finally, the nucleotides at
the 3¢-end of the RNA (A49–A52) pack against the
groove comprising the a1-helix and a2-helix and the
b2-strand of KH2 Hydrogen bonds to both amino
acid side-chains and the protein backbone,
electro-static and polar interactions and, to a lesser extent,
hydrophobic interactions between bases and
nonaro-matic amino acid side-chains stabilize the protein
RNA complex
The interaction of the NusA tandem KH domains
with RNA is quite different from that seen in the
dou-ble KH domain of FBP bound to ssDNA from FUSE
– the only other structure of a double KH domain
bound to a nucleic acid target In FBP, the two KH
domains are connected by a flexible 30-residue
Gly-rich linker and behave like beads on a string [12] In
the protein DNA complex, each KH domain interacts
with a separate ssDNA recognition sequence, and a
five-nucleotide noninteracting spacer separates the two bound DNA recognition sequences
Although in both examples the coupling of two RNA-binding domains will effectively increase the specificity and affinity of the RNA–protein interaction, the two different binding modes have very different consequences for the type and length of RNA bound
KH crystal dimers – a tenuous relationship
Crystallographic data Different KH domains crystallize as monomers, dimers, or tetramers This and other observations have
Fig 7 Solution structure of the QUA2 and KH domains of SF1 in
complex with RNA The Qua2 and KH domain of SF1, together,
recognize RNA BPS 5¢-UACUAAC (blue sticks; Protein Data Bank
entry 1K1G) Protein side-chains making polar and hydrophobic
con-tacts with RNA are colored cyan and green, respectively The
QUA2 domain (labeled) abuts the a2-helix of the KH domain, giving
rise to an expanded contact with RNA, with the five nucleotides at
the 5¢-end of the RNA contacting the QUA2 domain, exclusively.
The base of Ura6 is buried between the a1-helix and the QUA2
helix The RNA then continues in single-stranded, extended
confor-mation into the ‘typical’ KH groove Finally, the RNA loops over to
the right and makes contact with the b2-strand Note also the very
long variable loop, 24 amino acids, which loops back over the RNA
from the right.
Fig 8 Crystal structure of Nova-2 KH3 bound to SELEX RNA The third KH domain of the protein Nova-2 binds to the tetranucleotide sequence 5¢-UCAC (blue sticks; Protein Data Bank entry 1EC6), which is part of the larger SELEX RNA Protein side-chains making polar and hydrophobic contacts with RNA are shown in cyan and green, respectively U12–C13–A14 rests on a hydrophobic platform formed by the a1-helix and the b2-strand Electrostatic interactions between protein side-chains, nucleic acid bases and the sugar phosphate backbone further stabilize the complex Bases A14 and C15 participate in base-stacking interactions with each other The 2¢-hydroxyl groups of the tetrad hydrogen bond with protein or other bases, making it unlikely that ssDNA could bind tightly to this
KH domain.
Trang 9led to the proposal that the functional form of certain
KH domains may involve noncovalent dimers or higher-order oligomers Here we review the data Crystals of the single KH3 domain of the protein Nova-2 contain four KH molecules per asymmetric unit (Protein Data Bank entry 1DTJ) related by pseudo-222 noncrystallographic symmetry (NCS; Fig 11A) with two different surfaces on each KH domain mediating protein–protein contacts (Fig 11B,C) One protein– protein interface comprises primarily two b1-strands from two KH domains related by two-fold NCS This arrangement creates an augmented antiparallel b-sheet stabilized by cross-strand side-chain interac-tions [42] and a buried surface area of 890 A˚2 [18,43] (reported as 950 A˚2 in [44]) (Fig 12A) The other interface comprises two a¢-helices with an 500 packing angle [45] of the two KH domains related by NCS that buries 1000 A˚ (reported as 1250 A˚2 in [44]; Fig 11C)
Interestingly, the same KH domain in complex with
a SELEX RNA crystallizes with only two KH mole-cules in the asymmetric unit related by NCS [21] The two KH molecules interact through related a¢-helices and bury 1000 A˚2 (Fig 12B) This arrangement is identical to the protein–protein interactions observed
in crystals of apo-Nova (Fig 11C)
Crystals of the first KH domain of PCBP2 in plex with ssDNA contain two identical dimer com-plexes per asymmetric unit related by two-fold NCS [28] (Protein Data Bank entry 2AXY; Fig 13A) The dimer buries 1890 A˚2, and as in the protein–protein interface depicted in Fig 12A, an augmented antipar-allel b-sheet is formed by symmetry-related b1-strands and further stabilized by interactions between a¢-helices (Fig 13B) This dimeric arrangement is reproduced
in crystals of two PCBP2 KH1 molecules tethered by one ssDNA or RNA ligand [33] (Protein Data Bank entries 2PQU and 2PYQ) In the cocrystal structure
of the third KH domain of human PCBP-2 with DNA [34], however, no protein–protein contacts were observed in the crystal Instead, crystal contacts were solely formed by base-stacking interactions of DNA molecules from adjacent asymmetric units A1 of the heptanucleotide stacks on C3 of a symmetry-related DNA and vice versa
For neither the apo nor nucleic acid-bound forms of these KH domains are there published solution data in support of the idea that these KH domains may exist
as dimers or higher-order oligomers in solution [17,44], and nor have dimers or higher-order oligomers been shown to be of functional significance in vivo
Fig 9 Solution structure of the FBP KH3–KH4 domain bound to
ssDNA The third and fourth KH domains of FBP recognize ssDNA
5¢-dTTTT (A) and 5¢-ATTC (B), respectively In both domains, the
binding cleft makes hydrophobic contacts with the ssDNA bases,
and polar residues lining the edge of the cleft contact the sugar
phosphate backbone The bases of the DNA ligand stack with each
other, with the methyl groups of thymine pointing away from the
binding cleft Both domains behave independently Although both
the KH domains and both the DNA-binding sites were present as a
single unit, neither the Gly-rich protein linker nor the noncontacted
ssDNA were resolved.
Trang 10In crystals of the tandem KH domains from human
FMRP, there are also two molecules in the asymmetric
unit related by NCS [17] (Protein Data Bank entry
2QND) Contacts between NCS-related b2-strands
and, to a lesser extent, a1-helices bury, 2100 A˚2
(Fig 14A) This b-sheet augmentation is similar to that
seen with apo-Nova-2 KH3 and PCBP2 KH1, but its
interface comprises primarily b2–b2 and not b1–b1
interactions (compare Figs 12 and 13 with Fig 14B)
When the C2 operation is applied to the asymmetric
unit, another interface is formed between
neigh-boring KH domains This interface is mediated by
symmetry-related a¢-helices, as seen in crystals of
RNA-bound Nova-2 KH3 (Figs 11C and 12B), and buries 1200 A˚2 – significantly less than observed in the asymmetric unit
In summary, two interfaces are commonly observed
in the crystals: (a) helix–helix packing between symme-try-related a¢-helices with a 500 packing angle, as seen in the Nova-2 KH3–RNA structure; and (b) b-sheet augmentation achieved by contacts between b1
or b2 symmetry-related strands, as seen in Nova-2 KH3, hFMRP (KH1–KH2D), and PCBP2 KH1 Caution is advised in extrapolating from crystal structures to predict the solution oligomeric state of
KH domains Although several KH domains form
Fig 10 Crystal structure of tandem type II KH domains of NusA in complex with RNA The tandem KH1–KH2 domains of NusA recognize RNA ligand 5¢-GAACUCAAUAG (A) The KH1–KH2 domains of NusA bound to cognate RNA ligand (Protein Data Bank entry 2ASB) The RNA–protein contact surface spans across both domains In particular, A45 makes contacts with residues in both KH1 and KH2 Additional polar contacts with 2¢-hydroxyls specify RNA recognition The KH1 and KH2 domains are shown separately in (B) and (C), respectively Type II KH domains are connected differently The variable loop, for example, is located at the bottom and to the left of the binding cleft Although the connection of type II KH domains is different, the structural elements that comprise the binding cleft are the same in as type I domains, and accommodate four nucleotides as well.
Fig 11 Protein–protein interfaces in Nova-2 KH3 in crystals This figure is an adaptation of Figs 6 and 7 from Lewis et al [44], using Protein Data Bank coordinates 1DTJ (A) Contents of the asymmetric unit with the two-fold NCS axis labeled The tetrameric arrangement of mole-cules produces two protein–protein interfaces (B) One protein–protein interface generated by two-fold NCS (C) Other protein–protein inter-faces also generated by two-fold NCS.