1. Trang chủ
  2. » Luận Văn - Báo Cáo

Tài liệu Báo cáo khoa học: The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression ppt

14 428 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression
Tác giả Christophe Maris, Cyril Dominguez, Frédéric H.-T. Allain
Trường học Institute for Molecular Biology and Biophysics, Swiss Federal Institute of Technology Zurich (ETH Zurich)
Chuyên ngành Molecular biology and biophysics
Thể loại Minireview
Năm xuất bản 2005
Thành phố Zurich
Định dạng
Số trang 14
Dung lượng 1,07 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Biochemical characterizations of the mRNA polyadenylate binding protein PABP and the hnRNP protein C shed light on a consensus RNA-binding domain of approximately 90 amino acids containi

Trang 1

The RNA recognition motif, a plastic RNA-binding platform

to regulate post-transcriptional gene expression

Christophe Maris*, Cyril Dominguez* and Fre´de´ric H.-T Allain

Institute for Molecular Biology and Biophysics, Swiss Federal Institute of Technology Zurich, ETH-Ho¨nggerberg, Zu¨rich, Switzerland

History – what defines an RRM?

The RNA recognition motif (RRM), also known as

the RNA-binding domain (RBD) or

ribonucleopro-tein domain (RNP), was first identified in the late

1980s when it was demonstrated that mRNA

precur-sors (pre-mRNA) and heterogeneous nuclear RNAs

(hnRNAs) are always found in complex with proteins

(reviewed in [1]) Biochemical characterizations of the

mRNA polyadenylate binding protein (PABP) and

the hnRNP protein C shed light on a consensus

RNA-binding domain of approximately 90 amino acids containing a central sequence of eight con-served residues that are mainly aromatic and posi-tively charged [2,3] This sequence, termed the RNP consensus sequence, was thought to be involved in RNA interaction and was defined as Lys⁄ Arg-Gly-Phe⁄ Tyr-Gly ⁄ Ala-Phe ⁄ Tyr-Val ⁄ Ile ⁄ Leu-X-Phe ⁄ Tyr, where X can be any amino acid Later, a second consensus sequence less conserved than the previously characterized one [1] was identified This six residue sequence located at the N-terminus of the domain

Keywords

RNA recognition motif; protein–RNA

complex; structure–function relationship;

RNA-binding specificity

Correspondence

F H.-T Allain, Institute for Molecular

Biology and Biophysics, Swiss Federal

Institute of Technology Zurich,

ETH-Ho¨nggerberg, CH-8093 Zu¨rich, Switzerland

Fax: +41 1 6331294

Tel: +41 1 6333940

E-mail: allain@mol.biol.ethz.ch

Website: http://www.mol.biol.ethz.ch/

groups/allain_group

*These authors contributed equally to the

work

(Received 16 December 2004, accepted

7 March 2005)

doi:10.1111/j.1742-4658.2005.04653.x

The RNA recognition motif (RRM), also known as RNA-binding domain (RBD) or ribonucleoprotein domain (RNP) is one of the most abundant protein domains in eukaryotes Based on the comparison of more than 40 structures including 15 complexes (RRM–RNA or RRM–protein), we reviewed the structure–function relationships of this domain We identified and classified the different structural elements of the RRM that are import-ant for binding a multitude of RNA sequences and proteins Common structural aspects were extracted that allowed us to define a structural leit-motif of the RRM–nucleic acid interface with its variations Outside of the two conserved RNP motifs that lie in the center of the RRM b-sheet, the two external b-strands, the loops, the C- and N-termini, or even a second RRM domain allow high RNA-binding affinity and specific recognition Protein–RRM interactions that have been found in several structures rein-force the notion of an extreme structural versatility of this domain support-ing the numerous biological functions of the RRM-containsupport-ing proteins

Abbreviations

ACF, APOBEC-1 complementary factor; CBP, cap binding protein; CstF, cleavage stimulation factor; hnRNP, heterogeneous nuclear ribonucleoprotein; HuD, Hu protein D; LRR, leucine rich repeat; MIF4G, middle domain of the translation initiation factor 4 G; PABP, polyadenylate binding protein; PIE, polyadenylation inhibition element; PTB, polypyrimidine tract binding protein; RBD, RNA-binding domain; RNP, ribonucleoprotein; RRM, RNA recognition motif; SR, serine/arginine rich proteins; TLS, translocated in liposarcoma; U1A, U2A¢, U2B¢: U1 snRNP proteins A, A¢, B¢; U2AF, U2 snRNP auxiliary factor; UHM, U2AF homology motif; UPF, up-frameshift protein.

Trang 2

was defined as Ile⁄ Val ⁄ Leu-Phe ⁄ Tyr-Ile ⁄ Val ⁄

Leu-X-Asn-Leu The first consensus sequence was therefore

referred as RNP 1 and the second as RNP 2 (Fig 1)

It was then shown that this protein domain was

necessary and sufficient for binding RNA molecules

with a wide range of specificities and affinities

(reviewed in [4–6])

Here we review the structural properties of the

RRM domain in its isolated form and in complex with

RNAs and⁄ or proteins This review shows how such a

simple domain can modulate its fold to recognize

many RNAs and proteins in order to achieve a

multi-tude of biological functions often associated with

post-transcriptional gene regulation

An abundant and ancient fold with

multiple biological functions

Genome sequencing projects recently showed that the

RRM is found abundantly in all life kingdoms,

inclu-ding prokaryotes and viruses although at lower

abun-dance than in eukaryotes To date, only 85 proteins

containing an RRM domain in bacteria (mostly

cyano-bacteria [7]), and six such proteins in viruses have been

identified Prokaryotic RRM proteins are rather small

(about 100 amino acids) and have a single copy of the RRM domain In eukaryotes, the RNA recognition motif is one of the most abundant protein domains

To date, a total of 6056 RRM motifs have been identi-fied in 3541 different proteins (http://www.sanger ac.uk/cgi-bin/Pfam/getacc?PF00076) [8] In humans,

497 proteins containing at least one RRM have been identified Assuming about 20 000–25 000 human genes, the RRM would therefore be present in about 2% of gene products In eukaryotic proteins, RRMs are often found as multiple copies within a protein (44%, two to six RRMs) and⁄ or together with other domains (21%) Among the latter, the most abundant are the zinc fingers of the CCCH and CCHC type (21% of those with an additional domain), the poly-adenylate binding protein C-terminal domain (PABP

or PABC, 10%), and the WW domain (9%) Interest-ingly, contrary to the well known CCHHs that bind double-stranded DNA or RNA, the CCCH and CCHC zinc fingers are domains that bind single-stran-ded RNA [9,10] The PABP and the WW domains [11] are protein–protein interaction domains involved in translation [12,13] and pre-spliceosome formation, respectively [14] By association with different types of protein domains, the RRM domain can modulate its

10 20 30 40 50 60 70 80

PTB (1SJQ) 60 VIHIRKLPIDVTEGEVISLGLP -FGKVTNL -LMLKG -KNQAFIEMNTEEAANTMVNYYTSVTPVLRGQPIYIQ 147 PTB (1SRJ) 183 RIIVENLFYPVTLDVLH-QIFSK FGTVLKI -ITFTKNN QFQALLQYADPVSAQHAKLSLDGQNIYNACCTLRID 282 PTB (1QM9) 338 VLLVSNLNPERVTPQSLFILFGV YGDVQRV -KILFNK -KENALVQMADGNQAQLAMSHLNGHKLH GKPIRIT 407 PTB (1QM9) 455 TLHLSNIPPSVSEEDLK-VLFSS NGGVVKG -FKFFQKD RKMALIQMGSVEEAVQALIDLHNHDLG-ENHHLRVS 531 Cstf-64 (1P1T) 17 SVFVGNIPYEATEEQLK-DIFSE VGPVVSF -RLVYDRETGKPKGYGFCEYQDQETALSAMRNLNGREFS GRALRVD 90

LA (1OWX) 244 LKFSGDLDDQTCREDLHILFSNH GEIK -WIDFVRGA KEGIILFKEKAKEALGKAKDANNGNLQLRNKEVTWEV 305 TAP (1FO1) 121 KITIPYGRKYDK-AWLLSMIQSKCSVPFTPIEFHYENTRAQFFVEDASTASALKAVNYKILDRENRRISIIINSSAP PHS 290 ALY (1NO8) 106 KLLVSNLDFGVSDADIQ-ELFAE FGTLKKA -AVHYDRSGR-SLGTADVHFERKADALKAMKQYNGVPLD GRPMNIQ 178 hnRNP A1 (1UP1) 15 KLFIGGLSFETTDESLR-SHFEQ WGTLTDC -VVMRDPNTKRSRGFGFVTYATVEEVDAAMNARP-HKVD GRVVEPK 87 hnRNP A1 (1HA1) 105 KIFVGGIKEDTEEHHLR-DYFEQ YGKIEVI -EIMTDRGSGKKRGFAFVTFDDHDSVDKIVIQKY-HTVN GHNCEVR 177 HUD (1FXL) 47 NLIVNYLPQNMTQEEFR-SLFGS IGEIESC -KLVRDKITGQSLGYGFVNYIDPKDAEKAINTLNGLRLQ TKTIKV 119 HUD (1FXL) 133 NLYVSGLPKTMTQKELE-QLFSQ YGRIITS -RILVDQVTGVSRGVGFIRFDKRIEAEEAIKGLNGQKPSGATEPITVK 206 SXL (2SXL) 126 NLIVNYLPQDMTDRELY-ALFRA IGPINTC -RIMRDYKTGYSYGYAFVDFTSEMDSQRAIKVLNGITVR NKRLKV 199 SXL (1SXL) 212 NLYVTNLPRTITDDQLD-TIFGK YGSIVQK -NILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVR 290 PABP (1CVJ) 12 SLYVGDLHPDVTEAMLY-EKFSP AGPILSI -RVCRDMITRRSLGYAYVNFQQPADAERALDTMNFDVIK GKPVRI 84 PABP (1CVJ) 99 NIFIKNLDKSIDNKALYDTFSAF GNILSCK -VVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKS 175 Nucleolin (1FJE) 309 NLFIGNLNPNKSVAELKVAISEL FAKND -LAVVDVRTGTNRKFGYVDFESAEDLEKAL-ELTGLKVF GNEIKLE 380 Nucleolin (1FJE) 396 LLAKNLSFNITEDELKEVFEDAL EIRLVSQ -DGKSKGIAYIEFKS EADAEKNLEEKQGAEID GRSVSLY 463 U1A (1DZ5) 11 TIYINNLNEKIKKDELKKSLYAI FSQFGQI -LDILVSRSLKMRGQAFVIFKEVSSATNALRSMQGFPFY DKPMRIQ 85 U2B" (1A9N) 8 TIYINNMNDKIKKEELKRSLYAL FSQFGHV -VDIVALKTMKMRGQAFVIFKELGSSTNALRQLQGFPFY GKPMRI 81 CBP20 (1H2T) 41 TLYVGNLSFYTTEEQIY-ELFSK SGDIKKI -IMGLDKMKKTACGFCFVEYYSRADAENAMRYINGTRLD DRIIRTD 114 Y14 (1P27) 74 ILFVTGVHEEATEEDIH-DKFAE YGEIKNI -HLNLDRRTGYLKGYTLVEYETYKEAQAAMEGLNGQDLM GQPISVD 147 UPF3 (1UW4) 52 KVVIRRLPPTLTKEQLQEHLQPM PEHDYFE FFSNDTSLYPHMYARAYINFKNQEDIILFRDRFDGYVFLDNKGQEYPA 131 U2AF65 (1U2F) 150 RLYVGNIPFGITEEAMM-DFFNAQMR-LGGLTQAPG -NPVLAVQINQDKNFAFLEFRSVDETTQAM-AFDGIIFQ GQSLKIR 227 U2AF65 (2U2F) 260 KLFIGGLPNYLNDDQVK-ELLTS FGPLKAF -NLVKDSATGLSKGYAFCEYVDINVTDQAIAGLNGMQLG DKKLLVQ 333 U2AF35 (1JMT) 66 RSAVSDVEMQEHYDEFFEEVFTEMEEKYGEVEEM -NVC-DNLGDHLVGNVYVKFRREEDAEKAVIDLNNRWFN GQPIHA 143

Fig 1 Sequence alignment of a selection of RRM domains for which the structure has been solved (PDB codes are indicated in brackets) The alignment was generated by the program CLUSTALW (http://www.ebi.ac.uk/clustalw/) [55] and manually optimized The conserved RNP 1 and RNP 2 sequences are displayed in yellow The amino acids highlighted in boxes refer to the aromatic residues important for primary RNA binding.

Trang 3

RNA-binding affinity and specificity and diversify its

biological functions

A protein domain in such abundance is necessarily

biologically important and associated with many

func-tions in the cell Indeed, eukaryotic RRM proteins are

present in all post-transcriptional events: pre-mRNA

processing (for example CstF-64, LA, or UPF3

pro-teins), splicing (U2B¢, U2AF35, U2AF65, hnRNPA1 or

Y14 proteins), alternative splicing (hnRNPA1, PTB,

sex-lethal, SR proteins), mRNA stability (CBP20,

PABP or HuD), RNA editing (ACF), mRNA export

(TLS), pre-rRNA complex formation (nucleolin),

translation regulation (PABP) and degradation [6] In

plants, RRM proteins are present in chloroplasts and

are involved in 3¢ end processing of chloroplast mRNA

[15] They have also been discovered in plant

mito-chondria Their functions, however, remain unclear

[16] Similarly, their roles in bacteria and viruses are

still unknown The numerous three-dimensional

struc-tures of the RRM in isolation, and in complex with

RNA or other proteins, shed light on the function of

RRM proteins, as shown below

The structure of the RRM, a babbab fold

with some variations and extensions

The RRM folds into an ab sandwich structure with a

b1a1b2b3a2b4 topology (Figs 1 and 2) as demonstrated

by the first structure of an RNA recognition motif,

the N-terminal RRM of U1A [17] The fold is

com-posed of one four-stranded antiparallel b-sheet

spa-cially arranged in the order b4b1b3b2 from left to

right when facing the sheet (Fig 2, hnRNP A1-RRM

2, front view) and two a-helices (a1 and a2) packed

against the b-sheet Most of the conserved residues of

the RRM are in the hydrophobic core of the domain [17] except four conserved residues that contribute to RNA binding, namely RNP 1 positions 1, 3 and 5 and RNP 2 position 2 (see the following section and Fig 1) The RNP 1 and RNP 2 motifs are located in the central strands of the b-sheet, namely b3 and b1, respectively, and are highly conserved apart from a few RRM domains such as ALY and TAP (Fig 1) [18,19]

To date, more than 30 RRM structures have been determined either by NMR or X-ray crystallography and reveal unexpected variations as shown in Fig 2 The loops between the secondary structure elements (loops 1–5 as indicated in Figs 1 and 2) can have different lengths and are often disordered in the free form An exception to this is loop 5 that often forms

a small two-stranded b-sheet (b3¢ and b3¢) (Fig 2) The N- and C-terminal regions, outside the RRM, are usually poorly ordered in the isolated domains with a few exceptions where they can adopt a secon-dary structure (Fig 2, PTB-RRM 3, La C-terminal RRM and CstF-64) In the structures of La C-ter-minal RRM [20], U1A N-terC-ter-minal RRM [21] and CstF-64 RRM [22], the C-terminus forms an a-helix that lies on the b-sheet surface, while in PTB-RRM 2 and 3 it extends the size of the b-sheet by forming an extra b-strand (b5) antiparallel to b2

[23,24] CstF-64 RRM has also an additional short a-helix in its N-terminal region (Fig 2) [22] Finally, secondary structure elements of the domain can be modified; for example a-helix 1 in U2AF35 RRM that is three times longer than in a canonical RRM (Fig 2) This unusual helix 1 is involved in protein– protein interactions [25] (see the RRM–protein com-plexes section)

Fig 2 hnRNPA1 RRM 2, a typical RRM fold and its structural variations as illustrated by these different protein structures (hnRNPA1 RRM 2 [52], PTB RRM 3 [23], La C-terminal [20], Cst64 RRM [22] and U2AF35 [51]) This figure was generated with the program

MOLMOL [56].

Trang 4

A true single-stranded nucleic acid

binding domain

Since the first structure of an RRM in complex with

RNA (the N-terminal domain of U1A in complex with

U1snRNA stem-loop II [26]) that founded our

under-standing of RRM–RNA recognition, 10 structures

of RRMs in complex with RNA or DNA (for

hnRNPA1) have been determined either by NMR

[27–30] or X-ray crystallography [31–36] All of the

structures present intrinsic common features and

dif-ferences in RNA recognition reflecting the remarkable

adaptability of this domain in order to achieve high

affinity and specificity

Systematic visual analysis of the conserved residues

at the RRM–RNA interface for all 11 published

com-plexes led us to define a common structural archetype

of the RRM–nucleic acid interaction exemplified by

hnRNPA1, an RRM protein binding both DNA and

RNA with high affinity In the structure of hnRNPA1

RRM 2 in complex with DNA [34] (Fig 3A), two

deoxynucleotides, A209 and G210, stack two aromatic

rings located on b1(Phe108, RNP 2 position 2) and b3

(Phe150, RNP 1 position 5) strands, respectively

(Fig 3A) The contacts with these two RNP positions

result in a characteristic arrangement of the nucleic

acid strand on the b-sheet surface in which the 5¢ end

is located on the first half of the b-sheet (b4b1) and

the 3¢ end on the second half (b3b2) (Fig 3B) A third

aromatic residue located on b3 (Phe148, RNP 1

position 3) interacts hydrophobically with the sugar rings of A209 and G210 Finally, a positively charged side chain (Arg146, RNP 1 position 1) forms a salt bridge with the phosphate between A209 and G210 This small set of RRM–nucleic acid interactions, in the center of the domain, involving four conserved protein side chains of the RRM consensus sequence and two nucleotides, illustrates the perfect adaptation

of the RRM for effectively binding single-stranded nucleic acids of any sequence Indeed, the essential chemical elements of this dinucleotide, namely the two bases, the two sugar rings and the phosphates in between, are recognized The two bases are stacked on conserved aromatic rings, and correspondingly, RNP 2 position 5 and RNP 1 position 2 are planar residues (Phe, Tyr, His or Trp) in 78% and 72% of the 70 RRMs studied by Birney et al [6], respectively The two sugar rings are in contact with a hydrophobic side chain (RNP 1 position 3) that is present in 81% (67%

of Phe or Tyr) of the RRMs and finally the negatively charged phosphodiester group is neutralized by a posi-tively charged side chain (RNP 1 position 1) present in 68% of the RRMs [6] Although the residue conserva-tion at these four posiconserva-tions is strong, these four char-acteristic contacts are not always found all together [34] Among the RRM–RNA⁄ DNA complexes, the two RRMs of hnRNPA1 in complex with DNA have all four characteristic contacts, whereas only one to three of those are found in the other structures (Fig 4) The most frequent ones are the two stacking

A

Fig 3 hnRNPA1 RRM 2 as a model of

single stranded nucleic acid binding [25].

(A) Structure of hnRNPA1 RRM 2 in

com-plex with single stranded telomeric DNA

and scheme of the b-sheet annotated with

the conserved RNP 1 and RNP 2 aromatic

residue positions numbered according to

each RNP sequence numbering The

con-served aromatic residues are highlighted by

green circles [34] (B) Structural

arrange-ment of the DNA strand on the b-sheet of

hnRNPA1–RRM 2 (C) Hydrogen bond and

van der Waals interaction network

confer-ring base-binding specificity (hnRNPA1–

RRM 2 complex) This figure was generated

with the program MOLMOL [56].

Trang 5

interactions involving RNP 2 position 2 (always

pre-sent except in nucleolin RRM 2 [37]) and RNP 1

posi-tion 5 (always present except in CBP 20 [36]) The

contacts between the sugars and RNP 1 position 3 are

present in five RRM–RNA complexes (CBP20, PABP

RRM 1, nucleolin RRM 1 and RRM 2 and sex-lethal

RRM 1) The RNP 1 position 1 residue does not necessarily interact with the phosphate between the dinucleotide because in all structures apart from hnRNPA1 it contacts an RNA base or a phosphate oxygen of other nucleotides Also, the RRM inter-actions with the sugar–phosphate backbone are fairly

Fig 4 The RRM domain, a highly plastic platform for nucleic acid binding (A) Nucleolin RRM 2-sNRE complex [28] (B) Sex-lethal RRM 1–polyU–Tra mRNA [31] (C) Sex-lethal RRM 2–Tra mRNA precursor complex [31] (D) hnRNPA1 RRM 1–telomeric DNA complex [34] (E) Poly(A)-binding protein RRM 1–polyadenylate RNA complex [33] (F) Heterodimeric nuclear cap binding complex 5¢ capped polymerase II transcripts [36] In all figures, the RNA is shown in yellow and the protein side chain in green The ribbon of the RRM is shown in grey The N- and C-terminal extensions of the RRM are shown in green and red, respectively This figure was generated with the program MOLMOL

[56].

Trang 6

limited compared to other types of RNA-binding

proteins, such as ribosomal proteins, suggesting a less

important role for this type of interaction [38]

This basic binding platform common to all RRMs is

not in essence sequence-specific as eight of the 16

dinu-cleotide combinations have already been found: AA

[33], AG [34], CG [28], CA [26], GU [31], UC [28], UG

(S D Auweter and F H.-T Allain, unpublished data)

and UU [31], with any type of nucleotide either at the

5¢ or the 3¢ position The nucleotides at these two

posi-tions always adopt an anti conformation, except for

the G at the 3¢ position always found in a syn

confor-mation Specificity of this central dinucleotide

recogni-tion is provided by other non conserved elements of

the RRMs The two most frequently observed elements

are the protein side chains at the surface of the b-sheet

(RNP 1 position 7 and the two adjacent positions in

b1) (Fig 3A) and the backbone and side chains of the

few amino acids just C-terminal to b4 These residues

are base-specifically hydrogen-bonded to the RNA or

DNA functional groups as illustrated by the multiple

base–amino acid contacts in hnRNPA1 RRM 2

(Fig 3C)

A highly plastic domain to achieve high

RNA-binding affinity and specificity

Many RRMs bind RNA with high affinity (in the nm

range) and high sequence-specificity, in particular all

those whose structures have been determined to date

Nevertheless, sequence-specificity does not necessarily

imply high affinity, e.g PTB that specifically

recogni-zes pyrimidine tracts but does not provide sufficient

binding enthalpy to reach nm affinity (F C

Ober-strass, S D Auweter and F H.-T Allain, unpublished

data) To achieve higher affinity, some RRM proteins

use the two external b4 and b2 strands, while others

use the loops 1, 3 or 5, or the C- and N- termini [39]

In many proteins, multiple RRMs associate to bind

longer nucleotide stretches In these cases, the

interdo-main linker is an essential component of RNA

recogni-tion In addition, the RNA secondary structure can be

an important determinant of the protein binding

affin-ity All of these aspects are presented in detail below

Role of the two external b-strands and the loops

The b-sheet surface of an RRM can be modulated by

using only one or up to four b-strands for RNA

bind-ing Figure 4 clearly illustrates that the b-sheet surface

is not used to the same extent in each RRM–nucleic

acid complex Exceptionally, in hnRNPA1 RRM 1,

each b-strand binds one nucleotide, the DNA being

spread on the b-sheet from b4 to b2 in the 5¢)3¢

direc-tion More often, the nucleotide at the 5¢ end of the central dinucleotide contacts the loops at the bottom

of the b-sheet (loop 1 and loop 3 in particular, Fig 4C) and the one at the 3¢ end stacks over the pre-vious nucleotide (Fig 4A) In PAPB RRM 1, it is dif-ferent again; while A6 and A8 stack the protein side chains at the canonical positions on b1 and b3, respect-ively, the nucleotide in between, A7, interacts with loop 3 (Fig 4E)

Role of the N- and C-terminal regions The N- and C-terminal regions of the RRM are often

of crucial importance to dramatically enhance the RNA-binding affinity by increasing the protein–RNA interaction network In most RRM–RNA complexes, the base stacking on the aromatic residue at RNP 2 position 2 is sandwiched either by a protein side chain from the N-terminal region (CBP20) or by one from the C-terminal region of the RRM (Fig 4D–F) [36] This side chain can be one residue after the end of b4

as in U1A [26,27] or 16 residues afterwards as in hnRNPA1 RRM 1 [34] (Fig 4D) The C-terminus of hnRNPA1 RRM 1 is particularly interesting because it

is unstructured in the free form and becomes ordered upon DNA binding forming a 310helix This structural rearrangement reinforces the concept of binding by induced fit, initially proposed with the structure of the U1A–RNA complex [27] Side chain residues of this helix, His101 and Arg92, stack over A203 and G204, respectively (Fig 4D) [34]

The C-terminus can also contribute to differentiating RNA from DNA by interacting with the 2¢OH group

of the sugar ring as shown in Fig 4B,E The hydroxyl group can act as a hydrogen bond acceptor interacting with protein side chains (Fig 4E, Arg94; Fig 4B Arg202) as well as with the backbone amide (Fig 4B, Gly205) and⁄ or as a hydrogen bond donor interacting with the carbonyl oxygen of the protein backbone [38] Other parts of the RRM domain, such as the b2-strand and the loops, also interact with the 2¢OHs and help

to discriminate RNA from DNA [26,31,33,35]

The C-terminal region does not always enhance, but can also inhibit RNA binding as shown in the struc-ture of CBP20 [36] (Fig 4F) Two residues (Asn116 and Arg123) of the C-terminus form a salt bridge located above the RNP 1 residue at position 5 (Phe85) preventing any RNA binding at this key position Similarly in PTB, the C-terminal region of all the RRMs hydrophobically interacts with RNP 1 position

5, thereby masking this binding site (F C Oberstrass,

S D Auweter and F H.-T Allain, unpublished data)

Trang 7

Role of the RNA secondary structure

in RRM binding

Some proteins such as the N-terminal RRM of U1A

bind single-stranded RNA with high affinity only if the

RNA is embedded within a secondary structure, stem

loop (hairpin loop II of U1 snRNA [26]) or internal

loop (the regulatory element of the U1A 3¢ untranslated

region [27]) For example, the U1A protein that

recog-nizes a stem loop has a much weaker affinity (104-fold)

for a single-stranded 23-mer RNA with no base pairs,

even though the proper single-stranded recognition

sequence is present [26] U1A RRM 1 specifically

recog-nizes the secondary structure of the target RNA

through its loops 1 and 3 binding to a specific base pair

In the case of U1A bound to a fragment of U1 snRNA

hairpin II, Arg52 (loop 3) makes crucial interactions

with the closing loop GC base pair and its substitution

to Glu completely abolishes RNA binding [26]

(Fig 5A) U1A not only binds a stem loop but also an

internal loop [27,29] This ability to bind RNA in

differ-ent environmdiffer-ents shows the adaptability of the proteins

to recognize different secondary structures as long as

the key protein–RNA interactions are conserved The

closely related U2B¢ RRM binds the same

hexanucleo-tide sequence, AUUGCA, as U1A but within a

differ-ent stem loop (U2 snRNA hairpin IV) and only when

in complex with U2A¢ (Fig 5B) The adaptability of

the RRM domain is further illustrated here, as the key

residue Arg52 still interacts with the RNA stem

although the closing base pair is a UU base pair in

U2snRNA SLIV instead of a GC in U1snRNA SLII

While both U1A and U2B¢ recognize the bases at the top of the stem through numerous hydrogen bonds, nucleolin contacts the nucleolin recognition ele-ment (sNRE) RNA stem essentially by van der Waals interactions [28] (Fig 5C) The two RRMs of nucleolin sandwich the seven nucleotide loop and RRM 1 and its C-terminal part recognize the unusual loop E struc-ture [28] The substitution of the loop E by two GC base pairs separated by a bulge increases the dissoci-ation constant more than 100-fold (from 5 nm to 0.8 lm) [30] and, as shown in Fig 5D, this substitution annihilates all van der Waals interactions (only one hydrogen bond from Lys95 is retained) The double-stranded stem is important for two reasons: first, it restricts the conformation of the RNA loop and redu-ces the entropy loss accompanying protein binding; and second, some structural features of the RNA such

as the base pair (U1A and U2B¢) or loop E (nucleolin) that closes the RNA loop, are crucial for positioning the RRM onto the RNA It was postulated that the RNA structure is essential because it induces conform-ational changes in order to reach the bound state [27,40]

Role of additional RRMs The combination of two or more RRM domains allows the continuous recognition of a long nucleotide sequence (8–10 nucleotides) often drastically increasing the affinity (Kd< nm) As shown previously, the b-sheet surface can bind up to four nucleotides and up

to six if loops 1 and 3 contribute extensively to binding

D C

Fig 5 Role of the RNA secondary structure

in RRM binding (A) U1A spliceosomal protein–U1 snRNA hairpin II complex [26] (B) U2B¢–U2A¢ protein complex bound to U2 snRNA hairpin IV [32] (C) Nucleolin–sNRE complex [28] The loop E motif is composed

of a sheared base G5-A18 pair, an A6-U17-G16 and a symmetric (trans-Hoogsteen) locally parallel A7-A15 base pair (D) Nucleo-lin–b2NRE complex with the loop E motif substituted by a bulge (U15 between two

GC base pairs) [30] The color schemes are the same as in Fig 4, except that the pro-teins loops and the C-terminus are shown in blue This figure was generated with the program MOLMOL [56].

Trang 8

(S D Auweter and F H.-T Allain, unpublished

data) Thus, recognition of a longer single-stranded

DNA or RNA requires more than one RRM to form

a larger binding platform Four structures of two

con-secutive RRMs in complex with RNA (sex-lethal [31],

HuD [35], PABP [33] and nucleolin [28,30]) and one

with DNA (hnRNPA1 [34]) have been determined In

all five cases, the two RRMs and the interdomain

lin-ker cooperatively bind RNA providing high affinity

and specificity In the free forms of sex-lethal and

nucleolin, the linkers are disordered and the two RRM

domains tumble independently [37,41] In some cases

(PABP, nucleolin), the interdomain linker (that is the

C-terminal region of the N-terminal RRM as described

above) acts as a bridge, mediating the cooperative

binding of two RRM domains with the RNA More

interesting is the range of new possible conformations

provided by the association of two RRMs (Fig 6) In

PAPB, a large binding platform is created for the

RNA; in sex-lethal and HuD, the two RRMs form a

cleft in which the RNA lies; and in nucleolin the RNA

is sandwiched between the RRMs As a consequence

of the relative arrangement of the two domains in sex-lethal, HuD and nucleolin, several intra-RNA inter-actions are created upon RNA binding that contribute

to the overall enthalpy of the complex, while in PABP almost no intra-RNA interactions are present On the contrary, hnRNPA1 RRMs 1–2 and PTB RRMs 3–4 (F C Oberstrass, S D Auweter and F H.-T Allain, unpublished results) are arranged in such a way that only distantly located RNA sequences of the same RNA can bind simultaneously to both RRMs These totally opposite topologies might reflect the opposite function of the various RRM proteins, as both sex-lethal and HuD are splicing activators, while hnRNPA1 and PTB are splicing repressors [42]

The RRM, also a protein–protein interaction domain

Over the last few years, biochemical and structural studies have shown that the RRM is not only involved

in RNA recognition but also in protein–protein inter-action In addition to structures of multiple

RRM-RRM 1

RRM 2 UP1

A

C

E

B

D

RRM 1

RRM 2

5' 3' Nucleolin

RRM 1

RRM 2

5'

3' Sex-lethal

RRM 1

RRM 2

5'

3' PABP

5' 5' 3'

3'

U1A

Fig 6 The RRM–RRM interactions Several

protein structures either free or in a

com-plex in which two RRM domains interact

are shown Structures of (A) UP1 in the free

form [53] (pdb:1 lp1), (B) nucleolin in

com-plex with RNA [28] (pdb:1fje), (C) sex-lethal

in complex with RNA [31] (pdb:1b7f),

(D) PABP in complex with RNA [33]

(pdb:1cvj), and (E) U1A homodimer in

com-plex with RNA [29] (pdb:1dz5) The RNA

backbone is shown in yellow (A–E), the

N-terminal RRM domain is displayed green,

C-terminal domain blue, and linker region

red (F) One monomer of U1A is displayed

green and the other blue In all cases,

important residues for the protein–protein

interaction are displayed as balls and sticks.

This figure was generated using the

pro-grams MOLSCRIPT and RASTER 3 D [57,58].

Trang 9

containing proteins as described in the previous

sec-tion, structures of RRM domains in complex with

var-ious proteins or domains have been solved [32,43–51]

Analysis of these structures shows that protein

recogni-tion by RRM domains is very diverse with no general

mechanism emerging For clarity, we distinguish three

main classes of RRM–protein interactions: between

two RRMs, between an RRM-binding RNA and a

non-RRM protein, and finally between RRMs that do

not bind RNA and another protein

Protein interaction involving two RRM domains

The first structure showing an interaction between two

RRMs is the N-terminal region of hnRNPA1 (UP1) in

its free form that contains two RRM domains

separ-ated by a short linker [52,53] The two RRMs form a

compact fold and interact with each other via their

a-helix 2 The interaction is stabilized by two salt

brid-ges connecting two arginines of the first RRM and

two aspartic acids of the second (Fig 6A) This

arrangement positions adjacently the b-sheets of

both domains forming an extended surface of eight

b-strands Similarly, PTB RRMs 3 and 4, separated by

a 24 residue linker region, do not tumble

independ-ently in the free form (F C Oberstrass, S D Auweter

and F H.-T Allain, unpublished data)

These RRM–RRM interactions are not a general

feature of all RRM proteins In the case of sex-lethal

and nucleolin, in the free proteins, the linker is flexible

and the two RRM domains are independent [28,41]

However, upon RNA binding, the two RRM domains

adopt a fixed orientation and contact each other In

the nucleolin structure, the RRMs interact via two salt

bridges located in the loops (Fig 6B) and in the

struc-ture of hnRNPA1, the RRMs interact by salt bridges

located in the a2-helix Other examples of RNA

indu-cing RRM–RRM interactions have also been described

in the case of sex-lethal [31], PABP [33], and HuD

[35] In sex-lethal and HuD, the interdomain

inter-action is mainly governed by two hydrogen bonds

between residues located in b1 and b4 of RRM 1 and

in b2 of RRM 2 (Fig 6C) Furthermore, additional

contacts between RRM 2 and the linker region are

observed In the case of PABP, the interdomain

inter-actions are mediated through many salt bridges and

van der Waals contacts between a2 and b4 of RRM 1

and b2and a1of RRM 2, respectively (Fig 6D)

Another interesting example of RRM–RRM

inter-action is found in the structure of the N-terminal

RRM domain of the U1A protein in complex with the

polyadenylation inhibition element (PIE) RNA [29] In

this case, two U1A proteins bind cooperatively to the

PIE RNA [54] The structure shows that when bound

to RNA, U1A RRM 1 forms a homodimer stabilized

by interactions between the two a-helical C-termini (Fig 6E) On one side the C-terminal a-helix contains charged residues that interact with the RNA and on the opposite side contains hydrophobic residues that constitute the dimer interface

All of these structures clearly show that RRM domains can be involved in RRM–RRM interaction in addition to RNA binding In most of these complexes, these additional interactions contribute to the forma-tion of a larger RNA-binding interface and are there-fore critical to reach high RNA-binding affinity and specificity This feature is likely to be frequently found

in multiple RRM-containing proteins, especially if the interdomain linker is short

Protein interaction involving one RRM domain and another domain

In some cases, it has been demonstrated that RRM-containing proteins can associate with RNA only in the presence of another protein that acts as a cofactor Both U2B¢ and CBP20 need a cofactor, U2A¢ and CBP80, respectively, to recognize RNA Ternary structures of these complexes have been solved that partially explain the importance of a cofactor in RNA–RRM binding [32,43–45] U2A¢ consists of five consecutive leucine-rich repeats, and CBP80 of three helical hairpin repeats very similar to the fold of the middle domain of the translation initiation factor 4G (MIF4G) domain In both cases, the RRM domains of U2B¢ and CBP20 interact with the leucine rich repeat (LRR) motif or the MIF4G domain through their a-helices and loop 4, keeping the b-sheet accessible for RNA-binding (Fig 7) The interactions, however, are different as they are governed mainly by hydrophobic contacts in the U2B¢–U2A¢ complex, and salt bridges and hydrogen bonds in the CBP20–CBP80 complex Furthermore, in the case of CBP20, the N- and C-ter-minal extensions flanking the RRM domain become structured only when in complex with both RNA and CBP80 As for RRM–RRM interactions, these RRM– protein interactions contribute to RNA-binding specif-icity, U2A¢ contacting the RNA and CBP80 stabilizing both the N- and C-termini of CBP20 RRM, two key components of CBP20–RNA recognition (Fig 4) [44]

RRM domains involved only in protein recognition

Some proteins containing RRM domains are involved

in protein–protein but not in protein–RNA interactions

Trang 10

Recently, three-dimensional structures of such

pro-teins in complex partially explained this unexpected

behavior of the RRM domain Two different situations,

however, have been reported In one case, the protein

interaction involves the b-sheet of the RRM domain,

thus preventing RNA binding as in the Y14–Magoh

complex [46–49] or the UPF2–UPF3 complex [50] In a

second case, the interaction is mediated through the

a-helices, leaving the b-sheet solvent-exposed and

there-fore theoretically able to bind RNA, as with the

U2AF35–U2AF65[51], and the U2AF65–SF1 complexes

[46] In this latter case, it was postulated that the

partic-ular behavior of these RRM domains is due mainly to

the identity of the amino acids on the surface of the b-sheet (see below [25])

Y14 and Magoh proteins are part of the exon junc-tion complex that comprises several proteins Y14 and Magoh form a highly stable complex with nanomolar binding affinity [48] The C-terminal domain of Y14 has a typical RRM fold and the RNP 1 and RNP 2 amino acid sequences of Y14 are very similar to other RRM domains (Fig 1) However, Y14 does not bind RNA Structures of the Y14–Magoh heterodimer show that Y14 binds Magoh through its entire b-sheet [46–48] (Fig 8) This particular complex formation of the RRM neatly explains why some RRM domains

do not have RNA-binding activities Similarly, in the structure of the UPF2–UPF3 complex involved in non-sense mediated mRNA decay, the b-sheet of the N-terminal RRM domain of UPF3 binds UPF2 [50] Although the two RRM proteins both interact through their b-sheet, their interacting proteins, Magoh and UPF2, adopt a completely different fold UPF2 has a totally a-helical MIF4G fold very similar to CBP80, while Magoh has an ab fold (Fig 8) Also striking is the fact that both UPF2 and CBP80 adopt a MIF4G fold, but recognize RRM in a totally different manner, UPF2 recognizing the RRM b-sheet and CBP80 the RRM a-helices

The structures of the splicing factors U2AF35– U2AF65 and U2AF65–SF1 are another example of the diversity encountered in protein–RRM recogni-tion U2AF65 contains three RRM domains, the two

U2B"-U2A'

A

B CBP20-CBP80

Fig 7 The RRM–protein–RNA trimolecular complexes (A) The

U2B¢–U2A¢–RNA ternary complex [32] (B) The CBP20–CBP80–RNA

complex [36] The RNA is shown in yellow, the RRM domain in

green, and leucine-rich repeats or MIF4G domains in blue

Resi-dues important for the interaction are displayed as balls and sticks.

This figure was generated using the programs MOLSCRIPT and

RASTER 3 D [57,58].

Y14-Mago

Fig 8 The Y14–Magoh complex [48] Y14 is shown in green, and Magoh is shown in blue The RNP 1 and 2 of Y14 are shown in red This figure was generated using the programs MOLSCRIPT and

RASTER 3 D [57,58].

Ngày đăng: 19/02/2014, 17:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm