Our analyses focused on presumably functional amino acids and the construction of tertiary structure models of the RNA-binding Lsm domain of ataxin-2 and the deubiquitinating Josephin do
Trang 1Structural and functional analysis of ataxin-2 and ataxin-3
Mario Albrecht1,*, Michael Golatta2,*, Ullrich Wu¨llner3and Thomas Lengauer1
1Max-Planck-Institute for Informatics, Saarbru¨cken, Germany;2Institute for Medical Biometry, Informatics, and Epidemiology, University of Bonn, Germany;3Department of Neurology, University of Bonn, Germany
Spinocerebellar ataxia types 2 (SCA2) and 3 (SCA3) are
autosomal-dominantly inherited, neurodegenerative
dis-eases caused by CAG repeat expansions in the coding
regions of the genes encoding ataxin-2 and ataxin-3,
respectively To provide a rationale for further functional
experiments, we explored the protein architectures of
ataxin-2 and ataxin-3 Using structure-based multiple sequence
alignments of homologous proteins, we investigated
domains, sequence motifs, and interaction partners Our
analyses focused on presumably functional amino acids and
the construction of tertiary structure models of the RNA-binding Lsm domain of ataxin-2 and the deubiquitinating Josephin domain of ataxin-3 We also speculate about dis-tant evolutionary relationships of ubiquitin-binding UIM, GAT, UBA and CUE domains and helical ANTH and UBX domain extensions.
Keywords: spinocerebellar ataxia; Machado–Joseph dis-ease; polyglutamine disorder; ubiquitin; valosin-containing protein.
Spinocerebellar ataxia types 2 (SCA2) and 3 (SCA3) are
autosomal-dominantly inherited, neurodegenerative
dis-orders [1,2] SCA3 has also been known as Machado–
Joseph disease (MJD), and SCA2 and SCA3 belong to a
heterogeneous group of trinucleotide repeat disorders This
group includes Huntington disease (HD),
dentatorubral-pallidoluysian atrophy (DRPLA), and other spinocerebellar
ataxia types such as SCA1, SCA7 and SCA17 [3–7] The age
of onset of SCA2 and SCA3 is in the third to fourth decade
[8] The disorders share common phenotypic features such
as the degeneration of specific vulnerable neuron
popula-tions and the presence of intracellular aggregapopula-tions of the
mutant proteins in affected neurons In contrast, the
expression of the disease-associated genes occurs in a great
variety of tissues and is not restricted to neuronal cells.
The SCA2 and SCA3/MJD genes have been mapped to
chromosomes 12q24.1 and 14q32.1 [1,2] The common
underlying genetic basis of SCA2 and SCA3 is the
expansion of a CAG repeat region beyond a certain
thresh-old These CAG repeats encode a polyglutamine (polyQ)
tract in the respective proteins ataxin-2 and ataxin-3 The
polyQ stretch in ataxin-2 lies near the N-terminus at the
5¢-end of the coding region of exon 1 [9], but the polyQ region
of ataxin-3 is contained in exon 10 close to the C-terminus [10] While ataxin-2 is located predominantly in the Golgi apparatus [11], ataxin-3 is found in both the nucleus and the cytoplasm of cells [12].
To provide a rationale for further experiments, we characterized the protein architectures of ataxin-2 and ataxin-3 and investigated domains, sequence motifs, and interaction partners To explore the functional implications,
we assembled a multiple sequence alignment for the Lsm domain of ataxin-2 homologues including the yeast homologue Pbp1 We also constructed a 3D structural model for the RNA-binding Lsm domain of ataxin-2 Similarly, we used a structure-based multiple sequence alignment of the Josephin domain of ataxin-3 homologues
to derive a 3D model of this domain and to analyse specific residues involved in deubiquitination.
Materials and methods Protein sequences were retrieved from the NCBI [13], Ensembl [14], and SWISS-PROT/TrEMBL (SPTrEMBL) [15] databases and protein domain architectures from the Pfam [16] and SCOP [17] databases Sequence accession numbers are given in the respective figure legends and Tables S1 and S2 Species names are abbreviated by first letters (Table S3) Protein structures were obtained from the PDB database [18] The secondary structure assignments of PDB structures were taken from the DSSP database [19] A single capital letter appended to the actual PDB identifier denotes the chosen structure chain We used thePSI-BLAST suite of programs [20] to search for homologues (E-value cut-off 0.005) and the web servers PSIPRED [21], SAM-T99 [22], and SSpro2 [23] to predict the secondary structure of proteins and to form a consensus prediction by majority voting [24] To predict intrinsically unstructured and disordered regions in proteins, we explored the consensus
of the results returned by the DisEMBL [25], DISOPRED [26], GlobPlot [27], NORSp [28] and PONDR [10] online
Correspondence toM Albrecht, Max-Planck-Institute for Informatics,
Stuhlsatzenhausweg 85, 66123 Saarbru¨cken, Germany
E-mail: mario.albrecht@mpi-sb.mpg.de
Abbreviations: A2BP1, ataxin-2 binding protein 1; DRPLA,
denta-torubral pallidoluysian atrophy; DUB, deubiquitinating enzymes;
HD, Huntington disease; MJD, Machado–Joseph disease; NLS,
nuclear localization signal; OTU, otubains; PABP, poly(A)-binding
protein; RMSD, root mean square deviation; SCA, spinocerebellar
ataxia; SnRNPs, small nuclear ribonucleoproteins; UBP,
ubiquitin-specific protease; UCH, ubiquitin C-terminal hydrolase; UIM,
ubiquitin-interacting motif; VCP, valosin-containing protein
*Note: M Albrecht and M Golatta contributed equally to this work
(Received 6 April 2004, accepted 7 June 2004)
Trang 2servers The nuclear localization signals in ataxin-3
homo-logues were discovered with help of the prediction server
PSORT II [29].
Multiple sequence alignments were assembled by means
of T-COFFEE [30] and improved manually by minor
adjustments based on structure prediction results and
pair-wise structure superpositions computed by the programCE
[31] The root mean square deviations (RMSDs) were taken
from theCEsuperpositions We investigated the results of all
state-of-the-art fold recognition methods available via the
online meta-server BioInfo.PL [32], which contacts a dozen
other state-of-the-art prediction servers (the names of which
are listed on the web site http://Bioinfo.PL/Meta/) The
associated 3D-Jury system allows for the comparison and
evaluation of the predicted 3D models in a consensus view
[33] To model the protein structure of 2 and
ataxin-3, we submitted the constructed sequence–structure
align-ments to the 3D modelling server WHAT IF [34] The
sequence alignments depicted in the figures were prepared in
theSEAVIEWeditor [35] and illustrated by the web service
ESPript [36] The protein structure images were drawn
in the Accelrys Discovery Studio ViewerLight The online
version of this manuscript contains supplementary material,
and our web site will provide additional pictures.
Results and discussion
Protein architecture of ataxin-2
Ataxin-2 has 1312 residues (including 22 glutamines of the
polyQ stretch) and a molecular mass of 140 kDa
Ataxin-2 is a highly basic protein except for one acidic region
(amino acid 254–475) containing 46 acidic amino acids
(Fig 1) This region covers roughly exons 2–7 and is
predicted to consist of two globular domains named Lsm
(Like Sm, amino acid 254–345) [37] and LsmAD
(Lsm-associated domain, amino acid 353–475) The LsmAD
domain of ataxin-2 contains both a clathrin-mediated trans-Golgi signal (YDS, amino acid 414–416) and an endoplasmic reticulum (ER) exit signal (ERD, amino acid 426–428) [11,38] It is composed mainly of a-helices according to the results from secondary structure prediction servers.
The rest of ataxin-2 outside of the Lsm and LsmAD domains is only weakly conserved in eukaryotic ataxin-2 homologues and is predicted to be intrinsically unstructured according to the consensus result from the DisEMBL, DISOPRED, GlobPlot, NORSp and PONDR online servers These nonglobular, flexible N- and C-terminal tails (amino acid 1–253 and 476–1312) contain the polyQ region (amino acid 166–187), several highly conserved short sequence motifs as possible protein interaction sites, and conspicuous (R)RG peptides at the C-terminus of the LsmAD domain One of the sequence motifs constitutes a putative PABP [poly(A)-binding protein] interacting motif PAM2 (amino acid 908–925) [39], and (R)RG peptides are well-known to bind RNA in other proteins [40] The N- and C-terminal tails of ataxin-2 also have a high content of proline (179 prolines out of 1090 amino acids, 16.4%) This property and the low complexity of unstructured sequence regions may lead to several significant, but probably false-positive, hits during aPSI-BLASTsearch for homologues of ataxin-2 For instance, despite the use of the standard low complexity filter, our PSI-BLAST search with human ataxin-2 homologues found several questionable hits outside globular domains to homologues of the poly-glutamine DRPLA gene product atrophin For instance, starting thePSI-BLASTsearch with an Arabidopsis thaliana ataxin-2 homologue (SPTrEMBL: Q94AM9), human atrophin is retrieved in the third iteration with an E-value
of 5 · 10)11 Conversely, using the rat atrophin homologue (SPTrEMBL: Q62901) as the start sequence, human
ataxin-2 was detected in the second iteration with an E-value of
8 · 10)04.
Fig 1 Protein architectures of human ataxin-2, its yeast homologue Pbp1, and the P falciparum homologue PF13_0048 of the decapping enzyme DCP2 (DCP2_Pf)
Trang 3RNA binding of ataxin-2
The Lsm domain of ataxin-2 is typical of RNA-binding Sm
and Sm-like proteins, which often form cyclic 6-, 7- or even
14-oligomers [41–43] Generally, Lsm domain proteins are
involved in a variety of essential RNA processing events
including RNA modification, pre-mRNA splicing, and
mRNA decapping and degradation Some of them are also
important components of spliceosomal small nuclear
ribo-nucleoproteins (snRNPs).
The LsmAD domain is contained in the Pfam database
with the name Ataxin-2_N and also occurs in another, as
yet uncharacterized Plasmodium falciparum/yoelii yoelii
gene products PF13_0048/PY07327 without an Lsm
domain (Fig 1) Both Plasmodium gene products have an
additional N-terminal DCP2 domain (also termed box A),
which is always followed by a NUDIX domain [44] in all
known DCP2 homologues This NUDIX domain
consti-tutes the catalytic subunit of the mRNA decapping
holoenzyme DCP1–DCP2 [45,46].
The physiological function of ataxin-2 and closely related
eukaryotic homologues in RNA processing is as yet quite
unexplored [47–50] Interestingly, ataxin-2 has been
observed to interact with A2BP1 (ataxin-2 binding protein
1) [38], whose RNA-binding Caenorhabditis elegans
homo-logue, fox-1, regulates tissue-specific alternative splicing [51].
Disruption of the human A2BP1 gene may cause epilepsy or
mental retardation [52] In addition, ataxin-2 shows
signi-ficant homology to the yeast protein Pbp1
(Pab1/PABP-binding protein 1), which also contains the Lsm and
LsmAD domains; regions outside of these two globular
domains are predicted to be mainly unstructured in Pbp1 as
in ataxin-2.
Although the C-terminal tail of Pbp1 does not contain a
PAM2 motif [39], this yeast protein regulates
polyadenyla-tion after pre-mRNA splicing and interacts with the
C-terminal part of the yeast homologue PAB1 of the
human PABP [53] A2BP1 and PABP are also
evolutionar-ily related and possess RNA recognition motifs [38] These
observations strongly suggest that ataxin-2 is involved in
similar mRNA processing tasks.
Structural modelling of ataxin-2
First, we compiled a list of ataxin-2 homologues including
the yeast homologue Pbp1 and several Lsm domains of
snRNPs and other Sm and Sm-like proteins from various
species Then, we assembled a structure-based multiple
sequence alignment of the Lsm domains,
crystallographi-cally determined structures of which reveal a close structural
homology between archaeal and eukaryotic proteins
(Fig 2) [42,43,54–65] This suggests that the function and
the RNA-binding mode of the Lsm domain have been
preserved during evolution.
The RNA-binding Lsm domain is characterized by a
conserved sequence motif consisting of two short segments
known as Sm1 and Sm2, which are separated by a variable
linker [66,67] The very strong conservation of certain
glycine residues is especially striking and also demonstrates
the evolutionary relationship of ataxin-2 to Lsm domain
proteins The amide groups of the glycines are known to
stabilize the protein structure when forming hydrogen
bonds to adjacent b-strands [55] The secondary structure predictions of ataxin-2 and its yeast homologue Pbp1 are also in good agreement with the known structure of the Lsm domain as open b-barrel, consisting of an N-terminal a-helix followed by a strongly bent five-stranded antiparallel b-sheet with a 310helical turn in some cases before the fifth b-strand.
The top two alignment rows in Fig 2 show human ataxin-2 aligned with the Pyrococcus abyssi Sm1 protein (PDB identifier 1m8v, chain A), the crystal structure of which consists of a heptameric ring with a central cavity like other Lsm domain oligomers [65] This Sm1 protein provides the only Lsm domain structure, which is bound
to RNA inside and outside of the doughnut-shaped ring at
an internal and an external binding site Therefore, we used this alignment of ataxin-2 to Sm1 to model the 3D structure
of the Lsm domain of ataxin-2 in complex with RNA and Lsm domains of ataxin-2 protomers (Fig 3).
Functional analysis of the Lsm domain
We applied the same colour scheme to functionally relevant residues shown in the multiple sequence alignment and the 3D model of ataxin-2 (Figs 2 and 3) Based on the crystal structure of Sm1 from P abyssi bound to uridine heptamers (U7), we marked several amino acids in Sm1, which are involved in RNA binding [65] and are mostly physico-chemically conserved in ataxin-2 (Sm1/ataxin-2 residue numbers) The residues forming the internal U7binding site are H37/K299, N39/L302 and R63/K330, while ionic interactions between K22/K284, R63/K330 and D65/S332 stabilize the RNA-binding area The residues involved in the external U7binding site are R4/R266, H10/T272 and Y34/ Y296, stabilized by a hydrogen bond between H10/T272 and Y34/Y296 It is interesting to note that Sm1 from
P abyssi and from Archaeoglobus fulgidus (PDB identifier 1i4k, chain A) share identical RNA-binding residues except for H10, which is replaced by an asparagine [59,65] Furthermore, we investigated whether ataxin-2 may also form oligomers through the Lsm domain To this end, we used the detailed crystal structure analyses of the very similar snRNP heterodimers D1–D2and D3–B [55] Because
of analogous intermolecular interactions in both dimers, we focused on the complex of D3 with B This complex is stabilized mainly by the pairing of the fifth b-strand (b5) from D3with the fourth b-strand (b4) from B (D3/ataxin-2– B/ataxin-2): R69/V335–R73/K330, L71/V337–L71/L328, and L73/F339–L69/S326 In addition, two hydrophobic clusters formed by residues of D3and B contribute to the stability of the dimer The first cluster includes F70/V336 and I72/Q338 (both in b5 strand) of D3 and F27/Y289 (b2 strand), L67/M324, V70/I327 and L72/L328 (all in b4 strand) of B The second cluster consists of P6/M267, L10/ L271 (both in a-helix), V18/C279 (b1 strand), L32/F293 (b2 strand), I33/K294 (loop after b2 strand), I68/F334, L71/ V337 and L73/F339 (all in b5 strand) of D3and I41/L304, C43/A306 (both in b3), L69/S326 and L71/L328 (both in b4) of B Stacking interactions between guanidinium groups
of arginines R69/V335 of D3and R25/G287 and R49/T312
of B as well as an ionic interaction between E21/Q282 of D3 and R65/S322 of B stabilize the dimer further However, the latter salt bridge is not observed in the D –D complex
Trang 4D3
Trang 5despite identical amino acids Altogether, the degree of
conservation of amino acids relevant for heterodimerization
is only moderate, but may still suggest that ataxin-2 may
form Lsm domain oligomers.
Protein architecture of ataxin-3
The longest splice variant of ataxin-3 possesses 376 amino
acids (including 22 glutamines of the polyQ stretch, amino
acid 296–317) and an approximate molecular weight of
42 kDa Ataxin-3 consists of a globular deubiquitinating
N-terminal Josephin domain (amino acid 1–170) [68,69] and
a flexible C-terminal tail containing two
ubiquitin-interact-ing motifs (UIMs) [70] (also termed LALAL motifs and
PUBs [71], amino acid 223–240 and 243–260) and the polyQ
region (amino acid 296–317) (Fig 4) [72] A slightly shorter
alternative splice variant of ataxin-3 with 373 amino acids
has a third UIM (amino acid 334–351) at the C-terminus.
An as yet uncharacterized ataxin-3 paralogue on the X
chromosome (sequence identity 70%) is expressed in testis
(ataxin-3t) [10] The Josephin domain is also found without
a C-terminal tail in other, as yet uncharacterized, proteins
named josephins (Fig 5) [73].
A highly conserved, putative nuclear localization signal
(NLS) is found upstream of the polyQ stretch (RKRR,
amino acid 282–285), which may be bipartite in the
Caenorhabditis elegans homologue of ataxin-3, consisting
of 17 residues (RRDRQKFLERFEKKKEE, amino acid
296–312) This NLS follows a potential casein kinase II (CK-II) phosphorylation site (TSEE, amino acid 277–280), which may determine the rate of the observed ataxin-3 transport into the nucleus [74] Ataxin-3 may also contain
a nuclear export signal (NES) following the Josephin domain (ADQLLQMIRV, amino acid 174–183) based on our comparison with a published sequence profile of nuclear export signals [75] Furthermore, ataxin-3 contains several conserved sequence motifs similar to NR- and CoRNR-boxes L-x-x-L-L/[IL]-x-x-[IV]-I of transcriptional coactiva-tors and corepressors, respectively [73] Indeed, ataxin-3 interacts with histones and the histone acetyltransferases CBP, p300, and PCAF, which work as transcriptional coactivators In particular, dependent on these cofactors, ataxin-3 represses histone acetylation and transcription [76], and altered protein acetylation has already been implicated
in polyglutamine disease processes [77] Generally, the (de-)ubiquitination of histones has been linked to transcrip-tional regulation [78], which may also explain the observed interactions of ataxin-3.
Ataxin-3 is evolutionarily conserved in eukaryotes inclu-ding P falciparum and plants, but not yeast The P falci-parum homologue PFL1295w of ataxin-3 (ataxin-3_Pf), whose gene expression is upregulated similarly to the
P falciparum josephin homologue PF11_0125 in gameto-cytes [79–81], constitutes an exception because it has only the second UIM conserved (amino acid 250–267) and has
an additional ubiquitin-like UBX domain [82–85] at the
Fig 3 3D model of the Lsm domain of ataxin-2 using three adjacent protomers of the Sm1 protein from P abyssi as template (PDB identifier 1m8v, chain A, B and G) The model illustrates predicted internal (blue) and external (green) binding sites of ataxin-2 to RNA (grey) a-Helices are in shown in red, b-strands are shown in cyan Only functionally relevant residues of the central ataxin-2 protomer are annotated as follows: dark blue boxes point to residues forming the internal site, and light blue boxes mark amino acids stabilizing the RNA binding area; dark green boxes highlight residues involved in the external site, and light green ones indicate stabilizing hydrogen bonds
Trang 6C-terminus (amino acid 271–381) instead of the
polyQ-containing region [69] Like human ataxin-3, this ataxin-3
homologue PFL1295w also has a potential casein kinase II
phosphorylation site (TSDE, amino acid 278–281) close to
basic amino acids, which can be indicative of an NLS
(KKIH, amino acid 293–296) near the N-terminus of the
UBX domain In contrast, the prediction server PSORT II
returns another region inside the UBX domain as a possible
NLS (PRRK, amino acid 339–342) It is unclear which NLS
motif may be functionally more relevant because both
NLS motifs correspond to amino acids at solvent exposed
N-termini of the second and fourth b-strand in the crystal
structure of the UBX domain of the cofactor p47 (PDB
identifier 1s3s) [86] Similar to the P falciparum homologue,
the Cryptosporidium parvum homologue of ataxin-3 also
possesses only one UIM motif (amino acid 266–283) and a
C-terminal UBX domain (amino acid 288–397) instead of a
polyQ region.
Ubiquitin binding of ataxin-3
Ubiquitination fulfills many cellular functions in
cytoplas-mic trafficking, guiding specific proteins through the
endocytic pathways, and targeting proteins to the
protea-some [84,87–93] Above all, the ubiquitin–proteasomal pathway is involved in processing mutant or damaged proteins that cause neurodegenerative diseases The small ubiquitin protein can be covalently linked to other proteins
as single molecule or polyubiquitin chain.
Recently, the two UIMs between the Josephin domain and the polyQ stretch of ataxin-3 have been shown to be capable of binding tetraubiquitin and polyubiquitinated proteins [68,94–97] In our previous study, we used the C-terminal ANTH domain extension, which consists of an antiparallel three-helix bundle, to model the structure of the UIMs in the C-terminal tail of ataxin-3 [73] In fact, novel structure determinations have shown that UIM peptides are a-helices and can form helix bundles in the crystal structure [98] In contrast, the NMR solution structures of UIM peptides reveal that they are single amphiphatic a-helices connected by unstructured linkers [99,100] The latter observation is in agreement with the observed flexibility of the C-terminal tail of ataxin-3 [72].
Furthermore, the ANTH domain itself is evolutionarily, structurally, and functionally related to a VHS domain [101] Lately, the structure of the GAT (GGAs and Tom1) domain directly following the VHS domain of Tom1 and GGAs (Golgi-associate, c-adaptin ear-containing, Arf-binding proteins) was determined crystallographically [102–105] The GAT domain contains a three-helix bundle, which we found to superimpose very well with the helical bundle of the C-terminal ANTH domain extension (RMSD 3.1 A˚, PDB identifiers 1o3x and 1hx8, A chains).
Interestingly, the GAT domains of GGAs and Tom1 have been reported to interact with ubiquitin [106–108] The corresponding ubiquitin binding site was located to the third a-helix of the GAT three-helix bundle, and hydro-phobic amino acids like leucines are important for the interaction (Fig 5) The same residue type also plays an essential role in binding ubiquitin to the UIM a-helix [98– 100] and the third a-helix of the helical bundle in the homologous CUE and UBA domains [109] However, the sequence similarity is quite low, and thus it is difficult to deduce an evolutionary relationship, although the ubiquitin binding sequence of GGAs and Tom1 resembles a noncanonical UIM whose, otherwise strictly conserved, serine residue is replaced by an asparagine except in case of human GGA3 (Fig 5).
Further interaction partners of ataxin-3
It has been shown that ataxin-3 interacts with the ubiquitin-like (UBL) domain of the homologous ubiquitin- and proteasome-binding factors hHR23A and hHR23B, whose yeast orthologue is Rad23 [96,110–112] The latter factors are also involved in the nucleotide excision repair pathway
by targeting the ubiquitinated nucleotide excision repair factor XPC/Rad4 to the proteasome [113] Their UBL domain binds to a UIM helix of the 26S proteasome subunit S5a, and this interaction disrupts the interdomain contacts between the N-terminal ubiquitin-mimicking UBL domain and the two C-terminal ubiquitin-binding UBA domains, thereby inducing the change from a closed to an open protein conformation [109,111,114,115] Rad23 and the yeast orthologue Rpn10 of S5a serve as alternative ubiquitin receptors for the proteasome [116], and the UBA domains
Fig 4 Protein architectures of human ataxin-3, its P falciparum
homologue PFL1295w (ataxin-3_Pf), and human josephin 1
Trang 7of Rad23 inhibit proteasome-catalysed proteolysis by
sequestering Lys48-linked polyubiquitin chains [117,118].
In particular, the NMR solution structures of the UBL
domain of hHR23A/B bound to a UIM peptide of S5a
[99,119] could be used to model the complex of hHR23A/B
and ataxin-3 Similarly, the complex of a UIM of ataxin-3
with ubiquitin could be modelled based on the NMR
solution structure of the UIM of the Vps27 protein bound
to ubiquitin [100].
The C-terminal region of ataxin-3 including the polyQ
region interacts with the N-terminal
cofactor/substrate-binding adaptor domain of the valosin-containing protein
VCP/p97/Cdc48/VAT/ter94 [96,120–123] VCP is an
important multifunctional AAA+ ATPase with two
C-terminal ATPase domains after the adaptor domain, which
provide the energy for major conformational changes [124].
VCP forms hexamers and works as molecular chaperone
involved in a variety of intracellular functions including cell
cycle progression, membrane fusion, vesicle-mediated
trans-port, transcription activation, apoptosis prevention, and
ubiquitin-proteasome degradation, modulating
polygluta-mine-induced neurodegeneration [96,120–123,125–127].
VCP binds the ubiquitin E3 ligase and the chain assembly
factor UFD2a/E4B, which is a U box homologue of yeast
Ufd2 [128], and interacts with and regulates the degradation
of the proteasome-associated ataxin-3, forming a trimeric
complex of ataxin-3, VCP, and UFD2a [96,127,129–131].
Interestingly, Ufd2 binds the UBL domain of Rad23 and
competes with Rad23 for binding to the Rpn1 proteasome
subunit, while the N-terminal UBL domain of the ubiquitin
C-terminal hydrolase Ubp6 interacts with Rpn1 without
competition with Rad23 [116,132].
Furthermore, VCP also binds the C-terminal UBX
domain of the membrane fusion adaptor p47/SHP1/EYC/
Ubx3 [85,86,133], which consists of three domains
UBA-SEP-UBX [134] The crystallographically determined
com-plex of the N-terminal adaptor domain of VCP with this
UBX domain (PDB identifier 1s3s) indicates the interacting
residues [86] and could be used to model the putative
complex of VCP with the C-terminal UBX domain of the
ataxin-3 homologue from P falciparum (ataxin-3_Pf) Like
the UBX domain of p47, ataxin-3_Pf contains the conserved
loop that is essential for an interaction with VCP because it
inserts into a hydrophobic pocket of VCP [86] The UBX
domain structure of p47 is extended at its N-terminus by a
disordered peptide structure and an additional a-helix of as
yet unknown functional relevance [86] The length of this
a-helix is similar to a UIM a-helix (Fig 5), and such a UIM
also precedes the UBX domain of ataxin-3_Pf Therefore, this a-helix of p47 might be related to the second UIM in ataxin-3 homologues (recall that the first UIM is missing in ataxin-3_Pf) In addition, the arrangement of one UIM helix followed by a C-terminal UBX domain is also found in the cofactor Ubx2 with domain architecture
UBA-UAS-Fig 5 Multiple sequence alignment of UIM peptides, divided into
groups by horizontal lines from top to bottom: UIM sequences of the
Pfam seed alignment including first, second, and third UIMs of ataxin-3
homologues, UIM-like peptides from GGAs and Tom1, and related
AP180 sequences The latter are derived from the 3D structure
super-position of the GAT domain of human GGA1 with the AP180
exten-sions from Rattus norvegicus and D melanogaster (PDB identifiers 1hf8
and 1hx8, respectively) The second group of UIMs in ataxin-3
homo-logues also includes the similar N-terminal a-helix of the UBX domain
extension of p47 (PDB identifier 1s3s) For each group, amino acids in
alignment columns with a majority of identical residues are printed on a
black background, and similar amino acids are highlighted in grey
Trang 8UIM-UBX [133] The UIM of Ubx2 binds ubiquitin chains,
and the UBX domain interacts with VCP Thus the same
interactions can be expected for ataxin-3_Pf.
The C-terminal, presumably VCP-binding, UBX domain
of ataxin-3_Pf appears to correspond to the VCP-binding
C-terminal part of human ataxin-3, which follows the
second UIM and includes the polyQ region [120,123,131].
In addition, the polyQ tract of ataxin-3 has been shown to
be indispensable for the interaction with VCP, and its length
correlates with the strength of the interaction These
obser-vations raise the question how human ataxin-3 binds VCP
in contrast to its P falciparum homologue This is
partic-ularly interesting because VCP may suppress polyQ induced
neurodegeneration, and mutations in VCP have been
observed to cause cytoplasmic vacuoles followed by cell
death because of a dysfunctional second ATPase domain
and inclusion body formation [120–123,127,135,136] We
also observed that all VCP sequence variations associated
with Paget disease of bone and frontotemporal dementia
(IBMPFD) [135] are not located in the binding interface of a
UBX domain with the N-terminal adaptor domain of VCP,
but are involved in interactions between protein regions
(for details see the online supplement) Therefore, motions
of the adaptor domain, which are essential for proper VCP
function [124,127], may be impaired by
IBMPFD-associ-ated mutations.
According to a recent yeast-2-hybrid screen [137], a
josephin homologue from Drosophila melanogaster
(CG3781) on the X chromosome interacts with the heat
shock protein HSP60b (CG2830), which is involved in
spermatogenesis [138,139], suppresses ubiquitination [140]
and associates with 38 further proteins including a ubiquitin
E3 ligase, but no other deubiquitinating enzyme except
josephin (CG8184) Interestingly, HSP40 and HSP70
chap-erones have already been observed to associate with VCP,
and they also colocalize with intranuclear ataxin-3
aggre-gates and may play an important role in the disease process
and the impairment of the ubiquitin-proteasome system
[121,141–149].
Structural modelling of the Josephin domain
Recently, it has been observed that the Josephin domain
contains highly conserved amino acids reminiscent of the
catalytic residues of a deubiquitinating cysteine protease
[69], and first experimental results support this function
hypothesis [68]: decrease of polyubiquitination of 125
I-labelled lysozyme by removal of ubiquitin, cleavage of the
ubiquitin protease substrate ubiquitin-AMC, and binding of
the specific ubiquitin protease inhibitor ubiquitin-aldehyde
(Ubal) Mutating the catalytic cysteine in ataxin-3 inhibits
these functions [68].
Previously, we modeled the 3D structure of ataxin-3
based on the ANTH domain [150] of the adaptin AP180
as structural template [73] However, this prediction has to
be revised with regard to the N-terminal Josephin domain
because of the identified cysteine protease signature [69].
In contrast to our previous prediction [73], which relied
on the secondary structure prediction from a single server,
we now formed the consensus result of the three
state-of-the-art secondary structure prediction servers PSIPRED,
SAM-T99, and SSpro2 All three online servers basically
returned the same secondary structure for human ataxin-3 and josephin 1, resulting in a much more reliable secondary structure prediction of b-strands besides a-heli-ces We propose that the increased accuracy of this prediction is due, at least in part, to a substantial growth
of protein sequence and structure databases The predic-ted b-strands in the Josephin domain corroborate a cysteine protease fold of deubiquitinating enzymes (DUBs) and do not support the ANTH domain structure consisting solely of a-helices In hindsight, the fold recognition methods applied in the past to predict the structure of ataxin-3 may have been misguided by the pronounced prediction of a-helices only.
DUBs process ubiquitin proteolytically at the C-terminus and can be divided into at least two evolutionarily related families of cysteine proteases, UBPs (ubiquitin-specific proteases) and UCHs (ubiquitin C-terminal hydrolases) [151,152] However, new ubiquitin-specific families such as otubains (OTU) and JAMMs with low sequence similarity
to known DUBs are still being discovered [151] A consensus of fold recognition servers now selects both available UCH domain structures of human UCH-L3 [153] and yeast YUH1 [154], which superimpose with a low RMSD of 2.0 A˚ (PDB identifiers 1uch and 1cmxA, respectively), as best modelling templates with a moderate confidence score for human josephin 1, but still with only a weak score for ataxin-3 The pairwise sequence–structure alignments returned by the structure prediction servers for 3D modelling differ mainly in the central part of the Josephin domain (amino acid 47–117 in ataxin-3) aligned to DUBs This finding underpins the distant relationship of the Josephin domain to known DUBs The central part does not contain catalytic residues and is thus less conserved, containing insertions of variable length and structure in other cysteine proteases [155].
Based on a multiple sequence alignment of Josephin domain homologues (Fig 6), we used the crystallographi-cally determined structure of YUH1 bound to the ubiquitin-like inhibitor Ubal (PDB identifier 1cmx, chains A and B, respectively) to model the tertiary structure of the Josephin domain of ataxin-3 in complex with Ubal (Fig 7) Thus, the structure of ataxin-3 is predicted to be distinct from the finger–palm–thumb architecture of UBPs such as USP7/ HAUSP [156] Because of the low degree of conservation in the central part, we believe that ataxin-3 and josephin 1 adopt slightly different structures in this part, which are not very similar to YUH1 In addition, we observed that the Josephin domain also resembles the OTU domain because both have a highly conserved histidine three residues downstream of the catalytic cysteine Interestingly, like ataxin-3, the deubiquitinating OTU domain protein VCIP135 interacts with the N-terminal adaptor domain of VCP through the C-terminal tail including a UBL domain and dissociates p47 from the complex with VCP during ATP hydrolysis of VCP [157,158] This observation also indicates a close functional relationship of the homologous ubiquitin-like UBL and UBX domains.
Functional analysis of the Josephin domain The active site of UCHs is divided into two parts as follows (YUH1/ataxin-3 residue numbers) [153,154]: The
Trang 9N-terminal part consists of a glutamine (Q84/Q9) upstream
of a cysteine (C90/C14), both of which form an oxyanion
hole to accommodate the negative charge on the substrate
carbonyl oxygen during catalysis The C-terminal part
contains a histidine (H166/H119), which is thought to be
deprotonated, and an asparagine or aspartate (D181/N134),
both of which activate the side chain of the cysteine to
unleash a nucleophilic attack on the carbonyl carbon atom
of the scissile peptide bond The cysteine, histidine, and asparagine/aspartate constitute the catalytic triad charac-teristic of cysteine proteases such as papain.
While all four discussed catalytic residues are strictly conserved in the Josephin domain (Fig 6), a function-ally relevant disordered loop (E144–N164/V79–Q100)
Fig 6 Structure-based multiple sequence alignment of the Josephin domains of ataxin-3 homologues with the crystallographically determined UCH domains of human UCH-L3 and yeast YUH1 The known DSSP secondary structure assignments of UCH-L3 and YUH1 are shown at the top of the alignment (curled lines for a-helix, arrows for b-strands) The corresponding consensus secondary structure predictions for human ataxin-3 and josephin 1 are also depicted Alignment columns with identical residues are highlighted in purple-coloured boxes, those with more than 50% physico-chemically similar amino acids in yellow boxes (bold-printed letters) Text labels (including UCH-L3/YUH1 and ataxin-3/josephin 1 residue numbers) point to catalytic residues (four grey-shaded boxes) and to other highly conserved amino acids in the Josephin domain The PDB/ SPTrEMBL identifiers of UCH-L3 and YUH1 are 1uch/P15374 and 1cmxA/P35127, respectively NCBI or Ensembl accession numbers for Josephin domain homologues are given in Table S3
Trang 10positioned over the catalytic cleft is aligned in the less
conserved central part This loop maintains an inaccessible
active site, but becomes ordered upon binding of Ubal [154].
Therefore, it may control substrate specificity together with
further strongly conserved amino acids such as N88/L13,
which forms hydrogen bonds with main chain groups of the
loop, and Y167/W120 next to the catalytic histidine [154].
Unfortunately, the structure of the central part and the
loop function remains unclear for the Josephin domain
because of insufficient sequence similarity to UCHs The
Josephin domain is also missing the N-terminal extensions
of UCHs, which are involved in substrate recognition [154].
In addition, a functional relevance of a second strictly
conserved histidine H17, two highly conserved asparagines N20 and N21, and another identical glutamine Q24 downstream of the catalytic cysteine C14 cannot be derived either from the structural model of the Josephin domain (Figs 6 and 7) However, considering their distance from the active site and location inside the protein, they may be solely important for the stability of the domain fold This may also hold true for the strictly conserved S135 and P140 after the catalytically active N134 In contrast, it is easy to interpret
an alternative splice variant of ataxin-3 [10], which consists
of a deletion of the residues from E10 to Q64 including the catalytic cysteine and thus cannot possess proteolytic activity.
Fig 6 (Continued)