Nucleotide, deduced amino acid sequence and relevant features of the Talaromyces emersonii cel7 gene.. The amino acid sequence of the 1GPI model was changed to that of 1Q9H, and the mode
Trang 1Three-dimensional structure of a thermostable native
Alice Grassick1,*, Patrick G Murray1,*, Roisin Thompson2, Catherine M Collins1, Lucy Byrnes3,
Gabriel Birrane4, Timothy M Higgins2and Maria G Tuohy1
1
Molecular Glycobiotechnology Group, Department of Biochemistry,2Department of Chemistry, and3Department of Biochemistry, National University of Ireland, Galway, Ireland;4Beth Israel Deaconess Medical Centre, Harvard Institutes of Medicine,
Harvard Medical School, Boston, MA, USA
The X-ray structure of native cellobiohydrolase IB (CBH IB)
from the filamentous fungus Talaromyces emersonii, PDB
1Q9H, was solved to 2.4 A˚ by molecular replacement 1Q9H
is a glycoprotein that consists of a large, single domain
with dimensions of 60 A˚ · 40 A˚ · 50 A˚ and an overall
b-sandwich structure, the characteristic fold of Family 7
glycosyl hydrolases (GH7) It is the first structure of a native
glycoprotein and cellulase from this thermophilic eukaryote
The long cellulose-binding tunnel seen in GH7 Cel7A from
Trichoderma reeseiis conserved in 1Q9H, as are the catalytic
residues As a result of deletions and other changes in loop
regions, the binding and catalytic properties of T emersonii
1Q9H are different The gene (cel7) encoding CBH IB was
isolated from T emersonii and expressed heterologously
with an N-terminal polyHis-tag, in Escherichia coli The
deduced amino acid sequence of cel7 is homologous to fungal cellobiohydrolases in GH7 The recombinant cello-biohydrolase was virtually inactive against methylumb-eriferyl-cellobioside and chloronitrophenyl-lactoside, but partial activity could be restored after refolding of the urea-denatured enzyme Profiles of cel7 expression in T emerso-nii, investigated by Northern blot analysis, revealed that expression is regulated at the transcriptional level Putative regulatory element consensus sequences for cellulase tran-scription factors have been identified in the upstream region
of the cel7 genomic sequence
Keywords: 3D structure; cel7 gene; GH Family 7 glycopro-tein; Talaromyces emersonii; thermophilic
Cellulose is the major constituent of all plant materials
and is the most abundant organic molecule on Earth
[1,2] Microbial breakdown of cellulose creates the
potential for the production of energy [3–5] Cellulases
are used in waste recycling processes and in the
processing of cellulose-rich raw materials for the paper
and textile industries [6] Cellulose is composed of
repeating glucose units, where each glucose unit is
rotated 180 relative to its neighbours along the main
axis, so that the basic repeating unit is cellobiose Plant
cellulose exists in a highly crystalline form Hydrolysis of
cellulose requires the co-operative action of three classes
of cellulolytic enzymes, namely endo-b-1,4-glucanases
(EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) and b-gluco-sidases (EC 3.2.1.21) The CAZy (carbohydrate active enzymes) [7] classification system collates glycosyl hydrolase (GH) enzymes into families according to sequence similarity, which have been shown to reflect shared structural features To date, GH enzymes are members of 87 families, of which 43 have been assigned
a retaining mechanism of action, 24 an inverting mechanism, and the stereochemical mode of action of the remaining families have yet to be determined The endoglucanases are commonly characterized by a groove
or a cleft into which a linear cellulose chain can fit in a random manner Classically, exoglucanases such as the cellobiohydrolases (CBHs) possess tunnel-like active sites, which can only accept a substrate chain via its terminal regions [8] These exo-acting CBH enzymes act by threading the cellulose chain through the tunnel, where successive cellobiose units are removed in a sequential manner Sequential hydrolysis of a cellulose chain is termed processivity [9] However, some cellulase enzymes are capable of both endo- and exo-actions [10,11] Moreover, some GH families include both endo- and exo-enzymes, indicating that the mode of action can be independent of sequence homology and structural fold Relatively minor changes in the lengths of relevant loops in the general proximity of the active site in such enzymes, may dictate the endo- or exo-mode of action without
Correspondence to M G Tuohy, Molecular Glycobiotechnology
Group, Department of Biochemistry, National University of Ireland,
Galway, Ireland Fax: +353 91 512504, Tel.: +353 91 524411,
E-mail: maria.tuohy@nuigalway.ie
Abbreviations: CBH, cellobiohydrolase; CNP, chloronitrophenyl; Cre,
catabolite repressor element; GH, glycosyl hydrolase; reCBH,
recombinant cellobiohydrolase.
Enzymes: endo-b-1,4-glucanase (EC 3.2.1.4); cellobiohydrolase
(EC 3.2.1.91); b-glucosidase (EC 3.2.1.21).
*Both authors contributed equally to this work.
(Received 12 July 2004, revised 21 September 2004,
accepted 4 October 2004)
Trang 2significant differences in the overall fold In
Trichoder-ma reesei Cel7A, deletion of the exo-loop (residues
243–256) has been shown to decrease activity against
crystalline cellulose It was therefore postulated that the
exo-loop has evolved to facilitate processive hydrolysis of
crystalline cellulose by T reesei Cel7A [12] Fungal
cellu-lolytic enzymes reported to date comprise a single
polypeptide chain, frequently glycosylated, which contains
a catalytic domain usually connected to a cellulose-binding
domain by a proline/serine/threonine-rich linker [13]
CBHs from Humicola grisea [14], Phanerochaete
chrysos-porium[15] and Aspergillus niger [16] have been shown to
consist solely of a catalytic domain The most characterized
CBH members of GH7 are Cel7A from T reesei [8] and
Cel7D (CBH58) from P chrysosporium [17] Both CBHs
consist of two b-sheets that pack face-to-face to form a
b-sandwich [18] Cel7A from T reesei is composed of long
loops, on one face of the sandwich, that form a
cellulose-binding tunnel of 50 A˚ The catalytic residues are
glutamate 212 and 217, which are located on opposite
sides of the active site, separated by an intervening distance
consistent with a double-displacement retaining mechanism
[18] Members of GH7 are thought to follow a retaining
mechanism of action Kinetic parameters and enzyme–
ligand interactions of GH7 enzymes are well characterized
[19–21] Genes from this family have been cloned and
characterized from a variety of fungal sources, including
H grisea [14], T reesei [22,23], Penicillium janthinellum
[24], P chrysosporium [15] and Aspergillus species [16,
25,26], but until recently, never from a truly thermophilic
fungal species [27]
The thermophilic aerobic fungus, Talaromyces emersonii,
isolated from composting biomass, produces a completely
thermostable cellulase system that has not been fully
characterized to date [21,28–30] CBH enzymes from
T emersoniihave been purified, characterized and assigned
to GH families 6 and 7 [21,27,31] Protein thermostability is
not, however, reflected in the overall fold of a protein and is
thought to be the result of more localized differences,
causing thermophilic enzymes to be somewhat less flexible
than mesophilic enzymes [32] In this article we present the
3D structure of the native CBH IB from T emersonii, the
first structure of any protein from this source and the first
structure of a native fungal CBH core (glycoprotein)
Molecular cloning, transcriptional regulation analysis and
overexpression of the cel7 gene in Escherichia coli are also
reported The 3D structure has been deposited in the Protein
Data Bank as 1Q9H
Experimental procedures
Fungal strain and growth conditions
Mycelia harvested from cultures grown from T emersonii
strain CBS 814.70 at 45C on Sabouraud dextrose agar
were used to inoculate liquid nutrient media, as described
previously [30] Cultures were grown at 45C with shaking
at 220 r.p.m At appropriate time-points, mycelia were
harvested by filtration through several layers of fine-grade
muslin, washed with 75 mM sodium citrate, pH 7.5, and
frozen immediately under liquid nitrogen for nucleic acid
extraction
PCR cloning of genomic DNA Chromosomal DNA was isolated from T emersonii mycelia harvested after 24 h of culture on 2% (w/v) glucose,
by using the method of Raeder & Broda [33] Amplification
of a DNA fragment encoding a portion of the catalytic domain of T emersonii cel7 was performed by using PCR and degenerate primers designed from alignments of existing CBH sequences in the databases Reaction cocktails contained 2.5 U of Qiagen HotStarTM Taq DNA poly-merase, 1· buffer (Qiagen, Crawley, West Sussex, UK), 0.5· Q solution, 200 lMof each deoxynucleotide triphos-phate, 1.5 mMMgCl2 and 1 lM of the appropriate gene-specific primers Reaction conditions for PCR amplification were 94C for 15 min (initial DNA polymerase activation),
94C for 1 min, 50–60 C for 1 min and 72 C for 1 min, followed by a final extension of 10 min, for 30 cycles PCR products were separated by electrophoresis through a 1.2% (w/v) agarose gel and subsequently purified by using a Wizard PCR preps DNA purification system (Promega, Southampton, UK) and subcloned into the pGEM-T easy vector (Promega), following the manufacturer’s guidelines Plasmids were purified from E coli JM109 cultures by using
a spin miniprep kit (Qiagen), and sequenced Sequencing reactions were carried out by Altabioscience Laboratories (University of Birmingham, Birmingham, UK) Sequence analysis and database similarity searches were performed by using the online programBLAST[34] against protein (BLASTP) and nucleotide (BLASTX AND BLASTN) sequences stored at the National Centre for Biotechnology Information (NCBI) Rapid amplification of cDNA ends
RNA (10 lg), isolated after growth (48 h) of T emersonii
on solka floc (ball-milled cellulose), was used as a template for RACE, which involved a modification of the manufac-turer’s (Ambion Europe Ltd., Huntingdon, Cambridge-shire, UK) RACE protocol described previously [27] An aliquot (1 lL) of the reaction mixture was used as a template for performing 5¢- and 3¢-RACE PCRs by using the outer and inner RACE primers supplied by the manufacturer and the outer and inner gene-specific primers designed from the cel7 PCR products The Cel7 outer and inner RACE primers were as follows: outer 5¢-RACE
inner 5¢-RACE primer 5¢-GTTTGCTTCCCAGACATC CATC-3¢; outer 3¢-RACE primer 5¢-ATGCTGTGGTTGG ATTCCGACTAC-3¢; and inner 3¢-RACE primer 5¢-AAC TCCTACGTGACCTACTCGAAC-3¢ PCR products were cloned and sequenced as described previously Isolation ofcel7 cDNA and genomic genes Full-length genomic and cDNA sequences corresponding to cel7 were amplified from T emersonii first-strand cDNA and genomic DNA, respectively, by PCR with primers corresponding to the 5¢ start and 3¢ stop sequences identified
in the 5¢- and 3¢-RACE products The cel7 sense and antisense primers were 5¢-ATGCTTCGACGGGCTCTTC TTCTA-3¢ and 5¢-TCACGAAGCGGTGAAGGTCGA GTT-3¢, respectively Reactions contained 1.25 U of Pfu DNA polymerase, 1· Pfu reaction buffer, 200 lMof each
Trang 3deoxynucleotide triphosphate and 1 lMof the appropriate
gene-specific primers PCR products were gel purified,
subcloned and sequenced as described previously
Northern blot analysis and genomic library screening
Northern blot analysis of cel7 expression was carried out as
described previously [27] A T emersonii Sau3A genomic
library was prepared in LambdaGEM-11 (Promega) E coli
KW251 was used as the host strain in the preparation and
screening of the genomic library Plaque lifts were carried
out as described by Sambrook et al [35] Hybridization was
conducted overnight at 68C in 5· NaCl/Cit, 0.1% (w/v)
N-lauroylsarcosine, 0.02% (w/v) SDS and 1% (w/v)
block-ing reagent Full-length dioxygenin (Roche Molecular
Biochemicals, Roche Diagnostics Ltd., Lewes, East Sussex,
UK)-labelled cel7 (20 ngÆmL)1of hybridization buffer) was
used as a probe Detection was performed according to the
manufacturer’s instructions The presence of the full-length
gene in positively hybridizing single plaque-forming units
was confirmed by PCR, and the plaques were purified by
using a lambda purification kit (Qiagen) The plaques were
then sequenced directly, using a cel7 gene-specific
sequen-cing primer (5¢-GCATTCCTGCCATGTCAG-3¢) to
generate sequence data for the 5¢ region upstream of the
ATG start codon
Expression ofcel7 in E coli
Primers F1 (5¢-CACCCAGCAGGCCGGCACGGCG-3¢)
and R1 (5¢-TCACGAAGCGGTGAAGGTCGAGTT-3¢),
corresponding to the N- and C-terminal regions of the
mature protein, were used to amplify cel7 cDNA (the
N-terminal signal peptide, i.e amino acids 1–18, was
removed) The CACC corresponding to the GTGG
over-hang in the TOPO cloning vector (Invitrogen Ltd.,
Paisley, UK) is underlined in the primer sequence above
The purified PCR product was ligated into the pENTR/SD/
D-TOPO vector and transformed into One Shot Top10
E coli competent cells, according to the manufacturer’s
instructions An LR recombination reaction between the
entry clone, pE-Cel7, and the destination vector, pDEST-17
(Invitrogen), was transformed into E coli DH5a
library-efficient cells, thereby generating the expression clone,
pD-Cel7, with an N-terminal poly-histidine tag Multiple
transformants were analysed by restriction analysis and
PCR to confirm the presence and correct orientation of the
insert at all stages For expression, plasmid DNA was
purified and transformed into BL21-AI competent E coli
cells (Invitrogen), which were cultured to mid-log phase,
and expression was induced by the addition of 0.2% (w/v)
arabinose followed by a further growth period of 4 h at
37C Pilot experiments indicated that the CBH protein
was expressed in the inclusion body fraction Cells were
harvested by centrifugation (3630 g for 5 min) from a
50 mL culture, and the cell pellet was resuspended in 8M
urea The cell lysate was sonicated with three, 5-s,
high-intensity pulses, centrifuged at 1307 g for 15 min to pellet
cellular debris, and the supernatant was applied to a
Nickel-nitrilotriacetic acid purification matrix (Invitrogen) The
lysate was allowed to interact with the matrix at room
temperature for 30 min with gentle agitation and then
washed with 2 volumes of wash solution (containing 8M urea, 20 mM sodium phosphate, 500 mM NaCl, pH 7.8), followed by 2 volumes of a second wash solution (containing
8M urea, 20 mM sodium phosphate, 500 mM NaCl,
pH 6.0) The column was then washed with 4 volumes of a final wash solution of 50 mMsodium phosphate, and 20 mM imidazole, pH 8.0 Recombinant CBH (reCBH) was eluted from the column by application of a solution of 50 mM sodium phosphate, pH 8.0, containing 250 mMimidazole Denaturation and refolding of reCBH and enzyme assay reCBH (1 mg) was denatured by incubation in a solution of
8Murea/0.1MTris/HCl, pH 8.0, in the presence of 100 mM dithiothreitol and 1 mMEDTA, for 2 h at 20C The pH was lowered to pH 4.0 by dropwise addition of 1MHCl, and the dithiothreitol was removed by dialysis against the same buffer without the dithiothreitol Denatured and reduced reCBH was diluted 1 : 100 in a buffer solution containing 0.1M Tris/HCl, pH 8.5/1 mM EDTA/0.3 mM oxidized glutathione/3 mMglutathione, and then incubated
in a renaturation buffer containing 2.5 mg of protein disulphide isomerase at 30C for 30 h reCBH was dialysed against 100 mM sodium acetate, pH 5.0, followed by concentration in a Millipore microconcentrator fitted with
a 10 kDa cut-off membrane reCBH activity was measured
by incubating 10 lL of renatured enzyme with 100 lL
of 1 mM chloronitrophenyl-lactate (CNP-lactate), 1 mM 4-nitrophenyl-cellobioside or 50 lM 4-methylumberiferyl-cellobioside, at 50C Reactions were terminated by the addition of 100 lL of 1MNa2CO3or 0.2Mglycine/sodium hydroxide, pH 10.5, and the absorbance (405 nm) or UV fluorescence was measured
Purification of CBH IB CBH IB was purified from 2% (w/v) solka floc cellulose-induced cultures and characterized as described previously [21] The purified enzyme was concentrated to 20 mgÆmL)1
in 20 mMTris buffer, pH 7.5, and stored at 4C Peptide sequence information for native CBH IB was determined by Edman degradation on an automated sequenator (J Gray, University of Newcastle-upon-Tyne, Newcastle-upon-Tyne, UK)
Crystallization and data collection Native CBH IB from T emersonii was crystallized by using the hanging-drop vapour-diffusion method with ammo-nium phosphate (dibasic) as a precipitant at pH 8.5 Crystals of CBH IB, which diffracted to 2.4 A˚, were obtained Data were collected at room temperature on the multipolar wiggler beamline, BW7B, at the DORIS storage ring, EMBL Hamburg Outstation, Germany using a Mar345 area detector Data processing indicated that CBH IB crystallised in the tetragonal space group P41212, with unit cell dimensions a¼ b ¼ 74.42 A˚, c ¼ 176.92 A˚ [31] Structure solution
The structure was solved by molecular replacement by utilizing the program [36] Molecular replacement
Trang 4was completed using two separate search models, chosen
based on sequence homology The models used were the
catalytic domain of T reesei Cel7A (PDB 1CEL) and the
catalytic domain of P chrysosporium Cel7D (PDB 1GPI)
Structure refinement
A total of 5% of the reflections in the data set was set aside
for free R-factor calculations during refinement.REFMAC5
[37] from the CCP4 [38] suite of programs was used
throughout this refinement, with the program [36] being
employed for graphical displays and manipulation of the
models With each round of refinement, maps were
produced and the model was rebuilt where electron density
supported the changes Water molecules were located and
refined by using the programARP_WARP [39] The
stereo-chemical quality of the model was followed by using the
programPROCHECK[40]
Results
Isolation of genomic and cDNA clones
The cel7 degenerate primers amplified a 719 bp PCR
product from T emersonii chromosomal DNA The
prod-uct was cloned, sequenced and found to exhibit homology
to other fungal gene cel7 sequences Based on this sequence,
5¢- and 3¢ outer and inner RACE PCR primers were
designed to amplify the 5¢- and 3¢ ends of the cel7 gene
Sequence analysis confirmed the RACE products to be part
of the cel7 gene, which included a 54 bp 5¢ untranslated
region and a 281 bp 3¢ untranslated region, including a
polyA tail The full-length genomic (GenBank AF439935)
and cDNA (GenBank AY081766) cel7 clones were
ampli-fied from first-strand cDNA and chromosomal DNA,
respectively, by using N- and C-terminal gene-specific
primers based on the RACE products Cel7 was encoded
by a 1365 bp open reading frame encoding 455 amino acids
and interrupted by two introns (52 and 61 bp), with
consensus 5¢- and 3¢ intron splice sites (Fig 1)
Sequence analysis
Peptides sequenced from native CBH IB confirmed the
identity of the T emersonii cel7 gene/gene product; the
location of these peptides in the deduced polypeptide
sequence is given in the legend to Fig 1 Comparison of
the deduced cel7 amino acid sequence from T emersonii
with those from P chrysosporium (GenBank: AAA19802),
T reesei (GenBank CAA49596), A niger (GenBank
AAF04491), H grisea (GenBank AAD11942) and A
acule-atus(GenBank BAA25183) gave sequence identity values of
65%, 64%, 73%, 51% and 68%, respectively (Fig 2) [41]
Alignment of the deduced polypeptide sequence of
T emersonii cel7reveals the presence of a terminal catalytic
domain Other cel7 (cbhI) gene products possessing a
catalytic domain exclusively have been identified and
include H grisea [42] and A niger [16] Cel7 genes from
T reesei[43], P chrysosporium [44] and A aculeatus [23],
however, contain a modular structure composed of a
C-terminal carbohydrate-binding module linked via a proline/
serine/threonine-rich linker to the catalytic domain [45]
There are two predicted N-glycosylation sites in the catalytic domain of 1Q9H (Fig 3), i.e Asn-X-Ser/Thr (X is any amino acid except proline) consensus sequence, at Asn267 and Asn431 There are 18 residues corresponding to the signal peptide at the N-terminus of the translated protein product Alignment of the existing fungal CBH sequences revealed that 1Q9H from T emersonii comprises features found in both Cel7D of P chrysosporium and in Cel7A of
T reesei
Analysis of theT emersonii cel7 upstream region Initial screening of 6000 k phage clones from the T emersonii Sau3A genomic library identified two positively hybridizing clones Sequence analysis of the 5¢ region upsteam from the start codon of the purified cel7 clones revealed putative TATA-like and CCAAT box sequences located upstream of the start codon at bp)99, )132, )340, 1040, )1242, )1348, )1476 and )1694 In filamentous fungi [46], and in higher eukaryotes [47], the CCAAT sequence is known as an upstream activating sequence The binding sites for putative cellulase transcription factors [activator of cellulase expres-sion I (ACEI) and ACEII] [48,49] are located upstream of the start codon at bp)562, )844,)853and )1175,whileputative binding sites for the catabolite repressor element (Cre) [50,51] are located upstream of the start codon at bp)239, )265, )320, )359, )460, )977, )1404 and )1523
Northern blot analysis ofT emersonii cel7 expression Solka floc cellulose, lactose and beechwood xylan induce high levels of cel7 expression in T emersonii (Fig 4) Similar cellulase expression with complex cellulose has been docu-mented in P chrysposporium [52] and T reesei [53] Methyl xylose and gentiobiose, a b-1,6-linked glucose disaccharide, induce low levels of cel7 expression, relative to solka floc, in
T emersonii Gentiobiose has been shown to induce other cellulases in T emersonii [27] Other researchers have repor-ted induction of CBH A and B, and endoglucanase genes from A niger are also induced byD-xylose [16] Sophorose, a b-1,2-linked disaccharide of glucose, has previously been shown to be a poor inducer of cellulase activity in
T emersonii[54], and it has been postulated that sophorose could be the natural inducer of cellulase expression in
T reesei Cellobiose is a poor inducer of the T emersonii cellulases and did not induce detectable levels of cel7 in this study Glucose-induced cultures displayed no detectable levels of cel7 Indeed, the addition of 2% (w/v) glucose for 2 h
to T emersonii mycelia, previously cultured on solka floc for
48 h, resulted in the abolition of the cel7 signal The regulatory proteins, CreA [55] and Cre1 [51], similar to Mig1 in Saccharomyces cerevisiae [56], mediate glucose repression in Aspergillus and Trichoderma species The 5¢ upstream region of T emersonii cel7 has eight potential catabolite repressor-binding sites (SYRGG) The sequence
of a gene encoding CreA from T emersonii has recently been submitted to the GenBank database (AF440004) Expression of cel7 inE coli
A recombinant protein, of 57 000 relative molecular mass, was expressed in E coli BL-21A Under the conditions
Trang 5tested, reCBH was present in the insoluble inclusion fraction
(Fig 5A) The protein was purified under hydrid conditions
(denaturing/renaturing) on a Ni-nitrilotriacetic acid column
reCBH was inactive against CNP-lactate Denaturation of
reCBH, followed by refolding, with concominant disulphide
bond formation, in the presence of protein disulphide
isomerase in renaturation buffer, successfully restored
partial biological activity of reCBH against CNP-lactate
and methylumberiferyl (Fig 5B)
Structure solution and refinement of CBH IB
Molecular replacement was performed by using theCCP4
(1994) programs contained in the Automated package for
Molecular Replacement (AMORE) [33] Both 1GPI, which
was solved at 1.32 A˚ resolution and 1CEL, which was
solved at 1.8 A˚ resolution, were used as search models
Rotational and translational searches were performed at different resolutions in the range of 69 A˚ to 2.4 A˚ Rigid body refinement was carried out after each translation function to refine the position of the potential solution Euler angles, fractional coordinates, correlation coefficients and R factors for the best molecular replacement solution for each model, using P41212 as the space group of the 1Q9H crystal, were found The best solutions had correla-tion coefficients of 56.1% and 55.3%, and R-factors of 40.0% and 40.9% for 1CEL and 1GPI, respectively Refinement of the models produced by AMORE was performed by using the CCP4 program, REFMAC5 [34] REFMAC5 was used to carry out restrained refinement on X-ray data by using the maximum likelihood method The
Roveralland Rfreevalues from the first round ofREFMAC5 cycles on the 1GPI model were 28.6% and 33.6%, respectively, while those for the 1CEL model were 28.9%
Fig 1 Nucleotide, deduced amino acid sequence and relevant features of the Talaromyces emersonii cel7 gene The stop codon is denoted by an asterix The N-glycoslation sites are underlined Catalytic residues are boxed Cysteine residues involved in the formation of disulphide bridges are
in bold (19–25, 50–71, 61–67, 135–401, 169–207, 173–206, 227–253, 235–240 and 258–334 bp) Four tryptophan residues involved in the glucosyl-bindinding platform (W38, W40, W371, W380) in the active-site tunnel are bold italics and underlined Four peptides sequenced from native CBH
IB had complete identity with Y124-D129, Y267-D272, I295-P300 and F445-S455 in the deduced protein sequence.
Trang 6and 35.5%, respectively The 1GPI model was used for
further analysis The graphics programTURBO[36] was used
to examine both models and the maps produced by
REFMAC5 2Fo-Fc and Fo-Fc maps were used and analysed
with contour levels set to 1.0 The amino acid sequence of
the 1GPI model was changed to that of 1Q9H, and the
model was rebuilt where changes were supported by the
electron density Changes in R factors were used as a guide
to improvements to the overall structure Further rounds of
model mutation and rebuilding resulted in a model with an
R-factor of 16.1% and an R-free of 22.9% (Table 1)
Electron density maps showed almost continuous density
for the backbone of CBH IB The final model, 1Q9H, included 430 of the 437 amino acid residues of CBH IB The final two amino acids and the loop region (from amino acids 193–197) were not visible in electron density maps In addition, no side-chain density was apparent for four residues, which were subsequently modelled as alanine All
of these residues are located on the surface of the protein and are presumed to be disordered Three N-acetylgluco-samine and 175 water molecules were located within the model Average isotropic temperature factors (B factors) for the 1Q9H structure were calculated by using the CCP4 program Average isotropic temperature factors
Fig 2 Multiple sequence alignment of cellobiohydrolase IB (Tal.em; CBH IB) with glycosyl hydrolases from Humicola grisea (Hgrisea; GenBank AAD11942), Aspergillus niger (Aspniger; GenBank AAF04491), Phanerochaete chrysosporium (Phcry; GenBank AAA19802), and Tricho-derma reesei (T.reesei; GenBank CAA49596) Residues in white against a black background are amino acids that are identical or have a conserved substitution in all five sequences Residues in white against a grey background are amino acids that are identical or conserved in four out of the five sequences.
Trang 7for the main chain were 20.91 A˚2 and root-mean-square
deviations (rmsd) from ideal bond lengths and angles were
0.009 A˚ and 1.281 A˚, respectively.PROCHECKwas used to
verify the stereochemical quality of the model The
Rama-chandran plot showed that 86% of residues lie in the most
favoured regions and 13.8% lie in the allowed region, while
Ser311 was the only nonglycine residue in the generously
allowed region and there were no residues in the disallowed
regions Peptide bond planarity for the main chain was
found to be 7.0, nonbonded interactions were 0.6 per 100
residues, a-carbon tetrahedral distortion was 1.8, the
standard deviation of the hydrogen bond energies was 0.7
and overall G-factor, a measure of the normality of the
structure, was 0.0
Overall structure ofT emersonii 1Q9H 1Q9H is a large single-domain protein with overall dimen-sions of 60 A˚ · 40 A˚ · 50 A˚ (Fig 6) About one-third
of this domain is arranged in two large antiparallel b-sheets, which are stacked face-to-face and are highly curved, forming convex and concave surfaces The convex and concave sheets of the b-sandwich are composed of seven b-strands Many of the side-chains in the b-sheets are hydrophobic, and interactions between these residues appear to hold the b-sandwich in position With the exception of four a-helices and two pairs of short b-strands, the rest of the protein consists almost entirely of loops connecting the b-strands The loops extending from the
Fig 4 Northern blot analysis of Talaromyces emersonii cel7 expression with various carbon sources at 2% (w/v) Glucose at 24 h and 48 h (lanes 1 and 2); methyl glucose at 48 h (lane 3); sorbitol at 48 h (lane 4); galactose at 48 h (lane 5); galactitol at 48 h (lane 6); methyl xylose at 36 h (lane 7); glycerol at 48 h (lane 8); gentiobiose at 48 h (lane 9); cellobiose at 48 h (lane 10); and beech wood xylan at 48 h (lane 11) Time course transcription,
24 h, 48 h and 96 h (lanes 12, 13 and 14) of cel7 after transfer to Solka floc (ball-milled cellulose) Addition of 2% (w/v) glucose to 48 h cultures of
T emersonii cultured on Solka floc with RNA isolated after a further 2 h (lane 15) Time course transcription, 24 h and 48 h (lanes 16 and 17) of cel7 after transfer to lactose The bottom panel is the 18S ribosomal RNA loading control.
Fig 3 Electron maps at the two
N-glycosyla-tion sites (A) Asn267 with two GlcNAc
(2-amino-2-N-acetylamino- D -glucose) residues.
(B) Asn431 with one GlcNAc residue.
Fig 5 SDS/PAGE and activity analysis of
recombinant cellobiohydrolase (A) SDS/
PAGE [10% (w/v) gel] analysis of purified
recombinant cellobiohydrolase (reCBH).
Lane 1, molecular makers; lane 2, uninduced
cells; lane 3, induced cells (4 h); and lane 4,
purified His6-reCBH (B) reCBH activity
against methylumbelliferyl-cellobioside after
0, 1 and 24 h Substrate controls, lane 1;
enzyme reactions, lane 2.
Trang 8b-sandwich forms a tunnel, which runs the length of the
concave sheet, into which the cellulose substrate can be
accommodated The b-sandwich represents the
characteris-tic fold of GH7 and is also the fold of the legume-lectin
family and of GH16 [57]
The loops extending from the b-sandwich are stabilized
by the presence of nine disulphide bonds which are located
between residues 19–25, 50–71, 61–67, 135–401, 169–207,
173–206, 227–253, 235–240 and 258–334 The N-terminal
glutamine residue is present as the modified pyroglutamate
group, as observed in other GH structures [17,58] Electron
density corresponding to N-glycosylation is visible at two
asparagine residues, namely Asn267 and Asn431 (Fig 3) It
was possible to position two N-acetylglucosamine residues,
linked via a b-1,4 bond, at Asn267 A single N-linked
N-acetylglucosamine was seen in the model at position
Asn431
Structure ofT emersonii 1Q9H, in comparison with
P chrysosporium 1GPI and T reesei 1CEL
ABLASTsearch of the Protein Data Bank (PDB) revealed that the protein structures with the highest sequence homology to 1Q9H were structures 1GPI and 1CEL, which are the catalytic domains of CBH Cel7D from P chrysos-porium[17] and CBH Cel7A [8] from T reesei, respectively
P chrysosporium has a sequence identity of 67% with Cel7A, while T reesei has an identity of 66%
While the sequence homology between 1Q9H, 1CEL and 1GPI are similar, the areas of shared homology differ Superimposing the C-alpha traces of 1GPI and 1CEL on 1Q9H, gave rmsd values of 0.71 A˚ and 0.67 A˚, respectively (Fig 7)
Substrate-binding subsites The X-ray structure of the T reesei CBH, with eight glucose residues bound (PDB 7CEL), identifies some 20 residues involved in enzyme–substrate interactions Superposition of this structure on 1Q9H shows that all but two of these residues are conserved and suitably positioned for inter-actions with the substrate Four tryptophan residues form a glucosyl-binding platform in sites)7, )4, )2 and +1 in the tunnel of 1CEL; equivalent tryptophan residues are found
in 1Q9H at positions 38, 40, 371 and 380 A tyrosine residue (Tyr47) present in the T emersonii CBH IB sequence, and seen in 1GPI but not in 1CEL, is located at the entrance of the tunnel, which Munoz et al suggests may constitute an additional binding subsite [17] Three arginine residues in the product sites of 1CEL (+1, +2 and +3) are proposed
to assist in the binding and positioning of the substrate and play a role in the recognition of the reducing end of the cellulose chain Arginine side-chains are present in all equivalent locations in 1Q9H (Fig 7)
Tunnel-forming loops There are four major loops involved in the cellulose-binding tunnel in 1CEL It is postulated that Asn197 and Asn198
Fig 6 Stereoview of Talaromyces emersonii 1Q9H The substrate is superimposed onto 1Q9H The figure was drawn by using TURBO
[36].
Table 1 Final statistics for the structure of Talaromyces emersonii
1Q9H Values in parentheses refer to the last resolution shell.
Unit cell dimensions
Resolution range (A˚) 20–2.40
Completeness (%) 94.3 (94.3)
Mean I>2s (I) (%) 78.8 (56.8)
No of water molecules 175
No of sugar molecules 3
rms bond lengths (A˚) 0.009
Average B main chain (A˚2) 20.91
Average B water (A˚2) 31.28
Trang 9make van der Waals interactions with Tyr370 and Tyr371, on
the opposite loop, thus enabling it to form a fully enclosing
tunnel [8] While sequence analysis shows that 1Q9H
possesses the equivalent Asn residues (Asn193 and
Asn194), one of the tyrosine residues on the opposite loop
is replaced by an alanine (Ala374), forming a more open
tunnel; however, electron density in this area of 1Q9H is
poor In 1GPI, neither asparagine residues are present and a
histidine and an alanine residue are found in the equivalent
tyrosine positions The tunnel-forming loop (amino acids
240–248) in 1GPI is significantly shorter than in 1CEL and
1Q9H, owing to a six amino acid deletion, depicting a more
exposed catalytic site for 1GPI In 1CEL, three amino acids
form a tight turn over site)6,withGln101hydrogen bonding
to the glycosyl residue in site)5, thus forming the lid of the
binding site The structures 1Q9H and 1GPI have a deletion
of these three residues, thus leading to a more open
substrate-binding site
Catalytic binding site
Brooks et al showed, by NMR, that the CBHs I from
T emersoniihas a retaining mechanism of action [20] This
type of mechanism, as shown by Davies & Henrissat [9], involves a proton donor and a base separated by 5.5 A˚ [59–61] Henrissat [62] classified all members of GH7, which catalyse the hydrolysis of the b-1,4-glycosidic bond of cellulose, as retaining enzymes, i.e they retain the configur-ation of the anomeric carbon Glu212 and Glu217 have been identified as the proton donor and acceptor, respect-ively, in 1CEL Sequence analysis of 1Q9H and 1GPI shows that these residues are conserved, suggesting that they carry out the same function Based on the proposed mechanism of action from 1CEL, Glu209 of 1Q9H may act as the nucleophile, while the proton donor is likely to be Glu214 The proposed catalytic residues are separated by 5.57 A˚ The Asp211 residue of 1Q9H is in a position to share a proton with the nucleophile, in a short hydrogen bond (O-O distance 2.51 A˚; Fig 8) The residue Glu214 forms a weak hydrogen bond to Asn138 A platform of hydrophobic residues has recently been identified as being mechanistically relevant as a transition-state stabilizing factor in GH family members [63] A tyrosine residue (Tyr142), present near the )1 subsite in 1Q9H, is thought to be involved in this platform
Discussion
This article presents the first report on the purification and 3D structural determination of a native core CBH protein, and of the cloning and over-expression of the corresponding gene, from a thermophilic fungal source CBH IB is extremely thermostable with a temperature optimum of
68C at pH 5.0 and a half-life (t½) of 68.0 min at 80C and
pH 5.0 In comparison, Cel7a from T reesei has a tempera-ture optimum of 62C over the pH range 3.5–5.6 The cel7 gene from T emersonii was cloned and the deduced amino acid sequence used during the structure solution of the native enzyme Family 7 contains both CBHs and endoglucanases The structure of CBHs is distinguished from that of endoglucanases by the presence of loops of polypeptide chain covering the active site residues, which convert the active site cleft of endoglucanases into the characteristic tunnel of CBHs [8] Three CBHs belonging to GH Family 7 –
T reesei 1CEL, P chrysosporium 1GPI, and T emersonii 1Q9H – are generally similar in structure The catalytic domains are single domain proteins with two large antipar-allel b-sheets that stack face-to-face to form a b-sandwich The rest of the three CBHs consist almost entirely of loops
Fig 7 C-alpha trace of 1Q9H (yellow) superimposed on the C-alpha
trace of 5 Cel (white), illustrating the more open active site of 1Q9H The
sugar residues are superimposed in blue The catalytic residues are
shown in red The figure was drawn by using TURBO
Fig 8 Diagram of the active site of 1Q9H
showing the distance between the proposed
catalytic residues.
Trang 10connecting the b-strands However, on closer inspection of
the structures, local variations are reflected in the sequence
differences The cellulose-binding sites in 1Q9H are more
accessible than those in 1CEL The absence of the three
amino acids that are observed to form a tight turn over the
)5/)6 subsites in 1CEL, confer a more open entrance to the
cellulose-binding sites in 1Q9H This proposal is supported
by the replacement of Asn7 in 1CEL by a smaller threonine
residue in 1Q9H and 1GPI at the)7 subsite and of Tyr371 in
1CEL by Ala374 in 1Q9H at)3/)4 subsites A tyrosine
residue present in 1GPI and 1Q9H, but absent in 1CEL, has
been suggested, by Munoz et al [17], to be an additional
substrate-binding site The more open tunnel structure is
probably an adaptation to the lack of a CBM, allowing short
chain oligosaccharides more access to the active site, with
supporting evidence from the higher catalytic rate (kcat)
and catalytic efficiency (kcat/Km) of 1Q9H 13.4Æs)1 and
3.6Æs)1ÆmM )1 [21] (compared with 0.093Æs)1 and
0.23Æs)1ÆmM )1 [12] for 1CEL) with the oligosaccharide
derivative 4-NP-lactopyranoside An insertion of eight
amino acid residues common to 1Q9H, P chrysosporium
Cel7D and T reesei endoglucanase Cel7B can be seen
Although this insertion is located at the outer regions of the
structure, it could potentially have implications for function
and will be the target of future protein engineering studies
The probable catalytic residues, nucleophile Glu209 and
proton donor Glu214, of 1Q9H are located approximately
on opposite sides of the cleavable glycosidic linkage in the)1/
+1 subsites, with their carboxylic groups 5.57 A˚ apart Four
tryptophan residues located along the substrate-binding
tunnel in 1CEL, which are the determinants of the
glycosyl-binding sites, are conserved in 1Q9H Density was poor for
one of the tunnel-forming loops of 1Q9H (residues 193–197)
The tunnel is composed of loops that are inherently flexible,
and the absence of good density in the loops is perhaps
indicative of its flexibility It is worth noting that the
structures of T reesei GH7 CBHs were solved in the presence
of substrates However, as 1Q9H was solved in the absence of
bound substrate, one could imagine that if a substrate was
present in the structure the loops would close over the
substrate yielding a structure more like that of 1CEL
The cel7 gene consists of a 1365 bp open reading frame
encoding 455 amino acids interrupted by two introns The
deduced amino acid sequence revealed a secretory signal
peptide and a CBH catalytic domain The 5¢ upstream region
of cel7 has eight potential Cre-binding sites, and it is
probable that glucose repression of cellulase transcription is
mediated through a Cre protein in T emersonii (the gene
sequence for a Cre-like protein has been cloned from
T emersonii) It has been shown previously that sophorose is
a weak inducer of cellulases in T emersonii [64], but is the
proposed natural inducer of cellulase expression in T reesei
Induction of cel7 and cbhII [27] expression by gentiobiose
suggests that this glucose disaccharide may be the natural
cellulase inducer in T emersonii and indicative of an
alternative cellulase induction mechanism in this fungus
The carbohydrate-binding module and linker region that are
characteristic of some other GH family members were not
encoded in the gene, in contrast to cbh 2 from the same
source [27] Biochemical analysis of the CBHs from
T emersonii, previously reported from this laboratory, has
shown that the hydrolysis of crystalline cellulose (Avicel) by
CBH IB is 77% lower than observed with CBH IA [21] Earlier studies revealed that removal of the carbohydrate-binding module from the T reesei CBH resulted in a 90% decrease in activity against Avicel [65,66] More recently, Nutt et al [67] have shown, by progressive curve analysis, that intact CBHs from T reesei and P chrysosporium show higher activities than their corresponding cores against bacterial microcrystalline cellulose Takashima et al [68] suggest that the exoglucanase (EXO1) of H grisea displays lower activity towards crystalline cellulose than the corres-ponding CBHI enzyme from this organism The same study indicated exo-synergism between EXO1 and CBH I in the hydrolysis of crystalline cellulose, and a similar co-operativ-ity between CBHs in T emersonii may occur Despite the reduced activity of CBH IB against crystalline cellulose, the enzyme hydrolyses avicel in a processive manner In processive cellulose hydrolysis, initial hydrolytic attack occurs at the chain end, with glucose or cellotetrose produced only upon initial attack, with cellobiose being the principal product of hydrolysis thereafter During hydrolysis of avicel
by CBH IB, glucose production is markedly low and remains constant after the initial hydrolytic attack Cellobi-ose is the predominant product of hydrolysis and increases in concentration as the reaction proceeds [21] The exo-loop of 1CEL (amino acids 243–256) forms the roof of the active site tunnel at the catalytic centre Deletion of this loop has been shown to lead to a decreased processivity of 1CEL against crystalline cellulose [69] This exo-loop is conserved in 1Q9H and is presumed to contribute to processivity of CBH IB against crystalline cellulose It should be noted, however, that 1GPI has a natural deletion of the exo-loop, yet CEL7D, from P chrysosporium, is able to maintain high processivity, leading to efficient crystalline cellulose hydro-lysis [69] Therfore, conclusions drawn for one enzyme within the same family do not necessarily apply to others because of different substrate preferences We were able to restore biological activity of the denatured reCBH, although enzyme activity remained very low 1Q9H has nine disul-phide bridges, and so regeneration of the native CBH enzyme in high yield by in vitro reoxidation of the reduced, denatured polypeptide, is extremely complex Expression at lower temperatures has been carried out and has yielded similar activity results Therefore, heterologous expression studies in other hosts are currently in progress Future site-directed mutagenesis of specific residues in 1Q9H should provide a valuable insight into the structural basis of ehnaced thermostability of the CBH IB protein from T emersonii
Acknowledgements
This work was funded by HEA pre-PRTLI and Enterprise Ireland awards to M.G.T C.M.C and R.T are grateful for junior teaching fellowships from NUI, Galway, and postgraduate scholarships from Enterprise Ireland.
References
1 Enari, T.M (1983) Microbial cellulases In Microbial Enzymes and Biotechnology (Fogarty, W.M., ed.), pp 183–223 Elsevier Applied Science, London.
2 Coughlan, M.P (1985) Enzymatic hydrolysis of cellulose: an overview Biotechnol Genet Eng Rev 3, 39–169.