Báo cáo khoa học: Three-dimensional structure of a thermostable native cellobiohydrolase, CBH IB, and molecular characterization of the cel7 gene from the ﬁlamentous fungus, Talaromyces emersonii ppt

Nucleotide, deduced amino acid sequence and relevant features of the Talaromyces emersonii cel7 gene.. The amino acid sequence of the 1GPI model was changed to that of 1Q9H, and the mode

Trang 1

Three-dimensional structure of a thermostable native

Alice Grassick1,*, Patrick G Murray1,*, Roisin Thompson2, Catherine M Collins1, Lucy Byrnes3,

Gabriel Birrane4, Timothy M Higgins2and Maria G Tuohy1

1

Molecular Glycobiotechnology Group, Department of Biochemistry,2Department of Chemistry, and3Department of Biochemistry, National University of Ireland, Galway, Ireland;4Beth Israel Deaconess Medical Centre, Harvard Institutes of Medicine,

Harvard Medical School, Boston, MA, USA

The X-ray structure of native cellobiohydrolase IB (CBH IB)

from the ﬁlamentous fungus Talaromyces emersonii, PDB

1Q9H, was solved to 2.4 A˚ by molecular replacement 1Q9H

is a glycoprotein that consists of a large, single domain

with dimensions of 60 A˚ · 40 A˚ · 50 A˚ and an overall

b-sandwich structure, the characteristic fold of Family 7

glycosyl hydrolases (GH7) It is the ﬁrst structure of a native

glycoprotein and cellulase from this thermophilic eukaryote

The long cellulose-binding tunnel seen in GH7 Cel7A from

Trichoderma reeseiis conserved in 1Q9H, as are the catalytic

residues As a result of deletions and other changes in loop

regions, the binding and catalytic properties of T emersonii

1Q9H are diﬀerent The gene (cel7) encoding CBH IB was

isolated from T emersonii and expressed heterologously

with an N-terminal polyHis-tag, in Escherichia coli The

deduced amino acid sequence of cel7 is homologous to fungal cellobiohydrolases in GH7 The recombinant cello-biohydrolase was virtually inactive against methylumb-eriferyl-cellobioside and chloronitrophenyl-lactoside, but partial activity could be restored after refolding of the urea-denatured enzyme Proﬁles of cel7 expression in T emerso-nii, investigated by Northern blot analysis, revealed that expression is regulated at the transcriptional level Putative regulatory element consensus sequences for cellulase tran-scription factors have been identiﬁed in the upstream region

of the cel7 genomic sequence

Keywords: 3D structure; cel7 gene; GH Family 7 glycopro-tein; Talaromyces emersonii; thermophilic

Cellulose is the major constituent of all plant materials

and is the most abundant organic molecule on Earth

[1,2] Microbial breakdown of cellulose creates the

potential for the production of energy [3–5] Cellulases

are used in waste recycling processes and in the

processing of cellulose-rich raw materials for the paper

and textile industries [6] Cellulose is composed of

repeating glucose units, where each glucose unit is

rotated 180 relative to its neighbours along the main

axis, so that the basic repeating unit is cellobiose Plant

cellulose exists in a highly crystalline form Hydrolysis of

cellulose requires the co-operative action of three classes

of cellulolytic enzymes, namely endo-b-1,4-glucanases

(EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) and b-gluco-sidases (EC 3.2.1.21) The CAZy (carbohydrate active enzymes) [7] classiﬁcation system collates glycosyl hydrolase (GH) enzymes into families according to sequence similarity, which have been shown to reﬂect shared structural features To date, GH enzymes are members of 87 families, of which 43 have been assigned

a retaining mechanism of action, 24 an inverting mechanism, and the stereochemical mode of action of the remaining families have yet to be determined The endoglucanases are commonly characterized by a groove

or a cleft into which a linear cellulose chain can ﬁt in a random manner Classically, exoglucanases such as the cellobiohydrolases (CBHs) possess tunnel-like active sites, which can only accept a substrate chain via its terminal regions [8] These exo-acting CBH enzymes act by threading the cellulose chain through the tunnel, where successive cellobiose units are removed in a sequential manner Sequential hydrolysis of a cellulose chain is termed processivity [9] However, some cellulase enzymes are capable of both endo- and exo-actions [10,11] Moreover, some GH families include both endo- and exo-enzymes, indicating that the mode of action can be independent of sequence homology and structural fold Relatively minor changes in the lengths of relevant loops in the general proximity of the active site in such enzymes, may dictate the endo- or exo-mode of action without

Correspondence to M G Tuohy, Molecular Glycobiotechnology

Group, Department of Biochemistry, National University of Ireland,

Galway, Ireland Fax: +353 91 512504, Tel.: +353 91 524411,

E-mail: maria.tuohy@nuigalway.ie

Abbreviations: CBH, cellobiohydrolase; CNP, chloronitrophenyl; Cre,

catabolite repressor element; GH, glycosyl hydrolase; reCBH,

recombinant cellobiohydrolase.

Enzymes: endo-b-1,4-glucanase (EC 3.2.1.4); cellobiohydrolase

(EC 3.2.1.91); b-glucosidase (EC 3.2.1.21).

*Both authors contributed equally to this work.

(Received 12 July 2004, revised 21 September 2004,

accepted 4 October 2004)

Trang 2

signiﬁcant diﬀerences in the overall fold In

Trichoder-ma reesei Cel7A, deletion of the exo-loop (residues

243–256) has been shown to decrease activity against

crystalline cellulose It was therefore postulated that the

exo-loop has evolved to facilitate processive hydrolysis of

crystalline cellulose by T reesei Cel7A [12] Fungal

cellu-lolytic enzymes reported to date comprise a single

polypeptide chain, frequently glycosylated, which contains

a catalytic domain usually connected to a cellulose-binding

domain by a proline/serine/threonine-rich linker [13]

CBHs from Humicola grisea [14], Phanerochaete

chrysos-porium[15] and Aspergillus niger [16] have been shown to

consist solely of a catalytic domain The most characterized

CBH members of GH7 are Cel7A from T reesei [8] and

Cel7D (CBH58) from P chrysosporium [17] Both CBHs

consist of two b-sheets that pack face-to-face to form a

b-sandwich [18] Cel7A from T reesei is composed of long

loops, on one face of the sandwich, that form a

cellulose-binding tunnel of 50 A˚ The catalytic residues are

glutamate 212 and 217, which are located on opposite

sides of the active site, separated by an intervening distance

consistent with a double-displacement retaining mechanism

[18] Members of GH7 are thought to follow a retaining

mechanism of action Kinetic parameters and enzyme–

ligand interactions of GH7 enzymes are well characterized

[19–21] Genes from this family have been cloned and

characterized from a variety of fungal sources, including

H grisea [14], T reesei [22,23], Penicillium janthinellum

[24], P chrysosporium [15] and Aspergillus species [16,

25,26], but until recently, never from a truly thermophilic

fungal species [27]

The thermophilic aerobic fungus, Talaromyces emersonii,

isolated from composting biomass, produces a completely

thermostable cellulase system that has not been fully

characterized to date [21,28–30] CBH enzymes from

T emersoniihave been puriﬁed, characterized and assigned

to GH families 6 and 7 [21,27,31] Protein thermostability is

not, however, reﬂected in the overall fold of a protein and is

thought to be the result of more localized differences,

causing thermophilic enzymes to be somewhat less ﬂexible

than mesophilic enzymes [32] In this article we present the

3D structure of the native CBH IB from T emersonii, the

ﬁrst structure of any protein from this source and the ﬁrst

structure of a native fungal CBH core (glycoprotein)

Molecular cloning, transcriptional regulation analysis and

overexpression of the cel7 gene in Escherichia coli are also

reported The 3D structure has been deposited in the Protein

Data Bank as 1Q9H

Experimental procedures

Fungal strain and growth conditions

Mycelia harvested from cultures grown from T emersonii

strain CBS 814.70 at 45C on Sabouraud dextrose agar

were used to inoculate liquid nutrient media, as described

previously [30] Cultures were grown at 45C with shaking

at 220 r.p.m At appropriate time-points, mycelia were

harvested by ﬁltration through several layers of ﬁne-grade

muslin, washed with 75 mM sodium citrate, pH 7.5, and

frozen immediately under liquid nitrogen for nucleic acid

extraction

PCR cloning of genomic DNA Chromosomal DNA was isolated from T emersonii mycelia harvested after 24 h of culture on 2% (w/v) glucose,

by using the method of Raeder & Broda [33] Ampliﬁcation

of a DNA fragment encoding a portion of the catalytic domain of T emersonii cel7 was performed by using PCR and degenerate primers designed from alignments of existing CBH sequences in the databases Reaction cocktails contained 2.5 U of Qiagen HotStarTM Taq DNA poly-merase, 1· buffer (Qiagen, Crawley, West Sussex, UK), 0.5· Q solution, 200 lMof each deoxynucleotide triphos-phate, 1.5 mMMgCl2 and 1 lM of the appropriate gene-speciﬁc primers Reaction conditions for PCR ampliﬁcation were 94C for 15 min (initial DNA polymerase activation),

94C for 1 min, 50–60 C for 1 min and 72 C for 1 min, followed by a final extension of 10 min, for 30 cycles PCR products were separated by electrophoresis through a 1.2% (w/v) agarose gel and subsequently purified by using a Wizard PCR preps DNA purification system (Promega, Southampton, UK) and subcloned into the pGEM-T easy vector (Promega), following the manufacturer’s guidelines Plasmids were purified from E coli JM109 cultures by using

a spin miniprep kit (Qiagen), and sequenced Sequencing reactions were carried out by Altabioscience Laboratories (University of Birmingham, Birmingham, UK) Sequence analysis and database similarity searches were performed by using the online programBLAST[34] against protein (BLASTP) and nucleotide (BLASTX AND BLASTN) sequences stored at the National Centre for Biotechnology Information (NCBI) Rapid amplification of cDNA ends

RNA (10 lg), isolated after growth (48 h) of T emersonii

on solka floc (ball-milled cellulose), was used as a template for RACE, which involved a modification of the manufac-turer’s (Ambion Europe Ltd., Huntingdon, Cambridge-shire, UK) RACE protocol described previously [27] An aliquot (1 lL) of the reaction mixture was used as a template for performing 5¢- and 3¢-RACE PCRs by using the outer and inner RACE primers supplied by the manufacturer and the outer and inner gene-specific primers designed from the cel7 PCR products The Cel7 outer and inner RACE primers were as follows: outer 5¢-RACE

inner 5¢-RACE primer 5¢-GTTTGCTTCCCAGACATC CATC-3¢; outer 3¢-RACE primer 5¢-ATGCTGTGGTTGG ATTCCGACTAC-3¢; and inner 3¢-RACE primer 5¢-AAC TCCTACGTGACCTACTCGAAC-3¢ PCR products were cloned and sequenced as described previously Isolation ofcel7 cDNA and genomic genes Full-length genomic and cDNA sequences corresponding to cel7 were amplified from T emersonii first-strand cDNA and genomic DNA, respectively, by PCR with primers corresponding to the 5¢ start and 3¢ stop sequences identified

in the 5¢- and 3¢-RACE products The cel7 sense and antisense primers were 5¢-ATGCTTCGACGGGCTCTTC TTCTA-3¢ and 5¢-TCACGAAGCGGTGAAGGTCGA GTT-3¢, respectively Reactions contained 1.25 U of Pfu DNA polymerase, 1· Pfu reaction buffer, 200 lMof each

Trang 3

deoxynucleotide triphosphate and 1 lMof the appropriate

gene-speciﬁc primers PCR products were gel puriﬁed,

subcloned and sequenced as described previously

Northern blot analysis and genomic library screening

Northern blot analysis of cel7 expression was carried out as

described previously [27] A T emersonii Sau3A genomic

library was prepared in LambdaGEM-11 (Promega) E coli

KW251 was used as the host strain in the preparation and

screening of the genomic library Plaque lifts were carried

out as described by Sambrook et al [35] Hybridization was

conducted overnight at 68C in 5· NaCl/Cit, 0.1% (w/v)

N-lauroylsarcosine, 0.02% (w/v) SDS and 1% (w/v)

block-ing reagent Full-length dioxygenin (Roche Molecular

Biochemicals, Roche Diagnostics Ltd., Lewes, East Sussex,

UK)-labelled cel7 (20 ngÆmL)1of hybridization buffer) was

used as a probe Detection was performed according to the

manufacturer’s instructions The presence of the full-length

gene in positively hybridizing single plaque-forming units

was conﬁrmed by PCR, and the plaques were puriﬁed by

using a lambda puriﬁcation kit (Qiagen) The plaques were

then sequenced directly, using a cel7 gene-speciﬁc

sequen-cing primer (5¢-GCATTCCTGCCATGTCAG-3¢) to

generate sequence data for the 5¢ region upstream of the

ATG start codon

Expression ofcel7 in E coli

Primers F1 (5¢-CACCCAGCAGGCCGGCACGGCG-3¢)

and R1 (5¢-TCACGAAGCGGTGAAGGTCGAGTT-3¢),

corresponding to the N- and C-terminal regions of the

mature protein, were used to amplify cel7 cDNA (the

N-terminal signal peptide, i.e amino acids 1–18, was

removed) The CACC corresponding to the GTGG

over-hang in the TOPO cloning vector (Invitrogen Ltd.,

Paisley, UK) is underlined in the primer sequence above

The puriﬁed PCR product was ligated into the pENTR/SD/

D-TOPO vector and transformed into One Shot Top10

E coli competent cells, according to the manufacturer’s

instructions An LR recombination reaction between the

entry clone, pE-Cel7, and the destination vector, pDEST-17

(Invitrogen), was transformed into E coli DH5a

library-efﬁcient cells, thereby generating the expression clone,

pD-Cel7, with an N-terminal poly-histidine tag Multiple

transformants were analysed by restriction analysis and

PCR to conﬁrm the presence and correct orientation of the

insert at all stages For expression, plasmid DNA was

puriﬁed and transformed into BL21-AI competent E coli

cells (Invitrogen), which were cultured to mid-log phase,

and expression was induced by the addition of 0.2% (w/v)

arabinose followed by a further growth period of 4 h at

37C Pilot experiments indicated that the CBH protein

was expressed in the inclusion body fraction Cells were

harvested by centrifugation (3630 g for 5 min) from a

50 mL culture, and the cell pellet was resuspended in 8M

urea The cell lysate was sonicated with three, 5-s,

high-intensity pulses, centrifuged at 1307 g for 15 min to pellet

cellular debris, and the supernatant was applied to a

Nickel-nitrilotriacetic acid puriﬁcation matrix (Invitrogen) The

lysate was allowed to interact with the matrix at room

temperature for 30 min with gentle agitation and then

washed with 2 volumes of wash solution (containing 8M urea, 20 mM sodium phosphate, 500 mM NaCl, pH 7.8), followed by 2 volumes of a second wash solution (containing

8M urea, 20 mM sodium phosphate, 500 mM NaCl,

pH 6.0) The column was then washed with 4 volumes of a ﬁnal wash solution of 50 mMsodium phosphate, and 20 mM imidazole, pH 8.0 Recombinant CBH (reCBH) was eluted from the column by application of a solution of 50 mM sodium phosphate, pH 8.0, containing 250 mMimidazole Denaturation and refolding of reCBH and enzyme assay reCBH (1 mg) was denatured by incubation in a solution of

8Murea/0.1MTris/HCl, pH 8.0, in the presence of 100 mM dithiothreitol and 1 mMEDTA, for 2 h at 20C The pH was lowered to pH 4.0 by dropwise addition of 1MHCl, and the dithiothreitol was removed by dialysis against the same buffer without the dithiothreitol Denatured and reduced reCBH was diluted 1 : 100 in a buffer solution containing 0.1M Tris/HCl, pH 8.5/1 mM EDTA/0.3 mM oxidized glutathione/3 mMglutathione, and then incubated

in a renaturation buffer containing 2.5 mg of protein disulphide isomerase at 30C for 30 h reCBH was dialysed against 100 mM sodium acetate, pH 5.0, followed by concentration in a Millipore microconcentrator ﬁtted with

a 10 kDa cut-off membrane reCBH activity was measured

by incubating 10 lL of renatured enzyme with 100 lL

of 1 mM chloronitrophenyl-lactate (CNP-lactate), 1 mM 4-nitrophenyl-cellobioside or 50 lM 4-methylumberiferyl-cellobioside, at 50C Reactions were terminated by the addition of 100 lL of 1MNa2CO3or 0.2Mglycine/sodium hydroxide, pH 10.5, and the absorbance (405 nm) or UV ﬂuorescence was measured

Purification of CBH IB CBH IB was purified from 2% (w/v) solka floc cellulose-induced cultures and characterized as described previously [21] The purified enzyme was concentrated to 20 mgÆmL)1

in 20 mMTris buffer, pH 7.5, and stored at 4C Peptide sequence information for native CBH IB was determined by Edman degradation on an automated sequenator (J Gray, University of Newcastle-upon-Tyne, Newcastle-upon-Tyne, UK)

Crystallization and data collection Native CBH IB from T emersonii was crystallized by using the hanging-drop vapour-diffusion method with ammo-nium phosphate (dibasic) as a precipitant at pH 8.5 Crystals of CBH IB, which diffracted to 2.4 A˚, were obtained Data were collected at room temperature on the multipolar wiggler beamline, BW7B, at the DORIS storage ring, EMBL Hamburg Outstation, Germany using a Mar345 area detector Data processing indicated that CBH IB crystallised in the tetragonal space group P41212, with unit cell dimensions a¼ b ¼ 74.42 A˚, c ¼ 176.92 A˚ [31] Structure solution

The structure was solved by molecular replacement by utilizing the program [36] Molecular replacement

Trang 4

was completed using two separate search models, chosen

based on sequence homology The models used were the

catalytic domain of T reesei Cel7A (PDB 1CEL) and the

catalytic domain of P chrysosporium Cel7D (PDB 1GPI)

Structure refinement

A total of 5% of the reﬂections in the data set was set aside

for free R-factor calculations during reﬁnement.REFMAC5

[37] from the CCP4 [38] suite of programs was used

throughout this reﬁnement, with the program [36] being

employed for graphical displays and manipulation of the

models With each round of reﬁnement, maps were

produced and the model was rebuilt where electron density

supported the changes Water molecules were located and

reﬁned by using the programARP_WARP [39] The

stereo-chemical quality of the model was followed by using the

programPROCHECK[40]

Results

Isolation of genomic and cDNA clones

The cel7 degenerate primers ampliﬁed a 719 bp PCR

product from T emersonii chromosomal DNA The

prod-uct was cloned, sequenced and found to exhibit homology

to other fungal gene cel7 sequences Based on this sequence,

5¢- and 3¢ outer and inner RACE PCR primers were

designed to amplify the 5¢- and 3¢ ends of the cel7 gene

Sequence analysis conﬁrmed the RACE products to be part

of the cel7 gene, which included a 54 bp 5¢ untranslated

region and a 281 bp 3¢ untranslated region, including a

polyA tail The full-length genomic (GenBank AF439935)

and cDNA (GenBank AY081766) cel7 clones were

ampli-ﬁed from ﬁrst-strand cDNA and chromosomal DNA,

respectively, by using N- and C-terminal gene-speciﬁc

primers based on the RACE products Cel7 was encoded

by a 1365 bp open reading frame encoding 455 amino acids

and interrupted by two introns (52 and 61 bp), with

consensus 5¢- and 3¢ intron splice sites (Fig 1)

Sequence analysis

Peptides sequenced from native CBH IB conﬁrmed the

identity of the T emersonii cel7 gene/gene product; the

location of these peptides in the deduced polypeptide

sequence is given in the legend to Fig 1 Comparison of

the deduced cel7 amino acid sequence from T emersonii

with those from P chrysosporium (GenBank: AAA19802),

T reesei (GenBank CAA49596), A niger (GenBank

AAF04491), H grisea (GenBank AAD11942) and A

acule-atus(GenBank BAA25183) gave sequence identity values of

65%, 64%, 73%, 51% and 68%, respectively (Fig 2) [41]

Alignment of the deduced polypeptide sequence of

T emersonii cel7reveals the presence of a terminal catalytic

domain Other cel7 (cbhI) gene products possessing a

catalytic domain exclusively have been identiﬁed and

include H grisea [42] and A niger [16] Cel7 genes from

T reesei[43], P chrysosporium [44] and A aculeatus [23],

however, contain a modular structure composed of a

C-terminal carbohydrate-binding module linked via a proline/

serine/threonine-rich linker to the catalytic domain [45]

There are two predicted N-glycosylation sites in the catalytic domain of 1Q9H (Fig 3), i.e Asn-X-Ser/Thr (X is any amino acid except proline) consensus sequence, at Asn267 and Asn431 There are 18 residues corresponding to the signal peptide at the N-terminus of the translated protein product Alignment of the existing fungal CBH sequences revealed that 1Q9H from T emersonii comprises features found in both Cel7D of P chrysosporium and in Cel7A of

T reesei

Analysis of theT emersonii cel7 upstream region Initial screening of 6000 k phage clones from the T emersonii Sau3A genomic library identified two positively hybridizing clones Sequence analysis of the 5¢ region upsteam from the start codon of the purified cel7 clones revealed putative TATA-like and CCAAT box sequences located upstream of the start codon at bp)99, )132, )340, 1040, )1242, )1348, )1476 and )1694 In filamentous fungi [46], and in higher eukaryotes [47], the CCAAT sequence is known as an upstream activating sequence The binding sites for putative cellulase transcription factors [activator of cellulase expres-sion I (ACEI) and ACEII] [48,49] are located upstream of the start codon at bp)562, )844,)853and )1175,whileputative binding sites for the catabolite repressor element (Cre) [50,51] are located upstream of the start codon at bp)239, )265, )320, )359, )460, )977, )1404 and )1523

Northern blot analysis ofT emersonii cel7 expression Solka ﬂoc cellulose, lactose and beechwood xylan induce high levels of cel7 expression in T emersonii (Fig 4) Similar cellulase expression with complex cellulose has been docu-mented in P chrysposporium [52] and T reesei [53] Methyl xylose and gentiobiose, a b-1,6-linked glucose disaccharide, induce low levels of cel7 expression, relative to solka ﬂoc, in

T emersonii Gentiobiose has been shown to induce other cellulases in T emersonii [27] Other researchers have repor-ted induction of CBH A and B, and endoglucanase genes from A niger are also induced byD-xylose [16] Sophorose, a b-1,2-linked disaccharide of glucose, has previously been shown to be a poor inducer of cellulase activity in

T emersonii[54], and it has been postulated that sophorose could be the natural inducer of cellulase expression in

T reesei Cellobiose is a poor inducer of the T emersonii cellulases and did not induce detectable levels of cel7 in this study Glucose-induced cultures displayed no detectable levels of cel7 Indeed, the addition of 2% (w/v) glucose for 2 h

to T emersonii mycelia, previously cultured on solka ﬂoc for

48 h, resulted in the abolition of the cel7 signal The regulatory proteins, CreA [55] and Cre1 [51], similar to Mig1 in Saccharomyces cerevisiae [56], mediate glucose repression in Aspergillus and Trichoderma species The 5¢ upstream region of T emersonii cel7 has eight potential catabolite repressor-binding sites (SYRGG) The sequence

of a gene encoding CreA from T emersonii has recently been submitted to the GenBank database (AF440004) Expression of cel7 inE coli

A recombinant protein, of 57 000 relative molecular mass, was expressed in E coli BL-21A Under the conditions

Trang 5

tested, reCBH was present in the insoluble inclusion fraction

(Fig 5A) The protein was puriﬁed under hydrid conditions

(denaturing/renaturing) on a Ni-nitrilotriacetic acid column

reCBH was inactive against CNP-lactate Denaturation of

reCBH, followed by refolding, with concominant disulphide

bond formation, in the presence of protein disulphide

isomerase in renaturation buffer, successfully restored

partial biological activity of reCBH against CNP-lactate

and methylumberiferyl (Fig 5B)

Structure solution and refinement of CBH IB

Molecular replacement was performed by using theCCP4

(1994) programs contained in the Automated package for

Molecular Replacement (AMORE) [33] Both 1GPI, which

was solved at 1.32 A˚ resolution and 1CEL, which was

solved at 1.8 A˚ resolution, were used as search models

Rotational and translational searches were performed at different resolutions in the range of 69 A˚ to 2.4 A˚ Rigid body refinement was carried out after each translation function to refine the position of the potential solution Euler angles, fractional coordinates, correlation coefficients and R factors for the best molecular replacement solution for each model, using P41212 as the space group of the 1Q9H crystal, were found The best solutions had correla-tion coefficients of 56.1% and 55.3%, and R-factors of 40.0% and 40.9% for 1CEL and 1GPI, respectively Refinement of the models produced by AMORE was performed by using the CCP4 program, REFMAC5 [34] REFMAC5 was used to carry out restrained refinement on X-ray data by using the maximum likelihood method The

Roveralland Rfreevalues from the ﬁrst round ofREFMAC5 cycles on the 1GPI model were 28.6% and 33.6%, respectively, while those for the 1CEL model were 28.9%

Fig 1 Nucleotide, deduced amino acid sequence and relevant features of the Talaromyces emersonii cel7 gene The stop codon is denoted by an asterix The N-glycoslation sites are underlined Catalytic residues are boxed Cysteine residues involved in the formation of disulphide bridges are

in bold (19–25, 50–71, 61–67, 135–401, 169–207, 173–206, 227–253, 235–240 and 258–334 bp) Four tryptophan residues involved in the glucosyl-bindinding platform (W38, W40, W371, W380) in the active-site tunnel are bold italics and underlined Four peptides sequenced from native CBH

IB had complete identity with Y124-D129, Y267-D272, I295-P300 and F445-S455 in the deduced protein sequence.

Trang 6

and 35.5%, respectively The 1GPI model was used for

further analysis The graphics programTURBO[36] was used

to examine both models and the maps produced by

REFMAC5 2Fo-Fc and Fo-Fc maps were used and analysed

with contour levels set to 1.0 The amino acid sequence of

the 1GPI model was changed to that of 1Q9H, and the

model was rebuilt where changes were supported by the

electron density Changes in R factors were used as a guide

to improvements to the overall structure Further rounds of

model mutation and rebuilding resulted in a model with an

R-factor of 16.1% and an R-free of 22.9% (Table 1)

Electron density maps showed almost continuous density

for the backbone of CBH IB The ﬁnal model, 1Q9H, included 430 of the 437 amino acid residues of CBH IB The ﬁnal two amino acids and the loop region (from amino acids 193–197) were not visible in electron density maps In addition, no side-chain density was apparent for four residues, which were subsequently modelled as alanine All

of these residues are located on the surface of the protein and are presumed to be disordered Three N-acetylgluco-samine and 175 water molecules were located within the model Average isotropic temperature factors (B factors) for the 1Q9H structure were calculated by using the CCP4 program Average isotropic temperature factors

Fig 2 Multiple sequence alignment of cellobiohydrolase IB (Tal.em; CBH IB) with glycosyl hydrolases from Humicola grisea (Hgrisea; GenBank AAD11942), Aspergillus niger (Aspniger; GenBank AAF04491), Phanerochaete chrysosporium (Phcry; GenBank AAA19802), and Tricho-derma reesei (T.reesei; GenBank CAA49596) Residues in white against a black background are amino acids that are identical or have a conserved substitution in all ﬁve sequences Residues in white against a grey background are amino acids that are identical or conserved in four out of the ﬁve sequences.

Trang 7

for the main chain were 20.91 A˚2 and root-mean-square

deviations (rmsd) from ideal bond lengths and angles were

0.009 A˚ and 1.281 A˚, respectively.PROCHECKwas used to

verify the stereochemical quality of the model The

Rama-chandran plot showed that 86% of residues lie in the most

favoured regions and 13.8% lie in the allowed region, while

Ser311 was the only nonglycine residue in the generously

allowed region and there were no residues in the disallowed

regions Peptide bond planarity for the main chain was

found to be 7.0, nonbonded interactions were 0.6 per 100

residues, a-carbon tetrahedral distortion was 1.8, the

standard deviation of the hydrogen bond energies was 0.7

and overall G-factor, a measure of the normality of the

structure, was 0.0

Overall structure ofT emersonii 1Q9H 1Q9H is a large single-domain protein with overall dimen-sions of 60 A˚ · 40 A˚ · 50 A˚ (Fig 6) About one-third

of this domain is arranged in two large antiparallel b-sheets, which are stacked face-to-face and are highly curved, forming convex and concave surfaces The convex and concave sheets of the b-sandwich are composed of seven b-strands Many of the side-chains in the b-sheets are hydrophobic, and interactions between these residues appear to hold the b-sandwich in position With the exception of four a-helices and two pairs of short b-strands, the rest of the protein consists almost entirely of loops connecting the b-strands The loops extending from the

Fig 4 Northern blot analysis of Talaromyces emersonii cel7 expression with various carbon sources at 2% (w/v) Glucose at 24 h and 48 h (lanes 1 and 2); methyl glucose at 48 h (lane 3); sorbitol at 48 h (lane 4); galactose at 48 h (lane 5); galactitol at 48 h (lane 6); methyl xylose at 36 h (lane 7); glycerol at 48 h (lane 8); gentiobiose at 48 h (lane 9); cellobiose at 48 h (lane 10); and beech wood xylan at 48 h (lane 11) Time course transcription,

24 h, 48 h and 96 h (lanes 12, 13 and 14) of cel7 after transfer to Solka ﬂoc (ball-milled cellulose) Addition of 2% (w/v) glucose to 48 h cultures of

T emersonii cultured on Solka ﬂoc with RNA isolated after a further 2 h (lane 15) Time course transcription, 24 h and 48 h (lanes 16 and 17) of cel7 after transfer to lactose The bottom panel is the 18S ribosomal RNA loading control.

Fig 3 Electron maps at the two

N-glycosyla-tion sites (A) Asn267 with two GlcNAc

(2-amino-2-N-acetylamino- D -glucose) residues.

(B) Asn431 with one GlcNAc residue.

Fig 5 SDS/PAGE and activity analysis of

recombinant cellobiohydrolase (A) SDS/

PAGE [10% (w/v) gel] analysis of puriﬁed

recombinant cellobiohydrolase (reCBH).

Lane 1, molecular makers; lane 2, uninduced

cells; lane 3, induced cells (4 h); and lane 4,

puriﬁed His6-reCBH (B) reCBH activity

against methylumbelliferyl-cellobioside after

0, 1 and 24 h Substrate controls, lane 1;

enzyme reactions, lane 2.

Trang 8

b-sandwich forms a tunnel, which runs the length of the

concave sheet, into which the cellulose substrate can be

accommodated The b-sandwich represents the

characteris-tic fold of GH7 and is also the fold of the legume-lectin

family and of GH16 [57]

The loops extending from the b-sandwich are stabilized

by the presence of nine disulphide bonds which are located

between residues 19–25, 50–71, 61–67, 135–401, 169–207,

173–206, 227–253, 235–240 and 258–334 The N-terminal

glutamine residue is present as the modiﬁed pyroglutamate

group, as observed in other GH structures [17,58] Electron

density corresponding to N-glycosylation is visible at two

asparagine residues, namely Asn267 and Asn431 (Fig 3) It

was possible to position two N-acetylglucosamine residues,

linked via a b-1,4 bond, at Asn267 A single N-linked

N-acetylglucosamine was seen in the model at position

Asn431

Structure ofT emersonii 1Q9H, in comparison with

P chrysosporium 1GPI and T reesei 1CEL

ABLASTsearch of the Protein Data Bank (PDB) revealed that the protein structures with the highest sequence homology to 1Q9H were structures 1GPI and 1CEL, which are the catalytic domains of CBH Cel7D from P chrysos-porium[17] and CBH Cel7A [8] from T reesei, respectively

P chrysosporium has a sequence identity of 67% with Cel7A, while T reesei has an identity of 66%

While the sequence homology between 1Q9H, 1CEL and 1GPI are similar, the areas of shared homology differ Superimposing the C-alpha traces of 1GPI and 1CEL on 1Q9H, gave rmsd values of 0.71 A˚ and 0.67 A˚, respectively (Fig 7)

Substrate-binding subsites The X-ray structure of the T reesei CBH, with eight glucose residues bound (PDB 7CEL), identiﬁes some 20 residues involved in enzyme–substrate interactions Superposition of this structure on 1Q9H shows that all but two of these residues are conserved and suitably positioned for inter-actions with the substrate Four tryptophan residues form a glucosyl-binding platform in sites)7, )4, )2 and +1 in the tunnel of 1CEL; equivalent tryptophan residues are found

in 1Q9H at positions 38, 40, 371 and 380 A tyrosine residue (Tyr47) present in the T emersonii CBH IB sequence, and seen in 1GPI but not in 1CEL, is located at the entrance of the tunnel, which Munoz et al suggests may constitute an additional binding subsite [17] Three arginine residues in the product sites of 1CEL (+1, +2 and +3) are proposed

to assist in the binding and positioning of the substrate and play a role in the recognition of the reducing end of the cellulose chain Arginine side-chains are present in all equivalent locations in 1Q9H (Fig 7)

Tunnel-forming loops There are four major loops involved in the cellulose-binding tunnel in 1CEL It is postulated that Asn197 and Asn198

Fig 6 Stereoview of Talaromyces emersonii 1Q9H The substrate is superimposed onto 1Q9H The ﬁgure was drawn by using TURBO

[36].

Table 1 Final statistics for the structure of Talaromyces emersonii

1Q9H Values in parentheses refer to the last resolution shell.

Unit cell dimensions

Resolution range (A˚) 20–2.40

Completeness (%) 94.3 (94.3)

Mean I>2s (I) (%) 78.8 (56.8)

No of water molecules 175

No of sugar molecules 3

rms bond lengths (A˚) 0.009

Average B main chain (A˚2) 20.91

Average B water (A˚2) 31.28

Trang 9

make van der Waals interactions with Tyr370 and Tyr371, on

the opposite loop, thus enabling it to form a fully enclosing

tunnel [8] While sequence analysis shows that 1Q9H

possesses the equivalent Asn residues (Asn193 and

Asn194), one of the tyrosine residues on the opposite loop

is replaced by an alanine (Ala374), forming a more open

tunnel; however, electron density in this area of 1Q9H is

poor In 1GPI, neither asparagine residues are present and a

histidine and an alanine residue are found in the equivalent

tyrosine positions The tunnel-forming loop (amino acids

240–248) in 1GPI is signiﬁcantly shorter than in 1CEL and

1Q9H, owing to a six amino acid deletion, depicting a more

exposed catalytic site for 1GPI In 1CEL, three amino acids

form a tight turn over site)6,withGln101hydrogen bonding

to the glycosyl residue in site)5, thus forming the lid of the

binding site The structures 1Q9H and 1GPI have a deletion

of these three residues, thus leading to a more open

substrate-binding site

Catalytic binding site

Brooks et al showed, by NMR, that the CBHs I from

T emersoniihas a retaining mechanism of action [20] This

type of mechanism, as shown by Davies & Henrissat [9], involves a proton donor and a base separated by 5.5 A˚ [59–61] Henrissat [62] classified all members of GH7, which catalyse the hydrolysis of the b-1,4-glycosidic bond of cellulose, as retaining enzymes, i.e they retain the configur-ation of the anomeric carbon Glu212 and Glu217 have been identified as the proton donor and acceptor, respect-ively, in 1CEL Sequence analysis of 1Q9H and 1GPI shows that these residues are conserved, suggesting that they carry out the same function Based on the proposed mechanism of action from 1CEL, Glu209 of 1Q9H may act as the nucleophile, while the proton donor is likely to be Glu214 The proposed catalytic residues are separated by 5.57 A˚ The Asp211 residue of 1Q9H is in a position to share a proton with the nucleophile, in a short hydrogen bond (O-O distance 2.51 A˚; Fig 8) The residue Glu214 forms a weak hydrogen bond to Asn138 A platform of hydrophobic residues has recently been identified as being mechanistically relevant as a transition-state stabilizing factor in GH family members [63] A tyrosine residue (Tyr142), present near the )1 subsite in 1Q9H, is thought to be involved in this platform

Discussion

This article presents the ﬁrst report on the puriﬁcation and 3D structural determination of a native core CBH protein, and of the cloning and over-expression of the corresponding gene, from a thermophilic fungal source CBH IB is extremely thermostable with a temperature optimum of

68C at pH 5.0 and a half-life (t½) of 68.0 min at 80C and

pH 5.0 In comparison, Cel7a from T reesei has a tempera-ture optimum of 62C over the pH range 3.5–5.6 The cel7 gene from T emersonii was cloned and the deduced amino acid sequence used during the structure solution of the native enzyme Family 7 contains both CBHs and endoglucanases The structure of CBHs is distinguished from that of endoglucanases by the presence of loops of polypeptide chain covering the active site residues, which convert the active site cleft of endoglucanases into the characteristic tunnel of CBHs [8] Three CBHs belonging to GH Family 7 –

T reesei 1CEL, P chrysosporium 1GPI, and T emersonii 1Q9H – are generally similar in structure The catalytic domains are single domain proteins with two large antipar-allel b-sheets that stack face-to-face to form a b-sandwich The rest of the three CBHs consist almost entirely of loops

Fig 7 C-alpha trace of 1Q9H (yellow) superimposed on the C-alpha

trace of 5 Cel (white), illustrating the more open active site of 1Q9H The

sugar residues are superimposed in blue The catalytic residues are

shown in red The ﬁgure was drawn by using TURBO

Fig 8 Diagram of the active site of 1Q9H

showing the distance between the proposed

catalytic residues.

Trang 10

connecting the b-strands However, on closer inspection of

the structures, local variations are reﬂected in the sequence

differences The cellulose-binding sites in 1Q9H are more

accessible than those in 1CEL The absence of the three

amino acids that are observed to form a tight turn over the

)5/)6 subsites in 1CEL, confer a more open entrance to the

cellulose-binding sites in 1Q9H This proposal is supported

by the replacement of Asn7 in 1CEL by a smaller threonine

residue in 1Q9H and 1GPI at the)7 subsite and of Tyr371 in

1CEL by Ala374 in 1Q9H at)3/)4 subsites A tyrosine

residue present in 1GPI and 1Q9H, but absent in 1CEL, has

been suggested, by Munoz et al [17], to be an additional

substrate-binding site The more open tunnel structure is

probably an adaptation to the lack of a CBM, allowing short

chain oligosaccharides more access to the active site, with

supporting evidence from the higher catalytic rate (kcat)

and catalytic efﬁciency (kcat/Km) of 1Q9H 13.4Æs)1 and

3.6Æs)1ÆmM )1 [21] (compared with 0.093Æs)1 and

0.23Æs)1ÆmM )1 [12] for 1CEL) with the oligosaccharide

derivative 4-NP-lactopyranoside An insertion of eight

amino acid residues common to 1Q9H, P chrysosporium

Cel7D and T reesei endoglucanase Cel7B can be seen

Although this insertion is located at the outer regions of the

structure, it could potentially have implications for function

and will be the target of future protein engineering studies

The probable catalytic residues, nucleophile Glu209 and

proton donor Glu214, of 1Q9H are located approximately

on opposite sides of the cleavable glycosidic linkage in the)1/

+1 subsites, with their carboxylic groups 5.57 A˚ apart Four

tryptophan residues located along the substrate-binding

tunnel in 1CEL, which are the determinants of the

glycosyl-binding sites, are conserved in 1Q9H Density was poor for

one of the tunnel-forming loops of 1Q9H (residues 193–197)

The tunnel is composed of loops that are inherently ﬂexible,

and the absence of good density in the loops is perhaps

indicative of its ﬂexibility It is worth noting that the

structures of T reesei GH7 CBHs were solved in the presence

of substrates However, as 1Q9H was solved in the absence of

bound substrate, one could imagine that if a substrate was

present in the structure the loops would close over the

substrate yielding a structure more like that of 1CEL

The cel7 gene consists of a 1365 bp open reading frame

encoding 455 amino acids interrupted by two introns The

deduced amino acid sequence revealed a secretory signal

peptide and a CBH catalytic domain The 5¢ upstream region

of cel7 has eight potential Cre-binding sites, and it is

probable that glucose repression of cellulase transcription is

mediated through a Cre protein in T emersonii (the gene

sequence for a Cre-like protein has been cloned from

T emersonii) It has been shown previously that sophorose is

a weak inducer of cellulases in T emersonii [64], but is the

proposed natural inducer of cellulase expression in T reesei

Induction of cel7 and cbhII [27] expression by gentiobiose

suggests that this glucose disaccharide may be the natural

cellulase inducer in T emersonii and indicative of an

alternative cellulase induction mechanism in this fungus

The carbohydrate-binding module and linker region that are

characteristic of some other GH family members were not

encoded in the gene, in contrast to cbh 2 from the same

source [27] Biochemical analysis of the CBHs from

T emersonii, previously reported from this laboratory, has

shown that the hydrolysis of crystalline cellulose (Avicel) by

CBH IB is 77% lower than observed with CBH IA [21] Earlier studies revealed that removal of the carbohydrate-binding module from the T reesei CBH resulted in a 90% decrease in activity against Avicel [65,66] More recently, Nutt et al [67] have shown, by progressive curve analysis, that intact CBHs from T reesei and P chrysosporium show higher activities than their corresponding cores against bacterial microcrystalline cellulose Takashima et al [68] suggest that the exoglucanase (EXO1) of H grisea displays lower activity towards crystalline cellulose than the corres-ponding CBHI enzyme from this organism The same study indicated exo-synergism between EXO1 and CBH I in the hydrolysis of crystalline cellulose, and a similar co-operativ-ity between CBHs in T emersonii may occur Despite the reduced activity of CBH IB against crystalline cellulose, the enzyme hydrolyses avicel in a processive manner In processive cellulose hydrolysis, initial hydrolytic attack occurs at the chain end, with glucose or cellotetrose produced only upon initial attack, with cellobiose being the principal product of hydrolysis thereafter During hydrolysis of avicel

by CBH IB, glucose production is markedly low and remains constant after the initial hydrolytic attack Cellobi-ose is the predominant product of hydrolysis and increases in concentration as the reaction proceeds [21] The exo-loop of 1CEL (amino acids 243–256) forms the roof of the active site tunnel at the catalytic centre Deletion of this loop has been shown to lead to a decreased processivity of 1CEL against crystalline cellulose [69] This exo-loop is conserved in 1Q9H and is presumed to contribute to processivity of CBH IB against crystalline cellulose It should be noted, however, that 1GPI has a natural deletion of the exo-loop, yet CEL7D, from P chrysosporium, is able to maintain high processivity, leading to efﬁcient crystalline cellulose hydro-lysis [69] Therfore, conclusions drawn for one enzyme within the same family do not necessarily apply to others because of different substrate preferences We were able to restore biological activity of the denatured reCBH, although enzyme activity remained very low 1Q9H has nine disul-phide bridges, and so regeneration of the native CBH enzyme in high yield by in vitro reoxidation of the reduced, denatured polypeptide, is extremely complex Expression at lower temperatures has been carried out and has yielded similar activity results Therefore, heterologous expression studies in other hosts are currently in progress Future site-directed mutagenesis of speciﬁc residues in 1Q9H should provide a valuable insight into the structural basis of ehnaced thermostability of the CBH IB protein from T emersonii

Acknowledgements

This work was funded by HEA pre-PRTLI and Enterprise Ireland awards to M.G.T C.M.C and R.T are grateful for junior teaching fellowships from NUI, Galway, and postgraduate scholarships from Enterprise Ireland.

References

1 Enari, T.M (1983) Microbial cellulases In Microbial Enzymes and Biotechnology (Fogarty, W.M., ed.), pp 183–223 Elsevier Applied Science, London.

2 Coughlan, M.P (1985) Enzymatic hydrolysis of cellulose: an overview Biotechnol Genet Eng Rev 3, 39–169.

Định dạng
Số trang	12
Dung lượng	729,71 KB