The University of ToledoThe University of Toledo Digital Repository Theses and Dissertations 2013 Naturally-occurring fusion between the regulatory and catalytic components of type IIP r
Trang 1The University of Toledo
The University of Toledo Digital Repository
Theses and Dissertations
2013
Naturally-occurring fusion between the regulatory and catalytic components of type IIP restriction- modification systems
Jixiao Liang
The University of Toledo
Follow this and additional works at:http://utdr.utoledo.edu/theses-dissertations
This Thesis is brought to you for free and open access by The University of Toledo Digital Repository It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of The University of Toledo Digital Repository For more information, please see the repository's About page
Recommended Citation
Liang, Jixiao, "Naturally-occurring fusion between the regulatory and catalytic components of type IIP restriction-modification
systems" (2013) Theses and Dissertations Paper 134.
Trang 2A Thesis entitled Naturally-Occurring Fusion Between the Regulatory and Catalytic Components of Type
IIP Restriction-Modification Systems
by Jixiao Liang Submitted to the Graduate Faculty as partial fulfillment of the requirements for the
Master of Science Degree in Biomedical Sciences
Trang 3Copyright 2013, Jixiao Liang This document is copyrighted material Under copyright law, no parts of this document
may be reproduced without the expressed permission of the author
Trang 4An Abstract of
Naturally-Occurring Fusion Between the Regulatory and Catalytic Components of Type
IIP Restriction-Modification Systems
by Jixiao Liang
Submitted to the Graduate Faculty as partial fulfillment of the requirements for the
Master of Science Degree in Biomedical Sciences
The University of Toledo December 2013 Restriction-modification (R-M) systems play key roles in controlling gene flow among bacteria and archaea, and their own genetic mobility depends critically on their regulation, but the regulation of these systems is poorly understood The PvuII R-M system is a Type IIP R-M system in that the protective DNA methyltransferase (MTase)
is a separate and independently-active protein from the potentially lethal restriction endonuclease (REase) PvuII is one of the best studied of the R-M systems that use a positive feedback regulatory loop, involving a transcriptional regulator called C protein,
to delay expression of the REase relative to that of the MTase This allows protective methylation of a new host cell’s DNA before the REase is produced In searching for R-
M systems related to PvuII, in order to study evolution and variation of its regulatory
system, a putative system was found in the genome sequence of the bacterium Niabella
soli strain DSM 19437, in which the regulatory C protein and the REase are
translationally fused The hypothesis is that N soli truly produces a fused C-R protein, and that it is active as both a REase and as an autogenous regulator The genes for the N
soli R-M system were synthesized, produced and purified with affinity tags, and the
Trang 5production of full-length C-REase fusion protein was confirmed The dual activity of the
fusion protein was determined by in vitro restriction of known DNAs, and in vivo transcriptional activation of a lacZ fusion to the promoter on which the C protein acts
Trang 6This work is dedicated to my parents, Zhao-jun Liang and Gui-ying Xu for their love and support
Trang 7Acknowledgements
This thesis and the associated research would not have been possible without the ever-patient guidance of my mentor, Dr Robert Blumenthal I would like to express my sincere gratitude to my major advisor Dr Robert Blumenthal for his continuous support
of my graduate study and research, for his patience, encouragement, guidance and support He recognizes my strength and weakness, which keep me motivated I am also grateful for all his advice about life, career and everything else
I would additionally like to thank my committee members, Dr Jason Huntley and
Dr Steve Patrick for their valuable time, constructive suggestions, and criticisms during
my study
Further, for her constant support as an instructor in lab and a friend in life, I would like to sincerely thank my lab mate Dr Kristen Williams Also, my friends Dr Guo-ping Ren and Dr Gang Ren have offered me valuable advice and help on my experiments Last but not least, I would like to thank all the students, faculty, and staff in the Medical Microbiology and Immunology Department Thank you all!
Trang 8Table of Contents
Abstract iii
Acknowledgements vi
Table of Contents vii
List of Figures viii
Chapter 1: Literature Review 1
Chapter 2: Materials and Methods 13
Chapter 3: Results……… 22
Chapter 4: Discussion and Conclusion 33
References 39
Trang 9List of Figures
Figure1 Complex formed by R.PvuII and its cognate DNA
Figure2 PvuII R-M system control region
Figure3 Structure of C AhdI
Figure4 Sequence of synthesized NsoJS138I R-M system
Figure5 Vector map of constructed plasmids
Figure6 Alignment of CR fusion proteins orthologous to C.PvuII and R.PvuII Figure7 Test of CR fusion protein production
Figure8 Test of CR fusion protein production
Figure9 Assessment of REase activity in CR.NsoJS138I
Figure10 Confirmation of specific digestion conditions
Figure11 Assessment of C activity in CR.NsoJS138I
Figure12 Possible interactions of C-REase fusion polypeptides
Trang 10Chapter 1
Literature Review
1 Restriction-modification (R-M) systems
The biological phenomenon of restriction and modification were first recognized
in the early 1950s, and the first M system was cloned in E coli in the late 1970s [1]
R-M systems are present in the great majority of bacteria and archaea, with more than 3000 being found to date (most by detecting MTase gene sequences) [2] As the term indicates,
a typical R-M system comprises two activities: a restriction endonuclease (REase) that cleaves DNA at a target sequence, and a methyltransferase (MTase) that modifies the same sequence to protect it from the cognate REase [2] Four broad types of R-M systems have been reported so far, each with unique characteristics, and the two enzymes have been combined into a single multi-subunit protein in some of the systems [3] However in Type IIP R-M systems, the REase and MTase separately execute their opposing intracellular enzymatic activities [3]
1.1 Restriction Endonuclease (REase)
The REase catalyzes the cleavage of double-stranded DNA, generally on both strands REases recognize specific sequences on the target DNA, and the cleavage occurs
Trang 11via hydrolysis of one phosphate-deoxyribose bond in the backbone of each DNA strand [4] Typically, such enzymatic activity takes place without energy input, but commonly requires Mg2+ or a similar divalent cation; some REases also require or are stimulated by, ATP or S-adenosylmethionine (AdoMet) [5] REases appear to come from very different backgrounds, and are difficult to identify from their sequences alone [6-8]
1.2 Modification Methyltransferase (MTase)
REase cleavage of DNA could be lethal to cells producing R-M systems To protect endogenous DNA from REase, the paired (cognate) MTase catalyzes addition of a methyl group to one nucleotide in each strand of the recognition sequence, with the identities and positions varying from MTase to MTase [9] AdoMet always serves as the methyl donor and is thus an essential cofactor for methylation [10] The sensitivity of the REase of R-M systems to methylation on the recognition sequences usually prevents cleavage of endogenous DNA However, while cleavage can be prevented by the cognate methylation, noncognate methylation occurring elsewhere in the recognition sequence may or may not prevent the cleavage [11]
1.3 Types of restriction modification systems
R-M systems are classified based on enzyme composition and cofactor requirements, recognition sequence symmetry, and cleavage position [3, 12] Because my research defines a new subtype of R-M system, in which the REase and regulatory C protein are fused, it is appropriate to describe the various known types of R-M systems
Trang 121.3.1 Type I Systems
Type I systems are considered as the most complex R-M systems, as they consist
of three polypeptides: R (Restriction), M (Modification) and S (Specificity) These form a complex that can both cleave and methylate DNA in an energy (ATP) dependent manner, and about half of the bacterial genomes contain closely linked-genes that are predicted to code for these three polypeptides, based on screening of the present database of complete genomic sequences [13] Furthermore, the fact that cleavage occurs at a considerable distance away from the recognition site in most cases, makes it difficult to visualize the discrete bands by gel electrophoresis [14] So these enzymes have substantial biological significance, but have not yet found major biotechnological uses
1.3.2 Type II Systems
Type II systems are believed to be the simplest and most prevalent R-M systems
As opposed to type I systems, Type II REase and MTase act independently without the need of a specificity protein, and each has its own simple catalytic requirement: REase requires Mg2+ (or similar divalent cation) and MTase requires AdoMet [14] Type IIP REases are generally active after they dimerize and form homodimers, while most Type
II MTases only form monomers for catalyzing the addition of methyl groups to the cognate DNA [14, 15] Early on it was recognized that while typical Type II enzymes recognized palindromic sequences and cleaved symmetrically within them, the Type IIS enzymes cut outside their normally asymmetric sequences and differed in other interesting ways [16] There are many subdivisions of Type II enzymes, classified based
on their recognition and cleavage differences [3] Specifically, some of the criteria are
Trang 13based on the sequence cleaved and others on the structure of the enzymes themselves, so not all subdivisions are mutually exclusive [3] Type IIP designates the enzymes that recognize symmetric sequences (palindromes) [3] Some new subclasses of Type II R-M systems involve fusion of components, such as between the REase and MTase [17-20]
1.3.3 Type III and Type IV Systems
Type III MTase and REase form a complex of modification and cleavage [21] Similar to Type II systems, Mg2+ and AdoMet are essential cofactors for Type III REase and MTase, respectively; and in the presence of such cofactors, a complex formed from REase and MTase competes internally for modifying and restricting at the same DNA position [22] As a consequence, incomplete digestions are typical [14] The Type IV REases cleave only modified DNA, which consist of methylated, hydroxymethylated and glucosyl-hydroxymethylated bases [3] However, their recognition sequences have usually not been well defined except for EcoK-McrBC, and cleavage occurs at ~30 bp
away from one of the sites [3] The Escherichia coli McrBC enzyme, the best studied of
the type IV REases and the only one that is commercially available, requires two purine methylcytosine/hydroxymethylcytosine sites separated by 40–3000 base pairs for cleavage [23]
2 Roles and Control of R-M systems
One major function of R-M systems is to protect bacterial cells from bacteriophage infection or invasion by foreign DNA [24] In addition to being bacterial defense systems, R-M systems manifest themselves in a diverse range of functions such
Trang 14as stabilization of genomic islands, maintenance of bacterial fitness and nutrition, immigration control, recombination and genome rearrangement, evolution of genomes, enforcing methylation on the genome and so forth [25]
Lethal DNA damage would occur if the two R-M enzymatic activities (MTase and REase) were unbalanced [26] This is particularly true when R-M genes first enter a new host cell that has completely unmethylated DNA [27] Therefore, a timing delay between expression of the MTase and REase is theoretically believed to occur in Type IIP systems, and this has been documented to occur in PvuII [28] Specifically, there is a
~10-min delay between the appearance of MTase and REase transcripts and activities [28] This boosts our understanding of the mobility of R-M systems
3 PvuII R-M system and its regulatory characteristics
3.1 Overview of PvuII
PvuII was discovered [29] and then cloned into E coli from its original host
Proteus vulgaris about three decades ago [30] Since then, it has been subjected to many
regulatory studies [31-34] This system was also the first R-M system to have had both the REase [35, 36], and MTase [37] structures crystallographically determined Because this study reports a REase-C protein fusion, it is important to discuss the structures of those two components
Trang 153.2 Structure and function of PvuII restriction endonuclease
Figure 1 Complex formed by R.PvuII and its cognate DNA In this
view, the enzyme is in ribbons representation in purple, with the DNA strands in green and cyan The amino termini of the two REase subunits are at the right The image is structure 1EYU of the Protein Data Bank (managed by the Research
Collaboratory for Structural Biology) The image is in the public domain
With the application of X-ray crystallography, the molecular structure of active
PvuII endonuclease has been identified as a homodimeric protein, with the subunit
interface region consisting of a pseudo three-helix bundle at the amino end [35] Three regions have been determined in R.PvuII, namely the subunit interface region, catalytic
Trang 16region and DNA recognition region The recognition sequence for R PvuII cleavage is CAG↓CTG, and such cleavage is prevented by N4-methylcytosine (yielding
CAGN4mCTG), generated by its cognate methyltransferase [27]
3.3 C- protein and its regulatory roles
3.3.1 Overview
In addition to the MTase and REase genes, a subset of type II R-M systems contains regulatory genes The regulatory C (controller) gene was first discovered in the PvuII [38, 39] and BamHI [40] R-M systems A milestone in characterizing the PvuII
system is the identification of a regulatory element called “C-Boxes” between the pvuIIM and pvuIIR genes, exerting the time-control for the expression of REase and MTase [28,
39, 41] C boxes are where the C protein binds to exert its effects [31] While the location
of the MTase gene varies among R-M systems, in those that have C proteins the C gene is typically upstream of the REase gene [31]
Figure 2 PvuII R-M system control region Two transcription starts for
pvuIICR are identified by rightward bent arrows: from the C-independent weak
Trang 17promoter (left) and C-dependent strong promoter (right) [38] The two pvuIIM
promoters are also shown (leftward bent arrows) Gray wavy lines represent the resulting mRNAs
3.3.2 C protein-dependent regulatory circuit in PvuII
C proteins (encoded by C gene), where tested, activate transcription of their own gene (‘autogenous’ activation) They are believed to be responsible for the delay in REase activity, since the REase gene typically does not have its own promoter [33] and is completely dependent on transcription from the upstream autogenously regulated C gene [42] Thus when the R-M genes enter a new cell, and no C protein is present, MTase is expressed while C protein (and REase) are initially produced at very low levels As C protein accumulates, the positive feedback loop results in a sharp increase in C and REase expression [33, 34] The C protein acts as both as an activator and repressor, so it can prevent overexpression of the REase [43]
Trang 183.3.3 C protein structure
Studies in Type II R-M systems have indicated that C proteins are only active when they become homodimers [44, 45] The dimerization of C proteins is required for DNA binding and, considering the relatively low stability of the dimer itself, this appears
to be an important component of the genetic switch that delays transcription of the gene, and consequently that of the endonuclease (R) gene transcribed from the same promoter [46] The regulatory C protein of another R-M system named AhdI has been crystallized [44], and a high-resolution crystal structure of C.AhdI was described two years later by the same group of scientists [47] The high-resolution structure of C.AhdI reveals a compact, single-domain homodimer and can be classified as an all-alpha protein: 65% of the residues are in a helical conformation with no beta-sheet present [44] (Figure 3)
C-Figure 3 Structure of C
AhdI [44] In this view, the
dimeric protein is in ribbons representation The image is structure 1Y7Y of the Protein Data Bank (managed by the Research Collaboratory for Structural Biology) The image is in the public domain
Trang 194 CR fusion protein in Type II R-M systems
The PvuII R-M system is one of the best studied of the group that uses a positive feedback regulatory loop to delay restriction endonuclease (REase) expression with respect to DNA methyltransferase (MTase) expression [43], allowing protective methylation of a new host cell’s DNA before the REase is produced To better understand the variation in and evolution of this regulatory system, I searched for other R-M systems closely related to PvuII This work is described under Results, but a group of related systems had naturally-occurring fusions between the C and REase proteins I provide here some background on the considerations underlying my studies on one of these fused systems Gene fusion is a major contributor to the evolution of multi-domain bacterial proteins, that typically results in one long composite protein in one organism in place of two or more smaller split proteins in another organism [48, 49]
4.1 Identification of the CR fused Type II R-M systems
To search for R-M systems closely related to PvuII, the REase (R.PvuII) amino acid sequence was used as the search seed in TBLASTN [61] This was done because the
C proteins are fairly well conserved [31, 33, 50], and the MTase proteins have conserved motifs [24, 51], so using them as search seeds would likely give a higher background of unrelated R-M systems However , the generally poor conservation of REases implies that only two closely related R-M systems would have similar REase sequences One fused polypeptide with portions similar to both C.PvuII and R.PvuII was
well-found in the bacterium Niabella soli
Trang 204.1.1 Overview of Niabella soli
The genus Niabella was proposed by Kim et al (2007) [52] for a bacterium
isolated from soil This genus was characterized as Gram-negative, aerobic, flagellated, flexirubin-pigment-producing bacteria that form short rods Shortly after that,
non-a dnon-ark yellow-colored bnon-acterium, JS13-8T, was isolated from a soil sample from Jeju Island, Republic of Korea [53] The cells were aerobic, Gram-negative, non-motile, short rods Growth occurred at 15–35 oC (optimally at 30 oC) On the basis of the phylogenetic, physiological and chemotaxonomic data, strain JS13-8T was deemed to represent a novel
species of the genus Niabella, for which the name Niabella soli sp nov was proposed [53] Subsequent to our discovery of a fused system in N soli it was also detected by an
automated sequence search by the curators of REBASE [2], which is a continuously updated R-M system database We have adopted their nomenclature as NsoJS138I for this system, following their entry on April 10, 2013 They performed no biochemical characterization of the R-M system
4.1.2 Translational frameshifting as a possible mechanism for production of free C protein in such fused systems
C.NsoJS138I and R.NsoJS138I are clearly fused at the sequence level, as described in Results However, it is possible that a certain amount of free NsoJS138I C or REase protein is produced via translational frameshifting or post-translational processing Post-translational processing could involve proteolytic cleavage that yields free C and free REase polypeptides Alternatively, free C protein (but not free REase) could result from ribosomal frameshifting during translation, which can occur when a ribosome
Trang 21encounters certain sequence patterns in the mRNA [54] Translational frameshifting represents an alternative process of protein translation [55], and occurs much more frequently than was originally expected [56] For instance, a study of ribosomal frameshifting on the sequence GCAAAA has shown that this pattern is associated with
efficient -1 ribosomal frameshifting in Escherichia coli [57]
4.1.3 Novel demonstration of CR fusions in Type II R-M systems
Natural and synthetic fusions of the REase and MTase polypeptides have been observed, and found to be active [17-20] However this thesis focuses on naturally-occurring fusions between the REase and the regulatory C protein These have been suggested to occur by automated annotation systems, such as REBASE, but have never been tested and shown to be active for either the REase or the C protein components
Trang 22Chapter 2
Materials and Methods
Gene synthesis
The sequence containing the complete R-M system of Niabella soli (1837nt, from
NCBI database; GenBank accession # NZ_AGSA01000028) was obtained from Genscript Inc
(Piscataway, NJ) Some modifications were made to optimize the distribution of restriction sites, but without changing the specified amino acids (Figure 4) The inferred NsoJS138I C-Box and promoter region (161nt) was also obtained from Genscript, and for cloning purposes the restriction sites XmaI (at C gene end) and BamHI (at R gene end) were appropriately placed
Cloning strategy
The R-M system Mru1279I (~2.4 kbp) was cloned into the high-copy vector pUC19, using NruI (at CR gene end) and BamHI sites (at M gene end) Genscript synthesized the complete NsoJS138I system, but could only clone it into a low-copy number vector pCC1 (they normally use higher-copy pUC57) This presumably resulted from a frameshift error in the MTase gene that is due to an error in the requested
Trang 23sequence To avoid the apparent toxicity, a truncated version was subcloned, consisting
of only the fused CR gene of NsoJS138I and missing a portion of the COOH-end of the REase (so the MTase would not be required) The truncated NsoJS138ICR was cloned into the pACYCDuet-1 vector (Novagen®), with the N-terminus (C protein end) in-frame with the His-tag (using BamHI and SaI I sites), and preceded by a T7 promoter This plasmid, pJL100, is referred to for readability as “pNsoShort” Full length NsoJS138ICR
was also cloned into this vector, by transforming an E coli strain containing the
pre-expressed PvuII MTase [58], with the NsoJS138ICR COOH-terminus (REase end) frame with the His-tag (using the NcoI site), named as pJL200 (“pNso”) The truncated product would be ~1.5 kDa less than the full length one
in-The synthesized NsoJS138I “C-Box” region was digested with BamHI and XmaI and ligated into pBH403, which is a derivative of pKK232-8 and contains a promoterless
lacZ gene between two bidirectional transcription terminators [59], making the pJL300
(“pBoxLac”) These plasmids are illustrated in Figure 5 The oligonucleotide primers used for PCR amplification are shown below (all in the 5’à3’ direction)
Primer set for cloning the complete Mru1279I R-M system:
ggtTCGCGActtccgggtctacacctcaa; ggtGGATCCagccctaaccagccgtaaat
Primer set for making the truncated NsoJS138ICR PCR product for pJL100: aatGTCGACttatttgggattattaatatccttatcac; aatGGATCCgatgaacgaaccaaatgc
Trang 24Primer set for making the full length NsoJS138ICR PCR product for pJL200: cgtCCATGGacaaaagtcttatgccat; cgtCCATGGatgaacgaaccaaatgctta
Trang 25Figure 4 Sequence of synthesized NsoJS138I R-M system The initiators of the
CR and M genes are in green The red arrow near the top indicates the position at which the C-REase gene is interrupted in the truncated clone (pJL100, pNsoShort)
Trang 26Figure 5 Vector map of constructed plasmids
(A) pJL100 (“pNsoShort”), truncated version of the C-REase fusion with N-terminal His
tag;
(B) pJL200 (“pNso”), full-length version of the C-REase fusion with COOH-terminal His
tag (next page);
(C) pJL300 (“pBoxLac”), promoter-C-box region fused to promoterless lacZ reporter
gene (next page)
A
Trang 27B
Trang 28Protein expression & purification
Twenty ng of both truncated and full-length versions of NsoJS138ICR DNA were
transformed into a BL21 (DE3) E coli strain (InvitrogenTM) that has galactoside (IPTG) inducible T7 RNA polymerase expression Overnight cultures of cells
isopropylthio-β-D-in stationary phase were subcultured isopropylthio-β-D-into 250 mL (as per the QIAexpress® protocol for His tagged protein purification) LB medium with a 1:20 dilution at 37oC IPTG was added to a final concentration of 0.5 mM when the subculture cells reached mid-log phase (OD600~0.46) Cells were grown for another 2.5 h before being harvested by centrifugation and frozen at -80oC The QIAexpress® Ni-NTA Fast Start Kit was used to purify 6xHis-tagged protein (under nạve condition) PMSF protease inhibitor was added (final concentration of 0.5 mM) to the lysis buffer immediately before purification of full length NsoJS138ICR Purified protein was added immediately to either Diluent B (NEB#B8002S, for protection of REase activity) or 2x SDS PAGE sample buffer (1:1 solution), and stored at -20oC Protein concentration was determined by the Pierce 660
nm Assay (Thermo Scientific)
Western blots
Purified proteins were separated by SDS-PAGE (Novex® 10~20% Tris-Glycine gradient gel), and either stained with standard Coomassie blue or blotted onto PVDF membranes at 30 V for 2 h using an Xcell apparatus (Invitrogen) For signal detection, membranes were blocked by incubation at 4oC overnight in 1% BSA-0.1% Tween-20 in PBS, followed by incubation with a 1:1,000 dilution of mouse anti-His tag monoclonal antibody (Millipore) for 2 h at 4oC, followed by three 10-min washes The blots were
Trang 29then incubated with horseradish peroxidase (HRP)-conjugated goat anti-mouse IgG (1 : 15,000, Invitrogen) for 2 h at room temperature After three 10-min washes, protein bands were visualized by ECL Plus enhanced chemiluminescence (GE Healthcare) and image captured using an Alpha Innotech FluorChem HD Imaging System Minor adjustments of brightness and contrast were carried out to better visualize data, but in all cases the same manner of such changes were applied to the complete image panel as a whole The pre-stained MW markers used were SeeBluePlus (Invitrogen)
Restriction activity assay
To assess the enzymatic activity of NsoJS138I REase, bacteriophage lambda DNA (NEB#N3011S) was used as substrate Restriction enzyme PvuII (NEB#R0151S) was used as a standard control, with the digestion pattern on lambda DNA already known NsoJS138IR (2.36 µg) or 10 u PvuII were incubated with 1.5 µg of lambda DNA for 1 h at 27, 32, 37 or 42 °C, in four NEB standard buffers for each reaction, and the DNA was resolved on 0.8% agarose gels Empty pUC19 vector DNA (0.8 µg) was also used as substrate
Assays for C protein activity
pJL300 was transformed into a IPTG inducible Tn7 E coli DE3 strain (Lac-) carrying the NsoJS138IC RM system The LacZ assay was based on hydrolysis of O-nitrophenyl-β-D-thiogalactoside using the Miller units as modified by others [60] Briefly, β -galactosidase activity and culture density were measured at 20–30 min intervals during exponential growth The units for this assay were calculated by dividing