TOWARDS AUTOMATIC GENE SYNTHESIS WITH BIOINFORMATICS SOFTWARE, NOVEL ONE-STEP REAL-TIME PCR ASSEMBLY, AND LAB-CHIP GENE SYNTHESIS HUANG MO CHAO B.. Towards automatic gene synthesis wit
Trang 1TOWARDS AUTOMATIC GENE SYNTHESIS WITH BIOINFORMATICS SOFTWARE, NOVEL ONE-STEP REAL-TIME PCR ASSEMBLY, AND LAB-CHIP
GENE SYNTHESIS
HUANG MO CHAO
(B Eng.), XJTU
A THESIS SUBMITTED
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
DEPARTMENT OF ELECTRICAL & COMPUTER
ENGINEERING
NATIONAL UNIVERSITY OF SINGAPORE
2009
Trang 2ACKNOWLEDGEMENTS
It is the blessing from my Lord Jesus Christ, who has made this work possible
I would like to express my sincere thanks to my supervisor Dr Li Mo-Huang, for his patience and unfailing guidance Without his valuable suggestions and support, this work would not have been successful Special thanks also go to Associate Professor Adekunle Olusola Adeyeye, my NUS supervisor, for all his kind help and support during my four years PhD study
Many thanks should be addressed to Dr Yang Yi Yan for the valuable suggestions in developing the hydrogel valve, Dr Danny Van Noort for the useful comments on the device design, Fan Lee for providing different hydrogel materials and all my friends in the Institute of Bioengineering and Nanotechnology for their endless help Special thanks go to Professor Jackie Ying and Ms Noreena AbuBakar for providing me the opportunity to work in IBN and supporting me all the time
I am deeply grateful for the various team mates I encountered during my stay at IBN I would like to thank Dr Cheong Wai Chye, Dr Bode Marcus, Dr Ali Emril Mohamed, Wei Jiashen, Chua Jay, Muller Stefanie, Kuan Yoke Kong, Ye Hongye, Sim Choon Kiat, Khor Samuel, Loh Nicholas and the students at high-throughput gene synthesis group in IBN In addition to their generous help and strong support, I also enjoyed great companionship
My greatest appreciation should go to my parents and grandmother for their endless love and concern throughout my life Without them, I would not have made it so far in life Finally, I would like to give special thanks to my boyfriend, Brian, who has been supportive, loving and encouraging me all the time
This thesis is especially dedicated to all of you
Trang 3TABLE OF CONTENTS
ACKNOWLEDGEMENTS I SUMMARY VI LIST OF FIGURES VIII LIST OF TABLES XIV
CHAPTER I 1INTRODUCTION 1
1.1 Overview gene synthesis 2
1.2 Challenges in Gene Synthesis 5
1.3 Motivation 6
1.4 Objectives of this PhD thesis 8
1.5 Thesis Outline 10
CHAPTER II GENE SYNTHESIS METHODS 12
2.1 Introduction 12
2.2 Bioinformatics in Gene Synthesis 12
2.3 Biochemical method of gene synthesis 16
2.3.1 LCR based gene assembly 17
2.3.2 PCR based gene assembly 18
2.3.3 Real time PCR 21
2.3.4 DNA extraction and purification 22
2.3.5 Enzymatic error filtering 23
2.3.6 Cloning and sequencing of synthetic DNA 25
2.4 Fundamentals of lab-on-a-chip 27
2.4.1 Microvalves 27
2.4.2 Micromixers 28
2.4.3 On-chip PCR 28
2.4.4 On-chip DNA purification 28
CHAPTER III DESIGN AND OPTIMIZATION OF OLIGONUCLEOTIDES 30
3.1 Introduction 30
3.2 TmPrime oligonucleotide design methods and functional modules 32
Trang 43.2.1 Fast and flexible oligonucleotide design 32
3.2.2 Multiple-pool assembly 34
3.2.3 Mis-hybridization screening 34
3.2.4 Codon optimization 35
3.3 Experimental evaluation of TmPrime performance 35
3.3.1 Target Proteins 35
3.3.2 Real-time gene assembly and amplification 36
3.3.3 LCR assembly 36
3.4 Results 37
3.4.1 Designing oligonucleotides for target proteins 37
3.4.2 Oligonucleotide assembly and amplification 39
3.4.3 Comparison with existing oligonucleotide design programs 42
3.5 Discussion 43
3.6 Conclusion 45
CHAPTER IV TOPDOWN ONE-STEP GENE SYNTHESIS 46
4.1 Introduction 46
4.2 Principle of Top-Down PCR based gene synthesis 47
4.3 Experiment verification of TopDown one-step gene synthesis 49
4.3.1 Design of oligonucleotide for gene synthesis 49
4.3.2 Non-competition one-step real-time gene synthesis 49
4.3.3 One-step and two-step PCR-based gene synthesis 50
4.3.4 Agarose gel electrophoresis 51
4.4 Performance of TopDown one-step gene synthesis and its real-time analysis 51
4.4.1 Performance of TD one-step gene synthesis 51
4.4.2 Analysis of real-time gene synthesis 54
4.5 Discussion 58
4.6 Conclusion 61
CHAPTER V AUTOMATIC TOUCHDOWN ONE-STEP GENE SYNTHESIS 63
5.1 Introduction 63
5.1.1 Principle of Automatic TouchDown one-step gene synthesis 64
5.1.2 Mechanisms of PCR synthesis process 65
Trang 55.2.1 Design of oligonucleotides for gene synthesis 66
5.2.2 Automatic TouchDown one-step real-time gene synthesis 66
5.2.3 Gel electrophoresis 67
5.3 Theoretical analysis of DNA hybridization kinetics 68
5.4 Real-time performance study of ATD one-step gene synthesis 70
5.4.1 Effect of varying extension time during ATD one-step gene synthesis
70
5.4.2 Effect of varying initial oligonucleotides concentration 71
5.4.3 Effect of varying annealing temperature 73
5.4.4 Synthesis of long gene by ATD process 74
5.4.5 Effect of varying dNTP concentration 75
5.4.6 Effect of melting temperature uniformity of partitioned oligonucleotides 75
5.5 Discussion 78
CHAPTER VI INTEGRATED TWO-STEP GENE SYNTHESIS ON CHIP 80
6.1 Introduction 80
6.2 Device fabrication and thermal cycling system construction 81
6.2.1 Microfluidic device fabrication 81
6.2.2 Preparation of hydrogel valves 82
6.2.3 PCR thermal cycling 85
6.3 Experimental verification of the integrated two-step gene synthesis chip 85
6.3.1 Gene assembly and amplification 85
6.3.2 Solid-phase buffer exchange 86
6.3.3 Agarose gel electrophoresis 87
6.3.4 DNA sequencing 87
6.4 Results and discussion 88
6.4.1 Device operation 88
6.4.2 In situ hydrogel valve 92
6.4.3 PCR thermal cycling 93
6.4.4 Comparison of one-step and two-step gene syntheses 95
6.4.5 Thermally enhanced solid-phase PCR purification 99
6.5 Conclusion 102
Trang 6CHAPTER VII CONCLUSIONS AND FUTURE PLAN 104
7.1 Summary 104
7.2 Future work 107
7.2.1 Synthesis of long difficult genes 107
7.2.2 Error filter 107
7.2.3 Integration of real-time fluorescence detection with gene synthesis system 109
Author’s Publications 110
References 111
Appendix I 120
Appendix I 120
Appendix II 125
Appendix III 126
Appendix IV 135
Trang 7Towards automatic gene synthesis with bioinformatics software, novel one-step real-time PCR assembly, and lab-chip gene synthesis
Huang Mo Chao Under the supervision of Associate Professor Adekunle Olusola Adeyeye
At National University of Singapore and Dr Li Mo-Huang
At Institute of Bioengineering and Nanotechnology
SUMMARY
This PhD thesis presents the whole process of gene synthesis method development and optimization, including the development of bioinformatics software TmPrime, TopDown and Automatic TouchDown one-step gene synthesis methods; and based on the developed protocols, this thesis also demonstrates an integrated gene synthesis device which is capable to perform two-step gene synthesis as well as purifying the synthesized product for downstream applications
Bioinformatics software TmPrime is developed to optimize oligonucleotide design It is able to design oligonucleotides with homologous melting temperature for both LCR and gapless PCR assembly of very long gene sequences The potential mis-hybridization, hetero-dimer, homo-dimer and hairpin formations among oligonucleotides are screened by pair-wise sequence alignment The utility of TmPrime is demonstrated by synthesizing three genes using gapless one-step or two-step process
TopDown (TD) one-step gene synthesis method combines the advantages of one-step and two-step gene synthesis process It conducts gene synthesis with TmPrime particularly designed/partitioned outer primers and inner oligonucleotides with distinct melting temperature (∆Tm > 8°C) difference This particular reaction condition provides several advantages in (i) eliminating potential competition between the assembly and amplification reactions, (ii) minimizing the possibility of truncated oligonucleotides participating in the assembly process and
Trang 8the resulting errors, (iii) providing an stringent annealing condition to reduce the potential of forming secondary structures, and (iv) increasing the specialization of oligonucleotides hybridization as in Touchdown PCR All of these would prevent the generation of faulty sequence, especially for gene with high GC contents
Automatic TouchDown (ATD) one-step gene synthesis method is developed to further improve TopDown method It enables the synthesis of long DNA of up to 1.5 kbp with only one polymerase chain reaction (PCR) process The method involves two key steps: (i) design of outer primers with two melting temperatures, and (ii) utilization of DNA annealing kinetics to selectively control the oligonucleotide assembly and full-length template amplification With the help of a novel real-time PCR approach to monitor the gene assembly process, the ability of this ATD method has been demonstrated in the design and synthesis of human protein kinase B-2 (PKB2) (1446 bp) and the promoter of human calcium-binding protein A4 (S100A4) (752 bp) with oligonucleotides concentration of as low as 1 nM
The integrated two-step gene synthesis device is established based on the developed protocols It is capable of performing two-step gene synthesis to assemble a pool of oligonucleotides into genes with the desired coding sequence The device comprises of two polymerase chain reaction (PCR) modules, temperature-controlled hydrogel valves, electromagnetic micromixer, shuttle micromixer, volume meters, and magnetic beads based solid-phase PCR purification module, fabricated using a fast prototyping method without lithography process The fabricated device is combined with a miniaturized thermal cycler to perform gene synthesis This device has been demonstrated to successfully synthesize a green fluorescent protein fragment (GFPuv) (760 bp), and obtained comparable synthesis yield and error rate with experiments conducted in PCR tube within a commercial thermal cycler To our knowledge, this
is the first microfluidic device demonstrating integrated two-step gene synthesis
Trang 9LIST OF FIGURES
Figure 1.1: Generic gene synthesis process It takes about two weeks to construct and
deliver an error free DNA 3Figure 2.1: Flowchart of bioinformatics software Both protein sequence and DNA
sequence are eligible input files The program generates the optimized
partition themes of the input sequences regarding user requirements 13
Figure 2.2: Process steps of gene synthesis Oligonucleotides are synthesized as
building blocks for polymerase cycling assembly or ligase chain reaction
Synthesized mismatch DNA is filtered out via enzymatic error filtering 16Figure 2.3: LCR based gene synthesis (a) Oligos phosphorylation by modifying their
5’ ends from hydroxyl group to phosphate group using a kinase; (b) Oligos
are linked together gradually to form template DNA using thermostable
ligase enzyme 17Figure 2.4: Operation principle of two-step overlapping polymerase cycling assembly
Different pools of oligos with sequences partially overlapped are first
assembled to long DNA blocks Then the outer primers are added to
amplify the amount of assembled full length DNA 19Figure 2.5: (a) Successive extension polymerase cycling assembly method DNA is
elongated successively from oligo R5 and F5 (b) Thermodynamically
balanced inside-out polymerase cycling assembly method DNA
construction starts from inside oligos F1 and R1, and gradually extended
using outside oligos 20Figure 2.6: Schematic illustrations of non-specific and specific DNA purifications
using (a) ChargeSwitch magnetic beads, (b) streptavidin magnetic beads, (c) oligo (dT)25 magnetic beads 23Figure 2.7: Principle steps of MutS error filtering After re-anneal of assembled DNA,
mismatched heteroduplex DNA are captured by MutS enzyme and
separated from the DNA with correct sequence by gel electrophoresis [60] 24Figure 2.8: Working principle of enzymatic cleavage proteins Endonuclease such as
T4E7 and T7E1 recognize and bind to the mutation site of mismatched
DNA and cleave the DNA into two segments 25Figure 3.1: Scheme of LCR or gapless PCR assembly The input sequence is the serial
connection of overlap regions of oligonucleotides 32Figure 3.2: An overview of the oligonucleotide design scheme The software first
divides the input sequence into approximately equal-temperature
(Equi-Tm ) or equal-length fragments (Equi-space) using markers based on the
user-specified melting temperature The positions of the markers are
iteratively shifted to globally minimize the deviation in melting
temperature among the fragments (Tm Equilibrate) Two adjacent
fragments are joined together to generate oligonucleotides for gapless
assembly 33
Trang 10Figure 3.3: Web interface for TmPrime TmPrime is implemented as functional
modules, each module reflecting a different aspect of the oligonucleotide
design process with interface elements organized in a coherently grouped
fashion 38Figure 3.4: Base composition plot of gene sequence GC content and melting
temperature plot of overlap regions of oligonucleotides partitioned using
Equi-space approach (a) PKB2 (1446 bp, G+C: 58.4%) (b) S100A4 (752
bp, G+C: 56%) The GC plot was obtained using Isochore
(http://emboss.bioinformatics.nl/cgi-bin/emboss/isochore) The highly
similar profiles of GC and melting temperature plots clearly indicated the
affects of GC cluster on the Tm homogeneity of oligonucleotides 39Figure 3.5: Agarose gel electrophoresis of assembled products One-step synthesis of
GFPuv (760 bp) from TmPrime: (Lane 1) optimized and (Lane 2)
fixed-length control oligonucleotides (Lane 3) One-step synthesis of PKB2
(1446 bp) Two-step synthesis of PKB2: (Lane 4) assembly and (Lane 5)
amplification (Lane 6) one-step synthesis of S100A4 (752 bp) Two-step
synthesis of S100A4: (Lane 7) assembly and (Lane 8) amplification The
annealing temperatures for the PCR process are as follow: GFPuv, 50°C;
PKB2, 61°C; S100A4, 58°C (assembly) and 49°C (amplification) 40
Figure 3.6: (a) Melting peak analyses of the assembled products for GFPuv from
one-step synthesis: ( -) optimized and (—) fixed-length control oligos Melting
peak analyses of the assembled products for (b) PKB2 and (c) S100A4
from one-step and two-step syntheses; two replicas were performed for
each set of oligos The corresponding agarose gel electrophoresis results of
the assembled products are shown in Figure 3.5 The measured Tm values
are 86.5°C for GFPuv, 91.5°C for PKB2, and 90.5°C for S100A4 41Figure 3.7: Agarose gel electrophoresis of LCR assembled GFPuv with TmPrime-
optimized oligonucleotides (a) LCR products (2, 4 and 8 hrs assembly)
before second PCR (b) Second PCR after LCR (2, 4 and 8 hrs) Lane (L):
100 bp DNA ladder 41Figure 4.1: Schematic illustration of TopDown one-step gene synthesis combining
PCR assembly and amplification into a single stage with different
annealing temperatures designed for assembly and amplification Inner
oligonucleotides and outer primers are designed with melting temperature
different > 15ºC to minimize potential interference during PCR 48Figure 4.2: Agarose gel (1.5 %) electrophoresis results of one-step (30 cycles),
TopDown (TD) one-step (40 cycles), and two-step (PCA: 30 cycles; PCR:
30 cycles) gene syntheses The TD one-step process is conducted with
annealing temperature of 67 °C for the first 20 cycles followed by another
20 cycles with annealing temperature of 49 °C The concentrations of
oligonucleotides and outer primers are 10 nM and 400 nM respectively 51Figure 4.3: Continuous fluorescence monitoring of real-time gene synthesis with 1X
LCGreen I The first 20 cycles is conducted with annealing temperature of
67 °C followed by another 20 cycles with annealing temperature of 49 °C
The concentrations of oligonucleotides and outer primers are 10 nM and
400 nM respectively 52
Trang 11Figure 4.4: Concentration effects of SYBR Green I and LCGreen I for TD one-step
real-time gene synthesis of S100A4 (a) 0.25× to 5× SYBR Green I The
fluorescence intensity of 1× LCGreen I is also included in this plot for
comparison The fluorescence curves of SYBR Green I are insensitive to
the number of PCR cycles, and fail to indicate the DNA length extension
during gene synthesis (b) 0.25× to 5× LCGreen I The annealing
temperatures for assembly and amplification are 58°C and 49°C,
respectively The concentrations of oligonucleotide and outer primer are 64
nM and 400 nM, respectively 53Figure 4.5: The MgSO4 concentration is critical for successful gene synthesis (a)
Fluorescence of 1× LCGreen I as a function of PCR cycle number for
various concentrations of MgSO4: 1.5 mM (◊), 2.5 mM (□), 3.0 mM (∆),
3.5 mM (×), 4.0 mM (●), and 5.0 mM (○) (b) The corresponding agarose
gel electrophoresis results The TD one-step gene synthesis is conducted
with annealing temperatures of 58°C and 49°C for assembly and
amplification, respectively, 1 mM each of dNTP, 10 nM of
oligonucleotides, and 400 nM of forward and reverse primers Gene
synthesis with 4 mM of MgSO4 provides the best yield of full-length
product 53Figure 4.6: The oligonucleotide concentration is critical in the successful gene
synthesis S100A4 (752 bp) is synthesized with various oligonucleotide
concentrations ranging from 5 nM to 80 nM, and annealing temperatures of
67°C for the first 20 cycles and 49°C for the next 20 cycles (a)
Fluorescence as a function of PCR cycle number for oligonucleotide
concentrations of 5 nM (◊), 7 nM (□), 10 nM (∆), 13 nM (+), 17 nM (×),
20 nM (○), 40 nM (●), 64 nM (▲), and 80 nM (♦) The slopes of
fluorescence increment in the early cycles and cycles #21 indicate the
efficiencies of the assembly and amplification processes (b) The
corresponding agarose gel electrophoresis results 55Figure 4.7: S100A4 (752 bp) is successfully synthesized with various primer
concentrations ranging from 60 nM to 1 µM, as indicated by the sharp,
narrow gel band of the desired length (a) Fluorescence as a function of
PCR cycle number for outer primer concentrations of 60 nM (◊), 120 nM
(□), 200 nM (∆), 300 nM (×), 400 nM (+), and 1 µM (○) The inset shows
the fluorescence signal of the first 20 cycles (b) The corresponding
agarose gel electrophoresis results 56
Figure 4.8: S100A4 is synthesized with various assembly cycles (6-20 cycles),
followed by another 20 cycles for amplification Agarose gel (1.5%)
electrophoresis results indicate full-length assembly is achieved within 11
cycles 57Figure 4.9: S100A4 (752 bp) synthesized with various assembly annealing
temperatures ranging from 58°C to 70°C for the first 20 cycles, followed
by an annealing temperature of 49°C for the next 20 cycles (a)
Fluorescence as a function of PRC cycle number for annealing
temperatures of 58°C (◊), 60°C (□), 62°C (∆), 65°C (×), 67°C (+), and
70°C (○) The inset shows the middle 15 cycles (#13–27) (b) The
corresponding agarose gel electrophoresis results Higher synthesis yield
was obtained with a stringent assembly annealing temperature (> 67°C) 58
Trang 12Figure 5.1: Schematic illustration of Automatic TouchDown (ATD) one-step gene
synthesis combining PCR assembly and amplification into a single stage
The melting temperatures of inner oligonucleotides (Tmo) and outer
primers (Tp1 and Tp2) are designed with the conditions of Tp2 ≥ 72°C
and Tmo - Tp1 ≥ 5°C to minimize potential assembly-amplification
interference and maximize the full-length amplification during PCR 64Figure 5.2: Effect of hybridization reaction time Top: Agarose gel results of (a)
S100A4-1, (b) S100A4-2, and (c) PKB2 synthesized with: (1) 10-s
annealing (70°C) plus 10-s extension (72°C), and (2) 30-s annealing (70C)
plus 90-s extension (72°C) Bottom: The corresponding fluorescent curves
for S100A4-1 (□: 20 s, ■: 120 s), S100A4-2 (Δ: 20 s, ▲: 120 s), and
PKB2 (○: 20 s, ●: 120 s) The concentrations of oligonucleotides and outer
primers are 10 nM and 400 nM, respectively 71Figure 5.3: The synthesis yield is dependent on the extension time S100A4-2 (752 bp)
is synthesized with various extension time from 30 s to 120 s at an
annealing temperature of 70°C (30 s) with oligonucleotide concentration of
(a,c) 10 nM and (b,d) 1 nM (a, b) Fluorescence as a function of extension
time of 30 s (◊), 60 s (▲), 90 s (♦), and 120 s (□) (c, d) The corresponding
agarose gel electrophoresis results The synthesis from 10 nM
oligonucleotides reaches the plateau within 30 cycles, while the reaction
from 1 nM oligonucleotides only enters the amplification phase after 30
cycles 72Figure 5.4: The effect of oligonucleotide concentration on the successful gene
synthesis S100A4-2 (752 bp) is synthesized with various oligonucleotide
concentrations ranging from 1 nM to 40 nM All PCR are conducted with
30-s annealing at 70°C and 90-s extension at 72°C (a) Fluorescence as a
function of PCR cycle number for oligonucleotide concentrations of 1 nM
(□), 5 nM (∆), 10 nM (▲), 15 nM (○), 20 nM (●), and 40 nM (◊) The
change in the slopes of fluorescence increment indicates the emergence of
full-length template (b) The corresponding agarose gel electrophoresis
results The arrow indicates the undesired DNA with 2× length of
full-length template, generated from non-specified full-full-length amplification of
excess PCR 73Figure 5.5: (a,c) S100A4-2 (752 bp) and (b,d) PKB2 (1446 bp) synthesized with
various annealing temperatures ranging from 58°C to 70°C (30 s) and 90-s
extension at 72°C (a,b) Fluorescence as a function of PCR cycle number
for annealing temperatures of 58°C (◊), 60°C (∆), 62°C (□), 65°C (♦),
67°C (○), and 70°C (▲) (c,d) The corresponding agarose gel
electrophoresis results Higher synthesis yield is obtained with a stringent
assembly annealing temperature (70°C) The slope changes in fluorescence
intensity indicate the automatic switch feature in the assembly and
amplification processes 74Figure 5.6: Agarose gel electrophoresis results of conventional 1-step and ATD one-
step (30-cycle) gene syntheses with dNTPs concentrations of 4 mM and
0.8 mM for (a) S100A4-1 (752 bp), (b) S100A4-2 (752 bp) and (c) PKB2
(1446 bp) All PCRs are conducted with 30-s annealing at 70°C and 90-s
extension at 72°C The concentrations of oligonucleotides and outer
primers are 10 nM and 400 nM, respectively 76
Trang 13Figure 5.7: Fluorescent curves of conventional 1-step (▲,♦) and ATD one-step gene
syntheses (Δ,◊) with dNTPs concentration of 4 mM (♦,◊) and 0.8 mM
(▲,Δ) for (a) S100A4-1 (752 bp), (b) S100A4-2 (752 bp), and (c) PKB2
(1446 bp) All PCRs are conducted with 30-s annealing at 70°C and 90-s
extension at 72°C The concentrations of oligonucleotides and outer
primers are 10 nM and 400 nM, respectively 77Figure 5.8: Agarose gel electrophoresis results of S100A4-1 (lanes 1 and 3) and
S100A4-2 (lanes 2 and 4) with oligonucleotide concentrations of 10 nM
and 1 nM, and PKB2 (lane 5) with 1 nM oligonucleotides The arrow
indicates the full-length DNA Syntheses are performed with 30 and 36
cycles, respectively, for 10 nM and 1 nM oligonucleotides, with 30-s
annealing at 70°C and 90-s extension at 72°C 77Figure 6.1: Schematic illustration of PCR-based gene synthesis One-step synthesis
combines PCA and PCR amplification into a single stage The two-step
synthesis is performed with separate stages for assembly and amplification 81Figure 6.2: (a) Fabrication process of microfluidic chip (b) Fabrication process of
hydrogel valve The PCR reactions and hydrogel valves are controlled by
two separate thermoelectric heaters (TE 1 and TE 2) The insertion shows a
closed hydrogel valve (c) Photograph of a two-step gene synthesis chip
with solid-phase PCR purification (65 mm × 50 mm) 84Figure 6.3: (A) Device operation diagram with process time of each step (B) Detailed
schematic diagrams of each step: (a) Oligonucleotides and PCR mixture
were loaded into PCA chamber (highlighted in red) from A1 PCA was
then conducted (b) PCA-assembled solution (pumped through B1) was
mixed with fresh PCR mixture containing outer primers (pumped through
A2) The mixed PCR precursor was illustrated in green (c) Mixed PCR
precursor (green color) was positioned in PCR chamber, and the PCR
amplification was performed (d) PCR-synthesized product (highlighted in
green) and ChargeSwitch reagent (illustrated in yellow with black dots)
were pumped and loaded into beads chamber After mixing and incubation
the magnetic beads were captured by a magnet (e) Magnetic beads were
washed by washing buffer pumped from A5 (f) Elution buffer was loaded
and mixed with magnetic beads, after incubation the magnet was applied to
fix the beads Synthesis product was eluted into elution buffer and
collected through A7 (highlighted in green) 90Figure 6.4: (a) Photographs of micromixer Colored dyes (blue and red) were well
mixed after being shuttled three times between two chambers (b)
Schematic illustration of the experimental arrangement with a syringe
pump, electromagnetic mixer, thermoelectric heaters and data acquisition 91Figure 6.5: The thermal response of in situ photopolymerized hydrogel valve The
valve functions were highly repeatable The insets showed the transitions
of valve functions 92Figure 6.6: Thermal cycling profiles of the custom-built PCR thermal cycler A
thermocouple mounted on the heater was used in the temperature feedback
control (heater temperature) for thermal cycling The temperature
difference between the heater surface and within the PCR chamber
Trang 14Figure 6.7: Agarose gel (1.5%) electrophoresis showing the synthesis yields with
oligonucleotide concentrations of 5–25 nM and outer primer
concentrations of 0.1–0.4 μM for the two-step process Syntheses were
conducted using a commercial thermal cycler (a) PCA results (b) PCR
amplification results 96Figure 6.8: Agarose gel (1.5%) electrophoresis comparing the synthesis results
conducted within commercial thermal cycler (machine) and microfluidic
device (a) One-step process (device: single-chamber chip) and (b)
two-step process (device: two-two-step chip) conducted with an oligonucleotide
concentration of 10 nM and a primer concentration of 0.4 µM 97Figure 6.9: The effect of elution temperature and incubation time on DNA extraction
conducted within microfluidic device (■: 3 min) and standard PCR tube (□:
3 min; ◊: 2 min) 100Figure 7.1: Schematic illustration of chip based error filter module Error enriched
PCR product is pumped through the large inlet of the device, and the
mismatched DNA is captured by the MutS proteins, which are immobilized
on the Ni2+ beads The error depleted DNA is collect at the small outlet of
the device 109Figure S1: Scheme of overlapping PCR gene synthesis 135Figure S2: Calculated annealing possibility distribution of (a) S100A4-1 and (b)
S100A4-2 at oligonucleotide concentration of 1 nM (dash line) and 10 nM
(solid line) Plotted for oligonucleotides with minimum Tm (black line),
maximum Tm (gray line) and average Tm (blue line) 138Figure S3: The melting temperature versus oligonucleotide concentration plot for
oligonucleotide sets of S100A4-1 (dash line) and S100A4-2 (solid line)
Plotted for oligonucleotides with minimum Tm (black line), maximum Tm
(gray line) and average Tm (blue line) Both oligonucleotide sets contains
more than 30 different oligonucleotides The slopes of the average Tm
versus the logarithmic oligonucleotide concentration were ~ 1.21 and 1.28
for S100A4-1 and S100A4-2, respectively 138
Trang 15LIST OF TABLES
Table 1.1: Previous works of gene synthesis 5
Table 1.2: Gene synthesis companies 6
Table 2.1: Comparisons of the oligonucleotide design features of TmPrime with other gene synthesis programs 16
Table 2.2: Comparison of different methods of DNA assembly 21
Table 3.1: Data on oligonucleotides 39
Table 3.2: Comparisons of the oligonucleotide design performance of TmPrime with other gene synthesis programs for S100A4, PKB2, GFPuv and the whole genome of Poliovirus [1] (Genbank FJ517648; 7418 bp) and øX174 bacteriophage [3] (Genbank J02482; 5386 bp) with oligonucleotide concentration of 10 nM 42
Table 4.1: Data of oligonucleotide set 49
Table 4.2: PCR conditions for one-step, non-competition (NC) one-step and two-step gene synthesis 50
Table 4.3: Some reported optimal gene synthesis conditions 61
Table 5.1: Data of oligonucleotide set 66
Table 5.2: Summary of primers for conventional one-step, and ATD one-step gene syntheses All PCR assemblies are performed with an annealing temperature of 70°C 67
Table 6.1: Errors and efficiencies in the synthesis of GFPuv using one-step and two-step processes in the microfluidic device vs standard PCR tube (machine) 99
Table A1.1: TmPrime optimized oligonucleotides set designed for the E coli codon-optimized GFPuv [1] 120
Table A1.2: Fixed-length oligonucleotides set designed for the E coli codon-optimized GFPuv [1] 121
Table A1.3: Oligonucleotides set designed for E coli codon-optimized PKB2 [2] 122
Table A1.4: Oligonucleotides set designed for S100A4 124
Table A2.1: Oligonucleotides set designed for S100A4 125
Table A3.1: Semi-optimized oligonucleotides set (S100A4-1) designed for S100A4 with oligonucleotide concentration of 10 nM 126
Table A3.2: Optimized oligonucleotides set (S100A4-2) designed for S100A4 with oligonucleotide concentration of 10 nM 127
Trang 16Table A3.3: Oligonucleotides set designed for PKB2 with oligonucleotide
concentration of 10 nM 128Table A3.4: Partial list of potential mishybridizations for SA100A4 gene synthesis
predicted by TmPrime gene synthesis software
(http://prime.ibn.a-star.edu.sg) The oligonucleotides are alternately displayed in upper and
lower case for ease of finding the oligonucleotide boundaries Both the
forward and reverse mishybridizations are reported, which have the same
number of matched bases, but may generate different mishybridization
formations during the assembly 130
Table A3.5: Partial list of potential mishybridizations for PKB2 gene synthesis
predicted by TmPrime gene synthesis software
(http://prime.ibn.a-star.edu.sg) 132Table A4.1: Summary of melting temperatures of S100A4-1, S100A4-2 and PKB2
oligonucleotide sets at oligonucleotide concentrations of 10 nM and 1 nM 138
Trang 17CHAPTER I INTRODUCTION
For the last decade, molecular biologists, at large, has focused most of their resources and efforts
on decoding, sequencing and analyzing naturally occurred deoxyribonucleic acids (DNAs) It is not till the beginning of 21st centuries that attention was switched to the creation of synthetic biology This requires the artificial creation of non-natural genes, genomes, proteins, biological process and organisms Gene synthesis, an area in molecular biology which utilizes knowledge in organic chemistry and molecular biology procedures, is a highly efficient technology that is capable of creating full length genes, operons and even geomomes de-novo [1] This technique, first demonstrated by Har Gobind Khorana in 1979 [2], allows the generation of synthetic genes without using biological template was conceived as a means of gene acquisition It also gives biologists the unique flexibility of considering multiple gene design parameters in parallel For example, consideration of codon optimization, suppression schemes on deleterious secondary DNA structures, and generation of specific restriction sties or motifs can be all taken into consideration simultaneously Cello et al [1] has successfully utilized this de novo gene synthesis method to assemble a viral genome of 7.5 kbp in 2002 Likewise, Smith et al [3] and Koduma et al [4] have demonstrated the assembly of bacteriophage genome of 5.4 kbp in 2003 and a gene cluster
as large as 32 kb in 2004 respectively The longest synthetic DNA reported to-date is a genome of Mycoplasma Genitalium of 582 kbp by Venter and co-workers [5] in 2008 These remarkable achievements were the results of meticulous planning with long hours of laborious and repetitive bench-work with depletion of huge quantity chemicals reagents
Indeed, gene synthesis has been becoming an enabling technology for many fields of recombinant or synthetic gene technologies For instance, synthetic gene could be used for protein over-expression in heterologous system [6-8], drug/ vaccine development[9, 10], gene therapy and molecular or protein engineering [11, 12] Gene synthesis technology is also widely used in the study
Trang 18of ancestral genes construction as well as in the development of artificial gene networks and synthetic genomes [13, 14]
The context and challenges of gene synthesis are complex as they require parallel attentions on numerous interconnected parameters The following texts aim to give readers a basic understanding of genes synthesis, the shortcomings or limitation of various synthesis schemes and appreciations on the complexity of current genes synthesis methods
1.1 Overview gene synthesis
In general, generic gene synthesis often employs a “topdown” approach that involves a series of highly complex processing steps as shown in Figure 1.1 Basically, it includes sequential activities
of (i) pre-synthesis oligonucleotides design, (ii) oligonucleotide synthesis (oligonucleotide synthesizer) and (iii) gene synthesis (gene synthesizer) To enhance the quality of synthesized genes, (iv) post-synthesis processes (such as gene purification, error filter) may be required to stamp-out incorrect/ unwanted gene from final synthesized products
As clearly illustrated in Figure 1.1, the success of gene synthesis very much lies with the accuracy in the design of short-oligonucleotides (single-stranded DNA) and predication of correct synthesis conditions This is a very demanding task that requires formidable computation power from bioinformatics software such as DNAWorks [14] and Gene Design [15], etc These bioinformatics software requires user’s input of DNA text-file of target DNA sequence as well as other critical synthesis parameters such as oligonucleotides concentrations, outer-primer concentration, etc Based on these user-input parameters, the bioinformatics software partitions the desired gene sequence into short oligonucleotide sequences required by the oligonucleotides synthesizer Some software also provides supplementary information such as overlapping sites, temperature uniformity of partitioned oligonucleotides, possible mishybridization site, etc It should be noted that the success of any gene synthesis process lies in balance with the ability of bioinformatics software in accurately predicating correct synthesis conditions
Trang 19Figure 1.1: Generic gene synthesis process It takes about two weeks to construct and
deliver an error free DNA
The computed short oligonucleotides sequence information is fed into an oligonucleotides synthesizer where short fragments of single strand nucleic acids with defined sequences are being synthesized This is a highly efficient and inexpensive technology in generating specified short oligonucleotides of desired sequence and length It is noted that the state-of-art oligonucleotides synthesizers have the ability to automatically synthesize oligonucleotides up to about 200 bases However, to reduce the error rate of final gene product due to synthesis errors introduced during oligonucleotides synthesizing process, it is common that the partitioned short oligonucleotides are kept to a range of 15 to 40 bases long for gene synthesis application To date, oligonucleotides synthesis has been commonly used to produce antisense oligonucleotides, small interfering RNA, primers for gene synthesis, and probes for detecting complementary DNA (DNA microarray technology), etc
Next, the synthesized short oligonucleotides (~40 to 90 bases) are fed into a gene synthesizer for subsequent gene assembly (using ligase chain reaction (LCA) or polymerase chain reaction (PCA)) To increase the amount of the target gene, most protocols include a DNA amplification process known as polymerase chain reaction (PCR) to amplify the target gene PCR
Trang 20involves the denaturing and vitro enzymatic replication of target DNA, through the combined reaction of primers (short oligonucleotides containing complementary sequence to the 5’ ends of both strands of the target DNA) and DNA polymerases through iterative thermal cycles In PCR progress, the molecules of DNA are replicated with the help of DNA polymerases, thus doubling the number of DNA molecules Subsequently, each of these molecules is replicated in a second
"cycle" of replication, resulting in four times the number of the original molecules Again, each of these molecules is replicated in a third cycle of replication and so on This technique allows a single piece of DNA to be exponentially amplified, thus creating millions of copies of the original DNA PCR has been extensively modified to perform a wide array of genetic manipulations, diagnostic tests, and for many other uses
Before sending the synthesized genes to users for further application, the synthesized gene may be subjected to a post-synthesis treatment to screen-off unwanted genes from the final product pool The separation of target gene from other impurities (such as truncated gene) can be conducted by extracting the target gene (of desired length) through different ways, such as gel electrophoresis, magnetic charge switch beads, enzymatic digestion, etc In general, most gene synthesis techniques have an error rate about 1 to 5 bases per 1000 bp due to the accumulative errors throughout all synthesis processes In most cases, these errors are resultant of poor quality
of short oligonucleotides or synthesis errors occurred during gene assembly and amplification stages These errors can be removed by error filtering, which utilizes enzymes to recognize and capture/digest the mismatched DNA [16, 17] This is a complicated but very important process which determines the quality of synthesized gene A well-designed error filtering scheme will effectively increase the overall yield of gene synthesis as incorrect sequences, such as mismatches/ mutations/ insertions/ deletions, will be omitted from the produced gene pool The development of error filtering in gene synthesis is still at its infancy stage and requires extensive development effort The error-filtered genes are then ready to be used for cell free protein synthesis [18] or to be inserted into vectors/ cloned for sequencing before being used for future applications
Trang 211.2 Challenges in Gene Synthesis
Table 1.1 shows the results of various gene synthesis research groups in synthesizing DNA with different lengths (from 139 bp to 5.38 kbp) using various assembly and amplification protocols It
is interesting to note that there is no direct correlation of the number of mers of oligonucleotides with respect to the target length of desired DNA The parameters stated in the table were obtained after many iterative experiments Indeed, successful gene synthesis is often resources taxing as there is a genuine lack of a standardized protocol for synthesizing genes with various lengths and sequence complexity
Table 1.1: Previous works of gene synthesis
Trang 22The advancement of DNA synthesis technologies is greatly impeded by its high cost (~USD 0.85 to USD 1.20 per bp) and long turnaround time, which are mainly attributed by the costs of manpower and laboratory equipment, large amount of expensive chemicals used as well
as the complex and time-consuming synthesis processes Gene synthsis requires high accuracy Even a single error in the sequence of a synthetic DNA may lead to the total failure of the entire down stream applications [6-14] Hence, the main challenges faced by the current gene synthesis technologies are to develop novel technologies to produce low cost, high fidelity synthetic genes
at fast turnaround
Table 1.2: Gene synthesis companies
Name Price (USD/bp) Delivery (weeks)
of kinetics and mechanisms of PCR-based gene synthesis, novel gene synthesis approach with ultra-low oligonucleotide concentration, and finally the lab-chip devices to integrate the tedious gene synthesis process into a chip
Several bioinformatics programs have been developed, such as DNAWorks[14], Gene2Oligo[24], GEMS[25] and GeneDesign [15] These programs aim to partition gene sequence into short oligonucleotides with uniform melting temperature, and provide information on potential mishybridization sites Some of them also have useful features to divide long DNA into
Trang 23segment as well as codon optimization for heterologous protein expression However, there is a lack of gene synthesis program that provides all of these features for long DNA (> 5 kbp) or multiplex gene syntheses This prompts us to develop our own bioinformatics software TmPrime, which provides all of the desired functions
Several PCR-based gene synthesis methods have been reported include one-step/two-step overlapping PCR [8, 22], successive PCR [8] and thermodynamically balanced inside-out (TBIO) PCR-based gene synthesis [26] These approaches are developed to optimize the PCR process for long DNA synthesis, or to enhance the efficiency and accuracy of the synthesis process The performances of these approaches are all demonstrated on limited genes (< 5) based on the end-point gel electrophoresis results There is a lack of a model that can predict the gene synthesis In this thesis we establish an accurate gene synthesis model based on a novel TopDown gene synthesis method with the help of real time fluorescence monitoring This is the first time people combined gene synthesis with real time fluorescence study and clearly revealed kinetics of gene synthesis process This model aids in a deep insight of PCR based synthesis process with optimal reaction conditions
The production of synthetic gene is to a large extent hampered by its expensive cost (~USD.0.85 per base pair) with the major expenditure from oligonucleotides (~USD 0.1 per base), which limits its applications for large scale, systematic studies [27, 28] This can be potentially solved by gene synthesis with oligonucleotide from DNA microarray which would offer a significantly reduced cost Current microchips have very low surface areas and hence only a small amount of oligonucleotides (0.1 pmol/mm2) can be produced [29] Thus, the resulting concentration
of eluted oligonucleotides (<1 nM; 100 mm ×100 mm spot size and 1 ml PCR volume) might be insufficient for effective hybridization Moreover, the eluted solution contains thousands of different oligonucleotides, which increase the complexity of DNA assembly process and the possibility of mis-hybridization This prompts us to study the gene synthesis process at ultralow oligonucleotide concentration (1 nM) with the oligonucleotide quantity matched to DNA microarray A novel approach termed Automatic TouchDown gene synthesis is developed, which
Trang 24this is the first reported method which is able to synthesis relative long DNA from ultra low oligonucleotide concentration of 1nM
The cost of man power and laboratory equipment makes up a big portion of the entire expenditure of the whole synthesis process A possible solution to reduce these costs is to integrate the tedious gene synthesis process into a lab-chip device to provide an automatic microsystem for gene synthesis Numerous integrated microchip-based PCRs have been constructed using lab-on-a-chip technologies [30-33] However, most of the reported microPCRs are designed for genetic analysis, not for gene synthesis purpose So far there has been only one work reported by Kong et al [34], which demonstrated a polydimethylsiloxane (PDMS) device for one-step PCR gene synthesis However, this device did not include other steps of gene synthesis process such as DNA purification and error filtering In this PhD work, we have established a two-step gene synthesis microfluidic platform which integrated polymerase cycling assembly, amplification and DNA extraction module into a single chip using a fast prototyping method without lithography process Microfluidic syntheses were successfully attained with low oligonucleotide concentration of 10 nM and primer concentration of 0.4 µm The synthesized products were verified by DNA sequencing and the error rate was comparable to the control experiments conducted in PCR tube with a commercial thermal cycler This device would be useful for constructing a more comprehensive system for fully automated gene synthesis
1.4 Objectives of this PhD thesis
The primary objective of this thesis is to address the problematic issue in gene synthesis to synthesize high quality genes in a cost effective manner with short turnaround time To achieve this, a “parallel-topdown” approach in defining key synthesis parameters is outmost necessary This requires a careful consideration in the design of short oligonucleotides, processing conditions
in gene synthesis process and implementation of error-filter schemes in post-synthesis processing treatment In addition, to increase the synthesis efficiency in gene synthesis process, a fully automated lab-on-chip (LOC) based gene synthesizer will be constructed to validate effectiveness
Trang 25one has demonstrated the whole gene synthesis process in an integrate microfluidic chip It should
be noted that, in this study, the synthesis of short oligonucleotides is being outsource, hence, its contribution towards final error-rate in the final synthesized gene could not be studied numerically
In my effort in addressing the discussed issues of gene synthesis, the following tasks have been finished:
work-Constructed new bioinformatics software that can accurately partition oligonucleotides from target DNA (text-file) The new bioinformatics software is capable of suggesting good synthesis condition based on user-defined parameters It is able to generate short-oligonucleotides sequences with high uniformity in melting temperature while suppressing the generation of deleterious secondary structures It also includes codon optimization and can advice user on the generation of specific restriction site to prevent mis-hybridization during gene assembly process
short-Designed and implemented a gene synthesis model that allows the systematic analysis and studies of kinetics of gene synthesis This is critical given the current lack of synthesis kinetics within gene synthesis processes Vital information can be gathered and fed-back into up- and down-stream process to optimize gene synthesis performance The developed protocols are universal for the synthesizing of genes with different length and complexity at ultra-low oligonucleotides concentration, and have zero, if not minimum, contribution to the overall error-rate of synthesized gene
Designed and constructed an integrated gene synthesis on a microfluidic platform (i.e., gene synthesizer) using the lab-on-a-chip technology This was an extremely challenging task as gene synthesis even with current level of technology To ensure the success of on-chip gene synthesis, new synthesis protocols were developed as the synthesis kinetics is very much different between a microfluidic and a bench platform In addition, a new microfluidic system was developed to facilitate the precise metering, mixing, pumping, isolating, positioning and transporting of fluids in the integrated chip Device material was also carefully evaluated to
Trang 26Designed and constructed a highly accurate thermal cycling module for functionality of
an integrated gene synthesizer This module is capable of rapid and accurate control in thermal cycling conditions throughout the gene synthesis processes The module does not allow temperature fluctuation of more than 0.5 ºC in order to ensure the success of the dedicate gene assembly and amplification
1.5 Thesis Outline
This dissertation includes the following areas: (1) the biochemistry background knowledge of gene synthesis and gene synthesis related lab-on-a-chip technology overview; (2) development of bioinformatics software and its experimental verification; (3) development and the real-time experimental analysis of TopDown one-step gene synthesis protocol; (4) development and real-time experimental analysis of automatic TouchDown gene synthesis protocol; (5) design of an integrated gene synthesis microfluidic platform and its individual functional components, fabrication and experiment setup of the system as well as the preliminary results of integrated gene synthesis on a chip; (6) discussion of the present work and the future research
Chapter 2 addresses the conventional methods used in gene synthesis, including the introduction of bioinformatics software, oligonucleotide synthesis, LCR based gene synthesis, PCR based gene synthesis, real-time PCR, DNA purification/extraction and enzymatic error filtering Also, fundamentals of gene synthesis related lab-on-a-chip technology including microvalves, micromixers, and PCR on a chip are presented
Chapter 3 focuses on the discussion of the development of bioinformatics software and its experimental verifications Especially a novel approach – the real-time florescence analysis of the gene synthesis process is introduced, which reveals unique information of gene synthesis not available from gel electrophoresis studies
Chapter 4 discusses the protocol development and real-time experimental analysis of TopDown one-step gene synthesis method Also, an accurate model predicting optimized PCR conditions is addressed
Trang 27Chapter 5 presents the protocol development and real-time experimental analysis of automatic TouchDown one-step gene synthesis, which is an universal method for the construction
of other genes Further more, the theoretical study of PCR based gene synthesis kinetics is introduced in details
Chapter 6 introduces the design concerns of an integrated gene synthesis microfluidic platform and its individual functional components Besides, fabrication and experiment setup of each functional component are presented
Chapter 7 summarizes the work of the PhD thesis and discusses the future research plan including further optimization of the gene synthesis protocol as will as the development of post gene synthesis process such as error filtering Also, discussion regarding the integrated gene synthesis system performance optimization is presented
Trang 28CHAPTER II GENE SYNTHESIS METHODS
2.1 Introduction
This chapter reviews the background knowledge/concepts essential to this project It comprises three main sections Section 2.2 introduces bioinformatics programs for gene synthesis, an important tool for oligonucleotide design optimization In section 2.3, various gene synthesis approaches and related techniques are introduced in details including PCA, PCR, real-time PCR, DNA purification, error filtering as well as cloning and sequencing of synthetic gene Section 2.4 briefly delineates the fundamentals of gene synthesis related lab-on-chip (LOC) technology including microvalves, micromixers, micro PCR and on-chip DNA purification
2.2 Bioinformatics in Gene Synthesis
The design and manufacturing of custom genes is fast becoming an indispensable tool in synthetic biology [35] and protein engineering [36, 37] Several steps are involved in order to design a gene with high expression level First, the required protein sequence is reverse translated into a nucleotide sequence, where a codon is assigned to each amino acid in the protein sequence Second, codons of the reverse translated nucleotide sequence are modified without altering its translation to generate an optimized sequence for protein expression Finally, the designed DNA sequence is divided into short oligonucleotides for LCR [38] or PCR assembly [22] Figure 2.1 shows the link between different steps of gene design process
The input can be either original protein sequence or DNA sequence With the codon optimization module, a normal coding DNA sequence can be modified to be a highly expressible sequence User-defined sequences, such as restriction enzymes sites, could also be added to enable subsequent cloning and other applications [39] Even the DNA sequence is codon optimized,
in order to prevent errors occurring during chemical gene synthesis process, lots of challenges in oligonucleotides design still need to be settled Firstly, the oligonucleotides should be designed
Trang 29rather short to reduce the possibility of errors during oligonucleotide synthesis; but they should still be long enough to provide stable overlappings The oligos should share similar thermodynamic properties (melting temperature) to ensure uniform hybridization during gene assembly The oligonucleotides should be highly specific to its target to prevent incorrect assembly, and any deleterious secondary structures, namely hetero-dimers, homo-dimers and hairpin loops, in the oligonucleotides must be avoided [14, 24]
Figure 2.1: Flowchart of bioinformatics software Both protein sequence and DNA sequence
are eligible input files The program generates the optimized partition themes of the input sequences regarding user requirements
As presented above, oligonucleotide design can be a time-consuming, difficult and complicated process because many important factors need to be taken into consideration, such as
GC content and distribution of the gene, its codon usage, restriction enzyme sites and the possibility of formation of secondary structures in the synthesized DNA So far, several programs have been developed to automatically design the oligonucleotides based on user-specific hybridization temperature and oligonucleotide length [14, 15, 24, 25, 40, 41]
Trang 30DNAWorks
DNAWORKS (http://mcl1.ncifcrf.gov/lubkowski.html) is the bench mark of current gene synthesis bioinformatics software for the automatic design of oligonucleotides It is capable for both gap and gapless oligonucleotides design It requires simple input information such as the target protein/DNA sequence, the melting temperature of the synthetic oligonucleotides The program also includes codon optimization function which outputs a pool of codon-optimized oligonucleotides sequence for effective protein expression in a chosen organism These optimized oligonucleotides have highly homogeneous melting temperatures and are very unlikely to form any type of deleterious secondary structures such as hairpin loops, homodimers and heterodimers DNAWORKS is suitable to design oligonucleotides for PCR-based gene construction [39]
Gene2Oligo
Gene2oligo (http://berry.engin.umich.edu/gene2oligo/) is also a web-based bioinformatics program It is suitable for the partition of long input DNA sequence The output contiguous oligonucleotides sets are corresponding to both DNA strands Although the length of oligonucleotides can be specially adjusted to ensure both specificity and uniform melting temperatures, sometimes this software encounters convergence failure when designing some gene with high GC content and clusters Gene2oligo can only provide gapless oligonucleotides design which is suitable for both LCR and PCR-based gene synthesis methods
Trang 31by the software, many of which exceeded 5 kb in length, and the longest one was 32 kb[4] GEMS provides gapless oligo design for both PCR assembly and ligation based synthesis method
GeneDesign
Another set of web-based programs (http://slam.bs.jhmi.edu/gd), GeneDesign, was developed by Richardson et al [15] for optimization of protein expression and/or redesign of a gene of interest for detailed structural/functional studies (e.g., mutagenesis) It combines many modules such as codon juggling, reverse translation, silent site insertion, silent site removal, oligo design, random DNA generation, enzyme choosing and vector choosing, to provide a platform for the design of large genes for rapid synthesis
Our research group has also developed an oligonucleotide design program, namely TmPrime It utilizes a novel approach to divide the input gene sequences into oligonucleotides with homologous melting temperature for very long gene sequences Unlike in the existing programs, the user can define the number of oligo fragments instead of the intended melting temperature, and the program will determine the melting temperature automatically The number
of fragments corresponds to the average length of fragment or overlap region (sequence length/fragment number) Beside the core oligonucleotide design module, various modules such
as secondary structure screening and codon optimization are constructed in a cohesive manner TmPrime is fast and flexible with no fundamental limitation on the gene length and GC content It can design oligonucleotides for both gapless and gapped assemblies GeneDesign [15] only employed gapped design, while GeMS [25] only generated gapless oligonucleotides with lengths of
~ 40 nt TmPrime is able to handle much longer sequences than DNAWorks [14] and Gene2Oligo [24]
Long DNAs can be partitioned automatically into pools of smaller pieces with uniform annealing temperature This feature is essential for the whole genome synthesis [3] and artificial protein construction [4] Table 2.1 summarizes the functions of TmPrime and conventional gene synthesis software
Trang 32Table 2.1: Comparisons of the oligonucleotide design features of TmPrime with other gene
synthesis programs
Tm – optimization
Automatic pooling for long DNA
Mispriming analysis
Codon optimization
Ultra-long DNA analysis‡
TmPrime is capable to handle DNA length up to 40 kbp, a unique feature for genome synthesis
2.3 Biochemical method of gene synthesis
Gene synthesis requires several sequential steps shown in Figure 2.2 Firstly oligos are designed
by a bioinformatics program (such as TmPrime) and then chemically synthesized using an oligo synthesizer Pools of oligos are assembled into full-length target DNA template by polymerase cycling assembly (PCA) or ligase chain reaction (LCR), and the template is amplified by second polymerase chine reaction (PCR)
Figure 2.2: Process steps of gene synthesis Oligonucleotides are synthesized as building
blocks for polymerase cycling assembly or ligase chain reaction Synthesized mismatch DNA is filtered out via enzymatic error filtering
Mismatched DNA always exists in the assembled products due to the poor quality of synthetic oligonucleotides Thus, an error filtering step is added to remove those mismatched
Trang 33To create long DNA sequence such as 10 kbp, intermediate DNAs with length around 1 kbp can
be linked together using the same approach
2.3.1 LCR based gene assembly
LCR can be used as a tool to assemble a target gene with long DNA sequence from a pool of short oligos Figure 2.3 shows the basic principle of LCR based gene synthesis The synthetic oligos are first phosphorylated to modify their 5’ ends from hydroxyl groups to phosphate groups with T4 kinase (Figure 2.3a) Then, these oligos are annealed, and linked together to create a long DNA sequence with thermostable ligase enzyme (Figure 2.3b)
For ligation reaction, no gap is allowed between two adjacent oligos Although LCR based gene synthesis is accurate and mishybridization of oligos is hardly going to happen during reaction, this method requires an extra step to phosphorylate 5’ ends of oligos Moreover, high concentration of oligos (in the range of μM) is required to ensure successful oligos hybridization - which could further increase oligonucleotide cost Besides, it takes around 18 hours to complete the entire ligation reaction [3]
Figure 2.3: LCR based gene synthesis (a) Oligos phosphorylation by modifying their 5’ ends
from hydroxyl group to phosphate group using a kinase; (b) Oligos are linked together gradually to form template DNA using thermostable ligase enzyme
Trang 342.3.2 PCR based gene assembly
Similar to LCR, polymerase chain reaction (PCR) is also a nucleic acid amplification technique Instead of using ligase enzyme, PCR uses DNA polymerase enzyme to amplify template DNA exponentially as a result of thermal cycles of denaturation, polymerization and elongation, in the presence of a pair of forward and reverse primers Several PCR-based gene synthesis methods have been developed including: 1) Two-step/one-step overlapping synthesis, 2) Two-step/one-step successive synthesis, 3) Thermodynamically balanced inside-out synthesis, 4) TopDown one-step synthesis (Chapter 4), and 5) Automatically TouchDown one-step synthesis (Chapter 5)
Two-step/one-step overlapping PCA gene synthesis
Figure 2.4 shows the principle of two-step overlapping polymerase cycling assembly (PCA) method Oligo sequences F1 and R1, R1 and F2, and so on are designed to be overlapped The lengths of oligos are usually in the range of 40 to 60 base long with overlapping region around 20 bases The oligos are annealed at the overlapping regions at 52°C for example, followed by DNA polymerase extension in 5’ to 3’ direction at 72°C, and become longer double stranded DNA These DNA segments are denatured by increasing the solution temperature to 95°C and become single stranded oligos The extended oligos are subsequently annealed at the overlapping regions
of extended oligos to form even longer DNA segment By repeating this process several times, short oligos can be assembled into long DNA fragments (with the length of 500 to 1,000 bp) Then a second PCR is conducted by adding outer primers to amplify the amount of assembled full length DNA Long DNA can be effectively synthesized using this method
The two-step gene synthesis method mentioned above can be combined to a single-step PCR reaction [22] where inner oligonucleotides mixture and outer primers are added together with
a combined PCA/PCR thermal cycling process During the thermal cycling, the DNA fragments will be extended as well as being amplified The advantage of this method is that it simplifies the procedure and shortens the time consumption of two-step PCR process It is a good method for
Trang 35long DNA segments (> 1.0 kb) due to the increasing possibility of mis-hybridization among oligos Also, high concentration outer primers tend to anneal with and amplify the DNA fragments being assembled, which lead to the depletion of dNTP before the formation of full-length product
Figure 2.4: Operation principle of two-step overlapping polymerase cycling assembly
Different pools of oligos with sequences partially overlapped are first assembled to long DNA blocks Then the outer primers are added to amplify the amount of assembled full length DNA
Two-step/one-step successive PCA
One-step and two-step successive PCA methods [8] perform DNA assembly successively from one end of the target DNA As shown in Figure 2.5a, two-step DNA assembly starts from hybridization and extension of F5 and R5 After each thermal cycle, the DNA fragments extend successively from F5/R5 end Full-length DNA template can be formed within 20 to 30 cycles depends on the length of the gene Outer primers are added subsequently for PCR amplification
of the assembled full-length DNA template For one-step successive PCR, similar to one-step overlapping PCR, inner oligos and outer primers are added together with a combined assembly and amplification process
Trang 36Thermodynamically balanced inside-out (TBIO)
Thermodynamically balanced inside-out (TBIO) PCA [26] intends to balance the thermodynamic characteristic of oligos by increasing the concentration of oligo mixture inside-out As shown in Figure 2.5b, DNA assembly starts from annealing and extending of the overlapped oligo pair F1 (sense) and R1 (anti-sense) The extended F1/R1 pair then anneals with primer set F2 and R2 and
so on until the generation of full-length DNA For TBIO, the concentration of the inside to outside primers can be adjusted to yield the fully amplified DNA product in one PCR The synthesis process involves several systematic bidirectional extension cycles, and each cycle completes the formation of a fully synthesized inside fragment before the next round of annealing and extension taking place Thereby, the method can produce a well defined and narrow range of DNA products
The comparison of the gene synthesis methods is shown in Table 2.2 Regarding error rates, synthesis time and cost of process, two-step overlapping method gives better performance than other methods mentioned above [8]
(a) (b)
Figure 2.5: (a) Successive extension polymerase cycling assembly method DNA is elongated
successively from oligo R5 and F5 (b) Thermodynamically balanced inside-out polymerase cycling assembly method DNA construction starts from inside oligos F1 and R1, and gradually extended using outside oligos
Trang 37Table 2.2: Comparison of different methods of DNA assembly
Methods Oligo conc Error rate Process time (1kb) Yield
Overlapping Low ≤ 1kb, low
Time, temperature and fluorescence are three most important parameters that can be used
to describe real-time fluorescence monitoring of PCR As the amplification reaction proceeds, the overall trend of fluorescent signal which represents the amount of product increases This fluorescent signal varies during each individual thermal cycle and can only be seen when the DNA being amplified is double stranded In melting peak analyses, the amplified sequences can
be characterized with respect to their apparent melting temperature (Tm), which is a function of product length and base composition Because the Tm of a product is not defined only by length, melting curve analysis offers a superior alternative to gel electrophoresis for separating products with the same size [42, 43] Real-time PCR is used for absolute and relative quantifications of DNA and RNA template molecules and for genotyping in a variety of applications It is also a very important tool in the diagnosis of tumours [44-47] We use real time PCR as a novel approach to monitor gene assembly results and compare its efficiency The purity of assembled products can
Trang 38be estimated by melting curve analysis, which eliminates the necessity for agarose gel electrophoresis This process will be presented in detail in the following chapters
2.3.4 DNA extraction and purification
Besides the full-length assembled DNA, the assembly mixture also contains unused dNTPs, short oligonucleotides and intermediate DNAs with shorter length (truncate) These impurities can greatly affect the downstream process of gene synthesis and are preferred to be removed from the PCR assembled products
The target DNA can be isolated using the conventional method where the gel band of target DNA is cut out from the gel, and DNA is subsequently extracted from gel using various gel purification kits This method is reliable, but it is very tedious and time consuming It is desired to develop solid-phase purification methods which can be easily integrated with microfluidic chips
Solid-phase DNA extraction and purification is an important method widely used for the extraction of contaminant free DNA from samples such as whole blood and saliva, as well as the purification of amplified PCR product for further process [48] Generally speaking, there are two categories of DNA extraction methods, namely, non-specific DNA extraction and specific DNA extraction Most of the modern non-specific DNA purification methods depend on the adsorption
of DNA onto a solid media surface, either through the hydrogen bonding to silica or via electrostatic interactions [49] This kind of purification methods are normally used to purify PCR product or extract DNA and RNA from biological samples such as blood, saliva and tissues for downstream applications Several chip-based nucleic acid extraction methods have been developed by using microfabricated silicon structures [50, 51], silica beads [52] and sol-gel silica [49]
To simply the device integration with other gene synthesis components, silica-coated magnetic beads (ChargeSwitch® beads from Invitrogen) are an attractive approach The ChargeSwitch® beads [53] utilize a similar approach as other reported methods[49-52] DNA is first adsorbed onto the beads silica surface under a low pH condition Then the unbounded impurities are washed away (Figure 2.6a) The adsorbed DNA is released into solution with higher pH and a short heat shock is applied during the elution process to enhance the DNA releasing efficiency
Trang 39Besides the ChargeSwitch beads based non-specific DNA purification, full-length DNA could be specifically purified by utilizing streptavidin or dT25 beads (Figures 2.6b,c) Specific DNA extraction can be applied to isolate specific target DNA from samples, based on either the selective DNA hybridization technique or streptavidin-biotin interaction [54-58] Sequence-specific oligonucleotides immobilized on a solid phase are utilized as probes to capture single-stranded target DNA with sequence complementary to the probe [54] Alternatively, streptavidin-coated paramagnetic beads can be used to capture the biotinylated target DNA [54-58] Compared to the non-specific purification method, these methods provide selective extraction of target DNA for highly sensitive applications such as DNA analysis, mutation detection and molecular diagnostics
(a)
Figure 2.6: Schematic illustrations of non-specific and specific DNA purifications using (a)
ChargeSwitch magnetic beads, (b) streptavidin magnetic beads, (c) oligo (dT)25 magnetic beads
2.3.5 Enzymatic error filtering
Errors or incorrect bases are unavoidable in the initial assembly products Generally, the error rate
of synthetic gene assembled from unpurified oligos is about 1-5 errors/kbp Errors can come from the poor quality of short oligos and the fidelity of polymerase enzyme [59] The poor quality of