1. Trang chủ
  2. » Ngoại Ngữ

A combined statistical and in silico framework for analysis and characterization of microbial and mammalian metabolic networks

220 372 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 220
Dung lượng 8,88 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The emphasis of this research is to develop a framework by systematically combining both statistical and in silico approaches to identify important nutrient components in a culture medi

Trang 1

A COMBINED STATISTICAL AND IN SILICO FRAMEWORK

FOR ANALYSIS AND CHARACTERIZATION OF

MICROBIAL AND MAMMALIAN METABOLIC NETWORKS

SELVARASU SURESH

(B Tech, University of Madras, Chennai, India)

A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF CHEMICAL & BIOMOLECULAR ENGINEERING

NATIONAL UNIVERSITY OF SINGAPORE

2009

Trang 2

Acknowledgements

It is with great pleasure that I take this opportunity to express my gratitude to all those who have helped me in my research progress and more so in shaping my PhD into an enriching experience The research guidance that I got through my advisors Prof I A Karimi and Dr Lee Dong-Yup

at NUS was much more than what I had expected With due respect, I express my sincere gratitude to them for being wonderful and inspiring supervisors Without their immense support, timely inputs, guidance and encouragement my progress was impossible There is no word to explain their influence on my research I also wish to thank them for involving me in several projects and especially in collaborations with research institutes (BTI) which provided me a very good chance to learn more It was indeed a privilege to work with them

I would like to thank Dr Victor Wong and Dr Dave Ow from BTI, Singapore for their immense help in providing me the experimental data I appreciate their patience in explaining the nuances

of experimental strategies whenever I approached them I extend my thanks to A/P Loh Chee and A/P Sanjay Swarup for their kind acceptance to be on the panel of examiners and for valuable suggestions for planning this research during the qualifying exam I also do thank the final reviewers for spending time on evaluating this thesis I also express my gratitude to Dr Lakshminarayanan for his valuable suggestions at different times during my PhD

Kai-I wish to admire and thank all the unknown reviewers of our publications, who gave constructive feedbacks on all our manuscripts and helped us to bring the best out of this research I also take this opportunity to appreciate and thank all those dedicated researchers who shared their research

Trang 3

in the form of literature, website notes, and freely available online data These informations have played a major part in strengthening this research work

I also express my gratitude to all the professors at ChBE/NUS whose valuable lectures/seminars have resulted into good ideas for this research Special thanks to Prof I A Karimi and Prof Rangaiah for giving me an opportunity to teach undergraduate students (it was en enriching experience for me at NUS) that fetched me best tutor award It is indeed an honor I also thank ChBE department for financially supporting all my conference visits

Special thanks to all my labmates and other friends at NUS (if I start naming them the list would keep rolling) for their affectionate support and interactions that made my journey in NUS, a wonderful experience I would also thank all the GSA office bearers for helping me in one way

or the other and bringing the best out of me at different times during GSA activities Lastly, I thank all my professors, students and affectionate friends who trained and inspired me to be what

I am today I will cherish this wonderful journey for long

And most importantly, I thank my parents (Mr Selvarasu, and Mrs Vijaya), my sister (Mrs Veni), my niece and nephew (Dharani and Nirmal) for always being my source of inspiration Their love, continued support and motivation were the main driving force for me during my PhD I am ever grateful and indebted for their care and affection

Trang 4

Table of Contents

List of Tables v

List of Figures vi

Nomenclature x

Abbreviations xi

SUMMARY xii

1 Introduction 1

1.1 Cellular organisms and their complex functioning 1

1.2 Systems biology a new paradigm in biological research 2

1.2.1 Knowledge required for systems biology 3

1.2.2 Approaches in systems biology 5

1.2.3 Opportunities to unravel biological functions 6

1.3 Analysis techniques available in the data rich environment 7

1.4 Motivation for research 8

1.5 Scope of the present work 10

1.6 Organization of the thesis 11

2 Modeling and analysis of biological systems: An overview 14

2.1 Tools available for modeling biological systems 15

2.2 Genome-scale modeling 17

2.3 Constraints-based modeling approach 18

2.4 Other metabolic network simulations 19

2.5 Algorithms available for characterizing metabolic networks 27

2.6 Systems biotechnology: An approach for systematic strain improvement 29

2.7 In silico techniques available for strain improvement 29

2.8 Tools for multivariate data analyses 31

2.9 Research directions 33

3 Framework for combined analysis using statistical and in silico approaches 35

3.1 Introduction 35

3.2 Experimental data and their trend 36

3.3 Data preprocessing and elemental balancing 38

Trang 5

3.3.1 Cumulative consumption and specific rates calculation 39

3.4 Multivariate statistical data analysis (PCA and PLS) 42

3.4.1 Principal component analysis (PCA) 42

3.4.2 Partial least squares regression (PLS) 43

3.5 In silico modeling and analysis 43

3.5.1 Metabolic network reconstruction 45

3.5.2 Constraints-based flux analysis 48

3.6 Application of the framework 50

4 Application of framework for characterizing Escherichia coli DH5α growth and metabolism in a complex medium 51

4.1 Introduction 51

4.2 Materials and methods 53

4.2.1 Strains and culture conditions 53

4.2.2 Analytical techniques 54

4.2.3 Data preprocessing for statistical analysis 55

4.2.4 Constraints-based flux analysis 55

4.3 Results and discussion 56

4.3.1 Growth, metabolite uptake and excretion profiles during batch culture 56

4.3.2 Elemental balancing 60

4.3.3 Multivariate statistical analysis 60

4.3.4 In silico metabolic flux analysis 63

4.3.5 Sensitivity analysis of amino acid and glucose consumption 71

4.3.6 Analysis of the metabolite consumption and utilization 72

4.3.7 Availability of other nutrients in the medium 80

4.3.8 Exploring the statistical analysis results using in silico analysis 82

4.4 Concluding remarks 84

5 Genome-scale modeling and in silico analysis of mouse cell metabolism 86

5.1 Introduction 86

5.2 Materials and methods 88

5.2.1 Metabolic network reconstruction 88

5.2.2 Network visualization 91

Trang 6

5.2.3 Statistical network analysis 92

5.2.4 Constraints-based flux analysis 92

5.3 Results and discussion 93

5.3.1 Genome-scale reconstruction of mouse metabolic network 93

5.3.2 Comparison of mouse model with yeast and E coli genome-scale models 97

5.3.3 In silico model validation 99

5.3.4 Structural and functional characterization of mouse metabolism 104

5.3.5 Important role of lipid pathway in mouse metabolism 112

5.3.6 Alternate flux distributions and flux variations 114

5.4 Conclusion 116

6 Application of framework to elucidate mouse hybridoma cell growth and metabolism in a fed-batch culture 118

6.1 Introduction 118

6.2 Materials and methods 121

6.2.1 Cell line and culture medium 121

6.2.2 Analytical techniques 122

6.2.3 Data preprocessing for statistical analysis 122

6.2.4 Constraints-based flux analysis 124

6.3 Results and discussion 125

6.3.1 Fed batch cell culture 125

6.3.2 Elemental Balancing on Fed-batch Data 130

6.3.3 Multivariate Statistical Analysis 132

6.3.4 In silico metabolic flux analysis 136

6.3.5 Other possible cellular objectives 153

6.3.6 Understanding cellular behavior from combined analysis 154

6.4 Conclusion 157

7 Identification of necessary genes and evaluating their perturbations for strain improvement in E coli 159

7.1 Introduction 159

7.2 Algorithm for identifying sufficient and necessary genes 160

Trang 7

7.2.1 Mathematical formulations and algorithm 161

7.2.2 Identifying set of necessary genes 163

7.3 Application of the algorithm 165

7.3.1 Analysis in E coli DH5α metabolic network 165

7.4 Application of the necessary gene sets to identify knockout combinations for succinate production 168

7.5 Concluding remarks 172

8 Contributions and future recommendations 173

8.1 Summary of the contributions 173

8.2 Future directions 177

8.2.1 Expanding the horizon of mouse cell metabolism 177

8.2.2 Reconstruction of metabolic network of CHO cell lines 180

References 182

Appendices 197

List of Publications 198

VITAE 200

Trang 8

List of Tables

2.1 List of available genome-scale models for various organisms 20

3.1 List of public resources available for reconstruction of genome-scale metabolic

models* 47

4.1 Comparison of metabolic reaction fluxes of amino acids biosynthetic reactions 73

4.2 Sensitivity of amino acids, glucose and trehalose uptake on cell biomass

production in phase 1a 74

4.3 Sensitivity of amino acids, glucose and trehalose uptake on cell biomass

production in phase 2a 75

4.4 Consumption or production of amino acids for biosynthetic demand as well as for

other metabolites production in phase 1a 77

4.5 Consumption or production of amino acids for biosynthetic demand as well as for

other metabolites production in phase 2a 78

4.6 Comparison of ATP consuming metabolic pathways for complex and minimal

medium conditions 79

5.1 Online resources for reconstructing genome-scale mouse metabolic network 89

5.2 Characteristics of the mouse genome-scale metabolic network and its comparison

with the previous generic model 95

5.3 Comparison of mouse genome-scale network characteristics with yeast and E coli

6.1 Summary of specific consumption or production rate of measured metabolites

during the exponential growth phase of the cell culture a 123

6.2 Production and utilization of pyruvate in central metabolism during the exponential

growth phase of the cell culture a 141

6.3 Energy production from central carbon metabolism in all statesa 144

7.1 List of necessary reactions for both cell growth and succinate production 167

7.2 List of double knockout gene combinations that enhances succinate production in

E coli DH5α 170

Trang 9

List of Figures

1.1 Interaction of the different expertise in performing a systems biology research 4

1.2 Flowchart showing the major focus of the current research work and the

organization of the addressed research issues in different chapters of the thesis 13

2.1 Genome-scale reconstruction of metabolic network and elucidation of the systemic

properties using constraints-based analysis approach 22

3.1 Schematic illustration of the workflow involved in the analysis using combined

statistical and in silico framework 37

4.1 Profiles of optical cell density and residual concentration of various nutrient

components and products in the complex medium Highlighted regions correspond

to three different growing phases of the culture Phase 1: initial exponential growth

phase; phase 2: late exponential growth phase; phase 3: acetate consumption phase

A: Optical density values (OD600), concentration of glucose, trehalose and

acetate B: concentration of amino acids which were rapidly consumed; L-aspartate

(ASP), glycine (GLY), proline (PRO), methionine (MET), serine (SER),

L-asparagine (ASN), L-tyrosine (TYR), L-threonine (THR), L-glutamate (GLU) and

L-alanine (ALA) C: concentration of amino acids which were not completely

consumed; L-valine (VAL), L-lysine (LYS), L-isoleucine (ILE), L-leucine (LEU),

L-phenylalanine (PHE), L-histidine (HIS) and L-arginine (ARG) 59

4.2 Results obtained from multivariate statistical analysis 61

4.3 Results of PLS analysis Black arrows indicate positive correlation between those

amino acids and cell growth Dotted arrows indicate positive correlation between

those amino acids and acetate production The negative effect of set of amino acids

on acetate is shown using bold lines and on cell growth is shown with dashed line

A: correlation based on PLS and B: strategies for feed medium design for

enhancing cell viability 62

4.4 Specific consumption rates of all the measured nutrients and specific growth rate

during initial exponential phase (phase 1) and the late growth phase (phase 2) The

value for histidine in phase 1 corresponds to its specific production rate The rates

are ranked according to their specific consumption rates in phase 1 65

4.5 Schematic diagrams of metabolic flux distributions and flux-sum across the

metabolites serine, pyruvate and acetate A: Metabolic flux distribution across the

central metabolic pathways and amino acids biosynthetic pathways during the

exponential growth phase (phase 1: underlined) and late growth phase (phase 2:

normal) of the microbial culture Reactions with higher flux values are highlighted

with red (phase 1) and green (phase 2) Serine, pyruvate and acetate are

highlighted with squares B: consumption and production of the metabolites serine,

pyruvate and acetate are shown using the flux-sum values across each of the

Trang 10

metabolites for phase 1 and phase 2 Percentage contributions to each of the

metabolites are also shown PEP, Phosphoenolpyruvate; GLC, glucose; PYR,

pyruvate; GLY, glycine; TRE, trehalose; MAL, L-malate; TRP, L-tryptopan;

ALAC-S, (S)-2-acetolactate; ACCOA, acetyl coenzyme A; 23DHDP,

2,3-dihydrodipicolinate; 2AHBUT, (S)-2-Aceto-2-hydroxybutanoate; ACSER,

O-acetyl-L-serine; PS_EC, phosphatidylserine; CIT, citrate Annotation of other

metabolites follows that of the iJR904 model (Reed et al., 2003) 68

4.6 Interpretation of statistical and in silico analysis results A: set of positively

correlated amino acids with cell growth and acetate production and the

intracellular conversion of amino acids into various metabolites B: the plausible

effect of reducing amino acids (gly, ile, val and his) in the complex medium at the

intracellular level Arrow with bold outline: positive correlation with cell growth

and arrow with dashed line: positive correlation with acetate production 83

5.1 Schematic representation of the iterative approach employed in the reconstruction

and analysis of genome-scale mouse model The existing model was used as

template and the network was expanded by compiling the information (genome,

biochemical and mouse physiological data) Missing links and redundant reactions

were then identified to refine the model with such available resources The

resultant expanded model underwent the validation process using constraints-based

flux analysis with cell culture and in vivo gene essentiality data for verifying the

prediction The presence of knowledge gaps was explored and again the model can

be improved interactively Subsequently, the model was analyzed both structurally

and functionally to characterize mouse metabolism and identify key pathways,

reactions and metabolites 90

5.2 Functional classifications of metabolic reactions in mouse genome-scale model,

(A) current updated model and (B) old model Numbers on pie charts indicate

reactions in each subsystem Metabolic subsystems with number of gene and

non-gene associated reactions are detailed in the table 96

5.3 Comparison of metabolites across mouse, yeast and E coli genome-scale models

Metabolites from cytosol were only considered for comparison 99

5.4 Comparison of in silico growth rate with experimentally observed growth rate

during batch culture Specific growth rate is in h-1; mAb production rate in mg

gDCW-1 h-1 The bars with black and white colors represent specific consumption

and production rates, respectively 101

5.5 Comparison of in silico substrate requirements with experimentally observed

substrate requirements for cell growth Essential nutrients in the media are

highlighted in red colour and non-essential nutrients are highlighted in blue colour 102

5.6 The connectivity of metabolites in different reactions in the metabolic network

The reactions involved in significantly improved metabolic subsystems such as

carbohydrates, lipids and amino acids metabolisms are indicated by their edge

colours: green, blue and red, respectively Metabolites colors: blue - cytosol, red -

Trang 11

mitochondria, green - extracellular and yellow - cofactors Metabolites and

reactions from amino acids, lipids and carbohydrates metabolism were extracted to

draw individual edge generated graphs Essential reactions and metabolites in the

sub networks are highlighted using cross and star-shaped nodes Network diameter

and average path lengths (APL) for the main network and the three sub-networks

are also shown 106

5.7 Correlation between metabolite degree and betweenness centrality for (A) all

metabolites, (B) essential metabolites and (C) non-essential metabolites The

metabolite can be identified as essential when its removal leads to no growth

Highly-connected, bridging metabolites are highlighted in (A) ACP: acyl carrier

protein, ACCOA: acetyl-coA, ACCOAm: acetyl-coA mitochondiral, AKG:

α-ketoglutarate, AKGm: α-ketoglutarate mitochondrial, AMASA:

L-2-Aminoadipate-6-semialdehyde, ANA: N-acetylneuraminate, CAR: carnitine,

GLAC: D-galactose, GLC: D-glucose, GLU: L-glutamate, GLY: glycine,

MALACP: malonyl-[acyl-carrier-protein], PPIXm: Protoporphyrin mitochondrial,

PYR: pyruvate, SAH: homocysteine, SAM:

S-adenosyl-L-methionine, SUCC: succinate, SER: L-serine and URI: uridine 107

5.8 Visualization of the ACCOA interaction across lipid metabolism, TCA cycle and

glycolysis The enlarged section shows the high connectivity and bridging

characteristics of ACCOA Blue edges: lipid metabolic reactions, green: TCA

cycle and red: glycolysis ACCOA: acetyl-coA 108

5.9 Comparison of (A) metabolite flux-sum and (B) metabolic flux distribution during

cell growth under normal and AKG deletion conditions Metabolites flux sum and

flux distributions in carbohydrates and nucleotides metabolisms are shown in the

enlarged sections Blue and red color bars represent normal and AKG deletion

conditions, respectively AKG: α-ketoglutarate 110

5.10 Classification of essential (A) reactions and (B) metabolites according to different

metabolic subsystems in the mouse metabolism 113

5.11 Reaction usages in multiple optimal flux distribution The graph shows the fraction

of the metabolic flux distributions that utilize a specific reaction categorized under

different subsystems 115

6.1 Profiles of viable cell density, mAb, amino acids, glucose, OUR, lactate and

ammonia in the fed batch culture A: Viable cell density and mAb concentration

B: Glucose, glutamine, OUR, lactate and ammonia concentrations C:

Concentration profiles of all essential amino acids D: Non-essential amino acids

concentrations mAb- monoclonal antibodies (IgG1); ARG- arginine; THR-

threonine; SER- serine; GLY- glycine; TYR- tyrosine; PHE- phenylalanine; MET-

methionine; HIS- histidine; ASN- asparagine; ASP- aspartate; LYS- lysine; VAL-

valine; ILE- isoleucine; GLU- glutamate; LEU- leucine; ALA- alanine; GLN-

glutamine; GLC- glucose; LAC- lactate; NH3- ammonia; OUR- oxygen uptake

rate The concentration of amino acids tryptopan, cysteine and proline were

negligible 128

Trang 12

6.2 Summary of the results from multivariate statistical analysis using PCA and PLS

for fed-batch mouse hybridoma cell culture Amino acids consumption/production

rates were clustered using PCA Correlation between the variables obtained from

PLS analysis is also shown PCA- Principal Component Analysis; PLS- Partial

Least Squares 135

6.3 Schematic illustration of the correlation identified by PLS analysis Dotted lines

indicate the negative interaction of the amino acids (asp, glu and ala) with cell

growth and mAb production rate 136

6.4 Experimental and simulated growth rates for different time points during the

exponential growth phase of the culture 138

6.5 Metabolic flux distributions across the carbohydrate metabolism in hybridoma

cells Flux across the three pathways including glycolysis, pentose phosphate

pathway and TCA cycle are shown for all the 12 time points during the exponential

growth phase 140

6.6 Overall distributions of simulated internal fluxes across different metabolic

pathways on the left for time point V in Figure 3 The expanded region on the right

details the simulated flux values within the central carbon metabolism Bar length

and the direction indicates the minimum and maximum possible flux values

achieved by flux variability analysis 142

6.7 The resulting flux distributions from MFA illustrate consumption of all essential

and non-essential amino acids from the media and subsequent utilization of all

essential amino acids for the production of non-essential amino acids within the

cell 149

6.8 Metabolic activities of the consumed nutrients inside the cell Metabolites in purple

colour, EAA; green, NEAA, black, ala, glu, lac and NH3; red, cell growth and

mAb EAA, essential amino acids; NEAA, non-essential amino acids 156

7.1 The algorithm represents an iterative the method to identify the set of sufficient

genes and their corresponding reactions For executing the algorithm the cellular

objectives (growth, biochemical productions) are fixed at different levels of their

maximum values and minimum sets of genes are determined 162

7.2 Illustration of sufficient genes identification approach Circles 1, 2, and 3 represent

different levels of cellular objectives The shaded region in dark circle shows the

essential set of genes for achieving cellular objective values for all the three cases

and the remaining regions correspond to necessary genes 163

7.3 Number of sufficient genes required for maintaining cell growth rates 166

7.4 Succinate production limits for wild type and the mutants The bold line indicates

the limits for wild type strain and the points indicate double knockout mutants

Red color circle: result of SUCD1i/SUCD4 and NADH6 knockout Blue color

circle: result of SUCD1i and PGL Other combinations are described in table 7.2 168

Trang 13

Nomenclature

V Culture volume (ml)

X v Viable cell concentration (106 cells-1 ml-1)

µ Specific growth rate (h-1)

q s Specific substrate consumption rate (mmol h-1 cell-1)

q p Specific production rate (mmol h-1 cell-1)

S Substrate concentration (mM)

P Product concentration (mM)

S f , Substrate feed concentration (mM)

P f Product feed concentration (mM)

F Feed flow rate (ml h-1)

t Time (h)

v j Reaction flux (mmol gDCW-1 h-1)

α j Lower bound for reaction flux (mmol gDCW-1 h-1)

β j Upper bound for reaction flux (mmol gDCW-1 h-1)

S ij Stoichiometric coefficient of metabolite i in reaction j (dimensionless)

Z objective function in the optimization problem

c j Weight associated with the reaction fluxes in objective function (dimensionless)

M Number of metabolites in the network (dimensionless)

N Number of reactions in the network (dimensionless)

Trang 14

Abbreviations

FBA Flux Balance Analysis

GAMS The General Algebraic Modeling System

IgG1 Immunoglobulin G

LP Linear Programming

mAb Monoclonal Antibody

MDS Multidimensional Scaling

MFA Metabolic Flux Analysis

MILP Mixed Integer Linear Programming

MINLP Mixed Integer Nonlinear Programming

MOMA Minimization of Metabolic Adjustments

OMNI Optimal Metabolic network identification

PCA Principal Component Analysis

PCR Principal Component Regression

PLS Partial Least Squares regression

QP Quadratic Programming

ROOM Regulatory On/Off Minimization

Trang 15

SUMMARY

With advances in new experimental technologies, high throughput experimental data are

generated for describing micro/ macro-molecular cell functions of complex biological

systems Understanding these functions is essential for improvements in biomedical

research and more importantly for biotechnological processes Microbial and mammalian

cells are commonly used by these processes for producing very high-value therapeutics

In recent years, there is an increasing demand for these compounds that points to the need

for improved cell culture performance However, there are complexities associated with

the cell culture mainly due to deviations in the culture conditions, heterogeneous

interactions among different variables in the culture and between different cellular

components, which make it difficult to elucidate the cellular functions In addition,

accumulation toxic metabolites in the culture also lead to reduced productivity or cell

death These complexities pose a major challenge in developing high yielding cell

cultures Motivated from these challenges, the main objectives of this research include

reviewing potential unresolved issues pertaining to understanding the complex

functionalities associated with microbial/ mammalian metabolisms, and resolving them

using suitable techniques, which would enable us to improve the performance of

fermentation processes for producing high-value therapeutics

Multivariate statistical techniques have often been used to extract biologically

relevant information from the high throughput experimental data, even though they do

Trang 16

not provide any insights into the organism’s internal cell metabolic activities To deal

with this, genome-scale modeling approaches can be useful in improving our

understanding of the internal cellular metabolism of organisms Thus, these two

approaches can be concomitantly used to better understand and characterize the complex

microbial and mammalian cellular systems

The emphasis of this research is to develop a framework by systematically

combining both statistical and in silico approaches to identify important nutrient

components in a culture media based on the experimental data and to study the effect of

these components on the internal metabolic behavior of cellular systems This

understanding would be crucial for modifying/designing organisms for enhancing

byproduct yield and in developing efficient biotechnological processes

The major research issues addressed in this work and their corresponding outcomes are:

Combined framework: The first part of the thesis involves development of the

combined framework using multivariate statistical analysis techniques and in silico

modeling approaches for characterizing cell culture fermentation and exploring the

internal cell metabolism The most relevant statistical methods for examining the

experimental data are described Subsequently, various steps and procedure involved in

reconstructing a genome-scale metabolic model and conducting in silico analysis are also

detailed

Application to microbial system: The second part of the thesis includes application of

the framework to microbial metabolic networks E coli was chosen as the model

organism due to its applicability to biotechnological processes The framework was

Trang 17

applied to examine the growth and metabolism of E coli DH5α strain grown in a

complex medium Highly correlated nutrients from the culture media were obtained using

statistical analysis and the effect of nutrient consumption on intracellular metabolism was

explored using constraints-based genome-scale modeling

Application to mammalian system: The third part of the thesis considers analysis and

characterization of mammalian metabolic system In this case, mouse cell lines were used

due to their high degree of application to both biomedical and biotechnological

communities Initially, we have reconstructed the mouse cell metabolic network by

resorting to the genome-scale modeling approach and investigated its structural properties

Subsequently, statistical analysis was performed for a fed-batch culture of mouse

hybridoma cells grown in a complex medium, producing IgG1 (monoclonal antibody) In

silico analysis was then performed using the reconstructed model to elucidate the internal

metabolic states of mouse cells based on the observations from statistical analysis

Strain improvement strategies: The last part of the thesis deals with the development of

a novel optimization algorithm to identify set of necessary genes/reactions in the

metabolic network for cell growth and byproduct production This algorithm can be used

to select gene knockout candidates for mutant phenotypes that can enhance the yield of

desired byproducts (ex amino acids, succinate, etc.) The applicability of this approach

can be easily tested and verified experimentally for developing high-yielding microbial/

mammalian cell lines

Trang 18

1 Introduction

1.1 Cellular organisms and their complex functioning

Cellular functions are often complex due to the high degree of interaction among various

molecules and organelles within a cell and across cells Based on the level of these

internal complexities, living cells/systems have been mainly classified into prokaryotes

and eukaryotes Prokaryotes possess simplest cell structure Nevertheless, their

functioning is highly complex due to their molecular interactions at different time and

spatial scales Eukaryotes exhibit much more complex functions due to the presence of

different organelles within the cell thus making it even harder for understanding their

functions Until recently most of the biological research was devoted to understand the

properties of isolated molecules by reducing the complexity involved in biological

systems However, in reality, cell functions will definitely vary under interactions The

presence of surrounding molecules in a cellular environment may result in activation,

suppression, or regulation of a molecule Thus, functions that arise from the interaction of

different molecules cannot be easily understood/ predicted by studying isolated molecules

Often biological systems vary significantly from physical systems due to their complex

microscopic and macroscopic behavior resulting from the interactions of several

thousands to few millions of different components (Hartwell et al 1999) This entails the

need for a higher level of approach for handling the complexities as well as integrating

functionalities and interactions at different levels for elucidating cellular functions both at

Trang 19

microscopic and macroscopic levels Such inferences are not easily achieved by the

conventional reductionist approach

The availability of genome sequences for different microbial and mammalian

organisms and technological advancements in the field of genomics and high-throughput

experimental techniques have generated wealth of biological data that give information

on genes, mRNA, proteins, and metabolic products and their functions So far, biologists

have not effectively utilized these billions of data due to the challenges involved in

integrating them This complexity is coupled with the difficulty involved in integrating

different cellular organelle functioning Such integration underlies the emergence of

“Systems Biology”, an interdisciplinary research field that aims to develop a quantitative

understanding of cellular functions It involves characterizing different components of a

biological system using the knowledge and techniques of systems engineering (Kitano

2002) The Post-genomic era of cellular biology can focus on utilizing this approach to

understand the mechanisms through which biological functions emerge due to the

interaction of numerous molecular components

1.2 Systems biology a new paradigm in biological research

Systems biology is a new scientific discipline that studies the behavior of complex

biological organizations through the integration of diverse quantitative information and

mathematical modeling to generate predictive hypotheses and elucidate the functions of a

biological system (Aderem 2005; Hartwell et al 1999; Hood et al 2004; Westerhoff and

Palsson 2004) Although engineers have applied the concept of integrating systems

behavior of biological systems for years, the term systems biology came into emergence

Trang 20

as a distinct research paradigm only in recent years The significance of this

interdisciplinary research area is evident from the number publications available in the

name of systems biology in ISI web of science search The number of articles in the topic

systems biology was merely 9 in 2001 This has grown several folds in recent years to an

extent that the number of such articles has exceeded 1000 in 2008 (ISI web of science)

Systems biology research has been propelled by the successes of molecular biology

and genetics, which have made genomic blueprints of numerous organisms, together with

extensive experimental data covering most aspects of cell functions They also present an

opportunity for a significant role of theory that can guide experiments by developing

increasingly complex hypotheses, formed on the basis of modeling the phenomena and

analyzing genomic and other experimental data Beyond the cell level, systems biology

addresses questions of how multi-cellular organisms develop and function, and how

populations interact on the ecological scale

1.2.1 Knowledge required for systems biology

The progress in systems biology requires a deep and detailed understanding of biological

systems, which is essential for identifying the "right” questions It requires development

of novel concepts geared towards living systems, which are extremely heterogeneous,

non-generic and nontrivially coupled to the environment Since it is an intrinsically

interdisciplinary research, it involves expertise and perspectives from different disciplines

such as engineering, biology, computer science, physics, chemistry and mathematics

Ideas and concepts from these diversified fields will enrich physical science as it strives

to describe the complexity of living matter Biology provides a complementary

Trang 21

perspective from which to consider, analyze, and ultimately understand the living world,

whereas physics and chemistry come handy in probing the behavior of molecules and

their activity inside living cells Engineering applications can effectively harness the

power of the living system and solve problems that cannot be solved in any other way

Mathematics is important in developing accurate first principle models of a biological

system (to start with a small subsystem of a single cell) and then predicting dynamics

over time

Figure 1.1 Interaction of the different expertise in performing a systems biology research

Advanced technical expertise from bioinformatics, computation, statistical analysis,

and mathematical modeling are all pivotal for integrating and making sense of large and

complex datasets generated through high throughput experimental techniques Through

integration and modeling, these studies would allow us to better exploit the complexity of

genomics and extract their biological and clinical significance The integration and

modeling of such diverse information can vastly enhance the power of systems biology

approach and it would help us to decipher the mechanism behind the metabolic behavior

Trang 22

and provide new insights for exploration This new technology can also be further

explored with various analyses, modeling, simulations, and design techniques that are

precisely used in electronic, control and system engineering Furthermore a combined

effort by science, engineering and mathematics can be useful in exploring the complex

functional interactions of the system (Fig 2.1)

1.2.2 Approaches in systems biology

Model driven analyses and their experimental validations are the two major components

of systems biology research Analysis of biological systems using this approach can be

mainly categorized into two types The first one is quantitative systems biology that deals

with the extraction of quantified information such as molecular responses in a biological

system to a given perturbation Some of the technology platforms used for this approach

is:

• Gene expression measurement through DNA micro arrays and SAGE

• Protein levels through two-dimensional gel electrophoresis and mass spectrometry,

including phosphoproteomics and other methods to detect chemically modified

proteins

• Metabolomics for small-molecule metabolites

• Glycomics for sugars

These techniques are frequently combined with large-scale perturbation methods,

including gene-based (RNAi, misexpression of wild type and mutant genes) and chemical

approaches using small molecule libraries Robots and automated sensors enable such

Trang 23

large-scale experimentation and data acquisition These technologies are still emerging

and many face problems that the larger the quantity of data produced, the lower the

quality A wide variety of quantitative scientists (computational biologists, statisticians,

mathematicians, computer scientists, engineers, and physicists) are working to improve

the quality of these approaches and to create, refine, and retest the models until the

predicted behavior accurately reflects the phenotype seen

The second category in systems biology is utilized for deriving qualitative

predictions using knowledge from molecular biology to develop causal models

mimicking biological system of interest and proposing hypotheses that explain the

systemic properties These hypotheses can then be confirmed and used as a basis for

developing mathematical models for the system The causal models are used to explain

the effects of biological perturbations qualitatively while mathematical models are used

to predict how different perturbations in the system's environment affect the system

quantitatively

1.2.3 Opportunities to unravel biological functions

The two important questions that may arise from using systems biology approach are:

• Is systems biology suitable for exploring most of the complex biological problems?

• What kind of opportunities and challenges that this field of research provides and

what would be the intellectual outcome in future?

The main goal of systems biology is to utilize the knowledge available in systems

engineering and to get clear understanding of the basic biological functionalities at

Trang 24

microscopic or macroscopic levels It can be foreseen that in a few decades from now,

systems biology research will generate a vast amount of new information about life

processes starting from the role of specific genes to the metabolism of whole organisms

This potential technology can possibly bring about changes in medicine, agriculture,

industry, bioremediation, and energy When such technology is utilized, the mysteries of

biological evolution can be unlocked and the knowledge gained can be useful for creating

something useful for humankind

1.3 Analysis techniques available in the data rich environment

Recent advances in experimental techniques, automation, and sophisticated measurement

technology have resulted in high precision, high speed, and high throughput data This

has initiated an extensive interest and investigations are carried out with the aim of

improving the quality of the data obtained from different biotechnological and

biomedical processes Huge amount of data sets are available in the public databases and

it is possible to do vast database searches and data mining to extract the information of

biological interest Increasing number of genomic projects has also accelerated the

availability of datasets that provide information on gene, protein and physiological data

of multitude of organisms Most of these projects are completed or currently in progress

The excessive reliance of biotechnological, biopharmaceutical, and biomedical industries

on the vast amount of specified datasets provides an opportunity to apply data processing

techniques to gain knowledge from the generated datasets However, the complexity of

the data obtained from these experiments poses a serious challenge to research

community This has resulted in systems level studies for querying and understanding the

biological data sets Various levels of statistical analysis techniques have been

Trang 25

extensively employed for processing the experimental data and gain valuable

information The work presented here also attempts to perform statistical data analysis

mainly for the fermentation processes involving microbial and mammalian cell lines

producing different products ranging from important metabolites to recombinant proteins

1.4 Motivation for research

A detailed literature review of the significance of analyzing complex biological systems

and their functioning is provided in Chapter 2 with important subtopics The need for

systems level analysis of biological systems, higher confidence on the credibility of

computational analysis techniques, utilization of statistical data processing techniques for

bioprocesses are some of the important features that stand out in recent scientific research

literature Observations from this review identified important problems yet to be solved

from the following areas:

Systems biology - overview of the current research activities with more emphasis on

computational analysis of complex networks

Statistical analysis techniques - various techniques available for performing data mining

and preprocessing of experimental data obtained from different cell culture experiments

Microbial and mammalian metabolism - available genomic and biochemical

information for microbial and mammalian metabolic systems, their biotechnological

applications, and limitations

Trang 26

Genome-scale models - an overview on available genome-scale models, and methods for

their reconstruction, which would enable us to develop similar models for microbial and

mammalian systems

Analysis of genome scale models- various analysis techniques available for

genome-scale models, their merits and limitations, and potential strain improvement techniques

Limitations of the existing methodologies or techniques were identified for further

improvements; existence of knowledge gaps in metabolic systems; and the need for a

combined systems approach to understand the biological systems behavior are the key

issues that motivated this research work Some of the potential challenges that are

addressed in this work are highlighted below:

• Challenges involved in complete understanding of the behavior of the biological

systems in particular, microbial and mammalian systems

• Challenges in addressing the complexity of data obtained from experiments such as

batch and fed-batch fermentation cultures

• Challenges in integrating and applying the data analysis techniques that are available

for these fermentation processes

• Challenges involved in the reconstruction of genome-scale models in terms of

available biological information

• Challenges in combining modeling and data analysis techniques

Trang 27

• Challenges associated with the designing of new biological systems for strain

improvement

1.5 Scope of the present work

Advances in genomic revolution and increase in the availability of biological

experimental data motivated us to develop the core objectives of the current research

work It involves developing a frame work for modeling and analyzing microbial and

mammalian systems using a combined statistical and in silico approach to gain insights

about the effect of external cellular environment on the internal cell metabolic behavior

This would enable us to infer the systemic properties of the networks and propose

testable hypotheses for cellular reengineering through strain improvement Following are

some of the specific issues addressed in this study

• Review of various systems level analysis techniques available for analyzing complex

biological systems

• Review of various data analysis techniques available for processing biological data

and utilizing effective methods for preprocessing experimental data

• Identifying biological systems in the context of biomedical research and

biopharmaceutical applications

• Reconstructing metabolic reaction network of the identified organisms with available

genome information

• Identifying suitable in silico analysis techniques for understanding cellular genotype

and its relation to phenotype

Trang 28

• Identifying cellular capabilities and validating the predictions with the available

experimental information

• Development of novel techniques for designing mutant phenotypes in the context of

strain improvement

Figure 1.2 highlights the important issues covered in this research It summarizes the

depth of research in terms of modeling approaches and their combinations for analyzing

biological systems and breadth in terms of applications to well known microbial and

mammalian metabolic systems The work has also focused on addressing the major issues

in developing strategies for biotechnological advancements in terms of strain

improvement for byproduct productions

1.6 Organization of the thesis

Chapter 2 gives an extensive review on the current research initiatives in systems biology

with more emphasis on computational approaches for available for modeling and analysis

of microbial/mammalian metabolism as well as the fermentation processes which use

them Various analysis techniques used to analyze the metabolic models are summarized

The merits and demerits of the techniques are also described in detail The major

challenges and bottlenecks existing with the current approaches are highlighted and the

objectives for this research work have been derived

Chapter 3 gives an overview of the modeling and analysis framework for the

experimental data and the metabolic networks The first step of the framework involves

collection of experimental data from fermentation culture, which is then followed by data

preprocessing and statistical analysis to gain information on extracellular/environmental

Trang 29

effects on cell culture process The external environmental effect on internal cell

metabolism has been explored by constraints-based in silico analysis with the aid of

reconstructed genome-scale models of the organisms

Chapter 4 implements and tests the developed framework on one of the well-studied

microbe, E coli to gain insight within the cell metabolism from statistical as well as in

silico aspects

Chapter 5 describes the reconstruction of the genome-scale model of mouse (one of

the important mammalian cells) based on a previous generic version and from the

updated genome, biochemical and cell physiological data of Mus musculus

Chapter 6 applies the framework to mouse metabolism to elucidate the metabolic

behavior under varying environmental conditions We have effectively used the

reconstructed genome-scale mouse model for the in silico analysis

Chapter 7 provides information on the development of an efficient optimization

algorithm for identifying set of necessary genes that can be used as knockout candidates

for strain improvement This technique has been applied to both E coli metabolic

network for improving succinate production

Chapter 8 highlights the key findings and contributions of the current research

Potential extensions and future recommendations have been identified and suggested

The overall thesis organization is shown in figure 1.2

Trang 30

Figure 1.2 Flowchart showing the major focus of the current research work and the organization

of the addressed research issues in different chapters of the thesis

Trang 31

2 Modeling and analysis of biological systems:

An overview

The normal and abnormal behaviors of a living cell/cellular system are governed by

complex networks of interacting biomolecules Modeling these networks allows us to

make predictions about the cellular behavior under a variety of environmental cues (Price

and Shmulevich 2007), which is essential to both scientific and commercial communities

The main advantage of using modeling approaches is the reduction in time and cost

required for conducting experimental investigations by pinpointing the effect of

important parameters or conditions on the system The value of modeling cellular

behavior using mathematical representation and in silico simulations of their complex

functions has been long recognized (Price et al 2003) So far, many theoretical

approaches have been attempted (Rigoutsos and Stephanopoulos 2007; Wolkenhauer

2002) to model biological systems by describing the functions of few atoms, small

systems, and even large systems such as studying the whole cellular functions of the

organisms (Guardia 2002)

Most modeling approaches attempted to build cellular systems based on metabolic

pathways as they were well characterized both qualitatively and quantitatively (Rigoutsos

and Stephanopoulos 2007; Stephanopoulos et al 1998; Varner and Ramkrishna 1999)

These works mainly focused on modeling with emphasis on in vitro cell cultures (Sidoli

et al 2004) and their behaviors under different conditions A macroscopic approach to

Trang 32

biochemical networks was adopted to simplify the problem by lumping the cell regions

and species concentrations (Bower and Bolouri 2001) This major assumption helps in

formulating the cell as a reactor using kinetics and transport equations (Stephanopoulos

and Stafford 2002) Attempts have also been made to introduce the effect of randomness

through intrinsic (gene expression, mutations, intra-cellular product accumulation) and

extrinsic (cell growth, degradation, environmental changes) stochasticity into in silico

models (Kepler and Elston 2001; Meng et al 2004) This analysis helps to explain

stability, reliability and robustness of biosystems and explore the existence of multiple

stable state systems (Kauffman 1969; Kitano 2004)

2.1 Tools available for modeling biological systems

As the emphasis in this research is mainly on the modeling and analysis approaches, this

section reviews some of the important tools and techniques that are used for modeling of

biological systems and in particular metabolic networks

Many modeling approaches are currently being used to model cellular processes

Due to the presence of many parameters, variables and constraints a variety of numerical

and computational techniques are used in biosystems modeling and analysis (Haefner

1996) In the last two decades, various computational tools such as Cellware, Cell

designer, MetaFluxNet, Gepasi, KINsolver, etc has been proposed for modeling

biological systems Some of the commonly used modeling techniques in these tools are

described next

Kinetic modeling: This technique includes modeling of reaction kinetics for

understanding metabolic pathways (Steuer et al 2006) and in simulating gene interaction

Trang 33

circuits (de Jong 2002) Dynamic simulation of biological systems using set of ordinary

differential equations (ODE) (Mendes 1993), models representing cell division and

growth cycle in bacteria [90], and quantitative metabolism of a whole (hypothetical) cell

(Tomita et al 1999) also use kinetic modeling approaches

Stochastic modeling: It identifies the dynamic interactions of different processes in a

complex biological system (Wilkinson 2006) It provides quantitative understanding of

the cell physiology at multiple scales Such modeling techniques have been successfully

applied to metabolic systems such as E coli to study their lactose regulation system

(Julius et al 2008)

Cybernetic modeling: These models also incorporate kinetic information and predict the

dynamic interactions of the biological system (Kompala et al 1984) The effects of

perturbations on enzyme levels on the rates of substrate production or product formation

are explored It also describes coupling between metabolic fluxes and environmental

conditions These models are used to describe complex dynamic phenomena such as

steady-state multiplicity, oscillatory behavior, unbalanced growth, and futile cycling in a

metabolic network (Kompala 1999; Varner and Ramkrishna 1999)

Although these methods are useful in providing results, they need many parameters

to model complex biological functions such as complete cellular metabolism, which also

increases computational complexity Thus, genome-scale modeling techniques are widely

applied for analyzing biological functions especially intracellular metabolism, as it does

not require any kinetic parameters The current research work mainly focuses on

Trang 34

genome-scale modeling Thus an extensive and in depth review on this modeling technique is

given below

2.2 Genome-scale modeling

The sequencing of first bacterial genome signified a transition in biology from data poor

to data rich environment Since then various sets of ‘omics’ data such as genomics,

proteomics, transciptomics, metabolomics, and fluxomics have been made available

which shifted the modeling approaches to a new horizon This also led to a new challenge

in developing models with organism specific information, capable of integrating various

omics data and accounting for the inherent biological functions (Joyce and Palsson 2006;

Kell 2006; Yaspo 2001) Information on fully sequenced genome and their annotation for

different organisms are available in many public databases (Eppig et al 2007; Kanehisa

et al 2002) This abundance of genomic information and omics data has accelerated the

emergence of ‘genome-scale modeling’ which represents a reconstructed metabolic

network at the genome-level (Papin et al 2003) These genome-scale models are

reconstructed by incorporating diverse data including genome annotation, biochemical

reaction information, and cell physiology experiments The term “Genome-scale”

describes the inclusion of metabolic reactions that could be determined to take place in an

organism based on genome annotation and biochemical literature There are many such

genome-scale models available for different organisms and their reconstruction was

based on 60 – 70% of completed genome annotations Table 2.1 shows several such

genome-scale models of organisms with information on genes, number of reactions and

metabolites With such models, analytical frameworks are developed which can be

transformed into mathematical models for describing cellular functions These

Trang 35

mathematical representations are important to analyze the metabolism and to identify the

gene functions of the organisms The analyses of such models lead to better

understanding of the systems structure and its behavior (modeling and simulation) and

interpretations based on the analysis and predictions are possible to explain the cellular

functions of the organism Hence, it is possible to identify the genotype-phenotype

relationship

2.3 Constraints-based modeling approach

The challenges involved in genome-scale modeling are mainly addressed by a

constraints-based approach which gives rise to flux balance analysis (FBA) (Bonarius et

al 1997; Varma and Palsson 1994a; Varma and Palsson 1994b), a method for studying

the capabilities of metabolic networks at steady state, using linear optimization

techniques and provide quantitative predictions and testable hypotheses (e.g., optimal

growth rate) Analysis of a genome-scale model using FBA approach involves several

steps (Fig 2.1)

• First step: Reconstructing the network of the organism underlying the metabolism

• Second step: Developing constraints for the model that should reflect the working

state of the organism These include reaction stoichiometry, enzyme capacity,

reaction reversibility and biochemical loops based on thermodynamics

• Third step: Constraints lead to define a solution space in which a solution to the

network equations satisfies physiologically meaningful operating conditions This

solution space contains all possible functions of the reconstructed network or all

Trang 36

allowable phenotypes For this, traditionally linear programming using optimization

techniques has been widely used to predict optimal states such as growth and ATP

production

Genome-scale metabolic models using constraints-based approach have been

successfully developed and tested for several organisms like Escherichia coli,

Heamophilus influenzae, Helicobacter pyroli, Sacchromyces cerevisiae, Staphylococcus

aureus, etc

2.4 Other metabolic network simulations

Constraints-based analysis forms the basic platform for carrying out several in silico

analyses to explore the systemic behavior of metabolic networks These techniques and

their merits and demerits are briefly summarized below

Gene deletion studies: In this study, individual reactions associated with the genes are

deleted from in silico models, and the consequences of the deletions can be assessed

(Edwards and Palsson 2000b; Forster et al 2003; Varma and Palsson 1993) Gene

deletions in general modify the allowable states of metabolic network and result in the

reduction of the wild-type solution space These studies have been useful in predicting

the behavior of cell phenotypes of Helicobacter pylori and E coli with an accuracy rate

of 60-90% (Reed and Palsson 2003; Schilling et al 2002) However, gene deletion

studies could also lead to false predictions For instance, if a gene deletion is identified as

lethal and has been experimentally identified as non-lethal, it suggests that there exists an

alternate pathway to fulfill the cell requirements Alternatively, if a gene is identified as

non-lethal in silico and is experimentally found to be lethal, it suggests that another factor

Trang 37

Table 2.1 List of available genome-scale models for various organisms

Genes

in Model

Metabolites Reactions Reference Year

Bacteria

400 451 461 Schilling et al 2000

291 340 388 Schilling et al 2002

Trang 38

551 604 712 Heinemann et al 2005

Archaea

Eukaryotes

672 636 1,038 Kuepfer et al 2005

800 1013 1446 Nookaew et al 2008

Trang 39

Figure 2.1 Genome-scale reconstruction of metabolic network and elucidation of the systemic properties using constraints-based analysis

approach

Trang 40

besides the biomass synthesis is causing the gene deletion lethal (Duarte et al 2004)

Such failure modes are very important as these indicate the presence of incomplete

metabolic network and it can help in updating and improving the in silico models for

better prediction and thereby understanding of the organisms’ physiology

Double Knockouts: Double deletion analysis is similar to single gene deletion In this

case, genes are deleted in pairs and the resulting consequences are assessed and compared

with experimental data (Thiele et al 2005) This eventually identifies essential pairs of

genes (the presence of one or both is necessary, but the removal of both leads to lethal

phenotype) Such analyses can predict the underlying mechanism of regulations between

genes

Gene additions: Although gene deletion is the most commonly used approach in

elucidating cellular physiology, gene addition can also be used An expansion in the

wild-type solution space occurs due to the addition of reactions (corresponding to the added

genes) to the network Gene addition studies have been evaluated for applications like

increasing theoretical yield of amino acids in E coli metabolic network (Burgard and

Maranas 2001; Pharkya and Maranas 2006) The results from a wild type strain were

compared with a strain that has access to 3400 additional reactions available in other

species The increase in the production of amino acids was found by the addition of one

or two genes (reactions) only

Extreme pathways: In metabolic networks, the solution space is bounded by some

unique basis pathways called extreme pathways and all possible flux distributions can be

described as linear combinations of these pathways The cellular functions of a biological

Ngày đăng: 11/09/2015, 21:27

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm