1. Trang chủ
  2. » Thể loại khác

Systems biology in animal production and health, vol 1

161 124 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 161
Dung lượng 3,94 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

For example, many expression quantitative trait loci eQTL are detected as cis-effects with the causal genetic variation located in the regulatory sequences of the gene of interest.. chap

Trang 1

Haja N Kadarmideen Editor

Systems Biology

in Animal

Production and Health, Vol 1

Trang 2

Systems Biology in Animal Production and Health, Vol 1

Trang 4

Library of Congress Control Number: 2016956674

© Springer International Publishing Switzerland 2016

This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifi cally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfi lms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed

The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specifi c statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors

or omissions that may have been made

Printed on acid-free paper

This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG Switzerland

The registered company address is Gewerbestrasse 11, 6330 Cham, Switzerland

Trang 5

The increased prominence of “systems biology” in biological research over the past two decades is arguably a reaction to the reductionist approach exemplifi ed by the genome sequencing phase of the Human Genome Project A simplistic view of the genome projects was that the genome sequence of a species, whether humans, model organisms, plants or farmed animals, represents a blueprint for the organism

of interest, and thus characterising the sequence would reveal the relevant tions Subsequent targets for the reductionist or cataloguing approach were com-plete lists of transcripts (transcriptomes) and proteins (proteomes) for the organism

instruc-of interest The ‘omics approach to the comprehensive characterisation instruc-of an ism, tissue or cell has also been extended to metabolites and hence metabolomes

organ-A catalogue of parts, however, is insuffi cient to understand how an organism tions Thus, a holistic approach that recognises the interactions between compo-nents of the system was required Given the size and complexity of the data and the possible interactions, it was necessary to use advanced mathematical and computa-tional methods to attempt to make sense of the data Thus, “systems biology” in the

func-‘omics era is widely considered to concern the use of mathematical modelling and analysis together with ‘omics data (genome sequence, transcriptomes, proteomes, metabolomes) to understand complex biological systems The predictive aspect of these models is viewed as particularly important Moreover, it is desirable that the models’ predictions can be tested experimentally Systems biology, therefore, con-tributes in part to converting large ‘omics data sets from data-driven biology experi-ments into testable hypotheses

Systems approaches and the use of predictive mathematical models in biological systems long pre-date the post genome project (re-)emergence of systems biology Population biologists/geneticists, epidemiologists, agricultural scientists, quantita-tive geneticists and plant and animal breeders have been developing and success-fully exploiting predictive mathematical models and systems approaches for decades

Quantitative geneticists and animal breeders, for example, have been remarkably successful at developing statistical animal models that are effective predictors of future performance For decades, these successes were achieved without any knowl-edge of the underlying molecular components The accuracy of these models has been increased by using high-density molecular (single nucleotide polymorphism, SNP) genotypes in so-called genomic selection However, whilst the sequences and

Trang 6

genome locations of the SNP markers are known little is known about the functional impact or relevance of the individual SNP loci Further improvements could be achieved through the use of genome sequence data and by adding knowledge of the likely effects of the sequence variants whether coding or regulatory Thus, there is a growing commonality between the systems approaches of quantitative geneticists and animal breeders and the ‘omics version of systems biology

Animals are not only complex biological systems but also function within wider complex systems The recognition that an animal’s phenotype is determined by a combination of its genotype and environmental factors simply restates the latter The environmental factors include, amongst others, feed, pathogens and the micro-biomes present in the gastrointestinal tract and other locations The ‘omics tech-nologies allow not only the characterisation of the components of the animal of interest, but also those of its commensal microbes and the microbes, including pathogens present in its environment

As noted earlier, it is desirable that the mathematical models developed in tems biology are predictive and that the associated hypotheses are testable Genome editing technologies which have been demonstrated in farmed animal species facili-tate hypothesis testing at the level of modifying the genome sequence that deter-mines components of the system of interest

This volume of Systems Biology in Animal Production and Health , edited by

Professor Haja Kadarmideen, explores some aspects of both quantitative genetics and ‘omics led approaches to applying systems approaches to tackling the chal-lenges of improving animal productivity and reducing the burden of disease This book contains some chapters with R codes and other computer programs, workfl ow/pipelines for processing and analysing multi-omic datasets from laboratory all the way to interpretation of results Hence, this book would be particularly useful for students, teachers and practitioners of integrative genomics, bioinformatics and sys-tems biology in animal and veterinary sciences

Adhil et al (chapter “ Advanced Computational Methods, NGS Tools, and Software for Mammalian Systems Biology ”) review the computational methods and tools required to analyse and integrate multi-omics data from different levels including genome sequence, transcriptomics, proteomics and metabolomics The analysis of transcriptomic data and specifi cally RNA-Seq data are described in greater detail by Heras-Saldana et al (chapter “ RNA Sequencing Applied to Livestock Production ”)

Whilst it is generally challenging to identify the causal genetic variants for plex phenotypes, identifying loci with effects on primary traits such as the level of gene expression or levels of a metabolite is easier as effects are often delivered close

com-to the gene For example, many expression quantitative trait loci (eQTL) are detected

as cis-effects with the causal genetic variation located in the regulatory sequences of the gene of interest Of course, most phenotypes of importance to animal production

or health are controlled by the effects of many genes Wang and Michoel (chapter

“ Detection of Regulator Genes and eQTL Gene Networks ”) address the challenge

of identifying the gene networks that capture the interaction between genes from eQTL data Systems genetics and systems biology using gene network methods

Trang 7

with application for obesity using pig models is reviewed by Kogelman and

Kadarmideen ( chapter “ Applications of Systems Genetics and Biology for Obesity Using Pig Models ”) Fontanesi (chapter “ Merging Metabolomics, Genetics, and Genomics in Livestock to Dissect Complex Production Traits ”) reviews metabolite QTL (mQTL), which have similar advantages to eQTL in respect of ease of identi-

fi cation, in pigs and cattle

Rosa et al (chapter “ Applications of Graphical Models in Quantitative Genetics and Genomics ”) discuss the use of stochastic graphical models with an emphasis on Bayesian networks to predict phenotypes, including primary traits such as gene expression levels and end traits from sequence variants and thus arguably traversing the path from sequence to consequence

Professor Alan L Archibald FRSE Deputy Director, Head of Genetics and Genomics The Roslin Institute and Royal (Dick) School of Veterinary Studies

University of Edinburgh Easter Bush, Midlothian EH25 9RG, UK

Trang 8

Systems biology is a research discipline at the crossroad of statistical, tional, quantitative, and molecular biology methods It involves joint modeling, combined analysis and interpretation of high-throughput omics (HTO) data col-lected at many “levels or layers” of the biological systems within and across indi-viduals in the population The systems biology approach is often aimed at studying associations and interactions between different “layers or levels”, but not necessar-ily one layer or level in isolation For instance, it involves study of multidimensional associations or interaction among DNA polymorphisms, gene expression levels, proteins or metabolite abundances With modern HTO biotechnologies and their decreasing costs, hugely comprehensive multi-omic data at all “levels or layers” of the biological system are now available This “big data” at lower costs, along with development of genome scale models, network approaches and computational power, have spearheaded the progress of the systems biology era, including applica-tions in human biology and medicine Systems biology is an established indepen-dent discipline in humans and increasingly so in animals, plants and microbial research However, joint modeling and analyses of multilayer HTO data, in large volumes on a scale that has never been seen before, has enormous challenges from both computational and statistical points of view Systems biology tackles such joint modeling and analyses of multiple HTO datasets using a combination of statistical, computational, quantitative and molecular biology methods and bioinformatics

computa-tools As I wrote in my review article ( Livestock Science 2014, 166:232–248),

sys-tems biology is not only about multilayer HTO data collection from populations of individuals and subsequent analyses and interpretations; it is also about a philoso-phy and a hypothesis-driven predictive modeling approach that feeds into new experimental designs, analyses and interpretations In fact, systems biology revolves and iterates between these “wet” and “dry” approaches to converge on coherent understanding of the whole biological system behind a disease or phenotype and provide a complete blueprint of functions that leads to a phenotype or a complex disease

It is equally important to introduce, alongside systems biology, the

sub-disci-pline of systems genetics as a branch of systems biology It is akin to considering

“genetics” as a sub-discipline of “biology” It is well known that quantitative ics/genomics links genome-wide genetic variation with variation in disease risks or

genet-a performgenet-ance (phenotype or trgenet-ait) thgenet-at we cgenet-an egenet-asily megenet-asure or observe in genet-a

Trang 9

population of individuals However, systems genetics or systems genomics not only performs such genome-wide association studies (GWAS), but also performs linking genetic variations (e.g SNPs, CNVs, QTLs etc.) at the DNA sequence level with variation in molecular profi les or traits (e.g gene expression or metabolomic or proteomic levels etc in tissues and biological fl uids) that we can measure using high-throughput next- and third-generation biotechnologies The systems genetics approach is still “genetics”, because we are looking at those genetic variants that exert their effects from DNA to phenotypic expression or disease manifestations through a number of intermediate molecular profi les Hence, systems genetics

derives its name, as originally proposed in my earlier article ( Mammalian Genome ,

2006, 17:548–564), by being able to integrate analyses of all underlying genetic factors acting at different biological levels, namely, QTL, eQTL, mQTL, pQTL and

so on I have provided a complete up-to-date review and illustration of systems genetics or systems genomics and multi-omic data integration and analyses in our

review paper published in Genetics Selection Evolution (2016), 48:38 Overall,

sys-tems genetics/genomics leads us to provide a holistic view on complex trait heredity

at different biological layers or levels

Whether it is systems biology or systems genetics, the gene ontology annotation

is one of the most important and valuable means of assigning functional information using standardized vocabulary This would include annotation of genetic variants falling into functional groups such as trait QTL, eQTL, mQTL, pQTL Molecular pathway profi ling, signal transduction and gene set enrichment analyses along with various types of annotations form the “icing on cake” For this purpose, several bioinformatics tools are frequently used Most chapters in this book and its associ-ated volume cover these aspects

I would like to point out that systems biology approaches have been proven to be very powerful and shown to produce accurate and replicable discoveries of genes, proteins and metabolites and their networks that are involved in complex diseases or traits In very practical terms, it delivers biomarkers, drug targets, vaccine targets, target transcripts or metabolites, genetic markers, pathway targets etc to diagnose and treat diseases better or improve traits or characteristics in animals, plants and humans In the world of genomic prediction and genomic selection, there have been

an increasing number of studies that have shown high accuracy and predictive power when models include functional QTLs such as eQTL, mQTL, pQTL which,

in fact, are results from systems genetics methods

This book and its associated volume cover the above-mentioned principles, ory and application of systems biology and systems genetics in livestock and animal models and provides a comprehensive overview of open source and commercially available software tools, computer programing codes and other reading materials to learn, use and successfully apply systems biology and systems genetics in animals Overall, I believe this book is an extremely valuable source for students inter-ested in learning the basics and could form as a textbook in higher educational institutes and universities around the world Equally, the book chapters are very relevant and useful for scientists interested in learning and applying advanced HTO studies, integrative HTO data analyses (e.g eQTLs and mQTLs) and computational

Trang 10

the-systems biology techniques to animal production, health and welfare One of the chapters focuses on systems genomics models and computational methods applied

to animal models for elucidating systems biology of human obesity and diabetes The two volumes of this book is a result of contributions from highly reputed scien-tists and practitioners who originate from renowned universities and multinational companies in the UK, Denmark, France, Italy, Australia, USA, Brazil and India

I would like to thank the publisher Springer for inviting me to edit two volumes on this subject, publishing in an excellent form and promoting the book across the globe I am grateful to all contributing authors and co-authors of this book I also wish to thank Ms Gilda Kischinovsky from my research group for proofreading and the staff at Springer involved in production of this book Last but not least,

I wish to thank my wife and children who have given me moral support and strength while I reviewed and edited this book

September, 2016

Trang 11

Detection of Regulator Genes and eQTLs in Gene Networks 1

Lingfei Wang and Tom Michoel

Applications of Systems Genetics and Biology

for Obesity Using Pig Models 25

Lisette J.A Kogelman and Haja N Kadarmideen

Merging Metabolomics, Genetics, and Genomics in Livestock

to Dissect Complex Production Traits 43

Luca Fontanesi

RNA Sequencing Applied to Livestock Production 63

Sara de las Heras-Saldana , Hawlader A Al-Mamun ,

Mohammad H Ferdosi , Majid Khansefi d , and Cedric Gondro

Applications of Graphical Models in Quantitative

Genetics and Genomics 95

Guilherme J.M Rosa , Vivian P.S Felipe , and Francisco Peñagaricano

Advanced Computational Methods, NGS Tools, and Software

for Mammalian Systems Biology 117

Mohamood Adhil , Mahima Agarwal , Prahalad Achutharao , and

Asoke K Talukder

Trang 12

© Springer International Publishing Switzerland 2016

H.N Kadarmideen (ed.), Systems Biology in Animal Production and Health, Vol 1,

works, and to validate predicted networks in silico.

Genetic differences between individuals are responsible for variation in the able phenotypes This principle underpins genomewide association studies (GWAS), which map the genetic architecture of complex traits by measuring genetic variation

observ-at single-nucleotide polymorphisms (SNPs) on a genomewide scale across many

Trang 13

individuals (Mackay et al 2009) GWAS have resulted in major improvements in plant and animal breeding (Goddard and Hayes 2009) and in numerous insights into the genetic basis of complex diseases in human (Manolio 2013) However, quantita-tive trait loci (QTLs) with large effects are uncommon and a molecular explanation for their trait association rarely exists (Mackay et al 2009) The vast majority of QTLs indeed lie in noncoding genomic regions and presumably play a gene regula-tory role (Hindorff et al 2009; Schaub et al 2012) Consequently, numerous studies

have identified cis- and trans-acting DNA variants that influence gene expression

levels (i.e., “expression QTLs”; eQTLs) in model organisms, plants, farm animals, and humans (reviewed in Rockman and Kruglyak 2006; Georges 2007; Cookson

et al 2009; Cheung and Spielman 2009; Cubillos et al 2012) Gene expression programs are of course highly tissue- and cell-type specific, and the properties and complex relations of eQTL associations across multiple tissues are only beginning

to be mapped (Dimas et al 2009; Foroughi Asl et al 2015; Greenawalt et al 2011; Ardlie et al 2015) At the molecular level, a mounting body of evidence shows that

cis-eQTLs primarily cause variation in transcription factor (TF) binding to gene regulatory DNA elements, which then causes changes in histone modifications,

DNA methylation, and mRNA expression of nearby genes; trans-eQTLs in turn can usually be attributed to coding variants in regulatory genes or cis-eQTLs of such

genes (Albert and Kruglyak 2015)

Taken together, these results motivate and justify a systems biological view of quantitative genetics (“systems genetics”), where it is hypothesized that genetic variation, together with environmental perturbations, affects the status of molecular networks of interacting genes, proteins, and metabolites; these networks act within and across different tissues and collectively control physiological phenotypes (Williams 2006; Kadarmideen et al 2006; Rockman 2008; Schadt 2009; Schadt and Björkegren 2012; Civelek and Lusis 2014; Björkegren et al 2015) Studying the impact of genetic variation on gene regulation networks is of crucial importance in understanding the fundamental biological mechanisms by which genetic variation causes variation in phenotypes (Chen et al 2008), and it is expected to lead to the discovery of novel disease biomarkers and drug targets in human and veterinary medicine (Schadt et al 2009) Because the direct experimental mapping of genetic, protein–protein, or protein–DNA interactions is an immensely challenging task, further exacerbated by the cell-type-specific and dynamic nature of these interac-tions (Walhout 2006), comprehensive, experimentally verified molecular networks will not become available for multi-cellular organisms in the foreseeable future Statistical and computational methods are therefore essential to reconstruct trait- associated causal networks by integrating diverse omics data (Rockman 2008; Schadt 2009; Ritchie et al 2015)

A typical systems genetics study collects genotype and gene, protein, and/or metabolite expression data from a large number of individuals segregating for one

or more traits of interest After raw data processing and normalization, eQTLs are identified for each of the expression data types, and a coexpression matrix is con-structed Causal Bayesian gene networks, coexpression modules (i.e., clusters), and/

Trang 14

or causal Bayesian module networks are then reconstructed The in silico validation

of predicted networks and modules using independent data confirms their overall validity, ideally followed by the experimental validation of the most promising find-ings in a relevant cell line or model organism (Fig 1) Here we review the main

analytic principles behind each of the steps from eQTL identification to in silico

network validation and present a selection of most commonly used methods and software for each step Throughout this chapter, we tacitly assume that all data have been quality controlled, preprocessed, and normalized to suit the assumptions of the analytic methods presented here For expression data, this usually means working with log-transformed data where each gene expression profile is centered around zero with standard deviation one We also assume that the data have been corrected for any confounding factors, either by regressing out known covariates or by esti-mating hidden factors (Stegle et al 2012)

Adequate experimental design and data collection

Appropriate data preprocessing and quality control

Expression quantitative trait loci analysis

matrix-eQTL,kruX

Choice of correlation function and calculation of gene co-expression

Co-expression module detection

Gene Expression Omnibus,ArrayExpress

Experimental verification of regulatory pathways

Fig 1 A flow chart for a typical systems genetics study and the corresponding software Steps in

light yellow are covered in this chapter

Trang 15

2 Genetics of Gene Expression

A first step toward identifying molecular networks affected by DNA variants is to identify variants that underpin variations in eQTLs of transcripts (Cookson et al

2009), proteins (Foss et al 2007), or metabolites (Nicholson et al 2011) across individuals When studying a single trait, as in GWAS, it is possible to consider multiple statistical models to explicitly account for additive and/or dominant genetic effects (Laird and Lange 2011) However, when the possible effects of a million or more SNPs on tens of thousands of molecular abundance traits need to be tested, as

is common in modern genetics of gene expression studies, the computational cost of testing SNP–trait associations one by one becomes prohibitive To address this problem, new methods have been developed to calculate the test statistics for the parametric linear regression and analysis of variance (ANOVA) models (Shabalin

2012) and the nonparametric ANOVA model (or Kruskal–Wallis test) (Qi et al

2014) using fast matrix multiplication algorithms, implemented in the software

matrix eQTL (http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/) (Shabalin 2012) and kruX (https://github.com/tmichoel/krux) (Qi et al 2014)

In both software, genotype values of s genetic markers and expression levels of

k transcripts, proteins, or metabolites in n individuals are organized in an s n´

genotype matrix G and k n´ expression data matrix X Genetic markers take values

0, 1, …, l, where l is the maximum number of alleles ( = 2 for biallelic markers), whereas molecular traits take continuous values In the linear model, a linear rela-

tion is tested between the expression level of gene i and the genotype value (i.e., the number of reference alleles) of SNP j The corresponding test statistic is the Pearson

correlation between the ith row of X and the jth row of G, for all values of i and j

Standardizing the data matrices to zero mean and unit variance, such that for all i and j,

l

n il l

n jl

l

n il l

n jl

where GT denotes the transpose of G Hence, a single matrix multiplication suffices

to compute the test statistics for the linear model for all pairs of traits and SNPs.The ANOVA models test if expression levels in different genotype groups origi-nate from the same distribution Therefore, ANOVA models can account for both additive and dominant effects of a genetic variant on expression levels In the para-metric ANOVA model, suppose the test samples are divided into  +1 groups by the

SNP j The mean expression level for gene i in each group m can be written as

jl

,

( ) ( )

=

:

Trang 16

where n(m,j) is the number of samples in genotype group m for SNP j.

Again assuming that the expression data are standardized, the F-test statistic for testing gene i against SNP j can be written as

Let us define the n s´ indicator matrix I(m) for genotype group m, i.e., Ilj( )m = 1

if G jl =m and 0 otherwise Then

l G m il m ij

Hence, for each pair of expression level Xi and SNP Gj, the sum of squares matrix

SSi ( j) can be computed via  -1 matrix multiplications1

In the nonparametric ANOVA model, the expression data matrix is converted to

a matrix T of data ranks, independently over each row In the absence of ties, the

Kruskal–Wallis test statistic is given by

m j

=+

jl

,

( ) ( )

=

:

which can be similarly obtained from the  -1 matrix multiplications

There is as yet no consensus about which statistical model is most appropriate for eQTL detection Nonparametric methods were introduced in the earliest eQTL studies (Brem et al 2002; Schadt et al 2008) and have remained popular, as they are robust against variations in the underlying genetic model and trait distribution More recently, the linear model implemented in matrix eQTL has been used in a number of large-scale studies (Ardlie et al 2015; Lappalainen et al 2013) A com-parison on a data set of 102 human whole blood samples showed that the parametric ANOVA method was highly sensitive to the presence of outlying gene expression

1 There are only  -1 matrix multiplications, because the data standardization implies that

Trang 17

values and SNPs with singleton genotype group Linear models reported the highest number of eQTL associations after empirical False Discovery Rate (FDR) correc-tion, with an expected bias toward additive linear associations The Kruskal–Wallis test was most robust against data outliers and heterogeneous genotype group sizes and detected a higher proportion of nonlinear associations but was more conserva-tive for calling additive linear associations than linear models (Qi et al 2014).

In summary, when large numbers of traits and markers have to be tested for ciation, efficient matrix multiplication methods can be used to calculate all test sta-tistics at once, leading to a dramatic reduction in computation time compared with calculating these statistics one by one for every pair using traditional methods Matrix multiplication is a basic mathematical operation, which has been purposely studied and optimized for tens of years (Golub and Van Loan 1996) Highly effi-

asso-cient packages, such as BLAS (http://www.netlib.org/blas/) and LAPACK (http://www.netlib.org/lapack/), are available for use on generic CPUs and are indeed used

in most mainstream scientific computing software and programming languages, such as Matlab and R In recent years, graphics processor unit (GPU)-accelerated computing, such as CUDA, has revolutionized scientific calculations that involve repetitive operations in parallel on bulky data, offering even more speedup than the existing CPU-based packages The first applications of GPU computing in eQTL analysis have already appeared (e.g., Hemani et al 2014), and more can be expected

The Pearson correlation is the simplest and computationally most efficient

similar-ity measure for gene expression profiles For genes i and j, their Pearson correlation

Trang 18

permuted data, a discrete coexpression network is obtained Assuming that a high degree of coexpression signifies that genes are involved in the same biological pro-cesses, graph theoretical methods can be used, for instance, to predict gene function (Sharan et al 2007).

One drawback of the Pearson correlation is that by definition, it is biased toward

linear associations To overcome this limitation, other measures are available The Spearman correlation uses expression data ranks (cf Section 2) in Eq (1) and will

give high score to monotonic relations Mutual information is the most general

mea-sure and detects both linear and nonlinear associations For a pair of discrete

ran-dom variables A and B (representing the expression levels of two genes) taking values al and bm, respectively, the mutual information is defined as

2005; Faith et al 2007)

It is generally understood that cellular functions are carried out by “modules,” groups of molecules that operate together and whose function is separable from that

of other modules (Hartwell et al 1999) Clustering gene expression data (i.e., ing genes into discrete groups on the basis of similarities in their expression pro-files) is a standard approach to detect such functionally coherent gene modules The literature on gene expression clustering is vast and cannot possibly be reviewed comprehensively here It includes “standard” methods such as hierarchical cluster-ing (Eisen et al 1998), k-means (Tavazoie et al 1999), graph-based methods that operate directly on coexpression networks (Sharan and Shamir 2000), and model- based clustering algorithms which assume that the data are generated by a mixture

divid-of probability distributions, one for each cluster (Medvedovic and Sivaganesan

2002) Here we briefly describe a few recently developed methods with readily available software

Trang 19

3.2.1 Modularity Maximization

Modularity maximization is a network-clustering method that is particularly lar in the physical and social sciences, based on the assumption that intramodule connectivity should be much denser than intermodule connectivity (Newman and Girvan 2004; Newman 2006) In the context of coexpression networks, this method

popu-can be used to identify gene modules directly from the correlation matrix C (Ayroles

et al 2009) Suppose the genes are grouped into N modules M l l, = ¼1, , N Each

module Ml is a nonempty set that can contain any combination of the genes

i= ¼1, ,k , but each gene is contained by exactly one module Also define M0 as the set containing all genes The modularity score function is defined as

ỉè

çç

is a weight function, summing over all the edges

that connect one vertex in A with another vertex in B, and w(x) is a monotonic

function to map correlation values to edge strengths Common functions are

w x( )= x , xb (power law) (Langfelder and Horvath 2008), ebx (exponential) (Ayroles et al 2009), or 1 1( ebx) (sigmoid) (Lee et al 2009)

A modularity maximization software particularly suited for large networks is fast

modularity (http://www.cs.unm.edu/aaron/research/fastmodularity.htm) (Clauset

et al 2004)

Markov Cluster Algorithm

The Markov cluster (MCL) algorithm is a graph-based clustering algorithm, which emulates random walks among gene vertices to detect clusters in a graph obtained

directly from the coexpression matrix C It is implemented in the MCL software

(http://micans.org/mcl/) (Van Dongen 2001; Enright et al 2002) The MCL

algo-rithm starts with the correlation matrix C as the probability flow matrix of a random

walk and then iteratively suppresses weak structures of the network and performs a multistep random walk In the end, only backbones of the network structure remain, essentially capturing the modules of coexpression network To be precise, the MCL

algorithm performs the following two operations on C alternatingly:

• Inflation: The algorithm first contrasts stronger direct connections against weaker

ones, using an element-wise power law transformation, and normalizes each

col-umn separately to sum to one, such that the element C ij corresponds to the

dis-sipation rate from vertex X i to X j in a single step The inflation operation hence

updates C as C® GµC, where the contrast rate µ> 1 is a predefined parameter

of the algorithm After operation Γα, each element of C becomes

p

k pj

• Expansion: The probability flow matrix C controls the random walks performed

in the expansion phase After some integer b ³ 2 steps of random walk, gene

Trang 20

pairs with strong direct connections and/or strong indirect connections through other genes tend to see more probability flow exchanges, suggesting higher prob-abilities of belonging to the same gene modules The expansion operation for the

β-step random walk corresponds to the matrix power operation

C®Cb

The MCL algorithm performs the above two operations iteratively until

conver-gence Nonzero entries in the convergent matrix C connect gene pairs belonging to

the same cluster, whereas all inter-cluster edges attain the value zero, so that cluster structure can be obtained directly from this matrix (Van Dongen 2001; Enright et al

2002)

Weighted Gene Coexpression Network Analysis

With higher than average correlation or edge densities within clusters, genes from the same cluster typically share more neighboring (i.e., correlated) genes The weighted number of shared neighboring genes hence can be another measure of gene function similarity This information is captured in the so-called topological overlap matrix Ω, first defined by Ravasz et al (2002) for binary networks as

2

µ,

or

such that 0£A ij£1 (Zhang and Horvath 2005) Note that in the first case, only positive correlations have high edge weight, whereas in the second case, positive and negative correlations are treated equally The parameter µ> 1 is determined

such that the weighted network with adjacency matrix A has approximately a scale-

free degree distribution (Zhang and Horvath 2005)

In principle, any clustering algorithm (including the aforementioned ones) can

be applied to the topological overlap matrix W In the popular WGCNA software

(http://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/Rpackages/WGCNA/) (Langfelder and Horvath 2008), which is a multipurpose toolbox for

Trang 21

network analysis, hierarchical clustering with a dynamic tree-cut algorithm (Langfelder et al 2008) is used.

Model-Based Clustering

Model-based clustering approaches assume that the observed data are generated by

a mixture of probability distributions, one for each cluster, and takes explicitly into account the noise of gene expression data To infer model parameters and cluster assignments, techniques such as expectation maximization (EM) or Gibbs sampling are used (Liu 2002) A recently developed method assumes that the expression lev-els of genes in a cluster are random samples drawn from a mixture of normal distri-butions, where each mixture component corresponds to a clustering of samples for that module, i.e., it performs a two-way co-clustering operation (Joshi et al 2008)

The method is available as part of the Lemon-Tree package (https://github.com/eb00/lemon-tree) and has been successfully used in a variety of applications (Bonnet

et al 2015)

The co-clustering is carried out by a Gibbs sampler, which iteratively updates the assignment of each gene and, within each gene cluster, the assignment of each experimental condition The co-clustering operation results the full posterior distri-bution, which can be written as

l N u L

a normal distribution function with mean μ and precision τ, and p(μ, τ) is a

nonin-formative normal-gamma prior Detailed investigations of the convergence ties of the Gibbs sampler showed that the best results are obtained by deriving

proper-consensus clusters from multiple independent runs of the sampler In the Lemon-

Tree package, consensus clustering is performed by a novel spectral graph ing algorithm (Michoel and Nachtergaele 2012) applied to the weighted graph of pairwise frequencies with which two genes are assigned to the same gene module (Bonnet et al 2015)

in Coexpression Networks

Pairwise correlations between gene expression traits define undirected coexpression networks Several studies have shown that pairs of gene expression traits can be causally ordered using genotype data (Zhu et al 2004; Chen et al 2007; Aten et al

Trang 22

2008; Schadt et al 2005; Neto et al 2008, 2013; Millstein et al 2009) Although

varying in their statistical details, these methods conclude that gene A is causal for gene B, if the expression of B associates significantly with A’s eQTLs, and this association is abolished by conditioning on the expression of A and on any other

known confounding factors In essence, this is the principle of “Mendelian ization,” first introduced in epidemiology as an experimental design to detect causal effects of environmental exposures on human health (Smith and Ebrahim 2003), applied to gene expression traits

random-To illustrate how these methods work, let A and B be two random variables resenting two gene expression traits, and let E be a random variable representing a SNP, which is an eQTL for gene A and B Because genotype cannot be altered by gene expression (i.e., E cannot have any incoming edges), there are three possible regulatory models to explain the joint association of E to A and B:

1 E® ®A B : the association of E to B is indirect and due to a causal interaction from A to B.

2 E® ®B A : idem with the roles of A and B reversed.

3 A¬ ®E B : A and B are independently associated to E.

To determine if gene A mediates the effect of SNP E on gene B (model 1), one can test whether conditioning on A abolishes the correlation between E and B, using

the partial correlation coefficient

If model 1 is correct, then cor E B A( , | ) is expected to be zero, and this can be

tested, for example, using Fisher’s Z transform to assess the significance of a sample

correlation coefficient The same approach can be used to test model 2, and if ther is significant, it is concluded that no inference on the causal direction between

nei-A and B can be made (using SNP E), i.e., that model 3 is correct For more details,

see (Aten et al 2008), who have implemented this approach in the NEO software

(http://labs.genetics.ucla.edu/horvath/htdocs/aten/NEO/)

Other approaches are based on the same principle but use statistical model tion to identify the most likely causal model, with the probability density functions (PDF) for the models as follows:

Trang 23

so that E fulfils a Bernoulli distribution, A E| undergoes a normal distribution

whose mean depends on E, and that B A| has a conditional normal distribution

whose mean and variance are contributed in part by A For (B E A| , ), the mean

of B also depends on E The parameters of all distributions can be estimated by

maximum likelihood, and the model with the highest likelihood is selected as the most likely causal model The number of free parameters can be accounted using penalties such as the Akaike information criterion (AIC) (Schadt et al

2005)

The approach has been extended in various ways In the study of Chen et al (2007), likelihood ratio tests, comparison to randomly permuted data, and false dis-covery rate estimation techniques are used to convert the three model scores in a single probability value P A( ®B) for a causal interaction from gene A to B This method is available in the Trigger software (https://www.bioconductor.org/pack-ages/release/bioc/html/trigger.html) In the study of Millstein et al (2009) and (Neto et al (2013), the model selection task is recast into a single hypothesis test,

using F-tests and Vuong’s model selection test respectively, resulting in a cance p-value for each gene–gene causal interaction.

signifi-It should be noted that all of these approaches suffer from limitations due to their inherent model assumptions In particular, the presence of unequal levels of mea-surement noise among genes, or of hidden regulatory factors causing additional correlation among genes, can confuse causal inference For example, excessive

error level in the expression data of gene A, may mistake the true structure

E® ®A B as E® ®B A These limitations are discussed by Rockman (2008) and Li et al (2010)

Mechanisms

Bayesian networks are probabilistic graphical models that encode conditional dependencies between random variables in a directed acyclic graph (DAG) Although Bayesian network cannot fully reflect certain pathways in gene regula-tion, such as self-regulation or feedback loops, they still serve as a popular method for modeling gene regulation networks, as they provide a clear methodology for learning statistical dependency structures from possibly noisy data (Friedman et al 1999a, 2000; Koller and Friedman 2009)

Trang 24

We adopt our previous convention in Section 2, where we have the gene

expres-sion data X and genetic markers G The model contains a total of k vertices (i.e.,

random variables), Xi with i= ¼1, ,k, corresponding to the expression level of gene

i Given a DAG , and denoting the parental vertex set of Xi by Pa( )  ( )X i , the acyclic property of  allows to define the joint probability distribution function as

÷

÷( )

å

where (α i, σ i) and β ji are parameters for vertex X i and edge X j ®X i respectively, as part of the DAG structure  Under such modeling, the Bayesian network is called

a linear Gaussian network

The likelihood of data X given the graph  is

i k l

logp(|X)=logp(X|)+logp( ) -logp( )X ,

where p( ) is the prior probability for , and p(X) is a constant when the

expres-sion data are provided, so the follow-up calculations do not rely on it

Typically, a locally optimal DAG is found by starting from a random graph and randomly ascending the likelihood by adding, modifying, or removing one directed edge at a time (Friedman et al 1999a, 2000; Koller and Friedman 2009) Alternatively, the posterior distribution p( | X) can be estimated with Bayesian inference using Markov chain Monte Carlo simulation, allowing us to estimate the significance levels at an extra computational cost The parameter values of α, β, and

σ, as part of , can be estimated with maximum likelihood

When Bayesian network is modified by a single edge, only the vertices that receive a change would require a recalculation, whereas all others remain intact This significantly reduces the amount of computation needed for each random step

A further speedup is achievable if we constrain the maximum number of parents each vertex can have, either by using the same fixed number for all nodes or by preselecting a variable number of potential parents for each node using, for instance,

a preliminary L1-regularization step (Schmidt et al 2007)

Two DAGs are called Markov equivalent if they result in the same PDF (Koller and Friedman 2009) Clearly, using gene expression data alone, Bayesian networks can only be resolved up to Markov equivalence To break this equivalence and uncover a more specific causal gene regulation network, genotype data are

Trang 25

incorporated in the model inference process The most straightforward approach is

to use any of the methods in the previous section to calculate the probability

P X( i ®X j) of a causal interaction from Xi to Xj (Zhu et al 2004, 2008, 2012; Zhang et al 2013), for example, by defining the prior as

in the model, with the constraint that traits can depend on SNPs, but not vice versa However, the additional complexity of both methods means that they are computa-tionally expensive and have only been applied to problems with a handful of traits (Neto et al 2010; Scutari et al 2014)

A few additional “tips and tricks” are worth mentioning:

• First, when the number of vertices is much larger than the sample count, we may break the problem into independent subproblems by learning a separate Bayesian network for each coexpression module (Section 3.1 and Zhang et al 2013) Dependencies between modules could then be learned as a Bayesian network among the module eigengenes (Langfelder and Horvath 2007), although this does not seem to have been explored

• Second, Bayesian network learning algorithms inevitably result in locally mal models, which may contain a high number of false positives To address this problem, we can run the algorithm multiple times and report an averaged net-work, only consisting of edges that appear sufficiently frequent

opti-• Finally, another technique that helps in distinguishing genuine dependencies

from false positives is bootstrapping, where resampling with replacement is

exe-cuted on the existing sample pool A fixed number of samples are randomly selected and then processed to predict a Bayesian network This process is repeated many times, essentially regarding the distribution of sample pool as the true PDF, and allowing to estimate the robustness of each predicted edge, so that only those with high significance are retained (Friedman et al 1999b) In theory, even the whole pipeline of Fig 1 up to the in silico validation could be simulated

in this way Although bootstrapping is computationally expensive and mostly suited for small data sets, it could be used in conjunction with the separation into modules on larger data sets

Mechanisms

Module network inference is a statistically well-grounded method that uses listic graphical models to reconstruct modules of coregulated genes and their upstream regulatory programs and that has been proven useful in many biological case studies

Trang 26

probabi-(Bonnet et al 2015; Segal et al 2003; Friedman 2004; Qu et al 2016) The module network model was originally introduced as a method to infer regulatory networks

from large-scale gene expression compendia, as implemented in the Genomica

soft-ware (http://genomica.weizmann.ac.il) (Segal et al 2003) Subsequently, the method has been extended to integrate eQTL and gene expression data (Lee et al 2006, 2009; Zhang et al 2010) The module network model starts from the same formula as Eq (2) It is then assumed that genes belonging to the same module share the same parents and conditional distributions; these conditional distributions are parameterized as decision trees, with the parental genes on the internal (decision) nodes and normal distributions on the leaf nodes (Segal et al 2003) Recent algorithmic innovations decouple the module assignment and tree structure learning from the parental gene assignment and use Gibbs sampling and ensemble methods for improved module net-work inference (Joshi et al 2008, 2009) These algorithms are implemented in the

Lemon-Tree software (https://github.com/eb00/lemon-tree), a command line software suite for module network inference (Bonnet et al 2015)

We have recently identified genomewide significant eQTLs for 6500 genes in seven tissues from the Stockholm Atherosclerosis Gene Expression (STAGE) study (Foroughi Asl et al 2015) and performed coexpression clustering and causal net-works reconstruction (Talukdar et al 2016) To illustrate the above concepts, we show some results for a coexpression cluster in visceral fat (88 samples, 324 genes), which was highly enriched for tissue development genes (P = ´5 10- 10) and con-tained 10 genomewide significant eQTL genes and 25 transcription factors, includ-ing eight members of the homeobox family (Fig 2a)

A representative example of an inferred causal interaction is given by the pression interaction between huntingtin-associated protein 1 (HAP1, chr17 q21.2- 21.3) and forkhead box G1 (FOXG1, chr14 q11-q13) The expression of both genes

coex-is highly correlated (r = 0 85 , P =4 4 10 ´ - 24, Fig 2b) HAP1 expression shows a significant, nonlinear association with its eQTL rs1558285 (P =1 2 10 ´ - 4); this SNP also associates significantly with FOXG1 expression in the cross-association test (P = 0 0024 ), but not anymore after conditioning FOXG1 on HAP1 and its own eQTL rs7160881 (P = 0 67 ) (Fig 2c) By contrast, although FOXG1 expres-sion is significantly associated with its eQTL rs7160881 (P = 0 0028 ), there is no association between this SNP and HAP1 expression (P = 0 037 ), and conditioning

on FOXG1 and HAP1’s eQTL has only a limited effect (P = 0 19 ) (Fig 2d) Using conditional independence tests (Section 4.1), this results in a high-confidence pre-diction that HAP1 ® FOXG1 is causal

A standard greedy Bayesian network search algorithm (Schmidt et al 2007) was run on the aforementioned cluster of 324 genes Figure 2e shows the predicted con-sensus subnetwork of causal interactions between the 10 eQTLs and the 25 TFs This illustrates how a sparse Bayesian network can accurately represent the fully connected coexpression network (all 35 genes have high-mutual coexpression, cf Fig 2a)

Trang 27

Figure 2f shows a typical regulatory module inferred by the Lemon-Tree

soft-ware, also from the STAGE data Here, a heat map is shown of the genotypes of an eQTL (top), the expression levels of a regulatory gene (middle), predicted to regu-late a coexpression module of 11 genes (bottom) The red lines indicate sample clusters representing separate normal distributions inferred by the model-based co- clustering algorithm (Section 3.2)

Networks

Gene regulation networks reconstructed from omics data represent hypotheses about the downstream molecular implications of genetic variations in a particular cell or tissue type An essential first step toward using these networks in concrete

HAP1 standardized expression

rs7160881 genotype

FOXG1 HAP1 HAP1 adj

d

TBX5 PITX2

FMO3 ZBTB16 MESP2 HAP1

OBSCN

TP63

TEF TTC39B HOXD8 KLF5

HLA−DQB1

HOXA4 HOXB3

ISL1 DLK1

ASCL1 CDH1 FOXE1 HOXB7

HOXC6

FOXG1 SALL2 IRF6 HOXA7 HOXC9 ZBTB25

TRIM29 BCL11A

THNSL2

HOXA5 PLCD4

VASN GSC

e

f

SNP_A-8471683 ACVR1C ADIPOQ CIDEC PLIN4 THRSP SLC19A3 GPD1 DGAT2 TNMD CIDEA

−0.2

−0.15

−0.1

−0.05 0 0.05 0.1 0.15 0.2 0.25 0.3

rs1558285 genotype

FOXG1 HAP1 HAP1 adj

Fig 2 (a) Heat map of standardized expression profiles across 88 visceral fat samples for 10

eQTL genes and 25 TFs belonging to a coexpression cluster inferred from the STAGE data (b) Coexpression of HAP1 and FOXG1 across 88 visceral fat samples (c) Association between

HAP1’s eQTL (rs1558285) and expression of HAP1 (red), FOXG1 (blue), and FOXG1 adjusted for HAP1 and FOXG1’s eQTL (green) (d) Association between FOXG1’s eQTL (rs7160881) and expression of FOXG1 (blue), HAP1 (red), and HAP1 adjusted for FOXG1 and HAP1’s eQTL (green) (e) Causal interactions inferred between the same genes as in (a) using Bayesian network inference (f) Example of a regulatory module inferred by Lemon-Tree from the STAGE data See

Section 4.4 for further details

Trang 28

applications (e.g., discovering novel candidate drug target genes and pathways) consists of validating them using independent data The following is a nonexhaus-

tive list of typical in silico validation experiments.

Model Likelihood Comparison and Cross Validation

When different algorithms are used to infer gene network models, their log- likelihoods can be compared to select the best one (With the caveat that the same data that was used to learn the models is used to compare them, this comparison is

meaningful only when the algorithms optimize exactly the same (penalized) log- likelihood functions.) In a K-fold cross-validation experiment, the available samples are divided into K subsets of approximately equal size For each subset, models are

learned from a data set consisting of the K -1 other subsets, and the model hood is calculated using only the unseen data subset Thus, cross validation is used

likeli-to test the generalizability of the inferred network models likeli-to unseen data For an example where model likelihood comparison and cross validation were used to compare two module network inference strategies, see Joshi et al (2009)

Functional Enrichment

Organism-specific gene ontology databases contain structured functional gene annotations (Ashburner et al 2000) These databases can be used to construct gene signature sets composed of genes annotated to the same biological process, molecu-lar function or cellular component Reconstructed gene networks can then be vali-dated by testing for enriched connectivity of gene signature sets using a method proposed by (Zhu et al 2008) For a given gene set, this method considers all net-work nodes belonging to the set and their nearest neighbors, and from this set of nodes and edges, the largest connected subnetwork is identified Then the enrich-ment of the gene set in this subnetwork is tested using the Fisher exact test and compared with the enrichment of randomly selected gene sets of the same size

Comparison with Physical Interaction Networks

Networks of transcription factor–target interactions based on ChIP-sequencing data (Furey 2012) from diverse cell and tissue types are available from the ENCODE

(The ENCODE 2012), Roadmap Epigenomics (Kundaje et al 2015), and

modEN-CODE (Gerstein et al 2010; Roy et al 2010; Yue et al 2014) projects, whereas physical protein–protein interaction networks are available for many organisms

through databases such as the BioGRID (Chatr-Aryamontri et al 2015) Because of indirect effects, networks predicted from gene expression data rarely show a signifi-cant overlap with networks of direct physical interactions A more appropriate vali-dation is therefore to test for enrichment for short connection paths in the physical networks between pairs predicted to interact in the reconstructed networks (Bonnet

et al 2015)

Gene Perturbation Experiments

Gene knockout experiments provide the ultimate gold standard of a causal network intervention, and genes differentially expressed between knockout and control experiments can be considered as true positive direct or indirect targets of the

Trang 29

knockout gene Predicted gene networks can be validated by compiling relevant (i.e., performed in a relevant cell or tissue type) gene knockout experiments from

the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) or ArrayExpress

(https://www.ebi.ac.uk/arrayexpress/), and comparing the overlap between gene sets responding to a gene knockout and network genes predicted to be downstream

of the knockout gene Overlap significance can be estimated by using randomized networks with the same degree distribution as the predicted network

Although combining genotype and transcriptome data to reconstruct causal gene networks has led to important discoveries in a variety of applications (Civelek and Lusis 2014), important details are not incorporated in the resulting network models, particularly regarding the causal molecular mechanisms linking eQTLs to their tar-get genes, and the relation between variation in transcript levels and protein levels, with the latter ultimately determining phenotypic responses Several recent studies

have shown that at the molecular level, cis-eQTLs primarily cause variation in

tran-scription factor binding to gene regulatory DNA elements, which then causes changes in histone modifications, DNA methylation, and mRNA expression of nearby genes (reviewed in Albert and Kruglyak 2015) Although mRNA expression can be used as a surrogate for protein expression, due to diverse posttranscriptional regulation mechanisms, the correlation between mRNA and protein levels is known

to be modest (Lu et al 2007; Schwanhausser et al 2011), and genetic loci that affect mRNA and protein expression levels do not always overlap (Foss et al 2007; Wu

et al 2013) Thus, an ideal systems genetics study would integrate genotype data and molecular measurements at all levels of gene regulation from a large number of individuals

Human lymphoblastoid cell lines (LCLs) are emerging as the primary model system to test such an approach Whole-genome mRNA and micro-RNA sequenc-ing data are available for 462 LCL samples from five populations genotyped by the

1000 Genomes Project (Lappalainen et al 2013); protein levels from quantitative mass spectrometry for 95 samples (Wu et al 2013); ribosome occupancy levels from the sequencing of ribosome-protected mRNA for 50 samples (Cenik et al

2015); DNA-occupancy levels of the regulatory TF PU.1, the RNA polymerase II subunit RBP2, and three histone modifications from the ChIP sequencing of 47 samples (Waszak et al 2015); and the same three histone modifications from the ChIP sequencing of 75 samples (Grubert et al 2015) These population-level data sets can be combined further with three-dimensional chromatin contact data from Hi-C (Rao et al 2014) and ChIA-PET (Grubert et al 2015), knockdown experi-ments followed by microarray measurements for 59 transcription-associated factors and chromatin modifiers (Cusanovich et al 2014), and more than 260 ENCODE assays (including the ChIP sequencing of 130 TFs) (The ENCODE 2012) in a refer-ence LCL cell line (GM12878) Although the number of samples where all mea-sures are simultaneously available is currently small, this number is sure to rise in

Trang 30

the coming years, along with the availability of similar measurements in other cell types Despite the challenging heterogeneity of data and analyses in the integration

of multi-omics data, web-based toolboxes, such as GenomeSpace (http://www.genomespace.org) (Qu et al 2016), can prove helpful to nonprogrammer researchers

Conclusions

In this chapter, we have reviewed the main methods and software to carry out a systems genetics analysis, which combines genotype and various omics data to identify eQTLs and their associated genes, to reconstruct coexpression networks and modules, to reconstruct causal Bayesian gene and module networks, and to

validate predicted networks in silico Several method and software options are

available for each of these steps, and by necessity, a subjective choice about which ones to include had to be made, based largely on their ability to handle large data sets, their popularity in the field, and our personal experience of using them Where methods have been compared in the literature, they have usually been performed on a small number of data sets for a specific subset of tasks, and results have rarely been conclusive That is, although each of the presented meth-ods will give somewhat different results, no objective measurements will consis-tently select one of them as the “best” one Given this lack of objective criterion, the reader may well prefer to use a single software that allows to perform all of the presented analyses, but such an integrated software does not currently exist.Nearly all of the examples discussed referred to the integration of genotype and transcriptome data, reflecting the current dominant availability of these two data types However, omics technologies are evolving at a fast pace, and it is clear that data on the variation of TF binding, histone modifications, and post-transcriptional and protein expression levels will soon become more widely available Developing appropriate statistical models and computational methods

to infer causal gene regulation networks from these multi-omics data sets is surely the most important challenge for the field

Acknowledgments The authors’ work is supported by the BBSRC (BB/M020053/1) and Roslin Institute Strategic Grant funding from the BBSRC (BB/J004235/1).

Trang 31

Basso K et al (2005) Reverse engineering of regulatory networks in human b cells Nat Genet 37:382–390

Björkegren JL et al (2015) Genome-wide significant loci: how important are they?: systems ics to understand heritability of coronary artery disease and other common complex disorders

Cusanovich DA et al (2014) The functional consequences of variation in transcription factor ing PLoS Genet 10, e1004226

bind-Daub CO et al (2004) Estimating mutual information using B-spline functions – an improved larity measure for analysing gene expression data BMC Bioinf 5:118

simi-Dimas AS et al (2009) Common regulatory variation impacts gene expression in a cell dent manner Science 325:1246–1250

type–depen-Eisen MB et al (1998) Cluster analysis and display of genome-wide expression patterns PNAS 95:14863–14868

Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 30:1575–1584

Faith JJ et al (2007) Large-scale mapping and validation of Escherichia coli transcriptional

regula-tion from a compendium of expression profiles PLoS Biol 5, e8

Foroughi Asl H et al (2015) Expression quantitative trait loci acting across multiple tissues are enriched in inherited risk of coronary artery disease Circulation Cardiovasc Genet 8:305–315 Foss EJ et al (2007) Genetic basis of proteome variation in yeast Nat Genet 39:1369–1375 Friedman N (2004) Inferring cellular networks using probabilistic graphical models Science 308:799–805

Friedman N, Nachman I, Peér D (1999) Learning bayesian network structure from massive ets: the “sparse candidate” algorithm In Proceedings of the fifteenth conference on uncertainty

datas-in artificial datas-intelligence, UAI’99 Morgan Kaufmann Publishers Inc., San Francisco,

pp 206–215

Friedman N, Goldszmidt M, Wyner A (1999b) Data analysis with Bayesian networks: a bootstrap approach In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence Morgan Kaufmann Publishers Inc, San Francisco, pp 196–205

Trang 32

Friedman N et al (2000) Using Bayesian networks to analyze expression data J Comput Biol 7:601–620

Furey TS (2012) ChIP–seq and beyond: new and improved methodologies to detect and ize protein–DNA interactions Nat Rev Genet 13:840–852

character-Georges M (2007) Mapping, fine mapping, and molecular dissection of quantitative trait loci in domestic animals Annu Rev Genomics Hum Genet 8:131–162

Gerstein M et al (2010) Integrative analysis of the Caenorhabditis elegans genome by the CODE project Science 330:1775–1787

modEN-Goddard ME, Hayes BJ (2009) Mapping genes for complex traits in domestic animals and their use in breeding programmes Nat Rev Genet 10:381–391

Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn The Johns Hopkins University Press, Baltimore

Greenawalt DM et al (2011) A survey of the genetics of stomach, liver, and adipose gene sion from a morbidly obese cohort Genome Res 21:1008–1016

expres-Grubert F et al (2015) Genetic control of chromatin states in humans involves local and distal chromosomal interactions Cell 162:1051–1065

Hartwell LH et al (1999) From molecular to modular cell biology Nature 402:C47–C52

Hemani G et al (2014) Detection and replication of epistasis influencing transcription in humans Nature 508:249–253

Hindorff LA et al (2009) Potential etiologic and functional implications of genome-wide tion loci for human diseases and traits Proc Natl Acad Sci 106:9362–9367

associa-Joshi A, Van de Peer Y, Michoel T (2008) Analysis of a Gibbs sampler for model based clustering

of gene expression data Bioinformatics 24:176–183

Joshi A et al (2009) Module networks revisited: computational assessment and prioritization of model predictions Bioinformatics 25:490–496

Kadarmideen HN, von Rohr P, Janss LL (2006) From genetical genomics to systems genetics: tial applications in quantitative genomics and animal breeding Mamm Genome 17:548–564 Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques The MIT Press, Cambridge, MA

poten-Kundaje A et al (2015) Integrative analysis of 111 reference human epigenomes Nature 518:317–330

Laird N, Lange C (2011) The fundamentals of modern statistical genetics Springer, New York Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co- expression modules BMC Syst Biol 1:54

Langfelder P, Horvath S (2008) Wgcna: an r package for weighted correlation network analysis BMC Bioinf 9:559

Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for r Bioinformatics 24:719–720

Lappalainen T et al (2013) Transcriptome and genome sequencing uncovers functional variation in humans Nature 501:506–511

Lee S et al (2006) Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification Proc Natl Acad Sci U S A 103:14062–14067

Lee SI et al (2009) Learning a prior on regulatory potential from eqtl data PLoS Genet 5, e1000358

Li Y et al (2010) Critical reasoning on causal inference in genome-wide linkage and association studies Trends Genet 26:493–498

Liu JS (2002) Monte Carlo strategies in scientific computing Springer, New York

Lu P et al (2007) Absolute protein expression profiling estimates the relative contributions of scriptional and translational regulation Nat Biotech 25:117–124

tran-Mackay TF, Stone EA, Ayroles JF (2009) The genetics of quantitative traits: challenges and pects Nat Rev Genet 10:565–577

pros-Manolio TA (2013) Bringing genome-wide association findings into clinical use Nat Rev Genet 14:549–558

Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene expression profiles Bioinformatics 18:1194–1206

Trang 33

Michoel T, Nachtergaele B (2012) Alignment and integration of complex networks by hypergraph- based spectral clustering Phys Rev E 86:056111

Millstein J et al (2009) Disentangling molecular relationships with a causal inference test BMC Genet 10:23

Neto EC et al (2008) Inferring causal phenotype networks from segregating populations Genetics 179:1089–1100

Neto EC et al (2010) Causal graphical models in systems genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes Ann Appl Stat 4:320

Neto EC et al (2013) Modeling causality for pairs of phenotypes in system genetics Genetics 193:1003–1013

Newman MEJ (2006) Modularity and community structure in networks PNAS 103:8577–8582 Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks Phys Rev E 69:026113

Nicholson G et al (2011) A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection PLoS Genet 7, e1002270

Qi J et al (2014) kruX: Matrix-based non-parametric eQTL discovery BMC Bioinf 15:11

Qu K et al (2016) Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace Nat Methods 13:245–247

Rao SS et al (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping Cell 159:1665–1680

Ravasz E et al (2002) Hierarchical organization of modularity in metabolic networks Science 297:1551–1555

Ritchie MD et al (2015) Methods of integrating data to uncover genotype-phenotype interactions Nat Rev Genet 16:85–97

Rockman MV (2008) Reverse engineering the genotype–phenotype map with natural genetic ation Nature 456:738–744

vari-Rockman MV, Kruglyak L (2006) Genetics of global gene expression Nat Rev Genet 7:862–872 Roy S et al (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE Science 330:1787–1797

Schadt EE (2009) Molecular networks as sensors and drivers of common human diseases Nature 461:218–223

Schadt EE, Björkegren JL (2012) New: network-enabled wisdom in biology, medicine, and health care Sci Transl Med 4:115rv1

Schadt EE et al (2005) An integrative genomics approach to infer causal associations between gene expression and disease Nat Genet 37:710–717

Schadt EE et al (2008) Mapping the genetic architecture of gene expression in human liver PLoS Biol 6, e107

Schadt EE, Friend SH, Shaywitz DA (2009) A network view of disease and compound screening Nat Rev Drug Disc 8:286–295

Schaub MA et al (2012) Linking disease associations with regulatory information in the human genome Genome Res 22:1748–1759

Schmidt M, Niculescu-Mizil A, Murphy K (2007) Learning graphical model structure using L1-regularization paths AAAI 7:1278–1283

Schwanhausser B et al (2011) Global quantification of mammalian gene expression control Nature 473:337–342

Scutari M et al (2014) Multiple quantitative trait analysis using Bayesian networks Genetics 198:129–137

Segal E et al (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data Nat Genet 34:166–167

Shabalin AA (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations Bioinformatics 28:1353–1358

Trang 34

Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression analysis In Proc Int Conf Intell Syst Mol Biol 8:16

Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function Mol Syst Biol 3:88

Smith GD, Ebrahim S (2003) ‘mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32:1–22

Stegle O et al (2012) Using probabilistic estimation of expression residuals (peer) to obtain increased power and interpretability of gene expression analyses Nat Protoc 7:500–507 Talukdar H et al (2016) Cross-tissue regulatory gene networks in coronary artery disease Cell Syst 2:196–208

Tavazoie S et al (1999) Systematic determination of genetic network architecture Nat Genet 22:281–285

The ENCODE (2012) Project Consortium An integrated encyclopedia of DNA elements in the human genome Nature 489:57–74

Van Dongen SM (2001) Graph clustering by flow simulation Dissertation, Utrecht University Repository

Walhout AJ (2006) Unraveling transcription regulatory networks by protein–DNA and protein– protein interaction mapping Genome Res 16:1445–1454

Waszak SM et al (2015) Population variation and genetic control of modular chromatin ture in humans Cell 162:1039–1050

architec-Williams RW (2006) Expression genetics and the phenotype revolution Mamm Genome 17:496–502

Wu L et al (2013) Variation and genetic control of protein abundance in humans Nature 499:79–82

Yue F et al (2014) A comparative encyclopedia of DNA elements in the mouse genome Nature 515:355–364

Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network sis Stat Appl Genet Mol Biol 4:17

analy-Zhang W et al (2010) A Bayesian partition method for detecting pleiotropic and epistatic eQTL modules PLoS Comput Biol 6, e1000642

Zhang B et al (2013) Integrated systems approach identifies genetic nodes and networks in late- onset Alzheimer’s disease Cell 153:707–720

Zhu J et al (2004) An integrative genomics approach to the reconstruction of gene networks in segregating populations Cytogenet Genome Res 105:363–374

Zhu J et al (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks Nat Genet 40:854–861

Zhu J et al (2012) Stitching together multiple data dimensions reveals interacting metabolomic and transcriptomic networks that modulate cell regulation PLoS Biol 10, e1001301

Trang 35

© Springer International Publishing Switzerland 2016

H.N Kadarmideen (ed.), Systems Biology in Animal Production and Health, Vol 1,

In this chapter, we will describe the state of the art of genetic studies on human obesity, using pig populations We will describe the features of using the pig as a model for human obesity and briefl y discuss the genetics of obesity, and we will focus on systems genetic research performed using pigs with their contribution

to human obesity research

1 The Pig as a Model for Human Obesity

Throughout the history of biomedical research, animals have been extensively used

as a model for human diseases Animal models have several advantages with respect

to costs, ethical potential, and measurement of phenotypic characteristics The use

of animal models in biomedical research has been previously described in depth (Hau 2008 ), showing that the choice of animal model in biomedical research is

L J A Kogelman ( * ) • H N Kadarmideen

Department of Large Animal Sciences, Faculty of Health and Medical Sciences , University of Copenhagen , Grønnegårdsvej 7 , 1870 Frederiksberg C , Denmark

e-mail: lisette.kogelman@regionh.dk

Trang 36

highly dependent on the genetic, physiological, and/or psychological features of both animal and disease under study For human obesity, rodents are a commonly used animal model, but because of major anatomical and physiological differences, the translational efforts to human medical science have been limited (Houpt et al

1979 ; Spurlock and Gabler 2008 ) To overcome those major differences, the pig

( Sus scrofa ) has successfully been used as a model for human obesity

The digestive tract is one of the key organs in obesity research (Halsted 1999 ) and, therefore, needs to be considered when choosing an animal model for human obesity Zooming in on the anatomy of the digestive tract of pigs, it can be shown that it is very similar to that of humans Both species are omnivorous, and their digestive tract consists of the esophagus, stomach, small intestine (consisting of duodenum, jejunum, and ileum), and large intestine (consisting of cecum, colon, rectum, and anus) Furthermore, the digestive tract has proportionally the same size: the stomach has a capacity of 6–8 l in the pig compared to 2–4 l in humans (Curtus and Barnes 1994), the small intestine is approximately 18 m long in the pig (Razmaite et al 2009 ) compared to 7 m in humans (Gray 1918 ), and the large intes-tine is approximately 6 m in the pig (Razmaite et al 2009 ) compared to 1.5 m in humans

Also genetically, the pig is very similar to humans Recently (2013), the pig genome was assembled and analyzed (Groenen et al 2012 ), resulting in the annota-tion of protein-coding genes and gene transcripts, with numbers comparable to the human genome (Table 1 )

Based on the pig's genetic background, Groenen et al ( 2012 ) also showed the potential of the pig as a biomedical model For example, they detected that at 112 positions, the amino acid sequences were equal to the human orthologs that were implicated in human disease Moreover, several studies have used the pig as a bio-medical model, regarding, for example, heart physiology, brain, gut physiology and nutrition, biomechanical models, respiratory function, and infectious disease mod-els (Lunney 2007 ; Michael Swindle and Smith 2008 ) Here, we will focus on the use of the pig as a model for human obesity and its application in systems biology research

Already in 1979, the use of the pig as a biomedical model to study human obesity was reviewed (Houpt et al 1979 ) Several similarities between the pig and the humans were discussed with respect to obesity-related phenotypes For example, both in humans and in pigs, fat is mainly stored in subcutaneous adipose tissue, and fat cell size and number are similar (Gurr et al 1977 ) High-density lipid protein

Table 1 Overview genome

No of chromosomes 19 pairs 23 pairs

No of base pairs 2,596,639,456 3,272,480,989 Protein coding genes 21,640 20,345

Trang 37

(HDL) is also structurally and compositionally similar (Davis et al 1974 ) Importantly, as in humans, there is no indication that obesity in pigs is caused by a single locus or gene, but it is most likely caused by several loci or homozygous recessive genotypes for obesity Genetically, there is, however, a difference in vari-ous pig breeds: a number of breeds show a strong propensity for obesity (e.g., Ossabaw pig and Göttingen minipig), whereas others have been bred for centuries for their lean meat content (e.g., Yorkshire and Duroc)

As mentioned, the pig as a biomedical model has major advantages in parison with human studies with respect to the measurement of phenotypic characteristics Because of costs and ethical reasons, it is easier to measure a wide range of phenotypes under controlled experimental conditions, and after the study period, the animal was slaughtered and samples from different tissues and cells were collected Also during the study period, deep phenotyping can

com-be obtained using dual- energy X-ray absorptiometry (DXA) scanning DXA has shown its potential in human obesity studies because of the precise mea-surement of the body composition with respect to fat mass Several studies have shown the potential of estimating the fat mass percentage of pigs using DXA scanning For example, it was shown that the percentage of body fat mea-sured by DXA was not significantly different from estimation by chemical analysis, but more extensive calibration may be needed for total body analysis (Mitchell et al 1996 ), which was similarly shown in small pigs (Mitchell et al

1998 ) Likewise, a study using production pigs (Large White × Landrace pigs) showed a high accuracy of determining body composition using DXA scanning (Suster et al 2003 )

One of the well-known pig breeds in relation to obesity is the Göttingen Minipig This minipig is bred for its small size and ease in handling, which is one of its main advantages in an experimental setting (Johansen et al 2001 ) It has been shown that this breed becomes severely obese when fed a high-fat, high-energy diet, and their glycemic control is similar to what has been observed in humans (Johansen et al

2001 ) For example, pigs fed the high-fat, high-energy diet had a fat content of 15.2% ± 0.7% vs 10.0% ± 1.2% in the low-fat, low-energy diet However, dissimi-larities were also observed with the high-fat, high-energy diet: the obese pig showed

an increase in triglycerides and HDL cholesterol concentration, whereas in the obese human, triglyceride levels were increased but HDL cholesterol values were decreased Surprisingly, the Göttingen Minipig also develops obesity on a normal,

ad libitum diet and, therefore, is classifi ed to be prone to obesity (Bollen et al 2005 ) Another breed with high potential in obesity research is the Ossabaw pig, another miniature pig breed They possess a thrifty genotype, similar to humans, which results in the ability to store large amounts of fat during feasting and consequently survive periods of famine (Dyson et al 2006 ) Studies have shown their excellent potential as a model for obesity and the progression to type 2 diabetes and other implications of obesity (e.g., coronary artery disease) Extensive discussions on type and use of many breeds of minipigs in biomedical research and sources to pro-

cure them for research can be found in the book The Minipig in Biomedical Research

(McAnulty et al 2011 )

Trang 38

In contrast to the obese pigs, production pigs (e.g., Duroc and Yorkshire) have been bred for centuries for less fat or for their lean meat content to live up to the standards for human consumption, leading to a pig breed that is genetically predis-posed for leanness Although those animals are less valuable in an experimental setting because of their size, the normal production setting has great potential In research, e.g., performed by animal breeding industries, a large amount of data are collected to breed animals that are growing fast and lean, with a high feed effi -ciency These data have vast opportunities to be related to human obesity studies, gaining knowledge about the genetic architecture of, for example, eating behavior and development of lean/fat content

2 The Complexity of Human Obesity in a Nutshell

Obesity is the excessive accumulation of body adipose tissue, commonly the result of a chronic imbalance between energy intake and expenditure (Galgani and Ravussin 2010 ) Obesity is mostly the result of both environmental and genetic factors and interactions among and within them (multi-factorial) (Bougnères 2002 ) Worldwide, the prevalence of obesity has been growing expo-nentially over the last decades (World Health Organization 2012 ), which may be largely due to the increased availability of energy-rich foods and reduced need for physical activity (O'Rahilly and Farooqi 2006 ) However, it is also known that there is a large genetic component: quantitative genetic studies have esti-mated the heritability of obesity to be between 40% and 70% (Speliotes et al

2010 ) The regulation of energy balance (homeostasis) is a very important aspect

of human obesity Many different tissues, biological processes, and hormones are involved, whereby genes also play a major role For example, energy/food intake

is strongly associated with appetite and satiety Those states are mainly regulated

by the central nervous system with several involved organs and hormones One

of these hormones is ghrelin, an appetite- stimulatory signal secreted by the ach (Wren et al 2001 ) On the other side, leptin is called the satiety hormone, released by adipose tissue and functioning through its receptor in the hypothala-mus, leading to a reduction of food intake and increase of energy expenditure (Friedman 2002 )

Obesity is also very closely related to the human immune system because of the variety of functions of adipose tissue, i.e., endocrine, infl ammatory and metabolic functions (Heber 2010 ) White adipose tissue mainly consists of adipocytes that store energy as fat Adipocytes secrete a large number of adipokines (also called adipocytokines) with important roles in energy homeostasis (Fantuzzi 2005 ) The most abundant ones are leptin (function mentioned previously) and adiponectin (Tilg and Moschen 2006 ) Adiponectin is involved in insulin sensitivity and has anti-infl ammatory and anti-atherogenic properties (Diez and Iglesias 2003 ) Besides adipocytes, several immune cells are present in adipose tissue (Ferrante 2013 ), of which the most abundant are macrophages with an important role in phagocytosis Among others, neutrophils (critical for the fi rst immune response) and mast cells

Trang 39

(several immune functions, e.g., role in allergy) are strongly increased in als with obesity (Elgazar-Carmon et al 2008 ; Liu et al 2009 )

The complexity of obesity as well as, for example, the relationship of adipose tissue with many different hormones and cells results in its association with several other (complex) diseases, like type 2 diabetes, cardiovascular problems, and several types of cancer For instance, decreased insulin sensitivity (leading to type 2 diabe-tes) is a consequence of the chronic, low-grade infl ammation state caused by the increased level of immune cells because of the high degree of adipose cells (Xu

et al 2003 ; Shoelson et al 2007 ) Insulin has an important function in regulating the uptake of glucose, originating from carbohydrates in nutrition Because of food intake, the blood glucose levels rise, and insulin makes sure that the glucose is trans-ported to the cells and that cells take up glucose so it can be used as an energy source In the case of insulin sensitivity, there is not enough insulin, resulting in a limited uptake of glucose (type 2 diabetes) As the human body is dependent on an adequate glucose supply, the disturbed glucose uptake has major consequences for the human body, such as heart and vascular problems, diabetic retinopathy (affected eyesight), and kidney failure Furthermore, insulin also has an anti-infl ammatory effect, again relating obesity to the immune system

The complexity of obesity becomes clear by looking into all the associated sues, cells, hormones, and biological mechanisms and thereby affects longevity and quality of human life Its consequences for human health and subsequent fi nancial burden on the society increase the need for a better understanding of the biological and genetic background of obesity

tis-3 Single Gene Studies in Obesity: What Do We Know

So Far?

For several years, genomewide association studies (GWAS) have been very tant in the detection of genes associated with diseases Since 2007, GWAS have been published regarding obesity Most commonly, the body mass index (BMI) was used as a measure of obesity The fi rst GWAS performed on BMI, using 4741 indi-viduals and 362,129 SNPs, led to the detection of the fat mass and obesity- associated

impor-( FTO ) gene impor-(Scuteri et al 2007 ) Approximately 8 years later, the latest GWAS performed on BMI, composed of 339,224 individuals and approximately 2.5 mil-lion SNPs (GIANT consortium), detected 97 obesity-related loci (Locke et al

2015 ) Although this is a huge increase in detected loci/genes associated with sity, results thus far are disappointing, as they only explain approximately 2.7% of the BMI variation Over the years and with new fi ndings, the understanding of bio-

obe-logical mechanisms behind obesity has also changed Where FTO was discovered

as an actual fat mass gene (related to feed intake), pathway analysis of the 97 obesity- related loci discovered by Locke et al ( 2015 ) is pointing toward a major role for the central nervous system

Furthermore, several other phenotypes have been used to detect obesity-related genes For example, it has been shown that central obesity has more negative health

Trang 40

consequences than general obesity As a consequence, the waist-to-hip ratio (WHR) might be more informative than the BMI (Vazquez et al 2007 ; Molarius and Seidell

1998 ) Recently, 49 loci related to WHR were detected using the GIANT tium database, consisting of genes that were highly enriched in adipocyte-related tissues (Shungin et al 2015) Likewise, as with GWAS on BMI, the variance explained by those loci is very low: only approximately 1.4% of the variance was explained

Many more studies have been performed to try elucidating the genetic ground of obesity, to gain understanding of the biological mechanisms of this com-plex phenotype As mentioned, the use of animal models in biomedical research has some outstanding advantages, such as costs and ethical potential Several research groups have made use of those potentials and investigated obesity using the pig as a biomedical model, either using data sets coming from the pig industry or by using

back-an experimental back-animal model, both having their own (dis)advback-antages (Fig 1 )

4 Human Obesity Genes Present in Pigs

The fi rst gene that has been directly associated with obesity in humans is the FTO

gene This gene has also been related to obesity-related traits in pigs by several

studies In 2007, a study focused on the alleles of the FTO gene and studied the

relationship of this gene in seven pig breeds with several measured traits (Fontanesi

et al 2009 ) They showed, and reconfi rmed, that FTO was signifi cantly associated

with obesity-related phenotypes in Duroc pigs, for example, intramuscular fat content (Fontanesi et al 2010 ) Also in an ISU Berkshire × Yorkshire population, this gene showed signifi cant association with average daily gain and total lipid percentage in muscle (Fan et al 2009 ) Furthermore, an expression study showed

elevated levels of the FTO gene brain tissues, with signifi cantly higher levels in

the cerebellum compared with the cortex of pigs fed a high-cholesterol diet (Madsen et al 2009 )

Another well-known human obesity gene is MC4R The gene is active in the

hypothalamic leptin–melanocortin signaling pathway and has been associated with suppression of food intake (Santini et al 2009 ) Also in pigs, the gene has been associated with several obesity-related traits In Italian Duroc and Italian Large White pigs, it has been associated with daily gain, feed conversion ratio, and ham weight (Davoli et al 2012 ) Another study using Large White showed the associa-

tion of MC4R with backfat depth, average daily gain, and daily feed intake (Houston

Ngày đăng: 14/05/2018, 15:16

🧩 Sản phẩm bạn có thể quan tâm