Fernando and Grossman [11] developed a method for calculatingthe gametic covariance conditional on a single linked marker, assuming com- pletely informative markers.. [3] designed a comp
Trang 1© INRA, EDP Sciences, 2002
DOI: 10.1051/gse:2002030
Original article
The covariance between relatives
conditional on genetic markers
Yuefu LIUa∗, Gerald B JANSENa, Ching Y LINa,b
a
Department of Animal and Poultry Science, University of Guelph,
Guelph, ON N1G 2W1, Canadab
Dairy and Swine Research and Development Centre,Agriculture and Agri-Food, Canada
(Received 13 August 2001; accepted 3 June 2002)
Abstract – The development of molecular genotyping techniques makes it possible to analyze
quantitative traits on the basis of individual loci With marker information, the classical theory of estimating the genetic covariance between relatives can be reformulated to improve the accuracy
of estimation In this study, an algorithm was derived for computing the conditional covariance between relatives given genetic markers Procedures for calculating the conditional relationship coefficients for additive, dominance, additive by additive, additive by dominance, dominance
by additive and dominance by dominance effects were developed The relationship coefficients were computed based on conditional QTL allelic transmission probabilities, which were inferred from the marker allelic transmission probabilities An example data set with pedigree and linked markers was used to demonstrate the methods developed Although this study dealt with two QTLs coupled with linked markers, the same principle can be readily extended to the situation
of multiple QTL The treatment of missing marker information and unknown linkage phase between markers for calculating the covariance between relatives was discussed.
covariance between relatives / molecular marker / QTL / transmission probability / tionship matrix
rela-1 INTRODUCTION
Quantifying the resemblance between relatives is a fundamental issue inquantitative genetics It is needed for estimating genetic parameters, predictingbreeding values, planning mating schemes, QTL mapping and marker assistedgenetic evaluation The study of the correlation between relatives can betraced back to the beginning of the last century [29, 36] Kempthorne [22]summarized the work on this topic up to Malecot’s study [27] Fisher [12]first studied the two-locus epistatic deviations and their effects on the cov-ariance between relatives such as parents and descendants, fullsibs, uncles
∗Correspondence and reprints
E-mail: yuefuliu@uoguelph.ca
Trang 2and cousins Cockerham [6, 7] partitioned the two-locus epistatic varianceinto additive by additive, additive by dominance and dominance by domin-ance Kempthorne [21, 22] applied the analysis of factorial experiments topartition the genetic variance and studied the covariance between relatives inrandom mating populations [21, 23], inbred populations [24] and a simpleautotetraploid population [25] Plum [31] formulated a recursive methodfor calculating the relationship and inbreeding coefficients Cockerham [8]
and Weir et al [37] analyzed the influence of linkage on the covariance
between relatives The theory and computational algorithms for the correlationbetween relatives were well established in the early development of quantitativegenetics
The resemblance between relatives is attributed to gene transmission fromthe parents to the descendants so that the relatives share identical genes bydescent with certain probabilities Since the gene transmission between gener-ations is not observable, the transmission probability of an allele is generallytaken to be 0.5 Actually, the transmission of an allele from a parent to offspring
follows an all-or-none pattern With information from molecular markers, itbecomes possible to track the transmission of a linked gene more precisely than
by using pedigree data alone
There have been several studies on the conditional covariance betweenrelatives Fernando and Grossman [11] developed a method for calculatingthe gametic covariance conditional on a single linked marker, assuming com-
pletely informative markers Van Arendonk et al [3] designed a computing
procedure for the gametic relationship matrix given a single linked marker,which is valid when the parental origin of the offspring’s alleles is known.Goddard [16] derived the conditional gametic covariance due to allelic effects
in terms of genetic effects without using the concept of identity probabilities,where parental origins of marker alleles and linkage phases among markers areassumed to be known However, the parental origin of the offspring’s alleles
is often unknown in real data analysis Wang et al [35] extended Fernando
and Grossman’s [11] method to accommodate situations where the parentalorigin of marker alleles can not be determined unequivocably However, themethod used to account for this biological uncertainty has been developed onlyfor a single marker linked to a QTL In QTL mapping for human populations,Fulker and Cardon [13, 14] used a regression approach to approximate the IBD
of QTL from the IBD of flanking markers Their development is based on themethod of Haseman and Elston [18] which considers the expected IBD of alocus as a linear function of the IBD of another linked locus Kruglyak andLander [26] developed a hidden Markov model to estimate the IBD states of
a putative QTL using the probability distribution of the marker IBDs Thisapproach is more accurate than Fulker and Cardon’s approximation [13, 14],but is more complicated to compute Xu and Gessler [38] made a compromise
Trang 3between the two methods and proposed an approximate hidden Markov model
to improve the computing speed at the expense of estimation accuracy Almasyand Blangero [2] improved Fulker and Cardon’s method [13, 14] in regard
to the sib-pair approach of QTL mapping and developed a general
frame-work of multipoint identity by descent Pong-Wong et al [32] combined
the method of Haseman and Elston [18] for estimating identity by descent
between sibs often used in human genetics and the method of Wang et al.
[35] for general pedigree to derive a simple method for calculating the gameticidentity-by-descent matrix of QTLs Meuwissen and Goddard [28] developed
a method of predicting gametic identity probability from marker haplotypes
by a simplified coalescence process, assuming that the number of generationssince the base population and effective population size are known Thesestudies on conditional identity measures of relatives have generally focused
on the identity by descent due to allelic effects The theory of conditionalcovariance due to non-additive effects has been little studied Aside from thecovariance due to allelic effects, the quantification of the conditional covariancecomponents due to additive and non-additive effects is also frequently required
to refine the statistical model for marker assisted analysis of quantitativetraits
This study aimed to develop a general theory for constructing the tional covariance between relatives in the presence of additive, dominanceand epistatic effects and to update the classical theory when both pedigreeand marker data are available The development relaxed the assumptions ofprevious studies and applied both single and flanking marker inferences withknown or unknown parental origins of offspring’s haplotypes
i at the flanking locus N l
i The superscript l will be
dropped for simplicity whenever a single QTL is considered These symbols
are random variables For example, when an individual i has the genotype A1A2
at marker locus m, then M m1 i = A1 and M i m2 = A2 The symbol “≡” stands for
the identity between alleles and the symbol “⇐” for the allelic transmissionfrom a parent to a descendant
Trang 4m1 s m1 s m1
s Q N
s n1 s n1
s Q N
d m1 d m1
d Q N
d n1 d n1
dQ N
m2 s m2 s m2
s Q N
s n2 s n2
s Q N
d m2 d m2
d Q N
d n2 d n2
d Q NM
m1
i m1 i m1
i Q N
i n1 i n1
i Q NM m2
i m2 i m2
i Q N
i n2 i n2
i Q NM
m1 s' m1 s' m1
s' Q N
s' n1 s' n1 s'Q N
d' m1 d' m1 d' Q N
d' n1 d' n1 d'Q NM
m2 s' m2 s' m2
s' Q N
s' n2 s' n2 s'Q N
d' m2 d' m2 d' Q N
d' n2 d' n2 d'Q NM
m1
j m1 j m1
j Q N
j n1 j n1
j Q NM m2
j
m2 j
n2
j Q NM
Figure 1 The marker and QTL genotypes for individuals i and j, and their respective
parents s, d, and s, d
2.2 Genetic covariance components
If there are q loci controlling a quantitative trait, the classical formula for computing the covariance between genotypic values (g) of individuals i and j [21, 22] is:
under the assumption of no inbreeding and linkage equilibrium among loci
When there is only one locus (q = 1), formula (1) reduces to Cov(g i , g j ) =
Trang 5ADD Traditionally, the coefficients r ij and u ij are assumed to be identical for
the q loci because the allelic transmission at each individual locus can not be traced Considering only two loci, say m and n, the genetic covariance due to
these two QTL loci can be written as:
are dominance variances at the two loci The epistatic variances for additive
by additive, additive by dominance, dominance by additive and dominance by
dominance between loci m and n are σ2
A m A n,σ2
A m D n,σ2
D m A nandσ2
D m D n, respectively.Information on the markers linked to QTL affecting a trait can be used torefine the covariance among relatives Conditional on the marker information
Therefore, formula (2) needs to be rewritten as:
ijare the additive and dominance relationship coefficients
between individuals i and j at loci m and n, and r m
probability of QTL allelic identities between individuals i and j:
2.3 Conditional probability of QTL allelic identity by descent
For every pair of individuals i and j in a population, there are four possible QTL allelic identities: (Q1 ≡ Q1), (Q1 ≡ Q2), (Q2 ≡ Q1) and (Q2 ≡ Q2)
Trang 6The probabilities of these identities can be inferred conditional on the marker
information Let matrix Pij contain the probabilities of the four QTL allelic
identities between individuals i and j:
The additive and dominance relationship coefficients between individuals i and
where the t’s are all (2×1) column vectors Similarly, QTL allelic transmission
probabilities from parents sand dto descendant j can be defined in matrix T j
The QTL allelic identity probabilities between individuals i and j, i.e P ij,can be calculated as:
Trang 7Formula (7) corresponds to Falconer’s [10] “basic rule”for calculating try whereas formula (6) relates to the “supplementary rule” Computationally,formula (6) is more efficient than formula (7) Both (6) and (7) indicate that theQTL allelic identity probabilities in a population can be tabulated recursivelyfrom ancestors to descendants.
coances-The same principle applies to the derivation of QTL allelic identity
probab-ilities of individual i with itself Letting j = i, s= s and d = d in formula (7),
and replacing the marginal probabilities with conditional probabilities in Tjofformula (7) because the allelic transmission from parent to the first allele of
offspring i is not independent of that to the second allele, the QTL allelic identity
probabilities of individual i with itself (P ii) can be derived from formula (7)and take the following form:
where Psd and t’s are as defined above and 1 = (1 1) Matrix P ii is always
symmetric When the parental origins of the two QTL alleles are known (e.g.
i is from the father and Q2
i from mother), formula (8) simplifies to
the QTL identity probabilities of an individual i with itself when parental
origins of offspring’s alleles are known This explains why formula (8) of Van
Arendonk et al [3] works in the same way as the method of Wang et al [35]
when parental origins are known
2.4 QTL allelic transmission probabilities
The parental origin of QTL alleles is usually unknown because the QTLallelic transmission is not directly observable Therefore, the eight transmission
probabilities of QTL alleles from parents s and d to descendant i (T i) have
to be assessed based on marker alleles transmitted from parents s and d to the offspring i and genetic distances between QTL and markers When two
flanking markers are available, the transmission probability from QTL allele
Trang 8k p (k p = 1, 2) of parent p (p = s, d) to allele k i (k i = 1, 2) of descendant i can
p N p kp ) is the conditional probability given
in the 5th column of Table I when k p = 1 and in the 6th column when k p= 2
Matrix Ti can now be expressed in terms of marker allelic transmission
probabilities, Si, and recombination rates between QTL and markers andbetween flanking markers:
case of Wang et al [35] Formula (10) is identical to formula (5) of Wang
et al [35] if their B matrix is transposed.
Trang 102.5 Marker haplotype transmission probability
Although marker genotypes can be observed through genotyping techniques,the parental origin of a descendant’s haplotype is often uncertain For example,
if a descendant and its parents all have genotype A1A2at a single marker, there
is no way to ascertain which parent the descendant’s haplotypes come from.Furthermore, when a parent is homozygous, it is impossible to determine whichparental gamete a descendant’s haplotype comes from In this development,
we trace all possible paths from parental gametes to a descendant’s markerhaplotype Because the inference is always conditional on marker information,the notation for conditioning on marker information (|M) will be droppedhereafter for ease of presentation
The assessment of the marker haplotype transmission involves three steps.First, the transmission probabilities of each path from parental gametes to adescendant’s haplotype needs to be quantified For this, we need to inferwhich parent a descendant’s haplotype comes from (parental origin), and whichparental gamete type the descendant’s haplotype originates from given theparental origin (gametic frequency) The probability of each transmission path
is a probabilistic product of the parental origin and the gametic frequency givenparental origin, following the Law of Compound Probability [5] There are fourmutually exclusive paths for each descendant’s haplotype in a single markercase and eight in a flanking marker case Second, we need to determine theprobabilities of each descendant’s haplotype given the transmission path from aparental gamete to the descendant’s haplotype This can be done by comparingthe descendant’s haplotype with the parental gametic type Third, our purpose is
to determine the probabilities of each transmission path from parental gametes
to a descendant’s haplotype given that the descendant’s haplotype is observed.This requires calculating the reverse probability of each path given the observedhaplotype of the descendant using the Bayes Theorem [5]
Consider the single marker case first A marker haplotype M k i
There are two possible parental origins for M k i
i It may be paternal, i.e.
Trang 11sum to one as expected In the single marker case, the frequencies of parentalgametes given parental origins are all 0.5.
For each realization of M k i
In the case of flanking markers, there are eight mutually exclusive marker
transmission paths for each haplotype M k i
M and N, respectively The paternal and maternal gametic frequencies given
parental origins are(1 − θ)/2, θ/2, θ/2, and (1 − θ)/2 In a similar way, the
probabilities of parental origins of M k i
not be inferred, it is assumed that both Pr (M k i