Probability distributions and random number generation

Một phần của tài liệu CRC using r and RStudio for data management statistical analysis and graphics 2nd (Trang 58 - 61)

Quantiles and cumulative distribution values can be calculated easily within R. Random variables are commonly needed for simulation and analysis. These can be generated for a large number of distributions.

A seed can be specified for the random number generator. This is important to allow replication of results (e.g., while testing and debugging). Information about random number seeds can be found in 3.1.3.

Table 3.1 summarizes support for quantiles, cumulative distribution functions, and ran- dom numbers. More information on probability distributions can be found in the CRAN probability distributions task view (http://cran.r-project.org/web/views/

Distributions.html).

3.1.1 Probability density function

Example: 3.4.1 Here we use the normal distribution as an example; others are shown in Table 3.1 (p. 34).

y = pnorm(1.96, mean=0, sd=1)

Note: This calculates the probability that the random variable is less than the first argu- ment. Thexpnorm()function within themosaicpackage provides a graphical display.

3.1.2 Quantiles of a probability density function

Example: 4.2 Similar syntax is used for a variety of distributions. Here we use the normal distribution as an example; others are shown in Table 3.1 (p. 34).

y = qnorm(.975, mean=0, sd=1)

Table 3.1: Quantiles, probabilities, and pseudo-random number generation: available dis- tributions.

Distribution R DISTNAME

Beta beta

Beta-binomial betabin∗ Beta-normal betanorm∗

binomial binom

Cauchy cauchy

chi-square chisq

exponential exp

F f

gamma gamma

geometric geom

hypergeometric hyper inverse normal inv.gaussian∗

Laplace alap∗

logistic logis

lognormal lnorm

negative binomial nbinom

normal norm

Poisson pois

Student’st t

Uniform unif

Weibull weibull

Note: Prepend d to the command to compute density functions of a distribution dDISTNAME(xvalue, parm1, ..., parmn), p for the cumulative distribution function, pDISTNAME(xvalue, parm1, ..., parmn), qfor the quantile function qDISTNAME(prob, parm1, ..., parmn), and r to generate random variables rDISTNAME(nrand, parm1, ..., parmn), where in the last case a vector ofnrandvalues is the result.

∗Thebetabinom(), betanorm(), inv.gaussian(), andalap()(Laplace) families of dis- tributions are available using theVGAM package.

3.1.3 Setting the random number seed

Example: 12.1.3 The default random number seed is based on the system clock. To generate a replicable series of variates, first run set.seed(seedval) where seedval is a single integer for the default Mersenne–Twister random number generator.

set.seed(42)

set.seed(Sys.time())

Note: More information can be found usinghelp(.Random.seed).

3.1.4 Uniform random variables

Example: 10.1.1 x = runif(n, min=0, max=1)

Note: The arguments specify the number of variables to be created and the range over which they are distributed.

3.1. PROBABILITY DISTRIBUTIONS AND RANDOM NUMBER GENERATION 35

3.1.5 Multinomial random variables

library(Hmisc)

x = rMultinom(matrix(c(p1, p2, ..., pr), 1, r), n)

Note: The function rMultinom() from the Hmisc package allows the specification of the desired multinomial probabilities (P

rpr = 1) as a 1 ×r matrix. The final parameter is the number of variates to be generated (see alsormultinom() in thestatspackage).

3.1.6 Normal random variables

Example: 3.4.1 x1 = rnorm(n)

x2 = rnorm(n, mean=mu, sd=sigma)

Note: The arguments specify the number of variables to be created and (optionally) the mean and standard deviation (defaultà= 0 andσ= 1).

3.1.7 Multivariate normal random variables

For the following, we first create a 3×3 covariance matrix. Then we generate 1000 realiza- tions of a multivariate normal vector with the appropriate correlation or covariance.

library(MASS) mu = rep(0, 3)

Sigma = matrix(c(3, 1, 2, 1, 4, 0,

2, 0, 5), nrow=3) xvals = mvrnorm(1000, mu, Sigma) apply(xvals, 2, mean)

or

rmultnorm = function(n, mu, vmat, tol=1e-07)

# a function to generate random multivariate Gaussians {

p = ncol(vmat) if (length(mu)!=p)

stop("mu vector is the wrong length") if (max(abs(vmat - t(vmat))) > tol)

stop("vmat not symmetric") vs = svd(vmat)

vsqrt = t(vs$v %*% (t(vs$u) * sqrt(vs$d))) ans = matrix(rnorm(n * p), nrow=n) %*% vsqrt ans = sweep(ans, 2, mu, "+")

dimnames(ans) = list(NULL, dimnames(vmat)[[2]]) return(ans)

}

xvals = rmultnorm(1000, mu, Sigma) apply(xvals, 2, mean)

Note: The returned objectxvals, of dimension 1000×3, is generated from the variance–

covariance matrix denoted bySigma, which has first row and column (3,1,2). An arbitrary mean vector can be specified using thec()function.

Several techniques are illustrated in the definition of thermultnormfunction. The first lines test for the appropriate arguments and return an error if the conditions are not satisfied.

The singular value decomposition (see 3.3.15) is carried out on the variance–covariance matrix, and thesweepfunction is used to transform the univariate normal random variables generated byrnormto the desired mean and covariance. Thedimnames() function applies the existing names (if any) for the variables in vmat, and the result is returned.

3.1.8 Truncated multivariate normal random variables

See also 4.1.1.

library(tmvtnorm)

x = rtmvnorm(n, mean, Sigma, lower, upper)

Note:The arguments specify the number of variables to be created, the mean, the covariance matrix, and vectors of the lower and upper truncation values.

3.1.9 Exponential random variables

x = rexp(n, rate=lambda)

Note: The arguments specify the number of variables to be created and (optionally) the inverse of the mean (defaultλ= 1).

3.1.10 Other random variables

Example: 3.4.1 The list of probability distributions supported within R can be found in Table 3.1, page 34. In addition to these distributions, the inverse probability integral transform can be used to generate arbitrary random variables with invertible cumulative density function F (exploiting the fact that F−1 ∼ U(0,1)). As an example, consider the generation of random variates from an exponential distribution with rate parameter λ, where F(X) = 1ưexp(ưλX) =U. Solving forXyieldsX =ưlog(1ưU)/λ. If we generate a Uniform(0,1) variable, we can use this relationship to generate an exponential with the desired rate parameter.

lambda = 2

expvar = -log(1-runif(1))/lambda

Một phần của tài liệu CRC using r and RStudio for data management statistical analysis and graphics 2nd (Trang 58 - 61)

Tải bản đầy đủ (PDF)

(280 trang)