Probability distributions and random number genera- 123docz.net

Quantiles and cumulative distribution values can be calculated easily within R. Random variables are commonly needed for simulation and analysis. These can be generated for a large number of distributions.

A seed can be specified for the random number generator. This is important to allow replication of results (e.g., while testing and debugging). Information about random number seeds can be found in 3.1.3.

Table 3.1 summarizes support for quantiles, cumulative distribution functions, and random numbers. More information on probability distributions can be found in the CRAN probability distributions task view (http://cran.r-project.org/web/views/

Distributions.html).

3.1.1 Probability density function

Example: 3.4.1 Here we use the normal distribution as an example; others are shown in Table 3.1 (p. 34).

y = pnorm(1.96, mean=0, sd=1)

Note: This calculates the probability that the random variable is less than the first argu- ment. Thexpnorm()function within themosaicpackage provides a graphical display.

3.1.2 Quantiles of a probability density function

Example: 4.2 Similar syntax is used for a variety of distributions. Here we use the normal distribution as an example; others are shown in Table 3.1 (p. 34).

y = qnorm(.975, mean=0, sd=1)

Table 3.1: Quantiles, probabilities, and pseudo-random number generation: available distributions.

Distribution R DISTNAME

Beta beta

Beta-binomial betabin∗ Beta-normal betanorm∗

binomial binom

Cauchy cauchy

chi-square chisq

exponential exp

F f

gamma gamma

geometric geom

hypergeometric hyper inverse normal inv.gaussian∗

Laplace alap∗

logistic logis

lognormal lnorm

negative binomial nbinom

normal norm

Poisson pois

Student’st t

Uniform unif

Weibull weibull

Note: Prepend d to the command to compute density functions of a distribution dDISTNAME(xvalue, parm1, ..., parmn), p for the cumulative distribution function, pDISTNAME(xvalue, parm1, ..., parmn), qfor the quantile function qDISTNAME(prob, parm1, ..., parmn), and r to generate random variables rDISTNAME(nrand, parm1, ..., parmn), where in the last case a vector ofnrandvalues is the result.

∗Thebetabinom(), betanorm(), inv.gaussian(), andalap()(Laplace) families of distributions are available using theVGAM package.

3.1.3 Setting the random number seed

Example: 12.1.3 The default random number seed is based on the system clock. To generate a replicable series of variates, first run set.seed(seedval) where seedval is a single integer for the default Mersenne–Twister random number generator.

set.seed(42)

set.seed(Sys.time())

Note: More information can be found usinghelp(.Random.seed).

3.1.4 Uniform random variables

Example: 10.1.1 x = runif(n, min=0, max=1)

Note: The arguments specify the number of variables to be created and the range over which they are distributed.

3.1. PROBABILITY DISTRIBUTIONS AND RANDOM NUMBER GENERATION 35

3.1.5 Multinomial random variables

library(Hmisc)

x = rMultinom(matrix(c(p1, p2, ..., pr), 1, r), n)

Note: The function rMultinom() from the Hmisc package allows the specification of the desired multinomial probabilities (P

rpr = 1) as a 1 ×r matrix. The final parameter is the number of variates to be generated (see alsormultinom() in thestatspackage).

3.1.6 Normal random variables

Example: 3.4.1 x1 = rnorm(n)

x2 = rnorm(n, mean=mu, sd=sigma)

Note: The arguments specify the number of variables to be created and (optionally) the mean and standard deviation (defaultà= 0 andσ= 1).

3.1.7 Multivariate normal random variables

For the following, we first create a 3×3 covariance matrix. Then we generate 1000 realiza- tions of a multivariate normal vector with the appropriate correlation or covariance.

library(MASS) mu = rep(0, 3)

Sigma = matrix(c(3, 1, 2, 1, 4, 0,

2, 0, 5), nrow=3) xvals = mvrnorm(1000, mu, Sigma) apply(xvals, 2, mean)

rmultnorm = function(n, mu, vmat, tol=1e-07)

# a function to generate random multivariate Gaussians {

p = ncol(vmat) if (length(mu)!=p)

stop("mu vector is the wrong length") if (max(abs(vmat - t(vmat))) > tol)

stop("vmat not symmetric") vs = svd(vmat)

vsqrt = t(vs$v %*% (t(vs$u) * sqrt(vs$d))) ans = matrix(rnorm(n * p), nrow=n) %*% vsqrt ans = sweep(ans, 2, mu, "+")

dimnames(ans) = list(NULL, dimnames(vmat)[[2]]) return(ans)

}

xvals = rmultnorm(1000, mu, Sigma) apply(xvals, 2, mean)

Note: The returned objectxvals, of dimension 1000×3, is generated from the variance–

covariance matrix denoted bySigma, which has first row and column (3,1,2). An arbitrary mean vector can be specified using thec()function.

Several techniques are illustrated in the definition of thermultnormfunction. The first lines test for the appropriate arguments and return an error if the conditions are not satisfied.

The singular value decomposition (see 3.3.15) is carried out on the variance–covariance matrix, and thesweepfunction is used to transform the univariate normal random variables generated byrnormto the desired mean and covariance. Thedimnames() function applies the existing names (if any) for the variables in vmat, and the result is returned.

3.1.8 Truncated multivariate normal random variables

Probability distributions and random number generation

Derived variables and data manipulation

Merging, combining, and subsetting datasets