Quantiles and cumulative distribution values can be calculated easily within R. Random variables are commonly needed for simulation and analysis. These can be generated for a large number of distributions.
A seed can be specified for the random number generator. This is important to allow replication of results (e.g., while testing and debugging). Information about random number seeds can be found in 3.1.3.
Table 3.1 summarizes support for quantiles, cumulative distribution functions, and ran- dom numbers. More information on probability distributions can be found in the CRAN probability distributions task view (http://cran.r-project.org/web/views/
Distributions.html).
3.1.1 Probability density function
Example: 3.4.1 Here we use the normal distribution as an example; others are shown in Table 3.1 (p. 34).
y = pnorm(1.96, mean=0, sd=1)
Note: This calculates the probability that the random variable is less than the first argu- ment. Thexpnorm()function within themosaicpackage provides a graphical display.
3.1.2 Quantiles of a probability density function
Example: 4.2 Similar syntax is used for a variety of distributions. Here we use the normal distribution as an example; others are shown in Table 3.1 (p. 34).
y = qnorm(.975, mean=0, sd=1)
Table 3.1: Quantiles, probabilities, and pseudo-random number generation: available dis- tributions.
Distribution R DISTNAME
Beta beta
Beta-binomial betabin∗ Beta-normal betanorm∗
binomial binom
Cauchy cauchy
chi-square chisq
exponential exp
F f
gamma gamma
geometric geom
hypergeometric hyper inverse normal inv.gaussian∗
Laplace alap∗
logistic logis
lognormal lnorm
negative binomial nbinom
normal norm
Poisson pois
Student’st t
Uniform unif
Weibull weibull
Note: Prepend d to the command to compute density functions of a distribution dDISTNAME(xvalue, parm1, ..., parmn), p for the cumulative distribution function, pDISTNAME(xvalue, parm1, ..., parmn), qfor the quantile function qDISTNAME(prob, parm1, ..., parmn), and r to generate random variables rDISTNAME(nrand, parm1, ..., parmn), where in the last case a vector ofnrandvalues is the result.
∗Thebetabinom(), betanorm(), inv.gaussian(), andalap()(Laplace) families of dis- tributions are available using theVGAM package.
3.1.3 Setting the random number seed
Example: 12.1.3 The default random number seed is based on the system clock. To generate a replicable series of variates, first run set.seed(seedval) where seedval is a single integer for the default Mersenne–Twister random number generator.
set.seed(42)
set.seed(Sys.time())
Note: More information can be found usinghelp(.Random.seed).
3.1.4 Uniform random variables
Example: 10.1.1 x = runif(n, min=0, max=1)
Note: The arguments specify the number of variables to be created and the range over which they are distributed.
3.1. PROBABILITY DISTRIBUTIONS AND RANDOM NUMBER GENERATION 35
3.1.5 Multinomial random variables
library(Hmisc)
x = rMultinom(matrix(c(p1, p2, ..., pr), 1, r), n)
Note: The function rMultinom() from the Hmisc package allows the specification of the desired multinomial probabilities (P
rpr = 1) as a 1 ×r matrix. The final parameter is the number of variates to be generated (see alsormultinom() in thestatspackage).
3.1.6 Normal random variables
Example: 3.4.1 x1 = rnorm(n)
x2 = rnorm(n, mean=mu, sd=sigma)
Note: The arguments specify the number of variables to be created and (optionally) the mean and standard deviation (defaultà= 0 andσ= 1).
3.1.7 Multivariate normal random variables
For the following, we first create a 3×3 covariance matrix. Then we generate 1000 realiza- tions of a multivariate normal vector with the appropriate correlation or covariance.
library(MASS) mu = rep(0, 3)
Sigma = matrix(c(3, 1, 2, 1, 4, 0,
2, 0, 5), nrow=3) xvals = mvrnorm(1000, mu, Sigma) apply(xvals, 2, mean)
or
rmultnorm = function(n, mu, vmat, tol=1e-07)
# a function to generate random multivariate Gaussians {
p = ncol(vmat) if (length(mu)!=p)
stop("mu vector is the wrong length") if (max(abs(vmat - t(vmat))) > tol)
stop("vmat not symmetric") vs = svd(vmat)
vsqrt = t(vs$v %*% (t(vs$u) * sqrt(vs$d))) ans = matrix(rnorm(n * p), nrow=n) %*% vsqrt ans = sweep(ans, 2, mu, "+")
dimnames(ans) = list(NULL, dimnames(vmat)[[2]]) return(ans)
}
xvals = rmultnorm(1000, mu, Sigma) apply(xvals, 2, mean)
Note: The returned objectxvals, of dimension 1000×3, is generated from the variance–
covariance matrix denoted bySigma, which has first row and column (3,1,2). An arbitrary mean vector can be specified using thec()function.
Several techniques are illustrated in the definition of thermultnormfunction. The first lines test for the appropriate arguments and return an error if the conditions are not satisfied.
The singular value decomposition (see 3.3.15) is carried out on the variance–covariance matrix, and thesweepfunction is used to transform the univariate normal random variables generated byrnormto the desired mean and covariance. Thedimnames() function applies the existing names (if any) for the variables in vmat, and the result is returned.
3.1.8 Truncated multivariate normal random variables
See also 4.1.1.
library(tmvtnorm)
x = rtmvnorm(n, mean, Sigma, lower, upper)
Note:The arguments specify the number of variables to be created, the mean, the covariance matrix, and vectors of the lower and upper truncation values.
3.1.9 Exponential random variables
x = rexp(n, rate=lambda)
Note: The arguments specify the number of variables to be created and (optionally) the inverse of the mean (defaultλ= 1).
3.1.10 Other random variables
Example: 3.4.1 The list of probability distributions supported within R can be found in Table 3.1, page 34. In addition to these distributions, the inverse probability integral transform can be used to generate arbitrary random variables with invertible cumulative density function F (exploiting the fact that F−1 ∼ U(0,1)). As an example, consider the generation of random variates from an exponential distribution with rate parameter λ, where F(X) = 1ưexp(ưλX) =U. Solving forXyieldsX =ưlog(1ưU)/λ. If we generate a Uniform(0,1) variable, we can use this relationship to generate an exponential with the desired rate parameter.
lambda = 2
expvar = -log(1-runif(1))/lambda