For example, themaximum likelihood parameters for the RSLN distribution for the TSE 300distribution and the data from 1956 to 1999 are: Using these values for and , and using the recursi
Trang 1TABLE 4.3
THE LOGNORMAL MODEL
Bootstrap estimates of accumulation factor quantiles
to the empirical mean of 1.116122 (the data set is TSE 300, from 1956 to1999) This gives the first equation to solve for and , that
j
i i
2
3,3,
For the lognormal model, with Y ~ N( , ), the one-year accumulation
Trang 30 2 4 6 8 10 12 0.0
0.1
0.2
0.3
Lognormal, ML parameters RSLN, ML parameters
Accumulated Proceeds of a 10-year Unit Investment, TSE Parameters
Accumulated Proceeds of a 10-year Unit Investment, TSE Parameters
FIGURE 4.1
ANALYTIC CALIBRATION OF OTHER MODELS
Comparison of lognormal and RSLN distributions, before and aftercalibration
Calibration of AR(1) and the RSLN models can be done analytically,similarly to the lognormal model, though a little more complex
120
120
Trang 4This assumes a neutral starting position for the process; that is, ,
so that the first value of the series is
To prove equation 4.18, it is simpler to work with the detrended process
in equation 4.18, so it is possible to calculate probabilities analytically forthe accumulation factors
Once again, we use as one of the defining equations the mean one-yearaccumulation,
n
n
i i
2 12
Z = Y – , so that Z = aZ +
Trang 5The distribution function of the accumulation factor for the RSLN-2 modelwas derived in equation 2.30 in the section on RSLN in Chapter 2 In thatsection, we showed how to derive a probability distribution for the totalnumber of months spent in regime 1 for the month process Here wedenote the total sojourn random variable , and its probability function
Using this equation, it is straightforward to calculate the probabilitiesfor the various maximum quantile points in Table 4.1 For example, themaximum likelihood parameters for the RSLN distribution for the TSE 300distribution and the data from 1956 to 1999 are:
Using these values for and , and using the recursion fromequations 2.20 and 2.26, gives the distribution for shown in Table 4.4.Applying this distribution, together with the estimators for , , ,, gives
Trang 6The Simulation Method
Pr[ 0 85] 0 030 Pr[ 1 05] 0 057 Pr[ 1 35] 0 12
In each case, the probability that the accumulation factor is less thanthe table value is greater than the percentile specified in the table Forexample, for the top left table entry, we need at least 2.5 percent probabilitythat is less than 0.76 We have probability of 3.2 percent, so the RSLNdistribution with these parameters satisfies the requirement for the first entry.Similarly, all the probabilities calculated are greater than the minimumvalues So the maximum likelihood estimators satisfy all the quantile-matching criteria The mean one-year accumulation factor is 1.1181, andthe standard deviation is 18.23 percent
The CIA calibration criteria allow calibration using simulation, but stipulatethat the fitted values must be adequate with a high (95 percent) probability.The reason for this stipulation is that simulation adds sampling variability tothe estimation process, which needs to be allowed for Simulation is usefulwhere analytic calculation of the distribution function for the accumulationfactors is not practical This would be true, for example, for the conditionallyheteroscedastic models
The simulation calibration process runs as follows:
Simulate for example, values for each of the three accumulationfactors in Table 4.1
For each cell in Table 4.1, count how many simulated values for theaccumulation factor fall below the maximum quantile in the table Letthis number be That is, for the first calibration point, is thenumber of simulated values of the one-year accumulation factor thatare less than 0.76
˜ is an estimate of , the true underlying probability that theaccumulation factor is less than the calibration value This means that
˜the table quantile value lies at the -quantile of the accumulation-factordistribution, approximately
Trang 7The GARCH Model
˜ is greater than the required probability (0.025, 0.05,
or 0.1), then we can be 95 percent certain that the parameters satisfythe calibration criterion
If the calibration criteria are not all satisfied, it will be necessary toadjust the parameters and return to step 1
The maximum likelihood estimates of the generalized autoregressive tionally heteroscedastic (GARCH) model were given in Table 3.4 in Chap-ter 3 Using these parameter estimates to generate 20,000 values of , ,and , we find that the quantiles are too small at all durations Also,the mean one-year accumulation factor is rather large, at around 1.128.Reducing the term to, for example 0.0077 per month, is consistent withthe lognormal model and will bring the mean down Increasing any of theother parameters will increase the standard deviation for the process and,therefore, increase the portion of the distribution in the left tail The andparameters determine the dependence of the variance process on earliervalues After some experimentation, it appears most appropriate to increaseand leave and Here, appropriateness is being measured in terms ofthe overall fit at each duration for the accumulation factors
condi-Increasing the parameter to 0.00053 satisfies the quantile criteria.Using 100,000 simulations of , we find 2,629 are smaller than 0.76, giving
an estimated 2.629 percent of the distribution falling below 0.76 Allowingfor sampling variability, we are 95 percent certain that the probability forthis distribution of falling below 0.76 is at least
0 02629 1 645 0 02629 (1 02629) 100000 0 02546All the other quantile criteria are met comfortably; the 2.5 percent quan-tile for the one-year accumulation factor is the most stringent test forthe GARCH distribution, as it was for the lognormal distribution Usingthe simulated one-year accumulation factors, the mean lies in the range(1.114,1.117), and the standard deviation is estimated at 21.2 percent
p p m
Trang 8in projections at the end of this chapter, we return to the subject in muchmore depth in Chapter 11, where we will show that parameter uncertaintymay significantly affect estimated capital requirements for equity-linkedcontracts.
The term “Bayesian” comes from Bayes’ theorem, which states that forrandom variables and , the joint, conditional, and marginal probabilityfunctions are related as:
This relation is used in Bayesian parameter estimation with the unknownparameter vector as one of the random variables and the random sampleused to fit the distribution, , as the other Then we may determinethe probability (density) functions for , , , as well as the marginalprobability (density) functions for and
Originally, Bayesian methods were constrained by difficulty in ing distributions for the data and the parameters Only a small number of
兩 兩
Trang 9de-The maximum likelihood estimation (MLE) procedure discussed inChapter 3 is a classical frequentist technique The parameter is assumed
to be fixed but unknown A random sample is drawnfrom a population with distribution dependent on and used to draw
ˆinference about the likely value for The resulting estimator, , is assumed
to be a random variable through its dependence on the random sample.The Bayesian approach, as we have mentioned, is to treat as a randomvariable We are really using the language of random variables to model theuncertainty about
Before any data is collected, we may have some information about ;this is expressed in terms of a probability distribution for , ( ) known asthe If we have little or no information prior to observingthe data, we can choose a prior distribution with a very large variance orwith a flat density function If we have good information, we may choose
a prior distribution with a small variance, indicating little uncertaintyabout the parameter The mean of the prior distribution represents the bestestimate of the parameter before observing the data After having observedthe data , it is possible to construct the probability densityfunction for the parameter conditional on the data This is the
( ), and it combines the information in the prior distributionwith the information provided by the sample
We can connect all this in terms of the probability density functionsinvolved, considering the sample and the parameter as random variables Forsimplicity we assume all distribution and density functions are continuous,and the argument of the density function indicates the random variablesinvolved (i.e., ( ) could be written ( ), but that tends to becomecumbersome) Where the variable is we use () to denote the probabilitydensity function
Let ( ) denote the density of given the parameter The jointdensity for the random sample, conditional on the parameter is
This is the likelihood function that was used extensively in Chapter 3 Thelikelihood function plays a crucial role in Bayesian inference as well as infrequentist methods
Let ( ) denote the prior distribution of , then, from Bayes’ theorem,the joint probability of is
Trang 10MARKOV CHAIN MONTE CARLO AN INTRODUCTION
The denominator is the marginal joint distribution for the sample Since
it does not involve , it can be thought of as the constant required so that( ) integrates to 1
The posterior distribution for can then be used with the sample
to derive the This is the marginal distribution offuture observations of , taking into consideration the information aboutthe variability of the parameter , as adjusted by the previous data In terms
of the density functions, the predictive distribution is:
In Chapter 11, some examples are given of how to apply the predictivedistribution using the Markov chain Monte Carlo method, described inthis chapter, as part of a stochastic simulation analysis of equity-linkedcontracts
We can use the moments of the posterior distribution to derive tors of the parameters and standard errors An estimator for the parameter
estima-is the expected value E[ ] For parameter vectors, theposterior distribution is multivariate, giving information about howthe parameters are interrelated
Both the classical and the Bayesian methods can be used for statisticalinference—estimating parameters, constructing confidence intervals, and so
on Both are highly dependent on the likelihood function With maximumlikelihood we know only the asymptotic relationships between parameterestimates; whereas, with the Bayesian approach, we derive full joint dis-tributions between the parameters The price paid for this is additionalstructure imposed with the prior distribution
For all but very simple models, direct calculation of the posterior distribution
is not possible In particular, for a parameter vector , an analyticalderivation of the joint posterior distribution is, in general, not feasible Forsome time, this limited the applicability of the Bayes approach In the 1980sthe Markov chain Monte Carlo (MCMC) technique was developed Thistechnique can be used to simulate a sample from the posterior distribution
of So, although we may not know the analytic form for the posterior
n n
Trang 11Technically, the MCMC algorithm is used to construct a Markov chain
, which has as its stationary distribution the requiredposterior, So, if we generate a large number of simulated values of theparameter set using the algorithm, after a while the process will reach astationary distribution From that point, the algorithm generates randomvalues from the posterior distribution for the parameter vector We can usethe simulated values to estimate the marginal density and distributionfunctions for the individual parameters or the joint density or distri-bution functions for the parameter vector
The early values for the chain, before the chain achieves the limitingdistribution, are called the “burn in.” These values are discarded Thependent sample from the posterior distribution , enabling estimation ofthe joint moments of the posterior distribution
One of the reasons that the MCMC method is so effective is that wecan update the parameter vector one parameter at a time This makes thesimulation much easier to construct For example, assume we are estimating
and simulate a value from this distribution; we can then use this value
in the next distribution and so proceed, simulating:
This gives us ( ), and the iteration proceeds.The problem then reduces to simulating from the posterior distributionsfor each of the parameters, assuming known values for all the remainingparameters
For a general parameter vector ( ), the posterior tribution of interest with respect to parameter is
remaining values, { , , , , }are a random,
noninde-a three-pnoninde-arnoninde-ameter distribution⌰ = ( ␣ , , ) We can update⌰
Trang 12we may be able to use the Metropolis-Hastings algorithm Both of thesemethods are described in much more detail, along with full derivations forthe algorithms, in Gilks, Richardson, and Spiegelhalter (1996) Their bookalso gives other examples of MCMC in practice and discusses implemen-tation issues around, for example, convergence, which are not discussed indetail here.
The Metropolis-Hastings algorithm (MHA) is relatively straightforward toapply, provided the likelihood function can be calculated The algorithmsteps are described in the following sections Prior distributions are assignedbefore the simulation; the other steps are followed through in turn for eachparameter for each simulation In the descriptions below, we assume thatthe th simulation is complete, and we are now generating the ( 1)thvalues for the parameters
For each parameter in the parameter vector we need to assign a prior tribution These can be independent, or if there is a reason to use jointdistributions for subsets of parameters that is also possible In the examplesthat we use, the prior distributions for all the parameters are independent.The center of the prior distribution indicates the best initial estimate ofwhere the parameter lies If the maximum likelihood estimate is available,that will be a good starting point The variance of the prior distributionindicates the uncertainty associated with the initial estimate If the variance
dis-is very large, then the prior ddis-istribution will have little effect on the posteriordistribution, which will depend strongly on the data alone If the variance
is small, the prior will have a large effect on the shape and location of theposterior distribution The exact form of the prior distribution depends onthe parameter under consideration For example, a normal distribution may
be appropriate for a mean parameter, but not for a variance parameter,which we know must be greater than zero In practice, prior distributionsand candidate distributions for parameters will often be the same family.The choice of candidate distributions is discussed in the next section
Trang 13The Candidate Distribution
The Acceptance-Rejection Procedure
r
a a
r
The algorithm uses an method This requires a randomvalue, say, from a with probability density func-tion ( ) This value will be accepted or rejected as the new valueusing the acceptance probability defined below
For the candidate distribution we can use any distribution that spansthe parameter space for , but some candidate distributions will be moreefficient than others “Efficiency” here refers to the speed with which thechain reaches the stationary distribution Choosing a candidate distribu-tion usually requires some experimentation For unrestricted parameters(such as the mean parameter for an autoregressive [AR], autoregressiveconditionally heteroscedastic [ARCH], or generalized autoregressive condi-tionally heteroscedastic [GARCH] model), the normal distribution centered
on the previous value of the parameter has advantages and is a commonchoice That is, the candidate value for the ( 1)th value of param-eter is a random number generated from the ( ) distributionfor some , chosen to ensure that the acceptance probability is in anefficient region
The normal distribution can sometimes be used even if the eter space is restricted, provided the probability of generating a valueoutside the parameter space is kept to a near impossibility For exam-ple, with the AR(1) model, the normal distribution works as a candidatedistribution for the autoregressive parameter , even though we require
param-1 This is because we can use a normal distribution with ance of around 0.1 with generated values for the parameter in the rangeFor variance parameters that are constrained to be strictly positive,popular distributions in the literature are the gamma and inverted gammadistributions Again, there are advantages in centering the candidate distri-bution on the previous value of the series
vari-The candidate value, , may be accepted as the next value, , or it may
be rejected, in which case the next value in the chain is the previousvalue, Acceptance or rejection is a random process; thealgorithm provides the probability of acceptance
For the ( 1)th iteration for the parameter , we have the most recentvalue denoted by ; we also have the most current value for parameterset excluding :
i r
i i
Trang 14(5 9)( )
where ( ) is the likelihood calculated using for parameter ; allother parameters are taken from the vector ; and the () terms givethe values of the prior distribution for , evaluated at the current and thecandidate values The acceptance probability then becomes:
If 1, then the candidate is assigned to be the next value of the
It is worth considering equation 5.10 If the prior distribution is disperse,
it will not have a large effect on the calculations because it will be numericallymuch smaller than the likelihood So a major part of the acceptance proba-bility is the ratio of the likelihood with the candidate value to the likelihoodwith the previous value If the likelihood improves, then 1, depending
on the ratio, and we probably accept the candidate value If the likelihooddecreases very strongly, will be small and we probably keep the previousvalue If the likelihood decreases a little, then the value may or may notchange So the process is very similar to a Monte Carlo search for the jointmaximum likelihood, and the posterior density for will be roughly propor-tional to the likelihood function The results from the MHA with dispersepriors will therefore have similarities with the results of the maximum likeli-hood approach; in addition, we have the joint probabilities of the parameterestimates
i
r ,r r
i i
Trang 15It is important to look at the sample paths and the acceptance frequencies
to assess the appropriateness of the distributions A poor choice for thecandidate distribution will result in acceptance probabilities being too low
or too high If the acceptance probability is too low, then the series takes
a long time to converge to the limiting distribution because the chain willfrequently stay at one value for long periods If it is too large, the values tendnot to reach the tails of the limiting distribution quickly, again resulting inslow convergence Roberts (1996) suggests acceptance rates should lie inthe range [0.15,0.5]
In Figure 5.1 are some examples of sample paths for the mean parametergenerated for an AR(1) model, using the MHA sample of parameters andusing the TSE 300 data for the years 1956 to 1999 In the top figure, thecandidate distribution is ( 0 05 ) The acceptance probability is verylow; the relatively high variance of the candidate distribution means thatcandidates tend to generate low values for the likelihood, and are therefore
r
( ) 2
Trang 16MCMC FOR THE RSLN MODEL
usually rejected The process gets stuck for long periods, and convergence
to the stationary distribution will take some time In the middle figure, thecandidate distribution has a very low standard deviation of 0.00025 Theprocess moves very slowly around the parameter space, and it takes a longtime to move from the initial value ( 0) to the area of the long-termmean value (around 0.009) Values are very highly serially correlated Thebottom figure uses a candidate standard distribution of 0.005 This looksabout right; the process appears to have reached a stationary state and thesample space appears to be fully explored Serial correlations are very muchlower than the other two diagrams The correlation between the th and( 5)th values is 0.73 for the top diagram, 0.96 for the second, and 0.10for the third These correlations ignore the first 200 values
In this section, the application of the MCMC method to the RSLN model isdescribed in detail Many other choices of prior and candidate distributionwould work equally well and give very similar results The choices listedwere derived after some experimentation with different distributions andparameters Without strong prior information, it is appropriate to set thevariances for the prior distributions to be large enough that the effect of theprior on the acceptance probability is very small in virtually all cases
For the means of the two regimes, we use identical normal prior
distribu-(0 0 02 ) The candidate distribution for the firstregime is ( 0 005 ) and for the second regime is ( 0 02 ) Thecandidate density for is therefore:
冫冫
1 1
Trang 17MCMC Results for RSLN Model
The candidate variance is chosen to give an appropriate probability
of acceptance The acceptance probabilities for and depend onthe distributions used for the other parameters; using those describedbelow, we have acceptance probabilities of around 40 percent for bothvariables
It is conventional to work with the inverse variance, , known asthe precision The prior distribution for is the gamma distribution withprior mean 865 and variance 849 ; the prior distribution for is gammawith mean 190 and variance 1 000 The prior distributions are centeredaround the likelihood estimates, but are both very disperse, providing littleinfluence on the final distribution
The candidate distributions are also gamma; for , we use a
dis-different coefficients of variation (CV variance/mean ) are determinedheuristically to give acceptance probabilities within the desired range Theacceptance probabilities for and candidates are approximately 20percent to 35 percent
Obviously, the parameters are constrained to lie in (0 1), which cates the beta distribution for prior and candidate distributions The prior
indi-deviations of 0.027, 0.145 respectively for and
means as the prior distributions but are more widely distributed, toensure that candidates from the tails of the distribution are adequatelysampled
The acceptance rates for and are approximately 35 percent
The results given here are from 10,000 simulations of the parameters,separately for the TSE and S&P data The first 500 simulations are ignored
in both cases to allow for burn-in
Table 5.1 gives the means and standard deviations of the posterior rameter distributions The means of the posterior distributions are Bayesianpoint estimates for the individual parameters These are very similar tothe maximum likelihood estimates in Table 3.5 This is not surprising,
2
2 2
( 1) 1
distributions used for the transition probabilities are p ~Beta(2 48),
and p ~Beta(2 6), giving prior means of 0.04 and 0.25 and standard,
The candidate distributions are also beta, with ~ Beta(1.2, 28.8) for p , and for p , candidate ~ Beta(1, 3) These have the same