We calibrate the performance of the constrained andunconstrained kernel density estimators by estimating tail densities based on samplesfrom Normal and Student-t distributions.. We find
Trang 1Applications of Constrained Non-parametric Smoothing
Methods in Computing Financial Risk
Chung To (Charles) Wong, BCom (Melb), GradDip (RMIT)
Submit for the degree of Doctor of Philosophy
School of Mathematical Sciences,Queensland University of Technology,
Brisbane
December, 2008
Trang 3The aim of this thesis is to improve risk measurement estimation by rating extra information in the form of constraint into completely non-parametricsmoothing techniques A similar approach has been applied in empirical likelihoodanalysis The method of constraints incorporates bootstrap resampling techniques,
incorpo-in particular, biased bootstrap This thesis brincorpo-ings together formal estimation ods, empirical information use, and computationally intensive methods
meth-In this thesis, the constraint approach is applied to non-parametric smoothingestimators to improve the estimation or modelling of risk measures We considerestimation of Value-at-Risk, of intraday volatility for market risk, and of recoveryrate densities for credit risk management
Firstly, we study Value-at-Risk (VaR) and Expected Shortfall (ES) estimation.VaR and ES estimation are strongly related to quantile estimation Hence, tailestimation is of interest in its own right We employ constrained and unconstrainedkernel density estimators to estimate tail distributions, and we estimate quantilesfrom the fitted tail distribution The constrained kernel density estimator is anapplication of the biased bootstrap technique proposed by Hall & Presnell (1998).The estimator that we use for the constrained kernel estimator is the Harrell-Davis(H-D) quantile estimator We calibrate the performance of the constrained andunconstrained kernel density estimators by estimating tail densities based on samplesfrom Normal and Student-t distributions We find a significant improvement infitting heavy tail distributions using the constrained kernel estimator, when used inconjunction with the H-D quantile estimator We also present an empirical studydemonstrating VaR and ES calculation
A credit event in financial markets is defined as the event that a party fails topay an obligation to another, and credit risk is defined as the measure of uncertainty
Trang 4of such events Recovery rate, in the credit risk context, is the rate of recuperationwhen a credit event occurs It is defined as Recovery rate = 1 − LGD, where LGD
is the rate of loss given default From this point of view, the recovery rate is a keyelement both for credit risk management and for pricing credit derivatives Only thecredit risk management is considered in this thesis To avoid strong assumptionsabout the form of the recovery rate density in current approaches, we propose anon-parametric technique incorporating a mode constraint, with the adjusted Betakernel employed to estimate the recovery density function An encouraging result forthe constrained Beta kernel estimator is illustrated by a large number of simulations,
as genuine data are very confidential and difficult to obtain
Modelling high frequency data is a popular topic in contemporary finance Theintraday volatility patterns of standard indices and market-traded assets have beenwell documented in the literature They show that the volatility patterns reflectthe different characteristics of different stock markets, such as double U-shapedvolatility pattern reported in the Hang Seng Index (HSI) We aim to capture thisintraday volatility pattern using a non-parametric regression model In particular,
we propose a constrained function approximation technique to formally test thestructure of the pattern and to approximate the location of the anti-mode of theU-shape We illustrate this methodology on the HSI as an empirical example.Keywords: Constraint Method; Expected Shortfall; Non-parametric approach;Recovery rate density; Intraday Volatility; Risk Management; Value-at-Risk
Trang 51.1 Risk 1
1.2 Risk management 1
1.2.1 Value-at-Risk 2
1.2.2 Recovery rate 4
1.2.3 Intraday Volatility 5
1.3 Constraint methods for risk management 5
1.3.1 Constraint 5
1.3.2 Background 6
1.4 Aim of this thesis 7
1.5 Structure of this thesis 7
2 Estimation of Value-at-Risk and Expected Shortfall 9 2.1 Introduction 9
2.1.1 Background 9
2.1.2 Description of the Problem 11
2.2 Methodology 13
2.2.1 Constrained Kernel Estimator 13
2.3 Value-at-Risk 19
2.4 Simulation Study 21
2.4.1 Densities Investigated 22
Trang 62.4.2 Choice of Kernel Function and Bandwidth 22
2.4.3 Measure of Discrepancy: Mean Integrated Squared Error 26
2.4.4 Simulation Result for the Quantile Estimators 29
2.4.5 Convergence of the Constrained Kernel estimator 40
2.4.6 Test of quantile estimation using CKE 41
2.5 Expected Shortfall 47
2.5.1 Simulation Results for the ES Estimators 50
2.6 Empirical Study 53
2.6.1 Dataset 54
2.6.2 Risk factor 54
2.6.3 Value-at-Risk 57
2.6.4 Empricial VaR and ES Estimation 58
2.6.5 Confidence intervals for VaR 61
2.6.6 Confidence intervals for ES 64
2.7 Backtesting 66
2.7.1 Dataset 66
2.7.2 Backtesting: Result 67
2.8 Conclusion 75
3 Estimation of recovery rate density 77 3.1 Introduction 77
3.1.1 Credit Risk and Recovery Rate 77
3.1.2 Background 78
3.1.3 Aim and structure of this chapter 80
3.2 Methodology 81
3.2.1 Overview of methodology 81
3.2.2 The Beta kernel estimator 81
3.2.3 The constrained Beta kernel estimator 82
Trang 73.2.4 Objective function 82
3.2.5 Constraint 83
3.2.6 First derivative of the Beta kernel estimator 86
3.2.7 Optimisation 89
3.3 Bandwidth selection 89
3.4 Simulation Study 90
3.4.1 Mode Estimation 91
3.4.2 Density Estimation 105
3.5 Conclusion 119
4 Modelling intraday volatility patterns 120 4.1 Introduction 120
4.1.1 High frequency volatility 120
4.1.2 Background 121
4.1.3 Aim and structure of this chapter 122
4.2 Regression estimator 122
4.2.1 Nadaraya-Watson estimator 123
4.2.2 Distance measure 123
4.2.3 The U-shape constraint 124
4.2.4 First derivative of linear estimator 124
4.2.5 Optimisation for constrained Nadaraya-Watson estimator 125
4.2.6 Bandwidth selection 125
4.3 Constrained function approximation (CFA) 128
4.3.1 Computational aspects of the CFA 129
4.3.2 The Initialisation procedure 130
4.3.3 The Adding procedure 132
4.3.4 The Optimisation procedure 134
4.3.5 Model diagnostics 136
Trang 84.3.6 Anti-mode estimation 141
4.4 Simulation 141
4.4.1 Simulation results 145
4.5 Empirical Study 153
4.5.1 Dataset 153
4.5.2 Empirical result 155
4.6 Conclusion 167
5 Conclusion 168 A Empirical Study: All Ordinaries Index 172 A.0.1 Dataset 172
A.0.2 Risk factor 173
A.0.3 Value-at-Risk 174
A.0.4 Empricial VaR and ES Estimation 175
A.0.5 Confidence intervals for VaR 179
A.0.6 Confidence intervals for ES 181
A.1 Backtesting 181
A.1.1 Dataset 181
A.1.2 Backtesting: Result 183
Trang 9Chapter 2
• CKE: Constrained Kernel Estimator
• EQ: Empirical Quantile Estimator
• ES: Expected Shortfall
• EVT: Extreme Value Theory
• GPD: Generalised Pareto Distribution
• HSI: Hang Seng Index
• ISE: Integrated Squared Error
• K-L: Kaigh-Lachenbruch Estimator
• MSP bandwidth: Maximal Smoothing Principle
• NKE bandwidth: Na¨ıve Kernel Estimator
• NR bandwidth: Normal Reference bandwidth
• POT: peaks-over-threshold
• SJ: Sheather and Jones’s bandwidth
• VaR: Value-at-Risk
Chapter 3
• B-kernel: Beta kernel
• CB-kernel: Constrained Beta kernel
Trang 10• EDM: Empirical Density Mode
• ERM: Empirical Relationship Mode
• GM: Grenander Mode
• HSM: Half Sample Mode
• MSE: Mean Square Error
• ROT: Rule of Thumb
• SP: Semi-Parametric
• SPM: Standard Parametric Mode
• T-Gauss: Transformed Gaussian
Chapter 4
• AAE: Average Absolute Error
• CFA: Constrained Function Approximation
• LLS: Linear Least Squares
Trang 11Statement of Originality
This is to certify that the work contained in this thesis has never previously beensubmitted for a degree or diploma in any university and that, to the best of myknowledge and belief, the thesis contains no material previously published or written
by another person except where due reference is made in the thesis itself
Trang 13The author thanks Professor Rodney Wolff (Queensland University of Technology,Australia) for supervising the project throughout three years, and Dr Steven Li (Uni-versity of South Australia) for discussions on Monte Carlo simulation
Trang 15Chapter 1
Introduction
Risk is defined as a potential loss in an uncertain event The concept of risk plays
an important role in financial markets as risk management is an essential task for
a finanical institution According to the Basel Accord Committee (1996), the bankmust satisfy the minimum capital requirements of the Basel Accord The BaselAccord is a standard bank regulatory policy and regulated by the Basel Committee
on Banking Supervision The purpose of the regulation is to require the bank toretain a certain amount of capital to compensate for the corresponding risk exposure
Trang 16Secondly, the VaR measure, which is a cross-sectional measure in practice, siders the extreme loss at a given probability at a given time Statistically, it isgiven by the quantile of the hypothetical return distribution The definition of VaR
con-is given in Jorion (2001) and Duffie & Pan (1997)
Therefore, a consistently accurate estimation of the risk measure is an importanttask for risk management
This thesis is comprised of three case studies We consider estimation of theValue-at-Risk, of the intraday volatility for market risk, and of the density of re-covery rate for the credit risk management Our approach to risk measure is brieflyintroduced in the following Sections Also, each case study will contain its owndetailed literature review
of the tail distribution itself
Current Approaches to Computing Value-at-Risk
Empirical Quantile This approach is based on a fixed period of historical data.First of all, changes in well-defined risk factors are calculated from these data,and then an hypothetical portfolio value is computed This generates thehypothetical return directly Repeated application of this, by incorporatingvariation (such as ‘tweaking’ the data by jittering, or using different historical
Trang 17subsets) can then create an hypothetical empirical distribution of returns VaR
at level α is then given by the (n × α)th ordered hypothetical return, which isthe empiricial percentile at level α and n is the sample size
Problem with Empirical Quantile The empirical quantile is very sensitive to thesample size For instance, the empirical quantile at 0.01 level with 1000 ob-servations depends on the smallest 10 observations
Monte Carlo Simulation This approach assumes an hypothetical distribution ofrisk factor, and then generates a large number of random variables from thisdistribution Using these random risk factors, hypothetical returns are com-puted Then, a distribution of hypothetical returns is obtained, and VaR atlevel α is given by the (n × α)th ordered hypothetical return, where n is thesample size
Normal-based Methods This approach is based on the assumption that the riskfactor distribution is Normally distributed with mean zero1 and standard de-viation of risk factors VaR is simply the inverse cumulative density function
at level α
Problems with Monte Carlo Simulation and Normal-based Methods The MonteCarlo simulation and Normal-based methods assume that the risk factors be-long to some assumed distribution and a Normal distribution, respectively.Such assumptions restrict the flexibility and shape of the densities of the riskfactors Further the risk factor density has a heavier tail than the Normaldistribution
1 The mean is ignored since it is very small relative to σ∆t
Trang 181.2.2 Recovery rate
Recovery rate is the rate of recuperation when a default event2 occurs It is ameasure of the extent to which the size of loss is minimised when a default eventoccurs: a high recovery rate can reduce loss For this reason, recovery rate is a keyfactor in the estimation of Credit Value-at-Risk(CVaR), which is the potential loss
in a credit default event However, in some cases, the recovery rate is not taken intoaccount by CVaR; indeed, sometimes the recovery rate is assumed to be 0%, whichmeans that nothing is retrieved when a default event occurs This will induce anoverestimated CVaR There is clearly a need for better estimation of recovery rates
to improve calculation of CVaR
Current Practical Approach to Recovery Rate Densities
Recovery rate, in the credit risk context, is the rate of retrieval when a defaultevent occurs It is defined as Recovery rate = 1 − LGD, where LGD is the rate
of loss given default The current practical approach to estimating the recoveryrate is based on parametric methods Specifically, the recovery rate distribution
is usually assumed to be a Beta distribution, and its parameters are calibrated bythe method-of-moments (by equating the respective sample moment to those of theBeta distribution, and solving the resultant set of equations) Also, the recoveryrate is usually categorised by seniority and by industry, because recovery rates arerelated to the value of collateral and the liquidity of the company
Problems for the recovery rate with an assumed distribution There is no tistical or theoretical evidence to show that the recovery rate density necessarilybelongs to a Beta family distribution Also, the literature reports a bimodal density
sta-of recovery rates, which is not supported by the Beta distribution
2 A default event is a credit event that a party fails to pay an obligation to another.
Trang 191.2.3 Intraday Volatility
Volatility is one of the primary measures of risk It measures the change in return
in a given period As technology improves, accessibility of high frequency data ismore convenient and easier This enhances the potential for high frequency datamodelling Stock market and stock indices are general cases which render highfrequency data, attracting many researchers to investigate the daily periodic pattern
Capturing the intraday volatility pattern
Nowadays, numerous ways exists to access intraday data The data can be used
to measure volatility within a day and the literature explores the non-linearity inintraday stock index volatility in financial markets It is known that the share returnvolatility pattern changes throughout the day, with large volatility at the start ofthe day, slumping at around lunch time, then building up in a step-wise patternthroughout the afternoon to the end of the day
Applying a constraint can be understood as placing a restriction upon a model
in order to ‘encourage’ a particular behaviour Such constraints are usually based
on some property of the underlying distribution, and are accessed by means of astatistic For instance, a specific quantile of a return distribution can be used toimprove the tail estimation for VaR, or the mode of a density can be used to improvedensity estimation of recovery rate in credit risk modelling Our approach will be
to estimate the form of the constraint using the sample, and then to use in theappropriate statistic in the estimation technique
Trang 201.3.2 Background
The constraint approach is widely applied in non-parametric frameworks Theempirical likelihood (Owen (2001)) is the non-parametric approach incorporatingconstraints Chen (1997) proposes estimating the density by using an empiricallikelihood-based kernel estimator when extra information is available In this thesis,
we assume that the expectation of a transformed random variable is zero, which isgiven by E{g(X)} = 0, where g(X) is a known function as in Chen (1997) Thisconstraint can improve the estimation of a density function by a significant reduction
of the Mean Integrated Squared Error3
Hall & Presnell (1998), Hall & Presnell (1999a) and Hall & Presnell (1999b)propose biased bootstrap methods This method incorporates extra informationwith the standard bootstrap resampling method of Efron & Tibshirani (1993) Thebiased bootstrap method assigns sampling weights to each observation, xj, given by
P (X∗ = xj|χ) = pj,where X∗ is the resampled observation, χ is the original sample, and pj is the sam-pling weight In these papers, they also mention that one application of the biasedbootstrap is a kernel estimator4 under constraint Hall, Huang, Gifford & Gijbels(2001) and Hall & Huang (2001) successfully apply the biased bootstrap method touse kernel regression to estimate the hazard rate under assumptions of monotonic-ity Hall & Turlach (1999) apply the biased bootstrap method to curve estimation
3
Mean Integrated Squared Error = E
Z (f − ˆ f ) 2 dx
=
Z {f − E( ˆ f )}2dx +
Z {E( ˆ f2) − {E( ˆ f )}2}dx
= Integrated Squared Bias + Integrated Variance
Assuming integration and expectation can be interchange.
4 Kernel estimator here refers to the kernel density estimator and kernel regression model.
Trang 21by minimising the empirical mean integrated squared error and by assuming modality Cheng, Gasser & Hall (1999) apply the biased bootstrap method to curveestimation under both assumptions of unimodality and monotonicity Also, a uni-modality assumption is applied to density estimation in Hall & Huang (2002), whoprovide some theoretical properties of the choice of the distance measure for uni-modality assumption with the density estimation.
Estimation of risk measures is divided into two principal approaches: parametric andnon-parametric methods Parametric methods are based on strong assumptionsabout distributions, and non-parametric methods are distribution-free As non-parametric methods usually accept that there is a fixed but unknown underlyingdistribution characteristics of the distribution are, in some sense, ignored
Our main interest in this thesis is to improve non-parametric methods for elling and estimation in a selection of problems in finance by incorporating keydistributional characteristics into non-parametric estimation
In Chapter 2, we introduce a technique to improve tail estimation The weightedkernel density estimator is combined with a quantile estimator to highlight thecharacter of tail density Then the Value-at-Risk and the Expected Shortfall can becomputed directly from this estimator Both simulations and empirical results arepresented with comparison of the existing approaches in this chapter
In Chapter 3, we apply the constraint method to the Beta kernel density tor As a number of papers give empirical evidence of the recovery rate density5, this
estima-5 The recovery rate density is a bimodal density function.
Trang 22empirical evidence is imposed as a constraint on the Beta kernel estimator Thenthe preformance of the density estimation is shown by the simulation.
In Chapter 4, we further adopt the constraint method to the Nadaraya-Watsonregression with intraday volatility We impose the intraday volatility pattern, such
as double U-shaped pattern, as a constraint in the Nadaraya-Watson regressionestimator Also, we develop a non-paramtric technique to approximate the shape
of the function From this technique, we can analyse the shape of the function.Furthermore, these techniques are applied to the Hang Seng Index as an empiricalstudy
Chapter 5 concludes the thesis
Trang 23to calculating VaR.
For this reason, many researchers, such as Harrell & Davis (1982), Huang &Brill (1999) and Huang (2001) consider estimating quantiles Furthermore, theperformance of quantile estimators is an interesting topic in its own right Parrish(1990) and Dielman, Lowry & Pfaffenberger (1994) compare the performance ofseveral quantile estimators in order to find out which performs the best against
Trang 24specified criteria They conclude that the Harrell and Davis (H-D) estimator1 is thebest for a wide range of quantile estimation scenarios, as demonstrated in simulationresults from both symmetric and skewed distributions Since all these estimatorsare distribution-free, they are part of a non-parametric framework.
One of the alternative risk measurement methods is to consider the tail density,called the Expected Shortfall (ES) It is defined as the expected loss given thatthe loss exceeds the VaR In other words, ES is the conditional expectation of thereturns given that the returns are beyond the corresponding VaR Therefore, the
ES is also known as the Conditional VaR Artzner, Delbaen, Eber & Heath (1999)propose four axioms that any risk measure (ρ) should satisfy, namely
• Translation Invariance: If X is the risk factor and a is real, then
Trang 25coherent risk measure in terms of these four axioms The mathematical properties
of ES are studied in Bertsimas, Lauprete & Samarov (2002), Rockafellar & Uryasev(2000) and Tasche (2000)
VaR estimation is interesting, but so is estimation of entire tail distributions tocompute ES, as the ES is the first moment of the density over the partition belowthe corresponding VaR An accurate estimator of the tail of a density functioncarries accurate information both about extreme quantiles or percentiles, and aboutconditional expectation From this point of view, much research has been doneusing parametric methods, in which a specified distribution is fitted to the observedreturns by calibrating the parameters This method is, of course, very sensitive tothe assumption of distribution For instance, based on various simulation results
in Chang, Hung & Wu (2003) comparing empirical coverage, the average length ofconfidence intervals and the variance of the length of confidence intervals, the H-Destimator is preferred over parametric models in estimating the VaR
There are several popular ways to estimate VaR based on a series of financial returns.One of them is to use the parametric approach The parametric approach assumesthat returns follow a specified distribution (e.g., Normal distribution) Although thisapproach may be able to exploit properties of the specified distribution, it involvescalibrating parameters Also, the shape of the density lacks flexibility due to having
a small number of parameters for many popular models, such as the Normal andStudent-t distribution One consideration here is that evidence of risky behaviour
in the data set (outliers) may adversely affect parameter estimation, even when arobust method is used
There are two popular methods to estimate model parameters: Moments and Maximum Likelihood In the Method-of-Moments approach, the
Trang 26Method-of-parameters are calibrated by equating the respective sample moments and the tribution moments, and solving the resulting set of equations in the parameters It
dis-is a straightforward method for calibrating the parameters However, there dis-is noguarantee that the estimators will be unbiased, even in simple distributional cases,and the variance of the estimator is often greater than that of many other estimatorsand does not acheive the Cram´er-Rao lower bound2 In other words, this estimator
is not always maximally efficient Moreover, the sample moments are very sensitive
to outliers, especially the first few moments Hence, optimal parameters may not
be obtained
Efficiency can be improved dramatically by using the method of Maximum lihood (ML) The variance of the estimator is always asymptotically smaller thanthat of many other comparable estimators (The variance and the bias diminish asthe sample size increases.) In contrast, a large variance and a heavy bias may occurfor small sample sizes Furthermore, we need grounds for a strong conviction thatthe specified distribution is the true distribution
Like-On the other hand, the empirical quantile is a very popular estimator of VaR andthe corresponding ES in practice However, the bias of the VaR estimation is verysensitive to the sample size Also, it associates zero probability with a loss greaterthan the largest loss This may present a huge problem when estimating the ES.Due to the problems of these parametric approaches, this chapter proposes anon-parametric approach We propose to modify standard kernel density estimators:
we expect that the constrained kernel estimator can increase the accuracy of bothdensity and quantile estimation with a smaller sample size than the na¨ıve approach.Silverman (1986a) and Pagan & Ullah (1999) give detailed foundation discussionsabout kernel density estimators They are also known as distribution-free densityestimators Throughout this chapter, we refer to the kernel density estimator as thena¨ıve kernel estimator The na¨ıve kernel density estimator provides an estimate of
2 Cram´er-Rao lower bound is the minimum variance of any unbiased estimator.
Trang 27the global shape of the density However, for a tail estimate, when there is a smallnumber of outliers or extreme values, the na¨ıve kernel density estimator puts equalweight on these values The use of equal weights may induce bias in tail estimation.Moreover, the bias of the na¨ıve kernel density estimator is very sensitive to thesample size and the heaviness of the tails Hence, the na¨ıve kernel density estimator
is not a precise quantile estimator
Since the kernel density estimator is very sensitive to sample size, and since usingone year of daily returns to evaluate the VaR is very popular in practice, the na¨ıvekernel estimator in its usual form is not a suitable estimator of VaR: the sample size
of one year daily returns is not large enough to capture the extreme tail density.The kernel estimator of a density f , based on a sample X1, , Xn, is given by
where Xj is the jth observed return, K is a kernel function, h is a bandwidth and
n is the size of sample The kernel function K is defined to be a symmetric andcontinuous probability density function, and the bandwidth h controls the smooth-ness of the estimated density, and so it affects the bias of the estimated density.Equation (2.1) shows that the kernel density estimator is an equally weighted linearcombination of the kernel function K evaluated at each observation, with weight1/n at each kernel function or observation We now consider using unequal weightsinstead of equal weights in order to optimise some specific feature of the estimator
In order to specify the possibly different weights corresponding to each term ofthe sum in Equation (2.1), we propose a kernel density estimator constrained by
Trang 28the value of a quantile, which directly incorporates additional information into thekernel density estimator By using this constraint, we can evaluate the weightcorresponding to each kernel function The form of the constrained kernel densityestimator is given by
Now, let us introduce our quantile constraint,
Thoughout, the Constraint Kerenel Estiamtor with quantile constraint at level
α is written as “α% Constrained Kernel estimator” or “α%CKE”
In order to estimate the value of v, a quantile estimator is used A popularquantile estimator is an L-estimator, which is a weighted linear combination of theorder statistics The form of the L-estimator is given by
Trang 291 for j = 1, , n and X(j) is the jth order statistic There are several quantile estimators proposed by Harrell & Davis (1982), Huang & Brill (1999) and Huang(2001) Dielman et al (1994) show that the H-D estimator gives the most reliableestimator, through a large number of simulations comparing other L-estimators.The original H-D estimator is given by
1β((n + 1)p, (n + 1)q)t
Before we carry out any calculations, we need to define each weight pj Firstly,
pj must be a non-negative number and all weights must satisfy
n
X
j=1
where 0 ≤ pj ≤ 1 These are some of our constraints
Before we add the constraints to the density function, we need to have a liminary density function The constrained kernel estimator is a modified version
pre-of the na¨ıve kernel estimator, which is chosen to be our preliminary density tion Then, to ascertain the efficacy of our method, we can measure the distancebetween the na¨ıve kernel estimator and the constrained kernel estimator by usingthe Kullback-Leibler distance as in Cressie & Read (1984)
Trang 30func-Now, we let d be the Kullback-Leibler distance, so that d = P
jpjln(pj/qj),where pj is the weight which optimises the constrained kernel estimator and qj isthe fixed weight for the na¨ıve kernel estimator We consider the Kullback-Leiblerdistance between the constrained kernel estimator and na¨ıve kernel estimator issubject to the above constraints, where qj = 1/n for j = 0, 1, 2, , n to representthe na¨ıve kernel estimator Minimising the Kullback-Leibler distance is equivalent
to maximising Qn
j=1pj subject to the same constraints, for which we use LagrangeMultipliers Owen (1988) and Owen (1990) state that a unique maximum exists,provided that α is inside the convex hull of the points (a1, a1, , an), where
"
1h
− α
# (2.9)
In Equations (2.10)-(2.12), we have the partial derivative of S with respect to pj
(j = 1, , n), λ1 and λ2, respectively, and we maximise Qn
i=1pi by setting theseequations to zero We obtain
Trang 31We rearrange the partial derivative of S with respect to p1 to obtain an expressionfor Qn
Rv
λ1+λ2 h
Rv
λ1+λ2 h
k=1
λ1+λ2 h
Rv
λ1+λ2 h
k=1
λ1+λ 2 h
Rv
λ1 +λ2 h
Trang 32We can employ the Powell Dogleg Method3 (Powell 1970) to find the roots.However, there are special cases of µ to give function g the value zero Firstly, µ = 0
is one such value This solution induces the weights p1 = p2 = = pn= 1/n If weput these weights into Equation (2.2), we get Equation (2.1), which is exactly thena¨ıve kernel density estimator Moreover, when µ = 0, this implies that λ2 = 0 If wesubstitute λ2 = 0 into Equation (2.9), then the quantile constraint term disappears
in the expression for the Lagrange Multiplier, which means that the quantile at theconstrained level using the na¨ıve kernel estimator equals the H-D quantile, and alsogives a zero Kullback-Leibler distance between these two estimators This impliesthe constrained kernel estimator is the same as the na¨ıve kernel estimator In thiscase, the constraint is not necessary Also, Owen (1988) and Owen (1990) state that
a unique maximum exists under the constraint in Equation 2.3 and 2.8
3 It is a built-in function in Matlab 2006a with an optimisation package.
Trang 332.3 Value-at-Risk
In this Section, a number of quantile estimators will be introduced They are allcommonly used in estimating the VaR These estimators will be used to comparethe quantile estimators for the constrained and na¨ıve kernel estimators Beforeintroducing the estimators, we define n to be the sample size, p to be the quantilelevel and X(i) to be the ith order statistic
Empirical Quantile Estimator (EQ) This is a very basic quantile estimator Thisestimator has several kinds of versions The one that we use is given by
ˆ
where
w = 0.5 + [np + 0.5] − np,where [ ] denotes the integer part, and set X(0) = X(1) This estimator is such that
X(i) has quantile level p = (i − 0.5)/n Also, the quantile is estimated by linearinterpolation if the quantile level is between p = (i − 0.5)/n and p = (i + 0.5)/n Kaigh-Lachenbruch Estimator (K-L) The K-L estimator is an L-estimator, which
H-D Quantile Estimator (H-D) The form of the H-D estimator has been given inEquation (2.5) Inui, Kijima & Kitano (2005) give a very detailed description of thebehaviour of the H-D estimator They found that, for the H-D estimator, the biasincreases as the quantile level and sample size decrease Also, a better performance
of the H-D estimator eventuates when applied to less heavy tailed distributions
Trang 34Extreme Value Theory McNeil (1999) describes the application of ExtremeValue Theory (EVT) in risk management VaR estimation by EVT is based onthe peaks-over-threshold (POT) model Let F (x) be the distribution function ofnegative returns (losses) and u be the threshold value:
F (x + u) = F (u) + (1 − F (u)){1 − (1 + ξx/β)−1/ξ}
Let y = x + u, and then
F (y) = F (u) + (1 − F (u)){1 − (1 + ξ(y − u)/β)−1/ξ}
After rearranging,
F (y) = 1 − (1 − F (u)) + (1 − F (u))1 − (1 + ξ(y − u)/β)−1/ξ
F (y) = 1 − (1 − F (u))1 −1 − (1 + ξ(y − u)/β)−1/ξ
F (y) = 1 − (1 − F (u)){1 + ξ(y − u)/β}−1/ξ (2.21)and
Trang 35Equation (2.21) gives the distribution function of loss under the POT model andEquation (2.22) gives the inverse of the distribution function of loss For the VaRwith negative return, an estimator is given by
The unknown parameters β and ξ are estimated by Maximum Likelihood (ML)
In this section, we will study the performance of the tail and quantile estimationsusing the constrained kernel estimator with different heaviness of tail density func-tion We carry out a simulation experiment using several distributions and examineproperties of their tail distributions In this study, we draw 100 samples from eachcandidate distribution and estimate the tail individually using both the na¨ıve kerneland the constrained kernel estimators The properties of these candidate distribu-tion are described in Section 2.4.1 Then, the choice of the kernel function and the
Trang 36bandwidth for the na¨ıve and constrained kernel estimators are described in Section2.4.2 The convergence and the tail estimation of the constrained kernel estimatorare investigated by comparing the na¨ıve kernel estimators in Sections 2.4.5 and 2.4.3.Also, the EQ, H-D, K-L, EVT and the na¨ıve kernel estimator are used to comparethe quantile estimator using the constrained kernel estimator in section 2.4.4 Ahypothesis test of the constrained kernel estimator is illustrated in section 2.4.6.
These densities are plotted in Figure 2.1
In many papers, the Normal distribution is assumed for return distributions TheStudent-t distributions with the degrees of freedom 5 and 7 are also bell-shapeddensities and these densities have different heaviness of the tails We can thuscompare performance for distributions with flat tails and thick tails
Trang 37−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
Figure 2.1: The true tail density of the Normal and Student-t densities with degrees offreedom 5 and 7 They are represented by the solid line, dotted line and dashed line,respectively The location of the true quantile at 0.01 and 0.02 are represented by adiamond and a triangle, respectively
Bandwidth
Silverman (1986a) points out that the choice of bandwidth is much more importantthan that of the kernel function This is because bandwidth determines the smooth-ness of the estimated density In other words, the trade-off between the smoothnessand the bias of the estimated density depends on the choice of the bandwidth Also,
if the estimated density is oversmoothed, then the estimated density may lose somelocal turning points In contrast, an undersmoothed estimated density will have toomany local, possibly spurious, turning points Turlach (1993) gives an overview ofbandwidth selection There are several reasons why there is no unique optimal band-width First of all, the optimal bandwidth is derived by minimising error and there
is more than one type of error measurement, such as Mean Integrated Squared Error
Trang 38(MISE) and Approximate Mean Integrated Squared Error (AMISE) (see Equation2.26) Secondly, the expression of the error measurement always involves some un-known term(s) and these unknown term(s) are always related to the true density.Many different methods to estimate these unknown term(s) are used from the lit-erature, such as Hall & Marron (1987), Sheather & Jones (1991) and Hall & Wolff(1995).
Our simulation study will use the Maximal Smoothing Principle (see Terrell(1990)) for both constrained and na¨ıve kernel density estimators in our simula-tion study Furthermore, the Sheather & Jones (1991) Plug-in bandwidth will also
be applied to the na¨ıve kernel estimator for comparison of quantile and tail tion These bandwidths are based on minimising the Approximated Mean IntegratedSquared Error (AMISE), which is given by
where K is the kernel function, R(L) = R
L2(x)dx, n is the sample size, h is thebandwidth, µ2(L) is the second moment of L, f is the true density and f′′ is thesecond derivative of f The optimal bandwidth, h∞, is given by
h∞ =
R(K)nR(f′′)µ2
Rule of Thumb and Maximal Smoothing Principle
The optimal bandwidth using the Rule of Thumb method (hN R) is based on areference distribution The estimation of the unknown term R(f′′) in Equation(2.27) is based on this reference distribution In the most popular case, this referencedistribution is assumed to be the Normal distribution with standard derivation ˆσ
If K is the Gaussian kernel, then hN R ≈ 1.06ˆσn−1/5
Trang 39Similar to the Rule of Thumb, the Maximal Smoothing Principle (MSP) is posed by Terrell (1990) and Terrell & Scott (1985) Terrell (1990) obtained a lowerbound of R(f′′) Hence, an upper bound for the optimal bandwidth can be obtained
pro-by feeding a lower bound of R(f′′) into Equation (2.27) Also, he proposes to usethis upper bound as the optimal bandwidth and taking the standard derivation to
be the spread measure, which is the measurement of variability of the sample Thebandwidth is given by
hM SP = 3(35)−1/5σ{|R(K)/µˆ 22(K)|}1/5n−1/5
If K is the Gaussian kernel density, then R(K) = (2√
π)−1 and µ2 = 1, and
Sheather and Jones Plug-in Method
The Sheather and Jones Plug-in Method is proposed by Sheather & Jones (1991).This method uses the kernel-based estimator to estimate R(f′′) and is given by
ϕ(6) is the sixth derivative of the Gaussian kernel, a = 0.920λn−1/7, b = 0.912λn−1/9
and λ is the interquartile range The function f in ˆRa and ˆRb uses the Normaldistribution as the reference distribution Finally, a function of h can be derived by
Trang 40substituting Equation (2.29) into Equation (2.27), and an optimal bandwidth hSJ
can be obtained by solving
Approximate 95% confidence interval
In our simulation study, the 95% confidence interval (C.I.) is approximated by
C.I ≈ µ ± 2√σ
4 See Sheather & Marron (1990).