1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Handbook of Econometrics Vols1-5 _ Chapter 39 potx

41 126 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Methodology And Theory For The Bootstrap
Tác giả P. Hall
Thể loại Essay
Định dạng
Số trang 41
Dung lượng 2,06 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

TO cite an extreme case, one would not usually think of using Monte Carlo methods to compute a sample mean or variance, but nevertheless those quantities might reasonably be regarded as

Trang 2

2342 P Hall

Abstract

A brief account is given of the methodology and theory for the bootstrap Methodology is developed in the context of the “equation” approach, which allows attention to be focussed on specific criteria for excellence, such as coverage error of

a confidence interval or expected value of a bias-corrected estimator This approach utilizes a definition of the bootstrap in which the key component is replacing a true distribution function by its empirical estimator Our theory is Edgeworth expansion based, and is aimed specifically at elucidating properties of different methods for constructing bootstrap confidence intervals in a variety of settings The reader interested in more detail than can be provided here is referred to the recent monograph of Hall (1992)

Trang 3

Ch 39: Methodoioyy and Theoryfor the Bootstrap 2343

Barnard (1963), Hope (I 968) and Marriott (1979) Our definition of the bootstrap would not regard Monte Carlo testing as a bootstrap procedure That may be seen as either an advantage or a disadvantage, depending on one’s view

A second objection that one may have to defining the “bootstrap” strictly in terms

of whether or not Monte Carlo methods are employed, is that the method of numerical computation becomes intrinsic to the definition TO cite an extreme case, one would not usually think of using Monte Carlo methods to compute a sample mean or variance, but nevertheless those quantities might reasonably be regarded

as bootstrap estimators of the population mean and variance, respectively In a less obvious instance, estimators of bootstrap distribution functions, which would usually be candidates for approximation by Monte Carlo methods, may sometimes

be computed most effectively by exact, non-Monte Carlo methods See for example Fisher and Hall (1991) In other settings, saddlepoint methods provide excellent alternatives to simulation; see Davison and Hinkley (1988) and Reid (1988) Does

a technique stop being a bootstrap method as soon as non-Monte Carlo methods are employed? To argue that it does seems unnecessarily pedantic, but to deny that

it does would cause some problems for a bootstrap definition based on the notion

of simulation

The name “bootstrap” was introduced by Efron (1979), and it is appropriate here

to emphasize the fundamental contributions that he made As Efron was careful to point out, bootstrap methods (in the sense of replacing F by F) had been around

for many years before his seminal paper But he was perhaps the first to perceive the enormous breadth of this class of methods He saw too that the power of modern computing machinery could be harnessed to allow functionals of F^ to be computed

in very diverse circumstances The combination of these two observations is extremely powerful, and its ultimate effect on Statistics will be revolutionary Necessarily, these two observations go together; the vast range of applications of bootstrap methods would not be possible without a facility for extremely rapid simulation However, that fact does not imply that bootstrap methods are restricted

to situations where simulation is employed for calculation

Statistical scientists who thought along lines similar to Efron include Hartigan (1969, 1971), who used resampled sub-samples to construct point and interval estimators, and who stressed connections with Mahalanobis’ “interpenetrating samples” and the jackknife of Quenouille (1949, 1956) and Tukey (1958); and Simon (1969, Chapters 23-25), who described a variety of Monte Carlo methods

Let us accept, for the sake of argument, that bootstrap methods are defined by the “replace F by P’ rule, described above Two challenges immediately emerge in response to this definition First, we must determine how to “focus” this concept,

SO as to make the bootstrap responsive to statistical demands That is, how do we decide which functionals of F should be estimated? This requires a “principle” that

enables US to implement bootstrap methods in a range of circumstances The second challenge is that of calculating the values of those functionals in a practical setting The latter problem may be solved partly by providing simulation methods or related

Trang 4

devices, such as saddlepoint arguments, for numerical approximation Space limitations mean that a thorough account of these techniques is beyond the scope

of this chapter However, a detailed account of efficient methods of bootstrap simulation may be found in Appendix II of Hall (1992) A key part of the answer to the first question is the development of theory describing the relative performance

of different forms of the bootstrap, and that issue will be addressed at some length here

Our answer to the first question is provided in Section 2, where we describe an

“equation approach” to focussing attention on specific statistical questions This technique was discussed in more detail by Hall and Martin (1988), Martin (1989) and Hall (1992, Chapter 1) It leads naturally to bootstrap iteration, which is discussed in Section 3 Section 4 presents theory that enables comparisons to be made of different bootstrap approaches to inference about distributions The reader

is referred to Hinkley (1988) and DiCiccio and Roman0 (1988) for excellent reviews

of bootstrap methods

Our discussion is necessarily kept brief and is essentially an abbreviated form of

an account that may be found in Hall (1992) In undertaking that abbreviation we have omitted discussion of a variety of different approaches to the bootstrap In particular, we do not discuss various forms of bias correction, not because we do not recommend it but because space does not permit an adequate survey We readily concede that the restricted account of bootstrap methods and theory presented here

is in need of a degree of bias correction itself!

We do not address in any detail the bootstrap for dependent data, but pause here

to outline the main issues There are two main approaches to implementing the bootstrap in dependent settings The first is to model the dependent process as one that is driven by independent and identically distributed disturbances - examples include autoregressions and moving averages We describe briefly here a technique which may be used when no parametric assumptions are made about the distribution of the disturbances First estimate the parameters of the model, and calculate the residuals (i.e the estimated values of the independent disturbances) Then run the process over and over again, by Monte Carlo simulation, with parameter values set equal to their estimated values and with the bootstrapped independent disturbances obtained by resampling randomly, with replacement, from the set of residuals Each resampled process should be of the same length as the original one, and bootstrap inference may be conducted by averaging over the independent Monte Carlo replications Bose (1988) addresses the efficacy of this procedure in the context of autoregressive models, and derives results that may be viewed as analogues (in the case of autoregressive processes) of some of those discussed later in this chapter for independent data

If the distribution of disturbances is assumed known then, rather than estimate residuals and resample with replacement from those, the parameters of the assumed distribution may be estimated The bootstrap disturbances may now be derived by resampling from the hypothesized distribution, with parameters estimated

Trang 5

Ch 39: Methodology and Throryfir the Bootstrup 2345

The major other way of bootstrapping dependent processes is to divide the data sequence into blocks, and resample the blocks rather than individual data values This approach has application in spatial as well as “linear” or time series contexts, and indeed was apparently first suggested for spatial data; see Hall (1985) Blocking methods may involve either non-overlapping blocks, as in the technique treated by Carlstein (1986), or overlapping blocks, as proposed by Kiinsch (1989) (Both methods were considered for spatial data by Hall (1985)) In sheer asymptotic terms Kiinsch’s method has advantages over Carlstein’s, but those advantages are not always apparent in practice This matter has been addressed by Hall and Horowitz (1993) in the context of estimating bias or variance, and there the matter of optimal block width has been treated The issue of distribution estimation using blocking methods has been discussed by Gotze and Kiinsch (1990), Lahiri (1991, 1992) and Davison and Hall (1993)

2 A formal definition of the bootstrap principle

Much of statistical inference involves describing the relationship between a sample and the population from which the sample was drawn Formally, given a functional

f, from a class (f,:t~Y->, we wish to determine that value t, of r that solves an equation such as

where F = F, denotes the population distribution function and F = F, is the

distribution function “of the sample” An explicit definition of F, will be given

shortly Conditioning on F, in (2.1) serves to stress that the expectation is taken with respect to the distribution F, We call (2.1) the population equation because we

need properties of the population if we are to solve this equation exactly

For example, let 8, = d(F,) denote a true parameter value, such as the rth power

Trang 6

We call this the sample equation because we know (or can find out) everything about

it once we know the sample distribution function F, In particular, its solution f,

is a function of the sample values

We call & and E{f,(F,, FJ F,} “the bootstrap estimators” of t, and E{f,(F,, F,) 1 F,}, respectively They are obtained by replacing F0 by F, in formulae for to

and E{f,(F,, F,)I F,} In the bias correction problem, where f, is given by (2.2), the

bootstrap version of our bias-corrected estimator is I!+ &, In the confidence interval problem where (2.3) describes f,, our bootstrap confidence interval is (e - &,, 8 + f,) The latter is commonly called a (symmetric) percentile-method confidence interval for 6,

The “bootstrap principle” might be described in terms of this approach to estimation of a population equation

It is appropriate now to give detailed definitions of F, and F, There are two

approaches, suitable for nonparametric and parametric problems respectively In both, inference is based on a sample X of n random (independent and identically distributed) observations of the population In the nonparametric case, F, is simply

the empirical distribution function of X; that is, the distribution function of the distribution that assigns mass n-l to each point in X The associated empirical

probability measure assigns to a region B a value equal to the proportion of the sample that lies within 2 Similarly, F, is the empirical distribution function of a sample drawn at random from the population with distribution function F,; that

is, the empiric of a sample !Z* drawn randomly, with replacement, from 3 If we denote the population by X0 then we have a nest of sampling operations: X is drawn

at random from X0 and !E* is drawn at random from X

Trang 7

Ch 39: Mrthodology and Theoryfor the Bootstrap 2341

In the parametric case, F, is assumed completely known up to a finite vector i,

of unknown parameters To indicate this dependence we write F, = F,*(,), an element

of a class {F,,,, k.~Aj of possible distributions Let 1: be an estimator of I, computed from J, often (but not necessarily) the maximum likelihood estimator It will be a function of sample values, so we may write it as h(X) Then F, = F,Q, the distribution function obtained on replacing “true” parameter values by their sample estimates Let X* denote the sample drawn at random from the distribution with distribution function F,,, (not simply drawn from 3” with replacement), and let fi* = A(F*) denote the version of I computed for Y* instead of Y Then F, = F,i*,

It is appropriate now to discuss two examples that illustrate the bootstrap principle

Example 2.1 Bias reduction

Here the function f, is given by (2.2), and the sample equation (2.4) assumes the form

{.Fz, 1 d b d B} independently from the distribution with distribution function F,

In the nonparametric case, where F, is the empirical distribution function of the sample 3, let F,, denote the empirical distribution function of !!z In the parametric case, let iz = I’(%;) be that estimator of &, computed from resample Fz, and put

F,, = Fci*, Define 6: = 8(F,,) and o^= H(F,) Then in both parametric and non- parametrPc circumstances,

h=l

converges to fi = E(O(F,)lF,} = E(@*(X) (with probability one, conditional on F,)

as B+ncj

Trang 8

2348 P Hull

Example 2.2 Confidence interval

A symmetric confidence interval for 8, = U(F,) may be constructed by applying the resampling principle using the function f, given by (2.3) The sample equation then assumes the form

P{8(F,) - t < 8(F,) < Q(F,) + t(F,} - 0.95 = 0 (2.6)

In a nonparametric context Q(F,), conditional on F,, has a discrete distribution and so it would seldom be possible to solve (2.6) exactly However, any error in the solution of (2.6) will usually be very small, since the size of even the largest atom of the distribution of B(F,) decreases exponentially quickly with increasing II The largest atom is of size only 3.6 x 1O-4 when IZ = 10 We could remove this minor difficulty by smoothing the distribution function F, In parametric cases, (2.6) may usually be solved exactly for t

The interval (& f,, 8+ &J is a bootstrap confidence interval for 8, = 8(F,), usually called a (two-sided, symmetric) percentile interval since &, is a percentile of the distribution of le(F,) - Q(F,)I conditional on F, Other nominal 95% percentile intervals include the two-sided, equal-tailed interval (i?- f,,, 8 + fo2) and the one-sided interval (- co, & + f,,), where f,,, fo2, and f,, solve

Still other 95% percentile intervals are I^, = (e- fo2, 8+ f,,) and III = (- co,

8 + fo4), where too4 is the solution of

P{fI(F,) d Q(F,) - tlF,} - 0.05 = 0

These do not fit naturally into a systematic development of bootstrap methods by frequentist arguments, and we find them a little contrived They are sometimes

Trang 9

Ch 3Y: Methodology and Theoryfor the Bootstrap

motivated as follows Define e* = B(F,), I?(x) = P(8* < ~1%) and

I?‘(C()=inf{x:t?(x)>a}

Then

r^, = [I?‘(0.025),&‘(0.975)] and fI = [- co,I?‘(0.95)]

All these intervals cover 8, with probability approximately 0.95, which might be called the nominal coverage Coverage error is defined to be true coverage minus nominal coverage; it generally converges to zero as sample size increases

We now treat in more detail the construction of two-sided, symmetric percentile intervals in parametric problems There, provided the distribution functions Fo, are continuous, equation (2.6) may be solved exactly We focus attention on the cases where 8, = Q(F,) is a population mean and the population is normal or exponential Our main aim is to bring out the virtues of pivoting, which usually amounts to resealing so that the distribution of a statistic depends less on unknown parameters

If the population is Normal N@,c?) and we use the maximum likelihood estimator x = (x, S2) to estimate 1, = (CL, a2), then the sample equation (2.6) may be rewritten as

with coverage error

P(r7 - C2X (p& d /.i d x + n- 1’2xo.958)

= P{ (n”2(X - ,U)/Sl G x,.95} - 0.95 (2.8)

Ofcourse n”‘(X - p)/S does not have a Normal distribution, but a resealed Student’s

t distribution with n - 1 degrees of freedom Therefore the coverage error is essentially that which results from approximating to Student’s t distribution by a Normal distribution, and so is O(n-‘) (See Kendall and Stuart (1977, p 404).) That

is disappointing, particularly as classical methods lead so easily to an interval with precisely known coverage in this important special case

Trang 10

To appreciate why the percentile interval has this inadequate performance, let us

go back to our parametric example involving the Normal distribution The root cause of the problem there is that 8, and not cr, appears on the right-hand side in (2.8) This happens because the sample equation (2.6), equivalent here to (2.7), depends on 8 Put another way, the population equation (2.1), equivalent to

P{ l(V,) - B(F,)I < t} = 0.95,

depends on cr’, the population variancẹ This occurs because the distribution of le(F,) - 8(F,)I depends on the unknown CJ We should try to eliminate, or at least minimize, this dependencẹ

A function T of both the data and an unknown parameter is said to be (exactly)

piootal if it has the same distribution for all values of the unknowns It is asymptotically pivotal if, for sequences of known constants {a,} and {b,}, a,T+ b,

has a proper nondegenerate limiting distribution not depending on unknowns We may convert 8(F,) - 8(F,) into a pivotal statistic by correcting for scale, changing

it to T= {B(F,) - fl(F,)}/* r w h ere z* = r(F,) is an appropriate scale estimator In our example about the mean there are usually many different choices for 2, ẹg the sample standard deviation {n- ‘C(Xi - X) } ’ 1/Z the square root of the unbiased , variance estimate, Gini’s mean difference and the interquartile rangẹ In more complex problems, a jackknife standard deviation estimator is usually an option Note that exactly the same confidence interval will be obtained if t^ is replaced by c?, for any given c # 0, and so it is inessential that z^ be consistent for the asymptotic standard deviation of f&F,) What is important is piuotalness - exact pivotalness if

we are to obtain a confidence interval with zero coverage error, asymptotic pivotalness if exact pivotalness is unattainablẹ If we change to a pivotal statistic then the function f, alters from the form given in (2.3) to

f,(F,, F,) = 1(&F,) - tr(F,) d B(F,) d B(F,) + tr(F,)} - 0.95 (2.9)

In the case of our parametric Normal model, any reasonable scale estimator t* will give exact pivotalness We shall take z^ = 8, where 8’ = a’(F,) = n-‘C(Xi - f)’

denotes sample variancẹ Then f, becomes

ft(F,, F,) = 1(Q(F,) - m(~,) d e(F,) < e(F,) + tẵ,)) - 0.95

Using this functional in place of that at (2.3), but otherwise arguing exactly as before, equation (2.7) changes to

where T,_ I has Student’s t distribution with n - 1 degrees of freedom and is

stochastically independent of F, (Therefore the conditioning on F, in (2.10) is

Trang 11

Ch 39: Methodology and Theory for the Bootstrap 2351

irrelevant.) Thus, the solution of the sample equation is f0 = (n - 1)) 1i2w0,95, where w& = w,(n) is given by P(IT,_ 1 1 < w,) = ct The bootstrap confidence interval is (X - &,,b, % + 2,8), with perfect coverage accuracy,

P{X -(n - 1)-1’2w0,95 8 ,< p d X + (n - l)-“zw,,,,B) = 0.95

(Of course, the latter statement applies only to the parametric bootstrap under the assumption of a Normal model.)

Such confidence intervals are usually called percentile-t intervals since f0 is a

percentile of the Student’s t-like statistic /0(F,) - 8(F,)1/r(F,)

Perfect coverage accuracy of percentile-t intervals usually holds only in parametric problems where the underlying statistic is exactly pivotal More generally, if symmetric percentile-t intervals are constructed in parametric and nonparametric problems by solving the sample equation when f, is defined by (2.9), where z(F,) is chosen so that T= {8(F,) - B(F,)}/z(F,) is asymptotically pivotal, then coverage error will usually be O(n-‘) rather than the O(n- ‘) associated with ordinary percentile intervals

We conclude this example with remarks on the computation of critical points, such as r?,, by uniform Monte Carlo simulation Further details, including an account of efficient Monte Carlo simulation, are given in Section 5

Assume we wish to compute the solution 0, of the equation

or, to be more precise, the value

4 = inf{x:PC{W,) - B(F,))/T(F,) d xIF, (3 M}

Choose integers B > 1 and 1 d v 6 B such that v/(B + 1) = ~1 For example, if c1= 0.95

then we could take (v, B) = (9599) or (950,999) Conditonal on F,, draw B resamples

(gz, 1 <b d B} independently from the distribution with distribution function F,

In the nonparametric case, write F,,, for the empirical distribution function of 3:

In the parametric case, where the population distribution function is Fclo, and 1,

is a vector of unknown parameters, let i and @ denote the estimates of 1, computed from the sample X and the resample %“t, respectively, and put F,,, = Fci., For both

and write T* for a generic Tt In this notation, equation (2.11) is equivalent to

P(T* d z$,(%“) = a Let I&, denote the vth largest value of Tz Then O,,, f 8, with

probability one, conditional on 3, as B+ co The value O,,, is a Monte Carlo

approximation to v^,

Trang 12

2352 P Hall

3 Iterating the principle

Recall that in Section 2, we suggested that statistical inference often involves describing a relationship between the sample and the population We argued that this leads to a bootstrap principle, which may be enunciated in terms of finding an empirical solution to a population equation, (2.1) The empirical solution is obtained

by solving a sample version, (2.4) of the population equation The notation employed in those equations includes taking F,, F, and F, to denote the true

population distribution function, the empirical distribution function, and the resample version of the empiric, respectively The solution of the population equation is a functional of F,, say T(F,), and the solution of the sample equation

is the corresponding functional of the empiric, T(F,) The population equation may then be represented as

-W-W’,W,~ FJIFd = 0,

with approximate solution

The solution of the sample equation represents an approximation to the solution

of the population equation In many instances we would like to improve on this approximation ~ for example, to further reduce bias in a bias correction problem,

or to improve coverage accuracy in a confidence interval problem Therefore we introduce a correction term t to the functional T, so that T(.) becomes U(., t) with

U( , 0) E T( ) The adjustment may be multiplicative, for example, U( , t) E (1 + t)T( )

Or it may be an additive correction, as in U(*, t) = T(.) + t Or t might adjust some

particular feature of T, as in the level-error correction for confidence intervals, which

we shall discuss shortly In all cases, the functional U(.,t) should be smooth in t Our aim is to choose t so as to improve on the approximation (3.1)

Ideally, we would like to solve the equation

Trang 13

Ch 39: Methodology and TheoryJbr the Bootstrap

or equivalently,

This has solution &,oz = T,(F,), say, giving us a new approximate equation of the same form as the first approximation (3.1), and being the result of iterating that earlier approximation,

Our hope is that the approximation here is better than that in (3.1) so that in a sense U(F,, T,(F,)] is a better estimate than T(F,) of the solution t, to equation (2.1) Of course, this does not mean that U[F,, 7’,(F,)] is closer to t, than T(F,), only that the left-hand side of (3.4) is closer to zero than the left-hand side of (3.1)

If we revise notation and call U[F,, T,(F,)] the “new” T(F,), we may run through the argument again, obtaining a third approximate solution of (2.1) In principle, these iterations may be repeated as often as desired

We have given two explicit methods, multiplicative and additive, for modifying our original estimate f, = T(F,) of the solution of (2.1) so as to obtain the adjustable form U(F,, t) Those modifications may be used in a wide range of circumstances

In the special case of confidence intervals, an alternative approach is to modify the nominal coverage probability of the confidence interval To explain the argument

we shall concentrate on the special case of symmetric percentile-method intervals discussed in Example 2.1 Corrections for other types of intervals may be introduced

in like manner

An a-level symmetric percentile-method interval for Be = QF,) is given by [B(F,) - &,, f?(F,) + &,I, where &, is chosen to solve the sample equation

P{w-*) - t d WI) d 8(F,) + t/F,) -a = 0

(In our earlier examples, tl = 0.95.) This f,, is an estimator of the solution t, = T(F,)

of the population equation

Trang 14

2354 P Hall

Write x, as x(F,),, the quantile when F, is the true distribution function Then

E, = T(F,) is just x(F,),, and we might take U(., t) to be

U(., t) = x(.),+ 1

This is an alternative to multiplicative and additive corrections, which in the present problem are

U(.,t)-(1 +t)x(.), and U(.,t)=x(.),+t,

respectively In general, each will give slightly different numerical results, although,

as we shall prove shortly, each provides the same order of correction

Concise definitions of Fj are different in parametric and nonparametric cases In the former we work within a class {Fo,, d~l\} of distributions that are completely specified up to an unknown vector 1 of parameters The “true” distribution is

F, = Fcno), we estimate il, by I= L(X) where X = Xi is an n-sample drawn from F,, and we take F, to be F,i, To define Fj, let ij = L(Xj) denote the estimator i computed for an n-sample Xj drawn from Fjm 1 and put Fj = F,ir The nonparametric

case is conceptually simpler There, Fj is the empirical distribution of an n-sample drawn randomly from Fj_ 1, with replacement

To explain how high-index Fis enter into computation of bootstrap iterations,

we shall discuss calculation of the solution to equation (3.3) That requires calculation of U(F,, t), defined for example by

Thus, to find the second bootstrap iterate, the solution of (3.3), we must construct

F,, F,, and F, Calculation of F, “by simulation” typically involves order B sampling operations (B resamples drawn from the original sample), whereas calculation of F, “by simulation” involves order B2 sampling operations (B resamples drawn from each of B resamples) if the same number of operations is used

at each level Thus, i bootstrap iterations could require order B’ computations, and

so complexity would increase rapidly with the number of iterations

Trang 15

Ch 39: Methodology and Theory,for the Bootstrap 2355

In regular cases, expansions of the error in formulae such as (3.1) are usually power series in n- i’* or n- r, often resulting from Edgeworth expansions of the type that we shall discuss in Section 4 Each bootstrap iteration reduces the order of magnitude of error by a factor of at least n - liz However, in many problems with

an element of symmetry, such as two-sided confidence intervals, expansions of error are power series in IZ- ’ rather than n ‘I’, and each bootstrap iteration reduces error

by a factor of n-l, not just n- ‘I*

Example 3.1 Bias reduction

In this situation, each bootstrap iteration reduces the order of magnitude of bias

by the factor n-l (See Hall 1992, Section 1.5, for further details.) To investigate further the effect of bootstrap iteration on bias, observe that, in the case of bias reduction by an additive correction,

.ft(Fcb Fl) = w-1) - QF,) + t

Therefore the sample equation,

has solution t = T(F,) = QF,) - E{B(F,)IF,}, and so the once-iterated estimate is

Example 3.2 Confidence interval

Here, each iteration generally reduces the order of coverage error by the factor n-l

in the case of two-sided intervals, and by n- ‘I2 for one-sided intervals To appreciate

the effect of iteration in more detail, let us consider the case of parametric, percentile confidence intervals for a mean, assuming a Normal N(p, CJ*) population, discussed

in Example 2.2 Let N denote a Normal N(0, 1) random variable Estimate the

Trang 16

f;(F,,F,) = I{W,) - t d fI(F,) d B(F,) + t} - 0.95,

and the sample equation (2.4) has solution t = T(F,) = n-“2~,~,,o(F,), where x0,95

is given by P( 1 N 1 < x0.95) = 0.95 This gives the percentile interval

P(IT,-,/<w,)=sr

Trang 17

Ch 9: Methodoloyy und Thuory,fiv the Bootstrap 2357

The resulting bootstrap confidence interval is

CW,) - n 1’24F1)(%95 + f,), NF,) + fl- 1’2~(~l)(%.95 + Ml

= [X - (?I - 1))“2W,,,,B,X + (?I - 1))“2W,,,,6]

This is identical to the percentile-t (not the percentile) confidence interval derived

in Example 2.2 and has perfect coverage accuracy

The methodology of bootstrap iteration was introduced by Efron (1983), Hall (1986), Beran (1987) and Loh (1987)

4 Asymptotic theory

4.1 Summary

We begin by describing circumstances where Edgeworth expansions, in the usual rather than the bootstrap sense, may be generated under rigorous regularity conditions; see Section 4.2, Major contributors to this theory include Chibishov (1972,1973a, 1973b), Sargan (1975, 1976) and Bhattacharya and Ghosh (1978) Our account is based on the latter paper Following that, in Section 4.3, we discuss bootstrap versions of those expansions and then describe the conclusions that may

be drawn from those results Our first conclusions, about the efficacy of pivotal methods, are given towards the end of Section 4.3 Sections 4.4, 4.5, 4.6 and 4.7 describe respectively a variety of different confidence intervals, properties of bootstrap estimates of critical points, properties of coverage error and the special case of regression The last case is of particular interest because, in the context of intervals for slope parameters, it admits bootstrap methods with unusually good coverage accuracy

The main conclusions drawn in this section relate to the virtues of pivoting That subject was touched on in Section 2 but there we lacked the technical devices necessary to provide a broad description of the relative performances of pivotal and non-pivotal methods The Edgeworth expansion techniques introduced in Section4.2 fill this gap In particular, they enable us to show that pivotal methods generally yield greater accuracy in the estimation of critical points (Section 4.5) and smaller asymptotic order of coverage error of one-sided confidence intervals (Section 4.6) Nevertheless, it should be borne in mind that these results are asymptotic in character and that, while they provide a valuable guide, they do not tell the whole story For example, the performance of pivotal methods with small samples depends

in large part on the relative accuracy of the variance estimator and can be very poor

in cases where an accurate variance estimator is not available Examples which feature poor accuracy include interval estimation for the correlation coefficient and for a ratio of means when the denominator mean is close to zero

Trang 18

2358

Theory for the bootstrap, along the lines of that described here, was developed

by Bickel and Freedman (1980), Singh (1981), Beran (1982, 1987), Babu and Singh (1983, 1984, 1985), Hall (1986, 1988a, 1988b), Efron (1987), Liu and Singh (1987) and Robinson (1987) Further work on the bootstrap in regression models is described by Bickel and Freedman (198 1, 1983), Freedman (198 l), Freedman and Peters (1984) and Peters and Freedman (1984a, 1984b)

4.2 Edgeworth and Cornish-Fisher expansions

We begin by describing a general model that allows Edgeworth and CornishhFisher expansions to be established rigorously Let @, 4 denote respectively the Standard Normal distribution and density functions Let X, X,, X,, be independent and identically distributed random column d-vectors with mean p, and put X = n ‘C Xi

Let A: Rd f R be a smooth function satisfying A(p) = 0 We have in mind a function

such as A(x) = {g(x) - g(~)}/h@), where 8, = g(p) is the (scalar) parameter estimated

by 6 = g(X) and g2 = h(/1)2 is the asymptotic variance of n”28; or A(x) = {g(x) - g(p)}/h(x), where b2 = h(X) is an estimator of h(p) (Thus, we assume h is a known

function.)

This “smooth function model” allows us to study problems where 8, is a mean,

or a variance, or a ratio of means or variances, or a difference of means or variances,

or a correlation coefficient, etc For example, if { W,, , W,,} were a random sample

from a univariate population with mean m and variance fi2, and if we wished to

estimate 0, = m, then we would take d = 2, x = (X”‘, Xc2’)r = (W, W2)T, p = E(f),

Trang 19

Ch 39: Methodology and Theoryfor the Bootstrap 2359

h(X) = rl-l i (Wi - W)2 = B’

i=l

(Note that E(W- m)” - /I” equals the asymptotic variance of nri2/?.) The cases where o0 is a correlation coefficient (a function of five means), or a variance ratio (a function of four means), among others, may be treated similarly

The following result may be established under the model described above We first present a little notation Put p = E(X), and let

Assume that the function A has j + 2 continuous derivatives in a neighbourhood of

p = E(X), that 4(p) = 0, that E( II ii! IJj’2) < co, and that the characteristic function

x of X satisfies

limsup Ix(t)1 < 1

Suppose CJ > 0 Then for j 2 1,

P{n”2A(@/o d x) = e,(x) + n-“2pl(x)&) +

Trang 20

uniformly in x, where pj is a polynomial of degree at most 3j - 1, odd for even j and even for odd j, with coefficients depending on moments of 2 up to order j + 2 In particular,

pl(x) = (‘4,o-’ + $42a-3(X2 - l)}

See Bhattacharya and Ghosh (1978) for a proof

Condition (4.1) is a multivariate form of Cramer’s continuity condition It is satisfied if the distribution of z is nonsingular (i.e has a nondegenerate absolutely continuous component) or if 2 = (W, W2, , Wd)T where W is a random variable with a nonsingular distribution

Two versions of (4.2) are given by

The Edgeworth expansion in Theorem 4.1 is readily inverted so as to yield a Cornish-Fisher expansion of the critical point of a distribution To appreciate how, first define w, = w,(n), the a-level quantile of the distribution of S, = n1’2A(x), by

Ngày đăng: 02/07/2014, 22:20

🧩 Sản phẩm bạn có thể quan tâm