TO cite an extreme case, one would not usually think of using Monte Carlo methods to compute a sample mean or variance, but nevertheless those quantities might reasonably be regarded as
Trang 22342 P Hall
Abstract
A brief account is given of the methodology and theory for the bootstrap Methodology is developed in the context of the “equation” approach, which allows attention to be focussed on specific criteria for excellence, such as coverage error of
a confidence interval or expected value of a bias-corrected estimator This approach utilizes a definition of the bootstrap in which the key component is replacing a true distribution function by its empirical estimator Our theory is Edgeworth expansion based, and is aimed specifically at elucidating properties of different methods for constructing bootstrap confidence intervals in a variety of settings The reader interested in more detail than can be provided here is referred to the recent monograph of Hall (1992)
Trang 3Ch 39: Methodoioyy and Theoryfor the Bootstrap 2343
Barnard (1963), Hope (I 968) and Marriott (1979) Our definition of the bootstrap would not regard Monte Carlo testing as a bootstrap procedure That may be seen as either an advantage or a disadvantage, depending on one’s view
A second objection that one may have to defining the “bootstrap” strictly in terms
of whether or not Monte Carlo methods are employed, is that the method of numerical computation becomes intrinsic to the definition TO cite an extreme case, one would not usually think of using Monte Carlo methods to compute a sample mean or variance, but nevertheless those quantities might reasonably be regarded
as bootstrap estimators of the population mean and variance, respectively In a less obvious instance, estimators of bootstrap distribution functions, which would usually be candidates for approximation by Monte Carlo methods, may sometimes
be computed most effectively by exact, non-Monte Carlo methods See for example Fisher and Hall (1991) In other settings, saddlepoint methods provide excellent alternatives to simulation; see Davison and Hinkley (1988) and Reid (1988) Does
a technique stop being a bootstrap method as soon as non-Monte Carlo methods are employed? To argue that it does seems unnecessarily pedantic, but to deny that
it does would cause some problems for a bootstrap definition based on the notion
of simulation
The name “bootstrap” was introduced by Efron (1979), and it is appropriate here
to emphasize the fundamental contributions that he made As Efron was careful to point out, bootstrap methods (in the sense of replacing F by F) had been around
for many years before his seminal paper But he was perhaps the first to perceive the enormous breadth of this class of methods He saw too that the power of modern computing machinery could be harnessed to allow functionals of F^ to be computed
in very diverse circumstances The combination of these two observations is extremely powerful, and its ultimate effect on Statistics will be revolutionary Necessarily, these two observations go together; the vast range of applications of bootstrap methods would not be possible without a facility for extremely rapid simulation However, that fact does not imply that bootstrap methods are restricted
to situations where simulation is employed for calculation
Statistical scientists who thought along lines similar to Efron include Hartigan (1969, 1971), who used resampled sub-samples to construct point and interval estimators, and who stressed connections with Mahalanobis’ “interpenetrating samples” and the jackknife of Quenouille (1949, 1956) and Tukey (1958); and Simon (1969, Chapters 23-25), who described a variety of Monte Carlo methods
Let us accept, for the sake of argument, that bootstrap methods are defined by the “replace F by P’ rule, described above Two challenges immediately emerge in response to this definition First, we must determine how to “focus” this concept,
SO as to make the bootstrap responsive to statistical demands That is, how do we decide which functionals of F should be estimated? This requires a “principle” that
enables US to implement bootstrap methods in a range of circumstances The second challenge is that of calculating the values of those functionals in a practical setting The latter problem may be solved partly by providing simulation methods or related
Trang 4devices, such as saddlepoint arguments, for numerical approximation Space limitations mean that a thorough account of these techniques is beyond the scope
of this chapter However, a detailed account of efficient methods of bootstrap simulation may be found in Appendix II of Hall (1992) A key part of the answer to the first question is the development of theory describing the relative performance
of different forms of the bootstrap, and that issue will be addressed at some length here
Our answer to the first question is provided in Section 2, where we describe an
“equation approach” to focussing attention on specific statistical questions This technique was discussed in more detail by Hall and Martin (1988), Martin (1989) and Hall (1992, Chapter 1) It leads naturally to bootstrap iteration, which is discussed in Section 3 Section 4 presents theory that enables comparisons to be made of different bootstrap approaches to inference about distributions The reader
is referred to Hinkley (1988) and DiCiccio and Roman0 (1988) for excellent reviews
of bootstrap methods
Our discussion is necessarily kept brief and is essentially an abbreviated form of
an account that may be found in Hall (1992) In undertaking that abbreviation we have omitted discussion of a variety of different approaches to the bootstrap In particular, we do not discuss various forms of bias correction, not because we do not recommend it but because space does not permit an adequate survey We readily concede that the restricted account of bootstrap methods and theory presented here
is in need of a degree of bias correction itself!
We do not address in any detail the bootstrap for dependent data, but pause here
to outline the main issues There are two main approaches to implementing the bootstrap in dependent settings The first is to model the dependent process as one that is driven by independent and identically distributed disturbances - examples include autoregressions and moving averages We describe briefly here a technique which may be used when no parametric assumptions are made about the distribution of the disturbances First estimate the parameters of the model, and calculate the residuals (i.e the estimated values of the independent disturbances) Then run the process over and over again, by Monte Carlo simulation, with parameter values set equal to their estimated values and with the bootstrapped independent disturbances obtained by resampling randomly, with replacement, from the set of residuals Each resampled process should be of the same length as the original one, and bootstrap inference may be conducted by averaging over the independent Monte Carlo replications Bose (1988) addresses the efficacy of this procedure in the context of autoregressive models, and derives results that may be viewed as analogues (in the case of autoregressive processes) of some of those discussed later in this chapter for independent data
If the distribution of disturbances is assumed known then, rather than estimate residuals and resample with replacement from those, the parameters of the assumed distribution may be estimated The bootstrap disturbances may now be derived by resampling from the hypothesized distribution, with parameters estimated
Trang 5Ch 39: Methodology and Throryfir the Bootstrup 2345
The major other way of bootstrapping dependent processes is to divide the data sequence into blocks, and resample the blocks rather than individual data values This approach has application in spatial as well as “linear” or time series contexts, and indeed was apparently first suggested for spatial data; see Hall (1985) Blocking methods may involve either non-overlapping blocks, as in the technique treated by Carlstein (1986), or overlapping blocks, as proposed by Kiinsch (1989) (Both methods were considered for spatial data by Hall (1985)) In sheer asymptotic terms Kiinsch’s method has advantages over Carlstein’s, but those advantages are not always apparent in practice This matter has been addressed by Hall and Horowitz (1993) in the context of estimating bias or variance, and there the matter of optimal block width has been treated The issue of distribution estimation using blocking methods has been discussed by Gotze and Kiinsch (1990), Lahiri (1991, 1992) and Davison and Hall (1993)
2 A formal definition of the bootstrap principle
Much of statistical inference involves describing the relationship between a sample and the population from which the sample was drawn Formally, given a functional
f, from a class (f,:t~Y->, we wish to determine that value t, of r that solves an equation such as
where F = F, denotes the population distribution function and F = F, is the
distribution function “of the sample” An explicit definition of F, will be given
shortly Conditioning on F, in (2.1) serves to stress that the expectation is taken with respect to the distribution F, We call (2.1) the population equation because we
need properties of the population if we are to solve this equation exactly
For example, let 8, = d(F,) denote a true parameter value, such as the rth power
Trang 6We call this the sample equation because we know (or can find out) everything about
it once we know the sample distribution function F, In particular, its solution f,
is a function of the sample values
We call & and E{f,(F,, FJ F,} “the bootstrap estimators” of t, and E{f,(F,, F,) 1 F,}, respectively They are obtained by replacing F0 by F, in formulae for to
and E{f,(F,, F,)I F,} In the bias correction problem, where f, is given by (2.2), the
bootstrap version of our bias-corrected estimator is I!+ &, In the confidence interval problem where (2.3) describes f,, our bootstrap confidence interval is (e - &,, 8 + f,) The latter is commonly called a (symmetric) percentile-method confidence interval for 6,
The “bootstrap principle” might be described in terms of this approach to estimation of a population equation
It is appropriate now to give detailed definitions of F, and F, There are two
approaches, suitable for nonparametric and parametric problems respectively In both, inference is based on a sample X of n random (independent and identically distributed) observations of the population In the nonparametric case, F, is simply
the empirical distribution function of X; that is, the distribution function of the distribution that assigns mass n-l to each point in X The associated empirical
probability measure assigns to a region B a value equal to the proportion of the sample that lies within 2 Similarly, F, is the empirical distribution function of a sample drawn at random from the population with distribution function F,; that
is, the empiric of a sample !Z* drawn randomly, with replacement, from 3 If we denote the population by X0 then we have a nest of sampling operations: X is drawn
at random from X0 and !E* is drawn at random from X
Trang 7Ch 39: Mrthodology and Theoryfor the Bootstrap 2341
In the parametric case, F, is assumed completely known up to a finite vector i,
of unknown parameters To indicate this dependence we write F, = F,*(,), an element
of a class {F,,,, k.~Aj of possible distributions Let 1: be an estimator of I, computed from J, often (but not necessarily) the maximum likelihood estimator It will be a function of sample values, so we may write it as h(X) Then F, = F,Q, the distribution function obtained on replacing “true” parameter values by their sample estimates Let X* denote the sample drawn at random from the distribution with distribution function F,,, (not simply drawn from 3” with replacement), and let fi* = A(F*) denote the version of I computed for Y* instead of Y Then F, = F,i*,
It is appropriate now to discuss two examples that illustrate the bootstrap principle
Example 2.1 Bias reduction
Here the function f, is given by (2.2), and the sample equation (2.4) assumes the form
{.Fz, 1 d b d B} independently from the distribution with distribution function F,
In the nonparametric case, where F, is the empirical distribution function of the sample 3, let F,, denote the empirical distribution function of !!z In the parametric case, let iz = I’(%;) be that estimator of &, computed from resample Fz, and put
F,, = Fci*, Define 6: = 8(F,,) and o^= H(F,) Then in both parametric and non- parametrPc circumstances,
h=l
converges to fi = E(O(F,)lF,} = E(@*(X) (with probability one, conditional on F,)
as B+ncj
Trang 82348 P Hull
Example 2.2 Confidence interval
A symmetric confidence interval for 8, = U(F,) may be constructed by applying the resampling principle using the function f, given by (2.3) The sample equation then assumes the form
P{8(F,) - t < 8(F,) < Q(F,) + t(F,} - 0.95 = 0 (2.6)
In a nonparametric context Q(F,), conditional on F,, has a discrete distribution and so it would seldom be possible to solve (2.6) exactly However, any error in the solution of (2.6) will usually be very small, since the size of even the largest atom of the distribution of B(F,) decreases exponentially quickly with increasing II The largest atom is of size only 3.6 x 1O-4 when IZ = 10 We could remove this minor difficulty by smoothing the distribution function F, In parametric cases, (2.6) may usually be solved exactly for t
The interval (& f,, 8+ &J is a bootstrap confidence interval for 8, = 8(F,), usually called a (two-sided, symmetric) percentile interval since &, is a percentile of the distribution of le(F,) - Q(F,)I conditional on F, Other nominal 95% percentile intervals include the two-sided, equal-tailed interval (i?- f,,, 8 + fo2) and the one-sided interval (- co, & + f,,), where f,,, fo2, and f,, solve
Still other 95% percentile intervals are I^, = (e- fo2, 8+ f,,) and III = (- co,
8 + fo4), where too4 is the solution of
P{fI(F,) d Q(F,) - tlF,} - 0.05 = 0
These do not fit naturally into a systematic development of bootstrap methods by frequentist arguments, and we find them a little contrived They are sometimes
Trang 9Ch 3Y: Methodology and Theoryfor the Bootstrap
motivated as follows Define e* = B(F,), I?(x) = P(8* < ~1%) and
I?‘(C()=inf{x:t?(x)>a}
Then
r^, = [I?‘(0.025),&‘(0.975)] and fI = [- co,I?‘(0.95)]
All these intervals cover 8, with probability approximately 0.95, which might be called the nominal coverage Coverage error is defined to be true coverage minus nominal coverage; it generally converges to zero as sample size increases
We now treat in more detail the construction of two-sided, symmetric percentile intervals in parametric problems There, provided the distribution functions Fo, are continuous, equation (2.6) may be solved exactly We focus attention on the cases where 8, = Q(F,) is a population mean and the population is normal or exponential Our main aim is to bring out the virtues of pivoting, which usually amounts to resealing so that the distribution of a statistic depends less on unknown parameters
If the population is Normal N@,c?) and we use the maximum likelihood estimator x = (x, S2) to estimate 1, = (CL, a2), then the sample equation (2.6) may be rewritten as
with coverage error
P(r7 - C2X (p& d /.i d x + n- 1’2xo.958)
= P{ (n”2(X - ,U)/Sl G x,.95} - 0.95 (2.8)
Ofcourse n”‘(X - p)/S does not have a Normal distribution, but a resealed Student’s
t distribution with n - 1 degrees of freedom Therefore the coverage error is essentially that which results from approximating to Student’s t distribution by a Normal distribution, and so is O(n-‘) (See Kendall and Stuart (1977, p 404).) That
is disappointing, particularly as classical methods lead so easily to an interval with precisely known coverage in this important special case
Trang 10To appreciate why the percentile interval has this inadequate performance, let us
go back to our parametric example involving the Normal distribution The root cause of the problem there is that 8, and not cr, appears on the right-hand side in (2.8) This happens because the sample equation (2.6), equivalent here to (2.7), depends on 8 Put another way, the population equation (2.1), equivalent to
P{ l(V,) - B(F,)I < t} = 0.95,
depends on cr’, the population variancẹ This occurs because the distribution of le(F,) - 8(F,)I depends on the unknown CJ We should try to eliminate, or at least minimize, this dependencẹ
A function T of both the data and an unknown parameter is said to be (exactly)
piootal if it has the same distribution for all values of the unknowns It is asymptotically pivotal if, for sequences of known constants {a,} and {b,}, a,T+ b,
has a proper nondegenerate limiting distribution not depending on unknowns We may convert 8(F,) - 8(F,) into a pivotal statistic by correcting for scale, changing
it to T= {B(F,) - fl(F,)}/* r w h ere z* = r(F,) is an appropriate scale estimator In our example about the mean there are usually many different choices for 2, ẹg the sample standard deviation {n- ‘C(Xi - X) } ’ 1/Z the square root of the unbiased , variance estimate, Gini’s mean difference and the interquartile rangẹ In more complex problems, a jackknife standard deviation estimator is usually an option Note that exactly the same confidence interval will be obtained if t^ is replaced by c?, for any given c # 0, and so it is inessential that z^ be consistent for the asymptotic standard deviation of f&F,) What is important is piuotalness - exact pivotalness if
we are to obtain a confidence interval with zero coverage error, asymptotic pivotalness if exact pivotalness is unattainablẹ If we change to a pivotal statistic then the function f, alters from the form given in (2.3) to
f,(F,, F,) = 1(&F,) - tr(F,) d B(F,) d B(F,) + tr(F,)} - 0.95 (2.9)
In the case of our parametric Normal model, any reasonable scale estimator t* will give exact pivotalness We shall take z^ = 8, where 8’ = a’(F,) = n-‘C(Xi - f)’
denotes sample variancẹ Then f, becomes
ft(F,, F,) = 1(Q(F,) - m(~,) d e(F,) < e(F,) + tẵ,)) - 0.95
Using this functional in place of that at (2.3), but otherwise arguing exactly as before, equation (2.7) changes to
where T,_ I has Student’s t distribution with n - 1 degrees of freedom and is
stochastically independent of F, (Therefore the conditioning on F, in (2.10) is
Trang 11Ch 39: Methodology and Theory for the Bootstrap 2351
irrelevant.) Thus, the solution of the sample equation is f0 = (n - 1)) 1i2w0,95, where w& = w,(n) is given by P(IT,_ 1 1 < w,) = ct The bootstrap confidence interval is (X - &,,b, % + 2,8), with perfect coverage accuracy,
P{X -(n - 1)-1’2w0,95 8 ,< p d X + (n - l)-“zw,,,,B) = 0.95
(Of course, the latter statement applies only to the parametric bootstrap under the assumption of a Normal model.)
Such confidence intervals are usually called percentile-t intervals since f0 is a
percentile of the Student’s t-like statistic /0(F,) - 8(F,)1/r(F,)
Perfect coverage accuracy of percentile-t intervals usually holds only in parametric problems where the underlying statistic is exactly pivotal More generally, if symmetric percentile-t intervals are constructed in parametric and nonparametric problems by solving the sample equation when f, is defined by (2.9), where z(F,) is chosen so that T= {8(F,) - B(F,)}/z(F,) is asymptotically pivotal, then coverage error will usually be O(n-‘) rather than the O(n- ‘) associated with ordinary percentile intervals
We conclude this example with remarks on the computation of critical points, such as r?,, by uniform Monte Carlo simulation Further details, including an account of efficient Monte Carlo simulation, are given in Section 5
Assume we wish to compute the solution 0, of the equation
or, to be more precise, the value
4 = inf{x:PC{W,) - B(F,))/T(F,) d xIF, (3 M}
Choose integers B > 1 and 1 d v 6 B such that v/(B + 1) = ~1 For example, if c1= 0.95
then we could take (v, B) = (9599) or (950,999) Conditonal on F,, draw B resamples
(gz, 1 <b d B} independently from the distribution with distribution function F,
In the nonparametric case, write F,,, for the empirical distribution function of 3:
In the parametric case, where the population distribution function is Fclo, and 1,
is a vector of unknown parameters, let i and @ denote the estimates of 1, computed from the sample X and the resample %“t, respectively, and put F,,, = Fci., For both
and write T* for a generic Tt In this notation, equation (2.11) is equivalent to
P(T* d z$,(%“) = a Let I&, denote the vth largest value of Tz Then O,,, f 8, with
probability one, conditional on 3, as B+ co The value O,,, is a Monte Carlo
approximation to v^,
Trang 122352 P Hall
3 Iterating the principle
Recall that in Section 2, we suggested that statistical inference often involves describing a relationship between the sample and the population We argued that this leads to a bootstrap principle, which may be enunciated in terms of finding an empirical solution to a population equation, (2.1) The empirical solution is obtained
by solving a sample version, (2.4) of the population equation The notation employed in those equations includes taking F,, F, and F, to denote the true
population distribution function, the empirical distribution function, and the resample version of the empiric, respectively The solution of the population equation is a functional of F,, say T(F,), and the solution of the sample equation
is the corresponding functional of the empiric, T(F,) The population equation may then be represented as
-W-W’,W,~ FJIFd = 0,
with approximate solution
The solution of the sample equation represents an approximation to the solution
of the population equation In many instances we would like to improve on this approximation ~ for example, to further reduce bias in a bias correction problem,
or to improve coverage accuracy in a confidence interval problem Therefore we introduce a correction term t to the functional T, so that T(.) becomes U(., t) with
U( , 0) E T( ) The adjustment may be multiplicative, for example, U( , t) E (1 + t)T( )
Or it may be an additive correction, as in U(*, t) = T(.) + t Or t might adjust some
particular feature of T, as in the level-error correction for confidence intervals, which
we shall discuss shortly In all cases, the functional U(.,t) should be smooth in t Our aim is to choose t so as to improve on the approximation (3.1)
Ideally, we would like to solve the equation
Trang 13Ch 39: Methodology and TheoryJbr the Bootstrap
or equivalently,
This has solution &,oz = T,(F,), say, giving us a new approximate equation of the same form as the first approximation (3.1), and being the result of iterating that earlier approximation,
Our hope is that the approximation here is better than that in (3.1) so that in a sense U(F,, T,(F,)] is a better estimate than T(F,) of the solution t, to equation (2.1) Of course, this does not mean that U[F,, 7’,(F,)] is closer to t, than T(F,), only that the left-hand side of (3.4) is closer to zero than the left-hand side of (3.1)
If we revise notation and call U[F,, T,(F,)] the “new” T(F,), we may run through the argument again, obtaining a third approximate solution of (2.1) In principle, these iterations may be repeated as often as desired
We have given two explicit methods, multiplicative and additive, for modifying our original estimate f, = T(F,) of the solution of (2.1) so as to obtain the adjustable form U(F,, t) Those modifications may be used in a wide range of circumstances
In the special case of confidence intervals, an alternative approach is to modify the nominal coverage probability of the confidence interval To explain the argument
we shall concentrate on the special case of symmetric percentile-method intervals discussed in Example 2.1 Corrections for other types of intervals may be introduced
in like manner
An a-level symmetric percentile-method interval for Be = QF,) is given by [B(F,) - &,, f?(F,) + &,I, where &, is chosen to solve the sample equation
P{w-*) - t d WI) d 8(F,) + t/F,) -a = 0
(In our earlier examples, tl = 0.95.) This f,, is an estimator of the solution t, = T(F,)
of the population equation
Trang 142354 P Hall
Write x, as x(F,),, the quantile when F, is the true distribution function Then
E, = T(F,) is just x(F,),, and we might take U(., t) to be
U(., t) = x(.),+ 1
This is an alternative to multiplicative and additive corrections, which in the present problem are
U(.,t)-(1 +t)x(.), and U(.,t)=x(.),+t,
respectively In general, each will give slightly different numerical results, although,
as we shall prove shortly, each provides the same order of correction
Concise definitions of Fj are different in parametric and nonparametric cases In the former we work within a class {Fo,, d~l\} of distributions that are completely specified up to an unknown vector 1 of parameters The “true” distribution is
F, = Fcno), we estimate il, by I= L(X) where X = Xi is an n-sample drawn from F,, and we take F, to be F,i, To define Fj, let ij = L(Xj) denote the estimator i computed for an n-sample Xj drawn from Fjm 1 and put Fj = F,ir The nonparametric
case is conceptually simpler There, Fj is the empirical distribution of an n-sample drawn randomly from Fj_ 1, with replacement
To explain how high-index Fis enter into computation of bootstrap iterations,
we shall discuss calculation of the solution to equation (3.3) That requires calculation of U(F,, t), defined for example by
Thus, to find the second bootstrap iterate, the solution of (3.3), we must construct
F,, F,, and F, Calculation of F, “by simulation” typically involves order B sampling operations (B resamples drawn from the original sample), whereas calculation of F, “by simulation” involves order B2 sampling operations (B resamples drawn from each of B resamples) if the same number of operations is used
at each level Thus, i bootstrap iterations could require order B’ computations, and
so complexity would increase rapidly with the number of iterations
Trang 15Ch 39: Methodology and Theory,for the Bootstrap 2355
In regular cases, expansions of the error in formulae such as (3.1) are usually power series in n- i’* or n- r, often resulting from Edgeworth expansions of the type that we shall discuss in Section 4 Each bootstrap iteration reduces the order of magnitude of error by a factor of at least n - liz However, in many problems with
an element of symmetry, such as two-sided confidence intervals, expansions of error are power series in IZ- ’ rather than n ‘I’, and each bootstrap iteration reduces error
by a factor of n-l, not just n- ‘I*
Example 3.1 Bias reduction
In this situation, each bootstrap iteration reduces the order of magnitude of bias
by the factor n-l (See Hall 1992, Section 1.5, for further details.) To investigate further the effect of bootstrap iteration on bias, observe that, in the case of bias reduction by an additive correction,
.ft(Fcb Fl) = w-1) - QF,) + t
Therefore the sample equation,
has solution t = T(F,) = QF,) - E{B(F,)IF,}, and so the once-iterated estimate is
Example 3.2 Confidence interval
Here, each iteration generally reduces the order of coverage error by the factor n-l
in the case of two-sided intervals, and by n- ‘I2 for one-sided intervals To appreciate
the effect of iteration in more detail, let us consider the case of parametric, percentile confidence intervals for a mean, assuming a Normal N(p, CJ*) population, discussed
in Example 2.2 Let N denote a Normal N(0, 1) random variable Estimate the
Trang 16f;(F,,F,) = I{W,) - t d fI(F,) d B(F,) + t} - 0.95,
and the sample equation (2.4) has solution t = T(F,) = n-“2~,~,,o(F,), where x0,95
is given by P( 1 N 1 < x0.95) = 0.95 This gives the percentile interval
P(IT,-,/<w,)=sr
Trang 17Ch 9: Methodoloyy und Thuory,fiv the Bootstrap 2357
The resulting bootstrap confidence interval is
CW,) - n 1’24F1)(%95 + f,), NF,) + fl- 1’2~(~l)(%.95 + Ml
= [X - (?I - 1))“2W,,,,B,X + (?I - 1))“2W,,,,6]
This is identical to the percentile-t (not the percentile) confidence interval derived
in Example 2.2 and has perfect coverage accuracy
The methodology of bootstrap iteration was introduced by Efron (1983), Hall (1986), Beran (1987) and Loh (1987)
4 Asymptotic theory
4.1 Summary
We begin by describing circumstances where Edgeworth expansions, in the usual rather than the bootstrap sense, may be generated under rigorous regularity conditions; see Section 4.2, Major contributors to this theory include Chibishov (1972,1973a, 1973b), Sargan (1975, 1976) and Bhattacharya and Ghosh (1978) Our account is based on the latter paper Following that, in Section 4.3, we discuss bootstrap versions of those expansions and then describe the conclusions that may
be drawn from those results Our first conclusions, about the efficacy of pivotal methods, are given towards the end of Section 4.3 Sections 4.4, 4.5, 4.6 and 4.7 describe respectively a variety of different confidence intervals, properties of bootstrap estimates of critical points, properties of coverage error and the special case of regression The last case is of particular interest because, in the context of intervals for slope parameters, it admits bootstrap methods with unusually good coverage accuracy
The main conclusions drawn in this section relate to the virtues of pivoting That subject was touched on in Section 2 but there we lacked the technical devices necessary to provide a broad description of the relative performances of pivotal and non-pivotal methods The Edgeworth expansion techniques introduced in Section4.2 fill this gap In particular, they enable us to show that pivotal methods generally yield greater accuracy in the estimation of critical points (Section 4.5) and smaller asymptotic order of coverage error of one-sided confidence intervals (Section 4.6) Nevertheless, it should be borne in mind that these results are asymptotic in character and that, while they provide a valuable guide, they do not tell the whole story For example, the performance of pivotal methods with small samples depends
in large part on the relative accuracy of the variance estimator and can be very poor
in cases where an accurate variance estimator is not available Examples which feature poor accuracy include interval estimation for the correlation coefficient and for a ratio of means when the denominator mean is close to zero
Trang 182358
Theory for the bootstrap, along the lines of that described here, was developed
by Bickel and Freedman (1980), Singh (1981), Beran (1982, 1987), Babu and Singh (1983, 1984, 1985), Hall (1986, 1988a, 1988b), Efron (1987), Liu and Singh (1987) and Robinson (1987) Further work on the bootstrap in regression models is described by Bickel and Freedman (198 1, 1983), Freedman (198 l), Freedman and Peters (1984) and Peters and Freedman (1984a, 1984b)
4.2 Edgeworth and Cornish-Fisher expansions
We begin by describing a general model that allows Edgeworth and CornishhFisher expansions to be established rigorously Let @, 4 denote respectively the Standard Normal distribution and density functions Let X, X,, X,, be independent and identically distributed random column d-vectors with mean p, and put X = n ‘C Xi
Let A: Rd f R be a smooth function satisfying A(p) = 0 We have in mind a function
such as A(x) = {g(x) - g(~)}/h@), where 8, = g(p) is the (scalar) parameter estimated
by 6 = g(X) and g2 = h(/1)2 is the asymptotic variance of n”28; or A(x) = {g(x) - g(p)}/h(x), where b2 = h(X) is an estimator of h(p) (Thus, we assume h is a known
function.)
This “smooth function model” allows us to study problems where 8, is a mean,
or a variance, or a ratio of means or variances, or a difference of means or variances,
or a correlation coefficient, etc For example, if { W,, , W,,} were a random sample
from a univariate population with mean m and variance fi2, and if we wished to
estimate 0, = m, then we would take d = 2, x = (X”‘, Xc2’)r = (W, W2)T, p = E(f),
Trang 19Ch 39: Methodology and Theoryfor the Bootstrap 2359
h(X) = rl-l i (Wi - W)2 = B’
i=l
(Note that E(W- m)” - /I” equals the asymptotic variance of nri2/?.) The cases where o0 is a correlation coefficient (a function of five means), or a variance ratio (a function of four means), among others, may be treated similarly
The following result may be established under the model described above We first present a little notation Put p = E(X), and let
Assume that the function A has j + 2 continuous derivatives in a neighbourhood of
p = E(X), that 4(p) = 0, that E( II ii! IJj’2) < co, and that the characteristic function
x of X satisfies
limsup Ix(t)1 < 1
Suppose CJ > 0 Then for j 2 1,
P{n”2A(@/o d x) = e,(x) + n-“2pl(x)&) +
Trang 20uniformly in x, where pj is a polynomial of degree at most 3j - 1, odd for even j and even for odd j, with coefficients depending on moments of 2 up to order j + 2 In particular,
pl(x) = (‘4,o-’ + $42a-3(X2 - l)}
See Bhattacharya and Ghosh (1978) for a proof
Condition (4.1) is a multivariate form of Cramer’s continuity condition It is satisfied if the distribution of z is nonsingular (i.e has a nondegenerate absolutely continuous component) or if 2 = (W, W2, , Wd)T where W is a random variable with a nonsingular distribution
Two versions of (4.2) are given by
The Edgeworth expansion in Theorem 4.1 is readily inverted so as to yield a Cornish-Fisher expansion of the critical point of a distribution To appreciate how, first define w, = w,(n), the a-level quantile of the distribution of S, = n1’2A(x), by