16.1 Asymptotic properties Consider the test defined by the rejection region and whose power function is Since the distribution of r„X is not known we cannot determine c, or 326... If
Trang 1CHAPTER 16*
Asymptotic test procedures
As discussed in Chapter 14, the main problem in hypothesis testing is to construct a test statistic t(X) whose distribution we know under both the
null hypothesis Hy and the alternative H, and it does not depend on the
unknown parameter(s) 0 The first part of the problem, that of constructing
1(X), can be handled relatively easily using the various methods discussed
above (Neyman-—Pearson lemma, likelihood ratio) when certain conditions are satisfied The second part of the problem, that of determining the
distribution of t(X) under both Hy and H,, is much more difficult to ‘solve’
and often we have to resort to asymptotic theory This amounts to deriving the asymptotic distribution of 7(X) and using that to determine the rejection
region C, (or C,) and the associated probabilities For a given sample size n, however, these will be as accurate as the asymptotic distribution of 1,(X) is
an accurate approximation ofits finite sample distribution Moreover, since the distribution of t,(X) for a given n is not known (otherwise we use that) we
do not know how ‘good’ the approximation is This suggests that when asymptotic results are used we should be aware of their limitations and the inaccuracies they can lead to (see Chapter 10)
16.1 Asymptotic properties
Consider the test defined by the rejection region
and whose power function is
Since the distribution of r„(X) is not known we cannot determine c, or
326
Trang 216.1 Asymptotic properties 327
Y(8) If the asymptotic distribution of t,(X) is available, however, we can use that instead to define c, from some fixed « and the asymptotic power Junction
In this sense we can think of {t,(X), n> 1} as a sequence of test statistics defining a sequence of rejection regions {C1,n> 1} with power functions
1246), n>1, 0e@} and we can choose c, accordingly to ensure that the
sequence of tests have the same size « if
GeO,
Note that lim, ,, A(@)= (6) In this context the various criteria for tests discussed above must be reformulated in terms of the asymptotic power function (8); see Bickel and Doksum (1977)
Definition 1
The sequence of tests for Hy: 0€@, against H,: 0 €@, defined by
{CI,n> 1} is said to be consistent of size « if
AED,
and
As in the case of estimation, consistency is a reasonable property but only a
minimal property In order to be able to make comparisons between various tests we need better approximations to the power than 1 With this
in mind we define asymptotic unbiasedness
Definition 2
A sequence of tests as defined above is said to be asymptotically
unbiased of size « if
060,
and
Definition 3
A sequence of tests as defined above is said to be uniformly most
Trang 3328 Asymptotic test procedures
power (UMP) of size « if
9cOo
and
for any size x test with asymptotic power function n*(8)
In asymptotic tests we are often interested in local alternatives of the form
b
Hy:0,=O0+ 7 b#O 1 1 9 n z ( 16.11 )
in order to assess the power of the test around the null When
1,(0)=O,(n) then /n(Ô— 0) ~ NỊb,I„(0)ˆ ') (16.12) for ổ the MLE of Ø In this case we consider only local power and a test with greatest local power is called locally uniformly most powerful
16.2 The likelihood ratio and related test procedures
In this section three general test procedures which give rise to asymptotically optimal tests will be considered; the likelihood ratio, Wald and Lagrange multiplier test procedures All three test procedures can be
interpreted as utilising the information incorporated in the log likelihood
function in different but asymptotically equivalent ways
For expositional purposes the test procedures will be considered in the context of the simplest statistical model where
(i) = { f(x; 6), 0€@} is the probability model; and
(ii) X=(X,, X2, , X,,)' is a random sample
The results can be easily generalised to the non-random sample case where I,(8)= O,(n) as explained in Chapter 13 above in the context of maximum likelihood estimation For most results the generalisation amounts to substituting I(@) for I,,(@) and reinterpreting the results
(1) Simple null hypothesis
Let the null hypothesis be Hy: 0=6,, 0¢ @=R” against H,: 0440p (i) The likelihood ratio test
The likelihood ratio test statistic discussed in Section 14.4 takes the form
max L(@;x) LÍ; x)
Trang 416.2 Likelihood ratio and related test procedures 329
where 6 is the MLE of 0 In cases where A(x) or some monotonic function of
it have a tractable distribution there is no need for asymptotic theory Commonly, however, this is not the case and asymptotic theory is called for (ii) The Wald test
Under certain regularity conditions which include CR1—CR3 (see Chapter 13) log L(@; x) can be expanded in a Taylor series at 0=6
log L(@; x)=log L(6; x) + (6-6) E log Li; 9|
^ 0? log L(0*; x) ˆ
+f { 20 26" a )} +0,(1),
(16.14)
where |0* —6|<|6—6| and o,(1) refers to asymptotically negligible terms
(see Chapter 10) Since
ề
being the ñrst order conditions for the MLE, and
the above expansion can be simplified (see Serfling (1980)) to:
log L(0; x) =log L(6; x) +4n(6 — 6)1(6)(6 — 0) +0,(1) (16.17)
This implies that, since
—2 log A(x) = 2[log L(0; x) — log L(y; x)], (16.18)
—2 log A(x) =n(6 — 05) 1(8)(6 — 85) +0,( 1) (16.19) For the asymptotic properties of MLE’s it is known that under certain regularity conditions
Using this we can deduce (see property Q1, Chapter 15) that
Ho
LR= —2 log A(x) =~ n(6 — 05) 1(0)( — 8ạ) ~ x?(m) (16.21)
being a quadratic form in asymptotically normal random variables (r.v.’s)
Trang 5330 Asymptotic test procedures
Wald (1943), using the above approximation of —2 log A(x), proposed an alternative test statistic by replacing I(@) with I(6):
Họ
P
given that 1(6) > I(@) This is the so-called Wald statistic
(ili) The Lagrange multiplier test
Rao ( 1947) using the asymptotic distribution of the score function (instead
of that ot 9), i.e
1 é q(0) = —7- — log L(0; x) ~ N(0,1(0)) (16.23)
proposed the efficient score (or Lagrange multiplier) test statistic
LM =; 4(6o)1(6,)ˆ 'q(6,) ~ z?m) (16.24)
which is again a quadratic form in asymptotically normally distributed r.V.S
For all three test statistics (LR, W, LM) the rejection region takes the form
where /(x) stands forall three test statistics and the critical value c, is defined
by J2 dy(m) =a, « being the size of the test Under local alternatives with a Pittman type drift of the form:
b
all three test statistics are asymptotically distributed as:
\/n(6 — 05) = \/n(6 —0,) +b ~ N(b, (05) ~*) (16.28)
v/nq(8,)= /n(ô~ 8,)I(đ,)+op(1) ~ N(BI(6,),I(6)).— (16.29)
Trang 616.2 Likelihood ratio and related test procedures 331
gq (8)
qi6a)
Fig 16.1 The LR, W and LM tests compared
Hence, the power function for all three test statistics takes the form
cư
and thus, LR, W and LM are asymptotically equivalent in the sense that they have the same asymptotic properties
Fig 16.1, due to Pagan (1982),shows the relationship between LR, W and
LM in the case of a scalar 0
Note that all three test statistics can be interpreted as functions of the score function
Trang 7332 Asymptotic test procedures
(2) Composite null hypothesis
Consider the case where the H, is composite, i.e
Hạẹ:0ec©, against H,:06€0,, OCR, OCR"
It is both convenient as well as practical to parametrise @, in the form
where R(#)=0 represents r non-linear equations, i.e R(#)=(R,(0), R,(8),
, R,(0))’ In most situations in practice the parametrised form arises
naturally in the form of restrictions such as R,(0)=6,0,;+6,, R,(0)= log 0, —02, R3(0)=07 +0, —1, R,(0)=0, —20,, etc If we define 6 to be the maximum likelihood estimator (MLE) of @, ic 6 is the solution of [e log L(@; x)]/c@=0, then from
and
1ô
` log L{6; x) ~ N(0,I(6)), (16.36)
we can deduce that
Họ
x/n(R(ô) — R(6)) ~ N(0, R¿1(6) 'R,), (16.37)
since R(@) can be approximated at Ø8=Ô by
where
ôR(Ø)
R,=
60
(i) The Wald test procedure
If the null hypothesis Hy is true we expect the MLE 6, without imposing the
restrictions, to be close to satisfying the restrictions, i.e if Hy is true, R(@) ~0 This implies that a natural measure for any departure from H, should be
If this is ‘significantly’ different from zero it will be a good indication that Hạ
is false The problem is to formalise the concept ‘significantly different from
zero’ The obvious way to proceed is to construct a pivot based on ||R(6)|| in order to enable us to turn this statement into a precise probabilistic statement.
Trang 816.2 Likelihood ratio and related test procedures 333
In constructing such a pivot there are two basic problems to over-
come The first is that | R(@)|| depends on the units of measurement and
the second is that absolute values are not easy to manipulate A quantity which ‘solves’ both problems is the quadratic form
where V(R(6)) represents the covariance of R(6) Determining V(R(6)) can be
a very difficult task since we often know very little about the distribution of
6 Asymptotically, however, we know the distribution of R(6) and
hence we can deduce that
Ho
nR(Ø)[R,1(6)ˆ !R,]ˆ °R(6) ~ z?(r) (16.42)
Wald’s suggestion amounts to replacing V(R(6)) with a consistent estimator, Le
x
Note that the Wald procedure can be used in conjunction with any asymptotically normal estimator @* (not just MLE’s) since if
(it) Lagrange multiplier test procedure
In contrast to the Wald test procedure the Lagrange multiplier procedure is based solely on the restricted MLE of @, say 6 Although the Lagrange multiplier test statistic can take various equivalent forms we consider only two such forms in what follows Estimation of @ subject to the restrictions
R(@)=0 is based on the optimisation of the Lagrangian function
where pw: r x 1 vector of multipliers The restricted MLE of 6 is defined to be
the solution of the system of equations:
Trang 9334 Asymptotic test procedures
In the case of the Wald procedure we began our search for an asymptotic pivot using R(@) which should be close to zero when H, is true In the present case, however, R(8) =0 by definition and thus it cannot be used But although in the Wald procedure the score function evaluated at 0=6 is
zero, 1.e
60
this is not the case for [é log L(@; x)]/20 and we can use it to construct an asymptotic pivot Equivalently, the Lagrange multipliers g(ổ) can be used
instead The intuition underlying the use of p(6) is that these multipliers can
be interpreted as shadow prices for the constraints and should register all departures from H ,; if 8 is closed to @ (6) is small and vice versa Hence, a reasonable thing to do is to consider the quantity | (8) — 0| Using the same argument as in the Wald procedure for |R( (6) — 0| we set up the quadratic form
Using the fact that
1 log L(8; x)
we can deduce that
vn (u(8) — g(8)) ~ NỊ0, [RjÏ(0)~ 'Rạ]” ') (16.52) Hence,
The Lagrange multiplier test statistic takes the form
H
LM =- w(ñŸ[RjÏ(ð~ 'R„]g(ð) ~ 240), (16.54)
or, equivalently,
LM= - a L(6; x) | 1(8)~! Cy 2p E x) ) 20 og L(8; x) }, 6: (16.55)
which is the efficient score form.
Trang 1016.2 Likelihood ratio and related test procedures 335 The likelihood ratio test statistic takes the form
Ho
LR=2(log L(6; x) —log L(@; x)) ~ 77(°r) (16.56) Using the Taylor series expansions we can show that
Thus, although all three test statistics are based on three different asymptotic pivots, as n > 0 the test statistics become equivalent All three tests share the same asymptotic properties; they are all consistent as well
as asymptotically locally UMP against local alternatives of the form considered above In the absence of any information relating to higher-
order approximations of the distribution of these test statistics under both
Hạ and H, the choice between them is based on computational convenience The Wald test statistic is constructed in terms of 6 the
unrestricted MLE of @, the Lagrange multiplier in terms of 6 the restricted
MLE of @ and the likelihood ratio in terms of both
In order to be able to discriminate between the above three tests we need
to derive higher-order approximations such as Edgeworth approximations
(see Chapter 10) Rothenberg (1984) gives an excellent discussion of various
ways to derive such higher-order approximations
Of particular interest in practice is the case where Ø=(6;,Ø;) and Họ:
6, =0° against H,: 0, #09, 0,:r x 1 with 0,:(m—r) x 1 left unrestricted In this case the three test statistics take the form
LR= —2(log L(G; x) —log L(6; x)), (16.58)
W=n(6, — 09) 11, (8) —1, 2152 (O12,(8))(8, — 99), (16.59)
LM= © 9) (1,0) 1, (8) I;;( '(ð1;¡( ñ]_ 'm(ñ (16.60)
where
na on A ~ ~ ~ Clog L(@;x)
6=(6,,6,), 8=(09.0,), w0)=—=——
00; 6-8
This is because for R(0)=0, — 0°
1,,(@) 1,.(0)
IØ)={„" ø (ho 12/6) tở ) R;=(I,:0 U29)
and hence
R¿I(Ø)(6)ˆ 'Rạ= [I,,(6)—I;z(6)1z2(6)1›¡(6)]ˆ ° (16.61) For further discussion of the above asymptotic test procedures see the survey by Engle (1984).