It is often the case that a particular form of departure from one assumption might also affect other assumptions.. In general the way to proceed when any of the assumptions [6]-[8] are i
Trang 1(iti) Var(y,/X,=x,)=07, homoskedastic,
E71 0 =(ÿ;ø?) are time-invarlant
In each of the Sections 2—5 the above assumptions will be relaxed one at a time, retaining the others, and the following interrelated questions will be
discussed :
(a) what are the implications of the departures considered?
(b) how do we detect such departures?, and
(c) how do we proceed if departures are detected?
It is important to note at the outset that the following discussion which
considers individual assumptions being relaxed separately limits the scope
of misspecification analysis because it is rather rate to encounter such conditions in practice More often than not various assumptions are invalid simultaneously This is considered in more detail in Section 1 Section 6
discusses the problem of structural change which constitutes a particularly
important form of departure from [7]
211 Misspecification testing and auxiliary regressions
Misspecification testing refers to the testing of the assumptions underlying
a Statistical model In its context the null hypothesis is uniquely defined as
the assumption(s) in question being valid The alternative takes a particular form of departure from the null which is invariably non-unique This is
443
Trang 2because departures from a given assumption can take numerous forms with the specified alternative being only one such form Moreover, most misspecification tests are based on the questionable presupposition that the other assumptions of the model are valid This is because joint misspecification testing is considerably more involved For these reasons the choice in a misspecification test is between rejecting and not rejecting the null; accepting the alternative should be excluded at this stage
An important implication for the question on how to proceed if the null is rejected is that before any action is taken the results of the other misspecification tests should also be considered It is often the case that a particular form of departure from one assumption might also affect other assumptions For example when the assumption of sample independence [8] is invalid the other misspecification tests are influenced (see Chapter 22)
In general the way to proceed when any of the assumptions [6]-[8] are
invalid is first to narrow down the source of the departures by relating them
back to the NIID assumption of ,Z,,t¢ 7} and then respecify the model taking into account the departure from NIID The respecification of the
model involves a reconsideration of the reduction from D(Z,,Z5, ,Z7; W)
to D(y,/X,; 8) so as to account for the departures from the assumptions involved As argued in Chapters 19-20 this reduction coming in the form of:
to test assumption [8] first and then proceed with the other assumptions if [8] is not reyected The sequence of misspecification tests considered in what follows is chosen only for expositional purposes
With the above discussion in mind let us consider the question of general procedures for the derivation of misspecification tests In cases where the
alternative in a misspecification test is given a specific parametric form the
various procedures encountered in specification testing (F-type tests, Wald,
Trang 321.1 Misspecification testing 445
Lagrange multiplier and likelihood ratio) can be easily adapted to apply in the present context In addition to these procedures several specific misspecification test procedures have been proposed in the literature (see
White (1982), Bierens (1982), inter alia) Of particular interest in the present
book are the procedures based on the ‘omitted variables’ argument which lead to auxiliary regressions (see Ramsey (1969), (1974), Pagan and Hall (1983), Pagan (1984), inter alia) This particular procedure is given a prominent role in what follows because it is easy to implement in practice and it provides a common-sense interpretation of most other misspecification tests
The ‘omitted variables’ argument was criticised in Section 20.2 because it was based on the comparison of two ‘non-comparable’ statistical GM’s This was because the information sets underlying the latter were different It was argued, however, that the argument could be reformulated by postulating the same sample information sets In particular if both parametrisations can be derived from D(Z, Z>, Zr; w) by using alternative reduction arguments then the two statistical GM’s can be made comparable
Let {Z,, f¢ 1} be a vector stochastic process defined on the probability space (S, ~ P(-)) which includes the stochastic variables of interest In Chapter 17 it was argued that for a given %Co.F
defines a general statistical GM with
satisfying some desirable properties by construction including the orthogonality condition:
Itis important to note, however, that (3)(4) as defined above are just ‘empty
boxes’ These are filled when {Z,,t¢€ 1} is given a specific probabilistic
structure such as NIID In the latter case (3)-(4) take the specific forms:
with the conditioning information set being
When any of the assumptions in NIID are invalid, however, the various
properties of u, and u, no longer hold for u* and u* In particular the
Trang 4orthogonality condition (5) is invalid The non-orthogonality
can be used to derive various misspecification tests If we specify the alternative in a parametric form which includes the null as a special case (9) could be used to derive misspecification tests based on certain auxiliary regressions
In order to illustrate this procedure let us consider two important parametric forms which can provide the basis of several misspecification
and g(x,) is known as the Kolmogorov—Gabor polynomial (see Ivakhnenko
(1984)) Both of these polynomials can be used to specify a general parametric form for the alternative systematic component:
where z* represents known functions of the variables Z,_;, ,Z,,X, This gives rise to the alternative statistical GM
which includes (6) as a special case under
A direct comparison between (13) and (6) gives rise to the auxiliary
regression
Uy = (Bo ~ BYX, + Your + &, (21.15)
whose operational form
can be used to test (14) directly The most obvious test is the F-type test discussed in Sections 19.5 and 20.3 The F-test will take the general form
RRSS—URSS (T—k*
Trang 521.2 Normality 447
where RRSS and URSS refer to the residuals sum of squares from (6) and (16) (or (13)), respectively; k* being the number of parameters in (13) and m the number of restrictions
This procedure could be easily extended to the higher central moments of
where D(-) is an unknown distribution, and discuss the problem of testing whether D(-) is in fact normal or not
(1) Consequences of non-normality
Let us consider the effect of the non-normality assumption in (21) on the
specification, estimation and testing in the context of the linear regression model discussed in Chapter 19
As far as specification (see Section 19.2) is concerned only marginal changes are needed After removing assumption [6](i) the other assumptions can be reinterpreted in terms of D(f’x,, o”) This suggests that relaxing normality but retaining linearity and homoskedasticity might not
constitute a major break from the linear regression framework.
Trang 6The first casualty of (21) as far as estimation (see Section 19.4) is concerned is the method of maximum likelihood itself which cannot be used unless the form of D(-) is known We could, however, use the least-squares
method of estimation briefly discussed in Section 13.1, where the form of the
underlying distribution is ‘apparently’ not needed
Least-squares is an alternative method of estimation which is historically much older than the maximum likelihood or the method of moments The least-squares method estimates the unknown parameters 6 by minimising the squares of the distance between the observable random variables y,,
te T, and h,(@) (a function of @ purporting to approximate the mechanism giving rise to the observed values y,), weighted by a precision factor I/x, which ts assumed known, i.e
0cQ ¢ Ky
It is interesting to note that this method was first suggested by Gauss in 1794
as an alternative to maximising what we, nowadays, call the log-likelihood function under the normality assumption (see Section 13.1 for more details)
In an attempt to motivate the least-squares method he argued that:
the most probable value of the desired parameters will be that in which the sum of the squares of differences between the actually observed and computed values multiplied by numbers that measure the degree of precision, is a minimum
This clearly shows a direct relationship between the normality assumption and the least-squares method of estimation It can be argued, however, that the least-squares method can be applied to estimation problems without assuming normality In relation to such an argument Pearson (1920) warned that:
we can only assert that the least-squares methods are theoretically accurate on the assumption that our observations obey the normal law Hence in disregarding normal distributions and claiming great generality by merely using the principle of least-squares the apparent generalisation has been gained merely at the expense of theoretical validity
Despite this forceful argument let us consider the estimation of the linear regression model without assuming normality, but retaining linearity and homoskedasticity as in (21)
The least-squares method suggests minimising
T , _— #/ 2
1=1 ỡ
Trang 7Finite sample properties of b and $?
Although b is identical to B (the MLE of ) the similarity does not extend to the properties unless D(y,/X,; 8) is normal
(a) Since b=Ly, the OLS estimator is linear in y
Using the properties of the expectation operator E(-) we can deduce: (b) E(b) = E(b + Lu) = B+ LE(u) = £,i-e bis an unbiased estimator of B
(c) E(b— B)(b — 8 = E(LuwL)= ø?LU =ø?(XX)_ !
Given that we have the mean and variance of b but not its distribution, what other properties can we deduce?
Clearly, we cannot say anything about sufficiency or full efficiency without knowing D(y,/X,; 6) but hopefully we could discuss relative efficiency within the class of estimators satisfying (a) and (b) The Gauss— Markov theorem provides us with such a result
Gauss—Markov theorem
Under the assumption (21), b, the OLS estimator of B, has minimum variance among the class of linear and unbiased estimators (for a proof see Judge et
al, (1982))
As far as §? is concerned, we can show that
(d) E(8?)= 07, i.e §? is an-unbiased estimator of 07,
using only the properties of the expectation operator relative to D(p’x,, ¢”)
In order to test any hypotheses or set up confidence intervals for
Trang 80 =(ÿ, ø?) we need the distribution of the OLS estimators b and §* Thus, unless we specify the form of D(f’x,,¢7), no test or/and confidence interval
statistics can be derived The question which naturally arises is to what
extent ‘asymptotic theory’ can at least provide us with large sample results
Asymptotic distribution of b and §?
where {14 refers to the fourth central moment of D(y,/X,; 6) assumed
to be finite (see Schmidt (1976))
Note that in the case where D(y,/X,; Ø) is normal
From the above lemmas we can see that although the asymptotic
distribution of b coincides with the asymptotic distribution of the MLE this
is not the case with §* The asymptotic distribution of b does not depend on
Trang 921.2 Normality 451
D(y,/X,; 8) but that of $? does via 4 The question which naturally arises is
to what extent the various results related to tests about 0=(B, a) (see Section 19.5) are at least asymptotically justifiable Let us consider the F- test for Hy: RB=r against H,: RB¥¢r From lemma 21.1 we can deduce that under Hy: ,/T(Rb—r)~ N(0,o?(RQ;'R’)~'), which implies that
under Ho, and thus the F-test is robust with respect to the non-normality
assumption (21) above Although the asymptotic distribution of t,(y) is chi- square, in practice the F-distribution provides a better approximation fora small T (see Section 19.5)) This is particularly true when D(f’x,,o) has heavy tails The significance t-test being a special case of the F-test,
the size « and power of these tests can be very different from the ones based
on the postulated value of « This can seriously affect all tests which depend
on the distribution of s* such as some heteroskedasticity and structural change tests (see Sections 21.4-21.6 below) In order to get non-normality robust tests in such cases we need to modify them to take account of ji,
(2) Testing for departures from normality
Tests for normality can be divided into parametric and non-parametric
tests depending on whether the alternative is given a parametric form or not.
Trang 10(a) Non-parametric tests
The Kolmogorov—Smirnov test
Based on the assumption that {u,/X,,t¢ 7} isan IID process we can use the
results of Appendix 1i.1 to construct test with rejection region
The Shapiro—Wilk test
This test is based on the ratio of two different estimators of the variance ơ?
and a,; is a weight coefficient tabulated by Shapiro and Wilk (1965) for
sample sizes 2< 7< 50 The rejection region takes the form:
where c, are tabulated in the above paper
(b) Parametric tests
The skewness—kurtosis test
The most widely used parametric test for normality is the skewness— kurtosis The parametric alternative in this test comes in the form of the Pearson family of densities
The Pearson family of distributions is based on the differential equation din f(z) (z—a)
Trang 1121.2 Normality 453 where solution for different values of (a, co, Cc}, C2) generates a large number
of interesting distributions such as the gamma, beta and Student’s r It can
be shown that knowledge of a”, «3; and a, can be used to determine the distribution of Z within the Pearson family In particular:
(a) c,=0,c, #0 This gives rise to gamma-type distributions with the
chi-square an important member of this class of distributions For
3 1
(b) c,=0, cg>0, c,>0 An important member of this class of
distributions is the Student's t For Z~t(m), a,=0, #,=3+
6/(m—4), (m>4)
{c) c,<O<c, This gives rise to beta-type distributions which are
directly related to the chi-square and F-distributions In particular
if Z;~ x7(m,), i= 1,2, and Z,,Z, are independent, then
It is interesting to note that (48) also characterises normality within the
‘short’ (first four moments) Gram—Charlier expansion:
g(z)=[1 —4a3(23 —3z) + (a4 — 3)(z* — 6z? + 3)]đ®(z) (21.49)
(see Section 10.6)
Bera and Jarque (1982) using the Pearson family as the parametric
alternative derived the following skewness—kurtosis test as a Lagrange
Trang 12The rejection region is defined by
Cy =ty: THY) > Cy}, |, dy7(2) =a (21.53)
With đ; and đ¿ being asymptotically independent (see Kendall and Stuart
(1969)) we can add the squares of their standardised forms to derive (50); see Section 6.3
Let us consider the skewness—kurtosis test for the money equation
m, = 2.896 +0.690y, + 0.865p, —0.055i, + ñ,, (21.56) (1.034) (0.105) (0.020) (0.013) (0.039)
R?=0.995, R?=0.995, s=0.0393, log L=1474,
T=80, &2=0.005, (&@,—3)?=0.145
Thus, t%(y) =0.55 and since c,=5.99 for «=0.5 we can deduce that under
the assumption that the other assumptions underlying the linear regression model are valid the null hypothesis Hy: #,=0 and a, =3 is not rejected for
œ=0.05
There are several things to note about the above skewness—kurtosis test Firstly, it is an asymptotic test and caution should be exercised when the sample size T is small: For higher-order approximations of the finite sample distribution of &, and a, see Pearson, D’Agostino and Bowman (1977), Bowman and Shenton (1975), inter alia Secondly, the test is sensitive to
‘outliers’ (‘unusually large’ deviations) This can be both a blessing and a
Trang 1321.2 Normality 455
hindrance The first reaction of a practitioner whose residuals fail this normality test is to look for such outliers When the apparent non- normality can be explained by the presence of these outliers the problem can be solved when the presence of the outliers can itself be explained
Otherwise, alternative forms of tackling non-normality need to be
considered as discussed below Thirdly, in the case where the standard error
of the regression ¢ is relatively large (because very little of the variation in y,
is actually explained), it can dominate the test statistic r#(y) It will be suggested in Chapter 23 that the acceptance of normality in the case of the money equation above is largely due to this Fourthly, rejection of normality using the skewness—kurtosis test gives us no information as to the jnature of the departures from normality unless it is due to the presence of outliers
A natural way to extend the skewness-kurtosis test is to include cumulants of order higher than four which are zero under normality (see Appendix 6.1)
(3) Tackling non-normality
When the normality assumption is invalid there are two possible ways to proceed One is to postulate a more appropriate distribution for D( y,/X,; 9) and respecify the linear regression model accordingly This option is rarely considered, however, because most of the results in this context are
developed under the normality assumption For this reason the second way
to proceed, based on normalising transformations, is by far the most
commonly used way to tackle non-normality This approach amounts to
applying a transformation to y, or/and X, so as to induce normality Because of the relationship between normality, linearity and
homoskedasticity these transformations commonly induce linearity and homoskedasticity as well
One of the most interesting family of transformations in this context is the Box—Cox (1964) transformation For an arbitrary random variable Z the Box—Cox transformation takes the form
Trang 14(iii) 6=0, Z*=log, Z — logarithmic (21.60)
(note: lim Z* = log, Z)
670
The first two cases are not commonly used in econometric modelling
because of the difficulties involved in interpreting Z* in the context of an
empirical econometric model Often, however, the square-root transformation might be convenient as a homoskedasticity inducing
transformation This is because certain economic time-series exhibit
variances which change with its trending mean (m,), i.e Var(Z,) = m,o?, t= 1
2, , T In such cases the square-root transformation can be used as a
variance-stabilising one (see Appendix 21.1) since Var(Z*)~o?
The logarithmic trari8formation is of considerable interest in econometric modelling for a variety of reasons Firstly, for a random variable Z, whose distribution is closer to the log normal, gamma or chi- square (i.e positively skewed), the distribution of log, Z, is approximately normal (see Johnson and Kotz (1970)) The log, transformation induces
‘near symmetry’ to the original skewed distribution and allows Z* to take
negative values even though Z could not For economic data which take
only positive values this can be a useful transformation to achieve near
normality Secondly, the log, transformation can be used as a variance-
stabilising transformation in the case where the heteroskedasticity takes the form
Var(y,/X,=x,)=o2=(u,)'02, t=1,2 ,T, (21.61) For y*=log,y, Var(y*/X,=x,)=ø?, t=1, 2, ., T Thirdly, the log
transformation can be used to define useful economic concepts such as elasticities and growth rates For example, in the case of the money equation considered above the variables are all in logarithmic form and the estimated coefficients can be interpreted as elasticities (assuming that the estimated equation constitutes a well-defined statistical model; a doubtful assumption) Moreover, the growth rate of Z, defined by Z* =(Z,—Z, _ U/
Z,-, can be approximated by Alog,Z,=log,Z,—log Z,_, because Alog.Z,~log(1+Z,)~Z,"
In practice the Box—Cox transformation can be used with 5 unspecified and let the data determine its value (see Zarembka (1974)) For the money equation the original variables M,, Y,, P, and I, were used in the Box-Cox transformed equation:
Trang 1521.3 Linearity 457
and allowed the data to determine the value of 6 The estimated 6 value
chosen was 6=0.530 and
B, =0.252, ;=0.865, ;=0.005, ổ¿= —0.00007
(0.223) (0.119) (0.0001) (0.000 02)
‘Does this mean that the original logarithmic transformation is inappropriate” The answer is, not necessarily This is because the estimated
value of 5 depends on the estimated equation being a well-defined statistical
GM (no misspecification) In the money equation example there is enough evidence to suggest that various forms of misspecification are indeed present (see also Sections 21.3-7 and Chapter 22)
The alternative way to tackle non-linearity by postulating a more appropriate form for the distribution of Z, remains largely unexplored
Most of the results in this direction are limited to multivariate distributions
closely related to the normal such as the elliptical family of distributions (see Section 21.3 below) On the question of robust estimation see Amemiya
linear in x* but linear in x,=/(x*) where [(-) is a well-behaved transformation such as x,=log x* and x,= Oc)? Moreover, terms such as
Cot cyt tent? +7 +¢,¢"
in this section are the ones which cannot be accommodated into a linear conditional mean after transformation
and
Trang 16It is important to note that postulating (63), without assuming normality
of D(y,,X,; w), we limit the class of symmetric distributions in which
D(y,,X%,; w) could belong to that of elliptical distributions, denoted by EL(u, 2) (see Kelker (1970)) These distributions provide an extension of the multivariate normal distribution which preserve its bell-like shape and symmetry Assuming that
This shows that the assumption of linearity is not as sensitive to some
departures from normality as the homoskedasticity assumption Indeed,
homoskedasticity of the conditional variance characterises the normal distribution within the class of elliptical distributions (see Chmielewski
(198 1))
(1) Implications of non-linearity
Let us consider the implications of non-linearity for the results of Chapter
19 related to the estimation, testing and prediction in the context of the
linear regression model In particular, ‘what are the implications of assuming that D(Z,; w) is not normal and
where y,=E(y,/X,=x,) =h(x,) and ¢,=y,—E(y,/X,=x,) Comparing (69)
and (70) we can see that the error term in the former is no longer
white noise but u,=y,—f’x,=h(x,)—B’x,+¢,=g(x,) +e, Moreover,
Trang 1721.3 Linearity 459 E(ur/X,=x,)=9(X,), E(u#u#) #0 and
and s* are inconsistent estimators of B and o?
As we can see, the consequences of non-linearity are quite serious as far as the properties of B and s* are concerned, being biased and inconsistent estimators of B and o’, in general What is more, the testing and prediction results derived in Chapter 19 are generally invalid in the case of non- linearity In view of this the question arises as to what is it we are estimating
we can show that p— Br and s* ->ø2(§*) (see White (1980))
(2) Testing for non-linearity
In view of the serious implications of non-linearity for the results of Chapter
19 it is important to be able to test for departures from the linearity
assumption In particular we need to construct tests for
Trang 18against
This, however, raises the question of postulating a particular functional form for h(x,) which is not available unless we are prepared to assume a particular form for D(Z,;~) Alternatively, we could use the parametrisation related to the Kolmogorov—-Gabor and systematic component polynomials introduced in Section 21.2
Using, say, a third-order Kolmogorov-Gabor polynomial (KG(3)) we can postulate the alternative statistical GM:
Note that x,, is assumed to be the constant
Assuming that T is large enough to enable us to estimate (77) we can test linearity in the form of:
Ho: y2=0 and y3;=0, A,:y,40 or y;,40
using the usual F-type test (see Section 21.1) An asymptotically equivalent test can be based on the R? of the auxiliary regression:
ti, =(Bo —BYX,+ yoo, + sat 8 (21.80)
using the Lagrange multiplier test statistic
RRSS—URSS`e
LM(y)= TR= r(
q being the number of restrictions (see Engle (1984)) Its rejection region is
Cy=ty: LM(y)>c,}, | dy7(q) =a
For small T the F-type test is preferable in practice because of the degrees of
freedom adjustment; see Section 19.5
Using the polynomial in y, we can postulate the alternative GM of the form:
Ve=ByX, + Cote +C3up +00 Hull + v; (21.82) where ju, = B’x, A direct comparison between (75) and (82) gives rise to a
Trang 1921.3 Linearity 461
RESET type test (see Ramsey (1974)) for linearity based on Hg: cy=c3= -=¢,,=0,H,:c;40,i=2, ,m Again this can be tested using the F-type test or the LM test both based on the auxiliary regression:
u, =(B, — By x, + » Ciậu +0,, =X, (21.83)
¡=2 Let us apply these tests to the money equation estimated in Section 19.4
The F-test based on (77) with terms up to third order (but excluding
because of collinearity with y,) yielded:
.117520-0.045 477 (67 _0 17 520 -0.045 ( =
Given that c,=2.02 the null hypothesis of linearity is strongly rejected
Similarly, the RESET type test based on (82) with m=4 (excluding i? because of collinearity with 4i,) yielded:
.117520—0.060 28 (74
Me 0.060 28 2 )=3513
Again, with c,=3.12 linearity is strongly rejected
It is important to note that although the RESET type test is based on a
more restrictive form of the alternative (compare (77) with (82)) it might be the only test available in the case where the degrees of freedom are at a premium (see Chapter 23)
(3) Tackling non-linearity
As argued in Section 21.1 the results of the various misspecification tests
should be considered simultaneously because the assumptions are closely
interrelated For example in the case of the estimated money equation it is highly likely that the hnearity assumption was rejected because the
independent sample assumption [8] is invalid In cases, however, where the source of the departure is indeed the normality assumption (leading to non- linearity) we need to consider the question of how to proceed by relaxing the
normality of {Z,_,,t¢€1} One way to proceed from this is to postulate a general distribution D(y,,X,;w) and derive the specific form of the
conditional expectation
Choosing the form of D(y,, X,: ¥) will determine both the form of the
conditional expectation as well as the conditional variance (see Chapter 7)
An alternative way to proceed is to use some normalising transformation
Trang 20on the original variables y, and X, so as to ensure that the transformed
variables y* and X* are indeed jointly normal and hence
(see Box and Tidwell (1962))
In practice non-linear regression models are used in conjunction with the normality of the conditional distribution (see Judge et al (1985), inter alia)
The question which naturally arises is, ‘how can we reconcile the non-
linearity of the conditional expectation and the normality of D(y,/X,; 0} As mentioned in Section 19.2, the linearity of u,=E(y,/X,=x,) is a direct consequence of the normality of the joint distribution D(y,, X,; w) One way the non-linearity of E(y,/X,=x,) and the normality of D(y,/X,; 0) can be reconciled is to argue that the conditional distribution is normal in the transformed variables X* = h(X,), ie D(y,/X* =x) linear in x* but non-
linear in x,, Le
Moreover, the parameters of interest are not the linear regression parameters 6=(B, 0”) but @=(y, o2) It must be emphasised that non- linearity in the present context refers to both non-linearity in parameters (y) and variables (X,)
Non-linear regression models based on the statistical GM:
can be estimated by least-squares based on the minimisation of
,
Trang 2121.4 Homoskedasticity 463
This minimisation will give rise to certain non-linear normal equations which can be solved numerically (see Harvey (1981), Judge et al (1985), Malinvaud (1970), inter alia) to provide least-squares estimators for
y: mx 1 62 can then be estimated by
21.4 Homoskedasticity
The assumption that Var(y,/X,= x,)= ø is Íree oŸ x, is a consequence of the assumption that D(y,, X,; ý) is multivariate normal As argued above, the assumption of homoskedasticity is inextricably related to the assumption of normality and we cannot retain one and reject the other uncritically Indeed, as mentioned above, homoskedasticity of Var(y,X,=X,)
characterises the normal distribution within the elliptical class For argument’s sake, let us assume that the probability model is in fact based on
D(’x,,¢2) where D(-) is some unknown distribution and ơ¿ =h(X,) (1) Implications of heteroskedasticity
As far as the estimators f and s? are concerned we can show that
(1) E(B)=B, i.e B is an unbiased estimator of B
(ii) Ô — ÿ i.e ƒ 1s a consistent estimator of ÿ
These results suggest that B=(X’X) ‘X’y retains some desirable properties such as unbiasedness and consistency, although it might be inefficient B is usually compared with the so-called generalised least- squares (GLS) estimator of B, B, derived by minimising
Trang 22ôg) ws 28 0=>=(XQ-!X) !'X@-'y R ‘E71 ~lyg-t
presupposition that A is known a priori and thus the above efficiency
comparison is largely irrelevant It should surprise nobody to ‘discover’ that supplementing the statistical model with additional information we can get a more efficient estimator Moreover, when A is known there is no need for GLS because we can transform the original variables in order to return to a homoskedastic conditional variance of the form
This can be achieved by transforming y and X into
y*=Hy and X*=HX where H'H=A"™' (21.98)
In terms of the transformed variables the statistical GM takes the form
and the linear regression assumptions are valid for y* and X* Indeed, it can
be verified that
p=(X*’X*)'X*y*=(X'A7IX) UX’A ly= Bh (21.100)
Hence, the GLS estimator is rather unnecessary in the case where A is
known a priori
The question which naturally arises at this stage is, ‘what happens when Qis unknown?’ The conventional wisdom has been that since Q involves T unknown incidental parameters and increases with the sample size it is clearly out of the question to estimate T+k parameters from 7 observations Moreover, although B=(X’X) 'X’y is both unbiased and consistent s2(XX) ! is an inconsistent estimator of Cow )=
(X’X) !X’OX(X’X)~! and the difference
Trang 2321.4 Homoskedasticity 465
can be positive or negative Hence, no inference on , based on f, is possible since for a consistent estimator of Cov(p) we need to know Q (or estimate it consistently) So, the only way to proceed is to model a? so as to ‘solve’ the incidental parameters problem
Although there is an element of truth in the above viewpoint White (1980) pointed out that for consistent inference based on B we do not need to estimate Q by itself but (X’QX), and the two problems are not equivalent
The natural estimator o? is i? =(y,— f’x,)*, which is clearly unsatisfactory
because it is based on only one observation and no further information accrues by increasing the sample size On the other hand, there is a perfectly acceptable estimator for (X'QOX) coming in the form of
The most important implication of this is that consistent inference, such as
the F-test, is asymptotically justifiable, although the loss in efficiency should be kept in mind In particular a test for heteroskedasticity could be
based on the difference
Before we consider this test it is important to summarise the argument so
far
Under the assumption that the probability model is based on the
distribution D(f’x,,c7), although no estimator of Q=diag(o?, ., 67) is
possible, B=(X’X)~!X’y is both unbiased and consistent (under certain conditions) and a consistent estimator of Cov(f) is available in the form of
W, This enables us to use B for hypothesis testing related to Ø The argument of ‘modelling’ a? will be taken up after we consider the question of testing for departures from homoskedasticity
(2) Testing departures from homoskedasticity
White (1980), after proposing a consistent estimator for X’QOX, went on to
Trang 24use the difference (equivalent to (105)):
the natural estimator of (107) Given that (108) is symmetric we can express
the 4k(k — 1) different elements in the form
suggest an asymptotically equivalent test based on the R? of the auxiliary
Trang 25Example
For the money equation estimated above the estimated auxiliary equation
of the form
Ge =cotyW, +0,
yielded R?=0.190, FT(y)=2.8 and TR*= 15.2 In view of the fact that
F(6.73)=2.73 and y7(6)=12.6 for «=0.05 the null hypothesis of
homoskedasticity is rejected by both tests
The most important feature of the above White heteroskedasticity test
is that ‘apparently’ no particular form of heteroskedasticity is postulated
In subsection (3) below, however, it is demonstrated that the White test 1s an exact test in the case where D(Z,; w) is assumed to be multivariate ¢ In this
case the conditional mean is py, = p’x, but the variance takes the form:
Using the ‘omitted variables’ argument for u? = F(u2/X,=X,)+ 0, We can
derive the above auxiliary regression (see Spanos (1985b)) This suggests
that although the test is likely to have positive power for various forms of
heteroskedasticity it will have highest power for alternatives in the multivariate t direction That is, multivariate distributions for D(Z,; w) which are symmetric but have heavier tails than the normal
In practice it is advisable to use the White test in conjunction with other
tests based on particular forms of heteroskedasticity In particular, tests which allow first and higher-order terms to enter the auxiliary regression, such as the Breusch—Pagan test (see (128) below)
Important examples of heteroskedasticity considered in the econometric literature (see Judge et al (1985), Harvey (1981)) are: