1. Trang chủ
  2. » Giáo Dục - Đào Tạo

THE LINEAR REGRESSION MODEL III

50 269 0
Tài liệu được quét OCR, nội dung có thể không chính xác

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề The Linear Regression Model III
Trường học University of Example
Chuyên ngành Econometrics
Thể loại Lecture Notes
Năm xuất bản 2023
Thành phố Sample City
Định dạng
Số trang 50
Dung lượng 1,6 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

It is often the case that a particular form of departure from one assumption might also affect other assumptions.. In general the way to proceed when any of the assumptions [6]-[8] are i

Trang 1

(iti) Var(y,/X,=x,)=07, homoskedastic,

E71 0 =(ÿ;ø?) are time-invarlant

In each of the Sections 2—5 the above assumptions will be relaxed one at a time, retaining the others, and the following interrelated questions will be

discussed :

(a) what are the implications of the departures considered?

(b) how do we detect such departures?, and

(c) how do we proceed if departures are detected?

It is important to note at the outset that the following discussion which

considers individual assumptions being relaxed separately limits the scope

of misspecification analysis because it is rather rate to encounter such conditions in practice More often than not various assumptions are invalid simultaneously This is considered in more detail in Section 1 Section 6

discusses the problem of structural change which constitutes a particularly

important form of departure from [7]

211 Misspecification testing and auxiliary regressions

Misspecification testing refers to the testing of the assumptions underlying

a Statistical model In its context the null hypothesis is uniquely defined as

the assumption(s) in question being valid The alternative takes a particular form of departure from the null which is invariably non-unique This is

443

Trang 2

because departures from a given assumption can take numerous forms with the specified alternative being only one such form Moreover, most misspecification tests are based on the questionable presupposition that the other assumptions of the model are valid This is because joint misspecification testing is considerably more involved For these reasons the choice in a misspecification test is between rejecting and not rejecting the null; accepting the alternative should be excluded at this stage

An important implication for the question on how to proceed if the null is rejected is that before any action is taken the results of the other misspecification tests should also be considered It is often the case that a particular form of departure from one assumption might also affect other assumptions For example when the assumption of sample independence [8] is invalid the other misspecification tests are influenced (see Chapter 22)

In general the way to proceed when any of the assumptions [6]-[8] are

invalid is first to narrow down the source of the departures by relating them

back to the NIID assumption of ,Z,,t¢ 7} and then respecify the model taking into account the departure from NIID The respecification of the

model involves a reconsideration of the reduction from D(Z,,Z5, ,Z7; W)

to D(y,/X,; 8) so as to account for the departures from the assumptions involved As argued in Chapters 19-20 this reduction coming in the form of:

to test assumption [8] first and then proceed with the other assumptions if [8] is not reyected The sequence of misspecification tests considered in what follows is chosen only for expositional purposes

With the above discussion in mind let us consider the question of general procedures for the derivation of misspecification tests In cases where the

alternative in a misspecification test is given a specific parametric form the

various procedures encountered in specification testing (F-type tests, Wald,

Trang 3

21.1 Misspecification testing 445

Lagrange multiplier and likelihood ratio) can be easily adapted to apply in the present context In addition to these procedures several specific misspecification test procedures have been proposed in the literature (see

White (1982), Bierens (1982), inter alia) Of particular interest in the present

book are the procedures based on the ‘omitted variables’ argument which lead to auxiliary regressions (see Ramsey (1969), (1974), Pagan and Hall (1983), Pagan (1984), inter alia) This particular procedure is given a prominent role in what follows because it is easy to implement in practice and it provides a common-sense interpretation of most other misspecification tests

The ‘omitted variables’ argument was criticised in Section 20.2 because it was based on the comparison of two ‘non-comparable’ statistical GM’s This was because the information sets underlying the latter were different It was argued, however, that the argument could be reformulated by postulating the same sample information sets In particular if both parametrisations can be derived from D(Z, Z>, Zr; w) by using alternative reduction arguments then the two statistical GM’s can be made comparable

Let {Z,, f¢ 1} be a vector stochastic process defined on the probability space (S, ~ P(-)) which includes the stochastic variables of interest In Chapter 17 it was argued that for a given %Co.F

defines a general statistical GM with

satisfying some desirable properties by construction including the orthogonality condition:

Itis important to note, however, that (3)(4) as defined above are just ‘empty

boxes’ These are filled when {Z,,t¢€ 1} is given a specific probabilistic

structure such as NIID In the latter case (3)-(4) take the specific forms:

with the conditioning information set being

When any of the assumptions in NIID are invalid, however, the various

properties of u, and u, no longer hold for u* and u* In particular the

Trang 4

orthogonality condition (5) is invalid The non-orthogonality

can be used to derive various misspecification tests If we specify the alternative in a parametric form which includes the null as a special case (9) could be used to derive misspecification tests based on certain auxiliary regressions

In order to illustrate this procedure let us consider two important parametric forms which can provide the basis of several misspecification

and g(x,) is known as the Kolmogorov—Gabor polynomial (see Ivakhnenko

(1984)) Both of these polynomials can be used to specify a general parametric form for the alternative systematic component:

where z* represents known functions of the variables Z,_;, ,Z,,X, This gives rise to the alternative statistical GM

which includes (6) as a special case under

A direct comparison between (13) and (6) gives rise to the auxiliary

regression

Uy = (Bo ~ BYX, + Your + &, (21.15)

whose operational form

can be used to test (14) directly The most obvious test is the F-type test discussed in Sections 19.5 and 20.3 The F-test will take the general form

RRSS—URSS (T—k*

Trang 5

21.2 Normality 447

where RRSS and URSS refer to the residuals sum of squares from (6) and (16) (or (13)), respectively; k* being the number of parameters in (13) and m the number of restrictions

This procedure could be easily extended to the higher central moments of

where D(-) is an unknown distribution, and discuss the problem of testing whether D(-) is in fact normal or not

(1) Consequences of non-normality

Let us consider the effect of the non-normality assumption in (21) on the

specification, estimation and testing in the context of the linear regression model discussed in Chapter 19

As far as specification (see Section 19.2) is concerned only marginal changes are needed After removing assumption [6](i) the other assumptions can be reinterpreted in terms of D(f’x,, o”) This suggests that relaxing normality but retaining linearity and homoskedasticity might not

constitute a major break from the linear regression framework.

Trang 6

The first casualty of (21) as far as estimation (see Section 19.4) is concerned is the method of maximum likelihood itself which cannot be used unless the form of D(-) is known We could, however, use the least-squares

method of estimation briefly discussed in Section 13.1, where the form of the

underlying distribution is ‘apparently’ not needed

Least-squares is an alternative method of estimation which is historically much older than the maximum likelihood or the method of moments The least-squares method estimates the unknown parameters 6 by minimising the squares of the distance between the observable random variables y,,

te T, and h,(@) (a function of @ purporting to approximate the mechanism giving rise to the observed values y,), weighted by a precision factor I/x, which ts assumed known, i.e

0cQ ¢ Ky

It is interesting to note that this method was first suggested by Gauss in 1794

as an alternative to maximising what we, nowadays, call the log-likelihood function under the normality assumption (see Section 13.1 for more details)

In an attempt to motivate the least-squares method he argued that:

the most probable value of the desired parameters will be that in which the sum of the squares of differences between the actually observed and computed values multiplied by numbers that measure the degree of precision, is a minimum

This clearly shows a direct relationship between the normality assumption and the least-squares method of estimation It can be argued, however, that the least-squares method can be applied to estimation problems without assuming normality In relation to such an argument Pearson (1920) warned that:

we can only assert that the least-squares methods are theoretically accurate on the assumption that our observations obey the normal law Hence in disregarding normal distributions and claiming great generality by merely using the principle of least-squares the apparent generalisation has been gained merely at the expense of theoretical validity

Despite this forceful argument let us consider the estimation of the linear regression model without assuming normality, but retaining linearity and homoskedasticity as in (21)

The least-squares method suggests minimising

T , _— #/ 2

1=1 ỡ

Trang 7

Finite sample properties of b and $?

Although b is identical to B (the MLE of ) the similarity does not extend to the properties unless D(y,/X,; 8) is normal

(a) Since b=Ly, the OLS estimator is linear in y

Using the properties of the expectation operator E(-) we can deduce: (b) E(b) = E(b + Lu) = B+ LE(u) = £,i-e bis an unbiased estimator of B

(c) E(b— B)(b — 8 = E(LuwL)= ø?LU =ø?(XX)_ !

Given that we have the mean and variance of b but not its distribution, what other properties can we deduce?

Clearly, we cannot say anything about sufficiency or full efficiency without knowing D(y,/X,; 6) but hopefully we could discuss relative efficiency within the class of estimators satisfying (a) and (b) The Gauss— Markov theorem provides us with such a result

Gauss—Markov theorem

Under the assumption (21), b, the OLS estimator of B, has minimum variance among the class of linear and unbiased estimators (for a proof see Judge et

al, (1982))

As far as §? is concerned, we can show that

(d) E(8?)= 07, i.e §? is an-unbiased estimator of 07,

using only the properties of the expectation operator relative to D(p’x,, ¢”)

In order to test any hypotheses or set up confidence intervals for

Trang 8

0 =(ÿ, ø?) we need the distribution of the OLS estimators b and §* Thus, unless we specify the form of D(f’x,,¢7), no test or/and confidence interval

statistics can be derived The question which naturally arises is to what

extent ‘asymptotic theory’ can at least provide us with large sample results

Asymptotic distribution of b and §?

where {14 refers to the fourth central moment of D(y,/X,; 6) assumed

to be finite (see Schmidt (1976))

Note that in the case where D(y,/X,; Ø) is normal

From the above lemmas we can see that although the asymptotic

distribution of b coincides with the asymptotic distribution of the MLE this

is not the case with §* The asymptotic distribution of b does not depend on

Trang 9

21.2 Normality 451

D(y,/X,; 8) but that of $? does via 4 The question which naturally arises is

to what extent the various results related to tests about 0=(B, a) (see Section 19.5) are at least asymptotically justifiable Let us consider the F- test for Hy: RB=r against H,: RB¥¢r From lemma 21.1 we can deduce that under Hy: ,/T(Rb—r)~ N(0,o?(RQ;'R’)~'), which implies that

under Ho, and thus the F-test is robust with respect to the non-normality

assumption (21) above Although the asymptotic distribution of t,(y) is chi- square, in practice the F-distribution provides a better approximation fora small T (see Section 19.5)) This is particularly true when D(f’x,,o) has heavy tails The significance t-test being a special case of the F-test,

the size « and power of these tests can be very different from the ones based

on the postulated value of « This can seriously affect all tests which depend

on the distribution of s* such as some heteroskedasticity and structural change tests (see Sections 21.4-21.6 below) In order to get non-normality robust tests in such cases we need to modify them to take account of ji,

(2) Testing for departures from normality

Tests for normality can be divided into parametric and non-parametric

tests depending on whether the alternative is given a parametric form or not.

Trang 10

(a) Non-parametric tests

The Kolmogorov—Smirnov test

Based on the assumption that {u,/X,,t¢ 7} isan IID process we can use the

results of Appendix 1i.1 to construct test with rejection region

The Shapiro—Wilk test

This test is based on the ratio of two different estimators of the variance ơ?

and a,; is a weight coefficient tabulated by Shapiro and Wilk (1965) for

sample sizes 2< 7< 50 The rejection region takes the form:

where c, are tabulated in the above paper

(b) Parametric tests

The skewness—kurtosis test

The most widely used parametric test for normality is the skewness— kurtosis The parametric alternative in this test comes in the form of the Pearson family of densities

The Pearson family of distributions is based on the differential equation din f(z) (z—a)

Trang 11

21.2 Normality 453 where solution for different values of (a, co, Cc}, C2) generates a large number

of interesting distributions such as the gamma, beta and Student’s r It can

be shown that knowledge of a”, «3; and a, can be used to determine the distribution of Z within the Pearson family In particular:

(a) c,=0,c, #0 This gives rise to gamma-type distributions with the

chi-square an important member of this class of distributions For

3 1

(b) c,=0, cg>0, c,>0 An important member of this class of

distributions is the Student's t For Z~t(m), a,=0, #,=3+

6/(m—4), (m>4)

{c) c,<O<c, This gives rise to beta-type distributions which are

directly related to the chi-square and F-distributions In particular

if Z;~ x7(m,), i= 1,2, and Z,,Z, are independent, then

It is interesting to note that (48) also characterises normality within the

‘short’ (first four moments) Gram—Charlier expansion:

g(z)=[1 —4a3(23 —3z) + (a4 — 3)(z* — 6z? + 3)]đ®(z) (21.49)

(see Section 10.6)

Bera and Jarque (1982) using the Pearson family as the parametric

alternative derived the following skewness—kurtosis test as a Lagrange

Trang 12

The rejection region is defined by

Cy =ty: THY) > Cy}, |, dy7(2) =a (21.53)

With đ; and đ¿ being asymptotically independent (see Kendall and Stuart

(1969)) we can add the squares of their standardised forms to derive (50); see Section 6.3

Let us consider the skewness—kurtosis test for the money equation

m, = 2.896 +0.690y, + 0.865p, —0.055i, + ñ,, (21.56) (1.034) (0.105) (0.020) (0.013) (0.039)

R?=0.995, R?=0.995, s=0.0393, log L=1474,

T=80, &2=0.005, (&@,—3)?=0.145

Thus, t%(y) =0.55 and since c,=5.99 for «=0.5 we can deduce that under

the assumption that the other assumptions underlying the linear regression model are valid the null hypothesis Hy: #,=0 and a, =3 is not rejected for

œ=0.05

There are several things to note about the above skewness—kurtosis test Firstly, it is an asymptotic test and caution should be exercised when the sample size T is small: For higher-order approximations of the finite sample distribution of &, and a, see Pearson, D’Agostino and Bowman (1977), Bowman and Shenton (1975), inter alia Secondly, the test is sensitive to

‘outliers’ (‘unusually large’ deviations) This can be both a blessing and a

Trang 13

21.2 Normality 455

hindrance The first reaction of a practitioner whose residuals fail this normality test is to look for such outliers When the apparent non- normality can be explained by the presence of these outliers the problem can be solved when the presence of the outliers can itself be explained

Otherwise, alternative forms of tackling non-normality need to be

considered as discussed below Thirdly, in the case where the standard error

of the regression ¢ is relatively large (because very little of the variation in y,

is actually explained), it can dominate the test statistic r#(y) It will be suggested in Chapter 23 that the acceptance of normality in the case of the money equation above is largely due to this Fourthly, rejection of normality using the skewness—kurtosis test gives us no information as to the jnature of the departures from normality unless it is due to the presence of outliers

A natural way to extend the skewness-kurtosis test is to include cumulants of order higher than four which are zero under normality (see Appendix 6.1)

(3) Tackling non-normality

When the normality assumption is invalid there are two possible ways to proceed One is to postulate a more appropriate distribution for D( y,/X,; 9) and respecify the linear regression model accordingly This option is rarely considered, however, because most of the results in this context are

developed under the normality assumption For this reason the second way

to proceed, based on normalising transformations, is by far the most

commonly used way to tackle non-normality This approach amounts to

applying a transformation to y, or/and X, so as to induce normality Because of the relationship between normality, linearity and

homoskedasticity these transformations commonly induce linearity and homoskedasticity as well

One of the most interesting family of transformations in this context is the Box—Cox (1964) transformation For an arbitrary random variable Z the Box—Cox transformation takes the form

Trang 14

(iii) 6=0, Z*=log, Z — logarithmic (21.60)

(note: lim Z* = log, Z)

670

The first two cases are not commonly used in econometric modelling

because of the difficulties involved in interpreting Z* in the context of an

empirical econometric model Often, however, the square-root transformation might be convenient as a homoskedasticity inducing

transformation This is because certain economic time-series exhibit

variances which change with its trending mean (m,), i.e Var(Z,) = m,o?, t= 1

2, , T In such cases the square-root transformation can be used as a

variance-stabilising one (see Appendix 21.1) since Var(Z*)~o?

The logarithmic trari8formation is of considerable interest in econometric modelling for a variety of reasons Firstly, for a random variable Z, whose distribution is closer to the log normal, gamma or chi- square (i.e positively skewed), the distribution of log, Z, is approximately normal (see Johnson and Kotz (1970)) The log, transformation induces

‘near symmetry’ to the original skewed distribution and allows Z* to take

negative values even though Z could not For economic data which take

only positive values this can be a useful transformation to achieve near

normality Secondly, the log, transformation can be used as a variance-

stabilising transformation in the case where the heteroskedasticity takes the form

Var(y,/X,=x,)=o2=(u,)'02, t=1,2 ,T, (21.61) For y*=log,y, Var(y*/X,=x,)=ø?, t=1, 2, ., T Thirdly, the log

transformation can be used to define useful economic concepts such as elasticities and growth rates For example, in the case of the money equation considered above the variables are all in logarithmic form and the estimated coefficients can be interpreted as elasticities (assuming that the estimated equation constitutes a well-defined statistical model; a doubtful assumption) Moreover, the growth rate of Z, defined by Z* =(Z,—Z, _ U/

Z,-, can be approximated by Alog,Z,=log,Z,—log Z,_, because Alog.Z,~log(1+Z,)~Z,"

In practice the Box—Cox transformation can be used with 5 unspecified and let the data determine its value (see Zarembka (1974)) For the money equation the original variables M,, Y,, P, and I, were used in the Box-Cox transformed equation:

Trang 15

21.3 Linearity 457

and allowed the data to determine the value of 6 The estimated 6 value

chosen was 6=0.530 and

B, =0.252, ;=0.865, ;=0.005, ổ¿= —0.00007

(0.223) (0.119) (0.0001) (0.000 02)

‘Does this mean that the original logarithmic transformation is inappropriate” The answer is, not necessarily This is because the estimated

value of 5 depends on the estimated equation being a well-defined statistical

GM (no misspecification) In the money equation example there is enough evidence to suggest that various forms of misspecification are indeed present (see also Sections 21.3-7 and Chapter 22)

The alternative way to tackle non-linearity by postulating a more appropriate form for the distribution of Z, remains largely unexplored

Most of the results in this direction are limited to multivariate distributions

closely related to the normal such as the elliptical family of distributions (see Section 21.3 below) On the question of robust estimation see Amemiya

linear in x* but linear in x,=/(x*) where [(-) is a well-behaved transformation such as x,=log x* and x,= Oc)? Moreover, terms such as

Cot cyt tent? +7 +¢,¢"

in this section are the ones which cannot be accommodated into a linear conditional mean after transformation

and

Trang 16

It is important to note that postulating (63), without assuming normality

of D(y,,X,; w), we limit the class of symmetric distributions in which

D(y,,X%,; w) could belong to that of elliptical distributions, denoted by EL(u, 2) (see Kelker (1970)) These distributions provide an extension of the multivariate normal distribution which preserve its bell-like shape and symmetry Assuming that

This shows that the assumption of linearity is not as sensitive to some

departures from normality as the homoskedasticity assumption Indeed,

homoskedasticity of the conditional variance characterises the normal distribution within the class of elliptical distributions (see Chmielewski

(198 1))

(1) Implications of non-linearity

Let us consider the implications of non-linearity for the results of Chapter

19 related to the estimation, testing and prediction in the context of the

linear regression model In particular, ‘what are the implications of assuming that D(Z,; w) is not normal and

where y,=E(y,/X,=x,) =h(x,) and ¢,=y,—E(y,/X,=x,) Comparing (69)

and (70) we can see that the error term in the former is no longer

white noise but u,=y,—f’x,=h(x,)—B’x,+¢,=g(x,) +e, Moreover,

Trang 17

21.3 Linearity 459 E(ur/X,=x,)=9(X,), E(u#u#) #0 and

and s* are inconsistent estimators of B and o?

As we can see, the consequences of non-linearity are quite serious as far as the properties of B and s* are concerned, being biased and inconsistent estimators of B and o’, in general What is more, the testing and prediction results derived in Chapter 19 are generally invalid in the case of non- linearity In view of this the question arises as to what is it we are estimating

we can show that p— Br and s* ->ø2(§*) (see White (1980))

(2) Testing for non-linearity

In view of the serious implications of non-linearity for the results of Chapter

19 it is important to be able to test for departures from the linearity

assumption In particular we need to construct tests for

Trang 18

against

This, however, raises the question of postulating a particular functional form for h(x,) which is not available unless we are prepared to assume a particular form for D(Z,;~) Alternatively, we could use the parametrisation related to the Kolmogorov—-Gabor and systematic component polynomials introduced in Section 21.2

Using, say, a third-order Kolmogorov-Gabor polynomial (KG(3)) we can postulate the alternative statistical GM:

Note that x,, is assumed to be the constant

Assuming that T is large enough to enable us to estimate (77) we can test linearity in the form of:

Ho: y2=0 and y3;=0, A,:y,40 or y;,40

using the usual F-type test (see Section 21.1) An asymptotically equivalent test can be based on the R? of the auxiliary regression:

ti, =(Bo —BYX,+ yoo, + sat 8 (21.80)

using the Lagrange multiplier test statistic

RRSS—URSS`e

LM(y)= TR= r(

q being the number of restrictions (see Engle (1984)) Its rejection region is

Cy=ty: LM(y)>c,}, | dy7(q) =a

For small T the F-type test is preferable in practice because of the degrees of

freedom adjustment; see Section 19.5

Using the polynomial in y, we can postulate the alternative GM of the form:

Ve=ByX, + Cote +C3up +00 Hull + v; (21.82) where ju, = B’x, A direct comparison between (75) and (82) gives rise to a

Trang 19

21.3 Linearity 461

RESET type test (see Ramsey (1974)) for linearity based on Hg: cy=c3= -=¢,,=0,H,:c;40,i=2, ,m Again this can be tested using the F-type test or the LM test both based on the auxiliary regression:

u, =(B, — By x, + » Ciậu +0,, =X, (21.83)

¡=2 Let us apply these tests to the money equation estimated in Section 19.4

The F-test based on (77) with terms up to third order (but excluding

because of collinearity with y,) yielded:

.117520-0.045 477 (67 _0 17 520 -0.045 ( =

Given that c,=2.02 the null hypothesis of linearity is strongly rejected

Similarly, the RESET type test based on (82) with m=4 (excluding i? because of collinearity with 4i,) yielded:

.117520—0.060 28 (74

Me 0.060 28 2 )=3513

Again, with c,=3.12 linearity is strongly rejected

It is important to note that although the RESET type test is based on a

more restrictive form of the alternative (compare (77) with (82)) it might be the only test available in the case where the degrees of freedom are at a premium (see Chapter 23)

(3) Tackling non-linearity

As argued in Section 21.1 the results of the various misspecification tests

should be considered simultaneously because the assumptions are closely

interrelated For example in the case of the estimated money equation it is highly likely that the hnearity assumption was rejected because the

independent sample assumption [8] is invalid In cases, however, where the source of the departure is indeed the normality assumption (leading to non- linearity) we need to consider the question of how to proceed by relaxing the

normality of {Z,_,,t¢€1} One way to proceed from this is to postulate a general distribution D(y,,X,;w) and derive the specific form of the

conditional expectation

Choosing the form of D(y,, X,: ¥) will determine both the form of the

conditional expectation as well as the conditional variance (see Chapter 7)

An alternative way to proceed is to use some normalising transformation

Trang 20

on the original variables y, and X, so as to ensure that the transformed

variables y* and X* are indeed jointly normal and hence

(see Box and Tidwell (1962))

In practice non-linear regression models are used in conjunction with the normality of the conditional distribution (see Judge et al (1985), inter alia)

The question which naturally arises is, ‘how can we reconcile the non-

linearity of the conditional expectation and the normality of D(y,/X,; 0} As mentioned in Section 19.2, the linearity of u,=E(y,/X,=x,) is a direct consequence of the normality of the joint distribution D(y,, X,; w) One way the non-linearity of E(y,/X,=x,) and the normality of D(y,/X,; 0) can be reconciled is to argue that the conditional distribution is normal in the transformed variables X* = h(X,), ie D(y,/X* =x) linear in x* but non-

linear in x,, Le

Moreover, the parameters of interest are not the linear regression parameters 6=(B, 0”) but @=(y, o2) It must be emphasised that non- linearity in the present context refers to both non-linearity in parameters (y) and variables (X,)

Non-linear regression models based on the statistical GM:

can be estimated by least-squares based on the minimisation of

,

Trang 21

21.4 Homoskedasticity 463

This minimisation will give rise to certain non-linear normal equations which can be solved numerically (see Harvey (1981), Judge et al (1985), Malinvaud (1970), inter alia) to provide least-squares estimators for

y: mx 1 62 can then be estimated by

21.4 Homoskedasticity

The assumption that Var(y,/X,= x,)= ø is Íree oŸ x, is a consequence of the assumption that D(y,, X,; ý) is multivariate normal As argued above, the assumption of homoskedasticity is inextricably related to the assumption of normality and we cannot retain one and reject the other uncritically Indeed, as mentioned above, homoskedasticity of Var(y,X,=X,)

characterises the normal distribution within the elliptical class For argument’s sake, let us assume that the probability model is in fact based on

D(’x,,¢2) where D(-) is some unknown distribution and ơ¿ =h(X,) (1) Implications of heteroskedasticity

As far as the estimators f and s? are concerned we can show that

(1) E(B)=B, i.e B is an unbiased estimator of B

(ii) Ô — ÿ i.e ƒ 1s a consistent estimator of ÿ

These results suggest that B=(X’X) ‘X’y retains some desirable properties such as unbiasedness and consistency, although it might be inefficient B is usually compared with the so-called generalised least- squares (GLS) estimator of B, B, derived by minimising

Trang 22

ôg) ws 28 0=>=(XQ-!X) !'X@-'y R ‘E71 ~lyg-t

presupposition that A is known a priori and thus the above efficiency

comparison is largely irrelevant It should surprise nobody to ‘discover’ that supplementing the statistical model with additional information we can get a more efficient estimator Moreover, when A is known there is no need for GLS because we can transform the original variables in order to return to a homoskedastic conditional variance of the form

This can be achieved by transforming y and X into

y*=Hy and X*=HX where H'H=A"™' (21.98)

In terms of the transformed variables the statistical GM takes the form

and the linear regression assumptions are valid for y* and X* Indeed, it can

be verified that

p=(X*’X*)'X*y*=(X'A7IX) UX’A ly= Bh (21.100)

Hence, the GLS estimator is rather unnecessary in the case where A is

known a priori

The question which naturally arises at this stage is, ‘what happens when Qis unknown?’ The conventional wisdom has been that since Q involves T unknown incidental parameters and increases with the sample size it is clearly out of the question to estimate T+k parameters from 7 observations Moreover, although B=(X’X) 'X’y is both unbiased and consistent s2(XX) ! is an inconsistent estimator of Cow )=

(X’X) !X’OX(X’X)~! and the difference

Trang 23

21.4 Homoskedasticity 465

can be positive or negative Hence, no inference on , based on f, is possible since for a consistent estimator of Cov(p) we need to know Q (or estimate it consistently) So, the only way to proceed is to model a? so as to ‘solve’ the incidental parameters problem

Although there is an element of truth in the above viewpoint White (1980) pointed out that for consistent inference based on B we do not need to estimate Q by itself but (X’QX), and the two problems are not equivalent

The natural estimator o? is i? =(y,— f’x,)*, which is clearly unsatisfactory

because it is based on only one observation and no further information accrues by increasing the sample size On the other hand, there is a perfectly acceptable estimator for (X'QOX) coming in the form of

The most important implication of this is that consistent inference, such as

the F-test, is asymptotically justifiable, although the loss in efficiency should be kept in mind In particular a test for heteroskedasticity could be

based on the difference

Before we consider this test it is important to summarise the argument so

far

Under the assumption that the probability model is based on the

distribution D(f’x,,c7), although no estimator of Q=diag(o?, ., 67) is

possible, B=(X’X)~!X’y is both unbiased and consistent (under certain conditions) and a consistent estimator of Cov(f) is available in the form of

W, This enables us to use B for hypothesis testing related to Ø The argument of ‘modelling’ a? will be taken up after we consider the question of testing for departures from homoskedasticity

(2) Testing departures from homoskedasticity

White (1980), after proposing a consistent estimator for X’QOX, went on to

Trang 24

use the difference (equivalent to (105)):

the natural estimator of (107) Given that (108) is symmetric we can express

the 4k(k — 1) different elements in the form

suggest an asymptotically equivalent test based on the R? of the auxiliary

Trang 25

Example

For the money equation estimated above the estimated auxiliary equation

of the form

Ge =cotyW, +0,

yielded R?=0.190, FT(y)=2.8 and TR*= 15.2 In view of the fact that

F(6.73)=2.73 and y7(6)=12.6 for «=0.05 the null hypothesis of

homoskedasticity is rejected by both tests

The most important feature of the above White heteroskedasticity test

is that ‘apparently’ no particular form of heteroskedasticity is postulated

In subsection (3) below, however, it is demonstrated that the White test 1s an exact test in the case where D(Z,; w) is assumed to be multivariate ¢ In this

case the conditional mean is py, = p’x, but the variance takes the form:

Using the ‘omitted variables’ argument for u? = F(u2/X,=X,)+ 0, We can

derive the above auxiliary regression (see Spanos (1985b)) This suggests

that although the test is likely to have positive power for various forms of

heteroskedasticity it will have highest power for alternatives in the multivariate t direction That is, multivariate distributions for D(Z,; w) which are symmetric but have heavier tails than the normal

In practice it is advisable to use the White test in conjunction with other

tests based on particular forms of heteroskedasticity In particular, tests which allow first and higher-order terms to enter the auxiliary regression, such as the Breusch—Pagan test (see (128) below)

Important examples of heteroskedasticity considered in the econometric literature (see Judge et al (1985), Harvey (1981)) are:

Ngày đăng: 17/12/2013, 15:17