1. Trang chủ
  2. » Giáo Dục - Đào Tạo

THE LINEAR REGRESSION MODEL I

43 347 0
Tài liệu được quét OCR, nội dung có thể không chính xác

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề The linear regression model I
Trường học Standard University
Chuyên ngành Econometrics
Thể loại Thesis
Thành phố Standard City
Định dạng
Số trang 43
Dung lượng 1,3 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

On the basis of the assumption that {Z,, te T} isa NIID vector stochastic process we can proceed to reduce the joint distribution DZ,,..., Z;; w in order to define the statistical GM of

Trang 1

In relation to the Gauss linear model discussed in Chapter 18, apart from some apparent similarity in the notation and the mathematical manipulations involved in the statistical analysis, the linear regression model purports to model a very different situation from the one envisaged

by the former In particular the Gauss linear model could be considered to

be the appropriate statistical model for analysing estimable models of the form

where M, refers to money and Q,,, i= 1,2, 3 to quarterly dummy variables,

in view of the non-stochastic nature of the x,,s involved On the other hand, estimable models such as

referring to a demand for money function (M ~ money, Ÿ - income, P ~ price level, J — interest rate), could not be analysed in the context of the Gauss

369

Trang 2

linear model This is because it is rather arbitrary to discriminate on probabilistic grounds between the variable giving rise to the observed data chosen for M and those for Y, P and J For estimable models such as (3) the linear regression model as sketched in Chapter 17 seems more appropriate, especially if the observed data chosen do not exhibit time dependence This will become clearer in the present chapter after the specification of the linear regression model in Section 19.2 The money demand function (3) is used to illustrate the various concepts and results introduced throughout this chapter

Vt N my\(O11 12

in an obvious notation (see Chapter 15) It is interesting to note at this stage that these assumptions seem rather restrictive for most economic data in general and time-series in particular

On the basis of the assumption that {Z,, te T} isa NIID vector stochastic process we can proceed to reduce the joint distribution D(Z,, , Z;; w) in order to define the statistical GM of the linear regression model using the general form

where

H,= E(y,/X,;=x,)_ is the systematic component,

and

u,=y,—E(y,/X,=x,) the non-systematic component

(see Chapter 17) In view of the normality of {Z,,t¢1T! we deduce that

H, = E(y,/X,=X,)= Bo + B’x, (Hnear in x,), (19.6) where

Trang 3

19.2 Specification 371 where ø?=ø¡i¡—øizŠ;2ø¿; (see Chapter 15) The time inoariance of the parameters Øạ, B and oa? stems from the identically distributed (ID) assumption related to {Z,,t¢ 1} It is important, however, to note that the

ID assumption provides only a sufficient condition for the time invariance

of the statistical parameters

In order to simplify the notation let us assume the m=0 without any loss

of generality given that we can easily transform the original variables in mean derivation form (y,—m,) and (X,—m,) This implies that Bp, the coefficient of the constant, is zero and the systematic component becomes

In practice, however, unless the observed data are in mean deviation form the constant should never be dropped because the estimates derived ores are not estimates of the regression coefficients B= 2 ¢,, but of B* = E(X,X/) | E(X/y,}; see Appendix 19.1 on the role of the constant The stattical GM of the linear regression model takes the particular form

with 0 =(B, o”) being the statistical parameters of interest; the parameters in terms of which the statistical GM is defined By construction the systematic and non-systematic components of (9) satisfy the following properties:

(i) E(u,/X, = X;) = E[U,— E(y,/X,= X,))/X = =x,]

(iti) E(u,u,/X, = X,) = 1, E(u,/X,=x,)=0, tseT

The first two properties define {u,, t€ T} to be a white-noise process and (iii) establishes the orthogonality of the two components It is important to note that the above expectation operator E(-/X,=x,) is defined in terms of D(y,/X,; 6), which is the distribution underlying the probability model for (9) However, the above properties hold for E(-) defined in terms of D(Z,; w)

as weil, given that:

(i)’ Elu,) = Ey E(u,/X,=x,)} =0;

(ii) E(u,u,) = EY E(uu,/X,=x,)} = a, t=s 0, t#s;

Trang 4

and

(„ E(uu,)= Et(Euu,X,=x,)}=0, ĐseT

(see Section 7.2 on conditional expectation)

The conditional distribution D(y,/X,; 0) 1s related to the joint distribution D(y,, X„; ý) via the decomposition

DỤ,,X„; /⁄)=D(yX, W,) D(X,; Wf) (19.10)

(see Chapter 5) Given that in defining the probability model of the linear regression model as based on D(},/X,; 8) we choose to ignore D(X,; ;) for the estimation of the statistical parameters of interest @ For this to be possible we need to ensure that X, is weakly exogenous with respect to 0 for

the sample period t=1, 2, , T (see Section 19.3, below)

For the statistical parameters of interest 0 =(B, a7) to be well defined we need to ensure that £,, is non-singular, in view of the formulae B= Z3;'02,,

67 =6,; — 6,273 6>,, at least for the sample period t= 1, 2, , T: This requires that the sample equivalent of £,,,(1/T)(X’X) where X =(x,,X), x)’ is indeed non-singular, i.e

pe

X, being a k x | vector

As argued in Chapter 17, the statistical parameters of interest do not

necessarily coincide with the theoretical parameters of interest € We need,

however, to ensure that € is uniquely defined in terms of 6 for € to be identifiable In constructing empirical econometric models we proceed from

a well-defined estimated statistical GM (see Chapter 22) to reparametrise it

in terms of the theoretical parameters of interest Any restrictions induced

by the reparametrisation, however, should be tested for their validity For this reason no a priori restrictions are imposed on @ at the outset to make such restrictions testable at a later stage

As argued above, the probability model underlying (9) is defined in terms

of D(y,/X,; 8) and takes the form

Having defined all three components of the linear regression model let us

Trang 5

19.2 Specification 373

collect all the assumptions together and specify the statistical model properly

The linear regression model: specification

(1) Statistical GM, y,= Bx, +u,, te T

[1] tt, = E(y,/X,=x,) — the systematic component; u, = y, ~ Ely,/X,= X,)

— the non-systematic component

[2] 0=(B, 07) B=Xz76n;, 07 =0,,—G 174374, are the statistical

parameters of interest (Note: Z),=Cov(X,), 62; =Cov(X,, y,), Øi¡= Vat(y,).)

[3] X, is weakly exogenous with respect to #,r=1,2, ,T

(11) E(y,/X,=X,) = B’x, — linear in x,;

(ili) Var(y,/X,=X,) =o? — homoskedastic (free of x,);

[7] @ is time invariant

(IIT) Sampling model

[8] y=(\;, , yy) represents an independent sample sequentially

drawn from D(y,/X,; 9), t= 1, 2, T-

An important point to note about the above specification is that the model is specified directly in terms of D(y,/X,; 6) making no assumptions about D(Z,; w) For the specification of the linear regression model there is

no need to make any assumptions related to {Z,,t¢ Tj The problem, however, is that the additional generality gained by going directly to D(y,/X,; 0) is more apparent than real Despite the fact that the assumption that {Z,,t¢ 1} is a NHD process is only sufficient (not necessary) for [6]

to [8] above, it considerably enhances our understanding of econometric modelling in the context of the linear regression model This is, firstly, because it is commonly easier in practice to judge the appropriateness of

Trang 6

probabilistic assumptions related to Z, rather than (),/X,=x,); and, secondly, in the context of misspecification analysis possible sources for the departures from the underlying assumptions are of paramount importance Such sources can commonly be traced to departures from the assumptions postulated for {Z,,te T} (see Chapters 21-22)

Before we discuss the above assumptions underlying the linear regression

it is of some interest to compare the above specification with the standard textbook approach where the probabilistic assumptions are made in terms

of the error term

Standard textbook specification of the linear regression model y=Xÿ+u

(1) (u/X) ~ N(O, ø?1,);

(2) no a priori information on (8, ø?);

(3) rank (X)=k

Assumption (1) implies the orthogonality E(X{u,/X,=x,)=0,t=1,2, ,

T, and assumptions [6] to [8] the probability and the sampling models respectively This is because (y/X) is a linear function of uand thus normally distributed (see Chapter 15), ie

As we can see, the sampling model assumption of independence is ‘hidden’ behind the form of the conditional covariance o7/ Because of this the independence assumption and its implications are not clearly recognised in certain cases when the linear regression model is used in econometric modelling As argued in Chapter 17, the sampling model of an independent sample is usually inappropriate when the observed data come in the form of aggregate economic time series Assumptions (2) and (3) are identical to [4] and [5] above The assumptions related to the parameters of interest Ø=(, ø?) and the weak exogeneity of X, with respect to 6 ([2] and [3] above) are not made in the context of the standard textbook specification These assumptions related to the parametrisation of the statistical GM play

a very important role in the context of the methodology proposed in Chapter | (see also Chapter 26) Several concepts such as weak exogeneity (see Section 19.3, below) and collinearity (see Sections 20.5-6) are only definable with respect to a given parametrisation Moreover, the statistical

GM is turned into an econometric model by reparametrisation, going from the statistical to the theoretical parameters of interest

The most important difference between the specification [1]-[8] and (1)H{3), however, is the role attributed to the error term In the context of the

Trang 7

19.3 Discussion of the assumptions 375

latter the probabilistic and sampling model assumptions are made in terms

of the error term not in terms of the observable random variables involved

as in [1]-[8] This difference has important implications in the context of misspecification testing (testing the underlying assumptions) and action thereof The error term in the context of a statistical model as specified in the present book is by construction white-noise relative to a given information set ACF

19.3 Discussion of the assumptions

[1] The systematic and non-systematic components

As argued in Chapter 17 (see also Chapter 26) the specification of a statistical model is based on the joint distribution of Z,,t=1,2, , Tie

D(Z,,Z3, , 275, )= Dữ»: ý) (19.14)

which includes the relevant sample and measurement information The specification of the linear regression model can be viewed as directly related to (14) and derived by ‘reduction’ using the assumptions of normality and IID The independence assumption enables us to reduce D(Z; W) into the product of the marginal distributions D(Z,; w,), t= 1,2, ,

Under the NITD assumptions y, and u, take the particular forms:

Trang 8

Again, if the NIID assumptions are invalid then

(see Chapters 21-22),

[2] The parameters of interest

As discussed in Chapter 17, the parameters in terms of which the statistical

GM is defined constitute by definition the statistical parameters of interest and they represent a particular parametrisation of the unknown parameters of the underlying probability model In the case of the linear regression model the parameters of interest come in the form of 0=(B, a’)

where =Š;ÿø;, ø?=0ii—Ø¡;Ÿ;jø¿, AS argued above the

parametrisation 6 depends not only on D(Z; W) but also on the assumptions

of NIID Any changes in Z, or/and the NID assumptions will in general change the parametrisation

[3] Exogeneity

In the linear regression model we begin with D(y,, X,; w) and then we concentrate exclusively on D(y,/X,;,) where

which implies that we choose to ignore the marginal distribution D(X,;w,) In order to be able to do that, this distribution must contain no information relevant for the estimation of the parameters of interest, 0=(B,c7), i.e the stochastic structure of X, must be irrelevant for any inference on @ Formalising this intuitive idea we say that: X, 1s weakly exogenous over the sample period for @ if there exists a reparametrisation with ý=(Ú¡.¿) such that:

(i) 0 is a function of w, (@=h(y,));

(ii) yw, and y, are variation free ((w,, ¥,)eP, x W,)

Variation free means that for any specific value ý; in ‘P,, w, cau take any other value in ‘¥, and vice versa For more details on exogeneity see Engle, Hendry and Richard (1983) When the above conditions are not satisfied the marginal distribution of X, cannot be ignored because it contains relevant information for any inference on Ø

Trang 9

19.3 Discussion of the assumptions 377

[4] No a priori information on 0 =(B 0°)

This assumption is made at the outset in order to avoid imposing invalid testable restrictions on 6 At this stage the only relevant interpretation of 0

is as statistical parameters, directly related to W, in D(y,/X,; #1) As such no

a priori information seems likely to be available for 8 Such information is commonly related to the theoretical parameters of interest § Before 6 is used to define &, however, we need to ensure that the underlying statistical model is well defined (no misspecification) in terms of the observed data chosen

[5] The observed data matrix X is of full rank

For the data matrix X =(x,, X), x7), Tx k, we need to assume that

can be seen as the sample moment equivalent to Ly)

[6] Normality, linearity, homoskedasticity

The assumption of normality of D(y,, X,; ý) plays an important role in the specification as well as statistical analysis of the linear regression model As far as specification is concerned, normality of D(y,,X,; ý) implies (i) D(4,/X,; 9) is normal (see Chapter 15):

(1) E(y,/X,= x,)= xu a linear function of the observed value x, of X,; (iil) Var(y,/X,=x,)=ø?, the conditional variance is free of x,, Le

homoskedastic

Moreover, (i)-(iii) come very close to implying that D(¥,, X,; ý) is normal as

well (see Chapter 24.2)

Trang 10

[7] Parameter time-invariance

As far as the parameter invariance assumption is concerned we can see that it stems from the time invariance of the parameters of the distribution D(y,,X,; w); that is, from the identically distributed (ID) component of the normal IID assumption related to Z,

[8] Independent sample

The assumption that y is an independent sample from D(y,/X,; @), t= 1, 2, , T, is one of the most crucial assumptions underlying the linear regression model In econometrics this assumption should be looked at very closely because most economic time series have a distinct time dimension (dependency) which cannot be modelled exclusively in terms of exogenous random variables X, In such cases the non-random sample assumption (see Chapter 23) might be more appropriate

19.4 Estimation

(1) Maximum likelihood estimators

Let us consider the estimation of the linear regression model as specified by the assumptions [1]-[8] discussed above Using the assumptions [6] to [8] we can deduce that the likelihood function for the model takes the form

Trang 11

19.4 Estimation 379

(4) rà0 — B’x,) “=e 3 ủệ, in an obvious notation,

are the maximum likelihood estimators (MLE’s) of 8 and øˆ, respectively If

we were to write the statistical GM, y,=f’x,t+u, t=1, 2, , T, in the matrix notation form

where y=(),,.-., Yr), TX 1, X=B(X,, ,X7), TX ke andus(u,, , uz,

T x 1, the MLE’s take the more suggestive form

., #=(XX) 'X

Gr = T ad The information matrix I-(@) is defined by

Clog L\ /é log LY’ ê?log L

Trang 12

look like let us consider these formulae for the simple model:

Compare these formulae with those of Chapter 18

One very important feature of the MLE f above is that it preserves the original orthogonality between the systematic and non-systematic components

between the estimated systematic and non-systematic components in the form

Trang 13

19.4 Estimation 381

p=XB, i=y— XB, respectively This is because

where P,=X(XX) !X' is a symmetric (P, =P,), idempotent (P‡=P,) matrix (i.e it represents an orthogonal projection) and

E( ja’) = E(Pyy'(I — P,))

=E(P,yu(I—P,)), since (I-P,)y=(I—P,Ju

=P,(I—P,)o’, since E(yu’)=o7I,

In other words, the systematic and non-systematic components were estimated in such a way so as to preserve the original orthogonality Geometrically P, and (I—P,,) represent orthogonal projectors onto the subspace spanned by the columns of X, say #/(X), and into its orthogonal complement #(X)~, respectively The systematic component was estimated

by projecting y onto @(X) and the non-systematic component by projecting y into #(X)", ie

Moreover, this orthogonality, which is equivalent to independence in this context, is passed over to the MLE’s f and 6? since jris independent of Wa = y(I—P,)y, the residual sums of squares, because P,(I—P,)=0 (see Q6, Chapter 15) Given that =X and 6? =(1/T)i'a we can deduce that B and 6” are independent; see (E2) of Section 7.1

Another feature of the MLE’s f and 6? worth noting is the suggestive similarity between these estimators and the parameters f, o*:

Using the orthogonality of the estimated components â and a we could

Trang 14

decompose the variation in y as measured by y’y into

denoted as

{total} (explained) (residual)

where SS stands for sums of squares The multiple correlation coefficient in this case takes the form

R= #n-T†? aE 1 ` (yy-TƒẾ) RSS TSS 343

Note that R* was used in Chapter 15 to denote the population multiple correlation coefficient but in the econometrics literature R? is also used to

denote R? and R?

Both of the above measures of ‘goodness of fit’”, R? and R?, have variously

been defined to be the sample multiple correlation coefficient in the econometric literature Caution should be exercised when reading different textbooks because R? and R? have different properties For example, 0<R* <1, but no such restriction exists for R2 unless one of the regressors

in X, is the constant term On the role of the constant term see Appendix 19.1

One serious objection to the use of R? as a goodness-of-fit measure is the fact that as the number k of regressors increases, R* increases as well irrespective of whether the regressors are relevant or not For this reason a

‘corrected’ goodness-of-fit measure is defined by

Trang 15

19.4 Estimation 383

The correction is the division of the statistics involved by their corresponding degrees of freedom; see Theil (1971)

(2) An empirical example

In order to illustrate some of the concepts and results introduced so far let

us consider estimating a transactions demand for money Using the simplest form of a demand function we can postulate the theoretical model:

where M° is the transactions demand for money, Y is income, P is the price level and I is the short-run interest rate referring to the opportunity cost of holding transactions money Assuming a multiplicative form for h(-) the demand function takes the form

or

In MP=za+z¡lnY+a;lnP+zz In 1, (19.47) where In stands for log, and a )=In A

For expositional purposes let us adopt the commonly accepted approach

to econometric modelling (see Chapter 1) in an attempt to highlight some of the problems associated with it If we were to ignore the discussion on econometric modelling in Chapter 1 and proceed by using the usual

‘textbook’ approach the next step is to transform the theoretical model to

an econometric model by adding an error term, 1.e the econometric model

is

where m,=In M,, y,=In Y,, p,=In P,, i,=In I, and u,~ NI(0, 0”) Choosing

some observed data series corresponding to the theoretical variables, M, Y,

P and I, say:

M,- M1 money stock;

Y, — real consumers’ expenditure;

P, ~ implicit price deflator of Ÿ,;

[, — interest rate on 7 days’ deposit account (see Chapter 17 and its appendix for these data series),

respectively, the above equation can be transformed into the linear regression statistical GM:

Mi, = Bo + By 3, + BoB + Bai, tu (19.49) Estimation of this equation for the period 1963i-1982iv (T= 80) using

Trang 16

quarterly seasonally adjusted (for convenience) data yields

2.896 0.690 0.865

—0.055 s*=0.00155, R?=0.9953, R?=0.9951,

TSS = 24.954, ESS=24.836, RSS=0.118

That is, the estimated equation takes the form

mi, = 2.896 + 0.690, + 0.8657, —0.055i, + i, (19.50) The danger at this point is to get carried away and start discussing the plausibility of the sign and size of the estimated ‘elasticities’ (?) For example,

we might be tempted to argue that the estimated ‘elasticities’ have both a

‘correct’ sign and the size assumed on a priori grounds Moreover, the

‘goodness of fit’ measures show that we explain 99.5°% of the variation Taken together these results ‘indicate’ that (50) is a good empirical model for the transactions demand for money This, however, will be rather premature in view of the fact that before any discussion of a priori economic theory information we need to have a well-defined estimated statistical model which at least summarises the sample information adequately Well defined in the present context refers to ensuring that the assumptions underlying the statistical model adopted are valid This is because any formal testing of a priori restrictions could only be based on the underlying assumptions which when invalid render the testing procedures incorrect Looking at the above estimated equation in view of the discussion of econometric modelling in Chapter | several objections might be raised: (i) The observed data chosen do not correspond one-to-one to the

theoretical variables and thus the estimable model might be different from the theoretical model (see Chapter 23)

(1) The sampling model of an independent sample seems questionable

in view of the time paths of the observed data (see Fig 17.1) (iii) The high R? (and R’) is due to the fact that the data series for M, and

P, havea very similar time trend (see Fig 17 l(a) and (c)}) If we look

at the time path of the actual (4,) and fitted (},) values we notice that f, ‘tracks’ (explains) largely the trend and very little else (see Fig 19.1) An obvious way to get some idea of the trend’s contribution in R? is to subtract p, from both sides of the money equation in an attempt to ‘detrend’ the dependent variable.

Trang 17

Fig 19.2 Actual y,=In(M/P), and fitted y, from (19.51)

In Fig 19.2 the actual and fitted values of the ‘largely’ detrended dependent variable (m,—p,) are shown to emphasise the point The new regression equation yielded

(m,—p,)=2.896+0.690y,—0.135p,—0.055ï, + ñ,

Trang 18

Looking at this estimated equation we can see that the coefficients of the constant, y, and i,, are identical in value to the previous estimated equation The estimated coefficient of p, is, as expected, one minus the original estimate and the s? is identical for both estimated equations These suggest that the two estimated equations are identical as far as the estimated coefficients are concerned This is a special case of a more general result related to arbitrary linear combinations of the x,,s subtracted from both sides of the statistical GM In order to see this let us subtract y’x, from both sides of the statistical GM:

or

ye = Px, Tu,

in an obvious notation It is easy to see that the non-systematic component

as well as o* remain unchanged Moreover, in view of the equality

to note at this stage that trending data series can be a problem when the asymptotic properties of the MLE’s are used uncritically (see sub-section (4) below)

(3) Properties of the MLE 6=(B, a7) — finite sample

In order to decide whether the MLE 6 is a ‘good’ estimator of @ we need to consider its properties The finite sample properties (see Chapters 12 and 13) will be considered first and then the asymptotic properties

6 being a MLE satisfies certain properties by definition:

(1) For a Borel function h(-) the MLE of h() is h(6) For example, the

MLE of log(f’B) is log(f’ f).

Trang 19

In order to discuss any other properties of the MLE 6 of 6 we need to derive the sampling distribution of 6 Given that B and 6? are independent

we can consider them separately

(3(i)) fis an unbiased estimator of B since E(p)=, i.e the sampling

distribution of B has mean equal to ổ

(4(i)) Bisa fully efficient estimator of B since Cov(p)=07(X’X) “+, ie

Cov(B) achieves the Cramer—Rao lower bound; see (30) above

Trang 20

where tr M, refers to the trace of M, (trA=)"_, a;;, Ai nxn),

trM,=trI-trX(XX) 'X' Gince tr(A+B)=tr A+tr B)

Using (61) we can deduce that

t5 )=T-t and var( 53 }=3=k PB a

(see Appendix 6.1) These results imply that

T—k EG?)=—— 0? #0"

Var(23)=^U Tả “Co a - Cramer-Rao lower bound g4 „2-9 ø

That is:

(3(ii)) 6? is a biased estimator of o?: and

(4(ii)) 6? is not a fully efficient estimator of 02

However, 3(1i1) implies that for

and E(S?)=o7, Var(s?)=(20+)(T—k)>(20%)/T — Cramer-Rao bound

That is, s? is an unbiased estimator of c?, although it does not quite achieve

the Cramer-Rao lower bound given by the information matrix (30) above

It turns out, however, that no other unbiased estimator of o? achieves that bound and among such estimators s? has minimum variance In statistical inference relating to the linear regression model s? is preferred to 67 as an estimator of a7

The sampling distributions of the estimators and s* involve the

Trang 21

19.4 Estimation 389

unknown parameters f and o” In practice the covariance of B is needed to assess the ‘accuracy’ of the estimates From the above analysis it is known that

m, = 2.896 + 0.690, +0.865p, —0.055i, + ñ, (19.66) (1.034) (0.105) (0020) (0013) (0039)

R?=0.9953, R?=0.9951, s=0/0393, logL=147412, T=80 Note that having made the distinction between theoretical variables and observed data the upper tildas denoting observed data have been dropped for notational convenience and R? is used instead of R? in order to comply with the traditional econometric notation

(4) Properties of the MLE 6,=(B, 6?) — asymptotic

An obvious advantage of MLE’s is the fact that under certain regularity conditions they satisfy a number of desirable asymptotic properties (see Chapter 13)

P

(1U) Consistency (Ôy — 8)

Looking at the information matrix (30) we can deduce that:

(i) é* is a consistent estimator of o?, ie

lim Pr(|2?— ø?|<e)= l, Tox

since MSE(é?) > 0 as T> cw; and

lim (X’X) 7! = lim (x sx =0, (19.67) T¬x To x

lim; ,„ Pr|Ệ— f|<e)= I, ie Ê is a consistent estimator of B.

Ngày đăng: 17/12/2013, 15:17

TỪ KHÓA LIÊN QUAN