1. Trang chủ
  2. » Giáo Dục - Đào Tạo

THE LINEAR REGRESSION ANH RELATED STATISTICAL MODELS

19 303 0
Tài liệu được quét OCR, nội dung có thể không chính xác

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Simple statistical models
Chuyên ngành Econometrics
Thể loại Textbook chapter
Định dạng
Số trang 19
Dung lượng 630,62 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

17.2 Economic data and the sampling model Economic data are usually non-experimental in nature and come in one of three forms: i time Series, Measuring a particular variable at successi

Trang 1

The linear regression and related statistical models

Trang 2

CHAPTER 17

Statistical models in econometrics

17.1 Simple statistical models

The main purpose of Parts I] and III has been to formulate and discuss the concept of a statistical model which will form the backbone of the discussion in Part IV A statistical model has been defined as made up of two related components:

(i) a probability model, ®= {D(y; 0), @¢ O}+ specifying a parametric

family of densities indexed by 0; and

(ii) a sampling model, y=(y;, ¥2, - Jr)’ defining a sample from

D(y; 69), for some ‘true’ @ in O

The probability model provides the framework in the context of which the stochastic environment of the real phenomenon being studied can be

defined and the sampling model describes the relationship between the

probability model and the observable data By postulating a statistical model we transform the uncertainty relating to the mechanism giving rise to the observed data to uncertainty relating to some unknown parameter(s) 0 whose estimation determines the stochastic mechanism D(y; 6)

An example of such a statistical model in econometrics is provided by the

modelling of the distribution of personal income In studying the distribution of personal income higher than a lower limit yy the following statistical model is often postulated:

(i) D= 4 D(y/yos A= “) yh 0cf`,, y>yo¿:

+0 J

(1) y=(¡.ÿ¿, , Yy) isa random sample from D(y/ya;0)

+ The notation in Part IV will be somewhat different from the one used in Parts Il and

IH This change in notation has been made to conform with the established

econometric notation

339

Trang 3

Note:

For y a random sample the likelihood function is

T /Ø0N(y,V*!

LỊU; y)= H (2?) =0 y,,y;, , vợ) 690,

t=1 XŸo/ Vi

r

log L(6; y)= T log 0+ TO log yy —(0+ 1) ¥ log y,,

t=1

dlogL T

ẽ — + T log yo — > log y,=0,

t

dé)

0=r| ("||

is the maximum likelihood estimator (MLE) of the parameter 6 Since

(d? log L)/d6? = — T/6?, the asymptotic distribution of 6 takes the form (see

Chapter 13):

/ T(8— 0) ~ NI0, 0°)

Although in general the finite sample distribution is not frequently available, in this particular case we can derive D(0) analytically It takes the

form

(see Appendix 6.1) This distribution of § can be used to consider the finite sample properties of 0 as well as test hypotheses or set up confidence intervals for the unknown parameter 0 For instance, in view of the fact that

E(6) = (¿;}"

we can deduce that Ô is a biased estimator of 0

It is of interest in this particular case to assess the ‘accuracy’ of the asymptotic distribution of @ for a small T, (T=8), by noting that

^ T?8?

¬-.=

Trang 4

17.1 Simple statistical models 341 (see Johnson and Kotz (1970)) Using the data on income distribution (see Chapter 2), for y> 5000 (reproduced below) to estimate 0,

Income lower

No of

we get

aap loe(**) | = 1.6

as the ML estimate

Using the invariance property of MLE’s (see Section 13.3) we can deduce that

£(0)=2.13, Var(6)=0.91

As we can see, for a small sample (T=8) the estimate of the mean and the variance are considerably larger than the ones given by the asymptotic distribution:

Ậ2

E(O}= 1.6, Var(t) =; = 0.32

On the other hand, for a much larger sample, say T= 100,

E(6) = 1.63, Var(6)=0.028,

as compared with

E(6)=1.6, Var(0)=0.026

These results exemplify the danger of using asymptotic results for small samples and should be viewed as a warning against uncritical use of asymptotic theory For a more general discussion of asymptotic theory and how to improve upon the asymptotic results see Chapter 10

The statistical inference results derived above in relation to the income

distribution example depend crucially on the appropriateness of the statistical model postulated That is, the statistical model should represent a good approximation of the real phenomenon to be explained in a way which takes account the nature of the available data For example, if the

data were collected using stratified sampling then the random sample assumption is inappropriate (see Section 17.2 below) When any of the

Trang 5

assumptions underlying the statistical model are invalid the above

estimation results are unwarranted

In the next three sections it is argued that for the purposes of econometric modelling we need to extend the simple statistical model based on a random sample, illustrated above, in certain specific directions as required by the particular features of econometric modelling In Section 17.2 we consider the nature of economic data commonly available and discuss its

implications for the form of the sampling model It is argued that for most

forms of economic data the random sample assumption is inappropriate Section 17.3 considers the question of constructing probability models if the identically distributed assumption does not hold The concept of a statistical generating mechanism (GM) is introduced in Section 17.4 in order to supplement the probability and sampling models This additional

component enables us to accommodate certain specific features of econometric modelling In Section 17.5 the main statistical models of

interest in econometrics are summarised as a prelude to the discussion

which follows

17.2 Economic data and the sampling model

Economic data are usually non-experimental in nature and come in one of

three forms:

(i) time Series, Measuring a particular variable at successive points in

time (annual, quarterly, monthly or weekly);

(ii) cross-section, measuring a particular variable at a given point in

time over different units (persons, households, firms, industries, countries, etc.);

(1) panel data, which refer to cross-section data over time

Economic data such as M1 money stock (M), real consumers’ expenditure (Y) and its implicit deflator (P), interest rate on 7 days’ deposit account (J), over time, are examples of time-series data (see Appendix, Table 17.2) The income data used in Chapter 2 are cross-section data on 23 000 households

in the UK for 1979-80 Using the same 23 000 households of the cross- section observed over time we could generate panel data on income In

practice, panel data are rather rare in econometrics because of the

difficulties involved in gathering such data For a thorough discussion of econometric modelling using panel data see Chamberlain (1984)

The econometric modeller is rarely involved directly with the data collection and refinement and often has to use published data knowing very

little about their origins This lack of knowledge can have serious repercussions on the modelling process and lead to misleading conclusions

Ignorance related to how the data were collected can lead to an erroneous

Trang 6

17.2, Economic data and the sampling model 343

choice of an appropriate sampling model Moreover, if the choice of the data is based only on the name they carry and not on intimate knowledge about what exactly they are measuring, it can lead to an inappropriate

choice of the statistical GM (see Section 17.4, below) and some misleading

conclusions about the relationship between the estimated econometric model and the theoretical model as suggested by economic theory (see Chapter 1) Let us consider the relationship between the nature of the data and the sampling model in some more detail

In Chapter 11 we discussed three basic forms of a sampling model: (i) random sample — a set of independent and identically distributed

(ID) random variables (r.v.’s);

(ii) independent sample — a set of independent but not identically

distributed r.v.’s; and

(iit) non-random sample — a set of non-IID r.v.’s

For cross-section data selected by the simple random sampling method (where every unit in the target population has the same probability of being selected), the sampling model of a random sample seems the most appropriate choice On the other hand, for cross-section data selected by the stratified sampling method (the target population divided into a number of groups (strada) with every unit in each group having the same probability of being selected), the identically distributed assumption seems rather inappropriate The fact that the groups are chosen a priori in some systematic way renders the identically distributed assumption inappropriate For such cross-section data the sampling model of an independent sample seems more appropriate The independence assumption can be justified if sampling within and between groups is random

For time-series data the sampling models of a random or an independent sample seem rather unrealistic on a priori grounds, leaving the non-random sample as the most likely sampling model to postulate at the outset For the time-series data plotted against time in Fig 17 1(a)-(d) the assumption that they represent realisations of stochastic processes (see Chapter 8) seems more realistic than their being realisations of IID r.v.’s The plotted series

exhibit considerable time dependence This is confirmed in Chapter 23 where these series are used to estimate a money adjustment equation In

Chapters 19-22 the sampling model of an independent sample is

intentionally maintained for the example which involves these data series

and several misleading conclusions are noted throughout

In order to be able to take explicitly into consideration the nature of the

observed data chosen in the context of econometric modelling, the statistical models of particular interest in econometrics will be specified in terms of the observable r.v.’s giving rise to the data rather than the error term, the usual

Trang 7

35000 |-

§

= 25000 |-

E

a

=

15000 |-

Time

(a)

18000 |-

=

2 16000 Ƒ-

E

a

`

14000 |-

12000

Time

(b)

Fig 17.1(a) Money stock £(million) (b) Real consumers’ expenditure

approach in econometrics textbooks (see Theil (1971), Maddala (1977), Judge et al (1982) inter alia) The approach adopted in the present book is

to extend the statistical models considered so far in Part HI in order to accommodate certain specific features of econometric modelling In

particular a third component, called a statistical generating mechanism

(GM) will be added to the probability and sampling models in order to enable us to summarise the information involved in a way which provides

Trang 8

17.2 Economic data and the sampling model 345

240 —

200 |-

160

Pd

ar

120 |-

80_—

Time (c)

tiiliiiliirliiiliirliirliiiiliiirLiiiliirliiicLiiiLiiriliiiriiiiLiyiliiiEiiilittLiti

Time (d)

Fig 17.1(c) Implicit price deflator (d) Interest rate on 7 days’ deposit

account

‘an adequate’ approximation to the actual DGP giving rise to the observed data (see Chapter 1) This additional component will be considered extensively in Section 17.4 below In the next section the nature of the probabiiity models required in econometric modelling will be discussed in

view of the above discussion of the sampling model.

Trang 9

17.3 Economic data and the probability model

In Chapter | it was argued that the specification of statistical models should

take account not only of the theoretical a priori information available but the nature of the observed data chosen as well This is because the specification of statistical models proposed in the present book is based on the observable random variable giving rise to the observed data and not by attaching a white-noise error term to the theoretical model This strategy

implies that the modeller should consider assumptions such as

independence, stationarity, mixing (see Chapter 8) in relation to the observed data at the outset

As argued in Section 17.2, the sampling model of a random sample seems rather unrealistic for most situations in econometric modelling in view of

the economic data usually available Because of the interrelationship

between the sampling and the probability model we need to extend the simple probability model ®={D(y; 6), 0¢@} associated with a random sample to ones related to independent and non-random samples

An independent (but non-identically distributed) sample y=(y,, Vr) raises questions of time-heterogeneity in the context of the corresponding probability model This is because in general every element }, of y has its own distribution with different parameters D(y,; 0,) The parameters 6, which depend on t are called incidental parameters A probability model related to y takes the general form

where T={1, 2, } is an index set

A non-random sample y raises questions not only of time-heterogeneity

but of time-dependence as well In this case we need the joint distribution of y

in order to define an appropriate probability model of the general form

®=D(y¿,y;, , vr: 6y), 0;e@, T,=(1,2, ,7)ST} (172)

In both of the above cases the observed data can be viewed as realisations

of the stochastic process {y,,t¢ T} and for modelling purposes we need to restrict its generality using assumptions such as normality, stationarity and asymptotic independence or/and supplement the sample and theoretical information available In order to illustrate these let us consider the

simplest case of an independent sample and one incidental parameter:

0 9=iturznaslaf2"jk

6,=(u,,ø?)elR x R„, ret

Trang 10

17.3 Data and the probability model 347 (ii) Y=(V¡,Y¿ , yr} 1s an independent sample from D(y,; 6,),¢= 1, 2,

, T, respectively

The probability model postulates a normal density with mean yp, (an

incidental parameter) and variance o? The sampling model allows each y, to

have a different mean but the same variance and to be independent of the other y,s The distribution of the sample for the above statistical model

D(y: 6) where y=(y1, y„ yr) and Ð=(H, Hạ, , Hạ, Ø7) 1s

Diy, 0)= [] Dữ tụ, ở)

t=1

20° 424

As we can see, there are T+ 1 unknown parameters, 0= (07, Hy, 2 5 Hr)s

to be estimated and only T observations which provide us with sufficient warning that there will be problems This is indeed confirmed by the maximum likelihood (ML) method The log likelihood is

log L(6; y)=const—— logø?—— 3` (y—MjŸ, 2 20° 24 (174)

elog L Ch, ob (—2)(y,—m)=0, t=1,2, ,T, 2ø 1 (175)

Clog L Oe “=——~13——~ T 1 — 3 =0 17.6

These first-order conditions imply that f,=y,,t=1,2, , T, and 6?=0

Before we rush into pronouncing these as MLE'’s it is important to look at the second-order conditions for a maximum

é7 log L

ôm; =_+ oc? é? log L

ag? Gat

fy

which are unbounded and hence ñ, and ô? are not MLE”s; see Section 13.3

This suggests that there is not enough information in the statistical model

(i)(ii) above to estimate the statistical parameters 0=(y,, HU, -., ps 0”)

An obvious way to supplement this information is in the form of panel data for y,, say y,,i=1,2, ,N,t=1,2, , T In the case where N

realisations of y, are available at each t, 8 could be estimated by

Ngày đăng: 17/12/2013, 15:17

TỪ KHÓA LIÊN QUAN

w