17.2 Economic data and the sampling model Economic data are usually non-experimental in nature and come in one of three forms: i time Series, Measuring a particular variable at successi
Trang 1
The linear regression and related statistical models
Trang 2CHAPTER 17
Statistical models in econometrics
17.1 Simple statistical models
The main purpose of Parts I] and III has been to formulate and discuss the concept of a statistical model which will form the backbone of the discussion in Part IV A statistical model has been defined as made up of two related components:
(i) a probability model, ®= {D(y; 0), @¢ O}+ specifying a parametric
family of densities indexed by 0; and
(ii) a sampling model, y=(y;, ¥2, - Jr)’ defining a sample from
D(y; 69), for some ‘true’ @ in O
The probability model provides the framework in the context of which the stochastic environment of the real phenomenon being studied can be
defined and the sampling model describes the relationship between the
probability model and the observable data By postulating a statistical model we transform the uncertainty relating to the mechanism giving rise to the observed data to uncertainty relating to some unknown parameter(s) 0 whose estimation determines the stochastic mechanism D(y; 6)
An example of such a statistical model in econometrics is provided by the
modelling of the distribution of personal income In studying the distribution of personal income higher than a lower limit yy the following statistical model is often postulated:
(i) D= 4 D(y/yos A= “) yh 0cf`,, y>yo¿:
+0 J
(1) y=(¡.ÿ¿, , Yy) isa random sample from D(y/ya;0)
+ The notation in Part IV will be somewhat different from the one used in Parts Il and
IH This change in notation has been made to conform with the established
econometric notation
339
Trang 3Note:
For y a random sample the likelihood function is
T /Ø0N(y,V*!
LỊU; y)= H (2?) =0 y,,y;, , vợ) 690,
t=1 XŸo/ Vi
r
log L(6; y)= T log 0+ TO log yy —(0+ 1) ¥ log y,,
t=1
dlogL T
ẽ — + T log yo — > log y,=0,
t
dé)
0=r| ("||
is the maximum likelihood estimator (MLE) of the parameter 6 Since
(d? log L)/d6? = — T/6?, the asymptotic distribution of 6 takes the form (see
Chapter 13):
/ T(8— 0) ~ NI0, 0°)
Although in general the finite sample distribution is not frequently available, in this particular case we can derive D(0) analytically It takes the
form
(see Appendix 6.1) This distribution of § can be used to consider the finite sample properties of 0 as well as test hypotheses or set up confidence intervals for the unknown parameter 0 For instance, in view of the fact that
E(6) = (¿;}"
we can deduce that Ô is a biased estimator of 0
It is of interest in this particular case to assess the ‘accuracy’ of the asymptotic distribution of @ for a small T, (T=8), by noting that
^ T?8?
¬-.=
Trang 417.1 Simple statistical models 341 (see Johnson and Kotz (1970)) Using the data on income distribution (see Chapter 2), for y> 5000 (reproduced below) to estimate 0,
Income lower
No of
we get
aap loe(**) | = 1.6
as the ML estimate
Using the invariance property of MLE’s (see Section 13.3) we can deduce that
£(0)=2.13, Var(6)=0.91
As we can see, for a small sample (T=8) the estimate of the mean and the variance are considerably larger than the ones given by the asymptotic distribution:
Ậ2
E(O}= 1.6, Var(t) =; = 0.32
On the other hand, for a much larger sample, say T= 100,
E(6) = 1.63, Var(6)=0.028,
as compared with
E(6)=1.6, Var(0)=0.026
These results exemplify the danger of using asymptotic results for small samples and should be viewed as a warning against uncritical use of asymptotic theory For a more general discussion of asymptotic theory and how to improve upon the asymptotic results see Chapter 10
The statistical inference results derived above in relation to the income
distribution example depend crucially on the appropriateness of the statistical model postulated That is, the statistical model should represent a good approximation of the real phenomenon to be explained in a way which takes account the nature of the available data For example, if the
data were collected using stratified sampling then the random sample assumption is inappropriate (see Section 17.2 below) When any of the
Trang 5assumptions underlying the statistical model are invalid the above
estimation results are unwarranted
In the next three sections it is argued that for the purposes of econometric modelling we need to extend the simple statistical model based on a random sample, illustrated above, in certain specific directions as required by the particular features of econometric modelling In Section 17.2 we consider the nature of economic data commonly available and discuss its
implications for the form of the sampling model It is argued that for most
forms of economic data the random sample assumption is inappropriate Section 17.3 considers the question of constructing probability models if the identically distributed assumption does not hold The concept of a statistical generating mechanism (GM) is introduced in Section 17.4 in order to supplement the probability and sampling models This additional
component enables us to accommodate certain specific features of econometric modelling In Section 17.5 the main statistical models of
interest in econometrics are summarised as a prelude to the discussion
which follows
17.2 Economic data and the sampling model
Economic data are usually non-experimental in nature and come in one of
three forms:
(i) time Series, Measuring a particular variable at successive points in
time (annual, quarterly, monthly or weekly);
(ii) cross-section, measuring a particular variable at a given point in
time over different units (persons, households, firms, industries, countries, etc.);
(1) panel data, which refer to cross-section data over time
Economic data such as M1 money stock (M), real consumers’ expenditure (Y) and its implicit deflator (P), interest rate on 7 days’ deposit account (J), over time, are examples of time-series data (see Appendix, Table 17.2) The income data used in Chapter 2 are cross-section data on 23 000 households
in the UK for 1979-80 Using the same 23 000 households of the cross- section observed over time we could generate panel data on income In
practice, panel data are rather rare in econometrics because of the
difficulties involved in gathering such data For a thorough discussion of econometric modelling using panel data see Chamberlain (1984)
The econometric modeller is rarely involved directly with the data collection and refinement and often has to use published data knowing very
little about their origins This lack of knowledge can have serious repercussions on the modelling process and lead to misleading conclusions
Ignorance related to how the data were collected can lead to an erroneous
Trang 617.2, Economic data and the sampling model 343
choice of an appropriate sampling model Moreover, if the choice of the data is based only on the name they carry and not on intimate knowledge about what exactly they are measuring, it can lead to an inappropriate
choice of the statistical GM (see Section 17.4, below) and some misleading
conclusions about the relationship between the estimated econometric model and the theoretical model as suggested by economic theory (see Chapter 1) Let us consider the relationship between the nature of the data and the sampling model in some more detail
In Chapter 11 we discussed three basic forms of a sampling model: (i) random sample — a set of independent and identically distributed
(ID) random variables (r.v.’s);
(ii) independent sample — a set of independent but not identically
distributed r.v.’s; and
(iit) non-random sample — a set of non-IID r.v.’s
For cross-section data selected by the simple random sampling method (where every unit in the target population has the same probability of being selected), the sampling model of a random sample seems the most appropriate choice On the other hand, for cross-section data selected by the stratified sampling method (the target population divided into a number of groups (strada) with every unit in each group having the same probability of being selected), the identically distributed assumption seems rather inappropriate The fact that the groups are chosen a priori in some systematic way renders the identically distributed assumption inappropriate For such cross-section data the sampling model of an independent sample seems more appropriate The independence assumption can be justified if sampling within and between groups is random
For time-series data the sampling models of a random or an independent sample seem rather unrealistic on a priori grounds, leaving the non-random sample as the most likely sampling model to postulate at the outset For the time-series data plotted against time in Fig 17 1(a)-(d) the assumption that they represent realisations of stochastic processes (see Chapter 8) seems more realistic than their being realisations of IID r.v.’s The plotted series
exhibit considerable time dependence This is confirmed in Chapter 23 where these series are used to estimate a money adjustment equation In
Chapters 19-22 the sampling model of an independent sample is
intentionally maintained for the example which involves these data series
and several misleading conclusions are noted throughout
In order to be able to take explicitly into consideration the nature of the
observed data chosen in the context of econometric modelling, the statistical models of particular interest in econometrics will be specified in terms of the observable r.v.’s giving rise to the data rather than the error term, the usual
Trang 7
35000 |-
§
= 25000 |-
E
a
=
15000 |-
Time
(a)
18000 |-
=
2 16000 Ƒ-
E
a
`
14000 |-
12000
Time
(b)
Fig 17.1(a) Money stock £(million) (b) Real consumers’ expenditure
approach in econometrics textbooks (see Theil (1971), Maddala (1977), Judge et al (1982) inter alia) The approach adopted in the present book is
to extend the statistical models considered so far in Part HI in order to accommodate certain specific features of econometric modelling In
particular a third component, called a statistical generating mechanism
(GM) will be added to the probability and sampling models in order to enable us to summarise the information involved in a way which provides
Trang 817.2 Economic data and the sampling model 345
240 —
200 |-
160
Pd
ar
120 |-
80_—
Time (c)
tiiliiiliirliiiliirliirliiiiliiirLiiiliirliiicLiiiLiiriliiiriiiiLiyiliiiEiiilittLiti
Time (d)
Fig 17.1(c) Implicit price deflator (d) Interest rate on 7 days’ deposit
account
‘an adequate’ approximation to the actual DGP giving rise to the observed data (see Chapter 1) This additional component will be considered extensively in Section 17.4 below In the next section the nature of the probabiiity models required in econometric modelling will be discussed in
view of the above discussion of the sampling model.
Trang 917.3 Economic data and the probability model
In Chapter | it was argued that the specification of statistical models should
take account not only of the theoretical a priori information available but the nature of the observed data chosen as well This is because the specification of statistical models proposed in the present book is based on the observable random variable giving rise to the observed data and not by attaching a white-noise error term to the theoretical model This strategy
implies that the modeller should consider assumptions such as
independence, stationarity, mixing (see Chapter 8) in relation to the observed data at the outset
As argued in Section 17.2, the sampling model of a random sample seems rather unrealistic for most situations in econometric modelling in view of
the economic data usually available Because of the interrelationship
between the sampling and the probability model we need to extend the simple probability model ®={D(y; 6), 0¢@} associated with a random sample to ones related to independent and non-random samples
An independent (but non-identically distributed) sample y=(y,, Vr) raises questions of time-heterogeneity in the context of the corresponding probability model This is because in general every element }, of y has its own distribution with different parameters D(y,; 0,) The parameters 6, which depend on t are called incidental parameters A probability model related to y takes the general form
where T={1, 2, } is an index set
A non-random sample y raises questions not only of time-heterogeneity
but of time-dependence as well In this case we need the joint distribution of y
in order to define an appropriate probability model of the general form
®=D(y¿,y;, , vr: 6y), 0;e@, T,=(1,2, ,7)ST} (172)
In both of the above cases the observed data can be viewed as realisations
of the stochastic process {y,,t¢ T} and for modelling purposes we need to restrict its generality using assumptions such as normality, stationarity and asymptotic independence or/and supplement the sample and theoretical information available In order to illustrate these let us consider the
simplest case of an independent sample and one incidental parameter:
0 9=iturznaslaf2"jk
6,=(u,,ø?)elR x R„, ret
Trang 1017.3 Data and the probability model 347 (ii) Y=(V¡,Y¿ , yr} 1s an independent sample from D(y,; 6,),¢= 1, 2,
, T, respectively
The probability model postulates a normal density with mean yp, (an
incidental parameter) and variance o? The sampling model allows each y, to
have a different mean but the same variance and to be independent of the other y,s The distribution of the sample for the above statistical model
D(y: 6) where y=(y1, y„ yr) and Ð=(H, Hạ, , Hạ, Ø7) 1s
Diy, 0)= [] Dữ tụ, ở)
t=1
20° 424
As we can see, there are T+ 1 unknown parameters, 0= (07, Hy, 2 5 Hr)s
to be estimated and only T observations which provide us with sufficient warning that there will be problems This is indeed confirmed by the maximum likelihood (ML) method The log likelihood is
log L(6; y)=const—— logø?—— 3` (y—MjŸ, 2 20° 24 (174)
elog L Ch, ob (—2)(y,—m)=0, t=1,2, ,T, 2ø 1 (175)
Clog L Oe “=——~13——~ T 1 — 3 =0 17.6
These first-order conditions imply that f,=y,,t=1,2, , T, and 6?=0
Before we rush into pronouncing these as MLE'’s it is important to look at the second-order conditions for a maximum
é7 log L
ôm; =_+ oc? é? log L
ag? Gat
fy
which are unbounded and hence ñ, and ô? are not MLE”s; see Section 13.3
This suggests that there is not enough information in the statistical model
(i)(ii) above to estimate the statistical parameters 0=(y,, HU, -., ps 0”)
An obvious way to supplement this information is in the form of panel data for y,, say y,,i=1,2, ,N,t=1,2, , T In the case where N
realisations of y, are available at each t, 8 could be estimated by