Interpreting the Linear Model

Một phần của tài liệu A guide to modern econometrics, 5th edition (Trang 74 - 79)

As already stressed in Chapter 2, the linear model

yi=xi𝛽+𝜀i (3.1)

has little meaning unless we complement it with additional assumptions on 𝜀i. It is common to state that 𝜀i has expectation zero and that the xis are taken as given.

A formal way of stating this is that it is assumed that the expected value of𝜀i given X(the collection of allxis,i=1, . . . ,N), or the expected value of𝜀i given xi, is zero, that is,

E{𝜀i|X} =0 or E{𝜀i|xi} =0 (3.2)

k k

INTERPRETING THE LINEAR MODEL 61

respectively, where the latter condition is implied by the first. UnderE{𝜀i|xi} =0, we can interpret the regression model as describing the conditional expected value of yi given values for the explanatory variablesxi. For example, what is the expected wage for an arbitrary woman of age 40, with a university education and 14 years of experience?

Or, what is the expected unemployment rate given wage rates, inflation and total output in the economy? The first consequence of (3.2) is the interpretation of the individual𝛽 coefficients. For example,𝛽kmeasures the expected change inyiifxikchanges with one unit, whereas the other variables inxido not change. That is,

𝜕E{yi|xi}

𝜕xik =𝛽k. (3.3)

It is important to realize that we had to state explicitly that the other variables inxidid not change. This is the so-called ceteris paribus condition. In a multiple regression model, single coefficients can only be interpreted under ceteris paribus conditions. For example, 𝛽kcould measure the effect of age on the expected wage of a woman, if the education level and years of experience are kept constant. An important consequence of the ceteris paribus condition is that it is not possible to interpret a single coefficient in a regression model without knowing what the other variables in the model are. If interest is focused on the relationship betweenyiandxik, the other variables inxiact as control variables. For example, we may be interested in the relationship between house prices and the number of bedrooms, controlling for differences in lot size and location.

Depending upon the question of interest, we may decide to control for some factors but not for all (see Wooldridge, 2012, Section 6.3, for more discussion).

Sometimes these ceteris paribus conditions are hard to maintain. For example, in the wage equation case, it may be very common that a changing age almost always cor- responds to changing years of experience. Although the𝛽kcoefficient in this case still measures the effect of age, keeping years of experience (and the other variables) fixed, it may not be very well identified from a given sample owing to the collinearity between the two variables. In some cases it is just impossible to maintain the ceteris paribus con- dition, for example ifxiincludes both age and age-squared. Clearly, it is ridiculous to say that a coefficient𝛽kmeasures the effect of age given that age-squared is constant. In this case, one should go back to the derivative (3.3). Ifxi𝛽includes, say,agei𝛽2+age2i𝛽3, we can derive

𝜕E{yi|xi}

𝜕agei =𝛽2+2agei𝛽3, (3.4)

which can be interpreted as the marginal effect of a changing age if the other variables inxi(excludingage2i) are kept constant. This shows how the marginal effects of explana- tory variables can be allowed to vary over the observations by including additional terms involving these variables (in this case age2i). For example, we can allow the effect of age to be different for men and women by including an interaction term ageimalei in the regression, wheremaleiis a dummy for males. Thus, if the model includesagei𝛽2+ ageimalei𝛽3, the effect of a changing age is

𝜕E{yi|xi}

𝜕agei =𝛽2+malei𝛽3, (3.5)

which is𝛽2for females and𝛽2+𝛽3for males. Sections 3.4 and 3.6 will illustrate the use of such interaction terms.

k k In general, the inclusion of (many) interaction terms complicates direct interpretation

of the regression coefficients. For example, when the model of interest contains the inter- action termxi2xi3, the coefficient forxi2 measures the partial effect ofxi2 whenxi3=0, which may be irrelevant or uninteresting. When the model is expanded to includexi2xi4, interpretation becomes even more involved. This does not imply that we should not use interaction terms. Instead, we should be careful with the interpretation of our estimation results (and make sure that all relevant interaction terms are clearly reported).

When interaction terms are used, it is typically recommended to also include the orig- inal variables themselves in the regression model, unless there is a very good reason not to do so. That is, whenxi2xi3is included in the model, so should bexi2andxi3. If not, the interaction term may pick up the effect of the original variables – see the discussion on omitted variables in the next section.

The economic interpretation of the regression coefficient𝛽kin (3.3) depends upon the units in whichyi andxik are measured. If the variables are rescaled the magnitude of the coefficient and its estimate change accordingly. For example, ifxikis measured in 1000s of euros rather than euros, its coefficient will be 1000 times smaller, such that the economic interpretation is equivalent. Moreover, the coefficient estimate and its standard error will also change proportionally, such that thet-statistic and statistical significance are unaffected. In general, ifxikis multiplied by a constantc, its coefficient is divided byc. Ifyi is multiplied byc, all coefficients are multiplied by c, whereast-statistics, F-statistics andR2are unaffected. It may be attractive to scale the variables in a model such that the order of magnitude of the coefficients is reasonably similar. Adding or subtracting a constant from a variable does not affect the slope coefficients in a regres- sion, whereas the intercept will adapt. For example, replacingxikbyxikdincreases the intercept by𝛽kd.

Occasionally, researchers ‘standardize’ the variables in a regression model. This means that each variable is replaced by a standardized version obtained by subtracting the sam- ple average and dividing by the sample standard deviation. Whereas this does not affect statistical significance, the resulting regression coefficients now measure the expected change inyirelated to a change inxikin ‘units of standard deviation’. For example, ifxik changes by one standard deviation, we expectyito increase by𝛽kstandard deviations. The regression coefficients in this case are referred to asstandardized coefficientsand can be compared more easily across explanatory variables.1Note that standardization does not make too much sense when explanatory variables are dummy variables, variables with a small number of discrete outcomes or interaction variables. Standardization is particu- larly useful when an explanatory variable is measured on a scale that may be difficult to interpret (e.g. test scores, or measures of concepts like happiness and satisfaction).

The interpretation of (3.1) as a conditional expectation does not necessarily imply that we can interpret the parameters in𝛽 as measuring thecausaleffect ofxi uponyi. For example, it is not unlikely that expected wage rates vary between married and unmarried workers, even after controlling for many other factors, but it is not very likely that being married causes people to have higher wages. Rather, marital status proxies for a variety of (un)observable characteristics that also affect a person’s wage. Similarly, if you try to relate regional crime rates to, say, the number of police officers, you will probably find a positive relationship. This is because regions with more crime tend to

1See Bring (1994) for a critical note on the interpretation of standardized coefficients as a measure for the relative importance of explanatory variables.

k k

INTERPRETING THE LINEAR MODEL 63

spend more money on law enforcement and therefore have more police, not because the police arecausingthe crime. Angrist and Pischke (2009) provide an excellent discussion of the challenges of identifying causal effects in empirical work. If we wish to interpret coefficients causally, the ceteris paribus condition should include all other (observable and unobservable) factors, not just the observed variables that we happen to include in our model. Whether or not such an extended interpretation of the ceteris paribus condition makes sense – and a causal interpretation is appropriate – depends crucially upon the economic context. Unfortunately, statistical tests provide very little guidance on this issue. Accordingly, we should be very careful attaching a causal interpretation to estimated coefficients. In Chapter 5 we shall come back to this issue.

Frequently, economists are interested in elasticities rather than marginal effects.

Anelasticitymeasures therelativechange in the dependent variable owing to arelative change in one of thexivariables. Often, elasticities are estimated directly from a linear regression model involving the (natural) logarithms of most explanatory variables (excluding dummy variables), that is,

logyi= (logxi)𝛾+𝑣i, (3.6) where logxiis shorthand notation for a vector with elements(1,logxi2, . . . ,logxiK)and it is assumed thatE{𝑣i|logxi} =0. We shall call this aloglinear model. In this case,

𝜕E{yi|xi}

𝜕xik . xik

E{yi|xi} ≈ 𝜕E{logyi|logxi}

𝜕logxik =𝛾k, (3.7)

where the≈is due to the fact thatE{logyi|logxi} =E{logyi|xi}≠logE{yi|xi}. Note that (3.3) implies that in the linear model

𝜕E{yi|xi}

𝜕xik . xik

E{yi|xi} = xik

xi𝛽𝛽k, (3.8)

which shows that the linear model implies that elasticities arenonconstantand vary with xi, whereas the loglinear model imposesconstantelasticities. Although in many cases the choice of functional form is dictated by convenience in economic interpretation, other considerations may play a role. For example, explaining logyirather thanyioften helps to reduce heteroskedasticity problems, as illustrated in Sections 3.6 and 4.5. Note that elasticities are independent of the scaling of the variables. In Section 3.3 we shall briefly consider statistical tests for a linear versus a loglinear specification.

If xik is a dummy variable (or another variable that may take nonpositive values), we cannot take its logarithm and we include the original variable in the model. Thus we estimate

logyi=xi𝛽+𝜀i. (3.9)

Of course, it is possible to include some explanatory variables in logs and some in levels. In (3.9) the interpretation of a coefficient𝛽kis therelativechange in the expected value of yi owing to an absolute change of one unit in xik. This is referred to as a semi-elasticity. For example, ifxik is a dummy for males,𝛽k=0.10 tells us that the (ceteris paribus) relative wage differential between men and women is 10%. Again, this holds only approximately (see Subsection 3.6.2). The use of the natural logarithm in (3.9), rather than the log with base 10, is essential for this interpretation.

k k The inequality ofE{logyi|xi}and logE{yi|xi} also has some consequences for pre-

diction purposes. Suppose we start from the loglinear model (3.6) withE{𝑣i|logxi} =0.

Then, we can determine the predicted value of logyi as(logxi)𝛾. However, if we are interested in predictingyi rather than logyi, it is not the case that exp{(logxi)𝛾}is a good predictor foryiin the sense that it corresponds to the expected value ofyi, givenxi. That is,E{yi|xi}≥exp{E{logyi|xi}} =exp{(logxi)𝛾}. This inequality is referred to as Jensen’s inequality and will be important when the variance of𝑣iis not very small. The reason is that taking logarithms is a nonlinear transformation, whereas the expected value of a nonlinear function is not this nonlinear function of the expected value. The only way to get around this problem is to make distributional assumptions. If, for example, it can be assumed that𝑣i in (3.6) is normally distributed with mean zero and variance𝜎𝑣2, it implies that the conditional distribution ofyiis lognormal (see Appendix B) with mean

E{yi|xi} =exp{

E{logyi|xi} +12𝜎2𝑣}

=exp{

(logxi)𝛾+ 12𝜎𝑣2}

. (3.10) Sometimes, the additional half-variance term is also added when the error terms are not assumed to be normal. Often, it is simply omitted. Additional discussion on predictingyi when the dependent variable is logyiis provided in Wooldridge (2012, Section 6.4).

The logarithmic transformation cannot be used if a variable is negative or equal to zero.

An alternative transformation that is occasionally used, also whenyi≤0, is the inverse hyperbolic sine transformation (see Burbidge, Magee and Robb, 1988), given by

ihs(yi) =log (

yi+ y2i +1

) .

Although this looks complicated, the inverse sine is approximately equal to log(2) + log(yi)foryi larger than 4, so estimation results can be interpreted pretty much in the same way as with a standard logarithmic dependent variable. Whenyi is close to zero, the transformation is almost linear. Alternatively, authors often use log(c+yi)in cases whereyican be zero or very close to zero, for some small constantc, even though results will be sensitive to the choice ofc.

Another consequence of (3.2) is often overlooked. If we change the set of explanatory variablesxitozi, say, and estimate another regression model,

yi=zi𝛾+𝑣i (3.11)

with the interpretation that E{yi|zi} =zi𝛾, there is no conflict with the previous model stating that E{yi|xi} =xi𝛽. Because the conditioning variables are different, both conditional expectations can be correct in the sense that both are linear in the conditioning variables. Consequently, if we interpret the regression models as describing the conditional expectation given the variables that are included, there can never be any conflict between them. They are just two different things in which we might be interested. For example, we may be interested in the expected wage as a function of gender only, but also in the expected wage as a function of gender, education and experience. Note that, because of a different ceteris paribus condition, the coefficients for gender in these two models do not have the same interpretation. Often, researchers implicitly or explicitly make the assumption that the set of conditioning variables is larger than those that are included. Sometimes it is suggested that the model contains all relevant observable variables (implying that observables that are not included in the

k k

SELECTING THE SET OF REGRESSORS 65

model are in the conditioning set but irrelevant). If it is argued, for example, that the two linear models presented earlier should be interpreted as

E{yi|xi,zi} =zi𝛾 and

E{yi|xi,zi} =xi𝛽

respectively, then the two modelsaretypically in conflict and at most one of them can be correct.2Only in such cases does it make sense to compare the two models statistically and to test, for example, which model is correct and which one is not. We come back to this issue in Subsection 3.2.3.

Một phần của tài liệu A guide to modern econometrics, 5th edition (Trang 74 - 79)

Tải bản đầy đủ (PDF)

(523 trang)