Further issues in regression analysis Learning outcomes In this chapter, you will learn how to ● construct models with more than one explanatory variable; ● derive the OLS parameter and
Trang 1A term in
x t∗2can be cancelled from the numerator and denominator of (4A.29),
and, recalling that x∗t = (x t − ¯x ), this gives the variance of the slope coefficient as
It is possible to express ˆα as a function of the true α and of the disturbances, u t:ˆ
Writing (4A.34) out in full for g2
t and expanding the brackets,var( ¯α)= s
This looks rather complex, but, fortunately, if we take
x t2 outside the squarebrackets in the numerator, the remaining numerator cancels with a term in thedenominator to leave the required result:
Trang 2Further issues in regression analysis
Learning outcomes
In this chapter, you will learn how to
● construct models with more than one explanatory variable;
● derive the OLS parameter and standard error estimators in the
multiple regression context;
● determine how well the model fits the data;
● understand the principles of nested and non-nested models;
● test multiple hypotheses using an F -test;
● form restricted regressions; and
● test for omitted and redundant variables
5.1 Generalising the simple model to multiple linear regression
Previously, a model of the following form has been used:
y t = α + βx t + u t t = 1, 2, , T (5.1)Equation (5.1) is a simple bivariate regression model That is, changes inthe dependent variable are explained by reference to changes in one single
explanatory variable x What if the real estate theory or the idea that is
sought to be tested suggests that the dependent variable is influenced bymore than one independent variable, however? For example, simple esti-mation and tests of the capital asset pricing model can be conducted using
an equation of the form of (5.1), but arbitrage pricing theory does not suppose that there is only a single factor affecting stock returns So, to giveone illustration, REIT excess returns might be purported to depend on theirsensitivity to unexpected changes in
pre-108
Trang 3(1) inflation;
(2) the differences in returns on short- and long-dated bonds;
(3) the dividend yield; or
(4) default risks
Having just one independent variable would be no good in this case Itwould, of course, be possible to use each of the four proposed explanatoryfactors in separate regressions It is of greater interest, though, and it is alsomore valid, to have more than one explanatory variable in the regressionequation at the same time, and therefore to examine the effect of all theexplanatory variables together on the explained variable
It is very easy to generalise the simple model to one with k regressors
(independent variables) Equation (5.1) becomes
y t = β1+ β2x 2t + β3x 3t + · · · + β k x kt + u t , t = 1, 2, , T (5.2)
The variables x 2t , x 3t , , x kt are therefore a set of k− 1 explanatory variables
that are thought to influence y, and the coefficient estimates β2, β3, , β k
are the parameters that quantify the effect of each of these explanatory
variables on y The coefficient interpretations are slightly altered in the
multiple regression context Each coefficient is now known as a partialregression coefficient, interpreted as representing the partial effect of thegiven explanatory variable on the explained variable, after holding con-stant, or eliminating the effect of, all the other explanatory variables Forexample, ˆβ2measures the effect of x2on y after eliminating the effects of x3,
x4, , x k Stating this in other words, each coefficient measures the averagechange in the dependent variable per unit change in a given independentvariable, holding all other independent variables constant at their averagevalues
5.2 The constant term
In (5.2) above, astute readers will have noticed that the explanatory variables
are numbered x2, x3, – i.e the list starts with x2and not x1 So, where is
x1? In fact, it is the constant term, usually represented by a column of ones
.1
⎤
⎥
⎥
Trang 4Thus there is a variable implicitly hiding next to β1, which is a columnvector of ones, the length of which is the number of observations in the
sample The x1in the regression equation is not usually written, in the same
way that one unit of p and two units of q would be written as ‘p + 2q’ and not ‘1p + 2q’ β1is the coefficient attached to the constant term (which was
called α in the previous chapter) This coefficient can still be referred to as the intercept, which can be interpreted as the average value that y would
take if all the explanatory variables took a value of zero
A tighter definition of k, the number of explanatory variables, is ably now necessary Throughout this book, k is defined as the number of
prob-‘explanatory variables’ or ‘regressors’, including the constant term This isequivalent to the number of parameters that are estimated in the regressionequation Strictly speaking, it is not sensible to call the constant an explana-tory variable, since it does not explain anything and it always takes the same
values This definition of k will be employed for notational convenience,
in the X matrix Such a notation may seem unnecessarily complex, but,
in fact, the matrix notation is usually more compact and convenient So,
for example, if k is two – i.e there are two regressors, one of which is the constant term (equivalent to a simple bivariate regression y t = α + βx t + u t)– it is possible to write
Trang 5observa-conformable – in other words, there is a valid matrix multiplication and
addition on the RHS.1
calculated in the generalised case?
Previously, the residual sum of squares,
ˆ
u2
i ,was minimised with respect
to α and β In the multiple regression context, in order to obtain estimates
of the parameters, β1, β2, , β k, the RSS would be minimised with respect
to all the elements of β Now, the residuals can be stacked in a vector:
as ˆβ– it can be shown (see the appendix to this chapter) that the coefficientestimates will be given by the elements of the expression
If one were to check the dimensions of the RHS of (5.8), it would be observed
to be k × 1 This is as required, since there are k parameters to be estimated
by the formula for ˆβ
econometrics literature, although the ordering of the indices is different from that used in the mathematics of matrix algebra (as presented in chapter 2 of this book) In the latter
used from this point of the book onwards, it is the other way around.
Trang 6How are the standard errors of the coefficient estimates calculated,
though? Previously, to estimate the variance of the errors, σ2, an
estima-tor denoted by s2was used:
‘lost’ in estimating the two model parameters – i.e in deriving estimates for
α and β In the case in which there is more than one explanatory variable
plus a constant, and using the matrix notation, (5.9) would be modified to
so that the variance of ˆβ1 is the first diagonal element, the variance of ˆβ2
is the second element on the leading diagonal and the variance of ˆβ k is
the kth diagonal element The coefficient standard errors are simply given
therefore by taking the square roots of each of the terms on the leadingdiagonal
⎤
⎥
⎦ , ˆuuˆ= 10.96
Trang 7Calculate the coefficient estimates and their standard errors.
Recall from equation (4.29) in the previous chapter that the formula under a
test of significance approach to hypothesis testing using a t-test for variable
iis
test statistic= βˆi − β∗
i
SEˆ
Trang 8If the test is
H0: β i = 0
H1: β i = 0i.e a test that the population parameter is zero against a two-sided alterna-
tive – this is known as a t-ratio test Since β i∗= 0, the expression in (5.20)collapses to
test statistic= βˆi
Thus the ratio of the coefficient to its standard error, given by this
expres-sion, is known as the t-ratio or t-statistic In the last example above, the
t-ratios associated with each of the three coefficients would be given by
ˆ
β1 βˆ2 βˆ3Coefficient 1.10 −4.40 19.88
t-ratio 0.81 −4.63 10.04 Note that, if a coefficient is negative, its t-ratio will also be negative In order to test (separately) the null hypotheses that β1 = 0, β2= 0 and β3 = 0,the test statistics would be compared with the appropriate critical value
from a t-distribution In this case, the number of degrees of freedom, given
by T −k, is equal to 15 − 3 = 12 The 5 per cent critical value for this
two-sided test (remember, 2.5 per cent in each tail for a 5 per cent test) is 2.179,while the 1 per cent two-sided critical value (0.5 per cent in each tail) is 3.055
Given these t-ratios and critical values, would the following null hypotheses
that the variable is not helping to explain variations in y, and that it could therefore be removed from the regression equation For example, if the t- ratio associated with x had been 1.04 rather than 10.04, the variable would
Trang 9be classed as insignificant – i.e not statistically different from zero) Theonly insignificant term in the above regression is the intercept There aregood statistical reasons for always retaining the constant, even if it is notsignificant; see chapter 6.
It is worth noting that, for degrees of freedom greater than around five, the 5 per cent two-sided critical value is approximately±2 So, as a rule
twenty-of thumb (i.e a rough guide), the null hypothesis would be rejected if the
t-statistic exceeds two in absolute value
Some authors place the t-ratios in parentheses below the corresponding
coefficient estimates rather than the standard errors Accordingly, one needs
to check which convention is being used in each particular application, andalso to state this clearly when presenting estimation results
5.5 Goodness of fit statistics
It is desirable to have some measure of how well the regression modelactually fits the data In other words, it is desirable to have an answer to thequestion ‘How well does the model containing the explanatory variablesthat was proposed actually explain variations in the dependent variable?’
Quantities known as goodness of fit statistics are available to test how well
the sample regression function (SRF) fits the data – that is, how ‘close’ thefitted regression line is to all the data points taken together Note that
it is not possible to say how well the sample regression function fits thepopulation regression function – i.e how the estimated model compareswith the true relationship between the variables – as the latter is neverknown
What measures might therefore make plausible candidates to be goodness
of fit statistics? A first response to this might be to look at the residual sum
of squares Recall that OLS selected the coefficient estimates that minimisedthis quantity, so the lower the minimised value of the RSS was, the better themodel fitted the data Consideration of the RSS is certainly one possibility,but the RSS is unbounded from above (strictly, it is bounded from above bythe total sum of squares – see below) – i.e it can take any (non-negative)value So, for example, if the value of the RSS under OLS estimation was136.4, what does this actually mean? It would be very difficult, by looking atthis number alone, to tell whether the regression line fitted the data closely
or not The value of the RSS depends to a great extent on the scale of thedependent variable Thus one way to reduce the RSS pointlessly would be to
divide all the observations on y by ten!
Trang 10In fact, a scaled version of the residual sum of squares is usually employed The most common goodness of fit statistic is known as R2 One way to define
R2is to say that it is the square of the correlation coefficient between y and
ˆ
y– that is, the square of the correlation between the values of the dependentvariable and the corresponding fitted values from the model A correlationcoefficient must lie between−1 and +1 by definition Since R2 (defined inthis way) is the square of a correlation coefficient, it must lie between zeroand one If this correlation is high, the model fits the data well, while, if thecorrelation is low (close to zero), the model is not providing a good fit to thedata
Another definition of R2 requires a consideration of what the model isattempting to explain What the model is trying to do in effect is to explain
variability of y about its mean value, ¯ y This quantity, ¯y, which is more
specifically known as the unconditional mean of y, acts like a benchmark, since, if the researcher had no model for y, he/she could do no worse than
to regress y on a constant only In fact, the coefficient estimate for this regression would be the mean of y So, from the regression
the coefficient estimate, ˆβ1, will be the mean of y – i.e ¯ y The total variationacross all observations of the dependent variable about its mean value isknown as the total sum of squares, TSS, which is given by
since a residual for observation t is defined as the difference between the
actual and fitted values for that observation The goodness of fit statistic
is given by the ratio of the explained sum of squares to the total sum ofsquares,
R2= ESS
Trang 11flat estimated line
but, since TSS = ESS + RSS, it is also possible to write
In the first case, the model has not succeeded in explaining any of the
variability of y about its mean value, and hence the residual and total sums
of squares are equal This would happen only when the estimated values
of all the coefficients were exactly zero In the second case, the model has
explained all the variability of y about its mean value, which implies that
the residual sum of squares will be zero This would happen only in the case
in which all the observation points lie exactly on the fitted line Neither ofthese two extremes is likely in practice, of course, but they do show that
R2 is bounded to lie between zero and one, with a higher R2 implying,everything else being equal, that the model fits the data better
To sum up, a simple way (but crude, as explained next) to tell whether
the regression line fits the data well is to look at the value of R2 A value of
R2 close to one indicates that the model explains nearly all the variability
of the dependent variable about its mean value, while a value close to zeroindicates that the model fits the data poorly The two extreme cases, in
which R2= 0 and R2 = 1, are indicated in figures 5.1 and 5.2 in the context
of a simple bivariate regression
Trang 12y t
x t
Figure 5.2
points lie exactly on
the estimated line
Example 5.2 Measuring goodness of fit
We now estimate the R2for equation (4.28) applying formula (5.27) RSS=
R2is simple to calculate and intuitive to understand, and provides a broadindication of the fit of the model to the data There are a number of prob-
lems with R2as a goodness of fit measure, however, which are outlined inbox 5.1
change, even if the second model is a simple rearrangement of the first, with
with different dependent variables.
consider the following two models:
Trang 13R2 will always be at least as high for regression 2 relative to regression 1 TheR2 from regression 2 would be exactly the same as that for regression 1 only if the estimated value of the coefficient on the new variable were exactly zero – i.e ˆ
a given variable should be present in the model or not.
hence it is not good at discriminating between models, as a wide array of models
In order to get round the second of these three problems, a modification
to R2 is often made that takes into account the loss of degrees of freedomassociated with adding extra variables This is known as ¯R2, or adjusted R2,which is defined as
where k is the number of parameters to be estimated in the model and T
is the sample size If an extra regressor (variable) is added to the model, k increases and, unless R2increases by a more than offsetting amount, ¯R2willactually fall Hence ¯R2can be used as a decision-making tool for determiningwhether a given variable should be included in a regression model or not,with the rule being: include the variable if ¯R2rises and do not include it if
(2) There is no distribution available for ¯R2or R2, so hypothesis tests cannot
be conducted using them The implication is that one can never tell
whether the R2 or the ¯R2 from one model is significantly higher thanthat of another model in a statistical sense
5.6 Tests of non-nested hypotheses
All the hypothesis tests conducted thus far in this book have been in thecontext of ‘nested’ models This means that, in each case, the test involved
Trang 14imposing restrictions on the original model to arrive at a restricted lation that would be a subset of, or nested within, the original specification.Sometimes, however, it is of interest to compare between non-nested mod-els For example, suppose that there are two researchers working indepen-dently, each with a separate real estate theory for explaining the variation
formu-in some variable, y t The respective models selected by the researchers couldbe
where u t and v t are iid error terms Model (5.31) includes variable x2but not
x3, while model (5.32) includes x3but not x2 In this case, neither model can
be viewed as a restriction of the other, so how then can the two models be
compared as to which better represents the data, y t? Given the discussion inthe previous section, an obvious answer would be to compare the values of
R2or adjusted R2between the models Either would be equally applicable
in this case, since the two specifications have the same number of RHS
variables Adjusted R2 could be used even in cases in which the number
of variables was different in the two models, since it employs a penaltyterm that makes an allowance for the number of explanatory variables
Adjusted R2 is based upon a particular penalty function, however (that is,
T − k appears in a specific way in the formula) This form of penalty term
may not necessarily be optimal
Moreover, given the statement above that adjusted R2 is a soft rule, it islikely on balance that use of it to choose between models will imply thatmodels with more explanatory variables are favoured Several other similarrules are available, each having more or less strict penalty terms; these arecollectively known as ‘information criteria’ These are explained in somedetail in chapter 8, but suffice to say for now that a different strictness ofthe penalty term will in many cases lead to a different preferred model
An alternative approach to comparing between non-nested models would
be to estimate an encompassing or hybrid model In the case of (5.31) and(5.32), the relevant encompassing model would be
y t = γ1+ γ2x 2t + γ3x 3t + w t (5.33)
where w t is an error term Formulation (5.33) contains both (5.31) and
(5.32) as special cases when γ3 and γ2 are zero, respectively Therefore atest for the best model would be conducted via an examination of the sig-
nificances of γ2and γ3in model (5.33) There will be four possible outcomes(box 5.2)
Trang 15Box 5.2 Selecting between models
and the latter is the preferred model.
and the latter is the preferred model.
retained Models (5.31) and (5.32) are both ditched and (5.33) is the preferred model.
dropped, and some other method for choosing between them must be employed.
There are several limitations to the use of encompassing regressions toselect between non-nested models, however Most importantly, even if mod-els (5.31) and (5.32) have a strong theoretical basis for including the RHSvariables that they do, the hybrid model may be meaningless For example,
it could be the case that real estate theory suggests that y could either follow
model (5.31) or model (5.32), but model (5.33) is implausible
In adition, if the competing explanatory variables x2 and x3 are highlyrelated – i.e they are near-collinear – it could be the case that, if they are
both included, neither γ2 nor γ3 is statistically significant, while each issignificant in its separate regressions (5.31) and (5.32); see chapter 6 for anexplanation of why this may happen
An alternative approach is via the J -encompassing test due to Davidson
and MacKinnon (1981) Interested readers are referred to their work or toGujarati (2009) for further details
Example 5.3 A multiple regression in real estate
Amy, Ming and Yuan (2000) study the Singapore office market and focus
on obtaining empirical estimates for the natural vacancy rate and rentsutilising existing theoretical frameworks Their empirical analysis includesthe estimation of different specifications for rents For their investigation,quarterly data are available One of the models they estimate is given byequation (5.34),
where % denotes a percentage change (over the previous quarter), R tis the
nominal rent (hence %R t is the percentage change in nominal rent this
quarter over the preceding one), E tis the operating costs (due to data tions, the authors approximate this variable with the consumer price index;the CPI reflects the cost-push elements in an inflationary environment as
Trang 16limita-landlords push for higher rents to cover inflation and expenses) and V t−1isthe vacancy rate (in per cent) in the previous quarter The fitted model is
1 per cent will push up the rate of nominal rent growth by 2.07 per cent
The t-statistics in parentheses confirm that the parameters are statistically
significant
The above model explains approximately 23 per cent of the variation innominal rent growth, which means that model (5.35) has quite low explana-tory power Both the low explanatory power and the small sensitivity ofrents to vacancy are perhaps a result of model misspecification, which theauthors detect and attempt to address in their paper We consider suchissues of model misspecification in the following chapter
An alternative model that Amy, Ming and Yuan run is
This is a bivariate regression model; %RR t is the quarterly percentagechange in real rents (note that, in equation (5.34), nominal growth wasused) The following equation is the outcome:
to vacancy is greater than that in the previous model The explanatory powerremains low, however
Although we have not completed the treatment of regression analysis, onemay ask whether we can take a view as to which model is more appropriate
to study office rents in Singapore
This book equips the reader with the tools to answer this question, inparticular by means of the tests we discuss in the next chapter and theevaluation of forecast performance in later chapters On the basis of the