In this section we consider an empirical illustration concerning the relationship between sale prices of houses and their characteristics. The resulting price function can be referred to as a hedonic price function, because it allows the estimation of hedonic prices (see Rosen, 1974). A hedonic price refers to the implicit price of a certain attribute (e.g. the number of bedrooms) as revealed by the sale price of a house. In this context, a house is considered as a bundle of such attributes. Typical products for which hedonic price functions are estimated are computers, cars and houses. For our purpose, the important conclusion is that a hedonic price function describes the expected price (or log price) as a function of a number of characteristics. Berndt (1991, Chapter 4) discusses additional economic and econometric issues relating to the use, interpretation and estimation of such price functions.
The data we use are taken from a study by Anglin and Genỗay (1996) and contain sale prices of 546 houses, sold during July, August and September of 1987, in the city of Windsor, Canada, along with their important features. The following characteristics are available: the lot size of the property in square feet, the numbers of bedrooms, full bathrooms and garage places and the number of stories. In addition there are dummy variables for the presence of a driveway, recreational room, full basement and central air conditioning, for being located in a preferred area and for using gas for hot water heating.
To start our analysis, we shall first estimate a model that explains the log of the sale price from the log of the lot size, the numbers of bedrooms and bathrooms and the presence of air conditioning. OLS estimation produces the results in Table 3.1. These results indicate a reasonably highR2 of 0.57 and fairly hight-ratios for all coefficients. The coefficient for the air conditioning dummy indicates that a house that has central air conditioning is expected to sell at a 21% higher price than a house without it, both houses having the same number of bedrooms and bathrooms and the same lot size. A 10% larger lot, ceteris paribus, increases the expected sale price by about 4%, while an additional bedroom is estimated to raise the price by almost 8%. The expected log sale price of a house with four bedrooms, one full bathroom, a lot size of 5000 sq. ft and no air conditioning can be computed as
7.094+0.400 log(5000) +0.078×4+0.216=11.028,
which corresponds to an expected price of exp{11.028+0.5×0.24562} =63 460 Canadian dollars. The latter term in this expression corresponds to one-half of the
Table 3.1 OLS results hedonic price function Dependent variable: log(price)
Variable Estimate Standard error t-ratio
constant 7.094 0.232 30.636
log(lotsize) 0.400 0.028 14.397
bedrooms 0.078 0.015 5.017
bathrooms 0.216 0.023 9.386
air conditioning 0.212 0.024 8.923
s=0.2456 R2=0.5674 R̄2=0.5642 F=177.41
k k
ILLUSTRATION: EXPLAINING HOUSE PRICES 77
estimated error variance (s2)and is based upon the assumption that the error term is normally distributed (see (3.10)). Omitting this term produces an expected price of only 61 575 dollars. To appreciate the half-variance term, consider the fitted values of our model. Taking the exponential of these fitted values produces predicted prices for the houses in our sample. The average predicted price is 66 679 dollars, while the sample average of actual prices is 68 122. This indicates that without any corrections we would systematically underpredict prices. When the half-variance term is added, the average predicted price based on the model explaining log prices increases to 68 190, which is fairly close to the actual average.
To test the functional form of this simple specification, we can use the RESET test.
This means that we generate predicted values from our model, take powers of them, include them in the original equation and test their significance. Note that these latter regressions are run for testing purposes only and are not meant to produce a meaning- ful model. Including the squared fitted value produces at-statistic of 0.514(p=0.61), and including the squared and cubed fitted values gives anF-statistic of 0.56(p=0.57).
Neither test indicates particular misspecifications of our model. Nevertheless, we may be interested in including additional variables in our model because prices may also be affected by characteristics like the number of garage places or the location of the house.
To this end, we include all other variables in our model to obtain the specification that is reported in Table 3.2. Given that theR2 increases to 0.68 and that all the individual t-statistics are larger than 2, this extended specification appears to perform significantly better in explaining house prices than the previous one. A joint test on the hypothesis that all seven additional variables have a zero coefficient is provided by theF-test, where the test statistic is computed on the basis of the respectiveR2s as
F= (0.6865−0.5674)∕7
(1−0.6865)∕(546−12) =28.99,
which is highly significant for anFdistribution with 7 and 532 degrees of freedom(p= 0.000). Looking at the point estimates, the ceteris paribus effect of a 10% larger lot size is
Table 3.2 OLS results hedonic price function, extended model Dependent variable: log(price)
Variable Estimate Standard error t-ratio
constant 7.745 0.216 35.801
log(lotsize) 0.303 0.027 11.356
bedrooms 0.034 0.014 2.410
bathrooms 0.166 0.020 8.154
air conditioning 0.166 0.021 7.799
driveway 0.110 0.028 3.904
recreational room 0.058 0.026 2.225
full basement 0.104 0.022 4.817
gas for hot water 0.179 0.044 4.079
garage places 0.048 0.011 4.178
preferred area 0.132 0.023 5.816
stories 0.092 0.013 7.268
s=0.2104 R2=0.6865 R̄2=0.6801 F=106.33
k k now estimated to be only 3%. This is almost certainly due to the change in ceteris paribus
condition, for example houses with larger lot sizes tend to have a driveway relatively more often.10Similarly, the estimated impact of the other variables is reduced compared with the estimates in Table 3.1. As expected, all coefficient estimates are positive and relatively straightforward to interpret. Ceteris paribus, a house in a preferred neighbourhood of the city is expected to sell at a 13% higher price than a house located elsewhere.
As before we can test the functional form of the specification by performing one or more RESET tests. With at-value of 0.06 for the squared fitted values and anF-statistic of 0.04 for the squared and cubed terms, there is again no evidence of misspecification of the functional form. An inspection of the auxiliary regression results, though, suggests that this may be attributable to a lack of power owing to multicollinearity. Instead, it is possible to consider more specific alternatives when testing the functional form. For example, one could hypothesize that an additional bedroom implies a larger price increase when the house is in a preferred neighbourhood. If this is the case, the model should include an interaction term between the location dummy and the number of bedrooms. If the model is extended to include this interaction term, thet-test on the new variable produces a highly insignificant value of−0.131. Overall, the current model appears surprisingly well specified.
The model allows us to compute the expected log sale price of an arbitrary house in Windsor. If you own a two-storeyed house on a lot of 10 000 square feet, located in a pre- ferred neighbourhood of the city, with four bedrooms, one bathroom, two garage places, a driveway, a recreational room, air conditioning and a full and finished basement, using gas for water heating, the expected log price is 11.87. This indicates that the hypothetical price of your house, if sold in the summer of 1987, is estimated to be slightly more than 146 000 Canadian dollars.
Instead of modelling log prices, we could also consider explaining prices. Table 3.3 reports the results of a regression model where prices are explained as a linear function
Table 3.3 OLS results hedonic price function, linear model Dependent variable:price
Variable Estimate Standard error t-ratio
constant −4038.35 3409.47 −1.184
lot size 3.546 0.350 10.124
bedrooms 1832.00 1047.00 1.750
bathrooms 14 335.56 1 489.92 9.622
air conditioning 12 632.89 1555.02 8.124
driveway 6687.78 2045.25 3.270
recreational room 4511.28 1899.96 2.374
full basement 5452.39 1588.02 3.433
gas for hot water 12 831.41 3217.60 3.988
garage places 4244.83 840.54 5.050
preferred area 9369.51 1669.09 5.614
stories 6556.95 925.29 7.086
s=15 423 R2=0.6731 R̄2=0.6664 F=99.97
10The sample correlation coefficient between log lot size and the driveway dummy is 0.33.
k k
ILLUSTRATION: PREDICTING STOCK INDEX RETURNS 79
of lot size and all other variables. Compared with the previous model, the coefficients now reflect absolute differences in prices rather than relative differences. For example, the presence of a driveway (ceteris paribus) is expected to increase the house price by 6 688 dollars, while Table 3.2 implies an estimated increase of 11%. It is not directly clear from a comparison of the results in Tables 3.2 and 3.3 which of the two specifications is preferable. Recall that theR2 does not provide an appropriate means of comparison.
As discussed in Subsection 3.2.3, it is possible to test these two non-nested models against each other. Using the PE test we can test the two hypotheses that the linear model is appropriate and that the loglinear model is appropriate. When testing the linear model, we obtain a test statistic of −6.196. Given the critical values of a standard normal dis- tribution, this implies that the specification in Table 3.3 has to be rejected. This does not automatically imply that the specification in Table 3.2 is appropriate. Nevertheless, when testing the loglinear model (where only price and lot size are in logs) we find a test statistic of−0.569, so that it is not rejected.