We choose 6 variables: price, crime, nox, rooms, dist and proptax to do the research in which price is dependent variable and the other five are independent variables.. e variable label
Trang 1I INTRODUCTION 3
1 Overall about econometrics 3
2 Why choosing OLS? 4
II QUESTION OF INTEREST 5
III ECONOMIC MODEL 5
1 Choosing the variables 5
2 Embedding that target in a general unrestricted model (GUM) 8 IV ECONOMETRICS MODEL 9
1 Population regression function (PRF) 9
2 Sample regression function (SRF) 9
V DATA COLLECTION 10
1 Data overview 10
2 Data description 10
VI ESTIMATION OF ECONOMETRIC MODEL 10
1 Checking the correlation among variables: 10
2 Regression run 12
VII CHECK MULLTICOLLINEARITY AND HETEROSCEDASTICITY 15
1 Multicollinearity 15
2 Heteroskedasticity 16
VIII HYPOTHESES POSTULATED 19
1 The t test 19
2 Confidence Intervals 21
3 P Value 22
4 Testing the overall significance: The F test 23
IX RESULT ANALYSIS AND POLICY IMPLICATION 24
X CONCLUSION 24
XI REFERENCES 25
Y Figure 1 7
Trang 2Figure 9 21
Trang 3I INTRODUCTION
1.Overall about econometrics
Econometrics is the application of statistical methods to economic data and is described as the branch of economics that aims to give empirical content to economic relations. Precisely speaking, it is the quantitative analysis of actual economic problems, based on the concurrent development of theory and observation, related by appropriate methods of inference. It is understandable that economist make comparison econometrics is like an effective tool to convert mountains of data into extract simple relationships.
The reason why econometrics is effective is economics theory use statistical theory and mathematical statistics to evaluate and develop econometrics method In reality, econometrics help economists to assess economic theories, developing econometrics model, analyzing and forecasting the economic history.
Aware of the importance of econometrics to economic phenomena, our group decides to carry out a research of econometrics: “The factors that have influence on median housing price” and aim to analyze statistic and point out differences and their reason of price level.
The data set has 506 observations with 12 variables in total.
We choose 6 variables: price, crime, nox, rooms, dist and proptax
to do the research in which price is dependent variable and the other five are independent variables. The general method used in
Trang 4This is the first time our group carry out an econometrics research, our performance is unavoidable to have many mistakes.
It would be a pleasure if we can receive the feedback from you to better ourselves next time.
2.Why choosing OLS?
Ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model OLS chooses the parameters of a linear function of a set of explanatory variables by the principle
of least squares : minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function.
With the six selected variables, we use the OLS model because all regressions variable are exogenous variables, the effects of independent variables on the dependent variable are linear effects. In addition, the estimates calculated by means of the least squares OLS are linear estimates that are not deviate and are better than others.
When using OLS, we have some basic assumptions:
1 The regression model is linear in the parameters
2 X values are fixed in repeated sampling, which means Xi and ui are uncorrelated
3 Zero mean value of disturbance (E(ui)) =0)
4 Homoscedasticity or equal variance of ui : var(ui) =
5 No correlation between disturbances
6 The model is correctly specified.
Trang 5III ECONOMIC MODEL
According the provided data, the economic model used in this report is an empirical one. Note that the fundamental model is mathematical; with an empirical model, however, data is gathered for the variables and using accepted statistical techniques, the data are used to provide estimates of the model's values.
size: 22,770
Trang 6e variable label
median housing price, $
crimes committed per capita
nit ox concen; parts per 100m
avg number of rooms
wght dist to 5 employ centers
access. index to rad. hghwys
property tax per
$1000
Trang 7stratio float %9.0g
average student teacher ratio
perc of people 'lower status'
Figure 1
The above table reveal that this is the statistic of factors which have influence in housing price via 506 observations. After discussing carefully, our group jumped into a conclusion to choose a dependent variable Y: Price, independent variable contains:
A brief description of each variable is given in Figure 1.
Trang 8Name Meaning Expected
sign Dependent
Trang 98.59024 7
0.00 6
88.97 6
5.5497 83
1.15839
rooms 506
6.2840 51
0.70259
dist 506
3.7957 51
Trang 10price crime nox rooms dist proptax
corr price crime nox rooms dist proptax
We can see from the matrix, it can be inferred that the correlation between price and each of the independent variable is decent enough to run the regression model. Specifically:
Correlation coefficient between price and crime is 0.3879
=> price and crime have a moderate relationship.
Correlation coefficient between price and nox is 0.426 => price and nox have a moderate relationship.
Correlation coefficient between price and rooms is 0.6958 => price and rooms have a moderate relationship.
Correlation coefficient between price and dist is 0.2493 => price and dist have a weak relationship.
Correlation coefficient between price and proptax is 0.4671
=> price and proptax have a moderate relationship.
Trang 11Independent variables including Rooms and Dist have
correlation coefficient larger than 0, which means they are in
directly relationship with dependent variable The highest
In addition, all variables have correlation coefficient not
larger than 0.8 so this model does not have multicollinearity
Trang 12price Coef Err t P>t Conf Interval]
136.3551 43.55923
0.02 3
16877.67 1242.937
crime, nox, rooms, dist, proptax all have statistically
significant effects on price at the 5% significant level (as all
Trang 134 = 791.2588 means that if weight distance to 5 employ centers increases 1 unit, average housing price will decrease by 791.2588
Detect multicollinearity
o Method 1: Use cor command to examine multicollinearity
If independent variables are strongly correlated (r > 0.8), multicollinearity may occur.
Trang 14proptax 0.4671 0.5828 0.667 0.2921 0.5344 1.0000 Figure 6
From the table above, we can easily see that correlating coefficient among independent variables are pretty low and all smaller than 0.8 As a result, we can conclude that multicollinearity does not occur in this model.
We can draw a conclusion from 2 methods above that multicollinearity not too worrisome a problem for this set of data.
2 Heteroskedasticity
Another problem that our model can suffer from when being examined is heteroskedasticity. Heteroskedasticity may result in the situation that some least squared estimators are still unbiased but are no longer effective, along with that, estimators
Trang 15of variances will become biased, thus lead to the reduction in effectiveness of our model.
When the assumption of variance of each error term Ui is unchanged when i moves from 1, 2 to n. It can also be rewritten as:
Var (U i ) = Var (U j ) i=1,2,3,…,n
j=1,2,3,…,n When that assumption is violated, heteroskedasticity appears
Causes
o Essence of economic phenomena: If economic phenomena
is examined on subjects having difference in scale or they are examined under periods of time that are not similar in fluctuation level.
o Model’s function is wrongly formatted, maybe because appropriate variables are missing or function analysis is false.
o cannot fully and correctly reflect the essence of economic phenomena For example, external observations appear Bringing in or eliminate these observations does great impact on regression analysis.
o Error tends to decrease as data collecting, conserving and processing techniques are improved
o Behaviors in the past are learnt.
Hypothesis:
Trang 16Prob > chi2 = 0.0000
We can see that Prob > chi2 = 0.0000 < 0.05 => We reject H 0 , accept H 1
We can conclude that heteroskedasticity does occur in this model
0.588 3
5937 9
Robust
price Coef Std. Err t P>t [95% Conf Interval]
crime 150.0703 30.45247 4.93 0 209.9009 90.23976 nox 1737.66 389.6642 4.46 0 2503.241 972.0787 rooms 7707.327 670.6304 11.49 0 6389.726 9024.928 dist 791.2588 175.744 4.5 0 1136.546 445.9712 proptax 89.95717 26.84788 3.35 0.001 142.7057 37.20862 _cons 9060.303 5398.964 1.68 0.094 19667.75 1547.148 Figure 8
Note that comparing the results with the earlier regression, none of the coefficient estimates changed, but the standard
Trang 17errors and hence the t values are different, which gives reasonably more accurate p values.
c (500)
0.025 = 1.965 < |t s | => Reject
Conclusion: nitrogen oxide concentrator per 100m has statistically signifincant effect on median housing price. Higher nitrogen oxide concentrator per 100m, lower median housing price Hypothesis:
c (500)
0.025 = 1.965 < |t s | => Reject
Conclusion: The average number of rooms has statistically signifincant effect on median housing price, higher average number of rooms, higher median housing price.
Hypothesis:
Trang 18Conclusion weight distance to 5 employ centers has statistically signifincant effect on median housing price, higher weight distance to 5 employ centers, lower median housing price.
Hypothesis:
c (500)
0.025 = 1.965 < |t s | => Reject
Conclusion Property tax per $1000 has statistically signifincant effect on median housing price, higher property tax per $1000, lower median housing price.
Trang 19X 5 5% (142.7057 ; 37.20862) Figure 9
We can see that for all coefficients, 0 doesn’t belong to the confidence interval, so we reject the hypotheses H 0 : , , , ,
Conclusion: Number of crimes committed per capita, nitrogen oxide concentrator per 100m, the average number of rooms, weight distance to 5 employ centers and property tax per $1000 all have statistically signifincant effect on median housing price with the confidence level of 95%.
In particular, with the sample we have, the estimated result shows that one more crime committed decreases median housing price by 150.07$, holding other factors fixed.
Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H 0
Nitrogen oxide concentrator per 100m has statistically signifincant effect on median housing price Higher nitrogen oxide concentrator per 100m, lower median housing price
In particular, with the sample we have, the estimated result shows that one more unit in nitrogen oxide concentrator per 100m decreases median housing price by 1737.66$, holding other factors fixed.
Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H 0
The average number of rooms has statistically signifincant effect
on median housing price, higher average number of rooms, higher
Trang 20Pvalue = 0.0004 < α = 0.05 => Reject H 0
Weight distance to 5 employ centers has statistically signifincant effect on median housing price, higher weight distance to 5 employ centers, lower median housing price.
In particular, with the sample we have, the estimated result shows that one more unit increased in weight distance to 5 employ centers decreases median housing price by 791.25$, holding other factors fixed.
Hypothesis testing:
Pvalue = 0.0008 < α = 0.05 => Reject H 0
Property tax per $1000 has statistically signifincant effect on median housing price, higher property tax per $1000, lower median housing price.
In particular, with the sample we have, the estimated result shows that one more $ increased in property tax per 1000$ decreases median housing price by 89.96 $, holding other factors fixed.
Trang 21we don’t reduce the model by dropping out this subset.
IX RESULT ANALYSIS AND POLICY IMPLICATION
From data analysis in previous sections, we have gained an overall view of data set given in term of the satistical
relationship between housing prices and each of the factors
proposed. As mentioned at the beginning of this report, we aim to learn how security of the neighborhood, the air pollution, the size of house, accessibility and the property tax are associated with housing price. In other words, we are concerned about what
is the willingness of buyers to pay for these components.
Following the analysis of data, regression model run and
hypothesis testing, it can be concluded that security of the neighborhood, the air pollution, the size of house, accessibility and the property tax statistically affect the housing prices. Therefore, tenants, investors or constructors should take all of these ingredients into account when making deals.
X CONCLUSION
This report is completed on the dedicated contribution of each member and the knowledge from our study in Econometrics This research has provided us with a good opportunity to practice what we have learned and to get a deeper understanding of data analysis and relevant testing. From this useful application, we hope that our research can somehow suggest the relationship between the housing prices and some other factors.
Again, due to the limitation of understanding and resources, our report may contain misinterpretations. We hope that teacher and readers can give us constructive comments on the report so that
we would improve ourselves and do better in the future.
Trang 221 http://pages.hmc.edu/evans/chap1.pdf
2 http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.926.5532&rep=rep1&type=pdf
3 D.A. Belsey, E. Kuh, and R. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: Wiley (1990).