We choose 6 variables: price, crime, nox, rooms, dist and proptax to do the research in which price is dependent variable and theother five are independent variables.. Name Meaning Expec
Trang 1I INTRODUCTION 3
1 Overall about econometrics 3
2 Why choosing OLS? 4
II QUESTION OF INTEREST 5
III ECONOMIC MODEL 5
1 Choosing the variables 5
2 Embedding that target in a general unrestricted model (GUM) 8 IV ECONOMETRICS MODEL 9
1 Population regression function (PRF) 9
2 Sample regression function (SRF) 9
V DATA COLLECTION 10
1 Data overview 10
2 Data description 10
VI ESTIMATION OF ECONOMETRIC MODEL 10
1 Checking the correlation among variables: 10
2 Regression run 12
VII CHECK MULLTICOLLINEARITY AND HETEROSCEDASTICITY 15
1 Multicollinearity 15
2 Heteroskedasticity 16
VIII HYPOTHESES POSTULATED 19
1 The t test 19
2 Confidence Intervals 21
3 P- Value 22
4 Testing the overall significance: The F test 23
IX RESULT ANALYSIS AND POLICY IMPLICATION 24
X CONCLUSION 24
XI REFERENCES 25
Figure 1 7
Figure 2 9
Trang 2Figure 3 10
Figure 4 11
Figure 5 13
Figure 6 15
Figure 7 16
Figure 8 18
Figure 9 21
Trang 3I INTRODUCTION
1.Overall about econometrics
Econometrics is the application of statistical methods toeconomic data and is described as the branch of economics thataims to give empirical content to economic relations Preciselyspeaking, it is the quantitative analysis of actual economicproblems, based on the concurrent development of theory andobservation, related by appropriate methods of inference It isunderstandable that economist make comparison econometrics islike an effective tool to convert mountains of data into extractsimple relationships
The reason why econometrics is effective is economics theoryuse statistical theory and mathematical statistics to evaluateand develop econometrics method In reality, econometrics helpeconomists to assess economic theories, developing econometricsmodel, analyzing and forecasting the economic history
Aware of the importance of econometrics to economic phenomena,our group decides to carry out a research of econometrics: “Thefactors that have influence on median housing price” and aim toanalyze statistic and point out differences and their reason ofprice level
The data set has 506 observations with 12 variables in total
We choose 6 variables: price, crime, nox, rooms, dist and proptax
to do the research in which price is dependent variable and theother five are independent variables The general method used in
Trang 4this research is OLS (ordinary least squares) In addition, thespecialized method is estimate, running Stata software as well.
During carrying out this research, our group is so lucky to
be guided thoroughly by Dr Dinh Thi Thanh Binh We are gratefulfor everything you have taught us!
This is the first time our group carry out an econometricsresearch, our performance is unavoidable to have many mistakes
It would be a pleasure if we can receive the feedback from you tobetter ourselves next time
2.Why choosing OLS?
Ordinary least squares (OLS) is a type of linear leastsquares method for estimating the unknown parameters in a linearregression model OLS chooses the parameters of a linearfunction of a set of explanatory variables by the principle
of least squares: minimizing the sum of the squares of thedifferences between the observed dependent variable in thegiven dataset and those predicted by the linear function
With the six selected variables, we use the OLS model becauseall regressions variable are exogenous variables, the effects ofindependent variables on the dependent variable are lineareffects In addition, the estimates calculated by means of theleast squares OLS are linear estimates that are not deviate andare better than others
When using OLS, we have some basic assumptions:
Trang 51. The regression model is linear in the parameters
2. X values are fixed in repeated sampling, which means Xiand ui are uncorrelated
3. Zero mean value of disturbance (E(ui)) =0)
4. Homoscedasticity or equal variance of ui : var(ui) = σ2
5. No correlation between disturbances
6. The model is correctly specified
7. Number of observations must be greater than the number
of parameters to be estimated
8. X values in a given sample must not be the same
9. No perfect multicollinearity
10. Normal distribution
We have always been wondering “Why do housing prices amonglocations and regions differ so much?” Housing prices are affected
by many different factors such as structure, neighborhood,accessibility, air pollution and so on To seek the answer tothat question, our group is going to use the collected data tobuild and run the regression model and then the results are going
to be analyzed to finally answer the question of interest above
III ECONOMIC MODEL
According the provided data, the economic model used in thisreport is an empirical one Note that the fundamental model ismathematical; with an empirical model, however, data is gatheredfor the variables and using accepted statistical techniques, thedata are used to provide estimates of the model's values
1 Choosing the variables
Trang 6Having described data via the command “des” in file… fromStata software, we gain the result as following:
des
obs: 506 vars: 12
31 Oct 1996 16:37 size: 22,770
Trang 7l variable label
median housingprice, $
crimes committed per capita
wght dist to 5employ centers
average student-teacher ratio
perc of people'lower status'
Trang 8lnox float %9.0g log(nox)
Figure 1The above table reveal that this is the statistic of factorswhich have influence in housing price via 506 observations Afterdiscussing carefully, our group jumped into a conclusion tochoose a dependent variable Y: Price, independent variablecontains:
Price=f (crime , nox , rooms , dist , proptax)
A brief description of each variable is given in Figure 1
Trang 9Name Meaning Expected
signDependent
Variable (Y)
Price Median housing price +
IndependentVariables (X)
Crime Number of crimes
committed per capita
-Nox The amount of nitrogen
oxide concentrator parts
in the air per 100m
1 Population regression function (PRF)
PRF:
Price=β0+β1× crime +β2× nox+β3×rooms+ β4× dist +β5× proptax+u i
2 Sample regression function (SRF)
SRF:
Trang 10Price= ^ β0+ ^β1× crime+^ β2× nox+ ^ β3×rooms+ ^ β4× dist + ^ β5× proptax
where:
is the intercept of the regression model
is the slope coefficient of the independent variable
is the disturbance of the regression model
8.590247
0.00688.976
Trang 11nox 506
5.549783
1.15839
rooms 506
6.284051
0.70259
38 3.56 8.78
dist 506
3.795751
2.10613
7 1.13 12.13propta
40.82372
16.8537
Figure 3where:
Obs is the number of observationsStd Dev is the standard deviation of the variableMin is the minimum value of the variable
Max is the maximum value of the variable
1 Checking the correlation among variables:
Trang 12dist 0.2493 -0.3799 -0.7702 0.2054 1 proptax -0.4671 0.5828 0.667 -0.2921 -0.5344 1Figure 4
First and foremost, the correlation of Price and nox, crime,rooms, dist, proptax is checked by calculating the correlationcoefficient among these variables The correlation coefficientmeasures the strength and direction of a linear relationshipbetween two variables on a scatterplot In Stata, the correlationwith matrix is generated the command:
corr price crime nox rooms dist proptax
We can see from the matrix, it can be inferred that thecorrelation between price and each of the independent variable isdecent enough to run the regression model Specifically:
- Correlation coefficient between price and crime is -0.3879 =>
price and crime have a moderate relationship
- Correlation coefficient between price and nox is -0.426 =>
price and nox have a moderate relationship
- Correlation coefficient between price and rooms is 0.6958 =>
price and rooms have a moderate relationship
- Correlation coefficient between price and dist is 0.2493 =>
price and dist have a weak relationship
- Correlation coefficient between price and proptax is -0.4671
=> price and proptax have a moderate relationship
Independent variables including Rooms and Dist havecorrelation coefficient larger than 0, which means they are indirectly relationship with dependent variable The highest
Trang 13coefficient is 0.6958 (between Rooms and Price) points out thatRooms have the strongest impaction on Price When roomsincreases, then price will increase much On the other hands, thecorrelation coefficient between Price and Dist is 0.2493 Itimplies that they have not strong connection Even if the Distincreases, Price increases but not much
In addition, all variables have correlation coefficient not largerthan 0.8 so this model does not have multicollinearity problem
2 Regression run
Having checked the required condition of correlation amongvariables, the regression model is ready to run In Stata, this
is done by using the command:
Reg price nox crime rooms dist proptax
Trang 14136.3551 -43.55923
0.02 3
16877.67 -1242.937
-Figure 5
From table above we have Sample Regression Function:
Price = 9060.303 1737.66*nox + 7707.327*rooms 89.95717*proptax
-From the result, it can be inferred that
crime, nox, rooms, dist, proptax all have statistically significant effects on price at the 5% significant level (as all p-values are
smaller than 0.05) In particular, those effects can be specified bythe regression coefficients as follows:
β0 = -9060.303
Trang 151 = -1737.66 means that if nit ox concen per 100m increases byone , average housing price will decrease by 1737.66 in conditionother factors do not change.
2 = -150.0703 means that if crimes committed per capitalincreases by one , average housing price will decrease by 150.0703
in condition other factors do not change
3 = 7707.327 means that if average number of rooms increases byone, average housing price will increase by 7707.327 in conditionother factors do not change
4 = -791.2588 means that if weight distance to 5 employ centersincreases 1 unit, average housing price will decrease by 791.2588
in condition other factors do not change
5 = -89.95717 means that if average property tax per $1000increases by one, average housing price will decrease by 89.95717
in condition other factors do not change
The coefficient of determination R-squared=0.5883: all
independent variables (crime, nox, rooms, dist, proptax,)
jointly explain 58.83% of the variation in the dependent
variable (price); other factors that are not mentioned explain the remaining 41.17% of the variation in the price.
Other indicators:
- Adjusted coefficient of determination adj R-squared = 0.5842
- Total Sum of Squares TSS = 4,28E+14
- Explained Sum of Squares ESS = 2,52E+14
- Residual Sum of Squares RSS = 1,76E+14
- The degree of freedom of Model Dfm= 5
- The degree of freedom of residual Dfr = 500
VII CHECK MULLTICOLLINEARITY AND HETEROSCEDASTICITY
Trang 161 Multicollinearity
Multicollinearity is the high degree of correlation amongstthe explanatory variables, which may make it difficult toseparate out the effects of the individual regressors, standarderrors may be overestimated and t-value depressed
Detect multicollinearity
o Method 1 : Use cor command to examine multicollinearity
If independent variables are strongly correlated (r > 0.8),multicollinearity may occur
rooms 0.6958 -0.2188 -0.3028 1.0000 dist 0.2493 -0.3799 -0.7702 0.2054 1.0000 proptax -0.4671 0.5828 0.667 -0.2921 -0.5344 1.0000Figure 6
From the table above, we can easily see that correlatingcoefficient among independent variables are pretty low and allsmaller than 0.8 As a result, we can conclude thatmulticollinearity does not occur in this model
o Method 2 : Use variance inflation factor (VIF)
If VIF > 10, multicollinearity occurs
Trang 17Variable VIF 1/VIF
2 Heteroskedasticity
Another problem that our model can suffer from when beingexamined is heteroskedasticity Heteroskedasticity may result inthe situation that some least squared estimators are stillunbiased but are no longer effective, along with that, estimators
of variances will become biased, thus lead to the reduction ineffectiveness of our model
When the assumption of variance of each error term Ui isunchanged when i moves from 1, 2 to n It can also be rewrittenas:
Var (Ui) = Var (Uj) i=1,2,3,…,n
Trang 18When that assumption is violated, heteroskedasticity appears
Causes
o Essence of economic phenomena: If economic phenomena
is examined on subjects having difference in scale or theyare examined under periods of time that are not similar influctuation level
o Model’s function is wrongly formatted, maybe becauseappropriate variables are missing or function analysis isfalse
o cannot fully and correctly reflect the essence ofeconomic phenomena For example, external observationsappear Bringing in or eliminate these observations doesgreat impact on regression analysis
o Error tends to decrease as data collecting, conservingand processing techniques are improved
o Behaviors in the past are learnt.
Hypothesis: { H0:the variance is homogenous
H1:the variance is not homogenous
Using the command estat hettest in STATA:
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance
Trang 19Variables: fitted values of price
chi2(1) = 26.56 Prob > chi2 = 0.0000
We can see that Prob > chi2 = 0.0000 < 0.05 => We reject H0,accept H1
We can conclude that heteroskedasticity does occur in thismodel
Correcting heteroskedasticity
We use command:
reg price crime nox rooms dist proptax, robust
we have the result
R-squared =
0.5883
Root MSE =
5937
9
Robustprice Coef Std Err t P>t [95% Conf Interval]
crime -150.0703 30.45247 -4.93 0 -209.9009 -90.23976nox -1737.66 389.6642 -4.46 0 -2503.241 -972.0787rooms 7707.327 670.6304 11.49 0 6389.726 9024.928
Trang 20dist -791.2588 175.744 -4.5 0 -1136.546 -445.9712proptax -89.95717 26.84788 -3.35 0.001 -142.7057 -37.20862_cons -9060.303 5398.964 -1.68 0.094 -19667.75 1547.148Figure 8
Note that comparing the results with the earlier regression,none of the coefficient estimates changed, but the standard errorsand hence the t values are different, which gives reasonably moreaccurate p values
VIII HYPOTHESES POSTULATED
Conclusion: Number of crimes committed per capita hasstatistically signifincant effect on median housing price Highernumber of crimes commited per capita, lower median housing price