Describe the variables Function we have in this report will include these following variables: Dependent variable: salepric - Sale price and characteristics of house in 2 communities of
Trang 1FOREIGN TRADE UNIVERSITY FACULTY OF INTERNATIONAL ECONOMY
REPORT ECONOMETRICS
HOUSING PRICES AND FACTORS AFFECTING HOUSING PRICES IN CALIFORNIA, USA
Instructor: MSc Chu Thi Mai PhuongStudent: Group AH – KTEE309.2
1 Nguyen Lan Anh 1612250005
Trang 2- Estimate the lin- - Enthusiastic,lin model, log- responsible for
- Test the lin-lin assigned,
- Find source - Have the sense
- Describe the - Be responsibledata, variables, for the work,
log-lin model members work
deadline
edit the report
Trang 3EVALUATION 2
INTRODUCTION 4
ABSTRACT 5
ANALYSIS 6
SECTION 1: DESCRIBE THE VARIABLES, DATA AND CORRELATION 6
I Describe the variables 6
II Describe the data 6
III Describe the correlation between variables 6
SECTION 2: ESTIMATED MODEL AND STATISTICAL INFERENCES 9
I Linear – linear model 9
II Log – linear model 15
III Log – log model 21
CONCLUSION 28
REFERENCES 30
Trang 4Econometrics is the study of the social sciences in which the tools ofeconomic theory, mathematics and statistical speculation are applied to analyzeeconomic problems Econometrics uses the mathematical statistics methods to findout the essence of statistics, make conclusions about the collected statistics that canmake predictions about economic phenomenon
Since its inception, econometrics has provided economists with a sharpinstrument for measuring economic relations As economics students, we recognizethe need to study and learn about Econometrics in logical and problem analysis Tobetter understand how to put the Econometrics into reality and to apply theEconometrics effectively and correctly, our team would like to develop theECONOMETRICS REPORT under the guidance of MSc Chu Thi Mai Phuong Inthis report, we used the econometric analysis tool GRETL to analyze the topic
"Housing Prices and Factors Affecting Housing Prices in California, USA "
We sincerely thank our instructor - MSc Chu Thi Mai Phuong for helping us
to implement this report During the course of the report, despite all the efforts, wecertainly can not avoid the errors, we look forward to your comments so that ourteam can improve this report
Trang 5Recently, according to the report of the National Association of Realtors inthe United States, Vietnam is one of the 10 countries in the world with the highestinvesting in real estate in the USA (VnExpress) California is one of the placeswhere there are many overseas Vietnamese living and also the state has a vibrantreal estate market that Vietnamese and people from other countries would like toinvest in So what has affected housing prices in this area? As economics studentsinterested in real estate, we decided to do research on the topic "Housing Prices andFactors Affecting Housing Prices in California, USA”
In the process of searching for documents, we read a lot of foreign writings about the factors affecting housing prices in many regions and countries around the world such as "Macroeconomic Determinants of the Housing Market "- LSE," House Price Dynamics in the United States "- IMF, After synthesizing and
discussing, we decided to select a few factors that affect the price of a house to conduct research on the subject
Due to the limited time, we can only pick up a few prominent factors, hopefor your understanding Thank you!
Trang 6SECTION 1: DESCRIBE THE VARIABLES, DATA AND
CORRELATION
I Describe the variables
Function we have in this report will include these following variables:
Dependent variable: salepric - Sale price and characteristics of house in
2 communities of California: Dove Canyon and Coto de Caza (thousands
of dollars)
Independent variables:
sqft – Living area in square feet garage – Number of car spaces city – City: 1 for Coto de Caza and 0 for Dove Canyon
II Describe the data
Data collections
We collect data of Sale price and characteristics of house in 2 communities ofCalifornia: Dove Canyon and Coto de Caza from Ramanathan - Gretl
III Describe the correlation between variables
Correlation Matrix for Linear – linear Model:
Correlation coefficients, using the observations 1 - 2245% critical value (two-tailed) = 0.1311 for n = 224
Trang 7- Salepric is directly proportional to sqft The set standard between these twovariable is quite high
- Salepric is directly proportional to garage The set standard between thesetwo variable is medium
- Salepric is directly proportional to city The set standard between these twovariable is medium
Correlation Matrix for Log – linear Model:
Correlation coefficients, using the observations 1 - 2245% critical value (two-tailed) = 0.1311 for n = 224
Correlation Matrix for Log – log Model:
Correlation coefficients, using the observations 1 - 2245% critical value (two-tailed) = 0.1311 for n = 224
-l_salepric l_sqft l_garage city
Trang 9SECTION 2: ESTIMATED MODEL AND STATISTICAL
Describe the basic content of the value when estimating the function:
- The Population regression function is set up:
saleprici = 1 + 2sqft + 3garage+ 4city + i
- The Sample regression function is set up:
̂̂ = ̂ + ̂ sqft + ̂ garage + ̂ city i123 4
- Equation of regression:
salepric = -704.854 + 0.220060sqft + 129.286garage + 101.275city
Data explaination:
Trang 10 2 = 0.220060: Whensqftincreases by 1 (square feet), holding the
value of garage and city constant, the estimated value of salepric
increases by 0.220060 (thousands of dollars)
3 = 129.286: Whengarageincreases by 1 (number of car space),holding
the value of sqft and city constant, the estimated value of salepric
increases by 129.286 (thousands of dollars)
4 = 101.275: The expected sale price of house in Coto de Caza ishigher than that in Dove Canyon with the value is 101.275 (thousands of dollars)
The coefficient of determination R 2 :
In our results, we can see R2 which indicates that the model explains all thevariability of the response data around its mean
That R2 = 0.881604 is quite high, which suggests that the model is good fit.Because this means 88.1604% of the sample variation in the percentage vote fordependent variable (sale price) is explained by the changes in the independentvariables (living area, number of car spaces and city)
2 Testing
2.1 Testing hypothesis
2.1.1 Testing an individual regression coefficient
Purpose: Test for the statistical significace or the effect of independent
variables on dependent one We have: α = 0.05
Testing the variable of Living area in square feet (sqft):
Given that the hypothesis is:
{ : =: ≠
10
Trang 11 We see: P-value of sqft is < 0.0001 < 0.05 → Reject H 0 → The coefficient 2 is statistically significant.
Testing the variable of Number of car spaces (garage):
Given that the hypothesis is:
{ : =: ≠
We see: P-value of garage is < 0.0001 < 0.05 → Reject H 0 → The coefficient 3 is statistically significant.
Testing the variable of City:
Given that the hypothesis is:
{ : =: ≠
We see: P-value of sqft is < 0.0001 < 0.05 → Reject H 0 → The coefficient 4 is statistically significant.
2.1.2 Testing the overall significance.
Purpose: Test the null hypothesis stating that none of the explanatory
variables has an effect on the dependent variable.We have: = 0.05
2.2 Testing the model’s problems.
11
Trang 122.2.1 Testing omit variable
Given that the hypothesis is:
:
Ramsey’s RESET:
Auxiliary regression for RESET specification test
OLS, using observations 1-224
Dependent variable: salepric
with p-value = P(F(2,218) > 13.7772) = 2.32e-006
We see: p-value = P(F(2,218) > 13.7772) = 2.32e – 006 < = 0,05 → Reject H 0 → The model omits variable.
Method: Because of the limited research, we will spend more time readingmore documents to find out which variable is omitted
2.2.2 Testing multicollinearity.
Using the following command vif regression to examine multicollinearity
“VIF” commands specific to the variance inflation factor, if a variable value
vif > 10, the model has the possibility of multicollinearity
Using “VIF” command in Gretl, we have following result:
Variance Inflation Factors
Minimum possible value = 1.0
Values > 10.0 may indicate a collinearity problem
12
Trang 13VIF(j) = 1/(1 - R(j)^2), where R(j) is the multiple correlation
coefficient
between variable j and the other independent variables
Belsley-Kuh-Welsch collinearity diagnostics:
lambda = eigenvalues of X'X, largest to
smallest cond = condition index
note: variance proportions columns sum to 1.0
We see: VIF(sqft) = 1.742 < 10
VIF(garage) = 1.512 < 10VIF(city) = 1.224 < 10
White's test for heteroskedasticity
OLS, using observations 1-224
Dependent variable: uhat^2
Trang 14Unadjusted R-squared = 0.666709
Test statistic: TR^2 = 149.342835,
with p-value = P(Chi-square(8) > 149.342835) = 2.68837e-028
We see: p-value = P(Chi-square(8) > 149.342835) = 2.68837e - 028 < = 0.05 → Reject H 0 → The model has heteroskedasticity problem.
Method: Using Robust to fix the problem:
Model 2: OLS, using observations 1-224 Dependent variable: salepric Heteroskedasticity-robust standard
→ The model has BLUE quality but it still contains heteroskedasticity problem.
2.2.4 Testing normality of residual.
Given that the hypothesis is:
: { :
Using normality of residual in Gretl:
Test for normality of residual
-Null hypothesis: error is normally distributed
Test statistic: Chi-square(2) = 265.203
with p-value = 2.58197e-058
14
Trang 15 We see: Chi-square(2) = 265.203 with p-value 2.58197e-058 < = 0.05 → Reject H 0 → The model does not have normality.
Method: Increasing the number of observations until n ≥ 384.
II Log – linear model.
● Describe the basic content of the value when estimating the function:
- The Population regression function is set up:
ln(salepric) = 1 + 2sqft + 3garage+ 3city + i
- The Sample regression function is set up:
Trang 16✓ 2 = 0.000207498: When sqft increases by 1 (square feet), keeping the
value of garage and city constant, the Expected value of salepric
increases by 0.0207498%
✓ 3 = -0.117941: Whengarage increases by 1 (year), keeping the value of
sqft and city constant, the Expected value of salepric decreases by
11.7941%
✓ 4 = 0.267482: The expected sale price of house in Coto de Caza ishigher than that in Dove Canyon with the value is 26.7482 %
● The coefficient of determination R 2 :
In our results, we can see R2 which indicates that the model explains all thevariability of the response data around its mean
That R2 = 0.888862 is quite high, which suggests that the model is good fit,which means 88.8862% of the sample variation in the percentage vote fordependent variable (salepric) is explained by the changes in the independentvariables (sqft, garage and city)
2 Testing.
2.1 Testing hypothesis
2.1.1 Testing an individual regression coefficient
Purpose: Test for the statistical significace or the effect of independent
variables on dependent one We have: α = 0.05
Trang 17 We see: P-value of sqft is < 0.0001 < 0.05 → Reject H0 → The coefficient β2
is statistically significant
Testing the garage:
Given that the hypothesis is:
We see: P-value of garage is < 0.0001 < 0.05 → Reject H0 → The coefficient
β3 is statistically significant
Testing the city:
Given that the hypothesis is:
We see: P-value of city is < 0.0001 < 0.05 → Reject H0 → The coefficient β4
is statistically significant
2.1.2 Testing the overall significance.
Purpose: Test the null hypothesis stating that none of the explanatory
variables has an effect on the dependent variable.We have: α=0.05
Given that the hypothesis is:
→ The model is statistically fitted
2.2 Testing the model’s problem.
2.2.1 Testing Omit variable.
Given that the hypothesis is:
17
Trang 18{ :
:
Ramsey’s RESET:
Auxiliary regression for RESET specification test
OLS, using observations 1-224
Dependent variable: l_salepric
with p-value = P(F(2,218) > 11.3453) = 2.05e-005
We see: p-value = P(F(2,218) > 11.3453) = 2.05e-005 < α = 0,05 → Reject
H0 → The model omit variable
Method: Because of the limited research, we will spend more time readingmore documents to find out which variable is omitted
2.2.2 Testing multicollinearity.
Using the following command vif regression to examine multicollinearity
“VIF” commands specific to the variance inflation factor, if a variable value
vif > 10, the model has the possibility of multicollinearity
Using “VIF” command in Gretl, we have following result:
Variance Inflation Factors
Minimum possible value = 1.0
Values > 10.0 may indicate a collinearity problem
Trang 19between variable j and the other independent variables
Belsley-Kuh-Welsch collinearity diagnostics:
lambda = eigenvalues of X'X, largest to smallest
note: variance proportions columns sum to 1.0
We see: VIF (sqft) = 1.742 < 10
VIF (garage) = 1.512 < 10VIF (city) = 1.224 < 10
→ The model does not contain perfect multicollinearity
White's test for heteroskedasticity
OLS, using observations 1-224
Dependent variable: uhat^2
Trang 20Test statistic: TR^2 = 48.607688,
with p-value = P(Chi-square(8) > 48.6077) = 7.55903e-008
We see: p-value = P(Chi-square(8) > 48.6077) = 7.55903e-008 < α = 0.05
→ Reject H0 → The model has heteroskedasticity problem
Method: Using Robust to fix the problem:
Model 4: OLS, using observations 1-224 Dependent variable: l_salepric Heteroskedasticity-robust standard
→The model has BLUE quality but it still contains heteroskedasticity
problem
2.2.4 Testing normality of residual
Given that the hypothesis is:
0: The residuals have normality
{
Using normality of residual in Gretl:
Frequency distribution for uhat1, obs 1-224 number of bins = 15, mean = 3.17207e-017, sd = 0.135479
Trang 21Test for null hypothesis of normal distribution:
Chi-square(2) = 16.779 with p-value 0.00023
We see: Chi-square(2) = 16.779 with p-value 0.00023 < α = 0.05 → Reject
H0 → The model does not have normality
Method: Increasing the number of observations until n ≥ 384
III Log – log model.
● Describe the basic content of the value when estimating the
function: - The Population regression function is set up:
Trang 22- The Sample regression function is set up:
✓ 2 = 1.10258: Whensqftincreases by 1%, keeping the value of garage and
city constant, the expected value of salepric increases by 1.10258%.
✓ 3 = 0.421886: When garage increases by 1%, holding the value of sqft,
and city constant, the expected value of salepric decreases by
0.421886%
✓ 4 = 0.235163: The expected sale price of house in Coto de Caza ishigher than that in Dove Canyon with the value is 0.235163%
● The coefficient of determination R 2 :
In our results, we can see R2 which indicates that the model explains all thevariability of the response data around its mean
That R2 = 0.887465 is quite high, which suggests that the model is good fit,which means 88.7465% of the sample variation in the percentage vote fordependent variable (sale price) is explained by the changes in the independentvariables (sqft, garage and city)
2 Testing
2.1 Testing hypothesis
2.1.1 Testing an individual regression coefficient
Purpose: Test for the statistical significance or the effect of independent
variables on dependent one We have: α = 0.05
22