1. Trang chủ
  2. » Giáo Dục - Đào Tạo

tiểu luận kinh tế lượng factors affecting house price

30 21 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 30
Dung lượng 620,5 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Describe the variables Function we have in this report will include these following variables: Dependent variable: salepric - Sale price and characteristics of house in 2 communities of

Trang 1

FOREIGN TRADE UNIVERSITY FACULTY OF INTERNATIONAL ECONOMY

REPORT ECONOMETRICS

HOUSING PRICES AND FACTORS AFFECTING HOUSING PRICES IN CALIFORNIA, USA

Instructor: MSc Chu Thi Mai PhuongStudent: Group AH – KTEE309.2

1 Nguyen Lan Anh 1612250005

Trang 2

- Estimate the lin- - Enthusiastic,lin model, log- responsible for

- Test the lin-lin assigned,

- Find source - Have the sense

- Describe the - Be responsibledata, variables, for the work,

log-lin model members work

deadline

edit the report

Trang 3

EVALUATION 2

INTRODUCTION 4

ABSTRACT 5

ANALYSIS 6

SECTION 1: DESCRIBE THE VARIABLES, DATA AND CORRELATION 6

I Describe the variables 6

II Describe the data 6

III Describe the correlation between variables 6

SECTION 2: ESTIMATED MODEL AND STATISTICAL INFERENCES 9

I Linear – linear model 9

II Log – linear model 15

III Log – log model 21

CONCLUSION 28

REFERENCES 30

Trang 4

Econometrics is the study of the social sciences in which the tools ofeconomic theory, mathematics and statistical speculation are applied to analyzeeconomic problems Econometrics uses the mathematical statistics methods to findout the essence of statistics, make conclusions about the collected statistics that canmake predictions about economic phenomenon

Since its inception, econometrics has provided economists with a sharpinstrument for measuring economic relations As economics students, we recognizethe need to study and learn about Econometrics in logical and problem analysis Tobetter understand how to put the Econometrics into reality and to apply theEconometrics effectively and correctly, our team would like to develop theECONOMETRICS REPORT under the guidance of MSc Chu Thi Mai Phuong Inthis report, we used the econometric analysis tool GRETL to analyze the topic

"Housing Prices and Factors Affecting Housing Prices in California, USA "

We sincerely thank our instructor - MSc Chu Thi Mai Phuong for helping us

to implement this report During the course of the report, despite all the efforts, wecertainly can not avoid the errors, we look forward to your comments so that ourteam can improve this report

Trang 5

Recently, according to the report of the National Association of Realtors inthe United States, Vietnam is one of the 10 countries in the world with the highestinvesting in real estate in the USA (VnExpress) California is one of the placeswhere there are many overseas Vietnamese living and also the state has a vibrantreal estate market that Vietnamese and people from other countries would like toinvest in So what has affected housing prices in this area? As economics studentsinterested in real estate, we decided to do research on the topic "Housing Prices andFactors Affecting Housing Prices in California, USA”

In the process of searching for documents, we read a lot of foreign writings about the factors affecting housing prices in many regions and countries around the world such as "Macroeconomic Determinants of the Housing Market "- LSE," House Price Dynamics in the United States "- IMF, After synthesizing and

discussing, we decided to select a few factors that affect the price of a house to conduct research on the subject

Due to the limited time, we can only pick up a few prominent factors, hopefor your understanding Thank you!

Trang 6

SECTION 1: DESCRIBE THE VARIABLES, DATA AND

CORRELATION

I Describe the variables

Function we have in this report will include these following variables:

Dependent variable: salepric - Sale price and characteristics of house in

2 communities of California: Dove Canyon and Coto de Caza (thousands

of dollars)

Independent variables:

sqft – Living area in square feet garage – Number of car spaces city – City: 1 for Coto de Caza and 0 for Dove Canyon

II Describe the data

Data collections

We collect data of Sale price and characteristics of house in 2 communities ofCalifornia: Dove Canyon and Coto de Caza from Ramanathan - Gretl

III Describe the correlation between variables

 Correlation Matrix for Linear – linear Model:

Correlation coefficients, using the observations 1 - 2245% critical value (two-tailed) = 0.1311 for n = 224

Trang 7

- Salepric is directly proportional to sqft The set standard between these twovariable is quite high

- Salepric is directly proportional to garage The set standard between thesetwo variable is medium

- Salepric is directly proportional to city The set standard between these twovariable is medium

 Correlation Matrix for Log – linear Model:

Correlation coefficients, using the observations 1 - 2245% critical value (two-tailed) = 0.1311 for n = 224

 Correlation Matrix for Log – log Model:

Correlation coefficients, using the observations 1 - 2245% critical value (two-tailed) = 0.1311 for n = 224

-l_salepric l_sqft l_garage city

Trang 9

SECTION 2: ESTIMATED MODEL AND STATISTICAL

 Describe the basic content of the value when estimating the function:

- The Population regression function is set up:

saleprici = 1 + 2sqft + 3garage+ 4city + i

- The Sample regression function is set up:

̂̂ = ̂ + ̂ sqft + ̂ garage + ̂ city i123 4

- Equation of regression:

salepric = -704.854 + 0.220060sqft + 129.286garage + 101.275city

 Data explaination:

Trang 10

 2 = 0.220060: Whensqftincreases by 1 (square feet), holding the

value of garage and city constant, the estimated value of salepric

increases by 0.220060 (thousands of dollars)

 3 = 129.286: Whengarageincreases by 1 (number of car space),holding

the value of sqft and city constant, the estimated value of salepric

increases by 129.286 (thousands of dollars)

 4 = 101.275: The expected sale price of house in Coto de Caza ishigher than that in Dove Canyon with the value is 101.275 (thousands of dollars)

 The coefficient of determination R 2 :

In our results, we can see R2 which indicates that the model explains all thevariability of the response data around its mean

That R2 = 0.881604 is quite high, which suggests that the model is good fit.Because this means 88.1604% of the sample variation in the percentage vote fordependent variable (sale price) is explained by the changes in the independentvariables (living area, number of car spaces and city)

2 Testing

2.1 Testing hypothesis

2.1.1 Testing an individual regression coefficient

 Purpose: Test for the statistical significace or the effect of independent

variables on dependent one We have: α = 0.05

Testing the variable of Living area in square feet (sqft):

 Given that the hypothesis is:

{ : =: ≠

10

Trang 11

 We see: P-value of sqft is < 0.0001 < 0.05 → Reject H 0 → The coefficient 2 is statistically significant.

Testing the variable of Number of car spaces (garage):

 Given that the hypothesis is:

{ : =: ≠

 We see: P-value of garage is < 0.0001 < 0.05 → Reject H 0 → The coefficient 3 is statistically significant.

Testing the variable of City:

 Given that the hypothesis is:

{ : =: ≠

 We see: P-value of sqft is < 0.0001 < 0.05 → Reject H 0 → The coefficient 4 is statistically significant.

2.1.2 Testing the overall significance.

 Purpose: Test the null hypothesis stating that none of the explanatory

variables has an effect on the dependent variable.We have: = 0.05

2.2 Testing the model’s problems.

11

Trang 12

2.2.1 Testing omit variable

 Given that the hypothesis is:

:

 Ramsey’s RESET:

Auxiliary regression for RESET specification test

OLS, using observations 1-224

Dependent variable: salepric

with p-value = P(F(2,218) > 13.7772) = 2.32e-006

 We see: p-value = P(F(2,218) > 13.7772) = 2.32e – 006 < = 0,05 → Reject H 0 → The model omits variable.

 Method: Because of the limited research, we will spend more time readingmore documents to find out which variable is omitted

2.2.2 Testing multicollinearity.

 Using the following command vif regression to examine multicollinearity

“VIF” commands specific to the variance inflation factor, if a variable value

vif > 10, the model has the possibility of multicollinearity

 Using “VIF” command in Gretl, we have following result:

Variance Inflation Factors

Minimum possible value = 1.0

Values > 10.0 may indicate a collinearity problem

12

Trang 13

VIF(j) = 1/(1 - R(j)^2), where R(j) is the multiple correlation

coefficient

between variable j and the other independent variables

Belsley-Kuh-Welsch collinearity diagnostics:

lambda = eigenvalues of X'X, largest to

smallest cond = condition index

note: variance proportions columns sum to 1.0

 We see: VIF(sqft) = 1.742 < 10

VIF(garage) = 1.512 < 10VIF(city) = 1.224 < 10

White's test for heteroskedasticity

OLS, using observations 1-224

Dependent variable: uhat^2

Trang 14

Unadjusted R-squared = 0.666709

Test statistic: TR^2 = 149.342835,

with p-value = P(Chi-square(8) > 149.342835) = 2.68837e-028

 We see: p-value = P(Chi-square(8) > 149.342835) = 2.68837e - 028 < = 0.05 → Reject H 0 → The model has heteroskedasticity problem.

 Method: Using Robust to fix the problem:

Model 2: OLS, using observations 1-224 Dependent variable: salepric Heteroskedasticity-robust standard

→ The model has BLUE quality but it still contains heteroskedasticity problem.

2.2.4 Testing normality of residual.

 Given that the hypothesis is:

: { :

 Using normality of residual in Gretl:

Test for normality of residual

-Null hypothesis: error is normally distributed

Test statistic: Chi-square(2) = 265.203

with p-value = 2.58197e-058

14

Trang 15

 We see: Chi-square(2) = 265.203 with p-value 2.58197e-058 < = 0.05 → Reject H 0 → The model does not have normality.

 Method: Increasing the number of observations until n ≥ 384.

II Log – linear model.

● Describe the basic content of the value when estimating the function:

- The Population regression function is set up:

ln(salepric) = 1 + 2sqft + 3garage+ 3city + i

- The Sample regression function is set up:

Trang 16

✓ 2 = 0.000207498: When sqft increases by 1 (square feet), keeping the

value of garage and city constant, the Expected value of salepric

increases by 0.0207498%

✓ 3 = -0.117941: Whengarage increases by 1 (year), keeping the value of

sqft and city constant, the Expected value of salepric decreases by

11.7941%

✓ 4 = 0.267482: The expected sale price of house in Coto de Caza ishigher than that in Dove Canyon with the value is 26.7482 %

● The coefficient of determination R 2 :

In our results, we can see R2 which indicates that the model explains all thevariability of the response data around its mean

That R2 = 0.888862 is quite high, which suggests that the model is good fit,which means 88.8862% of the sample variation in the percentage vote fordependent variable (salepric) is explained by the changes in the independentvariables (sqft, garage and city)

2 Testing.

2.1 Testing hypothesis

2.1.1 Testing an individual regression coefficient

 Purpose: Test for the statistical significace or the effect of independent

variables on dependent one We have: α = 0.05

Trang 17

 We see: P-value of sqft is < 0.0001 < 0.05 → Reject H0 → The coefficient β2

is statistically significant

 Testing the garage:

Given that the hypothesis is:

 We see: P-value of garage is < 0.0001 < 0.05 → Reject H0 → The coefficient

β3 is statistically significant

 Testing the city:

Given that the hypothesis is:

 We see: P-value of city is < 0.0001 < 0.05 → Reject H0 → The coefficient β4

is statistically significant

2.1.2 Testing the overall significance.

 Purpose: Test the null hypothesis stating that none of the explanatory

variables has an effect on the dependent variable.We have: α=0.05

Given that the hypothesis is:

→ The model is statistically fitted

2.2 Testing the model’s problem.

2.2.1 Testing Omit variable.

 Given that the hypothesis is:

17

Trang 18

{ :

:

 Ramsey’s RESET:

Auxiliary regression for RESET specification test

OLS, using observations 1-224

Dependent variable: l_salepric

with p-value = P(F(2,218) > 11.3453) = 2.05e-005

 We see: p-value = P(F(2,218) > 11.3453) = 2.05e-005 < α = 0,05 → Reject

H0 → The model omit variable

 Method: Because of the limited research, we will spend more time readingmore documents to find out which variable is omitted

2.2.2 Testing multicollinearity.

 Using the following command vif regression to examine multicollinearity

“VIF” commands specific to the variance inflation factor, if a variable value

vif > 10, the model has the possibility of multicollinearity

 Using “VIF” command in Gretl, we have following result:

Variance Inflation Factors

Minimum possible value = 1.0

Values > 10.0 may indicate a collinearity problem

Trang 19

between variable j and the other independent variables

Belsley-Kuh-Welsch collinearity diagnostics:

lambda = eigenvalues of X'X, largest to smallest

note: variance proportions columns sum to 1.0

We see: VIF (sqft) = 1.742 < 10

VIF (garage) = 1.512 < 10VIF (city) = 1.224 < 10

→ The model does not contain perfect multicollinearity

White's test for heteroskedasticity

OLS, using observations 1-224

Dependent variable: uhat^2

Trang 20

Test statistic: TR^2 = 48.607688,

with p-value = P(Chi-square(8) > 48.6077) = 7.55903e-008

 We see: p-value = P(Chi-square(8) > 48.6077) = 7.55903e-008 < α = 0.05

→ Reject H0 → The model has heteroskedasticity problem

 Method: Using Robust to fix the problem:

Model 4: OLS, using observations 1-224 Dependent variable: l_salepric Heteroskedasticity-robust standard

→The model has BLUE quality but it still contains heteroskedasticity

problem

2.2.4 Testing normality of residual

 Given that the hypothesis is:

0: The residuals have normality

{

 Using normality of residual in Gretl:

Frequency distribution for uhat1, obs 1-224 number of bins = 15, mean = 3.17207e-017, sd = 0.135479

Trang 21

Test for null hypothesis of normal distribution:

Chi-square(2) = 16.779 with p-value 0.00023

 We see: Chi-square(2) = 16.779 with p-value 0.00023 < α = 0.05 → Reject

H0 → The model does not have normality

 Method: Increasing the number of observations until n ≥ 384

III Log – log model.

● Describe the basic content of the value when estimating the

function: - The Population regression function is set up:

Trang 22

- The Sample regression function is set up:

✓ 2 = 1.10258: Whensqftincreases by 1%, keeping the value of garage and

city constant, the expected value of salepric increases by 1.10258%.

✓ 3 = 0.421886: When garage increases by 1%, holding the value of sqft,

and city constant, the expected value of salepric decreases by

0.421886%

✓ 4 = 0.235163: The expected sale price of house in Coto de Caza ishigher than that in Dove Canyon with the value is 0.235163%

● The coefficient of determination R 2 :

In our results, we can see R2 which indicates that the model explains all thevariability of the response data around its mean

That R2 = 0.887465 is quite high, which suggests that the model is good fit,which means 88.7465% of the sample variation in the percentage vote fordependent variable (sale price) is explained by the changes in the independentvariables (sqft, garage and city)

2 Testing

2.1 Testing hypothesis

2.1.1 Testing an individual regression coefficient

 Purpose: Test for the statistical significance or the effect of independent

variables on dependent one We have: α = 0.05

22

Ngày đăng: 22/06/2020, 21:34

TỪ KHÓA LIÊN QUAN

w