REGRESSION MODEL FORMATION ➢ Dependent variable: Out put: Level of air pollution measured by AQI Air Quality Index.. kilometer.POPULATION REGRESSION MODEL PRM The model studies about de
Trang 1FOREIGN TRADE UNIVERSITY
ECONOMETRICS REPORT
Analysing certain factors that affect
air pollution
Tutor: MSc Quynh Thuy Nguyen Class: KTEE218 (1-1920).1_LT
Trang 2Foreign Trade University
ECONOMETRICS REPORT
Tutor: Msc Quynh Thuy Nguyen
Class: KTEE218(1-1920).1_LT
Group members:
Phạm Đức Thịnh – 1814450065
Nguyễn Gia Dương – 1814450024
Nguyễn Quang Huy – 1814450043
Nguyễn Anh Đức – 1814450020
Bạch Quốc Hoàng – 1814450039
Trang 3INTRODUCTION 3
CONTENTS 4
I DATA DESCRIPTION 4
1 REGRESSION MODEL FORMATION 4
2 DATA DEPICTION 6
II REGRESSION ANALYSIS 7
1 REGRESSION FUNCTION 7
2 CORRELATIONS ANALYSIS 8
4 REGRESSION MODEL EXAMINATIONS 11
III EXAMINATION OF VIOLATED ESTIMATIONS IN THE MODEL 12
1 Normal distribution 12
2 Multicollinearity 15
CONCLUSION 16
Referrence: 17
Trang 4Under the pressure of enviromental destruction, air pollution also one of the most significant contribution of humannity to our own habitat and natural world Air pollution occurs when harmful or excessive quantities of substances including gases, particles, and biological molecules are introduced into Earth's atmosphere
It may cause diseases, allergies and even death to humans; it may also cause harm to other living organisms such as animals and food crops, and may damage the natural or built environment Both human activity and natural processes can generate air pollution Indoor air pollution and poor urban air quality are listed as two of the world's worst toxic pollution problems in the 2008 Blacksmith Institute World's Worst Polluted Places report Outdoor air pollution alone causes 2.1 to 4.21 million premature deaths annually According to the 2014 World Health Organization report, air pollution in 2012 caused the deaths of around 7 million people worldwide, an estimate roughly echoed by the International Energy
Agency
Understanding the urgency of this phenomenon, our group decide to choose the
topic: “Analysing certain factors that affect air pollution” for our econometrics
assignment With the tools of econometrics, we can easily set a most suitable model to optimize analysis process because the econometric relationships depict the random behaviour of economic relationships which are generally not
considered in economics and mathematical formulations Unanimously, we have decided to select these factors (or variables) to describe the scale of air pollution which is measured by AQI: HPI (Happy Planet Index), GDP per capita, Oil
Trang 5consumtion per day and population density of 35 countries and their biggest cities over whole world in the 2016-2019 period
We would give our farmost appreciation to our lecturer who also our tutor for helpful conduction Throuhout the progression of making this report, we have give our best effort but mistakes are unavoidable If they are significant, please let
us know
CONTENTS
1 REGRESSION MODEL FORMATION
➢ Dependent variable: Out put: Level of air pollution measured by AQI
(Air Quality Index) It is used by government agencies to communicate
to the public how polluted the air currently is or how polluted it is forecast to become AQI is computed by these components: PM2.5 (small particle), PM10 (large particle), Ozon, CO and NO It’s the most precisive statistical figure to measure air pollution level
o Y = AQI
➢ Independent variables:
o X 1 : HPI (Happy Planet Index).
o X 2 : Gper (GDP per capita).
o X 3 : Ocons (Oil consumption per day).
4|
Trang 6o X 4 : Dens (People per sq kilometer).
POPULATION REGRESSION MODEL (PRM)
The model studies about depandence between the level of air pollution of representing cities amongst 35 nations and their Happy planet Index, GDP per capita (USD), Oil consumption per day (thousand barrels), population density (people per sq.kilometer)
• X 1 : HPI_ Happy Planet Index measures what matters: sustainable
wellbeing for all It tells us how well nations are doing at achieving long, happy, sustainable lives This variable was added in order to test whether people with high life expectancy, wellbeing, low inequality would show positive impact on their living enviroment
or not
• X 2 : Gper_ GDP per capita (USD) This variable is a most
convenient statistical norm to assess the economic growth for international comparation or productivity per captia on the period
of time We would like to find out the relation between people’s productivity and air pollution level
• X 3 : Ocons_Oil Consumption (thousand barrels/day): This variable
has certain effect on output AQI components are made of
emissions from oil consuming activities of human: vehicles
powering, manufactureing operation…
• X 4 : Dens_ Population density The higher density the more
exploitation of natural resources.A dense area obviously faces with
Trang 7more social issues, which included air pollution However, we still wanna test the significant level of this variables
2 DATA DEPICTION
o Using STATA 12 for model description with des command, We have collected this result:
.des aqi hpi Gper Ocons Dens
storage display value variable name type format label variable label
-
Ocons float %8.0g Oil Consumtion
Dens long %8.0g Population Density
o We continue use “sum” command for data descripton “sum” has shown
us number of observation (Obs), mean, standard deviation (std dev.), and also maximum value (Max), minimum value (Min) of variables
sum aqi hpi Gper Ocons Dens
Trang 8Variable | Obs Mean Std Dev Min Max
-+ -aqi | 148 55.03378 35.07532 16 188 hpi | 148 28.9 6.302273 15.9 40.7 Gper | 148 29417.34 21348.31 1923 82773 Ocons | 148 2224.618 3732.938 141.1 20094 Dens | 148 5928.304 5100.263 864 46781
o It can be seen clearly from the table that the difference of level between the largest (82773) and the smallest (1923) of Gper variable,
which is also the highest statistical figure from the rest
o Along with Gper, Ocons and Dens also witnessed a noticable disparity between the minimum value and maximum value The reason is our target are random countries around whole world included developed and under-developed nations
1 REGRESSION FUNCTION.
a Population ( | regression, function)=+(PRF):+ + +14011223344
Trang 9b Sample regression function (SRF):
2 CORRELATIONS ANALYSIS
➢ Using “corr” comand to test the correlation of variables Y = aqi
X1 = hpi, X2 = Gper, X3 = Ocons, X4 = Dens
corr aqi hpi Gper Ocons Dens
(obs=148)
-
+ -aqi |1.0000
hpi | -0.1581 1.0000
Gper | -0.5124 -0.0516 1.0000
Ocons | 0.2017 -0.2333 0.0702 1.0000
Dens | 0.1935 0.1305 -0.2629 0.1231 1.0000
➢ Explaining variables relationship:
- Correlative coefficient between aqi and hpi is -0.1581
- Correlative coefficient between aqi and Gper is -0.5124
- Correlative coefficient between aqi and Ocons is 0.2017
Trang 10- Correlative coefficient between aqi and Dens is 0.1935
According to the figures from the table, there are no coefficient greater than 0.8 ➔ the multicollinearity didn’t occur in our model.
3 EXPOSING REGRESSION FUNCTION AND RESULTS.
a Run regression model diagnosis: Using “reg” command to run regression
model dianosis in STATA
Y = aqi
X1 = hpi, X2 = Gper, X3 = Ocons, X4 = Dens
reg aqi hpi Gper Ocons Dens
Source | SS df MS Number of obs = 148
-+ - F( 4, 143) = 18.37
Model | 61389.6976 4 15347.4244 Prob > F = 0.0000
Residual | 119461.134 143 835.392542 R-squared = 0.3394
-+ - Adj R-squared = 0.3210
Total | 180850.831 147 1230.27776 Root MSE = 28.903
-aqi | Coef Std Err t P>|t| [95% Conf Interval]
-+ -hpi | -.8097223 3944118 -2.05 0.042 -1.589353 -.0300916
Gper | -.0008548 0001164 -7.34 0.000 -.001085 -.0006247
Ocons | 0018599 000669 2.78 0.006 0005375 0031823
Dens | 0003528 0004963 0.71 0.478 -.0006282 0013338
_cons | 97.35311 12.57025 7.74 0.000 72.50559 122.2006
Trang 11(Table 1)
b Linear Regression function:
With result from STATA, we have LRF:
aqi = 97.35311 - 0.8097223 hpi – 0.0008548 Gper + 0.0018599 Ocons +
0.0003528 Dens + ui
c Explaining results:
Coefficient coeficient
value Ceteris paribus, when hpi increase
<0 Estimator of
by 1 unit then aqi decrease by the amount of 0.8097 They may have negative correlative but it is apparent prediction It showed that a nation has high HPI, its citizens will reduce air pollution impact on
enviroment Ceteris paribus, when Gper increase
<0 Estimator of
by 1 unit then aqi decrease by the amount of 0.0008548 Surprisingly outcome, GDP per capita of
a nation increase following by the reduction in air pollution level.
>0 Estimator of Ceteris paribus, when Ocons
increase by 1 unit then aqi increase by the amount of 0.00186 Therefore, the increase in oil consumption per day lead to the increase in air polluton level.
>0 Estimator of Ceteris paribus, when Dens increase
by 1 unit then aqi increase by the amount of 0.0003528, which means the increase in population density lead to the rise in air polluton level.
10|
Trang 12R2 = 0.3394 : Independent variables can explain 33.94% of dependent variable’s fluctuation
4 REGRESSION MODEL EXAMINATIONS.
a Testing the coincidence of model:
➢
Method of using critical value :
Considering this hypothesis:
{
: 2 =0( = = = = )
0
1
: 2 ≠ 0
Using the “test” command to test the hypothesis:
test hpi Gper Ocons Dens
( 1) hpi = 0 ( 2) Gper = 0 ( 3) Ocons = 0 ( 4) Dens = 0
F( 4, 143) = 18.37 Prob > F = 0.0000 di Ftail(4, 143, 05) 99526432
143) = 18.37 > (4,143) = 0.99526432 → Reject
- Conclusion: Regression model coincide with sample.
Trang 1311|
Trang 14Considering this hypothesis:
2=0( = = = = )
0
1
: 2 ≠ 0
-From Table 1, we have P–Value = 0.0000 < = 0.05 → Reject H0-
Conclusion: Regression model coincide with sample.
b Testing the regression coefficients
➢
Level of significants: = 0.05
i Variables P-Value Statiscal Conclusion
significance
1 hpi 0.042 < Yes Happy planet index afftect the
air quality
2 Gper 0.000 < Yes GDP per capita affect the air
quality
3 Ocons 0.006 < Yes Oil consumption per day affect
the air quality
4 Dens 0.478 > No Population density has no effect
on air pollution level
III EXAMINATION OF VIOLATED ESTIMATIONS IN THE
MODEL.
1 Normal distribution
:
12|
Trang 15➢ Method 1: Testing normal distribution by graphs
The graphs illustrated that AQI are seem like coutinuously normal distributed
variable (Bell-shaped distribution) - Can not draw conclusion
➢ Method 2: Testing normal distributtion by considering Skewness &
Kurtosis values.
- A fundamental task in many statistical analyses is to characterize the location and variability of a data set A further characterization of the data includes skewness and kurtosis Skewness is a measure of symmetry, or more precisely, the lack of symmetry A distribution, or data set, is
symmetric if it looks the same to the left and right of the center point Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution That is, data sets with high kurtosis tend
to have heavy tails, or outliers Data sets with low kurtosis tend to have light tails, or lack of outliers A uniform distribution would be the extreme case A variables has normal distribution whether the skewness and
kurtosis value almost equal 0 or 3
• Variable description with STATA:
AQI
13|
Trang 16-Percentiles Smallest
Largest Std Dev 35.07532
90% 108 165 Variance 1230.278
99% 165 188 Kurtosis 5.162938
• Estimating 2 values:
sktest aqi
Skewness/Kurtosis tests for Normality
- joint -Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
-+
-aqi | 148 0.0000 0.0008 37.18 0.0000
From estimated result of AQI variables_Air =quality0.05 index,→ p-value of skewness and
kurtosis estimation are both smaller than Must reject H0
➔
Regression model doesn’t has normal distribution.
Trang 17However, with number of observations n = 148, we believe model could given acceptable outputs and could be used to infer statistics
2 Multicollinearity
a. CORRELATIVE ANALYSIS: presented in II.2.
Variable | VIF 1/VIF
-+ -Dens | 1.13 0.887027 Ocons | 1.10 0.911262 hpi | 1.09 0.919770 Gper | 1.09 0.920100
-+ -Mean VIF | 1.10
Conclusion: All VIF values are smaller than 10 ➔
The regression model doesn’t has multicollinearity phenomenon.
15|
Trang 18Throughout running data diagnosis, model analysing, conducting estimation,
surmount model violations, we have given these conclusions:
• Previous Sample Regression Model:
Output = 97.35311 - 0.8097223 hpi – 0.0008548 Gper + 0.0018599 Ocons + 0.0003528 Dens
After all, these steps upward have helped us to answer the question in the
introduction: Do HPI (Happy Planet Index), GDP per capita, Oil consumption and population affect the level of air pollution (AQI)? And in which scale? With the useful tool of STATA, our group have given specific figure, run mathematic model
in order to form the most coincide regression model, estimated all the task of
regression analysis
• Some limitations in implementation and recommendations:
- Limitations:
+ Data collecting is manual methods by variety sources from internet
so that errors are unavoidable
+ In facts, there are uncountable factors beside HPI, Gper, Ocons, Dens
that affect the air pollution level It’s might be not the most accordant variable to
depict the output
- Recommendations: If it’s possible, we should add more variables to this model such as: Enviromental tax, average rain quantity, constructions per
sq mile, etc for more incisive overview of our research
Trang 19- Basic of econometrics 5th edition by D N Gujarati
- STATA 12 software
- Internet source:
o https://air.plumelabs.com/en/
o http://happyplanetindex.org/
o http://statisticstimes.com/economy/projected-world-gdp-ranking.php
o https://ceoworld.biz/2018/11/13/the-worlds-biggest-oil-consuming-countries/
o https://populationof2019.com/population-of-beijing-2019.html
o http://www.thongke.info.vn/Desktop.aspx/Quan_ly_so_lieu/Phan-
bo-chuan-Normal-distribution-trong-Stata/Phan_bo_chuan_Normal_distribution_trong_Stata/
17|