Unanimously, we have decided to select these factors or variables to describe the scale of air pollution which is measured by AQI: HPI Happy Planet Index, GDP per capita, Oil... REGRESSI
Trang 1
ECONOMETRICS REPORT
Analysing certain factors that affect air pollution
Ha Noi, September 2019
FOREIGN TRADE UNIVERSITY
Tutor: MSc Quynh Thuy Nguyen Class: KTEE218 (1-1920).1_LT
Trang 2Foreign Trade University
ECONOMETRICS REPORT
Tutor: Msc Quynh Thuy Nguyen
Class: KTEE218(1-1920).1_LT
Group members:
Phạm Đức Thịnh – 1814450065
Nguyễn Gia Dương – 1814450024
Nguyễn Quang Huy – 1814450043
Nguyễn Anh Đức – 1814450020
Bạch Quốc Hoàng – 1814450039
Trang 3INTRODUCTION 3
CONTENTS 4
I DATA DESCRIPTION 4
1 REGRESSION MODEL FORMATION 4
2 DATA DEPICTION 6
II REGRESSION ANALYSIS 7
1 REGRESSION FUNCTION 7
2 CORRELATIONS ANALYSIS 8
4 REGRESSION MODEL EXAMINATIONS 11
III EXAMINATION OF VIOLATED ESTIMATIONS IN THE MODEL 12
1 Normal distribution 12
2 Multicollinearity 15
CONCLUSION 16
Referrence: 17
Trang 4Under the pressure of enviromental destruction, air pollution also one of the most significant contribution of humannity to our own habitat and natural world
Air pollution occurs when harmful or excessive quantities of substances including gases, particles, and biological molecules are introduced into Earth's atmosphere
It may cause diseases, allergies and even death to humans; it may also cause harm to other living organisms such as animals and food crops, and may damage the natural or built environment Both human activity and natural processes can generate air pollution Indoor air pollution and poor urban air quality are listed as two of the world's worst toxic pollution problems in the 2008 Blacksmith Institute World's Worst Polluted Places report Outdoor air pollution alone causes 2.1 to 4.21 million premature deaths annually According to the 2014 World Health Organization report, air pollution in 2012 caused the deaths of around 7 million people worldwide, an estimate roughly echoed by the International Energy Agency
Understanding the urgency of this phenomenon, our group decide to choose the
topic: “Analysing certain factors that affect air pollution” for our econometrics
assignment With the tools of econometrics, we can easily set a most suitable model to optimize analysis process because the econometric relationships depict the random behaviour of economic relationships which are generally not
considered in economics and mathematical formulations Unanimously, we have decided to select these factors (or variables) to describe the scale of air pollution which is measured by AQI: HPI (Happy Planet Index), GDP per capita, Oil
Trang 5consumtion per day and population density of 35 countries and their biggest cities over whole world in the 2016-2019 period
We would give our farmost appreciation to our lecturer who also our tutor for helpful conduction Throuhout the progression of making this report, we have give our best effort but mistakes are unavoidable If they are significant, please let
us know
CONTENTS
1 REGRESSION MODEL FORMATION
➢ Dependent variable: Out put: Level of air pollution measured by
AQI (Air Quality Index) It is used by government agencies to communicate to the public how polluted the air currently is or how polluted it is forecast to become AQI is computed by these
components: PM2.5 (small particle), PM10 (large particle), Ozon, CO and NO It’s the most precisive statistical figure to measure air
pollution level
o Y = AQI
➢ Independent variables:
o X 1 : HPI (Happy Planet Index)
o X 2 : Gper (GDP per capita)
o X 3 : Ocons (Oil consumption per day)
Trang 6o X 4 : Dens (People per sq kilometer)
POPULATION REGRESSION MODEL (PRM)
𝑶𝒖𝒕𝒑𝒖𝒕 = 𝜷𝟎 + 𝜷𝟏𝑯𝑷𝑰 + 𝜷𝟐𝑮𝒑𝒆𝒓 + 𝜷𝟑𝑶𝒄𝒐𝒏𝒔 + 𝜷𝟒𝑫𝒆𝒏𝒔 + 𝒖𝒊
The model studies about depandence between the level of air pollution of representing cities amongst 35 nations and their Happy planet Index, GDP per capita (USD), Oil consumption per day (thousand barrels), population density (people per sq.kilometer)
• X 1: HPI_ Happy Planet Index measures what matters: sustainable wellbeing for all It tells us how well nations are doing at achieving long, happy, sustainable lives This variable was added in order to test whether people with high life expectancy, wellbeing, low inequality would show positive impact on their living enviroment
or not
• X 2 : Gper_ GDP per capita (USD) This variable is a most
convenient statistical norm to assess the economic growth for international comparation or productivity per captia on the period
of time We would like to find out the relation between people’s productivity and air pollution level
• X 3 : Ocons_Oil Consumption (thousand barrels/day): This variable
has certain effect on output AQI components are made of emissions from oil consuming activities of human: vehicles powering, manufactureing operation…
• X 4: Dens_ Population density The higher density the more exploitation of natural resources.A dense area obviously faces with
Trang 7more social issues, which included air pollution However, we still wanna test the significant level of this variables
2 DATA DEPICTION
o Using STATA 12 for model description with des command, We have collected this result:
des aqi hpi Gper Ocons Dens
storage display value variable name type format label variable label -
aqi int %8.0g AQI hpi float %8.0g HPI Gper long %8.0g GDP per capita Ocons float %8.0g Oil Consumtion Dens long %8.0g Population Density
o We continue use “sum” command for data descripton “sum” has shown
us number of observation (Obs), mean, standard deviation (std dev.), and also maximum value (Max), minimum value (Min) of variables
sum aqi hpi Gper Ocons Dens
Trang 8Variable | Obs Mean Std Dev Min Max
-+ -
aqi | 148 55.03378 35.07532 16 188
hpi | 148 28.9 6.302273 15.9 40.7
Gper | 148 29417.34 21348.31 1923 82773
Ocons | 148 2224.618 3732.938 141.1 20094
Dens | 148 5928.304 5100.263 864 46781
o It can be seen clearly from the table that the difference of level between the largest (82773) and the smallest (1923) of Gper variable, which is also the highest statistical figure from the rest
o Along with Gper, Ocons and Dens also witnessed a noticable disparity between the minimum value and maximum value The reason is our target are random countries around whole world included developed and under-developed nations
1 REGRESSION FUNCTION
a Population regression function (PRF):
𝐸(𝑌|𝑋1, 𝑋4) = 𝛽0+ 𝛽1𝑋1+ 𝛽2𝑋2+ 𝛽3𝑋3+ 𝛽4𝑋4
Trang 9b Sample regression function (SRF):
𝑌 = 𝛽 ̂ + 𝛽0 ̂𝑋1 1+ 𝛽 ̂𝑋2 2+ 𝛽 ̂𝑋3 3+ 𝛽 ̂𝑋4 4+ 𝑢𝑖
2 CORRELATIONS ANALYSIS
➢ Using “corr” comand to test the correlation of variables Y = aqi
X1 = hpi, X2 = Gper, X3 = Ocons, X4 = Dens
corr aqi hpi Gper Ocons Dens
(obs=148)
| aqi hpi Gper Ocons Dens
-+ -
aqi | 1.0000
hpi | -0.1581 1.0000
Gper | -0.5124 -0.0516 1.0000
Ocons | 0.2017 -0.2333 0.0702 1.0000
Dens | 0.1935 0.1305 -0.2629 0.1231 1.0000
➢ Explaining variables relationship:
- Correlative coefficient between aqi and hpi is -0.1581
- Correlative coefficient between aqi and Gper is -0.5124
- Correlative coefficient between aqi and Ocons is 0.2017
Trang 10- Correlative coefficient between aqi and Dens is 0.1935
According to the figures from the table, there are no coefficient greater than 0.8 ➔ the multicollinearity didn’t occur in our model
3 EXPOSING REGRESSION FUNCTION AND RESULTS
a Run regression model diagnosis: Using “reg” command to run regression
model dianosis in STATA
Y = aqi
X1 = hpi, X2 = Gper, X3 = Ocons, X4 = Dens
reg aqi hpi Gper Ocons Dens
Source | SS df MS Number of obs = 148 -+ - F( 4, 143) = 18.37 Model | 61389.6976 4 15347.4244 Prob > F = 0.0000 Residual | 119461.134 143 835.392542 R-squared = 0.3394 -+ - Adj R-squared = 0.3210 Total | 180850.831 147 1230.27776 Root MSE = 28.903
- aqi | Coef Std Err t P>|t| [95% Conf Interval]
-+ - hpi | -.8097223 .3944118 -2.05 0.042 -1.589353 -.0300916 Gper | -.0008548 .0001164 -7.34 0.000 -.001085 -.0006247 Ocons | 0018599 000669 2.78 0.006 0005375 .0031823 Dens | .0003528 .0004963 0.71 0.478 -.0006282 .0013338 _cons | 97.35311 12.57025 7.74 0.000 72.50559 122.2006
Trang 11(Table 1)
b Linear Regression function:
With result from STATA, we have LRF:
aqi = 97.35311 - 0.8097223 hpi – 0.0008548 Gper + 0.0018599 Ocons + 0.0003528 Dens + ui
c Explaining results:
Regression Coefficient
Regression coeficient value
Meanings
𝜷 ̂ = −𝟎 𝟖𝟎𝟗𝟕𝟐𝟐𝟑 𝟏 <0 Estimator of 𝜷𝟏 Ceteris paribus, when hpi increase
by 1 unit then aqi decrease by the amount of 0.8097
They may have negative correlative but it is apparent prediction It showed that a nation has high HPI, its citizens will reduce air pollution impact on
enviroment
𝜷 ̂ = −𝟎 𝟎𝟎𝟎𝟖𝟓𝟒𝟖𝟐 <0 Estimator of 𝜷𝟐 Ceteris paribus, when Gper increase
by 1 unit then aqi decrease by the amount of 0.0008548 Surprisingly outcome, GDP per capita of
a nation increase following by the reduction in air pollution level.
𝜷 ̂ = 𝟎 𝟎𝟎𝟏𝟖𝟔𝟑 >0 Estimator of 𝜷𝟑 Ceteris paribus, when Ocons
increase by 1 unit then aqi increase by the amount of 0.00186 Therefore, the increase in oil consumption per day lead to the increase in air polluton level.
𝜷 ̂ = 𝟎 𝟎𝟎𝟎𝟑𝟓𝟐𝟖𝟒 >0 Estimator of 𝜷𝟒 Ceteris paribus, when Dens increase
by 1 unit then aqi increase by the amount of 0.0003528, which means the increase in population density lead to the rise in air polluton level.
Trang 12R2 = 0.3394 : Independent variables can explain 33.94% of dependent variable’s fluctuation
4 REGRESSION MODEL EXAMINATIONS
a Testing the coincidence of model:
➢ Method of using critical value :
- Considering this hypothesis: {𝐻0: 𝑅2 = 0 (𝜷𝟏 = 𝜷𝟐 = 𝜷𝟑 = 𝜷𝟒 = 𝟎)
𝐻1: 𝑅2 ≠ 0
- Using the “test” command to test the hypothesis:
test hpi Gper Ocons Dens
( 1) hpi = 0 ( 2) Gper = 0 ( 3) Ocons = 0 ( 4) Dens = 0
F( 4, 143) = 18.37 Prob > F = 0.0000
di Ftail(4, 143, 05) 99526432
F ( 4, 143) = 18.37 > 𝐹𝛼(4,143) = 0.99526432 → Reject H0
- Conclusion: Regression model coincide with sample
➢ Method of using P-value:
Trang 13- Considering this hypothesis: {𝐻0: 𝑅2 = 0 (𝜷𝟏 = 𝜷𝟐 = 𝜷𝟑 = 𝜷𝟒 = 𝟎)
𝐻1: 𝑅2 ≠ 0
From Table 1, we have P–Value = 0.0000 < 𝛼 = 0.05 → Reject H0
- Conclusion: Regression model coincide with sample
b Testing the regression coefficients
➢ Level of significants: 𝜶 = 0.05
𝛽̂i Variables P-Value Statiscal
significance
Conclusion
1 hpi 0.042 < 𝛼 Yes Happy planet index afftect the
air quality
2 Gper 0.000 < 𝛼 Yes GDP per capita affect the air
quality
3 Ocons 0.006 < 𝛼 Yes Oil consumption per day affect
the air quality
4 Dens 0.478 > 𝛼 No Population density has no effect
on air pollution level
III EXAMINATION OF VIOLATED ESTIMATIONS IN THE
MODEL
1 Normal distribution
Testing hypothesis: {𝑯 𝑯𝟎: 𝒖𝒊 𝒇𝒐𝒍𝒍𝒐𝒘 𝒏𝒐𝒓𝒎𝒂𝒍 𝒅𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏
𝒊: 𝒖𝒊 𝒅𝒐𝒆𝒔𝒏′𝒕 𝒇𝒐𝒍𝒍𝒐𝒘 𝒏𝒐𝒓𝒎𝒂𝒍 𝒅𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏
Trang 14➢ Method 1: Testing normal distribution by graphs
The graphs illustrated that AQI are seem like coutinuously normal distributed variable (Bell-shaped distribution) - Can not draw conclusion
➢ Method 2: Testing normal distributtion by considering Skewness &
Kurtosis values
- A fundamental task in many statistical analyses is to characterize the location and variability of a data set A further characterization of the data includes skewness and kurtosis Skewness is a measure of symmetry, or more precisely, the lack of symmetry A distribution, or data set, is symmetric if it looks the same to the left and right of the center point
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution That is, data sets with high kurtosis tend
to have heavy tails, or outliers Data sets with low kurtosis tend to have light tails, or lack of outliers A uniform distribution would be the extreme case A variables has normal distribution whether the skewness and
kurtosis value almost equal 0 or 3
• Variable description with STATA:
AQI
Frequency
Trang 15- Percentiles Smallest
1% 18 16 5% 23 18 10% 25 19 Obs 148 25% 30.5 21 Sum of Wgt 148
50% 41.5 Mean 55.03378 Largest Std Dev 35.07532 75% 67 162
90% 108 165 Variance 1230.278 95% 126 165 Skewness 1.597139 99% 165 188 Kurtosis 5.162938
• Estimating 2 values:
sktest aqi
Skewness/Kurtosis tests for Normality - joint - Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
-+ -
aqi | 148 0.0000 0.0008 37.18 0.0000 From estimated result of AQI variables_Air quality index, p-value of skewness and kurtosis estimation are both smaller than 𝛼 = 0.05 → Must reject H0
➔Regression model doesn’t has normal distribution.
Trang 16However, with number of observations n = 148, we believe model could given acceptable outputs and could be used to infer statistics
2 Multicollinearity
a CORRELATIVE ANALYSIS: presented in II.2
b TESTING VARIANCE INFLATION FACTOR (VIF)
We have: 𝐕𝐈𝐅 = 1−R1 2
Variable | VIF 1/VIF -+ - Dens | 1.13 0.887027 Ocons | 1.10 0.911262 hpi | 1.09 0.919770 Gper | 1.09 0.920100 -+ - Mean VIF | 1.10
Conclusion: All VIF values are smaller than 10 ➔ The regression model doesn’t has multicollinearity phenomenon
Trang 17CONCLUSION
Throughout running data diagnosis, model analysing, conducting estimation, surmount model violations, we have given these conclusions:
• Previous Sample Regression Model:
Output = 97.35311 - 0.8097223 hpi – 0.0008548 Gper + 0.0018599 Ocons + 0.0003528 Dens
After all, these steps upward have helped us to answer the question in the introduction: Do HPI (Happy Planet Index), GDP per capita, Oil consumption and population affect the level of air pollution (AQI)? And in which scale? With the useful tool of STATA, our group have given specific figure, run mathematic model
in order to form the most coincide regression model, estimated all the task of regression analysis
• Some limitations in implementation and recommendations:
- Limitations:
+ Data collecting is manual methods by variety sources from internet so that errors are unavoidable
+ In facts, there are uncountable factors beside HPI, Gper, Ocons, Dens that affect the air pollution level It’s might be not the most accordant variable to depict the output
- Recommendations: If it’s possible, we should add more variables to this model such as: Enviromental tax, average rain quantity, constructions per
sq mile, etc for more incisive overview of our research