Pickett and Wilkinson 2005 examined other research papers’ finds on the relationship; Erdal Demirhan, Mahmut Masca 2008 explored seven determinants of FDI of 38 developing countries from
Trang 1FOREIGN TRADE UNIVERSITY
FACULTY OF INTERNATIONAL ECONOMICS
ECONOMETRICS REPORT
FACTORS AFFECTING HUMAN
LIFE EXPECTANCY
GROUP 23
2 Nguyễn Thị Hải Yến 1815520240
3 Nguyễn Thị Huyền Trang 1815520234
Lecturer : Mrs Tu Thuy Anh
Ha Noi, October 2019
Trang 2TABLE OF CONTENTS
INTRODUCTION 1
Chapter 1 : LITERATURE REVIEW 2
1.Pickett and Wilkinson (2005) 2
2 Erdal Demirhan, Mahmut Masca (2008) 3
3 Lillard, Burkhauser, Hahn, and Wilkins (2014) 3
Chapter 2 : METHODOLOGY 5
1 Methodology used: 5
1.1 Methodology in collecting data: 5
1.2 Methodology in analyzing data 5
3 Data Description 6
3.1 Data source 6
3.2 Statistics Description 6
3.3 Correlation Description Among Variables 8
Chapter 3 : ESTIMATED RESULT & STATISTICAL INFERENCE 9
1 Estimated Model 9
2 Testing Assumption 10
2.1 Testing Multicollinearity 10
2.2 Testing Autocorrelation 11
2.3 Testing Normality of residual 11
3.4 Testing Heteroskedasticity 17
4 Hypothesis Testing 17
4.1 Statistical Significance of Coefficients (in model 2) 17
4.2 Statistical Significance of Model 2 18
5 Recommendation 18
CONCLUSION 19
REFERENCES 20
APPENDIX 21
Trang 3Human life expectancy is a statistical measure of the average time that a person isexpected to live, based on the year of his birth, his current age and other demographicfactors including gender Specially in the modern life with fast developing speed andincreasingly international integration trend, life expectancy is perhaps the mostimportant measure of health as well as the national development
A great many factors have been researched before that affect the humanlongevity; however, it is only realistic to cover a comparatively small number of suchfactors for the sake of statistical analysis Thus the question arises: What exactly arethe factors that have such an effect on human life expectancy? Having the awareness
of how important the life expectancy is, we decided to find more about elementsaffecting the human lifespan whether it is a good or bad factor The variables we will
be testing is : GDP per capita, GNI per capita, the level of air pollution as well
Researching topic “Factors affecting human life expectancy”, we aim to
understand and find out the solution contributing towards the enhancement of humanlife expectancy To accomplish the goal above, we need to estimate the regressionmodel and testing the significance of variables
Accompany with us in this project is an econometrics tool named “Gretl” In theend, we expect to achieve an objective look into the issue as well as apply appropriatemeasures to make progress in improving problems related to life expectancy
Except for introduction and conclusion, this report includes 3 main parts :
- Chapter 1 : Literature Review
- Chapter 2 : Methodology
- Chapter 3 : Estimated Result & Statistical Inference
Trang 4Chapter 1 : LITERATURE REVIEW
Prior to studying the relationship between average lifespan and poverty, it wascrucial to examine other studies that dealt with a similar topic Life expectancy is animportant index that reflects the standard of living and the social situation as well asthe economic development level of a country Therefore, in recent decades, there are alot of studies carried out that research about life expectancy and the potential factorsaffecting it
Three papers were specifically chosen for this literature review to show that the
project topic is significant Pickett and Wilkinson (2005) examined other research papers’ finds on the relationship; Erdal Demirhan, Mahmut Masca (2008) explored seven determinants of FDI of 38 developing countries from 2000 to 2004 and Lillard, Burkhauser, Hahn, and Wilkins (2014) looked at a self-reported health survey and
income inequality
1 Pickett and Wilkinson (2005)
The research had conducted research on the relationship between incomeinequality and population health and suggested why the results might be “whollysupportive,” “unsupportive,” and “partially supportive” of the claim that these twovariables were related “Wholly supportive” meant that the relationship between thetwo variables had only positive statistically significant associations “Unsupportive”implied that there were no statistically significant positive associations “Partiallysupportive” signified that only some of the relationships had statistically significantpositive associations
70% of the studies implied that when there was larger income inequality, thehealth of the population suffered from poorer health The paper found that it wasimportant to sample a large area to show the true nature of income inequality Forexample, studies that looked at large subnational regions were not as likely to provethe relationship between income inequality in health as international studies or studiesexamining sub-national regions
Another issue in a few of the studies was identifying the proper control variables.For example, the authors acknowledge that as countries are wealthier per capita, therelationship between life expectancy and GNI per capita becomes less prevalent Oncetwo issues were identified, Wilkinson and Pickett reviewed all of the papers and foundthat only 8% of them were unsupportive of the claim that health and income inequality
Trang 5were related Therefore, the variables of health and income inequality ought to be associated.
2 Erdal Demirhan, Mahmut Masca (2008)
In the document “Determinants of foreign direct investment flows to developing countries: a cross-sectional analysis”, Erdal Demirhan and Mahmut Masca explored
seven determinants of FDI with a cross-country data of 38 developing countries in the five-year period from 2000 to 2004 One of those determinants that is directly affected
to LEB is GDP per capita, the growth rate of which is used in the research as the proxyfor market size
Prior to building their own model, the authors mentioned the findings from a few existing studies on the topic Market size, measured by GDP per capita appears to be the most robust FDI determinant in econometrics studies (Artige and Nicolini, 2005) The idea is also supported and further explained by Jordaan (2004), who said that FDI tend to flow into economies with larger and expanding markets, translating into greaterpurchasing power or higher GDP per capita, the markets from which firms have a higher chance to earn better returns from invested capital and thus increase profit, life expectancy
3 Lillard, Burkhauser, Hahn, and Wilkins (2014)
This research investigated the relationship between a US-born adult’s reported health and income inequality The dependent variable was in a range from 1-5(1 being “poor” and 5 being “excellent”) The independent variable was the share held
self-by the top 1% from the age of 0-4 and also whether or not the child was considered aspoor growing up
The main find of this research paper was that if individuals suffered from incomeinequality early in their lives, they were more likely to have worse health and thisassociation is statistically significant for both genders For example, if a male had grown
up in a high income inequality society, they would be more likely to have worse health
However, there are some issues with this paper that the researchers acknowledge.Since the income inequality measure only changes over time and does not differ acrossgroups that live in different regions of the US, there may be omitted variable bias.Furthermore, the paper uses a linear model between inequality and health, when the truemodel may in fact be nonlinear Though the paper does not suggest why health and
Trang 6income inequality may be associated, it does encourage future studies to examine the mechanism of the relationship.
➔ From the literature review above, we can see that GDP per capita, GNI percapita have the effect on the level of human life expectancy However, there is
no current study including impact of all these factors, so we decided to conductthis research to find out how they affect on the life expectancy of 180 countriesall over the world
Trang 7Chapter 2 : METHODOLOGY
1 Methodology used:
During the project, we have used the knowledge of econometrics andmacroeconomics with the main support of Gretl software, Microsoft Excel, MicrosoftWord for completion this report
1.1 Methodology in collecting data:
We collected this set of data which indicates the information of factors affectingthe human life expectancy: GDP per capita, air pollution, GNI per capita Thissecondary data was gathered from reliable source of information- World Bank
1.2 Methodology in analyzing data
We use Gretl to bring out the regression models by using Ordinary Least Squaresmethod (OLS) to estimate the parameter of multi-variables regression models As aresult, we can:
- Use normality of residual command to test the normal distribution of the
To demonstrate the relationship between human life expectancy and other
factors, the regression function can be constructed as follows : (write in the stochastic form)
● Population Regression Function (PRF) :
LE = 0 + 1*GDP + 2*AP + 3*GNI + u
● Sample Regression Function (SRF) :
LE= b0 + b1*GDP + b2*AP + b3*GNI + e
In which :
βo : the intercept of the regression modelo : the intercept of the regression model
βo : the intercept of the regression modeli : the slope coefficient of the independent variable Xi
u : the disturbance of the regression
b 0 : the estimation of βo : the intercept of the regression modelo
b i : the estimation of βo : the intercept of the regression modeli
e : the estimation of u
Trang 8Variables Explanation
➢ Dependent variable: LE : the human life expectancy (year)
➢ Independent variables :
Exhibit 2.1 Variables Explanation
- We use the “summary statistics” command in Gretl to get statistical indicators
of the variables It shows the average value (Mean), the middle value (Median),the standard deviation (S.D) as well as the minimum value (Min) and themaximum value (Max) of all the given variables
Trang 9Exhibit 2.2 Statistics Description (Source : Gretl)
Summary Statistics, using the observations 1 – 40
From the result on Exhibit 2.2 above :
o The standard deviation of variable LE is 8.035814, a high standard
deviation, which means the difference in life expectancy across countries
is relatively high Rich countries, developed countries often have a high
average life expectancy (over 80 years), mainly in the Americas and
Europe, while those in Asia and Africa are developing countries, with the
average longevity of usually around 60 to 70 years
o The standard deviation of variable GDP is 17554.63 This is also a highstandard deviation, which shows that the gap in average income between
various countries worldwide is quite large It is totally understandable becausethere is a marked difference in the level of economic development among
nations GDP per capita income of the Americas or Europe is often much
higher than that of Asian or African countries
o The mean value of 28.34444 indicates that the level of pollution is mild (thesafe level is 25) and the standard deviation is 19.77875 Countries with
severe levels of pollution are often poor, developing countries in Asia and
Africa (For example: Qatar: 107, Saudi Arabia: 106, India: 74, Kuwait: 67),
whereas in developed countries in Europe and America, pollution levels are
very low (For example: Australia: 6, USA: 8, Denmark: 11, UK: 12,
Trang 103.3 Correlation Description Among Variables
- Before running the regression model, we consider the correlation among
variables by using the “correlation matrix” command in Gretl.
- Correlation Coefficients, using the observations 1 - 180
5% critical value (two-tailed) = 0.1463 for n = 180
We obtained the correlation table among variables as Exhibit 2.3 follows:
Exhibit 2.3 Correlation Matrix (Source : Gretl)
● The correlation coefficient between LE and GDP is 0,6291, which is positive
and quite high and in accordance with the theory Therefore, GDP has a
positive effect on LE, any change in GDP per capital will lead to a largelysimilar change in human life expectancy
● The correlation coefficient between LE and AP is -0,3224, which is in
accordance with the theory The air pollution has a negative effect on LE, any
change in level of air pollution will lead to a slightly inverse change in humanlife expectancy
● The correlation coefficient between LE and GNI is 0,6413, which is positive
and relatively high and in accordance with the theory Therefore, GNI has a
largely positive effect on LE, any change in density of the city will lead to alargely similar change in human life expectancy
➔ According to the figures from the table, all the correlation coefficient between
dependent variable and independent variables is less than 0,8 => The
multicollinearity is not like to occur in this model
Trang 11Chapter 3 : ESTIMATED RESULT & STATISTICAL INFERENCE
1 Estimated Model
- To run the regression model, we run the “Ordinary Least Square” command on
Gretl The result is shown on the Exhibit 3.1 below The number below
coefficients stands for t-ratio
Exhibit 3.1 Estimated Result
No multicollinearity not violated satisfy
No Autocorrelation not violated satisfy
Normal Distribution of u not violated satisfy
9
Trang 12● From the result on Exhibit 3.1, we have Sample Regression Function as follow:
LE = 69.8247 + 8.047e-05*GDP - 0.100591*AP - 0.000187*GNI+ e
● The coefficient of determination :
+ R2 = 0.499 means that the independent variables in the model account for
49,9% of the variation in the value of the dependent variable and theremaining depends on other factors
● Meaning of estimation coefficients:
+ b0 = 69.8247 : In case that all factors equal 0, then the average human lifeexpectancy is 69.8247 years
+ b1 = 8.04672e-05 > 0 => In accordance with the theory.
In case other factors doesn’t change, when the GDP per capita increases1$, then the average human life expectancy will increase 8.04672e-05year
+ b2 = -0.100591< 0 => In accordance with the theory.
In case other factors doesn’t change, when the air pollution levelincreases 1 microgram per cubic meter, then the average human lifeexpectancy decrease 0.100591 year
+ b3 = 0.000187 > 0 => In accordance with the theory.
In case other factors doesn’t change, when the GNI per capita increases 1$, then the average human life expectancy will increase 0.000187 year
2 Testing Assumption
2.1 Testing Multicollinearity
• Multicollinearity is the occurrence of high intercorrelations among independentvariables in a multiple regression model Multicollinearity can lead to widerconfidence intervals and less reliable probability values (p values) for theindependent variables
• One way to test multicollinearity is use VIF (variance inflation factor) which
can be calculated in Gretl That VIF is superior to 10 indicates that the modelhas multicollinearity phenomenon and the variation will seem larger and thefactor will appear to be more influential than it is
• Using Analysis - Collinearity command in Gretl, we get the VIF The result is shown in Exhibit 3.2 below
Trang 13Exhibit 3.2 Collinearity
• Summarizing the result, we get :
VIF GDP = 3.383 <10VIF AP = 1.124 < 10VIF GNI = 3.199 < 10
➢ We can see from the result above, VIF of all independent variables is lower
than 10 Therefore, we can conclude that there is no multicollinearity in this model.
2.2 Testing Autocorrelation
- The time-series data is likely to suffer from autocorrelation while this set of data
is cross-sectional so we can skip this assumption.
2.3 Testing Normality of residual
- The assumption states that the disturbance should be normally distributed
If not, then the standard errors of OLS estimates won’t be reliable, which meansthe confidence intervals would be too wide or narrow
- To test the normality of residual, we use the command Normality of Residual in
Gretl, we have the result as follow:
Trang 14Exhibit 3.3 Histogram
Exhibit 3.4 Residual Distribution
❏ Hypotheses:
+ H0: The disturbance follow normal distribution
+ H1: The disturbance doesn’t follow normal distribution
We can see the results on the Exhibit 3.4, p-value = 0,00001 < 0.05
➔ At 5% level of significant, we have enough evidence to reject H0
➔ The disturbance does not follow the normal distribution
Trang 15According to Central Limit Theorem, in some situations, when independent randomvariables are added, their properly normalized sum tends toward a normal distributioneven if the original variables themselves are not normally distributed So in our
situation, because the number of observations is 180, it is still able to conduct a significant test as usual.
2.4 Testing Heteroscedasticity
- If errors are heteroskedasticity, then it will be difficult to trust the standarderrors of the OLS estimates Hence, the confidence intervals will be either toonarrow or too wide Also, violation of this assumption has a tendency to givetoo much weight on some portion of the data
- To test heteroskedasticity problem, we use White test in Gretl
Exhibit 3.5 White test for Heteroskedasticity
- Hypothesis:
+ H0: var (ui) = constant for all i
- As we can see from the result on the Exhibit 3.5 : p-value = 0,004 < 0,05
➔ At 5% level of significant, we have enough evidence to reject H0
➔ Our model has trouble with Heteroscedasticity
➔ We can conclude that the Model 1 isn’t the perfect estimation for this set of
data