PART 1: MULTIPLE REGRESSIONThe four data sets have the same dependent and independent variables, which are: - Dependent variable: Life Expectancy at birth years - Independent variables:
Trang 1TEAM ASSIGNMENT Lecturer: GREENI MAHESHWARI
Topic: Life Expectancy at Birth
TEAM 3 - SG-G04
Table of Contents
Trang 2PART 1: MULTIPLE REGRESSION 3
1.1 Dataset I - All countries (ALL) 3
1.2 Dataset II - Low-income countries (LI) 5
1.3 Dataset III - Middle-income countries (MI) 5
1.4 Dataset IV - High-income countries (HI) 6
PART 2: TEAM REGRESSION CONCLUSION 8
Comparison: 8
Conclusion: 8
PART 3: TIME SERIES 9
a Regression output 9
b Formulas of trend models 13
c Predictions of Life Expectancy at Birth in 2015, 2016, and 2017 14
d Predictions’ errors of Life Expectancy at Birth in 2015, 2016, and 2017 15
e Calculations of MADs and SSEs of countries 17
PART 4: TEAM TIME SERIES CONCLUSION 18
Comment: 19
Time Series Output comparison: 20
PART 5: OVERALL TEAM CONCLUSION 20
PART 6: REFERENCE LIST 21
PART 7: APPENDICES 22
HI’s backward elimination output updates 22
LI’s backward elimination output updates 23
MI’s backward elimination output updates 24
HI’s backward elimination output updates 24
Trang 3PART 1: MULTIPLE REGRESSION
The four data sets have the same dependent and independent variables, which are:
- Dependent variable: Life Expectancy at birth (years)
- Independent variables:
+ Domestic general government health expenditure per capita - PPP (current
international $) (PPP)+ People using at least basic drinking water services (% of the population) (BDWS)+ Smoking prevalence, total (ages 15+) (% of the population) (SP)
+ GNI per capita, Atlas method (current US$) (GNI)
1.1 Dataset I - All countries (ALL)
Hypothesis Testing: (applied onward for all other backward eliminations)
- (None of the variables have relationships with life expectancy at birth)
- (At least one of the variables (PPP, % of people using basic water services (BDWS), Smoking Prevalence (SP), GNI) has a relationship with life expectancy at birth)
- P-value Test:
+ If p-value <→ reject : There is at least one variable affecting life expectancy
at birth + If p-value >→ do not reject : There is no relationship between all variablesand life expectancy at birth
- Backward Elimination: given significant level
(regression outputs can be found in the appendix)
+ In the multiple regression output above, Smoking Prevalence was the independent variable with the highest P-value (0,886) greater than the significant level indicating
an insignificant relationship between this variable and life expectancy at birth Hence,
we applied backward elimination and eliminated SP to proceed with a new multiple regression
+ In the second multiple regression output, GNI per capita had the highest p-value (0,857) much larger than meaning it was non-significant and should be deducted fromthe dataset following the rule of backward elimination to perform the next multiple regression for more accuracy
- The FINAL model of multiple regression:
Trang 4+ After eliminating the two insignificant variables, we are left with the final regression output in which PPP and the percentage of people using basic drinking water services are significant variables with p-values of 0 < 0,05.
Figure 1 Final model of multiple regression of all countries
+ Regression Equation:
Life expectancy at birth (years)
: PPP per capita (current international $)
: People using at least basic water services (% of the population)
+ Interpretations of regression coefficients:
● The slope shows that life expectancy at birth in all countries will increase by0,002 years with every international $ increase in PPP per capita
● The slope shows that life expectancy at birth in all countries will increase by0,31 years for every increase in the percentage of the population getting touse at least basic water services
● The intercept means without the general government health expenditure(PPP) and basic drinking water services, people in all countries can attain thelife expectancy at birth of 43,49 years
+ The coefficient of determination implies that 82,39% of the variation in life
expectancy at birth in all countries can be explained by the variations of PPP percapita and the percentage of people getting to use at least basic water services in eachcountry The remaining of 17,61% may be due to other ‘demographic and socio-economic’ factors (Mondal and Shitan 2014, pa 1, p.118)
Trang 51.2 Dataset II - Low-income countries (LI)
- Backward Eliminations: (regression outputs can be found in the appendix )
+ As seen from the regression output, the P-value of Smoking prevalence appear to behighest In comparison, the P-value of SP is 0.3 higher than the α (0.05) Hence, weeliminated this IV as it is insignificant and does not have a relationship with LEAB.Hence we eliminate SP
+ From the second regression output, GNI per capita appears to be the least significantvariable with the P-value of 0.09, higher than the α (0.05) Consequently, this IV doesnot have a relationship with LEAB Hence we eliminate GNI per capita
+ From the third regression output, the P-value of People using at least basic drinkingwater services (% of the population) appears to be greatest In comparison, the P-value of BDWS is 0.33 higher than the α (0.05) As a result, this IV is insignificantand does not have a relationship with LEAB Hence we eliminate BDWS
+ Finally, the P-value of the Domestic general government health expenditure per capita(0.21) still higher than the α (0.05) Then, this IV is insignificant and does not have arelationship with LEAB Hence we eliminate PPP
⇨ Conclusion: As we have eliminated all variables, there is no relationship between the four
variables and Life Expectancy at Birth among Low-Income countries Hence, no finalregression output is constructed for the LI dataset showing the life expectancy at birth inlow-countries may depend on other factors
1.3 Dataset III - Middle-income countries (MI)
- Backward Eliminations: (multiple regressions updates can be found in the appendix)
+ Based on the multiple regression output above, SP is the independent variable withthe highest p-value (0.691) larger than α (0.691 > 0.05) Since it is insignificantimplying no connection with LEAB, we apply the rule of backward elimination andeliminate this variable
+ Based on our first updated multiple regression output above, PPP is the independentvariable with the highest p-value (0.556) and larger than α (0.556 > 0.05) Similar to
SP, PPP in this output is insignificant and shows no connection with LEAB Again,
we apply the rule of backward elimination and eliminate this variable
+ In our second updated multiple regression output above, GNI is the independentvariable with the highest p-value (0.593) and larger than α (0.593 > 0.05) Hence,
Trang 6GNI in our second attempt is insignificant and has no relationship with LEAB Again,
we apply the rule of backward elimination and eliminate this variable
Figure 2 Final multiple regression output of MI countries
- After several attempts, we have the final regression of BDWS This significant variable has ap-value of 0, which is smaller than α (0 < 0.05)
- Regression Equation: y = 30,819 + 0,458X
+ y: Life Expectancy at birth (years)
+ X: People using at least basic drinking water services (% of the population)
- Coefficient of determination: R² = 0.7542: indicates 75.42% of the variation in LEAB can
be explained by the variation in the percentage of people using at least basic drinking water services, the remaining 24.58% of the variation may be due to other factors
1.4 Dataset IV - High-income countries (HI)
- Backward Elimination: (multiple regression updates can be found in the appendix)
+ As seen from the regression output, the p-value of PPP appears to be the highest Incomparison with , p-value PPP is higher ( 0.939 > 0.05) Therefore; this IV isinsignificant and does not have a relationship with LEAB, hence we eliminate PPP
Trang 7+ As seen from the regression output, the p-value of BDWS appears to be the highest
In comparison with , p-value BDWS is higher ( 0.314 > 0.05) Therefore; this IV is insignificant and does not have a relationship with LEAB, hence we eliminate BDWS
+ As seen from the regression output, the p-value of SP appears to be the highest In comparison with , p-value is higher ( 0.084 > 0.05) Therefore; this IV is SPinsignificant and doesn’t have a relationship with LEAB, hence we eliminate SP
Figure 3 Final multiple regression model of HI countries
- Equation: y = b0 + b1x = 77.151+ 0.000072x
+ y: Life Expectancy at birth (years)
+ x: Gross National Incomes per capita (US$)
- Regression Coefficient of Determination:
+ R square = 0.3645 means 36.45% of the variation in LEAB can be explained by variation in GNI per capita, and the remaining 63.55% of the variation in LEAB may
be due to other factors
Trang 8PART 2: TEAM REGRESSION CONCLUSION
Comparison:
- Significant independent variables vary among all multiple regression models from differentdatasets Mainly, all countries have two significant variables which are PPP and BWDS Furthermore,both high-income and middle-income countries have only one significant variable which are GNI andBWDS respectively In contrast, life expectancy in low-income countries may be due to other factorssince it does not have any relationship with all four variables as proved through the backwardelimination steps
- The regression model of all countries will provide the best life expectancy at birth estimationbecause it has the highest R square (82,39%) index indicating a strong relationship between lifeexpectancy and the variables while the other two models of MI and HI countries own lower values(75,4% and 50,4% respectively) implying weaker correlations among the metrics Particularly, in allcountries, 82.39%, of the variation in life expectancy at birth can be explained by the variations ofPPP per capita and the percentage of people getting to use at least basic water services (BWDS) ineach country, the remaining 17.61% may be due to other factors
Conclusion:
Water and healthcare services are essential factors reflecting a person’s living standards andown significant correlations to a person’s life expectancy at birth Internationally, 82,39% of thegeneral expected lifespan of a child in every country has a positive relationship with such indicatorsmeaning if the government invests and provides more quality drinking water and medical services, theoverall lifespan will be prolonged noticeably Furthermore, such correlations can be seen throughQueensland University of Technology’s research showing that people living in the countrysides ofAustralia will live 10 years lesser than those residing in the urban areas with fully equipped qualitywater sources (Queensland University of Technology 2015) Therefore, the more people have basicdrinking water to use; the longer life expectancy they can attain Additionally, it is proved that ‘healthcare expenditure significantly influences health status through improving life expectancy at birth,reducing death and infant mortality rates’ (Novignon, Olakojo & Nonvignon 2012, pa 3)
Trang 9PART 3: TIME SERIES
a Regression output
- Uganda and El Salvador
+ Linear trend models
Figure 4 Uganda’s linear output Figure 5 El Salvador’s Linear Output
+ Quadratic trend models
Figure 6 Uganda’s Quandratic Output Figure 7 El Salvador’s Quadratic Output
+ Exponential trend models
Trang 10Figure 8 Uganda’s Exponential Output Figure 9 El Salvador’s Exponential Output
- United Arab Emirates and Burundi
+ Linear trend models
Figure 10 ARE’s linear output Figure 11 Burundi’s linear output
+ Quadratic trend models
Figure 12 ARE’s quadratic output Figure 13 Burundi’s quadratic output
+ Exponential trend models
Figure 14 ARE’s exponential output Figure 15 Burundi’s exponential output
Trang 11- Qatar and Nigeria
+ Linear trend models
Figure 16 Qatar’s Linear Output Figure 17 Nigeria’s linear output
+ Quadratic trend models
Figure 18 Qatar’s quadratic output Figure 19 Nigeria’s quadratic output
+ Exponential trend models
Figure 20 Qatar’s exponential output Figure 21 Nigeria’s exponential output
Trang 12- Slovania and Rwanda
+ Linear trend models
Figure 22 Slovenia’s linear output Figure 23 Rwanda’s linear output
+ Quadratic trend models
Figure 24 Slovenia’s quadratic output Figure 25 Rwanda’s quadratic output
+ Exponential trend models
Figure 26 Slovenia’s exponential output Figure 27 Rwanda’s exponential output
Trang 13b Formulas of trend models
Countries Linear Formula Quadratic Formula Exponential Formula
Figure 28 Table of trend model formulas from eight countries
c Predictions of Life Expectancy at Birth in 2015, 2016, and 2017
Uganda (UGA)
Trang 14Figure 29 Table of life expectancy predictions from 2015 to 2017
d Predictions’ errors of Life Expectancy at Birth in 2015, 2016, and 2017
Trang 15Figure 30 Table of prediction errors in eight countries
e Calculations of MADs and SSEs of countries
Trang 16- Uganda: linear model is recommended for its lowest MAD and SSE (1,644 and 8,193) also
shows a relatively strong linear relationship
- El Salvador quadratic model: is recommended for its loweset MAD and SSE (0,891 and 2,491) indicating low errors also implies a strong quadratic relationship
- United Arab Emirates and Burundi : Quadratic trend model is recommended for its
smallest MAD and SSE in each nation (0.0043 and 0.0012 in [ARE]; 0.0043 and 0.0012 in[BDI]) High also show strong quadratic relationships ( 99.98% in ARE and 96.24% in BDI)
- Qatar and Nigeria: Linear model is recommended for its lowest MAD and SSE in each
nation (0,038 and 0,005 in QAT; 0,327 and 0,462 in NGA) High R (98,96% in QAT and 2 99,75% in NGA) also show a strong linear relationship
- Slovenia and Rwanda: linear model is recommended for its lowest MAD and SSE (0,167 and
0,142 in SVN; 0,883 and 3,72 in RWA).The high (98,33%in Slovenia and 60,76% in Rwanda)also show a relatively strong linear relationship
Trang 17PART 4: TEAM TIME SERIES CONCLUSION
Figure 32 Life expectancy at birth of 8 countries from 1985 to 2017
Burundi
[BDI]
Uganda[UGA]
Rwanda[RWA]
Nigeria[NGA]
ElSalvador[SLV]
Slovenia[SVN]
UnitedArabEmirates[ARE]
Qatar[QAT]
it is obvious that 8 countries shared the same upward pattern in the period of 32 years
Before 1999, the trend LEAB of 8 countries showed several differences, especially in Rwanda with adrop and bottomed out in 1993 However, in the next period of 18 years, except for a sharp rise of 15years in Rwanda, the other 7 nations witnessed a gradual increase in LEAB
Trang 18It can clearly be seen that the chart is divided into 2 groups: Rwanda, Nigeria, Burundi and Ugandatogether scattering at the lower range while Slovenia, El Salvador, Qatar, the United Arab Emiratesexperienced higher ratio range The first group is indeed developing countries with the GNI per capitafrom $ 260 to $2,880 while those in the second one are developed countries with income per capitaranging from $3,430 to $75,150.
Time Series Output comparison:
Linear model (5) Quadratic model (3)
Uganda, Qatar, Nigeria, Slovenia, Rwanda United Arab Emirates, Burundi, El Salvador
Figure 34 Table of trend model pursued by eight countries
From time series prediction, we can see that there are five countries following the Linearmodel: Uganda, Qatar, Nigeria, Slovenia, Rwanda The rest pursue the Quadratic model Additionally,based on the analysis of time series, the Linear model is the one having the smallest MAD and SSEvalues indicating its highest probability in giving the closest value out of the three models.Furthermore, in some countries, R-squared values are very high showing that the results scatteringextremely close to the trendlines, hence giving more accurate predictions Specifically, Qatar’s linearmodel has the R Square of 98.96% while also owning the smallest MAD and SSE of 0.038 and 0.005respectively Therefore, we choose its Linear trend model as the best Life expectancy at birthpredictor
PART 5: OVERALL TEAM CONCLUSION
After conducting various studies on LEAB, we figure that its connection with income (or GNIper capita), though positive, is not substantially strong In fact, PPP and BWDS are two moreprominent determinations for life expectancy Additionally, by using a linear model we predict that lifeexpectancy at birth in 2025 will be 79.253 years Optimistic signs shown in our previous findings alsomake us believe the United Nations can attain its goal of increasing the longevity in low-incomecountries by 2030
From various regression models of countries across different income ranges, GNI per capitaappears to have no prominent correlation with LEAB Although it is a significant variable in HIcountries’ multiple regression model, the R-squared of 0.3647 indicates a weak relationship as only36.47% of the change in LEAB is related to the variation in income USA, one of the wealthiestcountries, is an example attaining GNI ranked at the 9 place but its LEAB is only at 53 globally.th th(Geoba.se 2018 & World Data 2018)