From the WorldBank database, data of five different variables were collected by searching specific keywords, which are Fertility rate, total births per woman; GNI per capita, Atlas metho
Trang 1School of Business & Management
Course code: ECON1193B Course name: Business Statistics 1
Title of Assignment Assignment 3A: Team Assignment Report
Name and Student ID + Nguyen Le Thanh Tung - s3818087
+ Vu Thanh Nam - s3817688+ Hua Thanh Thanh - s3741199+ Phan Huynh Anh Thy - s3817741+ Le Thi Mai Thao - s3864168
Class Group SGS – Group 6 – Team 5
Pages 12 (excluding tables/figures, references and any appendix)
Trang 2Table of Contents:
PART 1: DATA COLLECTION 3
PART 2: DESCRIPTIVE ANALYSIS 3
PART 3: MULTIPLE REGRESSION 4
I Low-Income countries 4
II Lower-Middle Income countries 5
III Upper-Middle Income countries 6
IV High-Income countries 7
PART 4: TEAM REGRESSION CONCLUSION 7
PART 5: TIME SERIES 8
I The significant trend models 8
A Low-Income (LI) – Ethiopia: 8
B Lower-Middle Income (LMI) – Lao PDR: 10
C Upper-Middle Income (UMI) – Malaysia: 11
D High-Income (HI) – Poland: 13
II Recommended Trend Model for Prediction & Explanation 14
III Predictions for fertility rate, total (births per woman) in 2018, 2019, 2020 15
PART 6: TIME SERIES CONCLUSION 16
I Graph Analysis 16
II Recommended trend model 17
PART 7: OVERALL TEAM CONCLUSION 17
REFERENCES 19
APPENDICES 20
Trang 3PART 1: DATA COLLECTION
The first step we determine the information that we need to collect From the WorldBank database, data of five different variables were collected by searching specific keywords, which are Fertility rate, total (births per woman); GNI per capita, Atlas method (current US$); Life expectancy at birth, total (years); Labor force, female (% of total labor force); and Compulsory education, duration (years) These data are specifically in
2017 From the achieved data of 217 countries, the selection process was conducted to eliminate any countries missing even one variable This results in 125 countries Therefore, based on those countries meeting all the required variables, 102 countries were then chosen to be raw data for further processes Moreover, the collected data were then grouped into four different categories (Low-Income, Lower-Middle Income, Upper-Middle Income, High-Income countries) based on the GNI, as specified in the question
PART 2: DESCRIPTIVE ANALYSIS
Low-income Lower-middle income Upper-middle income High-income
Table 1: The central tendency of average births per woman of 4 categorized countries (Low-, Lower-Middle, Upper-Middle, High-Income) in 2017
Low-income Lower-middle income Upper-middle income High-income Observation value <
From table 1, the country categories which had the biggest median was high-income countries, with 2.249 births per woman and followed that is upper-middle-income countries which were 2.243 births per woman The number of births per woman in high-income nations was bigger than in upper-middle-income nations but that was not a tremendous gap between the two numbers, which was 0.0065 Inferring that over 50 percent of births per women of high-income countries was greater than 2.249 compared to 2.243 of upper-middle-income countries Besides, the lower-middle-income countries got the third rank in the list, with 1.979 births per woman while lower-income countries accounted for the smallest median number and stand at the bottom of the list, 1.74 births per woman It means that the fertility rate of upper-middle-income countries was 0.2395 higher than the figure for lower-income nations All in all, the fertility rate of upper-middle- and high-income countries was high and that could lead to a population bomb if the officials do not restrain this rate
Low-income Lower-middle income Upper-middle income High-income
Trang 4PART 3: MULTIPLE REGRESSION
In the case, to build a multiple regression model measuring the number of babies per woman based on the collected dataset, it is crucial to identify the independent variable and dependent variable It is clear that there are 4 independent variables including X , X , X , X below, whereas there is only one dependent variable 1 2 3 4denoted by Y
Y: Fertility rate, total (births per woman)
X1: Compulsory education, duration (years)
X2: Life expectancy at birth (years)
X3: Labor force, female (% of total labor force) X4: GNI per capita, Atlas method (current US$)
In order to remove the significant independent variable and get the independent variables relating to the dependent variable Y, the backward elimination method is applied in this part to construct the final regression model of the 4 income level countries The process of getting final regression model for the 4 income level countries is presented in the appendix 1
I Low-income countries:
Figure 1: The final regression output of Low-income countries
Trang 5The scatter plot between Fertility rate, total (births per woman) and Labor force,
female (% of total labor force)
Labor force, female (% of total labor force)
The coefficient of determination (R ) is 0.8185 = 81.85 %, representing that 81.85 % of the change in the total 2fertility rate (births per woman) (dependent variable Ŷ) can be explained by the variation in the female labor force (% of total labor force) (independent variable X ) 3
II Lower-Middle Income countries:
Figure 3: The final regression output of Lower-Middle income countries
Trang 6The scatter plot between the fertility rate, total (births per woman) and the life
expectancy at birth, total (years)
Life expectancy at birth, total (years)
The coefficient of determination (R ) is 0.4519 = 45.19 %, representing that 45.19% of the change in the total 2fertility rate (births per woman) (dependent variable Ŷ) can be explained by the variation in the total life expectancy at birth (years) (independent variable X ) 3
III Upper-Middle Income countries:
Figure 5: The final regression output of Upper-Middle income countries.
Based on the final data of upper-middle income countries from the figure 5, the regression equation would be calculated as below:
Ŷ = b + b 0 1 X 1 + b 2 X 2 + b 3 X 3 = 12.91 + 0.05X – 0.13X – 0.04X 1 2 3
+ Ŷ: Fertility rate, total (births per woman)
+ X : Compulsory education, duration (years)1
+ X : Life expectancy at birth (years) 2+ X : Labor force, female (% of total labor force) 3The duration of compulsory education (years) has a regression coefficient of 0.05, representing that the predicted total fertility rate will increase by 0.05 births per woman when the duration of compulsory education
Trang 7The female labor force (% of total labor force) has a regression coefficient of - 0.04, meaning that the predicted total fertility rate will decrease by 0.04 births per woman when the female labor force increases by 1 % in the total labor force, given that the duration compulsory education (years) and the life expectancy at birth (years) remain constant.
The coefficient of determination (R ) is 0.7820 = 78.20% representing that 78.20% of the change in the total 2fertility rate (births per woman) (dependent variable Ŷ) can be explained by the variation in the total life expectancy at birth (years) (independent variable X ), the duration of compulsory education (years) 2(independent variable X ) and the female labor force ( % of total labor force) (independent variable X )1 3
IV High-Income countries:
Figure 6: The final regression output of High-income countries
Based on the final data of high-income countries from the figure 6, the regression equation would be calculated
as below:
Ŷ = b + b 0 1 X 1 + b 3 X 3 = 1.75 + 0.1X – 0.02X 1 3
+ Ŷ: Fertility rate, total (births per woman)
+ X : Compulsory education, duration (years)1
+ X : Labor force, female (% of total labor force) 3
The duration of compulsory education (years) has a regression coefficient of 0.1, representing that the predicted total fertility rate will increase by 0.1 births per woman when the duration of compulsory education increases by 1 year, given that the female labor force (% of total labor force) remains constant
The female labor force (% of total labor force) has a regression coefficient of - 0.02, meaning that the predicted total fertility rate will decline by 0.02 births per woman when the female participation in labor force increases
by 1% in the total labor force, given that the duration compulsory education (years) remains stable
The coefficient of determination (R ) is 0.3906 = 39.06 % representing that 39.06 % of the change in the total 2fertility rate (births per woman) (dependent variable Ŷ) can be explained by the variation in the duration of compulsory education (years) (independent variable X ) and the female labor force (% of total labor force) 1(independent variable X ).3
PART 4: TEAM REGRESSION CONCLUSION
After executing several calculations of the multiple regression in part 3, not all the introduced models have the same crucial independent variables from the received outcomes For almost nations in the dataset, two independent variables – labor force and compulsory education have a huge impact on the fertility rate at a 0.05 significant level, but the lower-middle-income countries were influenced by only other variables which were
Trang 8life expectancy at birth It is clear that two of those independent variables affected tremendously the fertility rate of high-income countries at a 0.05 significant level Upper-Middle income countries have another independent variable out of the two mentioned variables, life expectancy at birth, which affected the fertility rate at 0.05 significant level Whereas the low-income countries were merely impacted by the labor force Besides, the regression model of low-income nations got the highest coefficient of variable determination, 0.818 compared to other country categories This illustrates that the fertility rate of low-income countries can
be explained excellently by the variation in the female labor force
On the other hands, the labor force and compulsory education would build the best regression model to demonstrate of the birth per woman assessment The fact that labor force is considered as a significant independent variable for almost country categories, excepting low-income countries because it was the last variable after eliminating all variables Especially, lower-middle-income countries were not affected by any
of two noticed independent variables For almost countries, the labor force was the best regression model, and
it could be predicted that the fertility rate will decrease when the labor force increase Similarity, life expectancy at birth was the most outstanding variable for the regression model of lower-middle-income countries It could be understanded that the fertility rate went up if the life expectancy at birth went down, given that other factors was unchanged
In terms of part 2, the low-income countries had the lowest average fertility rate, with 2.171 births per woman while the high-income countries had the highest average fertility which was 2.667 births per woman Theoretically, the fertility rate is expect to decline in the developed countries, which leads to a reduction the birth rate of that country (Nargund 2009)
PART 5: TIME SERIES
Notes: Throughout section I,
Y births per woman the estimated value of fertility rate, total in country (1990-2015)
A Low Income (LI) – Ethiopia:
1 Linear Trend Model:
H0: 𝛽1 = 0 (No linear trend in the fertility rate, total in Low Income country (1990-2015))
H1: 𝛽 ≠1 0 (Linear trend in the fertility rate, total in Low Income country (1990-2015) observed)
As seen in the regression output above, the p-value of variable T equals to 2.766 × 10―22, which is much smaller than the confidence level, (0.05) Therefore, we reject the null hypothesis H and do not reject H 𝛼 0 1This means that, with 95% level of confidence, there is sufficient evidence to confirm that the linear trend is
a significant trend model representing for the fertility rate, total (births per woman) of the Low-Income country, Ethiopia (1990-2015)
c) Formula & Coefficient explanation:
Y = 7.642 ― 0.116 × T
Trang 9c) Formula & Coefficient explanation:
H0: 𝛽1 = 0 (No exponential trend in the fertility rate, total in Low Income country (1990-2015))
H1: 𝛽 ≠1 0 (Exponential trend in the fertility rate, total in Low Income country (1990-2015) observed)
Trang 10As seen in the regression output above, the p-value of variable T equals to 1.093 × 10―19, which is much smaller than the confidence level, (0.05) Therefore, we reject H and do not reject H This means that, with 𝛼 0 195% level of confidence, there is sufficient evidence to confirm that the exponential trend is also a significant trend model representing for the fertility rate, total (births per woman) of the Low-Income country, Ethiopia from 1990 to 2015
c) Formula & Coefficient explanation:
- Linear format: log(Y)= 0.894 ― 0.009(T)
- Non-linear format : Y = 7.828 × 0.981T (Note: 7.828 ≈ 100.894 & 0.981 ≈ 100.009)
1 Thus, the estimated annual compound growth rate of the fertility rate, total of Low-Income
𝛽 = 0.981
country, Ethiopia (1990-2015) = (0.981 ― 1) × 100%= ― 1.90%
This illustrates that for every 1 year, on average, the fertility rate, total of Low-Income country, Ethiopia (1990-2015) is estimated to decrease by 1.90%
B Lower-Middle Income (UMI) – Lao PDR:
1 Linear Trend Model:
H0: 𝛽1 = 0 (No linear trend in the fertility rate, total in Lower Middle-Income country (1990-2015))
H1: 𝛽 ≠1 0 (Linear trend in the fertility rate, total in Lower Middle-Income country (1990-2015) observed)
As seen in the regression output above, the p-value of variable T equals to 3.954 × 10―19, which is much smaller than the confidence level, (0.05) Therefore, we reject H and do not reject H This means that, with 𝛼 0 195% level of confidence, there is sufficient evidence to confirm that the linear trend is a significant trend model representing for the fertility rate, total (births per woman) of the Lower Middle-Income country, Lao PRD from 1990 to 2015
c) Formula & Coefficient explanation:
(non-linear format)
Y = 6.119 ― 0.143 × T
0 , shows that the fertility rate, total of Lower Middle-Income country, Lao PDR (1990-2015) is
𝛽 = 6.119
expected to be around 6.119 births per when the time period, T is 0 year However, this does not make sense
as being out of our observation scope Therefore, this is the portion of fertility rate, total that is not explained
Trang 11c) Formula & Coefficient explanation:
As seen in the regression output above, the p-value of variable T equals to 9.64 × 10―26, which is much smaller than the confidence level, (0.05) Therefore, we reject H and accept H This leads to the fact that, 𝛼 0 1with 95% level of confidence, the exponential trend model is significant enough to represent for the fertility rate, total (births per woman) of the Lower Middle-Income country, Lao PDR from 1990 to 2015
3 Formula & Coefficient explanation:
- Linear format: log(Y)= 0.807 ― 0.015(T)
- Non-linear format : Y = 6.415 × 0.967T (Note: 6.415 ≈ 100.807 & 0.967 ≈ 100.015)
1 Thus, the estimated annual compound growth rate of the fertility rate, total of Lower
Middle-𝛽 = 0.967
Income country, Lao PDR (1990-2015) = (0.967 ― 1) × 100% = ― 3.30%
This illustrates that for every 1 year, on average, the fertility rate, total of Lower Middle-Income country, Lao PDR (1990-2015) is predicted to drop by 3.30%
C Upper- Middle Income (UMI)– Malaysia:
1 Linear model
Trang 12Based on the calculations, the p-value of linear trend model is approaching to 0 ( 2.98E ― 16 ≈ 2.98 ∗ 10―16
), and smaller than the confidence level , so we can reject H As a result, with 95% level of 0
is expected to be around 3.068 births per when the time period, T is 0 year However, this does not make sense
as being out of our observation scope Therefore, this is the portion of fertility rate, total that is not explained
by time period T
We have 𝛽1= ―0.069, so there is a decrease in every unit in time period T From that, the slope indicates that for every one year, on average the total fertility rate is predicted to decrease by 0.069 births per women in Malaysia And the downward sloping of its linear trend model
2 Quadratic trend model:
From the above figure, as both p-values are approaching to 0 (1.38E ― 11 ≈ 1.38 ∗ 10―11
, and smaller than the confidence level Thus, we can