the mean of the population in 2013 was 7.7, it seems that the world average crude death rate was increased to some extent.. By not rejecting H0 , i might have committed type II error Fai
Trang 1Subject’s Code: ECON1193
Subject’s Name:
Business Statistics
Location &
Campus:
RMIT Vietnam, Hanoi
Student’s Name
& Number:
Vu Quang Hung- s3743984
Lecturer’s
This report contains 5 parts:
Trang 21 Summary descriptive statistics
2 Confidence intervals
3 Hypothesis testing
4 Regression analysis
5 Overall conclusion
Part 1: Summarize descriptive statistics
From my assignment 1 results, i have concluded that crude death rate and national income is related I will list the analytics i have worked on previously to show why these two variables
is related
Firstly, Iceland has lower death rate than Uganda Given that Iceland is a more
developed country (GNI of 49960 in 2015) compare to Uganda, has lower income (GNI of 670 in 2015), and still a developing country
(CDR from 1980 to 2016: Iceland & Uganda)
Trang 3Taking the larger sample size of 35 countries, the results from Contingency Table suggested that high and middle-income countries have higher probability of having low crude death rate
(CDR Contingency Table)
Moreover, as can be seen in the table 1 below, CDR and Income are not independent, which means that the data of death rate might be affected by national earnings and vice versa
(Table 1) These evidences above all suggest that there is a relationship between income and death rate
of a country
Part 2: Confidence Intervals
Death rate, crude (per 1,000 people)
Trang 4Firstly, i choose a level of significance of 5% -> confidence level is
95% -> α = 0.05, α/2 = 0.025
Population standard deviation (σ) is unknown, so i can substitute σ by sample standard deviation
S, i will use t-table to calculate the confidence interval
Sample size n = 35
d.F = (n-1) = 35-1 = 34
t34,0,025 = 2.0322
(Distribution plot)
(table 2) Point estimate X = 8.22 (Mean value in table 2)
S = 2.26
-> Confidence interval estimation: µ =
Trang 5={8.22-0.7766;8.22+0.7766} =
{7.444;8.997} -> 7.444 < µ < 8.997
This means that I can be 95% confident that the true mean crude death rate falls between 7.444 and 8.997 (per 1000 people)
Gross National Income (GNI) per capita
Apply the same method
+) Level of significance: 5% (α = 0.05)
+) Confidence interval estimation calculated is µ =
{11408,137;27545.577} -> 11408,137 < µ < 27545.577
+) I am 95% confident that the average of Gross National Income per capita is between 11408.137 and 27545.577 (US dollars)
Domestic general government health expenditure per capita, PPP
+) Level of significance: 5 % (α= 0.05)
+) Confidence interval calculated is µ = {854.93;2065.128}
-> 854.93 < µ < 2065.128
Trang 6+) That means i can be 95 % confident that the domestic general government health
expenditure per capita is in between 854.93 and 2065.128 (current international $)
Assumptions:
No assumptions required in this case, since i use t-table and the sample size (n) is greater than 30, i am able to use Central Limit Theorem
Suppose the number of countries will double:
o Firstly, the confidence interval width will decrease if the number of
countries were doubled
According to the equation above, the confidence interval depends on the square root
ofthe number of measurements (n) Therefore, when we double the number of
countriesfrom 35 to 70 countries (double the n), the standard error decrease,
confidence intervalrange will become smaller
Overall, confidence interval will become narrower towards the true mean of the distribution, hence increasing the accuracy of the results Moreover, increasing the number of countries involved in the test will make the result a better representative of all
195 countries
Part 3: Hypothesis Testing
a Prediction:
As what have been calculated in part 2 above, it is 95% confidence that the world average crude death
rate is in between 7.444 and 8.997 Given that the sample mean X is 8.2 in 2015, since
Trang 7the mean of the population in 2013 was 7.7, it seems that the world average crude death rate was increased to some extent
However, to have better judgement, hypothesis testing is required to determine whether the crude death rate per thousand people has increased or not
b Hypothesis testing:
Level of significant: α = 0.05
Sample size n = 35 > 30, Central Limit Theorem is applicable and thus sampling distribution of mean becomes normally distributed I will use t-value to do the test Null hypothesis and alternative hypothesis :
This is a one-tail test (upper-tailed test) since the alternative hypothesis H1 is focused
on the upper tail above the mean of 7.7
Alpha = 0.05
d.F = 34
Upper-tailed test
-> t34,0.05 = 1.6909
Compute test statistics: (Sample mean X=8.22, µ=7.7, Sample standard deviation S=2.26, n=35)
-> t = 1.36
t = 1.36 < 1.6909, the test statistic falls into the non-rejection region, therefore we do not reject the null hypothesis H0
As H0 is not rejected, hence with 95% level of confidence it can be concluded that the average crude death rate has decreased or stay the same (not increased)
Trang 8By not rejecting H0 , i might have committed type II error (Failed to reject a false null hypothesis)
It means that there is a probability that the average world crude death rate has increased (H0 false) but i still claim that it has decreased or remain unchanged (not reject H0 ).
Type II error can be minimized by picking a larger sample size (n) By increasing the sample size, I make the hypothesis test more sensitive, which means that it is more likely to reject the null hypothesis when it is, in fact, false Another solution for this is
to increase the level of significant α, which makes the rejection area larger hence less likely to ignore a false null hypothesis
Part 4: Regression analysis
a Dependent variable and independent variables:
Dependent variable: Death rate, crude (per 1,000
people) Independent variables:
- GNI per capita, Atlas method (current US$)
- Domestic general government health expenditure per capita, PPP
(current international $)
- Immunization, measles (% of children ages 12-23 months)
- Smoking prevalence, total (age 15+)
b Regression analysis:
As i have concluded in the previous report, i expect the relationship between CDR and GNI to
be Negative Linear, higher income countries tend to have lower crude death rate.
Scatter plot of CDR and GNI:
Trang 9CDR & GNI
14
12
10
8
6
4
2
0
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
GNI per capita ( current us$ )
(Regression model) Comment on scatter plot:
- According to the graph, it seems to have no linear relationship between these two variables, many countries in this plot have the same income but also have very different CDR
Regression output (from excel):
- Based on the summary output, the Simple Linear Regression equation is
Trang 10- Regression coefficient (slope): b1 = 0.00001 shows the average increase in the crude death rate when the gross national income grows by 1$ per capita
- Coefficient of determination R square: 2 % This means that 2% of the variance
of the crude death rate is explained by the country income
- Test the significance of the independent variable:
o Using t-value method: (two-tailed test)
α= 0.05 -> α/2 = 0.025
d.F = n-2 = 35-2 = 33
t critical value (from t-table): t33,0.025 =
±2.0345 Test statistic t = 0.813
t statistic is in the non-rejection area -> Do not reject H0
o Using p-value method
Pb1 = 0.422; α = 0.05
Pb1 > α -> Do not reject H0
There is no sufficient evidence to conclude a linear relationship between crude death rate and gross national income
2 CDR & Domestic health expenditure
Expected result: Negative linear relationship, countries spent more money on domestic health programs will have less people die from health problems thus should have lower CDR
Scatter plot of CDR & Domestic general government health expenditure:
Trang 11CDR & Domestic health expenditure
14
12
10
8
6
4
2
0
domestic health expenditure ( current international $ )
Comment on the scatter plot: This scatter plot shows that CDR and Domestic health expenditure does not have linear relationship Almost a half of 35 countries spent approximately the same budget on health (less than 500$ per capita), but these countries have very different death rate which means that money spent on health program seems to have no effect on CDR
Regression output:
- Based on the regression statistics, linear equation is
Trang 12- Regression slope b1 = 0.00023 -> For every dollar the government spends on healthcare for each of its citizen, death rate will increase by per 1000 people
- Coefficient of determination R square = 0.033 -> 3.3 % of CDR is explained by domestic healthcare expenses
- Test the significance of the independent variable:
o Using t-value method:
H1 : β1 ≠ 0 (linear relationship does exist)
α = 0.05 -> α/2 = 0.025
t critical value (from t-table): t = ±2.0345
Test statistic t = 1.0623
t statistic is in the non-rejection area -> Do not reject H0
o Using p-value method
Pb1 = 0.2958; α= 0.05
Pb1 > α -> Do not reject H0
There is no sufficient evidence to conclude a linear relationship between crude death rate and domestic general government health expenditure
Expected result: Negative linear relationship Children from 12-23 months age get immunized will less likely to be exposed by measles in the future, this might reduce people die from this disease Therefore, death rate overall may decrease
Scatter plot of CDR and Immunization, measles (%)
Trang 13CDR & IMMUNIZATION,measles
14
12
10
8
6
4
2
0
3 0 iMMUNIZATION,measles ( % of children ages 12-23 months )
Comment on the scatter plot: This scatter plot shows that CDR and IMR do has linear relationship The more immunization children get, the lower the death rate, hence this relationship is negative However, since the value on the trend line and the actual value have quite large distance between them, this is a weak relationship
Regression output:
- Based on the regression statistics, linear equation is
- Regression slope b1 = -0.06871 shows how much dependent variable (death rate) decrease if the proportion of infant get measles immunization increases by 1%.
Trang 14- Coefficient of determination R square = 0.126 -> 12.6 % of CDR is explained
by measles immunization percentage While the remaining 87.4% is due to other factors
- Test the significance of the independent variable:
o Using t-value method:
H1 : β1 ≠ 0 (linear relationship does exist)
α = 0.05 -> α/2 = 0.025
t critical value (from t-table) : t = 2.0345
Test statistic t = -2.182
o Using p-value method
Pb1 = 0.036; α= 0.05
Pb1 < α -> Reject H0
There is sufficient evidence to conclude a linear relationship between crude death rate and proportion of 12-24 months age children who have measles immunization
Expected result: Positive linear relationship, Smoking cause numerous deadly diseases Therefore, it is dead rate will rise if smoking prevalence among teenagers (under 15) rise. Scatter plot of CDR & Smoking prevalence:
Trang 15cdr & sMOKING PREVALENCE
14
12
10
8
6
4
2
0
sMOKING PREVALENCE ( % )
Comment on the scatter plot: This scatter plot shows that CDR and SP has non-linear relationship It can be seen that data points scatter randomly around the trend line Regression output:
- Based on the regression statistics, linear equation is
- Regression slope b1 = 0.0467 This means that smoking prevalence rate increase by 1/1000 will result in the rise by 0.0467 in the crude death rate
- Coefficient of determination R square = 0.040 -> 4 % of CDR is explained by Smoking prevalence
- Test the significance of the independent variable:
Trang 16o Using t-value method:
H0 : β1 = 0 (no linear relationship)
H1 : β1 ≠ 0 (linear relationship does exist) α = 0.05
t critical value (from t-table): t=2.0345 Test statistic t = 1.177
o Using p-value method
Pb1 = 0.248; α = 0.05
Pb1 > α -> Do not reject H0 There is no sufficient evidence to conclude a linear relationship between crude death rate and smoking prevalence
c Variable recommended for further research on crude death rate:
After considering all 4 independent variables above, I recommend immunization, measles (% of children ages 12-23 months) for further research
on crude death rate The main reason for this is that among 4 given variables, this has the most significant correlation with CDR (R square = 0.126) although the relationship is weak Also, by doing regression analysis, this is the only variable that I can conclude it has linear relationship with CDR
Part 5: Overall conclusion.
According to the findings of this report and the results from the previous report, I have made some conclusion on the world average crude death rate Based on the confidence interval part, i can be
95 % confident that the world average CDR is in between 7.444 and 8.997 (per 1000 people) 95 %
is also the confidence level that i use to conclude that average CDR has decreasing trend from the mean of 7.7/1000 in 2013 to 8.2 in 2015.In part 4 (Regression analysis), I figured out that only
independent variable X 3 (immunization, measles) do have effect on the dependent variable CDR, but only accounted for a small 12.6% of the total change in CDR (confidence
Trang 17level 95%),while income, expenditure on healthcare and smoking prevalence seems to have no linear relationship with CDR Therefore, apart from these 4 above, there are still many prominent factors that have better impact on CDR and worth to do research on, but we have not yet covered in this report
To sum up, by looking at the crude death rate, we are able to have a better measurement of how well we ensure the human well-beings (Soares 2007) Our mission is to expand our studies to find out other aspects affecting the death rate, in the end reducing the world average CDR.
Reference list:
1.Soares.R.R 2007, On the Determinants of Mortality Reductions in the
Developing World,Population and Development Review,vol 33,pp 247-287,viewed
23th December 2018,Wiley Online Library database