These figures represent the country's population in millions in 2021, the estimated total number of deaths from Covid-19 from April 1 to July 31 per million population in 2021, the avera
Trang 1RMIT International University Vietnam ECON1193B – BUSINESS STATISTICS 1 Assignment 3: Team Report (35%)
Course name Business Statistics 1
Assignment #3: Team Report
Members & sID
Nguyen Thu Huong s3878242Nguyen Thi Thuy Trang s3892060Nguyen Dinh Khanh Uyen s3891815Tran Lam Kim Ngan s3891636Nguyen Ton Minh Nhat s3878695
Words count 3964 (content only)
Number of Pages 26
TABLE OF CONTENTS
Trang 2PART 1: DATA COLLECTION 5
PART 2: DESCRIPTIVE STATISTICS 5
1 Measures Of Central Tendency 5
2 Measures Of Variation 6
3 Measures Of Shape 6
PART 3: 9
REGION A: NORTH AMERICA 9
REGION B: ASIA 10
PART 4: TEAM REGRESSION CONCLUSION 12
PART 5: TIME SERIES 12
REGION A – NORTH AMERICA 12
REGION B – ASIA 15
PART 6: TIME SERIES CONCLUSION 19
1 Line Chart 19
2 Best Trend Model For Predicting The World's Number Of Deaths Due To Covid-19 20 PART 7: OVERALL TEAM CONCLUSION 20
1 The Main Factor That Impacts The Number Of Deaths Due To Covid-19? 20
2 Predict The Number Of Deaths On October 31 21
3 The Prediction In The Number Of Covid-19 Deaths At The End Of 2021 21
4 Two Variables Might Impact The Global Deaths Case. 22
REFERENCES 22
APPENDIX 25
PART CONTRIBUTION
First name Student ID Parts Contribution Signature
Trang 3contributed %
Part 1: Data collection
The analysis data in this report was collected from 20 countries of Asia and 20 countries of North America; it included six categories in the Excel File These figures represent the country's population (in millions) in 2021, the estimated total number of deaths from Covid-19 from April 1 to July 31 (per million population) in 2021, the average rainfall (in millimeters) and average temperature (in degrees Celsius) from 1991 to 2020, and the total number of hospital beds and medical doctors (per 10,000 people) in 2017 The figures are collected from The World Bank database, Our World in Data, World Health Organization (WHO), government sources with high reliability Because some countries are missing information, this report will analyze only 20 countries in each region to ensure that all of them have enough information for the other parts.
Part 2: Descriptive Statistics
1 Measures of Central Tendency
Min >,<,= Lower Max >,<,= Upper Result
Table 1: Test of outliers of total deaths (million per population) due to
COVID-19 in Asia from April 1st to July 31st, 2021
To conduct proper research of descriptive statistics measurements, outliers from each region are tested The table reveals no outliers in Asia's data, while there is one in North America's data.
Trang 4Type of Measures Region A: Asia Comparison Region B: North America
(>,<,=)
Table 2: Measures of Variation of total deaths (million per population) due to COVID-19 in two
regions from April 1st to July 31st, 2021
Table 2 shows that A and B have no Mode values Outliers in the sample influence the Mean value (Frost n.d.) It is unaffected by the entire dataset unlike the Mean Since the Median is not affected by extreme values (Frost n.d.) It therefore compares and evaluates overall death statistics well Asia had 49.58 median COVID-19 deaths vs 75.84 in North America COVID-19 has a significant impact on both regions because to climate change (Sandoiu 2020).
Table 3: Measures of Variation of total deaths (million per population) due to
COVID-19 in two regions from April 1st to July 31st, 2021
The Interquartile Range (IQR) is the best choice for comparing Asian and North American
datasets since it removes the impact of outliers Because the IQR shows the midpoint
between the upper and lower quartiles, it is ideal for skewed distributions (Bhandari 2020)
Table 3 indicates that North America had 77.76 deaths whereas Asia had 44.38 Using the
Median to compare IQR, the two regions are similar
3 Measures of Shape
Box-and-whisker Plot
Trang 5Figure 1: Box and whisker plots of total deaths (million per population) due to COVID-19 in
Asia from April 1st to July 31st, 2021
Table 4: Comparision of Asia's COVID-19 death rate
Trang 6Figure 2: Box and whisker plots of total deaths (million per population) due to COVID-19 in
North America from April 1st to July 31st, 2021
Table 5: Comparision of North America's COVID-19 death rate
Trang 7This plot is preferred because outliers are visible on the chart While death rates vary, Asiaand North America are expected to have similar distributions Their values are asymmetricand tend to be more positive From 47.26 to 125.03, the box and whisker plot in NorthAmerica is larger and higher than in Asia (35.63 to 80.01) COVID-19 reportedly impactedNorth America more than Asia, according to one report.
Figure 3 FINAL regression model of Region A: NA
2 Regression Equation
In this equation, units are:
Estimated Total number of deaths due to COVID-19 (per million
population) Population in a million in 2021
The average temperature in 2020
Trang 83 Regression coefficients:
= 0.01, the total number of fatalities due to COVID-19 in NA will rise by 0.01 deaths per million people for every 1 Celsius increase in average temperature, assuming the population
remains constant
= 0.399, under the assumption that the average temperature is 0 degrees Celsius and there is
no population, the total number of fatalities attributable to COVID-19 in NA between April 1 andJuly 31, 2021 is 0.399 This implies that there are about 0.399 instances that are independent ofaverage temperature
= 9.03, the total number of fatalities attributable to COVID-19 in NA between April 1 and July
31, 2021, will rise by 9.03 deaths per million population for every 1 million increase in thecountry's population, assuming the average temperature remains constant
4 Coefficients of determination:
According to the coefficient of determination (R Square = 0.39), variations in
average temperature and population account for 39% of the variation in total COVID 19 fatalities in NA between April 1 and July 31, 2021, while other factors (61%) account for theremaining 39% This regression model has an average goodness of fit and explanatory
power, and 61 percent of the observation points do not lie on the regression line
Region B: Asia
1 Regression Analysis:
According to backward elimination, the linear regression for Asia has just one significant hospital bed variable at the 0.05 level of significance
Trang 9Figure 4 FINAL regression model of Region B: Aisa
2 Regression Equation
In this equation, units are:
Estimated Total number of deaths due to COVID-19 (per million
population) Hospital beds (per 10,000 people)
3 Regression coefficients:
With the Y-intercept: b 0 = 0.465, when the hospital bed total is zero, the estimated number of
fatalities is 0.465 That implies there are about 0.465 patients that are not dependent onhospital beds
With the slope coefficient of the population, b 1 = 0.01 indicates that for every hospital bed per
10,000 people, the overall number of fatalities would rise by 0.01 cases
→ Hospital beds significantly affect the overall number of fatalities caused by COVID-19 inAsia's nations Additionally, there is a positive connection since a more significant number ofhospital beds corresponds to a greater number of COVID-19 mortality instances
However, owing to the unrealistic covariate connection between the number of fatalities and hospital beds in reality, this conclusion does not make sense
4 Coefficients of determination:
Trang 10The R square = 0.02 percent indicates that hospital beds explain approximately 2% of theCOVID-19 total number of fatalities variance Other variables, on the other hand, account for98% of the variance in the COVID-19 total number of deaths.
Part 4: Team Regression conclusion
Part 3 shows that the two areas' regression models have distinct significant independent variables In the United States, the two independent variables of average temperature (in Celsius) from
1991 to 2020 and population (in million) in 2021 could be used to reflect changes in the total deaths due
to COVID 19 between April 1 and July 31, 2021; in Asia, only one significant independent variable the number of hospital beds per 10,000 people - can be used.
-The significant independent variables explain 39 percent of the total variation in thetotal deaths due to COVID 19 in NA, and the average temperature has a larger stimulatoryeffect on the number of people who died due to COVID-19 in the period than the population
of this country, when compared to the regression coefficient of the two variables While inAsia, only 2% of the overall variance in the total number of COVID 19-related fatalities can
be explained by the number of hospital beds in our research
Following a comparison of central tendency analysis by median and skewness analysis
by boxplot, Part 2 finds that the number of deaths due to COVID 19 in NA is considerablyhigher than Asia As a result of the former's larger absolute regression coefficient than thelatter, in our study, the total COVID-19 deaths in NA are more affected by significantindependent variables than in Asia, despite the fact that the latter's regression coefficient isless Finally, North America has been hit worse than Asia by the pandemic
Part 5: TIME SERIES
REGION A – North America
Linear trend Quadratic trend Exponential trend
Trang 11Table 6: P-value and R2 of the three trend models.
1 Trend Models
Through calculations, the P-values of linear, quadratic, and exponential trends arecollected in the table above In the next stage, those numbers are used to examine the trends inthe number of humans who died by Covid-19 from April 1st, 2021 to July 31st, 2021
H0: 1 = 0 (There is no trend in the total number of deaths in North America)
H1: 1 ≠ 0 (There is a trend in the total number of deaths in North America)
Linear trend: According to the table above, it can be seen that the p-value is 2.474x10-29 < α
= 0.05, H0 is rejected Therefore, with 95% certainty, it is clear that North American nations are seeing a linear trend in the daily number of deaths caused by COVID-19.
Quadratic trend: Based on Table 6, we accept H1 because the p-value (0.000209075) is lower than
α (0.05) As a result, there is a quadratic trend in the number of COVID-19 fatalities
per day in North America from April 1st to July 31st occurs with 95% confidence
Exponential trend: We reject H0 because p-value = 2.373x10-31 < α = 0.05 (Table 6) Thus,
from April 1st to July 31st, North American countries had an exponential trend in the daily
casualties caused by COVID-19, with 95% of confidence
Regression Output – QUA
Figure 5: Quadratic trend regression output of North America.
Trang 12The figure above also shows the regression output of three trend models of North American nations (Region A), we may compare the coefficient of R2 was calculated to identify the best model to forecast the daily number of new victims due to COVID-19 As a result, out of the three trend models, the Quadratic trend model is the most effective option for estimating future death cases in North American nations since it has the least amount of erroneous computation.
Formula & Coefficient Explanation
= -0.0255+0.0001×2T
When T = 0, = 2.6935 indicates a 2.6935% change in the number of COVID-19 deaths
across North American nations every day However, because 0 is an unobservable value, thisinterpretation is illogical
In contrast, = -0.0255+0.0001×2T, demonstrating that as time passes, T rises, causing thetotal number of COVID-19 fatalities across Asia to change by -0.0255+0.0001×2T
2 Recommended Trend Model
Table 7: SSE & MAD results of the three trend models.
To evaluate which trend model is an ideal choice for estimating the number of COVID-19 fatalities
in North America, we created a table of two computations that analyze the errors - SSE & MAD As in the table demonstrated above, the Exponential trend model produces the smallest
Trang 13SSE and MAD results Therefore, the Exponential appears to be the best acceptable trend model for forecasting the number of deaths caused by COVID-19 in North America.
3 Prediction for the number of COVID-19 deaths
The following formulas are used to estimate the number of fatalities on September 28,
September 29, and September 30:
September 28: Ŷ = 2.531 x 0.992181 = 0.591
September 29: Ŷ = 2.531 x 0.992182 = 0.587
September 30: Ŷ = 2.531 x 0.992183 = 0.582
The calculations above predict North America will record 0.591, 0.587, 0.582 new death
cases because of the COVID-19 pandemic per million people on September 28, September 29,and September 30, respectively
REGION B – Asia
T-Squared: 1.31210-14
Trang 14R2 0.029% 39.416% 1.754%
Table 9: P-value and R2 of the three trend models.
1 Trend Models
The same as Region A, a table of P-values of the three trend models is conducted for further
analysis Accordingly, based on those outcomes above, we can determine the significant trend model of
the total number of COVID-19 deaths in Asia between April 01 and July 31, 2021.
H0: 1=0 (There is no trend in the total number of deaths in Asia)
H1: 10 (There is a trend in the total number of deaths in Asia)
Linear Trend: Since the p-value is larger than � (0.852 > 0.05) (Table 9), H0 is
accepted Hence, we are 95% confident that there is no linear trend in the total number of
daily deaths due to COVID-19 in Asia from April 01 to July 31, 2021
Quadratic Trend: As can be seen from the table above, both p-values of T (4.191×10-14)
and T-squared (1.31210-14) are smaller than � (0.05) Consequently, we reject H0 With
95% certainty, we ensure that the total number of COVID-19 daily deaths in Asia between
April 01 and July 31, 2021 has a quadratic trend.
Exponential Trend: As � is appeared to be lower than the p-value (0.05 < 0.146), we do not
reject H0 With a 95% confidence level, it is clear that the total number of daily deaths due to the
COVID-19 in Asia from April 01 to July 31, 2021 does not follow an exponential trend.
According to the findings above, it is undoubted that the total number of daily deaths
due to COVID-19 in Asia countries follows a quadratic trend model In addition, the
smaller the R2, the less the regression model fits our observations (Statistic By Jim n.d)
Thus, the highest R2 comes from the Quadratic trend model (Table 9) has once more
reinforced our conclusion that Quadratic is the significant trend model of the total
number of COVID-19 death cases in Asia.
Regression Output – QUA
Trang 15Figure 6: Quadratic trend regression output of the total number of daily
COVID-19 deaths in Asia.
Formula & Coefficient Explanation
Y=0.4017+0.0192(T)-0.0002(T)2
=0.0192-0.0002×2T
When T = 0, = 0.4017, the rate of change in the total number of daily COVID-19 deaths in
Asia is 0.4017% Nevertheless, 0 is a value that cannot be observed, so this interpretation is
unreasonable
Conversely, = 0.0192-0.0002×2T, indicating that as one year goes by, T increases, leading
the rate of the total number of daily COVID-19 deaths in Asia to change by 0.0192-0.0002×2T
(%)
2 Recommended Trend Model