Measures of Central Tendency Table 1: Measures of Central Tendency for the numbers of COVID-19 deaths by April 23rd, 2020, distributed by region As can be seen from Table 1, the mean of
Trang 1RMIT University Vietnam [ECON1193] Business Statistics
CASE STUDY
FACTORS AFFECTING NUMBER
OF DEATHS DUE TO COVID-19
Lecturer: Pham Thi Minh Thuy Students and Work Allocation:
Student Name Student ID Parts Contributed Contribution % Signature
Trang 2TABLE OF CONTENT
Trang 3I Data Collection
The data are collected and shown in the Excel File
II Descriptive Statistics
1 Measures of Central Tendency
Table 1: Measures of Central Tendency for the numbers of COVID-19 deaths
by April 23rd, 2020, distributed by region
As can be seen from Table 1, the mean of number of COVID-19 deaths in the EU, which is 3347.519 deaths, is higher than the figure for the Middle East (ME) at 421.634 Hence, it can be concluded that
on average there are more death cases in EU countries than in ME countries Regarding the median, which is the middle value of the data set when putting the data in ascending order, half of the EU countries have more than 225 deaths, whereas, 50% of ME countries count only more than 13 In terms
of the mode, while there is no mode in the EU, in ME, the mode is 7 deaths, meaning that the most frequent number of deaths in ME countries is 7.
In this situation, outliers are spotted in both categories (see Figure 1) Therefore, the mean, which is affected by outliers, cannot be used to compare both categories Moreover, the majority of data in each group differ from each other, the mode is not suitable, either Thus, median is the best measure The EU's median is nearly 20 times higher than that of the ME, indicating that COVID-19 has a more negative impact on the population of EU countries compared with ME countries.
2 Measures of Variation
Table 2: Measures of Variation for the numbers of COVID-19 deaths
by April 23rd, 2020, distributed by region
Trang 4Measures of variation indicate how data are distributed or, specifically in this case, how the number of COVID-19 deaths among countries in each region vary from one another While the range, or the gap between the minimum and maximum values of the data set, of the EU is around 25 thousand deaths, that of the Middle East is only slightly more than one fifths of the EU’s That means the numbers of deaths of the EU’s countries spread quite widely, much more significantly than the already wide range
of the Middle East Apart from that, sample variance, standard deviation and coefficient of variation are the measures that the larger their values are, the more widely data spread around the mean, and standard deviation is the most common one to be focused on among the three Again, both the EU and the Middle East’s standard deviations are quite large, and the one belonging to the EU’s numbers of deaths due to COVID-19 is substantially higher than that of the Middle East, which signifies the same idea the ranges tell.
However, as the data of the number of COVID-19 deaths collected have outliers (see Figure1), the above measures might not be effective enough to tell the data variation as they areaffected by outliers, where interquartile range, an independent measure of outliers, should bethe primary one to be considered Interquartile range is the range of 50% middle values of anordered data set Although the interquartile range of EU’s data is still at the high level ofnearly 2000, that of the Middle East is quite insignificant – 76 deaths Similarly put, there isevery likelihood that the EU’s countries’ COVID-19 mortality numbers are considerablydifferent from one another, in contrast with the moderately comparable values of the MiddleEast’s countries It might result from the lack of solidarity and political games among theEU’s countries that prevent them from making mutual efforts to deal with the coronaviruspandemic (Szucs 2020), while the Middle East countries remain good communication and riskmanagement between states (Pietromarchi 2020)
3 Measures of Shape
Figure 1: Box and Whisker Plots for the numbers of COVID-19 deaths
by April 23rd, 2020, distributed by region
Figure 1 illustrates the distributions of total number of deaths due to COVID-19 of countries in
European Union and Middle East region It shows a highly right-skewed distribution of the total
number of deaths in both regions due to the presence of extremely high total deaths belonging to
Trang 5some countries This indicates that there are more countries in both groups with lower thanaverage numbers of deaths due to COVID-19, which is a positive news for both regions Whenconsidering the position of the mean in relation to the boxes and whiskers, the EU’s meandeath toll locates in the fourth quartile, pointing out that more than 75% of countries in the EUzone have total deaths below the region’s average Meanwhile, the Middle East region has aless severe and deathly situation, with only one country having total deaths higher than theregion’s average as the mean falls outside the Middle East’s boxplot Moreover, the box-and-whisker plot of the Middle East is lower than the second quartile of the EU, which indicatesthat 13 of out 14 countries, excluding Iran, in the Middle East have lower death toll byCOVID-19 than over 50% of the EU countries, which also shows a much more serioussituation of COVID-19 in the EU than in the Middle East.
III Multiple Regression
All dataset cases are applied:
● Dependent variable (DV): Total number of deaths due to COVID 19 between January
22 and April 23, 2020
● Independent variables (IV):
- Average rainfall (in mm)
- Average temperature (in Celsius)
- Hospital beds (per 10,000 people)
- Population of the country (in 1000s)
- Medical doctors (per 10,000)
Since the significance level is α=0.05
- When p-value ¿0.05 , the variable is insignificant
- When p-value ¿0.05 , the variable is significant
According to Upton and Cook (2014), backward elimination, which is the opposite of forwardelimination, is a search procedure where the initial model contains all variables and removesineffective variables one by one At each step, one variable is extracted from the model, andthis procedure continues to the point where all remaining variables have p-value less than agiven threshold, which is 0.05 in this case
Regarding the given case, we shall start the backward elimination process with the full model,which consists of all predictors The initial model includes average rainfall, averagetemperature, hospital beds, country population, and medical doctors
Trang 6A EU countries:
1 Regression output 1 - All variables
● Significant variables: Population (p-value: 0.0000029) and Hospital beds (p-value:
0.01206)
● Insignificant variables: Average rainfall (highest p-value: 0.643), Medical doctors
(p-value: 0.627), Average Temperature (p-(p-value: 0.474)
2 Regression output 2 - Exclude Average rainfall
Trang 7● Significant variables: Population (p-value: 0.0000014) and Hospital beds (p-value:
0.01005)
● Insignificant variables: Medical doctors (highest p-value: 0.591), Average temperature
(p-value: 0.422)
3 Regression output 3 - Exclude Medical doctors
● Significant variables: Population (p-value: 0.000001) and Hospital beds (p-value:
0.01001)
● Insignificant variables: Average temperature (p-value: 0.508).
4 Regression FINAL Output - Exclude Average temperature
Trang 8● Significant variables: Population (p-value: 0.000001) and Hospital beds (0.0102)
● Insignificant variables: None
- bhospital beds = -151.453 is the coefficient of number of hospital beds (per 10,000 people).For 1 extra hospital bed per 10,000 people, the number of COVID-19 deaths will decrease byapproximately 151 people assuming that other variables are constant
- R2=0.665 is the coefficient of determination 66.5% of the total COVID-19 deaths in EUvariation can be evaluated through the total population of EU countries and number of hospital beds
1 Regression Output 1 - All variables
● Significant variables: Population (p-value: 0.0093)
● Insignificant variables: Average temperature (p-value: 0.951), Average rainfall (highest
p-value: 0.959), Hospital beds (p-value: 0.859), and Medical doctors (p-value: 0.635)
Trang 92 Regression Output 2 - Exclude Rainfall
● Significant variables: Population (p-value: 0.0037)
● Insignificant variables: Average temperature (highest p-value: 0.971), Hospital beds
(p-value: 0.855), and Medical doctors (p-(p-value: 0.568)
3 Regression Output 3 - Exclude Temperature
● Significant variables: Population (p-value: 0.001)
● Insignificant variables: Hospital beds (highest p-value: 0.804), and Medical doctors
(p-value: 0.540)
Trang 104 Regression Output 4 - Exclude Hospital Beds
● Significant variables: Population (p-value: 0.00032)
● Insignificant variables: Medical doctors (highest p-value: 0.316).
5 Regression FINAL Output - Exclude Medical Doctors
● Significant variables: Population (p-value: 0.00027)
● Insignificant variables: None.
6 Regression Equation
Total death = - 613.6408143 + 0.00005597 × (Population)
Trang 117 Interpretation
- b population=0.00005597 indicates that the total number of deaths due to COVID-19increases by 1 person when the population of Middle East countries increases by approximately17,867 (=1/0.00005597) people
- R2=71.5 %(¿0.715) This means that 71.5% of the variation of the total number ofdeaths due to COVID-19 is explained by the population of the countries in the Middle East It can
be seen that the relationship between the total number of deaths and population of Middle Eastcountries is relatively strong
IV Team Regression Conclusion
From the previous section, at 95% level of significance, total COVID-19 deaths of the EUregion have correlations with two independent variables: population and hospital beds (per10,000 people); while the Middle East region is only affected by total population Moreover,the regression model of the Middle East reflects a better estimation with a higher coefficient of
determination ( R2=0.744 ) than that of the EU zone ( R2=0.671 ), which indicates morevariation in total deaths of the Middle East region (74%) is predicted by the variation ofindependent variables compared with the EU (67%)
Based on the regression analysis, the EU region appears to be more impacted by this pandemic Both regions are partly influenced by total population, however, the EU’s slope of population (0.0002) is higher than that of the Middle East (0.00005), indicating that the EU countries will experience more significant variation in total deaths when total population changes It is quite comprehensible as the fact that the bigger the population in a country is, the wider the virus is able to spread, resulting in more infected people, which contributes to the number of deaths On the other hand, the EU’s total deaths by COVID-19 is correlated to another variable, which is hospital beds (per 10,000 people) The Middle East countries suffer from fewer cases of COVID-19 infected citizens and deaths, so they might manage to take care of their COVID-19 patients with their already available number of hospital beds Meanwhile, it might not be the same situation for the EU members as they have been burdened by lack
of hospital beds due to the large number of infected and death cases (Furlong & Hirsch 2020) As a result, the number of hospital beds might matter to the EU while it might not matter much to the Middle East This second variable also makes the EU more sensitive to COVID-19’s impacts, because it is likely to be affected by more elements than that of the Middle East.
Therefore, in order to reduce the number of COVID-19 deaths, especially in these regions, it isadvisable they focus on the virus’s ability to spread in the community, as well as improve thenumber of hospital beds
Trang 12V Time Series
1 Significant Models
According to the significance of the three trend models for each country as shown below, wehave come to the conclusion that both regions have all three significant models which areLinear, Quadratic and Exponential trends since all models have p-value < 0.05
● Linear Model
Formula: y = - 445.068 + 46.804 × ( number of days since 1 st death)
● Quadratic Model
Trang 13Formula: y = - 975.628 + 87.617 × ( number of days since 1 st death ) - 0.530 ×
number of days since 1 st death )2
¿
● Exponential Model
Formula: y = 0.002 × 1.291(number of days since 1st death)
● Linear Model
Formula: y = 24.745 + 1.897 × ( number of days since 1 st death )
● Quadratic Model
Trang 14Formula: y = - 54.540 + 8.414 × ( number of days since 1 st death ) - 0.091
number of days since 1 st death
× ¿
● Exponential Model
Formula: y = 2.82 × 1.07(number of days since 1st death)
2 Recommendation on the Best Model
To determine the best forecasting model, we have calculated measuring errors by calculating SSE and MAD The sum of squared errors (SSE) measures how the observations are scattered from the regression line or, in other words, the errors we make when predicting the observations using the regression line When the observations are scattered quite randomly, we will have a high SSE and vice versa However, SSE is affected by outliers (observations which vary too much from the mean) The mean absolute deviation (MAD) measures how far each of the
Trang 15observations are from the mean Unlike SSE, it is not sensitive to extreme observations
(observations which vary too much from the mean)
Table 3: SSE and MAD Values of each Time Model for each Region
The trend model that shows the smallest error levels is believed to be the most significant
trend model Regarding the EU, although the MAD of the quadratic model calculated is
slightly higher than that of the linear model, the quadratic model’s SSE is significantly larger
in comparison to the linear one, and the differences among the EU death adjacent numbers are
not too large, resulting in a moderate level of outliers’ errors Apart from that, both SSE and
MAD levels of the quadratic model in the Middle East are the smallest among the 3 models
Therefore, the quadratic model is chosen to be the best trend model for both the EU and the
Middle East to predict the number of deaths due to COVID-19 (NCD).
- On May 30th - day number 106 since the first confirmed death in the EU
^
+81.617×106−0.530 ×1062 =1,720.694∼1,721( people) NCD=−975.628
- On May 31st - day number 107 since the first confirmed death in the EU
^
+81.617×107 −0.530 ×107 2 =1,689.421∼1,689( people) NCD=−975.628
Based on the formula above, the number of deaths due to COVID-19 on May 29th, 30th and 31st are
predicted to be approximately 1,751 people, 1,721 people and 1,689 people respectively.
Trang 16Hence, it can be drawn from those results that the number of deaths due to COVID-19 would
see a downward trend in the future in the EU
b Middle East Countries
The daily number of COVID-19 deaths in a particular day in the Middle East can then be
- On May 31st - day number 102 since the first confirmed death in the Middle East
^
×102 −0.091× 102 2=−143.076 ∼−143 ( people ) NCD=−54.540+8.414
After applying the quadratic formula, the predicted new confirmed deaths in the Middle East
on the last three days of May are -124; -133 and -143 respectively However, these figures are
negative, which does not make sense because the daily confirmed deaths must be equal or
larger than 0 Hence, the results may suggest that the Middle East will not have any new
COVID-19 deaths on May 29, May 30, and May 31
VI Time Series Conclusion
Figure 2: Daily Recorded Number of COVID-19 Deaths
in 2 Regions since the 1st recorded death in each Region
Figure 2 provides an overview for the daily recorded number of COVID-19 deaths in the EU
and the Middle East By these line charts, the level of errors is easily reflected Although there
are two extreme values in the data set of the Middle East, the rest values do not show much