The data of the tworegions are collected based on 6 different variables: ● Total number of deaths due to COVID 19 ● Average temperature in Celsius ● Average rainfall in mm ● Medical doct
Trang 1ASSESSMENT TASK 3A
TEAM REPORT
Trang 21 Check for outlier 1
1 Regression output and formula of the significant trend model 10
1 The main factors that impact the number of deaths due to COVID-19 17
Trang 3European Quality Assurance Register for Higher Education 2020; United Nations 2020; Food andAgriculture Organization of the United Nations 2020) With the vigorously spreading speed, thedisease now appears at every continent (except Antarctica) and caused over 311,000 cases of fatality(Worldometer 2020) Especially at Western, the disease existed later than at Eastern, however, morecomplex and have not shown any sign of decreasing (French Institute of International Relations 2020).
This paper will have an insight at two specificial Western regions - Europe and European Union (EU), by analyzing the number of Covid 19 deaths in each region and its relationship with related
factors, examining the trend model of Covid 19 deaths in both regions as well as giving out somepredictions
I DATA COLLECTION
This section provides data (information) that is necessary for part II and III analysis According toRomos (2018), Europe contains 54 nations and territories while the EU consists of 27 regions.However, due to the lack of information in some countries, our datasets consist of 27 EU countries butonly 41 European countries
The datasets will contain six variables, namely total number of Covid 19 in each country period 22January to 24 April, population, medical doctors per 10,000 people, hospital bed per 10,000 people,average temperature of the first four months and average rainfall of the first for four months Fordetails, please see Appendix 1 and Appendix 2
II DESCRIPTIVE STATISTICS
European
Union
Table 1 Europe and European Union dataset descriptive statistics
1 Check for outlier
For Europe data set, there is no observation smaller than the lower bound (Q1-1.5 × IQR) but seven
observations bigger than the upper bound (Q3+1.5 × IQR) Thus, there are seven outliers in this
dataset
For European Union data set, there is no observation smaller than the lower bound (Q1-1.5 × IQR)
but five observations bigger than the upper bound (Q3+1.5 × IQR) Thus, there are six outliers in
this dataset
2 Measure of central tendency
Since both datasets have outliers, Median, the measure of central tendency that is not affected byoutliers, might be the most suitable measurement in this case
Medians of Europe and EU dataset are 169 and 225, respectively For Europe, it could be implied that50% of countries in this region have more than 169 Covid 19 deaths (period 22 January to 23 April)and 50% above 169 deaths Therefore, on average, it is 169 Covid 19 deaths per country in Europeandataset Similar conclusions could be made for the European Union 50% of countries in this regionhave more than 225 Covid 19 deaths (period 22 January to 23 April) and 50% above 225 deaths Onaverage, it is 225 Covid 19 deaths per European Union country
Trang 43 Measure of variation
In this context, Interquartile Range (IQR) would be the most suitable measurement as it is not affected
by outliers and could measure how much the middle 50% of observations spread out from the Median
- the chosen average in this case
Interquartile ranges of Europe and European Union dataset are 952 and 1889, respectively.Interquartile illustrates how the middle 50% observations spread out and the smaller the IQR is, themore consistent the middle 50% observations are Thus, it could be concluded that European Uniondataset is less consistent in terms of Covid 19 deaths per country In other words, the differencebetween the number of Covid 19 fatalities per nation in European Union is bigger than in Europe
4 Box and Whisker plots
According to the box and whisker plots graph, both datasets are right-skewed because their right part islonger than their left part This implies that for each dataset, the majority of the data are located on the
high side of the graph In other words, most
of the countries (in both European Union andEurope region) have a high number of Covid
19 deaths
The graph also indicates that there areoutliers in both datasets, meaning bothdatasets contain extreme values which wouldaffect the objectiveness of some sensitivemeasurements, such as Mean or Range.Thus, those sensitive measurements are notrecommended to use in assessing the twodatasets
The box plot shows that Europe’s Median(quartile 2) is slightly smaller than EuropeanUnion’s Median, meaning that in general, thenumber of Covid 19 deaths in EU countries
is higher than European countries
Comparing the two boxes, it could beconcluded that the European Union's box isbigger than Europe’s, meaning that thenumber of deaths of the middle 50%European Union countries spread out more
widely from the Median than the middle 50% of
Europe Since the middle 50% observations are generally considered as the most concentrated part ofthe dataset, it is possible that the number of Covid 19 deaths between European Union countries varymore than Europe countries’
5 Conclusion
Having analyzed European Union and Europe datasets descriptive statistics, it could be concluded that
in general, from January 22 to April 23, European Union countries have more cases of Covid 192
Figure 1 Europe and EU dataset box and whisker
Trang 5deaths than Europe because Europe Median is smaller than EU Median Since both datasets containoutliers, some sensitive measurements such as Mean or Range are likely to be unreliable in and shouldnot be used in assessing these two datasets Moreover, the IQR results of Europe and European Unionrevealed the variation in number of Covid 19 deaths in the two regions European Union’s IQR issmaller than Europe’s IQR so the number of Covid 19 deaths among European Union countries isconsidered less consistent than Europe, meaning the difference between the number of Covid 19fatalities among European Union countries is bigger than the difference among European countries.
III MULTIPLE REGRESSION
In this part, we will use the data of 2 regions which are Europe and European Union (it should benoticed that all the countries within European Union are included in Europe) The data of the tworegions are collected based on 6 different variables:
● Total number of deaths due to COVID 19
● Average temperature (in Celsius)
● Average rainfall (in mm)
● Medical doctors (per 10,000 people)
● Hospital beds (per 10,000 people)
● Population (in thousands)
Among 6 variables above, the total number of deaths due to COVID 19 is the only dependent variable Other 5 variables including average rainfall (in mm) and average temperature (in Celsius),
hospital beds (per 10,000 people), population of the country in 2018 (in thousands), and medicaldoctors (per 10,000 people), all are considered as independent variables
Based on these 5 independent variables, we will build multiple regression models for Europeregion and Europe Union region to predict the number of death rates due to COVID 19 For each dataset of each region, we will apply backward elimination to reach the final model with only the variablesthat are significant at 5% level of significance
1 Building Regression model of Europe and European Union (applying Backward Limitation)
* Europe
Step 1: Regression output for Europe (1)
It can be seen that, there are 4independent variables which areinsignificant at 0.05 significance level, but
we first eliminate the Average temperature(in Celsius) since this non significantindependent variable has the highest p-value
Figure 2 Regression Output for Europe (1)
Trang 6Step 2: Regression output for Europe (2)
After eliminating the Average temperature (in Celsius) out of those independent variables, we run theregression analysis again and have the summary output as below:
From the regression output, there arestill 3 non-significant independent variablessince their p-value is larger than 0.05
We remove the Medical doctor (per10,000 people) since this non significantindependent variable has the highest p-value
Step 3: Regression output for Europe (3)
After eliminating the Medical doctor (per 10,000 people) out of those independent variables, werun the regression analysis again and have the summary output as below:
As can be seen in the summary output,there is still 1 non-significant independentvariable which is the Average rainfall (inmm) since its p-value is larger than 0.05 Thus, we remove it out of the independentvariables
4
Figure 3 Regression Output for Europe (2)
Figure 4 Regression Output for Europe (3)
Trang 7Step 4: Regression output for Europe (4) (FINAL Model)
After removing the non-significant independent variables including the Average rainfall (in mm),the Average temperature (in Celsius), and the Medical doctor (per 10,000 people), we reach the FINAL
regression model where the two independent variables left namely the Hospital bed (per 10,000 people) and the Population (in thousands) are significant at 5% significance level (p-value < 0.05).This indicates that these two significant independent variables have an effect on the number of deathsdue to COVID 19 at 5% level of significance
* European Union
Step 1: Regression output for European Union (1)
It can be seen that, there are 3independent variables which areinsignificant at 0.05 significance level,but we first eliminate the Medicaldoctor (per 10,000 people) since thisnon significant independent variablehas the highest p-value
Figure 5 Regression Output for Europe (4) (Final Model)
Figure 6 Regression Output for European Union (1)
Trang 8Step 2: Regression output for European Union (2)
After eliminating the Medical doctor (per 10,000 people) out of those independent variables, werun the regression analysis again and have the summary output as below:
From the regression output, thereare still 2 non-significant independentvariables since their p-value is largerthan 0.05
We remove the AverageTemperature (in Celsius) since this nonsignificant independent variable has thehighest p-value
Step 3: Regression output for European Union (3)
After eliminating the Average Temperature (in Celsius) out of those independent variables, we runthe regression analysis again and have the summary output as below:
As can be seen in the summaryoutput, there is still 1 non-significantindependent variable which is theAverage rainfall (in mm) since its p-value is larger than 0.05
Figure 7 Regression Output for European Union (2)
Trang 9Thus, we remove it out of the independent variables
Step 4: Regression output for European Union (4) (FINAL Model)
After eliminating 3 non-significant independent variables including the Average temperature (inCelsius), the Average rainfall (in mm) and the Medical doctor (per 10,000 people), we reach the
FINAL regression model where the two independent variables left namely the Hospital bed (per 10,000 people) and the Population (in thousands) are significant at 5% significance level (p-value <0.05) This indicates that these two significant independent variables have an impact on the number ofdeaths due to COVID 19 at 5% level of significance
2 FINAL Regression Model of each region
* Europe’s FINAL Regression model
a) Regression Output
Figure 8 Regression Output for European Union (3)
Figure 9 Regression Output for European Union (4) (Final Model)
Trang 10b) Regression Equation
The number of deathsdue
^
× Population (in thousands)
c) Interpret the regression coefficient of the significant independent variables:
● The coefficient of Hospital bed (per 10,000 people) ( b1 = -102.152) denotes the
negative relationship between the two variables, which means for every increase in thehospital bed (per 10,000 people), the number of deaths due to COVID 19 is estimated todecrease by 102.152 deaths, given that the population holding constant
● The coefficient of Population (in thousands) ( b2 = 0.118) indicates the positive
relationship between two variables, which means for every increase of 1,000 people inpopulation (in thousands), the predicted number of deaths due to COVID 19 increases by
118 deaths, holding the hospital bed constant
d) Interpret the coefficient of determination
The Coefficient of Determination ( R2
= 0.320) shows that 32% of the variation in the number
of deaths due to COVID 19 can be explained by the variation in the hospital bed (per 10,000 people)and the population (in thousands) Whereas, the remaining 68% of the variation in the number ofdeaths due to COVID 19 is affected by other factors
* European Union’s FINAL Regression model
a) Regression output
8
Figure 10 Regression Output for Europe (Final Model)
Figure 11 Regression Output for European Union (Final Model)
Trang 11b) Regression Equation
The number of deathsdue
^
× Population (in thousands)
c) Interpret the regression coefficient of the significant independent variables
● The coefficient of Hospital bed (per 10,000 people) ( b1 = -146.198) denotes the
negative relationship between the two variables, which means for every increase in thehospital bed (per 10,000 people), the number of deaths due to COVID 19 is expected todecrease by 146.198 deaths, keeping that the population holding constant
● The coefficient of Population (in thousands) ( b2 = 0.265) indicates the positive
relationship between two variables, which means for every increase of 1,000 people inpopulation (in thousands), the average number of deaths due to COVID 19 is estimated toincrease by 265 deaths, given the hospital bed (in the model) remains constant
d) Interpret the Coefficient of Determination
The Coefficient of Determination ( R2
= 0.663) shows that 66.3% of the variation in the number
of deaths due to COVID 19 can be explained by the variation in the hospital bed (per 10,000 people)and the population (in thousands) Whereas, the remaining 33.7% of the variations of the number ofdeaths due to COVID 19 may refer to other factors
IV TEAM REGRESSION CONCLUSION
INDEPENDENT VARIABLES
COEFFICIENT OF THE
SIGNIFICANT INDEPENDENT VARIABLES
COEFFICIENT OF DETERMINATION
S ( R2 )
Europe Hospital bed (per 10,000
people)
Population (in thousands) 0.118
European Union Hospital bed (per 10,000
people)
Population (in thousands) 0.265
Table 2: Regression Conclusion for Europe and European Union datasets
According to the FINAL regression models of each region, it is noticeable that both regions
(Europe Union and Europe) have the same significant independent variables which are the Hospital bed (per 10,000 people) and the Population (in thousands)
We will analyze the coefficient of the two significant independent variables to access which region ismore impacted due to this pandemic With regards to the coefficient of the hospital bed (per 10,000people) of two regions, there is a negative relationship between the number of deaths due to COVID
19 and the hospital bed (per 10,000 people) However, the absolute value of b1 in EuropeanUnion’s model is higher than the absolute value of Europe's model (146.198 > 102.152), which
Trang 12in Europe, the number of deaths will decrease by 102.152 deaths, for every increase in the hospital bed(per 10,000 people), holding the population is constant
One of the substantial challenges faced by many hospitals is the lack of intensive care unit (ICU)beds in hospitals for COVID 19 patients, especially when handling the big wave of coronavirus cases.Building more hospital beds is important in saving patients’ lives Without enough total ICU beds,patients may not hospitalize and receive required treatments from doctors, which may result in a rise inthe number of deaths due to COVID 19 (Mangan & Schoen 2020) Therefore, the increase in the ICUbeds will lead to a reduction in the number of deaths caused by COVID 19
In terms of the coefficient of the population (in thousands) of two regions, there is a positiverelationship between the number of deaths due to COVID 19 and the population (in thousands)
However, the absolute value of b2 in European Union’s model is higher than the absolute
value of b2 in Europe’s model (0.265 > 0.118), which shows that the population (in thousands) has
a greater effect on the number of deaths due to COVID 19 of European Union This implies that thenumber of deaths resulting from COVID 19 will increase by 265 deaths, for every increase of 1,000people in the population (in thousands), keeping the hospital bed (in the model) remains constant Anincrease in population density in an area may result in the difficulty in implementing social-distancingmeasures According to Dr Seven Goodman, an epidemiologist at Stanford University, “density isreally an enemy in a situation like this” since the virus tends to spread faster in larger populationcenters where people interact more with each other, which may increase the coronavirus cases anddeaths As reported by public health experts, density is also the main reason for the increasing number
of coronavirus cases in New York (Rosenthal 2020) Therefore, the higher the population density, thehigher the number of deaths resulted from COVID 19
Since the coefficient of the hospital bed (per 10,000 people) and population (in thousands) are higher
in European Union’s regression model, the hospital bed (per 10,000 people) and the population (inthousands) has a greater impact on the number of deaths due to COVID 19 of European Union
Therefore, it can be concluded that the European Union is more impacted due to this pandemic,
which means the lack of hospital beds and the increase in population in European Union will result inmore deaths compared to changes of these variables in Europe
Moreover, when comparing the coefficient of determinations R2 of two regions, European
Union’s data set has a higher R2
of 66.3% compared to one of Europe's data set (32%) whichmeans 66.3% of the variation in the number of deaths caused by COVID 19 of European Union isexplained by the variation in the hospital bed (per 10,000 people) and the population (in thousands) A
larger R2
of the European Union indicates a better goodness-of-fit, which means the closer theobserved values will fall to the fitted regression line Hence, with an acceptable R2 , the EuropeanUnion’s regression model is reliable to be used for predicting the number of deaths due to COVID 19
Whereas, in Europe’s model, the R2
is very small (32%), with the remaining 68% may refer toother factors that are not included in the test Therefore, further research on other significantindependent variables that are not covered in the regression analysis should be taken and analyzed tohave a better estimation of the number of deaths due to COVID 19 For instance, an increase in theprotective equipment (such as respirators, eye protection, gloves, and gowns) for frontline health-careworkers will lessen the rate of coronavirus infections since these critical items will protect them frombeing infected with the coronavirus when providing treatment for COVID 19 patients (Choudhury2020)
V TIME SERIES
10
Trang 13This part presents the trend model analysis for the European Union and Europe by collecting thedaily data for total number of deaths due to COVID-19 in each region from January 01 and April 30,
2020 (Appendix 3) Specifically, the paper provides the regression output and formula of thesignificant trend model in each region, gives recommendations about trend models for prediction, aswell as forecasts the number of deaths due to Covid-19 in certain days in both regions
1 Regression output and formula of the significant trend model
Hypothesis testing (at a 5% significance level) for Linear Trend in European Union
Step 1: H : 0 β1 = 0 (There is no linear trend)
H1: β1≠ 0 (There is a linear trend)
Step 2: p-value (0.00) < α(0.05)→ Reject H0
Step 3: With 95% level of confidence, we can say that there is a linear trend in the number of deaths inthe European Union
b0 = -694.858→ b shows that the number of deaths in European Union will be -694.858 in day 00
(before February 15, 2020), which makes non-sense in the real life Therefore, there should be 0deaths on February 14, 2019
b1 = 62.211 → b shows that the number of deaths in European Union will increase by 62.2111
deaths every day
Quadratic Trend