1. Trang chủ
  2. » Giáo Dục - Đào Tạo

ASSESSMENT TASK 3a TEAM REPORT OVERALL TEAM CONCLUSION 17 1 the main factors that impact the number of deaths due to COVID 19

27 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Assessment Task 3a Team Report Overall Team Conclusion 17 1 The Main Factors That Impact The Number Of Deaths Due To COVID 19
Trường học RMIT University
Chuyên ngành Data Analysis and Statistical Methods
Thể loại Report
Năm xuất bản 2020
Thành phố Melbourne
Định dạng
Số trang 27
Dung lượng 0,99 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The data of the tworegions are collected based on 6 different variables: ● Total number of deaths due to COVID 19 ● Average temperature in Celsius ● Average rainfall in mm ● Medical doct

Trang 1

ASSESSMENT TASK 3A

TEAM REPORT

Trang 2

1 Check for outlier 1

1 Regression output and formula of the significant trend model 10

3 Prediction on number of deaths on May 29, May 30, May 31 16

1 The main factors that impact the number of deaths due to COVID-19 17

Trang 3

medical (The European Quality Assurance Register for Higher Education 2020; United Nations2020; Food and Agriculture Organization of the United Nations 2020) With the vigorouslyspreading speed, the disease now appears at every continent (except Antarctica) and caused over311,000 cases of fatality (Worldometer 2020) Especially at Western, the disease existed later than

at Eastern, however, more complex and have not shown any sign of decreasing (French Institute ofInternational Relations 2020) This paper will have an insight at two specificial Western regions -

Europe and European Union (EU), by analyzing the number of Covid 19 deaths in each region and its relationship with related factors, examining the trend model of Covid 19 deaths in both

regions as well as giving out some predictions

This section provides data (information) that is necessary for part II and III analysis According toRomos (2018), Europe contains 54 nations and territories while the EU consists of 27 regions.However, due to the lack of information in some countries, our datasets consist of 27 EU countriesbut only 41 European countries

The datasets will contain six variables, namely total number of Covid 19 in each country period 22January to 24 April, population, medical doctors per 10,000 people, hospital bed per 10,000 people,average temperature of the first four months and average rainfall of the first for four months Fordetails, please see Appendix 1 and Appendix 2

Union

Table 1 Europe and European Union dataset descriptive statistics

1 Check for outlier

For Europe data set, there is no observation smaller than the lower bound (Q1-1.5 × IQR) but seven observations bigger than the upper bound (Q3+1.5 × IQR) Thus, there are seven outliers in this

dataset

For European Union data set, there is no observation smaller than the lower bound (Q1-1.5 × IQR) but five observations bigger than the upper bound (Q3+1.5 × IQR) Thus, there are six outliers in

this dataset

2 Measure of central tendency

Since both datasets have outliers, Median, the measure of central tendency that is not affected by outliers, might be the most suitable measurement in this case

Medians of Europe and EU dataset are 169 and 225, respectively For Europe, it could be impliedthat 50% of countries in this region have more than 169 Covid 19 deaths (period 22 January to 23April) and 50% above 169 deaths Therefore, on average, it is 169 Covid 19 deaths per country inEuropean dataset Similar conclusions could be made for the European Union 50% of countries inthis region have more than 225 Covid 19 deaths (period 22 January to 23 April) and 50% above 225deaths On average, it is 225 Covid 19 deaths per European Union country

Trang 4

3 Measure of variation

In this context, Interquartile Range (IQR) would be the most suitable measurement as it is notaffected by outliers and could measure how much the middle 50% of observations spread out fromthe Median - the chosen average in this case

Interquartile ranges of Europe and European Union dataset are 952 and 1889, respectively.Interquartile illustrates how the middle 50% observations spread out and the smaller the IQR is, themore consistent the middle 50% observations are Thus, it could be concluded that European Uniondataset is less consistent in terms of Covid 19 deaths per country In other words, the differencebetween the number of Covid 19 fatalities per nation in European Union is bigger than in Europe

4 Box and Whisker plots

According to the box and whisker plots graph, both datasets are right-skewed because their right part is longer than their left part This implies that for each dataset, the majority of the data are located on the

high side of the graph In other words, most

of the countries (in both European Union andEurope region) have a high number of Covid

19 deaths

The graph also indicates that there areoutliers in both datasets, meaning bothdatasets contain extreme values which wouldaffect the objectiveness of some sensitivemeasurements, such as Mean or Range.Thus, those sensitive measurements are notrecommended to use in assessing the twodatasets

The box plot shows that Europe’s Median(quartile 2) is slightly smaller than EuropeanUnion’s Median, meaning that in general, thenumber of Covid 19 deaths in EU countries

is higher than European countries

Comparing the two boxes, it could beconcluded that the European Union's box isbigger than Europe’s, meaning that thenumber of deaths of the middle 50%European Union countries spread out more

Figure 1 Europe and EU dataset box and whisker

widely from the Median than the middle 50% of

Europe Since the middle 50% observations are generally considered as the most concentrated part

of the dataset, it is possible that the number of Covid 19 deaths between European Union countriesvary more than Europe countries’

5 Conclusion

Having analyzed European Union and Europe datasets descriptive statistics, it could be concludedthat in general, from January 22 to April 23, European Union countries have more cases of Covid 192

Trang 5

deaths than Europe because Europe Median is smaller than EU Median Since both datasets contain outliers, some sensitive measurements such as Mean or Range are likely to be unreliable in and should not be used in assessing these two datasets Moreover, the IQR results of Europe and European Union revealed the variation

in number of Covid 19 deaths in the two regions European Union’s IQR is smaller than Europe’s IQR so the number of Covid 19 deaths among European Union countries is considered less consistent than Europe, meaning the difference between the number of Covid 19 fatalities among European Union countries is bigger than the difference among European countries.

III MULTIPLE REGRESSION

In this part, we will use the data of 2 regions which are Europe and European Union (it should

be noticed that all the countries within European Union are included in Europe) The data of the tworegions are collected based on 6 different variables:

● Total number of deaths due to COVID 19

● Average temperature (in Celsius)

● Average rainfall (in mm)

● Medical doctors (per 10,000 people)

● Hospital beds (per 10,000 people)

● Population (in thousands)

Among 6 variables above, the total number of deaths due to COVID 19 is the only dependent variable Other 5 variables including average rainfall (in mm) and average temperature (in Celsius),

hospital beds (per 10,000 people), population of the country in 2018 (in thousands), and medical

doctors (per 10,000 people), all are considered as independent variables.

Based on these 5 independent variables, we will build multiple regression models for Europeregion and Europe Union region to predict the number of death rates due to COVID 19 For eachdata set of each region, we will apply backward elimination to reach the final model with only thevariables that are significant at 5% level of significance

1 Building Regression model of Europe and European Union (applying Backward Limitation)

* Europe

Step 1: Regression output for Europe (1)

It can be seen that, there are 4independent variables which areinsignificant at 0.05 significance level, but

we first eliminate the Average temperature(in Celsius) since this non significantindependent variable has the highest p-value

Figure 2 Regression Output for Europe (1)

Trang 6

Step 2: Regression output for Europe (2)

After eliminating the Average temperature (in Celsius) out of those independent variables, we run the regression analysis again and have the summary output as below:

From the regression output, there arestill 3 non-significant independent variablessince their p-value is larger than 0.05

We remove the Medical doctor (per10,000 people) since this non significantindependent variable has the highest p-value

Figure 3 Regression Output for Europe (2)

Step 3: Regression output for Europe (3)

After eliminating the Medical doctor (per 10,000 people) out of those independent variables, werun the regression analysis again and have the summary output as below:

As can be seen in the summary output,there is still 1 non-significant independentvariable which is the Average rainfall (inmm) since its p-value is larger than 0.05.Thus, we remove it out of the independentvariables

Figure 4 Regression Output for Europe (3)

4

Trang 7

Step 4: Regression output for Europe (4) (FINAL Model)

After removing the non-significant independent variables including the Average rainfall (inmm), the Average temperature (in Celsius), and the Medical doctor (per 10,000 people), we reach

the FINAL regression model where the two independent variables left namely the Hospital bed (per 10,000 people) and the Population (in thousands) are significant at 5% significance level (p- value < 0.05) This indicates that these two significant independent variables have an effect on the

number of deaths due to COVID 19 at 5% level of significance

Figure 5 Regression Output for Europe (4) (Final Model)

* European Union

Step 1: Regression output for European Union (1)

It can be seen that, there are 3independent variables which areinsignificant at 0.05 significance level,but we first eliminate the Medicaldoctor (per 10,000 people) since thisnon significant independent variablehas the highest p-value

Figure 6 Regression Output for European Union (1)

Trang 8

Step 2: Regression output for European Union (2)

After eliminating the Medical doctor (per 10,000 people) out of those independent variables, werun the regression analysis again and have the summary output as below:

From the regression output, thereare still 2 non-significant independentvariables since their p-value is largerthan 0.05

Temperature (in Celsius) since this nonsignificant independent variable has thehighest p-value

Figure 7 Regression Output for European Union (2)

Step 3: Regression output for European Union (3)

After eliminating the Average Temperature (in Celsius) out of those independent variables, we run the regression analysis again and have the summary output as below:

As can be seen in the summaryoutput, there is still 1 non-significantindependent variable which is theAverage rainfall (in mm) since its p-value is larger than 0.05

Trang 9

Thus, we remove it out of the independent variables.

Figure 8 Regression Output for European Union (3)

Step 4: Regression output for European Union (4) (FINAL Model)

After eliminating 3 non-significant independent variables including the Average temperature (inCelsius), the Average rainfall (in mm) and the Medical doctor (per 10,000 people), we reach the

FINAL regression model where the two independent variables left namely the Hospital bed (per 10,000 people) and the Population (in thousands) are significant at 5% significance level (p-value

< 0.05) This indicates that these two significant independent variables have an impact on the

number of deaths due to COVID 19 at 5% level of significance

Figure 9 Regression Output for European Union (4) (Final Model)

2 FINAL Regression Model of each region * Europe’s FINAL Regression model

a) Regression Output

7

Trang 10

Figure 10 Regression Output for Europe (Final Model)

b) Regression Equation

The number of deathsdue

= 5717.342 – 102.152 × Hospital bed (per 10,000 people) + 0.118

^

¿ COVID ¿

× Population (in thousands)

c) Interpret the regression coefficient of the significant independent variables:

The coefficient of Hospital bed (per 10,000 people) ( b1 = -102.152) denotes the

negative relationship between the two variables, which means for every increase in the hospital bed(per 10,000 people), the number of deaths due to COVID 19 is estimated to decrease by 102.152deaths, given that the population holding constant

The coefficient of Population (in thousands) ( b2 = 0.118) indicates the positive

relationship between two variables, which means for every increase of 1,000 people in population (inthousands), the predicted number of deaths due to COVID 19 increases by 118 deaths, holding thehospital bed constant

d) Interpret the coefficient of determination

The Coefficient of Determination ( R2 = 0.320) shows that 32% of the variation in the number

of deaths due to COVID 19 can be explained by the variation in the hospital bed (per 10,000 people)and the population (in thousands) Whereas, the remaining 68% of the variation in the number ofdeaths due to COVID 19 is affected by other factors

* European Union’s FINAL Regression model

a) Regression output

Figure 11 Regression Output for European Union (Final Model)

8

Trang 11

b) Regression Equation

The number of deathsdue

= 6508.490 – 146.198 × Hospital bed (per 10,000 people) + 0.265

^

¿ COVID ¿

× Population (in thousands)

c) Interpret the regression coefficient of the significant independent variables

● The coefficient of Hospital bed (per 10,000 people) ( b1 = -146.198) denotes the

negative relationship between the two variables, which means for every increase in thehospital bed (per 10,000 people), the number of deaths due to COVID 19 is expected todecrease by 146.198 deaths, keeping that the population holding constant

● The coefficient of Population (in thousands) ( b2 = 0.265) indicates the positive

relationship between two variables, which means for every increase of 1,000 people inpopulation (in thousands), the average number of deaths due to COVID 19 is estimated toincrease by 265 deaths, given the hospital bed (in the model) remains constant

d) Interpret the Coefficient of Determination

The Coefficient of Determination ( R2 = 0.663) shows that 66.3% of the variation in the number

of deaths due to COVID 19 can be explained by the variation in the hospital bed (per 10,000 people)and the population (in thousands) Whereas, the remaining 33.7% of the variations of the number ofdeaths due to COVID 19 may refer to other factors

IV TEAM REGRESSION CONCLUSION

INDEPENDENT VARIABLES

people)Population (in thousands) 0.118

European Union Hospital bed (per 10,000 -146.198 0.663

people)Population (in thousands) 0.265

Table 2: Regression Conclusion for Europe and European Union datasets

According to the FINAL regression models of each region, it is noticeable that both regions

(Europe Union and Europe) have the same significant independent variables which are the Hospital bed (per 10,000 people) and the Population (in thousands).

We will analyze the coefficient of the two significant independent variables to access which region ismore impacted due to this pandemic With regards to the coefficient of the hospital bed (per 10,000people) of two regions, there is a negative relationship between the number of deaths due to COVID

19 and the hospital bed (per 10,000 people) However, the absolute value of b1 in European Union’s

model is higher than the absolute value of Europe's model (146.198 > 102.152), which

Trang 12

unchanged While, in Europe, the number of deaths will decrease by 102.152 deaths, for everyincrease in the hospital bed (per 10,000 people), holding the population is constant.

One of the substantial challenges faced by many hospitals is the lack of intensive care unit (ICU) beds

in hospitals for COVID 19 patients, especially when handling the big wave of coronavirus cases Building more hospital beds is important in saving patients’ lives Without enough total ICU beds, patients may not hospitalize and receive required treatments from doctors, which may result in a rise in the number of deaths due to COVID 19 (Mangan & Schoen 2020) Therefore, the increase in the ICU beds will lead to a reduction

in the number of deaths caused by COVID 19.

In terms of the coefficient of the population (in thousands) of two regions, there is a positiverelationship between the number of deaths due to COVID 19 and the population (in thousands)

However, the absolute value of b2 in European Union’s model is higher than the absolutevalue of b

2 in Europe’s model (0.265 > 0.118), which shows that the population (in thousands) has

a greater effect on the number of deaths due to COVID 19 of European Union This implies that thenumber of deaths resulting from COVID 19 will increase by 265 deaths, for every increase of 1,000people in the population (in thousands), keeping the hospital bed (in the model) remains constant

An increase in population density in an area may result in the difficulty in implementing distancing measures According to Dr Seven Goodman, an epidemiologist at Stanford University,

social-“density is really an enemy in a situation like this” since the virus tends to spread faster in largerpopulation centers where people interact more with each other, which may increase the coronaviruscases and deaths As reported by public health experts, density is also the main reason for theincreasing number of coronavirus cases in New York (Rosenthal 2020) Therefore, the higher thepopulation density, the higher the number of deaths resulted from COVID 19

Since the coefficient of the hospital bed (per 10,000 people) and population (in thousands) arehigher in European Union’s regression model, the hospital bed (per 10,000 people) and thepopulation (in thousands) has a greater impact on the number of deaths due to COVID 19 of

European Union Therefore, it can be concluded that the European Union is more impacted due

to this pandemic, which means the lack of hospital beds and the increase in population in European

Union will result in more deaths compared to changes of these variables in Europe

Moreover, when comparing the coefficient of determinations R2 of two regions, EuropeanUnion’s data set has a higher R2 of 66.3% compared to one of Europe's data set (32%) which means66.3% of the variation in the number of deaths caused by COVID 19 of European Union is explained

by the variation in the hospital bed (per 10,000 people) and the population (in thousands) A larger

R2 of the European Union indicates a better goodness-of-fit, which means the closer the observedvalues will fall to the fitted regression line Hence, with an acceptable R2 , the European Union’sregression model is reliable to be used for predicting the number of deaths due to COVID 19

Whereas, in Europe’s model, the R2 is very small (32%), with the remaining 68% may refer to otherfactors that are not included in the test Therefore, further research on other significant independentvariables that are not covered in the regression analysis should be taken and analyzed to have a betterestimation of the number of deaths due to COVID 19 For instance, an increase in the protectiveequipment (such as respirators, eye protection, gloves, and gowns) for frontline health-care workerswill lessen the rate of coronavirus infections since these critical items will protect them from beinginfected with the coronavirus when providing treatment for COVID 19 patients (Choudhury 2020)

V TIME SERIES

10

Trang 13

This part presents the trend model analysis for the European Union and Europe by collectingthe daily data for total number of deaths due to COVID-19 in each region from January 01 and April

30, 2020 (Appendix 3) Specifically, the paper provides the regression output and formula of thesignificant trend model in each region, gives recommendations about trend models for prediction, aswell as forecasts the number of deaths due to Covid-19 in certain days in both regions

1 Regression output and formula of the significant trend model

a European Union

Since the number of deaths in the European Union remain 0 from January 1 to February 14, ourexponential trend model, the model used to indicate variable’s growth or decay, will be conductedbased on data starting from February 15 when the case of death is starting to be bigger than 0 Besides,the linear trend and quadratic trend will also be based on that day for consistency

*Regression output and hypothesis test for significant trend model

Linear Trend

Figure 12 Regression output of the linear trend of the number of deaths in the European Union

from February 15 to April 30, 2020

Hypothesis testing (at a 5% significance level) for Linear Trend in European Union

Step 1: H0: β1 = 0 (There is no linear trend)

H1: β1 0 (There is a linear trend)

Step 2: p-value (0.00) < α ( 0.05)→ Reject H 0

Step 3: With 95% level of confidence, we can say that there is a linear trend in the number of deaths

in the European Union

b0 = -694.858→ b0 shows that the number of deaths in European Union will be -694.858 in day 0(before February 15, 2020), which makes non-sense in the real life Therefore, there should be 0deaths on February 14, 2019

b1 = 62.211 → b1 shows that the number of deaths in European Union will increase by 62.211deaths every day

Quadratic Trend

Ngày đăng: 10/05/2022, 08:49

🧩 Sản phẩm bạn có thể quan tâm

w