1. Trang chủ
  2. » Giáo Dục - Đào Tạo

(TIỂU LUẬN) best trend model for predicting the worlds number of deaths due to covid 19

26 5 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Best trend model for predicting the world's number of deaths due to Covid-19
Người hướng dẫn Mr. Nguyen Thanh Liem
Trường học RMIT International University Vietnam
Chuyên ngành Business Statistics
Thể loại assignment report
Năm xuất bản 2021
Thành phố Ho Chi Minh City
Định dạng
Số trang 26
Dung lượng 1,21 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

These figures represent the country'spopulation in millions in 2021, the estimated total number of deaths from Covid-19 from April 1 to July 31 per million population in 2021, the averag

Trang 1

RMIT International University Vietnam ECON1193B – BUSINESS STATISTICS 1 Assignment 3: Team Report (35%)

TABLE OF CONTENTS

Course name Business Statistics 1

Assignment #3: Team Report

Members & sID

Nguyen Thu Huong s3878242Nguyen Thi Thuy Trang s3892060Nguyen Dinh Khanh Uyen s3891815Tran Lam Kim Ngan s3891636Nguyen Ton Minh Nhat s3878695

Words count 3964 (content only)

Number of Pages 26

Trang 2

PART 1: DATA COLLECTION 5

PART 2: DESCRIPTIVE STATISTICS 5

1 Measures Of Central Tendency 5

2 Measures Of Variation 6

3 Measures Of Shape 6

PART 3: 9

REGION A: NORTH AMERICA 9

REGION B: ASIA 10

PART 4: TEAM REGRESSION CONCLUSION 12

PART 5: TIME SERIES 12

REGION A – NORTH AMERICA 12

REGION B – ASIA 15

PART 6: TIME SERIES CONCLUSION 19

1 Line Chart 19

2 Best Trend Model For Predicting The World's Number Of Deaths Due To Covid-19 20

PART 7: OVERALL TEAM CONCLUSION 20

1 The Main Factor That Impacts The Number Of Deaths Due To Covid-19? 20

2 Predict The Number Of Deaths On October 31 21

3 The Prediction In The Number Of Covid-19 Deaths At The End Of 2021 21

4 Two Variables Might Impact The Global Deaths Case 22

REFERENCES 22

APPENDIX 25

PART CONTRIBUTION

Trang 3

contributed %

Part 1: Data collection

The analysis data in this report was collected from 20 countries of Asia and 20 countries ofNorth America; it included six categories in the Excel File These figures represent the country'spopulation (in millions) in 2021, the estimated total number of deaths from Covid-19 from April

1 to July 31 (per million population) in 2021, the average rainfall (in millimeters) and averagetemperature (in degrees Celsius) from 1991 to 2020, and the total number of hospital beds andmedical doctors (per 10,000 people) in 2017 The figures are collected from The World Bankdatabase, Our World in Data, World Health Organization (WHO), government sources with highreliability Because some countries are missing information, this report will analyze only 20countries in each region to ensure that all of them have enough information for the other parts

Part 2: Descriptive Statistics

1 Measures of Central Tendency

Table 1: Test of outliers of total deaths (million per population) due to COVID-19 in Asia from

April 1st to July 31st, 2021

To conduct proper research of descriptive statistics measurements, outliers from each region aretested The table reveals no outliers in Asia's data, while there is one in North America's data

Trang 4

Type of Measures Region A: Asia Comparison

(>,<,=)

Region B: North America

Table 2: Measures of Variation of total deaths (million per population) due to COVID-19 in two

regions from April 1st to July 31st, 2021

Table 2 shows that A and B have no Mode values Outliers in the sample influence theMean value (Frost n.d.) It is unaffected by the entire dataset unlike the Mean Since the Median

is not affected by extreme values (Frost n.d.) It therefore compares and evaluates overall deathstatistics well Asia had 49.58 median COVID-19 deaths vs 75.84 in North America COVID-19has a significant impact on both regions because to climate change (Sandoiu 2020)

2 Measures of Variation

A: Asia

Comparison (>,<,=)

Region B: North America

Table 3: Measures of Variation of total deaths (million per population) due to

COVID-19 in two regions from April 1st to July 31st, 2021

The Interquartile Range (IQR) is the best choice for comparing Asian and North American datasets since it removes the impact of outliers Because the IQR shows the midpoint between the upper and lower quartiles, it is ideal for skewed distributions (Bhandari 2020) Table 3 indicates that North America had 77.76 deaths whereas Asia had 44.38 Using the Median to compare IQR, the two regions are similar

3 Measures of Shape

 Box-and-whisker Plot

Trang 5

Figure 1: Box and whisker plots of total deaths (million per population) due to COVID-19 in

Asia from April 1st to July 31st, 2021

Table 4: Comparision of Asia's COVID-19 death rate

Trang 6

Figure 2: Box and whisker plots of total deaths (million per population) due to COVID-19 in

North America from April 1st to July 31st, 2021

Table 5: Comparision of North America's COVID-19 death rate

Trang 7

This plot is preferred because outliers are visible on the chart While death rates vary, Asia and North America are expected to have similar distributions Their values are asymmetric and tend

to be more positive From 47.26 to 125.03, the box and whisker plot in North America is larger and higher than in Asia (35.63 to 80.01) COVID-19 reportedly impacted North America more than Asia, according to one report

Part 3:

Region A: North America

1 Regression Analysis:

Following backward elimination, we discovered that the average temperatures (in Celsius) from

1991 to 2020 and the population (in millions) in 2021 are two important independent variables atthe 0.05 level of significance in the final model

Figure 3 FINAL regression model of Region A: NA

2 Regression Equation

In this equation, units are:

 Estimated Total number of deaths due to COVID-19 (per million population)

 Population in a million in 2021

 The average temperature in 2020

Trang 8

3 Regression coefficients:

= 0.01, the total number of fatalities due to COVID-19 in NA will rise by 0.01 deaths per millionpeople for every 1 Celsius increase in average temperature, assuming the population remains constant

= 0.399, under the assumption that the average temperature is 0 degrees Celsius and there is nopopulation, the total number of fatalities attributable to COVID-19 in NA between April 1 andJuly 31, 2021 is 0.399 This implies that there are about 0.399 instances that are independent ofaverage temperature

= 9.03, the total number of fatalities attributable to COVID-19 in NA between April 1 and July

31, 2021, will rise by 9.03 deaths per million population for every 1 million increase in thecountry's population, assuming the average temperature remains constant

4 Coefficients of determination:

According to the coefficient of determination (R Square = 0.39), variations in average temperature and population account for 39% of the variation in total COVID 19 fatalities in NA between April 1 and July 31, 2021, while other factors (61%) account for the remaining 39% This regression model has an average goodness of fit and explanatory power, and 61 percent of the observation points do not lie on the regression line

Region B: Asia

1 Regression Analysis:

According to backward elimination, the linear regression for Asia has just one significanthospital bed variable at the 0.05 level of significance

Trang 9

Figure 4 FINAL regression model of Region B: Aisa

2 Regression Equation

In this equation, units are:

 Estimated Total number of deaths due to COVID-19 (per million population)

 Hospital beds (per 10,000 people)

3 Regression coefficients:

With the Y-intercept: b 0 = 0.465, when the hospital bed total is zero, the estimated number offatalities is 0.465 That implies there are about 0.465 patients that are not dependent on hospitalbeds

With the slope coefficient of the population, b 1 = 0.01 indicates that for every hospital bed per10,000 people, the overall number of fatalities would rise by 0.01 cases

→ Hospital beds significantly affect the overall number of fatalities caused by COVID-19 inAsia's nations Additionally, there is a positive connection since a more significant number ofhospital beds corresponds to a greater number of COVID-19 mortality instances

However, owing to the unrealistic covariate connection between the number of fatalities andhospital beds in reality, this conclusion does not make sense

4 Coefficients of determination:

Trang 10

The R square = 0.02 percent indicates that hospital beds explain approximately 2% of theCOVID-19 total number of fatalities variance Other variables, on the other hand, account for98% of the variance in the COVID-19 total number of deaths.

Part 4: Team Regression conclusion

Part 3 shows that the two areas' regression models have distinct significant independentvariables In the United States, the two independent variables of average temperature (in Celsius)from 1991 to 2020 and population (in million) in 2021 could be used to reflect changes in thetotal deaths due to COVID 19 between April 1 and July 31, 2021; in Asia, only one significantindependent variable - the number of hospital beds per 10,000 people - can be used

The significant independent variables explain 39 percent of the total variation in the totaldeaths due to COVID 19 in NA, and the average temperature has a larger stimulatory effect onthe number of people who died due to COVID-19 in the period than the population of thiscountry, when compared to the regression coefficient of the two variables While in Asia, only2% of the overall variance in the total number of COVID 19-related fatalities can be explained

by the number of hospital beds in our research

Following a comparison of central tendency analysis by median and skewness analysis byboxplot, Part 2 finds that the number of deaths due to COVID 19 in NA is considerably higherthan Asia As a result of the former's larger absolute regression coefficient than the latter, in ourstudy, the total COVID-19 deaths in NA are more affected by significant independent variablesthan in Asia, despite the fact that the latter's regression coefficient is less Finally, North Americahas been hit worse than Asia by the pandemic

Part 5: TIME SERIES

REGION A – North America

Linear trend Quadratic trend Exponential trend

Trang 11

Table 6: P-value and R2 of the three trend models.

1 Trend Models

Through calculations, the P-values of linear, quadratic, and exponential trends are collected

in the table above In the next stage, those numbers are used to examine the trends in the number

of humans who died by Covid-19 from April 1 2021 to July 31 2021.st, st,

H0: 1 = 0 (There is no trend in the total number of deaths in North America)

H1: 1 ≠ 0 (There is a trend in the total number of deaths in North America)

Linear trend: According to the table above, it can be seen that the p-value is 2.474x10-29 < α

= 0.05, H0 is rejected Therefore, with 95% certainty, it is clear that North American nations are seeing a linear trend in the daily number of deaths caused by COVID-19.

Quadratic trend: Based on Table 6, we accept H1 because the p-value (0.000209075) is

lower than α (0.05) As a result, there is a quadratic trend in the number of COVID-19 fatalities per day in North America from April 1 to July 31 occurs with 95% confidence.st st

Exponential trend: We reject H0 because p-value = 2.373x10-31 < α = 0.05 (Table 6) Thus,

from April 1 to July 31 , st st North American countries had an exponential trend in the dailycasualties caused by COVID-19, with 95% of confidence

 Regression Output – QUA

Figure 5: Quadratic trend regression output of North America.

Trang 12

The figure above also shows the regression output of three trend models of North Americannations (Region A), we may compare the coefficient of R2 was calculated to identify the bestmodel to forecast the daily number of new victims due to COVID-19 As a result, out of the threetrend models, the Quadratic trend model is the most effective option for estimating future

death cases in North American nations since it has the least amount of erroneous computation

 Formula & Coefficient Explanation

Ŷ = 2.6935 − 0.0255 ( ) + 0.0001 ( )T T2

 = -0.0255+0.0001×2TWhen T = 0, = 2.6935 indicates a 2.6935% change in the number of COVID-19 deathsacross North American nations every day However, because 0 is an unobservable value, thisinterpretation is illogical

In contrast, = -0.0255+0.0001×2T, demonstrating that as time passes, T rises, causing thetotal number of COVID-19 fatalities across Asia to change by -0.0255+0.0001×2T

2 Recommended Trend Model

Table 7: SSE & MAD results of the three trend models.

To evaluate which trend model is an ideal choice for estimating the number of COVID-19fatalities in North America, we created a table of two computations that analyze the errors - SSE

& MAD As in the table demonstrated above, the Exponential trend model produces the smallest

Trang 13

SSE and MAD results Therefore, the Exponential appears to be the best acceptable trend model for forecasting the number of deaths caused by COVID-19 in North America.

3 Prediction for the number of COVID-19 deaths

Trang 14

H0: 1=0 (There is no trend in the total number of deaths in Asia)

H1: 10 (There is a trend in the total number of deaths in Asia)

Linear Trend: Since the p-value is larger than (0.852 > 0.05) (Table 9) H0 is accepted.,

Hence, we are 95% confident that there is no linear trend in the total number of daily deaths due

to COVID-19 in Asia from April 01 to July 31, 2021

Quadratic Trend: As can be seen from the table above, both p-values of T (4.191×10-14)

and T-squared (1.31210-14) are smaller than (0.05).� Consequently, we reject H0 With 95%

certainty, we ensure that the total number of COVID-19 daily deaths in Asia between April 01

and July 31, 2021 has a quadratic trend.

Exponential Trend: As is appeared to be lower than the p-value (0.05 < 0.146), we do not�reject H0 With a 95% confidence level, it is clear that the total number of daily deaths due to the

COVID-19 in Asia from April 01 to July 31, 2021 does not follow an exponential trend.

 According to the findings above, it is undoubted that the total number of daily deaths due

to COVID-19 in Asia countries follows a quadratic trend model In addition, the smaller theR2, the less the regression model fits our observations (Statistic By Jim n.d) Thus, thehighest R2 comes from the Quadratic trend model (Table 9) has once more reinforced ourconclusion that Quadratic is the significant trend model of the total number of COVID-

19 death cases in Asia.

 Regression Output – QUA

Trang 15

Figure 6: Quadratic trend regression output of the total number of daily

COVID-19 deaths in Asia.

Formula & Coefficient Explanation

Y=0.4017+0.0192(T)-0.0002(T)2

=0.0192-0.0002×2T

When T = 0, 0.4017= , the rate of change in the total number of daily COVID-19 deaths inAsia is 0.4017% Nevertheless, 0 is a value that cannot be observed, so this interpretation isunreasonable

Conversely, = 0.0192-0.0002×2T indicating that as one year goes by, T increases, leading,

the rate of the total number of daily COVID-19 deaths in Asia to change by 0.0192-0.0002×2T(%)

2 Recommended Trend Model

Trang 16

Table 10: SSE & MAD results of the three trend models.

To find the best trend model for forecasting COVID-19 fatalities in Asia, we made a tablecomprising two calculations: SSE and MAD The Exponential trend model has the lowest SSE,whereas the Quadratic trend model has the lowest MAD In this instance, the existence ofoutliers directly affects SSE However, as no outliers are found in our dataset (Appendix 1),

Exponential is the most suitable trend model to forecast the number of deaths due to COVID-19 in Asia

3 Prediction for the number of COVID-19 deaths

Table 11: Correspondence Time Period.

As the Exponential trend model is confirmed to be the best trend model, its formula will beused to forecast the number of COVID 19 deaths in Asia respectively on September 28,September 29, and September 30, 2021:

Ngày đăng: 02/12/2022, 06:20

🧩 Sản phẩm bạn có thể quan tâm

w