1. Trang chủ
  2. » Giáo Dục - Đào Tạo

(TIỂU LUẬN) fileCUsersBinhBIenDownloadsTIỂU%20LUWe collected over 55 countries from both regions at first the cleaning process is straightforward, any countries with missing variables will be void below a

29 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Data Cleaning and Descriptive Statistics of World Bank Dataset 2004
Người hướng dẫn Teck Lee Yap (Stanley)
Trường học RMIT International University Vietnam
Chuyên ngành Business Statistics
Thể loại Assignment
Năm xuất bản 2004
Thành phố Ho Chi Minh City
Định dạng
Số trang 29
Dung lượng 2,64 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

PART 1:For this part of the assignment, our team has to collect nine variables of the dataset from theworld bank in the year 2004, which includes GDP per capita growth rate annual %, Lif

Trang 1

RMIT International University Vietnam

ECON1193 - Business Statistics 1

Trang 2

PART 1:

For this part of the assignment, our team has to collect nine variables of the dataset from theworld bank in the year 2004, which includes GDP per capita growth rate (annual %), Lifeexpectancy at birth, total (years), GNI per capita, Atlas method (current US$), GDP per capita(current US$), Foreign direct investment, net inflow (% of GDP), Exports of goods and services(% of GDP), Imports of goods and services (% of GDP), Trade (% of GDP) and Population (ages15-64 (total) years)

We collected over 55 countries from both regions at first The cleaning process isstraightforward, any countries with missing variables will be void Below are the followingpictures on how the data are cleaned

Figure 1: How to download data from the worldbank

Trang 3

Figure 2: Worldbank data of GDP (growth annual %)

Once choosing the countries to use, I will go download thee file as an excel as can be seen fromthe drawing above on the picture

After downloading the excel file from the world bank the raw data looks pretty messy so we transform them into a style where it is much easier to read

Figure 3: Worldbank missing information

Then, any countries with any missing variables as can be seen from the picture above we mark them as red so we could know that these countries should be deleted

Trang 4

Figure 4: Chosen country data

Finally, after successfully voiding all, we finalised the list of chosen countries as can be seen above

PART 2: DESCRIPTIVE STATICS

1 Measures of Central Tendency:

Trang 5

Q3 + 1.5*IQR = 7.1876624645

6,489599681 < 7.1876624645 => Max < Q3 + 1.5*IQR => Max is not an outlier in upper valuesQ1 – 1.5*IQR = -2.1719459235

-6,10287512 < -2.1719459235 => Min < Q1 – 1.5*IQR => Min is an outlier in lower values

European countries' average GDP per capita growth rate (mean) in 2004 was much higher than that of African countries (4.218861687 % > 1.958022018 %) Due to the exist of outliers, mean

is no longer the best measure of central tendency Meanwhile, the mode has been disabled, as a result of which neither European nor African countries are recognized Therefore, median is widely recognized as the best measure for analyzing the GDP per capita growth rate (annual %)

of European and African countries since it is not affected by outliers From the table 1, European countries have higher median than African ones, with almost 3.93% compared to mostly 3% and from this comparison, it can be said that European countries have a bigger GDP per capita growth rate (annual %) than countries in Africa

Trang 6

bigger than African countries’ one (3.876496621 > 2.339902097) About coefficient of variation, the result in European countries is considerably lower than African countries, specifically almost 59.5% compared to nearly 160% The coefficient of variation results for countries in Africa and Europe are both fairly high, implying that data dispersion around the mean is enormous in both regions To put it another way, the GDP per capita growth rates recorded in 2004 for African and European countries varied and ranged by considerable margins.

Figure 1: Box and Whisker graph of GDP per capita growth rates of 2 country categories in 2004

(annual %)

According to figure 1, the most obvious factor is that African countries had the lowest GDP per capita growth rate, whereas countries in Europe dominated the growth rate of GDP per capita European countries’ box plot is right-skewed whereas the box plot of African countries is left-skewed None of European countries has negative GDP per capita growth rate whereas the minium GDP per capita growth value of countries in Africa records a number of -6.10287512

Trang 7

(Table 3) All the values of European countries ( min, Q1, median, Q3 and max ) is higher than inAfrican countries

PART 3: MULTIPLE REGRESSION (2004)

In this case, we are going to utilize backward elimination to analyze the regression of Region A(Europe) The final regression modle after applying backward elimination will include onlyvariable(s) that are significant at the level of 5%

1 Regression Output and Scatter Plots

 Region A: Europe

Figure X: Final regression model of Europe

Trang 8

60.00 65.00 70.00 75.00 80.00 85.00 90.00 0.00

Life expectancy at birth, total (years) Line Fit Plot

GDP per capita growth rate (annual %) Linear (GDP per capita growth rate (annual %)) Predicted GDP per capita growth rate (annual %)

Life expectancy at birth, total (years)

As data shown in Figure Y, it is considerable that:

 The Life expectancy at birth, total (years) results in 2004 of Europe countries in thedataset were all higher than 60 years The points were quite near to one other, showingthat the variations in life expectancy at birth amongst Asian nations were not verysignificant

 The trendline had a decreasing slope, indicating that there was a negative relationshipbetween GDP per capita growth rate and life expectancy at birth

Trang 9

Figure Z: The scatter plot of GDP per capita growth rate (annual %) and Population (age

15-64 (total) years) of Europe countries

As data shown in Figure Z, it is considerable that:

 Most Europe countries recorded the Population (age 15-64 (total) years) results lowerthan 60 millions, whereas two of them were outliers of more than 100 millions

 The trendline had a increasing slope, indicating that there was a positive relationshipbetween GDP per capita growth rate and Population

From both Figure Y and Z, we can see that there is no Europe countries received negative GDPper capita growth rate when the points are all greater than 0 They were also quite far away fromeach other, especially there was existence of two outliers of almost 11% - 12% GDP per capitagrowth

 Region B: Africa

Trang 10

Figure 1: Final regression model of Africa

-10.00 -5.00 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00

Foreign direct investment, net inflow (% of GDP) Line Fit Plot

GDP per capita growth rate (annual %) Linear (GDP per capita growth rate (annual %))

Predicted GDP per capita growth rate (annual %)

Foreign direct investment, net inflow (% of GDP)

As data shown in Figure 2, it is considerable that:

 The majority of African countries had positive net inflows of Foreign direct investment, netinflows of (% of GDP), while one had a negative outcome The points were likewisedistributed between 0 and 6 percent of GDP which means that there are some voids inthe Foreign direct investment, net inflow among African nations

 Same as the Foreign direct investment, net inflows of (% of GDP), GDP per capita growthrate reecorded by Africa countries are positive, where as one of them had a negative result

Trang 11

The values are quite close to each other, indicating that the differnces between GDP percapita growth rate of countries in this region were not too significant.

 The trendline was going upward and this means that the relationship between betweenGDP per capita growth rate and Foreign direct investment, net inflow (% of GDP) waspositive

2 Regression Equation

The equation for regression is going to be built as below:

Whereas:

: GDP per capita growth (annual %)

: Life expectancy at birth, total (years)

Population (age 15-64 (total) years)

Whereas:

: GDP per capita growth (annual %) : Foreign direct investment, net inflow (%

Trang 12

means that 66,8% of the variation in Europe countries’ GDP per capita growth ratecan be explained by the variation of their Life expectancy at birth and Population (15-64) every single year.

Region B: Africa

Same as Europe, states that 40,7% of the variation in GDP per capita growth rate inAfrica can be observed by the changes in Foreign direct investment, net inflow (% ofGDP)

PART 4: TEAM REGRESSION CONCLUSION

 Looking at both regression model for Africa and Europe we can conclude that for Europethe most significant independent variables would be the Life expectancy and population while for Africa it would be the Foreign direct investment

 Based on our scatter plots and some academic research it is safe to say that African countries experience a higher economic growth compared to European countries, due to the fact that over the past decades, Africa has increased the trade with the rest of the world by 200% (Ighobor 2012)

 The models for Europe in 2004, showed that there was a positive relationship between the GDP per capita growth rate and Population while on the other hand, the scatter plot for life expectancy at birth and GDP per capita growth rate shows a negative

relationship On the other hand, for African countries the Foreign direct investment, net inflows of (% of GDP), GDP per capita growth rate are positive

 The regression model for Europe show that there a high coefficient of determination Byimplementing the backward elimination process, we can observe that only variables that are significant at level of 5% will be include

 For the regression model of Africa, we have found that we didn’t include variables such

as Life expectancy at birth, total (years), GNI per capita, Atlas method (current US$), GDP per capita (current US$),), Exports of goods and services (% of GDP), Imports of goods and services (% of GDP), Trade (% of GDP) and Population (ages 15-64 (total) years) Will record a p-values smaller but close to 5% significance level

PART 5: TIME SERIES

LIN,QUA,EXP trend of four countries ( South Africa, Congo, Netherland and Moldova)

Region B: Africa

Trang 13

Low income country: Congo (1990-2015)

b1= 10,7386 is the estimate increase of GDP of Congo each year

2 Quadratic regression trend (QUA)

b2= -1,28*2=-2,56 is the estimate rate GDP decrease annually of Congo

3 Exponential trend (EXP)

A Regression output:

B Formula & Coefficient explanation:

Formula:

Linear format: log(Y^)=2,154+0,016*T

Non linear format: Y^= 142,56* 1,03^T

Trang 14

b1=1,03=> annual compuand growth rate =(1,03-1)*100=3%

This is the estimation of GDP growth of Congo evey year

Middle Income country: South Africa (1990-2015)

b0= 183,16 is the estimation of GDP of South Africa when T=0

=>Does not make sense because T=0 not included in the range

b1=183,166 is the estimate average of evey one year the GDP of South africa(1990-2015) will decrease 183,166$

2.Quadratic regression trend (QUA)

b2: annually rate 7,028*2=14,056 is the increase of GDP rate annually of South Africa

3.Exponential regresion trend ( EXP)

A Regression output:

B Formula & Coefficient explanation:

Formula: Linear format: log(Y^)=3,418+0,0165*T

Non-linear format:Y^= 2618*1,03^T

Annual growrth rate: (1,03-1)*100=3%

Annualy, South Africa GDP each year will increase 3%

Trang 15

b1:=1486,5 the decrease of GDP in the time period T

2 Quadratic regression trend (QUA):

Trang 16

B Formula & Coefficient explanation:

Formula: Linear format: log(Y^)=4,305+0,0179*T

Non-linear format:Y^= 20183*1,039^T

Annual growrth rate: (1,039-1)*100=3.9%

Annualy, Netherland’s GDP each year will increase 3%

Low-Middle Imcome country: Moldova

1 Linear regression trend(LIN)

A Regression trend output

B Formula& Coefficient explanation:

Formula: Y^=-246,4711+158,653*T

Coefficient explanation:

B0=-246,4711 is the estimate of GDP when T=0 But does not make sense because the range not included T=0 So it is not related to the trend

B1:=158,653 the decrease of GDP in the time period T

2 Quadratic regression trend(QUA)

B2: annually rate 8,6*2=17,2% is the increase of GDP rate annually of Moldova

3 Exponential regression trend (EXP)

A Regression output

Trang 17

B Formula & Coefficient explanation:

Formula: Linear format: log(Y^)=2,5206+0,049*T

Non-linear format:Y^= 331,5*1,119^T

Annual growrth rate: (1,119-1)*100=11.9%

Annualy, Moldova’s GDP each year will increase 11.9%

Time series forecast: after calculating both SSE and MAD of all three trend types The smallest SSE and MAD of

 Quadratic regression trend

South Africa GDP Prediction:

Trang 18

 Linear regression trend

Netherland GDP Prediction:

PART 6: Time series Conclusion:

Figure 1: Three low-middle countries: South Africa,Moldova and Cong,DEM.REP

Description: As you can see in the graph, The above three countries have different income level:

Low Income: Congo,dem.rep: is a country with a low GDP below 1000 Since the years 2002

to 2015, there has been an upward trend and according to the above prediction, it will continue toincrease by 8% in each of 2017, 2018 and 2019

Trang 19

Low-midlle income: Moldova: grew rapidly from 2004 to 2014 Based on calculated

projections, it will continue to grow at 7% per year in 2017,2018 and 2019

decreased and reached the lowest value in 2002(1990-2015) But then it gradually increased until

2012 and tended to decrease again But with the above prediction, from 2017-2019 there will be

an upward trend in GDP with a growth rate of about 5%

Figure 2:High income country: Netherland

High income country (Netherland): had a rapid growth from 2000-2007 and as predicted

calculated above GDP will continue to grow by 3% in 2018 and 2% in 2019

Conclusion: Both region A and B follow the same trend line: The similarity is that they all grew

rapidly from 2000 onwards

Pridict world trend: After comparison and analysis, the SSE and MAD of the Quadratic

regression trend of Congo are the lowest

Trang 20

 World’s quadratic model: Y^= 282,79-24,01*T+1,28*(T^2)

PART 7: TEAM CONCLUSION:

1&2 Predicted GDP Per Capita Growth Rate In 2030:

Applying the formula derived from the quadratic regression trend model of low-income country, Congo, 2030 will have the T value of 41, which shows a rate of -0.154 lower than the previous years The main factors are possibly the urban concentration level and the takeover of

technology

3 Recommendations:

Our datasets only include gathering information from African and European countries,but there are 195 countries in the world, which means still many places not evaluated in this research If expanding the sample size, the data will be more reliable Our data has worked on 8 aspects for the GDP per capita growth rate evaluation, but studies illustrate some other factors:

 Tuğba & Yılmaz (Intechopen, 2020) demonstrate the importance of inflation rate and unemployment rate in economic growth and how their influences over GDP per capita

 Vernon Henderson (Worldbank) suggests the contribution of urban concentration level in the GDP growth rate and this also relative to level of technology

References:

 Dayıoğlu, Tuğba, and Yılmaz Aydın, September 2020, Relationship between Economic

Growth, Unemployment, Inflation and Current Account Balance: Theory and Case of Turkey IntechOpen, , IntechOpen, viewed on 30 May 2021,

<practice/relationship-between-economic-growth-unemployment-inflation-and-current-account-balance-theory-and-c.>

www.intechopen.com/books/linear-and-non-linear-financial-econometrics-theory-and- Worldbank, How Urban Concentration Affects Economic Growth, Worldbank, viewed on

30 May 2021, <

www.intechopen.com/books/linear-and-non-linear-financial-

econometrics-theory-and-practice/relationship-between-economic-growth-unemployment-inflation-and-current-account-balance-theory-and-c.>

 Kingsley Ighobor August 2012, “African economy capture world attention”, [Access

online],viewed 29 May 2021, 2012/african-economies-capture-world-attention >

<https://www.un.org/africarenewal/magazine/august- The World Bank 2021, DataBank World Development Indicators, viewed May 28, 2021,

Ngày đăng: 02/12/2022, 18:14

🧩 Sản phẩm bạn có thể quan tâm

w