GROUP ASSIGNMENT APPLIED STATISTICS IN BUSINESS section 1 descriptive statistics with tabular and graphical displays

SECTION 1 DESCRIPTIVE STATISTICS WITH TABULARAND GRAPHICAL DISPLAYS 1.1 Question 1 A frequency distribution is a tabular summary of data showing the number frequency ofobservations in ea

Trang 1

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY SCHOOL OF ECONOMICS AND MANAGEMENT

GROUP ASSIGNMENT APPLIED STATISTICS IN BUSINESS

NGUYEN PHUONG ANH

HANOI, June 2021

Lecturer’s Signature

Trang 2

We assure that this is my own research report All the data, figures in the report are from my own study and cited fully from known sources We do not copy from any documents and do not violate the regulations for plagiarism

Trang 3

The success and final outcome of this project required a lot of guidance and assistance from many people and we are extremely fortunate to have got this all along the completion of my project work Whatever we have done is only due to such guidance and assistance and we would not forget to thank them.

We respect and thank Mr Nguyen Tien Dung, for giving us an opportunity to

do the project work in Applied Statistics and Experimental Design We are extremely grateful to him for providing such a nice lecture in every online class

on Microsoft Teams.

Trang 4

TABLE OF CONTENTS

List of Figures

List of Tables

Executive Summary

Section 1 Descriptive Statistics with Tabular and Graphical Displays 3

1.1 Question 1 x

1.2 Question 2 x

1.3 Question 3 x

1.4 Question 4 x

1.5 Question 5 x

Section 2 Descriptive Statistics with Numerical Measures 2.1 Question 1 x

2.2 Question 2 x

2.3 Question 3 x

2.4 Question 4 x

2.5 Question 5 x

Section 3 Hypothesis Tests 3.1 Question 1 x

3.2 Question 2 x

3.3 Question 3 x

3.4 Question 4 x

3.5 Question 5 x

Section 4 Experimental Design and ANOVA 4.1 Question 1 x

4.2 Question 2 x

4.3 Question 3 x

4.4 Question 4 x

4.5 Question 5 x

Section 5 Statistical Analysis with Real Data 5.1 Data Description x

5.2 Analysis Objectives x

5.3 Data Analysis and Interpretation x

5.4 Concluding Remarks x

References x

Appendices x

Trang 5

LIST OF FIGURES

Figure 1.1 xxxx xFigure 2.1 xxxx xFigure 2.2 xxxx x

LIST OF TABLES

Table 1.1 xxxx xTable 2.1 xxxx xTable 2.2 xxxx x

Trang 7

SECTION 1 DESCRIPTIVE STATISTICS WITH TABULAR

AND GRAPHICAL DISPLAYS 1.1 Question 1

A frequency distribution is a tabular summary of data showing the number (frequency) ofobservations in each of several nonoveralpping categories or classes

A percent frequency distribution summarizes the percent frequency distribution for the keyvariables

Thus, to help management develop a customer profile, firstly, we contruct the percentfrequency distribution for the key variabiles, in this case, which is type of customers, items,net sales, mehod of payment, gender, maritual status and age group

1.1.1 Percent frequency distribution for Type of Customers

Table 1.1 Percent frequency distribution for Type of Customers

From the table above, we can see that in the sample of 100 customers, there are 70promotional customers and 30 regular customers

1.1.2 Percent frequency distribution for Items

Trang 8

Table 1.6 Percent frequency distribution for Age Group

The sum of the frequencies in frequency distribution is 100, which equals the number ofobservations In addition, the sum of the percentage in a percent fequency distribution alwaysequals 100

These percent frequency distributions provide a profile of Pelican’s customers We canconclude that:

 Over half of the customers purchase 1 or 2 items, but a few make numerous purchases

Trang 9

 The percent frequency distribution of net sales shows that 61% of the customers spent

$50 or more

 Customers are distributed across all adults age groups

 The overwhelming majority of customers are female

 Most of the customers are married

1.2 Question 2

To contruct a bar chart showing the number of customer purchases attributed to the method

of payment, we statistic the number of customer according to the method of payment by usingPivotTable in Excel

Excel’s PivotTalbe Report is an interactive tool that allows us to quickly summarize data in

a variety of ways, including developing a frequency distribution for quantitative data

1.2.1 PivotTable showing the number of customer purchases attributable to the method of payment

Table 1.7 The number of customer purchases attributable to the method of payment

1.2.2 Bar chart showing the number of customer purchases attributable to the method of payment

0 10

Figure 1.1 The number of customer purchases attributable to the method of payment

Trang 10

From the bar chart above, we conclude that a large majority of the customers usepropretary credit card.

24.99

49.99

25.00- 74.99

50.00- 99.99

75.00- 124,99

100.00- 149.99

125.00- 174.99

150.00- 199.99

175.00-200+ Total

Table 1.7 A crosstabulation of types of customer versus net sales

1.3.2 Comment on similarities or differences present

In terms of similarities figure of promotional and regular customers, we have someconclusions:

- Both types of customers have highest total amount charged to the credit card in range

of 25.00-49.99 and 50.00-74.99

- There are a few customers charged above $125.00

In terms of differences, we can conclude that:

- Customers who use promotional coupons have net sales above 175.00, but regulardoes not

In conclusion, from the crosstabulation above, it appears that net sales are larger forpromotional customers

Trang 11

0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 0

The relationship between net values and customer age

Sales

Figure 1.2 The relationship between net values and customer age

A trendline has been fitted to the data From this, it appears that there is no relationshipbetween net sales and age Thus, age is not a factor in determining net sales

1.5 Conclusion

By using the tabular and graphical methods of descriptive statistics, we can conclude thatpromotional coupons and proprietary card might affect store’s net sales, they increase netsales in detail, while age is not a factor in determining net sales

Trang 12

SECTION 2 DESCRIPTIVE STATISTICS WITH NUMERICAL MEASURES 2.1 Question 1

2.1.1 Descriptive statistics on net sales

Table 2.1 Descriptive statistics on net sales

From the above statistics, it can be observed that the average net sales is 77.60 units Thevalue of median is 59.71 which is less than mean This indicates that data is left-skewed That

is, more than half of the customers give net sales worth of 59.71 units This conclusion issupported by the value of skewness 1.715

The maximum net sales is 287.59 units while the minimum is 13.23 units

2.1.2 Descriptive statistics on net sales by various classifications of customers

Table 2.2 Descriptive statistics on net sales by promotional and regular customers

From above statistics, a few observations can be made:

Trang 13

Customers taking advantage of the promotional coupons spent more money on average.The mean amount spent by all customers is $77.60; the average amount spent by promotionalcustomers was $84.29.

The standard deviation of sales is $55.66 This indicates a fairly wide variability inpurchase amounts across customers This variability is quite a bit smaller for the regularcustomers

The distribution of the sales data is skewed to the right The mean ($77.60) is larger thanthe median ($59.71) and the skewness measure (1.71) is positive Positive skewness is typicalfor this kind of data There are no negative sales amounts and there are a few large purchases

2.2 Question 2

To determine the relationship between Age and Net sales, we calculate the correlationcoefficient

Let be the age variable, be the net sales variable

We applied the formula of the correlation coefficient for a sample, denoted by

We use MegaStat to determine descriptive statistics on Age

Table 2.3 Descriptive statistics on age

It indicates that the sample standard deviation of Age

Sample covariance will be calculated by using the formula To get the result, we use Excel

as an assistant system Thus

Trang 14

Since the value of near zero, it indicates a weak linear relationship between Net sales andAge variable In other words, age is not a factor in determining Net sales.

Trang 15

SECTION 3 HYPOTHESIS TEST 3.1 Question 1

After conducting a hypothesis test for 4 samples at the 0.01 level of significance, we havethe hypothesis testing results as follow:

Table 3.1 Hypothesis testing results

Only sample 3 leads to the rejection of the hypothesis Thus, corrective action is warrantedfor sample 3 The other samples indicate cannot be rejected; thus, the process is operatingsatisfactorily Sample 3 with shows the process is operating below the desired mean Sample 4with is on the high side, but the -value of 0.03 is not sufficient to reject p

Trang 16

This would be an increase in the probability of a making a type I error.

3.5 Conclusion

Trang 17

SECTION 4 EXPERIMENTAL DESIGN AND ANOVA 4.1 Question 1

Anova: Single Factor Data from Medical 1

:There is significant difference in the mean depression score of healthy people in the threelocation where:

= the mean depression score of healthy people in Florida

= the mean depression score of healthy people in New York

= the mean depression score of healthy people in North Carolina

Rejection Rule: Reject the null hypothesis, if the calculated value of F statistic is greater

Trang 18

Conclusion: The null hypothesis is rejected, because the sample provides enough evidence

to support score of healthy people in the three locations (F≥) so all geographical means arenot equal The factor that makes this difference is the mean between New York andFlorida

Medical 2

Hypothesis tested

:There is no significant difference in the mean depression score of healthy people in thethree location

:There is significant difference in the mean depression score of healthy people in the threelocation where:

= the mean depression score of healthy people in Florida

= the mean depression score of healthy people in New York

= the mean depression score of healthy people in North Carolina

Rejection Rule: Reject the null hypothesis, if the calculated value of F statistic is greater

Trang 19

Conclusion: The null hypothesis cannot be rejected, because the sample does not provide

enough mean depression score of healthy people in the three locations (F≤) so allgeographical mean are equal There is no relation between location and depression score

4.3 Question 3

From the above two output results we observe that:

- There is no interaction between health and locations

- There is a big difference of depression scores between good health and chronic health

- With people in reasonable good heath, geographical locations affect the levels ofdepression However, if they have some kind of chronic health problem, there will not

be a depression variation in States

Trang 20

SECTION 5 STATISTICAL ANALYSIS WITH REAL DATA

1 Population and

samples

5.1 Introduction

5.1.1 Population and Sample

Our teams found the datasets on a website named “Kaggle” The survey gathered basicinformation such as height and weight from 500 respondents Our major goal is to see if there

is a difference in average height between boys and girls, as well as the relationship betweenthe respondents' height and weight

5.1.2 Sample size

Following a debate among team members, everyone agreed to select a sample of 500 students As this number is sufficiently large for us to obtain a definite and appropriate proportion for our testing and would result in a more accurate result Furthermore,when the sample size is too narrow, the overview of that sample size on how height andweight they are may not reflect the actual condition of the total number of respondents

5.1.3 Sampling method

There are several sampling methods available to test person height and weight However,because such a large total population as the total number of obese people, with approximately

650 million people, along with a sample size of 500 was chosen led us to the decision of using

a simple random sampling method to collect the responses

5.1.4 Data Collection

After selecting the main objective and content, our group seeked datasets on the Internet.Thanks to a recommendation by Mr.Nguyen Tien Dung, we came across a survey of heightand weight on a website named “Kaggle” The data is viewed by nearly under 100 thousandusers; therefore, we are confident that the collected data is highly authentic Finally, wedocumented (gender, height, weight, index) and processed the data with the help of GoogleExcel

The tables of the data are presented in Appendix C of this report

5.1.5 Data Processing

Acknowledging that the information obtained is rather large to compute manually, wecollect theinformation and analyse the figure with the assistance of Microsoft Excelapplications We inputted the data into Excel and did some statistics by using “MegaStat '' and

“Pivot Table” Furthermore, our team also used the graph tools of Excel to visualize data incharts namely pie charts, histogram and regression line, which make readers much easier tounderstand

5.1.6 Significance level of sample test

Trang 21

According to our research, the average height of men and women is roughly 170cm.Therefore, according to what we have learnt, we form a hypothesis that there is no difference

in the average height between boy and girl with the level of significance of 5%

5.2 Descriptive Statistics

After having raw data materials, we decided to divide data into 2 groups based on gender

in order to conduct further statistics

49.00%

51.00%

Gend er proportion

Male Female

Figure 5.1 The gender proportion

The percentages of boys and girls among 500 interviewees are roughly equal, as seen in thepie chart, with 51 percent and 49 percent respectively It can ensure that the sample size issufficient and that the outcome is not biased

Firstly, we did some descriptive statistics for male The table is shown below:

Table 5.1 The male’s height descriptive statistic

Then we did the frequency distribution table and histogram

Trang 22

lower upper midpoint width frequency percent

Figure 5.2 The male’s height histogram

We can observe some basic information from the graphs The poll was conducted by 245boys with an average height of 169cm Their heights range from 140 to 199 centimeters, with

a sample standard deviation of 17.07 centimeters Furthermore, among boys, the mostcommon height (Mode) is 179cm, which is higher than the Median (171cm) and Mean(169cm), indicating a left-skewed distribution

Trang 23

Range 110 Confidence interval 95.% lower 102.31 Confidence interval 95.% upper 110.32

Table 5.3 The male’s weight descriptive statistic

Figure 5.3 The male’s weight histogram

The sample mean weight is 106.31kg, according to the data The range is 110kg, with thelowest point being 50kg Furthermore, the most common weight (Mode) is 80kg, which is

Trang 24

lower than the Median (105kg) and Mean (106.31kg), indicating that the frequencydistribution is right skewed with a lengthy right tail.

We did the same step for females:

Table 5.5 The female’s height descriptive statistic

Định dạng
Số trang	27
Dung lượng	704,02 KB