1. Trang chủ
  2. » Giáo Dục - Đào Tạo

STATISTICS ANOVA AND MULTIPLE REGRESSION REPORT

17 428 2

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 17
Dung lượng 145,9 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

GENDER MALE FEMALE 27 130 349 total 369 872 In the survey there are five factors that is believed to have impact to customer decision in online shopping they are: 1 Price, 2 Brand awaren

Trang 1

Statistic of Business

Instructor: Hồ Thanh Vũ

PROJECT

REPORT

Group:

Student name Student ID

• NGUYỄN THỊ ÁNH TUYẾT BABAWE12088

• HUỲNH THANH TONG BABAWE14143

• TRẦN HOÀNG ANH BABAWE14100

• TRẦN LÊ NGỌC THỊNH BABAWE14230

• TRẦN VŨ KHA BABAWE14113

VIETNAM NATIONAL UNIVERSITY OF HCMC

INTERNATIONAL UNIVERSITY

Trang 2

I Descriptive statistics about the data

II Method presentation and result interpretation for the statements

1 Women are likely to shop online rather than men

2 There is some difference in the amount of money spent among different ages in groups of women

3 There is some difference in the amount of money spent among different ages in groups of men

III Multiple linear regression between revenue/month to the five factors

IV Multiple regression with 5 factors for the following group:

1 Male under 18

2 Male aged 18 – 27

3 Male over 27

4 Female under 18

5 Female aged 18 – 27

6 Female over 27

I Descriptive statistics about the data

2015, Lazada hired Neslien CA, a company in market research, to study about factors affect to customer decision when shopping at Lazada There are 1,500 surveys were issued with 1,241

Trang 3

responses (the response rate is 82.73%) The table and graph below shows how 1,241 surveys are distributed by male and female in different age groups

GENDER MALE FEMALE

<18 114 215

>27 130 349

total 369 872

In the survey there are five factors that is believed to have impact to customer decision in online shopping they are: 1) Price, 2) Brand awareness, 3) Security, 4) Easy of payment, and 5) Promotion and Marketing

The 5 factors are scored on a score range of [ -3 ; 3 ], indicating the lowest to highest evaluation from consumers to Lazada online service

The responses based 5 factors are then analyzed into the following descriptive statistics

The Mean or average is probably the most commonly used method of describing central tendency

Trang 4

In this case, the mean of Price is -0.0757, which means, on average the price of Lazada is still not

expected to outperform other brands by consumers There are a bit higher level of expectation for the rest: mean of Brand (0.359), Security(0.275), Payments(0.3158), Promotions and Marketing(0.302) but still, they are not expected to outperform other brands on average

The standard error is the standard deviation of the sampling distribution of a statistic, most commonly

of the mean

In this case The standard errors of 5 factors are close to each other on such scale: Price is 0.057, a little further than that of Brand (0.055),Security(0.056)Payments(0.056),Promotions and Marketing(0.056)

The Median is the score found at the exact middle of the set of values One way to compute the median is

to list all scores in numerical order, and then locate the score in the center of the sample In this case the Median of Price is 0, equal to the mean of Brand (0),Security(0),Payments(0),Promotions and

Marketing(0) because they all have the same range and level of evaluating

The mode is the most frequently occurring value in the set of scores To determine the mode, you might

again order the scores as shown above, and then count each one The most frequently occurring value is the mode

In this case, the mode of Price is -3, which means Price is expected by most of consumers joining the survey to be totally outperformed by other brands Meanwhile the other 4 factors mostly get highest performing score: Brand (3),Security(3),Payments(3),Promotions and Marketing(3)

The standard deviation (µ) is a measure that is used to quantify the amount of variation or dispersion of

a set of data values A low standard deviation indicates that the data points tend to be close to the mean

(also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values In this case, the SD of price is the highest (2.022), followed

by security(1.991), promotion and marketing(1.972), payment(1.967) and brand(1.955)

The variance is the expectation of the squared deviation of a random variable from its mean, and it

informally measures how far a set of (random) numbers are spread out from their mean In this case, the Variance of price is the highest too (4.091), after that is security(3.966), promotion and marketing(3.891), payment(3.869) and brand(3.823)

The skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean The skewness value can be positive or negative, or even undefined In this case

Price get the top(0.053), after that is payment(-0.08), security(-0.112), promotion and marketing(-0.113) and brand(-0.114)

There is a measure of the "tailedness" of the probability distribution of areal-valued random variable In a

similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution

and, just as for skewness, there are different ways of quantifying it for a theoretical distribution and

Trang 5

corresponding ways of estimating it from a sample from a population Depending on the particular measure of kurtosis that is used, there are various interpretations of kurtosis, and of how particular measures should be interpreted In this case, Brand get the top(-1.216) after that is security(-1.230), promotion and marketing(-1.249), payment (-1.253) and price(-1.259)

A Range is simple the difference between the highest and the lowest score and is determined by

subtraction If the range is small, the scores are close together; if it is large, the scores are more spread out In this case, five factors have the same range of 6 levelling scores

The maximum and minimum show up in the calculations for other summary statistics Both of these

two numbers are used to calculate the range which is simply the difference of the maximum and

minimum In this case, maximum and minimum have the same value that in turn is -3, 3

The count is number of surveys that we have summited from 1241 consumers of LAZADA.

Same indicators are used to present Lazada revenue/month from the chosen consumers, also their monthly money spent for shopping online on Lazada

REVENUE/

MONTH

Mean 167.3296132

Standard Error 11.20328795

Standard Deviation 394.6675223

Sample Variance 155762.4532

Kurtosis 96.70825409

Skewness 9.86533105

The average in money spent from each

consumer is 167.3296132

The money spent from most consumers of

Lazada is 139.95

The standard deviation is 394.6675223 The sample variance is 155762.4532

Trang 6

II Illustrate the following issues

1 whether women are likely to shop online rather than men?

We can accept this statement if the average amount of money a woman spent for shopping online is larger than a man In this

• First, conduct a Hypothesis test with means of female and male are µ1 and µ2 respectively

H0 : µ1 - µ2 ≤ 0 , indicating that women doesn’t spend more money than men

H1 : µ1 - µ2 > 0, indicating that women spends more money than men

• Next, using excel tools to conduct z- test table Assume normally distributed populations, independent random samples, population variance is not given

z-Test: Two Sample for Means

female male

Hypothesized Mean Difference 0

P(Z<=z) one-tail 0.000146106

z Critical one-tail 1.644853627 P(Z<=z) two-tail 0.000292212

z Critical two-tail 1.959963985

This is a right tailed z-test Thus, Ztest > Z Critical one-tail (0.645) proves that we have enough evidence to reject

H0 From the data given, we can conclude that women are likely to shop online rather than men

2 For group of women is there any different in amount of money spent at different age?

To see whether there is difference in the amount of money spent between ages in the men group, we shall compare average money spent per male from each age group Using the approach of ANOVA testing method and by the tool of excel, we can reach the conclusion as follow

• First, conduct a Hypothesis testing theory, with the average money spent between female under 18 (μ1 ), female aged 18 – 27 (μ2 ), and female over 27 (μ3 )

H0: μ1 = μ2 = μ3,

• indicating that there is no significant difference

H1: Not all μi (i = 1,2,3) are equal

• there is a significant difference in the money spent of at least one age group of female

Trang 7

• Next, use analysis tool in Excel to conduct the ANOVA table to get the F ratio

SUMMARY

Groups Count Sum Average Variance

29699.9 9

138.139488

4 77936.8052

69024.6

4 224.105974

358031.950

5

>27YR 349 62171.33 178.1413467 185320.5575

ANOVA

Source of Variation SS df MS F

Between Groups

959348.650

479674.325

4

2.18141224

2

Within Groups

191085839

219891.644

6

The test statistic value:

FT= F-ratio = 2.1814

At α=0.05, the critical value:

FC = F(2,869,0.05)=3.0061

Thus, at 0.05 level of significance, we can’t reject the null hypothesis since FT<FC It means that based on the ANOVA table and the hypothesis testing, there is no difference between the amount of money spent for online shopping between different age groups of women

3 For group of men is there any different in amount of money spent at different age?

to reach the conclusion, the same method and tools used previously shall be applied for this question

• First, conduct Hypothesis test for the means of revenue/month from each age group of male

H0: μ1 = μ2 = μ3

H1: Not all μi (i = 1,2,3) are equal

Next, use excel to conduct the Anova: Single Factor table

Trang 8

Groups Count Sum Average Variance

<18YR 114 15070.32 132.1957895 627.0838352 18-27 125 16344.47 130.75576 456.4155972

>27YR 130 15345.3 118.0407692 709.7350056

ANOVA

Source of Variation SS df MS F P-value F crit

Between Groups 15246.90229 2 7623.451145 12.7398744 4.47957E-06 3.020386874 Within Groups 219011.8232 366 598.3929594

The test statistic value :

FT= F – ratio

At α = 0.05, Fc(2,366, 0.05) = 3.0203

Since Fc < FT → There is enough evidence to reject the null Hypothesis H0.

It means that based on the ANOVA table and the hypothesis testing, there is some differences in amount of money spent between age groups of female

III The multiple linear regression between revenue/month to the five factors

• First we conduct hypothesis testing based on ANOVA table to claim the dependence of

revenue/month on the 5 factors

The coefficient of price, brand awareness, security, ease of payment, promotion and marketing are β1,

β2, β3, β4, β5 respectively

H0: β1 = β2 = β3 = β4 = β5 = 0, indicating that there is no regression relationship between revenue and any of the 5 factors

H1: not all the βi (i=1,2,3,4,5) are zero, revenue is dependent to at least 1 of the 5 factors

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.28280788

R Square 0.079980297

Adjusted R

Standard Error 379.3213759

Trang 9

Observations 1241

ANOVA

df SS MS F Significanc e F

Regression 5

15447829.7

9 3089565.957

21.4725111

5 1.188E-20 Residual 1235 177697612.1 143884.7062

Total 1240 193145441.9

Trang 10

At a 0.05 level of

significance, the critical

value:

Fcrit = F(0.05, 5, 1235) =

2.22132601

Since F ratio > Fcrit , there is

enough evidence to reject H0

Thus, revenue/month is

dependent to at least 1 of the 5

factors

• Now, we can construct a

regression model by taking

information in regression

analysis table below, with

inputs from the data given,

using data analysis in

excel

Coefficient

s Standard Error

PRICE X 1 37.530

BRAND X 2 7.383

SECURITY X 3 11.926

P & M X 5 36.557 5.464 6.691 0.000 25.838

• From the table coefficient,

we can set

up the regression equation as followings:

Y=150.35 + 37.53 X 1 + 7.38 X 2 + 11.92 X 3 + 8.98 X 4 + 36.55

X 5 + ε

• To test whether the variables of the regression modal are significant,

we have to conduct the Z-test of individual regression parameters Our null and alternative hypothesizes of each variable:

H0: β1 = 0

H1: β1 ≠ 0

H0: β2 = 0

H1: β2 ≠ 0

H0: β3 = 0

H1: β3 ≠ 0

H0: β4 = 0

H1: β4 ≠ 0 The

critical value at 0.05 level of significance, df = 1235:

T critical = 1.960

• The conclusion can be briefly describe through the table below Thus , among 5 factors, price, security, promotion and marketing are expected

to have significant affect on revenue/month

IV Regression models for specific groups

1 Male<18

H0:µ1=µ2=µ3=µ4=µ5=0

H1: not all(µi=1,2,3,4,5) are equal to zero

At the significant level of 0.05, the critical value is:

T test > T critical

 Reject H0: β1 = 0

T test < T critical

 Non reject

T test > T critical

 Reject

T test < T critical

 Non reject

T test > T critical

 Reject

ANOVA

Significance F

7553.15872

5 24.64870935 1.7193E-16

Trang 11

F(0.05,5, 108)= 2.2141

FT>FC indicates that there is

enough evidence to reject the null

hypothesis, which mean there is at

least 1 factor affects the money

spent from male under 18

The regression table for the group

of male under 18

Coefficients Standard Error t Stat

Intercept 124.4638167 1.848918513 67.31709153

PRICE 0.689368267 0.816347265 0.844454678

BRAND -0.993916385 0.813290658 -1.222092466

SECURITY 6.233068927 0.949998207 6.561137566

PAYMENTS 4.627027576 0.947222083 4.884839213

P & M 6.866677223 0.833018299 8.243128913

From the table, we can set up the

regression equation as following:

Y= 124,464

+0.689X1+0.994X2+6.233X3+4.62

7X4+6.866X5 + ε

Our null and alternative

hypothesizes of each variale:

H0: β1 = 0

H1: β1 ≠ 0

H0: β2 = 0

H1: β2 ≠ 0

Df=5 Alpha=0.05, alpha/2=0.025 The critical value ±tC=±t108,0.025=1.9822 Based on the table coefficients, we can conclude

2 Case2: Male 18-27

H1: not all(µi=1,2,3,4,5) are equal to zero

Coefficients Standard Error t Stat P-value Lower 95%

Intercept 122.3481617 1.272018764 96.18424284 1.1E-114 119.8294375 PRICE 0.508473755 0.561728716- 0.905194519- 0.367191378 1.620752717

BRAND 0.611516825 0.578680059 1.05674425 0.292768888 0.534327488 SECURITY 5.899148815 0.624324642 9.448848274 3.78409E-16 4.662923669 PAYMENTS 5.983963271 0.591661516 10.11382879 9.92225E-18 4.812414376

P & M 1.027075128 0.572507111- 1.793995407- 0.075353189 2.160696388

At the significant level of 0.05, the critical value is:

F(0.05,5, 119)= 2.2899

FT>FC indicates that there is enough evidence to reject the null hypothesis, which mean there is at least 1 factor affects the money spent from male aged 18-27

From the table, we can set up the regression equation as following:

Our null and alternative hypothesizes of each variable is the same as the one previously used

Df=5

H0:µ1=µ2=µ3=µ4=µ5=0

T test < T critical

 Non Reject H0:

β1 = 0

T test < T critical

 Non reject

T test > T critical

 Reject

T test >T critical

 reject

T test > T critical

 Reject

Removable Removable Significant Significant Significant

ANOVA

df SS MS F Significance

F

8 2.23276E-26

Trang 12

Alpha=0.05, alpha/2=0.025

The critical value

±tC=±t119,0.025=1.9801

Based on the table coefficients, we

can conclude

3 CASE3: MALE>27

H0:µ1=µ2=µ3=µ4=µ5=0

H1: not all(µi=1,2,3,4,5)

are equal to zero

At the significant level of 0.05, the

critical value is:

F(0.05,5, 124)= 2.2899

FT>FC indicates that there is enough evidence to reject the null hypothesis, which mean there is at least 1 factor affects the money spent from male aged

more than 27

T test < T critical

 Non reject H0: β1

= 0

T test < T critical

 Non reject

T test > T critical

 Reject

T test > T critical

 Reject

T test < T critical

 Non reject

ANOVA

df SS MS F Significance F

Regression 5 64974.6426 12994.92852 60.62076828 1.14242E-31

Ngày đăng: 09/08/2016, 14:49

TỪ KHÓA LIÊN QUAN

w