GENDER MALE FEMALE 27 130 349 total 369 872 In the survey there are five factors that is believed to have impact to customer decision in online shopping they are: 1 Price, 2 Brand awaren
Trang 1Statistic of Business
Instructor: Hồ Thanh Vũ
PROJECT
REPORT
Group:
Student name Student ID
• NGUYỄN THỊ ÁNH TUYẾT BABAWE12088
• HUỲNH THANH TONG BABAWE14143
• TRẦN HOÀNG ANH BABAWE14100
• TRẦN LÊ NGỌC THỊNH BABAWE14230
• TRẦN VŨ KHA BABAWE14113
VIETNAM NATIONAL UNIVERSITY OF HCMC
INTERNATIONAL UNIVERSITY
Trang 2I Descriptive statistics about the data
II Method presentation and result interpretation for the statements
1 Women are likely to shop online rather than men
2 There is some difference in the amount of money spent among different ages in groups of women
3 There is some difference in the amount of money spent among different ages in groups of men
III Multiple linear regression between revenue/month to the five factors
IV Multiple regression with 5 factors for the following group:
1 Male under 18
2 Male aged 18 – 27
3 Male over 27
4 Female under 18
5 Female aged 18 – 27
6 Female over 27
I Descriptive statistics about the data
2015, Lazada hired Neslien CA, a company in market research, to study about factors affect to customer decision when shopping at Lazada There are 1,500 surveys were issued with 1,241
Trang 3responses (the response rate is 82.73%) The table and graph below shows how 1,241 surveys are distributed by male and female in different age groups
GENDER MALE FEMALE
<18 114 215
>27 130 349
total 369 872
In the survey there are five factors that is believed to have impact to customer decision in online shopping they are: 1) Price, 2) Brand awareness, 3) Security, 4) Easy of payment, and 5) Promotion and Marketing
The 5 factors are scored on a score range of [ -3 ; 3 ], indicating the lowest to highest evaluation from consumers to Lazada online service
The responses based 5 factors are then analyzed into the following descriptive statistics
The Mean or average is probably the most commonly used method of describing central tendency
Trang 4In this case, the mean of Price is -0.0757, which means, on average the price of Lazada is still not
expected to outperform other brands by consumers There are a bit higher level of expectation for the rest: mean of Brand (0.359), Security(0.275), Payments(0.3158), Promotions and Marketing(0.302) but still, they are not expected to outperform other brands on average
The standard error is the standard deviation of the sampling distribution of a statistic, most commonly
of the mean
In this case The standard errors of 5 factors are close to each other on such scale: Price is 0.057, a little further than that of Brand (0.055),Security(0.056)Payments(0.056),Promotions and Marketing(0.056)
The Median is the score found at the exact middle of the set of values One way to compute the median is
to list all scores in numerical order, and then locate the score in the center of the sample In this case the Median of Price is 0, equal to the mean of Brand (0),Security(0),Payments(0),Promotions and
Marketing(0) because they all have the same range and level of evaluating
The mode is the most frequently occurring value in the set of scores To determine the mode, you might
again order the scores as shown above, and then count each one The most frequently occurring value is the mode
In this case, the mode of Price is -3, which means Price is expected by most of consumers joining the survey to be totally outperformed by other brands Meanwhile the other 4 factors mostly get highest performing score: Brand (3),Security(3),Payments(3),Promotions and Marketing(3)
The standard deviation (µ) is a measure that is used to quantify the amount of variation or dispersion of
a set of data values A low standard deviation indicates that the data points tend to be close to the mean
(also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values In this case, the SD of price is the highest (2.022), followed
by security(1.991), promotion and marketing(1.972), payment(1.967) and brand(1.955)
The variance is the expectation of the squared deviation of a random variable from its mean, and it
informally measures how far a set of (random) numbers are spread out from their mean In this case, the Variance of price is the highest too (4.091), after that is security(3.966), promotion and marketing(3.891), payment(3.869) and brand(3.823)
The skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean The skewness value can be positive or negative, or even undefined In this case
Price get the top(0.053), after that is payment(-0.08), security(-0.112), promotion and marketing(-0.113) and brand(-0.114)
There is a measure of the "tailedness" of the probability distribution of areal-valued random variable In a
similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution
and, just as for skewness, there are different ways of quantifying it for a theoretical distribution and
Trang 5corresponding ways of estimating it from a sample from a population Depending on the particular measure of kurtosis that is used, there are various interpretations of kurtosis, and of how particular measures should be interpreted In this case, Brand get the top(-1.216) after that is security(-1.230), promotion and marketing(-1.249), payment (-1.253) and price(-1.259)
A Range is simple the difference between the highest and the lowest score and is determined by
subtraction If the range is small, the scores are close together; if it is large, the scores are more spread out In this case, five factors have the same range of 6 levelling scores
The maximum and minimum show up in the calculations for other summary statistics Both of these
two numbers are used to calculate the range which is simply the difference of the maximum and
minimum In this case, maximum and minimum have the same value that in turn is -3, 3
The count is number of surveys that we have summited from 1241 consumers of LAZADA.
Same indicators are used to present Lazada revenue/month from the chosen consumers, also their monthly money spent for shopping online on Lazada
REVENUE/
MONTH
Mean 167.3296132
Standard Error 11.20328795
Standard Deviation 394.6675223
Sample Variance 155762.4532
Kurtosis 96.70825409
Skewness 9.86533105
The average in money spent from each
consumer is 167.3296132
The money spent from most consumers of
Lazada is 139.95
The standard deviation is 394.6675223 The sample variance is 155762.4532
Trang 6II Illustrate the following issues
1 whether women are likely to shop online rather than men?
We can accept this statement if the average amount of money a woman spent for shopping online is larger than a man In this
• First, conduct a Hypothesis test with means of female and male are µ1 and µ2 respectively
H0 : µ1 - µ2 ≤ 0 , indicating that women doesn’t spend more money than men
H1 : µ1 - µ2 > 0, indicating that women spends more money than men
• Next, using excel tools to conduct z- test table Assume normally distributed populations, independent random samples, population variance is not given
z-Test: Two Sample for Means
female male
Hypothesized Mean Difference 0
P(Z<=z) one-tail 0.000146106
z Critical one-tail 1.644853627 P(Z<=z) two-tail 0.000292212
z Critical two-tail 1.959963985
This is a right tailed z-test Thus, Ztest > Z Critical one-tail (0.645) proves that we have enough evidence to reject
H0 From the data given, we can conclude that women are likely to shop online rather than men
2 For group of women is there any different in amount of money spent at different age?
To see whether there is difference in the amount of money spent between ages in the men group, we shall compare average money spent per male from each age group Using the approach of ANOVA testing method and by the tool of excel, we can reach the conclusion as follow
• First, conduct a Hypothesis testing theory, with the average money spent between female under 18 (μ1 ), female aged 18 – 27 (μ2 ), and female over 27 (μ3 )
H0: μ1 = μ2 = μ3,
• indicating that there is no significant difference
H1: Not all μi (i = 1,2,3) are equal
• there is a significant difference in the money spent of at least one age group of female
Trang 7• Next, use analysis tool in Excel to conduct the ANOVA table to get the F ratio
SUMMARY
Groups Count Sum Average Variance
29699.9 9
138.139488
4 77936.8052
69024.6
4 224.105974
358031.950
5
>27YR 349 62171.33 178.1413467 185320.5575
ANOVA
Source of Variation SS df MS F
Between Groups
959348.650
479674.325
4
2.18141224
2
Within Groups
191085839
219891.644
6
The test statistic value:
FT= F-ratio = 2.1814
At α=0.05, the critical value:
FC = F(2,869,0.05)=3.0061
Thus, at 0.05 level of significance, we can’t reject the null hypothesis since FT<FC It means that based on the ANOVA table and the hypothesis testing, there is no difference between the amount of money spent for online shopping between different age groups of women
3 For group of men is there any different in amount of money spent at different age?
to reach the conclusion, the same method and tools used previously shall be applied for this question
• First, conduct Hypothesis test for the means of revenue/month from each age group of male
H0: μ1 = μ2 = μ3
H1: Not all μi (i = 1,2,3) are equal
• Next, use excel to conduct the Anova: Single Factor table
Trang 8Groups Count Sum Average Variance
<18YR 114 15070.32 132.1957895 627.0838352 18-27 125 16344.47 130.75576 456.4155972
>27YR 130 15345.3 118.0407692 709.7350056
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 15246.90229 2 7623.451145 12.7398744 4.47957E-06 3.020386874 Within Groups 219011.8232 366 598.3929594
The test statistic value :
FT= F – ratio
At α = 0.05, Fc(2,366, 0.05) = 3.0203
Since Fc < FT → There is enough evidence to reject the null Hypothesis H0.
It means that based on the ANOVA table and the hypothesis testing, there is some differences in amount of money spent between age groups of female
III The multiple linear regression between revenue/month to the five factors
• First we conduct hypothesis testing based on ANOVA table to claim the dependence of
revenue/month on the 5 factors
The coefficient of price, brand awareness, security, ease of payment, promotion and marketing are β1,
β2, β3, β4, β5 respectively
H0: β1 = β2 = β3 = β4 = β5 = 0, indicating that there is no regression relationship between revenue and any of the 5 factors
H1: not all the βi (i=1,2,3,4,5) are zero, revenue is dependent to at least 1 of the 5 factors
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.28280788
R Square 0.079980297
Adjusted R
Standard Error 379.3213759
Trang 9Observations 1241
ANOVA
df SS MS F Significanc e F
Regression 5
15447829.7
9 3089565.957
21.4725111
5 1.188E-20 Residual 1235 177697612.1 143884.7062
Total 1240 193145441.9
Trang 10At a 0.05 level of
significance, the critical
value:
Fcrit = F(0.05, 5, 1235) =
2.22132601
Since F ratio > Fcrit , there is
enough evidence to reject H0
Thus, revenue/month is
dependent to at least 1 of the 5
factors
• Now, we can construct a
regression model by taking
information in regression
analysis table below, with
inputs from the data given,
using data analysis in
excel
Coefficient
s Standard Error
PRICE X 1 37.530
BRAND X 2 7.383
SECURITY X 3 11.926
P & M X 5 36.557 5.464 6.691 0.000 25.838
• From the table coefficient,
we can set
up the regression equation as followings:
Y=150.35 + 37.53 X 1 + 7.38 X 2 + 11.92 X 3 + 8.98 X 4 + 36.55
X 5 + ε
• To test whether the variables of the regression modal are significant,
we have to conduct the Z-test of individual regression parameters Our null and alternative hypothesizes of each variable:
H0: β1 = 0
H1: β1 ≠ 0
H0: β2 = 0
H1: β2 ≠ 0
H0: β3 = 0
H1: β3 ≠ 0
H0: β4 = 0
H1: β4 ≠ 0 The
critical value at 0.05 level of significance, df = 1235:
T critical = 1.960
• The conclusion can be briefly describe through the table below Thus , among 5 factors, price, security, promotion and marketing are expected
to have significant affect on revenue/month
IV Regression models for specific groups
1 Male<18
H0:µ1=µ2=µ3=µ4=µ5=0
H1: not all(µi=1,2,3,4,5) are equal to zero
At the significant level of 0.05, the critical value is:
T test > T critical
Reject H0: β1 = 0
T test < T critical
Non reject
T test > T critical
Reject
T test < T critical
Non reject
T test > T critical
Reject
ANOVA
Significance F
7553.15872
5 24.64870935 1.7193E-16
Trang 11F(0.05,5, 108)= 2.2141
FT>FC indicates that there is
enough evidence to reject the null
hypothesis, which mean there is at
least 1 factor affects the money
spent from male under 18
The regression table for the group
of male under 18
Coefficients Standard Error t Stat
Intercept 124.4638167 1.848918513 67.31709153
PRICE 0.689368267 0.816347265 0.844454678
BRAND -0.993916385 0.813290658 -1.222092466
SECURITY 6.233068927 0.949998207 6.561137566
PAYMENTS 4.627027576 0.947222083 4.884839213
P & M 6.866677223 0.833018299 8.243128913
From the table, we can set up the
regression equation as following:
Y= 124,464
+0.689X1+0.994X2+6.233X3+4.62
7X4+6.866X5 + ε
Our null and alternative
hypothesizes of each variale:
H0: β1 = 0
H1: β1 ≠ 0
H0: β2 = 0
H1: β2 ≠ 0
Df=5 Alpha=0.05, alpha/2=0.025 The critical value ±tC=±t108,0.025=1.9822 Based on the table coefficients, we can conclude
2 Case2: Male 18-27
H1: not all(µi=1,2,3,4,5) are equal to zero
Coefficients Standard Error t Stat P-value Lower 95%
Intercept 122.3481617 1.272018764 96.18424284 1.1E-114 119.8294375 PRICE 0.508473755 0.561728716- 0.905194519- 0.367191378 1.620752717
BRAND 0.611516825 0.578680059 1.05674425 0.292768888 0.534327488 SECURITY 5.899148815 0.624324642 9.448848274 3.78409E-16 4.662923669 PAYMENTS 5.983963271 0.591661516 10.11382879 9.92225E-18 4.812414376
P & M 1.027075128 0.572507111- 1.793995407- 0.075353189 2.160696388
At the significant level of 0.05, the critical value is:
F(0.05,5, 119)= 2.2899
FT>FC indicates that there is enough evidence to reject the null hypothesis, which mean there is at least 1 factor affects the money spent from male aged 18-27
From the table, we can set up the regression equation as following:
• Our null and alternative hypothesizes of each variable is the same as the one previously used
Df=5
H0:µ1=µ2=µ3=µ4=µ5=0
T test < T critical
Non Reject H0:
β1 = 0
T test < T critical
Non reject
T test > T critical
Reject
T test >T critical
reject
T test > T critical
Reject
Removable Removable Significant Significant Significant
ANOVA
df SS MS F Significance
F
8 2.23276E-26
Trang 12Alpha=0.05, alpha/2=0.025
The critical value
±tC=±t119,0.025=1.9801
Based on the table coefficients, we
can conclude
3 CASE3: MALE>27
H0:µ1=µ2=µ3=µ4=µ5=0
H1: not all(µi=1,2,3,4,5)
are equal to zero
At the significant level of 0.05, the
critical value is:
F(0.05,5, 124)= 2.2899
FT>FC indicates that there is enough evidence to reject the null hypothesis, which mean there is at least 1 factor affects the money spent from male aged
more than 27
T test < T critical
Non reject H0: β1
= 0
T test < T critical
Non reject
T test > T critical
Reject
T test > T critical
Reject
T test < T critical
Non reject
ANOVA
df SS MS F Significance F
Regression 5 64974.6426 12994.92852 60.62076828 1.14242E-31