Case study analysis academic performance of university students by TWO WAY ANOVA test

HANOI UNIVERSITY FACULTY OF MANAGEMENT AND TOURISM STATISTICS FOR ECONOMICS FALL 2019 Case study analysis: Academic Performance of University Students by TWO-WAY ANOVA Test Instructor: M

Trang 1

HANOI UNIVERSITY FACULTY OF MANAGEMENT AND TOURISM

STATISTICS FOR ECONOMICS

(FALL 2019)

Case study analysis: Academic Performance of

University Students

by TWO-WAY ANOVA Test

Instructor: Ms Lai Hoai Phuong Tutorial 5 - Group 6

Group members:

Trang 2

TABLE OF CONTENTS

I Introduction 1

II Answering questions 1

Question 1 1

Question 2 1

Question 3 5

Question 4 7

Question 5 8

Question 6 10

Trang 3

I INTRODUCTION

Analysis of variance (ANOVA) is a statistical technique that assesses potential differences in a scale-level dependent variable by a nominal-level variable having two or more categories By using this method, the aggregate variability in a dataset is divided into two parts: random factors and systematic factors In fact, we often use two types of ANOVA methods to determine whether differences exist among population means, they are: one-way and two-way

In particular, a one-way ANOVA has just one independent variable, which estimates the effect

of a factor on a response variable The other, a two-way ANOVA, refers to an ANOVA using two independent variables In this case study: we study the relationship, if any, between classroom seating positions and academic performance (GPA) for both female and male students in a large university in the United States by the way of using two-way ANOVA method The aim of our project is to describe how the outstanding features of two-way ANOVA model applied into the real case study

II ANSWERING THE QUESTIONS

1 What inference technique should be considered for this study? Explain.

The objective of the survey in this case is to test for any significant interaction between Classroom seating positions and Gender and to test for any significant difference in academic performance (GPA) due to seat preference and gender We can easily notice that the suitable inference technique should be

used for this study is Two-way ANOVA model Two-way ANOVA compares the mean differences

among groups that have been split into 2 independent factors, each with several levels In particular, it is clear that respondents were asked to specify one of three levels of seat preference: “front” , “middle” and “back” Therefore, seating positions become the first factor which including 3 levels The second factor is gender with 2 levels of male and female From utilizing two factors, two-way ANOVA will expose the interaction

Trang 4

between these two factors Each combination of the factors is named a cell Therefore, total combinations of seats and genders results in 6 cells

2 Produce descriptive statistics for the dataset You are expected to generate as many

relevant descriptive statistics as possible using ALL the relevant tools introduced in the labs of this course Remember to provide appropriate interpretations for the descriptive

statistics Try not to include unnecessary or irrelevant descriptive statistics.

2.1 Sample size

The sample of the conservations is normally distributed It is conducted by 300 respondents which are large enough and it is independent because the attendants are randomly selected There are three variables consisting of the GPA, the gender (male,female), and the Seat (front, middle and back )

2.2 Mean and Standard deviation

We can get the mean of the GPA and find the standard deviation of two other variables but we have to convert variable Gender and Seat into factors Using “Factor” function, then use “By” function to get the mean for two groups at the same time

Convert variable Gender and Seat into factors and Crosstabulation table between

Gender and Seat variables:

❖ StudentSurvey$Gender <- factor(StudentSurvey$Gender, levels=c("Male","Female"))

❖ StudentSurvey$Seat <- factor(StudentSurvey$Seat, levels=c("Back","Front","Middle"))

Trang 5

❖ table(StudentSurvey$Seat,StudentSurvey$Gender)

❖ Mean of GPA for each combination of Seat and Gender :

Trang 6

❖ Standard deviation of GPA for each combination of Seat and Gender:

From this output, it is clearly seen that the highest standard deviation is the combination of back seat and male gender at 0.4958685 and the lowest one is 0.3795011 examined from the group of front seat and female gender

2 3 Boxplot and mean plot

❖ Graphical description

boxplot(GPA ~ interaction(Seat,Gender), data = StudentSurvey, xlab = "Seat and Gender", ylab = "expected GPA", col = c("red", "blue", "yellow","grey","pink","green"))

Trang 7

Judging the above boxplots, we can see that students who are female often have a stable mean than male In the male gender, the lowest GPA appears in the student groups who prefered the back, while it is middle in the female gender The black line which represent the median of the group reach the highest in group “Front.Female” and lowest in “Back.Male” Furthermore, there are total seven outliers in the boxplots

➢ install.packages("gplots")

➢ library(gplots)

➢ plotmeans(GPA ~ interaction(Seat,Gender), data = StudentSurvey, xlab = "Seat and Gender", ylab = "expected GPA", main="Mean Plot with 95% CI")

Trang 8

Mean plot provides the difference between mean GPA of each combination and standard deviation of them Plot in front seat combined with female gender stands at the highest GPA with more than 3.3 , followed by “Back.Female” at nearly 3.2, and the lowest one is the

“Back.Male” with only 3.0

3 Check all the assumptions of the inference technique you suggest in Question 1 Are the

assumptions satisfied? Explain.

There are 3 assumptions required to use two – way ANOVA:

Samples are independent, simple random samples

All populations are normal distributions.

All populations have the same standard deviation: : = = …=

3.1 Samples are independent, simple random samples

Looking up for the definition of an independent sample, it is a sample which does not have any connection to another sample when they happen The samples are independent, the occurrence of this sample does not influence the probability of another sample

table(StudentSurvey$Seat,StudentSurvey$Gender)

As can be seen, the total sample size of this survey is 300 observations provided in the accompanying file named StudentSurvey.csv, consists of six groups: Male, Back-Female, Front-Male, Front-Back-Female, Middle-Male, Middle-Female Since there is not any information on how respondents are selected, the group thinks that they are chosen randomly Each response came from a different person, and his/her answer is not affected by another Therefore, the samples are independent, and are randomly selected

3.2 All populations have the same standard deviation

Trang 9

To check whether all populations have the same standard deviation or not, we look for the ratio of the largest standard deviation divided by the smallest one If this ratio is smaller than

2, we can conclude that the populations are equal

From the by() function shown in question 2 to get the standard deviation, it can be seen that the

largest SD is 0.4958685, while the smallest SD is 0.3795011 The ratio of these two components is 1.3, which is smaller than 2 Therefore, we can conclude that all populations have the same standard deviation

Another technique can be used to check this assumption is to conduct the Levene test This test is to check the homogeneity of the variance, so the null hypothesis is all the variances which are equal We compare the P-value of the Levene test and our significant level (α = 0.05) The rejection rule is to reject Ho if P-value is smaller than α

The Levene test is in the “car” package, so it is necessary to install “car” package

R code:

-> install.packages("car")

-> library(car)

leveneTest(StudentSurvey$GPA,interaction(StudentSurvey$Seat,StudentSurvey$Gender),center

=mean)

The outcome

Levene's Test for Homogeneity of Variance (center = mean)

Df F value Pr(>F)

group 5 1.1739 0.322

294

The P-value of the test is 0.322 while our α is 0.05, therefore we do not reject the hypothesis,

as well as cannot conclude that the standard deviations are different

However, since the ratio is smaller than 2, conducting the Levene test is not truly necessary in this case If the ratio of this case is larger than 3, we should choose other tests instead of the Two-way ANOVA

3.3 All populations are normal distributions

We can check the normality by using Q-Q plot of residuals (The Q-Q plot was made in Rsudio)

with this code and output:

install.packages("car")

library(car)

Trang 10

leveneTest(StudentSurvey$GPA,interaction(StudentSurvey$Seat,StudentSurvey$Gender) , center=mean)

qqPlot(lm(GPA ~ Gender + Seat + Gender*Seat, data=StudentSurvey), simulate=T, main="Q-Q Plot", labels=F)

The outcome:

It is clearly seen from the Q-Q plot that all outliers lie within the confidence envelop, which obviously demonstrates that all populations are normally distributed

4 Perform the inference technique you suggest in Question 1 Remember to provide all the

necessary steps What are your interpretations and conclusions? Explain.

ANOVA test 2-way factors:

Step 1: Identify null and alternative hypothesis:

Ho: There is not a significant interaction between seat preference and gender in GPA

Ha: There is significant interaction between seat preference and gender in GPA

Step 2: Test statistic and p-value:

Trang 11

We used Rstudio to calculate and had the output as following:

> StudentSurvey.result<-aov(GPA ~ Gender*Seat, data = StudentSurvey)

> summary(StudentSurvey.result)

Df Sum Sq Mean Sq F value Pr(>F)

-Signif codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Step 3: Level of significance: α=0.05

Step 4: Decision rule:

Reject Ho if p-value < ∝

From R output, we can see that the interaction between seat preference and gender has P-value:

0.0339<∝

Reject Ho The effect of interaction between seat preference and gender is significant

Step 5: Conclusion:

We have enough statistical evidence to conclude that there are significant differences in GPA due to seat preference and gender

Question 5: Draw an interaction plot and interpret the plot

As you can see that there is a significant interaction in GPA due to genders and seat is

the interaction plot here with:

Rcode:

> interaction.plot(StudentSurvey$Gender,StudentSurvey$Seat,

StudentSurvey$GPA,type=“b”,col=c(“red”, “blue”),pch=c(16,18),main=“Interaction

between Gender and Seat”)

Trang 12

Figure 7: Interaction Plot between Gender and Seat

As we can see from the interaction plot, the male and female student groups record a significant difference among the ones who sit in the front, middle and back Looking at the details, the female group who sit in the front scores the highest GPA with over 3.3 while the male group who also sit at the same spot has 3.1 The female sitting in the middle has approximately 3.1 and the male group has a bit higher GPA The female group who sits in the back shows a similarity with the ones who sit in the middle but the male has the lowest GPA (less than 3.0) From this interaction, we can conclude that the ones who sit from the middle to the front has the tendency of having higher GPA Yet, the female group who sits in the back also has remarkable result.

An intersection among seat lines can be observed in the above interaction plot This indicates that there is a connection between genders and the seat position The female students sitting in the front and the back of the class have better performance than the male students and the contrary can be seen in the middle seat group

6 Discuss the credibility of the interpretations and conclusions of question 4 Is there anything we should be concerned about? Explain.

a Credibility of the interpretations

Trang 13

With the purpose of comparing population means when population is categorized by two categorical factors, an appropriate and useful tool is used in this case study – two-way ANOVA test Secondly, a significant level of 0.05 is utilized, which guarantees the accuracy of the test At the same time, the result of p-value is quite small meaning that there is a higher chance to reject the null hypothesis Besides, all the assumptions for the test are satisfied with clear evidences as well as explanation for each proof in the third part of the report The thing should be highlighted is that although we use “by” function to test equal variances and receive the result: Largest standard deviation/Smallest standard deviation equal 1.3 (< 2), we still apply LeveneTest to ensure the result of this assumption checking Eventually, the plot and interpretation of interaction between two factors is considered as an important part of the case study.

b Limitations of the case

First of all, one of the assumptions is that the sample of the case has to be a Simple Random Sample However, there is nothing here to ensure that the sample is chosen randomly from its population Moreover, ANOVA test assumes that the data are normally distributed and the violation of this assumption affects greatly on the results Since the violation in this case is moderate, therefore if there are some outliers in the QQ-plot, this assumption still can be satisfied

Another limitation is the condition of equal variances because the greater the difference in variances between groups, the greater chance that the conclusion of the test is inaccurate Eventually, when running ANOVA to test the difference of GPA due to Gender and Seat position, the result only tells whether there is a difference or not but it does not indicate how the difference is

III Conclusion

Two-way ANOVA which is used to address this case is satisfied It brings us to the conclusion that it is significant about the change in academic performance due to the relationship classroom seating positions and academic performance (GPA) for both female and male students.

Trang 14

1 Read R code with file “StudentSurvey.csv”

2 Mean and standard deviation

Trang 15

3 Check assumption

Trang 17

4 Interaction plot

Trang 18

STATISTICS FOR ECONOMICS - PEER EVALUATION FORM

Please fill out this form to perform evaluation of your group members Discuss with all members and agree on the final evaluations

Please evaluate each member out of a scale of 100% Allocation should be based upon group opinions regarding how satisfactorily the member fulfilled his/her assigned tasks within the group’s case study For example, a 100% rating should be given to members who fulfilled

satisfactorily the tasks assigned by the group.

Group members should ask themselves the following questions before assigning the

percentages to others

1 Did he/she do his/her fair share of the work on schedule and to the group’s satisfaction?

2 Did he/she cooperate with other group members?

3 Did he/she participate in, contribute to and share ideas in all relevant discussions?

4 Did he/she attend group meetings when required?

5 Did he/she relate and communicate to other group members?

Team members Contribution Signature (all members)

(100%)

Guidelines for peer evaluation:

Disregard your general impression and concentrate on group members’ performance

in the case study within this course only.

Make a fair, objective and impartial evaluation of group

members Sign the evaluation form to indicate group consensus.

Attach the evaluation form at the end of the report.

Note: Your final mark for the case study will be equal to Your group result * Your peer rating.

Định dạng
Số trang	18
Dung lượng	613,25 KB