Contrasts and Post Hoc Tests on Teacher

Một phần của tài liệu Spss® Data Analysis For Univariate, Bivariate And Multivariate Statistics (2019).Pdf (Trang 80 - 83)

A rejection of the null hypothesis in the ANOVA suggests that somewhere among the means, there are population mean differences. What a statistically significant F does not tell us however is where those differences are. Theoretically, we could investigate pairwise differences for our data by per‑

forming multiple t‐tests between teachers 1 vs. 2, 1 vs. 3, 1 vs. 4, 2 vs. 3, and so on. However, recall that with each t‐test comes with it a type I error rate, set at the significance level of the test. This error rate compounds across tests, and so for the family of comparisons, the overall type I error rate will be quite high.

On the other hand, if we only had one or two comparisons to make, we could possibly get away with not trying to control the familywise type I error rate, especially if we did not want to do all compari‑

sons. This is true especially if we know a priori (i.e. before looking at the data) which comparisons we want to make based on theory. For instance, suppose that instead of making all pairwise comparisons, we only wished to compare the means of teachers 1 and 2 with the means of teachers 3 and 4:

71 00 72 50. . vs. 80 0 92 67. .

Performing only this comparison would keep the type I error rate to 0.05, the level we set for the comparison. That is, by doing only a single comparison, we have no concern that the type I error rate will inflate. To accomplish this comparison between means, we could formulate what is known as a contrast. A contrast is a linear combination of the form:

C ci 1 1 c2 2 c3 3 c4 4

where c1 through c4 are integer weights such that the sum of the weights equals 0. That is, a contrast is a linear combination of means such that j cj

J 1 0. How shall we weight the means? Well, for our contrast, since we want to contrast the means of teachers 1 and 2 to the means of teachers 3 and 4, we need to assign weights that will achieve this. The following would work:

Ci 1 1 1 2 1 3 1 4

Notice that the sum of weights is equal to 0, and if there is no mean difference between teachers 1 and 2 vs. 3 and 4, then Ci will equal 0. If there is a difference or an “imbalance” among teachers 1 and 2 vs. 3 and 4, then we would expect Ci to be unequal to 0. Notice we could have accomplished the same contrast by using weights 2, 2, and −2, −2, for instance, since j cj

J 1 0 would still hold and we would still be comparing the means we wished to compare. Theoretically, we could actually use any integer weights such that it represents the contrast of interest to us.

To do the above contrast in SPSS, we enter the following syntax:

ONEWAY ac BY teach

/CONTRAST = 1 1 -1 -1.

ANOVA

Contrast Coefficients

Contrast Tests ac

df Mean

Square F Sig.

BetweenGroups Within Groups Total

Sum of Squares 1764.125

376.833 2140.958

3 588.042 31.210 .000 18.842

20 23

teach Contrast

ac

Contrast Value of Contrast Std.

Error t df Sig.

(2-tailed) Assume equal variances

Does not assume equal variances

1 1 1 –1 –1

1.00 2.00 3.00 4.00

1 1

–29.1667 –29.1667

3.54417 3.54417

–8.229 –8.229

20 .000

.000 15.034

We see on the left that SPSS performs the ANOVA for the achievement data once more but then carries on with the contrast below the summary table. Notice the coefficients of 1, 1 and −1, −1 correspond to the contrast we wished to make.

The Contrast Tests reveal the p‐value for the contrast. Assuming variances in each group are unequal (let us assume so for this example simply for demonstration, though both lines yield the same decision on the null hypothesis anyway), we see the value of the contrast is equal to −29.1667, with an associ- ated t‐statistic of −8.229, evaluated on 15.034 degrees of freedom. The two‐tailed p‐value is equal to 0.000, and so we reject the null hypothesis that Ci = 0 and conclude Ci ≠ 0.

That is, we have evidence that in the popula- tion from which these data were drawn, the means for teachers 1 and 2, taken as a set, are different from the means of teachers 3 and 4.

A contrast comparing achievement means for teachers 1 and 2 with 3 and 4 was per- formed. For both variances assumed to be equal and unequal, the null hypothesis of equality was rejected (p < 0.001), and hence we have inferen- tial support to suggest a mean difference on achieve- ment between teachers 1 and 2 vs. teachers 3 and 4.

Notice we would have gotten the same contrast value had we computed it manually, computing the estimated comparison ˆCi using sample means as follows:

( ) ( ) ( ) ( )

( )( ) ( )( ) ( )( ) ( )( )

= + + − + −

= + + − + −

= −

= −

1 2 3 4

ˆ 1 1 1 1

1 71.00 1 72.50 1 80.0 1 92.67

143.5 172.67 29.17

Ci y y y y

Notice that the number of −29.17 agrees with what was generated in SPSS for the value of the con‑

trast. Incidentally, we do not really care about the sign of the contrast; we only care about whether it is sufficiently different from zero in the sample for us to reject the null hypothesis that Ci = 0. We have evidence then that, taken collectively, the means of teachers 1 and 2 are different from the means of teachers 3 and 4 on the dependent variable of achievement.

Contrasts are fine so long as we have some theory guiding us regarding which comparisons we wish to make as to not inflate our type I error rate. Usually, however, we do not have strong theory guiding us and wish to make a lot more comparisons than just a few. But as mentioned, when we make several comparisons, we can expect our type I error rate to be inflated for the entire set. Post Hoc tests will allow us to make pairwise mean comparisons but with some control over the type I error rate, and hence not allowing it to “skyrocket” across the family of comparisons. Though there are a variety of post hoc tests available for “snooping” one’s data after a statistically significant overall F from the ANOVA, they range in terms of how conservative vs. liberal they are in deciding whether a difference truly does exist:

● A conservative post hoc test will indicate a mean difference only if there is very good evidence of one. That is, conservative tests make it fairly difficult to reject the null, but if the null is rejected, you can have fairly high confidence that a mean difference truly does exist.

● A liberal post hoc test will indicate a mean difference more easily than a conservative post hoc test. That is, liberal tests make it much easier to reject null hypotheses but with less confidence that a difference truly does exist in the population.

● Ideally, for most research situations, you would like to have a test that is not overly conservative since it will not allow you very much power to reject null hypotheses. On the opposite extreme, if you choose a test that is very liberal, then although you can reject many more null hypotheses, it is more likely that at least some of those rejections will be type I errors.

So, which test to choose for most research situations? The Tukey test is considered by many to be a reasonable post hoc test for most research situations. It provides a reasonable balance between controlling the type I error rate while still having enough power to reject null hypoth‑

eses, and hence for the majority of situations in which you are needing a basic post hoc, you really cannot go wrong with choosing the Tukey’s HSD (“honestly significant difference”).

Recall that we had already requested the Tukey test for our achievement data.

Results of the test are below:

Std.Error DifferenceMean

(I-J)

95% Confidence Interval Lower Bound Upper Bound Multiple Comparisons

Dependent Variable: ac Tukey HSD

(I) teach (J) teach Sig.

1.00

2.00

3.00

4.00 2.00 3.00 4.00 1.00 3.00 4.00 1.00 2.00 4.00 1.00 2.00 3.00 Based on observed means.

The error term is Mean Square(Error) = 18.842.

*.The mean difference is significant at the .05 level.

–1.5000 –9.0000*

–21.6667*

1.5000 –7.5000*

–20.1667*

9.0000*

7.5000*

–12.6667*

21.6667*

20.1667*

12.6667*

2.50610 2.50610 2.50610 2.50610 2.50610 2.50610 2.50610 2.50610 2.50610 2.50610 2.50610 2.50610

.931 .009 .000 .931 .033 .000 .009 .033 .000 .000 .000 .000

–8.5144 –16.0144 –28.6811 –5.5144 –14.5144 –27.1811 1.9856 .4856 –19.6811 14.6522 13.1522 5.6522

5.5144 –1.9856 –14.6522 8.5144 –.4856 –13.1522 16.0144 14.5144 –5.6522 28.6811 27.1811 19.6811

The table to the side shows the comparisons between teach levels 1 through 4. We note the following from the output:

● The mean difference between teach = 1 and teach = 2 is −1.500, and is not statistically significant (p = 0.931).

● The mean difference between teach = 1 and teach = 3 is −9.00 and is statistically significant (p = 0.009).

● The mean difference between teach = 1 and teach = 4 is −21.667 and is statistically significant (p = 0.000).

● The remaining pairwise differences are interpreted in analogous fashion to the above.

● The 95% confidence intervals provide a likely range for the true mean difference parameter. For instance, for the comparison teach 1 vs. teach 2, in 95% of samples drawn from this population, the true mean difference is expected to lay between the lower limit of −8.51 and the upper limit of 5.51.

A Tukey HSD multiple comparisons post hoc procedure was used to follow up on the statistically sig- nificant ANOVA findings as to learn of where pairwise mean differences exist among teacher groups. Statistically significant mean differences were found between teachers 1 and 3 (p = 0.009), 1 and 4 (p = 0.000), 2 and 3 (p = 0.033), 2 and 4 (p = 0.000), and 3 and 4  (p = 0.000). A difference was not found between teachers 1 and 2 (p = 0.931).

Một phần của tài liệu Spss® Data Analysis For Univariate, Bivariate And Multivariate Statistics (2019).Pdf (Trang 80 - 83)

Tải bản đầy đủ (PDF)

(206 trang)