Two-sample test of a proportion

Một phần của tài liệu A gentle introduction to stata, fourth edition (Trang 187 - 192)

On the right side of the output is a one-tailed alternative hypothesis,Ha: p > 0.5.

This means the researcher thought that most adults favored school prayer, and the researcher (incorrectly) ruled out the possibility that the support could be the other way. The results show that we cannot reject the null hypothesis. When we observe only 39.7% of our sample supporting school prayer, we clearly have not received a significant result that most adults supported school prayer in 2002.

Distinguishing between two p-values

In testing proportions, we have two differentp-values. Do not let this confuse you.

Onep-value is the proportion of the sample coded1. In this example, this is the proportion that supports school prayer, 0.397 or 39.7%.

The second p-value is the level of significance. In this example, p < 0.001 refers to the probability that we would obtain this result by chance if the null hypothesis were true. Because we would get our result fewer than 1 time in 1,000 by chance, p < 0.001, we can reject the null hypothesis. Many researchers will reject a null hypothesis whenever the probability of the results is less than 0.05, that is,p <0.05.

Proportions and percentages

Often we report proportions as percentages because many readers are more comfort- able trying to understand percentages than proportions. Stata, however, requires us to use proportions. As you may recall, the conversion is simple. We divide a per- centage by 100 to get a proportion, so 78.9% corresponds to a proportion of 0.789.

Similarly, we multiply a proportion by 100 to get a percentage, so a proportion of 0.258 corresponds to a percentage of 25.8%.

7.6 Two-sample test of a proportion

Sometimes a researcher wants to compare a proportion across two samples. For example, you might have an experiment testing a new drug (wide.dta). You randomly assign 40 study participants so that 20 are in a treatment group receiving the drug and 20 are in a control group receiving a sugar pill. You record whether the person is cured

158 Chapter 7 Tests for one or two means by assigning a1to those who were cured and a0to those who were not. Here are the data:

. list

treat control

1. 1 1

2. 0 0

3. 1 0

4. 1 0

5. 1 0

6. 1 1

7. 1 1

8. 0 0

9. 1 0

10. 1 0

11. 1 0

12. 1 1

13. 1 1

14. 0 1

15. 1 1

16. 1 0

17. 0 0

18. 1 0

19. 0 0

20. 1 0

In the treatment group, we have 15 of the 20 people, or 0.75, cured; that is, they have a score of1. In the control group, just 7 of the 20 people, or 0.35, are cured. Before proceeding, we need a null and an alternative hypothesis. The null hypothesis is that the two groups have the same proportion cured. The alternative hypothesis is that the proportions are unequal, and therefore, the difference between them will not equal zero:

p(treat)−p(control)6= 0. Your statistics book may state the alternative hypothesis as p(treat)6=p(control). The two ways of stating the alternative hypothesis are equivalent.

They are both two-tailed tests because we are saying that the proportions cured in the two groups are not equal. We could argue for a one-tailed test that the proportion in the treatment group is higher, but this means that we need to rule out the possibility that it could be lower. The null hypothesis is that the two proportions are equal; hence, there is no difference between them: p(treat)−p(control) = 0.

Alternative hypothesisHa: p(treat)−p(control)6= 0 Null hypothesisH0: p(treat)−p(control) = 0

These are independent samples, and the data for the two groups are entered as two variables. To open the dialog box for this test, selectStatistics⊲Summaries, tables, and

7.6 Two-sample test of a proportion 159 tests⊲Classical tests of hypotheses⊲Proportion test. Because we have two samples, those participants in the treatment condition and those participants in the control condition, we select Two-sample using variables in the Tests of proportions section of the Main tab. Typetreatin the box for theFirst variable, and typecontrolin the box for the Second variable. That is all there is to it, and our results are

. prtest treat == control

Two-sample test of proportions treat: Number of obs = 20 control: Number of obs = 20 Variable Mean Std. Err. z P>|z| [95% Conf. Interval]

treat .75 .0968246 .5602273 .9397727

control .35 .1066536 .1409627 .5590373

diff .4 .1440486 .1176699 .6823301

under Ho: .1573213 2.54 0.011

diff = prop(treat) - prop(control) z = 2.5426

Ho: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(Z < z) = 0.9945 Pr(|Z| < |z|) = 0.0110 Pr(Z > z) = 0.0055

These results have a layout that is similar to that of the one-sample proportion test.

The difference is that we now have two groups, so we get statistical information for each group. Under Mean is the proportion of 1 codes in each group. We have 0.75 (75%) of the treatment group coded as cured, compared with just 0.35 (35%) of the control group. The difference between the treatment-group mean and the control-group mean is 0.40; that is, 0.75−0.35 = 0.40. This appears in the table as the variablediffwith the mean of 0.4.

Directly below the table is the null hypothesis that the difference in the proportion cured in the two groups is 0, and to the right is the computedztest,z = 2.5426. Below this are the three hypotheses we might have selected. Using a two-tailed approach, we can say that z= 2.54,p < 0.05. If we had a one-tailed test that the treatment-group proportion was greater than the control-group proportion, ourzwould still be 2.54, but ourpwould bep <0.01. This one-tailed test has ap-value, 0.0055, that is exactly one- half of the two-tailed p-value. If someone had hypothesized that the treatment-group success would have a lower proportion than the control group, then he or she would have the results on the far left. Here the results would not be significant because the p= 0.9945 is far greater than the requiredp <0.05.

This difference-of-proportions test requires data to be entered in what is called a wide format. Each group (treatment and control) is treated as a variable with the scores on the outcome variable coded under each group, as illustrated in the listing that appeared above. Data in statistics books and related exercises often present the scores this way.

When dealing with survey data, it is common to use what is called a long format in which one variable is a grouping variable of whether someone is in the treatment

160 Chapter 7 Tests for one or two means group, coded1, or the control group, coded0. The second variable is the score on the dependent variable, which is also a binary variable coded as1if the person is cured and0 if the person is not cured. This appears in the following long-format listing (long.dta).

. list

group cure

1. 1 1

2. 1 0

3. 1 1

4. 1 1

5. 1 1

6. 1 1

7. 1 1

8. 1 0

9. 1 1

10. 1 1

11. 1 1

12. 1 1

13. 1 1

14. 1 0

15. 1 1

(output omitted)

36. 0 0

37. 0 0

38. 0 0

39. 0 0

40. 0 0

When your data are entered this way, you need to use a different test for the dif- ference of proportions. Select Statistics⊲ Summaries, tables, and tests ⊲ Classical tests of hypotheses ⊲ Proportion test like we did before, but this time we select Two-group using groupsin the Tests of proportionssection. Typecureunder Variable name(the dependent variable) andgroupunderGroup variable name(the independent variable).

Click onOK to obtain the following results:

7.6 Two-sample test of a proportion 161

. prtest cure, by(group)

Two-sample test of proportions 0: Number of obs = 20 1: Number of obs = 20 Variable Mean Std. Err. z P>|z| [95% Conf. Interval]

0 .35 .1066536 .1409627 .5590373

1 .75 .0968246 .5602273 .9397727

diff -.4 .1440486 -.6823301 -.1176699

under Ho: .1573213 -2.54 0.011

diff = prop(0) - prop(1) z = -2.5426

Ho: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(Z < z) = 0.0055 Pr(|Z| < |z|) = 0.0110 Pr(Z > z) = 0.9945

Here we have two means: one for the group coded 0of 0.35 and one for the group coded1of 0.75. These are, of course, the means for our control group and our treatment group, respectively. Below this is a row labeled diff, the difference in proportions, which has a value of−0.40 because the mean of the group coded as1is subtracted from the mean of the group coded as 0. Be careful interpreting the sign of this difference.

Here a negative value means that the treatment group is higher than the control group:

0.75 versus 0.35. Thez test for this difference isz=−2.54. This is the same absolute value that we had with the test for the wide format, but the sign is reversed. Be careful interpreting this sign just like with interpreting the difference. It happens to be negative because we are subtracting the group coded as1from the group coded as0.

We can interpret these results, including the negative sign on thez test, as follows.

The control group has a mean of 0.35 (35% of participants were cured), and the treat- ment group has a mean of 0.75 (75% of the participants were cured). Thez =−2.54, p < 0.05 indicates that the control group had a significantly lower success rate than the treatment group. Pay close attention to the way the difference of proportions was computed so that you interpret the sign of theztest correctly.

This long form is widely used in surveys. For example, in the General Social Survey 2002 data (gss2002 chapter7.dta), there is an item, abany, asking if abortion is okay anytime. The response option is binary, namely,yesorno. If we wanted to see whether more women said yes than did men, we could use a difference-of-proportions test. The grouping variable would besex, and the dependent variable would beabany. First, we must see how they are coded. To do this, we run the commandcodebook abany sex.

We see that abany is coded as a 1 for yes and 2 for no. We must recode abany and make a new variable coded0for no and1for yes. We do not need to change the coding ofsex. The independent variable has to be binary (just two values), but does not have to be coded as 0,1. Try this and do the difference of proportions test.

162 Chapter 7 Tests for one or two means

Một phần của tài liệu A gentle introduction to stata, fourth edition (Trang 187 - 192)

Tải bản đầy đủ (PDF)

(498 trang)