Percentages and measures of association

Một phần của tài liệu A gentle introduction to stata, fourth edition (Trang 160 - 163)

We have already discussed the use of percentages. These are often the easiest and best way to describe a relationship between two variables. In our last example, the percentage of men who said abortion was okay for any reason was slightly greater than, although not statistically significantly greater than, the percentage of women who said abortion was okay for any reason. Percentages often tell us what we want to know.

There are other ways of describing an association called measures of association; they try to summarize the strength of a relationship with a number.

The value of chi-squared depends on two things. First, the stronger the association between the variables, the bigger chi-squared will be. Second, because we have more confidence in our results when we have larger samples, then the more cases we have, the bigger chi-squared will be. In fact, for a given relationship expressed in percentages, chi-squared is a function of sample size. If you had the same relationship as in our example but, instead of having 1,939 observations, you had 19,390 (10 times as many), then chi-squared would be 20.254, also 10 times as big. There would still be 1 degree of freedom, but here the results would be statistically significant,p < 0.001. With large samples, researchers sometimes misinterpret a statistically significant chi-squared value as indicating a strong relationship. With a large sample, even a weak relationship can be statistically significant. This makes sense because with a large sample we have the power to detect even small effects.

6.4 Percentages and measures of association 131 One way to minimize this potential misinterpretation is to divide chi-squared by the maximum value it could be for a table of a particular shape and number of observations.

This is simple in the case of 2×2 tables, such as the one we are using. The maximum value of chi-squared for a 2×2 table is the sample size,N. Thus in our example, if the relationship were as strong as possible, chi-squared would be 1,939. Our chi-squared of 2.03 is tiny in comparison. The coefficientφ(phi) is defined as the positive or negative square root of the quantity chi-squared divided by N:

φ= rχ2

N

Stata uses a different formula for calculating the value ofφ. Stata’s formula produces a positive or negative value directly, depending on the arrangement of rows and columns;

see the box on page 132. Aφwith an absolute value from 0.0 to 0.19 is considered weak, from 0.20 to 0.49 is considered moderate, and from 0.50 and above is considered strong.

In our example,φ= 0.03, which we would thus describe as a weak relationship, meaning that the strength of the relationship is weak regardless of whether it is statistically significant. This is an important distinction between the strength of the relationship (some call this substantive or practical significance) and statistical significance. So long as the same distribution based on percentages describes a table, φ will have the same value whether we have 194 observations, 1,939 observations, or 19,390 observations.

Because both Cram´er’s V and φ are the square root of chi-squared divided by its maximum possible value and becauseφcan be thought of as a special case ofV, Stata simply has an option to compute Cram´er’sV. However, if you have a 2×2 table, you should call this measure of associationφto avoid confusion and recognize that it may be either a positive or negative value. On the dialog box for doing the cross-tabulation with measures of association, simply checkCramer’s Vunder the list of Test statistics. This results in the commandtabulate sex abany, chi2 row V. If you type this command directly into the Command window, remember to capitalize theV; this is a rare example where you must use an uppercase letter in Stata.

The ability to have a positive or negativeφ does not extend to larger tables where Cram´er’sV is the appropriate measure of association. The maximum value chi-squared can obtain for a larger table isN times the smaller of R−1 orC−1, where Ris the number of rows andCis the number of columns in the table. For these tables, we report the positive square root of the ratio of chi-squared to its maximum positive value and call it V:

V =

s χ2

N×min(R−1, C−1)

132 Chapter 6 Statistics and graphs for two categorical variables

Why can φ be negative?

Stata uses a special formula for calculatingφ, and this formula gives us the positive or negative sign forφdirectly. The formulas that Stata uses is

φ=n11n22−n12n21

√n1.n2.n.1n.2

Let’s look at how this formula applies to a sample table. In the following table, the first subscript is the row and the second subscript is the column. Hence,n11 is the number of cases in row 1, column 1, and n21 is the number of cases in row 2, column 1. A dot is used to refer to all cases in both rows or both columns. Thus n1.is the number of people who are in row 1 summed over both columns; it is the row total. Similarly,n.1 is the column total for the first column. Here is how this looks in a 2×2 table:

column

row 1 2 Total

1 n11 n12 n1.

2 n21 n22 n2.

Total n.1 n.2 n..

Applying the above formula to a simple table, we obtain φ = −0.3333 (re- ported in the table as Cram´er’sV):

column

row 1 2 Total

1 10 20 30

2 20 10 30

Total 30 30 60

Pearson chi2(1) = 6.6667 Pr = 0.010 Cram´er’s V = -0.3333

However, if we rearrange the rows and columns, we obtain φ= 0.3333. Here are the rearranged table and results:

column

row 1 2 Total

1 20 10 30

2 10 20 30

Total 30 30 60

Pearson chi2(1) = 6.6667 Pr = 0.010 Cram´er’s V = 0.3333

For most 2×2 tables, we do not care about how the rows and columns are arranged and so use only the positive value ofV andφ.

Một phần của tài liệu A gentle introduction to stata, fourth edition (Trang 160 - 163)

Tải bản đầy đủ (PDF)

(498 trang)