The chi-square statistic is a statistic that measures the amount that our expected counts differ from our observed counts. This statistic is shown in Formula 10.2.
Formula 10.2: X2 = a
cells
(O - E)2 E where
O is the observed count in each cell E is the expected count in each cell g means add the results from each cell
Why does this statistic work? The term (O - E) is the difference between what we observe and what we expect under the null hypothesis. To measure the total amount of deviation between Observed and Expected, it is tempting to just add together the individual differences. But this doesn’t work, because the expected counts and the observed counts always add to the same value; if we sum up the differences, they will always add to 0.
You can see that the differences between Observed and Expected add to 0 in Table 10.6 on the next page, where we’ve listed the data from Example 1, but we now show the expected counts as well as the differences (Observed minus Expected).
One reason why the chi-square statistic uses squared differences is that by squar- ing the differences, we always get a positive value, because both negative and positive numbers multiplied by themselves result in positive numbers:
1-23.322
143.3 + 23.32
179.7 + 2.242
139.76 + 1-2.2422
175.24 + c
506 CHAPTER 10 ASSOCIATIOnS BeTWeen CATeGOrICAl VArIABleS
c TABLE 10.6 Gender and opinion on same-sex marriage, emphasiz- ing the Observed minus expected values.
Outcome Observed Counts Expected Counts Observed minus Expected
Strongly Agree Male 120 143.3 −23.3
Strongly Agree Female 203 179.7 23.3
Agree Male 142 139.76 2.24
Agree Female 173 175.24 −2.24
Neutral Male 64 67.44 −3.44
Neutral Female 88 84.56 3.44
Disagree Male 88 81.19 6.81
Disagree Female 95 101.81 −6.81
Strongly Disagree Male 157 139.31 17.69
Strongly Disagree Female 157 174.69 −17.69
Why divide by the expected count? The reason is that a difference between the expected and actual counts of, say, 2 is a small difference if we were expecting 1000 counts. But if we were expecting only 5 counts, then this difference of 2 is substantial.
By dividing by the expected count, we’re controlling for the size of the expected count.
Basically, for each cell, we are finding what proportion of the expected count the squared difference is.
If we apply this formula to the data in Example 1, we get X2 = 12.26. We must still decide whether this value discredits the null hypothesis that gender and the answer to the question are independent. Keep reading.
EXAMPLE 2 Viewing Violent TV as a Child and Abusiveness as an Adult
Table 10.7 shows summary statistics from a study that asked whether there was an association between watching violent TV as a child and aggressive behavior toward one’s spouse later in life. The table shows both actual counts and expected counts (in parentheses).
Find the chi-square statistic to measure the difference between the observed counts and expected counts for the study of the effect of violent TV on future behavior.
QUESTION
We use Formula 10.2 with the values for O and E taken from Table 10.7.
X2 = a(Observed - Expected)2 Expected
= (25 - 16.45)2
16.45 + (57 - 65.55)2
65.55 + (41 - 49.55)2
49.55 + (206 - 197.45)2
197.45 = 7.4047
SOLUTION c TABLE 10.7 A two-way sum-
mary of the effect of viewing TV violence on later abusiveness (expected values are shown in parentheses).
High TV Violence Low TV Violence Total
Yes, Physical Abuse 25 (16.45) 57 (65.55) 82 No Physical Abuse 41 (49.55) 206 (197.45) 247
Total 66 263 329
M10_GOUL1228_02_GE_C10.indd 506 05/09/16 3:55 pm
10.1 The BASIC InGredIenTS FOr TeSTInG WITh CATeGOrICAl VArIABleS CHAPTER 10 507
X2 = 7.40
Later we will see whether this is an unusually large value for two independent variables.
CONCLUSION
Exercise 10.9
TRY THIS!
As you might expect, for tables with many cells, these calculations can quickly become tiresome. Fortunately, technology comes to our rescue. Most statistical soft- ware will calculate the chi-square statistic for you, given data summarized in a two- way table (as in Table 10.7) or presented as raw data (as in Table 10.1), and some soft- ware will even display the expected counts alongside the observed counts. Figure 10.1 shows the output from StatCrunch for these data.
When the null hypothesis is true, our real-life observations will usually differ slightly from the expected counts just by chance. When this happens, the chi-square statistic will be a small value.
If reality is very different from what our null hypothesis claims, then our observed counts should differ substantially from the expected counts. When that happens, the chi-square statistic is a big value.
The trick, then, is to decide what values of the chi-square statistic are “big.” Big values discredit the null hypothesis. To determine whether an observed value is big, we need to know its probability distribution when the null hypothesis is true.
b FIGURE 10.1 StatCrunch output for TV violence and abusiveness.
The expected values are below the observed values.
Tech
POINTKEY If the data conform to the null hypothesis, then the value of the chi-square statistic will be small. For this reason, large values of the chi-square statistic make us suspicious of the null hypothesis.
508 CHAPTER 10 ASSOCIATIOnS BeTWeen CATeGOrICAl VArIABleS