Two-sample test of group means

A researcher says that men make more money than women because men work more hours a week. The argument is that a lot of women work part-time jobs, and these neither pay as well nor offer opportunities for advancement. What happens if we only consider people who say they work full time? Do men still make more than women when both the men and the women are working full time?

The General Social Survey 2002 dataset (gss2002 chapter7.dta) has a question asking about the respondent’s income, variable rincom98. Like many surveys, the General Social Survey does not report the actual income but reports income in categories (for example, under $1,000, $1,000 to 2,999). Run a tabulate command to see the coding the surveyors used. For some reason, they have not defined a score coded as 24 as a “missing value”, but this is what a code of 24represents. Even with highly respected national datasets like the General Social Survey, you need to check for coding errors. A code of 24 was assigned to people who did not report their income. Many researchers have used rincom98 as it is coded (I hope after defining a code of 24 as a missing value). However, this coding is problematic because the intervals are not equal. The first interval, under $1,000, is $1,000 wide, but the second interval, $1,000 to $2,999, is nearly $2,000 wide. Some intervals are as much as $10,000 wide.

7.8 Two-sample test of group means 165 Before we can compare the means for women and men, we need to recode the rincom98 variable. We could recode it by using the dialog box as described in section 3.4, but let’s type the commands instead. You could enter the following commands in the Command window, one by one. A much better approach would be to enter them into a do-file so that we can easily modify or reproduce the routine later if we need to.

Remember that we should add some comments at the top of the do-file that include the name of the file and its purpose. For this example, you must use Stata/MP or Stata/SE because the number of columns exceeds the 20-column limit of Stata/IC.

* recode income.do (sample do-file)

* This is a short do-file that recodes income. It does a

* tabulation to see how income is coded (tab rincom98). People

* given a value of 24 are recoded as missing (mvdecode rincom98,

* mv(24)). We generate a new variable called inc that is equal to the

* old variable, rincom98. We recode each interval with its

* midpoint. We do a cross-tabulation of the new and old income

* variables as a check.

version 13

tabulate rincom98, missing mvdecode rincom98, mv(24) gen inc = rincom98

replace inc = 500 if rincom98 == 1 replace inc = 2500 if rincom98 == 2 replace inc = 3500 if rincom98 == 3 replace inc = 4500 if rincom98 == 4 replace inc = 5500 if rincom98 == 5 replace inc = 6500 if rincom98 == 6 replace inc = 7500 if rincom98 == 7 replace inc = 9000 if rincom98 == 8 replace inc = 11250 if rincom98 == 9 replace inc = 13250 if rincom98 == 10 replace inc = 16250 if rincom98 == 11 replace inc = 18750 if rincom98 == 12 replace inc = 21250 if rincom98 == 13 replace inc = 23750 if rincom98 == 14 replace inc = 27500 if rincom98 == 15 replace inc = 32500 if rincom98 == 16 replace inc = 37500 if rincom98 == 17 replace inc = 45000 if rincom98 == 18 replace inc = 55000 if rincom98 == 19 replace inc = 67500 if rincom98 == 20 replace inc = 82500 if rincom98 == 21 replace inc = 100000 if rincom98 == 22 replace inc = 110000 if rincom98 == 23 tabulate inc rincom98, missing

There are several lines commenting on what the do-file will do. The first line after the version command runs a tabulation (tabulate), including missing values. The results help us understand how the variable was coded. The next line makes the code of 24into a missing value so that anybody who has a score onrincome98of24is defined as having missing values (mvdecode). The next line generates (gen) a new variable called inc that is equal to the old variablerincom98. Following this command is a series of commands to replace (replace) each interval with the value of its midpoint. Thus a code of 8forincome98is given a value of9000oninc. The final command does a cross- tabulation (tabulate) of the two variables to check for coding errors. Economists and

166 Chapter 7 Tests for one or two means demographers may not be happy with this coding system. Those who have arincom98 code of 1 may include people who lost a fortune, so substituting a value of 500may not be ideal. The commands make no adjustment for these possible negative incomes.

Those who have a code of 23include people who make $110,000 but also may include people who make $1,000,000 or more. We hope that there are relatively few such cases at either end of the distribution, so the values we use here are reasonable.

Now that we have income measured in dollars, we are ready to compare the income of women and men who work full time by using a two-samplet test. Open the dialog box by selectingStatistics⊲Summaries, tables, and tests⊲Classical tests of hypotheses⊲t test (mean-comparison test). In this dialog box, selectTwo-sample using groupsin thet testssection of theMaintab because the data are arranged in the common long format.

Typeinc(outcome variable) as theVariable nameandsexas theGroup variable name.

This dialog box is shown in figure 7.2.

Figure 7.2. Two-samplettest using groups dialog box

Statistics books discuss assumptions for doing this ttest, one of which is that the variance of the outcome variable, inc, is equal for both categories of the grouping variable, sex. That is, the variance in income is the same for women as it is for men.

If we believed that the variances were unequal, we could click on theUnequal variances box, and Stata would automatically adjust everything accordingly.

We can click onSubmitat the bottom of the dialog box at this point, and we will find a huge difference between women and men, with men making much more on average than women. However, remember that we only wanted to include people who work full time. To implement this restriction, click on the by/if/intab in the dialog box. In the Restrict observations section, typewrkstat == 1 (remember to use the double equal signs) in theIf: (expression) box. Here are the results:

7.8 Two-sample test of group means 167

. ttest inc if wrkstat == 1, by(sex) Two-sample t test with equal variances

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

male 671 44567.81 1054.665 27319.7 42496.96 46638.66 female 589 33081.07 895.9353 21743.74 31321.45 34840.69 combined 1260 39198.21 718.7267 25512.27 37788.18 40608.25

diff 11486.74 1404.217 8731.874 14241.61

diff = mean(male) - mean(female) t = 8.1802

Ho: diff = 0 degrees of freedom = 1258

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000

The layout of these results is similar to what we had for the one-sample ttest.

Using a two-tailed hypothesis (center column, below the main table), we see that the t = 8.1802has ap <0.001 (p= 0.0000). TheN= 671 men who work full time have an average income of just under $44,568, compared with just over $33,081 for theN= 589 women who are employed full time. Notice that the degrees of freedom, 1,258, is two fewer than the total number of observations, 1,260, because we used two means and lost one degree of freedom for each of them. For this two-samplettest, we haveN−2 degrees of freedom. A good layout for reporting a two-samplettest ist(1258) = 8.18, p <0.001.

Is this result substantively significant? This question is pretty easy to answer because we all understand income. Men make about $11,487 more on average than do women, and this is true even though both our men and our women are working full time. It is sometimes helpful to compare the difference of means with the standard deviations.

The $11,487 difference of means is roughly one-half of a standard deviation if we use the average of the two standard deviations as a guide. A difference of less than 0.2 standard deviations is considered a weak effect, a difference of 0.2 to 0.49 is considered a moderate effect, and a difference of 0.5 or more is considered a strong effect. If your statistics book covers the delta statistic,δ, you can get a precise number, but here we have just eyeballed the difference.

So far, we have been using the group-comparison ttest, which assumes that the data are in the long format. The dependent variableincomeis coded with the income of each participant. income is compared across groups by sex, which is coded 1 for each participant who is a man and2for each participant who is a woman. If the data were arranged in the wide format, we would still have two variables, but they would be different. One variable would be the income for each man and would have 671 entries;

the other variable would be the income for each woman and would have 589 entries. This would look like the wide format shown previously for comparing proportions, except that the variables would be calledmaleincandfemaleinc.

168 Chapter 7 Tests for one or two means When our data are in a wide format, we again selectStatistics⊲ Summaries, tables, and tests⊲Classical tests of hypotheses⊲t test (mean-comparison test)to open the dialog box; however, we now select Two-sample using variables in the t tests section of the Maintab. We simply enter the names of the two variables, maleinc and femaleinc.

The resulting command would bettest maleinc == femaleinc, unpaired. I cannot illustrate this process here because the data are in the long format. If you are interested, you can use Stata’sreshapecommand to convert between formats; seehelp reshape.

7.8 Two-sample test of group means 169

Effect size

There are two measures of effect size that are sometimes used to measure the strength of the difference between means. These are R2 and Cohen’s d (δ). At the time this book was written, R2 is not directly computed by Stata, but it can be computed using Stata’s built-in calculator. The formula is R2 = t2/(t2+ df).

Using the results of the two-sample ttest comparing income of women and men, R2= 8.18022/(8.18022+ 1258). We can compute this with a hand calculator or with the Stata command

. display "r-squared = " 8.1802^2/(8.1802^2 + 1258) r-squared = .05050561

The square root of this value is sometimes called the point biserial correlation. A value of 0.01 to 0.09 is a small effect, a value of 0.10 to 0.25 is a medium effect, and a value of over 0.25 is a large effect. If you use the Stata calculator with a negativet, it is important to insert parentheses correctly so that Stata does not see the negative sign as making thet2a negative value. If we had a t=−4.0 with 100 degrees of freedom, the Stata command would be

. display "r-squared = " (-4.0)^2/((-4.0)^2 + 100)

or you could simply use the absolute value oft when doing the calculations.

Cohen’s d measures how much of a standard deviation separates the two groups.

Cohen’sd= mean difference pooled standard deviation sp =

r(N1−1)s21+ (N2−1)s22 df

Stata does have an effect-size command to compute Cohen’s d,esize. This command has a similar structure to the two-samplettestcommand: esize twosample inc if wrkstat==1, by(sex). This command results in Cohen’s d = 0.461 and also reports a 95% confidence interval. When you read a study that reports the means and standard deviations for each of two groups but does not report Cohen’s d, you can compute the d using an immediate form of esize, that is, with the esizei command. For example, you can type esizei #obs1 #mean1 #sd1 #obs2

#mean2#sd2. Suppose we read that in a study with 100 participants in each group the means for the two groups were 60.2 and 65.3 and their respective standard deviations were 9.0 and 10.1. For our example, we can typeesizei 100 60.2 9.0 100 65.3 10.1. Stata will report that the effect size is Cohen’sd=−0.533. You need to be careful to interpret the sign. In our example, group two has a larger mean, 65.3, than group one, 60.2. Hence, group one’s mean is less than group two’s mean.

You can typehelp esizeto find other applications of the effect-size command.

An example of a short Stata session

Entering data using the Data Editor