Now let’s make a boxplot comparing males and females on math achievement. This is similar to what we did in Chapter 3, but here we request statistics and stem-and-leaf plots.
4.3. Create a boxplot for math achievement split by gender.
Use these commands:
• Analyze → Descriptive Statistics → Explore.
• The Explore window (Fig. 4.5) will appear.
• Click on math achievement and move it to the Dependent List.
• Next, click on gender and move it to the Factor (or independent variable) List.
• Click on Both under Display. This will produce both a table of descriptive statistics and two kinds of plots: stem-and-leaf and box-and-whiskers.
UNDERSTANDING YOUR DATA 65
Fig. 4.5. Explore.
• Click on OK.
You will get an output file complete with syntax, statistics, stem-and-leaf plots, and boxplots. See Output 4.3 and compare it to your own output and syntax. As with most SPSS procedures, we could have requested a wide variety of other statistics if we had clicked on Statistics and/or Plots in Fig 4.5.
Output 4.3: Boxplots Split by Gender With Statistics and Stem-and-Leaf Plots
EXAMINE VARIABLES=mathach BY gender /PLOT BOXPLOT STEMLEAF
/COMPARE GROUP
/STATISTICS DESCRIPTIVES /CINTERVAL 95
/MISSING LISTWISE /NOTOTAL.
Explore Gender
Case Processing Summary
34 100.0% 0 .0% 34 100.0%
41 100.0% 0 .0% 41 100.0%
gender male female math achievement test
N Percent N Percent N Percent
Valid Missing Total
Cases
Descriptives
14.7550 1.03440 Statistic Std. Error
12.6505 16.8595 14.8454 14.3330 36.379 6.03154 3.67 23.7 20.00 10.00
-.156 .403
-.963 .788
10.7479 1.04576 8.6344
12.8615 10.6454 10.3330 44.838 6.69612 -1.7 23.7 25.33 10.50
.331 .369
-.698 .724
Mean
Lower Bound Upper Bound 95% Confidence
Interval for Mean 5% Trimmed Mean Median
Variance Std. Deviation Minimum Maximum Range
Interquartile Range Skewness Kurtosis Mean
Lower Bound Upper Bound 95% Confidence
Interval for Mean 5% Trimmed Mean Median
Variance Std. Deviation Minimum Maximum Range
Interquartile Range Skewness Kurtosis gender
male math
achievement test
female
math achievement test Stem-and-Leaf Plots
math achievement test Stem-and-Leaf Plot for gender= male
Frequency Stem & Leaf 1.00 0 . 3 7.00 0 . 5557799 11.00 1 . 01123444444 7.00 1 . 5578899 8.00 2 . 11123333 Stem width: 10.0
Each leaf: 1 case(s)
Note that we have circled, for males and for females, three key statistics:
mean, variance, and skewness.
The last digit of each person’s math achievement score.
This line indicates that 11 persons had stems of 1 and leaves of 0 through 4 (i.e., scores between 10 and 14). See the interpretation box for more explanation.
math achievement test Stem-and-Leaf Plot for gender= female
Frequency Stem & Leaf 1.00 -0 . 1 7.00 0 . 1123344 12.00 0 . 555666778999 11.00 1 . 00002334444
1 person had a negative score (stem – 0) of -1.
5.00 1 . 77779 5.00 2 . 02233 Stem width: 10.0
Each leaf: 1 case(s)
UNDERSTANDING YOUR DATA 67
female male
gender
25.00 20.00 15.00 10.00 5.00 0.00 -5.00
math achievement test
Interpretation of Output 4.3
The first table under Explore provides descriptive statistics about the number of males and females with Valid and Missing data. Note that we have 34 males and 41 females with valid math achievement test scores.
The Descriptives table contains many different statistics for males and females separately.
Several of them are beyond what we cover in this book. Note that the average math achievement test score is 14.76 for males and 10.75 for females. We discuss the variances and skewness below under assumptions.
The Stem-and-Leaf Plots for each gender separately are next. These plots are like a histogram or frequency distribution turned on the side. They give a visual impression of the distribution,andtheyshoweachperson’sscore on the dependent variable (math achievement).
Note that the legend indicates that Stem width equals 10 and Each leaf equals one case. This means that entries that have 0 for the stem are less than 10, those with 1 as the stem range from 10 to 19, and so forth. Each number in the Leaf column represents the last digit of one person’s math achievement score. The numbers in the Frequency column indicate how many participants had scores in the range represented by that stem and range of leaves. Thus, in the male plot, one student had a Stem of 0 and a Leaf of 3, that is, a score of 03 (or 3). The Frequency of students with leaves between 05 and 09 is 7, and there were three scores of 5, two of 7, and two of 9. One had a Stem of 1 and a Leaf of 0 (a score of 10); two had scores of 11, and so forth.
Boxplots are the last part of the output. This figure has two boxplots (one for males and one for females). By inspecting the plots, we can see that the median score for males is quite a bit higher than that for females, although there is substantial overlap of the boxplots, with the highest female score equaling the highest male score. We therefore need to be careful in concluding that males score higher than females, especially based on a small sample of students. In Chapter 9, we show how an inferential statistic (the t test) can help us know how likely it is that this apparent difference could have occurred by chance.
Using the output to check your data for errors. Checking the box and stem-and-leaf plots can help identify outliers that might be data entry errors. In this case there aren’t any.
Using the output to check your data for assumptions. As noted in the interpretation of Median
Outputs 4.2a and 4.2b, you can tell if a variable is grossly non-normal by looking at the boxplots. The stem-and-leaf plots provide similar information. You can also examine the skewness values for each gender separately in the table of Descriptives (see the circled skewness values). Note that for both males and females, the skewness values are less than one, which indicates that math achievement is approximately normal for both genders. This is an assumption of the t test.
The Descriptives table also provides the variances for males and females. A key assumption of the t test is that the variances are approximately equal (i.e., the assumption of homogeneity of variances). Note that the variance is 36.38 for males and 44.84 for females. These do not seem grossly different, and we find out in Chapter 9 that they are, in fact, not significantly different. Thus, the assumption of homogeneous variances is not violated.