In this problem, we use all of the HSB variables that we initially labeled as ordinal or scale in the Variable View. With those types of variables, it is important to see if the means make sense (are they close to what you expected?), to examine the dispersion/spread of the data, and to check the shape of the distribution (i.e., skewness value).
UNDERSTANDING YOUR DATA 57
4.1. Examine the data to get a good understanding of the central tendency, variability, range of scores, and the shape of the distribution for each of the ordinal and scale variables. Which variables are normally distributed?
This problem includes descriptive statistics and ways to examine your data to see if the variables are approximately normally distributed, an assumption of most of the parametric inferential statistics that we use. Remember that skewness is an important statistic for understanding whether a variable is normally distributed; it is an index that helps determine how much a variable’s distribution deviates from the distribution of the normal curve. Skewness refers to the lack of symmetry in a frequency distribution. Distributions with a long “tail” to the right have a positive skew and those with a long tail on the left have a negative skew. If a frequency distribution of a variable has a large (plus or minus) skewness, that variable is said to deviate from normality. In this assignment, we examine this assumption for several key variables.
However, some of the parametric inferential statistics that we use later in the book are robust or quite insensitive to violations of normality. Thus, we assume that it is okay to use parametric statistics to answer most of our research questions as long as the variables are not extremely skewed.
We answer Problem 4.1 by using the Descriptives command, which makes a compact, space- efficient output. You could instead run Frequencies because you can get the same statistics with that command. We use the Frequencies command later in the chapter. We use the Descriptives command to compute the basic descriptive statistics for all of the variables that we initially labeled ordinal or scale. We will not include the nominal variables (ethnicity and religion) or gender, algebra1, algebra2, geometry, trigonometry, calculus, and math grades, which are dichotomous variables but are labeled nominal here. We use them in a later problem.
4.1a. First, we will compute Descriptives for the ordinal variables. Use these steps:
• Select Analyze → Descriptive Statistics → Descriptives…
After selecting Descriptives, you will be ready to compute the mean, standard deviation, skewness, minimum, and maximum for all participants or cases on all the variables that we initially called ordinal under Measure in the Variable View of the Data Editor.
• While holding down the control key (i.e., the key marked “Ctrl”), click on all of the variables in the left box that we called ordinal so that they are highlighted.. These include father’s education, mother’s education, grades in h.s., and all the “item” variables (item 01 through item 11 reversed). Note that each of these variables has the symbol , which indicates that it was called ordinal in the variable view.
• Click on the arrow button pointing right to produce Fig. 4.1.
• Be sure that all of the requested variables have moved out of the left window.
Fig. 4.1. Descriptives.
• Click on Options. The Descriptives: Options window (Fig. 4.2) will open.
• Be sure that Mean has a check next to it.
• Under Dispersion, select Std. Deviation, Variance, Range, Minimum, and Maximum so that each has a check.
• Under Distribution, check Skewness. Your window should look like Fig.4.2
Fig. 4.2. Descriptives: Options.
• Click on Continue to get back to Fig. 4.1.
• Click on OK to produce Output 4.1a.
4.1b. Next, we compute Descriptives for the variables that were labeled scale in the Data Editor.
Note that these variables have the symbol next to them.
• Click on Reset in Fig 4.1 to move the ordinal variables back to the left. This also deletes what we chose under Options.
• Highlight math achievement, mosaic pattern test, visualization test, visualization retest, scholastic aptitude test – math, competence scale, and motivation scale and move them to the Variables box.
• Click on Options and check the same descriptive statistics as you did in Fig. 4.2.
Compare your syntax output to Outputs 4.1a and 4.1b. If they look the same, you have done the steps correctly. If the syntax is not showing in your output, consult Appendix A to see how to set your computer so that the syntax is displayed.
UNDERSTANDING YOUR DATA 59
Output 4.1a: Descriptives for the Variables Initially Labeled Ordinal
DESCRIPTIVES VARIABLES=faed maed grades item01 item02 item03 item04 item05 item06 item07 item08 item09 item10 item11 item12 item13 item14 item04r item05r item08r item11r /STATISTICS=MEAN STDDEV VARIANCE RANGE MIN MAX SKEWNESS.
Descriptives
Descriptive Statistics
73 8 2 10 4.73 2.830 8.007 .684 .281
75 8 2 10 4.11 2.240 5.015 1.124 .277
75 6 2 8 5.68 1.570 2.464 -.332 .277
74 3 1 4 2.96 .928 .861 -.763 .279
75 3 1 4 3.52 .906 .821 -1.910 .277
74 3 1 4 2.82 .897 .804 -.579 .279
74 3 1 4 2.16 .922 .850 .422 .279
75 3 1 4 1.61 .971 .943 1.581 .277
75 3 1 4 2.43 .975 .951 -.058 .277
75 3 1 4 2.76 1.051 1.104 -.433 .277
75 3 1 4 1.95 .914 .835 .653 .277
74 3 1 4 3.32 .760 .578 -1.204 .279
75 3 1 4 1.41 .737 .543 1.869 .277
75 3 1 4 1.36 .747 .558 2.497 .277
75 3 1 4 3.00 .822 .676 -.600 .277
75 3 1 4 2.67 .794 .631 -.320 .277
75 3 1 4 2.84 .717 .515 -.429 .277
74 3 1 4 2.84 .922 .850 -.422 .279
75 3 1 4 3.39 .971 .943 -1.581 .277
75 3 1 4 3.05 .914 .835 -.653 .277
75 3 1 4 3.64 .747 .558 -2.497 .277
69 father's education
mother's education grades in h.s.
item01 motivation item02 pleasure item03 competence item04 low motiv item05 low comp item06 low pleas item07 motivation item08 low motiv item09 competence item10 low pleas item11 low comp item12 motivation item13 motivation item14 pleasure item04 reversed item05 reversed item08 reversed item11 reversed Valid N (listwise)
Statistic Statistic Statistic Statistic Statistic Statistic Statistic Statistic Std. Error
N Range Minimum Maximum Mean Std. Variance Skewness
Syntax or log file shows the variables and statistics that you requested.
Output 4.1b Descriptives for Variables Initially Labeled as Scale
DESCRIPTIVES
VARIABLES=mathach mosaic visual visual2 satm competence motivation /STATISTICS=MEAN STDDEV VARIANCE RANGE MIN MAX SKEWNESS .
Descriptives
Descriptive Statistics
75 25.33 -1.67 23.67 12.5645 6.67031 44.493 .044 .277
75 60.0 -4.0 56.0 27.413 9.5738 91.658 .529 .277
75 15.00 -.25 14.75 5.2433 3.91203 15.304 .536 .277
75 9.50 .00 9.50 4.5467 3.01816 9.109 .235 .277
75 480 250 730 490.53 94.553 8940.252 .128 .277
73 3.00 1.00 4.00 3.2945 .66450 .442 -1.634 .281
73 2.83 1.17 4.00 2.8744 .63815 .407 -.570 .281
71 math achievement te mosaic, pattern test visualization test visualization retest scholastic aptitude test - math Competence scale Motivation scale Valid N (listwise)
Statistic Statistic Statistic Statistic Statistic Statistic Statistic Statistic Std. Error
N Range Minimum Maximum Mean Std. Variance Skewness
Interpretation of Outputs 4.1a and 4.1b
These outputs provide descriptive statistics for all of the variables labeled as ordinal (4.1a) and scale (4.1b). Notice that the variables are listed down the left column of the outputs and the requested descriptive statistics are listed across the top row. The descriptive statistics included in the output are the number of subjects (N), the Range, Minimum (lowest) and Maximum (highest) scores, the Mean (or average) for each variable, the Std. (the standard deviation), the Variance, the Skewness statistic, and the Std. error of the skewness. Note, from the bottom line of the outputs, that the Valid N (listwise) is 69 for Output 4.1a and 71 for 4.1b rather than 75, which is the number of participants in the data file. This is because the listwise N only includes the persons with no missing data on any variable requested in the output. Notice that several variables (e.g., father’s education, item01, motivation, and competence) each have a few participants missing.
Using your output to check your data for errors. For both the ordinal and scale variables, check to make sure that all Means seem reasonable. That is, you should check your means to see if they are within the ranges you expected (given the information in your codebook) and if the means are close to what you might expect (given your understanding of the variable). Next, check the output to see that the Minimum and Maximum are within the appropriate (codebook) range for each variable. If the minimum is smaller or the maximum is bigger than you expected (e.g., 0 or 100 for a variable that has 1–50 for possible values), then there was an error somewhere.
Finally, you should check the N column to see if the Ns are what you were expecting. If it happens that you have more participants missing than you expected, check the original data to see if you forgot to enter some data, or data were entered incorrectly. Notice that the competence scale and the motivation scale each have a few participants missing.
Using the output to check assumptions. The main assumption that we can check from this output is normality. We won’t pay much attention to the skewness for item 01 to item 11 reversed, which have only four levels (1–4). These ordinal variables have fewer than five levels, so they will not be considered to be scale even though some of the “item” variables are not very skewed. We do not use them as individual variables because we are combining them to create summated variables (the motivation, competence, and pleasure scales) before using inferential statistics. Most statistics books do not provide advice about how to decide whether a variable is at least approximately normal. SPSS/PASW recommends that you divide the skewness by its standard error. If the result is less than 2.5 (which is approximately the p = .01 level) then skewness is not significantly different from normal. A problem with this method, aside from having to use a calculator, is that the standard error depends on the sample size, so with large samples most variables would be found to be non-normal, yet, actually, data for large samples are more likely to be normal. A simpler guideline is that if the absolute value (value without considering whether or not there is a negative sign) of the skewness is less than one, the variable is at least approximately normal. From Output 4.1a, we can see that two of the variables that we initially called ordinal (father’s education and grades in h.s.) are approximately normally distributed. These ordinal variables, with five or more levels, have skewness values between –1 and 1. Thus, we can assume that they are more like scale variables, and we can use inferential statistics that have the assumption of normality. To better understand these variables, it may be helpful to change the Measure column in the Variable View so that these two variables are labeled as scale; we call them normal.
We expect the variables that we initially labeled as scale to be normally distributed. Look at the Skewness Statistic in Output 4.1b to see if it is between –1 and 1. From the output we see that most of these variables have skewness values between –1 and 1, but one (competence) does not, so it may be helpful to change it to ordinal in the Measure column.
UNDERSTANDING YOUR DATA 61
There are several ways to check this assumption in addition to checking the skewness value. If the mean, median, and mode, which can be obtained with the Frequencies command, are approximately equal, then you can assume that the distribution is approximately normally distributed. For example, remember from Chapter 3 (Fig. 3.8) that the mean (490.53), median (490.00), and mode (500) for scholastic aptitude test – math were very similar values, and the skewness value was .128 (see Output 4.1). Thus, we can assume that SAT-math is approximately normally distributed.