About all we can do to summarize a categorical variable that is unordered is to report the mode and show a frequency distribution or a graph (pie chart or bar chart). In this chapter, you will use a dataset,descriptive gss.dta, that includes three categorical, unordered variables. sex, marital, and polviews are nominal-level variables. The variable sex is coded as male or female, marital is coded by marital status, and polviewsis coded by political view. There is no order tosexin that being codedmale or female does not make one higher or lower on sex. Similarly, there is no order to maritalin that having a particular status (that is, never married, married, separated, divorced, or widowed) does not make one higher or lower on marital status. These are just different statuses.
We can use thetabulatecommand to get frequency distributions forsex,marital, and polviews. We could type tabulate sex, then tabulate marital, and then tabulate polviews. However, Stata has another command namedtab1 that will re- peatedly issue the tabulate command on each of the specified variables: tab1 sex marital polviews. This is so simple that you probably want to enter it directly, but if you want to use the dialog box, select Statistics⊲ Summaries, tables, and tests⊲ Fre- quency tables⊲Multiple one-way tables; see figure 5.3. Be sure to selectMultiple one-way tables rather thanOne-way tables.
Figure 5.3. Dialog box for frequency tabulation
5.4 Statistics and graphs—unordered categories 99 Tabulating a series of variables and including missing values
There are three variations of the tabulatecommand. tabulate, followed by one variable name, produces a one-way table of frequency counts for the values of that variable. To do a tabulation of a variable, say, educ, the command is tabulate educ.
The tab1 command, as mentioned above, takes a list of variable names and issues the tabulate command separately for each of them. Suppose that you want to do a tabulation on educ, sex, and polviews, and you want to do this using one command. You would use tab1 educ sex polviews. Sometimes you might want to have the tabulation show missing values. To do this, add the missing option. To do a tabulation of the three variables in one command and show the missing values, the command istab1 educ sex polviews, missing.
tabulate, followed by two variable names, produces a two-way table of fre- quency counts for the combinations of values of those variables. The tab2 command takes a list of variable names and issues the tabulate command separately for each possible pair of variable names.
Using the dialog box, we enter the variablessex,marital, andpolviews. Clicking onOK produces the following result:
. tab1 sex marital polviews -> tabulation of sex respondents
sex Freq. Percent Cum.
male 1,228 44.41 44.41
female 1,537 55.59 100.00
Total 2,765 100.00
-> tabulation of marital marital
status Freq. Percent Cum.
married 1,269 45.90 45.90
widowed 247 8.93 54.83
divorced 445 16.09 70.92
separated 96 3.47 74.39
never married 708 25.61 100.00
Total 2,765 100.00
100 Chapter 5 Descriptive statistics and graphs for one variable
-> tabulation of polviews think of self as
liberal or
conservative Freq. Percent Cum.
extremely liberal 47 3.53 3.53
liberal 143 10.74 14.27
slightly liberal 159 11.95 26.22
moderate 522 39.22 65.44
slghtly conservative 209 15.70 81.14
conservative 210 15.78 96.92
extrmly conservative 41 3.08 100.00
Total 1,331 100.00
The first tabulations forsexandmaritaltell us a lot. Some 55.6% of our sample of 2,765 adults comprises women, and 44.4% comprises men. We have 1,537 women and 1,228 men. For themarital variable, 45.9% (1,269) of adults are married. This is a clear mode because this marital status is so much more frequent than any of the other statuses. By contrast, the mode forsexis not as predominant a category.
A Stata user, Ben Jann, in 2007, with subsequent revisions wrote a command called fre, which provides more details thantab1. To install thefrecommand, typesearch fre. A new Viewer window will open with a list of all packages with the keyword “fre”.
You can scroll through these to find the command fre, or you can press Ctrl+f on Windows orCmnd+fon Mac, typefrein the resulting box, and then pressEnteruntil you find the command fre. Click on the link, and in the new window, click on the click here to installlink, which will install thefreado-file and help file. Because you know the name of the command, you could install the command more simply by typing ssc install frein the Command window. This will automatically install the command.
Enter the command fre sex marital polviews. This gives you the result we had before, but this is more useful when there are missing values. In this dataset, there are 1,434 people who were not asked—or at least did not report—their political views. Thefrecommand provides the percentage of the total sample who selected each category (18.88% picked moderate). We usually do not want to use the percentages in this column because we are usually interested in the percentage of the valid responses.
The valid responses are those 1,331 survey participants who answered the item. The column headed with “Valid” shows us that 39.22% of those valid responses picked the moderate category. The last column on the right is labeled “Cum.”, which is the cumulative percentage for the valid responses. We see that 65.44% of the participants picked moderate or more liberal responses, but only 14.27% picked the liberal or the extremely liberal categories. One of the nicest features of thefrecommand is the way it provides both the value label and the numeric value in one table. Being able to see the label and value together is quite useful when you are recoding or creating a scale.
Comparing the freresults with the tab1 results above, we see that the variable label is easier to read with thefrecommand.
5.4 Statistics and graphs—unordered categories 101
. fre sex marital polviews sex respondents sex
Freq. Percent Valid Cum.
Valid 1 male 1228 44.41 44.41 44.41
2 female 1537 55.59 55.59 100.00
Total 2765 100.00 100.00
marital marital status
Freq. Percent Valid Cum.
Valid 1 married 1269 45.90 45.90 45.90
2 widowed 247 8.93 8.93 54.83
3 divorced 445 16.09 16.09 70.92
4 separated 96 3.47 3.47 74.39
5 never married 708 25.61 25.61 100.00
Total 2765 100.00 100.00
polviews think of self as liberal or conservative
Freq. Percent Valid Cum.
Valid 1 extremely liberal 47 1.70 3.53 3.53
2 liberal 143 5.17 10.74 14.27
3 slightly liberal 159 5.75 11.95 26.22
4 moderate 522 18.88 39.22 65.44
5 slghtly conservative 209 7.56 15.70 81.14
6 conservative 210 7.59 15.78 96.92
7 extrmly conservative 41 1.48 3.08 100.00
Total 1331 48.14 100.00
Missing . 1434 51.86
Total 2765 100.00
102 Chapter 5 Descriptive statistics and graphs for one variable Obtaining both numbers and value labels
Before doing the tabulations, you might want to type the command numlabel all, add. After you enter this command, whenever you type the tabulate command, Stata reports both the numbers you use for coding the data (1, 2, 3, 4, and 5) and the value labels (married, widowed, divorced, separated, and never married). Later, if you do not want to include both of these, you can drop the numerical values by using the commandnumlabel all, remove. Practice using these commands as an exercise on your own.
The tables with both numbers and value labels may not look great, so you may want two tables for each variable, with one showing the value labels without the numeric codes and the other showing the numeric codes without the value labels. The default gives you the value labels. On the dialog box, there is an option to Display numeric codes rather than value labels. This option produces the numeric values without the value labels. If you want both the numeric values and the value labels using official Stata commands, you need to run either the numlabelcommand or run the tab1 command twice—once withDisplay numeric codes rather than value labelschecked and once with it not checked. It is probably simpler just to run Ben Jann’sfrecommand once you have installed it.
In chapter 4, we created a pie chart. Here we will create a pie chart for marital status. Select Graphics ⊲ Pie chart and look at the Main tab. If this dialog still has information entered from a previous pie chart, you should click on the (reset) icon in the lower left of the view screen to clear the dialog box. Typemaritalas theCategory variable. This uses the categories we want to show as pieces of the pie. Leave the Variable: (optional) box blank. Under the Titles tab, enter a nice title in the Title box and the name of the dataset we used as a Note. Under the Optionstab, click on Order by this variableand typemarital. Also checkExclude observations with missing values (casewise deletion) because we do not want these, if there are any, to appear as a separate piece of the pie. The dialog box for theOptionstab is shown in figure 5.4.
5.4 Statistics and graphs—unordered categories 103
Figure 5.4. TheOptionstab for pie charts (by category)
The initial pie chart on the left in figure 5.5 provides a visual display of the dis- tribution of marital statuses in the United States. The size of each piece of the pie is proportional to the percentage of the people in that status. This pie chart shows that the most common status of adults is married.
married widowed
divorced separated
never married descriptive_gss.dta
Marital Status in the United States
Married Widowed
Divorced Separated
Never married descriptive_gss.dta
Marital Status in the United States
Initial and Edited Pie Charts
Figure 5.5. Pie charts of marital status in the United States
It is possible to improve the default pie chart. The default pie chart is a bit hard to read because it assumes you want each slice a different color, and this will not work well when printing in black and white. Because many publications require black and
104 Chapter 5 Descriptive statistics and graphs for one variable white printing, we should edit the pie chart. From our dialog box for the pie chart, we could click on theOveralltab, from which we could select a monochrome scheme from the drop-down Schemelist. However, there are several other ways we can improve this pie chart, so let’s open the Graph Editor.
We can open the Graph Editor by right-clicking on the pie chart and selectingStart Graph Editor, clicking on the icon above the graph that has a bar chart with a pencil, or in Windows, we can selectFile ⊲ Start Graph Editor. This expands the window that has the graph and adds a panel on the side of the chart with things we might want to change (see figure 5.6). On the left is the pie chart we will edit, and on the right are the names of the parts of the pie chart.
Figure 5.6. The Graph Editor
Notice that the labels in the legend of the initial pie chart are not capitalized. Click on the plus sign bylegendand the plus sign bykey region. Double-click onlabel[1]and change theTextfrom married to Married. Do the same for each of the other labels.
We could also do this by double-clicking on the labelmarried.
Next click on the plus sign byplotregion1, and then double-click onpieslices[1]. Here we will pickBlackas theColorand100%as theFill intensity. Forpieslices[2], pickBlack and 70%. For pieslices[3], pick Black and 50%. Forpieslices[4], pick Black and 30%, check Explode slice, and make sure the Distance is Medium. Finally, for pieslices[5], pickBlackand10%. You can experiment with other options. The pie chart on the right
5.4 Statistics and graphs—unordered categories 105 in figure 5.5 is what we have created. The exploded slice,Separated, would be useful if you wanted to emphasize the size of this group.
We have only scratched the surface of what you can do with the powerful Graph Editor. For example, you can click somewhere on the figure, then click on theT (Add Text Tool) to the left of the graph, and a dialog box opens so you can add text. You could then click on the \(slash or Add Line Tool) just below the T, and draw a line from the text to the piece of the pie it describes. As an exercise, you might add the text “Less than half married” with a line pointing to the piece of the pie for married.
Within Stata’s Graph Editor, you can record the changes you make to a graph by using the Graph Recorder. When the Graph Editor is open, there are symbols like those you might see on a recorder or a video player (at the top right of the Editor). Clicking on the red circle starts the Recorder, and clicking on the pair of vertical bars pauses a recording. When you are done making your changes, click on the red circle again before you save the graph or exit the Graph Editor, and the Graph Recorder will prompt you to name the recording. Suppose that we call the recording myschemeand save it. The next time we do a similar graph and want to make the same changes, we can click on the arrow to the right of the pair of vertical bars, and it will give us a list of recordings we have saved. We can pick myschemeto apply the changes we recorded and saved in myschemeto our current graph.
A bar chart is more attractive than a pie chart for many applications. Instead of selecting Bar Chart from the Graphicsmenu, select Histogram. Here we are creating a bar chart rather than a histogram, but this is the best way to produce a high-quality bar chart using Stata.
On theMain tab, type marital in the Variable box. Click on the button next to Data are discrete. In the section labeled Y axis, click on the button next to Percent.
The trick to making this a bar chart is to click onBar propertiesin the lower left of the Maintab. This opens another dialog box where the default is to have no gap between the bars. Change this to a gap of 10, which sets the gap between bars to 10 percent of the width of a bar. Click on Accept. If you switch to theTitles tab, you can enter a title, such as Marital Status in the United States. Next switch to the X axis tab and click onMajor tick/label properties. This opens another dialog box where you select theLabelstab and check the box forUse value labels. Sometimes the value labels are too wide to fit under each bar. You may need to create new value labels that are shorter. If they are just a little bit too wide, you can change the angle. Click on Angle and select45 degrees from the drop-down menu. Click onAccept. Finally, switch back to theMaintab. In the lower right corner of the dialog box, click onAdd height labels to bars. Because we are reporting percentages, this option will show the percentage in each marital status at the top of each bar. The Maintab is shown in figure 5.7.
106 Chapter 5 Descriptive statistics and graphs for one variable
Figure 5.7. Using thehistogramdialog box to make a bar chart
Figure 5.8 shows the resulting bar chart, which has the percentage in each status at the top of each bar. Married is the most common status, but never married is second.
This dataset includes people who are 18 and older, and it is likely that many of those in the never-married status are between 18 and 30.
45.9
8.933
16.09
3.472
25.61
01020304050Percent
0 married widowed divorced separated never married Marital Status
Marital Status in the United States
Figure 5.8. Bar chart of marital status of U.S. adults
Sometimes you may have a larger number of categories. When this happens, Stata’s default will show only a limited number of value labels, so some of the bars will be unlabeled. If you want to label all of them, you need to go to theX axistab and click on Major tick/label propertiesto open the dialog box we opened before. On theRuletab, click onSuggest # of ticksand enter the number of bars in the box byTicks. Another