Creating and modifying variables

Một phần của tài liệu A gentle introduction to stata, fourth edition (Trang 93 - 98)

3.5 Creating and modifying variables

There are several commands that are used in creating, replacing, and modifying vari- ables. Useful commands includegenerate,egen(short for extended generate),rename, andclonevar. Suppose that we want to change the positive items so that their names make more sense to us. A simple way to do this is to make the new variables be clones.

The clonevar newname = oldname command does this. clonevar has a couple of advantages: the actual command is simple, it keeps the missing values coded the same way as they were in the old variable, and it keeps the same value labels. We could run the following commands from the Command window:

. clonevar mompraise = R3483600 . clonevar momhelp = R3483800 . clonevar dadpraise = R3485200 . clonevar dadhelp = R3485400

Let’s do this for the other three variables in the dataset: identification number, age, and gender.

. clonevar id = R0000100 . clonevar sex = R3828700 . clonevar age = R3828100

You can open the dialog box for generate by selecting Data ⊲ Create or change data⊲ Create new variable. Thegeneratecommand does much the same thing as the clonevarcommand, but it does not transfer value labels. It does transfer the missing- value codes we used: .a,.b,.c,.d, and.e. If we wanted to run thegeneratecommand directly without using the dialog box, we would type the following commands. (Do not type these commands now because we already created the new variables by using the clonevarcommand.)

. generate mompraise = R3483800 . generate momhelp = R3483800 . generate dadpraise = R3485200 . generate dadhelp = R3485400 . generate id = R0000100 . generate sex = R3828700 . generate age = R3828100

Thegeneratecommand can also be used to create new variables by using an arith- metic expression. Table 3.4 shows the arithmetic symbols that can be used in these expressions.

64 Chapter 3 Preparing data for analysis Table 3.4. Arithmetic symbols

Symbol Operation Example

+ Addition mscore + fscore + sibscore

- Subtraction balance - expenses - penalty

* Multiplication income * .75

/ Division expenses/income

^ Exponentiation (x2) x^2

Attempts to do arithmetic with missing values will lead to missing values. So in the addition example in table 3.4, if sibscorewere missing (say, it was a single-child household), the whole sum in the example would be set to missing for that observation.

From the Command window, typehelp generateto see several more examples of what can be done with thegeneratecommand. If you ever have trouble finding a dialog box but you know the command name, you can open the help file first. The upper right corner of a help file opened in the Viewer window will show aDialog menu. Select the command name from the drop-down menu, and the dialog box for that command will open.

For more complicated expressions, order of operations can be important, and you can use parentheses to control the order in which things are done. Parentheses contain expressions too, and those are calculated before the expressions outside of parentheses.

Parentheses are never wrong. They might be unnecessary to get Stata to calculate the correct value, but they are not wrong. If you think they make an expression easier to read or understand, use as many as you need.

Fortunately, the rules are pretty simple. Stata reads expressions from left to right, and the order in which Stata calculates things inside expressions is to

1. Do everything in parentheses. If one set of parentheses contains another set, do the inside set first.

2. Exponentiate (raise to a power).

3. Multiply and divide.

4. Add and subtract.

Let’s go step by step through an example:

. generate example = weight/.45*(5+1/age^2)

When Stata looks at the expression to the right of the equal sign, it notices the paren- theses (priority #1) and looks inside them. There it sees that it first has to squareage (priority #2), then divide 1 by the result (priority #3), and then add 5 to that result (priority #4). Once it is done with all the stuff in parentheses, it starts reading from

3.5 Creating and modifying variables 65 left to right, so it divides weight by 0.45 and then multiplies the result by the final value it got from calculating what was inside parentheses. That final value is put into a variable calledexample.

Stata does not care about spaces in expressions, but they can help readability. So for example, instead of typing something like we just did, we can use some spaces to make the meaning clearer, as in

. generate example = weight/.45 * (5 + 1/age^2)

If we wanted to be even more explicit, it would not be wrong to type

. generate example = (weight/.45) * (5 + 1/(age^2))

Let’s take another look at reverse coding. We already reverse-coded variables by using a set of explicit rules like(0=4), but we could accomplish the same thing using arithmetic. Because this is a relatively simple problem, we will use it to introduce some of the things you may need to be concerned about with more complex problems.

Reversing a scale is swapping the ends of the scale around. The scale is 0 to 4, so we can swap the ends around by subtracting the current value from 4. If the original value is 4 and we subtract it from 4, we have made it a 0, which is the rule we specified withrecode. If the original value is 0 and we subtract it from 4, we have made it a 4, which is the rule we specified withrecode.

This scale starts at 0, so to reverse it, you just subtract each value from the largest value in the scale, in this case, 4. So if our scale were 0 to 6, we would subtract each value from 6; if it were 0 to 3, we would subtract from 3. If the scale started at 1 instead of 0, we would need to add 1 to the largest value before subtracting. So for a 1 to 5 scale, we would subtract from 6 (6−1 = 5 and 6−5 = 1); for a 1 to 3 scale, we would subtract from 4 (4−3 = 1 and 4−1 = 3).

What we have said so far is correct, as far as it goes, but we are not taking into account missing values or their codes. The missing values are coded in this dataset as

−1 to−5, and if you subtract those from 4 along with the item responses that are not missing-value codes, we will end up with 4−(−1) = 5 to 4−(−5) = 9. So we must first convert the missing-value codes to Stata’s missing-value code (.) and then do the arithmetic to reverse the scale.

Many researchers would code the values 1–5 rather than 0–4. To reverse a scale that starts at 1 and goes to 5, you need to subtract the current value from 1 more than the maximum value. Thus you would use 6−variable. If the variable’s current value is 5, then 6−5 = 1.

Let’s try this example. We have already run the mvdecodecommand on our vari- ables, so let’s reverse codeR3485300 and call it facritr. We will use the generate dialog box to create the variable. Select Data ⊲ Create or change data ⊲ Create new variable, typefacritrfor theVariable name, and type4 - R3485300in theSpecify a value or an expressionbox. Figure 3.5 shows the completed dialog box.

66 Chapter 3 Preparing data for analysis

Figure 3.5. Thegeneratedialog box

Here is the output of thegeneratecommand:

. generate facritr = 4 - R3485300 (5611 missing values generated)

Whenever you see that missing values are generated (there are 5,611 of them in this example!), it is a good idea to make sure you know why they are missing. These variables have only a small set of values they can take, so we can compare the original variable with the new variable in a table and see what got turned into what. Select Statistics⊲Summaries, tables, and tests⊲Frequency tables⊲Two-way table with measures of association, which will bring up the dialog shown in figure 3.6.

3.5 Creating and modifying variables 67

Figure 3.6. Two-way tabulation dialog box

Here we selectfacritras theRow variableandR3485300as theColumn variable.

We also check two of the options boxes: Because we are interested in the actual values the variables take, select the option to suppress the value labels (Suppress value labels).

We are also interested in the missing values, so we check the box to have them included in the table (Treat missing values like other values).

. tabulate facritr R3485300, miss nolabel

Father criticizes child’s ideas

facritr 0 1 2 3 4 Total

0 0 0 0 0 117 117

1 0 0 0 247 0 247

2 0 0 811 0 0 811

3 0 1,078 0 0 0 1,078

4 1,120 0 0 0 0 1,120

. 0 0 0 0 0 5,611

Total 1,120 1,078 811 247 117 8,984

Father criticizes child’s ideas

facritr .a .b .d .e Total

0 0 0 0 0 117

1 0 0 0 0 247

2 0 0 0 0 811

3 0 0 0 0 1,078

4 0 0 0 0 1,120

. 775 4,816 4 16 5,611

Total 775 4,816 4 16 8,984

68 Chapter 3 Preparing data for analysis From the table generated, we can see that everything happened as anticipated. Those adolescents who had a score of 0 on the original variable (the column with a0at the top) now have a score of 4on the new variable. There are 1,120 of these observations.

We need to check the other combinations, as well. Also the missing-value codes were all transferred to the new variable, but we lost the distinctions between the different reasons an answer is missing. The new variable,facritr, is a reverse coding of our old variable,R3485300.

Một phần của tài liệu A gentle introduction to stata, fourth edition (Trang 93 - 98)

Tải bản đầy đủ (PDF)

(498 trang)